Hakan SARIOGLU’s Post

View profile for Hakan SARIOGLU, graphic

Kurucu&CEO - SAKARYA UZAY HAVACILIK OTOMASYON SANAYİ YAZILIM SİSTEMLERİ AR-GE TEST MERKEZİ

Understanding and Addressing Cache System Failures: A Detailed Insight Cache systems are indispensable in modern computing, enabling faster data access and reducing the load on backend systems. However, when not properly managed, they can lead to severe issues that undermine system performance and reliability. Here’s an in-depth look at common cache failures, their causes, and strategies to mitigate them: 1. Thunder Herd Problem What Happens: When a large number of cache keys expire simultaneously, incoming queries bypass the cache and overwhelm the database with requests, potentially causing it to crash. Solutions: Stagger Expiry Times: Add random values to key expiration times to distribute requests evenly over time. Prioritize Core Data: Allow only essential business data to access the database during cache recovery, blocking non-critical requests. Asynchronous Cache Refresh: Preemptively refresh popular keys before expiration to maintain availability. 2. Cache Penetration What Happens: Requests for non-existent keys bypass the cache and hit the database, often due to malicious attacks or poorly structured queries. This burdens both the cache and database. Solutions: Cache Null Values: Store null results for non-existent keys to avoid repeated database hits. Use Bloom Filters: A lightweight, probabilistic structure that checks key existence before hitting the database. Input Validation: Prevent unnecessary queries with proper input sanitization and query optimization.

View organization page for ByteByteGo, graphic

551,961 followers

How can Cache Systems go wrong? The diagram below shows 4 typical cases where caches can go wrong and their solutions. 1. Thunder herd problem This happens when a large number of keys in the cache expire at the same time. Then the query requests directly hit the database, which overloads the database. There are two ways to mitigate this issue: one is to avoid setting the same expiry time for the keys, adding a random number in the configuration; the other is to allow only the core business data to hit the database and prevent non-core data to access the database until the cache is back up. 2. Cache penetration This happens when the key doesn’t exist in the cache or the database. The application cannot retrieve relevant data from the database to update the cache. This problem creates a lot of pressure on both the cache and the database. To solve this, there are two suggestions. One is to cache a null value for non-existent keys, avoiding hitting the database. The other is to use a bloom filter to check the key existence first, and if the key doesn’t exist, we can avoid hitting the database. 3. Cache breakdown This is similar to the thunder herd problem. It happens when a hot key expires. A large number of requests hit the database. Since the hot keys take up 80% of the queries, we do not set an expiration time for them. 4. Cache crash This happens when the cache is down and all the requests go to the database. There are two ways to solve this problem. One is to set up a circuit breaker, and when the cache is down, the application services cannot visit the cache or the database. The other is to set up a cluster for the cache to improve cache availability. Over to you: Have you met any of these issues in production? --  Subscribe to our weekly newsletter to get a Free System Design PDF (158 pages): https://bit.ly/bbg-social #systemdesign #coding #interviewtips  .

  • No alternative text description for this image
Hakan SARIOGLU

Kurucu&CEO - SAKARYA UZAY HAVACILIK OTOMASYON SANAYİ YAZILIM SİSTEMLERİ AR-GE TEST MERKEZİ

2w

3. Cache Breakdown What Happens: When a frequently accessed ("hot") key expires, multiple requests hit the database simultaneously, causing overload. Solutions: Disable Expiry for Hot Keys: Retain critical keys in the cache indefinitely. Lazy Loading: Allow the cache to repopulate keys incrementally rather than all at once. Distributed Cache Layers: Use multiple cache layers to distribute the load and isolate the impact of key expiration. 4. Cache Crash What Happens: A cache system failure forces all requests to the database, leading to unmanageable loads and potential system downtime.

To view or add a comment, sign in

Explore topics