Sunday, November 23, 2014

Investigate into why key cache in apache cassandra 1.0.8 gets reduced

Today, we will investigate into apache cassandra 1.0.8 when and why it reduce configured key cache. If you run the command nodetool cfstats. One of the statistics would probably interest you. I paste the snippet below.
Key cache capacity: 200000
Key cache size: 200000
Key cache hit rate: 0.9655797101449275
Row cache: disabled

After cassandra instance has been running for sometime, and you start to notice that the key cache capacity has gone down.
Key cache capacity: 150000
Key cache size: 150000
Key cache hit rate: 0.962251615630851
Row cache: disabled

As seen above, the initial capacity for this column family has 20,000 total key for cache. Currently, all object (that is 20,000) occupied fully in the key cache assigned. The hit rate is 96% which is very good statistics. So after a while, what had happened and why was it reduce? Let's investigate into the log file.
 WARN [ScheduledTasks:1] 2014-02-02 00:46:46,384 (line 187) Reducing MyColumnFamily KeyCache capacity from 200000 to 150000 to reduce memory pressure

Apparently memory is not enough at this point of time and the key cache is reduced to free up more memory for the cassandra instance. Let's look at the cassandra yaml file if there is any description for the key cache.
# emergency pressure valve #2: the first time heap usage after a full
# (CMS) garbage collection is above this fraction of the max,
# Cassandra will reduce cache maximum _capacity_ to the given fraction
# of the current _size_. Should usually be set substantially above
# flush_largest_memtables_at, since that will have less long-term
# impact on the system.
# Set to 1.0 to disable. Setting this lower than
# CMSInitiatingOccupancyFraction is not likely to be useful.
reduce_cache_sizes_at: 0.85
reduce_cache_capacity_to: 0.6

There are two configurations that reduce the cache size. When memory heap usage at 85%, key cache is reduced to 60% of its initial value. So now we dive deeper into the code to see what happened. Let's read into class GCInspector.
double usage = (double) memoryUsed / memoryMax;

if (memoryUsed > DatabaseDescriptor.getReduceCacheSizesAt() * memoryMax && !cacheSizesReduced)
cacheSizesReduced = true;
logger.warn("Heap is " + usage + " full. You may need to reduce memtable and/or cache sizes. Cassandra is now reducing cache sizes to free up memory. Adjust reduce_cache_sizes_at threshold in cassandra.yaml if you don't want Cassandra to do this automatically");

When memory used is greater than reduce_cache_sizes_at (configured in cassanra.yaml, value at 0.85) multiply maximum memory in the heap and cache has not been reduced before. For example, if jvm is assigned with 8GB of heap, so the if statement evaluation become valid under such arithmetic, memory usage greater than 6.8GB when cache size has not been reduced before.

When the condition become true, StorageService will start to reduce cache size. A simple for loop over all column families to reduce the cache size. As seen here, there are two caches are being reduced. The rowcache and the keycache. Because we did not enable row cache and also not a focus on this study, I'll leave as an exercise for you. The investigation continue on the keyCache.reduceCacheSize();. As the snippet of code below shown.
public void reduceCacheSize()
if (getCapacity() > 0)
int newCapacity = (int) (DatabaseDescriptor.getReduceCacheCapacityTo() * size());
logger.warn(String.format("Reducing %s %s capacity from %d to %s to reduce memory pressure",
cfName, cacheType, getCapacity(), newCapacity));

So if the capacity is initially assigned to a value larger than 0, then a new capacity is set. The new capacity is such that, reduce_cache_capacity_to (default at cassandra yaml, 0.60) multiply with the current size of the cache. For example, if the cache is occupied at 20000 x 0.60, the new value will be the new cache capacity at 12000.

This wrap up the investigation. Final thought, because the memory consumption is exceed certain amount of threshold, this emergency pressure valve kicked in. To fix immediate, an increase heap for cassandra instance will solve, but the correct would probably reduce node load or increase node for the cluster. When cache capacity is reduced, expect read become slower too and in data storage perspective, speed and performance is everything and reduced cache is definitely an impact to the cluster.

No comments:

Post a Comment