Friday, July 18, 2014

Cassandra discarding obsolete commit log

Often times, I noticed the log show the following lines in apache cassandra 1.0.8
INFO [COMMIT-LOG-WRITER] 2014-06-30 08:58:55,039 (line 490) Discarding obsolete commit log:CommitLogSegment(/mnt/cassandra/commitlog/CommitLog-1404095354976.log)

It looks nothing to worry about as the log level is INFO. However, I often see this and just to check to understand what it really is.

This log is written by
private void maybeDiscardSegment(CommitLogSegment segment, Iterator<CommitLogSegment> iter)
if (segment.isSafeToDelete() && iter.hasNext())
{"Discarding obsolete commit log:" + segment);
// usually this will be the first (remaining) segment, but not always, if segment A contains
// writes to a CF that is unflushed but is followed by segment B whose CFs are all flushed.
if (logger.isDebugEnabled())
logger.debug("Not safe to delete commit log " + segment + "; dirty is " + segment.dirtyString() + "; hasNext: " + iter.hasNext());

So why discard segment? From the code documentation

Commit Log tracks every write operation into the system. The aim of the commit log is to be able to successfully recover data that was not stored to disk via the Memtable. Every Commit Log maintains a header represented by the abstraction CommitLogHeader. The header contains a bit array and an array of longs and both the arrays are of size, #column families for the Table, the Commit Log represents.

Whenever a ColumnFamily is written to, for the first time its bit flag is set to one in the CommitLogHeader. When it is flushed to disk by the Memtable its corresponding bit in the header is set to zero. This helps track which CommitLogs can be thrown away as a result of Memtable flushes. Additionally, when a ColumnFamily is flushed and written to disk, its entry in the array of longs is updated with the offset in the Commit Log file where it was written. This helps speed up recovery since we can seek to these offsets and start processing the commit log.

Every Commit Log is rolled over everytime it reaches its threshold in size; the new log inherits the "dirty" bits from the old.

Over time there could be a number of commit logs that would be generated. To allow cleaning up non-active commit logs, whenever we flush a column family and update its bit flag in the active CL, we take the dirty bit array and bitwise & it with the headers of the older logs. If the result is 0, then it is safe to remove the older file. (Since the new CL inherited the old's dirty bitflags, getting a zero for any given bit in the anding means that either the CF was clean in the old CL or it has been flushed since the switch in the new.)

So that's pretty clear why commit log is remove. Tracing the call upward,
+--- CommitLog.discardCompletedSegmentsInternal()
+--- CommitLog.discardCompletedSegments()
+--- ColumnFamilyStore.maybeSwitchMemtable()
+--- ColumnFamilyStore.truncate()

So whenever a column family is truncated, the commit log is discard (remove) as well which make sense. When maybeSwitchMemtable is executed, that is, memtable is flushed, after that, its segments get discard (remove) as well.

This logging message is just fine as it is part of cassandra operation. That's it for this article.

No comments:

Post a Comment