Saturday, November 30, 2013

how does read performance gains when in compression?

Read the following interesting discussion in the cassandra mailing list, and think very good explanation and would like to share out.

how does read performance gains when in compression?

Cite from Artur Kronenberg
The way I understand it is that compression gives you the advantage of having to use way less IO and rather use CPU. The bottleneck of reads is usually the IO time you need to read the data from disk. As a figure, we had about 25 reads/s reading from disk, while we get up to 3000 reads/s when we have all of it in cache. So having good compression reduces the amount you have to read from disk. Rather you may spend a little bit more time decompressing data, but this data will be in cache anyways so it won't matter.

Cite from Edward Capriolo
The big * in the explanation: Smaller file size footprint leads to better disk cache, however decompression adds work for the JVM to do and increases the churn of objects in the JVM. Additionally compression block sizes might be 4KB while for some use cases a small row may be 200bytes. This means that internally a large block might be decompressed to get at the row inside of it.

In many use cases compression is a performance win, but not necessarily in all cases. In particular if you are already doing JVM performance tuning issues to stop garbage collection pauses enabling compression could make performance worse.

Thursday, November 14, 2013

Wednesday, November 13, 2013

cassandra 2.0 catch 101 – part2

After playing playing around cassandra 2.0 for quite sometime and in this article, I'm gonna share with you a strange issue that encountered, unable to drop table no matter how.

I'm using the stress tools in cassandra package to create the table column family. It seem that the keyspaces and table created successfully. Following are the output.

Created keyspaces. Sleeping 1s for propagation.
total,interval_op_rate, interval_key_rate,latency,95th,99.9th,elapsed_time

So everything seem to created okay in cassandra.

cqlsh:system> desc keyspaces;

jw_schema1 system system_traces

cqlsh:system> use jw_schema1;
cqlsh:jw_schema1> desc tables;

Counter1 Counter3 Standard1 Super1 SuperCounter1

cqlsh:jw_schema1> desc table Counter1;

CREATE TABLE "Counter1" (
key blob,
column1 ascii,
value counter,
PRIMARY KEY (key, column1)
bloom_filter_fp_chance=0.010000 AND
caching='KEYS_ONLY' AND
comment='' AND
dclocal_read_repair_chance=0.000000 AND
gc_grace_seconds=864000 AND
index_interval=128 AND
read_repair_chance=0.100000 AND
replicate_on_write='true' AND
populate_io_cache_on_flush='false' AND
default_time_to_live=0 AND
speculative_retry='NONE' AND
memtable_flush_period_in_ms=0 AND
compaction={'class': 'SizeTieredCompactionStrategy'} AND


when selecting or dropping table in any tables within the keyspaces, things started to become wrong and cassandra server debug log show nothing wrong.
cqlsh:jw_schema1> select * from Counter1;
Bad Request: unconfigured columnfamily counter1

DEBUG [Thrift:105] 2013-11-13 20:55:29,050 (line 1932) execute_cql3_query
DEBUG [Thrift:105] 2013-11-13 20:55:29,050 (line 159) request complete

cqlsh:jw_schema1> drop table Counter1;
Bad Request: Cannot drop non existing column family 'counter1' in keyspace 'jw_schema1'.

DEBUG [Thrift:105] 2013-11-13 20:55:59,392 (line 1932) execute_cql3_query
DEBUG [Thrift:105] 2013-11-13 20:55:59,393 (line 159) request complete

and using the datastax java binary driver.
public void connect(String node) {
cluster = Cluster.builder().addContactPoint(node)
.withReconnectionPolicy(new ConstantReconnectionPolicy(100L)).build();
session = cluster.connect("jw_schema1");

ExecutionInfo info = session.execute("DROP TABLE Counter1").getExecutionInfo();

Exception in thread "main" com.datastax.driver.core.exceptions.InvalidQueryException: Cannot drop non existing column family 'counter1' in keyspace 'jw_schema1'.
at com.datastax.driver.core.exceptions.InvalidQueryException.copy(
at com.datastax.driver.core.ResultSetFuture.extractCauseFromExecutionException(
at com.datastax.driver.core.ResultSetFuture.getUninterruptibly(
at com.datastax.driver.core.Session.execute(
at com.datastax.driver.core.Session.execute(
Caused by: com.datastax.driver.core.exceptions.InvalidConfigurationInQueryException: Cannot drop non existing column family 'counter1' in keyspace 'jw_schema1'.
at com.datastax.driver.core.Responses$Error.asException(
at com.datastax.driver.core.ResultSetFuture$ResponseCallback.onSet(
at com.datastax.driver.core.RequestHandler.setFinalResult(
at com.datastax.driver.core.RequestHandler.onSet(
at com.datastax.driver.core.Connection$Dispatcher.messageReceived(
at org.jboss.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(
at org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(
at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(
at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(
at org.jboss.netty.util.internal.DeadLockProofWorker$
at java.util.concurrent.ThreadPoolExecutor.runWorker(
at java.util.concurrent.ThreadPoolExecutor$

So I'm not sure what is gone wrong, but I'end up dropping the keyspace as a work around.
cqlsh:system> drop keyspace jw_schema1;

work around
cqlsh:system> desc keyspaces;

TestKeyspace system system_traces

cqlsh:system> drop keyspace TestKeyspace;
Bad Request: Cannot drop non existing keyspace 'testkeyspace'.
cqlsh:system> drop keyspace "TestKeyspace";
cqlsh:system> desc keyspaces;

system system_traces