Friday, December 20, 2013

A maven introduction.

So recently, I have been working on an opensource project and stumble upon maven. So I'm all ant guy (with ant background), and guess that to use maven should not be that difficult to start using it.

When you see a file in the java project, pom.xml, this should tell you that it is a maven configuration file. So for instance, in a as simple java project, it would look like
|project home |
| +-------+
+------+ src |
| +---+---+
| | +------+
| +-----+ main |
| | +--+---+
| | | +----------+
| | +----+ java |
| | | +----------+
| | | +----------+
| | +----+ resources|
| | +----------+
| | +------+
| +-----+ test |
| +--+---+
| | +----------+
| +----+ java |
| | +----------+
| | +----------+
| +----+ resources|
| +--------+ +----------+
+-----+ target |
| +--------+
| +--------+
+-----+ pom.xml|

The most basic command that you ever gonna use and use it very often would probably

mvn package

With above command, mvn will compile your class, run any tests and package the deliverable code and resources into target/my-app-1.0.jar . If mvn produced this jar, this should be enough and that the developer should be able to concentrate the java project.

But if you are adventurous and want to know more about maven, continue to read on. There are a few maven phases which you can issue the command. The following is the standard maven lifecycle with an ordered phases.

  • process-resources

  • compile

  • process-test-resources

  • test-compile

  • test

  • package

  • install

  • deploy

So in order to satisfy the library dependencies of your project, you should specify coordinate of the lib that it depends into pom.xml. You can use  this site to search for the libraries it depends.

I hope this answer a simple start up to use maven to assist in your java project. If you reach here and have further question, this link  and this link .

Changing ElasticSearch logging level by updating cluster setting.

In this article, we are going to learn how to update logging for all the in the elasticsearch cluster. Because logging is crucial in understanding the system behaviour, so from time to time, change the logging level in elasticsearch via elasticsearch.yml and restart elasticsearch instance so that the logging level will be pick up. Unfortunately restart on the live production will take sometime (because of the shards recovery) and this could not be efficient.

Luckily, there is a setting in the cluster which allow the logging level to be change on the fly.

So with that, if you want to understand the what's happening in the cluster node, you can change the logging

curl -XPUT localhost:9200/_cluster/settings -d '{
"transient" : {
"logger.cluster.service" : "DEBUG"

and tail the elasticsearch log, you should see some log started appearing. Because logging is managed by the class NodeSettingsService, so you should read into the elasticsearch package that initialized with this class. Example elasticsearch package, cluster.service, cluster.routing.allocation.allocator, indices.ttl.IndicesTTLService, etc. Note that the package prefix, org.elastic is not needed when the setting is updated.

If you want more information, this link would provide better help.

Friday, December 6, 2013

cassandra 2.0 catch 101 – part4

It has been a while since I last post, mainly was due to the abundane works. :-(  In this article, I'm gonna share with the lesson learned on cassandra 2.0.2 learned using cqlsh 4.1.0.

Last we had to remove all the files in /var/lib/cassandra/ simply because somewhere it break when we upgraded from cassandra 2.0.0 to 2.0.2 and everybody in the teams just do not have the time to goes into details. So since this is just4fun cluster, we agreed to removed the dir /var/lib/cassandra/ and start the cluster using cassandra 2.0.2.

In order to better understand cassandra, we take a detail look at alter table. But before that, let's create a new keyspace and table.
cqlsh> CREATE KEYSPACE jw_schema1 WITH replication = {'class':'SimpleStrategy', 'replication_factor':3};

and the correspondance cassandra system.log

INFO [Thrift:7] 2013-12-06 16:17:21,902 (line 217) Create new Keyspace: jw_schema1, rep strategy:SimpleStrategy{}, strategy_options: {replication_factor=3}, durable_writes: true

cassandra 2.0 catch 101 – part3

So many of us are from mysql / postgres background and we quickly interface to the database using the command line. In order to comment in cassandra cql, it is different than in sql. Read the example below
cqlsh:jw_schema1> #select * from users;
Invalid syntax at line 1, char 1
#select * from users;
cqlsh:jw_schema1> --select * from users;
cqlsh:jw_schema1> -select * from users;
Bad Request: line 1:0 no viable alternative at input '-'
cqlsh:jw_schema1> -- select * from users;

So as you can see, the hash glyph do not work in cqlsh, you need to use double dashes in front of the comment you want to made.

Voila! =)

Saturday, November 30, 2013

how does read performance gains when in compression?

Read the following interesting discussion in the cassandra mailing list, and think very good explanation and would like to share out.

how does read performance gains when in compression?

Cite from Artur Kronenberg
The way I understand it is that compression gives you the advantage of having to use way less IO and rather use CPU. The bottleneck of reads is usually the IO time you need to read the data from disk. As a figure, we had about 25 reads/s reading from disk, while we get up to 3000 reads/s when we have all of it in cache. So having good compression reduces the amount you have to read from disk. Rather you may spend a little bit more time decompressing data, but this data will be in cache anyways so it won't matter.

Cite from Edward Capriolo
The big * in the explanation: Smaller file size footprint leads to better disk cache, however decompression adds work for the JVM to do and increases the churn of objects in the JVM. Additionally compression block sizes might be 4KB while for some use cases a small row may be 200bytes. This means that internally a large block might be decompressed to get at the row inside of it.

In many use cases compression is a performance win, but not necessarily in all cases. In particular if you are already doing JVM performance tuning issues to stop garbage collection pauses enabling compression could make performance worse.

Thursday, November 14, 2013

Wednesday, November 13, 2013

cassandra 2.0 catch 101 – part2

After playing playing around cassandra 2.0 for quite sometime and in this article, I'm gonna share with you a strange issue that encountered, unable to drop table no matter how.

I'm using the stress tools in cassandra package to create the table column family. It seem that the keyspaces and table created successfully. Following are the output.

Created keyspaces. Sleeping 1s for propagation.
total,interval_op_rate, interval_key_rate,latency,95th,99.9th,elapsed_time

So everything seem to created okay in cassandra.

cqlsh:system> desc keyspaces;

jw_schema1 system system_traces

cqlsh:system> use jw_schema1;
cqlsh:jw_schema1> desc tables;

Counter1 Counter3 Standard1 Super1 SuperCounter1

cqlsh:jw_schema1> desc table Counter1;

CREATE TABLE "Counter1" (
key blob,
column1 ascii,
value counter,
PRIMARY KEY (key, column1)
bloom_filter_fp_chance=0.010000 AND
caching='KEYS_ONLY' AND
comment='' AND
dclocal_read_repair_chance=0.000000 AND
gc_grace_seconds=864000 AND
index_interval=128 AND
read_repair_chance=0.100000 AND
replicate_on_write='true' AND
populate_io_cache_on_flush='false' AND
default_time_to_live=0 AND
speculative_retry='NONE' AND
memtable_flush_period_in_ms=0 AND
compaction={'class': 'SizeTieredCompactionStrategy'} AND


when selecting or dropping table in any tables within the keyspaces, things started to become wrong and cassandra server debug log show nothing wrong.
cqlsh:jw_schema1> select * from Counter1;
Bad Request: unconfigured columnfamily counter1

DEBUG [Thrift:105] 2013-11-13 20:55:29,050 (line 1932) execute_cql3_query
DEBUG [Thrift:105] 2013-11-13 20:55:29,050 (line 159) request complete

cqlsh:jw_schema1> drop table Counter1;
Bad Request: Cannot drop non existing column family 'counter1' in keyspace 'jw_schema1'.

DEBUG [Thrift:105] 2013-11-13 20:55:59,392 (line 1932) execute_cql3_query
DEBUG [Thrift:105] 2013-11-13 20:55:59,393 (line 159) request complete

and using the datastax java binary driver.
public void connect(String node) {
cluster = Cluster.builder().addContactPoint(node)
.withReconnectionPolicy(new ConstantReconnectionPolicy(100L)).build();
session = cluster.connect("jw_schema1");

ExecutionInfo info = session.execute("DROP TABLE Counter1").getExecutionInfo();

Exception in thread "main" com.datastax.driver.core.exceptions.InvalidQueryException: Cannot drop non existing column family 'counter1' in keyspace 'jw_schema1'.
at com.datastax.driver.core.exceptions.InvalidQueryException.copy(
at com.datastax.driver.core.ResultSetFuture.extractCauseFromExecutionException(
at com.datastax.driver.core.ResultSetFuture.getUninterruptibly(
at com.datastax.driver.core.Session.execute(
at com.datastax.driver.core.Session.execute(
Caused by: com.datastax.driver.core.exceptions.InvalidConfigurationInQueryException: Cannot drop non existing column family 'counter1' in keyspace 'jw_schema1'.
at com.datastax.driver.core.Responses$Error.asException(
at com.datastax.driver.core.ResultSetFuture$ResponseCallback.onSet(
at com.datastax.driver.core.RequestHandler.setFinalResult(
at com.datastax.driver.core.RequestHandler.onSet(
at com.datastax.driver.core.Connection$Dispatcher.messageReceived(
at org.jboss.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(
at org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(
at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(
at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(
at org.jboss.netty.util.internal.DeadLockProofWorker$
at java.util.concurrent.ThreadPoolExecutor.runWorker(
at java.util.concurrent.ThreadPoolExecutor$

So I'm not sure what is gone wrong, but I'end up dropping the keyspace as a work around.
cqlsh:system> drop keyspace jw_schema1;

work around
cqlsh:system> desc keyspaces;

TestKeyspace system system_traces

cqlsh:system> drop keyspace TestKeyspace;
Bad Request: Cannot drop non existing keyspace 'testkeyspace'.
cqlsh:system> drop keyspace "TestKeyspace";
cqlsh:system> desc keyspaces;

system system_traces