So, to answer question like, how do I determine if my cluster is at bottleneck? To answer this type of question, you will need to have the measuring tools ready and measure over time. Meaning that you need to display statistics in graphical form and with the history, it should give an indication of the cluster performance.
Because the topic will grow huge, hence, we will focus on a specific metric. This article gonna inspect the metric exposed by the jmx beans. In order to inspect the jmx metrics, you will need a jmx client. There is a gui jmx client that comes with the jdk, that is jconsole. Because nature of this article, I would suggest you go for cli jmx client, for example jmxterm. You can read introduction of jmxterm here.
There are many metrics exposed by cassandra jmx beans. But we will focus on bean org.apache.cassandra.db:type=CompactionManager.
If you are using jmxterm, you can read the output below:
$ cat test.script
$ java -jar jmxterm-1.0-alpha-4-uber.jar -i test.script
Welcome to JMX terminal. Type "help" for available commands.
#Connection to localhost:7199 is opened
#bean is set to org.apache.cassandra.db:type=CompactionManager
#mbean = org.apache.cassandra.db:type=CompactionManager:
PendingTasks = 0;
So if you plot PendingTasks in a graph over time with a periodic interval, it should give insight to your cluster performance. You can also plot the statistics output from nodetool tpstats. I would suggest also, you plot Message type dropped as those metrics indicate over time that the performance is impacted. If you have a stock cassandra settings, you will probably want to fine tune to your node at this point after this investigation and analysis on the graph.
There is no best strategies, as mentioned earlier, you need experience and there are many other metrics measuring tools, example sar , iostats, top, your application measurement. So it take some times to even master all of these but it is crucial if you managed a production cluster.