Use Nodetool

Cassandra’s nodetool allows you to narrow problems from the cluster down to a particular node and gives a lot of insight into the state of the Cassandra process itself. There are dozens of useful commands (see nodetool help for all the commands), but briefly some of the most useful for troubleshooting:

Cluster Status

You can use nodetool status to assess status of the cluster:

$ nodetool status <optional keyspace>

Datacenter: dc1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address    Load       Tokens       Owns (effective)  Host ID                               Rack
UN  127.0.1.1  4.69 GiB   1            100.0%            35ea8c9f-b7a2-40a7-b9c5-0ee8b91fdd0e  r1
UN  127.0.1.2  4.71 GiB   1            100.0%            752e278f-b7c5-4f58-974b-9328455af73f  r2
UN  127.0.1.3  4.69 GiB   1            100.0%            9dc1a293-2cc0-40fa-a6fd-9e6054da04a7  r3

In this case we can see that we have three nodes in one datacenter with about 4.6GB of data each and they are all "up". The up/down status of a node is independently determined by every node in the cluster, so you may have to run nodetool status on multiple nodes in a cluster to see the full view.

You can use nodetool status plus a little grep to see which nodes are down:

$ nodetool status | grep -v '^UN'
Datacenter: dc1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address    Load       Tokens       Owns (effective)  Host ID                               Rack
Datacenter: dc2
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address    Load       Tokens       Owns (effective)  Host ID                               Rack
DN  127.0.0.5  105.73 KiB  1            33.3%             df303ac7-61de-46e9-ac79-6e630115fd75  r1

In this case there are two datacenters and there is one node down in datacenter dc2 and rack r1. This may indicate an issue on 127.0.0.5 warranting investigation.

Coordinator Query Latency

You can view latency distributions of coordinator read and write latency to help narrow down latency issues using nodetool proxyhistograms:

$ nodetool proxyhistograms
Percentile       Read Latency      Write Latency      Range Latency   CAS Read Latency  CAS Write Latency View Write Latency
                     (micros)           (micros)           (micros)           (micros)           (micros)           (micros)
50%                    454.83             219.34               0.00               0.00               0.00               0.00
75%                    545.79             263.21               0.00               0.00               0.00               0.00
95%                    654.95             315.85               0.00               0.00               0.00               0.00
98%                    785.94             379.02               0.00               0.00               0.00               0.00
99%                   3379.39            2346.80               0.00               0.00               0.00               0.00
Min                     42.51             105.78               0.00               0.00               0.00               0.00
Max                  25109.16           43388.63               0.00               0.00               0.00               0.00

Here you can see the full latency distribution of reads, writes, range requests (e.g. select * from keyspace.table), CAS read (compare phase of CAS) and CAS write (set phase of compare and set). These can be useful for narrowing down high level latency problems, for example in this case if a client had a 20 millisecond timeout on their reads they might experience the occasional timeout from this node but less than 1% (since the 99% read latency is 3.3 milliseconds < 20 milliseconds).

Local Query Latency

If you know which table is having latency/error issues, you can use nodetool tablehistograms to get a better idea of what is happening locally on a node:

$ nodetool tablehistograms keyspace table
Percentile  SSTables     Write Latency      Read Latency    Partition Size        Cell Count
                              (micros)          (micros)           (bytes)
50%             0.00             73.46            182.79             17084               103
75%             1.00             88.15            315.85             17084               103
95%             2.00            126.93            545.79             17084               103
98%             2.00            152.32            654.95             17084               103
99%             2.00            182.79            785.94             17084               103
Min             0.00             42.51             24.60             14238                87
Max             2.00          12108.97          17436.92             17084               103

This shows you percentile breakdowns particularly critical metrics.

The first column contains how many sstables were read per logical read. A very high number here indicates that you may have chosen the wrong compaction strategy, e.g. SizeTieredCompactionStrategy typically has many more reads per read than LeveledCompactionStrategy does for update heavy workloads.

The second column shows you a latency breakdown of local write latency. In this case we see that while the p50 is quite good at 73 microseconds, the maximum latency is quite slow at 12 milliseconds. High write max latencies often indicate a slow commitlog volume (slow to fsync) or large writes that quickly saturate commitlog segments.

The third column shows you a latency breakdown of local read latency. We can see that local Cassandra reads are (as expected) slower than local writes, and the read speed correlates highly with the number of sstables read per read.

The fourth and fifth columns show distributions of partition size and column count per partition. These are useful for determining if the table has on average skinny or wide partitions and can help you isolate bad data patterns. For example if you have a single cell that is 2 megabytes, that is probably going to cause some heap pressure when it’s read.

Threadpool State

You can use nodetool tpstats to view the current outstanding requests on a particular node. This is useful for trying to find out which resource (read threads, write threads, compaction, request response threads) the Cassandra process lacks. For example:

$ nodetool tpstats
Pool Name                         Active   Pending      Completed   Blocked  All time blocked
ReadStage                              2         0             12         0                 0
MiscStage                              0         0              0         0                 0
CompactionExecutor                     0         0           1940         0                 0
MutationStage                          0         0              0         0                 0
GossipStage                            0         0          10293         0                 0
Repair-Task                            0         0              0         0                 0
RequestResponseStage                   0         0             16         0                 0
ReadRepairStage                        0         0              0         0                 0
CounterMutationStage                   0         0              0         0                 0
MemtablePostFlush                      0         0             83         0                 0
ValidationExecutor                     0         0              0         0                 0
MemtableFlushWriter                    0         0             30         0                 0
ViewMutationStage                      0         0              0         0                 0
CacheCleanupExecutor                   0         0              0         0                 0
MemtableReclaimMemory                  0         0             30         0                 0
PendingRangeCalculator                 0         0             11         0                 0
SecondaryIndexManagement               0         0              0         0                 0
HintsDispatcher                        0         0              0         0                 0
Native-Transport-Requests              0         0            192         0                 0
MigrationStage                         0         0             14         0                 0
PerDiskMemtableFlushWriter_0           0         0             30         0                 0
Sampler                                0         0              0         0                 0
ViewBuildExecutor                      0         0              0         0                 0
InternalResponseStage                  0         0              0         0                 0
AntiEntropyStage                       0         0              0         0                 0

Message type           Dropped                  Latency waiting in queue (micros)
                                             50%               95%               99%               Max
READ                         0               N/A               N/A               N/A               N/A
RANGE_SLICE                  0              0.00              0.00              0.00              0.00
_TRACE                       0               N/A               N/A               N/A               N/A
HINT                         0               N/A               N/A               N/A               N/A
MUTATION                     0               N/A               N/A               N/A               N/A
COUNTER_MUTATION             0               N/A               N/A               N/A               N/A
BATCH_STORE                  0               N/A               N/A               N/A               N/A
BATCH_REMOVE                 0               N/A               N/A               N/A               N/A
REQUEST_RESPONSE             0              0.00              0.00              0.00              0.00
PAGED_RANGE                  0               N/A               N/A               N/A               N/A
READ_REPAIR                  0               N/A               N/A               N/A               N/A

This command shows you all kinds of interesting statistics. The first section shows a detailed breakdown of threadpools for each Cassandra stage, including how many threads are current executing (Active) and how many are waiting to run (Pending). Typically if you see pending executions in a particular threadpool that indicates a problem localized to that type of operation. For example if the RequestResponseState queue is backing up, that means that the coordinators are waiting on a lot of downstream replica requests and may indicate a lack of token awareness, or very high consistency levels being used on read requests (for example reading at ALL ties up RF RequestResponseState threads whereas LOCAL_ONE only uses a single thread in the ReadStage threadpool). On the other hand if you see a lot of pending compactions that may indicate that your compaction threads cannot keep up with the volume of writes and you may need to tune either the compaction strategy or the concurrent_compactors or compaction_throughput options.

The second section shows drops (errors) and latency distributions for all the major request types. Drops are cumulative since process start, but if you have any that indicate a serious problem as the default timeouts to qualify as a drop are quite high (~5-10 seconds). Dropped messages often warrants further investigation.

Compaction State

As Cassandra is a LSM datastore, Cassandra sometimes has to compact sstables together, which can have adverse effects on performance. In particular, compaction uses a reasonable quantity of CPU resources, invalidates large quantities of the OS page cache, and can put a lot of load on your disk drives. There are great os tools <os-iostat> to determine if this is the case, but often it’s a good idea to check if compactions are even running using nodetool compactionstats:

$ nodetool compactionstats
pending tasks: 2
- keyspace.table: 2

id                                   compaction type keyspace table completed total    unit  progress
2062b290-7f3a-11e8-9358-cd941b956e60 Compaction      keyspace table 21848273  97867583 bytes 22.32%
Active compaction remaining time :   0h00m04s

In this case there is a single compaction running on the keyspace.table table, has completed 21.8 megabytes of 97 and Cassandra estimates (based on the configured compaction throughput) that this will take 4 seconds. You can also pass -H to get the units in a human readable format.

Generally each running compaction can consume a single core, but the more you do in parallel the faster data compacts. Compaction is crucial to ensuring good read performance so having the right balance of concurrent compactions such that compactions complete quickly but don’t take too many resources away from query threads is very important for performance. If you notice compaction unable to keep up, try tuning Cassandra’s concurrent_compactors or compaction_throughput options.

Cassandra Documentation

Version: