Monitoring
Metrics in Cassandra are managed using the Dropwizard Metrics library. These metrics can be queried via JMX or pushed to external monitoring systems using a number of built in and third party reporter plugins.
Metrics are collected for a single node. It’s up to the operator to use an external monitoring system to aggregate them.
Metric Types
All metrics reported by cassandra fit into one of the following types.
Gauge
-
An instantaneous measurement of a value.
Counter
-
A gauge for an
AtomicLong
instance. Typically this is consumed by monitoring the change since the last call to see if there is a large increase compared to the norm. Histogram
-
Measures the statistical distribution of values in a stream of data. + In addition to minimum, maximum, mean, etc., it also measures median, 75th, 90th, 95th, 98th, 99th, and 99.9th percentiles.
Timer
-
Measures both the rate that a particular piece of code is called and the histogram of its duration.
Latency
-
Special type that tracks latency (in microseconds) with a
Timer
plus aCounter
that tracks the total latency accrued since starting. The former is useful if you track the change in total latency since the last check. Each metric name of this type will have 'Latency' and 'TotalLatency' appended to it. Meter
-
A meter metric which measures mean throughput and one-, five-, and fifteen-minute exponentially-weighted moving average throughputs.
Table Metrics
Each table in Cassandra has metrics responsible for tracking its state and performance.
The metric names are all appended with the specific Keyspace
and
Table
name.
Reported name format:
- Metric Name
-
org.apache.cassandra.metrics.Table.<MetricName>.<Keyspace>.<Table>
- JMX MBean
-
org.apache.cassandra.metrics:type=Table keyspace=<Keyspace> scope=<Table> name=<MetricName>
Note
There is a special table called ‘all’ without a keyspace. This represents the aggregation of metrics across all tables and keyspaces on the node. ====[cols=",,",options="header",]
== Keyspace Metrics Each keyspace in Cassandra has metrics responsible for tracking its state and performance. Most of these metrics are the same as the Reported name format:
== ThreadPool Metrics Cassandra splits work of a particular type into its own thread pool. This provides back-pressure and asynchrony for requests on a node. It’s important to monitor the state of these thread pools since they can tell you how saturated a node is. The metric names are all appended with the specific Reported name format:
The following thread pools can be monitored.
== Client Request Metrics Client requests have their own set of metrics that encapsulate the work happening at coordinator level. Different types of client requests are broken down by Reported name format:
== Cache Metrics Cassandra caches have metrics to track the effectivness of the caches.
Though the Reported name format:
The following caches are covered:
|
Misses and MissLatency are only defined for the ChunkCache ====== CQL Metrics
Metrics specific to CQL prepared statement caching.
Reported name format:
- Metric Name
-
org.apache.cassandra.metrics.CQL.<MetricName>
- JMX MBean
-
org.apache.cassandra.metrics:type=CQL name=<MetricName>
Name | Type | Description |
---|---|---|
PreparedStatementsCount |
Gauge<Integer> |
Number of cached prepared statements. |
PreparedStatementsEvicted |
Counter |
Number of prepared statements evicted from the prepared statement cache |
PreparedStatementsExecuted |
Counter |
Number of prepared statements executed. |
RegularStatementsExecuted |
Counter |
Number of non prepared statements executed. |
PreparedStatementsRatio |
Gauge<Double> |
Percentage of statements that are prepared vs unprepared. |
DroppedMessage Metrics
Metrics specific to tracking dropped messages for different types of
requests. Dropped writes are stored and retried by Hinted Handoff
Reported name format:
- Metric Name
-
org.apache.cassandra.metrics.DroppedMessage.<MetricName>.<Type>
- JMX MBean
-
org.apache.cassandra.metrics:type=DroppedMessage scope=<Type> name=<MetricName>
Name | Type | Description |
---|---|---|
CrossNodeDroppedLatency |
Timer |
The dropped latency across nodes. |
InternalDroppedLatency |
Timer |
The dropped latency within node. |
Dropped |
Meter |
Number of dropped messages. |
The different types of messages tracked are:
Name | Description |
---|---|
BATCH_STORE |
Batchlog write |
BATCH_REMOVE |
Batchlog cleanup (after succesfully applied) |
COUNTER_MUTATION |
Counter writes |
HINT |
Hint replay |
MUTATION |
Regular writes |
READ |
Regular reads |
READ_REPAIR |
Read repair |
PAGED_SLICE |
Paged read |
RANGE_SLICE |
Token range read |
REQUEST_RESPONSE |
RPC Callbacks |
_TRACE |
Tracing writes |
Streaming Metrics
Metrics reported during Streaming
operations, such as repair,
bootstrap, rebuild.
These metrics are specific to a peer endpoint, with the source node being the node you are pulling the metrics from.
Reported name format:
- Metric Name
-
org.apache.cassandra.metrics.Streaming.<MetricName>.<PeerIP>
- JMX MBean
-
org.apache.cassandra.metrics:type=Streaming scope=<PeerIP> name=<MetricName>
Name | Type | Description |
---|---|---|
IncomingBytes |
Counter |
Number of bytes streamed to this node from the peer. |
OutgoingBytes |
Counter |
Number of bytes streamed to the peer endpoint from this node. |
Compaction Metrics
Metrics specific to Compaction
work.
Reported name format:
- Metric Name
-
org.apache.cassandra.metrics.Compaction.<MetricName>
- JMX MBean
-
org.apache.cassandra.metrics:type=Compaction name=<MetricName>
Name | Type | Description |
---|---|---|
BytesCompacted |
Counter |
Total number of bytes compacted since server [re]start. |
PendingTasks |
Gauge<Integer> |
Estimated number of compactions remaining to perform. |
CompletedTasks |
Gauge<Long> |
Number of completed compactions since server [re]start. |
TotalCompactionsCompleted |
Meter |
Throughput of completed compactions since server [re]start. |
PendingTasksByTableName |
Gauge<Map<String, Map<String, Integer>>> |
Estimated number of compactions remaining to perform, grouped by
keyspace and then table name. This info is also kept in |
CommitLog Metrics
Metrics specific to the CommitLog
Reported name format:
- Metric Name
-
org.apache.cassandra.metrics.CommitLog.<MetricName>
- JMX MBean
-
org.apache.cassandra.metrics:type=CommitLog name=<MetricName>
Name | Type | Description |
---|---|---|
CompletedTasks |
Gauge<Long> |
Total number of commit log messages written since [re]start. |
PendingTasks |
Gauge<Long> |
Number of commit log messages written but yet to be fsync’d. |
TotalCommitLogSize |
Gauge<Long> |
Current size, in bytes, used by all the commit log segments. |
WaitingOnSegmentAllocation |
Timer |
Time spent waiting for a CommitLogSegment to be allocated - under normal conditions this should be zero. |
WaitingOnCommit |
Timer |
The time spent waiting on CL fsync; for Periodic this is only occurs when the sync is lagging its sync interval. |
Storage Metrics
Metrics specific to the storage engine.
Reported name format:
- Metric Name
-
org.apache.cassandra.metrics.Storage.<MetricName>
- JMX MBean
-
org.apache.cassandra.metrics:type=Storage name=<MetricName>
Name | Type | Description |
---|---|---|
Exceptions |
Counter |
Number of internal exceptions caught. Under normal exceptions this should be zero. |
Load |
Counter |
Size, in bytes, of the on disk data size this node manages. |
TotalHints |
Counter |
Number of hint messages written to this node since [re]start. Includes one entry for each host to be hinted per hint. |
TotalHintsInProgress |
Counter |
Number of hints attemping to be sent currently. |
HintedHandoff Metrics
Metrics specific to Hinted Handoff. There are also some metrics related
to hints tracked in Storage Metrics
These metrics include the peer endpoint in the metric name
Reported name format:
- Metric Name
-
org.apache.cassandra.metrics.HintedHandOffManager.<MetricName>
- JMX MBean
-
org.apache.cassandra.metrics:type=HintedHandOffManager name=<MetricName>
Name | Type | Description |
---|---|---|
Hints_created-<PeerIP> |
|
|
Hints_not_stored-<PeerIP> |
|
|
HintsService Metrics
Metrics specific to the Hints delivery service. There are also some
metrics related to hints tracked in Storage Metrics
These metrics include the peer endpoint in the metric name
Reported name format:
- Metric Name
-
org.apache.cassandra.metrics.HintsService.<MetricName>
- JMX MBean
-
org.apache.cassandra.metrics:type=HintsService name=<MetricName>
Name | Type | Description |
---|---|---|
HintsSucceeded |
|
|
HintsFailed |
|
|
HintsTimedOut |
|
|
Hint_delays |
Histogram |
Histogram of hint delivery delays (in milliseconds) |
Hint_delays-<PeerIP> |
Histogram |
Histogram of hint delivery delays (in milliseconds) per peer |
SSTable Index Metrics
Metrics specific to the SSTable index metadata.
Reported name format:
- Metric Name
-
org.apache.cassandra.metrics.Index.<MetricName>.RowIndexEntry
- JMX MBean
-
org.apache.cassandra.metrics:type=Index scope=RowIndexEntry name=<MetricName>
Name | Type | Description |
---|---|---|
IndexedEntrySize |
Histogram |
Histogram of the on-heap size, in bytes, of the index across all SSTables. |
IndexInfoCount |
Histogram |
Histogram of the number of on-heap index entries managed across all SSTables. |
IndexInfoGets |
Histogram |
Histogram of the number index seeks performed per SSTable. |
BufferPool Metrics
Metrics specific to the internal recycled buffer pool Cassandra manages. This pool is meant to keep allocations and GC lower by recycling on and off heap buffers.
Reported name format:
- Metric Name
-
org.apache.cassandra.metrics.BufferPool.<MetricName>
- JMX MBean
-
org.apache.cassandra.metrics:type=BufferPool name=<MetricName>
Name | Type | Description |
---|---|---|
Size |
Gauge<Long> |
Size, in bytes, of the managed buffer pool |
Misses |
Meter |
|
Client Metrics
Metrics specifc to client managment.
Reported name format:
- Metric Name
-
org.apache.cassandra.metrics.Client.<MetricName>
- JMX MBean
-
org.apache.cassandra.metrics:type=Client name=<MetricName>
Name | Type | Description |
---|---|---|
connectedNativeClients |
Gauge<Integer> |
Number of clients connected to this nodes native protocol server |
connections |
Gauge<List<Map<String, String>> |
List of all connections and their state information |
connectedNativeClientsByUser |
Gauge<Map<String, Int> |
Number of connnective native clients by username |
Batch Metrics
Metrics specifc to batch statements.
Reported name format:
- Metric Name
-
org.apache.cassandra.metrics.Batch.<MetricName>
- JMX MBean
-
org.apache.cassandra.metrics:type=Batch name=<MetricName>
Name | Type | Description |
---|---|---|
PartitionsPerCounterBatch |
Histogram |
Distribution of the number of partitions processed per counter batch |
PartitionsPerLoggedBatch |
Histogram |
Distribution of the number of partitions processed per logged batch |
PartitionsPerUnloggedBatch |
Histogram |
Distribution of the number of partitions processed per unlogged batch |
JVM Metrics
JVM metrics such as memory and garbage collection statistics can either be accessed by connecting to the JVM using JMX or can be exported using Metric Reporters.
BufferPool
- Metric Name
-
jvm.buffers.<direct|mapped>.<MetricName>
- JMX MBean
-
java.nio:type=BufferPool name=<direct|mapped>
Name | Type | Description |
---|---|---|
Capacity |
Gauge<Long> |
Estimated total capacity of the buffers in this pool |
Count |
Gauge<Long> |
Estimated number of buffers in the pool |
Used |
Gauge<Long> |
Estimated memory that the Java virtual machine is using for this buffer pool |
FileDescriptorRatio
- Metric Name
-
jvm.fd.<MetricName>
- JMX MBean
-
java.lang:type=OperatingSystem name=<OpenFileDescriptorCount|MaxFileDescriptorCount>
Name | Type | Description |
---|---|---|
Usage |
Ratio |
Ratio of used to total file descriptors |
GarbageCollector
- Metric Name
-
jvm.gc.<gc_type>.<MetricName>
- JMX MBean
-
java.lang:type=GarbageCollector name=<gc_type>
Name | Type | Description |
---|---|---|
Count |
Gauge<Long> |
Total number of collections that have occurred |
Time |
Gauge<Long> |
Approximate accumulated collection elapsed time in milliseconds |
Memory
- Metric Name
-
jvm.memory.<heap/non-heap/total>.<MetricName>
- JMX MBean
-
java.lang:type=Memory
Committed |
Gauge<Long> |
Amount of memory in bytes that is committed for the JVM to use |
Init |
Gauge<Long> |
Amount of memory in bytes that the JVM initially requests from the OS |
Max |
Gauge<Long> |
Maximum amount of memory in bytes that can be used for memory management |
Usage |
Ratio |
Ratio of used to maximum memory |
Used |
Gauge<Long> |
Amount of used memory in bytes |
MemoryPool
- Metric Name
-
jvm.memory.pools.<memory_pool>.<MetricName>
- JMX MBean
-
java.lang:type=MemoryPool name=<memory_pool>
Committed |
Gauge<Long> |
Amount of memory in bytes that is committed for the JVM to use |
Init |
Gauge<Long> |
Amount of memory in bytes that the JVM initially requests from the OS |
Max |
Gauge<Long> |
Maximum amount of memory in bytes that can be used for memory management |
Usage |
Ratio |
Ratio of used to maximum memory |
Used |
Gauge<Long> |
Amount of used memory in bytes |
JMX
Any JMX based client can access metrics from cassandra.
If you wish to access JMX metrics over http it’s possible to download
Mx4jTool and place mx4j-tools.jar
into
the classpath. On startup you will see in the log:
HttpAdaptor version 3.0.2 started on port 8081
To choose a different port (8081 is the default) or a different listen
address (0.0.0.0 is not the default) edit conf/cassandra-env.sh
and
uncomment:
#MX4J_ADDRESS="-Dmx4jaddress=0.0.0.0"
#MX4J_PORT="-Dmx4jport=8081"
Metric Reporters
As mentioned at the top of this section on monitoring the Cassandra metrics can be exported to a number of monitoring system a number of built in and third party reporter plugins.
The configuration of these plugins is managed by the
metrics reporter
config project. There is a sample configuration file located at
conf/metrics-reporter-config-sample.yaml
.
Once configured, you simply start cassandra with the flag
-Dcassandra.metricsReporterConfigFile=metrics-reporter-config.yaml
.
The specified .yaml file plus any 3rd party reporter jars must all be in
Cassandra’s classpath.