Production recommendations
The cassandra.yaml
and jvm.options
files have a number of notes and
recommendations for production usage.
This page expands on some of the information in the files.
Tokens
Using more than one token-range per node is referred to as virtual nodes, or vnodes.
vnodes
facilitate flexible expansion with more streaming peers when a new node bootstraps
into a cluster.
Limiting the negative impact of streaming (I/O and CPU overhead) enables incremental cluster expansion.
However, more tokens leads to sharing data with more peers, and results in decreased availability.
These two factors must be balanced based on a cluster’s characteristic reads and writes.
To learn more,
Cassandra Availability in Virtual Nodes, Joseph Lynch and Josh Snyder is recommended reading.
Change the number of tokens using the setting in the cassandra.yaml
file:
num_tokens: 16
Here are the most common token counts with a brief explanation of when and why you would use each one.
Token Count | Description |
---|---|
1 |
Maximum availablility, maximum cluster size, fewest peers, but inflexible expansion. Must always double size of cluster to expand and remain balanced. |
4 |
A healthy mix of elasticity and availability. Recommended for clusters which will eventually reach over 30 nodes. Requires adding approximately 20% more nodes to remain balanced. Shrinking a cluster may result in cluster imbalance. |
8 |
Using 8 vnodes distributes the workload between systems with a ~10% variance and has minimal impact on performance. |
16 |
Best for heavily elastic clusters which expand and shrink regularly, but may have issues availability with larger clusters. Not recommended for clusters over 50 nodes. |
In addition to setting the token count, it’s extremely important that
allocate_tokens_for_local_replication_factor
in cassandra.yaml
is set to an
appropriate number of replicates, to ensure even token allocation.
Read ahead
Read ahead is an operating system feature that attempts to keep as much data as possible loaded in the page cache. Spinning disks can have long seek times causing high latency, so additional throughput on reads using page cache can improve performance. By leveraging read ahead, the OS can pull additional data into memory without the cost of additional seeks. This method works well when the available RAM is greater than the size of the hot dataset, but can be problematic when the reverse is true (dataset > RAM). The larger the hot dataset, the less read ahead is useful.
Read ahead is definitely not useful in the following cases:
-
Small partitions, such as tables with a single partition key
-
Solid state drives (SSDs)
Read ahead can actually increase disk usage, and in some cases result in as much as a 5x latency and throughput performance penalty. Read-heavy, key/value tables with small (under 1KB) rows are especially prone to this problem.
The recommended read ahead settings are:
Hardware | Initial Recommendation |
---|---|
Spinning Disks |
64KB |
SSD |
4KB |
Read ahead can be adjusted on Linux systems using the blockdev
tool.
For example, set the read ahead of the disk /dev/sda1
to 4KB:
$ blockdev --setra 8 /dev/sda1
The |
All systems are different, so use these recommendations as a starting point and tune, based on your SLA and throughput requirements. To understand how read ahead impacts disk resource usage, we recommend carefully reading through the Diving Deep, using external tools section.
Compression
Compressed data is stored by compressing fixed-size byte buffers and writing the
data to disk.
The buffer size is determined by the chunk_length_in_kb
element in the compression
map of a table’s schema settings for WITH COMPRESSION
.
The default setting is 16KB starting with Cassandra 4.0.
Since the entire compressed buffer must be read off-disk, using a compression chunk length that is too large can lead to significant overhead when reading small records. Combined with the default read ahead setting, the result can be massive read amplification for certain workloads. Therefore, picking an appropriate value for this setting is important.
LZ4Compressor is the default and recommended compression algorithm. If you need additional information on compression, read The Last Pickle blogpost on compression performance.
Compaction
There are different compaction strategies available for different workloads. We recommend reading about the different strategies to understand which is the best for your environment. Different tables may, and frequently do use different compaction strategies in the same cluster.
Encryption
It is significantly better to set up peer-to-peer encryption and client server encryption when setting up your production cluster. Setting it up after the cluster is serving production traffic is challenging to do correctly. If you ever plan to use network encryption of any type, we recommend setting it up when initially configuring your cluster. Changing these configurations later is not impossible, but mistakes can result in downtime or data loss.
Ensure keyspaces are created with NetworkTopologyStrategy
Production clusters should never use SimpleStrategy
.
Production keyspaces should use the NetworkTopologyStrategy
(NTS).
For example:
CREATE KEYSPACE mykeyspace WITH replication = {
'class': 'NetworkTopologyStrategy',
'datacenter1': 3
};
Cassandra clusters initialized with NetworkTopologyStrategy
can take advantage
of the ability to configure multiple racks and data centers.
Configure racks and snitch
Correctly configuring or changing racks after a cluster has been provisioned is an unsupported process.
Migrating from a single rack to multiple racks is also unsupported and can
result in data loss.
Using GossipingPropertyFileSnitch
is the most flexible solution for
on-premise or mixed cloud environments.
Ec2Snitch
is reliable for AWS EC2 only environments.