Production Recommendations

The cassandra.yaml and jvm.options files have a number of notes and recommendations for production usage. This page expands on some of the notes in these files with additional information.

Tokens

Using more than 1 token (referred to as vnodes) allows for more flexible expansion and more streaming peers when bootstrapping new nodes into the cluster. This can limit the negative impact of streaming (I/O and CPU overhead) as well as allow for incremental cluster expansion.

As a tradeoff, more tokens will lead to sharing data with more peers, which can result in decreased availability. To learn more about this we recommend reading this paper.

The number of tokens can be changed using the following setting:

num_tokens: 16

Here are the most common token counts with a brief explanation of when and why you would use each one.

Token Count Description
1 Maximum availablility, maximum cluster size, fewest peers, but inflexible expansion. Must always double size of cluster to expand and remain balanced.
4 A healthy mix of elasticity and availability. Recommended for clusters which will eventually reach over 30 nodes. Requires adding approximately 20% more nodes to remain balanced. Shrinking a cluster may result in cluster imbalance.
16 Best for heavily elastic clusters which expand and shrink regularly, but may have issues availability with larger clusters. Not recommended for clusters over 50 nodes.

In addition to setting the token count, it’s extremely important that allocate_tokens_for_local_replication_factor be set as well, to ensure even token allocation.

Read Ahead

Read ahead is an operating system feature that attempts to keep as much data loaded in the page cache as possible. The goal is to decrease latency by using additional throughput on reads where the latency penalty is high due to seek times on spinning disks. By leveraging read ahead, the OS can pull additional data into memory without the cost of additional seeks. This works well when available RAM is greater than the size of the hot dataset, but can be problematic when the hot dataset is much larger than available RAM. The benefit of read ahead decreases as the size of your hot dataset gets bigger in proportion to available memory.

With small partitions (usually tables with no partition key, but not limited to this case) and solid state drives, read ahead can increase disk usage without any of the latency benefits, and in some cases can result in up to a 5x latency and throughput performance penalty. Read heavy, key/value tables with small (under 1KB) rows are especially prone to this problem.

We recommend the following read ahead settings:

Hardware Initial Recommendation
Spinning Disks 64KB
SSD 4KB

Read ahead can be adjusted on Linux systems by using the blockdev tool.

For example, we can set read ahead of ``/dev/sda1` to 4KB by doing the following:

blockdev --setra 8 /dev/sda1

Note: blockdev accepts the number of 512 byte sectors to read ahead. The argument of 8 above is equivilent to 4KB.

Since each system is different, use the above recommendations as a starting point and tuning based on your SLA and throughput requirements. To understand how read ahead impacts disk resource usage we recommend carefully reading through the troubleshooting portion of the documentation.

Compression

Compressed data is stored by compressing fixed size byte buffers and writing the data to disk. The buffer size is determined by the chunk_length_in_kb element in the compression map of the schema settings.

The default setting is 16KB starting with Cassandra 4.0.

Since the entire compressed buffer must be read off disk, using too high of a compression chunk length can lead to significant overhead when reading small records. Combined with the default read ahead setting this can result in massive read amplification for certain workloads.

LZ4Compressor is the default and recommended compression algorithm.

There is additional information on this topic on The Last Pickle Blog.

Compaction

There are different compaction strategies available for different workloads. We recommend reading up on the different strategies to understand which is the best for your environment. Different tables may (and frequently do) use different compaction strategies on the same cluster.

Encryption

It is significantly easier to set up peer to peer encryption and client server encryption when setting up your production cluster as opposed to setting it up once the cluster is already serving production traffic. If you are planning on using network encryption eventually (in any form), we recommend setting it up now. Changing these configurations down the line is not impossible, but mistakes can result in downtime or data loss.

Ensure Keyspaces are Created with NetworkTopologyStrategy

Production clusters should never use SimpleStrategy. Production keyspaces should use the NetworkTopologyStrategy (NTS).

For example:

create KEYSPACE mykeyspace WITH replication =
{'class': 'NetworkTopologyStrategy', 'datacenter1': 3};

NetworkTopologyStrategy allows Cassandra to take advantage of multiple racks and data centers.

Configure Racks and Snitch

Correctly configuring or changing racks after a cluster has been provisioned is an unsupported process. Migrating from a single rack to multiple racks is also unsupported and can result in data loss.

Using GossipingPropertyFileSnitch is the most flexible solution for on premise or mixed cloud environments. Ec2Snitch is reliable for AWS EC2 only environments.