Compression

Cassandra offers operators the ability to configure compression on a per-table basis. Compression reduces the size of data on disk by compressing the SSTable in user-configurable compression chunk_length_in_kb. As Cassandra SSTables are immutable, the CPU cost of compressing is only necessary when the SSTable is written - subsequent updates to data will land in different SSTables, so Cassandra will not need to decompress, overwrite, and recompress data when UPDATE commands are issued. On reads, Cassandra will locate the relevant compressed chunks on disk, decompress the full chunk, and then proceed with the remainder of the read path (merging data from disks and memtables, read repair, and so on).

Compression algorithms typically trade off between the following three areas:

  • Compression speed: How fast does the compression algorithm compress data. This is critical in the flush and compaction paths because data must be compressed before it is written to disk.
  • Decompression speed: How fast does the compression algorithm de-compress data. This is critical in the read and compaction paths as data must be read off disk in a full chunk and decompressed before it can be returned.
  • Ratio: By what ratio is the uncompressed data reduced by. Cassandra typically measures this as the size of data on disk relative to the uncompressed size. For example a ratio of 0.5 means that the data on disk is 50% the size of the uncompressed data. Cassandra exposes this ratio per table as the SSTable Compression Ratio field of nodetool tablestats.

Cassandra offers five compression algorithms by default that make different tradeoffs in these areas. While benchmarking compression algorithms depends on many factors (algorithm parameters such as compression level, the compressibility of the input data, underlying processor class, etc …), the following table should help you pick a starting point based on your application’s requirements with an extremely rough grading of the different choices by their performance in these areas (A is relatively good, F is relatively bad):

Compression Algorithm Cassandra Class Compression Decompression Ratio C* Version
LZ4 LZ4Compressor A+ A+ C+ >=1.2.2
LZ4HC LZ4Compressor C+ A+ B+ >= 3.6
Zstd ZstdCompressor A- A- A+ >= 4.0
Snappy SnappyCompressor A- A C >= 1.0
Deflate (zlib) DeflateCompressor C C A >= 1.0

Generally speaking for a performance critical (latency or throughput) application LZ4 is the right choice as it gets excellent ratio per CPU cycle spent. This is why it is the default choice in Cassandra.

For storage critical applications (disk footprint), however, Zstd may be a better choice as it can get significant additional ratio to LZ4.

Snappy is kept for backwards compatibility and LZ4 will typically be preferable.

Deflate is kept for backwards compatibility and Zstd will typically be preferable.

Configuring Compression

Compression is configured on a per-table basis as an optional argument to CREATE TABLE or ALTER TABLE. Three options are available for all compressors:

  • class (default: LZ4Compressor): specifies the compression class to use. The two “fast” compressors are LZ4Compressor and SnappyCompressor and the two “good” ratio compressors are ZstdCompressor and DeflateCompressor.
  • chunk_length_in_kb (default: 16KiB): specifies the number of kilobytes of data per compression chunk. The main tradeoff here is that larger chunk sizes give compression algorithms more context and improve their ratio, but require reads to deserialize and read more off disk.
  • crc_check_chance (default: 1.0): determines how likely Cassandra is to verify the checksum on each compression chunk during reads to protect against data corruption. Unless you have profiles indicating this is a performance problem it is highly encouraged not to turn this off as it is Cassandra’s only protection against bitrot.

The LZ4Compressor supports the following additional options:

  • lz4_compressor_type (default fast): specifies if we should use the high (a.k.a LZ4HC) ratio version or the fast (a.k.a LZ4) version of LZ4. The high mode supports a configurable level, which can allow operators to tune the performance <-> ratio tradeoff via the lz4_high_compressor_level option. Note that in 4.0 and above it may be preferable to use the Zstd compressor.
  • lz4_high_compressor_level (default 9): A number between 1 and 17 inclusive that represents how much CPU time to spend trying to get more compression ratio. Generally lower levels are “faster” but they get less ratio and higher levels are slower but get more compression ratio.

The ZstdCompressor supports the following options in addition:

  • compression_level (default 3): A number between -131072 and 22 inclusive that represents how much CPU time to spend trying to get more compression ratio. The lower the level, the faster the speed (at the cost of ratio). Values from 20 to 22 are called “ultra levels” and should be used with caution, as they require more memory. The default of 3 is a good choice for competing with Deflate ratios and 1 is a good choice for competing with LZ4.

Users can set compression using the following syntax:

CREATE TABLE keyspace.table (id int PRIMARY KEY) WITH compression = {'class': 'LZ4Compressor'};

Or

ALTER TABLE keyspace.table WITH compression = {'class': 'LZ4Compressor', 'chunk_length_in_kb': 64, 'crc_check_chance': 0.5};

Once enabled, compression can be disabled with ALTER TABLE setting enabled to false:

ALTER TABLE keyspace.table WITH compression = {'enabled':'false'};

Operators should be aware, however, that changing compression is not immediate. The data is compressed when the SSTable is written, and as SSTables are immutable, the compression will not be modified until the table is compacted. Upon issuing a change to the compression options via ALTER TABLE, the existing SSTables will not be modified until they are compacted - if an operator needs compression changes to take effect immediately, the operator can trigger an SSTable rewrite using nodetool scrub or nodetool upgradesstables -a, both of which will rebuild the SSTables on disk, re-compressing the data in the process.

Benefits and Uses

Compression’s primary benefit is that it reduces the amount of data written to disk. Not only does the reduced size save in storage requirements, it often increases read and write throughput, as the CPU overhead of compressing data is faster than the time it would take to read or write the larger volume of uncompressed data from disk.

Compression is most useful in tables comprised of many rows, where the rows are similar in nature. Tables containing similar text columns (such as repeated JSON blobs) often compress very well. Tables containing data that has already been compressed or random data (e.g. benchmark datasets) do not typically compress well.

Operational Impact

  • Compression metadata is stored off-heap and scales with data on disk. This often requires 1-3GB of off-heap RAM per terabyte of data on disk, though the exact usage varies with chunk_length_in_kb and compression ratios.
  • Streaming operations involve compressing and decompressing data on compressed tables - in some code paths (such as non-vnode bootstrap), the CPU overhead of compression can be a limiting factor.
  • To prevent slow compressors (Zstd, Deflate, LZ4HC) from blocking flushes for too long, all three flush with the default fast LZ4 compressor and then rely on normal compaction to re-compress the data into the desired compression strategy. See CASSANDRA-15379 <https://issues.apache.org/jira/browse/CASSANDRA-15379> for more details.
  • The compression path checksums data to ensure correctness - while the traditional Cassandra read path does not have a way to ensure correctness of data on disk, compressed tables allow the user to set crc_check_chance (a float from 0.0 to 1.0) to allow Cassandra to probabilistically validate chunks on read to verify bits on disk are not corrupt.

Advanced Use

Advanced users can provide their own compression class by implementing the interface at org.apache.cassandra.io.compress.ICompressor.