Cassandra Documentation

Version:

Configure SAI indexes

Configuring your Apache Cassandra environment for Storage-Attached Indexing (SAI) may require some customization of the cassandra.yaml file.

Compaction strategies

Read queries perform better with compaction strategies that produce fewer SSTables.

For most environments that include SAI indexes, using the default SizeTieredCompactionStrategy (STCS) is recommended. This strategy triggers a minor compaction when there are a number of similar sized SSTables on disk as configured by the table subproperty, min_threshold. A minor compaction does not involve all the tables in a keyspace. For details, see Configuring compaction.

For time series data, an alternative is the TimeWindowCompactionStrategy (TWCS). TWCS compacts SSTables using a series of time windows. While in a time window, TWCS compacts all SSTables flushed from memory into larger SSTables using STCS. At the end of the time window, all of these SSTables are compacted into a single SSTable. Then the next time window starts and the process repeats. The duration of the time window is the only setting required. See TimeWindowCompactionStrategy. For more information about TWCS, see Time Window Compaction Strategy.

In general, do not use LeveledCompactionStrategy (LCS) unless your index queries restrict the token range, either directly or by providing a restriction on the partition key. However, if you decide to use LCS, use the following guidelines:

  • The 160 MB default for the CREATE TABLE command’s sstable_size_in_mb option, described in this topic, may result in suboptimal performance for index queries that do not restrict on token range or partition key.

  • While even higher values may be appropriate, depending on your hardware, we recommend at least doubling the default value of sstable_size_in_mb.

Example:

CREATE TABLE IF NOT EXISTS my_keyspace.my_table
.
.
.
   WITH compaction = {
     'class' : 'LeveledCompactionStrategy',
     'sstable_size_in_mb' : '320' };

After increasing the MB value, observe whether the query performance improves on tables with SAI indexes. To observe any performance deltas, per query, look at the QueryLatency and SSTableIndexesHit data in the Cassandra query metrics.

Using a larger value reserves more disk space, because the SSTables are larger, and the ones destined for replacement will use more space while being compacted. However, the larger value results in having fewer SSTables, which lowers query latencies. Each SAI index should ultimately consume less space on disk because of better long-term compression with the larger indexes.

If query performance degrades on large (sstable_max_size ~2GB) SAI indexed SSTables when the workload is not dominated by reads but is experiencing increased write amplification, consider using Unified Compaction Strategy (UCS).

The cassandra.yaml options sai_sstable_indexes_per_query_warn_threshold (default: 32) and sai_sstable_indexes_per_query_fail_threshold (default: disabled) determine the number of SSTable indexes a SAI query touches before warning clients and failing queries respectively. When enabled, they can provide feedback for clients and protection for the database in the face of sub-optimal read queries.

About SAI encryption

With SAI indexes, its on-disk components are simply additional SSTable data. To protect sensitive user data, including any present in the table’s partition key values, SAI will need to encrypt all parts of the index that contain user data,the trie index data for strings and the kd-tree data for numerics. By design, SAI does not encrypt non-user data such as postings metadata or SSTable-level offsets and tokens.