The basic idea of
SizeTieredCompactionStrategy (STCS) is to merge sstables of approximately the same size. All
sstables are put in different buckets depending on their size. An sstable is added to the bucket if size of the sstable
bucket_high of the current average size of the sstables already in the bucket. This
will create several buckets and the most interesting of those buckets will be compacted. The most interesting one is
decided by figuring out which bucket’s sstables takes the most reads.
When running a major compaction with STCS you will end up with two sstables per data directory (one for repaired data and one for unrepaired data). There is also an option (-s) to do a major compaction that splits the output into several sstables. The sizes of the sstables are approximately 50%, 25%, 12.5%… of the total size.
bucket_low * avg_bucket_size < sstable_size(and the
bucket_highcondition holds, see below), then the sstable is added to the bucket.
sstable_size < bucket_high * avg_bucket_size(and the
bucket_lowcondition holds, see above), then the sstable is added to the bucket.
Defragmentation is done when many sstables are touched during a read. The result of the read is put in to the memtable so that the next read will not have to touch as many sstables. This can cause writes on a read-only-cluster.