sstablemetadata
Print information about an sstable from the related Statistics.db and Summary.db files to standard output.
Cassandra must be stopped before this tool is executed, or unexpected results will occur. Note: the script does not verify that Cassandra is stopped.
Usage
sstablemetadata <options> <sstable filename(s)>
--gc_grace_seconds <arg> |
The gc_grace_seconds to use when calculating droppable tombstones |
Print all the metadata
Run sstablemetadata against the Data.db file(s) related to a table. If necessary, find theData.db file(s) using sstableutil.
Example:
sstableutil keyspace1 standard1 | grep Data /var/lib/cassandra/data/keyspace1/standard1-f6845640a6cb11e8b6836d2c86545d91/mc-1-big-Data.db sstablemetadata /var/lib/cassandra/data/keyspace1/standard1-f6845640a6cb11e8b6836d2c86545d91/mc-1-big-Data.db SSTable: /var/lib/cassandra/data/keyspace1/standard1-f6845640a6cb11e8b6836d2c86545d91/mc-1-big Partitioner: org.apache.cassandra.dht.Murmur3Partitioner Bloom Filter FP chance: 0.010000 Minimum timestamp: 1535025576141000 Maximum timestamp: 1535025604309000 SSTable min local deletion time: 2147483647 SSTable max local deletion time: 2147483647 Compressor: org.apache.cassandra.io.compress.LZ4Compressor TTL min: 86400 TTL max: 86400 First token: -9223004712949498654 (key=39373333373831303130) Last token: 9222554117157811897 (key=4f3438394e39374d3730) Estimated droppable tombstones: 0.9188263888888889 SSTable Level: 0 Repaired at: 0 Replay positions covered: {CommitLogPosition(segmentId=1535025390651, position=226400)=CommitLogPosition(segmentId=1535025390651, position=6849139)} totalColumnsSet: 100000 totalRows: 20000 Estimated tombstone drop times: 1535039100: 80390 1535039160: 5645 1535039220: 13965 Count Row Size Cell Count 1 0 0 2 0 0 3 0 0 4 0 0 5 0 20000 6 0 0 7 0 0 8 0 0 10 0 0 12 0 0 14 0 0 17 0 0 20 0 0 24 0 0 29 0 0 35 0 0 42 0 0 50 0 0 60 0 0 72 0 0 86 0 0 103 0 0 124 0 0 149 0 0 179 0 0 215 0 0 258 20000 0 310 0 0 372 0 0 446 0 0 535 0 0 642 0 0 770 0 0 924 0 0 1109 0 0 1331 0 0 1597 0 0 1916 0 0 2299 0 0 2759 0 0 3311 0 0 3973 0 0 4768 0 0 5722 0 0 6866 0 0 8239 0 0 9887 0 0 11864 0 0 14237 0 0 17084 0 0 20501 0 0 24601 0 0 29521 0 0 35425 0 0 42510 0 0 51012 0 0 61214 0 0 73457 0 0 88148 0 0 105778 0 0 126934 0 0 152321 0 0 182785 0 0 219342 0 0 263210 0 0 315852 0 0 379022 0 0 454826 0 0 545791 0 0 654949 0 0 785939 0 0 943127 0 0 1131752 0 0 1358102 0 0 1629722 0 0 1955666 0 0 2346799 0 0 2816159 0 0 3379391 0 0 4055269 0 0 4866323 0 0 5839588 0 0 7007506 0 0 8409007 0 0 10090808 0 0 12108970 0 0 14530764 0 0 17436917 0 0 20924300 0 0 25109160 0 0 30130992 0 0 36157190 0 0 43388628 0 0 52066354 0 0 62479625 0 0 74975550 0 0 89970660 0 0 107964792 0 0 129557750 0 0 155469300 0 0 186563160 0 0 223875792 0 0 268650950 0 0 322381140 0 0 386857368 0 0 464228842 0 0 557074610 0 0 668489532 0 0 802187438 0 0 962624926 0 0 1155149911 0 0 1386179893 0 0 1663415872 0 0 1996099046 0 0 2395318855 0 0 2874382626 0 3449259151 0 4139110981 0 4966933177 0 5960319812 0 7152383774 0 8582860529 0 10299432635 0 12359319162 0 14831182994 0 17797419593 0 21356903512 0 25628284214 0 30753941057 0 36904729268 0 44285675122 0 53142810146 0 63771372175 0 76525646610 0 91830775932 0 110196931118 0 132236317342 0 158683580810 0 190420296972 0 228504356366 0 274205227639 0 329046273167 0 394855527800 0 473826633360 0 568591960032 0 682310352038 0 818772422446 0 982526906935 0 1179032288322 0 1414838745986 0 Estimated cardinality: 20196 EncodingStats minTTL: 0 EncodingStats minLocalDeletionTime: 1442880000 EncodingStats minTimestamp: 1535025565275000 KeyType: org.apache.cassandra.db.marshal.BytesType ClusteringTypes: [org.apache.cassandra.db.marshal.UTF8Type] StaticColumns: {C3:org.apache.cassandra.db.marshal.BytesType, C4:org.apache.cassandra.db.marshal.BytesType, C0:org.apache.cassandra.db.marshal.BytesType, C1:org.apache.cassandra.db.marshal.BytesType, C2:org.apache.cassandra.db.marshal.BytesType} RegularColumns: {}
Specify gc grace seconds
To see the ratio of droppable tombstones given a configured gc grace seconds, use the gc_grace_seconds option. Because the sstablemetadata tool doesn’t access the schema directly, this is a way to more accurately estimate droppable tombstones — for example, if you pass in gc_grace_seconds matching what is configured in the schema. The gc_grace_seconds value provided is subtracted from the curent machine time (in seconds).
Example:
sstablemetadata /var/lib/cassandra/data/keyspace1/standard1-41b52700b4ed11e896476d2c86545d91/mc-12-big-Data.db | grep "Estimated tombstone drop times" -A4 Estimated tombstone drop times: 1536599100: 1 1536599640: 1 1536599700: 2 echo $(date +%s) 1536602005 # if gc_grace_seconds was configured at 100, all of the tombstones would be currently droppable sstablemetadata --gc_grace_seconds 100 /var/lib/cassandra/data/keyspace1/standard1-41b52700b4ed11e896476d2c86545d91/mc-12-big-Data.db | grep "Estimated droppable tombstones" Estimated droppable tombstones: 4.0E-5 # if gc_grace_seconds was configured at 4700, some of the tombstones would be currently droppable sstablemetadata --gc_grace_seconds 4700 /var/lib/cassandra/data/keyspace1/standard1-41b52700b4ed11e896476d2c86545d91/mc-12-big-Data.db | grep "Estimated droppable tombstones" Estimated droppable tombstones: 9.61111111111111E-6 # if gc_grace_seconds was configured at 100, none of the tombstones would be currently droppable sstablemetadata --gc_grace_seconds 5000 /var/lib/cassandra/data/keyspace1/standard1-41b52700b4ed11e896476d2c86545d91/mc-12-big-Data.db | grep "Estimated droppable tombstones" Estimated droppable tombstones: 0.0
Explanation of each value printed above
Value | Explanation |
---|---|
SSTable |
prefix of the sstable filenames related to this sstable |
Partitioner |
partitioner type used to distribute data across nodes; defined in cassandra.yaml |
Bloom Filter FP |
precision of Bloom filter used in reads; defined in the table definition |
Minimum timestamp |
minimum timestamp of any entry in this sstable, in epoch microseconds |
Maximum timestamp |
maximum timestamp of any entry in this sstable, in epoch microseconds |
SSTable min local deletion time |
minimum timestamp of deletion date, based on TTL, in epoch seconds |
SSTable max local deletion time |
maximum timestamp of deletion date, based on TTL, in epoch seconds |
Compressor |
blank (-) by default; if not blank, indicates type of compression enabled on the table |
TTL min |
time-to-live in seconds; default 0 unless defined in the table definition |
TTL max |
time-to-live in seconds; default 0 unless defined in the table definition |
First token |
lowest token and related key found in the sstable summary |
Last token |
highest token and related key found in the sstable summary |
Estimated droppable tombstones |
ratio of tombstones to columns, using configured gc grace seconds if relevant |
SSTable level |
compaction level of this sstable, if leveled compaction (LCS) is used |
Repaired at |
the timestamp this sstable was marked as repaired via sstablerepairedset, in epoch milliseconds |
Replay positions covered |
the interval of time and commitlog positions related to this sstable |
totalColumnsSet |
number of cells in the table |
totalRows |
number of rows in the table |
Estimated tombstone drop times |
approximate number of rows that will expire, ordered by epoch seconds |
Count Row Size Cell Count |
two histograms in two columns; one represents distribution of Row Size and the other represents distribution of Cell Count |
Estimated cardinality an estimate of unique values, used for compaction |
EncodingStats* minTTL |
in epoch milliseconds |
EncodingStats* minLocalDeletionTime |
in epoch seconds |
EncodingStats* minTimestamp |
in epoch microseconds |
KeyType |
the type of partition key, useful in reading and writing data from/to storage; defined in the table definition |
ClusteringTypes |
the type of clustering key, useful in reading and writing data from/to storage; defined in the table definition |
StaticColumns |
a list of the shared columns in the table |
RegularColumns |
*
For the encoding stats values, the delta of this and the current epoch
time is used when encoding and storing data in the most optimal way.