Like any database, Apache Cassandra can be impacted by data modeling choices gone awry or the unexpected use of even a well-designed data model. This occurs when partitions get too large or have too many rows or tombstones. These problematic partitions can impact the reading and writing of other partitions, which can lead to service-impacting incidents. While the Cassandra community is always actively working to reduce the impact they have, it can be necessary for operators to take immediate action when they are encountered. Apache Cassandra 4.1 introduces a new feature to give operators this control: the Partition Denylist.
The Partition Denylist allows operators to make a choice between providing access to the entire data set with reduced performance or reducing the available data set (by preventing reads and writes to specific partitions) to ensure performance is not affected. This choice gives operators new control over how these problematic partitions affect other reads and writes serviced by Cassandra, converting operations that would be slow and eat resources to immediate failures before any resources are consumed. Operators have the choice to prevent reads and/or writes to the partition as well as range reads that include the partition.
For operators, JMX is the primary interface to interact with the Denylist. There are four operations available to operators: add a key to the denylist, check if a key is currently in the denylist, remove a key from the denylist, and force a refresh of a node’s local denylist cache. The examples below will use jmxterm but any JMX interface will work.
To add a key to the denylist all that is needed is the keyspace, table, and partition key. In jmxterm denylisting the key
baz in the keyspace
foo and table
bar looks like:
bean org.apache.cassandra.db:type=StorageProxy run denylistKey foo bar baz
All operations are defined on the StorageProxy mbean.
To determine if a key (
baz) for a particular keyspace and table (
foo.bar) is in the denylist use the
bean org.apache.cassandra.db:type=StorageProxy run isKeyDenylisted foo bar baz
After mitigation or for other reasons, operators will eventually want to remove a key from the denylist. Like the other operations, all that is needed is the keyspace, table, and key. In this case, to remove the key
foo.bar from the denylist:
bean org.apache.cassandra.db:type=StorageProxy run removeDenylistKey foo bar baz
For performance purposes, the denylist implementation maintains a local cache on every node. The cache is refreshed periodically (by default every 5 minutes) but it may be necessary to manually refresh it. This can be done with a quick JMX call on the nodes where its necessary (typically on the nodes that own a partition that was recently added):
bean org.apache.cassandra.db:type=StorageProxy run loadPartitionDenylist
Other aspects of the denylist are controlled via properties in
cassandra.yaml. Operators can control whether reads, range reads, and/or writes fail for partitions listed in the denylist. They can also control the frequency of local node cache refreshes and the consistency level used to load the cache (higher consistency gives more accuracy at the cost of potentially not being able to complete the query while lower consistency increases the chances of a successful load while also increasing the potential to have stale or missed records in the denylist).
For clients of Cassandra, they have no control over what partitions are in the denylist. They will however get an informative exception telling them why the read or write failed. In the example below key 3 has been placed into the denylist while key 9 remains unaffected – both reads and writes are configured to be denied.
cqlsh> select * from system_distributed.partition_denylist; ks_name | table_name | key ---------+------------+------------ foo | buz | 0x00000003 cqlsh> select * from foo.buz where key=3; InvalidRequest: Error from server: code=2200 [Invalid query] message= \ "Unable to read denylisted partition [0xDecoratedKey(9010454139840013625, 00000003)] in foo/buz" cqlsh> select * from foo.buz where key=9; key | c1 | c2 | c3 | c4 -----+----+----+----+---- 9 | 1 | 3 | 4 | 5 9 | 9 | 3 | 4 | 5 cqlsh> insert into foo.buz (key, c1, c2, c3, c4) VALUES (3, 1, 1, 1, 1); InvalidRequest: Error from server: code=2200 [Invalid query] message= \ "Unable to write to denylisted partition [0xDecoratedKey(9010454139840013625, 00000003)] in foo/buz"
The Partition Denylist provides operators with a powerful tool when data models run off the tracks. The Cassandra community continues to strengthen and widen those tracks with the hopes that the need to prevent reads and writes to specific partitions will diminish. However, users always find new ways to surprise database authors and administrators. For those cases, the Partition Denylist provides a quick solution with a significant impact.