Testing

Creating tests is one of the most important and also most difficult parts of developing Cassandra. There are different ways to test your code depending on what you’re working on.

Cassandra tests can be divided into three main categories, based on the way how they are executed:

  • Java tests - tests implemented in Java and being a part of the Cassandra project. You can distinguish the following subcategories there:

    • JUnit tests - consists of unit tests, single-node integration tests and some tool tests; those tests may run a server with limited functionality in the same JVM as the test code

    • JVM distributed tests - integrated tests against one or multiple nodes, each running in their own classloader; also contains upgrade tests

    • Micro-benchmarks - micro-benchmarks implemented with JMH framework

  • CQLSH tests - CQLSH tests are Python tests written with the Nose test framework. They verify the CQLSH client that can be found in the bin directory. They aim at verifying CQLSH specific behavior like output formatting, autocompletion, parsing, etc).

  • Python distributed tests - Python distributed tests are implemented on top of the PyTest framework and located outside the main Cassandra project in the separate repository apache/cassandra-dtest. They test Cassandra via CCM verifying operation results, logs, and cluster state. Python Distributed tests are Cassandra version agnostic. They include upgrade tests.

In case you want to run DTests with your own version of CCM, please refer to requirements.txt in apache/cassandra-dtest how to do it.

The recipes for running those tests can be found in the cassandra-builds repository here.

Running full test suites locally takes hours, if not days. Beyond running specific tests you know are applicable, or are failing, to the work at hand, it is recommended to rely upon the project’s Continuous Integration systems. If you are not a committer, and don’t have access to a premium CircleCI plan, ask one of the committers to test your patch on the project’s ci-cassandra.apache.org.

Java tests

The simplest test to write for Cassandra code is a unit test. Cassandra uses JUnit as a testing framework and test cases can be found in the test/unit directory. Ideally, you’ll create a unit test for your implementation that exclusively covers the class you created (the unit under test).

Unfortunately, this is not always possible, because Cassandra doesn’t have a very mock friendly code base. Often you’ll find yourself in a situation where you have to use the embedded Cassandra instance to interact with your test. If you want to use CQL in your test, you can extend CQLTester and use some convenient helper methods, as shown here:

@Test
public void testBatchAndList() throws Throwable
{
   createTable("CREATE TABLE %s (k int PRIMARY KEY, l list<int>)");
   execute("BEGIN BATCH " +
           "UPDATE %1$s SET l = l +[ 1 ] WHERE k = 0; " +
           "UPDATE %1$s SET l = l + [ 2 ] WHERE k = 0; " +
           "UPDATE %1$s SET l = l + [ 3 ] WHERE k = 0; " +
           "APPLY BATCH");

   assertRows(execute("SELECT l FROM %s WHERE k = 0"),
              row(list(1, 2, 3)));
}

JUnit tests

To run the unit tests:

ant test

However, this is probably not what you want to do, since that command would run all the unit tests (those from test/unit). It would take about an hour or more to finish.

To run the specific test class or even a method, use the following command:

ant testsome -Dtest.name=<TestClassName> -Dtest.method=<testMethodName>
  • test.name property is for either a simple or fully qualified class name

  • test.method property is optional; if not specified, all test cases from the specified class are executed. Though, you can also specify multiple methods separating them by comma

You can also use the IDE to run the tests - when you generate IDE files and properly import the Cassandra project, you can run the tests by right-clicking on the test class or package name. Remember that it is not enough to compile with IDE for some tests, and you need to call ant jar to build the distribution artifacts. When the test runs some tool as an external process, the tool expects Cassandra artifacts to be in the build directory.

Note that those commands apply to the tests in the test/unit directory. There are, however, some other test categories that have tests in individual directories:

  • test/burn - to run them, first build the uber jar with ant burn-test-jar, and then to run the tests call ant test-burn or ant burn-testsome

  • test/long - to run them, call ant long-test or ant long-testsome

  • test/memory - to run them, call ant test-memory

  • test/microbench discussed in Micro-benchmarks

  • test/distributed discussed in JVM distributed tests

Hint

If you get the error similar to the one below, install the ant-optional package because you need the JUnitTask class (see prerequisites).

Throws: cassandra-trunk/build.xml:1134: taskdef A class needed by class org.krummas.junit.JStackJUnitTask cannot be found:
org/apache/tools/ant/taskdefs/optional/junit/JUnitTask  using the classloader
AntClassLoader[/.../cassandra-trunk/lib/jstackjunit-0.0.1.jar]

Stress and FQLTool tests

Stress and FQLTool are separate modules located under the tools directory in the Cassandra project. They have their own source code and unit tests. To run the tests for those tools, first, build jar artifacts for them but calling:

ant fqltool-build fqltool-build-test
ant stress-build stress-build-test

Then you can execute the tests with either one of the commands:

ant fqltool-test
ant stress-test
and stress-test-some

or using your IDE.

JVM distributed tests

JVM distributed tests can run a cluster of nodes inside a single JVM - they utilize a particular framework (that can be found at apache/cassandra-in-jvm-dtest-api) for that purpose. Those tests are intended to test features that require more started nodes or verify specific behaviors when the nodes get restarted, including upgrading them from one version to another. The tests are located at the test/distributed directory of the Cassandra project; however, only org.apache.cassandra.distributed.test and org.apache.cassandra.upgrade packages contain the actual tests. The rest of the files are various utilities related to the distributed test framework.

The distributed tests can be run in few ways. ant test-jvm-dtest command runs all the distributed JVM tests. It is not very useful; thus, there is also ant test-jvm-dtest-some, which allows specifying test class and test name in the similar way as you could do that for the ant testsome command, for example:

ant test-jvm-dtest-some -Dtest.name=org.apache.cassandra.distributed.test.SchemaTest

ant test-jvm-dtest-some -Dtest.name=org.apache.cassandra.distributed.test.SchemaTest -Dtest.methods=readRepair
Hint

Unlike for JUnit tests, for JVM distributed tests you need to provide fully qualified class name

Distributed tests can also be run using IDE (in fact, you can even debug them).

Upgrade tests

JVM upgrade tests can be run precisely in the same way as any other JVM distributed tests. However, running them requires some preparation - for example, if a test verifies the upgrade from Cassandra 3.0 and Cassandra 3.11 to the current version (say Cassandra 4.0), you need to have prepared dtest uber JARs for all involved versions. To do this:

  1. Check out Cassandra 3.0 based branch you want to test the upgrade from into some other directory

  2. Build dtest uber JAR with ant dtest-jar command

  3. Copy the created build/dtest-3.0.x.jar to the build directory of your target Cassandra project

  4. Repeat the procedure for Cassandra 3.11

  5. Once you have dtest jars of all the involved versions for the upgrade test, you can finally execute the test using your favorite method, say:

ant test-jvm-dtest-some -Dtest.name=org.apache.cassandra.distributed.upgrade.MixedModeReadTest
Hint

You may pre-generate dtest uber JARs for certain past Cassandra releases, store is somewhere and reuse in you future work - no need to rebuild them all the time.

Running multiple tests

It is possible to define a list of test classes to run with a single command. Define a text file, by default called testlist.txt, and put it into your project directory. Here is an example of that file:

org/apache/cassandra/db/ReadCommandTest.java
org/apache/cassandra/db/ReadCommandVerbHandlerTest.java

Essentially, you list the paths to the class files of the tests you want to run. Then you call ant testclasslist, which uses the text file to run the listed tests. Note that, by default, it applies to the tests under the test/unit directory and takes the testlist.txt file, but this behavior can be modified by providing additional parameters:

ant testclasslist -Dtest.classlistprefix=<category> -Dtest.classlistfile=<class list file>

For example, if you want to run the distributed tests this way, and say our tests were listed in the distributed-tests-set.txt file (paths to test classes relative to test/distributed directory), you can do that by calling:

ant testclasslist -Dtest.classlistprefix=distributed -Dtest.classlistfile=distributed-tests-set.txt

Running coverage analysis

Coverage reports from the executed JVM tests can be obtained in two ways - through IDE - for example, IntelliJ supports running tests with coverage analysis (another run button next to the one for running in debug mode).

The other way is to run Ant target codecoverage. Basically, it works for all the ways mentioned above of running JVM tests - the only difference is that instead of specifying the target directly, you pass it as a property called taskname. For example - given the original test command is:

ant testsome -Dtest.name=org.apache.cassandra.utils.concurrent.AccumulatorTest

to run it with coverage analysis, do:

ant codecoverage -Dtaskname=testsome -Dtest.name=org.apache.cassandra.utils.concurrent.AccumulatorTest

It applies to all the targets like test, testsome, test-long, etc., even testclasslist. You can find the coverage report in build/jacoco (index.html is the entry point for the HTML version, but there are also XML and CSV reports).

Note that if you run various tests that way, the coverage information is added to the previously collected runs. That is, you get the cumulative coverage from all runs unless you clean up the project or at least clean up the recorded coverage information by executing the command ant jacoco-cleanup.

Micro-benchmarks

To run micro-benchmarks, first build the uber jar for the JMH framework. Use the following ant command:

ant build-jmh

Then, you can run either all benchmarks (from the test/microbench directory) or the tests matching the name specified by the benchmark.name property when executing the ant microbench command. Whether you run all benchmarks or just a selected one, only classes under the microbench package are selected. The class selection pattern is actually .*microbench.*${benchmark.name}. For example, in order to run org.apache.cassandra.test.microbench.ChecksumBench, execute:

ant microbench -Dbenchmark.name=ChecksumBench

The ant microbench command runs the benchmarks with default parameters as defined in the build.xml file (see the microbench target definition). If you want to run JMH with custom parameters, consider using the test/bin/jmh script. In addition to allowing you to customize JMH options, it also sets up the environment and JVM options by running Cassandra init script (conf/cassandra-env.sh). Therefore, it lets the environment for running the tests to be more similar to the production environment. For example:

test/bin/jmh -gc true org.apache.cassandra.test.microbench.CompactionBench.compactTest

You may also find it useful to run the command to list all the tests: test/bin/jmh -l or test/bin/jmh -lp (also showing the default parameters). The list of all options can be shown by running test/bin/jmh -h

Python tests

Docker

The Docker approach is recommended for running Python distributed tests. The behavior will be more repeatable, matching the same environment as the official testing on Cassandra CI.

Setup Docker

If you are on Linux, you need to install Docker using the system package manager.

If you are on MacOS, you can use either Docker Desktop or some other approach.

Pull the Docker image

The Docker image used on the official Cassandra CI can be found in this repository. You can use either docker/testing/ubuntu2004_j11.docker or docker/testing/ubuntu2004_j11_w_dependencies.docker The second choice has prefetched dependencies for building each main Cassandra branch. Those images can be either built locally (as per instructions in the GitHub repo) or pulled from the Docker Hub - see here.

First, pull the image from Docker Hub (it will either fetch or update the image you previously fetched):

docker pull apache/cassandra-testing-ubuntu2004-java11-w-dependencies

Start the container

docker run -di -m 8G --cpus 4 \
--mount type=bind,source=/path/to/cassandra/project,target=/home/cassandra/cassandra \
--mount type=bind,source=/path/to/cassandra-dtest,target=/home/cassandra/cassandra-dtest \
--name test \
apache/cassandra-testing-ubuntu2004-java11-w-dependencies \
dumb-init bash
Hint

Many distributed tests are not that demanding in terms of resources - 4G / 2 cores should be enough to start one node. However, some tests really run multiple nodes, and some of them are automatically skipped if the machine has less than 32G (there is a way to force running them though). Usually 8G / 4 cores is a convenient choice which is enough for most of the tests.

To log into the container, use the following docker exec command:

docker exec -it `docker container ls -f name=test -q` bash

Setup Python environment

The tests are implemented in Python, so a Python virtual environment (see here for details) with all the required dependencies is good to be set up. If you are familiar with the Python ecosystem, you know what it is all about. Otherwise, follow the instructions; it should be enough to run the tests.

For Python distributed tests do:

cd /home/cassandra/cassandra-dtest
virtualenv --python=python3 --clear --always-copy ../dtest-venv
source ../dtest-venv/bin/activate
CASS_DRIVER_NO_CYTHON=1 pip install -r requirements.txt

For CQLSH tests, replace some paths:

cd /home/cassandra/cassandra/pylib
virtualenv --python=python3 --clear --always-copy ../../cqlsh-venv
source ../../cqlsh-venv/bin/activate
CASS_DRIVER_NO_CYTHON=1 pip install -r requirements.txt
Hint

You may wonder why this weird environment variable CASS_DRIVER_NO_CYTHON=1 was added - it is not required at all. Still, it allows avoiding the compilation of Cassandra driver with Cython, which is not needed unless you want to test that Cython compiled driver. In the end, it speeds up the installation of the requirements significantly from the order of minutes to the order of seconds.

The above commands are also helpful for importing those test projects into your IDE. In that case, you need to run them on your host system rather than in Docker container. For example, when you open the project in IntelliJ, the Python plugin may ask you to select the runtime environment. In this case, choose the existing virtualenv based environment and point to bin/python under the created dtest-venv directory (or cqlsh-venv, or whichever name you have chosen).

Whether you want to play with Python distributed tests or CQLSH tests, you need to select the right virtual environment. Remember to switch to the one you want:

deactivate
source /home/cassandra/dtest-venv/bin/activate

or

deactivate
source /home/cassandra/cqlsh-venv/bin/activate

CQLSH tests

CQLSH tests are located in the pylib/cqlshlib/test directory. They are based on the Nose framework. They require a running Cassandra cluster (it can be one or more nodes cluster) as they start a CQL shell client which tries to connect to a live node. Each test case starts the CQLSH client as a subprocess, issues some commands, and verifies the outcome returned by CQLSH to the console.

You need to set up and activate the virtualenv for CQLSH tests (see Setup Python environment section for details).

So let’s start the cluster first - you can use CCM for that (note that CCM gets automatically installed with the virtualenv and is immediately available once the virtualenv is activated):

ccm create test -n 1 --install-dir=/home/cassandra/cassandra
ccm updateconf "enable_user_defined_functions: true"
ccm updateconf "enable_scripted_user_defined_functions: true"
ccm updateconf "cdc_enabled: true"
ccm start --wait-for-binary-proto

When those commands complete successfully, there is a cluster up and running, and you can run the CQLSH tests. To do so, go to the pylib/cqlshlib directory (not to the test subdirectory) and call the nosetests command without any arguments. The tests take around 5 minutes to complete.

Finally, remember that since you manually started the cluster, you need to stop it manually - just call:

ccm remove test

There is a helper script that does all of those things for you. In particular, it builds the Cassandra project, creates a virtual environment, runs the CCM cluster, executes the tests, and eventually removes the cluster. You find the script in the pylib directory. The only argument it requires is the Cassandra project directory:

cassandra@b69a382da7cd:~/cassandra/pylib$ ./cassandra-cqlsh-tests.sh /home/cassandra/cassandra

As you noticed, if you have already built Cassandra, the previous method of running tests is much faster. Just remember that all the ccm updateconf calls must be aligned with the Cassandra version you are testing, with the supported features enabled. Otherwise, Cassandra won’t start.

Running selected tests

You may run all test tests from the selected file by passing that file as an argument:

~/cassandra/pylib/cqlshlib$ nosetests test/test_constants.py

To run a specific test case, you need to specify the module, class name, and the test name, for example:

~/cassandra/pylib/cqlshlib$ nosetests cqlshlib.test.test_cqlsh_output:TestCqlshOutput.test_boolean_output

For more information on selecting tests with the Nose framework, see this page.

Python distributed tests

One way of doing integration or system testing at larger scale is using dtest (Cassandra distributed test). These dtests automatically setup Cassandra clusters with certain configurations and simulate use cases you want to test.

The best way to learn how to write dtests is probably by reading the introduction "http://www.datastax.com/dev/blog/how-to-write-a-dtest[How to Write a Dtest]". Looking at existing, recently updated tests in the project is another good activity. New tests must follow certain style conventions that are checked before contributions are accepted. In contrast to Cassandra, dtest issues and pull requests are managed on github, therefore you should make sure to link any created dtests in your Cassandra ticket and also refer to the ticket number in your dtest PR.

Creating a good dtest can be tough, but it should not prevent you from submitting patches! Please ask in the corresponding JIRA ticket how to write a good dtest for the patch. In most cases a reviewer or committer will able to support you, and in some cases they may offer to write a dtest for you.

Run the tests - quick examples

Note that you need to set up and activate the virtualenv for DTests (see Setup Python environment section for details). Tests are implemented with the PyTest framework, so you use the pytest command to run them. Let’s run some tests:

pytest --cassandra-dir=/home/cassandra/cassandra schema_metadata_test.py::TestSchemaMetadata::test_clustering_order

That command runs the test_clustering_order test case from TestSchemaMetadata class, located in the schema_metadata_test.py file. You may also provide the file and class to run all test cases from that class:

pytest --cassandra-dir=/home/cassandra/cassandra schema_metadata_test.py::TestSchemaMetadata

or just the file name to run all test cases from all classes defined in that file.

pytest --cassandra-dir=/home/cassandra/cassandra schema_metadata_test.py

You may also specify more individual targets:

pytest --cassandra-dir=/home/cassandra/cassandra schema_metadata_test.py::TestSchemaMetadata::test_basic_table_datatype  schema_metadata_test.py::TestSchemaMetadata::test_udf

If you run pytest without specifying any test, it considers running all the tests it can find. More on the test selection here You probably noticed that --cassandra-dir=/home/cassandra/cassandra is constantly added to the command line. It is one of the cassandra-dtest custom arguments - the mandatory one - unless it is defined, you cannot run any Cassandra dtest.

Setting up PyTest

All the possible options can be listed by invoking pytest --help. You see tons of possible parameters - some of them are native PyTest options, and some come from Cassandra DTest. When you look carefully at the help note, you notice that some commonly used options, usually fixed for all the invocations, can be put into the pytest.ini file. In particular, it is quite practical to define the following:

cassandra_dir = /home/cassandra/cassandra
log_cli = True
log_cli_level = DEBUG

so that you do not have to provide --cassandra-dir param each time you run a test. The other two options set up console logging - remove them if you want logs stored only in log files.

Running tests with specific configuration

There are a couple of options to enforce exact test configuration (their names are quite self-explanatory):

  • --use-vnodes

  • --num-token=xxx - enables the support of virtual nodes with a certain number of tokens

  • --use-off-heap-memtables - use off-heap memtables instead of the default heap-based

  • `--data-dir-count-per-instance=xxx - the number of data directories configured per each instance

Note that the list can grow in the future as new predefined configurations can be added to dtests. It is also possible to pass extra Java properties to each Cassandra node started by the tests - define those options in the JVM_EXTRA_OPTS environment variable before running the test.

Listing the tests

You can do a dry run, so that the tests are only listed and not invoked. To do that, add --collect-only to the pytest command. That additional -q option will print the results in the same format as you would pass the test name to the pytest command:

pytest --collect-only -q

lists all the tests pytest would run if no particular test is specified. Similarly, to list test cases in some class, do:

$ pytest --collect-only -q schema_metadata_test.py::TestSchemaMetadata
schema_metadata_test.py::TestSchemaMetadata::test_creating_and_dropping_keyspace
schema_metadata_test.py::TestSchemaMetadata::test_creating_and_dropping_table
schema_metadata_test.py::TestSchemaMetadata::test_creating_and_dropping_table_with_2ary_indexes
schema_metadata_test.py::TestSchemaMetadata::test_creating_and_dropping_user_types
schema_metadata_test.py::TestSchemaMetadata::test_creating_and_dropping_udf
schema_metadata_test.py::TestSchemaMetadata::test_creating_and_dropping_uda
schema_metadata_test.py::TestSchemaMetadata::test_basic_table_datatype
schema_metadata_test.py::TestSchemaMetadata::test_collection_table_datatype
schema_metadata_test.py::TestSchemaMetadata::test_clustering_order
schema_metadata_test.py::TestSchemaMetadata::test_compact_storage
schema_metadata_test.py::TestSchemaMetadata::test_compact_storage_composite
schema_metadata_test.py::TestSchemaMetadata::test_nondefault_table_settings
schema_metadata_test.py::TestSchemaMetadata::test_indexes
schema_metadata_test.py::TestSchemaMetadata::test_durable_writes
schema_metadata_test.py::TestSchemaMetadata::test_static_column
schema_metadata_test.py::TestSchemaMetadata::test_udt_table
schema_metadata_test.py::TestSchemaMetadata::test_udf
schema_metadata_test.py::TestSchemaMetadata::test_uda

You can copy/paste the selected test case to the pytest command to run it.

Filtering tests

Based on configuration

Most tests run with any configuration, but a subset of tests (test cases) only run if a specific configuration is used. In particular, there are tests annotated with:

  • @pytest.mark.vnodes - the test is only invoked when the support of virtual nodes is enabled

  • @pytest.mark.no_vnodes - the test is only invoked when the support of virtual nodes is disabled

  • @pytest.mark.no_offheap_memtables - the test is only invoked if off-heap memtables are not used

Note that enabling or disabling vnodes is obviously mutually exclusive. If a test is marked to run only with vnodes, it does not run when vnodes is disabled; similarly, when a test is marked to run only without vnodes, it does not run when vnodes is enabled - therefore, there are always some tests which would not run with a single configuration.

Based on resource usage

There are also tests marked with:

@pytest.mark.resource_intensive

which means that the test requires more resources than a regular test because it usually starts a cluster of several nodes. The meaning of resource-intensive is hardcoded to 32GB of available memory, and unless your machine or docker container has at least that amount of RAM, such test is skipped. There are a couple of arguments that allow for some control of that automatic exclusion:

  • --force-resource-intensive-tests - forces the execution of tests marked as resource_intensive, regardless of whether there is enough memory available or not

  • --only-resource-intensive-tests - only run tests marked as resource_intensive - it makes all the tests without resource_intensive annotation to be filtered out; technically, it is equivalent to passing native PyTest argument: -m resource_intensive

  • --skip-resource-intensive-tests - skip all tests marked as resource_intensive - it is the opposite argument to the previous one, and it is equivalent to the PyTest native argument: -m 'not resource_intensive'

Based on the test type

Upgrade tests are marked with:

@pytest.mark.upgrade_test

Those tests are not invoked by default at all (just like running PyTest with -m 'not upgrade_test'), and you have to add some extra options to run them: * --execute-upgrade-tests - enables execution of upgrade tests along with other tests - when this option is added, the upgrade tests are not filtered out * --execute-upgrade-tests-only - execute only upgrade tests and filter out all other tests which do not have @pytest.mark.upgrade_test annotation (just like running PyTest with -m 'upgrade_test')

Filtering examples

It does not matter whether you want to invoke individual tests or all tests or whether you only want to list them; the above filtering rules apply. So by using --collect-only option, you can learn which tests would be invoked.

To list all the applicable tests for the current configuration, use the following command:

pytest --collect-only -q --execute-upgrade-tests --force-resource-intensive-tests

List tests specific to vnodes (which would only run if vnodes are enabled):

pytest --collect-only -q --execute-upgrade-tests --force-resource-intensive-tests --use-vnodes -m vnodes

List tests that are not resource-intensive

pytest --collect-only -q --execute-upgrade-tests --skip-resource-intensive-tests

Upgrade tests

Upgrade tests always involve more than one product version. There are two kinds of upgrade tests regarding the product versions they span - let’s call them fixed and generated.

In case of fixed tests, the origin and target versions are hardcoded. They look pretty usual, for example:

pytest --collect-only -q --execute-upgrade-tests --execute-upgrade-tests-only upgrade_tests/upgrade_supercolumns_test.py

prints:

upgrade_tests/upgrade_supercolumns_test.py::TestSCUpgrade::test_upgrade_super_columns_through_all_versions
upgrade_tests/upgrade_supercolumns_test.py::TestSCUpgrade::test_upgrade_super_columns_through_limited_versions

When you look into the code, you will see the fixed upgrade path:

def test_upgrade_super_columns_through_all_versions(self):
    self._upgrade_super_columns_through_versions_test(upgrade_path=[indev_2_2_x, indev_3_0_x, indev_3_11_x, indev_trunk])

The generated upgrade tests are listed several times - the first occurrence of the test case is a generic test definition, and then it is repeated many times in generated test classes. For example:

pytest --cassandra-dir=/home/cassandra/cassandra --collect-only -q --execute-upgrade-tests --execute-upgrade-tests-only upgrade_tests/cql_tests.py -k test_set

prints:

upgrade_tests/cql_tests.py::cls::test_set
upgrade_tests/cql_tests.py::TestCQLNodes3RF3_Upgrade_current_2_2_x_To_indev_2_2_x::test_set
upgrade_tests/cql_tests.py::TestCQLNodes3RF3_Upgrade_current_3_0_x_To_indev_3_0_x::test_set
upgrade_tests/cql_tests.py::TestCQLNodes3RF3_Upgrade_current_3_11_x_To_indev_3_11_x::test_set
upgrade_tests/cql_tests.py::TestCQLNodes3RF3_Upgrade_current_4_0_x_To_indev_4_0_x::test_set
upgrade_tests/cql_tests.py::TestCQLNodes3RF3_Upgrade_indev_2_2_x_To_indev_3_0_x::test_set
upgrade_tests/cql_tests.py::TestCQLNodes3RF3_Upgrade_indev_2_2_x_To_indev_3_11_x::test_set
upgrade_tests/cql_tests.py::TestCQLNodes3RF3_Upgrade_indev_3_0_x_To_indev_3_11_x::test_set
upgrade_tests/cql_tests.py::TestCQLNodes3RF3_Upgrade_indev_3_0_x_To_indev_4_0_x::test_set
upgrade_tests/cql_tests.py::TestCQLNodes3RF3_Upgrade_indev_3_11_x_To_indev_4_0_x::test_set
upgrade_tests/cql_tests.py::TestCQLNodes3RF3_Upgrade_indev_4_0_x_To_indev_trunk::test_set
upgrade_tests/cql_tests.py::TestCQLNodes2RF1_Upgrade_current_2_2_x_To_indev_2_2_x::test_set
upgrade_tests/cql_tests.py::TestCQLNodes2RF1_Upgrade_current_3_0_x_To_indev_3_0_x::test_set
upgrade_tests/cql_tests.py::TestCQLNodes2RF1_Upgrade_current_3_11_x_To_indev_3_11_x::test_set
upgrade_tests/cql_tests.py::TestCQLNodes2RF1_Upgrade_current_4_0_x_To_indev_4_0_x::test_set
upgrade_tests/cql_tests.py::TestCQLNodes2RF1_Upgrade_indev_2_2_x_To_indev_3_0_x::test_set
upgrade_tests/cql_tests.py::TestCQLNodes2RF1_Upgrade_indev_2_2_x_To_indev_3_11_x::test_set
upgrade_tests/cql_tests.py::TestCQLNodes2RF1_Upgrade_indev_3_0_x_To_indev_3_11_x::test_set
upgrade_tests/cql_tests.py::TestCQLNodes2RF1_Upgrade_indev_3_0_x_To_indev_4_0_x::test_set
upgrade_tests/cql_tests.py::TestCQLNodes2RF1_Upgrade_indev_3_11_x_To_indev_4_0_x::test_set
upgrade_tests/cql_tests.py::TestCQLNodes2RF1_Upgrade_indev_4_0_x_To_indev_trunk::test_set

In this example, the test case name is just test_set, and the class name is TestCQL - the suffix of the class name is automatically generated from the provided specification. The first component is the cluster specification - there are two variants: Nodes2RF1 and Nodes3RF3 - they denote that the upgrade is tested on 2 nodes cluster with a keyspace using replication factor = 1. Analogously the second variant uses 3 nodes cluster with RF = 3.

Then, there is the upgrade specification - for example, Upgrade_indev_3_11_x_To_indev_4_0_x - which means that this test upgrades from the development version of Cassandra 3.11 to the development version of Cassandra 4.0 - the meaning of indev/current and where they are defined is explained later.

When you look into the implementation, you notice that such upgrade test classes inherit from UpgradeTester class, and they have the specifications defined at the end of the file. In this particular case, it is something like:

topology_specs = [
    {'NODES': 3,
     'RF': 3,
     'CL': ConsistencyLevel.ALL},
    {'NODES': 2,
     'RF': 1},
]
specs = [dict(s, UPGRADE_PATH=p, __test__=True)
for s, p in itertools.product(topology_specs, build_upgrade_pairs())]

As you can see, there is a list of the cluster specifications and the cross product is calculated with upgrade paths returned by the build_upgrade_pairs() function. That list of specifications is used to dynamically generate upgrade tests.

Suppose you need to test something specifically for your scenario. In that case, you can add more cluster specifications, like a test with 1 node or a test with 5 nodes with some different replication factor or consistency level. The build_upgrade_pairs() returns the list of upgrade paths (actually just the origin and target version). That list is generated according to the upgrade manifest.

Upgrade manifest

The upgrade manifest is a file where all the upgrade paths are defined. It is a regular Python file located at upgrade_tests/upgrade_manifest.py. As you noticed, Cassandra origin and target version descriptions mentioned in the upgrade test consist of indev or current prefix followed by version string. The definitions of each such version description can be found in the manifest, for example:

indev_3_11_x = VersionMeta(name='indev_3_11_x', family=CASSANDRA_3_11, variant='indev', version='github:apache/cassandra-3.11', min_proto_v=3, max_proto_v=4, java_versions=(8,))

current_3_11_x = VersionMeta(name='current_3_11_x', family=CASSANDRA_3_11, variant='current', version='3.11.10', min_proto_v=3, max_proto_v=4, java_versions=(8,))

There are a couple of different properties which describe those two versions:

  • name - is a name as you can see in the names of the generated test classes

  • family - families is an enumeration defined in the beginning of the upgrade manifest - say family CASSANDRA_3_11 is just a string "3.11". Some major features were introduced or removed with new version families, and therefore some checks can be done or some features can be enabled/disabled according to that, for example:

if self.cluster.version() < CASSANDRA_4_0:
    node1.nodetool("enablethrift")

But it is also used to determine whether our checked-out version matches the target version in the upgrade pair (more on that later)

  • variant and version - there are indev or current variants:

    • indev variant means that the development version of Cassandra will be used. That is, that version is checked out from the Git repository and built before running the upgrade (CCM does it). In this case, the version string is specified as github:apache/cassandra-3.11, which means that it will checkout the cassandra-3.11 branch from the GitHub repository whose alias is apache. Aliases are defined in CCM configuration file, usually located at ~/.ccm/config - in this particular case, it could be something like:

[aliases]
apache:git@github.com:apache/cassandra.git
  • current variant means that a released version of Cassandra will be used. It means that Cassandra distribution denoted by the specified version (3.11.10 in this case) is downloaded from the Apache repository/mirror - again, the repository can be defined in CCM configuration file, under repositories section, something like:

[repositories]
cassandra=https://archive.apache.org/dist/cassandra
  • min_proto_v, max_proto_v - the range of usable Cassandra driver protocol versions

  • java_versions - supported Java versions

The possible upgrade paths are defined later in the upgrade manifest - when you scroll the file, you will find the MANIFEST map which may look similar to:

MANIFEST = {
current_2_1_x: [indev_2_2_x, indev_3_0_x, indev_3_11_x],
current_2_2_x: [indev_2_2_x, indev_3_0_x, indev_3_11_x],
current_3_0_x: [indev_3_0_x, indev_3_11_x, indev_4_0_x],
current_3_11_x: [indev_3_11_x, indev_4_0_x],
current_4_0_x: [indev_4_0_x, indev_trunk],

   indev_2_2_x: [indev_3_0_x, indev_3_11_x],
   indev_3_0_x: [indev_3_11_x, indev_4_0_x],
   indev_3_11_x: [indev_4_0_x],
   indev_4_0_x: [indev_trunk]
}

It is a simple map where for the origin version (as a key), there is a list of possible target versions (as a value). Say:

current_4_0_x: [indev_4_0_x, indev_trunk]

means that upgrades from current_4_0_x to indev_4_0_x and from current_4_0_x to indev_trunk will be considered. You may make changes to that upgrade scenario in your development branch according to your needs. There is a command-line option that allows filtering across upgrade scenarios: --upgrade-version-selection=xxx. The possible values for that options are as follows:

  • indev - which is the default, only selects those upgrade scenarios where the target version is in indev variant

  • both - selects upgrade paths where either both origin and target versions are in the same variant or have the same version family

  • releases - selects upgrade paths between versions in current variant or from the current to indev variant if both have the same version family

  • all - no filtering at all - all variants are tested

Running upgrades with local distribution

The upgrade test can use your local Cassandra distribution, the one specified by the cassandra_dir property, as the target version if the following preconditions are satisfied:

  • the target version is in the indev variant,

  • the version family set in the version description matches the version family of your local distribution

For example, your local distribution is branched off from the cassandra-4.0 branch, likely matching indev_4_0_x. It means that the upgrade path with target version indev_4_0_x uses your local distribution. There is a handy command line option which will filter out all the upgrade tests which do not match the local distribution: --upgrade-target-version-only. Given you are on cassandra-4.0 branch, when applied to the previous example, it will be something similar to:

pytest --cassandra-dir=/home/cassandra/cassandra --collect-only -q --execute-upgrade-tests --execute-upgrade-tests-only upgrade_tests/cql_tests.py -k test_set --upgrade-target-version-only

prints:

upgrade_tests/cql_tests.py::cls::test_set
upgrade_tests/cql_tests.py::TestCQLNodes3RF3_Upgrade_current_4_0_x_To_indev_4_0_x::test_set
upgrade_tests/cql_tests.py::TestCQLNodes3RF3_Upgrade_indev_3_0_x_To_indev_4_0_x::test_set
upgrade_tests/cql_tests.py::TestCQLNodes3RF3_Upgrade_indev_3_11_x_To_indev_4_0_x::test_set
upgrade_tests/cql_tests.py::TestCQLNodes2RF1_Upgrade_current_4_0_x_To_indev_4_0_x::test_set
upgrade_tests/cql_tests.py::TestCQLNodes2RF1_Upgrade_indev_3_0_x_To_indev_4_0_x::test_set
upgrade_tests/cql_tests.py::TestCQLNodes2RF1_Upgrade_indev_3_11_x_To_indev_4_0_x::test_set

You can see that the upgrade tests were limited to the ones whose target version is indev and family matches 4.0.

Logging

A couple of common PyTest arguments control what is logged to the file and the console from the Python test code. Those arguments which start from --log-xxx are pretty well described in the help message (pytest --help) and in PyTest documentation, so it will not be discussed further. However, most of the tests start with the cluster of Cassandra nodes, and each node generates its own logging information and has its own data directories.

By default the logs from the nodes are copied to the unique directory created under logs subdirectory under root of dtest project. For example:

(venv) cassandra@b69a382da7cd:~/cassandra-dtest$ ls logs/ -1
1627455923457_test_set
1627456019264_test_set
1627456474949_test_set
1627456527540_test_list
last

The last item is a symbolic link to the directory containing the logs from the last executed test. Each such directory includes logs from each started node - system, debug, GC as well as standard streams registered upon each time the node was started:

(venv) cassandra@b69a382da7cd:~/cassandra-dtest$ ls logs/last -1
node1.log
node1_debug.log
node1_gc.log
node1_startup-1627456480.3398306-stderr.log
node1_startup-1627456480.3398306-stdout.log
node1_startup-1627456507.2186499-stderr.log
node1_startup-1627456507.2186499-stdout.log
node2.log
node2_debug.log
node2_gc.log
node2_startup-1627456481.10463-stderr.log
node2_startup-1627456481.10463-stdout.log

Those log files are not collected if --delete-logs command-line option is added to PyTest. The nodes also produce data files which may be sometimes useful to examine to resolve some failures. Those files are usually deleted when the test is completed, but there are some options to control that behavior:

  • --keep-test-dir - keep the whole CCM directory with data files and logs when the test completes

  • --keep-failed-test-dir – only keep that directory when the test has failed

Now, how to find where is that directory for the certain test - you need to grab that information from the test logs - for example, you may add -s option to the command line and then look for "dtest_setup INFO" messages. For example:

05:56:06,383 dtest_setup INFO cluster ccm directory: /tmp/dtest-0onwvgkr

says that the cluster work directory is /tmp/dtest-0onwvgkr, and all node directories can be found under the test subdirectory:

(venv) cassandra@b69a382da7cd:~/cassandra-dtest$ ls /tmp/dtest-0onwvgkr/test -1
cluster.conf
node1
node2

Performance Testing

Performance tests for Cassandra are a special breed of tests that are not part of the usual patch contribution process. In fact, many people contribute a lot of patches to Cassandra without ever running performance tests. However, they are important when working on performance improvements; such improvements must be measurable.

Several tools exist for running performance tests. Here are a few to investigate: