Apache Cassandra is an open source, distributed, NoSQL database. It presents a partitioned wide column storage model with eventually consistent semantics.
Apache Cassandra was initially designed at Facebook using a staged event-driven architecture (SEDA) to implement a combination of Amazon’s Dynamo distributed storage and replication techniques combined with Google’s Bigtable data and storage engine model. Dynamo and Bigtable were both developed to meet emerging requirements for scalable, reliable and highly available storage systems, but each had areas that could be improved.
Cassandra was designed as a best in class combination of both systems to meet emerging large scale, both in data footprint and query volume, storage requirements. As applications began to require full global replication and always available low-latency reads and writes, it became imperative to design a new kind of database model as the relational database systems of the time struggled to meet the new requirements of global scale applications.
Systems like Cassandra are designed for these challenges and seek the following design objectives:
Cassandra provides the Cassandra Query Language (CQL), an SQL-like language, to create and update database schema and access data. CQL allows users to organize data within a cluster of Cassandra nodes using:
CQL supports numerous advanced features over a partitioned dataset such as:
Cassandra explicitly chooses not to implement operations that require cross partition coordination as they are typically slow and hard to provide highly available global semantics. For example Cassandra does not support:
Apache Cassandra configuration settings are configured in the cassandra.yaml
file that can be edited by hand or with the aid of configuration management tools.
Some settings can be manipulated live using an online interface, but others
require a restart of the database to take effect.
Cassandra provides tools for managing a cluster. The nodetool
command
interacts with Cassandra’s live control interface, allowing runtime manipulation
of many settings from cassandra.yaml
. The auditlogviewer
is used
to view the audit logs. The fqltool
is used to view, replay and compare
full query logs. The auditlogviewer
and fqltool
are new tools in
Apache Cassandra 4.0.
In addition, Cassandra supports out of the box atomic snapshot functionality, which presents a point in time snapshot of Cassandra’s data for easy integration with many backup tools. Cassandra also supports incremental backups where data can be backed up as it is written.
Apache Cassandra 4.0 has added several new features including virtual tables. transient replication, audit logging, full query logging, and support for Java 11. Two of these features are experimental: transient replication and Java 11 support.