Apache Cassandra | Apache Cassandra Documentation

Our monthly roundup of key activities and knowledge to keep the community informed.

Release Notes

Released

Apache #Cassandra 4.0-beta3, 3.11.9, 3.0.23, and 2.2.19 were released on November 4 and are in the repositories. Please pay attention to release notes and let the community know if you encounter problems.

Join the Cassandra mailing list to stay updated.

Changed

Cassandra 4.0 is progressing toward GA. There are 1,390 total tickets and remaining tickets represent 5.5% of total scope. Read the full summary shared to the dev mailing list and take a look at the open tickets that need reviewers.

Cassandra 4.0 will be dropping support for older distributions of CentOS 5, Debian 4, and Ubuntu 7.10. Learn more.

Community Notes

Updates on Cassandra Enhancement Proposals (CEPs), how to contribute, and other community activities.

Added

The community weighed options to address reads inconsistencies for Compact Storage as noted in ticket CASSANDRA-16217 (committed). The conversation continues in ticket CASSANDRA-16226 with the aim of ensuring there are no huge performance regressions for common queries when you upgrade from 2.x to 3.0 with Compact Storage tables or drop it from a table on 3.0+.

Added

CASSANDRA-16222 is a Spark library that can compact and read raw Cassandra SSTables into SparkSQL. By reading the sstables directly from a snapshot directory, one can achieve high performance with minimal impact to a production cluster. It was used to successfully export a 32TB Cassandra table (46bn CQL rows) to HDFS in Parquet format in around 70 minutes, a 20x improvement on previous solutions.

Changed

Great news for CEP-2: Kubernetes Operator, the community has agreed to create a community-based operator by merging the cass-operator and CassKop. The work being done can be viewed on GitHub here.

Released

The Reaper community announced v2.1 of its tool that schedules and orchestrates repairs of Apache Cassandra clusters. Read the docs.

Released

Apache Cassandra 4.0-beta-1 was released on FreeBSD.

User Space

Netflix

“With these optimized Cassandra clusters in place, it now costs us 71% less to operate clusters and we could store 35x more data than our previous configuration.” - Maulik Pandey

Yelp

“Cassandra is a distributed wide-column NoSQL datastore and is used at Yelp for both primary and derived data. Yelp’s infrastructure for Cassandra has been deployed on AWS EC2 and ASG (Autoscaling Group) for a while now. Each Cassandra cluster in production spans multiple AWS regions.” - Raghavendra D Prabhu

Do you have a Cassandra case study to share? Email cassandra@constantia.io.

In the News

DevPro Journal: What’s included in the Cassandra 4.0 Release?

JAXenter: Moving to cloud-native applications and data with Kubernetes and Apache Cassandra

DZone: Improving Apache Cassandra’s Front Door and Backpressure

ApacheCon: Building Apache Cassandra 4.0: behind the scenes

Cassandra Tutorials & More

Users in search of a tool for scheduling backups and performing restores with cloud storage support (archiving to AWS S3, GCS, etc) should consider Cassandra Medusa.

Apache Cassandra Deployment on OpenEBS and Monitoring on Kubera - Abhishek Raj, MayaData

Lucene Based Indexes on Cassandra - Rahul Singh, Anant

How Netflix Manages Version Upgrades of Cassandra at Scale - Sumanth Pasupuleti, Netflix

Impacts of many tables in a Cassandra data model - Alex Dejanovski, The Last Pickle

Cassandra Upgrade in production : Strategies and Best Practices - Laxmikant Upadhyay, American Express

Apache Cassandra Collections and Tombstones - Jeremy Hanna

Spark + Cassandra - Javier Ramos, ITNext

How to install the Apache Cassandra NoSQL database server on Ubuntu 20.04 - Jack Wallen, TechRepublic

How to deploy Cassandra on Openshift and open it up to remote connections - Sindhu Murugavel