Our monthly roundup of key activities and knowledge to keep the community informed.
Apache #Cassandra 4.0-beta3, 3.11.9, 3.0.23, and 2.2.19 were released on November 4 and are in the repositories. Please pay attention to release notes and let the community know if you encounter problems.
Join the Cassandra mailing list to stay updated.
Cassandra 4.0 is progressing toward GA. There are 1,390 total tickets and remaining tickets represent 5.5% of total scope. Read the full summary shared to the dev mailing list and take a look at the open tickets that need reviewers.
Cassandra 4.0 will be dropping support for older distributions of CentOS 5, Debian 4, and Ubuntu 7.10. Learn more.
Updates on Cassandra Enhancement Proposals (CEPs), how to contribute, and other community activities.
The community weighed options to address reads inconsistencies for Compact Storage as noted in ticket CASSANDRA-16217 (committed). The conversation continues in ticket CASSANDRA-16226 with the aim of ensuring there are no huge performance regressions for common queries when you upgrade from 2.x to 3.0 with Compact Storage tables or drop it from a table on 3.0+.
CASSANDRA-16222 is a Spark library that can compact and read raw Cassandra SSTables into SparkSQL. By reading the sstables directly from a snapshot directory, one can achieve high performance with minimal impact to a production cluster. It was used to successfully export a 32TB Cassandra table (46bn CQL rows) to HDFS in Parquet format in around 70 minutes, a 20x improvement on previous solutions.
Apache Cassandra 4.0-beta-1 was released on FreeBSD.
“With these optimized Cassandra clusters in place, it now costs us 71% less to operate clusters and we could store 35x more data than our previous configuration.” - Maulik Pandey
“Cassandra is a distributed wide-column NoSQL datastore and is used at Yelp for both primary and derived data. Yelp’s infrastructure for Cassandra has been deployed on AWS EC2 and ASG (Autoscaling Group) for a while now. Each Cassandra cluster in production spans multiple AWS regions.” - Raghavendra D Prabhu
Do you have a Cassandra case study to share? Email email@example.com.
Users in search of a tool for scheduling backups and performing restores with cloud storage support (archiving to AWS S3, GCS, etc) should consider Cassandra Medusa.
Apache Cassandra Deployment on OpenEBS and Monitoring on Kubera - Abhishek Raj, MayaData
Lucene Based Indexes on Cassandra - Rahul Singh, Anant
How Netflix Manages Version Upgrades of Cassandra at Scale - Sumanth Pasupuleti, Netflix
Impacts of many tables in a Cassandra data model - Alex Dejanovski, The Last Pickle
Cassandra Upgrade in production : Strategies and Best Practices - Laxmikant Upadhyay, American Express
Apache Cassandra Collections and Tombstones - Jeremy Hanna
Spark + Cassandra - Javier Ramos, ITNext
How to install the Apache Cassandra NoSQL database server on Ubuntu 20.04 - Jack Wallen, TechRepublic
How to deploy Cassandra on Openshift and open it up to remote connections - Sindhu Murugavel