Support for arithmetic operators was added to CQL in Cassandra 4.0 and this blog post describes the important things to know about CQL arithmetics operators, and how we addressed the non-trivial challenges around return types and types inference.
Operations on Numbers
Cassandra supports five arithmetic operators on numbers:
Operator |
Description |
+ |
addition |
- |
negates operand/subtraction |
* |
multiplication |
/ |
division |
% |
returns the remainder of a division |
Those operators can be used in the SELECT
INSERT
, UPDATE,
and DELETE
statements to perform some computations. For example, if you wish to know the percentage of completion of the currently running compactions, you can query the sstable_tasks
system view using the following query:
SELECT keyspace_name,
table_name,
Task_id,
(progress * 100) / total AS percentage_of_completion
FROM system_views.sstable_tasks
WHERE kind = 'compaction'
ALLOW FILTERING;
The Important Role of Return Types
An important thing to take into account when using operators is the return type of the operation. In Cassandra the return type is based on the type of the operands and can be determined using the following table:
Left/Right | tinyint | smallint | int | bigint | counter | float | double | varint | decimal |
---|---|---|---|---|---|---|---|---|---|
tinyint |
tinyint |
smallint |
int |
bigint |
bigint |
float |
double |
varint |
decimal |
smallint |
smallint |
smallint |
int |
bigint |
bigint |
float |
double |
varint |
decimal |
int |
int |
int |
int |
bigint |
bigint |
float |
double |
varint |
decimal |
bigint |
bigint |
bigint |
bigint |
bigint |
bigint |
double |
double |
varint |
decimal |
counter |
bigint |
bigint |
bigint |
bigint |
bigint |
double |
double |
varint |
decimal |
float |
float |
float |
float |
double |
double |
float |
double |
decimal |
decimal |
double |
double |
double |
double |
double |
double |
double |
double |
decimal |
decimal |
varint |
varint |
varint |
varint |
varint |
varint |
decimal |
decimal |
varint |
decimal |
decimal |
decimal |
decimal |
decimal |
decimal |
decimal |
decimal |
decimal |
decimal |
decimal |
Based on that table you can see that if the percentage_of_completion
had been computed as (progress / total) * 100
instead of as (progress * 100) / total
the output will always have been a bigint
equal to 0
or 1
. As progress
and total
are both bigints
the output of the division would have been of type bigint
and equal to 0 if progress
was smaller than total
.
Type Inference
Of course, literals also play a role in the selection of the return type. Cassandra will try to infer the type of the literals to determine how the operation must be performed.
In an expression like percentage_of_completion
, the only information that C* has to infer the type is its value.
If the literal is an integer, C* will look at it and consider it as an int
, a bigint
or a varint
if the type can hold this value. For example, in our computation, 100 will be considered as an int
because the value can fit in a 32-bit signed integer.
For floats, C* will consider them as either doubles
or decimals
.
If the percentage_of_completion
had been computed as (progress * 100.0) / total
, 100.0 would have been considered as a double
and the percentage_of_completion
type would also have been a double
.
Now, in some cases, Cassandra knows the return type of the expression and can use it to infer the type of the literals.
If for example we have a table like:
CREATE TABLE myTable (pk int, c tinyint, v text, PRIMARY KEY(pk, c));
Cassandra will be able to prepare the following query without problem.
SELECT * FROM myTable WHERE pk = 2 AND c = 1 + ?;
As the c
column is of type tinyint
, C* can guess that the binding parameter and the literal would be of the tinyint
type as only the sum of 2 tinyints
can return a tinyint
.
But if you try to prepare the following query:
SELECT * FROM myTable WHERE pk = ? + 1 AND c = 1;
Cassandra will return an error saying:
Ambiguous '+' operation with args ? and 1: use type casts to disambiguate
The problem here is that the binding parameter could have 3 different types (tinyint
, smallint
or int
) and C* does not know which one the user intend to use.
Hints and Casting to the Rescue
To solve type inference problems related to binding parameters Cassandra has support for type hints. A type hint is a way to explicitly tell Cassandra what will be the type of the parameter.
For example in our previous example if the binding parameter will be an int
, the query can be modified as follow:
SELECT * FROM myTable WHERE pk = (int) ? + 1 AND c = 1;
Type hints can also be used to control the type of a literal. If we go back to the sstable_tasks query. We could have changed the type of the percentage_of_completion
to decimal
by using a decimal
type hint in front of the literal:
(progress * (decimal) 100) / total AS percentage_of_completion
Of course, the same result could have been reached by casting progress into a decimal:
(CAST(progress AS decimal) * 100) / total AS percentage_of_completion
or
(CAST(progress AS decimal) / total) * 100 AS percentage_of_completion
as the operation between a decimal
and a bigint
will result in a decimal.
Operator Precedence
*
, /
and %
operators have a higher precedence level than +
and -
operator. By consequence, they will be evaluated before the other two. If two operators in an expression have the same precedence level, they will be evaluated left to right based on their positions in the expression.
Parentheses can be used to modify the order in which the operations must be performed within an expression. Everything within parentheses will be evaluated first to yield a single value before that value can be used by any operator outside the parentheses.
For example, it is possible to compute the percentage remaining before completion for the compactions as:
(total - progress) * 100 / total AS remaining_percentage
Operations on Timestamps and Dates
In version 4.0, CQL also supports the addition or subtraction of durations from timestamps
and dates
.
If you are working with time-series data and use the following table to store your sensor data:
CREATE TABLE sensor_data (
sensor text,
day date,
ts timeuuid,
value double,
primary key((sensor, day), ts)
) WITH CLUSTERING ORDER BY (ts DESC)
You can use the following query to retrieve some statistics on the data from the previous day:
SELECT sensor, day, min(value), max(value), avg(value)
FROM sensor_data
WHERE sensor = ? AND day = currentdate() - 1d;
You can express durations as (quantity unit)+
like 12h30m
where the unit can be:
-
y
: years (12` months) -
mo
: months (1 month) -
w
: weeks (7 days) -
d
: days (1 day) -
h
: hours (3,600,000,000,000 nanoseconds) -
m
: minutes (60,000,000,000 nanoseconds) -
s
: seconds (1,000,000,000 nanoseconds) -
ms
: milliseconds (1,000,000 nanoseconds) -
us
orµs
: microseconds (1000 nanoseconds) -
ns
: nanoseconds (1 nanosecond)
What About Daylight Savings and Leap Seconds?
Internally the timestamp and date types store information in UTC time. As UTC does not change with a change of seasons arithmetic operations on timestamps and dates are safe and will always return the expected results. However, be aware that the Java libraries used internally by Cassandra, ignore leap seconds.