Data Types
CQL is a typed language and supports a rich set of data types, including native types, collection types, user-defined types, tuple types, and custom types:
cql_type::= native_type| collection_type| user_defined_type | tuple_type | custom_type
Native types
The native types supported by CQL are:
native_type::= ASCII | BIGINT | BLOB | BOOLEAN | COUNTER | DATE
| DECIMAL | DOUBLE | DURATION | FLOAT | INET | INT |
SMALLINT | TEXT | TIME | TIMESTAMP | TIMEUUID | TINYINT |
UUID | VARCHAR | VARINT | VECTOR
The following table gives additional information on the native data types, and on which kind of constants each type supports:
Type | Constants supported | Description |
---|---|---|
|
|
ASCII character string |
|
|
64-bit signed long |
|
|
Arbitrary bytes (no validation) |
|
|
Either |
|
|
Counter column (64-bit signed value). See |
|
|
A date (with no corresponding time value). See |
|
|
Variable-precision decimal |
|
|
64-bit IEEE-754 floating point |
|
|
A duration with nanosecond precision. See |
|
|
32-bit IEEE-754 floating point |
|
|
An IP address, either IPv4 (4 bytes long) or IPv6 (16 bytes long). Note
that there is no |
|
|
32-bit signed int |
|
|
16-bit signed int |
|
|
UTF8 encoded string |
|
|
A time (with no corresponding date value) with nanosecond precision. See
|
|
|
A timestamp (date and time) with millisecond precision. See |
|
|
Version 1 UUID,
generally used as a “conflict-free” timestamp. Also see |
|
|
8-bit signed int |
|
|
A UUID (of any version) |
|
|
UTF8 encoded string |
|
|
Arbitrary-precision integer |
|
|
A fixed length non-null, flattened array of float values CASSANDRA-18504 added this data type to Cassandra 5.0. |
Counters
The counter
type is used to define counter columns. A counter column
is a column whose value is a 64-bit signed integer and on which 2
operations are supported: incrementing and decrementing (see the
UPDATE statement for syntax).
Note that the value of a counter cannot
be set: a counter does not exist until first incremented/decremented,
and that first increment/decrement is made as if the prior value was 0.
Counters have a number of important limitations:
-
They cannot be used for columns part of the
PRIMARY KEY
of a table. -
A table that contains a counter can only contain counters. In other words, either all the columns of a table outside the
PRIMARY KEY
have thecounter
type, or none of them have it. -
Counters do not support expiration.
-
The deletion of counters is supported, but is only guaranteed to work the first time you delete a counter. In other words, you should not re-update a counter that you have deleted (if you do, proper behavior is not guaranteed).
-
Counter updates are, by nature, not idemptotent. An important consequence is that if a counter update fails unexpectedly (timeout or loss of connection to the coordinator node), the client has no way to know if the update has been applied or not. In particular, replaying the update may or may not lead to an over count.
Working with timestamps
Values of the timestamp
type are encoded as 64-bit signed integers
representing a number of milliseconds since the standard base time known
as the epoch: January 1 1970 at
00:00:00 GMT.
Timestamps can be input in CQL either using their value as an integer
,
or using a string
that represents an
ISO 8601 date. For instance, all
of the values below are valid timestamp
values for Mar 2, 2011, at
04:05:00 AM, GMT:
-
1299038700000
-
'2011-02-03 04:05+0000'
-
'2011-02-03 04:05:00+0000'
-
'2011-02-03 04:05:00.000+0000'
-
'2011-02-03T04:05+0000'
-
'2011-02-03T04:05:00+0000'
-
'2011-02-03T04:05:00.000+0000'
The +0000
above is an RFC 822 4-digit time zone specification; +0000
refers to GMT. US Pacific Standard Time is -0800
. The time zone may be
omitted if desired ('2011-02-03 04:05:00'
), and if so, the date will
be interpreted as being in the time zone under which the coordinating
Cassandra node is configured. There are however difficulties inherent in
relying on the time zone configuration being as expected, so it is
recommended that the time zone always be specified for timestamps when
feasible.
The time of day may also be omitted ('2011-02-03'
or
'2011-02-03+0000'
), in which case the time of day will default to
00:00:00 in the specified or default time zone. However, if only the
date part is relevant, consider using the date type.
Date type
Values of the date
type are encoded as 32-bit unsigned integers
representing a number of days with “the epoch” at the center of the
range (2^31). Epoch is January 1st, 1970
For timestamps, a date can be input either as an
integer
or using a date string
. In the later case, the format should
be yyyy-mm-dd
(so '2011-02-03'
for instance).
Time type
Values of the time
type are encoded as 64-bit signed integers
representing the number of nanoseconds since midnight.
For timestamps, a time can be input either as an
integer
or using a string
representing the time. In the later case,
the format should be hh:mm:ss[.fffffffff]
(where the sub-second
precision is optional and if provided, can be less than the nanosecond).
So for instance, the following are valid inputs for a time:
-
'08:12:54'
-
'08:12:54.123'
-
'08:12:54.123456'
-
'08:12:54.123456789'
Duration type
Values of the duration
type are encoded as 3 signed integer of
variable lengths. The first integer represents the number of months, the
second the number of days and the third the number of nanoseconds. This
is due to the fact that the number of days in a month can change, and a
day can have 23 or 25 hours depending on the daylight saving.
Internally, the number of months and days are decoded as 32 bits
integers whereas the number of nanoseconds is decoded as a 64 bits
integer.
A duration can be input as:
-
(quantity unit)+
like12h30m
where the unit can be:-
y
: years (12 months) -
mo
: months (1 month) -
w
: weeks (7 days) -
d
: days (1 day) -
h
: hours (3,600,000,000,000 nanoseconds) -
m
: minutes (60,000,000,000 nanoseconds) -
s
: seconds (1,000,000,000 nanoseconds) -
ms
: milliseconds (1,000,000 nanoseconds) -
us
orµs
: microseconds (1000 nanoseconds) -
ns
: nanoseconds (1 nanosecond)
-
-
ISO 8601 format:
P[n]Y[n]M[n]DT[n]H[n]M[n]S or P[n]W
-
ISO 8601 alternative format:
P[YYYY]-[MM]-[DD]T[hh]:[mm]:[ss]
For example:
INSERT INTO RiderResults (rider, race, result)
VALUES ('Christopher Froome', 'Tour de France', 89h4m48s);
INSERT INTO RiderResults (rider, race, result)
VALUES ('BARDET Romain', 'Tour de France', PT89H8M53S);
INSERT INTO RiderResults (rider, race, result)
VALUES ('QUINTANA Nairo', 'Tour de France', P0000-00-00T89:09:09);
Duration columns cannot be used in a table’s PRIMARY KEY
. This
limitation is due to the fact that durations cannot be ordered. It is
effectively not possible to know if 1mo
is greater than 29d
without
a date context.
A 1d
duration is not equal to a 24h
one as the duration type has
been created to be able to support daylight saving.
Collections
CQL supports three kinds of collections: maps
, sets
and lists
. The
types of those collections is defined by:
collection_type::= MAP '<' cql_type',' cql_type'>'
| SET '<' cql_type '>'
| LIST '<' cql_type'>'
and their values can be inputd using collection literals:
collection_literal::= map_literal | set_literal | list_literal
map_literal::= '\{' [ term ':' term (',' term : term)* ] '}'
set_literal::= '\{' [ term (',' term)* ] '}'
list_literal::= '[' [ term (',' term)* ] ']'
Note however that neither bind_marker
nor NULL
are supported inside
collection literals.
Noteworthy characteristics
Collections are meant for storing/denormalizing relatively small amount of data. They work well for things like “the phone numbers of a given user”, “labels applied to an email”, etc. But when items are expected to grow unbounded (“all messages sent by a user”, “events registered by a sensor”…), then collections are not appropriate and a specific table (with clustering columns) should be used. Concretely, (non-frozen) collections have the following noteworthy characteristics and limitations:
-
Individual collections are not indexed internally. Which means that even to access a single element of a collection, the whole collection has to be read (and reading one is not paged internally).
-
While insertion operations on sets and maps never incur a read-before-write internally, some operations on lists do. Further, some lists operations are not idempotent by nature (see the section on lists below for details), making their retry in case of timeout problematic. It is thus advised to prefer sets over lists when possible.
Please note that while some of those limitations may or may not be removed/improved upon in the future, it is a anti-pattern to use a (single) collection to store large amounts of data.
Maps
A map
is a (sorted) set of key-value pairs, where keys are unique and
the map is sorted by its keys. You can define and insert a map with:
CREATE TABLE users (
id text PRIMARY KEY,
name text,
favs map<text, text> // A map of text keys, and text values
);
INSERT INTO users (id, name, favs)
VALUES ('jsmith', 'John Smith', { 'fruit' : 'Apple', 'band' : 'Beatles' });
// Replace the existing map entirely.
UPDATE users SET favs = { 'fruit' : 'Banana' } WHERE id = 'jsmith';
Further, maps support:
-
Updating or inserting one or more elements:
UPDATE users SET favs['author'] = 'Ed Poe' WHERE id = 'jsmith'; UPDATE users SET favs = favs + { 'movie' : 'Cassablanca', 'band' : 'ZZ Top' } WHERE id = 'jsmith';
-
Removing one or more element (if an element doesn’t exist, removing it is a no-op but no error is thrown):
DELETE favs['author'] FROM users WHERE id = 'jsmith'; UPDATE users SET favs = favs - { 'movie', 'band'} WHERE id = 'jsmith';
Note that for removing multiple elements in a
map
, you remove from it aset
of keys.
Lastly, TTLs are allowed for both INSERT
and UPDATE
, but in both
cases the TTL set only apply to the newly inserted/updated elements. In
other words:
UPDATE users USING TTL 10 SET favs['color'] = 'green' WHERE id = 'jsmith';
will only apply the TTL to the { 'color' : 'green' }
record, the rest
of the map remaining unaffected.
Sets
A set
is a (sorted) collection of unique values. You can define and
insert a set with:
CREATE TABLE images (
name text PRIMARY KEY,
owner text,
tags set<text> // A set of text values
);
INSERT INTO images (name, owner, tags)
VALUES ('cat.jpg', 'jsmith', { 'pet', 'cute' });
// Replace the existing set entirely
UPDATE images SET tags = { 'kitten', 'cat', 'lol' } WHERE name = 'cat.jpg';
Further, sets support:
-
Adding one or multiple elements (as this is a set, inserting an already existing element is a no-op):
UPDATE images SET tags = tags + { 'gray', 'cuddly' } WHERE name = 'cat.jpg';
-
Removing one or multiple elements (if an element doesn’t exist, removing it is a no-op but no error is thrown):
UPDATE images SET tags = tags - { 'cat' } WHERE name = 'cat.jpg';
Lastly, for sets, TTLs are only applied to newly inserted values.
Lists
As mentioned above and further discussed at the end of this section, lists have limitations and specific performance considerations that you should take into account before using them. In general, if you can use a set instead of list, always prefer a set. |
A list
is a (sorted) collection of non-unique values where
elements are ordered by their position in the list. You can define and
insert a list with:
CREATE TABLE plays (
id text PRIMARY KEY,
game text,
players int,
scores list<int> // A list of integers
)
INSERT INTO plays (id, game, players, scores)
VALUES ('123-afde', 'quake', 3, [17, 4, 2]);
// Replace the existing list entirely
UPDATE plays SET scores = [ 3, 9, 4] WHERE id = '123-afde';
Further, lists support:
-
Appending and prepending values to a list:
UPDATE plays SET players = 5, scores = scores + [ 14, 21 ] WHERE id = '123-afde'; UPDATE plays SET players = 6, scores = [ 3 ] + scores WHERE id = '123-afde';
Warning
The append and prepend operations are not idempotent by nature. So in particular, if one of these operations times out, then retrying the operation is not safe and it may (or may not) lead to appending/prepending the value twice. |
-
Setting the value at a particular position in a list that has a pre-existing element for that position. An error will be thrown if the list does not have the position:
UPDATE plays SET scores[1] = 7 WHERE id = '123-afde';
-
Removing an element by its position in the list that has a pre-existing element for that position. An error will be thrown if the list does not have the position. Further, as the operation removes an element from the list, the list size will decrease by one element, shifting the position of all the following elements one forward:
DELETE scores[1] FROM plays WHERE id = '123-afde';
-
Deleting all the occurrences of particular values in the list (if a particular element doesn’t occur at all in the list, it is simply ignored and no error is thrown):
UPDATE plays SET scores = scores - [ 12, 21 ] WHERE id = '123-afde';
Warning
Setting and removing an element by position and removing occurences of particular values incur an internal read-before-write. These operations will run slowly and use more resources than usual updates (with the exclusion of conditional write that have their own cost). |
Lastly, for lists, TTLs only apply to newly inserted values.
Working with vectors
Vectors are fixed-size sequences of non-null values of a certain data type. They use the same literals as lists.
You can define, insert and update a vector with:
CREATE TABLE plays (
id text PRIMARY KEY,
game text,
players int,
scores vector<int, 3> // A vector of 3 integers
)
INSERT INTO plays (id, game, players, scores)
VALUES ('123-afde', 'quake', 3, [17, 4, 2]);
// Replace the existing vector entirely
UPDATE plays SET scores = [ 3, 9, 4] WHERE id = '123-afde';
Note that it isn’t possible to change the individual values of a vector, and it isn’t possible to select individual elements of a vector.
User-Defined Types (UDTs)
CQL support the definition of user-defined types (UDTs). Such a
type can be created, modified and removed using the
create_type_statement
, alter_type_statement
and
drop_type_statement
described below. But once created, a UDT is simply
referred to by its name:
user_defined_type::= udt_name
udt_name::= [ keyspace_name '.' ] identifier
Creating a UDT
Creating a new user-defined type is done using a CREATE TYPE
statement
defined by:
create_type_statement::= CREATE TYPE [ IF NOT EXISTS ] udt_name
'(' field_definition ( ',' field_definition)* ')'
field_definition::= identifier cql_type
A UDT has a name (used to declared columns of that type) and is a set of named and typed fields. Fields name can be any type, including collections or other UDT. For instance:
CREATE TYPE phone (
country_code int,
number text,
);
CREATE TYPE address (
street text,
city text,
zip text,
phones map<text, phone>
);
CREATE TABLE user (
name text PRIMARY KEY,
addresses map<text, frozen<address>>
);
Things to keep in mind about UDTs:
-
Attempting to create an already existing type will result in an error unless the
IF NOT EXISTS
option is used. If it is used, the statement will be a no-op if the type already exists. -
A type is intrinsically bound to the keyspace in which it is created, and can only be used in that keyspace. At creation, if the type name is prefixed by a keyspace name, it is created in that keyspace. Otherwise, it is created in the current keyspace.
-
As of Cassandra , UDT have to be frozen in most cases, hence the
frozen<address>
in the table definition above.
UDT literals
Once a user-defined type has been created, value can be input using a UDT literal:
udt_literal::= '{' identifier ':' term ( ',' identifier ':' term)* '}'
In other words, a UDT literal is like a map` literal but its keys are the names of the fields of the type. For instance, one could insert into the table define in the previous section using:
INSERT INTO user (name, addresses)
VALUES ('z3 Pr3z1den7', {
'home' : {
street: '1600 Pennsylvania Ave NW',
city: 'Washington',
zip: '20500',
phones: { 'cell' : { country_code: 1, number: '202 456-1111' },
'landline' : { country_code: 1, number: '...' } }
},
'work' : {
street: '1600 Pennsylvania Ave NW',
city: 'Washington',
zip: '20500',
phones: { 'fax' : { country_code: 1, number: '...' } }
}
}
);
To be valid, a UDT literal can only include fields defined by the
type it is a literal of, but it can omit some fields (these will be set to NULL
).
Altering a UDT
An existing user-defined type can be modified using an ALTER TYPE
statement:
alter_type_statement::= ALTER TYPE [ IF EXISTS ] udt_name alter_type_modification
alter_type_modification::= ADD [ IF NOT EXISTS ] field_definition
| RENAME [ IF EXISTS ] identifier TO identifier (AND identifier TO identifier )*
If the type does not exist, the statement will return an error, unless IF EXISTS
is used in which case the operation is a no-op.
You can:
-
Add a new field to the type (
ALTER TYPE address ADD country text
). That new field will beNULL
for any values of the type created before the addition. If the new field exists, the statement will return an error, unlessIF NOT EXISTS
is used in which case the operation is a no-op. -
Rename the fields of the type. If the field(s) does not exist, the statement will return an error, unless
IF EXISTS
is used in which case the operation is a no-op.
ALTER TYPE address RENAME zip TO zipcode;
Dropping a UDT
You can drop an existing user-defined type using a DROP TYPE
statement:
drop_type_statement::= DROP TYPE [ IF EXISTS ] udt_name
Dropping a type results in the immediate, irreversible removal of that type. However, attempting to drop a type that is still in use by another type, table or function will result in an error.
If the type dropped does not exist, an error will be returned unless
IF EXISTS
is used, in which case the operation is a no-op.
Tuples
CQL also support tuples and tuple types (where the elements can be of different types). Functionally, tuples can be though as anonymous UDT with anonymous fields. Tuple types and tuple literals are defined by:
tuple_type::= TUPLE '<' cql_type( ',' cql_type)* '>'
tuple_literal::= '(' term( ',' term )* ')'
and can be created:
CREATE TABLE durations (
event text,
duration tuple<int, text>,
);
INSERT INTO durations (event, duration) VALUES ('ev1', (3, 'hours'));
Unlike other composed types, like collections and UDTs, a tuple is always
frozen <frozen>
(without the need of the frozen
keyword)
and it is not possible to update only some elements of a tuple (without
updating the whole tuple). Also, a tuple literal should always have the
same number of value than declared in the type it is a tuple of (some of
those values can be null but they need to be explicitly declared as so).
Custom Types
Custom types exists mostly for backward compatibility purposes and their usage is discouraged. Their usage is complex, not user friendly and the other provided types, particularly user-defined types, should almost always be enough. |
A custom type is defined by:
custom_type::= string
A custom type is a string
that contains the name of Java class that
extends the server side AbstractType
class and that can be loaded by
Cassandra (it should thus be in the CLASSPATH
of every node running
Cassandra). That class will define what values are valid for the type
and how the time sorts when used for a clustering column. For any other
purpose, a value of a custom type is the same than that of a blob
, and
can in particular be input using the blob
literal syntax.