CREATE INDEX
Defines a new secondary index (2i) or storage-attached index (SAI) for a single column of a table.
Apache Cassandra supports creating a secondary index or storage-attached index on most columns, including the partition and cluster columns of a PRIMARY KEY
, collections, and static columns.
For maps, you can index using the key, value, or entries (a key:value pair).
All column date types except the following are supported for SAI indexes:
-
counter
-
non-frozen user-defined type (UDT)
See also: CREATE CUSTOM INDEX for Storage-Attached Indexes (SAI), DROP INDEX
Syntax
BNF definition:
index_name::= re('[a-zA-Z_0-9]+')
CREATE INDEX [ IF NOT EXISTS ] <index_name> ON [<keyspace_name>.]<table_name> ( [ ( KEYS | FULL | ENTRIES ) ] <column_name>) // | [ (KEYS(<map_name>)) ] // | [ (VALUES(<map_name>)) ] // | [ (ENTRIES(<map_name>)) ] [USING 'sai'] [ WITH OPTIONS = { <option_map> } ];
Syntax conventions | Description |
---|---|
UPPERCASE |
Literal keyword. |
Lowercase |
Not literal. |
|
Variable value. Replace with a user-defined value. |
|
Optional.
Square brackets ( |
|
Group.
Parentheses ( |
|
Or.
A vertical bar ( |
|
Repeatable.
An ellipsis ( |
|
Single quotation ( |
|
Map collection.
Braces ( |
Set, list, map, or tuple.
Angle brackets ( |
|
|
End CQL statement.
A semicolon ( |
|
Separate the command line options from the command arguments with two hyphens ( |
|
Search CQL only: Single quotation marks ( |
|
Search CQL only: Identify the entity and literal value to overwrite the XML element in the schema and solrConfig files. |
Required parameters
Parameter |
Description |
table_name |
Name of the table to index. |
column_name |
Name of the column to index.
SAI allows only alphanumeric characters and underscores in names.
SAI returns |
Optional parameters
Parameter |
Description |
index_name |
Name of the index.
Enclose in quotes to use special characters or preserve capitalization.
If no name is specified, Apache Cassandra names the index as |
keyspace_name |
Name of the keyspace that contains the table to index. If no name is specified, the current keyspace is used. |
map_name |
Used with collections, identifier of the |
option_map |
Define options in JSON simple format. |
|
|
Ignore case in matching string values. Default: |
|
|
When set to When implementations keep strings in a normalized form, equivalent strings have a unique binary representation. See Unicode Standard Annex #15, Unicode Normalization Forms. Default: |
|
|
When set to |
|
|
Vector search relies on computing the similarity or distance between vectors to identify relevant matches.
The similarity function is used to compute the similarity between two vectors.
Valid options are: |
Usage notes
If the column already contains data, it is indexed during the execution of this statement. After an index has been created, it is automatically updated when data in the column changes.
Indexing with the CREATE INDEX
command can impact performance.
Before creating an index, be aware of when and when not to create an index.
Restriction: Indexing counter columns is not supported.
SAI indexes
You can define an SAI index on one of the columns in a table’s composite partition key, i.e., a partition key comprised of multiple columns. If you need to query based on one of those columns, an SAI index is a helpful option. In fact, you can define an SAI index on each column in a composite partition key, if needed.
Defining one or more SAI indexes based on any column in a database table allows queries to use the indexed column to filter results.
SAI query operators
SAI supports the following query operators for tables with SAI indexes:
-
Numerics:
=
,<
,>
,⇐
,>=
,AND
,OR
,IN
-
Strings:
=
,CONTAINS
,CONTAINS KEY
,AND
,OR
,IN
SAI does not support the following query operators for tables with SAI indexes:
-
Strings or Numerics:
LIKE
See the SAI section.
Examples
Creating a SAI index on a clustering column
Define a table having a composite partition key, and then create an index on a clustering column.
CREATE TABLE IF NOT EXISTS cycling.rank_by_year_and_name (
race_year int,
race_name text,
cyclist_name text,
rank int,
PRIMARY KEY ((race_year, race_name), rank)
);
CREATE INDEX IF NOT EXISTS rank_idx
ON cycling.rank_by_year_and_name (rank);
SELECT * FROM rank_by_year_and_name WHERE rank = 1;
race_year | race_name | rank | cyclist_name
-----------+--------------------------------------------+------+-------------------
2014 | 4th Tour of Beijing | 1 | Phillippe GILBERT
2014 | Tour of Japan - Stage 4 - Minami > Shinshu | 1 | Daniel MARTIN
2015 | Giro d'Italia - Stage 11 - Forli > Imola | 1 | Ilnur ZAKARIN
2015 | Tour of Japan - Stage 4 - Minami > Shinshu | 1 | Benjamin PRADES
(4 rows)
Creating an index on a set or list collection
Create an index on a set or list collection column as you would any other column.
Enclose the name of the collection column in parentheses at the end of the CREATE INDEX
statement.
For example, add a collection of teams to the cyclist_career_teams
table to index the data in the teams set.
CREATE TABLE IF NOT EXISTS cycling.cyclist_career_teams (
id UUID PRIMARY KEY,
lastname text,
teams set<text>
);
CREATE INDEX IF NOT EXISTS teams_idx
ON cycling.cyclist_career_teams (teams);
SELECT * FROM cyclist_career_teams WHERE teams CONTAINS 'Rabobank-Liv Woman Cycling Team';
id | lastname | teams
--------------------------------------+-----------------+------------------------------------------------------------------------------------------------------
5b6962dd-3f90-4c93-8f61-eabfa4a803e2 | VOS | {'Nederland bloeit', 'Rabobank Women Team', 'Rabobank-Liv Giant', 'Rabobank-Liv Woman Cycling Team'}
1c9ebc13-1eab-4ad5-be87-dce433216d40 | BRAND | {'AA Drink - Leontien.nl', 'Leontien.nl', 'Rabobank-Liv Giant', 'Rabobank-Liv Woman Cycling Team'}
e7cd5752-bc0d-4157-a80f-7523add8dbcd | VAN DER BREGGEN | {'Rabobank-Liv Woman Cycling Team', 'Sengers Ladies Cycling Team', 'Team Flexpoint'}
(3 rows)
Creating an index on map keys
You can create an index on map collection keys.
If an index of the map values of the collection exists, drop that index before creating an index on the map collection keys.
Assume a cyclist table contains this map data where nation is the map key and `Canada
is the map value`:
{'nation':'CANADA' }
To index map keys, use the KEYS
keyword and map name in nested parentheses in the CREATE INDEX statement.
To run a SELECT
query on the table, use CONTAINS KEY in WHERE
clauses.
This query returns cyclist teams that have an entry for the year 2015.
CREATE TABLE IF NOT EXISTS cycling.cyclist_teams (
id uuid PRIMARY KEY,
firstname text,
lastname text,
teams map<int, text>
);
CREATE INDEX IF NOT EXISTS team_year_keys_idx
ON cycling.cyclist_teams ( KEYS (teams) );
SELECT *
FROM cycling.cyclist_teams
WHERE teams CONTAINS KEY 2015;
id | firstname | lastname | teams
--------------------------------------+-----------+------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
cb07baad-eac8-4f65-b28a-bddc06a0de23 | Elizabeth | ARMITSTEAD | {2011: 'Team Garmin - Cervelo', 2012: 'AA Drink - Leontien.nl', 2013: 'Boels:Dolmans Cycling Team', 2014: 'Boels:Dolmans Cycling Team', 2015: 'Boels:Dolmans Cycling Team'}
5b6962dd-3f90-4c93-8f61-eabfa4a803e2 | Marianne | VOS | {2015: 'Rabobank-Liv Woman Cycling Team'}
(2 rows)
Creating an index on map entries
You can create an index on map entries.
An ENTRIES
index can be created only on a map column of a table that doesn’t have an existing index.
To index collection entries, use the ENTRIES
keyword and map name in nested parentheses.
To query the map entries in the table, use a WHERE
clause with the map name and a value.
This query finds cyclists who are the same age.
CREATE TABLE IF NOT EXISTS cycling.birthday_list (
cyclist_name text PRIMARY KEY,
blist map<text, text>
);
CREATE INDEX IF NOT EXISTS blist_idx
ON cycling.birthday_list ( ENTRIES(blist) );
SELECT *
FROM cycling.birthday_list
WHERE blist[ 'age' ] = '23';
cyclist_name | blist
------------------+----------------------------------------------------------
Claudio HEINEN | {'age': '23', 'bday': '27/07/1992', 'nation': 'GERMANY'}
Laurence BOURQUE | {'age': '23', 'bday': '27/07/1992', 'nation': 'CANADA'}
(2 rows)
Use the same index to find cyclists from the same country:
CREATE TABLE IF NOT EXISTS cycling.birthday_list (
cyclist_name text PRIMARY KEY,
blist map<text, text>
);
CREATE INDEX IF NOT EXISTS blist_idx
ON cycling.birthday_list ( ENTRIES(blist) );
SELECT *
FROM cycling.birthday_list
WHERE blist[ 'nation' ] = 'NETHERLANDS';
cyclist_name | blist
---------------+--------------------------------------------------------------
Luc HAGENAARS | {'age': '28', 'bday': '27/07/1987', 'nation': 'NETHERLANDS'}
Toine POELS | {'age': '52', 'bday': '27/07/1963', 'nation': 'NETHERLANDS'}
(2 rows)
Creating an index on map values
To create an index on map values, use the VALUES
keyword and map name in nested parentheses.
To query the table, use a WHERE
clause with the map name and the value it contains.
CREATE TABLE IF NOT EXISTS cycling.birthday_list (
cyclist_name text PRIMARY KEY,
blist map<text, text>
);
CREATE INDEX IF NOT EXISTS blist_values_idx
ON cycling.birthday_list ( VALUES(blist) );
SELECT *
FROM cycling.birthday_list
WHERE blist CONTAINS 'NETHERLANDS';
cyclist_name | blist
---------------+--------------------------------------------------------------
Luc HAGENAARS | {'age': '28', 'bday': '27/07/1987', 'nation': 'NETHERLANDS'}
Toine POELS | {'age': '52', 'bday': '27/07/1963', 'nation': 'NETHERLANDS'}
(2 rows)
Creating an index on the full content of a frozen collection
You can create an index on a full FROZEN
collection.
A FULL
index can be created on a set, list, or map column of a table that doesn’t have an existing index.
Create an index on the full content of a FROZEN
list
.
The table in this example stores the number of Pro wins, Grand Tour races, and Classic races that a cyclist has competed in.
To index collection entries, use the FULL
keyword and collection name in nested parentheses.
For example, index the frozen list rnumbers
.
To query the table, use a WHERE
clause with the collection name and values.
CREATE TABLE IF NOT EXISTS cycling.race_starts (
cyclist_name text PRIMARY KEY,
rnumbers FROZEN<LIST<int>>
);
CREATE INDEX IF NOT EXISTS rnumbers_idx
ON cycling.race_starts ( FULL(rnumbers) );
SELECT *
FROM cycling.race_starts
WHERE rnumbers = [39, 7, 14];
cyclist_name | rnumbers
----------------+-------------
John DEGENKOLB | [39, 7, 14]
(1 rows)