Cassandra Documentation

Version:

You are viewing the documentation for a prerelease version.

Physical data modeling

Once you have a logical data model defined, creating the physical model is a relatively simple process.

You walk through each of the logical model tables, assigning types to each item. You can use any valid CQL data type <data-types>, including the basic types, collections, and user-defined types. You may identify additional user-defined types that can be created to simplify your design.

After you’ve assigned data types, you analyze the model by performing size calculations and testing out how the model works. You may make some adjustments based on your findings. Once again let’s cover the data modeling process in more detail by working through an example.

Before getting started, let’s look at a few additions to the Chebotko notation for physical data models. To draw physical models, you need to be able to add the typing information for each column. This figure shows the addition of a type for each column in a sample table.

image

The figure includes a designation of the keyspace containing each table and visual cues for columns represented using collections and user-defined types. Note the designation of static columns and secondary index columns. There is no restriction on assigning these as part of a logical model, but they are typically more of a physical data modeling concern.

Hotel Physical Data Model

Now let’s get to work on the physical model. First, you need keyspaces to contain the tables. To keep the design relatively simple, create a hotel keyspace to contain tables for hotel and availability data, and a reservation keyspace to contain tables for reservation and guest data. In a real system, you might divide the tables across even more keyspaces in order to separate concerns.

For the hotels table, use Cassandra’s text type to represent the hotel’s id. For the address, create an address user defined type. Use the text type to represent the phone number, as there is considerable variance in the formatting of numbers between countries.

While it would make sense to use the uuid type for attributes such as the hotel_id, this document uses mostly text attributes as identifiers, to keep the samples simple and readable. For example, a common convention in the hospitality industry is to reference properties by short codes like "AZ123" or "NY229". This example uses these values for hotel_ids, while acknowledging they are not necessarily globally unique.

You’ll find that it’s often helpful to use unique IDs to uniquely reference elements, and to use these uuids as references in tables representing other entities. This helps to minimize coupling between different entity types. This may prove especially effective if you are using a microservice architectural style for your application, in which there are separate services responsible for each entity type.

As you work to create physical representations of various tables in the logical hotel data model, you use the same approach. The resulting design is shown in this figure:

image

Note that the address type is also included in the design. It is designated with an asterisk to denote that it is a user-defined type, and has no primary key columns identified. This type is used in the hotels and hotels_by_poi tables.

User-defined types are frequently used to help reduce duplication of non-primary key columns, as was done with the address user-defined type. This can reduce complexity in the design.

Remember that the scope of a UDT is the keyspace in which it is defined. To use address in the reservation keyspace defined below design, you’ll have to declare it again. This is just one of the many trade-offs you have to make in data model design.

Reservation Physical Data Model

Now, let’s examine reservation tables in the design. Remember that the logical model contained three denormalized tables to support queries for reservations by confirmation number, guest, and hotel and date. For the first iteration of your physical data model design, assume you’re going to manage this denormalization manually. Note that this design could be revised to use Cassandra’s (experimental) materialized view feature.

image

Note that the address type is reproduced in this keyspace and guest_id is modeled as a uuid type in all of the tables.

Material adapted from Cassandra, The Definitive Guide. Published by O’Reilly Media, Inc. Copyright © 2020 Jeff Carpenter, Eben Hewitt. All rights reserved. Used with permission.