First steps with Cassandra

First steps with Apache Cassandra (C*)

According to Wikipedia Apache Cassandra is

an open sourcedistributeddatabase management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. Cassandra offers robust support for clusters spanning multiple datacenters,[1] with asynchronous masterless replication allowing low latency operations for all clients.

As first steps let’s dig into data model

Cassandra data model

There are 5 key elements that we need to investigate to describe Cassandra data model:

Column

A column is the most basic unit of data structure in the Cassandra data model. A column is a triplet of a name, a value, and a clock, which you can be thought of as a timestamp. We can imagine something like the JSON below:
{
 "name" : "Davide",
 "value" : "davide.brambilla@fake.com",
 "timestamp" : 1385844381000
}

When you’re working with Cassandra, you may choose to use wide or skinny rows. A wide row means a row that has lots and lots (perhaps tens of thousands or even millions) of columns. A skinny row means a row that has a small number of columns, something closer to a relational model. Wide rows are typically used to store lists of things, for instance they can be used to store the list of user actions on your catalogue items. Skinny rows are more similar to traditional RDBMS rows, they contain similar sets of column names; the main difference is that in Cassandra columns are optional. The basic structure of a super column is its name and the set of columns it stores, its columns are held as a map whose keys are the column names and whose values are the columns. Each column family is stored on disk in its own separate file. So to optimize performance, it’s important to keep columns that you are likely to query together in the same column family. Column Family A column family is a container for an ordered collection of rows, each of which is itself an ordered collection of columns. You may think to column family as RDBMS table but there are some differences. First, column families are defined but you may have different columns on each single row, so there is not a strict schema as in a RDBMS. Second a column familiy has two attributes: name and a comparator which is used to sort columns when they are returned in a query result.

Super Column

A super column is a special kind of columnm, but wherease a regular column stores a byte array as its value, a super column stores a map of subcolumns as its value. You must be aware that the super column structure goes only one level deeper, you cannot define a super column that stores anoter super column. You must be aware that when modeling with super columns, Cassandra does not index subcolumns, so when you load a super column into memory, all of its columns are loaded as well. In this case you may think to define a composite key, for instance something like “contentid:insertts”. Note that  super columns are not supported in CQL 3.

Super Column Family

If you want to create a group of related columns, that is, add another dimension on top of column. Note that  super columns family are not supported in CQL 3.

Keyspace

A keyspace is the outermost container for data in Cassandra, it can be thoughy as a relational database.  A keyspace has a name and a set of attributes that define keyspace-wide behavior. You can create as many keyspaces as your application needs. The basic attributes that you can set per keyspace are:

  • Replication factor:  number of nodes that will act as copies (replicas) of each row of data
  • Replica placement strategy:  how the replicas will be placed in the ring.
  • Column families: keyspace is a container for a list of one or more column families. Each keyspace has at least one and often many column families.

Generally its not recommended to create more than a single keyspace per application, it could be useful to define different keyspaces when you need to specify different repliacation options.

Tagged with: ,
Posted in Apache Cassandra

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: