Cassandra NoSQL Database

https://en.wikipedia.org/wiki/Apache_Cassandra

  • every node in the cluster has the same role / no master
  • no single point of failure.
  • supports multi data center replication
  • read and write scale linearly with number of nodes
  • consistency is configurable

Getting started

Run a single node DB via docker (can be accessed via localhost 9042). Add -d to run in the background

docker run -p 9042:9042 --name cas0 cassandra

Or start your own cluster (found on https://gokhanatil.com/2018/02/build-a-cassandra-cluster-on-docker.html)

docker run -p 9042:9042 --name cas1 -e CASSANDRA_CLUSTER_NAME=MyCluster -e CASSANDRA_ENDPOINT_SNITCH=GossipingPropertyFileSnitch -e CASSANDRA_DC=datacenter1 cassandra

Once it runs, add more nodes with the IP of the the first one. One of the nodes is even in a different data center

FIRST_IP="$(docker inspect --format='{{ .NetworkSettings.IPAddress }}' cas1)"
docker run --name cas2 -e CASSANDRA_SEEDS="$FIRST_IP" -e CASSANDRA_CLUSTER_NAME=MyCluster -e CASSANDRA_ENDPOINT_SNITCH=GossipingPropertyFileSnitch -e CASSANDRA_DC=datacenter1 cassandra
FIRST_IP="$(docker inspect --format='{{ .NetworkSettings.IPAddress }}' cas1)"
docker run --name cas3 -e CASSANDRA_SEEDS="$FIRST_IP" -e CASSANDRA_CLUSTER_NAME=MyCluster -e CASSANDRA_ENDPOINT_SNITCH=GossipingPropertyFileSnitch -e CASSANDRA_DC=datacenter1 cassandra
FIRST_IP="$(docker inspect --format='{{ .NetworkSettings.IPAddress }}' cas1)"
docker run --name cas4 -e CASSANDRA_SEEDS="$FIRST_IP" -e CASSANDRA_CLUSTER_NAME=MyCluster -e CASSANDRA_ENDPOINT_SNITCH=GossipingPropertyFileSnitch -e CASSANDRA_DC=datacenter2 cassandra

Check the status of your cluster

docker exec -ti cas1 nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 10.101.0.1 108.61 KiB 256 68.2% 68f10b1a-0313-4fb7-8640-6b1afdab1a5f rack1
UN 10.101.0.3 69.91 KiB 256 65.1% eb7d4399-ea6e-4a67-8dca-9a64f318ea8f rack1
UN 10.101.0.2 93.98 KiB 256 66.8% c8b4140f-faeb-4f0c-b802-7494364777db rack1
Datacenter: datacenter2
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UJ 10.101.0.4 15.47 KiB 256 ? bd40da84-a7ad-4eee-b481-ec8ca2b263c1 rack1

Connect to Cassandra

cqlsh -u cassandra -p *** --request-timeout=300

That also works directly in the docker container where you run the server

docker exec -ti cas1 cqlsh

And of also Kubernetes

kubectl  exec -ti cassandra-0 cqlsh

There is also a web UI frontend for Cassandra: https://hub.docker.com/r/delermando/docker-cassandra-web

FIRST_IP="$(docker inspect --format='{{ .NetworkSettings.IPAddress }}' cas1)"

docker run --name cassandra-web -e CASSANDRA_HOST_IP="$FIRST_IP" -e CASSANDRA_PORT=9042 -e CASSANDRA_USERNAME=cassandra -e CASSANDRA_PASSOWRD=cassandra -p 3000:3000 delermando/docker-cassandra-web:v0.4.0

We will be accessible at http://localhost:300 and should look like this

Cassandra Terminology

  • node: One running instance of Cassandra
  • cluster: Several nodes
  • datacenter: Several nodes that can exchange data fast and cheap
  • Column: The basic data structure of Cassandra with column name, column value, and a time stamp
  • SuperColumn: Like a column but its values are other columns. The can improve performance if you group columns in a SuperColumn that you often read together
  • Column Family / table: A table with columns and rows. The rows are free to not have all the columns
  • keyspace: Like a database, groups several column family

Query

For example start cqlsh in your Cassandra docker container.

Get all keypspaces (databases)

SELECT * FROM system_schema.keyspaces;

Get all tables

SELECT * FROM system_schema.TABLES WHERE keyspace_name = 'keyspace name';

Creating data structures

Create a keypsace where all the data is stored at least twice in datacenter1 and at least once in datacenter2

CREATE KEYSPACE keyspacetest1
WITH replication = {
        'class' : 'NetworkTopologyStrategy',
        'datacenter1' : 2,
        'datacenter2' : 1
};

Create a keyspace where all the data is stored at least on 2 nodes

CREATE KEYSPACE keyspacetest2
WITH replication = {
        'class': 'SimpleStrategy',
        'replication_factor' : 2
};

Create a simple table

CREATE TABLE keyspacetest2.people (
        id INT PRIMARY KEY,
        name text
);

Data

INSERT INTO keyspacetest2.people (id, name) VALUES(1, 'John');
INSERT INTO keyspacetest2.people (id, name) VALUES(2, 'Doe');
INSERT INTO keyspacetest2.people (id, name) VALUES(3, 'Jane');
INSERT INTO keyspacetest2.people (id, name) VALUES(4, 'Frank');
SELECT * FROM keyspacetest2.people;

Have complex types in a column

CREATE TABLE keyspacetest1.people2  (id INT, NAME text, EMAIL LIST<text>, PRIMARY KEY(id) );
INSERT INTO keyspacetest1.people2 (id,name,email) VALUES(1, 'John',['test@example.com', 'test2@example.com']);
UPDATE keyspacetest1.people2 SET email=email+['foo@example.com'] WHERE id=1; // ["test@example.com","test2@example.com","foo@example.com"] John

Java Integration

There are several java clients http://cassandra.apache.org/doc/latest/getting_started/drivers.html#java

Cassandra datastax java

https://github.com/datastax/java-driver

final List<InetSocketAddress> nodes = ...;

final CqlSessionBuilder builder = CqlSession.builder();
builder.addContactPoints(nodes);
builder.withLocalDatacenter("NameOfYourDataCenter");

session = builder.build();

final Relation relationA = Relation.column(partitionColumn).isEqualTo(bindMarker());
final Relation minDate = Relation.column(clusterColumn).isGreaterThanOrEqualTo(bindMarker());
final Relation maxDate = Relation.column(clusterColumn).isLessThanOrEqualTo(bindMarker());

Select query = QueryBuilder
.selectFrom(myPartitionKey, myClusterKey)
.column(myDataColumn)
.where(relationA)
.where(minDate)
.where(maxDate);

PreparedStatement statement=session.prepare(query.build());
BoundStatement bound=statement.bind("A", now, later);



final RegularInsert insert = insertInto(myPartitionKey, myClusterKey)
                               .value(myDataColumn1, bindMarker())
                               .value(myDataColumn2, bindMarker())
                               );
PreparedStatement statement2=session.prepare(insert.build());
session.execute(statement2.bind("A", "B"));