Cassandra NoSQL Database
https://en.wikipedia.org/wiki/Apache_Cassandra
- every node in the cluster has the same role / no master
- no single point of failure.
- supports multi data center replication
- read and write scale linearly with number of nodes
- consistency is configurable
Getting started
Run a single node DB via docker (can be accessed via localhost 9042). Add -d to run in the background
Or start your own cluster (found on https://gokhanatil.com/2018/02/build-a-cassandra-cluster-on-docker.html)
Once it runs, add more nodes with the IP of the the first one. One of the nodes is even in a different data center
docker run --name cas2 -e CASSANDRA_SEEDS="$FIRST_IP" -e CASSANDRA_CLUSTER_NAME=MyCluster -e CASSANDRA_ENDPOINT_SNITCH=GossipingPropertyFileSnitch -e CASSANDRA_DC=datacenter1 cassandra
docker run --name cas3 -e CASSANDRA_SEEDS="$FIRST_IP" -e CASSANDRA_CLUSTER_NAME=MyCluster -e CASSANDRA_ENDPOINT_SNITCH=GossipingPropertyFileSnitch -e CASSANDRA_DC=datacenter1 cassandra
docker run --name cas4 -e CASSANDRA_SEEDS="$FIRST_IP" -e CASSANDRA_CLUSTER_NAME=MyCluster -e CASSANDRA_ENDPOINT_SNITCH=GossipingPropertyFileSnitch -e CASSANDRA_DC=datacenter2 cassandra
Check the status of your cluster
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 10.101.0.1 108.61 KiB 256 68.2% 68f10b1a-0313-4fb7-8640-6b1afdab1a5f rack1
UN 10.101.0.3 69.91 KiB 256 65.1% eb7d4399-ea6e-4a67-8dca-9a64f318ea8f rack1
UN 10.101.0.2 93.98 KiB 256 66.8% c8b4140f-faeb-4f0c-b802-7494364777db rack1
Datacenter: datacenter2
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UJ 10.101.0.4 15.47 KiB 256 ? bd40da84-a7ad-4eee-b481-ec8ca2b263c1 rack1
There is also a web UI frontend for Cassandra: https://hub.docker.com/r/delermando/docker-cassandra-web
docker run --name cassandra-web -e CASSANDRA_HOST_IP="$FIRST_IP" -e CASSANDRA_PORT=9042 -e CASSANDRA_USERNAME=cassandra -e CASSANDRA_PASSOWRD=cassandra -p 3000:3000 delermando/docker-cassandra-web:v0.4.0
We will be accessible at http://localhost:300 and should look like this

Cassandra Terminology
- node: One running instance of Cassandra
- cluster: Several nodes
- datacenter: Several nodes that can exchange data fast and cheap
- Column: The basic data structure of Cassandra with column name, column value, and a time stamp
- SuperColumn: Like a column but its values are other columns. The can improve performance if you group columns in a SuperColumn that you often read together
- Column Family / table: A table with columns and rows. The rows are free to not have all the columns
- keyspace: Like a database, groups several column family
Query
For example start cqlsh in your Cassandra docker container.
Get all keypspaces (databases)
Get all tables
Creating data structures
You can enter the statements via the web ui, via the cqlsh command of any of the cluster node or via docker
Create a keypsace where all the data is stored at least twice in datacenter1 and at least once in datacenter2
WITH replication = {
'class' : 'NetworkTopologyStrategy',
'datacenter1' : 2,
'datacenter2' : 1
};
Create a keyspace where all the data is stored at least on 2 nodes
WITH replication = {
'class': 'SimpleStrategy',
'replication_factor' : 2
};
Create a simple table
id INT PRIMARY KEY,
name text
);
Data
INSERT INTO keyspacetest2.people (id, name) VALUES(2, 'Doe');
INSERT INTO keyspacetest2.people (id, name) VALUES(3, 'Jane');
INSERT INTO keyspacetest2.people (id, name) VALUES(4, 'Frank');
Have complex types in a column
INSERT INTO keyspacetest1.people2 (id,name,email) VALUES(1, 'John',['test@example.com', 'test2@example.com']);
UPDATE keyspacetest1.people2 SET email=email+['foo@example.com'] WHERE id=1; // ["test@example.com","test2@example.com","foo@example.com"] John
Java Integration
There are several java clients http://cassandra.apache.org/doc/latest/getting_started/drivers.html#java
Cassandra datastax java
https://github.com/datastax/java-driver
final CqlSessionBuilder builder = CqlSession.builder();
builder.addContactPoints(nodes);
builder.withLocalDatacenter("NameOfYourDataCenter");
session = builder.build();
final Relation relationA = Relation.column(partitionColumn).isEqualTo(bindMarker());
final Relation minDate = Relation.column(clusterColumn).isGreaterThanOrEqualTo(bindMarker());
final Relation maxDate = Relation.column(clusterColumn).isLessThanOrEqualTo(bindMarker());
Select query = QueryBuilder
.selectFrom(myPartitionKey, myClusterKey)
.column(myDataColumn)
.where(relationA)
.where(minDate)
.where(maxDate);
PreparedStatement statement=session.prepare(query.build());
BoundStatement bound=statement.bind("A", now, later);
final RegularInsert insert = insertInto(myPartitionKey, myClusterKey)
.value(myDataColumn1, bindMarker())
.value(myDataColumn2, bindMarker())
);
PreparedStatement statement2=session.prepare(insert.build());
session.execute(statement2.bind("A", "B"));