Cassandra - An Introduction

Download as pdf or txt
Download as pdf or txt
You are on page 1of 35

Cassandra An Introduction

Mikio L. Braun Leo Jugel TU Berlin, twimpact LinuxTag Berlin 13. Mai 2011

LinuxTag Berlin, 13. 5. 2011

(c) 2011 by Mikio L. Braun

@mikiobraun, blog.mikiobraun.de

What is NoSQL

For many web applications, classical data bases are not the right choice:

Database is just used for storing objects. Consistency not essential. A lot of concurrent access.

LinuxTag Berlin, 13. 5. 2011

(c) 2011 by Mikio L. Braun

@mikiobraun, blog.mikiobraun.de

NoSQL in comparison
Classical Databases Powerful query language Scales by using larger servers (scaling up) Changes of database schema very costly ACID: Atomicity, Consistency, Isolation, Duratbility Transactions, locking, etc. NoSQL very simple query language skales through clustering (scaling out) No fixed database schema Typically only eventually consistent Typically no support for transactions etc.

LinuxTag Berlin, 13. 5. 2011

(c) 2011 by Mikio L. Braun

@mikiobraun, blog.mikiobraun.de

Brewer's CAP Theorem

CAP: Consistency, Availability, Partition Tolerance


Consistency: You never get old data. Availability: read/write operations always possible. Partition Tolerance: other guarantees hold even if network of servers break.

You can only have two of these!

Gilbert, Lynch, Brewer's conjecture and the feasibility of consistent, available, partitiontolerant web services, ACM SIGACT News, Volume 33, Issue 2, June 2002
LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de

Homepage Language History

http://cassandra.apache.org Java Developed at Facebook for inbox search, released as Open Source in July 2008 Apache Incubator since March 2009 Apache Top-Level since February 2010 structured key value store eventually consistent fully equivalent nodes cluster can be modified without restarting DataStax (http://datastax.com)

Main Properties

Support Licence
LinuxTag Berlin, 13. 5. 2011

Apache 2.0
(c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de

Version 0.6.x and 0.7.x

Most important changes in 0.7.x


config file format changed from XML to YAML schema modification (ColumnFamilies) without restart Beginning support for secondary indices

However, also problems with stability initially.

LinuxTag Berlin, 13. 5. 2011

(c) 2011 by Mikio L. Braun

@mikiobraun, blog.mikiobraun.de

Inspirations for Cassandra

Amazon Dynamo

Clustering without dedicated master node Peer-to-peer discovery of nodes, HintedHintoff, etc. data model requires central master node Provides much more fine grained control:

Google BigTable

which data should be stored together on-the-fly compression, etc.

LinuxTag Berlin, 13. 5. 2011

(c) 2011 by Mikio L. Braun

@mikiobraun, blog.mikiobraun.de

Installation

Download tar.gz from http://cassandra.apache.org/download/ Unpack ./conf contains config files ./bin/cassandra -f to start Cassandra, Ctrl-C to stop

LinuxTag Berlin, 13. 5. 2011

(c) 2011 by Mikio L. Braun

@mikiobraun, blog.mikiobraun.de

Configuration

Database

Version 0.6.x: conf/storage-conf.xml Version 0.7.x: conf/cassandra.yaml Version 0.6.x: bin/cassandra.in.sh Version 0.7.x: conf/cassandra-env.sh

JVM Parameters

LinuxTag Berlin, 13. 5. 2011

(c) 2011 by Mikio L. Braun

@mikiobraun, blog.mikiobraun.de

Cassandra's Data Model


Keyspace (= database) Column Family (= table)
key

Row

byte arrays

{name1: value1, name2: value2, name3: value3, ...}

strings sorted according to partitioner

column
sorted by name!

Super Column Family key key


{name1: value1, ...}

LinuxTag Berlin, 13. 5. 2011

(c) 2011 by Mikio L. Braun

@mikiobraun, blog.mikiobraun.de

Example: Simple Object Store


class Person { long id; String name; String affiliation; } Convert fields to byte arrays

Keyspace MyDatabase: ColumnFamily Person: 1: {id: 1, name: Mikio Braun, affiliation: TU Berlin}

LinuxTag Berlin, 13. 5. 2011

(c) 2011 by Mikio L. Braun

@mikiobraun, blog.mikiobraun.de

Example: Index
class Page { long id; List<Links> links; } class Link { long id; ... int numberOfHits; } Object data fields Keyspace MyDatabase ColumnFamily Pages 3: {id: 3, } 4: {id: 4, } ColumnFamily Links 1: {id: 1, url: } 17. {id: 17, url: }

Used for both, linking and indexing!

ColumnFamily LinksPerPageByNumberOfHits 3: { 00000132:00000001: t, 000025: 00000017: 4: { 00000044:00000024: t, } Here we exploit that columns are sorted by their names.
LinuxTag Berlin, 13. 5. 2011

Of course, everything encoded in byte arrays, not ASCII


(c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de

Are SuperColumnFamilies necessary?

Usually, you can replace a SuperColumnFamily by several CollumnFamilies. Since SuperColumnFamilies make the implementation and the protocol more compelx, there are also people advocating the remove SuperCFs... .

LinuxTag Berlin, 13. 5. 2011

(c) 2011 by Mikio L. Braun

@mikiobraun, blog.mikiobraun.de

Cassandra's Architecture
MemTable Read Operation

Memory Disk

Flush

Write Operation

Commit Log

SSTable

SSTable

SSTable

Compaction!
LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de

Cassandras API

THRIFT-based API
Write operations single column range of columns range of columns in several rows column count several columns from range of rows insert batch_mutate remove truncate single column several columns in several rows single column while ColumnFamily

Read operations get get_slice multiget_slice get_count get_range_slice

get_indexed_slices range of columns from index Sonstige login, describe_*, add/drop column family/keyspace since 0.7.x
@mikiobraun, blog.mikiobraun.de

LinuxTag Berlin, 13. 5. 2011

(c) 2011 by Mikio L. Braun

Cassandra Clustering

Fully equivalent nodes, no master node. Bootstrapping requires seed node.


Storage Proxy

Node

Node

Node

Reads/writes according to consistency level Query


LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de

Consistency Level and Replication Factor


Replication factor: On how many nodes is a piece of data stored?

Consistency level:
A node has received the operation, even a HintedHandoff node. One node has completed the request. Operation has completed on majority of nodes / newest result is returned. QUORUM in local data center QUORUM in global data center Wait till all nodes have completed the request
(c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de

Consistency Level ANY ONE QUORUM LOCAL_QUORUM GLOBAL_QUORUM ALL


LinuxTag Berlin, 13. 5. 2011

How to deal with failure

As long as requirements of the consistency level can be met, everything is fine. Hinted Handoff:

A write operation for a faulty node is stored on another node and pushed to the other node once it is available again. Data won't be readable after write! After read operation has completed, data will be compared and updated on all nodes in the background.

Read Repair:

LinuxTag Berlin, 13. 5. 2011

(c) 2011 by Mikio L. Braun

@mikiobraun, blog.mikiobraun.de

Libraries
Python Java Pycassa: http://github.com/pycassa/pycass Telephus: http://github.com/driftx/Telephus Datanucleus JDO:http://github.com/tnine/Datanucleus-Cassandra-Plugin Hector: http://github.com/rantav/hector Kundera http://code.google.com/p/kundera/ Pelops: http://github.com/s7/scale7-pelops grails-cassandra: https://github.com/wolpert/grails-cassandra Aquiles: http://aquiles.codeplex.com/ FluentCassandra: http://github.com/managedfusion/fluentcassandra Cassandra: http://github.com/fauna/cassandra phpcassa: http://github.com/thobbs/phpcassa SimpleCassie: http://code.google.com/p/simpletools-php/wiki/SimpleCassie

Grails .NET Ruby PHP

Or roll your own based on THRIFT http://thrift.apache.org/ :)

LinuxTag Berlin, 13. 5. 2011

(c) 2011 by Mikio L. Braun

@mikiobraun, blog.mikiobraun.de

TWIMPACT: An Application

Real-time analysis of Twitter Trend analysis based on retweets Very high data rate (several million tweets per day, about 50 per second)

LinuxTag Berlin, 13. 5. 2011

(c) 2011 by Mikio L. Braun

@mikiobraun, blog.mikiobraun.de

TWIMPACT: twimpact.jp

LinuxTag Berlin, 13. 5. 2011

(c) 2011 by Mikio L. Braun

@mikiobraun, blog.mikiobraun.de

TWIMPACT: twimpact.com

LinuxTag Berlin, 13. 5. 2011

(c) 2011 by Mikio L. Braun

@mikiobraun, blog.mikiobraun.de

Application Profile

Information about tweets, users, and retweets Text matching for non-API-retweets Retweet frequency and user impact Operation profile:
get_slice (all) get 6.0% 1.7ms get_slice (range) 0.1% 0.8ms batch_mutate (one row) 14.9% 0.9ms insert 21.5% 1.1ms batch_mutate 6.8% 0.8ms remove 0.8% 1.2ms 50.1%

Fraction

Duration 1.1ms

LinuxTag Berlin, 13. 5. 2011

(c) 2011 by Mikio L. Braun

@mikiobraun, blog.mikiobraun.de

Practical Experiences with Cassandra

Very stable Read operations relatively expensive Multithreading leads to a huge performance increase Requires quite extensive tuning Clustering doesn't automatically lead to better performance Compaction leads to performance decrease of up to 50%
(c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de

LinuxTag Berlin, 13. 5. 2011

Performance through Multithreading


Multithreading leads to much higher throughput How to achieve multithreading without locking support?
64 32 4 2 16 8

Core i7, 4 cores (2 + 2 HT)

LinuxTag Berlin, 13. 5. 2011

(c) 2011 by Mikio L. Braun

@mikiobraun, blog.mikiobraun.de

Performance through Multithreading


Multithreading leads to much higher throughput How to achieve multithreading without locking support?

LinuxTag Berlin, 13. 5. 2011

(c) 2011 by Mikio L. Braun

@mikiobraun, blog.mikiobraun.de

Cassandra Tuning

Tuning opportunities:

Size of memtables, thresholds for flushes Size of JVM Heap Frequency and depth of compaction MemTableThresholds etc. in conf/cassandra.yaml JVM Parameters in conf/cassandra-env.sh

Where?

LinuxTag Berlin, 13. 5. 2011

(c) 2011 by Mikio L. Braun

@mikiobraun, blog.mikiobraun.de

Overview of JVM GC
Young Generation Old Generation
CMSInitiatingOccupancyFraction

Eden

Survivors dozens of GBs

up to a few hundred MB
LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun

Additional memory usage while GC is running

@mikiobraun, blog.mikiobraun.de

Cassandra's Memory Usage

Flush
Memtables, indexes, etc.

Size of Memtable: 128M, JVM Heap: 3G, #CF: 12


LinuxTag Berlin, 13. 5. 2011

Compaction
@mikiobraun, blog.mikiobraun.de

(c) 2011 by Mikio L. Braun

Cassandra's Memory Usage

Memtables may survive for a very long time (up to several hours)

are placed in old generation GC has to process several dozen GBs heap to small, GC triggered too late GC storm I/O load vs. memory usage

Trade-off:

Do not neglect compaction!


(c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de

LinuxTag Berlin, 13. 5. 2011

The Effects of GC and Compactions

Compaction

Groe GC

LinuxTag Berlin, 13. 5. 2011

(c) 2011 by Mikio L. Braun

@mikiobraun, blog.mikiobraun.de

Cluster vs Single Node

Our set-up:

1 Cluster with six-core CPU and RAID 5 with 6 hard disks 4 Cluster with six-core CPU and RAID 0 with 2 hard disks

Single node consistently performs 1,5-3 times better. Possible causes:


Overhead through network communication/consistency levels, etc. Hard disk performance significant Cluster still too small 1 Cluster: 6 * 500 GB = 3TB with RAID 5 = 2.5 TB (83%) 4 Cluster: 4 * 1TB = 4TB with replication factor 2 = 2TB (50%)
(c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de

Effectively available disk space:


LinuxTag Berlin, 13. 5. 2011

Alternatives

MongoDB, CouchDB, redis, even memcached... . Persistency: Disk or RAM? Replication: Master/Slave or Peer-to-Peer? Sharding? Upcoming trend towards more complex query languages (Javascript), map-reduce operations, etc.

LinuxTag Berlin, 13. 5. 2011

(c) 2011 by Mikio L. Braun

@mikiobraun, blog.mikiobraun.de

Summary: Cassandra

Platform which scales well Active user and developer community Read operations quite expensive For optimal performance, extensive tuning necessary Depending on your application, eventually consistent and lack of transactions/locking might be problematic.

LinuxTag Berlin, 13. 5. 2011

(c) 2011 by Mikio L. Braun

@mikiobraun, blog.mikiobraun.de

Links

Apache Cassandra http://cassandra.apache.org Apache Cassandra Wiki http://wiki.apache.org/cassandra/FrontPage DataStax Dokumentation fr Cassandra http://www.datastax.com/docs/0.7/index My Blog: http://blog.mikiobraun.de Twimpact: http://beta.twimpact.com

LinuxTag Berlin, 13. 5. 2011

(c) 2011 by Mikio L. Braun

@mikiobraun, blog.mikiobraun.de

You might also like