0% found this document useful (0 votes)
178 views60 pages

Introduction To NOSQL and Cassandra: @rantav @outbrain

The document provides an introduction to NoSQL and Cassandra. It discusses some of the challenges of modern web applications that have led to the development of NoSQL databases, such as large data sizes, high read/write rates, and frequent schema changes. It then summarizes Cassandra, describing it as a column-oriented distributed database modeled after Bigtable that provides eventual consistency. The document also covers Cassandra's data model and basic operations through its API.

Uploaded by

chrisjaure
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
178 views60 pages

Introduction To NOSQL and Cassandra: @rantav @outbrain

The document provides an introduction to NoSQL and Cassandra. It discusses some of the challenges of modern web applications that have led to the development of NoSQL databases, such as large data sizes, high read/write rates, and frequent schema changes. It then summarizes Cassandra, describing it as a column-oriented distributed database modeled after Bigtable that provides eventual consistency. The document also covers Cassandra's data model and basic operations through its API.

Uploaded by

chrisjaure
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 60

Introduction to NOSQL

And Cassandra
@rantav 
@outbrain
SQL is good

• Rich language
• Easy to use and integrate
• Rich toolset 
• Many vendors

• The promise: ACID


o Atomicity
o Consistency
o Isolation
o Durability
SQL Rules
BUT
HOWEVER...
The Challenge: Modern web apps

• Internet-scale data size


• High read-write rates
• Frequent schema changes

• "social" apps - not banks


o They don't need the same
 level of ACID 

SCALING
Scaling Solutions - Replication

Scales Reads
Scaling Solutions - Sharding

Scales also Writes


Brewer's CAP Theorem: 

You can only choose two


CAP
Availability + Consistency (no Partition Tolerance)

• Single master SQL server

• Or - an array of SQLs
Consistency + Partition Tolerance (no Availability)
Availability + Partition Tolerance (no Consistency)
Consistency Levels

• Strong Consistency (RDBMS, Local Disk, RAM, ...)

• Weak Consistency - no guarranties

• Eventual Consistentcy (Cassandra, DNS etc)


o Causal consistency. A writes, then tells B "I wrote". 
o Read-your-writes consistency. (special case of
 causal).
o Monotonic read consistency. A reads x. In future reads,
A will never read older values of x
o Monotonic write consistency. Serialize the writes by the
same process. 
Existing NOSQL Solutions
 

• Developed at facebook

• Follows the BigTable Data Model - column


oriented

• Follows the Dynamo Eventual Consistency


model

• Opensourced at Apache

• Implemented in Java
CONSISTENCY DOWN TO EARTH
N/R/W

• N - Number of replicas (nodes) for any data item

• W - Number or nodes a write operation blocks on

• R - Number of nodes a read operation blocks on


N/R/W - Typical Values

• W=1 => Block until first node written successfully


• W=N => Block until all nodes written successfully
• W=0 => Async writes

• R=1 => Block until the first node returns an answer


• R=N => Block until all nodes return an answer
• R=0 => Doesn't make sense

• QUORUM:
o R = N/2+1
o W = N/2+1
o => Fully consistent
Data Model - Forget SQL

Do you know SQL?


Data Model - Vocabulary

• Keyspace – like namespace for unique keys.

• Column Family – very much like a table… but not quite.

• Key – a key that represent row (of columns)

• Column – representation of value with:


o Column name
o Value
o Timestamp

• Super Column – Column that holds list of columns inside


Data Model - Columns

struct Column { 
 1: binary   name, 
 2: binary   value, 
 3: i64      timestamp, 
}

JSON-ish notation:

 "name":      "emailAddress", 
 "value":     "[email protected]", 
 "timestamp": 123456789 }
Data Model - Column Family

• Similar to SQL tables


• Has many columns
• Has many rows
Data Model - Rows

• Primary key for objects


• All keys are arbitrary length strings

 "Users": { 
    "ran":{ 
        {"name":"emailAddress", "value":"[email protected]"},
        {"name":"webSite", "value":"http://bar.com"} 
    }, 
    "f.rat":{ 
        {"name":"emailAddress", "value":"[email protected]"} 
    } 
 "Stats":{ 
    "ran":{ 
        {"name":"visits", "value":"243"},  
    }  
}
Data Model - Short Notation

Users:                                 CF
    ran:                               ROW
        emailAddress: [email protected],     COLUMN
        webSite: http://bar.com        COLUMN
    f.rat:                             ROW
        emailAddress: [email protected]  COLUMN
Stats:                                 CF
    ran:                               ROW
        visits: 243                    COLUMN
Data Model - Songs example

Songs: 
    Meir Ariel: 
        Shir Keev: 6:13, 
        Tikva: 4:11,
        Erol: 6:17
        Suetz: 5:30
        Dr Hitchakmut: 3:30
    Mashina:
        Rakevet Layla: 3:02
        Optikai: 5:40
Data Model - Super Columns

Songs: 
    Meir Ariel:
        Shirey Hag:
            Shir Keev: 6:13, 
            Tikva: 4:11,
            Erol: 6:17
        Vegluy Eynaim: 
            Suetz: 5:30
            Dr Hitchakmut: 3:30
    Mashina:
        ...
Data Model - Super Columns

• Columns whose values are lists of columns


The API

get
get_slice
multiget
multiget_slice
get_count
get_ranage_slice
get_ranage_slices
insert
remove
batch_insert
batch_mutate
The True API

get(keyspace, key, column_path, consistency)


get_slice(ks, key, column_parent, predicate, consistency)
multiget(ks, keys, column_path, consistency)
multiget_slice(ks, keys, column_parent,
predicate, consistency)
...
Consistency Model

• N - per keyspace
• R - per each read requests
• W - per each write request
Consistency Model

Cassandra defines:

enum ConsistencyLevel {
    ZERO = 0,
    ONE = 1,
    QUORUM = 2,
    DCQUORUM = 3,
    ALL = 5,
}
Java Code

TTransport tr = new TSocket("localhost", 9160); 


TProtocol proto = new TBinaryProtocol(tr); 
Cassandra.Client client = new Cassandra.Client(proto); 
tr.open(); 

String key_user_id = "1"; 

long timestamp = System.currentTimeMillis(); 


client.insert("Keyspace1", 
              key_user_id, 
              new ColumnPath("Standard1", 
                             null,
                             "name".getBytes("UTF-8")), 
              "Chris Goffinet".getBytes("UTF-8"),
              timestamp, 
              ConsistencyLevel.ONE); 
Java Client - Hector
http://github.com/rantav/hector
• The de-facto java client for cassandra

• Encapsulates thrift
• Adds JMX (Monitoring)
• Connection pooling
• Failover
• Open-sourced at github and has a growing
community of developers and users.
Java Client - Hector - cont

 /**
   * Insert a new value keyed by key
   *
   * @param key   Key for the value
   * @param value the String value to insert
   */
  public void insert(final String key, final String value) {
    Mutator m = createMutator(keyspaceOperator);
    m.insert(key, 
             CF_NAME, 
             createColumn(COLUMN_NAME, value));
  }
Java Client - Hector - cont

  /**
   * Get a string value.
   *
   * @return The string value; null if no value exists for the given key.
   */
  public String get(final String key) throws HectorException {
    ColumnQuery<String, String> q = createColumnQuery(keyspaceOperator, serializer, serializer);
    Result<HColumn<String, String>> r = q.setKey(key).
        setName(COLUMN_NAME).
        setColumnFamily(CF_NAME).
        execute();
    HColumn<String, String> c = r.get();
    return c == null ? null : c.getValue();
  }
Extra

If you're not snoring yet...


Sorting

Columns are sorted by their type 


• BytesType 
• UTF8Type
• AsciiType
• LongType
• LexicalUUIDType
• TimeUUIDType

Rows are sorted by their Partitioner


• RandomPartitioner
• OrderPreservingPartitioner
• CollatingOrderPreservingPartitioner
Thrift

Cross-language protocol
Compiles to: C++, Java, PHP, Ruby, Erlang, Perl, ...

struct UserProfile { 
    1: i32    uid, 
    2: string name, 
    3: string blurb 

service UserStorage { 
    void         store(1: UserProfile user),
    UserProfile  retrieve(1: i32 uid) 
}
Thrift

Generating sources:

thrift --gen java cassandra.thrift

thrift -- gen py cassandra.thrift


Internals
Required Reading ;-)

BigTable http://labs.google.com/papers/bigtable.html

Dynamo http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html
From Dynamo:

• Symmetric p2p architecture
• Gossip based discovery and error detection
• Distributed key-value store
o Pluggable partitioning 
o Pluggable topology discovery
• Eventual consistent and Tunable per operation 
From BigTable

• Sparse Column oriented sparse array


• SSTable disk storage
o Append-only commit log
o Memtable (buffering and sorting)
o Immutable sstable files
o Compactions
o High write performance 
Architecture Layers

Cluster Management Single Host Consistency


Messaging service  Commit log  Tombstones 
Gossip  Memtable  Hinted handoff 
Failure detection  SSTable  Read repair 
Cluster state  Indexes  Bootstrap 
Partitioner  Compaction  Monitoring 
Replication  Admin tools
Gossip

• p2p
• Enables seamless nodes addition.
• Rebalancing of keys
• Fast detection of nodes that goes down.
• Every node knows about all others - no
master.
Internals - Consistent Hashing

 
Memtables

• In-memory representation of recently written data


• When the table is full, it's sorted and then flushed to disk ->
sstable
SSTables

Sorted Strings Tables


• Immutable
• On-disk
• Sorted by a string key
• In-memory index of elements
• Binary search (in memory) to find element location
• Bloom filter to reduce number of unneeded binary searches.
Write Path

 
Write Path

 
Compactions

 
Write Properties

• No Locks in the critical path


• Always available to writes, even if there are failures.

• No reads
• No seeks 
• Fast 
• Atomic within ColumnFamily
Read Path
Reads
Read Properteis

• Read multiple SSTables 


• Slower than writes (but still fast) 
• Seeks can be mitigated with more RAM
• Uses probabilistic bloom filters to reduce lookups.
Bloom Filters

• Space efficient probabilistic data structure


• Test whether an element is a member of a set
• Allow false positive, but not false negative 
• k hash functions
• Union and intersection are implemented as bitwise OR, AND
Compactions

• Merge keys 
• Combine columns 
• Discard tombstones
• Use bloom filters bitwise OR operation

• Large and Small compactions


Deletions

• Deletion marker (tombstone) necessary to suppress data in


older SSTables, until compaction 
• Read repair complicates things a little 
• Eventually consistent complicates things more 
• Solution: configurable delay before tombstone GC, after
which tombstones are not repaired
Extra Long list of subjects

SEDA
anti entropy
hinted handoff
repair on read
timestamps -> vector clocks
consistent hashing
merkle trees
References

• http://horicky.blogspot.com/2009/11/nosql-patterns.html
• http://s3.amazonaws.com/AllThingsDistributed/sosp/amazon
-dynamo-sosp2007.pdf
• http://labs.google.com/papers/bigtable.html
• https://nosqleast.com/2009/
• http://bret.appspot.com/entry/how-friendfeed-uses-mysql
• http://www.julianbrowne.com/article/viewer/brewers-cap-
theorem
• http://www.allthingsdistributed.com/2008/12/eventually_cons
istent.html
• http://wiki.apache.org/cassandra/DataModel
• http://incubator.apache.org/thrift/

You might also like