Ado Lecture III 2024-26
Ado Lecture III 2024-26
Lecture III
MBA(DSDA) 2024-26, SCIT
BigTable
ADO
• BigTable
• Amazon DynamoDB
• Hbase
• Cassandra
Revision of Last Lecture
ADO
Journey From
RDBMS to NoSQL (II)
ADO
• BigTable
• Amazon DynamoDB
• HBase
• Cassandra
BigTable
Amazon DynamoDB
• Amazon DynamoDB is a key-value and document
database that is fully managed, multi-region, and auto-
scaling so that you don’t have to worry about the
infrastructure or datacenter.
• Dynamo Paper- Published by a group of Amazon.com
engineers in 2007, the Dynamo Paper described a new
kind of database.
• As the other main NoSQL solutions, such as MongoDB (2009)
or Apache Cassandra (2008)
• Single-Table Design- put all your entities in the same
table.
• A fully managed NoSQL database released by AWS in 2012.
Amazon DynamoDB
• Create primary keys for your items which are
composed of a partition key and sort key.
• You can use just the partition key as the primary
key, but for most cases, you will also want to
leverage a sort key.
• DynamoDB allows two kinds of primary keys:
– Simple primary keys, made of a single element called
partition key (PK).
– Composite primary keys, made of partition key (PK)
and sort key (SK).
Amazon DynamoDB
Amazon DynamoDB
Amazon DynamoDB
Amazon DynamoDB
Amazon DynamoDB
• Partitioning- Horizontal Scaling
{
TableName: "Music",
KeyConditionExpression: "Artist = :a and SongTitle = :t",
ExpressionAttributeValues: {
":a": "No One You Know",
":t": "Call Me Today"
} https://docs.aws.amazon.com/
} amazondynamodb/latest/developerguide/
SQLtoNoSQL.ReadData.Query.html
Amazon DynamoDB
• Useful Links:
• https://medium.com/swlh/data-modeling-in-a
ws-dynamodb-dcec6798e955
• https://blog.theodo.com/2021/04/introductio
n-to-dynamo-db-modeling/
BigTable
• Bigtable + Mapreduce-> Hbase(HDFS)
A column-oriented database serializes all of the values of a column together, then the
values of the next column, and so on.
HBase
• Revision
HBase
• Revision
HBase
• HBase Blocks
Key:Value
HBase
• HBase Blocks
Key:Value
HBase
• HBase Blocks
Key:Value
HBase
• HBase Blocks
Key:Value
Cell Example
HBase
• HBase Blocks
Key:Value
More at:
HBase Query Example: put(), get(), scan() Comm
and in HBase
Cassandra
History
• Cassandra was developed at Facebook for
inbox search.
• It was open-sourced by Facebook in July 2008.
• Cassandra was accepted into Apache
Incubator in March 2009.
• It was made an Apache top-level project since
February 2010.
Cassandra
Imp Websites
• http://cassandra.apache.org/
• https://academy.datastax.com/planet-cassand
ra/what-is-apache-cassandra
• https://www.credera.com/blog/technology-in
sights/java/cassandra-explained-5-minutes-les
s/
Cassandra
Keyspace
Cassandra
Column Family
Cassandra
Column
Example of Sequence
CREATE SEQUENCE sequence_name START WITH initial_value
INCREMENT BY increment_value MINVALUE minimum value
MAXVALUE maximum value CYCLE|NOCYCLE ;
Cassandra
• Although Cassandra falls under the column-
oriented database type, which stores its data
storage in columns, it actually is a partitioned
data store, with “partitioned” referring to the
fact that the database uses unique keys for
each row to distribute the rows across
multiple nodes. It stores data in sparse hash
tables, with “sparse” alluding to the fact that
all rows may not have the same columns.
Cassandra
• Cassandra handles various types of data, such as
structured, semi-structured, and unstructured
data.
• Cassandra is fully distributed
• Cassandra is designed as a decentralized database,
meaning that all nodes are the same; there’s no
concept of master/slave nodes. This also means
that there is no single point of failure since there
are no special hosts, and the cluster continues
operations regardless of node failures.
Cassandra
• Cassandra uses an efficient log–structured
engine that turns updates into sequential I/O.
Cassandra
• It is scalable, fault-tolerant, and consistent.
• It is a column-oriented database.
• Its distribution design is based on Amazon’s Dynamo and its
data model on Google’s Bigtable.
• Created at Facebook, it differs sharply from relational database
management systems.
• Cassandra implements a Dynamo-style replication model with
no single point of failure, but adds a more powerful “column
family” data model.
• Cassandra is being used by some of the biggest companies such
as Facebook, Twitter, Cisco, Rackspace, ebay, Twitter, Netflix,
and more.
Cassandra Features
• Elastic scalability − Cassandra is highly scalable; it
allows to add more hardware to accommodate more
customers and more data as per requirement.
• Always on architecture − Cassandra has no single
point of failure and it is continuously available for
business-critical applications that cannot afford a
failure.
• Fast linear-scale performance − Cassandra is linearly
scalable, i.e., it increases your throughput as you
increase the number of nodes in the cluster.
Therefore it maintains a quick response time.
Cassandra Features
• Flexible data storage − Cassandra accommodates all possible
data formats including: structured, semi-structured, and
unstructured. It can dynamically accommodate changes to
your data structures according to your need.
• Easy data distribution − Cassandra provides the flexibility to
distribute data where you need by replicating data across
multiple data centers.
• Transaction support − Cassandra supports properties like
Atomicity, Consistency, Isolation, and Durability (ACID).
• Fast writes − Cassandra was designed to run on cheap
commodity hardware. It performs blazingly fast writes and
can store hundreds of terabytes of data, without sacrificing
the read efficiency.
Cassandra Architecture
• Node Node is the place where data is stored. It is the basic
component of Cassandra.
• Data Center A collection of nodes are called data center.
Many nodes are categorized as a data center.
• Cluster The cluster is the collection of many data centers.
• Commit Log Every write operation is written to Commit Log.
Commit log is used for crash recovery.
• Mem-table After data written in Commit log, data is written in
Mem-table. Data is written in Mem-table temporarily.
• SSTable When Mem-table reaches a certain threshold, data is
flushed to an SSTable disk file.
Cassandra Architecture
• Cassandra places replicas of data on different nodes based on
these two factors.
• Where to place next replica is determined by the Replication
Strategy.
• While the total number of replicas placed on different nodes
is determined by the Replication Factor.
Cassandra Architecture
Keyspace
The basic attributes of a Keyspace in Cassandra
are −
• Replication factor − It is the number of
machines in the cluster that will receive copies of
the same data.
• Replica placement strategy − It is nothing but
the strategy to place replicas in the ring. We have
strategies such as simple strategy (rack-aware
strategy), old network topology
strategy (rack-aware strategy), and network
topology strategy (datacenter-shared strategy).
Cassandra Architecture
Keyspace
The basic attributes of a Keyspace in Cassandra
are −
• Column families − Keyspace is a container for a
list of one or more column families. A column
family, in turn, is a container of a collection of
rows. Each row contains ordered columns.
Column families represent the structure of your
data. Each keyspace has at least one and often
many column families.
Cassandra Architecture
Keyspace
The basic attributes of a Keyspace in Cassandra
are −
• Column families − Keyspace is a container for a
list of one or more column families. A column
family, in turn, is a container of a collection of
rows. Each row contains ordered columns.
Column families represent the structure of your
data. Each keyspace has at least one and often
many column families.
Cassandra Architecture
Keyspace
CREATE KEYSPACE Keyspace name
WITH replication = {'class': 'SimpleStrategy',
'replication_factor' : 3};
Cassandra Architecture
Cassandra Architecture
• Replication Strategy
• SimpleStrategy
• NetworkTopologyStrategy
Cassandra Architecture
Keyspace
CREATE KEYSPACE Keyspace name
WITH replication = {'class': 'SimpleStrategy',
'replication_factor' : 3};
Cassandra Architecture
• Write Operation
1. When write request comes to the node, first of all, it logs in
the commit log.
2. Then Cassandra writes the data in the mem-table. Data
written in the mem-table on each write request also writes in
commit log separately. Mem-table is a temporarily stored
data in the memory while Commit log logs the transaction
records for back up purposes.
3. When mem-table is full, data is flushed to the SSTable data
file.
SSTable stands for Sorted Strings Table
Cassandra Architecture
• Write Operation
Cassandra
Native Data Types
Cassandra
Native Data Types
BigTable- Case Study
Moving from Cassandra to Auto-Scaling Bigtable at
Spotify (Cloud Next '19)
https://www.youtube.com/watch?v=Hfd3VZOYXNU&autoplay=1
Cassandra
Downloading Cassandra:
https://cassandra.apache.org/_/download.html
Installing Cassandra on Windows
https://phoenixnap.com/kb/install-cassandra-on-windows
https://www.javatpoint.com/cassandra-setup-and-installation
Installing Cassandra on Linux
https://www.hostinger.in/tutorials/set-up-and-install-
cassandra-ubuntu/
BigTable
MongoDB
Installation:
• Community Server 8.0.3
https://www.mongodb.com/try/download/com
munity-edition
Tools
• MongoDB Shell
• MongoDB Compass
• MongoDB Database Tools
MongoDB
Each Row( in Column Store)
MongoDB
Mind Mapping
MongoDB
Documents
MongoDB
Documents
• The document model is a superset of other data models, including
key-value pairs, relational, objects, graph, and geospatial.
• Key-value pairs can be modeled with fields and values in a
document. Any field in a document can be indexed, providing
developers with additional flexibility in how to query the data.
• Relational data can be modeled differently (and some would argue
more intuitively) by keeping related data together in a single
document using embedded documents and arrays. Related data
can also be stored in separate documents, and database
references can be used to connect the related data.
MongoDB
Documents
• Documents map to objects in most popular
programming languages.
• Graph nodes and/or edges can be modeled as
documents. Edges can also be modeled
through database references. Graph queries can be run
using operations like $graphLookup.
• Geospatial data can be modeled as arrays in documents.
ADO