0% found this document useful (0 votes)

23 views64 pages

NoSQL Database

The document discusses the benefits and limitations of relational databases compared to NoSQL databases, highlighting that while relational databases are designed for structured data and provide ACID properties, they struggle with scalability and distributed applications. NoSQL databases offer flexibility, horizontal scaling, and schema-less design, making them suitable for handling large volumes of diverse data. The CAP theorem is introduced, emphasizing the trade-offs between consistency, availability, and partition tolerance in distributed systems.

Uploaded by

Md Hamid

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views64 pages

NoSQL Database

Uploaded by

Md Hamid

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 64

Relational databases 5

• Benefits of Relational databases:

🡺 Designed for OLTP

🡺 ACID properties
🡺 Strong consistancy, concurrency, recovery
🡺 Mathematical background
🡺 Standard Query language (SQL)
🡺 Lots of tools to use with i.e: Reporting services, entity
frameworks, ...
NoSQL why, what and when? 8
But...
❑ Relational databases were not built
for distributed applications.

Because...
❑ Joins are expensive
❑ Hard to scale horizontally (adding
more machines)
❑ Impedance (object-relational)
mismatch occurs
❑ Expensive (product cost, hardware,
Maintenance)
NoSQL why, what and when? 9

And....
It’s weak in:
❑ Speed (performance)
❑ High availability
❑ Partition tolerance
Why NOSQL now?? Driving Trends 11
13
What is NoSQL?

❑ A No SQL database provides a mechanism

for storage and retrieval of data that
employs less constrained models than
traditional relational database

❑ No SQL systems are also referred to as

"NotonlySQL“ to emphasize that they do in
fact allow SQL-like query languages to be
used.
Motivations of NoSQL databases 14

o simplicity of design
o simpler "horizontal" scaling to
clusters of machines (which is
a problem for relational
databases)
o finer control over availability
Servers can be added or removed without
application downtime

o limiting the object-relational

impedance mismatch
Characteristics of NoSQL databases 14

NoSQL avoids:
▶ Overhead of ACID transactions
▶ Complexity of SQL query
▶ Burden of up-front schema design
▶ DBA presence
▶ Transactions (It should be handled
at
application layer)
Provides:
▶ Easy and frequent changes to DB
▶ Fast development
▶ Large data volumes(eg.Google)
▶ Schema less
What we need ? 26

• We need a distributed database system having such

features:
– Fault tolerance
– High availability
– Consistency
– Scalability

Which is impossible!!!
According to CAP theorem
CAP Theorem
■ Three properties of a system
❑ Consistency (all copies have same value)
❑ Availability (system can run even if parts have failed)
❑ Via replication
❑ Partitions (network can break into two or more parts,
each with active systems that can’t talk to other
parts)
■ Brewer’s CAP “Theorem”: You can have at most
two of these three properties for any system
■ Very large systems will partition at some point
❑ 🡺Choose one of consistency or availablity
❑ Traditional database choose consistency
❑ Most Web applications choose availability
■ Except for specific parts such as order processing
Availability

■ Traditionally, thought of as the

server/process available five 9’s (99.999
%).
■ However, for large node system, at
almost any point in time there’s a good
chance that a node is either down or
there is a network disruption among the
nodes.
❑ Want a system that is resilient in the face
of network disruption
Eventual Consistency

■ When no updates occur for a long period of time,

eventually all updates will propagate through the
system and all the nodes will be consistent
■ For a given accepted update and a given node,
eventually either the update reaches the node or the
node is removed from service
■ Known as BASE (Basically Available, Soft state,
Eventual consistency), as opposed to ACID
❑ Soft state: copies of a data item may be inconsistent

❑ Eventually Consistent – copies becomes consistent at

some later time if there are no more updates to that
data item
CAP theorem 27

We can not achieve all the three items

In distributed database systems
(center)
NoSQL when? 10

o To handle a huge volume of structured, semi-structured and

unstructured data.
o Where there is a need to follow modern software development
practices like Agile Scrum and if you need to deliver
prototypes or fast applications.
o If you prefer object-oriented programming.
o If your relational database is not capable enough to scale up
to your traffic at an acceptable cost.
o If you want to have an efficient, scale-out architecture in place
of an expensive and monolithic architecture.
o If you have local data transactions that need not be very
durable.
o If you are going with schema-less data and want to include
new fields without any ceremony.
o When your priority is easy scalability and availability.
NoSQL when not? 10

o If you are required to perform complex and dynamic querying

and reporting, then you should avoid using NoSQL as it has a
limited query functionality. For such requirements, you should
prefer SQL only.
o NoSQL also lacks in the ability to perform dynamic operations.
It can’t guarantee ACID properties. In such cases like financial
transactions, etc., you may go with SQL databases.
o You should also avoid NoSQL if your application needs
run-time flexibility.
o If consistency is a must and if there aren’t going to be any
large-scale changes in terms of the data volume, then going
with the SQL database is a better option.
NoSQL is getting more & more popular 15
What is a schema-less datamodel? 16

In relational Databases:

▶ You can’t add a record which does

not fit the schema
▶ You need to add NULLs to unused
items in a row
▶ We should consider the
datatypes.
i.e : you can’t add a stirng to an
interger field
▶ You can’t add multiple items in a
field (You should create another
table: primary-key, foreign key,
joins, normalization, ... !!!)
What is a schema-less datamodel? 17

In NoSQL Databases:

▶ There is no schema to consider

▶ There is no unused cell
▶ There is no datatype (implicit)
▶ Most of considerations are done in

application layer

▶ We gather all items in an aggregate

(document)
Aggregate Data Models 18

NoSQL databases are classified in four major

datamodels:

• Key-value
• Document
• Column family (or wide
column)
• Graph

Each DB has its own query language

Aggregate Data Models 18

Column Family: Azure Cosmos DB, Accumulo, Cassandra, Scylla,

HBase.

Document: Azure Cosmos DB, Apache CouchDB, ArangoDB,

BaseX, Clusterpoint, Couchbase, eXist-db, IBM Domino,
MarkLogic, MongoDB, OrientDB, Qizx, RethinkDB

Key–value: Azure Cosmos DB, Aerospike, Apache Ignite,

ArangoDB, Berkeley DB, Couchbase, Dynamo, FoundationDB,
InfinityDB, MemcacheDB, MUMPS, Oracle NoSQL Database,
OrientDB, Redis, Riak, SciDB, SDBM/Flat File dbm

Graph: Azure Cosmos DB, AllegroGraph, ArangoDB,

InfiniteGraph, Apache Giraph, MarkLogic, Neo4J, AgensGraph,
OrientDB, Virtuoso
Key-value data model 19

🡺 Simplest NOSQL databases

🡺 The main idea is the use of a

hash table

🡺 Access data (values) by strings

called keys

🡺 Data has no required format data

may have any format

🡺 Data model: (key, value) pairs

🡺 Basic
Operations:
Insert(key,value),
Fetch(key),
Update(key),
Delete(key)
Column family data model 20

🡺 Based on Google's Bigtable

🡺 The column is lowest/smallest instance of data.

🡺 the names and format of the columns can vary from row to row in the
same table

🡺 each column family typically contains multiple columns that are used
together
🡺 Within a given column family, all data is stored in a row-by-row
fashion, such that the columns for a given row are stored together,
rather than each column being stored separately.
Column family data model 20

🡺 A wide-column store can be

interpreted as a
two-dimensional key–value
store
🡺 It is a tuple that contains a
name, a value and a timestamp
Column family data model 21

Some statistics about Facebook Search (using Cassandra)

❖ MySQL > 50 GB Data

🡺 Writes Average : ~300 ms
🡺 Reads Average : ~350 ms

❖ Rewritten with Cassandra > 50 GB Data

🡺 Writes Average : 0.12 ms
🡺 Reads Average : 15 ms
Graph data model 22
🡺 Similar to network data model at high
level of abstraction
🡺 Based on Graph Theory.
🡺 You can use graph algorithms easily
🡺 Graph Query language (GQL): Gremlin,
cypher, SPARQL
🡺 underlying storage mechanism of graph
databases can vary: relational, key–value
store or document-oriented database
Document based data model 23

• The central concept of a document-oriented database is the notion

of a document
• document store are roughly equivalent to the programming concept
of an object
• While each document-oriented database implementation differs on
the details of this definition, in general, they all assume documents
encapsulate and encode data (or information) in some standard
format or encoding
• Encodings in use include XML, YAML, JSON, as well as binary forms
like BSON.
• allow different types of documents in a single store
• Documents are addressed in the database via a unique key that
represents that document. This key is a simple identifier (or ID),
typically a string, a URI, or a path
Document based data model 23

• Pair each key with complex data

structure known as data structure.
• Documents can contain many different
key-value pairs, or key-array pairs, or
even nested documents.
SQL vs NOSQL 25
Common Advantages of NoSQL
Systems

■ Cheap, easy to implement (open source)

■ Data are replicated to multiple nodes (therefore identical and
fault-tolerant) and can be partitioned
❑ When data is written, the latest version is on at least one node and then
replicated to other nodes

❑ No single point of failure

■ Easy to distribute
■ Don't require a schema
What does NoSQL Not Provide?
■ Joins
■ Group by
❑ But PNUTS (a massively parallel and
geographically distributed database
system for Yahoo!'s web applications)
provides materialized view approach to
joins/aggregation.
■ ACID transactions
■ SQL
■ Integration with applications that are
based on SQL
What: HBase is...
Open-source non-relational distributed
column family database modeled after
Google’s BigTable.

Think of it as a sparse, consistent,

distributed, multidimensional, sorted map:

labeled tables of rows

row consist of key-value cells:

(row key, column family, column, timestamp) -> value

HBase
random, real time read/write access to the
Big Data
goal is the hosting of very large tables --
billions of rows X millions of columns --
atop clusters of commodity hardware.
HDFS vs HBase
HBase
Tables in HBase can serve as the input and
output for MapReduce jobs run in Hadoop
may be accessed through the Java API but
also through REST, Avro or Thrift gateway
APIs

HBase runs on top of HDFS and is

well-suited for faster read and write
operations on large datasets with high
throughput and low input/output latency.
Phoenix
HBase is not a direct replacement for a
classic SQL database, however Apache Phoenix
project provides a SQL layer for Hbase
Apache Phoenix is an open source, massively
parallel, relational database engine
supporting OLTP for Hadoop using Apache HBase
as its backing store
Phoenix provides a JDBC driver that hides the
intricacies of the noSQL store enabling users
to create, delete, and alter SQL tables,
views, indexes, and sequences; insert and
delete rows singly and in bulk; and query data
through SQL.
Phoenix compiles queries and other statements
into native noSQL store APIs
Usage
HBase is now serving several data-driven
websites
Facebook elected to implement its new messaging
platform using HBase in November 2010, but
migrated away from HBase in 2018 (MyRocks)
Twitter runs HBase across its entire Hadoop
cluster.
HP IceWall SSO is a web-based single sign-on
solution and uses HBase to store user data to
authenticate users.
Adobe: currently have about 30 nodes running
HDFS, Hadoop and HBase in clusters ranging from
5 to 14 nodes on both production and development

Powered By Apache Hbase at

http://hbase.apache.org/poweredbyhbase.html
Enterprises that use HBase
What: Part of Hadoop
ecosystem

Provides realtime random read/write

access to data stored in HDFS

read HBase write

Data Data
read write
Consumer Producer
HDFS write
Hive vs. HBase
o Unlike Hive, HBase operations run in real-time on
its database rather than MapReduce jobs
o Apache Hive is a data warehouse system that's
built on top of Hadoop. Apache HBase is a NoSQL
key/value store on top of HDFS
o Apache Hive provides SQL features to Spark/Hadoop
data. HBase can store or process Hadoop data with
near real-time read/write needs.
o Hive should be used for analytical querying of
data collected over a period of time. HBase is
primarily used to store and process unstructured
Hadoop data
o HBase is perfect for real-time querying of Big
Data. Hive should not be used for real-time
querying
What: Features-1

Linear scalability, capable of

storing hundreds of terabytes of data

Automatic and configurable sharding

of tables

Automatic failover support

Strictly consistent reads and writes

What: Features-2
Integrates nicely with Hadoop MapReduce (both
as source and destination)

Easy Java API for client access

Thrift gateway and REST APIs

Bulk import of large amount of data

Replication across clusters & backup options

Block cache and Bloom filters for real-time

queries
How to use HBase?
Hbase Table
How: the Data
Row keys uninterpreted byte arrays
Columns grouped in columnfamilies (CFs)

CFs defined statically upon table creation

Rows are Cell is uninterpreted byte array and a timestamp

ordered and
accessed by Different data All values stores
row key separated into CFs as byte arrays

Row Key Data Rows can

have
geo:{‘country’:‘Belarus’,‘regio
Minsk differen
n’:‘Minsk’}
t
demography:{‘population’:‘1,937,00
0’@ts=2011} columns
Cell can have
New_York_City multiple
geo:{‘country’:‘ USA’,‘state’:’ NY’} versions

geo:{‘country’:‘Fiji’} Data can be

Suva demography:{‘population’:‘8,175,133’@ts=2010,
very “sparse”
‘population’:‘8,244,910’@ts=2011}
How: Writing the Data
Row updates are atomic

Updates across multiple rows are NOT

atomic, no transaction support out of
the box

HBase stores N versions of a cell

(default 3)

Tables are usually “sparse”, not all

columns populated in a row
How: Reading the Data
Reader will always read the last written (and committed)
values

Reading single row: Get

Reading multiple rows: Scan (very fast)

Scan usually defines start key and stop key

Rows are ordered, easy to do partial key scan

How: MapReduce Integration
How: Sharding the Data
Automatic and configurable sharding of
tables:

Tables partitioned into Regions

Region defined by start & end row keys

Regions are the “atoms” of

distribution

Regions are assigned to RegionServers

(HBase cluster slaves)
How: Setup: Components
HBase components

ZooKeeper

client
How: Setup: Hadoop Cluster
Typical Hadoop+HBase setup
Master Node HDFS

NameNode JobTracker MapRed

uce
HBase
HMaster

TaskTracker
TaskTracker

RegionServer RegionServer Slave

Nodes

DataNode DataNode

Slave Node Slave Node

How: Setup: Automatic Failover
When to Use HBase?
When: What HBase is good at

Serving large amount of data: built

to scale from the get-go
fast random access to the data

Write-heavy applications*

Append-style writing (inserting/

overwriting new data) rather than
heavy read-modify-write operations
When: HBase vs ...
General COMMANDS
• status: Provides the status of HBase,
for example, the number of servers.
• version: Provides the version of HBase
being used.
• table_help: Provides help for
table-reference commands.
• whoami: Provides information about the
user.
Hbase DDL commands
• create: Creates a table.
• list: Lists all the tables in HBase.
• disable: Disables a table.
• is_disabled: Verifies whether a table
is disabled.
• enable: Enables a table.
• is_enabled: Verifies whether a table
is enabled.
• describe: Provides the description of
a table.
• alter: Alters a table.
• exists: Verifies whether a table
exists.
• drop: Drops a table from HBase.
Hbase Data Manipulation commands

• put: Puts a cell value at a specified

column in a
specified row in a particular table.
• get: Fetches the contents of row or a
cell.
• delete: Deletes a cell value in a
table.
• deleteall: Deletes all the cells in a
given row.
• scan: Scans and returns the table
data.
• count: Counts and returns the number
of rows in a
table.
• truncate: Disables, drops, and

City, Grade, Salesman - Id) ORDERS (Ord - No, Purchase - Amt, Ord - Date, Customer - Id, Salesman - Id) Write SQL Queries To
50% (2)
City, Grade, Salesman - Id) ORDERS (Ord - No, Purchase - Amt, Ord - Date, Customer - Id, Salesman - Id) Write SQL Queries To
4 pages
Differences Between Parallel and Distributed DB
100% (3)
Differences Between Parallel and Distributed DB
1 page
Full Stack UNIT3
No ratings yet
Full Stack UNIT3
57 pages
Scribed Premium Cookies
100% (1)
Scribed Premium Cookies
7 pages
Introduction To NoSQL
No ratings yet
Introduction To NoSQL
43 pages
IntroNoSQL Revised
No ratings yet
IntroNoSQL Revised
28 pages
Sap New Edition Hana: SQL Script
No ratings yet
Sap New Edition Hana: SQL Script
32 pages
Module 2.3
No ratings yet
Module 2.3
25 pages
8.4 NoSQL Database
No ratings yet
8.4 NoSQL Database
36 pages
Unit 4 Cap Mongodb
No ratings yet
Unit 4 Cap Mongodb
23 pages
NoSQL Database
No ratings yet
NoSQL Database
8 pages
NoSQL Database
No ratings yet
NoSQL Database
8 pages
Nosql
No ratings yet
Nosql
64 pages
CIT843
100% (1)
CIT843
257 pages
BDT Unit 4
No ratings yet
BDT Unit 4
93 pages
BDA Module-3
No ratings yet
BDA Module-3
7 pages
No SQL & RDBMS
No ratings yet
No SQL & RDBMS
39 pages
2 - NoSQL
No ratings yet
2 - NoSQL
32 pages
Join - SQL Join Queries Interview Questions and Answers PDF Set-6 Interview Questions PDF
No ratings yet
Join - SQL Join Queries Interview Questions and Answers PDF Set-6 Interview Questions PDF
9 pages
Unitw 12 W 2
No ratings yet
Unitw 12 W 2
18 pages
11 NoSQL-slides
No ratings yet
11 NoSQL-slides
26 pages
Nosql KK
No ratings yet
Nosql KK
23 pages
2 - Disadvantages of NoSQL Technology
No ratings yet
2 - Disadvantages of NoSQL Technology
3 pages
Introduction To NoSQL
No ratings yet
Introduction To NoSQL
29 pages
Massively Parallel Cloud Data Storage Systems: S. Sudarshan IIT Bombay
No ratings yet
Massively Parallel Cloud Data Storage Systems: S. Sudarshan IIT Bombay
17 pages
CIS - 468 - 04 - NOSQL Databases and Big Data Storage Systems
No ratings yet
CIS - 468 - 04 - NOSQL Databases and Big Data Storage Systems
102 pages
Nosql Prepared
No ratings yet
Nosql Prepared
60 pages
NoSQL Databases
No ratings yet
NoSQL Databases
52 pages
Nosql Database
No ratings yet
Nosql Database
19 pages
NoSQL
No ratings yet
NoSQL
18 pages
DB Connect Extraction
50% (2)
DB Connect Extraction
16 pages
No SQL Lecture Notes
No ratings yet
No SQL Lecture Notes
17 pages
RK NoSQL
No ratings yet
RK NoSQL
35 pages
Big Data Analytics Unit-2
No ratings yet
Big Data Analytics Unit-2
30 pages
Unit 4
No ratings yet
Unit 4
47 pages
Semester 1 Final Exam Oracle PL SQL 3
No ratings yet
Semester 1 Final Exam Oracle PL SQL 3
24 pages
Lecture 6 - NoSQL
No ratings yet
Lecture 6 - NoSQL
28 pages
BDA CW Chapter 3
No ratings yet
BDA CW Chapter 3
9 pages
Unit 4 BDA
No ratings yet
Unit 4 BDA
22 pages
BIG - DATA - Unit 4
No ratings yet
BIG - DATA - Unit 4
99 pages
Bda Module 3
No ratings yet
Bda Module 3
20 pages
NGD Unit 1-4
No ratings yet
NGD Unit 1-4
43 pages
Key For Mid I
100% (1)
Key For Mid I
11 pages
41 NoSQL Introduction
No ratings yet
41 NoSQL Introduction
18 pages
NoSql 2024 Assign2
No ratings yet
NoSql 2024 Assign2
189 pages
Unit VI - 1
No ratings yet
Unit VI - 1
31 pages
A Survey On RDBMS and NoSQL Databases MySQL Vs MongoDB
No ratings yet
A Survey On RDBMS and NoSQL Databases MySQL Vs MongoDB
7 pages
Explain The Term Nosql'. Describe Vertical and Horizontal Scaling
No ratings yet
Explain The Term Nosql'. Describe Vertical and Horizontal Scaling
13 pages
BDA Module 5 - Part1 (No SQL) 2023
No ratings yet
BDA Module 5 - Part1 (No SQL) 2023
32 pages
No SQL
No ratings yet
No SQL
109 pages
Introduction To NoSQL
No ratings yet
Introduction To NoSQL
13 pages
NoSQL Intro
No ratings yet
NoSQL Intro
26 pages
Chapter 1 Oracle Introduction
No ratings yet
Chapter 1 Oracle Introduction
16 pages
DBMS 11
No ratings yet
DBMS 11
13 pages
CS3492-DBMS Unit-5
No ratings yet
CS3492-DBMS Unit-5
9 pages
NoSQL Notes
No ratings yet
NoSQL Notes
11 pages
Unit 3 NoSQL
No ratings yet
Unit 3 NoSQL
98 pages
Unit 2
No ratings yet
Unit 2
26 pages
Netezza Cheat Sheet
No ratings yet
Netezza Cheat Sheet
9 pages
Unit Ii - Nosql Databases
No ratings yet
Unit Ii - Nosql Databases
112 pages
Lecture 1
No ratings yet
Lecture 1
31 pages
Module 5 - NoSQL Databases
No ratings yet
Module 5 - NoSQL Databases
33 pages
NoSQL D
No ratings yet
NoSQL D
26 pages
No SQL
No ratings yet
No SQL
12 pages
NoSQL Databases
No ratings yet
NoSQL Databases
20 pages
Module 1
No ratings yet
Module 1
69 pages
Functions in SQL
No ratings yet
Functions in SQL
6 pages
Sample Java Program Accessing A Database
No ratings yet
Sample Java Program Accessing A Database
28 pages
Unit 4: Big Data Tehnology Landscape Two Inportant Technologies
No ratings yet
Unit 4: Big Data Tehnology Landscape Two Inportant Technologies
42 pages
Introduction To: Nosql
No ratings yet
Introduction To: Nosql
27 pages
SQL PDF
No ratings yet
SQL PDF
7 pages
Nosql What Does It Mean
No ratings yet
Nosql What Does It Mean
15 pages
Abhay 51048 Dbms Ass 04
No ratings yet
Abhay 51048 Dbms Ass 04
8 pages
NOSQL
No ratings yet
NOSQL
23 pages
Data Warehousing - C02 - OLAP
No ratings yet
Data Warehousing - C02 - OLAP
46 pages
DBMS, Unit-2
No ratings yet
DBMS, Unit-2
12 pages
Field Data Type Constrain T Description: Drishti Agarwal
No ratings yet
Field Data Type Constrain T Description: Drishti Agarwal
30 pages
Dbms Report
No ratings yet
Dbms Report
38 pages
Assignment Submission and Assessment
No ratings yet
Assignment Submission and Assessment
13 pages
Databse Systems (Lab Manual)
No ratings yet
Databse Systems (Lab Manual)
9 pages
SQL For Beginners LearnProgrammingAcademy
No ratings yet
SQL For Beginners LearnProgrammingAcademy
43 pages
How To Create An ODBC
No ratings yet
How To Create An ODBC
5 pages
Cisco IM&P External Databases Whitepaper
No ratings yet
Cisco IM&P External Databases Whitepaper
13 pages
PHP How To Store and Read Json Data Via Mysql? - Stack Overflow
No ratings yet
PHP How To Store and Read Json Data Via Mysql? - Stack Overflow
5 pages
Basics of SQL SELECT Statement
No ratings yet
Basics of SQL SELECT Statement
2 pages
IP Practical Code
No ratings yet
IP Practical Code
2 pages
DBMS Que. Bank
No ratings yet
DBMS Que. Bank
2 pages
Flight Management Project Flow Diagram
No ratings yet
Flight Management Project Flow Diagram
1 page
Introduction to Microsoft SQL Server
From Everand
Introduction to Microsoft SQL Server
Eric Frick
No ratings yet

NoSQL Database

Uploaded by

NoSQL Database

Uploaded by

Relational databases 5

• Benefits of Relational databases:

🡺 Designed for OLTP

❑ A No SQL database provides a mechanism

❑ No SQL systems are also referred to as

o limiting the object-relational

• We need a distributed database system having such

■ Traditionally, thought of as the

■ When no updates occur for a long period of time,

❑ Eventually Consistent – copies becomes consistent at

We can not achieve all the three items

o To handle a huge volume of structured, semi-structured and

o If you are required to perform complex and dynamic querying

▶ You can’t add a record which does

▶ There is no schema to consider

▶ We gather all items in an aggregate

NoSQL databases are classified in four major

Each DB has its own query language

Column Family: Azure Cosmos DB, Accumulo, Cassandra, Scylla,

Document: Azure Cosmos DB, Apache CouchDB, ArangoDB,

Key–value: Azure Cosmos DB, Aerospike, Apache Ignite,

Graph: Azure Cosmos DB, AllegroGraph, ArangoDB,

🡺 Simplest NOSQL databases

🡺 The main idea is the use of a

🡺 Access data (values) by strings

🡺 Data has no required format data

🡺 Data model: (key, value) pairs

🡺 Based on Google's Bigtable

🡺 The column is lowest/smallest instance of data.

🡺 A wide-column store can be

Some statistics about Facebook Search (using Cassandra)

❖ MySQL > 50 GB Data

❖ Rewritten with Cassandra > 50 GB Data

• The central concept of a document-oriented database is the notion

• Pair each key with complex data

■ Cheap, easy to implement (open source)

❑ No single point of failure

Think of it as a sparse, consistent,

labeled tables of rows

row consist of key-value cells:

(row key, column family, column, timestamp) -> value

HBase runs on top of HDFS and is

Powered By Apache Hbase at

Provides realtime random read/write

read HBase write

Linear scalability, capable of

Automatic and configurable sharding

Automatic failover support

Strictly consistent reads and writes

Easy Java API for client access

Thrift gateway and REST APIs

Replication across clusters & backup options

Block cache and Bloom filters for real-time

CFs defined statically upon table creation

Rows are Cell is uninterpreted byte array and a timestamp

Row Key Data Rows can

geo:{‘country’:‘Fiji’} Data can be

Updates across multiple rows are NOT

HBase stores N versions of a cell

Tables are usually “sparse”, not all

Reading single row: Get

Reading multiple rows: Scan (very fast)

Rows are ordered, easy to do partial key scan

Tables partitioned into Regions

Region defined by start & end row keys

Regions are the “atoms” of

Regions are assigned to RegionServers

NameNode JobTracker MapRed

RegionServer RegionServer Slave

Slave Node Slave Node

Serving large amount of data: built

Append-style writing (inserting/

• put: Puts a cell value at a specified

You might also like