0% found this document useful (0 votes)
18 views18 pages

04-2 Intro Nosql

The document discusses the evolution of databases from RDBMS to NoSQL. It describes that traditional RDBMS were designed for business data processing but not suitable for modern web applications with different needs like scalability, flexibility and high availability. This led to the emergence of NoSQL databases with different data models like key-value, column-family, and document stores to address the shortcomings of RDBMS for large scale web and cloud applications. Popular NoSQL databases discussed include DynamoDB, HBase, Cassandra, Redis and MongoDB.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views18 pages

04-2 Intro Nosql

The document discusses the evolution of databases from RDBMS to NoSQL. It describes that traditional RDBMS were designed for business data processing but not suitable for modern web applications with different needs like scalability, flexibility and high availability. This led to the emergence of NoSQL databases with different data models like key-value, column-family, and document stores to address the shortcomings of RDBMS for large scale web and cloud applications. Popular NoSQL databases discussed include DynamoDB, HBase, Cassandra, Redis and MongoDB.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

28/11/2022

NoSQL

Eras of Databases

1
28/11/2022

Eras of Databases

DB engines ranking according to their


popularity (2019)

2
28/11/2022

Before NoSQL

Star schema

OLTP
OLAP cube
5

RDBMS: one size fits all needs

3
28/11/2022

ICDE 2005 conference

The last 25 years of commercial DBMS development can be summed up in a single phrase:
"one size fits all". This phrase refers to the fact that the traditional DBMS architecture
(originally designed and optimized for business data processing) has been used to support
many data-centric applications with widely varying characteristics and requirements. In this
paper, we argue that this concept is no longer applicable to the database market, and that the
commercial world will fracture into a collection of independent database engines ...
7

After is NoSQL

4
28/11/2022

NoSQL landscape

How to write a CV

10

10

5
28/11/2022

Why NoSQL

• Web applications have different needs


• Horizontal scalability – lowers cost
• Geographically distributed
• Elasticity
• Schema less, flexible schema for semi-structured data
• Easier for developers
• Heterogeneous data storage
• High Availability/Disaster Recovery
• Web applications do not always need
• Transaction
• Strong consistency
• Complex queries

11

11

SQL vs NoSQL

SQL NoSQL
Gigabytes to Terabytes Petabytes(1kTB) to Exabytes(1kPB) to
Zetabytes(1kEB)
Centralized Distributed
Structured Semi structured and Unstructured
Structured Query Language No declarative query language
Stable Data Model Schema less
Complex Relationships Less complex relationships
ACID Property Eventual Consistency
Transaction is priority High Availability, High Scalability
Joins Tables Embedded structures

12

6
28/11/2022

NoSQL use cases

• Massive data volume at scale (Big volume)


• Google, Amazon, Yahoo, Facebook – 10-100K servers
• Extreme query workload (Big velocity)
• High availability
• Flexible, schema evolution

13

13

Relational data model revisited

• Data is usually stored in row by row


manner (row store)
• Standardized query language (SQL)
• Data model defined before you add data
• Joins merge data from multiple tables
• Results are tables
• Pros: Mature ACID transactions with fine-grain
security controls, widely used
Oracle, MySQL, PostgreSQL,
• Cons: Requires up front data modeling, does not Microsoft SQL Server, IBM
scale well DB/2

14

14

7
28/11/2022

Key/value data model

• Simple key/value interface


• GET, PUT, DELETE
• Value can contain any kind of data
• Super fast and easy to scale (no joins)
• Examples
• Berkley DB, Memcache, DynamoDB, Redis, Riak

15

15

Key/value vs. table

• A table with two columns and a simple


interface
• Add a key-value
• For this key, give me the value
• Delete a key

16

16

8
28/11/2022

Key/value vs. Relational data model

17

17

Memcached

• Open source in-memory key-value caching system


• Make effective use of RAM on many distributed web servers
• Designed to speed up dynamic web applications by alleviating
database load
• Simple interface for highly distributed RAM caches
• 30ms read times typical
• Designed for quick deployment, ease of development
• APIs in many languages

18

18

9
28/11/2022

Redis

• Open source in-memory key-value store with optional


durability
• Focus on high speed reads and writes of common data
structures to RAM
• Allows simple lists, sets and hashes to be stored within the
value and manipulated
• Many features that developers like expiration, transactions,
pub/sub, partitioning

19

19

Amazon DynamoDB

• Scalable key-value store


• Fastest growing product in Amazon's history
• Focus on throughput on storage and predictable read and
write times
• Strong integration with S3 and Elastic MapReduce

20

20

10
28/11/2022

Column family store

• Dynamic schema, column-oriented data model


• Sparse, distributed persistent multi-dimensional sorted map
• (row, column (family), timestamp) -> cell contents

22

22

Column families

• Group columns into "Column families"


• Group column families into "Super-Columns"
• Be able to query all columns with a family or super family
• Similar data grouped together to improve speed

23

23

11
28/11/2022

Column family data model vs. relational

• Sparse matrix, preserve table structure


• One row could have millions of columns but can be very sparse
• Hybrid row/column stores
• Number of columns is extendible
• New columns to be inserted without doing an "alter table"

24

24

Bigtable

• ACM TOCS 2008


• Fault-tolerant, persistent
• Scalable
• Thousands of servers
• Terabytes of in-memory data
• Petabyte of disk-based data
• Millions of reads/writes per
second, efficient scans
• Self-managing
• Servers can be added/removed
dynamically
• Servers adjust to load imbalance

25

25

12
28/11/2022

Apache Hbase

• Open-source Bigtable, written in JAVA


• Part of Apache Hadoop project

26

26

Apache Cassandra

• Apache open source column family database


• Supported by DataStax
• Peer-to-peer distribution model
• Strong reputation for linear scale out (millions of
writes/second)
• Written in Java and works well with HDFS and MapReduce

27

27

13
28/11/2022

Graph data model

• Core abstractions: Nodes, Relationships, Properties on both

28

28

Graph database store

• A database stored data in an explicitly graph structure


• Each node knows its adjacent nodes
• Queries are really graph traversals

29

29

14
28/11/2022

Linking open data

33

33

Neo4j

• Graph database designed to be easy to use by Java


developers
• Disk-based (not just RAM)
• Full ACID
• High Availability (with Enterprise Edition)
• 32 Billion Nodes, 32 Billion Relationships,
64 Billion Properties
• Embedded java library
• REST API

34

34

15
28/11/2022

Document store

• Documents, not value, not tables


• JSON or XML formats
• Document is identified by ID
• Allow indexing on properties

35

35

MongoDB

• Open Source JSON data store created by 10gen


• Master-slave scale out model
• Strong developer community
• Sharding built-in, automatic
• Implemented in C++ with many APIs (C++, JavaScript, Java,
Perl, Python etc.)

40

40

16
28/11/2022

MongoDB architecture

• Replica set
• Copies of the data on each node
• Data safety
• High availability
• Disaster recovery
• Maintenance
• Read scaling
• Sharding
• “Partitions” of the data
• Horizontal scale

41

Apache CouchDB

• Apache project
• Open source JSON data store
• Written in ERLANG
• RESTful JSON API
• B-Tree based indexing, shadowing b-tree versioning
• ACID fully supported
• View model
• Data compaction
• Security

42

42

17
28/11/2022

Thank you for your attention!


Q&A

43

18

You might also like