DECS 43A - Big Data Analysis
DECS 43A - Big Data Analysis
Unit - 3
Big Data Technologies and Databases
Dr. S. P. Ponnusamy
Assistant Professor and Head
1
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Unit - 3
Big Data Technologies and Databases
2
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Agenda
Introduction to NoSQL
Uses, Features and Types
Need, Advantages, Disadvantages
Application of NoSQL
Overview of NewSQL
Comparing SQL, NoSQL and NewSQL
Introduction to MongoDB and its needs
Characteristics of MongoDB
Introduction of Apache Cassandra and its needs
Characteristics of Cassandra
3
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Introduction to NoSQL
RDBMS (SQL) ….. NoSQL
• Value of RDBMS
• Getting Persistent Data
• Concurrency
• Shared DB integration
• A (mostly) standard model
4
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Introduction to NoSQL
RDBMS Characteristics
6
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Introduction to NoSQL
RDBMS (SQL) ….. NoSQL
7
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Introduction to NoSQL
RDBMS (SQL) ….. NoSQL
8
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Introduction to NoSQL
What is NoSQL ?
9
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Introduction to NoSQL
What is NoSQL ?
10
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Introduction to NoSQL
Where does NoSQL come from?
• Non-relational DBMSs are not new
• But NoSQL represents a new incarnation
– Due to massively scalable Internet applications
– Based on distributed and parallel computing
• Development
– Starts with Google
– First research paper published in 2003
– Continues also thanks to Lucene's developers/Apache (Hadoop) and Amazon (Dynamo)
– Then a lot of products and interests came from Facebook, Netfix, Yahoo, eBay, Hulu,
IBM, and many more
11
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Introduction to NoSQL
What is NoSQL ?
12
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Uses of NoSQL
• Log Analysis
• Social Networking Feeds
• Time Based Data (not easily analyzed in RDBMS)
• Dealing with rich variety of Data
(structured, semi-structured and unstructured)
13
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Features of NoSQL
• Open Source
• Non-Relational
• Distributed
• Schema-less
• Cluster friendly
• Born out of 21st century Web Applications
14
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Types of NoSQL
• Broad Classification
1. Key-Value or the big hash table
2. Schema-less
NoSQL
Scheme-less
• Column Based (Cassandra)
Key-Value or the big hash table
• Document Based (CouchDB,
[Amazon S3 (Dynamo) Scalaris)
HBase)
• Graph Based (Neo4j)
15
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Types of NoSQL
16
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Types of NoSQL
Key-value Data Store
• Store data in a schema-less way
• Store data as maps
– HashMaps or associative arrays
– Provide a very efficient average running time algorithm for
accessing data
• Notable for:
– Couchbase (Zynga, Vimeo, NAVTEQ, ...)
– Redis (Craiglist, Instagram, StackOverfow, flickr, ...)
– Amazon Dynamo (Amazon, Elsevier, IMDb, ...)
– Apache Cassandra (Facebook, Digg, Reddit, Twitter,...)
– Voldemort (LinkedIn, eBay, …)
– Riak (Github, Comcast, Mochi, ...)
17
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Types of NoSQL
18
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Types of NoSQL
Key-value Data Store
19
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Types of NoSQL
Key-value Data Store
20
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Types of NoSQL
Key-value Data Store
21
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Types of NoSQL
Key-value Data Store
22
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Types of NoSQL
Document based
23
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Types of NoSQL
Document based - JSON
{
_id: ObjectId("51156a1e056d6f966f268f81"),
type: "Article",
author: "Derick Rethans",
title: "Introduction to Document Databases with MongoDB",
date: ISODate("2013-04-24T16:26:31.911Z"),
body: "This arti…"
},
{
_id: ObjectId("51156a1e056d6f966f268f82"),
type: "Book",
author: "Derick Rethans",
title: "php|architect's Guide to Date and Time Programming with PHP",
isbn: "978-0-9738621-5-7"
}
24
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Types of NoSQL
Document based - MongoDB
25
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Types of NoSQL
Document based
26
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Types of NoSQL
Column based
27
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Types of NoSQL
Column based
• Data are stored in a column-oriented way
– Data efficiently stored
– Avoids consuming space for storing nulls
– Columns are grouped in column-families
– Data isn’t stored as a single table but is stored by column families
– Unit of data is a set of key/value pairs
• Identified by “row-key”
• Ordered and sorted based on row-key
• Notable for:
– Google's Bigtable (used in all Google's services)
– HBase (Facebook, StumbleUpon, Hulu, Yahoo!, ...)
28
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Types of NoSQL
Column based
29
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Types of NoSQL
Column based
30
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Types of NoSQL
Column based
31
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Types of NoSQL
Graph based
• Graph-oriented
• Everything is stored as an edge, a node or
an attribute.
• Each node and edge can have any number
of attributes.
• Both the nodes and edges can be
labelled.
• Labels can be used to narrow searches.
32
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Types of NoSQL
Graph based
33
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Types of NoSQL
Graph based
34
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Why NoSQL?
35
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Advantages of NoSQL
36
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Disadvantages of NoSQL
37
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
CAP Theorem
38
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
CAP Theorem
39
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
CAP Theorem
40
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
CAP Theorem
41
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
42
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
43
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
44
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
SQL vs NoSQL
45
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
SQL vs NoSQL
46
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
SQL vs NoSQL
47
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
SQL vs NoSQL
48
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
NewSQL
49
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Characteristics of NewSQL
50
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Comparision
51
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
End
52