0% found this document useful (0 votes)

21 views24 pages

Bcis5420 - Lecture Note - ch6 - Big Data Technologies

The document discusses big data technologies including data lakes, NoSQL databases, and MongoDB. It covers characteristics of big data like volume, variety, and velocity. It also compares schema on read and schema on write approaches and provides examples of JSON and XML data structures.

Uploaded by

483-ROHIT SURAPALLI

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views24 pages

Bcis5420 - Lecture Note - ch6 - Big Data Technologies

Uploaded by

483-ROHIT SURAPALLI

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

Ch.6.

Big Data Technologies

Chapter 6

Big Data Technologies

2.1
Ch.6. Big Data Technologies

Section. 1

Introduction to Big Data

2.2
Ch.6. Big Data Technologies

Big Data and Analytics

• Big Data: data that exist in very large volumes and many
different data types and that need to be processed at a very
high velocity (e.g., Internet of Things)

• Analytics: systematic analysis and interpretation of data,

typically using mathematical, statistical, and computational
tools, to improve understanding of a real-world domain

2.3
Ch.6. Big Data Technologies

Characteristics of Big Data

• Five V’s of Big Data

▪ Volume – much larger quantity of data than typical for relational databases
▪ Variety – various data formats (e.g., structured, semi-structured, unstructured)
▪ Velocity – data comes at very fast rate (e.g. mobile sensors, Web click stream)
▪ Veracity – credible and reliable data
▪ Value – practical and beneficial data

2.4
Ch.6. Big Data Technologies

Why Relational DBMS is NOT for Big Data

• Hard to organize big data into relational structures, due to various data
types (variety), enormous size (volume), and growth rate (velocity)
• Hence, hard to ensure credibility (veracity) and usefulness of data (value)

Variety: Data
Types and Sources
Velocity: Real (or near)
e.g., image, video, text Unstructured time data transaction
Data

e.g., various data frames Semi-structured Data Data

(CSV, JSON etc.) Data Lake Warehouse

Structured Volume: large Analytics Friendly

e.g., data in rows and columns
(e.g., tidy data) Data size of data

2.5
Ch.6. Big Data Technologies

Technologies for Big Data

• Data Lake
▪ A large integrated repository for internal and external data that do not
follow a predefined schema (i.e., data warehouse)
▪ Capture everything, dive in anywhere, flexible access

• Schema on Read, rather than Schema on Write

▪ Schema on Write: preexisting data model, how traditional databases are
designed (relational databases)
▪ Schema on Read: data model determined later, depends on how you
want to use it (XML, JSON, BSON)
▪ Capture and store the data, and care about how you want to use it later
▪ More flexible in terms of data integration and migration
▪ Little need for data normalization

2.6
Ch.6. Big Data Technologies

Schema on Write vs. Schema on Read

Schema on Write: Schema on Read:

Relational Database Design Big Data Approach

Requiring more programming and technical

skills for database managers and analysts

2.7
Ch.6. Big Data Technologies

Examples of JSON and XML

The same data in different data
structures, JSON and XML

2.8
Ch.6. Big Data Technologies

Section. 2

NoSQL

2.9
Ch.6. Big Data Technologies

NoSQL (Not Only SQL)

• A category of recently introduced data

storage and retrieval technologies not
based on the relational model
• Supports schema on read for data lake
• Optimal for a cloud environment
• Mostly open source

source: aws.amazon.com

2.10
Ch.6. Big Data Technologies

NoSQL Classifications
• Key-Value Stores
▪ A simple pair of a key and an associated collection of values
▪ Key is usually a string
▪ Database has no knowledge of the structure or meaning of the values
▪ Example: Redis

Key Value
Table_Name: Primary_Key_value: Attribute_Name = Value

STUDENT
STUDENT:ID100:FName = “John”,
ID FName LName Grade
STUDENT:ID100:LName = “Smith”,
ID100 John Smith 90 STUDENT:ID100:Grade = 90

ID101 Peter Gregory 95

STUDENT:ID101:FName = “Peter”,
STUDENT:ID101:LName = “Gregory”,
STUDENT:ID101:Grade = 95

2.11
Ch.6. Big Data Technologies

NoSQL Classifications (cont’d)

• Wide-Column Stores

▪ Distribution of data is based on both key values (records) and

columns, using “column families”
▪ Example: Cassandra

STUDENT Column Family_1

ID FName LName Grade FName LName Grade

ID100
ID100 John Smith 90 John Smith 90

ID101 Peter Gregory N/A FName LName

ID101
Peter Gregory

Column Family_2

2.12
Ch.6. Big Data Technologies

NoSQL Classifications (cont’d)

• Document Stores

▪ Like a key-value store, but “document” goes further than “value”

▪ Document is structured so specific elements can be manipulated separately
▪ Example: MongoDB

“Document”
{ Primary_Key: ID1, Attribute_1: Value_1, Attribute_2: Value_2, …}

STUDENT
ID FName LName Grade {_id: “ID100”, FName: “John”,
ID100 John Smith 90 LName: “Smith”, Grade: 90}

ID101 Peter Gregory 95 {_id: “ID101”, FName: “Peter”, LName:

“Gregory”, Grade: 95}

2.13
Ch.6. Big Data Technologies

NoSQL Classifications (cont’d)

• Graph-Oriented Database

▪ Maintain information regarding the relationships between

data tables in graphs
▪ More intuitive and analytics friendly (easy to understand)
▪ Example: Neo 4j

Relational DB Graph-Oriented DB ORDER

PRODUCT ORDERLINE

ORDER
CUSTOMER LINE
CUSTOMER ORDER

PRODUCT

2.14
Ch.6. Big Data Technologies

Section. 2

MongoDB

2.15
Ch.6. Big Data Technologies

MongoDB
• “Mongo” being short for humongous, meaning huge or gigantic
• A document-store database
• Collections
• Equivalent to tables in a relational database, containing multiple documents
• Keys
• Equivalent to columns in a relational database
• Documents
• Equivalent to rows in a relational database
• Documents do not need to have the same structure
• Relationships
• _id property serves as “primary key”
• Another document can have a “foreign” key for another JSON property

2.16
Ch.6. Big Data Technologies

JSON in Mongo DB vs. Relational DB

• JSON Data
“Advisor”:
[
{“_id”: “Advisor1”, “name”: “Luke Shire”, “dept”: “Sales”},
{“_id”: “Advisor2”, “name”: “Merry Longbottom”, “dept”: “Accounting”}
{“_id”: “Advisor3”, “name”: “Jason Manor”, “dept”: “Marketing”, “tell”: “123-456-7890”}
]
Missing values may occur
• Relational Data (i.e., data anomalies)
ADVISOR
id name dept tell
Advisor1 Luke Shire Sales

Advisor2 Merry Longbottom Accounting

Advisor3 Jason Manor Marketing 123-456-7890

2.17
Ch.6. Big Data Technologies

Mongo Documents with Relationships

a) A document in the product collection b) A document in the reviewer collection

Based on the primary key, “_id”, it is

possible to have more details of a
reviewer, such as first and last names

2.18
Ch.6. Big Data Technologies

How to Query in Mongo DB

• Select Documents (rows) in a Collection (table) by Conditions

▪ db.collection.find ( )

• Types of Brackets

▪ Round bracket ( ) for collection (i.e., table)

▪ Brace bracket { } for document (i.e., row) and operator
▪ Box bracket [ ] for array (i.e., group of values)

2.19
Ch.6. Big Data Technologies

How to Query in Mongo DB (cont’d)

• Equator ($eq) and And ($and) Operator

▪ Using Yelp review data, show businesses that are open

: db.collection.find({is_open: {$eq:1}})
▪ Show businesses from Colorado
: db.collection.find({state: {$eq: "CO"}})
c.f., equator ($eq) operator CAN be dropped, producing the same result
▪ Show businesses from Boulder, Colorado
: db.collection.find({$and: [{state: {$eq: "CO"}},{city: {$eq: "Boulder"}}]})
c.f., and ($and) operator CAN be dropped, because state and city use the
same operator ($eq) and thus, equivalent to “db.collection.find({state: "CO",
city: "Boulder" })”

2.20
Ch.6. Big Data Technologies

How to Query in Mongo DB (cont’d)

• Greater ($gt) and Greater or Equal ($gte) Operator

▪ Show businesses with star rating greater than 3.5

: db.collection.find({stars: {$gt: 3.5}})
▪ Show businesses with star rating greater than or equal to 4
: db.collection.find({stars: {$gte: 4.0}})
▪ Show businesses with star rating greater than or equal to 4 and open
: db.collection.find({$and:[{is_open:{$eq:1}},{stars:{$gte:4}}]})
c.f., and ($and) operator CANNOT be dropped, because there are two
different operators ($eq and $gte)

2.21
Ch.6. Big Data Technologies

How to Query in Mongo DB (cont’d)

• Less ($lt) and Less or Equal ($lte) Operator

• Show businesses with star rating less than 4.5

: db.collection.find({stars: {$lt: 4.5}})
• Show businesses with star rating less than or equal to 4
: db.collection.find({stars: {$lte: 4.0}})
• Show businesses with star rating less than or equal to 4 and open
: db.collection.find({$and: [{is_open: {$eq: 1}},{stars: {$lte: 4}}]})

2.22
Ch.6. Big Data Technologies

How to Query in Mongo DB (cont’d)

• In ($in) and Not In ($nin) Operator (c.f., only for array)

▪ Show businesses in Colorado or Texas

: db.collection.find({state: {$in: ["CO","TX"]}})
▪ Show businesses not in Colorado or Texas
: db.collection.find({state: {$nin: ["CO","TX"]}})

2.23
Ch.6. Big Data Technologies

References

• The major contents of this note are reproduced from the textbook of BCIS 5420;
Topi et al. Modern Database Management. 13 th edition. Pearson's, 2019

• Unless having a specific reference source, the photos and icons used in this
material are from the following sources providing copyright free images:
imagesource.com, iconfinder.com, and pexels.com

• The diagrams used are from the textbook publisher’s materials

2.24

Informatica Interview Part 1
100% (1)
Informatica Interview Part 1
111 pages
20764B Administering A SQL Database Infrastructure
No ratings yet
20764B Administering A SQL Database Infrastructure
6 pages
Author Contributions: Manuscript Title
No ratings yet
Author Contributions: Manuscript Title
6 pages
Screenshot 2023-12-07 at 00.20.37
No ratings yet
Screenshot 2023-12-07 at 00.20.37
21 pages
NOSQL Lecture 1 Notes
No ratings yet
NOSQL Lecture 1 Notes
31 pages
NGT NOV-19 (Sol) (E-Next - In)
No ratings yet
NGT NOV-19 (Sol) (E-Next - In)
33 pages
03 Unit Bda Hadoop, Map Reduce
No ratings yet
03 Unit Bda Hadoop, Map Reduce
80 pages
Lecture 1 - NoSQL
No ratings yet
Lecture 1 - NoSQL
31 pages
Chapter 5c
No ratings yet
Chapter 5c
18 pages
Chapter 14
No ratings yet
Chapter 14
35 pages
Bcse302l Dbms Module-7 Nosql
No ratings yet
Bcse302l Dbms Module-7 Nosql
30 pages
Lecture 1
No ratings yet
Lecture 1
31 pages
Cheatsheet 2
No ratings yet
Cheatsheet 2
2 pages
Cs 620 / Dasc 600 Introduction To Data Science & Analytics: Lecture 6-Nosql
No ratings yet
Cs 620 / Dasc 600 Introduction To Data Science & Analytics: Lecture 6-Nosql
31 pages
41 NoSQL Introduction
No ratings yet
41 NoSQL Introduction
18 pages
1 Bda A6515 Intro Bda
No ratings yet
1 Bda A6515 Intro Bda
48 pages
2 BDA A6515 Hadoop
No ratings yet
2 BDA A6515 Hadoop
55 pages
Slide 6 NoSQL Database and HBase Tutorial
No ratings yet
Slide 6 NoSQL Database and HBase Tutorial
110 pages
CS8091 LN
No ratings yet
CS8091 LN
68 pages
01 Unit-BDA - Intro BDA
No ratings yet
01 Unit-BDA - Intro BDA
37 pages
Lecture 8
No ratings yet
Lecture 8
34 pages
Big Data Complete Notes
No ratings yet
Big Data Complete Notes
9 pages
Introduction To NoSQL
No ratings yet
Introduction To NoSQL
29 pages
06 NoSQL
No ratings yet
06 NoSQL
80 pages
BIG - DATA - Unit 4
No ratings yet
BIG - DATA - Unit 4
99 pages
Understanding Big Data and NoSQL
No ratings yet
Understanding Big Data and NoSQL
31 pages
BD Unit 1
No ratings yet
BD Unit 1
5 pages
Bigdata Unit 4
No ratings yet
Bigdata Unit 4
97 pages
Big Data
No ratings yet
Big Data
24 pages
Unit 2 (Big Data Analytics)
No ratings yet
Unit 2 (Big Data Analytics)
11 pages
Iccmc51019 2021 9418441
No ratings yet
Iccmc51019 2021 9418441
5 pages
Big Data Unit 5
No ratings yet
Big Data Unit 5
16 pages
Big Data Query Processing Approach UsingMongoDB
No ratings yet
Big Data Query Processing Approach UsingMongoDB
6 pages
Chapter-1-Introduction To Big Data
No ratings yet
Chapter-1-Introduction To Big Data
25 pages
Lecture 6 - NoSQL
No ratings yet
Lecture 6 - NoSQL
28 pages
Super Important Questions For BDA
100% (1)
Super Important Questions For BDA
26 pages
No SQL
No ratings yet
No SQL
38 pages
Big Data 2022 Notes
No ratings yet
Big Data 2022 Notes
118 pages
CS 441 Handouts
No ratings yet
CS 441 Handouts
300 pages
NOsql Presentation
No ratings yet
NOsql Presentation
20 pages
WK 3
No ratings yet
WK 3
29 pages
Big Data 2022 Notes
No ratings yet
Big Data 2022 Notes
118 pages
Big Data Analytics
No ratings yet
Big Data Analytics
49 pages
Kuliah M1 - TEKREK - Komputasi Big Data
No ratings yet
Kuliah M1 - TEKREK - Komputasi Big Data
55 pages
Big Data
No ratings yet
Big Data
957 pages
Big Data 2022 Notes
No ratings yet
Big Data 2022 Notes
118 pages
IMP Questions PDF in Big Data
No ratings yet
IMP Questions PDF in Big Data
15 pages
Big Data Analytics (VN) 1
No ratings yet
Big Data Analytics (VN) 1
98 pages
Unit 1
No ratings yet
Unit 1
18 pages
Chapter14 BigData&NoSQLDatabases
No ratings yet
Chapter14 BigData&NoSQLDatabases
39 pages
CS8091 Big Data Analytics Unit5
No ratings yet
CS8091 Big Data Analytics Unit5
71 pages
Graph Databases: Key Points: 1. Definition & Basics
No ratings yet
Graph Databases: Key Points: 1. Definition & Basics
20 pages
MongoDB Ebook 07292020
No ratings yet
MongoDB Ebook 07292020
24 pages
Nosql Tricks
No ratings yet
Nosql Tricks
34 pages
Unit 2
No ratings yet
Unit 2
41 pages
Unit 2 (Chapter 4) - Big Data Modelling
No ratings yet
Unit 2 (Chapter 4) - Big Data Modelling
9 pages
Bda CHP 3
No ratings yet
Bda CHP 3
75 pages
Module 1
No ratings yet
Module 1
90 pages
Database And Computer Management: SERIES 1, #3
From Everand
Database And Computer Management: SERIES 1, #3
Elias Mutegi
No ratings yet
DBMS MASTER: Become Pro in Database Management System
From Everand
DBMS MASTER: Become Pro in Database Management System
Ummed Singh
No ratings yet
The Power of Big Data: Transforming Industries and Shaping the Future
From Everand
The Power of Big Data: Transforming Industries and Shaping the Future
Tom Henricksen
No ratings yet
Databases: System Concepts, Designs, Management, and Implementation
From Everand
Databases: System Concepts, Designs, Management, and Implementation
Jonathan Rigdon
No ratings yet
Learn SQL in 24 Hours
From Everand
Learn SQL in 24 Hours
Alex Nordeen
5/5 (4)
Bcis5420 - Lecture Note - ch5 - Data Normalization
No ratings yet
Bcis5420 - Lecture Note - ch5 - Data Normalization
43 pages
CH 7
No ratings yet
CH 7
7 pages
Chapter Five: How Leaders Respond To The Situation at Hand
No ratings yet
Chapter Five: How Leaders Respond To The Situation at Hand
28 pages
Eligibility To Become A Depository Participant
No ratings yet
Eligibility To Become A Depository Participant
2 pages
Costs Involved in Equity Investing and Estimation of Profit
No ratings yet
Costs Involved in Equity Investing and Estimation of Profit
9 pages
IPO
No ratings yet
IPO
10 pages
Notes On Private Placement in India
No ratings yet
Notes On Private Placement in India
6 pages
Pop Quiz - Konsep TOGAF - Attempt Review
No ratings yet
Pop Quiz - Konsep TOGAF - Attempt Review
5 pages
SE UNIT - 1 - PPT - Manju Vyas
No ratings yet
SE UNIT - 1 - PPT - Manju Vyas
27 pages
E TOM
No ratings yet
E TOM
86 pages
Information Management Maturity Assessment: # Questions Response
No ratings yet
Information Management Maturity Assessment: # Questions Response
12 pages
CH 6
No ratings yet
CH 6
10 pages
MIS Mid Sem Question Paper
No ratings yet
MIS Mid Sem Question Paper
1 page
TAFJ-Read Only Database
100% (3)
TAFJ-Read Only Database
15 pages
Bda - 2 Unit
No ratings yet
Bda - 2 Unit
12 pages
Answers To Testing Throughout The Software Life Cycle Section
No ratings yet
Answers To Testing Throughout The Software Life Cycle Section
4 pages
Transaction Processing in Postgresql: Tom Lane Great Bridge, LLC Tgl@Sss - Pgh.Pa - Us 1
No ratings yet
Transaction Processing in Postgresql: Tom Lane Great Bridge, LLC Tgl@Sss - Pgh.Pa - Us 1
22 pages
©silberschatz, Korth and Sudarshan 1 Database System Concepts - 6 Edition
No ratings yet
©silberschatz, Korth and Sudarshan 1 Database System Concepts - 6 Edition
9 pages
DWM Lab Manual
No ratings yet
DWM Lab Manual
92 pages
Data Warehouse Architecture
No ratings yet
Data Warehouse Architecture
4 pages
Keys in Database Management System
No ratings yet
Keys in Database Management System
14 pages
105690-Creating A Custom Pathcode and Environment
No ratings yet
105690-Creating A Custom Pathcode and Environment
4 pages
Commands of SQL
No ratings yet
Commands of SQL
4 pages
Mongo DB
No ratings yet
Mongo DB
28 pages
Nimble Storage SQL Server
No ratings yet
Nimble Storage SQL Server
12 pages
Distributed Databases: Practice Exercises
No ratings yet
Distributed Databases: Practice Exercises
8 pages
Validating Lims in A GMP Environment: Howto
No ratings yet
Validating Lims in A GMP Environment: Howto
30 pages
A Program Lab Report On: Bachelor of Computer Applications
No ratings yet
A Program Lab Report On: Bachelor of Computer Applications
21 pages
Database Management System: by Bhavesh Patel
No ratings yet
Database Management System: by Bhavesh Patel
17 pages
Bank Al Habib
No ratings yet
Bank Al Habib
4 pages
02a - PPT2 - DW Components - R0
No ratings yet
02a - PPT2 - DW Components - R0
17 pages
Oracle Performance Tuning 101 - Developer Perspective 090528 - 1
No ratings yet
Oracle Performance Tuning 101 - Developer Perspective 090528 - 1
25 pages
Zilla Swasthya Samiti Nayagarh
No ratings yet
Zilla Swasthya Samiti Nayagarh
4 pages
ART2
No ratings yet
ART2
8 pages

Bcis5420 - Lecture Note - ch6 - Big Data Technologies

Uploaded by

Bcis5420 - Lecture Note - ch6 - Big Data Technologies

Uploaded by

Ch.6.

Big Data Technologies

Big Data Technologies

Introduction to Big Data

Big Data and Analytics

• Analytics: systematic analysis and interpretation of data,

Characteristics of Big Data

• Five V’s of Big Data

Why Relational DBMS is NOT for Big Data

e.g., various data frames Semi-structured Data Data

Structured Volume: large Analytics Friendly

Technologies for Big Data

• Schema on Read, rather than Schema on Write

Schema on Write vs. Schema on Read

Schema on Write: Schema on Read:

Requiring more programming and technical

Examples of JSON and XML

NoSQL (Not Only SQL)

• A category of recently introduced data

ID101 Peter Gregory 95

NoSQL Classifications (cont’d)

▪ Distribution of data is based on both key values (records) and

STUDENT Column Family_1

ID FName LName Grade FName LName Grade

ID101 Peter Gregory N/A FName LName

NoSQL Classifications (cont’d)

▪ Like a key-value store, but “document” goes further than “value”

ID101 Peter Gregory 95 {_id: “ID101”, FName: “Peter”, LName:

NoSQL Classifications (cont’d)

▪ Maintain information regarding the relationships between

Relational DB Graph-Oriented DB ORDER

JSON in Mongo DB vs. Relational DB

Advisor2 Merry Longbottom Accounting

Advisor3 Jason Manor Marketing 123-456-7890

Mongo Documents with Relationships

a) A document in the product collection b) A document in the reviewer collection

Based on the primary key, “_id”, it is

How to Query in Mongo DB

• Select Documents (rows) in a Collection (table) by Conditions

▪ Round bracket ( ) for collection (i.e., table)

How to Query in Mongo DB (cont’d)

▪ Using Yelp review data, show businesses that are open

How to Query in Mongo DB (cont’d)

• Greater ($gt) and Greater or Equal ($gte) Operator

▪ Show businesses with star rating greater than 3.5

How to Query in Mongo DB (cont’d)

• Less ($lt) and Less or Equal ($lte) Operator

• Show businesses with star rating less than 4.5

How to Query in Mongo DB (cont’d)

• In ($in) and Not In ($nin) Operator (c.f., only for array)

▪ Show businesses in Colorado or Texas

• The diagrams used are from the textbook publisher’s materials

You might also like