0% found this document useful (0 votes)

23 views

Chapter14_BigData&NoSQLDatabases

Database class

Uploaded by

chamso Abou

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views

Chapter14_BigData&NoSQLDatabases

Database class

Uploaded by

chamso Abou

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 39

Big Data and NoSQL

Databases

CSC3326
Learning Objectives

• Understand Big Data and its 3Vs

• Get introduced to NoSQL databases
• Understand the difference between Relational and NoSQL databases
• Distinguish the different types of NoSQL databases
• Explain how a document database such as MongoDB stores and
manipulates data
Big Data
• The rapid pace of data growth is the top challenge for organizations, with system performance
and scalability as the main challenges.

• Big data is a movement to find new and better ways to manage large amounts of data and
derive business insight from it, providing high performance and scalability at a reasonable cost
=> 3Vs.

• Volume: the quantity of data to be stored. The storage capacities associated with Big Data are
extremely large.

• Velocity: the rate at which new data enters the system as well as the rate at which the data
must be processed.

• Variety: the variations in the structure of the data to be stored. Data can be structured,
unstructured, or semistructured.
Big Data
As the quantity of data needing to be stored increases, the need for
larger storage devices increases as well:
Scaling up is keeping the same number of systems, but migrating each
system to a larger system:
Scaling out means that when the workload exceeds the capacity of a
server, the workload is spread out across a number of servers. This is also
referred to as clustering—creating a cluster of low-cost servers to share a
workload.
Big Data
• There is a need for databases that can provide:
 Scalability
 Flexibility
 Cost
 Availability
NoSQL
• Although much of the transactional data that organizations use works well in a
structured environment, most of the data in the world is semistructured or
unstructured.
• Relational databases impose a structure on the data when the data is captured
and stored.
• Big Data requires that the data be captured in whatever format it naturally
exists, without any attempt to impose a data model or structure to the data.
NoSQL
• NoSQL represents a broad array of nonrelational database technologies
that have developed to address the challenges represented by Big Data
• NoSQL DBs are built to be flexible, scalable and capable of rapidly
responding to the data management demands of Big Data applications.
• NoSQL DBs represent a different way of approaching the storage and
processing of data in a nonrelational way.
• NoSQL DBs do not force data to fit predefined structures.
• NoSQL DBs provide distributed, fault-tolerant databases for processing
unstructured data.
• NoSQL DBs are not based on the relational model and SQL.
NoSQL
• One of four categories: key-value data stores, document databases,
column-oriented databases, and graph databases
Data Consistency in Distributed Systems
• Distributed systems offer a range of benefits, including increased scalability, fault tolerance, and performance
• However, managing data consistency in distributed systems is a very complex problem
• Two consistency modes:
 Strong consistency: it is a requirement for data to be consistently and identically available across all
server nodes globally. At any given time, all server nodes should have the same value for a given entity. That
means that data across nodes need to be updated immediately after a write request was made to one of the
server nodes. During that time, access to data is locked.
 Eventual consistency: allows for temporary inconsistencies between server nodes in the system. This means
that the data across nodes will get consistent eventually. This will take time for updates to reach other nodes. This
makes data highly available=> access to data is not locked.
• Strong consistency provides immediate consistency but can result in higher latency and lower
availability. In contrast, eventual consistency prioritizes availability but can lead to temporary data
inconsistencies.
• When choosing between strong and eventual consistency, it’s important to consider the specific
requirements of your system.
• Non relational DBs adopt eventual consistency while relational DBs adopt strong consistency.
Strong Consistency Mode
Performance
• NoSQL DBs provide high scalability and high performance.
• Example: a blog website data.
• In a document-based NoSQL, all data related to each post is collected
into a self-contained single document containing data about user, post
details and comments.
• In a relational DB, this data will be split into three tables: user, post and
comment.
What are the benefits of NoSQL
•
databases?
Flexible data models: NoSQL databases typically have very flexible schemas.

• Horizontal scaling: most NoSQL databases allow you to scale-out horizontally, meaning you
can add cheaper commodity servers whenever you need to.

• Fast queries: Queries in NoSQL databases can be faster than SQL databases. Data in SQL
databases is typically normalized, so queries require you to join data from multiple tables. As
your tables grow, the joins can become expensive. However, data in NoSQL databases is
typically stored in a way that is optimized for queries => The rule of thumb is data that is
accessed together should be stored together.
The Disadvantages of NoSQL Databases
• NoSQL databases also have their own limitations and weaknesses.

• The lack of SQL: lack of a standard query language

• The lack of ACID: ACID stands for the four key properties that define a transaction
(Atomicity, Consistency, Isolation, and Durability) and NoSQL does not support these
properties.
NoSQL Databases
SQL Databases

Document: JSON documents, Key-

value: key-value pairs, Wide-column:
Data Storage Model Tables with fixed rows and columns
tables with rows and dynamic
columns, Graph: nodes and edges

Developed in the late 2000s with a

Developed in the 1970s with a focus focus on scaling and allowing for
Development History
on reducing data duplication rapid application change driven by
agile and DevOps practices.
Document: MongoDB and CouchDB, Key-value:
Oracle, MySQL, Microsoft SQL Server, and
Examples Redis and DynamoDB, Wide-column: Cassandra
PostgreSQL
and HBase, Graph: Neo4j and Amazon Neptune

Document: general purpose, Key-value: large

amounts of data with simple lookup queries, Wide-
Primary Purpose General purpose column: large amounts of data with predictable
query patterns, Graph: analyzing and traversing
relationships between connected data

Schemas Rigid Flexible

Scaling Vertical (scale-up with a larger server) Horizontal (scale-out across commodity servers)
Most do not support
multi-record ACID
Multi-Record ACID
Supported transactions.
Transactions
However, some — like
MongoDB — do.

Joins Typically required Typically not required

Key-Value Databases
• Key-value (KV) databases are conceptually the simplest of the NoSQL data models
• KV database is a NoSQL database that stores data as a collection of key-value pairs. The key acts
as an identifier for the value. The value can be anything such as text, an XML document, or an
image.
• The database does not attempt to understand the contents of the value component or its
meaning—the database simply stores whatever value is provided for the key
• Key-value pairs are typically organized into “buckets.” A bucket can roughly be thought of as the
KV database equivalent of a table. A bucket is a logical grouping of keys. Key values must be
unique within a bucket, but they can be duplicated across buckets.
• Operations on KV databases are rather simple—get, store, and delete operations are
used. Get or fetch is used to retrieve the value component of the pair.
Document Databases
• Document databases are conceptually similar to key-value databases, and they can almost be considered a
subtype of KV databases.

• A document database is a NoSQL database that stores data in tagged documents in key-value pairs.

• Unlike a KV database where the value component can contain any type of data, a document database always
stores a document in the value component.

• The document can be in any encoded format, such as XML or JSON (JavaScript Object Notation)

• While KV databases do not attempt to understand the content of the value component, document databases
do

• Despite the use of tags in documents, document databases are considered schema-less, that is, they do not
impose a predefined structure on the data that is stored

• Being schema-less means that although all documents have tags, not all documents are required to have the
same tags, so each document can have its own structure

• Tags inside the document are accessible to the DBMS, which makes sophisticated querying possible.

• Document databases group documents into logical groups called collections.

Column-Oriented Databases
• A column family database is a NoSQL database that organizes data in key-value pairs with keys
mapped to a set of columns in the value component.

• Each row key in the column family can have different columns.
Graph Databases
• A graph database is a NoSQL database based on graph theory to store data about relationship-rich
environments.

• Modeling and storing data about relationships is the focus of graph databases.

• The primary components of graph databases are nodes, edges, and properties

• The node is a specific instance of something we want to keep data about.

• Properties are like attributes; they are the data that we need to store about the node

• An edge is a relationship between nodes.

• Edges can be in one direction, or they can be bidirectional.

• A query in a graph database is called a traversal.

Aggregate Awareness

• Key-value, document, and column family databases are aggregate aware

• Aggregate aware means that the data is collected or aggregated around a central
topic or entity.
• For example, a blog website might organize data around individual blog posts. All data
related to each blog post is aggregated into a single denormalized collection that
might include data about the blog post (title, content, and date posted), the poster
(user name and screen name), and all comments made on the post (comment content
and commenter’s user name and screen name). In a normalized, relational database,
this same data might call for USER, BLOGPOST, and COMMENT tables.
• Determining the best central entity for forming aggregates is one of the most
important tasks in designing most NoSQL databases.
Working with MongoDB
• The name, MongoDB, comes from the word humongous as its developers
intended their new product to support extremely large data sets. It is designed
for:
• High availability
• High scalability
• High performance
• As a document database, MongoDB is schema-less and aggregate aware
• Schema-less means that all documents are not required to conform to the same
structure, and the structure of documents does not have to be declared ahead of
time.
Working with MongoDB

• Data is stored in documents, documents of a similar type are stored in

collections, and related collections are stored in a database.
• Documents are formatted using JSON for storage.
• JavaScript Object Notation (JSON) is a data interchange format that represents
data as a logical object.
• Objects are enclosed in curly brackets {} that contain key-value pairs.
Working with MongoDB (JSON)
• A single JSON object can contain many key:value pairs separated by commas.
• A simple JSON document to store data on a book might look like this:
{_id: 101, title: “Database Systems”}
• This document contains two key:value pairs:
o _id is a key with 101 as the associated value
o title is a key with “Database Systems” as the associated value

• The value component may have multiple values that would be appropriate for a given key
• When there are multiple values for a single key, an array is used.
• Arrays in JSON are placed inside square brackets []. For example, the above document
could be expanded to:
{_id: 101, title: “Database Systems”, author: [“Coronel”, “Morris”]}
Embedded documents
• Objects can also have other objects embedded as a value.
• Consider another simple document with data about a publisher that is
related to the book in the previous example.
Embedded documents
• In a relational environment, we would have used a BOOK table and a
PUBLISHER table with a 1:M relationship.
• Although this increases redundancy, NoSQL databases often sacrifice
redundancy to improve scalability.
• With document databases, we are attempting to avoid the need for joins,
making documents independent of each other so they can be easily scaled out
to many computers in a cluster.
Creating Databases and Collections in
MongoDB
• MongoDB databases comprise collections of documents.
• Each MongoDB server can host many databases.
• A database object contains collections. Collections are also objects.
Collection objects contain document objects.
• In addition to holding data content, an object can also have methods,
which are programmed functions for manipulating the object.
• MongoDB has two versions of the command-line MongoDB shell and a
graphical interface called MongoDB Compass.
• A list of the databases available on the server can be retrieved with the
command:
show dbs
• The following command creates a database named demo
use demo
• Using the createCollection() method with the db variable creates a
collection with the specified name. The following command creates a
“newproducts” collection inside the previously defined demo database:
db.createCollection(“newproducts”)
Inserting Documents in MongoDB

• db.<collection name>.insertOne({document})

• db.<collection name>.insertMany([{document1}, {document2},

{document3}])
• The following command displays all of the documents in the product
collection
db.products.find()

Software Quality Assurance A Self-Teaching Introduction
No ratings yet
Software Quality Assurance A Self-Teaching Introduction
677 pages
Cs 620 / Dasc 600 Introduction To Data Science & Analytics: Lecture 6-Nosql
No ratings yet
Cs 620 / Dasc 600 Introduction To Data Science & Analytics: Lecture 6-Nosql
31 pages
NOsql Presentation
No ratings yet
NOsql Presentation
20 pages
Unit Ii - Nosql Databases
No ratings yet
Unit Ii - Nosql Databases
112 pages
Lecture 1 - NoSQL
No ratings yet
Lecture 1 - NoSQL
31 pages
Lecture 1
No ratings yet
Lecture 1
31 pages
NOSQL Lecture 1 Notes
No ratings yet
NOSQL Lecture 1 Notes
31 pages
NoSQL_Notes
No ratings yet
NoSQL_Notes
11 pages
Unit 3
No ratings yet
Unit 3
10 pages
Full Stack UNIT3
No ratings yet
Full Stack UNIT3
57 pages
NoSQL (1)
No ratings yet
NoSQL (1)
12 pages
No SQL
No ratings yet
No SQL
38 pages
Features of Nosql: Non-Relational
No ratings yet
Features of Nosql: Non-Relational
7 pages
Module 5_NoSQL databases
No ratings yet
Module 5_NoSQL databases
33 pages
Nosql Module 1
No ratings yet
Nosql Module 1
23 pages
Unit 2
No ratings yet
Unit 2
23 pages
No SQL Lecture Notes
No ratings yet
No SQL Lecture Notes
17 pages
Nosql Databases
No ratings yet
Nosql Databases
2 pages
NOSQL
No ratings yet
NOSQL
25 pages
NoSQL D
No ratings yet
NoSQL D
26 pages
Unit 6
No ratings yet
Unit 6
143 pages
UNIT-III
No ratings yet
UNIT-III
22 pages
Learning Guide 2.1 - CloudDatabase - NOSQL PDF
No ratings yet
Learning Guide 2.1 - CloudDatabase - NOSQL PDF
44 pages
Lec 15 Notes
No ratings yet
Lec 15 Notes
3 pages
NoSQL Group1
No ratings yet
NoSQL Group1
15 pages
BDA_UNIT12
No ratings yet
BDA_UNIT12
9 pages
Unit II No-SQL Db Managment
No ratings yet
Unit II No-SQL Db Managment
33 pages
Unit 4
No ratings yet
Unit 4
36 pages
Nosql Databases Unit-1
No ratings yet
Nosql Databases Unit-1
16 pages
1842-week6-NoSQL
No ratings yet
1842-week6-NoSQL
51 pages
Dbms Presentation
No ratings yet
Dbms Presentation
22 pages
Bcse302l Dbms Module-7 Nosql
No ratings yet
Bcse302l Dbms Module-7 Nosql
30 pages
Unit VI_1
No ratings yet
Unit VI_1
31 pages
Nosql Database
No ratings yet
Nosql Database
19 pages
What Is NoSQL
No ratings yet
What Is NoSQL
4 pages
41 NoSQL Introduction.pptx
No ratings yet
41 NoSQL Introduction.pptx
18 pages
NOSQL , MONGODB
No ratings yet
NOSQL , MONGODB
18 pages
Lecture 3.1.2
No ratings yet
Lecture 3.1.2
47 pages
No SQL - Types, CAP Theorem(4)
No ratings yet
No SQL - Types, CAP Theorem(4)
12 pages
Full Stack-Unit-Iii
No ratings yet
Full Stack-Unit-Iii
56 pages
BDT UNIT-II
No ratings yet
BDT UNIT-II
13 pages
MongoDB Slides Until ClassTest
No ratings yet
MongoDB Slides Until ClassTest
221 pages
Chapter 1 - Introducing Big Data & NoSQL
No ratings yet
Chapter 1 - Introducing Big Data & NoSQL
14 pages
UNIT II First Half Notes
No ratings yet
UNIT II First Half Notes
21 pages
2 Big Data Analytics-Hadoop R21 A7902 ABP
No ratings yet
2 Big Data Analytics-Hadoop R21 A7902 ABP
16 pages
No SQL Database Compiled
No ratings yet
No SQL Database Compiled
20 pages
Unit 2 Handouts
No ratings yet
Unit 2 Handouts
11 pages
Unit 3
No ratings yet
Unit 3
28 pages
NoSQL Database
No ratings yet
NoSQL Database
8 pages
No SQL
No ratings yet
No SQL
38 pages
Unit 2
No ratings yet
Unit 2
26 pages
Unit 2
No ratings yet
Unit 2
65 pages
Iccmc51019 2021 9418441
No ratings yet
Iccmc51019 2021 9418441
5 pages
Bda Unit-5 PDF
No ratings yet
Bda Unit-5 PDF
83 pages
Lecture 6 - NoSQL
No ratings yet
Lecture 6 - NoSQL
28 pages
BDA Unit-5
No ratings yet
BDA Unit-5
18 pages
NoSql 2024 Assign2
No ratings yet
NoSql 2024 Assign2
189 pages
Unit No 1
No ratings yet
Unit No 1
34 pages
NoSQL DATABSES
No ratings yet
NoSQL DATABSES
12 pages
DBMS MASTER: Become Pro in Database Management System
From Everand
DBMS MASTER: Become Pro in Database Management System
Ummed Singh
No ratings yet
Databases: System Concepts, Designs, Management, and Implementation
From Everand
Databases: System Concepts, Designs, Management, and Implementation
Jonathan Rigdon
No ratings yet
CH03-COA10e.top Level View (1)
No ratings yet
CH03-COA10e.top Level View (1)
40 pages
Chapter8_AdvancedSQL_Part5V2
No ratings yet
Chapter8_AdvancedSQL_Part5V2
45 pages
Chapter10_TransactionManagementandConcurrencyControl
No ratings yet
Chapter10_TransactionManagementandConcurrencyControl
31 pages
Chapter6_NormalizationDatabaseTables_Part4 (2)
No ratings yet
Chapter6_NormalizationDatabaseTables_Part4 (2)
38 pages
Sheen Catalogue 2009 Rebrand
No ratings yet
Sheen Catalogue 2009 Rebrand
43 pages
Bcs 031
No ratings yet
Bcs 031
23 pages
Evermotion Archmodels 27 PDF
No ratings yet
Evermotion Archmodels 27 PDF
2 pages
Operating System - Lab Manual # 8
No ratings yet
Operating System - Lab Manual # 8
6 pages
Connectivity: 1.1.authorization - Dual-Message
No ratings yet
Connectivity: 1.1.authorization - Dual-Message
4 pages
G120C List Manual 0917 en-US
No ratings yet
G120C List Manual 0917 en-US
756 pages
Web Control Room: LESSON 1 - Introduction
No ratings yet
Web Control Room: LESSON 1 - Introduction
6 pages
AS-410 Datasheet
No ratings yet
AS-410 Datasheet
4 pages
Download Complete Quantum Computing In Action MEAP v09 Johan Vos PDF for All Chapters
100% (6)
Download Complete Quantum Computing In Action MEAP v09 Johan Vos PDF for All Chapters
55 pages
How To Install DayZ Mod
No ratings yet
How To Install DayZ Mod
2 pages
Ngeniusone Enterprise Essentials
No ratings yet
Ngeniusone Enterprise Essentials
2 pages
Openmanage Idrac Licensing Guide
No ratings yet
Openmanage Idrac Licensing Guide
15 pages
Computer 9 Summative Test
No ratings yet
Computer 9 Summative Test
4 pages
Mca N293 Java Lab Assignment
No ratings yet
Mca N293 Java Lab Assignment
2 pages
Data-Analytics-2025-V2.0
No ratings yet
Data-Analytics-2025-V2.0
18 pages
Sccribddownloader.com - Free PDF & Book Scribd Downloader
No ratings yet
Sccribddownloader.com - Free PDF & Book Scribd Downloader
14 pages
Online Examination Project Synopsis
No ratings yet
Online Examination Project Synopsis
4 pages
What Parameters Are Passed To WinMain
No ratings yet
What Parameters Are Passed To WinMain
1 page
PERBANDINGAN Huawei FusionCube Hyper
No ratings yet
PERBANDINGAN Huawei FusionCube Hyper
2 pages
PAVIRO - Training: Security Systems
No ratings yet
PAVIRO - Training: Security Systems
6 pages
1Z0-1050-23 - Exam Demo10 Answers
No ratings yet
1Z0-1050-23 - Exam Demo10 Answers
5 pages
A 013657 1643996180299 121912 Kana013657 HND Com 64 Unit 18 Discrete Mathematics Reworded 2021
No ratings yet
A 013657 1643996180299 121912 Kana013657 HND Com 64 Unit 18 Discrete Mathematics Reworded 2021
93 pages
Epp Tle-Ict Entrep6 q3 W8rvbasas
No ratings yet
Epp Tle-Ict Entrep6 q3 W8rvbasas
7 pages
Hackathon Project Ideas
No ratings yet
Hackathon Project Ideas
5 pages
Bdckio
No ratings yet
Bdckio
4 pages
MCA Mathematical Foundation For Computer Application 01
No ratings yet
MCA Mathematical Foundation For Computer Application 01
22 pages
Maximising The Benefits of XPages
No ratings yet
Maximising The Benefits of XPages
9 pages
Data Structure Midterm Exam Ay 2023-2024
No ratings yet
Data Structure Midterm Exam Ay 2023-2024
7 pages
Book IELTS Test Online - Booking Wizard
No ratings yet
Book IELTS Test Online - Booking Wizard
2 pages

Chapter14_BigData&NoSQLDatabases

Uploaded by

Chapter14_BigData&NoSQLDatabases

Uploaded by

Big Data and NoSQL

• Understand Big Data and its 3Vs

• The lack of SQL: lack of a standard query language

Document: JSON documents, Key-

Developed in the late 2000s with a

Document: general purpose, Key-value: large

Schemas Rigid Flexible

Joins Typically required Typically not required

• Document databases group documents into logical groups called collections.

• The node is a specific instance of something we want to keep data about.

• An edge is a relationship between nodes.

• Edges can be in one direction, or they can be bidirectional.

• A query in a graph database is called a traversal.

• Key-value, document, and column family databases are aggregate aware

• Data is stored in documents, documents of a similar type are stored in

• db.<collection name>.insertMany([{document1}, {document2},

You might also like