Big Data - RDBMS, NoSQL and DynamoDB

Uploaded by

Ganapathy P

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

80 views

Big Data - RDBMS, NoSQL and DynamoDB

Uploaded by

Ganapathy P

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

PG Program in Cloud Computing

Big Data Notes : RDBMS,NoSQL and

DynamoDB

Big data is everywhere. Online retail, social media, conglomerates etc.,

● Used in analyzing the user needs and giving better recommendations.
● Used in analyzing the performance of a sports player and preparing strategy and game
predictions.
● Used in financial institutions, picking up the news and trends, algorithmic trading, price prediction
in real estates.
How big is Big Data? Petabytes (1 Million Gigabyte)
10BM photos ~ 1.5 petabytes
VVV - Volume | Variety | Velocity

Python is slow when it comes to big data processing

Hence tools like Hadoop(HDFS and MapReduce) / Hive(SQL like calls to HDFS) / Spark(pyspark) are
used.
Large numbers of physical hardwares are required for the big data. Setting those resources up initially
will cost you a lot. Cloud services can save the initial setting up cost with the pay per gigabyte and pay
per compute hour model. also one should be very careful while using the cloud for big data.
The big data is significantly helpful in large scale analysis. Analyzing the data of all the smart car users,
from the shopping complex purchase trends etc, this is done for increasing the profit and productivity.

Applications are evolving so do the databases.

RDBMS
● The data is present in separate categorical tables while a linking column relates the tables.
Linking or joining tables with reference to the foreign key(linking column) is done through the
select query which we fire towards the database when we need the details from multiple tables.
So multiple tables data can be joined to create a ‘result set’ which will be consumed by the
application or the user.
● But the join operation happens every time the query is executed. For example if the query is run 2
million times, then the joining operation is done 2 million times.
● If the tables count is higher than the join itself can be of a very high number.
● So fetching the data within a few seconds from such a large join will be time consuming.

NoSQL
● The size of data stored is increasing at a very high velocity.

© 2013 - 2021 Great Lakes E-Learning Services Pvt. Ltd. All rights reserved
PG Program in Cloud Computing
● The data type and formats are also varied and increasing.

CAP :-
Consistency - Once the written data is readable immediately after the write operation is finished
Availability: ability to withstand the downtime of the Database server
Durability : The data once stored should always be available. (Partition Tolerance)

Master - Slave architectures are CP(Consistent and Partition Tolerance)

Masterless architectures are AP(Availability and Partition Tolerance)

© 2013 - 2021 Great Lakes E-Learning Services Pvt. Ltd. All rights reserved
PG Program in Cloud Computing
In RDBMS, the Data is mapped properly and any kind of query can be fired for fetching the data across
multiple hundred tables.
In NoSQL, you have to reverse this mechanism and plan the time to fetch the data.
DynamoDB is very similar to Cassandra.

DynamoDB
It is a NoSQL. Key value type
Tables
Indices
Global Tables
Interaction - insert delete edit data

● Hotspot - Sudden raise is incoming requests for one particular partition.

● Partitions split when more reads are done.
● Partition key and sort key combined together will bring one record back
● Global - row key and sort key can be different
● Local - RK should be same and sort key can be different

© 2013 - 2021 Great Lakes E-Learning Services Pvt. Ltd. All rights reserved
PG Program in Cloud Computing
One table cannot have all the data. Use ways to retrieve data with a minimum number of tables where
the data is flatten out as possible - with data that does not change often.

Read capacity unit(RCU) - 1 strongly consistent read per second / 2 eventually consistent read.
Write capacity unit(WCU) - 1KB/s
So 20KB with strongly consistent read will be = 4KB x 5RCU
(3000RCU and 1000WCU per partition)
(if the partition size is increasing more than 10GB then the partition will split)

Autoscaling - can configure for percentage of the RCU after which more capacity should be added
At the initial stage, the load should be monitored and default read unit should be mapped correctly

DynamoDB Streams

Global tables
Global tables are managed solution for deploying a multi-region, multi-master database, without
having to build and maintain your own replication solution
Ideal for massively scaled applications, with globally dispersed users
It is a collection of one or more replica tables, all owned by a single AWS account
Each replica stores the same set of data items. Any given global table can only have one replica
table per region.

https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/best-practices.html

DynamoDB Accelerator
DAX is a pass through layer, a cluster of cache.
So before hitting the DynamoDB, the data will be checked in the cache layer. If the data is not
present in this layer then the query will be sent to DynamoDB.

12 - DataEngineer - Interview - Questions and Answers - EPAM Anywhere
No ratings yet
12 - DataEngineer - Interview - Questions and Answers - EPAM Anywhere
2 pages
Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
NOSQL
No ratings yet
NOSQL
23 pages
LBYMF1A Syllabus PDF
No ratings yet
LBYMF1A Syllabus PDF
5 pages
03 Employee Database
No ratings yet
03 Employee Database
11 pages
NoSQL MongoDB HBase Cassandra
100% (1)
NoSQL MongoDB HBase Cassandra
142 pages
Characteristics of Key Value DB (DB)
No ratings yet
Characteristics of Key Value DB (DB)
13 pages
An Investigation of NoSQL Database Performance From A MYSQL Perspective
No ratings yet
An Investigation of NoSQL Database Performance From A MYSQL Perspective
3 pages
MongoBoulder - Schema Design
No ratings yet
MongoBoulder - Schema Design
59 pages
What Is DW2.0
No ratings yet
What Is DW2.0
13 pages
Cassandra: Types of Nosql Databases
No ratings yet
Cassandra: Types of Nosql Databases
6 pages
A Performance Comparison of SQL and NoSQL Databases
No ratings yet
A Performance Comparison of SQL and NoSQL Databases
5 pages
Nosql - Journey Ahead!: Origin: Punch Cards To Dbms
No ratings yet
Nosql - Journey Ahead!: Origin: Punch Cards To Dbms
54 pages
Nosql Databases: by Amy Alexander and Tanya Christina
No ratings yet
Nosql Databases: by Amy Alexander and Tanya Christina
14 pages
SQL NoSQL NewSQL
No ratings yet
SQL NoSQL NewSQL
12 pages
Nosqlmodule 1
100% (1)
Nosqlmodule 1
102 pages
NoSQL Intro
No ratings yet
NoSQL Intro
26 pages
Nosql Database: New Era of Databases For Big Data Analytics - Classification, Characteristics and Comparison
No ratings yet
Nosql Database: New Era of Databases For Big Data Analytics - Classification, Characteristics and Comparison
17 pages
Apache Spark Quick Guide
100% (2)
Apache Spark Quick Guide
21 pages
Edureka Interview Questions - HDFS
No ratings yet
Edureka Interview Questions - HDFS
4 pages
Mongodb Interview Questions (V4.4)
No ratings yet
Mongodb Interview Questions (V4.4)
25 pages
Unit-5 Notes
No ratings yet
Unit-5 Notes
17 pages
Introduction To Nosql: - Key Value Databases
No ratings yet
Introduction To Nosql: - Key Value Databases
14 pages
Unit 5-Key - Value Store Database
No ratings yet
Unit 5-Key - Value Store Database
16 pages
3 Lecture 3-ETL
100% (1)
3 Lecture 3-ETL
42 pages
Big Data Analytics Unit-5
No ratings yet
Big Data Analytics Unit-5
28 pages
Design Document Database
No ratings yet
Design Document Database
62 pages
CS8492-Database Management Systems-UNIT 5
100% (1)
CS8492-Database Management Systems-UNIT 5
20 pages
Redis Cheat Sheet
No ratings yet
Redis Cheat Sheet
4 pages
HBase
No ratings yet
HBase
36 pages
MIE1628 Big Data Analytics Lecture8
No ratings yet
MIE1628 Big Data Analytics Lecture8
82 pages
Unit 1 Introduction To BIG DATA ANALYSIS: Evolution of Technology
No ratings yet
Unit 1 Introduction To BIG DATA ANALYSIS: Evolution of Technology
9 pages
Spark Summit East 2015 - Adv Dev Ops - Student Slides
No ratings yet
Spark Summit East 2015 - Adv Dev Ops - Student Slides
219 pages
Big Data and Hadoop For Developers - Syllabus
No ratings yet
Big Data and Hadoop For Developers - Syllabus
6 pages
Apache Sqoop
No ratings yet
Apache Sqoop
21 pages
MongoDB Performance Best Practices
No ratings yet
MongoDB Performance Best Practices
15 pages
Nosql Database Systems: M.Tech. (Iind, Sem Ce/Cn)
100% (1)
Nosql Database Systems: M.Tech. (Iind, Sem Ce/Cn)
135 pages
NoSql Notes
No ratings yet
NoSql Notes
4 pages
Serverless Architecture For Product Defect Detection Using Computer Vision Ra
No ratings yet
Serverless Architecture For Product Defect Detection Using Computer Vision Ra
1 page
Hadoop Report
No ratings yet
Hadoop Report
110 pages
PL SQL Exercise by Unsw
No ratings yet
PL SQL Exercise by Unsw
5 pages
Lecture 07 - Key-Value Databases
No ratings yet
Lecture 07 - Key-Value Databases
75 pages
4.2 NoSQL Databases UNIT-1
No ratings yet
4.2 NoSQL Databases UNIT-1
35 pages
Complete Guide To Spark Memory Management 1726709042
No ratings yet
Complete Guide To Spark Memory Management 1726709042
11 pages
Talend Open Studio For Data Integration: User Guide
No ratings yet
Talend Open Studio For Data Integration: User Guide
452 pages
Cassandra Tutorial For Beginners: Learn in 3 Days: What Is Apache Cassandra?
No ratings yet
Cassandra Tutorial For Beginners: Learn in 3 Days: What Is Apache Cassandra?
4 pages
De Mod 2 Transform Data With Spark
No ratings yet
De Mod 2 Transform Data With Spark
32 pages
MongoDB Lab
No ratings yet
MongoDB Lab
41 pages
NoSQL Module 2
No ratings yet
NoSQL Module 2
76 pages
CCS334 BIG DATA ANALYTICS Session 1 Intr
No ratings yet
CCS334 BIG DATA ANALYTICS Session 1 Intr
18 pages
Da Notes (Big Data) PDF
No ratings yet
Da Notes (Big Data) PDF
32 pages
Bda - Unit 2
No ratings yet
Bda - Unit 2
30 pages
Migrating A PostgreSQL Database To Citus
No ratings yet
Migrating A PostgreSQL Database To Citus
3 pages
Migrate Sparc
No ratings yet
Migrate Sparc
8 pages
Module 4 Nosql
No ratings yet
Module 4 Nosql
8 pages
Theory Questions: Finalexam. Write Down The Command That You Have Used Inside The Terminal To
No ratings yet
Theory Questions: Finalexam. Write Down The Command That You Have Used Inside The Terminal To
1 page
RDBMS To MongoDB Migration
No ratings yet
RDBMS To MongoDB Migration
20 pages
Mongo DB (1)
No ratings yet
Mongo DB (1)
30 pages
Learn Hive in 24 Hours
From Everand
Learn Hive in 24 Hours
Alex Nordeen
No ratings yet
Monitoring Hadoop
From Everand
Monitoring Hadoop
Gurmukh Singh
No ratings yet
HDInsight Essentials - Second Edition
From Everand
HDInsight Essentials - Second Edition
Rajesh Nadipalli
No ratings yet
Optimizing Hadoop for MapReduce
From Everand
Optimizing Hadoop for MapReduce
Khaled Tannir
No ratings yet
The Critical Window of Shadow Librarie
No ratings yet
The Critical Window of Shadow Librarie
1 page
Cambridge O Level: Computer Science 2210/11
No ratings yet
Cambridge O Level: Computer Science 2210/11
16 pages
ISO 9001:2015 Clause 7.1.6 Organizational Knowledge
100% (1)
ISO 9001:2015 Clause 7.1.6 Organizational Knowledge
21 pages
1-Big Data Analytics
No ratings yet
1-Big Data Analytics
37 pages
FIT3176 W4 Lab 03 Activity Sheet 1 MongoDB CRUD
No ratings yet
FIT3176 W4 Lab 03 Activity Sheet 1 MongoDB CRUD
6 pages
File Disks
No ratings yet
File Disks
22 pages
Duschl 2003-58-77
No ratings yet
Duschl 2003-58-77
20 pages
I. Title of The Project
No ratings yet
I. Title of The Project
17 pages
Literature Review Definition of Terms
100% (2)
Literature Review Definition of Terms
8 pages
System Administration Guide PDF
0% (1)
System Administration Guide PDF
2,693 pages
Item Barcode
No ratings yet
Item Barcode
473 pages
Chap - 7 Research Report Writing
No ratings yet
Chap - 7 Research Report Writing
3 pages
Importance of Amchoor Production in Tribal Economy A Case Studies of Akkalkuwa Tahsil in Nandurbar District
No ratings yet
Importance of Amchoor Production in Tribal Economy A Case Studies of Akkalkuwa Tahsil in Nandurbar District
9 pages
Biologylabreport 2
No ratings yet
Biologylabreport 2
10 pages
Working Scholars Lifeasa Workerandan Academic Performer
No ratings yet
Working Scholars Lifeasa Workerandan Academic Performer
9 pages
Final Examination: This Is A Closed Book, Closed Notes, No Calculator Exam
No ratings yet
Final Examination: This Is A Closed Book, Closed Notes, No Calculator Exam
10 pages
Global Business School & Research Centre, Pune: "Product Promotion of Max Life Insurance
100% (1)
Global Business School & Research Centre, Pune: "Product Promotion of Max Life Insurance
75 pages
DBMS Les 9
No ratings yet
DBMS Les 9
7 pages
Unit 5
No ratings yet
Unit 5
7 pages
Ass4 - Hamming Code
No ratings yet
Ass4 - Hamming Code
5 pages
Company Database Schema:: Fname Lname SSN Bdate Addresss Sex Salary Superssn Dno
0% (1)
Company Database Schema:: Fname Lname SSN Bdate Addresss Sex Salary Superssn Dno
4 pages
3. ABC-AI-Big-Data-and-Cloud-Computing
No ratings yet
3. ABC-AI-Big-Data-and-Cloud-Computing
67 pages
Chapter 2 Transaction Processing
No ratings yet
Chapter 2 Transaction Processing
48 pages
Oracle Database Installation Guide
No ratings yet
Oracle Database Installation Guide
13 pages
Nep 2023 Syllabus MSC CS
No ratings yet
Nep 2023 Syllabus MSC CS
33 pages
1996 Powell & Single - Focal Groups PDF
No ratings yet
1996 Powell & Single - Focal Groups PDF
6 pages
Oracle Recommendations For RAC Cluster Interconnect and Jumbo Frames
No ratings yet
Oracle Recommendations For RAC Cluster Interconnect and Jumbo Frames
3 pages
SELECT - JOIN - ABAP Keyword Documentation
No ratings yet
SELECT - JOIN - ABAP Keyword Documentation
5 pages

Big Data - RDBMS, NoSQL and DynamoDB

Uploaded by

Big Data - RDBMS, NoSQL and DynamoDB

Uploaded by

PG Program in Cloud Computing

Big Data Notes : RDBMS,NoSQL and

Big data is everywhere. Online retail, social media, conglomerates etc.,

Python is slow when it comes to big data processing

Applications are evolving so do the databases.

Master - Slave architectures are CP(Consistent and Partition Tolerance)

Masterless architectures are AP(Availability and Partition Tolerance)

● Hotspot - Sudden raise is incoming requests for one particular partition.

You might also like