0% found this document useful (0 votes)
80 views

Big Data - RDBMS, NoSQL and DynamoDB

Uploaded by

Ganapathy P
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
80 views

Big Data - RDBMS, NoSQL and DynamoDB

Uploaded by

Ganapathy P
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

PG Program in Cloud Computing

Big Data Notes : RDBMS,NoSQL and


DynamoDB

Big data is everywhere. Online retail, social media, conglomerates etc.,


● Used in analyzing the user needs and giving better recommendations.
● Used in analyzing the performance of a sports player and preparing strategy and game
predictions.
● Used in financial institutions, picking up the news and trends, algorithmic trading, price prediction
in real estates.
How big is Big Data? Petabytes (1 Million Gigabyte)
10BM photos ~ 1.5 petabytes
VVV - Volume | Variety | Velocity

Python is slow when it comes to big data processing


Hence tools like Hadoop(HDFS and MapReduce) / Hive(SQL like calls to HDFS) / Spark(pyspark) are
used.
Large numbers of physical hardwares are required for the big data. Setting those resources up initially
will cost you a lot. Cloud services can save the initial setting up cost with the pay per gigabyte and pay
per compute hour model. also one should be very careful while using the cloud for big data.
The big data is significantly helpful in large scale analysis. Analyzing the data of all the smart car users,
from the shopping complex purchase trends etc, this is done for increasing the profit and productivity.

Applications are evolving so do the databases.

RDBMS
● The data is present in separate categorical tables while a linking column relates the tables.
Linking or joining tables with reference to the foreign key(linking column) is done through the
select query which we fire towards the database when we need the details from multiple tables.
So multiple tables data can be joined to create a ‘result set’ which will be consumed by the
application or the user.
● But the join operation happens every time the query is executed. For example if the query is run 2
million times, then the joining operation is done 2 million times.
● If the tables count is higher than the join itself can be of a very high number.
● So fetching the data within a few seconds from such a large join will be time consuming.

NoSQL
● The size of data stored is increasing at a very high velocity.

© 2013 - 2021 Great Lakes E-Learning Services Pvt. Ltd. All rights reserved
PG Program in Cloud Computing
● The data type and formats are also varied and increasing.

CAP :-
Consistency - Once the written data is readable immediately after the write operation is finished
Availability: ability to withstand the downtime of the Database server
Durability : The data once stored should always be available. (Partition Tolerance)

Master - Slave architectures are CP(Consistent and Partition Tolerance)

© 2013 - 2021 Great Lakes E-Learning Services Pvt. Ltd. All rights reserved
PG Program in Cloud Computing

Masterless architectures are AP(Availability and Partition Tolerance)

© 2013 - 2021 Great Lakes E-Learning Services Pvt. Ltd. All rights reserved
PG Program in Cloud Computing
In RDBMS, the Data is mapped properly and any kind of query can be fired for fetching the data across
multiple hundred tables.
In NoSQL, you have to reverse this mechanism and plan the time to fetch the data.
DynamoDB is very similar to Cassandra.

DynamoDB
It is a NoSQL. Key value type
Tables
Indices
Global Tables
Interaction - insert delete edit data

● Hotspot - Sudden raise is incoming requests for one particular partition.


● Partitions split when more reads are done.
● Partition key and sort key combined together will bring one record back
● Global - row key and sort key can be different
● Local - RK should be same and sort key can be different

© 2013 - 2021 Great Lakes E-Learning Services Pvt. Ltd. All rights reserved
PG Program in Cloud Computing
One table cannot have all the data. Use ways to retrieve data with a minimum number of tables where
the data is flatten out as possible - with data that does not change often.

Read capacity unit(RCU) - 1 strongly consistent read per second / 2 eventually consistent read.
Write capacity unit(WCU) - 1KB/s
So 20KB with strongly consistent read will be = 4KB x 5RCU
(3000RCU and 1000WCU per partition)
(if the partition size is increasing more than 10GB then the partition will split)

Autoscaling - can configure for percentage of the RCU after which more capacity should be added
At the initial stage, the load should be monitored and default read unit should be mapped correctly

DynamoDB Streams

Global tables
Global tables are managed solution for deploying a multi-region, multi-master database, without
having to build and maintain your own replication solution
Ideal for massively scaled applications, with globally dispersed users
It is a collection of one or more replica tables, all owned by a single AWS account
Each replica stores the same set of data items. Any given global table can only have one replica
table per region.

https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/best-practices.html

© 2013 - 2021 Great Lakes E-Learning Services Pvt. Ltd. All rights reserved
PG Program in Cloud Computing

DynamoDB Accelerator
DAX is a pass through layer, a cluster of cache.
So before hitting the DynamoDB, the data will be checked in the cache layer. If the data is not
present in this layer then the query will be sent to DynamoDB.

© 2013 - 2021 Great Lakes E-Learning Services Pvt. Ltd. All rights reserved

You might also like