Big Data - RDBMS, NoSQL and DynamoDB
Big Data - RDBMS, NoSQL and DynamoDB
RDBMS
● The data is present in separate categorical tables while a linking column relates the tables.
Linking or joining tables with reference to the foreign key(linking column) is done through the
select query which we fire towards the database when we need the details from multiple tables.
So multiple tables data can be joined to create a ‘result set’ which will be consumed by the
application or the user.
● But the join operation happens every time the query is executed. For example if the query is run 2
million times, then the joining operation is done 2 million times.
● If the tables count is higher than the join itself can be of a very high number.
● So fetching the data within a few seconds from such a large join will be time consuming.
NoSQL
● The size of data stored is increasing at a very high velocity.
© 2013 - 2021 Great Lakes E-Learning Services Pvt. Ltd. All rights reserved
PG Program in Cloud Computing
● The data type and formats are also varied and increasing.
CAP :-
Consistency - Once the written data is readable immediately after the write operation is finished
Availability: ability to withstand the downtime of the Database server
Durability : The data once stored should always be available. (Partition Tolerance)
© 2013 - 2021 Great Lakes E-Learning Services Pvt. Ltd. All rights reserved
PG Program in Cloud Computing
© 2013 - 2021 Great Lakes E-Learning Services Pvt. Ltd. All rights reserved
PG Program in Cloud Computing
In RDBMS, the Data is mapped properly and any kind of query can be fired for fetching the data across
multiple hundred tables.
In NoSQL, you have to reverse this mechanism and plan the time to fetch the data.
DynamoDB is very similar to Cassandra.
DynamoDB
It is a NoSQL. Key value type
Tables
Indices
Global Tables
Interaction - insert delete edit data
© 2013 - 2021 Great Lakes E-Learning Services Pvt. Ltd. All rights reserved
PG Program in Cloud Computing
One table cannot have all the data. Use ways to retrieve data with a minimum number of tables where
the data is flatten out as possible - with data that does not change often.
Read capacity unit(RCU) - 1 strongly consistent read per second / 2 eventually consistent read.
Write capacity unit(WCU) - 1KB/s
So 20KB with strongly consistent read will be = 4KB x 5RCU
(3000RCU and 1000WCU per partition)
(if the partition size is increasing more than 10GB then the partition will split)
Autoscaling - can configure for percentage of the RCU after which more capacity should be added
At the initial stage, the load should be monitored and default read unit should be mapped correctly
DynamoDB Streams
Global tables
Global tables are managed solution for deploying a multi-region, multi-master database, without
having to build and maintain your own replication solution
Ideal for massively scaled applications, with globally dispersed users
It is a collection of one or more replica tables, all owned by a single AWS account
Each replica stores the same set of data items. Any given global table can only have one replica
table per region.
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/best-practices.html
© 2013 - 2021 Great Lakes E-Learning Services Pvt. Ltd. All rights reserved
PG Program in Cloud Computing
DynamoDB Accelerator
DAX is a pass through layer, a cluster of cache.
So before hitting the DynamoDB, the data will be checked in the cache layer. If the data is not
present in this layer then the query will be sent to DynamoDB.
© 2013 - 2021 Great Lakes E-Learning Services Pvt. Ltd. All rights reserved