0% found this document useful (0 votes)

12 views7 pages

BDA Module-3

The document discusses the CAP Theorem, which states that a distributed data system cannot simultaneously provide Consistency, Availability, and Partition Tolerance, emphasizing the trade-offs in database design. It also covers NoSQL database architecture, highlighting its flexibility, scalability, and various data models, including key-value, document, column, and graph databases. Additionally, it contrasts NoSQL with traditional SQL databases and outlines strategies for handling big data challenges, such as even distribution, replication, and efficient query execution.

Uploaded by

laxmishetti1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views7 pages

BDA Module-3

Uploaded by

laxmishetti1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

BDA MODULE 3

1. CAP THEOREM

The CAP Theorem states that it is impossible for a distributed data system to simultaneously provide
Consistency (C), Availability (A), and Partition Tolerance (P). A distributed system can only guarantee at most
two out of these three properties. This theorem is critical in understanding trade-offs in distributed database design.
Among C, A, and P, two are at least present for the application service process.
• Consistency means all copies have the same value like in traditional DB.
• Availability means at least one copy is available in case a partition becomes active or fails.
• Partition means parts that are active but may not cooperate (share) as in distributed DBs.
1. Consistency in distributed database:
a. All nodes observe the same data at the same time.
b. Operations in one partition of the database should reflect in other related partitions in a distributed
database.
c. Example: Operations that change the sales data from a specific showroom in a table should also
reflect in related tables using that sales data.
2. Availability:
a. During transactions, field values must be available in other partitions of the database.
b. Each request receives a response on success or failure.
c. Replication ensures availability.
3. Partition:
a. Division of a large database into different databases without affecting the operations on them by
adopting specified procedures.
b. Partition Tolerance: Refers to the continuation of operations as a whole even in case of message
loss, node failure, or a node not being reachable.

Brewer's CAP Theorem

The CAP theorem demonstrates that any distributed system cannot guarantee Consistency (C), Availability (A),
and Partition Tolerance (P) together.
1. Consistency: All nodes observe the same data at the same time.
2. Availability: Each request receives a response on success or failure.
3. Partition Tolerance: The system continues to operate as a whole even in case of message loss, node
failure, or a node not being reachable.
In Case of Network Failure, the Choice Can Be:
• Database must answer, and that answer would be old or wrong data (AP).
• Database should not answer unless it receives the latest copy of the data (CP).
The CAP theorem implies that for a network partition system, the choice of consistency and availability are
mutually exclusive.
• CA: Consistency and Availability.
• AP: Availability and Partition Tolerance.
• CP: Consistency and Partition Tolerance.

2. NOSQL DATA ARCHITECTURE PATTERN, CHARACTERISTICS, TRANSACTIONS AND

SOLUTIONS.
Definition: NoSQL (Not Only SQL) databases are non-relational databases designed to handle large-scale data
with flexibility, scalability, and high performance. They support diverse data models like key-value, document,
column-family, and graph, making them ideal for semi-structured or unstructured data in modern applications.
NoSQL databases are used to manage and store big data efficiently and flexibly. They organize data into logical
patterns that help in storing, retrieving, and managing data effectively. The following are the primary data
architecture patterns in NoSQL:

i. Key-Value Store Database

This is one of the simplest models in NoSQL databases. Data is stored as Key-Value Pairs, where:
• Key: A unique identifier (strings, integers, or characters).
• Value: Linked to the key and can be of any data type (e.g., JSON, BLOB, strings).
Typically uses hash tables to store key-value pairs.

• Applications: Commonly used in shopping websites or e-commerce platforms.

• Advantages:
o Can handle large data volumes and heavy loads.
o Fast and easy retrieval of data using keys.
• Examples: DynamoDB, Berkeley DB

ii. Column Store Database

o Stores data in individual cells grouped into columns.
o Unlike relational databases, data is stored column-wise rather than row-wise.
o Columns can differ in format and titles across rows.
• Applications: Suitable for analytical operations like SUM, AVERAGE, and COUNT.
• Advantages:
o Data is readily available for column-specific queries.
o Optimized for queries on large datasets.
• Examples: HBase, Bigtable by Google, Cassandra

iii. Document Database

o Stores data as key-value pairs, but the values are called Documents.
o Documents are complex data structures (e.g., JSON, XML, text, arrays).
o Nested documents are commonly used.
• Applications: Ideal for managing semi-structured data such as JSON files.
• Advantages:
o Suitable for handling unstructured and semi-structured data.
o Easy storage, retrieval, and management of documents.
• Examples: MongoDB, CouchDB

iv. Graph Database

o Stores data in the form of graphs, where:
▪ Nodes: Represent entities or objects.
▪ Edges: Represent relationships between nodes and are uniquely identified.
• Applications: Useful for handling data with complex relationships, such as social networks.
• Advantages:
o Fast data traversal due to the connected structure.
o Suitable for managing spatial data.
• Examples: Neo4J, FlockDB (used by Twitter)

Features/ Characteristics:

1. Schema Flexibility: NoSQL databases do not require a fixed schema, allowing for dynamic addition and
modification of fields. This makes them suitable for handling unstructured and semi-structured data.
2. Scalability: Supports horizontal scaling, meaning data can be distributed across multiple servers (nodes)
to handle large-scale operations efficiently.
3. Auto Sharding: Automatically partitions large datasets across multiple servers, ensuring efficient load
distribution and improved query performance.
4. Replication: Provides data redundancy by replicating data across multiple nodes, ensuring high
availability and fault tolerance.
5. Integrated Caching: Built-in caching mechanisms reduce latency and improve data retrieval speed.
6. High Performance: Optimized for fast read and write operations, making them ideal for real-time
analytics and applications.
7. Distributed Architecture: Operates on distributed systems, making it possible to store and process
massive amounts of data across geographically dispersed nodes.
8. Handles Big Data: Designed to store and manage large volumes of data, including unstructured and semi-
structured formats like JSON, XML, and binary data (BLOBs).
9. Semi-Structured Data Support: Can handle irregular or flexible data formats, making them versatile for
modern applications that deal with dynamic data.
10. CAP Theorem Compliance: NoSQL databases prioritize Consistency, Availability, or Partition
Tolerance, depending on the use case, as per Brewer's CAP theorem.

Big Data NoSQL Transactions

NoSQL transactions differ significantly from traditional SQL-based systems, as they prioritize scalability and
flexibility over strict ACID compliance. Key features include:

1. Relaxation of ACID Properties: NoSQL databases often relax one or more ACID properties (Atomicity,
Consistency, Isolation, Durability) to enhance scalability and performance.
2. CAP Theorem: NoSQL databases are characterized by two out of three CAP properties: Consistency,
Availability, and Partition Tolerance, depending on the use case.
3. BASE Model: Transactions in NoSQL follow the BASE properties: Basically Available, Soft state, and
Eventual consistency, emphasizing availability and scalability over strict consistency.
4. Atomicity in Operations: While multi-document or multi-collection transactions are limited, atomicity
is often maintained within a single document or key-value pair.
5. Consistency: Transactions ensure eventual consistency, meaning all data replicas will synchronize over
time, but immediate consistency across all nodes is not guaranteed.
6. Isolation: Transactions are isolated from one another, ensuring that incomplete operations do not interfere
with others in the system.
7. Durability: Changes made during transactions are durable and persist even in the event of a system failure,
though the mechanism might vary from SQL databases.

Big Data NoSQL Solutions

NoSQL databases offer scalable and cost-effective solutions to handle the challenges of big data. Key
characteristics include:

1. High and Easy Scalability: Horizontal scalability allows adding new nodes to expand capacity, making
it suitable for terabytes and petabytes of data.
2. Replication: Data is replicated across multiple nodes, ensuring high availability, fault tolerance, and
reliability.
3. Distributed Shards: Data is partitioned into shards and distributed across clusters, improving
performance and throughput.
4. Cost-Effectiveness: NoSQL databases use inexpensive, open-source tools and commodity hardware,
reducing implementation and operational costs.
5. Schema-Less Data Model: No predefined schema is required, allowing flexibility in storing and
managing unstructured or semi-structured data.
6. Integrated Caching: Built-in caching in memory improves performance, eliminating the need for
separate caching infrastructure as in traditional SQL systems.
7. Flexibility: Unlike rigid SQL databases, NoSQL solutions are highly flexible, supporting various data
formats and structures without stringent constraints.

3. SHARED NOTHING ARCHITECTURE FOR BIG DATA TASK

The Shared Nothing Architecture (SN) is a scalable and distributed design model used in NoSQL databases and
big data systems. It ensures no single point of contention by decentralizing the system components. Each node
operates independently, with its own memory and disk storage, making it highly suitable for big data tasks.
Here are the key distribution models under SN architecture:

i. Single Server Model

• Simplest distribution option for NoSQL data stores.
• Entire application runs sequentially on a single server.
• Suitable for graph databases, which process relationships between nodes on one server.
• Efficient for small-scale applications but lacks scalability for larger datasets.
ii. Sharding Very Large Databases
• Sharding divides the database into smaller, more manageable pieces (shards), which are distributed
across multiple nodes.
• Provides horizontal scalability, as shards allow the addition of nodes to the cluster without
reconfiguring the application.
• Applications can process shards in parallel, improving performance.
• If a node or shard fails, the system can migrate the affected shard to another node, ensuring continuity.

iii. Master-Slave Distribution Model

• In this model, one node serves as the master (primary) while others act as slaves (secondary).
• The master node directs operations and updates slave nodes.
• Slave nodes handle read operations, while the master handles write operations and updates.
• Advantages:
o Improved processing performance due to the distribution of large datasets across slave nodes.
o Data redundancy ensures fault tolerance.

iv. Peer-to-Peer Distribution Model

• All nodes (peers) function equally, removing the concept of master-slave hierarchy.
• Characteristics of the model:
o All nodes handle read requests and provide responses.
o Replication ensures all nodes have updated data, enhancing consistency.
o Node failures do not disrupt write capabilities.
• Widely used by databases like Cassandra, where data is distributed across the cluster.
• Adding nodes increases system performance and scalability.
4. NOSQL V/S SQL(RDBMS)
Feature RDBMS NoSQL
Data Model Tabular (rows and columns) Document, Key-value, Graph, Wide-column
Schema Structured (predefined schema) Flexible (schema-less or dynamic)
ACID Properties Strong support for ACID (Atomicity, May or may not support ACID
Consistency, Isolation, Durability)
Scalability Vertical scaling (adding more powerful Horizontal scaling (adding more nodes)
hardware)
Data Types Primarily structured data (numbers, Supports various data types, including
text, dates) unstructured and semi-structured data
Use Cases Complex transactions, data Big data, high-velocity data, content
warehousing, OLTP systems management, mobile applications
Examples MySQL, PostgreSQL, Oracle Database, MongoDB, Cassandra, Redis, Neo4j
SQL Server

5. HANDLING BIG DATA PROBLEMS.

Big Data systems deal with vast volumes of structured, semi-structured, and unstructured data. Efficient
handling of these problems requires distributed and scalable solutions. Here are four key ways to address Big
Data challenges:

i. Even Distribution Using Hash Rings (Consistent Hashing)

• Consistent Hashing: A technique to distribute data evenly across a cluster using a hashing algorithm.
• How It Works: The algorithm generates a pointer (hash value) for a dataset, allowing client nodes to
locate data within the cluster using only the Collection_ID hash.
• Hash Ring: A circular map of hash values used to assign datasets consistently to specific processors or
nodes.
• Benefits: Ensures balanced data distribution and prevents overloading any single node, improving
efficiency and scalability.

ii. Replication for Horizontal Scaling and Fault Tolerance

• Replication: Creating real-time backup copies of data across multiple nodes.
• Enables horizontal scaling by distributing client read-requests to multiple nodes.
• Advantages:
o Fault-tolerant data retrieval in distributed environments.
o Improved performance as client requests are handled by replicated nodes.

iii. Moving Queries to Data Instead of Data to Queries

• In distributed environments, moving data-intensive queries to the nodes where data resides reduces
network overhead.
• This approach is efficient, especially when using cloud services or large-scale databases.
• Advantages:
o Faster query execution.
o Reduced data transfer costs.

iv. Query Distribution Across Multiple Nodes

• Query Analyzers: Analyze client queries and distribute them evenly across data nodes or replica nodes.
• Parallel Query Execution: Queries are executed simultaneously across multiple nodes, enhancing
performance.
• Benefits:
o Efficient resource utilization.
o Reduced query response time.

Oracle Database 19c Data Guard Administration Workshop
100% (1)
Oracle Database 19c Data Guard Administration Workshop
3 pages
NOSQL
No ratings yet
NOSQL
23 pages
Module 2
No ratings yet
Module 2
100 pages
Module 2
No ratings yet
Module 2
104 pages
Module 3&5 21&18
No ratings yet
Module 3&5 21&18
26 pages
Introduction To NoSQL
No ratings yet
Introduction To NoSQL
13 pages
CS3492-DBMS Unit-5
No ratings yet
CS3492-DBMS Unit-5
9 pages
BDA UT2 QB Answers
100% (1)
BDA UT2 QB Answers
22 pages
Bda Module 3
No ratings yet
Bda Module 3
20 pages
NGD Unit 1-4
No ratings yet
NGD Unit 1-4
43 pages
BDA CW Chapter 3
No ratings yet
BDA CW Chapter 3
9 pages
Nosql
No ratings yet
Nosql
12 pages
Unit 4 BDA
No ratings yet
Unit 4 BDA
22 pages
Unitw 12 W 2
No ratings yet
Unitw 12 W 2
18 pages
NoSQL Databases and Big Data Storage Systems
No ratings yet
NoSQL Databases and Big Data Storage Systems
4 pages
IntroNoSQL Revised
No ratings yet
IntroNoSQL Revised
28 pages
NoSQL Database
No ratings yet
NoSQL Database
64 pages
Data Engineering Unit 3
No ratings yet
Data Engineering Unit 3
4 pages
Full Stack UNIT3
No ratings yet
Full Stack UNIT3
57 pages
Lecture 6 - NoSQL
No ratings yet
Lecture 6 - NoSQL
28 pages
NoSql 2024 Assign2
No ratings yet
NoSql 2024 Assign2
189 pages
Module 3
No ratings yet
Module 3
14 pages
Unit 4
No ratings yet
Unit 4
47 pages
4unit NoSQL
No ratings yet
4unit NoSQL
27 pages
Bda Module 3
No ratings yet
Bda Module 3
35 pages
NoSQL Database
No ratings yet
NoSQL Database
8 pages
Bda Module 3
No ratings yet
Bda Module 3
24 pages
Module 1
No ratings yet
Module 1
69 pages
Introduction To NoSQL
No ratings yet
Introduction To NoSQL
29 pages
NoSQL D
No ratings yet
NoSQL D
26 pages
Module 2.3
No ratings yet
Module 2.3
25 pages
Big Data Analytics Lecture 3A
No ratings yet
Big Data Analytics Lecture 3A
27 pages
Big Data Analysis
No ratings yet
Big Data Analysis
9 pages
2 - NoSQL
No ratings yet
2 - NoSQL
32 pages
CIS - 468 - 04 - NOSQL Databases and Big Data Storage Systems
No ratings yet
CIS - 468 - 04 - NOSQL Databases and Big Data Storage Systems
102 pages
Nosql
No ratings yet
Nosql
64 pages
Unit 4 Cap Mongodb
No ratings yet
Unit 4 Cap Mongodb
23 pages
Introduction To NoSQL
No ratings yet
Introduction To NoSQL
43 pages
Nosql KK
No ratings yet
Nosql KK
23 pages
Unit Ii - Nosql Databases
No ratings yet
Unit Ii - Nosql Databases
112 pages
Nosql, Mongodb
No ratings yet
Nosql, Mongodb
18 pages
Module 2 Notes
No ratings yet
Module 2 Notes
19 pages
4 NoSql
No ratings yet
4 NoSql
25 pages
Massively Parallel Cloud Data Storage Systems: S. Sudarshan IIT Bombay
No ratings yet
Massively Parallel Cloud Data Storage Systems: S. Sudarshan IIT Bombay
17 pages
BDA (18CS72) Module-III
No ratings yet
BDA (18CS72) Module-III
14 pages
No SQL
No ratings yet
No SQL
109 pages
4.NoSQL 1
No ratings yet
4.NoSQL 1
69 pages
NoSQL Databases
No ratings yet
NoSQL Databases
20 pages
Bda Ia2 Bda
No ratings yet
Bda Ia2 Bda
7 pages
Intro No SQL
No ratings yet
Intro No SQL
44 pages
09 - Cloud-Enabling Technologies - v2
No ratings yet
09 - Cloud-Enabling Technologies - v2
45 pages
Unit VI - 1
No ratings yet
Unit VI - 1
31 pages
No SQL
No ratings yet
No SQL
12 pages
Module 5 - NoSQL Databases
No ratings yet
Module 5 - NoSQL Databases
33 pages
DBMS 11
No ratings yet
DBMS 11
13 pages
RK NoSQL
No ratings yet
RK NoSQL
35 pages
BDA Answers
No ratings yet
BDA Answers
6 pages
Avances Mayerly Orjuela Terea 2
No ratings yet
Avances Mayerly Orjuela Terea 2
6 pages
Unit-5 Notes
No ratings yet
Unit-5 Notes
17 pages
BDA - Assignment - 3 Sahil - Bhise
No ratings yet
BDA - Assignment - 3 Sahil - Bhise
3 pages
Cloud Computing Module-4
No ratings yet
Cloud Computing Module-4
4 pages
Cloud Computing Module-5
No ratings yet
Cloud Computing Module-5
5 pages
Cloud Computing Module-2
No ratings yet
Cloud Computing Module-2
6 pages
Cloud Computing Module-1
No ratings yet
Cloud Computing Module-1
5 pages
Cloud Computing Module-3
No ratings yet
Cloud Computing Module-3
5 pages
Sadp 5
No ratings yet
Sadp 5
7 pages
DBMS Lab Manual
No ratings yet
DBMS Lab Manual
37 pages
Sadp 2
No ratings yet
Sadp 2
10 pages
Umair Latif
No ratings yet
Umair Latif
4 pages
SAP Basis
No ratings yet
SAP Basis
5 pages
4Gls Sap R/2: Sap Systems and Landscapes
No ratings yet
4Gls Sap R/2: Sap Systems and Landscapes
19 pages
From Spreadsheets To Relational Databases and Back: Abstract
No ratings yet
From Spreadsheets To Relational Databases and Back: Abstract
21 pages
S5 Syllabus Computer Science (Old Scheme)
100% (1)
S5 Syllabus Computer Science (Old Scheme)
6 pages
Oracle SQL PL SQL Concepts
No ratings yet
Oracle SQL PL SQL Concepts
4 pages
Guardium Data Encryption - Tech TalkFINAL
No ratings yet
Guardium Data Encryption - Tech TalkFINAL
27 pages
Create Calculation View - SQL Script Table Functions Procedure
No ratings yet
Create Calculation View - SQL Script Table Functions Procedure
12 pages
10 - Sqlite
No ratings yet
10 - Sqlite
46 pages
Lecture 02 - Conceptual Phase and ERD
No ratings yet
Lecture 02 - Conceptual Phase and ERD
38 pages
Mod Updating
No ratings yet
Mod Updating
11 pages
Database Languages in DBMS
No ratings yet
Database Languages in DBMS
7 pages
Migrating A Twotier Application To Azure A Handson Walkthrough of Azure Infrastructure Platform and Container Services 1st Ed Peter de Tender Download
No ratings yet
Migrating A Twotier Application To Azure A Handson Walkthrough of Azure Infrastructure Platform and Container Services 1st Ed Peter de Tender Download
83 pages
Professional Cloud Architect
No ratings yet
Professional Cloud Architect
9 pages
Math S1 Notes
No ratings yet
Math S1 Notes
177 pages
Evaluation of Business Performance Source 01
No ratings yet
Evaluation of Business Performance Source 01
25 pages
SS 2 Data Processing 1ST Term 20172018 Exam
No ratings yet
SS 2 Data Processing 1ST Term 20172018 Exam
9 pages
Intrusion Detection System Using Machine Learning in Python
No ratings yet
Intrusion Detection System Using Machine Learning in Python
10 pages
Asm2 Bi 2ST
No ratings yet
Asm2 Bi 2ST
57 pages
CTR 3 - Statistical Analysis With Software Application: I. Topic II. Learning Objectives/ Outcomes
No ratings yet
CTR 3 - Statistical Analysis With Software Application: I. Topic II. Learning Objectives/ Outcomes
10 pages
Practical
No ratings yet
Practical
17 pages
Ch3 Queries, Forms and Reports
No ratings yet
Ch3 Queries, Forms and Reports
3 pages
What Is Rank Math SEO - Guide KWs
100% (1)
What Is Rank Math SEO - Guide KWs
304 pages
Schneider Electric
No ratings yet
Schneider Electric
40 pages
g12 Important Questions Database Concepts
No ratings yet
g12 Important Questions Database Concepts
7 pages
Learn Python and Develop A Full Deployable Android App or Web App
No ratings yet
Learn Python and Develop A Full Deployable Android App or Web App
3 pages
FULL PORTIONS 1 ANSWER KEY Ekbs
No ratings yet
FULL PORTIONS 1 ANSWER KEY Ekbs
22 pages
DBA Sheet v7.0
100% (1)
DBA Sheet v7.0
545 pages
Adbms PDF
No ratings yet
Adbms PDF
199 pages

BDA Module-3

Uploaded by

BDA Module-3

Uploaded by

BDA MODULE 3

Brewer's CAP Theorem

2. NOSQL DATA ARCHITECTURE PATTERN, CHARACTERISTICS, TRANSACTIONS AND

i. Key-Value Store Database

• Applications: Commonly used in shopping websites or e-commerce platforms.

ii. Column Store Database

iii. Document Database

iv. Graph Database

Big Data NoSQL Transactions

Big Data NoSQL Solutions

3. SHARED NOTHING ARCHITECTURE FOR BIG DATA TASK

i. Single Server Model

iii. Master-Slave Distribution Model

iv. Peer-to-Peer Distribution Model

5. HANDLING BIG DATA PROBLEMS.

i. Even Distribution Using Hash Rings (Consistent Hashing)

ii. Replication for Horizontal Scaling and Fault Tolerance

iii. Moving Queries to Data Instead of Data to Queries

iv. Query Distribution Across Multiple Nodes

You might also like