0% found this document useful (0 votes)

3 views

no sql

notes for nosql 21 scheme

Uploaded by

hrakshitharaju2192001

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

no sql

notes for nosql 21 scheme

Uploaded by

hrakshitharaju2192001

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

2.

3 Relaxing Consistency

• Consistency ensures all parts of a system see the same data, but sometimes it's okay to
sacri ce it to achieve better performance or availability.
• Why relax consistency?
◦ In some systems, ensuring 100% consistency slows things down too much.
◦ Example: Databases allow "relaxed isolation levels" like Read Committed for better
performance, even if it allows minor inconsistencies.
◦ Some systems (like MySQL before it supported transactions) skip strict consistency
to be faster.
◦ Large systems (e.g., eBay) drop strict consistency to handle massive operations
effectively, especially with sharding.

2.4 The CAP Theorem

• CAP theorem states that in a distributed system, you can only guarantee two of the three:
Consistency, Availability, or Partition Tolerance.
◦ Consistency: All users see the same data simultaneously.
◦ Availability: The system responds to all requests, even during failures.
◦ Partition Tolerance: The system works despite communication breakdowns
(network issues).
Examples:

• Single machine: Ensures Consistency + Availability because it doesn’t deal with partitions.

• Distributed systems: Often need to choose between consistency and availability when
partitions occur.

◦ Example: Hotel booking during a network failure:

▪ Consistent system: No one can book (low availability but avoids
overbooking).
▪ Available system: Both users can book (overbooking may occur, but it's
faster).
• Trade-off: Systems often allow small inconsistencies to improve availability or reduce
response time. For instance:

◦ Shopping carts: Even if the network fails, users can add items. Later, systems
merge carts to x inconsistencies.
Key Takeaway:

• Decide between consistency and latency based on the system's needs. For example:
◦ Finance systems prioritize up-to-date consistency.
◦ Media websites tolerate minor delays in consistency.

2.5 Relaxing Durability

• Durability ensures data remains safe even after a crash, but sometimes it's okay to relax
durability for better speed.
When to relax durability?
fi
fi
• Session Data: Temporary user data on websites can be stored in memory. If lost, the user
may need to log in again, but the system stays fast.
• Telemetry Data: Devices sending data can prioritize speed over saving every update.
• Replicated Data: In systems with master-slave models:
◦ If the master fails before updates are copied to slaves, some data is lost.
◦ To avoid this, the master can wait for slaves to con rm updates—but this slows the
system.
Trade-off:

• Balancing durability with performance depends on the importance of the data. Critical
operations can enforce durability, while less-critical ones can skip it for speed.

fi
the version stamps for multiple nodes in a simpler way, using real-world analogies:

1. Counter-based Versioning

• Think of it like a counter on a paper.

• Every time a node (like a server or computer) makes a change, it adds 1 to a counter.
• If two nodes give you different counters (like 4 and 6), the node with the higher number is
the newer version.
• Problem: If multiple nodes are updating at the same time, you can’t tell which change came
rst if the counters are not properly synchronized. It's great for a single main node, but not
for many nodes updating at once.
2. Timestamps

• Imagine a clock showing the current time.

• Every time a node updates something, it records the current time (timestamp).
• If one node has a timestamp of 10:00 AM and another node has 10:05 AM, the second one
is the newer version.
• Problem: Clocks on different nodes can be out of sync, so if one node's clock is wrong, it
could cause confusion. Also, this doesn't help if two nodes update data at the same time.
3. Version History (Like a Journal of Changes)

• Think of a journal where you keep a record of all changes.

• Every time a node updates something, it writes a new entry in its journal.
• If two nodes share their journals, you can check who made the change rst, or if both
journals show the same changes.
• Problem: This method requires a lot of space and tracking to store all changes.
4. Vector Stamps (Version Vectors)

• Imagine having a list of counters, one for each node in the system.
• Each time a node makes a change, it increases its counter. So, if there are three nodes, the
version stamp could look like:
[Node A: 4, Node B: 5, Node C: 3].
• When two nodes share information, they compare their lists. If a node’s list has bigger
numbers, it means it’s newer.
• Example: If Node A’s list is [4, 5, 3] and Node B’s list is [4, 4, 3], Node A is
newer because it has a higher value for Node B’s counter.
• Problem: It doesn’t solve con icts on its own, it just detects them (for example, if two
nodes change the same value at the same time).
Why is Vector Stamping Important?

• Vector stamps work well in systems where multiple nodes are updating data independently,
and it helps detect if updates are inconsistent (for example, if two nodes made con icting
changes). This method ensures that you can track which node made what change, and
determine if any con icts happened.
fi
fl
fl
fi
fl
MapReduce Model for Calculating Average Ordered Quantity of Each Product

To calculate the average ordered quantity of each product using MapReduce, we need to follow
the typical steps in the MapReduce model, but we have to handle the speci c aggregation of sums
and counts for each product. Here's the breakdown of the process:

1. Map Phase:
◦ For each order record, emit the product and the ordered quantity.
◦ Emit (product, quantity) as key-value pairs, where the key is the product,
and the value is the ordered quantity.
2. Shuf e and Sort Phase:

◦ Group the records by product so that all records related to the same product are
brought together.
3. Reduce Phase:

◦ For each group of records (i.e., all records for the same product), sum the ordered
quantities and count how many orders there are for that product.
◦ Calculate the average by dividing the total sum by the count.
MapReduce Calculation for Average Ordered Quantity

Input Data: Suppose we have the following records (product, ordered quantity):

Product A, 10
Product B, 20
Product A, 15
Product A, 5
Product B, 10
Step-by-Step Calculation:

1. Map Phase:

• Each record is processed and emitted as a key-value pair where the key is the product, and
the value is the ordered quantity.

Input Records:
("Product A", 10)
("Product B", 20)
("Product A", 15)
("Product A", 5)
("Product B", 10)

Map Output:
("Product A", 10)
("Product B", 20)
("Product A", 15)
("Product A", 5)
fl
fi
("Product B", 10)
2. Shuf e and Sort Phase:

• The system groups all the records by the key, i.e., by product. So we get:

Shuffled and Sorted Output:

"Product A" -> [10, 15, 5]
"Product B" -> [20, 10]
3. Reduce Phase:

• For each product, the reducer calculates:

◦ The total quantity (sum of quantities).
◦ The count (the number of orders for that product).
◦ Then calculates the average by dividing the total quantity by the count.
For "Product A":

• Sum of quantities = 10 + 15 + 5 = 30
• Count of orders = 3
• Average = Sum / Count = 30 / 3 = 10
For "Product B":

• Sum of quantities = 20 + 10 = 30
• Count of orders = 2
• Average = Sum / Count = 30 / 2 = 15

Reduce Output:
"Product A" -> Average = 10
"Product B" -> Average = 15
Example and Diagram:

Input Data:

Ordered
Product
Quantity
Product
10
A
Product
20
B
Product
15
A
Product
5
A
Product
10
B

Map Phase:

After the Map phase, we emit:

fl
Value (Ordered
Key
Quantity)
Product
10
A
Product
20
B
Product
15
A
Product
5
A
Product
10
B

Shuf e and Sort Phase:

We group the records by product:

Values (Ordered
Key
Quantities)
Product
[10, 15, 5]
A
Product
[20, 10]
B

Reduce Phase:

For each product, we calculate the total and the count:

• For Product A:

◦ Total Quantity = 10 + 15 + 5 = 30
◦ Count = 3
◦ Average = 30 / 3 = 10
• For Product B:

◦ Total Quantity = 20 + 10 = 30
◦ Count = 2
◦ Average = 30 / 2 = 15
Final Output (After Reduce Phase):

Average Ordered
Product
Quantity
Product
10
A
Product
15
B
fl
Visual Diagram:

Input Data:
(Product A, 10)
(Product B, 20)
(Product A, 15)
(Product A, 5)
(Product B, 10)

Map Phase:
Emit:
(Product A, 10)
(Product B, 20)
(Product A, 15)
(Product A, 5)
(Product B, 10)

Shuffle & Sort Phase:

Group by Key:
(Product A, [10, 15, 5])
(Product B, [20, 10])

Reduce Phase:
For Product A:
Total = 10 + 15 + 5 = 30
Count = 3
Average = 30 / 3 = 10

For Product B:
Total = 20 + 10 = 30
Count = 2
Average = 30 / 2 = 15

Output:
(Product A, 10)
(Product B, 15)
Explanation:

• Map: The Map function emits key-value pairs where the key is the product, and the value is
the quantity ordered.
• Shuf e & Sort: The system groups all records by the product (key), so we get a list of all
quantities for each product.
• Reduce: The Reduce function calculates the total and count for each product, and
computes the average by dividing the total by the count.
fl
Conclusion:

By structuring our calculation in this way, we can ef ciently compute the average ordered quantity
for each product using MapReduce. This approach scales well, as each task can be processed in
parallel, allowing for distributed computation across a cluster.

fi
Key-value stores are a type of NoSQL database where data is stored as pairs of a key and a value.
These databases are designed for fast, simple data retrieval and are often used when you need to
store a lot of data and access it quickly.

Here’s a simple explanation of the key features of key-value stores:

1. Consistency

• In key-value stores, consistency refers to whether data is the same across all copies
(replicas) of the data. If data is updated in one place, it may take some time for all replicas to
re ect the change.
• Some key-value stores like Riak use eventual consistency, meaning changes will
eventually be consistent across all replicas, but not immediately.
• Example: If you update a product’s price in one location, it might take some time before all
users see the new price.
2. Transactions

• Transactions in key-value stores are simpler than in traditional databases. In most key-
value stores, you can’t have complex multi-step transactions like in relational databases.
Instead, a key-value store can ensure that data is written to a certain number of replicas
(multiple servers) before the write is considered successful.
• Example: If you want to update data, you can set rules like "write to at least 3 servers," but it
doesn’t ensure full transactional behavior (like locking or rolling back changes).
3. Query Features

• Key-value stores are simple: you can only look up data by its key. There’s no complex
querying like in relational databases (e.g., ltering by column values).
• If you need more complex querying, some key-value stores like Riak allow you to search
inside the value (e.g., if values are stored as JSON, you can search the elds inside the
JSON).
• Example: If you have a shopping cart, you would query by the cart ID (the key), and you get
the cart details (the value).
4. Structure of Data

• Key-value stores don’t impose a structure on the value part of the data. The value can be
anything, such as text, numbers, JSON data, or even binary les.
• Example: You can store user pro les as JSON objects, and the key could be the user ID.
5. Scaling

• Sharding: To handle more data, key-value stores split (or shard) the data across multiple
servers. For example, if you store data based on a key, a key-value store might assign each
key to a speci c server based on a hash function.
• Replication: Data is often copied across multiple servers (replication) to ensure it’s always
available even if a server goes down.
• Example: If you have a key for “user123” that’s stored on one server, the data could also be
stored on several other servers for backup and availability.
Why Use Key-Value Stores?

• They are very fast for simple lookups by key.

fl
fi
fi
fi
fi
fi
• They are exible, allowing you to store a wide variety of data types without worrying about
a xed schema (structure).
• They scale well as your application grows because they can distribute data across multiple
servers.
Example Use Cases:

• Session storage: Storing temporary user session data (e.g., a session ID as the key and the
user’s details as the value).
• Caching: Storing frequently accessed data for quick retrieval, like product details or page
content.
• Shopping cart: Storing cart information with the cart ID as the key and the list of items as
the value.
Conclusion:

Key-value stores are perfect when you need simple, fast lookups, can manage with minimal
querying, and are looking to scale your application quickly. They work well for speci c use cases
like session management, caching, and storing user pro les.
fi
fl
fi
fi
Map-Reduce is a simple yet powerful method used to process large amounts of data in parallel
across many computers. It consists of two main steps: Map and Reduce.

Basic Map-Reduce Steps:

1. Map Step:
◦ In the Map step, we break down the input data into smaller pieces and process each
piece independently.
◦ For example, let’s say we have orders, and each order has products with quantity and
price. The Map function takes each order and creates a key-value pair:
▪ Key: Product ID (e.g., "A001")
▪ Value: Quantity and Price (e.g., Quantity: 3, Price: 20)
2. Map Output Example:

◦ Order 1: Product A001, Quantity 3, Price 20 → Emit ("A001", 3, 20)

◦ Order 2: Product A001, Quantity 2, Price 20 → Emit ("A001", 2, 20)
3. Shuf e and Sort Step:

◦ The system groups all the key-value pairs by their key (Product ID in this case). So
all "A001" values will be grouped together for the next step.
4. Reduce Step:

◦ In the Reduce step, the system takes all the values for the same key and combines
them. For example:
▪ For Product A001, we get the quantities and prices from the Map step.
▪ We sum the quantities and calculate total revenue by multiplying quantity
with price.
5. Reduce Calculation Example for Product A001:

◦ Quantity = 3 + 2 = 5
◦ Revenue = (3 * 20) + (2 * 20) = 60 + 40 = 100
6. Final Output:

◦ The nal result will be something like:

▪ Product A001 → Total Quantity: 5, Total Revenue: 100
fi
fl
In Riak, data is stored in buckets, and operations are performed using HTTP requests. Here's how
data can be read from and written to a bucket:

1. Storing Data (POST)

To store data in Riak, send a POST request with the data and specify the bucket and key.

curl -v -X POST -d '{"lastVisit": 1324669989288, "user":

{"customerId": "91cfdf5bcb7c", "name": "buyer"}}' \
-H "Content-Type: application/json" \
http://localhost:8098/buckets/session/keys/a7e618d9db25
2. Retrieving Data (GET)

To retrieve data, send a GET request with the bucket name and key:

curl -i http://localhost:8098/buckets/session/keys/
a7e618d9db25
3. Deleting Data (DELETE)

To delete data, use a DELETE request with the key:

curl -i -X DELETE http://localhost:8098/buckets/session/keys/

a7e618d9db25
Summary:

• POST to store data: curl -X POST -d '{"data"}' ...

• GET to retrieve data: curl -i GET ...
• DELETE to remove data: curl -X DELETE ...

60 Multiple Choice Questions
No ratings yet
60 Multiple Choice Questions
11 pages
Ch02 - Big Data Storage Concepts
No ratings yet
Ch02 - Big Data Storage Concepts
23 pages
Web Application Advanced
No ratings yet
Web Application Advanced
118 pages
ex
No ratings yet
ex
6 pages
Module-2 NOSQL
No ratings yet
Module-2 NOSQL
5 pages
Nosql Data Management
No ratings yet
Nosql Data Management
13 pages
3 Module NOSQL Preparation
No ratings yet
3 Module NOSQL Preparation
12 pages
MODULE 3
No ratings yet
MODULE 3
79 pages
DOC-20250224-WA0004
No ratings yet
DOC-20250224-WA0004
14 pages
Big Data and hadoop
No ratings yet
Big Data and hadoop
8 pages
777 1651400043 BD Module 4
No ratings yet
777 1651400043 BD Module 4
21 pages
MR Databases
No ratings yet
MR Databases
52 pages
03-MapReduce
No ratings yet
03-MapReduce
184 pages
Big Data Computing
No ratings yet
Big Data Computing
36 pages
DrKP Module 3
No ratings yet
DrKP Module 3
44 pages
Nosql Module 2
100% (1)
Nosql Module 2
87 pages
SYSTEM DESIGN.docx (1)
No ratings yet
SYSTEM DESIGN.docx (1)
6 pages
MapReduce Algo Design Final
No ratings yet
MapReduce Algo Design Final
46 pages
Introduction To MapReduce
No ratings yet
Introduction To MapReduce
43 pages
Big Data Management and Nosql Databases: Doc. Rndr. Irena Holubova, PH.D
No ratings yet
Big Data Management and Nosql Databases: Doc. Rndr. Irena Holubova, PH.D
27 pages
lecture 27
No ratings yet
lecture 27
19 pages
NOSQL_QBSOL_IA-02
No ratings yet
NOSQL_QBSOL_IA-02
18 pages
NoSQL - Unit 2
No ratings yet
NoSQL - Unit 2
11 pages
4 - Key-Value Stores
No ratings yet
4 - Key-Value Stores
47 pages
HLD
No ratings yet
HLD
35 pages
System Design Importnat Concepts
No ratings yet
System Design Importnat Concepts
16 pages
NOSQL_MOD3
No ratings yet
NOSQL_MOD3
18 pages
Advanced Distributed Systems Replication: What Is Replication? Reasons For Replication
No ratings yet
Advanced Distributed Systems Replication: What Is Replication? Reasons For Replication
20 pages
DBS REVIEWER
No ratings yet
DBS REVIEWER
4 pages
Big data Slides
No ratings yet
Big data Slides
26 pages
Algomasterio System Design Interview Handbook
No ratings yet
Algomasterio System Design Interview Handbook
19 pages
Big Data Notes (All Lectures)
No ratings yet
Big Data Notes (All Lectures)
44 pages
Notes NoSQL Module 2 Leason 6 (1)
No ratings yet
Notes NoSQL Module 2 Leason 6 (1)
3 pages
DBMS Module 1&2
No ratings yet
DBMS Module 1&2
57 pages
Projections
No ratings yet
Projections
8 pages
Join Algorithms
No ratings yet
Join Algorithms
66 pages
Dara Mining
No ratings yet
Dara Mining
3 pages
DS CH6 - Consistency and Replication
No ratings yet
DS CH6 - Consistency and Replication
18 pages
Sample Doc Final
No ratings yet
Sample Doc Final
21 pages
04 Chapter Pattern in Mongodb1
No ratings yet
04 Chapter Pattern in Mongodb1
27 pages
URL Shortner
100% (1)
URL Shortner
32 pages
System Design
No ratings yet
System Design
385 pages
Introduction To Nosql: Gabriele Pozzani
No ratings yet
Introduction To Nosql: Gabriele Pozzani
49 pages
BDA 2 (1)
No ratings yet
BDA 2 (1)
35 pages
Dynamo: Amazon'S Highly Available Key-Value Store: Csci 8101: Advanced Operating Systems Presented By: Chaithra KN
No ratings yet
Dynamo: Amazon'S Highly Available Key-Value Store: Csci 8101: Advanced Operating Systems Presented By: Chaithra KN
23 pages
Nosql Systems: Sharding, Replication and Consistency: Riccardo Torlone Università Roma Tre
No ratings yet
Nosql Systems: Sharding, Replication and Consistency: Riccardo Torlone Università Roma Tre
28 pages
4.NoSQL 1
No ratings yet
4.NoSQL 1
69 pages
Adbms 1 To 3
No ratings yet
Adbms 1 To 3
36 pages
day6
No ratings yet
day6
12 pages
Introduction To Distributed Computing
No ratings yet
Introduction To Distributed Computing
57 pages
Introduction To Hadoop
No ratings yet
Introduction To Hadoop
37 pages
Differences and Definitions
No ratings yet
Differences and Definitions
13 pages
Bda Unit I Lecture8 1
No ratings yet
Bda Unit I Lecture8 1
55 pages
BDH Answer Bank
No ratings yet
BDH Answer Bank
21 pages
ECS781P-9-Cloud Data Management
No ratings yet
ECS781P-9-Cloud Data Management
79 pages
Lec 3 -Basic Concepts
No ratings yet
Lec 3 -Basic Concepts
32 pages
CH-07 Replication
No ratings yet
CH-07 Replication
35 pages
Nosql1
No ratings yet
Nosql1
40 pages
Lecture 3 - MapReduce
No ratings yet
Lecture 3 - MapReduce
9 pages
Notes NoSQL Module 2 Leason 5 (1)
No ratings yet
Notes NoSQL Module 2 Leason 5 (1)
6 pages
System Design
No ratings yet
System Design
30 pages
New Forms of On-line Converters and Calculators
From Everand
New Forms of On-line Converters and Calculators
Joseph Nowarski
No ratings yet
MongoDB SDP
No ratings yet
MongoDB SDP
1 page
OceanofPDF - Com DATA SCIENCE Simple and Effective Tips An - Benjamin Smith
100% (1)
OceanofPDF - Com DATA SCIENCE Simple and Effective Tips An - Benjamin Smith
122 pages
Building A Scalable Time-Series Database Using Postgres: Mike Freedman
No ratings yet
Building A Scalable Time-Series Database Using Postgres: Mike Freedman
45 pages
Lecture 1 - Introduction to Data Science
No ratings yet
Lecture 1 - Introduction to Data Science
12 pages
Advances in Mobile Cloud Computing and Big Data in the 5G Era 1st Edition Constandinos X. Mavromoustakis pdf download
No ratings yet
Advances in Mobile Cloud Computing and Big Data in the 5G Era 1st Edition Constandinos X. Mavromoustakis pdf download
62 pages
jm.15.0413
No ratings yet
jm.15.0413
75 pages
Yiguo (Margo) Wang: Master of Science in Business Analytics (STEM)
No ratings yet
Yiguo (Margo) Wang: Master of Science in Business Analytics (STEM)
1 page
M.E. Cse (Ai&ml)
No ratings yet
M.E. Cse (Ai&ml)
63 pages
Big Data Seminar Report Rahul Jain
No ratings yet
Big Data Seminar Report Rahul Jain
41 pages
BigData Cs-704 Practical
No ratings yet
BigData Cs-704 Practical
28 pages
Mongodb - Quick Guide Mongodb Overview
No ratings yet
Mongodb - Quick Guide Mongodb Overview
18 pages
Databases Note
No ratings yet
Databases Note
6 pages
115 SQL Interview Questions and Answers
100% (1)
115 SQL Interview Questions and Answers
34 pages
ADBMS
No ratings yet
ADBMS
41 pages
Big Data Use Cases: Product Development
No ratings yet
Big Data Use Cases: Product Development
8 pages
Hiring_dec_2024 - Bytive
No ratings yet
Hiring_dec_2024 - Bytive
2 pages
Forrester White Paper The CIO Guide To Big Data Archiving
No ratings yet
Forrester White Paper The CIO Guide To Big Data Archiving
27 pages
Bda Super Imp
No ratings yet
Bda Super Imp
35 pages
Ulislam 2019
No ratings yet
Ulislam 2019
10 pages
Dbms All Units Notes
No ratings yet
Dbms All Units Notes
140 pages
CASE STUDY
No ratings yet
CASE STUDY
6 pages
Educative System Design Part1
No ratings yet
Educative System Design Part1
33 pages
Bda Summer 2022 Solution
No ratings yet
Bda Summer 2022 Solution
30 pages
[FREE PDF sample] (eBook PDF) Modern Database Management 12th Global Edition ebooks
100% (7)
[FREE PDF sample] (eBook PDF) Modern Database Management 12th Global Edition ebooks
56 pages
Full Stuck Software Developer
No ratings yet
Full Stuck Software Developer
45 pages
Intro To MongoDB
100% (1)
Intro To MongoDB
13 pages
2 - Disadvantages of NoSQL Technology
No ratings yet
2 - Disadvantages of NoSQL Technology
3 pages

no sql

Uploaded by

no sql

Uploaded by

2.

2.4 The CAP Theorem

◦ Example: Hotel booking during a network failure:

2.5 Relaxing Durability

• Think of it like a counter on a paper.

• Imagine a clock showing the current time.

• Think of a journal where you keep a record of all changes.

Shuffled and Sorted Output:

• For each product, the reducer calculates:

After the Map phase, we emit:

Shuf e and Sort Phase:

We group the records by product:

For each product, we calculate the total and the count:

Shuffle & Sort Phase:

Here’s a simple explanation of the key features of key-value stores:

• They are very fast for simple lookups by key.

Basic Map-Reduce Steps:

◦ Order 1: Product A001, Quantity 3, Price 20 → Emit ("A001", 3, 20)

◦ The nal result will be something like:

1. Storing Data (POST)

curl -v -X POST -d '{"lastVisit": 1324669989288, "user":

To delete data, use a DELETE request with the key:

curl -i -X DELETE http://localhost:8098/buckets/session/keys/

• POST to store data: curl -X POST -d '{"data"}' ...

You might also like