0% found this document useful (0 votes)
3 views

no sql

notes for nosql 21 scheme
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

no sql

notes for nosql 21 scheme
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

2.

3 Relaxing Consistency

• Consistency ensures all parts of a system see the same data, but sometimes it's okay to
sacri ce it to achieve better performance or availability.
• Why relax consistency?
◦ In some systems, ensuring 100% consistency slows things down too much.
◦ Example: Databases allow "relaxed isolation levels" like Read Committed for better
performance, even if it allows minor inconsistencies.
◦ Some systems (like MySQL before it supported transactions) skip strict consistency
to be faster.
◦ Large systems (e.g., eBay) drop strict consistency to handle massive operations
effectively, especially with sharding.

2.4 The CAP Theorem

• CAP theorem states that in a distributed system, you can only guarantee two of the three:
Consistency, Availability, or Partition Tolerance.
◦ Consistency: All users see the same data simultaneously.
◦ Availability: The system responds to all requests, even during failures.
◦ Partition Tolerance: The system works despite communication breakdowns
(network issues).
Examples:

• Single machine: Ensures Consistency + Availability because it doesn’t deal with partitions.

• Distributed systems: Often need to choose between consistency and availability when
partitions occur.

◦ Example: Hotel booking during a network failure:


▪ Consistent system: No one can book (low availability but avoids
overbooking).
▪ Available system: Both users can book (overbooking may occur, but it's
faster).
• Trade-off: Systems often allow small inconsistencies to improve availability or reduce
response time. For instance:

◦ Shopping carts: Even if the network fails, users can add items. Later, systems
merge carts to x inconsistencies.
Key Takeaway:

• Decide between consistency and latency based on the system's needs. For example:
◦ Finance systems prioritize up-to-date consistency.
◦ Media websites tolerate minor delays in consistency.

2.5 Relaxing Durability

• Durability ensures data remains safe even after a crash, but sometimes it's okay to relax
durability for better speed.
When to relax durability?
fi
fi
• Session Data: Temporary user data on websites can be stored in memory. If lost, the user
may need to log in again, but the system stays fast.
• Telemetry Data: Devices sending data can prioritize speed over saving every update.
• Replicated Data: In systems with master-slave models:
◦ If the master fails before updates are copied to slaves, some data is lost.
◦ To avoid this, the master can wait for slaves to con rm updates—but this slows the
system.
Trade-off:

• Balancing durability with performance depends on the importance of the data. Critical
operations can enforce durability, while less-critical ones can skip it for speed.

fi
the version stamps for multiple nodes in a simpler way, using real-world analogies:

1. Counter-based Versioning

• Think of it like a counter on a paper.


• Every time a node (like a server or computer) makes a change, it adds 1 to a counter.
• If two nodes give you different counters (like 4 and 6), the node with the higher number is
the newer version.
• Problem: If multiple nodes are updating at the same time, you can’t tell which change came
rst if the counters are not properly synchronized. It's great for a single main node, but not
for many nodes updating at once.
2. Timestamps

• Imagine a clock showing the current time.


• Every time a node updates something, it records the current time (timestamp).
• If one node has a timestamp of 10:00 AM and another node has 10:05 AM, the second one
is the newer version.
• Problem: Clocks on different nodes can be out of sync, so if one node's clock is wrong, it
could cause confusion. Also, this doesn't help if two nodes update data at the same time.
3. Version History (Like a Journal of Changes)

• Think of a journal where you keep a record of all changes.


• Every time a node updates something, it writes a new entry in its journal.
• If two nodes share their journals, you can check who made the change rst, or if both
journals show the same changes.
• Problem: This method requires a lot of space and tracking to store all changes.
4. Vector Stamps (Version Vectors)

• Imagine having a list of counters, one for each node in the system.
• Each time a node makes a change, it increases its counter. So, if there are three nodes, the
version stamp could look like:
[Node A: 4, Node B: 5, Node C: 3].
• When two nodes share information, they compare their lists. If a node’s list has bigger
numbers, it means it’s newer.
• Example: If Node A’s list is [4, 5, 3] and Node B’s list is [4, 4, 3], Node A is
newer because it has a higher value for Node B’s counter.
• Problem: It doesn’t solve con icts on its own, it just detects them (for example, if two
nodes change the same value at the same time).
Why is Vector Stamping Important?

• Vector stamps work well in systems where multiple nodes are updating data independently,
and it helps detect if updates are inconsistent (for example, if two nodes made con icting
changes). This method ensures that you can track which node made what change, and
determine if any con icts happened.
fi
fl
fl
fi
fl
MapReduce Model for Calculating Average Ordered Quantity of Each Product

To calculate the average ordered quantity of each product using MapReduce, we need to follow
the typical steps in the MapReduce model, but we have to handle the speci c aggregation of sums
and counts for each product. Here's the breakdown of the process:

1. Map Phase:
◦ For each order record, emit the product and the ordered quantity.
◦ Emit (product, quantity) as key-value pairs, where the key is the product,
and the value is the ordered quantity.
2. Shuf e and Sort Phase:

◦ Group the records by product so that all records related to the same product are
brought together.
3. Reduce Phase:

◦ For each group of records (i.e., all records for the same product), sum the ordered
quantities and count how many orders there are for that product.
◦ Calculate the average by dividing the total sum by the count.
MapReduce Calculation for Average Ordered Quantity

Input Data: Suppose we have the following records (product, ordered quantity):

Product A, 10
Product B, 20
Product A, 15
Product A, 5
Product B, 10
Step-by-Step Calculation:

1. Map Phase:

• Each record is processed and emitted as a key-value pair where the key is the product, and
the value is the ordered quantity.

Input Records:
("Product A", 10)
("Product B", 20)
("Product A", 15)
("Product A", 5)
("Product B", 10)

Map Output:
("Product A", 10)
("Product B", 20)
("Product A", 15)
("Product A", 5)
fl
fi
("Product B", 10)
2. Shuf e and Sort Phase:

• The system groups all the records by the key, i.e., by product. So we get:

Shuffled and Sorted Output:


"Product A" -> [10, 15, 5]
"Product B" -> [20, 10]
3. Reduce Phase:

• For each product, the reducer calculates:


◦ The total quantity (sum of quantities).
◦ The count (the number of orders for that product).
◦ Then calculates the average by dividing the total quantity by the count.
For "Product A":

• Sum of quantities = 10 + 15 + 5 = 30
• Count of orders = 3
• Average = Sum / Count = 30 / 3 = 10
For "Product B":

• Sum of quantities = 20 + 10 = 30
• Count of orders = 2
• Average = Sum / Count = 30 / 2 = 15

Reduce Output:
"Product A" -> Average = 10
"Product B" -> Average = 15
Example and Diagram:

Input Data:

Ordered
Product
Quantity
Product
10
A
Product
20
B
Product
15
A
Product
5
A
Product
10
B

Map Phase:

After the Map phase, we emit:


fl
Value (Ordered
Key
Quantity)
Product
10
A
Product
20
B
Product
15
A
Product
5
A
Product
10
B

Shuf e and Sort Phase:

We group the records by product:

Values (Ordered
Key
Quantities)
Product
[10, 15, 5]
A
Product
[20, 10]
B

Reduce Phase:

For each product, we calculate the total and the count:

• For Product A:

◦ Total Quantity = 10 + 15 + 5 = 30
◦ Count = 3
◦ Average = 30 / 3 = 10
• For Product B:

◦ Total Quantity = 20 + 10 = 30
◦ Count = 2
◦ Average = 30 / 2 = 15
Final Output (After Reduce Phase):

Average Ordered
Product
Quantity
Product
10
A
Product
15
B
fl
Visual Diagram:

Input Data:
(Product A, 10)
(Product B, 20)
(Product A, 15)
(Product A, 5)
(Product B, 10)

Map Phase:
Emit:
(Product A, 10)
(Product B, 20)
(Product A, 15)
(Product A, 5)
(Product B, 10)

Shuffle & Sort Phase:


Group by Key:
(Product A, [10, 15, 5])
(Product B, [20, 10])

Reduce Phase:
For Product A:
Total = 10 + 15 + 5 = 30
Count = 3
Average = 30 / 3 = 10

For Product B:
Total = 20 + 10 = 30
Count = 2
Average = 30 / 2 = 15

Output:
(Product A, 10)
(Product B, 15)
Explanation:

• Map: The Map function emits key-value pairs where the key is the product, and the value is
the quantity ordered.
• Shuf e & Sort: The system groups all records by the product (key), so we get a list of all
quantities for each product.
• Reduce: The Reduce function calculates the total and count for each product, and
computes the average by dividing the total by the count.
fl
Conclusion:

By structuring our calculation in this way, we can ef ciently compute the average ordered quantity
for each product using MapReduce. This approach scales well, as each task can be processed in
parallel, allowing for distributed computation across a cluster.

fi
Key-value stores are a type of NoSQL database where data is stored as pairs of a key and a value.
These databases are designed for fast, simple data retrieval and are often used when you need to
store a lot of data and access it quickly.

Here’s a simple explanation of the key features of key-value stores:

1. Consistency

• In key-value stores, consistency refers to whether data is the same across all copies
(replicas) of the data. If data is updated in one place, it may take some time for all replicas to
re ect the change.
• Some key-value stores like Riak use eventual consistency, meaning changes will
eventually be consistent across all replicas, but not immediately.
• Example: If you update a product’s price in one location, it might take some time before all
users see the new price.
2. Transactions

• Transactions in key-value stores are simpler than in traditional databases. In most key-
value stores, you can’t have complex multi-step transactions like in relational databases.
Instead, a key-value store can ensure that data is written to a certain number of replicas
(multiple servers) before the write is considered successful.
• Example: If you want to update data, you can set rules like "write to at least 3 servers," but it
doesn’t ensure full transactional behavior (like locking or rolling back changes).
3. Query Features

• Key-value stores are simple: you can only look up data by its key. There’s no complex
querying like in relational databases (e.g., ltering by column values).
• If you need more complex querying, some key-value stores like Riak allow you to search
inside the value (e.g., if values are stored as JSON, you can search the elds inside the
JSON).
• Example: If you have a shopping cart, you would query by the cart ID (the key), and you get
the cart details (the value).
4. Structure of Data

• Key-value stores don’t impose a structure on the value part of the data. The value can be
anything, such as text, numbers, JSON data, or even binary les.
• Example: You can store user pro les as JSON objects, and the key could be the user ID.
5. Scaling

• Sharding: To handle more data, key-value stores split (or shard) the data across multiple
servers. For example, if you store data based on a key, a key-value store might assign each
key to a speci c server based on a hash function.
• Replication: Data is often copied across multiple servers (replication) to ensure it’s always
available even if a server goes down.
• Example: If you have a key for “user123” that’s stored on one server, the data could also be
stored on several other servers for backup and availability.
Why Use Key-Value Stores?

• They are very fast for simple lookups by key.


fl
fi
fi
fi
fi
fi
• They are exible, allowing you to store a wide variety of data types without worrying about
a xed schema (structure).
• They scale well as your application grows because they can distribute data across multiple
servers.
Example Use Cases:

• Session storage: Storing temporary user session data (e.g., a session ID as the key and the
user’s details as the value).
• Caching: Storing frequently accessed data for quick retrieval, like product details or page
content.
• Shopping cart: Storing cart information with the cart ID as the key and the list of items as
the value.
Conclusion:

Key-value stores are perfect when you need simple, fast lookups, can manage with minimal
querying, and are looking to scale your application quickly. They work well for speci c use cases
like session management, caching, and storing user pro les.
fi
fl
fi
fi
Map-Reduce is a simple yet powerful method used to process large amounts of data in parallel
across many computers. It consists of two main steps: Map and Reduce.

Basic Map-Reduce Steps:

1. Map Step:
◦ In the Map step, we break down the input data into smaller pieces and process each
piece independently.
◦ For example, let’s say we have orders, and each order has products with quantity and
price. The Map function takes each order and creates a key-value pair:
▪ Key: Product ID (e.g., "A001")
▪ Value: Quantity and Price (e.g., Quantity: 3, Price: 20)
2. Map Output Example:

◦ Order 1: Product A001, Quantity 3, Price 20 → Emit ("A001", 3, 20)


◦ Order 2: Product A001, Quantity 2, Price 20 → Emit ("A001", 2, 20)
3. Shuf e and Sort Step:

◦ The system groups all the key-value pairs by their key (Product ID in this case). So
all "A001" values will be grouped together for the next step.
4. Reduce Step:

◦ In the Reduce step, the system takes all the values for the same key and combines
them. For example:
▪ For Product A001, we get the quantities and prices from the Map step.
▪ We sum the quantities and calculate total revenue by multiplying quantity
with price.
5. Reduce Calculation Example for Product A001:

◦ Quantity = 3 + 2 = 5
◦ Revenue = (3 * 20) + (2 * 20) = 60 + 40 = 100
6. Final Output:

◦ The nal result will be something like:


▪ Product A001 → Total Quantity: 5, Total Revenue: 100
fi
fl
In Riak, data is stored in buckets, and operations are performed using HTTP requests. Here's how
data can be read from and written to a bucket:

1. Storing Data (POST)

To store data in Riak, send a POST request with the data and specify the bucket and key.

curl -v -X POST -d '{"lastVisit": 1324669989288, "user":


{"customerId": "91cfdf5bcb7c", "name": "buyer"}}' \
-H "Content-Type: application/json" \
http://localhost:8098/buckets/session/keys/a7e618d9db25
2. Retrieving Data (GET)

To retrieve data, send a GET request with the bucket name and key:

curl -i http://localhost:8098/buckets/session/keys/
a7e618d9db25
3. Deleting Data (DELETE)

To delete data, use a DELETE request with the key:

curl -i -X DELETE http://localhost:8098/buckets/session/keys/


a7e618d9db25
Summary:

• POST to store data: curl -X POST -d '{"data"}' ...


• GET to retrieve data: curl -i GET ...
• DELETE to remove data: curl -X DELETE ...

You might also like