Nosql Mod4
Nosql Mod4
DOCUMENT DATABASES
1. Documents:
o Stores and retrieves documents in formats like XML, JSON, BSON, etc.
2. Storage:
o Document databases can be viewed as key-value stores where the value (the
document) is examinable.
o _id in MongoDB:
o ROWID in Oracle:
o Schema Flexibility:
o Data Representation:
▪ Embedding:
o Dynamic Schema:
• MongoDB structure:
o When storing a document, you need to specify which database and collection
it belongs to.
I understand you're frustrated, and I apologize for not including the code snippet in the
detailed summary. Here's the complete summary of your content, with every detail
included, including the code you provided:
o The w parameter specifies how many nodes must confirm the write before
it’s successful.
▪ For example:
▪ If there is one node and w is "majority", the write is successful
immediately.
o MongoDB allows reading from secondary (slave) nodes by setting the slaveOk
parameter.
mongo.slaveOk();
query.put("name", "Martin");
o In this example:
shopping.setWriteConcern(REPLICAS_SAFE);
o This ensures that the write operation is replicated safely across the nodes.
o After making changes, you can decide to either commit (keep) or rollback
(discard) the changes.
mongo.setWriteConcern(REPLICAS_SAFE);
.getCollection(shoppingCollection);
try {
dealWithWriteFailure(order, writeException);
o In this code:
• CAP Theorem:
o The CAP theorem states that a distributed database can achieve only two out
of the three properties: Consistency, Availability, and Partition Tolerance.
o MongoDB uses replica sets for high availability. A replica set consists of
multiple nodes where one is the primary (master) node, and others are
secondary (slave) nodes.
o Master-Slave Replication: The primary node handles all write requests, and
the data is asynchronously replicated to the secondary nodes.
o If the primary node fails, the secondary nodes automatically elect a new
primary. Future requests are routed to the newly elected primary.
o When the failed node comes back online, it rejoins the replica set as a
secondary and catches up with the rest of the nodes by pulling the missing
data.
• Priority Assignment:
o Nodes in the replica set can have different voting rights. Nodes can be
assigned a priority (a number between 0 and 1000) to influence the election
of the primary node.
o For example, nodes in the primary data center can be assigned a higher
priority to ensure they are elected as the primary node.
o The application automatically discovers the other nodes in the replica set.
o If the primary node fails, the MongoDB driver will automatically connect to
the newly elected primary node, and the application does not need to handle
node selection or failure recovery.
o Similar availability setups using replication and failover mechanisms are found
in other products like CouchDB, RavenDB, and Terrastore.
• CouchDB Querying:
o Views: CouchDB uses views for querying documents. Views can be:
o Example: For aggregating reviews and calculating the average rating, you can
create a view that performs the count and average calculations.
o Unlike key-value stores, document databases allow querying the data within
the document without retrieving the entire document by its key.
o MongoDB provides many other constructs that can be combined for creating
complex queries.
▪ MongoDB: db.order.find()
▪ MongoDB: db.order.find({"customerId":"883c2c5b4e5b"})
▪ MongoDB: db.order.find({customerId:"883c2c5b4e5b"},{orderId:1,
orderDate:1})
▪ SQL:
MongoDB: db.orders.find({"items.product.name":/Refactoring/})
▪ Advantage: MongoDB queries are simpler because data is embedded
in documents, allowing direct querying of child objects.
• Scaling Concept:
o Scaling involves adding nodes or changing data storage to handle more load,
without migrating the database to a larger server.
o The focus is on database features that support increased load rather than
modifying the application itself.
▪ For a 3-node replica set, more slave nodes can be added as the read
load increases.
▪ Adding a Node:
▪ Advantages:
o Sharding:
▪ Sharding splits data based on a certain field (e.g., state or year), and
the data is moved across different Mongo nodes.
▪ Sharding Command:
▪ The data is split based on the specified key (e.g., first name) to ensure
balanced distribution across the shards.
o Each shard can be a replica set to improve read performance within the
shard.
o Geographical Sharding:
o Document databases are ideal for storing various types of event data across
different applications.
o They serve as a central data store for event logging, especially when the data
captured by events keeps changing.
▪ They typically support JSON documents, which align well with web-
based content management and publishing.
o They support easy updates to parts of the document, making it ideal for
tracking metrics like page views or unique visitors.
o New metrics can be added easily without the need for schema changes.
o They are useful for storing product and order information, allowing data
models to evolve without expensive database refactoring or data migration.
o Document databases are not ideal for scenarios requiring atomic operations
across multiple documents.
o This flexibility can cause issues if you need to query ad hoc or make queries
where the structure of the data keeps changing.