0% found this document useful (0 votes)
3 views12 pages

Nosql Mod4

Document databases store and retrieve documents in formats like XML and JSON, allowing for schema flexibility and dynamic data representation. Key features include self-describing hierarchical structures, handling missing data, and support for embedding child documents. Popular document databases include MongoDB, CouchDB, and RavenDB, with MongoDB serving as a representative example for its features, consistency, transactions, and scaling capabilities.

Uploaded by

Prerana S A
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views12 pages

Nosql Mod4

Document databases store and retrieve documents in formats like XML and JSON, allowing for schema flexibility and dynamic data representation. Key features include self-describing hierarchical structures, handling missing data, and support for embedding child documents. Popular document databases include MongoDB, CouchDB, and RavenDB, with MongoDB serving as a representative example for its features, consistency, transactions, and scaling capabilities.

Uploaded by

Prerana S A
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

MODULE 4

DOCUMENT DATABASES

1. Documents:

o The primary concept in document databases.

o Stores and retrieves documents in formats like XML, JSON, BSON, etc.

o Documents are self-describing, hierarchical tree structures consisting of


maps, collections, and scalar values.

o Documents are similar to each other but not required to be identical.

2. Storage:

o Documents are stored in the value part of a key-value store.

o Document databases can be viewed as key-value stores where the value (the
document) is examinable.

3. Terminology Comparison (Oracle vs MongoDB):

o _id in MongoDB:

▪ A special field found in all documents.

▪ Similar to ROWID in Oracle.

▪ _id can be user-assigned, as long as it remains unique.

o ROWID in Oracle:

▪ Serves a similar function as MongoDB’s _id field.

9.1 WHAT IS A DOCUMENT DATABASE?

The above document can be considered a row in a traditional


RDBMS. Let’s look at another document:
1. Key Features:

o Schema Flexibility:

▪ Documents in a collection can have different attribute names and


structures.

▪ No fixed schema like in traditional RDBMS, where every row in a table


must follow the same schema.

▪ Example: One document has an addresses field, while another has a


likes field.

o Data Representation:

▪ Attributes can vary between documents, e.g., some documents may


have a likes field, while others may not.

▪ Embedding:

▪ Child documents (e.g., addresses) can be embedded inside the


main document for easier access and better performance.

o Handling Missing Data:

▪ If an attribute is missing, it is assumed not relevant, unlike RDBMS


where missing data is set to null or empty.

o Dynamic Schema:

▪ New attributes can be added to documents without the need to


define them or modify existing documents.

2. Popular Document Databases:

MongoDB , CouchDB , Terrastore , OrientDB , RavenDB , Lotus Notes (uses document


storage)
9.2 FEATURES

• MongoDB as a representative of document databases: While there are many


specialized document databases, MongoDB is used as a representative to explain
features.

• MongoDB structure:

o A MongoDB instance can have multiple databases.

o Each database can contain multiple collections.

• Comparison with RDBMS:

o An RDBMS instance is analogous to a MongoDB instance.

o Schemas in RDBMS are similar to MongoDB databases.

o RDBMS tables are equivalent to MongoDB collections.

• Storing documents in MongoDB:

o When storing a document, you need to specify which database and collection
it belongs to.

o Example: database.collection.insert(document) or db.coll.insert(document).

I understand you're frustrated, and I apologize for not including the code snippet in the
detailed summary. Here's the complete summary of your content, with every detail
included, including the code you provided:

9.2.1 Consistency in MongoDB:

• Replica Sets for Consistency:

o MongoDB achieves consistency by using replica sets, which replicate writes to


multiple servers.

o The number of servers to which a write must be propagated is configurable.

o A write can be considered successful only after being propagated to a certain


number of servers.

• Example Command for Consistency:

o Command: db.runCommand({ getlasterror : 1 , w : "majority" })

o The w parameter specifies how many nodes must confirm the write before
it’s successful.

▪ For example:
▪ If there is one node and w is "majority", the write is successful
immediately.

▪ If there are three nodes and w is "majority", the write must


complete on at least two nodes.

• Impact of Consistency Settings on Write Performance:

o Stronger consistency (higher w value) leads to slower write performance as


more nodes need to confirm the write.

• Increasing Read Performance:

o MongoDB allows reading from secondary (slave) nodes by setting the slaveOk
parameter.

o The slaveOk parameter can be set at the connection, database, collection, or


operation level.

• Example Code for Read Consistency:

Mongo mongo = new Mongo("localhost:27017");

mongo.slaveOk();

o This sets slaveOk for the MongoDB connection.

• Example Code for Query with Slave Read:

DBCollection collection = getOrderCollection();

BasicDBObject query = new BasicDBObject();

query.put("name", "Martin");

DBCursor cursor = collection.find(query).slaveOk();

o In this example:

▪ A query is created to find documents with the name "Martin."

▪ The query uses slaveOk() to allow reading from a slave node.

• WriteConcern for Write Consistency:

o WriteConcern controls the consistency level for write operations.

o By default, a write is considered successful once the database receives it.

o You can configure WriteConcern to wait for writes to sync to disk or


propagate to multiple nodes.

• Example Code for Setting WriteConcern:


DBCollection shopping = database.getCollection("shopping");

shopping.setWriteConcern(REPLICAS_SAFE);

o This sets the WriteConcern to REPLICAS_SAFE, ensuring that writes are


propagated to both the master and at least one slave.

• Setting WriteConcern per Operation:

WriteResult result = shopping.insert(order, REPLICAS_SAFE);

o This ensures that the write operation is replicated safely across the nodes.

• Trade-offs in Consistency Settings:

o The choice between read performance (slaveOk) and write consistency


(WriteConcern) should be made based on application needs and business
requirements.

9.2.2 Transactions in MongoDB:

• Traditional RDBMS Transactions:

o In traditional RDBMS, transactions allow modifications to multiple tables


using commands like insert, update, or delete.

o After making changes, you can decide to either commit (keep) or rollback
(discard) the changes.

• Transactions in NoSQL (MongoDB):

o In NoSQL systems like MongoDB, traditional transactions (spanning multiple


operations) are not available.

o MongoDB only supports atomic transactions at the single-document level.


This means:

▪ A write either succeeds or fails at the document level.

▪ There is no concept of commit or rollback for operations spanning


multiple documents or collections.

▪ However, some NoSQL products, like RavenDB, do support


transactions across multiple operations.

• Write Concern for Fine Control:

o MongoDB provides a way to control write operations' success using the


WriteConcern parameter.
o By default, all writes are considered successful as soon as they are received
by the database.

o You can configure WriteConcern to ensure the write is propagated to more


than one node before being reported as successful.

▪ For example, setting WriteConcern.REPLICAS_SAFE ensures the write


is propagated to the primary and at least one secondary node before
reporting success.

▪ Different levels of WriteConcern provide varying safety levels:

▪ WriteConcern.NONE is used for the lowest safety level,


suitable for less critical operations like writing log entries.

• Example Code for Transactions with Write Concern:

final Mongo mongo = new Mongo(mongoURI);

mongo.setWriteConcern(REPLICAS_SAFE);

DBCollection shopping = mongo.getDB(orderDatabase)

.getCollection(shoppingCollection);

try {

WriteResult result = shopping.insert(order, REPLICAS_SAFE);

// Writes made it to primary and at least one secondary

} catch (MongoException writeException) {

// Writes did not make it to minimum of two nodes including primary

dealWithWriteFailure(order, writeException);

o In this code:

▪ The MongoDB connection is configured with


WriteConcern.REPLICAS_SAFE to ensure the write is successful only
when the write reaches the primary and at least one secondary node.

▪ If the write operation fails to propagate to the required nodes, a


MongoException is caught, and the failure is handled by the
dealWithWriteFailure() method.

9.2.3 Availability in MongoDB:

• CAP Theorem:
o The CAP theorem states that a distributed database can achieve only two out
of the three properties: Consistency, Availability, and Partition Tolerance.

o MongoDB focuses on availability by using data replication, ensuring data


remains accessible even when the primary node is down.

• Replica Sets in MongoDB:

o MongoDB uses replica sets for high availability. A replica set consists of
multiple nodes where one is the primary (master) node, and others are
secondary (slave) nodes.

o Master-Slave Replication: The primary node handles all write requests, and
the data is asynchronously replicated to the secondary nodes.

o If the primary node fails, the secondary nodes automatically elect a new
primary. Future requests are routed to the newly elected primary.

o When the failed node comes back online, it rejoins the replica set as a
secondary and catches up with the rest of the nodes by pulling the missing
data.

• Priority Assignment:

o Nodes in the replica set can have different voting rights. Nodes can be
assigned a priority (a number between 0 and 1000) to influence the election
of the primary node.

o For example, nodes in the primary data center can be assigned a higher
priority to ensure they are elected as the primary node.

• Automatic Node Discovery:

o When an application connects to a replica set, it only needs to connect to one


node (whether primary or secondary).

o The application automatically discovers the other nodes in the replica set.

o If the primary node fails, the MongoDB driver will automatically connect to
the newly elected primary node, and the application does not need to handle
node selection or failure recovery.

• Use Cases for Replica Sets:

o Data Redundancy: Ensures that data is available on multiple nodes,


preventing data loss.

o Automated Failover: Automatically elects a new primary if the current


primary node fails.
o Read Scaling: Distributes read requests across secondary nodes to reduce the
load on the primary node.

o Server Maintenance Without Downtime: Allows for maintenance of servers


without interrupting service, as secondary nodes can handle requests during
maintenance.

o Disaster Recovery: Ensures data remains accessible and recoverable in case


of disasters.

• Comparison with Other Products:

o Similar availability setups using replication and failover mechanisms are found
in other products like CouchDB, RavenDB, and Terrastore.

9.2.4 Query Features in Document Databases:

• CouchDB Querying:

o Views: CouchDB uses views for querying documents. Views can be:

▪ Materialized Views: Precomputed results stored in the database.

▪ Dynamic Views: Computed at runtime using map-reduce functions.

o Example: For aggregating reviews and calculating the average rating, you can
create a view that performs the count and average calculations.

o Materialized Views: Precompute values to avoid recalculating for every


request. They are updated when queried, reflecting any changes in data.

• Advantages of Document Databases Over Key-Value Stores:

o Unlike key-value stores, document databases allow querying the data within
the document without retrieving the entire document by its key.

o This brings document databases closer to the relational database query


model.
• MongoDB Query Language:

o MongoDB’s query language is expressed using JSON.

o Some common constructs in MongoDB queries:

▪ $query: For the WHERE clause.

▪ $orderby: For sorting data.

▪ $explain: To view the execution plan of the query.

o MongoDB provides many other constructs that can be combined for creating
complex queries.

• MongoDB Query Examples:

o Fetching All Documents:

▪ SQL: SELECT * FROM order

▪ MongoDB: db.order.find()

o Fetching Orders for a Specific Customer:

▪ SQL: SELECT * FROM order WHERE customerId = "883c2c5b4e5b"

▪ MongoDB: db.order.find({"customerId":"883c2c5b4e5b"})

o Selecting Specific Fields for a Customer:

▪ SQL: SELECT orderId, orderDate FROM order WHERE customerId =


"883c2c5b4e5b"

▪ MongoDB: db.order.find({customerId:"883c2c5b4e5b"},{orderId:1,
orderDate:1})

o Querying Embedded Documents:

▪ Example: Querying orders where an item has a product name like


"Refactoring".

▪ SQL:

SELECT * FROM customerOrder, orderItem, product

WHERE customerOrder.orderId = orderItem.customerOrderId

AND orderItem.productId = product.productId

AND product.name LIKE '%Refactoring%'

MongoDB: db.orders.find({"items.product.name":/Refactoring/})
▪ Advantage: MongoDB queries are simpler because data is embedded
in documents, allowing direct querying of child objects.

9.2.5 SCALING IN DOCUMENT DATABASES:

• Scaling Concept:

o Scaling involves adding nodes or changing data storage to handle more load,
without migrating the database to a larger server.

o The focus is on database features that support increased load rather than
modifying the application itself.

• Scaling for Heavy-Read Loads:

o Horizontal Scaling for Reads:

▪ Achieved by adding more read slaves (secondary nodes) to a replica


set.

▪ For a 3-node replica set, more slave nodes can be added as the read
load increases.

▪ The slaveOk flag allows read operations to be directed to the slave


nodes.

▪ Adding a Node:

▪ New node is added with rs.add("mongod:27017").

▪ The new node syncs with existing nodes, joins as a secondary


node, and starts serving read requests.

▪ Advantages:

▪ No need to restart other nodes.

▪ No downtime for the application.

• Scaling for Writes:

o Sharding:
▪ Sharding splits data based on a certain field (e.g., state or year), and
the data is moved across different Mongo nodes.

▪ This allows for horizontal scaling for writes.

▪ Sharding Command:

▪ db.runCommand({ shardcollection: "ecommerce.customer", key: {


firstname: 1 } })

▪ The data is split based on the specified key (e.g., first name) to ensure
balanced distribution across the shards.

▪ As more nodes are added, the number of writable nodes increases,


providing better scalability for write operations.

• Sharding and Replica Sets:

o Each shard can be a replica set to improve read performance within the
shard.

o As new shards are added, data is rebalanced across the shards.

o Zero Downtime: The application does not experience downtime, although


performance may temporarily decrease while data is being moved to
rebalance shards.

• Shard Key Importance:

o The choice of shard key is critical to data distribution and performance.

o Geographical Sharding:

▪ Sharding can be done based on user location, such as East Coast or


West Coast, ensuring that data is served from the closest shards for
faster access.

9.3 SUITABLE USE CASES FOR DOCUMENT DATABASES:

• 9.3.1 Event Logging:

o Document databases are ideal for storing various types of event data across
different applications.

o They serve as a central data store for event logging, especially when the data
captured by events keeps changing.

o Events can be sharded by the application name or event type (e.g.,


order_processed, customer_logged).

• 9.3.2 Content Management Systems, Blogging Platforms:


o Document databases are suitable for content management systems and
blogging platforms because:

▪ They don’t have predefined schemas, allowing flexibility in storing


various types of data (e.g., user comments, profiles).

▪ They typically support JSON documents, which align well with web-
based content management and publishing.

• 9.3.3 Web Analytics or Real-Time Analytics:

o Document databases are effective for storing real-time analytics data.

o They support easy updates to parts of the document, making it ideal for
tracking metrics like page views or unique visitors.

o New metrics can be added easily without the need for schema changes.

• 9.3.4 E-Commerce Applications:

o E-commerce applications benefit from document databases due to their


flexible schema.

o They are useful for storing product and order information, allowing data
models to evolve without expensive database refactoring or data migration.

9.4 WHEN NOT TO USE DOCUMENT DATABASES:

• 9.4.1 Complex Transactions Spanning Different Operations:

o Document databases are not ideal for scenarios requiring atomic operations
across multiple documents.

o If cross-document transactions are necessary, document databases may not


be suitable.

o However, some document databases, like RavenDB, support these types of


operations.

• 9.4.2 Queries against Varying Aggregate Structure:

o Document databases offer flexible schemas, meaning they don’t enforce


schema restrictions.

o This flexibility can cause issues if you need to query ad hoc or make queries
where the structure of the data keeps changing.

o If the design of aggregates is constantly changing, the data may need to be


stored at a lower granularity (normalized data), which could lead to
inefficiency in document databases.

You might also like