0% found this document useful (0 votes)
18 views36 pages

Module 4

Uploaded by

vishnu priya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views36 pages

Module 4

Uploaded by

vishnu priya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 36

NOSQL DATABASE

MODULE -4
DOCUMENT DATABASES

Department of
Computer Science & Engineering

www.cambridge.edu.in
DOCUMENT DATABASES
Documents are the main concept in document databases.

 The database stores and retrieves documents, which can be XML, JSON,
BSON(Binary Javascript Object Notation), and so on.

 These documents are self-describing, hierarchical tree data structures which


can consist of maps, collections, and scalar values.
What Is a Document Database?

A document is a record in a document−based database


that contains data on an item and any associated
metadata. Field−value pairs form documents that
can include a variety of data types like characters,
integers, dates, arrays, and objects. They are
commonly saved in XML, JSON, or BSON formats.
Let’s look at how terminology compares in Oracle and MongoDB

The _id is a special field that is found on all documents in Mongo, just like ROWID in
Oracle. In MongoDB, _id can be assigned by the user, as long as it is unique.
{
"firstname": "Martin",
"likes": [ "Biking", "Photography" ],
"lastcity": "Boston",
"lastVisited":
}

Firstname likes listcity lastVIsited


Martin Biking photogra[py boston -

• Json data is specified in the name and value pairs


• Each data is separated be commas in JSON
• The square brackets in json are used to hold an array.
• The curly braces in json are used to hold object
{
"firstname": "Pramod",
"citiesvisited": [ "Chicago", "London", "Pune", "Bangalore" ],
"addresses": [
{ "state": "AK",
"city": "DILLINGHAM",
"type": "R"
},
{ "state": "MH",
"city": "PUNE",
"type": "R“
}
],
"lastcity": "Chicago”
}
firstname citiesvisited addresses lastcity
sate city type sate city type
Features of Document
Databases
9.2.1. Consistency
 Consistency in MongoDB database is configured by using the
replica sets and choosing to wait for the writes to be replicated
to all the slaves or a given number of slaves.

 Every write can specify the number of servers the write has to
be propagated to before it returns as successful.
A command like

db.runCommand({ getlasterror : 1 , w : "majority" }) tells


the database how strong is the consistency you want.
For example, if you have one server and specify the w as
majority, the write will return immediately since there is only
one node.
db.runCommand({ getlasterror : W , “1" })
If you have three nodes in the replica set and specify w as majority, the write
will have to complete at a minimum of two nodes before it is reported as a
success. You can increase the w value for stronger consistency but you will
suffer on write performance, since now the writes have to complete at more
nodes.
db.runCommand({ getlasterror : 1 , w: "majority" })
Read Performance

Replica sets also allow you to increase the read performance by allowing
reading from slaves by setting slaveOk; this parameter can be set on the
connection, or database, or collection, or individually for each operation

Mongo mongo = new Mongo("localhost:27017");


mongo.slaveOk();
Here we are setting slaveOk per operation, so that we can decide
which operations can work with data from the slave node.

DBCollection collection = getOrderCollection();


BasicDBObject query = new BasicDBObject();
query.put("name", "Martin");
DBCursor cursor = collection.find(query).slaveOk();
WriteConcern:

What is writeconcern MongoDB?


A: Writeconcern MongoDB is the acknowledgement level requested from the
MongoDB for the write operations to be performed on sharded clusters or
replica sets or to a standalone mongod.
You make sure that certain writes are written to the master and some
slaves by setting WriteConcern to REPLICAS_SAFE. Shown below is
code where we are setting the WriteConcern for all writes to a
collection:

DBCollection shopping = database.getCollection("shopping");


shopping.setWriteConcern(REPLICAS_SAFE);
WriteConcern can also be set per operation by specifying it on the save
command:

WriteResult result = shopping.insert(order, REPLICAS_SAFE);


9.2.2. Transactions
Transactions, in the traditional RDBMS sense, mean that you can start
modifying the database with insert, update, or delete commands over different
tables and then decide if you want to keep the changes or not by using commit
or rollback.

These constructs are generally not available in NoSQL solutions—a


write either succeeds or fails.
 Transactions at the single-document level are known as atomic
transactions.
 Transactions involving more than one operation are not possible,
although there are products such as RavenDB that do support
transactions across multiple operations.
final Mongo mongo = new Mongo(mongoURI);
mongo.setWriteConcern(REPLICAS_SAFE);
DBCollection shopping = mongo.getDB(orderDatabase)
.getCollection(shoppingCollection);
try {
WriteResult result = shopping.insert(order, REPLICAS_SAFE);
//Writes made it to primary and at least one secondary
} catch (MongoException writeException) {
//Writes did not make it to minimum of two nodes including primary
dealWithWriteFailure(order, writeException);
}
9.2.3. Availability
 The CAP theorem dictates that we can have only two of
Consistency, Availability, and Partition Tolerance.
 Document databases try to improve on availability by
replicating data using the master-slave setup.
 The same data is available on multiple nodes and the clients
can get to the data even when the primary node is down.
 MongoDB implements replication, providing high availability
using replica sets
 In a replica set, there are two or more nodes participating in an
asynchronous master slave replication.
 The replica-set nodes elect the master, or primary, among themselves.
Assuming all the nodes have equal voting rights, some nodes can be
favored for being closer to the other servers, for having more RAM, and
so on.
 users can affect this by assigning a priority—a number between 0 and
1000—to a node.
 All requests go to the master node, and the data is replicated to the slave
nodes. If the master node goes down, the remaining nodes in the replica
set vote among themselves to elect a new master;
example configuration of replica sets.
Replica sets are generally used for data redundancy, automated
failover, read scaling, server maintenance without downtime,
and disaster recovery. Similar availability setups can be achieved
with CouchDB, RavenDB, Terrastore, and other products
9.2.4. Query Features

Document databases provide different query features.


 CouchDB allows you to query via views—complex queries on
documents which can be either materialized (“ Materialized
Views,”
 With CouchDB, if you need to aggregate the number of reviews for
a product as well as the average rating, you could add a view
implemented via map reduce to return the count of reviews and
the average of their ratings
MongoDB has a query language which is expressed via JSON and has
constructs such as $query for the where clause, $orderby for sorting
the data, or $explain to show the execution plan of the query. There
are many more constructs like these that can be combined to create
a MongoDB query
SQL MQL
SELECT * FROM order db.order.find()
SELECT * FROM order WHERE customerId db.order.find({"customerId":"883
= "883c2c5b4e5b" c2c5b4e5b"})
SELECT orderId,orderDate FROM order db.order.find({customerId:"883c2
WHERE customerId = "883c2c5b4e5b" c5b4e5b"},
{orderId:1,orderDate:1})
SQL MQL
SELECT * FROM customerOrder, db.orders.find({"items.product.na
orderItem, product WHERE me":/Refactoring/})
customerOrder.orderId =
orderItem.customerOrderId AND
orderItem.productId = product.productId
AND product.name LIKE '%Refactoring%'
The query for MongoDB is simpler because the objects are embedded inside
a single document and you can query based on the embedded child
documents.
9.2.5. Scaling
The idea of scaling is to add nodes or change data storage without simply
migrating the database to a bigger box.
READ OPERATION:

 Scaling for heavy-read loads can be achieved by adding more read slaves, so
that all the reads can be directed to the slaves.
 Given a heavy-read application, with our 3-node replica-set cluster, we can
add more read capacity to the cluster as the read load increases just by
adding more slave nodes to the replica set to execute reads with the slaveOk
flag ( Figure 9.2). This is horizontal scaling for reads.
Once the new node, mongo D, is started, it needs to be added to the replica
set.
rs.add("mongod:27017");

When a new node is added, it will sync up with the existing nodes, join the
replica set as secondary node, and start serving read requests.

An advantage of this setup is that we do not have to restart any other nodes,
and there is no downtime for the application either
When we want to scale for write

 When we want to scale for write, we can start sharding


 Sharding is similar to partitions in RDBMS where we split data by value in a
certain column, such as state or year. With RDBMS, partitions are usually
on the same node
 In sharding, the data is also split by certain field, but then moved to
different Mongo nodes.
 We can add more nodes to the cluster and increase the number of
writable nodes, enabling horizontal scaling for writes
db.runCommand( { shardcollection : "ecommerce.customer", key : {firstname : 1}
})

Splitting the data on the first name of the customer ensures that the
data is balanced across the shards for optimal write performance;
furthermore, each shard can be a replica set ensuring better read
performance within the shard
Suitable Use Cases

9.3.1. Event Logging Applications have different event logging needs; within
the enterprise, there are many different applications that want to log events.
Document databases can store all these different types of events and can act
as a central data store for event storage. This is especially true when the type
of data being captured by the events keeps changing. Events can be sharded
by the name of the application where the event originated or by the type of
event such as order_processed or customer_logged.
9.3.2. Content Management Systems, Blogging Platforms Since
document databases have no predefined schemas and usually
understand JSON documents, they work well in content
management systems or applications for publishing websites,
managing user comments, user registrations, profiles, web-facing
documents.
E-Commerce Applications E-commerce applications often need
to have flexible schema for products and orders
When Not to Use

9.4.1. Complex Transactions Spanning Different Operations

9.4.2. Queries against Varying Aggregate Structure

You might also like