0% found this document useful (0 votes)
29 views

Mongodb

The document discusses MongoDB, a document-oriented database. It describes MongoDB's key features like document-based data model, ease of use without predefined schemas, and horizontal scalability. The document also covers MongoDB concepts like documents, collections, and CRUD operations like insert, update, remove. It discusses indexing, aggregation, and sharding in MongoDB.

Uploaded by

Ragendhu K P
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views

Mongodb

The document discusses MongoDB, a document-oriented database. It describes MongoDB's key features like document-based data model, ease of use without predefined schemas, and horizontal scalability. The document also covers MongoDB concepts like documents, collections, and CRUD operations like insert, update, remove. It discusses indexing, aggregation, and sharding in MongoDB.

Uploaded by

Ragendhu K P
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Syllabus Module 2:

Document based databases MongoDB- Documents- JSON & BSON format, representing
relationships, CRUD operations, Indexing, Aggregation, Sharding architecture and Replication
strategies, consistency and locking.

MongoDB
• MongoDB is a powerful, flexible, and scalable general-purpose database.
• It combines the ability to scale out with features such as secondary indexes, range queries,
sorting, aggregations, and geospatial indexes.
• Ease of Use
• Easy Scaling

Ease of Use
• A document-oriented database replaces the concept of a “row” with a more flexible model, the
“document.”
• By allowing embedded documents and arrays, the document oriented approach makes it possible
to represent complex hierarchical relationships with a single record.
• There are also no predefined schemas: a document’s keys and values are not of fixed types or
sizes.
• Without a fixed schema, adding or removing fields as needed becomes easier. Generally, this
makes development faster as developers can quickly iterate. It is also easier to experiment.

Easy Scaling
• Data set sizes for applications are growing at an incredible pace. Increases in available bandwidth
and cheap storage have created an environment where even small-scale applications need to
store more data than many databases were meant to handle.
• A terabyte of data, once an unheard-of amount of information, is now commonplace
• Scaling a database comes down to the choice between scaling up (getting a bigger machine) or
scaling out (partitioning data across more machines).
• Scaling up is often the path of least resistance, but it has draw- backs: large machines are often
very expensive, and eventually a physical limit is reached where a more powerful machine cannot
be purchased at any cost.
• The alternative is to scale out: to add storage space or increase performance, buy another
commodity server and add it to your cluster. This is both cheaper and more scalable; however, it
is more difficult to administer a thousand machines than it is to care for one.
• MongoDB was designed to scale out.
• Its document-oriented data model makes it easier for it to split up data across multiple servers.
• MongoDB automatically takes care of balancing data and load across a cluster, redistributing
documents automatically and routing user requests to the correct machines.
• This allows developers to focus on programming the application, not scaling it

MongoDB – Features
• Indexing
• MongoDB supports generic secondary indexes, allowing a variety of fast queries, and
provides unique, compound, geospatial, and full-text indexing capabilities as well.
• Aggregation
• MongoDB supports an “aggregation pipeline” that allows you to build complex aggregations
from simple pieces and allow the database to optimize it.
• Special collection types
• MongoDB supports time-to-live collections for data that should expire at a certain time,
such as sessions. It also supports fixed-size collections, which are useful for holding recent
data, such as logs.
• File storage
• MongoDB supports an easy-to-use protocol for storing large files and file metadata.

Architecture : -

MongoDB – Basic Concepts


• A document is the basic unit of data for MongoDB and is roughly equivalent to a row in a relational
database management system (but much more expressive).
• Similarly, a collection can be thought of as a table with a dynamic schema.
• A single instance of MongoDB can host multiple independent databases, each of which can have
its own collections.
• Every document has a special key, "_id", that is unique within a collection.
• MongoDB comes with a simple but powerful JavaScript shell, which is useful for the
administration of MongoDB instances and data manipulation.
• At the heart of MongoDB is the document: an ordered set of keys with associated values.
• The representation of a document varies by programming language, but most languages have a
data structure that is a natural fit, such as a map, hash, or dictionary.
• Example : {"greeting" : "Hello, world!", "foo" : 3}
In this example the value for "greeting" is a string, whereas the value for "foo" is an integer.
• The keys in a document are strings.
• Any UTF-8 character is allowed in a key, with a few notable exceptions:
o Keys must not contain the character \0 (the null character). This character is used to signify
the end of a key.
o The . and $ characters have some special properties and should be used only in certain
circumstances
o MongoDB is type-sensitive and case-sensitive.
o MongoDB cannot contain duplicate keys.
o Key/value pairs in documents are ordered:
Collections
• A collection is a group of documents.
• If a document is the MongoDB analog of a row in a relational database, then a collection can be
thought of as the analog to a table.
Dynamic Schemas
• Collections have dynamic schemas.
• This means that the documents within a single collection can have any number of different
“shapes.”

MongoDB – CRUD Operations

Inserting and Saving Documents


• Inserts are the basic method for adding data to MongoDB.
• To insert a document into a collection, use the collection’s insert method:
> db.foo.insert({"bar" : "baz"})
• This will add an "_id" key to the document (if one does not already exist) and store it in MongoDB

Batch Insert
• If you have a situation where you are inserting multiple documents into a collection, you can make
the insert faster by using batch inserts.
• Batch inserts allow you to pass an array of documents to the database
• > db.foo.batchInsert([{"_id" : 0}, {"_id" : 1}, {"_id" : 2}])
• > db.foo.find()
• { "_id" : 0 } { "_id" : 1 } { "_id" : 2 }
• Batch inserts are only useful if you are inserting multiple documents into a single col- lection:
• you cannot use batch inserts to insert into multiple collections with a single request.
• If you are importing a batch and a document halfway through the batch fails to be inserted, the
documents up to that document will be inserted and everything after that document will not:
• > db.foo.batchInsert([{"_id" : 0}, {"_id" : 1}, {"_id" : 1}, {"_id" : 2}])
• Only the first two documents will be inserted, as the third will produce an error: you cannot insert
two documents with the same "_id".

Removing Documents
• Now that there’s data in our database, let’s delete it:
• > db.foo.remove()
• This will remove all of the documents in the foo collection.
• This doesn’t actually remove the collection, and any meta information about it will still exist.
• that we want to remove everyone from the mailing.list collection where the value for "optout" is
true:
• > db.mailing.list.remove({"opt-out" : true})
• Once data has been removed, it is gone forever. There is no way to undo the remove or recover
deleted documents
• Remove Speed Removing documents is usually a fairly quick operation, but if you want to clear
an entire collection, it is faster to drop it
• db.tester.drop()

Updating Documents
• Once a document is stored in the database, it can be changed using the update method.
• update takes two parameters: a query document, which locates documents to update, and a
modifier document, which describes the changes to make to the documents found.
Document Replacement
• The simplest type of update fully replaces a matching document with a new one.
• This can be useful to do a dramatic schema migration
• { "_id" : ObjectId("4b2b9f67a1f631733d917a7a"), ‘
"name" : "joe",
"friends" : 32,
"enemies" : 2 }
• We want to move the "friends" and "enemies" fields to a "relationships" subdocument.
var joe = db.users.findOne({"name" : "joe"});
> joe.relationships = {"friends" : joe.friends, "enemies" : joe.enemies};
{ "friends" : 32, "enemies" : 2 }
> joe.username = joe.name;
"joe" > delete joe.friends; true
> delete joe.enemies; true
> delete joe.name; true
> db.users.update({"name" : "joe"}, joe);

• { "_id" : ObjectId("4b2b9f67a1f631733d917a7a"),
"username" : "joe",
"relationships" : {
"friends" : 32,
"enemies" : 2 }
}

Using Modifiers
• Usually only certain portions of a document need to be updated.
• You can update specific fields in a document using atomic update modifiers.
• Update modifiers are special keys that can be used to specify complex update operations, such as
altering, adding, or removing keys, and even manipulating arrays and embedded documents.
• Each URL and its number of page views is stored in a document that looks like this:
• { "_id" : ObjectId("4b253b067525f35f94b60a31"),
"url" : "www.example.com",
"pageviews" : 52 }
• db.analytics.update({"url" : "www.example.com"}, ...
{"$inc" : {"pageviews" : 1}})

“$set” modifier
• "$set" sets the value of a field.
• If the field does not yet exist, it will be created.
• This can be handy for updating schema or adding user-defined keys.
• db.users.findOne()
{ "_id" : ObjectId("4b253b067525f35f94b60a31"),
"name" : "joe",
"age" : 30,
"sex" : "male",
"location" : "Wisconsin" }
• db.users.update({"_id" : ObjectId("4b253b067525f35f94b60a31")}, ...
{"$set" : {"favorite book" : "War and Peace"}})

Upserts
• An upsert is a special type of update.
• If no document is found that matches the update criteria, a new document will be created by
combining the criteria and updated documents.
• If a matching document is found, it will be updated normally.
• Upserts can be handy because they can eliminate the need to “seed” your collection: you can
often have the same code create and update documents.
• an upsert (the third parameter to update specifies that this should be an upsert):
• db.analytics.update({"url" : "/blog"}, {"$inc" : {"pageviews" : 1}}, true)

MongoDB – Indexing
• A database index is similar to a book’s index.
• Instead of looking through the whole book, the database takes a shortcut and just looks at an
ordered list that points to the content, which allows it to query orders of magnitude faster.
• A query that does not use an index is called a table scan (a term inherited from relational
databases), which means that the server has to “look through the whole book” to find a query’s
results.
• This process is basically what you’d do if you were looking for information in a book without an
index: you start at page 1 and read through the whole thing.
• In general, you want to avoid making the server do table scans because it is very slow for large
collections.
• indexes have their price: every write (insert, update, or delete) will take longer for every index
you add.
• This is because MongoDB has to update all your indexes whenever your data changes, as well as
the document itself.
• Thus, MongoDB limits you to 64 indexes per collection.
• Generally you should not have more than a couple of indexes on any given collection.
• The tricky part becomes figuring out which fields to index.
• To choose which fields to create indexes for, look through your common queries and queries that
need to be fast and try to find a common set of keys from those.

MongoDB – Indexing - Compound Indexes


• An index keeps all of its values in a sorted order so it makes sorting documents by the indexed key
much faster.
• However, an index can only help with sorting if it is a prefix of the sort. For example, the index on
"username" wouldn’t help much for this sort:
• > db.users.find().sort({"age" : 1, "username" : 1})
• This sorts by "age" and then "username", so a strict sorting by "username" isn’t terribly helpful.
• To optimize this sort, you could make an index on "age" and "username":
• > db.users.ensureIndex({"age" : 1, "username" : 1})
• This is called a compound index and is useful if your query has multiple sort directions or multiple
keys in the criteria.
• A compound index is an index on more than one field
• The way MongoDB uses this index depends on the type of query you’re doing.
• There are three most common ways:
• db.users.find({"age" : 21}).sort({"username" : -1})
• This is a point query, which searches for a single value (although there may be multiple
documents with that value).
• Due to the second field in the index, the results are already in the correct order for the sort:
• MongoDB can start with the last match for {"age" : 21} and traverse the index in order:
• This type of query is very efficient:
• MongoDB can jump directly to the correct age and doesn’t need to sort the results as traversing
the index returns the data in the correct order.
• Note that sort direction doesn’t matter: MongoDB is comfortable traversing the index in either
direction.
• db.users.find({"age" : {"$gte" : 21, "$lte" : 30}})
• This is a multi-value query, which looks for documents matching multiple values (in this case, all
ages between 21 and 30).
• db.users.find({"age" : {"$gte" : 21, "$lte" : 30}}).sort({"username" : 1})
• This is a multi-value query, like the previous one, but this time it has a sort.
• As before, MongoDB will use the index to match the criteria:
• However, the index doesn’t return the usernames in sorted order and the query requested that
the results be sorted by username, so MongoDB has to sort the results in memory before
returning them.
• Thus, this query is usually less efficient than the queries above.
• One other index you can use in the last example is the same keys in reverse order: {"username" :
1, "age" : 1}.
• MongoDB will then traverse all the index entries, but in the order you want them back in.
• It will pick out the matching documents using the "age" part of the index:

• This is good in that it does not require any giant in-memory sorts.
• However, it does have to scan the entire index to find all matches.
• Thus, putting the sort key first is generally a good strategy when you’re using a limit so MongoDB
can stop scanning the index after a couple of matches.
• if your query is only looking for the fields that are included in the index, it does not need to fetch
the document.
• When an index contains all the values requested by the user, it is considered to be covering a
query.
• Whenever practical, use covered indexes in preference to going back to documents.

MongoDB – Indexing - Types of Indexes


Unique Indexes
• Unique indexes guarantee that each value will appear at most once in the index.
• For example, if you want to make sure no two documents can have the same value in the
"username" key, you can create a unique index:
• > db.users.ensureIndex({"username" : 1}, {"unique" : true})
• For example, suppose that we try to insert the following documents on the collection above:
• > db.users.insert({username: "bob"})
• > db.users.insert({username: "bob"})
• E11000 duplicate key error index: test.users.$username_1 dup key: { : "bob" }
• If you check the collection, you’ll see that only the first "bob" was stored.

Compound unique indexes


• You can also create a compound unique index.
• If you do this, individual keys can have the same values, but the combination of values across all
keys in an index entry can appear in the index at most once.
• For example, if we had a unique index on {"username" : 1, "age" : 1}, the following inserts would
be legal:
• > db.users.insert({"username" : "bob"})
• > db.users.insert({"username" : "bob", "age" : 23})
• > db.users.insert({"username" : "fred", "age" : 23})
• However, attempting to insert a second copy of any of these documents would cause a duplicate
key exception.

Sparse Indexes
• Unique indexes count null as a value, so you cannot have a unique index with more than one
document missing the key.
• However, there are lots of cases where you may want the unique index to be enforced only if the
key exists.
• If you have a field that may or may not exist but must be unique when it does, you can combine
the unique option with the sparse option.
• To create a sparse index, include the sparse option.
• For example, if providing an email address was optional but, if provided, should be unique, we
could do:
• > db.ensureIndex({"email" : 1}, {"unique" : true, "sparse" : true})

MongoDB – Sharding
• Sharding refers to the process of splitting data up across machines;
• the term partitioning is also sometimes used to describe this concept.
• By putting a subset of data on each machine, it becomes possible to store more data and handle
more load without requiring larger or more powerful machines, just a larger quantity of less-
powerful machines.
• Manual sharding can be done with almost any database software.
• Manual sharding is when an application maintains connections to several different database
servers, each of which are completely independent.
• The application manages storing different data on different servers and querying against the
appropriate server to get data back.
• This approach can work well but becomes difficult to maintain when adding or removing nodes
from the cluster or in the face of changing data distributions or load patterns.
• MongoDB supports autosharding, which tries to both abstract the architecture away from the
application and simplify the administration of such a system.
• MongoDB allows your application to ignore the fact that it isn’t talking to a standalone MongoDB
server, to some extent.
• On the operations side, MongoDB automates balancing data across shards and makes it easier to
add and remove capacity
• MongoDB’s sharding allows you to create a cluster of many machines (shards) and break up your
collection across them, putting a subset of data on each shard.
• This allows your application to grow beyond the resource limits of a standalone server or replica
set
• One of the goals of sharding is to make a cluster of 5, 10, or 1,000 machines look like a single
machine to your application.
• To hide these details from the application, we run a routing process called mongos in front of the
shards.
• This router keeps a “table of contents” that tells it which shard contains which data.
• Applications can connect to this router and issue requests normally

• The router, knowing what data is on which shard, is able to forward the requests to the
appropriate shard(s).
• If there are responses to the request, the router collects them, merges them, and sends them
back to the application.
• As far as the application knows, it’s connected to a standalone mongod,
• Deciding when to shard is a balancing act.
• You generally do not want to shard too early because it adds operational complexity to your
deployment and forces you to make design decisions that are difficult to change later.
• On the other hand, you do not want to wait too long to shard because it is difficult to shard an
overloaded system without downtime.
• In general, sharding is used to:
o Increase available RAM
o Increase available disk space
o Reduce load on a server
o Read or write data with greater throughput than a single mongod can handle
• Thus, good monitoring is important to decide when sharding will be necessary.
• Carefully measure each of these metrics.
• Generally people speed toward one of these bottlenecks much faster than the others, so figure
out which one your deployment will need to provision for first and make plans well in advance
about when and how you plan to convert your replica set.
• As you add shards, performance should increase roughly linearly per shard up to hundreds of
shards.
• However, you will usually experience a performance drop if you move from a non-sharded system
to just a few shards.
• Due to the overhead of moving data, maintaining metadata, and routing, small numbers of shards
will generally have higher latency and may even have lower throughput than a non-sharded
system.
• Thus, you may want to jump directly to three or more shards.

You might also like