0% found this document useful (0 votes)
51 views59 pages

Fastquerying Indexingforperformance4 150324144349 Converske01

Fast querying in MongoDB involves optimizing performance through indexing strategies. Indexes are the most impactful way to tune query performance. The type of index used, such as single field, compound, geospatial, and text indexes, depends on the nature of the queries. Index selection, sort order, and background building can further enhance querying speed. Monitoring indexes helps optimize their usage over time.

Uploaded by

john
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views59 pages

Fastquerying Indexingforperformance4 150324144349 Converske01

Fast querying in MongoDB involves optimizing performance through indexing strategies. Indexes are the most impactful way to tune query performance. The type of index used, such as single field, compound, geospatial, and text indexes, depends on the nature of the queries. Index selection, sort order, and background building can further enhance querying speed. Monitoring indexes helps optimize their usage over time.

Uploaded by

john
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 59

Fast Querying:

Indexing Strategies to Optimize


Performance

Muthu Chinnasamy
Senior Solutions Architect
Agenda

• Introduction
• What is fast Querying?
• Index Types & Properties
• Index Intersection
• Index Monitoring
• New in MongoDB 3.0
Introduction
MongoDB's unique architecture
• MongoDB uniquely brings the best features of
both RDBMS and NoSQL

RDBMS No SQL

Strong consistency Flexibility

Secondary indexes Scalability

Rich query language Performance


When to use an index?
• Indexes are the single biggest tunable
performance factor for an application
• Use for frequently accessed queries
• Use when low latency response time needed
How different are indexes in
MongoDB?
Compared to NoSQL stores, MongoDB indexes
are
•Native in the database and not maintained by
developers in their code
•Strongly consistent - Atomically updated with
the data as part of the same write operation
What is Fast Querying?
The query
Question:
Find the zip codes in New York city with population more
than 100,000. Sort the results by population in descending
order
Query:
db.zips.find({state:'NY',city:'NEW YORK',pop:
{'$gt':100000}}).sort({pop:-1})
Output:
{"zip" : "10021", "city" : "NEW YORK", "pop" : 106564, "state" : "NY" }
{"zip" : "10025", "city" : "NEW YORK", "pop" : 100027, "state" : "NY" }
No Index – Collection Scan
Observations:
"cursor" : "BasicCursor"
"n" : 2
29470
29470 documents
documents
"nscannedObjects" : 29470
scanned!
scanned!
"nscanned" : 29470
"scanAndOrder" : true
"millis" : 12
Index on a single field
Create Index:
db.zips.ensureIndex({state:1})

Observations:
"cursor" : "BtreeCursor state_1"
"n" : 2 Better.
Better. Only
Only 1596
1596
"nscannedObjects" : 1596 documents
documents scanned
scanned for
for
"nscanned" : 1596 the
the same
same result!
result!
"scanAndOrder" : true
"millis" : 3
Compound Index on two fields
Create Index:
db.zips.ensureIndex({state:1, city:1})

Observations:
"cursor" : "BtreeCursor state_1_city_1"
"n" : 2 Much
Much better.
better. Only
Only 40
40
"nscannedObjects" : 40 documents
documents scanned
scanned
"nscanned" : 40 for
for the
the same
same result!
result!
"scanAndOrder" : true
"millis" : 0
Compound Index on three fields
Create Index:
db.zips.ensureIndex({state:1, city:1, pop:1})

Observations:
"cursor" : "BtreeCursor state_1_city_1_pop_1 reverse"
"n" : 2 22 documents
documents scanned
scanned for for
"nscannedObjects" : 2 the
the same
same result.
result. This
This isis fast
fast
"nscanned" : 2 querying
querying folks!
folks!
"scanAndOrder" : false
"millis" : 0
Types of Indexes
Types of Indexes
Be sure to remove unneeded
indexes
Drop Indexes:
db.zips.dropIndex({state:1, city:1})
db.zips.dropIndex({state:1})

Why drop those indexes?


–Not used by mongo for given queries
–Consume space
–Affect write operations
Use projection
• Reduce data sent back to the client over the network
• Use the projection clause with a 1 to enable and 0 to disable
– Return specified fields only in a query
– Return all but excluded fields
– Use $, $elemMatch, or $slice operators to project array fields

// exclude _id and include item & qty fields


> db.inventory.find( { type: 'food' }, { item: 1, qty: 1, _id:0 } )
// project all fields except the type field
> db.inventory.find( { type: 'food' }, { type:0 } )
// project the first two elements of the ratings array & the _id field
> db.inventory.find( { _id: 5 }, { ratings: { $slice: 2 } } )
Covered (Index only) Queries
• Returns data from an index only
– Not accessing the collection in a query
– Performance optimization
– Works with compound indexes
– Invoke with a projection

> db.users.ensureIndex( { user : 1, password :1 } )

> db.user.find({ user: ”Muthu” },


{ _id:0, password:1 } )
Ensure indexes fit in RAM
Index Types & Properties
Indexing Basics

// Create index on author (ascending)


>db.articles.ensureIndex( { author : 1 } )

// Create index on author (descending)


>db.articles.ensureIndex( { author : -1 } )

// Create index on arrays of values on the "tags" field – multi key index.
>db.articles.ensureIndex( { tags : 1 } )
Sub-document indexes
• Index on sub-documents
– Using dot notation {
‘_id’ : ObjectId(..),
‘article_id’ : ObjectId(..),
‘section’ : ‘schema’,
>db.interactions.ensureIndex( ‘date’ : ISODate(..),
‘daily’: { ‘views’ : 45,
{ “daily.comments” : 1} ‘comments’ : 150 }
‘hours’ : {
} 0 : { ‘views’ : 10 },
1 : { ‘views’ : 2 },
>db.interactions.find( …
23 : { ‘views’ : 14,
{“daily.comments” : { "$gte" : 150} } ‘comments’ : 10 }
) }
}
Compound indexes
• Indexes defined on multiple fields

//To view via the console


> db.articles.ensureIndex( { author : 1, tags : 1 } )

> db.articles.find( { author : Muthu C’, tags : ‘MongoDB’} )


//and
> db.articles.find( { author : Muthu C’ } )

// you don’t need a separate single field index on "author"


> db.articles.ensureIndex( { author : 1 } )
Sort order
• Sort doesn’t matter on single field indexes
– We can read from either side of the btree
• { attribute: 1 } or { attribute: -1 }
• Sort order matters on compound indexes
– We’ll want to query on author and sort by date in the application

// index on author ascending but date descending

>db.articles.ensureIndex( { ‘author’ : 1, ‘date’ -1 } )


Options

• Uniqueness constraints (unique, dropDups)


// index on author must be unique. Reject duplicates

>db.articles.ensureIndex( { ‘author’ : 1}, { unique : true } )

• Sparse Indexes

// allow multiple documents to not have likes field

>db.articles.ensureIndex( { ‘author’ : 1, ‘likes’ : 1}, { sparse: true } )

* Missing fields are stored as null(s) in the index


Background Index Builds

• Index creation is a blocking operation that can


take a long time
• Background creation yields to other operations
• Build more than one index in background
concurrently
• Restart secondaries in standalone to build index
// To build in the background
> db.articles.ensureIndex(
{ ‘author’ : 1, ‘date’ -1 },
{background : true}
)
Other Index Types

• Geospatial Indexes (2d Sphere)


• Text Indexes
• TTL Collections (expireAfterSeconds)
• Hashed Indexes for sharding
Geospatial Index - 2dSphere
Supported GeoJSON
objects:
• Indexes on geospatial fields
Point
– Using GeoJSON objects LineString
Polygon
– Geometries on spheres MultiPoint
MultiLineString
MultiPolygon
GeometryCollection
//GeoJSON object structure for indexing
{
name: ’MongoDB Palo Alto’,
location: { type : “Point”,
coordinates: [ 37.449157 , -122.158574 ] }
}

// Index on GeoJSON objects


>db.articles.ensureIndex( { location: “2dsphere” } )
Extended Articles document
Articles collections
>db.articles.insert({
• Store the location 'text': 'Article
content…’,
article was posted 'date' : ISODate(...),
from…. 'title' :
MongoDB’,
’Indexing

'author' : ’Muthu C’,


• Geo location from 'tags' : ['mongodb',
'database',
browser 'geospatial’],

//Javascript function to get geolocation. ‘location’ : {


navigator.geolocation.getCurrentPosition(); ‘type’ : ‘Point’,
‘coordinates’ :
[37.449, -122.158]
//You will need to translate into GeoJSON
}
});
Geo Spatial Example
– Query for locations ’near’ a particular coordinate

>db.articles.find( {
location: { $near :
{ $geometry :
{ type : "Point”, coordinates : [37.449, -122.158] } },
$maxDistance : 5000
}
})
Text Indexes

• Use text indexes to support text search of


string content in documents of a collection
• Text indexes can include any field whose value
is a string or an array of string elements
• Text indexes can be very large
• To perform queries that access the text index,
use the $text query operator
• A collection can at most have one text index
Text Search

Operators
$text, $search, $language,
$meta >db.articles.ensureIndex(
{title: ”text”, content: ”text”}
• Only one text index )

per collection >db.articles.ensureIndex(


{ "$**" : “text”,
• $** operator to index name : “MyTextIndex”} )
all text fields in the >db.articles.ensureIndex(
collection { "$**" : "text”},
{ weights :
• Use weight to change { ”title" : 10, ”content" : 5},
name : ”MyTextIndex” }
importance of fields )
Search
• Use the $text and $search operators to query
• $meta for scoring results

// Search articles collection


> db.articles.find ({$text: { $search: ”MongoDB" }})

> db.articles.find(
{ $text: { $search: "MongoDB" }},
{ score: { $meta: "textScore" }, _id:0, title:1 } )

{ "title" : "Indexing MongoDB", "score" : 0.75 }


Performance best practices
• MongoDB performs best when the working set
fits in RAM
• When working set exceeds the RAM of a single
server, consider sharding across multiple servers
• Use SSDs for write heavy applications
• Use compression features of wiredTiger
• Absence of values and negation does not use
index
• Use covered queries that use index only
Performance best practices
• Avoid large indexed arrays
• Use caution indexing low-cardinality fields
• Eliminate unnecessary indexes
• Remove indexes that are prefixes of other
indexes
• Avoid regex that are not left anchored or rooted
• Use wiredTiger feature to place indexes on a
separate, higher performance volumes
We recognize customers need help

Rapid Start Consulting Service


https://www.mongodb.com/products/consulting#rapid_start
Index Intersection
Index Intersection
• Consider the scenario with collection having a Compound Index
{status:1, order_date: -1} & your query is
a. find({order_date:{'$gt': new Date(…)}, status: 'A'}

MongoDB should be able to use this index as the all fields of the
compound index are used in the query
Index Intersection
• Consider the scenario with collection having a Compound Index
{status:1, order_date: -1} & your query is
b. find({status: 'A'})

MongoDB should be able to use this index as the leading field of


the compound index is used in the query
Index Intersection
• Consider the scenario with collection having a Compound Index
{status:1, order_date: -1} & your query is
c. find({order_date:{'$gt': new Date(…)}} //not leading field

MongoDB will not be able to use this index as order_date in the


query is not a leading field of the compound index
Index Intersection
• Consider the scenario with collection having a Compound Index
{status:1, order_date: -1} & your query is
d. find( {} ).sort({order_date: 1}) // sort order is different

MongoDB will not be able to use this index as sort order on the
order_date in the query is different than that of the compound
index
Index Intersection
Index intersection should be able to resolve all four query
combinations with two separate indexes

a. find({order_date:{'$gt': new Date(…)}, status: 'A'}


b. find({status: 'A'})
c. find({order_date:{'$gt': new Date(…)}} //not leading field
d. find( {} ).sort({order_date: 1}) // sort order is different

Instead of the Compound Index {status:1, order_date: -1}, you


would create two single field indexes on {status:1} and
{order_date: -1}
Index Intersection – How to check?
db.zips.find({state: 'CA', city: 'LOS ANGELES'})
"inputStage" : {
"stage" : "AND_SORTED",
"inputStages" : [
{
"stage" : "IXSCAN",

"indexName" : "state_1",

{
"stage" : "IXSCAN",

"indexName" : "city_1",

Index monitoring
The Query Optimizer
• For each "type" of query, MongoDB periodically
tries all useful indexes
• Aborts the rest as soon as one plan wins
• The winning plan is temporarily cached for each
“type” of query (used for next 1,000 times)
• As of MongoDB 2.6 can use the intersection of
multiple indexes to fulfill queries
Explain plan
• Use to evaluate operations and indexes
– Which indexes have been used.. If any.
– How many documents / objects have been scanned
– View via the console or via code

//To view via the console


> db.articles.find({author:’Joe D'}).explain()
Explain() method
• What are the key metrics?
– # docs returned
– # index entries scanned
– Index used? Which one?
– Whether the query was covered?
– Whether in-memory sort performed?
– How long did the query take in millisec?
Explain plan output (no index)

{ Other Types:
"cursor" : ”BasicCursor",
… •BasicCursor
"n" : 12, • Full collection scan
"nscannedObjects" : 25820, •BtreeCursor
"nscanned" : 25820, •GeoSearchCursor
… •Complex Plan
"indexOnly" : false, •TextCursor

"millis" : 27,

}
Explain plan output (Index)
{ Other Types:
"cursor" : "BtreeCursor author_1_date_-
1", •BasicCursor
… • Full collection scan
"n" : 12, •BtreeCursor
"nscannedObjects" : 12, •GeoSearchCursor
"nscanned" : 12, •Complex Plan
… •TextCursor
"indexOnly" : false,

"millis" : 0,

}
Explain() method in 3.0
• By default .explain() gives query planner verbosity
mode. To see stats use .explain("executionStats")
• Descriptive names used for some key fields
{ …
"nReturned" : 2,
"executionTimeMillis" : 0,
"totalKeysExamined" : 2,
"totalDocsExamined" : 2,
"indexName" : "state_1_city_1_pop_1",
"direction" : "backward",

}
Explain() method in 3.0
• Fine grained query introspection into query plan and
query execution – Stages
• Support for commands: Count, Group, Delete,
Update
• db.collection.explain().find() – Allows for additional
chaining of query modifiers
– Returns a cursor to the explain result
– var a = db.zips.explain().find({state: 'NY'})
– a.next() to return the results
Database Profiler

• Collect actual samples from a running


MongoDB instance
• Tunable for level and slowness
• Can be controlled dynamically
Using Database profiler
• Enable to see slow queries
– (or all queries)
– Default 100ms

// Enable database profiler on the console, 0=off 1=slow 2=all


> db.setProfilingLevel(1, 50)
{ "was" : 0, "slowms" : 50, "ok" : 1 }

// View profile with


> show profile

// See the raw data


>db.system.profile.find().pretty()
New in MongoDB 3.0
Indexes on a separate storage
device
$ mongod --dbpath DBPATH --storageEngine wiredTiger
--wiredTigerDirectoryForIndexes

•Available only when wiredTiger configured as the


storage engine
•With the wiredTigerDirectoryForIndexes storage engine
option
• One file per collection under DBPATH/collection
• One file per index under DBPATH/index
•Allows customers to place indexes on a dedicated
storage device such as SSD for higher performance
Index compression
$ mongod --dbpath DBPATH --storageEngine wiredTiger
--wiredTigerIndexPrefixCompression

•Compression is on in wiredTiger by default


•Indexes on disk are compressed using prefix
compression
•Allows indexes to be compressed in RAM
Fine grain control for DBAs

MongoDB 3.0 enhancements allow fine grain control


for DBAs
•wiredTiger storage engine for wide use cases
•Index placement on faster storage devices
•Index compression saving disk and RAM capacity
•Finer compression controls for collections and
indexes during creation time
Register now: mongodbworld.com
Super Early Bird Ends April 3!
Use Code MuthuChinnasamy for additional 25% Off
*Come as a group of 3 or more – Save another 25%
MongoDB World is back!
June 1-2 in New York.

Use code MuthuChinnasamy for 25% off!


Come as a Group of 3 or More & Save Another
25%.
MongoDB can help you!
MongoDB Enterprise Advanced
The best way to run MongoDB in your data center

MongoDB Management Service (MMS)


The easiest way to run MongoDB in the cloud

Production Support
In production and under control

Development Support
Let’s get you running

Consulting
We solve problems

Training
Get your teams up to speed.

You might also like