MongoDB Manual
MongoDB Manual
Release 3.0.1
Contents
Introduction to MongoDB
1.1 What is MongoDB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Install MongoDB
2.1 Recommended Operating Systems for Production Deployments
2.2 Other Supported Operating Systems . . . . . . . . . . . . . . .
2.3 Installation Guides . . . . . . . . . . . . . . . . . . . . . . . .
2.4 First Steps with MongoDB . . . . . . . . . . . . . . . . . . . .
2.5 Additional Resources . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3
3
5
5
5
5
48
54
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
55
. 55
. 58
. 91
. 127
Data Models
4.1 Data Modeling Introduction . . . .
4.2 Data Modeling Concepts . . . . . .
4.3 Data Model Examples and Patterns
4.4 Data Model Reference . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
141
141
143
149
166
Administration
181
5.1 Administration Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
5.2 Administration Tutorials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
5.3 Administration Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280
Security
6.1 Security Introduction
6.2 Security Concepts .
6.3 Security Tutorials . .
6.4 Security Reference .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
305
305
307
321
389
Aggregation
7.1 Aggregation Introduction
7.2 Aggregation Concepts . .
7.3 Aggregation Examples . .
7.4 Aggregation Reference . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
417
417
421
434
451
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
463
463
468
502
537
Replication
9.1 Replication Introduction
9.2 Replication Concepts . .
9.3 Replica Set Tutorials . .
9.4 Replication Reference .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
541
541
545
581
631
10 Sharding
10.1 Sharding Introduction . .
10.2 Sharding Concepts . . . .
10.3 Sharded Cluster Tutorials
10.4 Sharding Reference . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
641
641
647
669
715
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
723
723
726
736
738
742
747
750
755
757
12 Release Notes
12.1 Current Stable Release . . . . .
12.2 Previous Stable Releases . . . .
12.3 Other MongoDB Release Notes
12.4 MongoDB Version Numbers . .
Indexes
8.1 Index Introduction
8.2 Index Concepts . .
8.3 Indexing Tutorials
8.4 Indexing Reference
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
763
763
794
887
887
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
889
889
889
890
890
890
ii
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
See About MongoDB Documentation (page 889) for more information about the MongoDB Documentation project,
this Manual and additional editions of this text.
Note: This version of the PDF does not include the reference section, see MongoDB Reference Manual1 for a PDF
edition of all MongoDB Reference Material.
1 http://docs.mongodb.org/master/MongoDB-reference-manual.pdf
Contents
Contents
CHAPTER 1
Introduction to MongoDB
Welcome to MongoDB. This document provides a brief introduction to MongoDB and some key concepts. See the
installation guides (page 5) for information on downloading and installing MongoDB.
CHAPTER 2
Install MongoDB
MongoDB runs on most platforms and supports both 32-bit and 64-bit architectures.
MongoDB
supported
supported
supported
supported
supported
supported
supported
MongoDB Enterprise
supported
supported
supported
supported
supported
supported
supported
MongoDB
supported
supported
supported
supported
supported
supported
supported
MongoDB Enterprise
supported
supported
supported
Install on Ubuntu (page 14) Install MongoDB on Ubuntu Linux systems using .deb packages.
Install on Debian (page 17) Install MongoDB on Debian systems using .deb packages.
Install on Other Linux Systems (page 19) Install the official build of MongoDB on other Linux systems from
MongoDB archives.
Install on OS X (page 21) Install the official build of MongoDB on OS X systems from Homebrew packages or from
MongoDB archives.
Install on Windows (page 24) Install MongoDB on Windows systems and optionally start MongoDB as a Windows
service.
Install MongoDB Enterprise (page 29) MongoDB Enterprise is available for MongoDB Enterprise subscribers and
includes several additional features including support for SNMP monitoring, LDAP authentication, Kerberos
authentication, and System Event Auditing.
Install MongoDB Enterprise on Red Hat (page 29) Install the MongoDB Enterprise build and required dependencies on Red Hat Enterprise or CentOS Systems using packages.
Install MongoDB Enterprise on Ubuntu (page 32) Install the MongoDB Enterprise build and required dependencies on Ubuntu Linux Systems using packages.
Install MongoDB Enterprise on Amazon AMI (page 40) Install the MongoDB Enterprise build and required
dependencies on Amazon Linux AMI.
Install MongoDB Enterprise on Windows (page 41) Install the MongoDB Enterprise build and required dependencies using the .msi installer.
Overview Use this tutorial to install MongoDB on Red Hat Enterprise Linux CentOS Linux using .rpm packages.
While some of these distributions include their own MongoDB packages, the official MongoDB packages are generally
more up to date.
Packages MongoDB provides packages of the officially supported MongoDB builds in its own repository. This
repository provides the MongoDB distribution in the following packages:
mongodb-org
This package is a metapackage that will automatically install the four component packages listed below.
mongodb-org-server
This package contains the mongod daemon and associated configuration and init scripts.
mongodb-org-mongos
This package contains the mongos daemon.
mongodb-org-shell
This package contains the mongo shell.
mongodb-org-tools
This package contains the following MongoDB tools: mongoimport bsondump, mongodump,
mongoexport, mongofiles, mongooplog, mongoperf, mongorestore, mongostat, and
mongotop.
Control Scripts The mongodb-org package includes various control scripts, including the init script
/etc/rc.d/init.d/mongod. These scripts are used to stop, start, and restart daemon processes.
The package configures MongoDB using the /etc/mongod.conf file in conjunction with the control scripts. See
the Configuration File reference for documentation of settings available in the configuration file.
As of version 3.0.1, there are no control scripts for mongos. The mongos process is used only in sharding (page 647).
You can use the mongod init script to derive your own mongos control script for use in such environments. See the
mongos reference for configuration details.
Considerations For production deployments, always run MongoDB on 64-bit systems.
The default /etc/mongod.conf configuration file supplied by the 3.0 series packages has bind_ip set to
127.0.0.1 by default. Modify this setting as needed for your environment before initializing a replica set.
Changed in version 2.6: The package structure and names have changed as of version 2.6. For instructions on installation of an older release, please refer to the documentation for the appropriate version.
Install MongoDB
Step
1:
Configure
the
package
management
system
(yum). Create
a
/etc/yum.repos.d/mongodb-org-3.0.repo file so that you can install MongoDB directly, using
yum.
Use the following repository file to specify the latest stable release of MongoDB.
[mongodb-org-3.0]
name=MongoDB Repository
baseurl=http://repo.mongodb.org/yum/redhat/$releasever/mongodb-org/3.0/x86_64/
gpgcheck=0
enabled=1
Use the following repository to install only versions of MongoDB for the 3.0 release. If youd like to install
MongoDB packages from a particular release series (page 887), such as 2.4 or 2.6, you can specify the release
series in the repository configuration. For example, to restrict your system to the 2.6 release series, create a
/etc/yum.repos.d/mongodb-org-2.6.repo file to hold the following configuration information for the
MongoDB 2.6 repository:
[mongodb-org-2.6]
name=MongoDB 2.6 Repository
baseurl=http://downloads-distro.mongodb.org/repo/redhat/os/x86_64/
gpgcheck=0
enabled=1
.repo files for each release can also be found in the repository itself1 . Remember that odd-numbered minor release
versions (e.g. 2.5) are development versions and are unsuitable for production use.
Step 2: Install the MongoDB packages and associated tools. When you install the packages, you choose whether
to install the current release or a previous one. This step provides the commands for both.
To install the latest stable version of MongoDB, issue the following command:
sudo yum install -y mongodb-org
To install a specific release of MongoDB, specify each component package individually and append the version number
to the package name, as in the following example:
You can specify any available version of MongoDB. However yum will upgrade the packages when a newer version
becomes available. To prevent unintended upgrades, pin the package. To pin a package, add the following exclude
directive to your /etc/yum.conf file:
exclude=mongodb-org,mongodb-org-server,mongodb-org-shell,mongodb-org-mongos,mongodb-org-tools
Versions of the MongoDB packages before 2.6 use a different repo location. Refer to the version of the documentation
appropriate for your MongoDB version.
Run MongoDB
Important: You must configure SELinux to allow MongoDB to start on Red Hat Linux-based systems (Red Hat
Enterprise Linux or CentOS Linux). Administrators have three options:
enable access to the relevant ports (e.g. 27017) for SELinux. See Default MongoDB Port (page 408) for more
information on MongoDBs default ports. For default settings, this can be accomplished by running
semanage port -a -t mongod_port_t -p tcp 27017
should be changed to
1 https://repo.mongodb.org/yum/{{distro_name}}/
SELINUX=permissive
All three options require root privileges. The latter two options each requires a system reboot and may have larger
implications for your deployment.
You may alternatively choose not to install the SELinux packages when you are installing your Linux operating system,
or choose to remove the relevant packages. This option is the most invasive and is not recommended.
The MongoDB instance stores its data files in /var/lib/mongo and its log files in /var/log/mongodb
by default, and runs using the mongod user account. You can specify alternate log and data file directories in
/etc/mongod.conf. See systemLog.path and storage.dbPath for additional information.
If you change the user that runs the MongoDB process, you must modify the access control rights to the
/var/lib/mongo and /var/log/mongodb directories to give this user access to these directories.
Step 1: Start MongoDB. You can start the mongod process by issuing the following command:
sudo service mongod start
Step 2: Verify that MongoDB has started successfully You can verify that the mongod process has started successfully by checking the contents of the log file at /var/log/mongodb/mongod.log for a line reading
[initandlisten] waiting for connections on port <port>
Step 3: Stop MongoDB. As needed, you can stop the mongod process by issuing the following command:
sudo service mongod stop
Step 4: Restart MongoDB. You can restart the mongod process by issuing the following command:
sudo service mongod restart
You can follow the state of the process for errors or important messages by watching the output in the
/var/log/mongodb/mongod.log file.
Step 5: Begin using MongoDB. To begin using MongoDB, see Getting Started with MongoDB (page 48). Also
consider the Production Notes (page 198) document before deploying MongoDB in a production environment.
Later, to stop MongoDB, press Control+C in the terminal where the mongod instance is running.
Install MongoDB on SUSE
Overview Use this tutorial to install MongoDB on SUSE Linux from .rpm packages. While SUSE distributions
include their own MongoDB packages, the official MongoDB packages are generally more up to date.
2.3. Installation Guides
Packages MongoDB provides packages of the officially supported MongoDB builds in its own repository. This
repository provides the MongoDB distribution in the following packages:
mongodb-org
This package is a metapackage that will automatically install the four component packages listed below.
mongodb-org-server
This package contains the mongod daemon and associated configuration and init scripts.
mongodb-org-mongos
This package contains the mongos daemon.
mongodb-org-shell
This package contains the mongo shell.
mongodb-org-tools
This package contains the following MongoDB tools: mongoimport bsondump, mongodump,
mongoexport, mongofiles, mongooplog, mongoperf, mongorestore, mongostat, and
mongotop.
Control Scripts The mongodb-org package includes various control scripts, including the init script
/etc/rc.d/init.d/mongod. These scripts are used to stop, start, and restart daemon processes.
The package configures MongoDB using the /etc/mongod.conf file in conjunction with the control scripts. See
the Configuration File reference for documentation of settings available in the configuration file.
As of version 3.0.1, there are no control scripts for mongos. The mongos process is used only in sharding (page 647).
You can use the mongod init script to derive your own mongos control script for use in such environments. See the
mongos reference for configuration details.
Considerations For production deployments, always run MongoDB on 64-bit systems.
The default /etc/mongod.conf configuration file supplied by the 3.0 series packages has bind_ip set to
127.0.0.1 by default. Modify this setting as needed for your environment before initializing a replica set.
Changed in version 2.6: The package structure and names have changed as of version 2.6. For instructions on installation of an older release, please refer to the documentation for the appropriate version.
Note: SUSE Linux Enterprise Server 11 and potentially other versions of SLES and other SUSE distributions ship
with virtual memory address space limited to 8GB by default. This must be adjusted in order to prevent virtual memory
allocation failures as the database grows.
The SLES packages for MongoDB adjust these limits in the default scripts, but you will need to make this change
manually if you are using custom scripts and/or the tarball release rather than the SLES packages.
Install MongoDB
Step 1: Configure the package management system (zypper). Add the repository so that you can install MongoDB using zypper.
Use the following command to specify the latest stable release of MongoDB.
zypper addrepo --no-gpgcheck http://repo.mongodb.org/zypper/suse/11/mongodb-org/3.0/x86_64/ mongodb
10
This repository only offers the 3.0 MongoDB release. If youd like to install MongoDB packages from a previous
release series (page 887), such as 2.6, you can specify the release series in the repository configuration. For example,
to restrict your system to the 2.6 release series, use the following command:
zypper addrepo --no-gpgcheck http://downloads-distro.mongodb.org/repo/suse/os/x86_64/ mongodb
Step 2: Install the MongoDB packages and associated tools. When you install the packages, you choose whether
to install the current release or a previous one. This step provides the commands for both.
To install the latest stable version of MongoDB, issue the following command:
sudo zypper install mongodb-org
To install a specific release of MongoDB, specify each component package individually and append the version number
to the package name, as in the following example:
You can specify any available version of MongoDB. However zypper will upgrade the packages when a newer
version becomes available. To prevent unintended upgrades, pin the packages by running the following command:
Previous versions of MongoDB packages use a different repository location. Refer to the version of the documentation
appropriate for your MongoDB version.
Run MongoDB The MongoDB instance stores its data files in /var/lib/mongo and its log files in
/var/log/mongodb by default, and runs using the mongod user account. You can specify alternate log and
data file directories in /etc/mongod.conf. See systemLog.path and storage.dbPath for additional information.
If you change the user that runs the MongoDB process, you must modify the access control rights to the
/var/lib/mongo and /var/log/mongodb directories to give this user access to these directories.
Step 1: Start MongoDB. You can start the mongod process by issuing the following command:
sudo service mongod start
Step 2: Verify that MongoDB has started successfully You can verify that the mongod process has started successfully by checking the contents of the log file at /var/log/mongodb/mongod.log for a line reading
[initandlisten] waiting for connections on port <port>
Step 3: Stop MongoDB. As needed, you can stop the mongod process by issuing the following command:
sudo service mongod stop
11
Step 4: Restart MongoDB. You can restart the mongod process by issuing the following command:
sudo service mongod restart
You can follow the state of the process for errors or important messages by watching the output in the
/var/log/mongodb/mongod.log file.
Step 5: Begin using MongoDB. To begin using MongoDB, see Getting Started with MongoDB (page 48). Also
consider the Production Notes (page 198) document before deploying MongoDB in a production environment.
Later, to stop MongoDB, press Control+C in the terminal where the mongod instance is running.
Install MongoDB on Amazon Linux
Overview Use this tutorial to install MongoDB on Amazon Linux from .rpm packages.
Packages MongoDB provides packages of the officially supported MongoDB builds in its own repository. This
repository provides the MongoDB distribution in the following packages:
mongodb-org
This package is a metapackage that will automatically install the four component packages listed below.
mongodb-org-server
This package contains the mongod daemon and associated configuration and init scripts.
mongodb-org-mongos
This package contains the mongos daemon.
mongodb-org-shell
This package contains the mongo shell.
mongodb-org-tools
This package contains the following MongoDB tools: mongoimport bsondump, mongodump,
mongoexport, mongofiles, mongooplog, mongoperf, mongorestore, mongostat, and
mongotop.
Control Scripts The mongodb-org package includes various control scripts, including the init script
/etc/rc.d/init.d/mongod. These scripts are used to stop, start, and restart daemon processes.
The package configures MongoDB using the /etc/mongod.conf file in conjunction with the control scripts. See
the Configuration File reference for documentation of settings available in the configuration file.
As of version 3.0.1, there are no control scripts for mongos. The mongos process is used only in sharding (page 647).
You can use the mongod init script to derive your own mongos control script for use in such environments. See the
mongos reference for configuration details.
Considerations For production deployments, always run MongoDB on 64-bit systems.
The default /etc/mongod.conf configuration file supplied by the 3.0 series packages has bind_ip set to
127.0.0.1 by default. Modify this setting as needed for your environment before initializing a replica set.
Changed in version 2.6: The package structure and names have changed as of version 2.6. For instructions on installation of an older release, please refer to the documentation for the appropriate version.
12
Install MongoDB
Step
1:
Configure
the
package
management
system
(yum). Create
a
/etc/yum.repos.d/mongodb-org-3.0.repo file so that you can install MongoDB directly, using
yum.
Use the following repository file to specify the latest stable release of MongoDB.
[mongodb-org-3.0]
name=MongoDB Repository
baseurl=http://repo.mongodb.org/yum/amazon/2013.03/mongodb-org/3.0/x86_64/
gpgcheck=0
enabled=1
Use the following repository to install only versions of MongoDB for the 3.0 release. If youd like to install
MongoDB packages from a particular release series (page 887), such as 2.4 or 2.6, you can specify the release
series in the repository configuration. For example, to restrict your system to the 2.6 release series, create a
/etc/yum.repos.d/mongodb-org-2.6.repo file to hold the following configuration information for the
MongoDB 2.6 repository:
[mongodb-org-2.6]
name=MongoDB 2.6 Repository
baseurl=http://downloads-distro.mongodb.org/repo/redhat/os/x86_64/
gpgcheck=0
enabled=1
.repo files for each release can also be found in the repository itself2 . Remember that odd-numbered minor release
versions (e.g. 2.5) are development versions and are unsuitable for production use.
Step 2: Install the MongoDB packages and associated tools. When you install the packages, you choose whether
to install the current release or a previous one. This step provides the commands for both.
To install the latest stable version of MongoDB, issue the following command:
sudo yum install -y mongodb-org
To install a specific release of MongoDB, specify each component package individually and append the version number
to the package name, as in the following example:
You can specify any available version of MongoDB. However yum will upgrade the packages when a newer version
becomes available. To prevent unintended upgrades, pin the package. To pin a package, add the following exclude
directive to your /etc/yum.conf file:
exclude=mongodb-org,mongodb-org-server,mongodb-org-shell,mongodb-org-mongos,mongodb-org-tools
Versions of the MongoDB packages before 2.6 use a different repo location. Refer to the version of the documentation
appropriate for your MongoDB version.
Run MongoDB The MongoDB instance stores its data files in /var/lib/mongo and its log files in
/var/log/mongodb by default, and runs using the mongod user account. You can specify alternate log and
data file directories in /etc/mongod.conf. See systemLog.path and storage.dbPath for additional information.
2 https://repo.mongodb.org/yum/{{distro_name}}/
13
If you change the user that runs the MongoDB process, you must modify the access control rights to the
/var/lib/mongo and /var/log/mongodb directories to give this user access to these directories.
Step 1: Start MongoDB. You can start the mongod process by issuing the following command:
sudo service mongod start
Step 2: Verify that MongoDB has started successfully You can verify that the mongod process has started successfully by checking the contents of the log file at /var/log/mongodb/mongod.log for a line reading
[initandlisten] waiting for connections on port <port>
Step 3: Stop MongoDB. As needed, you can stop the mongod process by issuing the following command:
sudo service mongod stop
Step 4: Restart MongoDB. You can restart the mongod process by issuing the following command:
sudo service mongod restart
You can follow the state of the process for errors or important messages by watching the output in the
/var/log/mongodb/mongod.log file.
Step 5: Begin using MongoDB. To begin using MongoDB, see Getting Started with MongoDB (page 48). Also
consider the Production Notes (page 198) document before deploying MongoDB in a production environment.
Later, to stop MongoDB, press Control+C in the terminal where the mongod instance is running.
Install MongoDB on Ubuntu
Overview Use this tutorial to install MongoDB on Ubuntu Linux systems from .deb packages. While Ubuntu
includes its own MongoDB packages, the official MongoDB packages are generally more up-to-date.
Packages MongoDB provides packages of the officially supported MongoDB builds in its own repository. This
repository provides the MongoDB distribution in the following packages:
mongodb-org
This package is a metapackage that will automatically install the four component packages listed below.
mongodb-org-server
This package contains the mongod daemon and associated configuration and init scripts.
mongodb-org-mongos
This package contains the mongos daemon.
14
mongodb-org-shell
This package contains the mongo shell.
mongodb-org-tools
This package contains the following MongoDB tools: mongoimport bsondump, mongodump,
mongoexport, mongofiles, mongooplog, mongoperf, mongorestore, mongostat, and
mongotop.
Control Scripts The mongodb-org package includes various control scripts, including the init script
/etc/init.d/mongod. These scripts are used to stop, start, and restart daemon processes.
The package configures MongoDB using the /etc/mongod.conf file in conjunction with the control scripts. See
the Configuration File reference for documentation of settings available in the configuration file.
As of version 3.0.1, there are no control scripts for mongos. The mongos process is used only in sharding (page 647).
You can use the mongod init script to derive your own mongos control script for use in such environments. See the
mongos reference for configuration details.
Considerations For production deployments, always run MongoDB on 64-bit systems.
You cannot install this package concurrently with the mongodb, mongodb-server, or mongodb-clients packages provided by Ubuntu.
MongoDB only provides packages for Ubuntu 12.04 LTS (Precise Pangolin) and 14.04 LTS (Trusty Tahr). These
packages may work with other Ubuntu releases.
The default /etc/mongod.conf configuration file supplied by the 3.0 series packages has bind_ip set to
127.0.0.1 by default. Modify this setting as needed for your environment before initializing a replica set.
Changed in version 2.6: The package structure and names have changed as of version 2.6. For instructions on installation of an older release, please refer to the documentation for the appropriate version.
Install MongoDB
Step 1: Import the public key used by the package management system. The Ubuntu package management tools
(i.e. dpkg and apt) ensure package consistency and authenticity by requiring that distributors sign packages with
GPG keys. Issue the following command to import the MongoDB public GPG Key3 :
sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 7F0CEB10
Step 3: Reload local package database. Issue the following command to reload the local package database:
sudo apt-get update
Step 4: Install the MongoDB packages. You can install either the latest stable version of MongoDB or a specific
version of MongoDB.
3 http://docs.mongodb.org/10gen-gpg-key.asc
15
Install the latest stable version of MongoDB. Issue the following command:
sudo apt-get install -y mongodb-org
Install a specific release of MongoDB. Specify each component package individually and append the version number to the package name, as in the following example:
Pin a specific version of MongoDB. Although you can specify any available version of MongoDB, apt-get will
upgrade the packages when a newer version becomes available. To prevent unintended upgrades, pin the package. To
pin the version of MongoDB at the currently installed version, issue the following command sequence:
echo
echo
echo
echo
echo
Versions of the MongoDB packages before 2.6 use a different repo location. Refer to the version of the documentation
appropriate for your MongoDB version.
Run MongoDB The MongoDB instance stores its data files in /var/lib/mongodb and its log files in
/var/log/mongodb by default, and runs using the mongodb user account. You can specify alternate log and
data file directories in /etc/mongod.conf. See systemLog.path and storage.dbPath for additional information.
If you change the user that runs the MongoDB process, you must modify the access control rights to the
/var/lib/mongodb and /var/log/mongodb directories to give this user access to these directories.
Step 1: Start MongoDB. Issue the following command to start mongod:
sudo service mongod start
Step 2: Verify that MongoDB has started successfully Verify that the mongod process has started successfully
by checking the contents of the log file at /var/log/mongodb/mongod.log for a line reading
[initandlisten] waiting for connections on port <port>
16
Step 5: Begin using MongoDB. To begin using MongoDB, see Getting Started with MongoDB (page 48). Also
consider the Production Notes (page 198) document before deploying MongoDB in a production environment.
Later, to stop MongoDB, press Control+C in the terminal where the mongod instance is running.
Install MongoDB on Debian
Overview Use this tutorial to install MongoDB from .deb packages on the current stable Debian release. While
Debian includes its own MongoDB packages, the official MongoDB packages are more up to date.
Packages MongoDB provides packages of the officially supported MongoDB builds in its own repository. This
repository provides the MongoDB distribution in the following packages:
mongodb-org
This package is a metapackage that will automatically install the four component packages listed below.
mongodb-org-server
This package contains the mongod daemon and associated configuration and init scripts.
mongodb-org-mongos
This package contains the mongos daemon.
mongodb-org-shell
This package contains the mongo shell.
mongodb-org-tools
This package contains the following MongoDB tools: mongoimport bsondump, mongodump,
mongoexport, mongofiles, mongooplog, mongoperf, mongorestore, mongostat, and
mongotop.
Control Scripts The mongodb-org package includes various control scripts, including the init script
/etc/init.d/mongod. These scripts are used to stop, start, and restart daemon processes.
The package configures MongoDB using the /etc/mongod.conf file in conjunction with the control scripts. See
the Configuration File reference for documentation of settings available in the configuration file.
As of version 3.0.1, there are no control scripts for mongos. The mongos process is used only in sharding (page 647).
You can use the mongod init script to derive your own mongos control script for use in such environments. See the
mongos reference for configuration details.
Considerations For production deployments, always run MongoDB on 64-bit systems.
You cannot install this package concurrently with the mongodb, mongodb-server, or mongodb-clients packages that your release of Debian may include.
The default /etc/mongod.conf configuration file supplied by the 3.0 series packages has bind_ip set to
127.0.0.1 by default. Modify this setting as needed for your environment before initializing a replica set.
Changed in version 2.6: The package structure and names have changed as of version 2.6. For instructions on installation of an older release, please refer to the documentation for the appropriate version.
Install MongoDB The Debian package management tools (i.e. dpkg and apt) ensure package consistency and
authenticity by requiring that distributors sign packages with GPG keys.
17
Step 1: Import the public key used by the package management system. Issue the following command to add
the MongoDB public GPG Key4 to the system key ring.
sudo apt-key adv --keyserver keyserver.ubuntu.com --recv 7F0CEB10
Step 3: Reload local package database. Issue the following command to reload the local package database:
sudo apt-get update
Step 4: Install the MongoDB packages. You can install either the latest stable version of MongoDB or a specific
version of MongoDB.
Install the latest stable version of MongoDB. Issue the following command:
sudo apt-get install -y mongodb-org
Install a specific release of MongoDB. Specify each component package individually and append the version number to the package name, as in the following example:
Pin a specific version of MongoDB. Although you can specify any available version of MongoDB, apt-get will
upgrade the packages when a newer version becomes available. To prevent unintended upgrades, pin the package. To
pin the version of MongoDB at the currently installed version, issue the following command sequence:
echo
echo
echo
echo
echo
Versions of the MongoDB packages before 2.6 use a different repo location. Refer to the version of the documentation
appropriate for your MongoDB version.
Run MongoDB The MongoDB instance stores its data files in /var/lib/mongodb and its log files in
/var/log/mongodb by default, and runs using the mongodb user account. You can specify alternate log and
data file directories in /etc/mongod.conf. See systemLog.path and storage.dbPath for additional information.
If you change the user that runs the MongoDB process, you must modify the access control rights to the
/var/lib/mongodb and /var/log/mongodb directories to give this user access to these directories.
4 http://docs.mongodb.org/10gen-gpg-key.asc
18
Step 2: Verify that MongoDB has started successfully Verify that the mongod process has started successfully
by checking the contents of the log file at /var/log/mongodb/mongod.log for a line reading
[initandlisten] waiting for connections on port <port>
Step 5: Begin using MongoDB. To begin using MongoDB, see Getting Started with MongoDB (page 48). Also
consider the Production Notes (page 198) document before deploying MongoDB in a production environment.
Later, to stop MongoDB, press Control+C in the terminal where the mongod instance is running.
Install MongoDB on Linux Systems
Overview Compiled versions of MongoDB for Linux provide a simple option for installing MongoDB for other
Linux systems without supported packages.
Considerations For production deployments, always run MongoDB on 64-bit systems.
Install MongoDB MongoDB provides archives for both 64-bit and 32-bit Linux. Follow the installation procedure
appropriate for your system.
Install for 64-bit Linux
Step 1: Download the binary files for the desired release of MongoDB.
https://www.mongodb.org/downloads.
For example, to download the latest release through the shell, issue the following:
curl -O http://downloads.mongodb.org/linux/mongodb-linux-x86_64-3.0.1.tgz
Step 2: Extract the files from the downloaded archive. For example, from a system shell, you can extract through
the tar command:
tar -zxvf mongodb-linux-x86_64-3.0.1.tgz
19
Step 3: Copy the extracted archive to the target directory. Copy the extracted folder to the location from which
MongoDB will run.
mkdir -p mongodb
cp -R -n mongodb-linux-x86_64-3.0.1/ mongodb
Step 4: Ensure the location of the binaries is in the PATH variable. The MongoDB binaries are in the bin/
directory of the archive. To ensure that the binaries are in your PATH, you can modify your PATH.
For example, you can add the following line to your shells rc file (e.g. ~/.bashrc):
export PATH=<mongodb-install-directory>/bin:$PATH
For example, to download the latest release through the shell, issue the following: .. include:: /includes/release/curlrelease-linux-i686.rst
Step 2: Extract the files from the downloaded archive. For example, from a system shell, you can extract through
the tar command:
tar -zxvf mongodb-linux-i686-3.0.1.tgz
Step 3: Copy the extracted archive to the target directory. Copy the extracted folder to the location from which
MongoDB will run.
mkdir -p mongodb
cp -R -n mongodb-linux-i686-3.0.1/ mongodb
Step 4: Ensure the location of the binaries is in the PATH variable. The MongoDB binaries are in the bin/
directory of the archive. To ensure that the binaries are in your PATH, you can modify your PATH.
For example, you can add the following line to your shells rc file (e.g. ~/.bashrc):
export PATH=<mongodb-install-directory>/bin:$PATH
20
mkdir -p /data/db
Step 2: Set permissions for the data directory. Before running mongod for the first time, ensure that the user
account running mongod has read and write permissions for the directory.
Step 3: Run MongoDB. To run MongoDB, run the mongod process at the system prompt. If necessary, specify the
path of the mongod or the data directory. See the following examples.
Run without specifying paths If your system PATH variable includes the location of the mongod binary and if you
use the default data directory (i.e., /data/db), simply enter mongod at the system prompt:
mongod
Specify the path of the mongod If your PATH does not include the location of the mongod binary, enter the full
path to the mongod binary at the system prompt:
<path to binary>/mongod
Specify the path of the data directory If you do not use the default data directory (i.e., /data/db), specify the
path to the data directory using the --dbpath option:
mongod --dbpath <path to data directory>
Step 4: Begin using MongoDB. To begin using MongoDB, see Getting Started with MongoDB (page 48). Also
consider the Production Notes (page 198) document before deploying MongoDB in a production environment.
Later, to stop MongoDB, press Control+C in the terminal where the mongod instance is running.
21
Homebrew8 installs binary packages based on published formulae. This section describes how to update brew to
the latest packages and install MongoDB. Homebrew requires some initial setup and configuration, which is beyond
the scope of this document.
Step 1: Update Homebrews package database.
You can install MongoDB via brew with several different options. Use one of the following operations:
Install the MongoDB Binaries To install the MongoDB binaries, issue the following command in a system shell:
brew install mongodb
To build MongoDB from the source files and include SSL sup-
Install the Latest Development Release of MongoDB To install the latest development release for use in testing
and development, issue the following command in a system shell:
brew install mongodb --devel
Only install MongoDB using this procedure if you cannot use homebrew (page 22).
Step 1: Download the binary files for the desired release of MongoDB.
For example, from a system shell, you can extract through the tar command:
8 http://brew.sh/
22
Copy the extracted folder to the location from which MongoDB will run.
mkdir -p mongodb
cp -R -n mongodb-osx-x86_64-3.0.1/ mongodb
The MongoDB binaries are in the bin/ directory of the archive. To ensure that the binaries are in your PATH, you
can modify your PATH.
For example, you can add the following line to your shells rc file (e.g. ~/.bashrc):
export PATH=<mongodb-install-directory>/bin:$PATH
Before you start MongoDB for the first time, create the directory to which the mongod process will write data. By
default, the mongod process uses the /data/db directory. If you create a directory other than this one, you must
specify that directory in the dbpath option when starting the mongod process later in this procedure.
The following example command creates the default /data/db directory:
mkdir -p /data/db
Before running mongod for the first time, ensure that the user account running mongod has read and write permissions for the directory.
Step 3: Run MongoDB.
To run MongoDB, run the mongod process at the system prompt. If necessary, specify the path of the mongod or the
data directory. See the following examples.
Run without specifying paths If your system PATH variable includes the location of the mongod binary and if you
use the default data directory (i.e., /data/db), simply enter mongod at the system prompt:
mongod
23
Specify the path of the mongod If your PATH does not include the location of the mongod binary, enter the full
path to the mongod binary at the system prompt:
<path to binary>/mongod
Specify the path of the data directory If you do not use the default data directory (i.e., /data/db), specify the
path to the data directory using the --dbpath option:
mongod --dbpath <path to data directory>
To begin using MongoDB, see Getting Started with MongoDB (page 48). Also consider the Production Notes
(page 198) document before deploying MongoDB in a production environment.
Later, to stop MongoDB, press Control+C in the terminal where the mongod instance is running.
Requirements
On Windows MongoDB requires Windows Server 2008 R2, Windows Vista, or later. The .msi installer includes all
other software dependencies and will automatically upgrade any older version of MongoDB installed using an .msi
file.
Get MongoDB
Step 1: Determine which MongoDB build you need.
24
MongoDB for Windows 32-bit runs on any 32-bit version of Windows newer than Windows Vista. 32-bit versions
of MongoDB are only intended for older systems and for use in testing and development systems. 32-bit versions of
MongoDB only support databases smaller than 2GB.
MongoDB for Windows 64-bit Legacy runs on Windows Vista, Windows Server 2003, and Windows Server 2008
and does not include recent performance enhancements.
To find which version of Windows you are running, enter the following commands in the Command Prompt or Powershell:
wmic os get caption
wmic os get osarchitecture
Download the latest production release of MongoDB from the MongoDB downloads page10 . Ensure you download
the correct version of MongoDB for your Windows system. The 64-bit versions of MongoDB does not work with
32-bit Windows.
Install MongoDB
Interactive Installation
Step 1: Install MongoDB for Windows.
In Windows Explorer, locate the downloaded MongoDB .msi file, which typically is located in the default
Downloads folder. Double-click the .msi file. A set of screens will appear to guide you through the installation process.
You may specify an installation directory if you choose the Custom installation option. These instructions assume
that you have installed MongoDB to C:\mongodb.
MongoDB is self-contained and does not have any other system dependencies. You can run MongoDB from any folder
you choose. You may install MongoDB in any folder (e.g. D:\test\mongodb).
Unattended Installation
You may install MongoDB unattended on Windows from the command line using msiexec.exe.
Step 1: Install MongoDB for Windows.
Open a shell in the directory containing the .msi installation binary of your choice and invoke:
msiexec.exe /q /i mongodb-<version>-signed.msi INSTALLLOCATION="<installation directory>"
By default, this method installs the following MongoDB binaries: mongod.exe, mongo.exe, mongodump.exe,
mongorestore.exe, mongoimport.exe, mongoexport.exe, mongostat.exe, and mongotop.exe.
You can specify the installation location for the executable by modifying the <installation directory>
value. To install specific subsets of the binaries, you may specify an ADDLOCAL argument:
10 http://www.mongodb.org/downloads
25
The <binary set(s)> value is a comma-separated list including one or more of the following:
Server - includes mongod.exe
Client - includes mongo.exe
MonitoringTools - includes mongostat.exe and mongotop.exe
ImportExportTools - includes mongodump.exe, mongorestore.exe, mongoexport.exe, and
mongoimport.exe)
MiscellaneousTools - includes bsondump.exe, mongofiles.exe, mongooplog.exe, and
mongoperf.exe
For instance, to install only the entire set of tools to C:\mongodb, invoke:
You may also specify ADDLOCAL=ALL to install the complete set of binaries, as in the following:
msiexec.exe /q /i mongodb-<version>-signed.msi INSTALLLOCATION="C:\mongodb" ADDLOCAL=ALL
Run MongoDB
Warning: Do not make mongod.exe visible on public networks without running in Secure Mode with the
auth setting. MongoDB is designed to be run in trusted environments, and the database does not enable Secure
Mode by default.
MongoDB requires a data directory to store all data. MongoDBs default data directory path is \data\db. Create
this folder using the following commands from a Command Prompt:
md \data\db
You can specify an alternate path for data files using the --dbpath option to mongod.exe, for example:
C:\mongodb\bin\mongod.exe --dbpath d:\test\mongodb\data
If your path includes spaces, enclose the entire path in double quotes, for example:
C:\mongodb\bin\mongod.exe --dbpath "d:\test\mongo db data"
To start MongoDB, run mongod.exe. For example, from the Command Prompt:
C:\mongodb\bin\mongod.exe
This starts the main MongoDB database process. The waiting for connections message in the console
output indicates that the mongod.exe process is running successfully.
26
Depending on the security level of your system, Windows may pop up a Security Alert dialog box about blocking
some features of C:\mongodb\bin\mongod.exe from communicating on networks. All users should select
Private Networks, such as my home or work network and click Allow access. For additional
information on security and MongoDB, please see the Security Documentation (page 307).
Step 3: Connect to MongoDB.
To connect to MongoDB through the mongo.exe shell, open another Command Prompt.
C:\mongodb\bin\mongo.exe
If you want to develop applications using .NET, see the documentation of C# and MongoDB11 for more information.
Step 4: Begin using MongoDB.
To begin using MongoDB, see Getting Started with MongoDB (page 48). Also consider the Production Notes
(page 198) document before deploying MongoDB in a production environment.
Later, to stop MongoDB, press Control+C in the terminal where the mongod instance is running.
Manually Create a Windows Service for MongoDB
You can set up the MongoDB server as a Windows Service that starts automatically at boot time.
The following procedure assumes you have installed MongoDB using the .msi installer with the path
C:\mongodb\.
If you have installed in an alternative directory, you will need to adjust the paths as appropriate.
Step 1: Open an Administrator command prompt.
Windows 7 / Vista / Server 2008 (and R2) Press Win + R, then type cmd, then press Ctrl + Shift +
Enter.
Windows 8 Press Win + X, then press A.
Execute the remaining steps from the Administrator command prompt.
Step 2: Create directories.
27
Create a configuration file. This file can include any of the configuration options for mongod, but
must include a valid setting for logpath:
The following creates a configuration file, specifying both the logpath and the dbpath settings in the configuration
file:
echo logpath=c:\data\log\mongod.log> "C:\mongodb\mongod.cfg"
echo dbpath=c:\data\db>> "C:\mongodb\mongod.cfg"
sc.exe requires a space between = and the configuration values (eg binPath= ), and a \ to escape double
quotes.
If successfully created, the following log message will display:
[SC] CreateService SUCCESS
To remove the MongoDB service, first stop the service and then run the following command:
sc.exe delete MongoDB
Additional Resources
MongoDB for Developers Free Course12
MongoDB for .NET Developers Free Online Course13
MongoDB Architecture Guide14
12 https://university.mongodb.com/courses/M101P/about
13 https://university.mongodb.com/courses/M101N/about
14 https://www.mongodb.com/lp/white-paper/architecture-guide
28
Use this tutorial to install MongoDB Enterprise on Red Hat Enterprise Linux or CentOS Linux from .rpm packages.
Packages
MongoDB provides packages of the officially supported MongoDB Enterprise builds in its own repository. This
repository provides the MongoDB Enterprise distribution in the following packages:
mongodb-enterprise
This package is a metapackage that will automatically install the four component packages listed below.
mongodb-enterprise-server
This package contains the mongod daemon and associated configuration and init scripts.
mongodb-enterprise-mongos
This package contains the mongos daemon.
mongodb-enterprise-shell
This package contains the mongo shell.
mongodb-enterprise-tools
This package contains the following MongoDB tools: mongoimport bsondump, mongodump,
mongoexport, mongofiles, mongoimport, mongooplog, mongoperf, mongorestore,
mongostat, and mongotop.
29
Control Scripts
includes
various
control
scripts,
including
the
init
script
The package configures MongoDB using the /etc/mongod.conf file in conjunction with the control scripts. See
the Configuration File reference for documentation of settings available in the configuration file.
As of version 3.0.1, there are no control scripts for mongos. The mongos process is used only in sharding (page 647).
You can use the mongod init script to derive your own mongos control script.
Considerations
MongoDB only provides Enterprise packages for Red Hat Enterprise Linux and CentOS Linux versions 5 and 6,
64-bit.
The default /etc/mongod.conf configuration file supplied by the 3.0 series packages has bind_ip set to
127.0.0.1 by default. Modify this setting as needed for your environment before initializing a replica set.
Changed in version 2.6: The package structure and names have changed as of version 2.6. For instructions on installation of an older release, please refer to the documentation for the appropriate version.
Install MongoDB Enterprise
When you install the packages for MongoDB Enterprise, you choose whether to install the current release or a previous
one. This procedure describes how to do both.
Step 1: Configure repository. Create an /etc/yum.repos.d/mongodb-enterprise.repo file so that
you can install MongoDB enterprise directly, using yum.
Use the following repository file to specify the latest stable release of MongoDB enterprise.
[mongodb-enterprise]
name=MongoDB Enterprise Repository
baseurl=https://repo.mongodb.com/yum/redhat/$releasever/mongodb-enterprise/stable/$basearch/
gpgcheck=0
enabled=1
Use the following repository to install only versions of MongoDB for the 2.6 release. If youd like to install MongoDB Enterprise packages from a particular release series (page 887), such as 2.4 or 2.6, you can specify the release series in the repository configuration. For example, to restrict your system to the 2.6 release series, create a
/etc/yum.repos.d/mongodb-enterprise-2.6.repo file to hold the following configuration information
for the MongoDB Enterprise 2.6 repository:
[mongodb-enterprise-2.6]
name=MongoDB Enterprise 2.6 Repository
baseurl=https://repo.mongodb.com/yum/redhat/$releasever/mongodb-enterprise/2.6/$basearch/
gpgcheck=0
enabled=1
.repo files for each release can also be found in the repository itself15 . Remember that odd-numbered minor release
versions (e.g. 2.5) are development versions and are unsuitable for production deployment.
15 https://repo.mongodb.com/yum/redhat/
30
Step 2: Install the MongoDB Enterprise packages and associated tools. You can install either the latest stable
version of MongoDB Enterprise or a specific version of MongoDB Enterprise.
To install the latest stable version of MongoDB Enterprise, issue the following command:
sudo yum install -y mongodb-enterprise
Pin a specific version of MongoDB Enterprise. Although you can specify any available version of MongoDB
Enterprise, yum will upgrade the packages when a newer version becomes available. To prevent unintended upgrades,
pin the package. To pin a package, add the following exclude directive to your /etc/yum.conf file:
exclude=mongodb-enterprise,mongodb-enterprise-server,mongodb-enterprise-shell,mongodb-enterprise-mong
Previous versions of MongoDB packages use different naming conventions. See the 2.4 version of documentation for
more information16 .
Step 4: When the install completes, you can run MongoDB.
Run MongoDB Enterprise
Important: You must configure SELinux to allow MongoDB to start on Red Hat Linux-based systems (Red Hat
Enterprise Linux or CentOS Linux). Administrators have three options:
enable access to the relevant ports (e.g. 27017) for SELinux. See Default MongoDB Port (page 408) for more
information on MongoDBs default ports. For default settings, this can be accomplished by running
semanage port -a -t mongod_port_t -p tcp 27017
should be changed to
SELINUX=permissive
All three options require root privileges. The latter two options each requires a system reboot and may have larger
implications for your deployment.
You may alternatively choose not to install the SELinux packages when you are installing your Linux operating system,
or choose to remove the relevant packages. This option is the most invasive and is not recommended.
16 http://docs.mongodb.org/v2.4/tutorial/install-mongodb-on-linux
31
The MongoDB instance stores its data files in /var/lib/mongo and its log files in /var/log/mongodb
by default, and runs using the mongod user account. You can specify alternate log and data file directories in
/etc/mongod.conf. See systemLog.path and storage.dbPath for additional information.
If you change the user that runs the MongoDB process, you must modify the access control rights to the
/var/lib/mongo and /var/log/mongodb directories to give this user access to these directories.
Step 1: Start MongoDB. You can start the mongod process by issuing the following command:
sudo service mongod start
Step 2: Verify that MongoDB has started successfully You can verify that the mongod process has started successfully by checking the contents of the log file at /var/log/mongodb/mongod.log for a line reading
[initandlisten] waiting for connections on port <port>
Step 3: Stop MongoDB. As needed, you can stop the mongod process by issuing the following command:
sudo service mongod stop
Step 4: Restart MongoDB. You can restart the mongod process by issuing the following command:
sudo service mongod restart
You can follow the state of the process for errors or important messages by watching the output in the
/var/log/mongodb/mongod.log file.
Step 5: Begin using MongoDB. To begin using MongoDB, see Getting Started with MongoDB (page 48). Also
consider the Production Notes (page 198) document before deploying MongoDB in a production environment.
Later, to stop MongoDB, press Control+C in the terminal where the mongod instance is running.
Install MongoDB Enterprise on Ubuntu
Overview
Use this tutorial to install MongoDB Enterprise on Ubuntu Linux systems from .deb packages.
Packages
MongoDB provides packages of the officially supported MongoDB Enterprise builds in its own repository. This
repository provides the MongoDB Enterprise distribution in the following packages:
mongodb-enterprise
This package is a metapackage that will automatically install the four component packages listed below.
32
mongodb-enterprise-server
This package contains the mongod daemon and associated configuration and init scripts.
mongodb-enterprise-mongos
This package contains the mongos daemon.
mongodb-enterprise-shell
This package contains the mongo shell.
mongodb-enterprise-tools
This package contains the following MongoDB tools: mongoimport bsondump, mongodump,
mongoexport, mongofiles, mongoimport, mongooplog, mongoperf, mongorestore,
mongostat, and mongotop.
Control Scripts
includes
various
control
scripts,
including
the
init
script
The package configures MongoDB using the /etc/mongod.conf file in conjunction with the control scripts. See
the Configuration File reference for documentation of settings available in the configuration file.
As of version 3.0.1, there are no control scripts for mongos. The mongos process is used only in sharding (page 647).
You can use the mongod init script to derive your own mongos control script.
Considerations
MongoDB only provides Enterprise packages for Ubuntu 12.04 LTS (Precise Pangolin) and 14.04 LTS (Trusty Tahr).
Changed in version 2.6: The package structure and names have changed as of version 2.6. For instructions on installation of an older release, please refer to the documentation for the appropriate version.
Install MongoDB Enterprise
Step 1: Import the public key used by the package management system. The Ubuntu package management tools
(i.e. dpkg and apt) ensure package consistency and authenticity by requiring that distributors sign packages with
GPG keys. Issue the following command to import the MongoDB public GPG Key17 :
sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 7F0CEB10
If youd like to install MongoDB Enterprise packages from a particular release series (page 887), such as 2.4 or 2.6,
you can specify the release series in the repository configuration. For example, to restrict your system to the 2.6 release
series, add the following repository:
echo "deb http://repo.mongodb.com/apt/ubuntu "$(lsb_release -sc)"/mongodb-enterprise/2.6 multiverse"
17 http://docs.mongodb.org/10gen-gpg-key.asc
33
Step 3: Reload local package database. Issue the following command to reload the local package database:
sudo apt-get update
Step 4: Install the MongoDB Enterprise packages. When you install the packages, you choose whether to install
the current release or a previous one. This step provides instructions for both.
To install the latest stable version of MongoDB Enterprise, issue the following command:
sudo apt-get install mongodb-enterprise
To install a specific release of MongoDB Enterprise, specify each component package individually and append the
version number to the package name, as in the following example that installs the 2.6.1 release of MongoDB Enterprise:
You can specify any available version of MongoDB Enterprise. However apt-get will upgrade the packages when
a newer version becomes available. To prevent unintended upgrades, pin the package. To pin the version of MongoDB
Enterprise at the currently installed version, issue the following command sequence:
echo
echo
echo
echo
echo
Previous versions of MongoDB Enterprise packages use different naming conventions. See the 2.4 version of documentation18 for more information.
Run MongoDB Enterprise
The MongoDB instance stores its data files in /var/lib/mongodb and its log files in /var/log/mongodb
by default, and runs using the mongodb user account. You can specify alternate log and data file directories in
/etc/mongod.conf. See systemLog.path and storage.dbPath for additional information.
If you change the user that runs the MongoDB process, you must modify the access control rights to the
/var/lib/mongodb and /var/log/mongodb directories to give this user access to these directories.
Step 1: Start MongoDB. Issue the following command to start mongod:
sudo service mongod start
Step 2: Verify that MongoDB has started successfully Verify that the mongod process has started successfully
by checking the contents of the log file at /var/log/mongodb/mongod.log for a line reading
[initandlisten] waiting for connections on port <port>
34
Step 5: Begin using MongoDB. To begin using MongoDB, see Getting Started with MongoDB (page 48). Also
consider the Production Notes (page 198) document before deploying MongoDB in a production environment.
Later, to stop MongoDB, press Control+C in the terminal where the mongod instance is running.
Install MongoDB Enterprise on Debian
Overview
Use this tutorial to install MongoDB Enterprise on Debian Linux systems from .deb packages.
Packages
MongoDB provides packages of the officially supported MongoDB Enterprise builds in its own repository. This
repository provides the MongoDB Enterprise distribution in the following packages:
mongodb-enterprise
This package is a metapackage that will automatically install the four component packages listed below.
mongodb-enterprise-server
This package contains the mongod daemon and associated configuration and init scripts.
mongodb-enterprise-mongos
This package contains the mongos daemon.
mongodb-enterprise-shell
This package contains the mongo shell.
mongodb-enterprise-tools
This package contains the following MongoDB tools: mongoimport bsondump, mongodump,
mongoexport, mongofiles, mongoimport, mongooplog, mongoperf, mongorestore,
mongostat, and mongotop.
Control Scripts
includes
various
control
scripts,
including
the
init
script
The package configures MongoDB using the /etc/mongod.conf file in conjunction with the control scripts. See
the Configuration File reference for documentation of settings available in the configuration file.
As of version 3.0.1, there are no control scripts for mongos. The mongos process is used only in sharding (page 647).
You can use the mongod init script to derive your own mongos control script.
35
Considerations
Changed in version 2.6: The package structure and names have changed as of version 2.6. For instructions on installation of an older release, please refer to the documentation for the appropriate version.
MongoDB only provides Enterprise packages for 64-bit versions of Debian Wheezy.
Install MongoDB Enterprise
Step 1: Import the public key used by the package management system. Issue the following command to add
the MongoDB public GPG Key19 to the system key ring.
sudo apt-key adv --keyserver keyserver.ubuntu.com --recv 7F0CEB10
If youd like to install MongoDB Enterprise packages from a particular release series (page 887), such as 2.6, you can
specify the release series in the repository configuration. For example, to restrict your system to the 2.6 release series,
add the following repository:
Step 3: Reload local package database. Issue the following command to reload the local package database:
sudo apt-get update
Step 4: Install the MongoDB Enterprise packages. When you install the packages, you choose whether to install
the current release or a previous one. This step provides instructions for both.
To install the latest stable version of MongoDB Enterprise, issue the following command:
sudo apt-get install mongodb-enterprise
To install a specific release of MongoDB Enterprise, specify each component package individually and append the
version number to the package name, as in the following example that installs the 2.6.1 release of MongoDB Enterprise:
You can specify any available version of MongoDB Enterprise. However apt-get will upgrade the packages when
a newer version becomes available. To prevent unintended upgrades, pin the package. To pin the version of MongoDB
Enterprise at the currently installed version, issue the following command sequence:
echo
echo
echo
echo
echo
19 http://docs.mongodb.org/10gen-gpg-key.asc
36
The MongoDB instance stores its data files in /var/lib/mongodb and its log files in /var/log/mongodb
by default, and runs using the mongodb user account. You can specify alternate log and data file directories in
/etc/mongod.conf. See systemLog.path and storage.dbPath for additional information.
If you change the user that runs the MongoDB process, you must modify the access control rights to the
/var/lib/mongodb and /var/log/mongodb directories to give this user access to these directories.
Step 1: Start MongoDB. Issue the following command to start mongod:
sudo service mongod start
Step 2: Verify that MongoDB has started successfully Verify that the mongod process has started successfully
by checking the contents of the log file at /var/log/mongodb/mongod.log for a line reading
[initandlisten] waiting for connections on port <port>
Step 5: Begin using MongoDB. To begin using MongoDB, see Getting Started with MongoDB (page 48). Also
consider the Production Notes (page 198) document before deploying MongoDB in a production environment.
Later, to stop MongoDB, press Control+C in the terminal where the mongod instance is running.
Install MongoDB Enterprise on SUSE
Overview
Use this tutorial to install MongoDB Enterprise on SUSE Linux. MongoDB Enterprise is available on select platforms
and contains support for several features related to security and monitoring.
Packages
MongoDB provides packages of the officially supported MongoDB Enterprise builds in its own repository. This
repository provides the MongoDB Enterprise distribution in the following packages:
mongodb-enterprise
This package is a metapackage that will automatically install the four component packages listed below.
mongodb-enterprise-server
This package contains the mongod daemon and associated configuration and init scripts.
37
mongodb-enterprise-mongos
This package contains the mongos daemon.
mongodb-enterprise-shell
This package contains the mongo shell.
mongodb-enterprise-tools
This package contains the following MongoDB tools: mongoimport bsondump, mongodump,
mongoexport, mongofiles, mongoimport, mongooplog, mongoperf, mongorestore,
mongostat, and mongotop.
Control Scripts
includes
various
control
scripts,
including
the
init
script
The package configures MongoDB using the /etc/mongod.conf file in conjunction with the control scripts. See
the Configuration File reference for documentation of settings available in the configuration file.
As of version 3.0.1, there are no control scripts for mongos. The mongos process is used only in sharding (page 647).
You can use the mongod init script to derive your own mongos control script.
Considerations
Note: SUSE Linux Enterprise Server 11 and potentially other versions of SLES and other SUSE distributions ship
with virtual memory address space limited to 8GB by default. This must be adjusted in order to prevent virtual memory
allocation failures as the database grows.
The SLES packages for MongoDB adjust these limits in the default scripts, but you will need to make this change
manually if you are using custom scripts and/or the tarball release rather than the SLES packages.
Step 1: Configure the package management system (zypper). Add the repository so that you can install MongoDB using zypper.
Use the following command to specify the latest stable release of MongoDB.
If youd like to install MongoDB packages from a previous release series (page 887), such as 2.6, you can specify the
release series in the repository configuration. For example, to restrict your system to the 2.6 release series, use the
following command:
Step 2: Install the MongoDB packages and associated tools. When you install the packages, you choose whether
to install the current release or a previous one. This step provides the commands for both.
To install the latest stable version of MongoDB, issue the following command:
38
To install a specific release of MongoDB, specify each component package individually and append the version number
to the package name, as in the following example:
You can specify any available version of MongoDB. However zypper will upgrade the packages when a newer
version becomes available. To prevent unintended upgrades, pin the packages by running the following command:
Previous versions of MongoDB packages use a different repository location. Refer to the version of the documentation
appropriate for your MongoDB version.
Run MongoDB Enterprise
The MongoDB instance stores its data files in /var/lib/mongo and its log files in /var/log/mongodb
by default, and runs using the mongod user account. You can specify alternate log and data file directories in
/etc/mongod.conf. See systemLog.path and storage.dbPath for additional information.
If you change the user that runs the MongoDB process, you must modify the access control rights to the
/var/lib/mongo and /var/log/mongodb directories to give this user access to these directories.
Step 1: Start MongoDB. You can start the mongod process by issuing the following command:
sudo service mongod start
Step 2: Verify that MongoDB has started successfully You can verify that the mongod process has started successfully by checking the contents of the log file at /var/log/mongodb/mongod.log for a line reading
[initandlisten] waiting for connections on port <port>
Step 3: Stop MongoDB. As needed, you can stop the mongod process by issuing the following command:
sudo service mongod stop
Step 4: Restart MongoDB. You can restart the mongod process by issuing the following command:
sudo service mongod restart
You can follow the state of the process for errors or important messages by watching the output in the
/var/log/mongodb/mongod.log file.
Step 5: Begin using MongoDB. To begin using MongoDB, see Getting Started with MongoDB (page 48). Also
consider the Production Notes (page 198) document before deploying MongoDB in a production environment.
Later, to stop MongoDB, press Control+C in the terminal where the mongod instance is running.
2.3. Installation Guides
39
Use this tutorial to install MongoDB Enterprise on Amazon Linux AMI. MongoDB Enterprise is available on select
platforms and contains support for several features related to security and monitoring.
Prerequisites
To use MongoDB Enterprise on Amazon Linux AMI, you must install several prerequisite packages:
net-snmp
net-snmp-libs
openssl
net-snmp-utils
cyrus-sasl
cyrus-sasl-lib
cyrus-sasl-devel
cyrus-sasl-gssapi
To install these packages, you can issue the following command:
sudo yum install openssl net-snmp net-snmp-libs net-snmp-utils cyrus-sasl cyrus-sasl-lib cyrus-sasl-d
Note: The Enterprise packages include an example SNMP configuration file named mongod.conf. This file is not
a MongoDB configuration file.
Step 1: Download and install the MongoDB Enterprise packages. After you have installed the required prerequisite packages, download and install the MongoDB Enterprise packages from http://www.mongodb.com/thankyou/download/mongodb-enterprise. The MongoDB binaries are located in the bin/ directory of the archive. To
download and install, use the following sequence of commands.
curl -O http://downloads.10gen.com/linux/mongodb-linux-x86_64-enterprise-amzn64-3.0.1.tgz
tar -zxvf mongodb-linux-x86_64-enterprise-amzn64-3.0.1.tgz
cp -R -n mongodb-linux-x86_64-enterprise-amzn64-3.0.1/ mongodb
Step 2: Ensure the location of the MongoDB binaries is included in the PATH variable. Once you have copied
the MongoDB binaries to their target location, ensure that the location is included in your PATH variable. If it is not,
either include it or create symbolic links from the binaries to a directory that is included.
40
The MongoDB instance stores its data files in /var/lib/mongo and its log files in /var/log/mongodb
by default, and runs using the mongod user account. You can specify alternate log and data file directories in
/etc/mongod.conf. See systemLog.path and storage.dbPath for additional information.
If you change the user that runs the MongoDB process, you must modify the access control rights to the
/var/lib/mongo and /var/log/mongodb directories to give this user access to these directories.
Step 1: Create the data directory. Before you start MongoDB for the first time, create the directory to which
the mongod process will write data. By default, the mongod process uses the /data/db directory. If you create a
directory other than this one, you must specify that directory in the dbpath option when starting the mongod process
later in this procedure.
The following example command creates the default /data/db directory:
mkdir -p /data/db
Step 2: Set permissions for the data directory. Before running mongod for the first time, ensure that the user
account running mongod has read and write permissions for the directory.
Step 3: Run MongoDB. To run MongoDB, run the mongod process at the system prompt. If necessary, specify the
path of the mongod or the data directory. See the following examples.
Run without specifying paths If your system PATH variable includes the location of the mongod binary and if you
use the default data directory (i.e., /data/db), simply enter mongod at the system prompt:
mongod
Specify the path of the mongod If your PATH does not include the location of the mongod binary, enter the full
path to the mongod binary at the system prompt:
<path to binary>/mongod
Specify the path of the data directory If you do not use the default data directory (i.e., /data/db), specify the
path to the data directory using the --dbpath option:
mongod --dbpath <path to data directory>
Step 4: Begin using MongoDB. To begin using MongoDB, see Getting Started with MongoDB (page 48). Also
consider the Production Notes (page 198) document before deploying MongoDB in a production environment.
Later, to stop MongoDB, press Control+C in the terminal where the mongod instance is running.
Install MongoDB Enterprise on Windows
New in version 2.6.
41
Overview
Use this tutorial to install MongoDB Enterprise on Windows systems. MongoDB Enterprise is available on select
platforms and contains support for several features related to security and monitoring.
Prerequisites
MongoDB Enterprise Server for Windows requires Windows Server 2008 R2 or later. The .msi installer includes all
other software dependencies and will automatically upgrade any older version of MongoDB installed using an .msi
file.
Get MongoDB Enterprise
To find which version of Windows you are running, enter the following commands in the Command Prompt or Powershell:
wmic os get caption
wmic os get osarchitecture
Interactive Installation
Step 1: Install MongoDB Enterprise for Windows. In Windows Explorer, locate the downloaded MongoDB .msi
file, which typically is located in the default Downloads folder. Double-click the .msi file. A set of screens will
appear to guide you through the installation process.
You may specify an installation directory if you choose the Custom installation option. These instructions assume
that you have installed MongoDB to C:\mongodb.
MongoDB is self-contained and does not have any other system dependencies. You can run MongoDB from any folder
you choose. You may install MongoDB in any folder (e.g. D:\test\mongodb).
Unattended Installation You may install MongoDB unattended on Windows from the command line using
msiexec.exe.
Step 1: Install MongoDB Enterprise for Windows.
binary of your choice and invoke:
By default, this method installs the following MongoDB binaries: mongod.exe, mongo.exe, mongodump.exe,
mongorestore.exe, mongoimport.exe, mongoexport.exe, mongostat.exe, and mongotop.exe.
You can specify the installation location for the executable by modifying the <installation directory>
value. To install specific subsets of the binaries, you may specify an ADDLOCAL argument:
20 http://www.mongodb.com/products/mongodb-enterprise
42
The <binary set(s)> value is a comma-separated list including one or more of the following:
Server - includes mongod.exe
Client - includes mongo.exe
MonitoringTools - includes mongostat.exe and mongotop.exe
ImportExportTools - includes mongodump.exe, mongorestore.exe, mongoexport.exe, and
mongoimport.exe)
MiscellaneousTools - includes bsondump.exe, mongofiles.exe, mongooplog.exe, and
mongoperf.exe
For instance, to install only the entire set of tools to C:\mongodb, invoke:
You may also specify ADDLOCAL=ALL to install the complete set of binaries, as in the following:
msiexec.exe /q /i mongodb-<version>-signed.msi INSTALLLOCATION="C:\mongodb" ADDLOCAL=ALL
Warning: Do not make mongod.exe visible on public networks without running in Secure Mode with the
auth setting. MongoDB is designed to be run in trusted environments, and the database does not enable Secure
Mode by default.
Step 1: Set up the MongoDB environment. MongoDB requires a data directory to store all data. MongoDBs
default data directory path is \data\db. Create this folder using the following commands from a Command Prompt:
md \data\db
You can specify an alternate path for data files using the --dbpath option to mongod.exe, for example:
C:\mongodb\bin\mongod.exe --dbpath d:\test\mongodb\data
If your path includes spaces, enclose the entire path in double quotes, for example:
C:\mongodb\bin\mongod.exe --dbpath "d:\test\mongo db data"
This starts the main MongoDB database process. The waiting for connections message in the console
output indicates that the mongod.exe process is running successfully.
Depending on the security level of your system, Windows may pop up a Security Alert dialog box about blocking
some features of C:\mongodb\bin\mongod.exe from communicating on networks. All users should select
Private Networks, such as my home or work network and click Allow access. For additional
information on security and MongoDB, please see the Security Documentation (page 307).
43
C:\mongodb\bin\mongo.exe
If you want to develop applications using .NET, see the documentation of C# and MongoDB21 for more information.
Step 4: Begin using MongoDB. To begin using MongoDB, see Getting Started with MongoDB (page 48). Also
consider the Production Notes (page 198) document before deploying MongoDB in a production environment.
Later, to stop MongoDB, press Control+C in the terminal where the mongod instance is running.
Manually Create a Windows Service for MongoDB Enterprise
You can set up the MongoDB server as a Windows Service that starts automatically at boot time.
The following procedure assumes you have installed MongoDB using the .msi installer with the path
C:\mongodb\.
If you have installed in an alternative directory, you will need to adjust the paths as appropriate.
Step 1: Open an Administrator command prompt. Press Win + R, then type cmd, then press Ctrl + Shift
+ Enter.
Execute the remaining steps from the Administrator command prompt.
Step 2: Create directories. Create directories for your database and log files:
mkdir c:\data\db
mkdir c:\data\log
Step 3: Create a configuration file. Create a configuration file. This file can include any of the
configuration options for mongod, but must include a valid setting for logpath:
The following creates a configuration file, specifying both the logpath and the dbpath settings in the configuration
file:
echo logpath=c:\data\log\mongod.log> "C:\mongodb\mongod.cfg"
echo dbpath=c:\data\db>> "C:\mongodb\mongod.cfg"
sc.exe requires a space between = and the configuration values (eg binPath= ), and a to escape double quotes.
If successfully created, the following log message will display:
[SC] CreateService SUCCESS
44
Step 6: Stop or remove the MongoDB service as needed. To stop the MongoDB service, use the following command:
net stop MongoDB
To remove the MongoDB service, first stop the service and then run the following command:
sc.exe delete MongoDB
Replace 2.2 with the appropriate release number to download public key. Keys are available for all MongoDB
releases beginning with 2.2.
Procedures
Use PGP/GPG
Step
1:
Download
the
MongoDB
installation
file. Download
https://www.mongodb.org/downloads based on your environment.
the
binaries
from
For example, to download the 2.6.0 release for OS X through the shell, type this command:
curl -LO http://downloads.mongodb.org/osx/mongodb-osx-x86_64-2.6.0.tgz
45
If you have not downloaded and imported the key file, enter these
Download and import the key file, as described above, if you receive a message like this one:
gpg: Signature made Thu Mar 6 15:11:28 2014 EST using RSA key ID AAB2461C
gpg: Can't check signature: public key not found
gpg will return the following message if the package is properly signed, but you do not currently trust the signing
key in your local trustdb.
gpg: WARNING: This key is not certified with a trusted signature!
gpg:
There is no indication that the signature belongs to the owner.
Primary key fingerprint: DFFA 3DCF 326E 302C 4787 673A 01C4 E7FA AAB2 461C
Use SHA
MongoDB provides checksums using both the SHA-1 and SHA-256 hash functions. You can use either, as you like.
Step
1:
Download
the
MongoDB
installation
file. Download
https://www.mongodb.org/downloads based on your environment.
the
binaries
from
For example, to download the 2.6.0 release for OS X through the shell, type this command:
curl -LO http://downloads.mongodb.org/osx/mongodb-osx-x86_64-2.6.0.tgz
Step 3: Use the SHA-256 checksum to verify the MongoDB package file. Compute the checksum of the package
file:
shasum mongodb-linux-x86_64-2.6.3.tgz
46
fe511ee40428edda3a507f70d2b91d16b0483674 mongodb-osx-x86_64-2.6.3.tgz
mongodb-osx-x86_64-2.6.3.tgz
mongodb-osx-x86_64-2.6.3.tgz
Step
1:
Download
the
MongoDB
installation
file. Download
https://www.mongodb.org/downloads based on your environment.
the
binaries
from
For example, to download the 2.6.0 release for OS X through the shell, type this command:
curl -LO http://downloads.mongodb.org/osx/mongodb-osx-x86_64-2.6.0.tgz
Step 3: Verify the checksum values for the MongoDB package file (Linux). Compute the checksum of the package file:
md5 mongodb-linux-x86_64-2.6.0.tgz
47
From a system prompt, start mongo by issuing the mongo command, as follows:
mongo
By default, mongo looks for a database server listening on port 27017 on the localhost interface. To connect to
a server on a different port or interface, use the --port and --host options.
22 http://api.mongodb.org/js
48
Select a Database
After starting the mongo shell, your session will use the test database by default. At any time, issue the following
operation at the mongo shell to report the name of the current database:
db
1. From the mongo shell, display the list of databases, with the following operation:
show dbs
3. Confirm that your session has the mydb database as context, by checking the value of the db object, which
returns the name of the current database, as follows:
db
At this point, if you issue the show dbs operation again, it will not include the mydb database. MongoDB
will not permanently create a database until you insert data into that database. The Create a Collection and
Insert Documents (page 49) section describes the process for inserting data.
New in version 2.4: show databases also returns a list of databases.
Display mongo Help
At any point, you can access help for the mongo shell using the following operation:
help
Furthermore, you can append the .help() method to some JavaScript methods, any cursor object, as well as the db
and db.collection objects to return additional help information.
Create a Collection and Insert Documents
In this section, you insert documents into a new collection named testData within the new database named mydb.
MongoDB will create a collection implicitly upon its first use. You do not need to create a collection before inserting
data. Furthermore, because MongoDB uses dynamic schemas (page 724), you also need not specify the structure of
your documents before inserting them into the collection.
1. From the mongo shell, confirm you are in the mydb database by issuing the following:
db
2. If mongo does not return mydb for the previous operation, set the context to the mydb database, with the
following operation:
use mydb
3. Create two documents named j and k by using the following sequence of JavaScript operations:
j = { name : "mongo" }
k = { x : 3 }
4. Insert the j and k documents into the testData collection with the following sequence of operations:
49
db.testData.insert( j )
db.testData.insert( k )
When you insert the first document, the mongod will create both the mydb database and the testData
collection.
5. Confirm that the testData collection exists. Issue the following operation:
show collections
The mongo shell will return the list of the collections in the current (i.e. mydb) database. At this point, the only
collection with user data is testData.
6. Confirm that the documents exist in the testData collection by issuing a query on the collection using the
find() method:
db.testData.find()
This operation returns the following results. The ObjectId (page 174) values will be unique:
{ "_id" : ObjectId("4c2209f9f3924d31102bd84a"), "name" : "mongo" }
{ "_id" : ObjectId("4c2209fef3924d31102bd84b"), "x" : 3 }
All MongoDB documents must have an _id field with a unique value. These operations do not explicitly
specify a value for the _id field, so mongo creates a unique ObjectId (page 174) value for the field before
inserting it into the collection.
Insert Documents using a For Loop or a JavaScript Function
To perform the remaining procedures in this tutorial, first add more documents to your database using one or both of
the procedures described in Generate Test Data (page 52).
Working with the Cursor
When you query a collection, MongoDB returns a cursor object that contains the results of the query. The mongo
shell then iterates over the cursor to display the results. Rather than returning all results at once, the shell iterates over
the cursor 20 times to display the first 20 results and then waits for a request to iterate over the remaining results. In
the shell, enter it to iterate over the next set of results.
The procedures in this section show other ways to work with a cursor. For comprehensive documentation on cursors,
see crud-read-cursor.
Iterate over the Cursor with a Loop
Before using this procedure, add documents to a collection using one of the procedures in Generate Test Data
(page 52). You can name your database and collections anything you choose, but this procedure will assume the
database named test and a collection named testData.
1. In the MongoDB JavaScript shell, query the testData collection and assign the resulting cursor object to the
c variable:
var c = db.testData.find()
2. Print the full result set by using a while loop to iterate over the c variable:
50
The hasNext() function returns true if the cursor has documents. The next() method returns the next
document. The printjson() method renders the document in a JSON-like format.
The operation displays all documents:
{ "_id" : ObjectId("51a7dc7b2cacf40b79990be6"), "x" : 1 }
{ "_id" : ObjectId("51a7dc7b2cacf40b79990be7"), "x" : 2 }
{ "_id" : ObjectId("51a7dc7b2cacf40b79990be8"), "x" : 3 }
...
The following procedure lets you manipulate a cursor object as if it were an array:
1. In the mongo shell, query the testData collection and assign the resulting cursor object to the c variable:
var c = db.testData.find()
2. To find the document at the array index 4, use the following operation:
printjson( c [ 4 ] )
When you access documents in a cursor using the array index notation, mongo first calls the
cursor.toArray() method and loads into RAM all documents returned by the cursor. The index is then
applied to the resulting array. This operation iterates the cursor completely and exhausts the cursor.
For very large result sets, mongo may run out of available memory.
For more information on the cursor, see crud-read-cursor.
Query for Specific Documents
MongoDB has a rich query system that allows you to select and filter the documents in a collection along specific
fields and values. See Query Documents (page 95) and Read Operations (page 58) for a full account of queries in
MongoDB.
In this procedure, you query for specific documents in the testData collection by passing a query document as a
parameter to the find() method. A query document specifies the criteria the query must match to return a document.
In the mongo shell, query for all documents where the x field has a value of 18 by passing the { x :
document as a parameter to the find() method:
18 } query
db.testData.find( { x : 18 } )
51
With the findOne() method you can return a single document from a MongoDB collection. The findOne()
method takes the same parameters as find(), but returns a document rather than a cursor.
To retrieve one document from the testData collection, issue the following command:
db.testData.findOne()
For more information on querying for documents, see the Query Documents (page 95) and Read Operations (page 58)
documentation.
Limit the Number of Documents in the Result Set
To increase performance, you can constrain the size of the result by limiting the amount of data your application must
receive over the network.
To specify the maximum number of documents in the result set, call the limit() method on a cursor, as in the
following command:
db.testData.find().limit(3)
MongoDB will return the following result, with different ObjectId (page 174) values:
{ "_id" : ObjectId("51a7dc7b2cacf40b79990be6"), "x" : 1 }
{ "_id" : ObjectId("51a7dc7b2cacf40b79990be7"), "x" : 2 }
{ "_id" : ObjectId("51a7dc7b2cacf40b79990be8"), "x" : 3 }
52
From the mongo shell, use the for loop. If the testData collection does not exist, MongoDB will implicitly create
the collection.
for (var i = 1; i <= 25; i++) {
db.testData.insert( { x : i } )
}
The mongo shell displays the first 20 documents in the collection. Your ObjectId (page 174) values will be different:
{ "_id" :
{ "_id" :
{ "_id" :
{ "_id" :
{ "_id" :
{ "_id" :
{ "_id" :
{ "_id" :
{ "_id" :
{ "_id" :
{ "_id" :
{ "_id" :
{ "_id" :
{ "_id" :
{ "_id" :
{ "_id" :
{ "_id" :
{ "_id" :
{ "_id" :
{ "_id" :
Type "it"
ObjectId("53d7be30242b692a1138ac7d"),
ObjectId("53d7be30242b692a1138ac7e"),
ObjectId("53d7be30242b692a1138ac7f"),
ObjectId("53d7be30242b692a1138ac80"),
ObjectId("53d7be30242b692a1138ac81"),
ObjectId("53d7be30242b692a1138ac82"),
ObjectId("53d7be30242b692a1138ac83"),
ObjectId("53d7be30242b692a1138ac84"),
ObjectId("53d7be30242b692a1138ac85"),
ObjectId("53d7be30242b692a1138ac86"),
ObjectId("53d7be30242b692a1138ac87"),
ObjectId("53d7be30242b692a1138ac88"),
ObjectId("53d7be30242b692a1138ac89"),
ObjectId("53d7be30242b692a1138ac8a"),
ObjectId("53d7be30242b692a1138ac8b"),
ObjectId("53d7be30242b692a1138ac8c"),
ObjectId("53d7be30242b692a1138ac8d"),
ObjectId("53d7be30242b692a1138ac8e"),
ObjectId("53d7be30242b692a1138ac8f"),
ObjectId("53d7be30242b692a1138ac90"),
for more
"x"
"x"
"x"
"x"
"x"
"x"
"x"
"x"
"x"
"x"
"x"
"x"
"x"
"x"
"x"
"x"
"x"
"x"
"x"
"x"
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
1 }
2 }
3 }
4 }
5 }
6 }
7 }
8 }
9 }
10 }
11 }
12 }
13 }
14 }
15 }
16 }
17 }
18 }
19 }
20 }
The find() method returns a cursor. To iterate the cursor (page 108) and return more documents, type it in the
mongo shell. The shell will exhaust the cursor and return these documents:
{
{
{
{
{
"_id"
"_id"
"_id"
"_id"
"_id"
:
:
:
:
:
ObjectId("53d7be30242b692a1138ac91"),
ObjectId("53d7be30242b692a1138ac92"),
ObjectId("53d7be30242b692a1138ac93"),
ObjectId("53d7be30242b692a1138ac94"),
ObjectId("53d7be30242b692a1138ac95"),
"x"
"x"
"x"
"x"
"x"
:
:
:
:
:
21
22
23
24
25
}
}
}
}
}
53
The insertData() function takes three parameters: a database, a new or existing collection, and the number of
documents to create. The function creates documents with an x field set to an incremented integer, as in the following
example documents:
{ "_id" : ObjectId("51a4da9b292904caffcff6eb"), "x" : 0 }
{ "_id" : ObjectId("51a4da9b292904caffcff6ec"), "x" : 1 }
{ "_id" : ObjectId("51a4da9b292904caffcff6ed"), "x" : 2 }
Store the function in your .mongorc.js file. The mongo shell loads and parses the .mongorc.js file on startup so your
function is available every time you start a session.
Example
Specify database name, collection name, and the number of documents to insert as arguments to insertData().
insertData("test", "testData", 400)
This operation inserts 400 documents into the testData collection in the test database. If the collection and
database do not exist, MongoDB creates them implicitly before inserting documents.
Additional Resources
Python utils to create random JSON data and import into mongoDB27
27 https://github.com/10gen-labs/ipsum
28 https://docs.mms.mongodb.com/tutorial/getting-started
54
CHAPTER 3
MongoDB provides rich semantics for reading and manipulating data. CRUD stands for create, read, update, and
delete. These terms are the foundation for all interactions with the database.
MongoDB CRUD Introduction (page 55) An introduction to the MongoDB data model as well as queries and data
manipulations.
MongoDB CRUD Concepts (page 58) The core documentation of query and data manipulation.
MongoDB CRUD Tutorials (page 91) Examples of basic query and data modification operations.
MongoDB CRUD Reference (page 127) Reference material for the query and data manipulation interfaces.
MongoDB stores all documents in collections. A collection is a group of related documents that have a set of shared
common indexes. Collections are analogous to a table in relational databases.
55
56
Data Modification
Data modification refers to operations that create, update, or delete data. In MongoDB, these operations modify the
data of a single collection. For the update and delete operations, you can specify the criteria to select the documents
to update or remove.
In the following diagram, the insert operation adds a new document to the users collection.
57
58
Distributed Queries (page 67) Describes how sharded clusters and replica sets affect the performance of read operations.
Read Operations Overview
Read operations, or queries, retrieve data stored in the database. In MongoDB, queries select documents from a single
collection.
Queries specify criteria, or conditions, that identify the documents that MongoDB returns to the clients. A query may
include a projection that specifies the fields from the matching documents to return. The projection limits the amount
of data that MongoDB returns to the client over the network.
Query Interface
For query operations, MongoDB provides a db.collection.find() method. The method accepts both the
query criteria and projections and returns a cursor (page 62) to the matching documents. You can optionally modify
the query to impose limits, skips, and sort orders.
The following diagram highlights the components of a MongoDB query operation:
Example
db.users.find( { age: { $gt: 18 } }, { name: 1, address: 1 } ).limit(5)
This query selects the documents in the users collection that match the condition age is greater than 18. To specify
the greater than condition, query criteria uses the greater than (i.e. $gt) query selection operator. The query returns
at most 5 matching documents (or more precisely, a cursor to those documents). The matching documents will return
with only the _id, name and address fields. See Projections (page 60) for details.
See
SQL to MongoDB Mapping Chart (page 130) for additional examples of MongoDB queries and the corresponding
SQL statements.
59
Query Behavior
Consider the following diagram of the query process that specifies a query criteria and a sort modifier:
In the diagram, the query selects documents from the users collection. Using a query selection operator
to define the conditions for matching documents, the query selects documents that have age greater than (i.e. $gt)
18. Then the sort() modifier sorts the results by age in ascending order.
For additional examples of queries, see Query Documents (page 95).
Projections
Queries in MongoDB return all fields in all matching documents by default. To limit the amount of data that MongoDB
sends to applications, include a projection in the queries. By projecting results with a subset of fields, applications
reduce their network overhead and processing requirements.
60
Projections, which are the second argument to the find() method, may either specify a list of fields to return or list
fields to exclude in the result documents.
Important:
projections.
Except for excluding the _id field in inclusive projections, you cannot mix exclusive and inclusive
Consider the following diagram of the query process that specifies a query criteria and a projection:
In the diagram, the query selects from the users collection. The criteria matches the documents that have age equal
to 18. Then the projection specifies that only the name field should return in the matching documents.
Projection Examples
Exclude One Field From a Result Set
db.records.find( { "user_id": { $lt: 42 } }, { "history": 0 } )
This query selects documents in the records collection that match the condition { "user_id": { $lt: 42
} }, and uses the projection { "history": 0 } to exclude the history field from the documents in the result
set.
Return Two fields and the _id Field
db.records.find( { "user_id": { $lt: 42 } }, { "name": 1, "email": 1 } )
This query selects documents in the records collection that match the query { "user_id": { $lt: 42 }
} and uses the projection { "name": 1, "email": 1 } to return just the _id field (implicitly included),
name field, and the email field in the documents in the result set.
61
This query selects documents in the records collection that match the query { "user_id":
}, and only returns the name and email fields in the documents in the result set.
{ $lt:
42}
See
Limit Fields to Return from a Query (page 106) for more examples of queries with projection statements.
To manually iterate the cursor to access the documents, see Iterate a Cursor in the mongo Shell (page 108).
Cursor Behaviors
Closure of Inactive Cursors By default, the server will automatically close the cursor after 10 minutes of inactivity
or if client has exhausted the cursor. To override this behavior, you can specify the noTimeout flag in your query
using cursor.addOption(); however, you should either close the cursor manually or exhaust the cursor. In the
mongo shell, you can set the noTimeout flag:
var myCursor = db.inventory.find().addOption(DBQuery.Option.noTimeout);
See your driver documentation for information on setting the noTimeout flag. For the mongo shell, see
cursor.addOption() for a complete list of available cursor flags.
Cursor Isolation Because the cursor is not isolated during its lifetime, intervening write operations on a document
may result in a cursor that returns a document more than once if that document has changed. To handle this situation,
see the information on snapshot mode (page 734).
1 You can use the DBQuery.shellBatchSize to change the number of iteration from the default value 20. See Executing Queries
(page 267) for more information.
62
Cursor Batches The MongoDB server returns the query results in batches. Batch size will not exceed the maximum
BSON document size. For most queries, the first batch returns 101 documents or just enough documents to exceed 1
megabyte. Subsequent batch size is 4 megabytes. To override the default size of the batch, see batchSize() and
limit().
For queries that include a sort operation without an index, the server must load all the documents in memory to perform
the sort before returning any results.
As you iterate through the cursor and reach the end of the returned batch, if there are more results, cursor.next()
will perform a getmore operation to retrieve the next batch. To see how many documents remain in the batch
as you iterate the cursor, you can use the objsLeftInBatch() method, as in the following example:
var myCursor = db.inventory.find();
var myFirstDocument = myCursor.hasNext() ? myCursor.next() : null;
myCursor.objsLeftInBatch();
Cursor Information
The db.serverStatus() method returns a document that includes a metrics field. The metrics field contains a cursor field with the following information:
number of timed out cursors since the last server restart
number of open cursors with the option DBQuery.Option.noTimeout set to prevent timeout after a period
of inactivity
number of pinned open cursors
total number of open cursors
Consider the following example which calls the db.serverStatus() method and accesses the metrics field
from the results and then the cursor field from the metrics field:
db.serverStatus().metrics.cursor
See also:
db.serverStatus()
Query Optimization
Indexes improve the efficiency of read operations by reducing the amount of data that query operations need to process.
This simplifies the work associated with fulfilling queries within MongoDB.
63
If your application queries a collection on a particular field or set of fields, then an index on the queried field or a
compound index (page 472) on the set of fields can prevent the query from scanning the whole collection to find and
return the query results. For more information about indexes, see the complete documentation of indexes in MongoDB
(page 468).
Example
An application queries the inventory collection on the type field. The value of the type field is user-driven.
var typeValue = <someUserInput>;
db.inventory.find( { type: typeValue } );
To improve the performance of this query, add an ascending, or a descending, index to the inventory collection
on the type field. 2 In the mongo shell, you can create indexes using the db.collection.createIndex()
method:
db.inventory.createIndex( { type: 1 } )
This index can prevent the above query on type from scanning the whole collection to return the results.
To analyze the performance of the query with an index, see Analyze Query Performance (page 109).
In addition to optimizing read operations, indexes can support sort operations and allow for a more efficient storage
utilization. See db.collection.createIndex() and Indexing Tutorials (page 502) for more information about
index creation.
Query Selectivity
Query selectivity refers to how well the query predicate excludes or filters out documents in a collection. Query
selectivity can determine whether or not queries can use indexes effectively or even use indexes at all.
More selective queries match a smaller percentage of documents. For instance, an equality match on the unique _id
field is highly selective as it can match at most one document.
Less selective queries match a larger percentage of documents. Less selective queries cannot use indexes effectively
or even at all.
For instance, the inequality operators $nin and $ne are not very selective since they often match a large portion of
the index. As a result, in many cases, a $nin or $ne query with an index may perform no better than a $nin or $ne
query that must scan all documents in a collection.
The selectivity of regular expressions depends on the expressions themselves. For details, see regular expression and index use.
Covering a Query
An index covers (page 64) a query when both of the following apply:
all the fields in the query (page 95) are part of an index, and
all the fields returned in the results are in the same index.
For example, a collection inventory has the following index on the type and item fields:
2 For single-field indexes, the selection between ascending and descending order is immaterial. For compound indexes, the selection is important.
See indexing order (page 473) for more details.
64
This index will cover the following operation which queries on the type and item fields and returns only the item
field:
db.inventory.find(
{ type: "food", item:/^c/ },
{ item: 1, _id: 0 }
)
For the specified index to cover the query, the projection document must explicitly specify _id:
_id field from the result since the index does not include the _id field.
0 to exclude the
Performance Because the index contains all fields required by the query, MongoDB can both match the query
conditions (page 95) and return the results using only the index.
Querying only the index can be much faster than querying documents outside of the index. Index keys are typically
smaller than the documents they catalog, and indexes are typically available in RAM or located sequentially on disk.
Limitations
Restrictions on Indexed Fields An index cannot cover a query if:
any of the indexed fields in any of the documents in the collection includes an array. If an indexed field is an
array, the index becomes a multi-key index (page 474) index and cannot support a covered query.
any of the indexed field in the query predicate or returned in the projection are fields in embedded documents.
For example, consider a collection users with documents of the following form:
The { "user.login":
Restrictions on Sharded Collection An index cannot cover a query on a sharded collection when run against a
mongos if the index does not contain the shard key, with the following exception for the _id index: If a query on a
sharded collection only specifies a condition on the _id field and returns only the _id field, the _id index can cover
the query when run against a mongos even if the _id field is not the shard key.
Changed in version 3.0: In previous versions, an index cannot cover (page 64) a query on a sharded collection when
run against a mongos.
explain To determine whether a query is a covered query, use the db.collection.explain() or the
explain() method and review the results.
db.collection.explain() provides information on the execution of other operations,
db.collection.update(). See db.collection.explain() for details.
3
such as
65
such as
Query Optimization
As collections change over time, the query optimizer deletes the query plan and re-evaluates after any of the following
events:
The collection receives 1,000 write operations.
66
Sharded clusters allow you to partition a data set among a cluster of mongod instances in a way that is nearly transparent to the application. For an overview of sharded clusters, see the Sharding (page 641) section of this manual.
For a sharded cluster, applications issue operations to one of the mongos instances associated with the cluster.
Read operations on sharded clusters are most efficient when directed to a specific shard. Queries to sharded collections
should include the collections shard key (page 654). When a query includes a shard key, the mongos can use cluster
metadata from the config database (page 650) to route the queries to shards.
If a query does not include the shard key, the mongos must direct the query to all shards in the cluster. These scatter
gather queries can be inefficient. On larger clusters, scatter gather queries are unfeasible for routine operations.
For more information on read operations in sharded clusters, see the Sharded Cluster Query Routing (page 658) and
Shard Keys (page 654) sections.
67
68
69
70
Replica sets use read preferences to determine where and how to route read operations to members of the replica set.
By default, MongoDB always reads data from a replica sets primary. You can modify that behavior by changing the
read preference mode (page 637).
You can configure the read preference mode (page 637) on a per-connection or per-operation basis to allow reads from
secondaries to:
reduce latency in multi-data-center deployments,
improve read throughput by distributing high read-volumes (relative to write volume),
for backup operations, and/or
to allow reads during failover (page 560) situations.
Read operations from secondary members of replica sets are not guaranteed to reflect the current state of the primary,
and the state of secondaries will trail the primary by some amount of time. Often, applications dont rely on this kind
of strict consistency, but application developers should always consider the needs of their application before setting
read preference.
For more information on read preference or on the read preference modes, see Read Preference (page 568) and Read
Preference Modes (page 637).
71
Write Concern (page 76) Describes the kind of guarantee MongoDB provides when reporting on the success of a
write operation.
Atomicity and Transactions (page 80) Describes write operation atomicity in MongoDB.
Distributed Write Operations (page 80) Describes how MongoDB directs write operations on sharded clusters and
replica sets and the performance characteristics of these operations.
Write Operation Performance (page 85) Introduces the performance constraints and factors for writing data to MongoDB deployments.
Bulk Write Operations (page 86) Provides an overview of MongoDBs bulk write operations.
Storage (page 88) Introduces the storage allocation strategies available for MongoDB collections.
Write Operations Overview
A write operation is any operation that creates or modifies data in the MongoDB instance. In MongoDB, write
operations target a single collection. All write operations in MongoDB are atomic on the level of a single document.
There are three classes of write operations in MongoDB: insert (page 72), update (page 73), and remove (page 74).
Insert operations add new data to a collection. Update operations modify existing data, and remove operations delete
data from a collection. No insert, update, or remove can affect more than one document atomically.
For the update and remove operations, you can specify criteria, or conditions, that identify the documents to update or
remove. These operations use the same query syntax to specify the criteria as read operations (page 58).
MongoDB allows applications to determine the acceptable level of acknowledgement required of write operations.
See Write Concern (page 76) for more information.
Insert
db.users.insert(
{
name: "sue",
age: 26,
status: "A"
}
)
73
Example
db.users.update(
{ age: { $gt: 18 } },
{ $set: { status: "A" } },
{ multi: true }
)
This update operation on the users collection sets the status field to A for the documents that match the criteria
of age greater than 18.
For more information, see db.collection.update() and update() Examples.
Default Update Behavior By default, the db.collection.update() method updates a single document.
However, with the multi option, update() can update all documents in a collection that match a query.
The db.collection.update() method either updates specific fields in the existing document or replaces the
document. See db.collection.update() for details as well as examples.
When performing update operations that increase the document size beyond the allocated space for that document, the
update operation relocates the document on disk.
MongoDB preserves the order of the document fields following write operations except for the following cases:
The _id field is always the first field in the document.
Updates that include renaming of field names may result in the reordering of fields in the document.
Changed in version 2.6: Starting in version 2.6, MongoDB actively attempts to preserve the field order in a document.
Before version 2.6, MongoDB did not actively preserve the order of the fields in a document.
Update Behavior with the upsert Option If the update() method includes upsert: true and no documents
match the query portion of the update operation, then the update operation creates a new document. If there are
matching documents, then the update operation with the upsert: true modifies the matching document or documents.
By specifying upsert: true, applications can indicate, in a single operation, that if no matching documents are found
for the update, an insert should be performed. See update() for details on performing an upsert.
Changed in version 2.6: In 2.6, the new Bulk() methods and the underlying update command allow you to perform
many updates with upsert: true operations in a single call.
If you create documents using the upsert option to update() consider using a a unique index to prevent duplicated
operations.
Remove
The
Example
db.users.remove(
{ status: "D" }
)
This delete operation on the users collection removes all documents that match the criteria of status equal to D.
For more information, see db.collection.remove() method and Remove Documents (page 105).
Remove Behavior By default, db.collection.remove() method removes all documents that match its query.
However, the method can accept a flag to limit the delete operation to a single document.
Isolation of Write Operations
The modification of a single document is always atomic, even if the write operation modifies multiple embedded
documents within that document. No other operations are atomic.
If a write operation modifies multiple documents, the operation as a whole is not atomic, and other operations may interleave. You can, however, attempt to isolate a write operation that affects multiple documents using the isolation
operator.
For more information Atomicity and Transactions (page 80).
Additional Methods
The db.collection.save() method can either update an existing document or insert a document if the document cannot be found by the _id field. See db.collection.save() for more information and examples.
MongoDB also provides methods to perform write operations in bulk. See Bulk() for more information.
75
Write Concern
Write concern describes the guarantee that MongoDB provides when reporting on the success of a write operation.
The strength of the write concerns determine the level of guarantee. When inserts, updates and deletes have a weak
write concern, write operations return quickly. In some failure cases, write operations issued with weak write concerns
may not persist. With stronger write concerns, clients wait after sending a write operation for MongoDB to confirm
the write operations.
MongoDB provides different levels of write concern to better address the specific needs of applications. Clients
may adjust write concern to ensure that the most important operations persist successfully to an entire MongoDB
deployment. For other less critical operations, clients can adjust the write concern to ensure faster performance rather
than ensure persistence to the entire deployment.
Changed in version 2.6: A new protocol for write operations (page 815) integrates write concern with the write
operations.
For details on write concern configurations, see Write Concern Reference (page 128).
Considerations
Default Write Concern The mongo shell and the MongoDB drivers use Acknowledged (page 77) as the default
write concern.
See Acknowledged (page 77) for more information, including when this write concern became the default.
Read Isolation MongoDB allows clients to read documents inserted or modified before it commits these modifications to disk, regardless of write concern level or journaling configuration. As a result, applications may observe two
classes of behaviors:
For systems with multiple concurrent readers and writers, MongoDB will allow clients to read the results of a
write operation before the write operation returns.
If the mongod terminates before the journal commits, even if a write returns successfully, queries may have
read data that will not exist after the mongod restarts.
Other database systems refer to these isolation semantics as read uncommitted. For all inserts and updates, MongoDB modifies each document in isolation: clients never see documents in intermediate states. For multi-document
operations, MongoDB does not provide any multi-document transactions or isolation.
When a standalone mongod returns a successful journaled write concern, the data is fully committed to disk and will
be available after mongod restarts.
For replica sets, write operations are durable only after a write replicates and commits to the journal on a majority of
the voting members of the set. MongoDB regularly commits data to the journal regardless of journaled write concern:
use the commitIntervalMs to control how often a mongod commits the journal.
Timeouts Clients can set a wtimeout (page 129) value as part of a replica acknowledged (page 79) write concern. If
the write concern is not satisfied in the specified interval, the operation returns an error, even if the write concern will
eventually succeed.
MongoDB does not rollback or undo modifications made before the wtimeout interval expired.
Write Concern Levels
MongoDB has the following levels of conceptual write concern, listed from weakest to strongest:
76
Unacknowledged With an unacknowledged write concern, MongoDB does not acknowledge the receipt of write
operations. Unacknowledged is similar to errors ignored; however, drivers will attempt to receive and handle network
errors when possible. The drivers ability to detect network errors depends on the systems networking configuration.
Before the releases outlined in Default Write Concern Change (page 887), this was the default write concern.
Acknowledged With a receipt acknowledged write concern, the mongod confirms that it received the write operation and applied the change to the in-memory view of data. Acknowledged write concern allows clients to catch
network, duplicate key, and other errors.
MongoDB uses the acknowledged write concern by default starting in the driver releases outlined in Releases
(page 887).
Changed in version 2.6: The mongo shell write methods now incorporates the write concern (page 76) in the write
methods and provide the default write concern whether run interactively or in a script. See Write Method Acknowledgements (page 821) for details.
Acknowledged write concern does not confirm that the write operation has persisted to the disk system.
Journaled With a journaled write concern, the MongoDB acknowledges the write operation only after committing
the data to the journal. This write concern ensures that MongoDB can recover the data following a shutdown or power
interruption.
You must have journaling enabled to use this write concern.
With a journaled write concern, write operations must wait for the next journal commit. To reduce latency for these operations, MongoDB also increases the frequency that it commits operations to the journal. See commitIntervalMs
for more information.
Note: Requiring journaled write concern in a replica set only requires a journal commit of the write operation to the
primary of the set regardless of the level of replica acknowledged write concern.
77
78
Replica Acknowledged Replica sets present additional considerations with regards to write concern. The default
write concern only requires acknowledgement from the primary.
With replica acknowledged write concern, you can guarantee that the write operation propagates to additional members
of the replica set. See Write Concern for Replica Sets (page 566) for more information.
Note: Requiring journaled write concern in a replica set only requires a journal commit of the write operation to the
primary of the set regardless of the level of replica acknowledged write concern.
See also:
Write Concern Reference (page 128)
79
Using the $isolated operator, a write operation that affect multiple documents can prevent other processes from
interleaving once the write operation modifies the first document. This ensures that no client sees the changes until the
write operation completes or errors out.
Isolated write operation does not provide all-or-nothing atomicity. That is, an error during the write operation does
not roll back all its changes that preceded the error.
The $isolated does not work on sharded clusters.
For an example of an update operation that uses the $isolated operator, see $isolated. For an example of a
remove operation that uses the $isolated operator, see isolate-remove-operations.
Transaction-Like Semantics
Since a single document can contain multiple embedded documents, single-document atomicity is sufficient for many
practical use cases. For cases where a sequence of write operations must operate as if in a single transaction, you can
implement a two-phase commit (page 114) in your application.
However, two-phase commits can only offer transaction-like semantics. Using two-phase commit ensures data consistency, but it is possible for applications to return intermediate data during the two-phase commit or rollback.
For more information on two-phase commit and rollback, see Perform Two Phase Commits (page 114).
Concurrency Control
Concurrency control allows multiple applications to run concurrently without causing data inconsistency or conflicts.
An approach may be to create a unique index (page 490) on a field (or fields) that should have only unique values (or
unique combination of values) prevents duplicate insertions or updates that result in duplicate values. For examples of
use cases, see update() and Unique Index and findAndModify() and Unique Index.
Another approach is to specify the expected current value of a field in the query predicate for the write operations. For
an example, see Update if Current (page 120).
The two-phase commit pattern provides a variation where the query predicate includes the application identifier
(page 118) as well as the expected state of the data in the write operation.
Distributed Write Operations
Write Operations on Sharded Clusters
For sharded collections in a sharded cluster, the mongos directs write operations from applications to the shards that
are responsible for the specific portion of the data set. The mongos uses the cluster metadata from the config database
(page 650) to route the write operation to the appropriate shards.
80
81
MongoDB partitions data in a sharded collection into ranges based on the values of the shard key. Then, MongoDB
distributes these chunks to shards. The shard key determines the distribution of chunks to shards. This can affect the
performance of write operations in the cluster.
Important: Update operations that affect a single document must include the shard key or the _id field. Updates
that affect multiple documents are more efficient in some situations if they have the shard key, but can be broadcast to
all shards.
If the value of the shard key increases or decreases with every insert, all insert operations target a single shard. As a
result, the capacity of a single shard becomes the limit for the insert capacity of the sharded cluster.
For more information, see Sharded Cluster Tutorials (page 669) and Bulk Write Operations (page 86).
Write Operations on Replica Sets
In replica sets, all write operations go to the sets primary, which applies the write operation then records the operations on the primarys operation log or oplog. The oplog is a reproducible sequence of operations to the data set.
Secondary members of the set are continuously replicating the oplog and applying the operations to themselves in an
asynchronous process.
Large volumes of write operations, particularly bulk operations, may create situations where the secondary members
have difficulty applying the replicating operations from the primary at a sufficient rate: this can cause the secondarys
state to fall behind that of the primary. Secondaries that are significantly behind the primary present problems for
normal operation of the replica set, particularly failover (page 560) in the form of rollbacks (page 564) as well as
general read consistency (page 565).
To help avoid this issue, you can customize the write concern (page 76) to return confirmation of the write operation
to another member 4 of the replica set every 100 or 1,000 operations. This provides an opportunity for secondaries
to catch up with the primary. Write concern can slow the overall progress of write operations but ensure that the
secondaries can maintain a largely current state with respect to the primary.
For more information on replica sets and write operations, see Replica Acknowledged (page 79), Oplog Size (page 573),
and Change the Size of the Oplog (page 608).
4 Intermittently issuing a write concern with a w value of 2 or majority will slow the throughput of write traffic; however, this practice will
allow the secondaries to remain current with the state of the primary.
Changed in version 2.6: In Master/Slave (page 575) deployments, MongoDB treats w: "majority" as equivalent to w: 1. In earlier
versions of MongoDB, w: "majority" produces an error in master/slave (page 575) deployments.
82
83
84
After every insert, update, or delete operation, MongoDB must update every index associated with the collection in
addition to the data itself. Therefore, every index on a collection adds some amount of overhead for the performance
of write operations. 5
In general, the performance gains that indexes provide for read operations are worth the insertion penalty. However,
in order to optimize write performance when possible, be careful when creating new indexes and evaluate the existing
indexes to ensure that your queries actually use these indexes.
For indexes and queries, see Query Optimization (page 63). For more information on indexes, see Indexes (page 463)
and Indexing Strategies (page 532).
Document Growth and the MMAPv1 Storage Engine
Some update operations can increase the size of the document; for instance, if an update adds a new field to the
document.
For the MMAPv1 storage engine, if an update operation causes a document to exceed the currently allocated record
size, MongoDB relocates the document on disk with enough contiguous space to hold the document. Updates that
require relocations take longer than updates that do not, particularly if the collection has indexes. If a collection has
indexes, MongoDB must update all index entries. Thus, for a collection with many indexes, the move will impact the
write throughput.
Changed in version 3.0.0: By default, MongoDB uses Power of 2 Sized Allocations (page 90) to add padding automatically (page 90) for the MMAPv1 storage engine. The Power of 2 Sized Allocations (page 90) ensures that MongoDB
allocates document space in sizes that are powers of 2, which helps ensure that MongoDB can efficiently reuse free
space created by document deletion or relocation as well as reduce the occurrences of reallocations in many cases.
Although ref:power-of-2-allocation minimizes the minimize the occurrence of reallocation, it does not eliminate document re-allocation.
See Storage (page 88) for more information.
Storage Performance
Hardware The capability of the storage system creates some important physical limits for the performance of MongoDBs write operations. Many unique factors related to the storage system of the drive affect write performance,
including random access patterns, disk caches, disk readahead and RAID configurations.
Solid state drives (SSDs) can outperform spinning hard disks (HDDs) by 100 times or more for random workloads.
See
Production Notes (page 198) for recommendations regarding additional hardware and configuration options.
Journaling MongoDB uses write ahead logging to an on-disk journal to guarantee write operation (page 71) durability and to provide crash resiliency. Before applying a change to the data files, MongoDB writes the change operation
to the journal.
5 For inserts and updates to un-indexed fields, the overhead for sparse indexes (page 490) is less than for non-sparse indexes. Also for non-sparse
indexes, updates that do not change the record size have less indexing overhead.
85
While the durability assurance provided by the journal typically outweigh the performance costs of the additional write
operations, consider the following interactions between the journal and performance:
if the journal and the data file reside on the same block device, the data files and the journal may have to contend
for a finite number of available write operations. Moving the journal to a separate device may increase the
capacity for write operations.
if applications specify write concern (page 76) that includes journaled (page 77), mongod will decrease the
duration between journal commits, which can increases the overall write load.
the duration between journal commits is configurable using the commitIntervalMs run-time option. Decreasing the period between journal commits will increase the number of write operations, which can limit
MongoDBs capacity for write operations. Increasing the amount of time between commits may decrease the
total number of write operation, but also increases the chance that the journal will not record a write operation
in the event of a failure.
For additional information on journaling, see Journaling Mechanics (page 300).
Bulk Write Operations
Overview
MongoDB provides clients the ability to perform write operations in bulk. Bulk write operations affect a single
collection. MongoDB allows applications to determine the acceptable level of acknowledgement required for bulk
write operations.
New Bulk methods provide the ability to perform bulk insert, update, and remove operations. MongoDB also supports
bulk insert through passing an array of documents to the db.collection.insert() method.
Changed in version 2.6: Previous versions of MongoDB provided the ability for bulk inserts only. With previous
versions, clients could perform bulk inserts by passing an array of documents to the db.collection.insert()6 method. To
see the documentation for earlier versions, see Bulk Inserts7 .
Ordered vs Unordered Operations
Bulk write operations can be either ordered or unordered. With an ordered list of operations, MongoDB executes
the operations serially. If an error occurs during the processing of one of the write operations, MongoDB will return
without processing any remaining write operations in the list.
With an unordered list of operations, MongoDB can execute the operations in parallel. If an error occurs during the
processing of one of the write operations, MongoDB will continue to process remaining write operations in the list.
Executing an ordered list of operations on a sharded collection will generally be slower than executing an unordered
list since with an ordered list, each operation must wait for the previous operation to finish.
Bulk Methods
86
Bulk.insert()
Bulk.find()
Bulk.find.upsert()
Bulk.find.update()
Bulk.find.updateOne()
Bulk.find.replaceOne()
Bulk.find.remove()
Bulk.find.removeOne()
3. To execute the list of operations, use the Bulk.execute() method. You can specify the write concern for
the list in the Bulk.execute() method.
Once executed, you cannot re-execute the list without reinitializing.
For example,
var bulk = db.items.initializeUnorderedBulkOp();
bulk.insert( { _id: 1, item: "abc123", status: "A", soldQty: 5000 } );
bulk.insert( { _id: 2, item: "abc456", status: "A", soldQty: 150 } );
bulk.insert( { _id: 3, item: "abc789", status: "P", soldQty: 0 } );
bulk.execute( { w: "majority", wtimeout: 5000 } );
For more examples, refer to the reference page for each http://docs.mongodb.org/manual/reference/method/js-bul
method. For information and examples on performing bulk insert using the db.collection.insert(), see
db.collection.insert().
See also:
New Write Operation Protocol (page 815)
Bulk Execution Mechanics
When executing an ordered list of operations, MongoDB groups adjacent operations by the operation type.
When executing an unordered list of operations, MongoDB groups and may also reorder the operations to increase
performance. As such, when performing unordered bulk operations, applications should not depend on the ordering.
Each group of operations can have at most 1000 operations. If a group exceeds this limit, MongoDB will
divide the group into smaller groups of 1000 or less. For example, if the bulk operations list consists of 2000 insert
operations, MongoDB creates 2 groups, each with 1000 operations.
The sizes and grouping mechanics are internal performance details and are subject to change in future versions.
To see how the operations are grouped for a bulk operation execution, call Bulk.getOperations() after the
execution.
For more information, see Bulk.execute().
Strategies for Bulk Inserts to a Sharded Collection
Large bulk insert operations, including initial data inserts or routine data import, can affect sharded cluster performance. For bulk inserts, consider the following strategies:
87
Pre-Split the Collection If the sharded collection is empty, then the collection has only one initial chunk, which
resides on a single shard. MongoDB must then take time to receive data, create splits, and distribute the split chunks
to the available shards. To avoid this performance cost, you can pre-split the collection, as described in Split Chunks
in a Sharded Cluster (page 701).
Insert to Multiple mongos To parallelize import processes, send bulk insert or insert operations to more than one
mongos instance. For empty collections, first pre-split the collection as described in Split Chunks in a Sharded Cluster
(page 701).
Avoid Monotonic Throttling If your shard key increases monotonically during an insert, then all inserted data goes
to the last chunk in the collection, which will always end up on a single shard. Therefore, the insert capacity of the
cluster will never exceed the insert capacity of that single shard.
If your insert volume is larger than what a single shard can process, and if you cannot avoid a monotonically increasing
shard key, then consider the following modifications to your application:
Reverse the binary bits of the shard key. This preserves the information and avoids correlating insertion order
with increasing sequence of values.
Swap the first and last 16-bit words to shuffle the inserts.
Example
The following example, in C++, swaps the leading and trailing 16-bit word of BSON ObjectIds generated so they are
no longer monotonically increasing.
using namespace mongo;
OID make_an_id() {
OID x = OID::gen();
const unsigned char *p = x.getData();
swap( (unsigned short&) p[0], (unsigned short&) p[10] );
return x;
}
void foo() {
// create an object
BSONObj o = BSON( "_id" << make_an_id() << "x" << 3 << "name" << "jane" );
// now we may insert o into a sharded collection
}
See also:
Shard Keys (page 654) for information on choosing a sharded key. Also see Shard Key Internals (page 654) (in
particular, Choosing a Shard Key (page 673)).
Storage
New in version 3.0: MongoDB adds support for additional storage engines. MongoDBs original storage engine,
known as mmapv1 remains the default in 3.0, but the new wiredTiger engine is available and can offer additional
flexibility and improved throughput for many workloads.
Data Model
MongoDB stores data in the form of BSON documents, which are rich mappings of keys, or field names, to values.
BSON supports a rich collection of types, and fields in BSON documents may hold arrays of values or embedded
88
documents. All documents in MongoDB must be less than 16MB, which is the BSON document size.
All documents are part of a collection, which are a logical groupings of documents in a MongoDB database. The
documents in a collection share a set of indexes, and typically these documents share common fields and structure.
In MongoDB the database construct is a group of related collections. Each database has a distinct set of data files and
can contain a large number of collections. A single MongoDB deployment may have many databases.
WiredTiger Storage Engine
MMAPv1 is MongoDBs original storage engine based on memory mapped files. It excels at workloads with high
volume inserts, reads, and in-place updates. MMAPv1 is the default storage engine in MongoDB 3.0 and all previous
versions.
89
Journal In order to ensure that all modifications to a MongoDB data set are durably written to disk, MongoDB
records all modifications to a journal that it writes to disk more frequently than it writes the data files. The journal
allows MongoDB to successfully recover data from data files after a mongod instance exits without flushing all
changes.
See Journaling Mechanics (page 300) for more information about the journal in MongoDB.
Record Storage Characteristics All records are contiguously located on disk, and when a document becomes
larger than the allocated record, MongoDB must allocate a new record. New allocations require MongoDB to move a
document and update all indexes that refer to the document, which takes more time than in-place updates and leads to
storage fragmentation.
Changed in version 3.0.0.
By default, MongoDB uses Power of 2 Sized Allocations (page 90) so that every document in MongoDB is stored in a
record which contains the document itself and extra space, or padding. Padding allows the document to grow as the
result of updates while minimizing the likelihood of reallocations.
Record Allocation Strategies MongoDB supports multiple record allocation strategies that determine how mongod
adds padding to a document when creating a record. Because documents in MongoDB may grow after insertion and
all records are contiguous on disk, the padding can reduce the need to relocate documents on disk following updates.
Relocations are less efficient than in-place updates and can lead to storage fragmentation. As a result, all padding
strategies trade additional space for increased efficiency and decreased fragmentation.
Different allocation strategies support different kinds of workloads: the power of 2 allocations (page 90) are more
efficient for insert/update/delete workloads; while exact fit allocations (page 90) is ideal for collections without update
and delete workloads.
Power of 2 Sized Allocations Changed in version 3.0.0.
MongoDB 3.0 uses the power of 2 sizes allocation as the default record allocation strategy for MMAPv1. With the
power of 2 sizes allocation strategy, each record has a size in bytes that is a power of 2 (e.g. 32, 64, 128, 256, 512 ...
2MB). For documents larger than 2MB, the allocation is rounded up to the nearest multiple of 2MB.
The power of 2 sizes allocation strategy has the following key properties:
Can efficiently reuse freed records to reduce fragmentation. Quantizing record allocation sizes into a fixed set
of sizes increases the probability that an insert will fit into the free space created by an earlier document deletion
or relocation.
Can reduce moves. The added padding space gives a document room to grow without requiring a move. In
addition to saving the cost of moving, this results in less updates to indexes. Although the power of 2 sizes
strategy can minimize moves, it does not eliminate them entirely.
No Padding Allocation Strategy Changed in version 3.0.0.
For collections whose workloads do not change the document sizes, such as workloads that consist of insertonly operations or update operations that do not increase document size (such as incrementing a counter), you
can disable the power of 2 allocation (page 90) using the collMod command with the noPadding flag or the
db.createCollection() method with the noPadding option.
Prior to version 3.0.0, MongoDB used an allocation strategy that included a dynamically calculated padding as a
factor of the document size.
90
Capped Collections
Capped collections are fixed-size collections that support high-throughput operations that store records in insertion
order. Capped collections work like circular buffers: once a collection fills its allocated space, it makes room for new
documents by overwriting the oldest documents in the collection.
See Capped Collections (page 208) for more information.
Insert a document into a collection named inventory. The operation will create the collection if the collection does
not currently exist.
db.inventory.insert(
{
item: "ABC1",
details: {
91
model: "14Q3",
manufacturer: "XYZ Company"
},
stock: [ { size: "S", qty: 25 }, { size: "M", qty: 50 } ],
category: "clothing"
}
)
The operation returns a WriteResult object with the status of the operation. A successful insert of the document
returns the following object:
WriteResult({ "nInserted" : 1 })
The nInserted field specifies the number of documents inserted. If the operation encounters an error, the
WriteResult object will contain the error information.
Step 2: Review the inserted document.
If the insert operation is successful, verify the insertion by querying the collection.
db.inventory.find()
The returned document shows that MongoDB added an _id field to the document. If a client inserts a document that
does not contain the _id field, MongoDB adds the field with the value set to a generated ObjectId8 . The ObjectId9
values in your documents will differ from the ones shown.
Insert an Array of Documents
You can pass an array of documents to the db.collection.insert() method to insert multiple documents.
Step 1: Create an array of documents.
92
item: "IJK2",
details: { model: "14Q2", manufacturer: "M5 Corporation" },
stock: [ { size: "S", qty: 5 }, { size: "L", qty: 1 } ],
category: "houseware"
}
];
The method returns a BulkWriteResult object with the status of the operation. A successful insert of the documents returns the following object:
BulkWriteResult({
"writeErrors" : [ ],
"writeConcernErrors" : [ ],
"nInserted" : 3,
"nUpserted" : 0,
"nMatched" : 0,
"nModified" : 0,
"nRemoved" : 0,
"upserted" : [ ]
})
The nInserted field specifies the number of documents inserted. If the operation encounters an error, the
BulkWriteResult object will contain information regarding the error.
The inserted documents will each have an _id field added by MongoDB.
Insert Multiple Documents with Bulk
New in version 2.6.
MongoDB provides a Bulk() API that you can use to perform multiple write operations in bulk. The following
sequence of operations describes how you would use the Bulk() API to insert a group of documents into a MongoDB
collection.
Step 1: Initialize a Bulk operations builder.
The operation returns an unordered operations builder which maintains a list of operations to perform. Unordered
operations means that MongoDB can execute in parallel as well as in nondeterministic order. If an error occurs during
the processing of one of the write operations, MongoDB will continue to process remaining write operations in the
list.
You can also initialize an ordered operations builder; see db.collection.initializeOrderedBulkOp()
for details.
93
Add two insert operations to the bulk object using the Bulk.insert() method.
bulk.insert(
{
item: "BE10",
details: { model: "14Q2", manufacturer: "XYZ Company" },
stock: [ { size: "L", qty: 5 } ],
category: "clothing"
}
);
bulk.insert(
{
item: "ZYT1",
details: { model: "14Q1", manufacturer: "ABC Company" },
stock: [ { size: "S", qty: 5 }, { size: "M", qty: 5 } ],
category: "houseware"
}
);
Call the execute() method on the bulk object to execute the operations in its list.
bulk.execute();
The method returns a BulkWriteResult object with the status of the operation. A successful insert of the documents returns the following object:
BulkWriteResult({
"writeErrors" : [ ],
"writeConcernErrors" : [ ],
"nInserted" : 2,
"nUpserted" : 0,
"nMatched" : 0,
"nModified" : 0,
"nRemoved" : 0,
"upserted" : [ ]
})
The nInserted field specifies the number of documents inserted. If the operation encounters an error, the
BulkWriteResult object will contain information regarding the error.
Additional Examples and Methods
For more examples, see db.collection.insert().
The db.collection.update() method, the db.collection.findAndModify(), and the
db.collection.save() method can also add new documents. See the individual reference pages for the
methods for more information and examples.
94
10
The
This tutorial provides examples of read operations using the db.collection.find() method in the mongo
shell. In these examples, the retrieved documents contain all their fields. To restrict the fields to return in the retrieved
documents, see Limit Fields to Return from a Query (page 106).
Select All Documents in a Collection
An empty query document ({}) selects all documents in the collection:
db.inventory.find( {} )
Not specifying a query document to the find() is equivalent to specifying an empty query document. Therefore the
following operation is equivalent to the previous operation:
db.inventory.find()
The following example retrieves from the inventory collection all documents where the type field has the value
snacks:
db.inventory.find( { type: "snacks" } )
Although you can express this query using the $or operator, use the $in operator rather than the $or operator when
performing equality checks on the same field.
Refer to the http://docs.mongodb.org/manual/reference/operator/query document for the complete list of query operators.
Specify AND Conditions
A compound query can specify conditions for more than one field in the collections documents. Implicitly, a logical
AND conjunction connects the clauses of a compound query so that the query selects the documents in the collection
that match all the conditions.
In the following example, the query document specifies an equality match on the field type and a less than ($lt)
comparison match on the field price:
10
The db.collection.findOne() method also performs a read operation to return a single document.
db.collection.findOne() method is the db.collection.find() method with a limit of 1.
Internally, the
95
This query selects all documents where the type field has the value food and the value of the price field is less
than 9.95. See comparison operators for other comparison operators.
Specify OR Conditions
Using the $or operator, you can specify a compound query that joins each clause with a logical OR conjunction so
that the query selects the documents in the collection that match at least one condition.
In the following example, the query document selects all documents in the collection where the field qty has a value
greater than ($gt) 100 or the value of the price field is less than ($lt) 9.95:
db.inventory.find(
{
$or: [ { qty: { $gt: 100 } }, { price: { $lt: 9.95 } } ]
}
)
Embedded Documents
When the field holds an embedded document, a query can either specify an exact match on the embedded document
or specify a match by individual fields in the embedded document using the dot notation.
Exact Match on the Embedded Document
To specify an equality match on the whole embedded document, use the query document { <field>: <value>
} where <value> is the document to match. Equality matches on an embedded document require an exact match of
the specified <value>, including the field order.
In the following example, the query matches all documents where the value of the field producer is an embedded
document that contains only the field company with the value ABC123 and the field address with the value
123 Street, in the exact order:
db.inventory.find(
{
producer:
{
company: 'ABC123',
96
Use the dot notation to match by specific fields in an embedded document. Equality matches for specific fields in
an embedded document will select documents in the collection where the embedded document contains the specified
fields with the specified values. The embedded document can contain additional fields.
In the following example, the query uses the dot notation to match all documents where the value of the field
producer is an embedded document that contains a field company with the value ABC123 and may contain
other fields:
db.inventory.find( { 'producer.company': 'ABC123' } )
Arrays
When the field holds an array, you can query for an exact array match or for specific values in the array. If the array
holds embedded documents, you can query for specific fields in the embedded documents using dot notation.
If you specify multiple conditions using the $elemMatch operator, the array must contain at least one element that
satisfies all the conditions. See Single Element Satisfies the Criteria (page 98).
If you specify multiple conditions without using the $elemMatch operator, then some combination of the array
elements, not necessarily a single element, must satisfy all the conditions; i.e. different elements in the array can
satisfy different parts of the conditions. See Combination of Elements Satisfies the Criteria (page 98).
Consider an inventory collection that contains the following documents:
{ _id: 5, type: "food", item: "aaa", ratings: [ 5, 8, 9 ] }
{ _id: 6, type: "food", item: "bbb", ratings: [ 5, 9 ] }
{ _id: 7, type: "food", item: "ccc", ratings: [ 9, 5, 8 ] }
To specify equality match on an array, use the query document { <field>: <value> } where <value> is
the array to match. Equality matches on the array require that the array field match exactly the specified <value>,
including the element order.
The following example queries for all documents where the field ratings is an array that holds exactly three elements, 5, 8, and 9, in this order:
db.inventory.find( { ratings: [ 5, 8, 9 ] } )
Equality matches can specify a single element in the array to match. These specifications match if the array contains
at least one element with the specified value.
3.3. MongoDB CRUD Tutorials
97
The following example queries for all documents where ratings is an array that contains 5 as one of its elements:
db.inventory.find( { ratings: 5 } )
Equality matches can specify equality matches for an element at a particular index or position of the array using the
dot notation.
In the following example, the query uses the dot notation to match all documents where the ratings array contains
5 as the first element:
db.inventory.find( { 'ratings.0': 5 } )
Single Element Satisfies the Criteria Use $elemMatch operator to specify multiple criteria on the elements of
an array such that at least one array element satisfies all the specified criteria.
The following example queries for documents where the ratings array contains at least one element that is greater
than ($gt) 5 and less than ($lt) 9:
db.inventory.find( { ratings: { $elemMatch: { $gt: 5, $lt: 9 } } } )
The operation returns the following documents, whose ratings array contains the element 8 which meets the criteria:
{ "_id" : 5, "type" : "food", "item" : "aaa", "ratings" : [ 5, 8, 9 ] }
{ "_id" : 7, "type" : "food", "item" : "ccc", "ratings" : [ 9, 5, 8 ] }
Combination of Elements Satisfies the Criteria The following example queries for documents where the
ratings array contains elements that in some combination satisfy the query conditions; e.g., one element can satisfy
the greater than 5 condition and another element can satisfy the less than 9 condition, or a single element can satisfy
both:
db.inventory.find( { ratings: { $gt: 5, $lt: 9 } } )
The document with the "ratings" : [ 5, 9 ] matches the query since the element 9 is greater than 5 (the
first condition) and the element 5 is less than 9 (the second condition).
98
Match a Field in the Embedded Document Using the Array Index If you know the array index of the embedded
document, you can specify the document using the embedded documents position using the dot notation.
The following example selects all documents where the memos contains an array whose first element (i.e. index is 0)
is a document that contains the field by whose value is shipping:
db.inventory.find( { 'memos.0.by': 'shipping' } )
Match a Field Without Specifying Array Index If you do not know the index position of the document in the array,
concatenate the name of the field that contains the array, with a dot (.) and the name of the field in the embedded
document.
The following example selects all documents where the memos field contains an array that contains at least one
embedded document that contains the field by with the value shipping:
db.inventory.find( { 'memos.by': 'shipping' } )
99
type: "food",
item: "xyz",
qty: 25,
price: 2.5,
ratings: [ 5, 8, 9 ],
memos: [ { memo: "on time", by: "shipping" }, { memo: "approved", by: "billing" } ]
}
{
_id: 101,
type: "fruit",
item: "jkl",
qty: 10,
price: 4.25,
ratings: [ 5, 9 ],
memos: [ { memo: "on time", by: "payment" }, { memo: "delayed", by: "shipping" } ]
}
Single Element Satisfies the Criteria Use $elemMatch operator to specify multiple criteria on an array of embedded documents such that at least one embedded document satisfies all the specified criteria.
The following example queries for documents where the memos array has at least one embedded document that
contains both the field memo equal to on time and the field by equal to shipping:
db.inventory.find(
{
memos:
{
$elemMatch:
{
memo: 'on time',
by: 'shipping'
}
}
}
)
Combination of Elements Satisfies the Criteria The following example queries for documents where the memos
array contains elements that in some combination satisfy the query conditions; e.g. one element satisfies the field
memo equal to on time condition and another element satisfies the field by equal to shipping condition, or
a single element can satisfy both criteria:
100
db.inventory.find(
{
'memos.memo': 'on time',
'memos.by': 'shipping'
}
)
101
For the document with item equal to "MNO2", use the $set operator to update the category field and the
details field to the specified values and the $currentDate operator to update the field lastModified with
the current date.
db.inventory.update(
{ item: "MNO2" },
{
$set: {
category: "apparel",
details: { model: "14Q3", manufacturer: "XYZ Company" }
},
$currentDate: { lastModified: true }
}
)
The update operation returns a WriteResult object which contains the status of the operation. A successful update
of the document returns the following object:
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
The nMatched field specifies the number of existing documents matched for the update, and nModified specifies
the number of existing documents modified.
Step 2: Update an embedded field.
To update a field within an embedded document, use the dot notation. When using the dot notation, enclose the whole
dotted field name in quotes.
The following updates the model field within the embedded details document.
db.inventory.update(
{ item: "ABC1" },
{ $set: { "details.model": "14Q2" } }
)
The update operation returns a WriteResult object which contains the status of the operation. A successful update
of the document returns the following object:
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
By default, the update() method updates a single document. To update multiple documents, use the multi option
in the update() method.
Update the category field to "apparel" and update the lastModified field to the current date for all documents that have category field equal to "clothing".
db.inventory.update(
{ category: "clothing" },
{
$set: { category: "apparel" },
$currentDate: { lastModified: true }
},
102
{ multi: true }
)
The update operation returns a WriteResult object which contains the status of the operation. A successful update
of the document returns the following object:
WriteResult({ "nMatched" : 3, "nUpserted" : 0, "nModified" : 3 })
The following operation replaces the document with item equal to "BE10". The newly replaced document will only
contain the the _id field and the fields in the replacement document.
db.inventory.update(
{ item: "BE10" },
{
item: "BE05",
stock: [ { size: "S", qty: 20 }, { size: "M", qty: 5 } ],
category: "apparel"
}
)
The update operation returns a WriteResult object which contains the status of the operation. A successful update
of the document returns the following object:
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
upsert Option
By default, if no document matches the update query, the update() method does nothing.
However, by specifying upsert: true, the update() method either updates matching document or documents, or
inserts a new document using the update specification if no matching document exists.
Step 1: Specify upsert:
When you specify upsert: true for an update operation to replace a document and no matching documents
are found, MongoDB creates a new document using the equality conditions in the update conditions document, and
replaces this document, except for the _id field if specified, with the update document.
The following operation either updates a matching document by replacing it with a new document or adds a new
document if no matching document exists.
103
db.inventory.update(
{ item: "TBD1" },
{
item: "TBD1",
details: { "model" : "14Q4", "manufacturer" : "ABC Company" },
stock: [ { "size" : "S", "qty" : 25 } ],
category: "houseware"
},
{ upsert: true }
)
The update operation returns a WriteResult object which contains the status of the operation, including whether
the db.collection.update() method modified an existing document or added a new document.
WriteResult({
"nMatched" : 0,
"nUpserted" : 1,
"nModified" : 0,
"_id" : ObjectId("53dbd684babeaec6342ed6c7")
})
When you specify upsert: true for an update operation that modifies specific fields and no matching documents
are found, MongoDB creates a new document using the equality conditions in the update conditions document, and
applies the modification as specified in the update document.
The following update operation either updates specific fields of a matching document or adds a new document if no
matching document exists.
db.inventory.update(
{ item: "TBD2" },
{
$set: {
details: { "model" : "14Q3", "manufacturer" : "IJK Co." },
category: "houseware"
}
},
{ upsert: true }
)
The update operation returns a WriteResult object which contains the status of the operation, including whether
the db.collection.update() method modified an existing document or added a new document.
WriteResult({
"nMatched" : 0,
"nUpserted" : 1,
"nModified" : 0,
"_id" : ObjectId("53dbd7c8babeaec6342ed6c8")
})
104
To remove all documents from a collection, it may be more efficient to use the drop() method to drop the entire
collection, including the indexes, and then recreate the collection and rebuild the indexes.
Remove Documents that Match a Condition
To remove the documents that match a deletion criteria, call the remove() method with the <query> parameter.
The following example removes all documents from the inventory collection where the type field equals food:
db.inventory.remove( { type : "food" } )
For large deletion operations, it may be more efficient to copy the documents that you want to keep to a new collection
and then use drop() on the original collection.
Remove a Single Document that Matches a Condition
To remove a single document, call the remove() method with the justOne parameter set to true or 1.
The following example removes one document from the inventory collection where the type field equals food:
db.inventory.remove( { type : "food" }, 1 )
To delete a single document sorted by some specified order, use the findAndModify() method.
3.3. MongoDB CRUD Tutorials
105
<1 or true>
<0 or false>
Description
Specify the inclusion of a field.
Specify the suppression of the field.
Important: The _id field is, by default, included in the result set. To suppress the _id field from the result set,
specify _id: 0 in the projection document.
You cannot combine inclusion and exclusion semantics in a single projection with the exception of the _id field.
This tutorial offers various query examples that limit the fields to return for all matching documents. The examples in
this tutorial use a collection inventory and use the db.collection.find() method in the mongo shell. The
db.collection.find() method returns a cursor (page 62) to the retrieved documents. For examples on query
selection criteria, see Query Documents (page 95).
Return All Fields in Matching Documents
If you specify no projection, the find() method returns all fields of all documents that match the query.
db.inventory.find( { type: 'food' } )
This operation will return all documents in the inventory collection where the value of the type field is food.
The returned documents contain all its fields.
Return the Specified Fields and the _id Field Only
A projection can explicitly include several fields. In the following operation, find() method returns all documents
that match the query. In the result set, only the item and qty fields and, by default, the _id field return in the
matching documents.
db.inventory.find( { type: 'food' }, { item: 1, qty: 1 } )
This operation returns all documents that match the query. In the result set, only the item and qty fields return in
the matching documents.
Return All But the Excluded Field
To exclude a single field or group of fields you can use a projection in the following form:
db.inventory.find( { type: 'food' }, { type:0 } )
106
This operation returns all documents where the value of the type field is food. In the result set, the type field does
not return in the matching documents.
With the exception of the _id field you cannot combine inclusion and exclusion statements in projection documents.
Projection for Array Fields
For fields that contain arrays, MongoDB provides the following projection operators: $elemMatch, $slice, and
$.
For example, the inventory collection contains the following document:
{ "_id" : 5, "type" : "food", "item" : "aaa", "ratings" : [ 5, 8, 9 ] }
Then the following operation uses the $slice projection operator to return just the first two elements in the ratings
array.
db.inventory.find( { _id: 5 }, { ratings: { $slice: 2 } } )
$elemMatch, $slice, and $ are the only way to project portions of an array. For instance, you cannot project a
portion of an array using the array index; e.g. { "ratings.0": 1 } projection will not project the array with
the first element.
107
db.students.update(
{ _id: 1 },
{
$push: {
scores: {
$each: [ { attempt: 3, score: 7 }, { attempt: 4, score: 4 } ],
$sort: { score: 1 },
$slice: -3
}
}
}
)
Note: When using the $sort modifier on the array element, access the field in the embedded document element
directly instead of using the dot notation on the array field.
After the operation, the document contains only the top 3 scores in the scores array:
{
"_id" : 1,
"scores" : [
{ "attempt" : 3, "score" : 7 },
{ "attempt" : 2, "score" : 8 },
{ "attempt" : 1, "score" : 10 }
]
}
See also:
$push operator,
$each modifier,
$sort modifier, and
$slice modifier.
13
13 You can use the DBQuery.shellBatchSize to change the number of iteration from the default value 20. See Executing Queries
(page 267) for more information.
108
You can also use the cursor method next() to access the documents, as in the following example:
var myCursor = db.inventory.find( { type: 'food' } );
while (myCursor.hasNext()) {
print(tojson(myCursor.next()));
}
As an alternative print operation, consider the printjson() helper method to replace print(tojson()):
var myCursor = db.inventory.find( { type: 'food' } );
while (myCursor.hasNext()) {
printjson(myCursor.next());
}
You can use the cursor method forEach() to iterate the cursor and access the documents, as in the following
example:
var myCursor =
myCursor.forEach(printjson);
See JavaScript cursor methods and your driver documentation for more information on cursor methods.
Iterator Index
In the mongo shell, you can use the toArray() method to iterate the cursor and return the documents in an array,
as in the following:
var myCursor = db.inventory.find( { type: 'food' } );
var documentArray = myCursor.toArray();
var myDocument = documentArray[3];
The toArray() method loads into RAM all documents returned by the cursor; the toArray() method exhausts
the cursor.
Additionally, some drivers provide access to the documents by using an index on the cursor (i.e.
cursor[index]). This is a shortcut for first calling the toArray() method and then using an index on the
resulting array.
Consider the following example:
var myCursor = db.inventory.find( { type: 'food' } );
var myDocument = myCursor[3];
109
such as
"_id"
"_id"
"_id"
"_id"
"_id"
"_id"
"_id"
"_id"
"_id"
"_id"
:
:
:
:
:
:
:
:
:
:
The following query retrieves documents where the quantity field has a value between 100 and 200, inclusive:
db.inventory.find( { quantity: { $gte: 100, $lte: 200 } } )
110
...
},
...
},
...
}
To support the query on the quantity field, add an index on the quantity field:
db.inventory.createIndex( { quantity: 1 } )
111
...
}
To support the query, add a compound index (page 472). With compound indexes (page 472), the order of the fields
matter.
For example, add the following two compound indexes. The first index orders by quantity field first, and then the
type field. The second index orders by type first, and then the quantity field.
db.inventory.createIndex( { quantity: 1, type: 1 } )
db.inventory.createIndex( { type: 1, quantity: 1 } )
112
"rejectedPlans" : [ ]
},
"executionStats" : {
"executionSuccess" : true,
"nReturned" : 2,
"executionTimeMillis" : 0,
"totalKeysExamined" : 5,
"totalDocsExamined" : 2,
"executionStages" : {
...
}
},
...
}
1, quantity:
113
See also:
Query Optimization (page 63), Query Plans (page 66), Optimize Query Performance (page 214), Indexing Strategies
(page 532)
Pattern
Overview
Consider a scenario where you want to transfer funds from account A to account B. In a relational database system,
you can subtract the funds from A and add the funds to B in a single multi-statement transaction. In MongoDB, you
can emulate a two-phase commit to achieve a comparable result.
The examples in this tutorial use the following two collections:
1. A collection named accounts to store account information.
2. A collection named transactions to store information on the fund transfer transactions.
114
Insert into the accounts collection a document for account A and a document for account B.
db.accounts.insert(
[
{ _id: "A", balance: 1000, pendingTransactions: [] },
{ _id: "B", balance: 1000, pendingTransactions: [] }
]
)
The operation returns a BulkWriteResult() object with the status of the operation. Upon successful insert, the
BulkWriteResult() has nInserted set to 2 .
Initialize Transfer Record
For each fund transfer to perform, insert into the transactions collection a document with the transfer information.
The document contains the following fields:
source and destination fields, which refer to the _id fields from the accounts collection,
value field, which specifies the amount of transfer affecting the balance of the source and
destination accounts,
state field, which reflects the current state of the transfer. The state field can have the value of initial,
pending, applied, done, canceling, and canceled.
lastModified field, which reflects last modification date.
To initialize the transfer of 100 from account A to account B, insert into the transactions collection a document
with the transfer information, the transaction state of "initial", and the lastModified field set to the current
date:
db.transactions.insert(
{ _id: 1, source: "A", destination: "B", value: 100, state: "initial", lastModified: new Date() }
)
The operation returns a WriteResult() object with the status of the operation. Upon successful insert, the
WriteResult() object has nInserted set to 1.
Transfer Funds Between Accounts Using Two-Phase Commit
Step 1: Retrieve the transaction to start. From the transactions collection, find a transaction in the initial
state. Currently the transactions collection has only one document, namely the one added in the Initialize
Transfer Record (page 115) step. If the collection contains additional documents, the query will return any transaction
with an initial state unless you specify additional query conditions.
var t = db.transactions.findOne( { state: "initial" } )
Type the variable t in the mongo shell to print the contents of the variable. The operation should print a document
similar to the following except the lastModified field should reflect date of your insert operation:
{ "_id" : 1, "source" : "A", "destination" : "B", "value" : 100, "state" : "initial", "lastModified"
115
Step 2: Update transaction state to pending. Set the transaction state from initial to pending and use the
$currentDate operator to set the lastModified field to the current date.
db.transactions.update(
{ _id: t._id, state: "initial" },
{
$set: { state: "pending" },
$currentDate: { lastModified: true }
}
)
The operation returns a WriteResult() object with the status of the operation. Upon successful update, the
nMatched and nModified displays 1.
In the update statement, the state: "initial" condition ensures that no other process has already updated this
record. If nMatched and nModified is 0, go back to the first step to get a different transaction and restart the
procedure.
Step 3: Apply the transaction to both accounts. Apply the transaction t to both accounts using the update()
method if the transaction has not been applied to the accounts. In the update condition, include the condition
pendingTransactions: { $ne: t._id } in order to avoid re-applying the transaction if the step is run
more than once.
To apply the transaction to the account, update both the balance field and the pendingTransactions field.
Update the source account, subtracting from its balance the transaction value and adding to its
pendingTransactions array the transaction _id.
db.accounts.update(
{ _id: t.source, pendingTransactions: { $ne: t._id } },
{ $inc: { balance: -t.value }, $push: { pendingTransactions: t._id } }
)
Upon successful update, the method returns a WriteResult() object with nMatched and nModified set to 1.
Update the destination account, adding to its balance the transaction value and adding to its
pendingTransactions array the transaction _id .
db.accounts.update(
{ _id: t.destination, pendingTransactions: { $ne: t._id } },
{ $inc: { balance: t.value }, $push: { pendingTransactions: t._id } }
)
Upon successful update, the method returns a WriteResult() object with nMatched and nModified set to 1.
Step 4: Update transaction state to applied. Use the following update() operation to set the transactions
state to applied and update the lastModified field:
db.transactions.update(
{ _id: t._id, state: "pending" },
{
$set: { state: "applied" },
$currentDate: { lastModified: true }
}
)
Upon successful update, the method returns a WriteResult() object with nMatched and nModified set to 1.
116
Step 5: Update both accounts list of pending transactions. Remove the applied transaction _id from the
pendingTransactions array for both accounts.
Update the source account.
db.accounts.update(
{ _id: t.source, pendingTransactions: t._id },
{ $pull: { pendingTransactions: t._id } }
)
Upon successful update, the method returns a WriteResult() object with nMatched and nModified set to 1.
Update the destination account.
db.accounts.update(
{ _id: t.destination, pendingTransactions: t._id },
{ $pull: { pendingTransactions: t._id } }
)
Upon successful update, the method returns a WriteResult() object with nMatched and nModified set to 1.
Step 6: Update transaction state to done. Complete the transaction by setting the state of the transaction to
done and updating the lastModified field:
db.transactions.update(
{ _id: t._id, state: "applied" },
{
$set: { state: "done" },
$currentDate: { lastModified: true }
}
)
Upon successful update, the method returns a WriteResult() object with nMatched and nModified set to 1.
Recovering from Failure Scenarios
The most important part of the transaction procedure is not the prototypical example above, but rather the possibility
for recovering from the various failure scenarios when transactions do not complete successfully. This section presents
an overview of possible failures and provides steps to recover from these kinds of events.
Recovery Operations
The two-phase commit pattern allows applications running the sequence to resume the transaction and arrive at a
consistent state. Run the recovery operations at application startup, and possibly at regular intervals, to catch any
unfinished transactions.
The time required to reach a consistent state depends on how long the application needs to recover each transaction.
The following recovery procedures uses the lastModified date as an indicator of whether the pending transaction
requires recovery; specifically, if the pending or applied transaction has not been updated in the last 30 minutes,
the procedures determine that these transactions require recovery. You can use different conditions to make this
determination.
Transactions in Pending State To recover from failures that occur after step Update transaction state to pending.
(page ??) but before Update transaction state to applied. (page ??) step, retrieve from the transactions
collection a pending transaction for recovery:
117
And resume from step Apply the transaction to both accounts. (page ??)
Transactions in Applied State To recover from failures that occur after step Update transaction state to applied.
(page ??) but before Update transaction state to done. (page ??) step, retrieve from the transactions collection
an applied transaction for recovery:
var dateThreshold = new Date();
dateThreshold.setMinutes(dateThreshold.getMinutes() - 30);
var t = db.transactions.findOne( { state: "applied", lastModified: { $lt: dateThreshold } } );
And resume from Update both accounts list of pending transactions. (page ??)
Rollback Operations
In some cases, you may need to roll back or undo a transaction; e.g., if the application needs to cancel the
transaction or if one of the accounts does not exist or stops existing during the transaction.
Transactions in Applied State After the Update transaction state to applied. (page ??) step, you should not
roll back the transaction. Instead, complete that transaction and create a new transaction (page 115) to reverse the
transaction by switching the values in the source and the destination fields.
Transactions in Pending State After the Update transaction state to pending. (page ??) step, but before the
Update transaction state to applied. (page ??) step, you can rollback the transaction using the following procedure:
Step 1: Update transaction state to canceling. Update the transaction state from pending to canceling.
db.transactions.update(
{ _id: t._id, state: "pending" },
{
$set: { state: "canceling" },
$currentDate: { lastModified: true }
}
)
Upon successful update, the method returns a WriteResult() object with nMatched and nModified set to 1.
Step 2: Undo the transaction on both accounts. To undo the transaction on both accounts, reverse the transaction
t if the transaction has been applied. In the update condition, include the condition pendingTransactions:
t._id in order to update the account only if the pending transaction has been applied.
Update the destination account, subtracting from its balance the transaction value and removing the transaction
_id from the pendingTransactions array.
db.accounts.update(
{ _id: t.destination, pendingTransactions: t._id },
{
$inc: { balance: -t.value },
118
Upon successful update, the method returns a WriteResult() object with nMatched and nModified set to
1. If the pending transaction has not been previously applied to this account, no document will match the update
condition and nMatched and nModified will be 0.
Update the source account, adding to its balance the transaction value and removing the transaction _id from
the pendingTransactions array.
db.accounts.update(
{ _id: t.source, pendingTransactions: t._id },
{
$inc: { balance: t.value},
$pull: { pendingTransactions: t._id }
}
)
Upon successful update, the method returns a WriteResult() object with nMatched and nModified set to
1. If the pending transaction has not been previously applied to this account, no document will match the update
condition and nMatched and nModified will be 0.
Step 3: Update transaction state to canceled. To finish the rollback, update the transaction state from
canceling to cancelled.
db.transactions.update(
{ _id: t._id, state: "canceling" },
{
$set: { state: "cancelled" },
$currentDate: { lastModified: true }
}
)
Upon successful update, the method returns a WriteResult() object with nMatched and nModified set to 1.
Multiple Applications
Transactions exist, in part, so that multiple applications can create and run operations concurrently without causing
data inconsistency or conflicts. In our procedure, to update or retrieve the transaction document, the update conditions
include a condition on the state field to prevent reapplication of the transaction by multiple applications.
For example, applications App1 and App2 both grab the same transaction, which is in the initial state. App1
applies the whole transaction before App2 starts. When App2 attempts to perform the Update transaction state to
pending. (page ??) step, the update condition, which includes the state: "initial" criterion, will not match
any document, and the nMatched and nModified will be 0. This should signal to App2 to go back to the first step
to restart the procedure with a different transaction.
When multiple applications are running, it is crucial that only one application can handle a given transaction at any
point in time. As such, in addition including the expected state of the transaction in the update condition, you can
also create a marker in the transaction document itself to identify the application that is handling the transaction. Use
findAndModify() method to modify the transaction and get it back in one step:
t = db.transactions.findAndModify(
{
query: { state: "initial", application: { $exists: false } },
update:
119
{
$set: { state: "pending", application: "App1" },
$currentDate: { lastModified: true }
},
new: true
}
)
Amend the transaction operations to ensure that only applications that match the identifier in the application field
apply the transaction.
If the application App1 fails during transaction execution, you can use the recovery procedures (page 117), but applications should ensure that they own the transaction before applying the transaction. For example to find and resume
the pending job, use a query that resembles the following:
var dateThreshold = new Date();
dateThreshold.setMinutes(dateThreshold.getMinutes() - 30);
db.transactions.find(
{
application: "App1",
state: "pending",
lastModified: { $lt: dateThreshold }
}
)
120
Changed in version 2.6: The db.collection.update() method now returns a WriteResult() object that
contains the status of the operation. Previous versions required an extra db.getLastErrorObj() method call.
var myDocument = db.products.findOne( { sku: "abc123" } );
if ( myDocument ) {
var oldQuantity = myDocument.quantity;
var oldReordered = myDocument.reordered;
var results = db.products.update(
{
_id: myDocument._id,
quantity: oldQuantity,
reordered: oldReordered
},
{
$inc: { quantity: 50 },
$set: { reordered: true }
}
)
if ( results.hasWriteError() ) {
print( "unexpected error updating document: " + tojson(results) );
}
else if ( results.nMatched === 0 ) {
print( "No matching document for " +
"{ _id: "+ myDocument._id.toString() +
", quantity: " + oldQuantity +
", reordered: " + oldReordered
+ " } "
);
}
}
121
Note: If your query is on an indexed field, do not use tailable cursors, but instead, use a regular cursor. Keep track of
the last value of the indexed field returned by the query. To retrieve the newly added documents, query the collection
again using the last value of the indexed field in the query criteria, as in the following example:
db.<collection>.find( { indexedField: { $gt: <lastvalue> } } )
122
continue;
}
BSONObj o = c->next();
lastValue = o["insertDate"];
cout << o.toString() << endl;
}
query = QUERY( "insertDate" << GT << lastValue ).hint( BSON( "$natural" << 1 ) );
}
}
* Loop through the outer while (1) loop to re-query with the new query condition and repeat.
See also:
Detailed blog post on tailable cursor14
14 http://shtylman.com/post/the-tail-of-mongodb
123
Counter Collection Implementation Use a separate counters collection to track the last number sequence used.
The _id field contains the sequence name and the seq field contains the last value of the sequence.
1. Insert into the counters collection, the initial value for the userid:
db.counters.insert(
{
_id: "userid",
seq: 0
}
)
2. Create a getNextSequence function that accepts a name of the sequence. The function uses the
findAndModify() method to atomically increment the seq value and return this new value:
function getNextSequence(name) {
var ret = db.counters.findAndModify(
{
query: { _id: name },
update: { $inc: { seq: 1 } },
new: true
}
);
return ret.seq;
}
124
}
)
db.users.insert(
{
_id: getNextSequence("userid"),
name: "Bob D."
}
)
findAndModify Behavior When findAndModify() includes the upsert: true option and the query
field(s) is not uniquely indexed, the method could insert a document multiple times in certain circumstances. For
instance, if multiple clients each invoke the method with the same query condition and these methods complete the
find phase before any of methods perform the modify phase, these methods could insert the same document.
In the counters collection example, the query field is the _id field, which always has a unique index. Consider
that the findAndModify() includes the upsert: true option, as in the following modified example:
function getNextSequence(name) {
var ret = db.counters.findAndModify(
{
query: { _id: name },
update: { $inc: { seq: 1 } },
new: true,
upsert: true
}
);
return ret.seq;
}
If multiple clients were to invoke the getNextSequence() method with the same name parameter, then the
methods would observe one of the following behaviors:
Exactly one findAndModify() would successfully insert a new document.
Zero or more findAndModify() methods would update the newly inserted document.
Zero or more findAndModify() methods would fail when they attempted to insert a duplicate.
If the method fails due to a unique index constraint violation, retry the method. Absent a delete of the document, the
retry should not fail.
125
Optimistic Loop
In this pattern, an Optimistic Loop calculates the incremented _id value and attempts to insert a document with the
calculated _id value. If the insert is successful, the loop ends. Otherwise, the loop will iterate through possible _id
values until the insert is successful.
1. Create a function named insertDocument that performs the insert if not present loop. The function wraps
the insert() method and takes a doc and a targetCollection arguments.
Changed in version 2.6: The db.collection.insert() method now returns a writeresults-insert object
that contains the status of the operation. Previous versions required an extra db.getLastErrorObj()
method call.
function insertDocument(doc, targetCollection) {
while (1) {
var cursor = targetCollection.find( {}, { _id: 1 } ).sort( { _id: -1 } ).limit(1);
var seq = cursor.hasNext() ? cursor.next()._id + 1 : 1;
doc._id = seq;
var results = targetCollection.insert(doc);
if( results.hasWriteError() ) {
if( results.writeError.code == 11000 /* dup key */ )
continue;
else
print( "unexpected error inserting data: " + tojson( results ) );
}
break;
}
}
126
);
insertDocument(
{
name: "Ted R."
},
myCollection
)
The while loop may iterate many times in collections with larger insert volumes.
127
MongoDB allows clients to read documents inserted or modified before it commits these modifications to disk, regardless of write concern level or journaling configuration. As a result, applications may observe two classes of behaviors:
For systems with multiple concurrent readers and writers, MongoDB will allow clients to read the results of a
write operation before the write operation returns.
If the mongod terminates before the journal commits, even if a write returns successfully, queries may have
read data that will not exist after the mongod restarts.
Other database systems refer to these isolation semantics as read uncommitted. For all inserts and updates, MongoDB modifies each document in isolation: clients never see documents in intermediate states. For multi-document
operations, MongoDB does not provide any multi-document transactions or isolation.
When a standalone mongod returns a successful journaled write concern, the data is fully committed to disk and will
be available after mongod restarts.
For replica sets, write operations are durable only after a write replicates and commits to the journal on a majority of
the voting members of the set. MongoDB regularly commits data to the journal regardless of journaled write concern:
use the commitIntervalMs to control how often a mongod commits the journal.
128
Write concern can include the w (page 129) option to specify the required number of acknowledgments before returning, the j (page 129) option to require writes to the journal before returning, and wtimeout (page 129) option to specify
a time limit to prevent write operations from blocking indefinitely.
In sharded clusters, mongos instances will pass the write concern on to the shard.
w Option The w option provides the ability to disable write concern entirely as well as specify the write concern for
replica sets.
MongoDB uses w:
<Number
greater than
1>
"majority"
<tag set>
Description
Provides acknowledgment of write operations on a standalone mongod or the primary in a
replica set.
This is the default write concern for MongoDB.
Disables basic acknowledgment of write operations, but returns information about socket
exceptions and networking errors to the application.
If you disable basic write operation acknowledgment but require journal commit
acknowledgment, the journal commit prevails, and the server will require that mongod
acknowledge the write operation.
Guarantees that write operations have propagated successfully to the specified number of replica
set members including the primary.
For example, w: 2 indicates acknowledgements from the primary and at least one secondary.
If you set w to a number that is greater than the number of set members that hold data,
MongoDB waits for the non-existent members to become available, which means MongoDB
blocks indefinitely.
Confirms that write operations have propagated to the majority of voting nodes: a majority of
the replica sets voting members must acknowledge the write operation before it succeeds. This
allows you to avoid hard coding assumptions about the size of your replica set into your
application.
Changed in version 3.0: In previous versions, w: "majority" refers to the majority of the
replica sets members.
Changed in version 2.6: In Master/Slave (page 575) deployments, MongoDB treats w:
"majority" as equivalent to w: 1. In earlier versions of MongoDB, w: "majority"
produces an error in master/slave (page 575) deployments.
By specifying a tag set (page 614), you can have fine-grained control over which replica set
members must acknowledge a write operation to satisfy the required level of write concern.
j Option The j option confirms that the mongod instance has written the data to the on-disk journal. This ensures
that data is not lost if the mongod instance shuts down unexpectedly. Set to true to enable.
Changed in version 2.6: Specifying a write concern that includes j: true to a mongod or mongos running with
--nojournal option now errors. Previous versions would ignore the j: true.
Note: Requiring journaled write concern in a replica set only requires a journal commit of the write operation to the
primary of the set regardless of the level of replica acknowledged write concern.
wtimeout This option specifies a time limit, in milliseconds, for the write concern. wtimeout is only applicable
for w values greater than 1.
129
wtimeout causes write operations to return with an error after the specified limit, even if the required write concern
will eventually succeed. When these write operations return, MongoDB does not undo successful data modifications
performed before the write concern exceeded the wtimeout time limit.
If you do not specify the wtimeout option and the level of write concern is unachievable, the write operation will
block indefinitely. Specifying a wtimeout value of 0 is equivalent to a write concern without the wtimeout option.
See also:
Write Concern Introduction (page 76) and Write Concern for Replica Sets (page 79).
SQL to MongoDB Mapping Chart
In addition to the charts that follow, you might want to consider the Frequently Asked Questions (page 723) section for
a selection of common questions about MongoDB.
Terminology and Concepts
The following table presents the various SQL terminology and concepts and the corresponding MongoDB terminology
and concepts.
SQL Terms/Concepts
database
table
row
column
index
table joins
primary key
Specify any unique column or column combination as
primary key.
aggregation (e.g. group by)
MongoDB Terms/Concepts
database
collection
document or BSON document
field
index
embedded documents and linking
primary key
In MongoDB, the primary key is automatically set to
the _id field.
aggregation pipeline
See the SQL to Aggregation Mapping Chart
(page 458).
Executables
The following table presents some database executables and the corresponding MongoDB executables. This table is
not meant to be exhaustive.
Database Server
Database Client
MongoDB
mongod
mongo
MySQL
mysqld
mysql
Oracle
oracle
sqlplus
Informix
IDS
DB-Access
DB2
DB2 Server
DB2 Client
Examples
The following table presents the various SQL statements and the corresponding MongoDB statements. The examples
in the table assume the following conditions:
The SQL examples assume a table named users.
The MongoDB examples assume a collection named users that contain documents of the following prototype:
130
{
_id: ObjectId("509a8fb2f3f4948bd2f983a0"),
user_id: "abc123",
age: 55,
status: 'A'
}
Create and Alter The following table presents the various SQL statements related to table-level actions and the
corresponding MongoDB statements.
131
db.users.createIndex( { user_id: 1 } )
CREATE INDEX
idx_user_id_asc_age_desc
ON users(user_id, age DESC)
db.users.drop()
For
more
information,
see
db.collection.insert(),
db.createCollection(),
db.collection.update(), $set, $unset, db.collection.createIndex(), indexes (page 468),
db.collection.drop(), and Data Modeling Concepts (page 143).
Insert The following table presents the various SQL statements related to inserting records into tables and the corresponding MongoDB statements.
132
db.users.insert(
{ user_id: "bcd001", age: 45, status: "A" }
)
133
SELECT *
FROM users
db.users.find()
SELECT id,
user_id,
status
FROM users
db.users.find(
{ },
{ user_id: 1, status: 1 }
)
db.users.find(
{ },
{ user_id: 1, status: 1, _id: 0 }
)
SELECT *
FROM users
WHERE status = "A"
db.users.find(
{ status: "A" }
)
db.users.find(
{ status: "A" },
{ user_id: 1, status: 1, _id: 0 }
)
SELECT *
FROM users
WHERE status != "A"
db.users.find(
{ status: { $ne: "A" } }
)
SELECT *
FROM users
WHERE status = "A"
AND age = 50
db.users.find(
{ status: "A",
age: 50 }
)
SELECT *
FROM users
WHERE status = "A"
OR age = 50
db.users.find(
{ $or: [ { status: "A" } ,
{ age: 50 } ] }
)
SELECT *
FROM users
WHERE age > 25
db.users.find(
{ age: { $gt: 25 } }
)
SELECT *
FROM users
WHERE age < 25
db.users.find(
{ age: { $lt: 25 } }
)
SELECT *
FROM users
WHERE age > 25
AND
age <= 50
db.users.find(
{ age: { $gt: 25, $lte: 50 } }
)
134
SELECT *
FROM users
WHERE user_id like "%bc%"
Chapter 3. {MongoDB
CRUD
db.users.find(
user_id:
/bc/Operations
} )
For
more
information,
see
db.collection.find(),
db.collection.distinct(),
db.collection.findOne(), $ne $and, $or, $gt, $lt, $exists, $lte, $regex, limit(), skip(),
explain(), sort(), and count().
Update Records The following table presents the various SQL statements related to updating existing records in
tables and the corresponding MongoDB statements.
SQL Update Statements
UPDATE users
SET status = "C"
WHERE age > 25
db.users.update(
{ age: { $gt: 25 } },
{ $set: { status: "C" } },
{ multi: true }
)
UPDATE users
SET age = age + 3
WHERE status = "A"
db.users.update(
{ status: "A" } ,
{ $inc: { age: 3 } },
{ multi: true }
)
db.users.remove({})
135
136
"awards" : [
{
"award" : "Turing Award",
"year" : 1971,
"by" : "ACM"
},
{
"award" : "Kyoto Prize",
"year" : 1988,
"by" : "Inamori Foundation"
},
{
"award" : "National Medal of Science",
"year" : 1990,
"by" : "National Science Foundation"
}
]
}
{
"_id" : 3,
"name" : {
"first" : "Grace",
"last" : "Hopper"
},
"title" : "Rear Admiral",
"birth" : ISODate("1906-12-09T05:00:00Z"),
"death" : ISODate("1992-01-01T05:00:00Z"),
"contribs" : [
"UNIVAC",
"compiler",
"FLOW-MATIC",
"COBOL"
],
"awards" : [
{
"award" : "Computer Sciences Man of the Year",
"year" : 1969,
"by" : "Data Processing Management Association"
},
{
"award" : "Distinguished Fellow",
"year" : 1973,
"by" : " British Computer Society"
},
{
"award" : "W. W. McDowell Award",
"year" : 1976,
"by" : "IEEE Computer Society"
},
{
"award" : "National Medal of Technology",
"year" : 1991,
"by" : "United States"
}
]
}
137
{
"_id" : 4,
"name" : {
"first" : "Kristen",
"last" : "Nygaard"
},
"birth" : ISODate("1926-08-27T04:00:00Z"),
"death" : ISODate("2002-08-10T04:00:00Z"),
"contribs" : [
"OOP",
"Simula"
],
"awards" : [
{
"award" : "Rosing Prize",
"year" : 1999,
"by" : "Norwegian Data Association"
},
{
"award" : "Turing Award",
"year" : 2001,
"by" : "ACM"
},
{
"award" : "IEEE John von Neumann Medal",
"year" : 2001,
"by" : "IEEE"
}
]
}
{
"_id" : 5,
"name" : {
"first" : "Ole-Johan",
"last" : "Dahl"
},
"birth" : ISODate("1931-10-12T04:00:00Z"),
"death" : ISODate("2002-06-29T04:00:00Z"),
"contribs" : [
"OOP",
"Simula"
],
"awards" : [
{
"award" : "Rosing Prize",
"year" : 1999,
"by" : "Norwegian Data Association"
},
{
"award" : "Turing Award",
"year" : 2001,
"by" : "ACM"
},
{
"award" : "IEEE John von Neumann Medal",
"year" : 2001,
"by" : "IEEE"
138
}
]
}
{
"_id" : 6,
"name" : {
"first" : "Guido",
"last" : "van Rossum"
},
"birth" : ISODate("1956-01-31T05:00:00Z"),
"contribs" : [
"Python"
],
"awards" : [
{
"award" : "Award for the Advancement of Free Software",
"year" : 2001,
"by" : "Free Software Foundation"
},
{
"award" : "NLUUG Award",
"year" : 2003,
"by" : "NLUUG"
}
]
}
{
"_id" : ObjectId("51e062189c6ae665454e301d"),
"name" : {
"first" : "Dennis",
"last" : "Ritchie"
},
"birth" : ISODate("1941-09-09T04:00:00Z"),
"death" : ISODate("2011-10-12T04:00:00Z"),
"contribs" : [
"UNIX",
"C"
],
"awards" : [
{
"award" : "Turing Award",
"year" : 1983,
"by" : "ACM"
},
{
"award" : "National Medal of Technology",
"year" : 1998,
"by" : "United States"
},
{
"award" : "Japan Prize",
"year" : 2011,
"by" : "The Japan Prize Foundation"
}
]
}
139
{
"_id" : 8,
"name" : {
"first" : "Yukihiro",
"aka" : "Matz",
"last" : "Matsumoto"
},
"birth" : ISODate("1965-04-14T04:00:00Z"),
"contribs" : [
"Ruby"
],
"awards" : [
{
"award" : "Award for the Advancement of Free Software",
"year" : "2011",
"by" : "Free Software Foundation"
}
]
}
{
"_id" : 9,
"name" : {
"first" : "James",
"last" : "Gosling"
},
"birth" : ISODate("1955-05-19T04:00:00Z"),
"contribs" : [
"Java"
],
"awards" : [
{
"award" : "The Economist Innovation Award",
"year" : 2002,
"by" : "The Economist"
},
{
"award" : "Officer of the Order of Canada",
"year" : 2007,
"by" : "Canada"
}
]
}
{
"_id" : 10,
"name" : {
"first" : "Martin",
"last" : "Odersky"
},
"contribs" : [
"Scala"
]
}
140
CHAPTER 4
Data Models
Data in MongoDB has a flexible schema. Collections do not enforce document structure. This flexibility gives you
data-modeling choices to match your application and its performance requirements.
Data Modeling Introduction (page 141) An introduction to data modeling in MongoDB.
Data Modeling Concepts (page 143) The core documentation detailing the decisions you must make when determining a data model, and discussing considerations that should be taken into account.
Data Model Examples and Patterns (page 149) Examples of possible data models that you can use to structure your
MongoDB documents.
Data Model Reference (page 166) Reference material for data modeling for developers of MongoDB applications.
141
Embedded Data
Embedded documents capture relationships between data by storing related data in a single document structure. MongoDB documents make it possible to embed document structures in a field or array within a document. These denormalized data models allow applications to retrieve and manipulate related data in a single database operation.
See Embedded Data Models (page 144) for the strengths and weaknesses of embedding documents.
142
143
Embedded data models allow applications to store related pieces of information in the same database record. As a
result, applications may need to issue fewer queries and updates to complete common operations.
In general, use embedded data models when:
you have contains relationships between entities. See Model One-to-One Relationships with Embedded Documents (page 150).
you have one-to-many relationships between entities. In these relationships the many or child documents
always appear with or are viewed in the context of the one or parent documents. See Model One-to-Many
Relationships with Embedded Documents (page 151).
In general, embedding provides better performance for read operations, as well as the ability to request and retrieve
related data in a single database operation. Embedded data models make it possible to update related data in a single
atomic write operation.
However, embedding related data in documents may lead to situations where documents grow after creation. With the
MMAPv1 storage engine, document growth can impact write performance and lead to data fragmentation.
In version 3.0.0, MongoDB uses Power of 2 Sized Allocations (page 90) as the default allocation strategy for MMAPv1
in order to account for document growth, minimizing the likelihood of data fragmentation. See Power of 2 Sized
Allocations (page 90) for details. Furthermore, documents in MongoDB must be smaller than the maximum BSON
document size. For bulk binary data, consider GridFS (page 148).
To interact with embedded documents, use dot notation to reach into embedded documents. See query for data in
arrays (page 97) and query data in embedded documents (page 96) for more examples on accessing data in arrays and
embedded documents.
144
145
146
To distribute data and application traffic in a sharded collection, MongoDB uses the shard key (page 654). Selecting
the proper shard key (page 654) has significant implications for performance, and can enable or prevent query isolation
and increased write capacity. It is important to consider carefully the field or fields to use as the shard key.
See Sharding Introduction (page 641) and Shard Keys (page 654) for more information.
Indexes
Use indexes to improve performance for common queries. Build indexes on fields that appear often in queries and for
all operations that return sorted results. MongoDB automatically creates a unique index on the _id field.
As you create indexes, consider the following behaviors of indexes:
Each index requires at least 8KB of data space.
Adding an index has some negative performance impact for write operations. For collections with high writeto-read ratio, indexes are expensive since each insert must also update any indexes.
Collections with high read-to-write ratio often benefit from additional indexes. Indexes do not affect un-indexed
read operations.
When active, each index consumes disk space and memory. This usage can be significant and should be tracked
for capacity planning, especially for concerns over working set size.
See Indexing Strategies (page 532) for more information on indexes as well as Analyze Query Performance (page 109).
Additionally, the MongoDB database profiler (page 225) may help identify inefficient queries.
Large Number of Collections
In certain situations, you might choose to store related information in several collections rather than in a single collection.
Consider a sample collection logs that stores log documents for various environment and applications. The logs
collection contains documents of the following form:
{ log: "dev", ts: ..., info: ... }
{ log: "debug", ts: ..., info: ...}
If the total number of documents is low, you may group documents into collection by type. For logs, consider maintaining distinct log collections, such as logs_dev and logs_debug. The logs_dev collection would contain
only the documents related to the dev environment.
Generally, having a large number of collections has no significant performance penalty and results in very good
performance. Distinct collections are very important for high-throughput batch processing.
When using models that have a large number of collections, consider the following behaviors:
Each collection has a certain minimum overhead of a few kilobytes.
Each index, including the index on _id, requires at least 8KB of data space.
For each database, a single namespace file (i.e. <database>.ns) stores all meta-data for that database, and
each index and collection has its own entry in the namespace file. MongoDB places limits on the size
of namespace files.
MongoDB using the mmapv1 storage engine has limits on the number of namespaces. You may
wish to know the current number of namespaces in order to determine how many additional namespaces the
database can support. To get the current number of namespaces, run the following in the mongo shell:
147
db.system.namespaces.count()
The limit on the number of namespaces depend on the <database>.ns size. The namespace file defaults to
16 MB.
To change the size of the new namespace file, start the server with the option --nssize <new size MB>.
For existing databases, after starting up the server with --nssize, run the db.repairDatabase() command from the mongo shell. For impacts and considerations on running db.repairDatabase(), see
repairDatabase.
Data Lifecycle Management
Data modeling decisions should take data lifecycle management into consideration.
The Time to Live or TTL feature (page 211) of collections expires documents after a period of time. Consider using
the TTL feature if your application requires some data to persist in the database for a limited period of time.
Additionally, if your application only uses recently inserted documents, consider Capped Collections (page 208).
Capped collections provide first-in-first-out (FIFO) management of inserted documents and efficiently support operations that insert and read documents based on insertion order.
4.2.3 GridFS
GridFS is a specification for storing and retrieving files that exceed the BSON-document size limit of 16MB.
Instead of storing a file in a single document, GridFS divides a file into parts, or chunks, 7 and stores each of those
chunks as a separate document. By default GridFS limits chunk size to 255k. GridFS uses two collections to store
files. One collection stores the file chunks, and the other stores file metadata.
When you query a GridFS store for a file, the driver or client will reassemble the chunks as needed. You can perform
range queries on files stored through GridFS. You also can access information from arbitrary sections of files, which
allows you to skip into the middle of a video or audio file.
GridFS is useful not only for storing files that exceed 16MB but also for storing any files for which you want access
without having to load the entire file into memory. For more information on the indications of GridFS, see When
should I use GridFS? (page 729).
Changed in version 2.4.10: The default chunk size changed from 256k to 255k.
Implement GridFS
To store and retrieve files using GridFS, use either of the following:
A MongoDB driver. See the drivers documentation for information on using GridFS with your driver.
The mongofiles command-line tool in the mongo shell. See the mongofiles reference for complete
documentation.
GridFS Collections
GridFS stores files in two collections:
chunks stores the binary chunks. For details, see The chunks Collection (page 172).
files stores the files metadata. For details, see The files Collection (page 173).
7
The use of the term chunks in the context of GridFS is not related to the use of the term chunks in the context of sharding.
148
GridFS places the collections in a common bucket by prefixing each with the bucket name. By default, GridFS uses
two collections with names prefixed by fs bucket:
fs.files
fs.chunks
You can choose a different bucket name than fs, and create multiple buckets in a single database.
Each document in the chunks collection represents a distinct chunk of a file as represented in the GridFS store. Each
chunk is identified by its unique ObjectId stored in its _id field.
For descriptions of all fields in the chunks and files collections, see GridFS Reference (page 172).
GridFS Index
GridFS uses a unique, compound index on the chunks collection for the files_id and n fields. The files_id
field contains the _id of the chunks parent document. The n field contains the sequence number of the chunk.
GridFS numbers all chunks, starting with 0. For descriptions of the documents and fields in the chunks collection,
see GridFS Reference (page 172).
The GridFS index allows efficient retrieval of chunks using the files_id and n values, as shown in the following
example:
cursor = db.fs.chunks.find({files_id: myFileID}).sort({n:1});
See the relevant driver documentation for the specific behavior of your GridFS application. If your driver does not
create this index, issue the following operation using the mongo shell:
db.fs.chunks.createIndex( { files_id: 1, n: 1 }, { unique: true } );
Additional Resources
Building MongoDB Applications with Binary Files Using GridFS: Part 18
Building MongoDB Applications with Binary Files Using GridFS: Part 29
149
Model Tree Structures with Parent References (page 155) Presents a data model that organizes documents in
a tree-like structure by storing references (page 145) to parent nodes in child nodes.
Model Tree Structures with Child References (page 156) Presents a data model that organizes documents in a
tree-like structure by storing references (page 145) to child nodes in parent nodes.
See Model Tree Structures (page 154) for additional examples of data models for tree structures.
Model Specific Application Contexts (page 161) Examples for models for specific application contexts.
Model Data for Atomic Operations (page 162) Illustrates how embedding fields related to an atomic update
within the same document ensures that the fields are in sync.
Model Data to Support Keyword Search (page 162) Describes one method for supporting keyword search by
storing keywords in an array in the same document as the text field. Combined with a multi-key index, this
pattern can support applications keyword search operations.
Data in MongoDB has a flexible schema. Collections do not enforce document structure. Decisions that affect how
you model data can affect application performance and database capacity. See Data Modeling Concepts (page 143)
for a full high level overview of data modeling in MongoDB.
This document describes a data model that uses embedded (page 144) documents to describe relationships between
connected data.
Pattern
Consider the following example that maps patron and address relationships. The example illustrates the advantage of
embedding over referencing if you need to view one data entity in context of the other. In this one-to-one relationship
between patron and address data, the address belongs to the patron.
In the normalized data model, the address document contains a reference to the patron document.
{
_id: "joe",
name: "Joe Bookreader"
}
{
patron_id: "joe",
street: "123 Fake Street",
city: "Faketon",
state: "MA",
150
zip: "12345"
}
If the address data is frequently retrieved with the name information, then with referencing, your application needs
to issue multiple queries to resolve the reference. The better data model would be to embed the address data in the
patron data, as in the following document:
{
_id: "joe",
name: "Joe Bookreader",
address: {
street: "123 Fake Street",
city: "Faketon",
state: "MA",
zip: "12345"
}
}
With the embedded data model, your application can retrieve the complete patron information with one query.
Model One-to-Many Relationships with Embedded Documents
Overview
Data in MongoDB has a flexible schema. Collections do not enforce document structure. Decisions that affect how
you model data can affect application performance and database capacity. See Data Modeling Concepts (page 143)
for a full high level overview of data modeling in MongoDB.
This document describes a data model that uses embedded (page 144) documents to describe relationships between
connected data.
Pattern
Consider the following example that maps patron and multiple address relationships. The example illustrates the
advantage of embedding over referencing if you need to view many data entities in context of another. In this one-tomany relationship between patron and address data, the patron has multiple address entities.
In the normalized data model, the address documents contain a reference to the patron document.
{
_id: "joe",
name: "Joe Bookreader"
}
{
patron_id: "joe",
street: "123 Fake Street",
city: "Faketon",
state: "MA",
zip: "12345"
}
{
patron_id: "joe",
street: "1 Some Other Street",
city: "Boston",
151
state: "MA",
zip: "12345"
}
If your application frequently retrieves the address data with the name information, then your application needs
to issue multiple queries to resolve the references. A more optimal schema would be to embed the address data
entities in the patron data, as in the following document:
{
_id: "joe",
name: "Joe Bookreader",
addresses: [
{
street: "123 Fake Street",
city: "Faketon",
state: "MA",
zip: "12345"
},
{
street: "1 Some Other Street",
city: "Boston",
state: "MA",
zip: "12345"
}
]
}
With the embedded data model, your application can retrieve the complete patron information with one query.
Model One-to-Many Relationships with Document References
Overview
Data in MongoDB has a flexible schema. Collections do not enforce document structure. Decisions that affect how
you model data can affect application performance and database capacity. See Data Modeling Concepts (page 143)
for a full high level overview of data modeling in MongoDB.
This document describes a data model that uses references (page 145) between documents to describe relationships
between connected data.
Pattern
Consider the following example that maps publisher and book relationships. The example illustrates the advantage of
referencing over embedding to avoid repetition of the publisher information.
Embedding the publisher document inside the book document would lead to repetition of the publisher data, as the
following documents show:
{
title: "MongoDB: The Definitive Guide",
author: [ "Kristina Chodorow", "Mike Dirolf" ],
published_date: ISODate("2010-09-24"),
pages: 216,
language: "English",
publisher: {
name: "O'Reilly Media",
152
founded: 1980,
location: "CA"
}
}
{
title: "50 Tips and Tricks for MongoDB Developer",
author: "Kristina Chodorow",
published_date: ISODate("2011-05-06"),
pages: 68,
language: "English",
publisher: {
name: "O'Reilly Media",
founded: 1980,
location: "CA"
}
}
To avoid repetition of the publisher data, use references and keep the publisher information in a separate collection
from the book collection.
When using references, the growth of the relationships determine where to store the reference. If the number of books
per publisher is small with limited growth, storing the book reference inside the publisher document may sometimes
be useful. Otherwise, if the number of books per publisher is unbounded, this data model would lead to mutable,
growing arrays, as in the following example:
{
name: "O'Reilly Media",
founded: 1980,
location: "CA",
books: [12346789, 234567890, ...]
}
{
_id: 123456789,
title: "MongoDB: The Definitive Guide",
author: [ "Kristina Chodorow", "Mike Dirolf" ],
published_date: ISODate("2010-09-24"),
pages: 216,
language: "English"
}
{
_id: 234567890,
title: "50 Tips and Tricks for MongoDB Developer",
author: "Kristina Chodorow",
published_date: ISODate("2011-05-06"),
pages: 68,
language: "English"
}
To avoid mutable, growing arrays, store the publisher reference inside the book document:
{
_id: "oreilly",
name: "O'Reilly Media",
founded: 1980,
location: "CA"
}
153
{
_id: 123456789,
title: "MongoDB: The Definitive Guide",
author: [ "Kristina Chodorow", "Mike Dirolf" ],
published_date: ISODate("2010-09-24"),
pages: 216,
language: "English",
publisher_id: "oreilly"
}
{
_id: 234567890,
title: "50 Tips and Tricks for MongoDB Developer",
author: "Kristina Chodorow",
published_date: ISODate("2011-05-06"),
pages: 68,
language: "English",
publisher_id: "oreilly"
}
Model Tree Structures with Parent References (page 155) Presents a data model that organizes documents in a treelike structure by storing references (page 145) to parent nodes in child nodes.
154
Model Tree Structures with Child References (page 156) Presents a data model that organizes documents in a treelike structure by storing references (page 145) to child nodes in parent nodes.
Model Tree Structures with an Array of Ancestors (page 157) Presents a data model that organizes documents in a
tree-like structure by storing references (page 145) to parent nodes and an array that stores all ancestors.
Model Tree Structures with Materialized Paths (page 159) Presents a data model that organizes documents in a treelike structure by storing full relationship paths between documents. In addition to the tree node, each document
stores the _id of the nodes ancestors or path as a string.
Model Tree Structures with Nested Sets (page 160) Presents a data model that organizes documents in a tree-like
structure using the Nested Sets pattern. This optimizes discovering subtrees at the expense of tree mutability.
Model Tree Structures with Parent References
Overview
Data in MongoDB has a flexible schema. Collections do not enforce document structure. Decisions that affect how
you model data can affect application performance and database capacity. See Data Modeling Concepts (page 143)
for a full high level overview of data modeling in MongoDB.
This document describes a data model that describes a tree-like structure in MongoDB documents by storing references
(page 145) to parent nodes in children nodes.
Pattern
The Parent References pattern stores each tree node in a document; in addition to the tree node, the document stores
the id of the nodes parent.
Consider the following hierarchy of categories:
The following example models the tree using Parent References, storing the reference to the parent category in the
field parent:
db.categories.insert(
db.categories.insert(
db.categories.insert(
db.categories.insert(
db.categories.insert(
db.categories.insert(
{
{
{
{
{
{
_id:
_id:
_id:
_id:
_id:
_id:
You can create an index on the field parent to enable fast search by the parent node:
db.categories.createIndex( { parent: 1 } )
You can query by the parent field to find its immediate children nodes:
db.categories.find( { parent: "Databases" } )
The Parent Links pattern provides a simple solution to tree storage but requires multiple queries to retrieve subtrees.
155
Data in MongoDB has a flexible schema. Collections do not enforce document structure. Decisions that affect how
you model data can affect application performance and database capacity. See Data Modeling Concepts (page 143)
for a full high level overview of data modeling in MongoDB.
This document describes a data model that describes a tree-like structure in MongoDB documents by storing references
(page 145) in the parent-nodes to children nodes.
Pattern
The Child References pattern stores each tree node in a document; in addition to the tree node, document stores in an
array the id(s) of the nodes children.
Consider the following hierarchy of categories:
The following example models the tree using Child References, storing the reference to the nodes children in the field
children:
db.categories.insert(
db.categories.insert(
db.categories.insert(
db.categories.insert(
156
{
{
{
{
_id:
_id:
_id:
_id:
"MongoDB", children: [] } )
"dbm", children: [] } )
"Databases", children: [ "MongoDB", "dbm" ] } )
"Languages", children: [] } )
The query to retrieve the immediate children of a node is fast and straightforward:
db.categories.findOne( { _id: "Databases" } ).children
You can create an index on the field children to enable fast search by the child nodes:
db.categories.createIndex( { children: 1 } )
You can query for a node in the children field to find its parent node as well as its siblings:
db.categories.find( { children: "MongoDB" } )
The Child References pattern provides a suitable solution to tree storage as long as no operations on subtrees are
necessary. This pattern may also provide a suitable solution for storing graphs where a node may have multiple
parents.
Model Tree Structures with an Array of Ancestors
Overview
Data in MongoDB has a flexible schema. Collections do not enforce document structure. Decisions that affect how
you model data can affect application performance and database capacity. See Data Modeling Concepts (page 143)
for a full high level overview of data modeling in MongoDB.
157
This document describes a data model that describes a tree-like structure in MongoDB documents using references
(page 145) to parent nodes and an array that stores all ancestors.
Pattern
The Array of Ancestors pattern stores each tree node in a document; in addition to the tree node, document stores in
an array the id(s) of the nodes ancestors or path.
Consider the following hierarchy of categories:
The following example models the tree using Array of Ancestors. In addition to the ancestors field, these documents also store the reference to the immediate parent category in the parent field:
db.categories.insert(
db.categories.insert(
db.categories.insert(
db.categories.insert(
db.categories.insert(
db.categories.insert(
{
{
{
{
{
{
_id:
_id:
_id:
_id:
_id:
_id:
The query to retrieve the ancestors or path of a node is fast and straightforward:
db.categories.findOne( { _id: "MongoDB" } ).ancestors
You can create an index on the field ancestors to enable fast search by the ancestors nodes:
158
db.categories.createIndex( { ancestors: 1 } )
You can query by the field ancestors to find all its descendants:
db.categories.find( { ancestors: "Programming" } )
The Array of Ancestors pattern provides a fast and efficient solution to find the descendants and the ancestors of a node
by creating an index on the elements of the ancestors field. This makes Array of Ancestors a good choice for working
with subtrees.
The Array of Ancestors pattern is slightly slower than the Materialized Paths (page 159) pattern but is more straightforward to use.
Model Tree Structures with Materialized Paths
Overview
Data in MongoDB has a flexible schema. Collections do not enforce document structure. Decisions that affect how
you model data can affect application performance and database capacity. See Data Modeling Concepts (page 143)
for a full high level overview of data modeling in MongoDB.
This document describes a data model that describes a tree-like structure in MongoDB documents by storing full
relationship paths between documents.
Pattern
The Materialized Paths pattern stores each tree node in a document; in addition to the tree node, document stores as
a string the id(s) of the nodes ancestors or path. Although the Materialized Paths pattern requires additional steps of
working with strings and regular expressions, the pattern also provides more flexibility in working with the path, such
as finding nodes by partial paths.
Consider the following hierarchy of categories:
The following example models the tree using Materialized Paths, storing the path in the field path; the path string
uses the comma , as a delimiter:
db.categories.insert(
db.categories.insert(
db.categories.insert(
db.categories.insert(
db.categories.insert(
db.categories.insert(
{
{
{
{
{
{
_id:
_id:
_id:
_id:
_id:
_id:
You can query to retrieve the whole tree, sorting by the field path:
db.categories.find().sort( { path: 1 } )
You can use regular expressions on the path field to find the descendants of Programming:
db.categories.find( { path: /,Programming,/ } )
You can also retrieve the descendants of Books where the Books is also at the topmost level of the hierarchy:
db.categories.find( { path: /^,Books,/ } )
159
db.categories.createIndex( { path: 1 } )
Data in MongoDB has a flexible schema. Collections do not enforce document structure. Decisions that affect how
you model data can affect application performance and database capacity. See Data Modeling Concepts (page 143)
for a full high level overview of data modeling in MongoDB.
This document describes a data model that describes a tree like structure that optimizes discovering subtrees at the
expense of tree mutability.
160
Pattern
The Nested Sets pattern identifies each node in the tree as stops in a round-trip traversal of the tree. The application
visits each node in the tree twice; first during the initial trip, and second during the return trip. The Nested Sets pattern
stores each tree node in a document; in addition to the tree node, document stores the id of nodes parent, the nodes
initial stop in the left field, and its return stop in the right field.
Consider the following hierarchy of categories:
{
{
{
{
{
{
_id:
_id:
_id:
_id:
_id:
_id:
The Nested Sets pattern provides a fast and efficient solution for finding subtrees but is inefficient for modifying the
tree structure. As such, this pattern is best for static trees that do not change.
161
Model Data to Support Keyword Search (page 162) Describes one method for supporting keyword search by storing
keywords in an array in the same document as the text field. Combined with a multi-key index, this pattern can
support applications keyword search operations.
Model Monetary Data (page 164) Describes two methods to model monetary data in MongoDB.
Model Time Data (page 165) Describes how to deal with local time in MongoDB.
Model Data for Atomic Operations
Pattern
Then to update with new checkout information, you can use the db.collection.update() method to atomically
update both the available field and the checkout field:
db.books.update (
{ _id: 123456789, available: { $gt: 0 } },
{
$inc: { available: -1 },
$push: { checkout: { by: "abc", date: new Date() } }
}
)
The operation returns a WriteResult() object that contains information on the status of the operation:
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
The nMatched field shows that 1 document matched the update condition, and nModified shows that the operation
updated 1 document.
If no document matched the update condition, then nMatched and nModified would be 0 and would indicate that
you could not check out the book.
Model Data to Support Keyword Search
162
Note: Keyword search is not the same as text search or full text search, and does not provide stemming or other
text-processing features. See the Limitations of Keyword Indexes (page 163) section for more information.
In 2.4, MongoDB provides a text search feature. See Text Indexes (page 486) for more information.
If your application needs to perform queries on the content of a field that holds text you can perform exact matches
on the text or use $regex to use regular expression pattern matches. However, for many operations on text, these
methods do not satisfy application requirements.
This pattern describes one method for supporting keyword search using MongoDB to support application search
functionality, that uses keywords stored in an array in the same document as the text field. Combined with a multi-key
index (page 474), this pattern can support applications keyword search operations.
Pattern
To add structures to your document to support keyword-based queries, create an array field in your documents and add
the keywords as strings in the array. You can then create a multi-key index (page 474) on the array and create queries
that select values from the array.
Example
Given a collection of library volumes that you want to provide topic-based search. For each volume, you add the array
topics, and you add as many keywords as needed for a given volume.
For the Moby-Dick volume you might have the following document:
{ title : "Moby-Dick" ,
author : "Herman Melville" ,
published : 1851 ,
ISBN : 0451526996 ,
topics : [ "whaling" , "allegory" , "revenge" , "American" ,
"novel" , "nautical" , "voyage" , "Cape Cod" ]
}
The multi-key index creates separate index entries for each keyword in the topics array. For example the index
contains one entry for whaling and another for allegory.
You then query based on the keywords. For example:
db.volumes.findOne( { topics : "voyage" }, { title: 1 } )
Note: An array with a large number of elements, such as one with several hundreds or thousands of keywords will
incur greater indexing costs on insertion.
MongoDB can support keyword searches using specific data models and multi-key indexes (page 474); however, these
keyword indexes are not sufficient or comparable to full-text products in the following respects:
Stemming. Keyword queries in MongoDB can not parse keywords for root or related words.
163
Synonyms. Keyword-based search features must provide support for synonym or related queries in the application layer.
Ranking. The keyword look ups described in this document do not provide a way to weight results.
Asynchronous Indexing. MongoDB builds indexes synchronously, which means that the indexes used for keyword indexes are always current and can operate in real-time. However, asynchronous bulk indexes may be
more efficient for some kinds of content and workloads.
Model Monetary Data
Overview
MongoDB stores numeric data as either IEEE 754 standard 64-bit floating point numbers or as 32-bit or 64-bit signed
integers. Applications that handle monetary data often require capturing fractional units of currency. However, arithmetic on floating point numbers, as implemented in modern hardware, often does not conform to requirements for
monetary arithmetic. In addition, some fractional numeric quantities, such as one third and one tenth, have no exact
representation in binary floating point numbers.
Note: Arithmetic mentioned on this page refers to server-side arithmetic performed by mongod or mongos, and not
to client-side arithmetic.
This document describes two ways to model monetary data in MongoDB:
Exact Precision (page 164) which multiplies the monetary value by a power of 10.
Arbitrary Precision (page 165) which uses two fields for the value: one field to store the exact monetary value
as a non-numeric and another field to store a floating point approximation of the value.
Use Cases for Exact Precision Model
If you regularly need to perform server-side arithmetic on monetary data, the exact precision model may be appropriate.
For instance:
If you need to query the database for exact, mathematically valid matches, use Exact Precision (page 164).
If you need to be able to do server-side arithmetic, e.g., $inc, $mul, and aggregation framework
arithmetic, use Exact Precision (page 164).
Use Cases for Arbitrary Precision Model
If there is no need to perform server-side arithmetic on monetary data, modeling monetary data using the arbitrary
precision model may be suitable. For instance:
If you need to handle arbitrary or unforeseen number of precision, see Arbitrary Precision (page 165).
If server-side approximations are sufficient, possibly with client-side post-processing, see Arbitrary Precision
(page 165).
Exact Precision
2. Convert the monetary value into an integer by multiplying the value by a power of 10 that ensures the maximum
precision needed becomes the least significant digit of the integer. For example, if the required maximum
precision is the tenth of one cent, multiply the monetary value by 1000.
3. Store the converted monetary value.
For example, the following scales 9.99 USD by 1000 to preserve precision up to one tenth of a cent.
{ price: 9990, currency: "USD" }
To model monetary data using the arbitrary precision model, store the value in two fields:
1. In one field, encode the exact monetary value as a non-numeric data type; e.g., BinData or a string.
2. In the second field, store a double-precision floating point approximation of the exact value.
The following example uses the arbitrary precision model to store 9.99 USD for the price and 0.25 USD for the
fee:
{
price: { display: "9.99", approx: 9.9900000000000002, currency: "USD" },
fee: { display: "0.25", approx: 0.2499999999999999, currency: "USD" }
}
With some care, applications can perform range and sort queries on the field with the numeric approximation. However, the use of the approximation field for the query and sort operations requires that applications perform client-side
post-processing to decode the non-numeric representation of the exact value and then filter out the returned documents
based on the exact monetary value.
For use cases of this model, see Use Cases for Arbitrary Precision Model (page 164).
Model Time Data
Overview
MongoDB stores times in UTC (page 178) by default, and will convert any local time representations into this form.
Applications that must operate or report on some unmodified local time value may store the time zone alongside the
UTC timestamp, and compute the original local time in their application logic.
Example
In the MongoDB shell, you can store both the current date and the current clients offset from UTC.
165
You can reconstruct the original local time by applying the saved offset:
var record = db.data.findOne();
var localNow = new Date( record.date.getTime() -
( record.offset * 60000 ) );
4.4.1 Documents
MongoDB stores all data in documents, which are JSON-style data structures composed of field-and-value pairs:
{ "item": "pencil", "qty": 500, "type": "no.2" }
166
Document Structure
MongoDB documents are composed of field-and-value pairs and have the following structure:
{
field1:
field2:
field3:
...
fieldN:
value1,
value2,
value3,
valueN
The value of a field can be any of the BSON data types (page 176), including other documents, arrays, and arrays of
documents. The following document contains values of varying types:
var mydoc = {
_id: ObjectId("5099803df3f4948bd2f98391"),
name: { first: "Alan", last: "Turing" },
birth: new Date('Jun 23, 1912'),
death: new Date('Jun 07, 1954'),
contribs: [ "Turing machine", "Turing test", "Turingery" ],
views : NumberLong(1250000)
}
167
MongoDB preserves the order of the document fields following write operations except for the following cases:
The _id field is always the first field in the document.
Updates that include renaming of field names may result in the reordering of fields in the document.
Changed in version 2.6: Starting in version 2.6, MongoDB actively attempts to preserve the field order in a document.
Before version 2.6, MongoDB did not actively preserve the order of the fields in a document.
The _id Field
The _id field has the following behavior and constraints:
By default, MongoDB creates a unique index on the _id field during the creation of a collection.
The _id field is always the first field in the documents. If the server receives a document that does not have the
_id field first, then the server will move the field to the beginning.
The _id field may contain values of any BSON data type (page 176), other than an array.
Warning: To ensure functioning replication, do not store values that are of the BSON regular expression
type in the _id field.
The following are common options for storing values for _id:
Use an ObjectId (page 174).
Use a natural unique identifier, if available. This saves space and avoids an additional index.
Generate an auto-incrementing number. See Create an Auto-Incrementing Sequence Field (page 124).
Generate a UUID in your application code. For a more efficient storage of the UUID values in the collection
and in the _id index, store the UUID as a value of the BSON BinData type.
Index keys that are of the BinData type are more efficiently stored in the index if:
the binary subtype value is in the range of 0-7 or 128-135, and
168
the length of the byte array is: 0, 1, 2, 3, 4, 5, 6, 7, 8, 10, 12, 14, 16, 20, 24, or 32.
Use your drivers BSON UUID facility to generate UUIDs. Be aware that driver implementations may implement UUID serialization and deserialization logic differently, which may not be fully compatible with other
drivers. See your driver documentation12 for information concerning UUID interoperability.
Note: Most MongoDB driver clients will include the _id field and generate an ObjectId before sending the insert
operation to MongoDB; however, if the client sends a document without an _id field, the mongod will add the _id
field and generate the ObjectId.
Dot Notation
MongoDB uses the dot notation to access the elements of an array and to access the fields of an embedded document.
To access an element of an array by the zero-based index position, concatenate the array name with the dot (.) and
zero-based index position, and enclose in quotes:
'<array>.<index>'
See also $ positional operator for update operations and $ projection operator when array index position is unknown.
To access a field of an embedded document with dot-notation, concatenate the embedded document name with the dot
(.) and the field name, and enclose in quotes:
'<embedded document>.<field>'
See also:
Embedded Documents (page 96) for dot notation examples with embedded documents.
Arrays (page 97) for dot notation examples with arrays.
Some community supported drivers may have alternate behavior and may resolve a DBRef into a document automatically.
169
Manual References
Background
Using manual references is the practice of including one documents _id field in another document. The application
can then issue a second query to resolve the referenced fields as needed.
Process
Consider the following operation to insert two documents, using the _id field of the first document as a reference in
the second document:
original_id = ObjectId()
db.places.insert({
"_id": original_id,
"name": "Broadway Center",
"url": "bc.example.net"
})
db.people.insert({
"name": "Erin",
"places_id": original_id,
"url": "bc.example.net/Erin"
})
Then, when a query returns the document from the people collection you can, if needed, make a second query for
the document referenced by the places_id field in the places collection.
Use
For nearly every case where you want to store a relationship between two documents, use manual references
(page 170). The references are simple to create and your application can resolve references as needed.
The only limitation of manual linking is that these references do not convey the database and collection names. If you
have documents in a single collection that relate to documents in more than one collection, you may need to consider
using DBRefs.
DBRefs
Background
DBRefs are a convention for representing a document, rather than a specific reference type. They include the name of
the collection, and in some cases the database name, in addition to the value from the _id field.
Format
170
$id
The $id field contains the value of the _id field in the referenced document.
$db
Optional.
Contains the name of the database where the referenced document resides.
Only some drivers support $db references.
Example
DBRef documents resemble the following document:
{ "$ref" : <value>, "$id" : <value>, "$db" : <value> }
The DBRef in this example points to a document in the creators collection of the users database that has
ObjectId("5126bc054aed4daf9e2ab772") in its _id field.
Note: The order of fields in the DBRef matters, and you must use the above sequence when using a DBRef.
C The C driver contains no support for DBRefs. You can traverse references manually.
C++ The C++ driver contains no support for DBRefs. You can traverse references manually.
C# The C# driver supports DBRefs using the MongoDBRef14 class and the FetchDBRef15 method.
Haskell The Haskell driver contains no support for DBRefs. You can traverse references manually.
Java The DBRef16 class provides support for DBRefs from Java.
JavaScript The mongo shells JavaScript interface provides a DBRef.
Node.js The Node.js driver supports DBRefs using the DBRef17 class and the dereference18 method.
Perl The Perl driver contains no support for DBRefs. You can traverse references manually or use the MongoDBx::AutoDeref19 CPAN module.
PHP The PHP driver supports DBRefs, including the optional $db reference, using the MongoDBRef20 class.
14 http://api.mongodb.org/csharp/current/html/46c356d3-ed06-a6f8-42fa-e0909ab64ce2.htm
15 http://api.mongodb.org/csharp/current/html/1b0b8f48-ba98-1367-0a7d-6e01c8df436f.htm
16 http://api.mongodb.org/java/current/com/mongodb/DBRef.html
17 http://mongodb.github.io/node-mongodb-native/api-bson-generated/db_ref.html
18 http://mongodb.github.io/node-mongodb-native/api-generated/db.html#dereference
19 http://search.cpan.org/dist/MongoDBx-AutoDeref/
20 http://www.php.net/manual/en/class.mongodbref.php/
171
Python The Python driver supports DBRefs using the DBRef21 class and the dereference22 method.
Ruby The Ruby driver supports DBRefs using the DBRef23 class and the dereference24 method.
Scala The Scala driver contains no support for DBRefs. You can traverse references manually.
Use
In most cases you should use the manual reference (page 170) method for connecting two or more related documents.
However, if you need to reference documents from multiple collections, consider using DBRefs.
172
chunks.data
The chunks payload as a BSON binary type.
The chunks collection uses a compound index on files_id and n, as described in GridFS Index (page 149).
The files Collection
Each document in the files collection represents a file in the GridFS store. Consider the following prototype of a
document in the files collection:
{
"_id" : <ObjectId>,
"length" : <num>,
"chunkSize" : <num>,
"uploadDate" : <timestamp>,
"md5" : <hash>,
"filename" : <string>,
"contentType" : <string>,
"aliases" : <string array>,
"metadata" : <dataObject>,
}
Documents in the files collection contain some or all of the following fields. Applications may create additional
arbitrary fields:
files._id
The unique ID for this document. The _id is of the data type you chose for the original document. The default
type for MongoDB documents is BSON ObjectId.
files.length
The size of the document in bytes.
files.chunkSize
The size of each chunk. GridFS divides the document into chunks of the size specified here. The default size is
255 kilobytes.
Changed in version 2.4.10: The default chunk size changed from 256k to 255k.
files.uploadDate
The date the document was first stored by GridFS. This value has the Date type.
files.md5
An MD5 hash returned by the filemd5 command. This value has the String type.
files.filename
Optional. A human-readable name for the document.
files.contentType
Optional. A valid MIME type for the document.
files.aliases
Optional. An array of alias strings.
files.metadata
Optional. Any additional information you want to store.
173
4.4.4 ObjectId
Overview
ObjectId is a 12-byte BSON type, constructed using:
a 4-byte value representing the seconds since the Unix epoch,
a 3-byte machine identifier,
a 2-byte process id, and
a 3-byte counter, starting with a random value.
In MongoDB, documents stored in a collection require a unique _id field that acts as a primary key. Because ObjectIds
are small, most likely unique, and fast to generate, MongoDB uses ObjectIds as the default value for the _id field if
the _id field is not specified. MongoDB clients should add an _id field with a unique ObjectId. However, if a client
does not add an _id field, mongod will add an _id field that holds an ObjectId.
Using ObjectIds for the _id field provides the following additional benefits:
in the mongo shell, you can access the creation time of the ObjectId, using the getTimestamp() method.
sorting on an _id field that stores ObjectId values is roughly equivalent to sorting by creation time.
Important: The relationship between the order of ObjectId values and generation time is not strict within a
single second. If multiple systems, or multiple processes or threads on a single system generate values, within a
single second; ObjectId values do not represent a strict insertion order. Clock skew between clients can also
result in non-strict ordering even for values, because client drivers generate ObjectId values, not the mongod
process.
Also consider the Documents (page 166) section for related information on MongoDBs document orientation.
ObjectId()
The mongo shell provides the ObjectId() wrapper class to generate a new ObjectId, and to provide the following
helper attribute and methods:
str
The hexadecimal string representation of the object.
getTimestamp()
Returns the timestamp portion of the object as a Date.
toString()
Returns the JavaScript representation in the form of a string literal ObjectId(...).
Changed in version 2.2: In previous versions toString() returns the hexadecimal string representation,
which as of version 2.2 can be retrieved by the str property.
valueOf()
Returns the representation of the object as a hexadecimal string. The returned string is the str attribute.
Changed in version 2.2: In previous versions, valueOf() returns the object.
174
Examples
Consider the following uses ObjectId() class in the mongo shell:
Generate a new ObjectId
To generate a new ObjectId using the ObjectId() constructor with a unique hexadecimal string:
y = ObjectId("507f191e810c19729de860ea")
To return the timestamp of an ObjectId() object, use the getTimestamp() method as follows:
Convert an ObjectId into a Timestamp
To return the timestamp of an ObjectId() object, use the getTimestamp() method as follows:
ObjectId("507f191e810c19729de860ea").getTimestamp()
To return the hexadecimal string representation of an ObjectId(), use the valueOf() method as follows:
ObjectId("507f191e810c19729de860ea").valueOf()
To return the string representation of an ObjectId() object, use the toString() method as follows:
ObjectId("507f191e810c19729de860ea").toString()
175
ObjectId("507f191e810c19729de860ea")
Number
1
2
3
4
5
6
7
8
9
10
11
13
14
15
16
17
18
255
127
Notes
Deprecated.
To determine a fields type, see Check Types in the mongo Shell (page 263).
If you convert BSON to JSON, see the Extended JSON reference.
Comparison/Sort Order
When comparing values of different BSON types, MongoDB uses the following comparison order, from lowest to
highest:
1. MinKey (internal type)
2. Null
3. Numbers (ints, longs, doubles)
4. Symbol, String
5. Object
6. Array
7. BinData
8. ObjectId
9. Boolean
25 http://bsonspec.org/
176
10. Date
11. Timestamp
12. Regular Expression
13. MaxKey (internal type)
MongoDB treats some types as equivalent for comparison purposes. For instance, numeric types undergo conversion
before comparison.
Changed in version 3.0.0: Date objects sort before Timestamp objects. Previously Date and Timestamp objects sorted
together.
The comparison treats a non-existent field as it would an empty BSON Object. As such, a sort on the a field in
documents { } and { a: null } would treat the documents as equivalent in sort order.
With arrays, a less-than comparison or an ascending sort compares the smallest element of arrays, and a greater-than
comparison or a descending sort compares the largest element of the arrays. As such, when comparing a field whose
value is a single-element array (e.g. [ 1 ]) with non-array fields (e.g. 2), the comparison is between 1 and 2. A
comparison of an empty array (e.g. [ ]) treats the empty array as less than null or a missing field.
MongoDB sorts BinData in the following order:
1. First, the length or size of the data.
2. Then, by the BSON one-byte subtype.
3. Finally, by the data, performing a byte-by-byte comparison.
The following sections describe special considerations for particular BSON types.
ObjectId
ObjectIds are: small, likely unique, fast to generate, and ordered. These values consists of 12-bytes, where the first
four bytes are a timestamp that reflect the ObjectIds creation. Refer to the ObjectId (page 174) documentation for
more information.
String
BSON strings are UTF-8. In general, drivers for each programming language convert from the languages string format
to UTF-8 when serializing and deserializing BSON. This makes it possible to store most international characters in
BSON strings with ease. 26 In addition, MongoDB $regex queries support UTF-8 in the regex string.
Timestamps
BSON has a special timestamp type for internal MongoDB use and is not associated with the regular Date (page 178)
type. Timestamp values are a 64 bit value where:
the first 32 bits are a time_t value (seconds since the Unix epoch)
the second 32 bits are an incrementing ordinal for operations within a given second.
Within a single mongod instance, timestamp values are always unique.
In replication, the oplog has a ts field. The values in this field reflect the operation time, which uses a BSON
timestamp value.
26 Given strings using UTF-8 character sets, using sort() on strings will be reasonably correct. However, because internally sort() uses the
C++ strcmp api, the sort order may handle some characters incorrectly.
177
Note: The BSON timestamp type is for internal MongoDB use. For most cases, in application development, you will
want to use the BSON date type. See Date (page 178) for more information.
If you insert a document containing an empty BSON timestamp in a top-level field, the MongoDB server will replace
that empty timestamp with the current timestamp value. For example, if you create an insert a document with a
timestamp value, as in the following operation:
var a = new Timestamp();
db.test.insert( { ts: a } );
Then, the db.test.find() operation will return a document that resembles the following:
{ "_id" : ObjectId("542c2b97bac0595474108b48"), "ts" : Timestamp(1412180887, 1) }
If ts were a field in an embedded document, the server would have left it as an empty timestamp value.
Changed in version 2.6: Previously, the server would only replace empty timestamp values in the first two fields,
including _id, of an inserted document. Now MongoDB will replace any top-level field.
Date
BSON Date is a 64-bit integer that represents the number of milliseconds since the Unix epoch (Jan 1, 1970). This
results in a representable date range of about 290 million years into the past and future.
The official BSON specification27 refers to the BSON Date type as the UTC datetime.
Changed in version 2.0: BSON Date type is signed.
28
Example
Construct a Date using the new Date() constructor in the mongo shell:
var mydate1 = new Date()
Example
Construct a Date using the ISODate() constructor in the mongo shell:
var mydate2 = ISODate()
Example
Return the Date value as string:
mydate1.toString()
Example
Return the month portion of the Date value; months are zero-indexed, so that January is month 0:
27 http://bsonspec.org/#/specification
28 Prior to version 2.0, Date values were incorrectly interpreted as unsigned integers, which affected sorts, range queries, and indexes on Date
fields. Because indexes are not recreated when upgrading, please re-index if you created an index on Date values with an earlier version, and dates
before 1970 are relevant to your application.
178
mydate1.getMonth()
179
180
CHAPTER 5
Administration
The administration documentation addresses the ongoing operation and maintenance of MongoDB instances and deployments. This documentation includes both high level overviews of these concerns as well as tutorials that cover
specific procedures and processes for operating MongoDB.
Administration Concepts (page 181) Core conceptual documentation of operational practices for managing MongoDB deployments and systems.
MongoDB Backup Methods (page 182) Describes approaches and considerations for backing up a MongoDB
database.
Monitoring for MongoDB (page 185) An overview of monitoring tools, diagnostic strategies, and approaches
to monitoring replica sets and sharded clusters.
Production Notes (page 198) A collection of notes that describe best practices and considerations for the operations of MongoDB instances and deployments.
Continue reading from Administration Concepts (page 181) for additional documentation of MongoDB administration.
Administration Tutorials (page 219) Tutorials that describe common administrative procedures and practices for operations for MongoDB instances and deployments.
Configuration, Maintenance, and Analysis (page 220) Describes routine management operations, including
configuration and performance analysis.
Backup and Recovery (page 240) Outlines procedures for data backup and restoration with mongod instances
and deployments.
Continue reading from Administration Tutorials (page 219) for more tutorials of common MongoDB maintenance operations.
Administration Reference (page 280) Reference and documentation of internal mechanics of administrative features,
systems and functions and operations.
See also:
The MongoDB Manual contains administrative documentation and tutorials though out several sections. See Replica
Set Tutorials (page 581) and Sharded Cluster Tutorials (page 669) for additional tutorials and information.
181
Operational Strategies (page 182) Higher level documentation of key concepts for the operation and maintenance of
MongoDB deployments.
MongoDB Backup Methods (page 182) Describes approaches and considerations for backing up a MongoDB
database.
Monitoring for MongoDB (page 185) An overview of monitoring tools, diagnostic strategies, and approaches
to monitoring replica sets and sharded clusters.
Run-time Database Configuration (page 192) Outlines common MongoDB configurations and examples of
best-practice configurations for common use cases.
Continue reading from Operational Strategies (page 182) for additional documentation.
Data Management (page 207) Core documentation that addresses issues in data management, organization, maintenance, and lifecycle management.
Data Center Awareness (page 207) Presents the MongoDB features that allow application developers and
database administrators to configure their deployments to be more data center aware or allow operational
and location-based separation.
Capped Collections (page 208) Capped collections provide a special type of size-constrained collections that
preserve insertion order and can support high volume inserts.
Expire Data from Collections by Setting TTL (page 211) TTL collections make it possible to automatically
remove data from a collection based on the value of a timestamp and are useful for managing data like
machine generated event data that are only useful for a limited period of time.
Optimization Strategies for MongoDB (page 212) Techniques for optimizing application performance with MongoDB.
Continue reading from Optimization Strategies for MongoDB (page 212) for additional documentation.
Chapter 5. Administration
The mongodump tool reads data from a MongoDB database and creates high fidelity BSON files. The
mongorestore tool can populate a MongoDB database with the data from these BSON files. These tools are
simple and efficient for backing up small MongoDB deployments, but are not ideal for capturing backups of larger
systems.
mongodump and mongorestore can operate against a running mongod process, and can manipulate the underlying data files directly. By default, mongodump does not capture the contents of the local database (page 632).
mongodump only captures the documents in the database. The resulting backup is space efficient, but
mongorestore or mongod must rebuild the indexes after restoring data.
When connected to a MongoDB instance, mongodump can adversely affect mongod performance. If your data is
larger than system memory, the queries will push the working set out of memory.
To mitigate the impact of mongodump on the performance of the replica set, use mongodump to capture backups from a secondary (page 547) member of a replica set. Alternatively, you can shut down a secondary and use
mongodump with the data files directly. If you shut down a secondary to capture data with mongodump ensure that
the operation can complete before its oplog becomes too stale to continue replicating.
1 http://docs.mongodb.org/ecosystem/tutorial/backup-and-restore-mongodb-on-amazon-ec2
183
For replica sets, mongodump also supports a point in time feature with the --oplog option. Applications may
continue modifying data while mongodump captures the output. To restore a point in time backup created with
--oplog, use mongorestore with the --oplogReplay option.
If applications modify data while mongodump is creating a backup, mongodump will compete for resources with
those applications.
See Back Up and Restore with MongoDB Tools (page 246), Backup a Small Sharded Cluster with mongodump
(page 249), and Backup a Sharded Cluster with Database Dumps (page 252) for more information.
MongoDB Management Service (MMS) Cloud Backup
The MongoDB Management Service2 supports the backing up and restoring of MongoDB deployments.
MMS continually backs up MongoDB replica sets and sharded clusters by reading the oplog data from your MongoDB
deployment.
MMS Backup offers point in time recovery of MongoDB replica sets and a consistent snapshot of sharded clusters.
MMS achieves point in time recovery by storing oplog data so that it can create a restore for any moment in time in
the last 24 hours for a particular replica set or sharded cluster. Sharded cluster snapshots are difficult to achieve with
other MongoDB backup methods.
To restore a MongoDB deployment from an MMS Backup snapshot, you download a compressed archive of your
MongoDB data files and distribute those files before restarting the mongod processes.
To get started with MMS Backup sign up for MMS3 , and consider the complete documentation of MMS see the MMS
Manual4 .
MongoDB Management Service (MMS) On Prem Backup Software
MongoDB Subscribers can install and run the same core software that powers MongoDB Management Service (MMS)
Cloud Backup (page 184) on their own infrastructure. The On Prem version of MMS, has similar functionality as the
cloud version and is available with Standard and Enterprise subscriptions.
For more information about On Prem MMS see the MongoDB subscription5 page and the MMS On Prem Manual6 .
Further Reading
Backup and Restore with Filesystem Snapshots (page 241) An outline of procedures for creating MongoDB data set
backups using system-level file snapshot tool, such as LVM or native storage appliance tools.
Restore a Replica Set from MongoDB Backups (page 244) Describes procedure for restoring a replica set from an
archived backup such as a mongodump or MMS7 Backup file.
Back Up and Restore with MongoDB Tools (page 246) The procedure for writing the contents of a database to a
BSON (i.e. binary) dump file for backing up MongoDB databases, as well as using this copy of a database to
restore a MongoDB instance.
Backup and Restore Sharded Clusters (page 249) Detailed procedures and considerations for backing up sharded
clusters and single shards.
2 https://mms.mongodb.com/
3 https://mms.mongodb.com/
4 https://docs.mms.mongodb.com/
5 https://www.mongodb.com/products/subscriptions
6 https://mms.mongodb.com/help-hosted/current/
7 https://mms.mongodb.com/
184
Chapter 5. Administration
Recover Data after an Unexpected Shutdown (page 257) Recover data from MongoDB data files that were not properly closed or have an invalid state.
Additional Resources
Backup and its Role in Disaster Recovery White Paper8
Backup vs. Replication: Why Do You Need Both?9
Monitoring for MongoDB
Monitoring is a critical component of all database administration. A firm grasp of MongoDBs reporting will allow you
to assess the state of your database and maintain your deployment without crisis. Additionally, a sense of MongoDBs
normal operational parameters will allow you to diagnose problems before they escalate to failures.
This document presents an overview of the available monitoring utilities and the reporting statistics available in MongoDB. It also introduces diagnostic strategies and suggestions for monitoring replica sets and sharded clusters.
Note: MongoDB Management Service (MMS)10 , a hosted service, and Ops Manager11 , an on-premise solution,
provide monitoring, backup, and automation of MongoDB instances. See the MMS documentation12 and Ops Manager
documentation13 for more information.
Monitoring Strategies
There are three methods for collecting data about the state of a running MongoDB instance:
First, there is a set of utilities distributed with MongoDB that provides real-time reporting of database activities.
Second, database commands return statistics regarding the current database state with greater fidelity.
Third, MMS Monitoring14 collects data from running MongoDB deployments and provides visualization and
alerts based on that data.
Each strategy can help answer different questions and is useful in different contexts. These methods are complementary.
MongoDB Reporting Tools
This section provides an overview of the reporting methods distributed with MongoDB. It also offers examples of the
kinds of questions that each method is best suited to help you address.
Utilities The MongoDB distribution includes a number of utilities that quickly return statistics about instances
performance and activity. Typically, these are most useful for diagnosing issues and assessing normal operation.
8 https://www.mongodb.com/lp/white-paper/backup-disaster-recovery
9 http://www.mongodb.com/blog/post/backup-vs-replication-why-do-you-need-both
10 https://mms.mongodb.com/
11 https://www.mongodb.com/products/mongodb-enterprise-advanced
12 https://docs.mms.mongodb.com/
13 https://docs.opsmanager.mongodb.com
14 https://mms.mongodb.com/
185
mongostat mongostat captures and returns the counts of database operations by type (e.g. insert, query, update,
delete, etc.). These counts report on the load distribution on the server.
Use mongostat to understand the distribution of operation types and to inform capacity planning. See the
mongostat manual for details.
mongotop mongotop tracks and reports the current read and write activity of a MongoDB instance, and reports
these statistics on a per collection basis.
Use mongotop to check if your database activity and use match your expectations. See the mongotop manual
for details.
HTTP Console MongoDB provides a web interface that exposes diagnostic and monitoring information in a simple
web page. The web interface is accessible at localhost:<port>, where the <port> number is 1000 more than
the mongod port .
For example, if a locally running mongod is using the default port 27017, access the HTTP console at
http://localhost:28017.
Commands MongoDB includes a number of commands that report on the state of the database.
These data may provide a finer level of granularity than the utilities discussed above. Consider using their output
in scripts and programs to develop custom alerts, or to modify the behavior of your application in response to the
activity of your instance. The db.currentOp method is another useful tool for identifying the database instances
in-progress operations.
serverStatus The serverStatus command, or db.serverStatus() from the shell, returns a general
overview of the status of the database, detailing disk usage, memory use, connection, journaling, and index access.
The command returns quickly and does not impact MongoDB performance.
serverStatus outputs an account of the state of a MongoDB instance. This command is rarely run directly. In
most cases, the data is more meaningful when aggregated, as one would see with monitoring tools including MMS15 .
Nevertheless, all administrators should be familiar with the data provided by serverStatus.
dbStats The dbStats command, or db.stats() from the shell, returns a document that addresses storage use
and data volumes. The dbStats reflect the amount of storage used, the quantity of data contained in the database,
and object, collection, and index counters.
Use this data to monitor the state and storage capacity of a specific database. This output also allows you to compare
use between databases and to determine the average document size in a database.
collStats The collStats or db.collection.stats() from the shell that provides statistics that resemble dbStats on the collection level, including a count of the objects in the collection, the size of the collection, the
amount of disk space used by the collection, and information about its indexes.
replSetGetStatus The replSetGetStatus command (rs.status() from the shell) returns an
overview of your replica sets status. The replSetGetStatus document details the state and configuration of
the replica set and statistics about its members.
Use this data to ensure that replication is properly configured, and to check the connections between the current host
and the other members of the replica set.
15 https://mms.mongodb.com/
186
Chapter 5. Administration
Third Party Tools A number of third party monitoring tools have support for MongoDB, either directly, or through
their own plugins.
Self Hosted Monitoring Tools These are monitoring tools that you must install, configure and maintain on your
own servers. Most are open source.
Tool
Ganglia16
Plugin
mongodb-ganglia17
Ganglia
gmond_python_modules18
Motop19
None
mtop20
Munin21
Munin
None
mongo-munin22
mongomon23
Munin
Nagios25
nagios-plugin-mongodb26
Description
Python script to report operations per second,
memory usage, btree statistics, master/slave status
and current connections.
Parses output from the serverStatus and
replSetGetStatus commands.
Realtime monitoring tool for MongoDB servers.
Shows current operations ordered by durations
every second.
A top like tool.
Retrieves server statistics.
Retrieves collection statistics (sizes, index sizes,
and each (configured) collection count for one
DB).
Some additional munin plugins not in the main
distribution.
A simple Nagios check script, written in Python.
Also consider dex27 , an index and query analyzing tool for MongoDB that compares MongoDB log files and indexes
to make indexing recommendations.
As part of MongoDB Enterprise28 , you can run MMS On-Prem29 , which offers the features of MMS in a package that
runs within your own infrastructure.
Hosted (SaaS) Monitoring Tools These are monitoring tools provided as a hosted service, usually through a paid
subscription.
Name
MongoDB Management
Service :mms-home:</>
Scout30
Server Density34
Application Performance
Management36
Notes
MMS is a cloud-based suite of services for managing MongoDB deployments.
MMS provides monitoring, backup, and automation functionality.
Several plugins, including MongoDB Monitoring31 , MongoDB Slow
Queries32 , and MongoDB Replica Set Monitoring33 .
Dashboard for MongoDB35 , MongoDB specific alerts, replication failover
timeline and iPhone, iPad and Android mobile apps.
IBM has an Application Performance Management SaaS offering that includes
monitor for MongoDB and other applications and middleware.
16 http://sourceforge.net/apps/trac/ganglia/wiki
17 https://github.com/quiiver/mongodb-ganglia
18 https://github.com/ganglia/gmond_python_modules
19 https://github.com/tart/motop
20 https://github.com/beaufour/mtop
21 http://munin-monitoring.org/
22 https://github.com/erh/mongo-munin
23 https://github.com/pcdummy/mongomon
24 https://launchpad.net/
chris-lea/+archive/munin-plugins
25 http://www.nagios.org/
26 https://github.com/mzupan/nagios-plugin-mongodb
27 https://github.com/mongolab/dex
28 http://www.mongodb.com/products/mongodb-enterprise
29 https://mms.mongodb.com/
30 http://scoutapp.com
187
Process Logging
During normal operation, mongod and mongos instances report a live account of all server activity and operations to
either standard output or a log file. The following runtime settings control these options.
quiet. Limits the amount of information written to the log or output.
verbosity. Increases the amount of information written to the log or output. You can also modify the logging
verbosity during runtime with the logLevel parameter or the db.setLogLevel() method in the shell.
path. Enables logging to a file, rather than the standard output. You must specify the full path to the log file
when adjusting this setting.
logAppend. Adds information to a log file instead of overwriting the file.
Note: You can specify these configuration operations as the command line arguments to mongod or mongos
For example:
mongod -v --logpath /var/log/mongodb/server1.log --logappend
mode,
appending
data
to
the
log
file
at
Degraded performance in MongoDB is typically a function of the relationship between the quantity of data stored
in the database, the amount of system RAM, the number of connections to the database, and the amount of time the
database spends in a locked state.
In some cases performance issues may be transient and related to traffic load, data access patterns, or the availability
of hardware on the host system for virtualized environments. Some users also experience performance limitations as a
result of inadequate or inappropriate indexing strategies, or as a consequence of poor schema design patterns. In other
situations, performance issues may indicate that the database may be operating at capacity and that it is time to add
additional capacity to the database.
The following are some causes of degraded performance in MongoDB.
Locks MongoDB uses a locking system to ensure data set consistency. However, if certain operations are longrunning, or a queue forms, performance will degrade as requests and operations wait for the lock. Lock-related
slowdowns can be intermittent. To see if the lock has been affecting your performance, look to the data in the globalLock section of the serverStatus output. If globalLock.currentQueue.total is consistently high, then
there is a chance that a large number of requests are waiting for a lock. This indicates a possible concurrency issue
that may be affecting performance.
31 https://scoutapp.com/plugin_urls/391-mongodb-monitoring
32 http://scoutapp.com/plugin_urls/291-mongodb-slow-queries
33 http://scoutapp.com/plugin_urls/2251-mongodb-replica-set-monitoring
34 http://www.serverdensity.com
35 http://www.serverdensity.com/mongodb-monitoring/
36 http://ibmserviceengage.com
188
Chapter 5. Administration
If globalLock.totalTime is high relative to uptime, the database has existed in a lock state for a significant
amount of time. If globalLock.ratio is also high, MongoDB has likely been processing a large number of
long running queries. Long queries are often the result of a number of factors: ineffective use of indexes, nonoptimal schema design, poor query structure, system architecture issues, or insufficient RAM resulting in page faults
(page 218) and disk reads.
Memory Usage MongoDB uses memory mapped files to store data. Given a data set of sufficient size, the MongoDB
process will allocate all available memory on the system for its use. While this is part of the design, and affords
MongoDB superior performance, the memory mapped files make it difficult to determine if the amount of RAM is
sufficient for the data set.
The memory usage statuses metrics of the serverStatus output can provide insight into MongoDBs memory use.
Check the resident memory use (i.e. mem.resident): if this exceeds the amount of system memory and there is a
significant amount of data on disk that isnt in RAM, you may have exceeded the capacity of your system.
You should also check the amount of mapped memory (i.e. mem.mapped.) If this value is greater than the amount of
system memory, some operations will require disk access page faults to read data from virtual memory and negatively
affect performance.
Page Faults With the MMAPv1 storage engine, page faults can occur as MongoDB reads from or writes data to parts
of its data files that are not currently located in physical memory. In contrast, operating system page faults happen
when physical memory is exhausted and pages of physical memory are swapped to disk.
Page faults triggered by MongoDB are reported as the total number of page faults in one second. To check for page
faults, see the extra_info.page_faults value in the serverStatus output.
MongoDB on Windows counts both hard and soft page faults.
The MongoDB page fault counter may increase dramatically in moments of poor performance and may correlate
with limited physical memory environments. Page faults also can increase while accessing much larger data sets,
for example, scanning an entire collection. Limited and sporadic MongoDB page faults do not necessarily indicate a
problem or a need to tune the database.
A single page fault completes quickly and is not problematic. However, in aggregate, large volumes of page faults
typically indicate that MongoDB is reading too much data from disk. In many situations, MongoDBs read locks will
yield after a page fault to allow other processes to read and avoid blocking while waiting for the next page to read
into memory. This approach improves concurrency, and also improves overall throughput in high volume systems.
Increasing the amount of RAM accessible to MongoDB may help reduce the frequency of page faults. If this is not
possible, you may want to consider deploying a sharded cluster or adding shards to your deployment to distribute load
among mongod instances.
See What are page faults? (page 754) for more information.
Number of Connections In some cases, the number of connections between the application layer (i.e. clients) and
the database can overwhelm the ability of the server to handle requests. This can produce performance irregularities.
The following fields in the serverStatus document can provide insight:
globalLock.activeClients contains a counter of the total number of clients with active operations in
progress or queued.
connections is a container for the following two fields:
current the total number of current clients that connect to the database instance.
available the total number of unused connections available for new clients.
189
If requests are high because there are numerous concurrent application requests, the database may have trouble keeping
up with demand. If this is the case, then you will need to increase the capacity of your deployment. For read-heavy
applications increase the size of your replica set and distribute read operations to secondary members. For write heavy
applications, deploy sharding and add one or more shards to a sharded cluster to distribute load among mongod
instances.
Spikes in the number of connections can also be the result of application or driver errors. All of the officially supported
MongoDB drivers implement connection pooling, which allows clients to use and reuse connections more efficiently.
Extremely high numbers of connections, particularly without corresponding workload is often indicative of a driver or
other configuration error.
Unless constrained by system-wide limits MongoDB has no limit on incoming connections. You can modify system
limits using the ulimit command, or by editing your systems /etc/sysctl file. See UNIX ulimit Settings
(page 281) for more information.
Database Profiling MongoDBs Profiler is a database profiling system that can help identify inefficient queries
and operations.
The following profiling levels are available:
Level
0
1
2
Setting
Off. No profiling
On. Only includes slow operations
On. Includes all operations
Enable the profiler by setting the profile value using the following command in the mongo shell:
db.setProfilingLevel(1)
The slowOpThresholdMs setting defines what constitutes a slow operation. To set the threshold above
which the profiler considers operations slow (and thus, included in the level 1 profiling data), you can configure
slowOpThresholdMs at runtime as an argument to the db.setProfilingLevel() operation.
See
The documentation of db.setProfilingLevel() for more information about this command.
By default, mongod records all slow queries to its log, as defined by slowOpThresholdMs.
Note: Because the database profiler can negatively impact performance, only enable profiling for strategic intervals
and as minimally as possible on production systems.
You may enable profiling on a per-mongod basis. This setting will not propagate across a replica set or sharded
cluster.
You can view the output of the profiler in the system.profile collection of your database by issuing the show
profile command in the mongo shell, or with the following operation:
db.system.profile.find( { millis : { $gt : 100 } } )
This returns all operations that lasted longer than 100 milliseconds. Ensure that the value specified here (100, in this
example) is above the slowOpThresholdMs threshold.
See also:
Optimization Strategies for MongoDB (page 212) addresses strategies that may improve the performance of your
database queries and operations.
190
Chapter 5. Administration
Beyond the basic monitoring requirements for any MongoDB instance, for replica sets, administrators must monitor
replication lag. Replication lag refers to the amount of time that it takes to copy (i.e. replicate) a write operation
on the primary to a secondary. Some small delay period may be acceptable, but two significant problems emerge as
replication lag grows:
First, operations that occurred during the period of lag are not replicated to one or more secondaries. If youre
using replication to ensure data persistence, exceptionally long delays may impact the integrity of your data set.
Second, if the replication lag exceeds the length of the operation log (oplog) then MongoDB will have to perform
an initial sync on the secondary, copying all data from the primary and rebuilding all indexes. This is uncommon
under normal circumstances, but if you configure the oplog to be smaller than the default, the issue can arise.
Note: The size of the oplog is only configurable during the first run using the --oplogSize argument to the
mongod command, or preferably, the oplogSizeMB setting in the MongoDB configuration file. If you do not
specify this on the command line before running with the --replSet option, mongod will create a default
sized oplog.
By default, the oplog is 5 percent of total available disk space on 64-bit systems. For more information about
changing the oplog size, see the Change the Size of the Oplog (page 608)
For causes of replication lag, see Replication Lag (page 627).
Replication issues are most often the result of network connectivity issues between members, or the result of a primary
that does not have the resources to support application and replication traffic. To check the status of a replica, use the
replSetGetStatus or the following helper in the shell:
rs.status()
The replSetGetStatus reference provides a more in-depth overview view of this output. In general, watch the
value of optimeDate, and pay particular attention to the time difference between the primary and the secondary
members.
Sharding and Monitoring
In most cases, the components of sharded clusters benefit from the same monitoring and analysis as all other MongoDB
instances. In addition, clusters require further monitoring to ensure that data is effectively distributed among nodes
and that sharding operations are functioning appropriately.
See also:
See the Sharding Concepts (page 647) documentation for more information.
Config Servers The config database maintains a map identifying which documents are on which shards. The cluster
updates this map as chunks move between shards. When a configuration server becomes inaccessible, certain sharding
operations become unavailable, such as moving chunks and starting mongos instances. However, clusters remain
accessible from already-running mongos instances.
Because inaccessible configuration servers can seriously impact the availability of a sharded cluster, you should monitor your configuration servers to ensure that the cluster remains well balanced and that mongos instances can restart.
MMS37 monitors config servers and can create notifications if a config server becomes inaccessible.
37 https://mms.mongodb.com/
191
Balancing and Chunk Distribution The most effective sharded cluster deployments evenly balance chunks among
the shards. To facilitate this, MongoDB has a background balancer process that distributes data to ensure that chunks
are always optimally distributed among the shards.
Issue the db.printShardingStatus() or sh.status() command to the mongos by way of the mongo
shell. This returns an overview of the entire cluster including the database name, and a list of the chunks.
Stale Locks In nearly every case, all locks used by the balancer are automatically released when they become stale.
However, because any long lasting lock can block future balancing, its important to ensure that all locks are legitimate.
To check the lock status of the database, connect to a mongos instance using the mongo shell. Issue the following
command sequence to switch to the config database and display all outstanding locks on the shard database:
use config
db.locks.find()
For active deployments, the above query can provide insights. The balancing process, which originates on a randomly
selected mongos, takes a special balancer lock that prevents other balancing activity from transpiring. Use the
following command, also to the config database, to check the status of the balancer lock.
db.locks.find( { _id : "balancer" } )
If this lock exists, make sure that the balancer process is actively using this lock.
Run-time Database Configuration
The command line and configuration file interfaces provide MongoDB administrators with a large number of options and settings for controlling the operation of the database system. This document provides an overview
of common configurations and examples of best-practice configurations for common use cases.
While both interfaces provide access to the same collection of options and settings, this document primarily uses the
configuration file interface. If you run MongoDB using a control script or installed from a package for your operating
system, you likely already have a configuration file located at /etc/mongodb.conf. Confirm this by checking the
contents of the /etc/init.d/mongod or /etc/rc.d/mongod script to ensure that the control scripts start the
mongod with the appropriate configuration file (see below.)
To start a MongoDB instance using this configuration issue a command in the following form:
mongod --config /etc/mongodb.conf
mongod -f /etc/mongodb.conf
Modify the values in the /etc/mongodb.conf file on your system to control the configuration of your database
instance.
Configure the Database
192
Chapter 5. Administration
For most standalone servers, this is a sufficient base configuration. It makes several assumptions, but consider the
following explanation:
fork is true, which enables a daemon mode for mongod, which detaches (i.e. forks) the MongoDB from
the current session and allows you to run the database as a conventional server.
bindIp is 127.0.0.1, which forces the server to only listen for requests on the localhost IP. Only bind to
secure interfaces that the application-level systems can access with access control provided by system network
filtering (i.e. firewall).
New in version 2.6: mongod installed from official .deb (page 17) and .rpm (page 7) packages have the
bind_ip configuration set to 127.0.0.1 by default.
port is 27017, which is the default MongoDB port for database instances. MongoDB can bind to any port.
You can also filter access based on port using network filtering tools.
Note: UNIX-like systems require superuser privileges to attach processes to ports lower than 1024.
quiet is true. This disables all but the most critical entries in output/log file, and is not recommended for
production systems. If you do set this option, you can use setParameter to modify this setting during run
time.
dbPath is /srv/mongodb, which specifies where MongoDB will store its data files. /srv/mongodb and
/var/lib/mongodb are popular locations. The user account that mongod runs under will need read and
write access to this directory.
systemLog.path is /var/log/mongodb/mongod.log which is where mongod will write its output.
If you do not set this value, mongod writes all output to standard output (e.g. stdout.)
logAppend is true, which ensures that mongod does not overwrite an existing log file following the server
start operation.
storage.journal.enabled is true, which enables journaling. Journaling ensures single instance writedurability. 64-bit builds of mongod enable journaling by default. Thus, this setting may be redundant.
Given the default configuration, some of these values may be redundant. However, in many situations explicitly stating
the configuration increases overall system intelligibility.
Security Considerations
The following collection of configuration options are useful for limiting access to a mongod instance. Consider the
following:
bind_ip = 127.0.0.1,10.8.0.10,192.168.4.24
auth = true
193
Replication Configuration Replica set configuration is straightforward, and only requires that the replSetName
have a value that is consistent among all members of the set. Consider the following:
replSet = set0
Use descriptive names for sets. Once configured use the mongo shell to add hosts to the replica set.
See also:
Replica set reconfiguration.
To enable authentication for the replica set, add the following option:
keyFile = /srv/mongodb/keyfile
New in version 1.8: for replica sets, and 1.9.1 for sharded replica sets.
Setting keyFile enables authentication and specifies a key file for the replica set member use to when authenticating
to each other. The content of the key file is arbitrary, but must be the same on all members of the replica set and
mongos instances that connect to the set. The keyfile must be less than one kilobyte in size and may only contain
characters in the base64 set and the file must not have group or world permissions on UNIX systems.
See also:
The Replica set Reconfiguration section for information regarding the process for changing replica set during operation.
Additionally, consider the Replica Set Security (page 310) section for information on configuring authentication with
replica sets.
Finally, see the Replication (page 541) document for more information on replication in MongoDB and replica set
configuration in general.
Sharding Configuration Sharding requires a number of mongod instances with different configurations. The config servers store the clusters metadata, while the cluster distributes data among one or more shard servers.
Note: Config servers are not replica sets.
To set up one or three config server instances as normal (page 192) mongod instances, and then add the following
configuration option:
configsvr = true
bind_ip = 10.8.0.12
port = 27001
This creates a config server running on the private IP address 10.8.0.12 on port 27001. Make sure that there are
no port conflicts, and that your config server is accessible from all of your mongos and mongod instances.
194
Chapter 5. Administration
To set up shards, configure two or more mongod instance using your base configuration (page 192), with the
shardsvr value for the clusterRole setting:
shardsvr = true
Finally, to establish the cluster, configure at least one mongos process with the following settings:
configdb = 10.8.0.12:27001
chunkSize = 64
You can specify multiple configDB instances by specifying hostnames and ports in the form of a comma separated
list. In general, avoid modifying the chunkSize from the default value of 64, 38 and ensure this setting is consistent
among all mongos instances.
See also:
The Sharding (page 641) section of the manual for more information on sharding and cluster configuration.
Run Multiple Database Instances on the Same System
In many cases running multiple instances of mongod on a single system is not recommended. On some types of
deployments 39 and for testing purposes you may need to run more than one mongod on a single system.
In these cases, use a base configuration (page 192) for each instance, but consider the following configuration values:
dbpath = /srv/mongodb/db0/
pidfilepath = /srv/mongodb/db0.pid
The dbPath value controls the location of the mongod instances data directory. Ensure that each database has a
distinct and well labeled data directory. The pidFilePath controls where mongod process places its process id
file. As this tracks the specific mongod file, it is crucial that file be unique and well labeled to make it easy to start
and stop these processes.
Create additional control scripts and/or adjust your existing MongoDB configuration and control script as needed to
control these processes.
Diagnostic Configurations
The following configuration options control various mongod behaviors for diagnostic purposes. The following settings have default values that tuned for general production purposes:
slowms = 50
profile = 3
verbose = true
objcheck = true
Use the base configuration (page 192) and add these options if you are experiencing some unknown issue or performance problem as needed:
slowOpThresholdMs configures the threshold for to consider a query slow, for the purpose of the logging
system and the database profiler. The default value is 100 milliseconds. Set a lower value if the database
profiler does not return useful results, or a higher value to only log the longest running queries. See Optimization
Strategies for MongoDB (page 212) for more information on optimizing operations in MongoDB.
38 Chunk size is 64 megabytes by default, which provides the ideal balance between the most even distribution of data, for which smaller chunk
sizes are best, and minimizing chunk migration, for which larger chunk sizes are optimal.
39 Single-tenant systems with SSD or other high performance disks may provide acceptable performance levels for multiple mongod instances.
Additionally, you may find that multiple databases with small working sets may function acceptably on a single system.
195
mode sets the database profiler level. The profiler is not active by default because of the possible impact on the
profiler itself on performance. Unless this setting has a value, queries are not profiled.
verbosity controls the amount of logging output that mongod write to the log. Only use this option if you
are experiencing an issue that is not reflected in the normal logging level.
wireObjectCheck forces mongod to validate all requests from clients upon receipt. Use this option to
ensure that invalid requests are not causing errors, particularly when running a database with untrusted clients.
This option may affect database performance.
Import and Export MongoDB Data
This document provides an overview of the import and export programs included in the MongoDB distribution. These
tools are useful when you want to backup or export a portion of your data without capturing the state of the entire
database, or for simple data ingestion cases. For more complex data migration tasks, you may want to write your own
import and export scripts using a client driver to interact with the database itself. For disaster recovery protection and
routine database backup operation, use full database instance backups (page 182).
Warning: Because these tools primarily operate by interacting with a running mongod instance, they can impact
the performance of your running database.
Not only do these processes create traffic for a running database instance, they also force the database to read all
data through memory. When MongoDB reads infrequently used data, it can supplant more frequently accessed
data, causing a deterioration in performance for the databases regular workload.
See also:
MongoDB Backup Methods (page 182) or MMS Backup Manual40 for more information on backing up MongoDB
instances. Additionally, consider the following references for the MongoDB import/export tools:
mongoimport
mongoexport
mongorestore
mongodump
Data Import, Export, and Backup Operations
For resilient and non-disruptive backups, use a file system or block-level disk snapshot function, such as the methods described in the MongoDB Backup Methods (page 182) document. The tools and operations discussed provide
functionality that is useful in the context of providing some kinds of backups.
In contrast, use import and export tools to backup a small subset of your data or to move data to or from a third
party system. These backups may capture a small crucial set of data or a frequently modified section of data for extra
insurance, or for ease of access.
Warning: mongoimport and mongoexport do not reliably preserve all rich BSON data types because JSON
can only represent a subset of the types supported by BSON. As a result, data exported or imported with these tools
may lose some measure of fidelity. See the Extended JSON reference for more information.
No matter how you decide to import or export your data, consider the following guidelines:
Label files so that you can identify the contents of the export or backup as well as the point in time the export/backup reflect.
40 https://docs.mms.mongodb.com/tutorial/nav/backup-use/
196
Chapter 5. Administration
Do not create or apply exports if the backup process itself will have an adverse effect on a production system.
Make sure that they reflect a consistent data state. Export or backup processes can impact data integrity (i.e.
type fidelity) and consistency if updates continue during the backup process.
Test backups and exports by restoring and importing to ensure that the backups are useful.
Human Intelligible Import/Export Formats
This section describes a process to import/export a collection to a file in a JSON or CSV format.
The examples in this section use the MongoDB tools mongoimport and mongoexport. These tools may also be
useful for importing data into a MongoDB database from third party applications.
If you want to simply copy a database or collection from one instance to another, consider using the copydb,
clone, or cloneCollection commands, which may be more suited to this task. The mongo shell provides
the db.copyDatabase() method.
Collection Export with mongoexport You can use the mongoexport utility you can create a backup file.
Warning: mongoimport and mongoexport do not reliably preserve all rich BSON data types because JSON
can only represent a subset of the types supported by BSON. As a result, data exported or imported with these tools
may lose some measure of fidelity. See the Extended JSON reference for more information.
In the most simple invocation, the command takes the following form:
mongoexport --collection collection --out collection.json
This will export all documents in the collection named collection into the file collection.json. Without
the output specification (i.e. --out collection.json), mongoexport writes output to standard output (i.e.
stdout). You can further narrow the results by supplying a query filter using the --query and limit results to a
single database using the --db option. For instance:
mongoexport --db sales --collection contacts --query '{"field": 1}'
This command returns all documents in the sales databases contacts collection, with a field named field with
a value of 1. Enclose the query in single quotes (e.g. ) to ensure that it does not interact with your shell environment.
The resulting documents will return on standard output.
By default, mongoexport returns one JSON document per MongoDB document. Specify the --jsonArray
argument to return the export as a single JSON array. Use the --type=csv file to return the result in CSV
(comma separated values) format.
Changed in version 3.0.0: mongoexport removed the --csv option and replaced with the --type option.
If your mongod instance is not running, you can use the --dbpath option to specify the location to your MongoDB instances database files. See the following example:
mongoexport --db sales --collection contacts --dbpath /srv/MongoDB/
This reads the data files directly. This locks the data directory to prevent conflicting writes. The mongod process must
not be running or attached to these data files when you run mongoexport in this configuration.
The --host and --port options allow you to specify a non-local host to connect to capture the export. Consider
the following example:
mongoexport --host mongodb1.example.net --port 37017 --username user --password pass --collection con
On any mongoexport command you may, as above specify username and password credentials as above.
5.1. Administration Concepts
197
Collection Import with mongoimport To restore a backup taken with mongoexport. Most of the arguments
to mongoexport also exist for mongoimport.
Warning: mongoimport and mongoexport do not reliably preserve all rich BSON data types because JSON
can only represent a subset of the types supported by BSON. As a result, data exported or imported with these tools
may lose some measure of fidelity. See the Extended JSON reference for more information.
Consider the following command:
mongoimport --collection collection --file collection.json
This imports the contents of the file collection.json into the collection named collection. If you do not
specify a file with the --file option, mongoimport accepts input over standard input (e.g. stdin.)
If you specify the --upsert option, all of mongoimport operations will attempt to update existing documents
in the database and insert other documents. This option will cause some performance impact depending on your
configuration.
You can specify the database option --db to import these documents to a particular database. If your MongoDB
instance is not running, use the --dbpath option to specify the location of your MongoDB instances database
files. Consider using the --journal option to ensure that mongoimport records its operations in the journal. The mongod process must not be running or attached to these data files when you run mongoimport in this
configuration.
Use the --ignoreBlanks option to ignore blank fields. For CSV and TSV imports, this option provides the
desired functionality in most cases: it avoids inserting blank fields in MongoDB documents.
Production Notes
This page details system configurations that affect MongoDB, especially in production.
Note: MongoDB Management Service (MMS)41 , a hosted service, and Ops Manager42 , an on-premise solution,
provide monitoring, backup, and automation of MongoDB instances. See the MMS documentation43 and Ops Manager
documentation44 for more information.
MongoDB
Storage Engines Changed in version 3.0: MongoDB includes support for two storage engines: MMAPv1 (page 89),
the storage engine available in previous versions of MongoDB, and WiredTiger (page 89). MongoDB uses the
MMAPv1 engine by default.
The files in the dbPath directory must correspond to the configured storage engine. mongod will not start if dbPath
contains data files created by a storage engine other than the one specified by --storageEngine.
Supported Platforms MongoDB distributions are currently available for Mac OS X, Linux, Windows Server 2008
R2 64bit, Windows 7 (64 bit), Windows Vista, and Solaris. The MongoDB distribution for Solaris does not include
support for the WiredTiger storage engine (page 89).
For a full list of the recommended operating systems for production deployments, see: Recommended Operating
Systems for Production Deployments (page 5).
41 https://mms.mongodb.com/
42 https://www.mongodb.com/products/mongodb-enterprise-advanced
43 https://docs.mms.mongodb.com/
44 https://docs.opsmanager.mongodb.com
198
Chapter 5. Administration
See also:
Platform Specific Considerations (page 203)
Use the Latest Stable Packages Be sure you have the latest stable release.
All releases are available on the Downloads45 page. The Downloads46 page is a good place to verify the current stable
release, even if you are installing via a package manager.
Use 64-bit Builds Always use 64-bit builds for production.
Although the 32-bit builds exist, they are unsuitable for production deployments. 32-bit builds also do not support
the WiredTiger storage engine. For more information, see the 32-bit limitations page (page 726)
Concurrency
MMAPv1 Changed in version 3.0: Beginning with MongoDB 3.0, MMAPv1 (page 89) provides collection-level
locking: All collections have a unique readers-writer lock that allows multiple clients to modify documents in different
collections at the same time.
For MongoDB versions 2.2 through 2.6 series, each database has a readers-writer lock that allows concurrent read access to a database, but gives exclusive access to a single write operation per database. See the Concurrency (page 738)
page for more information. In earlier versions of MongoDB, all write operations contended for a single readers-writer
lock for the entire mongod instance.
WiredTiger WiredTiger (page 89) supports concurrent access by readers and writers to the documents in a collection. Clients can read documents while write operations are in progress, and multiple threads can modify different
documents in a collection at the same time.
See also:
Manage Connection Pool Sizes (page 200), Allocate Sufficient RAM and CPU (page 200)
Data Consistency
Journaling MongoDB uses write ahead logging to an on-disk journal. Journaling guarantees that MongoDB can
quickly recover write operations (page 71) that were not written to data files in cases where mongod terminated as a
result of a crash or other serious failure.
Leave journaling enabled in order to ensure that mongod will be able to recover its data files and keep the data files
in a valid state following a crash. See Journaling (page 300) for more information.
Write Concern Write concern describes the guarantee that MongoDB provides when reporting on the success of
a write operation. The strength of the write concerns determine the level of guarantee. When inserts, updates and
deletes have a weak write concern, write operations return quickly. In some failure cases, write operations issued with
weak write concerns may not persist. With stronger write concerns, clients wait after sending a write operation for
MongoDB to confirm the write operations.
MongoDB provides different levels of write concern to better address the specific needs of applications. Clients
may adjust write concern to ensure that the most important operations persist successfully to an entire MongoDB
45 http://www.mongodb.org/downloads
46 http://www.mongodb.org/downloads
199
deployment. For other less critical operations, clients can adjust the write concern to ensure faster performance rather
than ensure persistence to the entire deployment.
See the Write Concern (page 76) document for more information about choosing an appropriate write concern level
for your deployment.
Networking
Use Trusted Networking Environments Always run MongoDB in a trusted environment, with network rules that
prevent access from all unknown machines, systems, and networks. As with any sensitive system that is dependent on
network access, your MongoDB deployment should only be accessible to specific systems that require access, such as
application servers, monitoring services, and other MongoDB components.
Note: By default, authorization (page 312) is not enabled, and mongod assumes a trusted environment. Enable
authorization mode as needed. For more information on authentication mechanisms supported in MongoDB as
well as authorization in MongoDB, see Authentication (page 308) and Authorization (page 312).
For additional information and considerations on security, refer to the documents in the Security Section (page 305),
specifically:
Security Checklist (page 322)
Configuration Options (page 314)
Firewalls (page 315)
Network Security Tutorials (page 324)
For Windows users, consider the Windows Server Technet Article on TCP Configuration47 when deploying MongoDB
on Windows.
Disable HTTP Interfaces MongoDB provides interfaces to check the status of the server and, optionally, run queries
on it, over HTTP. In production environments, disable the HTTP interfaces.
See HTTP Status Interface (page 317).
Manage Connection Pool Sizes To avoid overloading the connection resources of a single mongod or mongos
instance, ensure that clients maintain reasonable connection pool sizes.
When using the WiredTiger storage engine (page 89), the number of incoming connections to Wired Tiger should be
less than or equal to the number of cores available on the machine.
The connPoolStats command returns information regarding the number of open connections to the current
database for mongos and mongod instances in sharded clusters.
Hardware Considerations
MongoDB is designed specifically with commodity hardware in mind and has few hardware requirements or limitations. MongoDBs core components run on little-endian hardware, primarily x86/x86_64 processors. Client libraries
(i.e. drivers) can run on big or little endian systems.
Allocate Sufficient RAM and CPU
47 http://technet.microsoft.com/en-us/library/dd349797.aspx
200
Chapter 5. Administration
MMAPv1 The MMAPv1 storage engine is not CPU bound due to its concurrency model. As such, increasing the
number of cores can help but does not provide significant return.
Increasing the amount of RAM accessible to MongoDB may help reduce the frequency of page faults.
WiredTiger The WiredTiger storage engine is CPU bound since the number of connections to Wired Tiger should
be less than or equal to the number of cores available on the machine. As such, increasing the number of cores can
improve performance.
If you run mongod in a container (e.g. lxc, cgroups, Docker, etc.) that does not have access to all of the RAM
available in a system, you must set the wiredTiger.engineConfig.cacheSizeGB to a value less than the
amount of RAM available in the container. The exact amount depends on the other processes running in the container.
The size of the WiredTiger cache should be sufficient to hold the entire working set for the mongod. To adjust the
size of the WiredTiger cache, see storage.wiredTiger.engineConfig.cacheSizeGB.
See also:
Concurrency (page 199)
Use Solid State Disks (SSDs)
(Solid State Disk).
MongoDB has good results and a good price-performance ratio with SATA SSD
Use SSD if available and economical. Spinning disks can be performant, but SSDs capacity for random I/O operations
works well with the update model of MMAPv1.
Commodity (SATA) spinning drives are often a good option, as the random I/O performance increase with more
expensive spinning drives is not that dramatic (only on the order of 2x). Using SSDs or increasing RAM may be more
effective in increasing I/O throughput.
MongoDB and NUMA Hardware Running MongoDB on a system with Non-Uniform Access Memory (NUMA)
can cause a number of operational problems, including slow performance for periods of time and high system process
usage.
When running MongoDB servers and clients on NUMA hardware, you should configure a memory interleave policy so
that the host behaves in a non-NUMA fashion. MongoDB checks NUMA settings on start up when deployed on Linux
(since version 2.0) and Windows (since version 2.6) machines. If the NUMA configuration may degrade performance,
MongoDB prints a warning.
See The MySQL swap insanity problem and the effects of NUMA48 post, which describes the effects of NUMA on
databases. This blog post addresses the impact of NUMA for MySQL, but the issues for MongoDB are similar. The
post introduces NUMA and its goals, and illustrates how these goals are not compatible with production databases.
Configuring NUMA on Windows On Windows, memory interleaving must be enabled through the machines
BIOS. Please consult your system documentation for details.
Configuring NUMA on Linux When running MongoDB on Linux you may instead use the numactl command
and start the MongoDB programs (mongod, mongos, or clients) in the following manner:
numactl --interleave=all <path>
where <path> is the path to the program you are starting. Then, disable zone reclaim in the proc settings using the
following command:
48 http://jcole.us/blog/archives/2010/09/28/mysql-swap-insanity-and-the-numa-architecture/
201
To fully disable NUMA behavior, you must perform both operations. For more information, see the Documentation
for /proc/sys/vm/*49 .
Disk and Storage Systems
Swap Assign swap space for your systems. Allocating swap space can avoid issues with memory contention and
can prevent the OOM Killer on Linux systems from killing mongod.
For the MMAPv1 storage engine, the method mongod uses to map files to memory ensures that the operating system
will never store MongoDB data in swap space.
The WiredTiger storage engine also ensures that the operating system does not store MongoDB data in swap space.
RAID Most MongoDB deployments should use disks backed by RAID-10.
RAID-5 and RAID-6 do not typically provide sufficient performance to support a MongoDB deployment.
Avoid RAID-0 with MongoDB deployments. While RAID-0 provides good write performance, it also provides limited
availability and can lead to reduced performance on read operations, particularly when using Amazons EBS volumes.
Remote Filesystems With the MMAPv1 storage engine, the Network File System protocol (NFS) is not recommended as you may see performance problems when both the data files and the journal files are hosted on NFS. You
may experience better performance if you place the journal on local or iscsi volumes.
The WiredTiger storage engine does not experience performance issues with NFS.
If you decide to use NFS, add the following NFS options to your /etc/fstab file: bg, nolock, and noatime.
Separate Components onto Different Storage Devices For improved performance, consider separating your
databases data, journal, and logs onto different storage devices, based on your applications access and write pattern.
For the WiredTiger storage engine, you can also store the indexes on a different storage device.
storage.wiredTiger.engineConfig.directoryForIndexes.
See
Note: Using different storage devices will affect your ability to create snapshot-style backups of your data, since the
files will be on different devices and volumes.
Scheduling for Virtual Devices Local block devices attached to virtual machine instances via the hypervisor should
use a noop scheduler for best performance. The noop scheduler allows the operating system to defer I/O scheduling to
the underlying hypervisor.
Architecture
Replica Sets See the Replica Set Architectures (page 553) document for an overview of architectural considerations
for replica set deployments.
49 http://www.kernel.org/doc/Documentation/sysctl/vm.txt
202
Chapter 5. Administration
Sharded Clusters See the Sharded Cluster Production Architecture (page 652) document for an overview of recommended sharded cluster architectures for production deployments.
See also:
Design Notes (page 215)
Compression
WiredTiger can compress collection data using either snappy or zlib compression library. snappy provides a lower
compression rate but has little performance cost, whereas zlib provides better compression rate but has a higher
performance cost.
By default, WiredTiger uses snappy compression library.
To change the compression setting, see
storage.wiredTiger.collectionConfig.blockCompressor.
WiredTiger uses prefix compression on all indexes by default.
Platform Specific Considerations
Note: MongoDB uses the GNU C Library50 (glibc) if available on a system. MongoDB requires version at least
glibc-2.12-1.2.el6 to avoid a known bug with earlier versions. For best results use at least version 2.13.
MongoDB on Linux
Kernel and File Systems When running MongoDB in production on Linux, it is recommended that you use Linux
kernel version 2.6.36 or later.
With the MMAPv1 storage engine, MongoDB preallocates its database files before using them and often creates large
files. As such, you should use the Ext4 and XFS file systems:
In general, if you use the Ext4 file system, use at least version 2.6.23 of the Linux Kernel.
In general, if you use the XFS file system, use at least version 2.6.25 of the Linux Kernel.
Some Linux distributions require different versions of the kernel to support using ext4 and/or xfs:
Linux Distribution
CentOS 5.5
CentOS 5.6
CentOS 5.8
CentOS 6.1
RHEL 5.6
RHEL 6.0
Ubuntu 10.04.4 LTS
Amazon Linux AMI release 2012.03
Filesystem
ext4, xfs
ext4, xfs
ext4, xfs
ext4, xfs
ext4
xfs
ext4, xfs
ext4
Kernel Version
2.6.18-194.el5
2.6.18-3.0.el5
2.6.18-308.8.2.el5
2.6.32-131.0.15.el6.x86_64
2.6.18-3.0
2.6.32-71
2.6.32-38-server
3.2.12-3.2.4.amzn1.x86_64
fsync() on Directories
Important: MongoDB requires a filesystem that supports fsync() on directories. For example, HGFS and Virtual
Boxs shared folders do not support this operation.
50 http://www.gnu.org/software/libc/
203
Recommended Configuration For the MMAPv1 storage engine and the WiredTiger storage engines, consider the
following recommendations:
Turn off atime for the storage volume containing the database files.
Set the file descriptor limit, -n, and the user process limit (ulimit), -u, above 20,000, according to the suggestions in the ulimit (page 281) document. A low ulimit will affect MongoDB when under heavy use and can
produce errors and lead to failed connections to MongoDB processes and loss of service.
Disable Transparent Huge Pages, as MongoDB performs better with normal (4096 bytes) virtual memory pages.
See Transparent Huge Pages Settings (page 285).
Disable NUMA in your BIOS. If that is not possible, see MongoDB on NUMA Hardware (page 201).
For the MMAPv1 storage engine:
Ensure that readahead settings for the block devices that store the database files are appropriate. For random
access use patterns, set low readahead values. A readahead of 32 (16kb) often works well.
For a standard block device, you can run sudo blockdev --report to get the readahead settings and
sudo blockdev --setra <value> <device> to change the readahead settings. Refer to your specific operating system manual for more information.
For all MongoDB deployments:
Use the Network Time Protocol (NTP) to synchronize time among your hosts. This is especially important in
sharded clusters.
SSL Libraries On Linux platforms, you may observe one of the following statements in the MongoDB log:
These warnings indicate that the systems SSL libraries are different from the SSL libraries that the mongod was
compiled against. Typically these messages do not require intervention; however, you can use the following operations
to determine the symbol versions that mongod expects:
objdump -T <path to mongod>/mongod | grep " SSL_"
objdump -T <path to mongod>/mongod | grep " CRYPTO_"
These operations will return output that resembles one the of the following lines:
0000000000000000
0000000000000000
DF *UND*
DF *UND*
0000000000000000
0000000000000000
libssl.so.10 SSL_write
OPENSSL_1.0.0 SSL_write
The last two strings in this output are the symbol version and symbol name. Compare these values with the values
returned by the following operations to detect symbol version mismatches:
objdump -T <path to SSL libs>/libssl.so.1*
objdump -T <path to SSL libs>/libcrypto.so.1*
This procedure is neither exact nor exhaustive: many symbols used by mongod from the libcrypto library do not
begin with CRYPTO_.
MongoDB on Windows
204
Chapter 5. Administration
MongoDB 2.6.6 and Later Using MMAPv1 Microsoft has released a hotfix for Windows 7 and Windows Server
2008 R2, KB273128451 , that repairs a bug in these operating systems use of memory-mapped files that adversely
affects the performance of MongoDB using the MMAPv1 storage engine.
Install this hotfix to obtain significant performance improvements on MongoDB 2.6.6 and later releases in the 2.6
series, which use MMAPv1 exclusively, and on 3.0 and later when using MMAPv1 as the storage engine.
MongoDB 3.0 Using WiredTiger For MongoDB instances using the WiredTiger storage engine, performance on
Windows is comparable to performance on Linux.
MongoDB on Virtual Environments This section describes considerations when running MongoDB in some of the
more common virtual environments.
For all platforms, consider Scheduling for Virtual Devices (page 202).
EC2 MongoDB is compatible with EC2.
You may alternately choose to obtain a set of Amazon Machine Images (AMI) that bundle together MongoDB and
Amazons Provisioned IOPS storage volumes. Provisioned IOPS can greatly increase MongoDBs performance and
ease of use. For more information, see this blog post52 .
Azure For all MongoDB deployments using Azure, you must mount the volume that hosts the mongod instances
dbPath with the Host Cache Preference READ/WRITE.
This applies to all Azure deployments, using any guest operating system.
If your volumes have inappropriate cache settings, MongoDB may eventually shut down with the following error:
[DataFileSync] FlushViewOfFile for <data file> failed with error 1 ...
[DataFileSync] Fatal Assertion 16387
These shut downs do not produce data loss when storage.journal.enabled is set to true. You can safely
restart mongod at any time following this event.
The performance characteristics of MongoDB may change with READ/WRITE caching enabled.
The TCP keepalive on the Azure load balancer is 240 seconds by default, which can cause it to silently drop connections if the TCP keepalive on your Azure systems is greater than this value. You should set tcp_keepalive_time
to 120 to ameliorate this problem.
On Linux systems you can use the following operation to check the value of tcp_keepalive_time:
cat /proc/sys/net/ipv4/tcp_keepalive_time
The value is measured in seconds. You can change the tcp_keepalive_time value with the following operation:
echo <value> > /proc/sys/net/ipv4/tcp_keepalive_time
For Windows systems, issue the following command to view the keep alive setting:
reg query HKLM\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters /v KeepAliveTime
The registry value is not present by default. The system default, used if the value is absent, is 7200000 milliseconds
or 0x6ddd00 in hexadecimal. To set a shorter keep alive period use the following invocation in an Administrator
Command Prompt, where <value> is expressed in hexadecimal (e.g. 0x0124c0 is 120000):
51 http://support.microsoft.com/kb/2731284
52 http://www.mongodb.com/blog/post/provisioned-iops-aws-marketplace-significantly-boosts-mongodb-performance-ease-use
205
Windows users should consider the Windows Server Technet Article on KeepAliveTime53 for more information on
setting keep alive for MongoDB deployments on Windows systems.
VMWare MongoDB is compatible with VMWare. As some users have run into issues with VMWares memory
overcommit feature, disabling the feature is recommended.
It is possible to clone a virtual machine running MongoDB. You might use this function to spin up a new virtual host
to add as a member of a replica set. If you clone a VM with journaling enabled, the clone snapshot will be valid. If
not using journaling, first stop mongod, then clone the VM, and finally, restart mongod.
MongoDB on Solaris The MongoDB distribution for Solaris does not include support for the WiredTiger storage
engine (page 89).
Performance Monitoring
iostat On Linux, use the iostat command to check if disk I/O is a bottleneck for your database. Specify a number
of seconds when running iostat to avoid displaying stats covering the time since server boot.
For example, the following command will display extended statistics and the time for each displayed report, with
traffic in MB/s, at one second intervals:
iostat -xmt 1
To make backups of your MongoDB database, please refer to MongoDB Backup Methods Overview (page 182).
Additional Resources
Blog Post: Capacity Planning and Hardware Provisioning for MongoDB In Ten Minutes55
Whitepaper: MongoDB Multi-Data Center Deployments56
Whitepaper: Security Architecture57
Whitepaper: MongoDB Architecture Guide58
53 https://technet.microsoft.com/en-us/library/cc957549.aspx
54 http://www.gropp.org/?id=projects&sub=bwm-ng
55 https://www.mongodb.com/blog/post/capacity-planning-and-hardware-provisioning-mongodb-ten-minutes
56 http://www.mongodb.com/lp/white-paper/multi-dc
57 https://www.mongodb.com/lp/white-paper/mongodb-security-architecture
58 https://www.mongodb.com/lp/whitepaper/architecture-guide
206
Chapter 5. Administration
Operational Overview MongoDB includes a number of features that allow database administrators and developers
to segregate application operations to MongoDB deployments by functional or geographical groupings.
This capability provides data center awareness, which allows applications to target MongoDB deployments with
consideration of the physical location of the mongod instances. MongoDB supports segmentation of operations
across different dimensions, which may include multiple data centers and geographical regions in multi-data center
deployments, racks, networks, or power circuits in single data center deployments.
MongoDB also supports segregation of database operations based on functional or operational parameters, to ensure
that certain mongod instances are only used for reporting workloads or that certain high-frequency portions of a
sharded collection only exist on specific shards.
Specifically, with MongoDB, you can:
ensure write operations propagate to specific members of a replica set, or to specific members of replica sets.
59 http://www.mongodb.com/presentations/webinar-mongodb-administration-101
207
The Write Concern (page 76) and Read Preference (page 568) documents, which address capabilities related to
data center awareness.
Deploy a Geographically Redundant Replica Set (page 588).
Additional Resource
208
Chapter 5. Administration
Capped collections guarantee preservation of the insertion order. As a result, queries do not need an index to
return documents in insertion order. Without this indexing overhead, they can support higher insertion throughput.
Capped collections guarantee that insertion order is identical to the order on disk (natural order) and do so
by prohibiting updates that increase document size. Capped collections only allow updates that fit the original
document size, which ensures a document does not change its location on disk.
Capped collections automatically remove the oldest documents in the collection without requiring scripts or
explicit remove operations.
For example, the oplog.rs collection that stores a log of the operations in a replica set uses a capped collection.
Consider the following potential use cases for capped collections:
Store log information generated by high-volume systems. Inserting documents in a capped collection without
an index is close to the speed of writing log information directly to a file system. Furthermore, the built-in
first-in-first-out property maintains the order of events, while managing storage use.
Cache small amounts of data in a capped collections. Since caches are read rather than write heavy, you would
either need to ensure that this collection always remains in the working set (i.e. in RAM) or accept some write
penalty for the required index or indexes.
Recommendations and Restrictions
You can only make in-place updates of documents. If the update operation causes the document to grow beyond
their original size, the update operation will fail.
If you plan to update documents in a capped collection, create an index so that these update operations do not
require a table scan.
If you update a document in a capped collection to a size smaller than its original size, and then a secondary
resyncs from the primary, the secondary will replicate and allocate space based on the current smaller document
size. If the primary then receives an update which increases the document back to its original size, the primary
will accept the update but the secondary will fail with a failing update: objects in a capped
ns cannot grow error message.
To prevent this error, create your secondary from a snapshot of one of the other up-to-date members of the
replica set. Follow our tutorial on filesystem snapshots (page 241) to seed your new secondary.
Seeding the secondary with a filesystem snapshot is the only way to guarantee the primary and secondary binary
files are compatible. MMS Backup snapshots are insufficient in this situation since you need more than the
content of the secondary to match the primary.
You cannot delete documents from a capped collection. To remove all documents from a collection, use the
drop() method to drop the collection.
You cannot shard a capped collection.
Capped collections created after 2.2 have an _id field and an index on the _id field by default. Capped
collections created before 2.2 do not have an index on the _id field by default. If you are using capped
collections with replication prior to 2.2, you should explicitly create an index on the _id field.
Warning: If you have a capped collection in a replica set outside of the local database, before 2.2,
you should create a unique index on _id. Ensure uniqueness using the unique: true option to
the createIndex() method or by using an ObjectId for the _id field. Alternately, you can use the
autoIndexId option to create when creating the capped collection, as in the Query a Capped Collection (page 210) procedure.
209
Use natural ordering to retrieve the most recently inserted elements from the collection efficiently. This is
(somewhat) analogous to tail on a log file.
The aggregation pipeline operator $out cannot write results to a capped collection.
Procedures
Create a Capped Collection You must create capped collections explicitly using the createCollection()
method, which is a helper in the mongo shell for the create command. When creating a capped collection you must
specify the maximum size of the collection in bytes, which MongoDB will pre-allocate for the collection. The size of
the capped collection includes a small amount of space for internal overhead.
db.createCollection( "log", { capped: true, size: 100000 } )
If the size field is less than or equal to 4096, then the collection will have a cap of 4096 bytes. Otherwise, MongoDB
will raise the provided size to make it an integer multiple of 256.
Additionally, you may also specify a maximum number of documents for the collection using the max field as in the
following document:
db.createCollection("log", { capped : true, size : 5242880, max : 5000 } )
Important: The size argument is always required, even when you specify max number of documents. MongoDB
will remove older documents if a collection reaches the maximum size limit before it reaches the maximum document
count.
See
createCollection() and create.
Query a Capped Collection If you perform a find() on a capped collection with no ordering specified, MongoDB
guarantees that the ordering of results is the same as the insertion order.
To retrieve documents in reverse insertion order, issue find() along with the sort() method with the $natural
parameter set to -1, as shown in the following example:
db.cappedCollection.find().sort( { $natural: -1 } )
Check if a Collection is Capped Use the isCapped() method to determine if a collection is capped, as follows:
db.collection.isCapped()
Convert a Collection to Capped You can convert a non-capped collection to a capped collection with the
convertToCapped command:
db.runCommand({"convertToCapped": "mycoll", size: 100000});
The size parameter specifies the size of the capped collection in bytes.
Warning: This command obtains a global write lock and will block other operations until it has completed.
Changed in version 2.2: Before 2.2, capped collections did not have an index on _id unless you specified
autoIndexId to the create, after 2.2 this became the default.
210
Chapter 5. Administration
Automatically Remove Data After a Specified Period of Time For additional flexibility when expiring data, consider MongoDBs TTL indexes, as described in Expire Data from Collections by Setting TTL (page 211). These indexes
allow you to expire and remove data from normal collections using a special type, based on the value of a date-typed
field and a TTL value for the index.
TTL Collections (page 211) are not compatible with capped collections.
Tailable Cursor You can use a tailable cursor with capped collections. Similar to the Unix tail -f command,
the tailable cursor tails the end of a capped collection. As new documents are inserted into the capped collection,
you can use the tailable cursor to continue retrieving documents.
See Create Tailable Cursor (page 121) for information on creating a tailable cursor.
Expire Data from Collections by Setting TTL
New in version 2.2.
This document provides an introduction to MongoDBs time to live or TTL collection feature. TTL collections make
it possible to store data in MongoDB and have the mongod automatically remove data after a specified number of
seconds or at a specific clock time.
Data expiration is useful for some classes of information, including machine generated event data, logs, and session
information that only need to persist for a limited period of time.
A special TTL index property (page 488) supports the implementation of TTL collections. The TTL feature relies on a
background thread in mongod that reads the date-typed values in the index and removes expired documents from the
collection.
Procedures
To create a TTL index (page 488), use the db.collection.createIndex() method with the
expireAfterSeconds option on a field whose value is either a date (page 178) or an array that contains date
values (page 178).
Note: The TTL index is a single field index. Compound indexes do not support the TTL property. For more
information on TTL indexes, see TTL Indexes (page 488).
Expire Documents after a Specified Number of Seconds To expire data after a specified number of seconds has
passed since the indexed field, create a TTL index on a field that holds values of BSON date type or an array of BSON
date-typed objects and specify a positive non-zero value in the expireAfterSeconds field. A document will
expire when the number of seconds in the expireAfterSeconds field has passed since the time specified in its
indexed field. 64
For example, the following operation creates an index on the log_events collections createdAt field and specifies the expireAfterSeconds value of 3600 to set the expiration time to be one hour after the time specified by
createdAt.
db.log_events.createIndex( { "createdAt": 1 }, { expireAfterSeconds: 3600 } )
When adding documents to the log_events collection, set the createdAt field to the current time:
64 If the field contains an array of BSON date-typed objects, data expires if at least one of BSON date-typed object is older than the number of
seconds specified in expireAfterSeconds.
211
db.log_events.insert( {
"createdAt": new Date(),
"logEvent": 2,
"logMessage": "Success!"
} )
MongoDB will automatically delete documents from the log_events collection when the documents createdAt
value 1 is older than the number of seconds specified in expireAfterSeconds.
See also:
$currentDate operator
Expire Documents at a Specific Clock Time To expire documents at a specific clock time, begin by creating a
TTL index on a field that holds values of BSON date type or an array of BSON date-typed objects and specify an
expireAfterSeconds value of 0. For each document in the collection, set the indexed date field to a value
corresponding to the time the document should expire. If the indexed date field contains a date in the past, MongoDB
considers the document expired.
For example, the following operation creates an index on the log_events collections expireAt field and specifies
the expireAfterSeconds value of 0:
db.log_events.createIndex( { "expireAt": 1 }, { expireAfterSeconds: 0 } )
For each document, set the value of expireAt to correspond to the time the document should expire. For instance,
the following insert() operation adds a document that should expire at July 22, 2013 14:00:00.
db.log_events.insert( {
"expireAt": new Date('July 22, 2013 14:00:00'),
"logEvent": 2,
"logMessage": "Success!"
} )
MongoDB will automatically delete documents from the log_events collection when the documents expireAt
value is older than the number of seconds specified in expireAfterSeconds, i.e. 0 seconds older in this case. As
such, the data expires at the specified expireAt value.
212
Chapter 5. Administration
MongoDB provides a database profiler that shows performance characteristics of each operation against the database.
Use the profiler to locate any queries or write operations that are running slow. You can use this information, for
example, to determine what indexes to create.
For more information, see Database Profiling (page 219).
Use db.currentOp() to Evaluate mongod Operations
The cursor.explain() and db.collection.explain() methods return information on a query execution, such as the index MongoDB selected to fulfill the query and execution statistics. You can run the methods in
queryPlanner mode, executionStats mode, or allPlansExecution mode to control the amount of information returned.
Example
To use cursor.explain() on a query for documents matching the expression { a:
named records, use an operation that resembles the following in the mongo shell:
1 }, in the collection
db.records.find( { a: 1 } ).explain("executionStats")
Capped Collections (page 208) are circular, fixed-size collections that keep documents well-ordered, even without the
use of an index. This means that capped collections can receive very high-speed writes and sequential reads.
These collections are particularly useful for keeping log files but are not limited to that purpose. Use capped collections
where appropriate.
Use Natural Order for Fast Reads
To return documents in the order they exist on disk, return sorted operations using the $natural operator. On a
capped collection, this also returns the documents in the order in which they were written.
Natural order does not use indexes but can be fast for operations when you want to select the first or last items on disk.
See also:
sort() and limit().
213
For commonly issued queries, create indexes (page 463). If a query searches multiple fields, create a compound index
(page 472). Scanning an index is much faster than scanning a collection. The indexes structures are smaller than the
documents reference, and store references in order.
Example
If you have a posts collection containing blog posts, and if you regularly issue a query that sorts on the
author_name field, then you can optimize the query by creating an index on the author_name field:
db.posts.createIndex( { author_name : 1 } )
Indexes also improve efficiency on queries that routinely sort on a given field.
Example
If you regularly issue a query that sorts on the timestamp field, then you can optimize the query by creating an
index on the timestamp field:
Creating this index:
db.posts.createIndex( { timestamp : 1 } )
Because MongoDB can read indexes in both ascending and descending order, the direction of a single-key index does
not matter.
Indexes support queries, update operations, and some phases of the aggregation pipeline (page 423).
Index keys that are of the BinData type are more efficiently stored in the index if:
the binary subtype value is in the range of 0-7 or 128-135, and
the length of the byte array is: 0, 1, 2, 3, 4, 5, 6, 7, 8, 10, 12, 14, 16, 20, 24, or 32.
Limit the Number of Query Results to Reduce Network Demand
MongoDB cursors return results in groups of multiple documents. If you know the number of results you want, you
can reduce the demand on network resources by issuing the limit() method.
This is typically used in conjunction with sort operations. For example, if you need only 10 results from your query to
the posts collection, you would issue the following command:
db.posts.find().sort( { timestamp : -1 } ).limit(10)
When you need only a subset of fields from documents, you can achieve better performance by returning only the
fields you need:
214
Chapter 5. Administration
For example, if in your query to the posts collection, you need only the timestamp, title, author, and
abstract fields, you would issue the following command:
db.posts.find( {}, { timestamp : 1 , title : 1 , author : 1 , abstract : 1} ).sort( { timestamp : -1
For more information on using projections, see Limit Fields to Return from a Query (page 106).
Use $hint to Select a Particular Index
In most cases the query optimizer (page 66) selects the optimal index for a specific operation; however, you can force
MongoDB to use a specific index using the hint() method. Use hint() to support performance testing, or on
some queries where you must select a field or field included in several indexes.
Use the Increment Operator to Perform Operations Server-Side
Use MongoDBs $inc operator to increment or decrement values in documents. The operator increments the value
of the field on the server side, as an alternative to selecting a document, making simple modifications in the client
and then writing the entire document to the server. The $inc operator can also help avoid race conditions, which
would result when two application instances queried for a document, manually incremented a field, and saved the
entire document back at the same time.
Design Notes
This page details features of MongoDB that may be important to keep in mind when developing applications.
Schema Considerations
Dynamic Schema Data in MongoDB has a dynamic schema. Collections do not enforce document structure. This
facilitates iterative development and polymorphism. Nevertheless, collections often hold documents with highly homogeneous structures. See Data Modeling Concepts (page 143) for more information.
Some operational considerations include:
the exact set of collections to be used;
the indexes to be used: with the exception of the _id index, all indexes must be created explicitly;
shard key declarations: choosing a good shard key is very important as the shard key cannot be changed once
set.
Avoid importing unmodified data directly from a relational database. In general, you will want to roll up certain
data into richer documents that take advantage of MongoDBs support for embedded documents and nested arrays.
Case Sensitive Strings MongoDB strings are case sensitive. So a search for "joe" will not find "Joe".
Consider:
storing data in a normalized case format, or
using regular expressions ending with the i option, and/or
using $toLower or $toUpper in the aggregation framework (page 421).
215
Type Sensitive Fields MongoDB data is stored in the BSON format, a binary encoded serialization of JSON-like
documents. BSON encodes additional type information. See bsonspec.org65 for more information.
Consider the following document which has a field x with the string value "123":
{ x : "123" }
Then the following query which looks for a number value 123 will not return that document:
db.mycollection.find( { x : 123 } )
General Considerations
By Default, Updates Affect one Document To update multiple documents that meet your query criteria, set the
update multi option to true or 1. See: Update Multiple Documents (page 74).
Prior to MongoDB 2.2, you would specify the upsert and multi options in the update method as positional
boolean options. See: the update method reference documentation.
BSON Document Size Limit The BSON Document Size limit is currently set at 16MB per document. If you
require larger documents, use GridFS (page 148).
No Fully Generalized Transactions MongoDB does not have fully generalized transactions (page 80). If you
model your data using rich documents that closely resemble your applications objects, each logical object will be in
one MongoDB document. MongoDB allows you to modify a document in a single atomic operation. These kinds of
data modification pattern covers most common uses of transactions in other systems.
Replica Set Considerations
Use an Odd Number of Replica Set Members Replica sets (page 541) perform consensus elections. To ensure
that elections will proceed successfully, either use an odd number of members, typically three, or else use an arbiter
to ensure an odd number of votes.
Keep Replica Set Members Up-to-Date MongoDB replica sets support automatic failover (page 560). It is important for your secondaries to be up-to-date. There are various strategies for assessing consistency:
1. Use monitoring tools to alert you to lag events. See Monitoring for MongoDB (page 185) for a detailed discussion of MongoDBs monitoring options.
2. Specify appropriate write concern.
3. If your application requires manual fail over, you can configure your secondaries as priority 0 (page 548).
Priority 0 secondaries require manual action for a failover. This may be practical for a small replica set, but
large deployments should fail over automatically.
See also:
replica set rollbacks (page 564).
65 http://bsonspec.org/#/specification
216
Chapter 5. Administration
Sharding Considerations
Pick your shard keys carefully. You cannot choose a new shard key for a collection that is already sharded.
Shard key values are immutable.
When enabling sharding on an existing collection, MongoDB imposes a maximum size on those collections to ensure that it is possible to create chunks. For a detailed explanation of this limit, see:
<sharding-existing-collection-data-size>.
To shard large amounts of data, create a new empty sharded collection, and ingest the data from the source
collection using an application level import operation.
Unique indexes are not enforced across shards except for the shard key itself. See Enforce Unique Keys for
Sharded Collections (page 711).
Consider pre-splitting (page 700) an empty sharded collection before a massive bulk import.
Analyze Performance
As you develop and operate applications with MongoDB, you may want to analyze the performance of the database
as the application. Consider the following as you begin to investigate the performance of MongoDB.
Overview Degraded performance in MongoDB is typically a function of the relationship between the quantity of
data stored in the database, the amount of system RAM, the number of connections to the database, and the amount of
time the database spends in a locked state.
In some cases performance issues may be transient and related to traffic load, data access patterns, or the availability
of hardware on the host system for virtualized environments. Some users also experience performance limitations as a
result of inadequate or inappropriate indexing strategies, or as a consequence of poor schema design patterns. In other
situations, performance issues may indicate that the database may be operating at capacity and that it is time to add
additional capacity to the database.
The following are some causes of degraded performance in MongoDB.
Locks MongoDB uses a locking system to ensure data set consistency. However, if certain operations are longrunning, or a queue forms, performance will slow as requests and operations wait for the lock. Lock-related slowdowns
can be intermittent. To see if the lock has been affecting your performance, look to the data in the globalLock section
of the serverStatus output. If globalLock.currentQueue.total is consistently high, then there is a
chance that a large number of requests are waiting for a lock. This indicates a possible concurrency issue that may be
affecting performance.
If globalLock.totalTime is high relative to uptime, the database has existed in a lock state for a significant
amount of time. If globalLock.ratio is also high, MongoDB has likely been processing a large number of
long running queries. Long queries are often the result of a number of factors: ineffective use of indexes, nonoptimal schema design, poor query structure, system architecture issues, or insufficient RAM resulting in page faults
(page 218) and disk reads.
Memory Use for MMAPv1 Storage Engine MongoDB uses memory mapped files to store data. Given a data set
of sufficient size, the MongoDB process will allocate all available memory on the system for its use. While this is part
of the design, and affords MongoDB superior performance, the memory mapped files make it difficult to determine if
the amount of RAM is sufficient for the data set.
217
The memory usage statuses metrics of the serverStatus output can provide insight into MongoDBs memory use.
Check the resident memory use (i.e. mem.resident): if this exceeds the amount of system memory and there is a
significant amount of data on disk that isnt in RAM, you may have exceeded the capacity of your system.
You should also check the amount of mapped memory (i.e. mem.mapped.) If this value is greater than the amount of
system memory, some operations will require disk access page faults to read data from virtual memory and negatively
affect performance.
Page Faults for MMAPv1 Storage Engine With the MMAPv1 storage engine, page faults can occur as MongoDB
reads from or writes data to parts of its data files that are not currently located in physical memory. In contrast,
operating system page faults happen when physical memory is exhausted and pages of physical memory are swapped
to disk.
Page faults triggered by MongoDB are reported as the total number of page faults in one second. To check for page
faults, see the extra_info.page_faults value in the serverStatus output.
MongoDB on Windows counts both hard and soft page faults.
The MongoDB page fault counter may increase dramatically in moments of poor performance and may correlate
with limited physical memory environments. Page faults also can increase while accessing much larger data sets,
for example, scanning an entire collection. Limited and sporadic MongoDB page faults do not necessarily indicate a
problem or a need to tune the database.
A single page fault completes quickly and is not problematic. However, in aggregate, large volumes of page faults
typically indicate that MongoDB is reading too much data from disk. In many situations, MongoDBs read locks will
yield after a page fault to allow other processes to read and avoid blocking while waiting for the next page to read
into memory. This approach improves concurrency, and also improves overall throughput in high volume systems.
Increasing the amount of RAM accessible to MongoDB may help reduce the frequency of page faults. If this is not
possible, you may want to consider deploying a sharded cluster or adding shards to your deployment to distribute load
among mongod instances.
See What are page faults? (page 754) for more information.
Number of Connections In some cases, the number of connections between the application layer (i.e. clients) and
the database can overwhelm the ability of the server to handle requests. This can produce performance irregularities.
The following fields in the serverStatus document can provide insight:
globalLock.activeClients contains a counter of the total number of clients with active operations in
progress or queued.
connections is a container for the following two fields:
current the total number of current clients that connect to the database instance.
available the total number of unused collections available for new clients.
If requests are high because there are numerous concurrent application requests, the database may have trouble keeping
up with demand. If this is the case, then you will need to increase the capacity of your deployment. For read-heavy
applications increase the size of your replica set and distribute read operations to secondary members. For write heavy
applications, deploy sharding and add one or more shards to a sharded cluster to distribute load among mongod
instances.
Spikes in the number of connections can also be the result of application or driver errors. All of the officially supported
MongoDB drivers implement connection pooling, which allows clients to use and reuse connections more efficiently.
Extremely high numbers of connections, particularly without corresponding workload is often indicative of a driver or
other configuration error.
218
Chapter 5. Administration
Unless constrained by system-wide limits MongoDB has no limit on incoming connections. You can modify system
limits using the ulimit command, or by editing your systems /etc/sysctl file. See UNIX ulimit Settings
(page 281) for more information.
Database Profiling MongoDBs Profiler is a database profiling system that can help identify inefficient queries
and operations.
The following profiling levels are available:
Level
0
1
2
Setting
Off. No profiling
On. Only includes slow operations
On. Includes all operations
Enable the profiler by setting the profile value using the following command in the mongo shell:
db.setProfilingLevel(1)
The slowOpThresholdMs setting defines what constitutes a slow operation. To set the threshold above
which the profiler considers operations slow (and thus, included in the level 1 profiling data), you can configure
slowOpThresholdMs at runtime as an argument to the db.setProfilingLevel() operation.
See
The documentation of db.setProfilingLevel() for more information about this command.
By default, mongod records all slow queries to its log, as defined by slowOpThresholdMs.
Note: Because the database profiler can negatively impact performance, only enable profiling for strategic intervals
and as minimally as possible on production systems.
You may enable profiling on a per-mongod basis. This setting will not propagate across a replica set or sharded
cluster.
You can view the output of the profiler in the system.profile collection of your database by issuing the show
profile command in the mongo shell, or with the following operation:
db.system.profile.find( { millis : { $gt : 100 } } )
This returns all operations that lasted longer than 100 milliseconds. Ensure that the value specified here (100, in this
example) is above the slowOpThresholdMs threshold.
See also:
Optimization Strategies for MongoDB (page 212) addresses strategies that may improve the performance of your
database queries and operations.
219
Continue reading from Configuration, Maintenance, and Analysis (page 220) for additional tutorials of fundamental MongoDB maintenance procedures.
Backup and Recovery (page 240) Outlines procedures for data backup and restoration with mongod instances and
deployments.
Backup and Restore with Filesystem Snapshots (page 241) An outline of procedures for creating MongoDB
data set backups using system-level file snapshot tool, such as LVM or native storage appliance tools.
Backup and Restore Sharded Clusters (page 249) Detailed procedures and considerations for backing up
sharded clusters and single shards.
Recover Data after an Unexpected Shutdown (page 257) Recover data from MongoDB data files that were not
properly closed or have an invalid state.
Continue reading from Backup and Recovery (page 240) for additional tutorials of MongoDB backup and recovery procedures.
MongoDB Scripting (page 259) An introduction to the scripting capabilities of the mongo shell and the scripting
capabilities embedded in MongoDB instances.
MongoDB Tutorials (page 277) A complete list of tutorials in the MongoDB Manual that address MongoDB operation and use.
220
Chapter 5. Administration
You specify a command first by constructing a standard BSON document whose first key is the name of the command.
For example, specify the isMaster command using the following BSON document:
{ isMaster: 1 }
Issue Commands
The mongo shell provides a helper method for running commands called db.runCommand(). The following
operation in mongo runs the above command:
db.runCommand( { isMaster: 1 } )
Many drivers provide an equivalent for the db.runCommand() method. Internally, running commands with
db.runCommand() is equivalent to a special query against the $cmd collection.
Many common commands have their own shell helpers or wrappers in the mongo shell and drivers, such as the
db.isMaster() method in the mongo JavaScript shell.
You can use the maxTimeMS option to specify a time limit for the execution of a command, see Terminate a Command
(page 224) for more information on operation termination.
admin Database Commands
You must run some commands on the admin database. Normally, these operations resemble the followings:
use admin
db.runCommand( {buildInfo: 1} )
However, theres also a command helper that automatically runs the command in the context of the admin database:
db._adminCommand( {buildInfo: 1} )
Command Responses
All commands return, at minimum, a document with an ok field indicating whether the command has succeeded:
{ 'ok': 1 }
221
By default, MongoDB stores data in the /data/db directory. On Windows, MongoDB stores data in C:\data\db.
On all platforms, MongoDB listens for connections from clients on port 27017.
To start MongoDB using all defaults, issue the following command at the system shell:
mongod
Specify a Data Directory If you want mongod to store data files at a path other than /data/db you can specify
a dbPath. The dbPath must exist before you start mongod. If it does not exist, create the directory and the
permissions so that mongod can read and write data to this path. For more information on permissions, see the
security operations documentation (page 322).
To specify a dbPath for mongod to use as a data directory, use the --dbpath option. The following invocation
will start a mongod instance and store data in the /srv/mongodb path
mongod --dbpath /srv/mongodb/
Specify a TCP Port Only a single process can listen for connections on a network interface at a time. If you run
multiple mongod processes on a single machine, or have other processes that must use this port, you must assign each
a different port to listen on for client connections.
To specify a port to mongod, use the --port option on the command line. The following command starts mongod
listening on port 12345:
mongod --port 12345
Additional Configuration Options For an overview of common configurations and common configuration deployments. configurations for common use cases, see Run-time Database Configuration (page 192).
222
Chapter 5. Administration
In a clean shutdown a mongod completes all pending operations, flushes all data to data files, and closes all data files.
Other shutdowns are unclean and can compromise the validity the data files.
To ensure a clean shutdown, always shutdown mongod instances using one of the following methods:
Use shutdownServer() Shut down the mongod from the mongo shell using the db.shutdownServer()
method as follows:
use admin
db.shutdownServer()
Calling the same method from a control script accomplishes the same result.
For systems with authorization enabled, users may only issue db.shutdownServer() when authenticated
to the admin database or via the localhost interface on systems without authentication enabled.
Use --shutdown From the Linux command line, shut down the mongod using the --shutdown option in the
following command:
mongod --shutdown
Use CTRL-C When running the mongod instance in interactive mode (i.e. without --fork), issue Control-C
to perform a clean shutdown.
Use kill From the Linux command line, shut down a specific mongod instance using the following command:
kill <mongod process ID>
Procedure If the mongod is the primary in a replica set, the shutdown process for these mongod instances has the
following steps:
1. Check how up-to-date the secondaries are.
2. If no secondary is within 10 seconds of the primary, mongod will return a message that it will not shut down.
You can pass the shutdown command a timeoutSecs argument to wait for a secondary to catch up.
3. If there is a secondary within 10 seconds of the primary, the primary will step down and wait for the secondary
to catch up.
4. After 60 seconds or once the secondary has caught up, the primary will shut down.
Force Replica Set Shutdown If there is no up-to-date secondary and you want the primary to shut down, issue the
shutdown command with the force argument, as in the following mongo shell operation:
db.adminCommand({shutdown : 1, force : true})
223
To keep checking the secondaries for a specified number of seconds if none are immediately up-to-date, issue
shutdown with the timeoutSecs argument. MongoDB will keep checking the secondaries for the specified
number of seconds if none are immediately up-to-date. If any of the secondaries catch up within the allotted time, the
primary will shut down. If no secondaries catch up, it will not shut down.
The following command issues shutdown with timeoutSecs set to 5:
db.adminCommand({shutdown : 1, timeoutSecs : 5})
Alternately you can use the timeoutSecs argument with the db.shutdownServer() method:
db.shutdownServer({timeoutSecs : 5})
MongoDB provides two facilitates to terminate running operations: maxTimeMS() and db.killOp(). Use these
operations as needed to control the behavior of operations in a MongoDB deployment.
Available Procedures
From the mongo shell, use the following method to set a time limit of 30 milliseconds for this
Terminate a Command Consider a potentially long running operation using distinct to return each distinctcollection field that has a city key:
db.runCommand( { distinct: "collection",
key: "city" } )
You can add the maxTimeMS field to the command document to set a time limit of 45 milliseconds for the operation:
db.runCommand( { distinct: "collection",
key: "city",
maxTimeMS: 45 } )
224
Chapter 5. Administration
killOp The db.killOp() method interrupts a running operation at the next interrupt point. db.killOp()
identifies the target operation by operation ID.
db.killOp(<opId>)
Warning: Terminate running operations with extreme caution. Only use db.killOp() to terminate operations
initiated by clients and do not terminate internal database operations.
Related
To return a list of running operations see db.currentOp().
1 - collects profiling data for slow operations only. By default slow operations are those slower than 100
milliseconds.
You can modify the threshold for slow operations with the slowOpThresholdMs runtime option or the
setParameter command. See the Specify the Threshold for Slow Operations (page 226) section for more
information.
2 - collects profiling data for all database operations.
Enable Database Profiling and Set the Profiling Level
You can enable database profiling from the mongo shell or through a driver using the profile command. This
section will describe how to do so from the mongo shell. See your driver documentation if you want to
control the profiler from within your application.
When you enable profiling, you also set the profiling level (page 225). The profiler records data in the
system.profile (page 288) collection. MongoDB creates the system.profile (page 288) collection in a
database after you enable profiling for that database.
5.2. Administration Tutorials
225
To enable profiling and set the profiling level, use the db.setProfilingLevel() helper in the mongo shell,
passing the profiling level as a parameter. For example, to enable profiling for all database operations, consider the
following operation in the mongo shell:
db.setProfilingLevel(2)
The shell returns a document showing the previous level of profiling. The "ok" :
operation succeeded:
To verify the new setting, see the Check Profiling Level (page 226) section.
Specify the Threshold for Slow Operations The threshold for slow operations applies to the entire mongod instance. When you change the threshold, you change it for all databases on the instance.
Important: Changing the slow operation threshold for the database profiler also affects the profiling subsystems
slow operation threshold for the entire mongod instance. Always set the threshold to the highest useful value.
By default the slow operation threshold is 100 milliseconds. Databases with a profiling level of 1 will log operations
slower than 100 milliseconds.
To change the threshold, pass two parameters to the db.setProfilingLevel() helper in the mongo shell. The
first parameter sets the profiling level for the current database, and the second sets the default slow operation threshold
for the entire mongod instance.
For example, the following command sets the profiling level for the current database to 0, which disables profiling,
and sets the slow-operation threshold for the mongod instance to 20 milliseconds. Any database on the instance with
a profiling level of 1 will use this threshold:
db.setProfilingLevel(0,20)
Check Profiling Level To view the profiling level (page 225), issue the following from the mongo shell:
db.getProfilingStatus()
Disable Profiling To disable profiling, use the following helper in the mongo shell:
db.setProfilingLevel(0)
226
Chapter 5. Administration
Enable Profiling for an Entire mongod Instance For development purposes in testing environments, you can
enable database profiling for an entire mongod instance. The profiling level applies to all databases provided by the
mongod instance.
To enable profiling for a mongod instance, pass the following parameters to mongod at startup or within the
configuration file:
mongod --profile=1 --slowms=15
This sets the profiling level to 1, which collects profiling data for slow operations only, and defines slow operations as
those that last longer than 15 milliseconds.
See also:
mode and slowOpThresholdMs.
Database Profiling and Sharding You cannot enable profiling on a mongos instance. To enable profiling in a
shard cluster, you must enable profiling for each mongod instance in the cluster.
View Profiler Data
The database profiler logs information about database operations in the system.profile (page 288) collection.
To view profiling information, query the system.profile (page 288) collection. You can use $comment to add
data to the query document to make it easier to analyze data from the profiler. To view example queries, see Profiler
Overhead (page 228).
For an explanation of the output data, see Database Profiler Output (page 288).
Example Profiler Data Queries This section displays example queries to the system.profile (page 288) collection. For an explanation of the query output, see Database Profiler Output (page 288).
To return the most recent 10 log entries in the system.profile (page 288) collection, run a query similar to the
following:
db.system.profile.find().limit(10).sort( { ts : -1 } ).pretty()
To return all operations except command operations ($cmd), run a query similar to the following:
db.system.profile.find( { op: { $ne : 'command' } } ).pretty()
To return operations for a particular collection, run a query similar to the following. This example returns operations
in the mydb databases test collection:
db.system.profile.find( { ns : 'mydb.test' } ).pretty()
To return operations slower than 5 milliseconds, run a query similar to the following:
db.system.profile.find( { millis : { $gt : 5 } } ).pretty()
To return information from a certain time range, run a query similar to the following:
db.system.profile.find(
{
ts : {
$gt : new ISODate("2012-12-09T03:00:00Z") ,
$lt : new ISODate("2012-12-09T03:40:00Z")
}
227
}
).pretty()
The following example looks at the time range, suppresses the user field from the output to make it easier to read,
and sorts the results by how long each operation took to run:
db.system.profile.find(
{
ts : {
$gt : new ISODate("2011-07-12T03:00:00Z") ,
$lt : new ISODate("2011-07-12T03:40:00Z")
}
},
{ user : 0 }
).sort( { millis : -1 } )
Show the Five Most Recent Events On a database that has profiling enabled, the show profile helper in the
mongo shell displays the 5 most recent operations that took at least 1 millisecond to execute. Issue show profile
from the mongo shell, as follows:
show profile
Profiler Overhead
When enabled, profiling has a minor effect on performance. The system.profile (page 288) collection is a
capped collection with a default size of 1 megabyte. A collection of this size can typically store several thousand
profile documents, but some application may use more or less profiling data per operation.
To change the size of the system.profile (page 288) collection, you must:
1. Disable profiling.
2. Drop the system.profile (page 288) collection.
3. Create a new system.profile (page 288) collection.
4. Re-enable profiling.
For example, to create a new system.profile (page 288) collections thats 4000000 bytes, use the following
sequence of operations in the mongo shell:
db.setProfilingLevel(0)
db.system.profile.drop()
db.createCollection( "system.profile", { capped: true, size:4000000 } )
db.setProfilingLevel(1)
To change the size of the system.profile (page 288) collection on a secondary, you must stop the secondary, run
it as a standalone, and then perform the steps above. When done, restart the standalone as a member of the replica set.
For more information, see Perform Maintenance on Replica Set Members (page 610).
228
Chapter 5. Administration
When used with the --logpath option or systemLog.path setting, mongod and mongos instances report a
live account of all activity and operations to a log file. When reporting activity data to a log file, by default, MongoDB
only rotates logs in response to the logRotate command, or when the mongod or mongos process receives a
SIGUSR1 signal from the operating system.
MongoDBs standard log rotation approach archives the current log file and starts a new one. To do this, the mongod
or mongos instance renames the current log file by appending a UTC timestamp to the filename, in ISODate format.
It then opens a new log file, closes the old log file, and sends all new log entries to the new log file.
You can also configure MongoDB to support the Linux/Unix logrotate utility by setting systemLog.logRotate
or --logRotate to reopen. With reopen, mongod or mongos closes the log file, and then reopens a log file
with the same name, expecting that another process renamed the file prior to rotation.
Finally, you can configure mongod to send log data to the syslog. using the --syslog option. In this case, you
can take advantage of alternate logrotation tools.
See also:
For information on logging, see the Process Logging (page 188) section.
Default Log Rotation Behavior
By default, MongoDB uses the --logRotate rename behavior. With rename, mongod or mongos renames
the current log file by appending a UTC timestamp to the filename, opens a new log file, closes the old log file, and
sends all new log entries to the new log file.
Step 1: Start a mongod instance.
mongod -v --logpath /var/log/mongodb/server1.log
Step 4: View the new log files List the new log files to view the newly-created log:
ls /var/log/mongodb/server1.log*
There should be two log files listed: server1.log, which is the log file that mongod or mongos made when it
reopened the log file, and server1.log.<timestamp>, the renamed original log file.
229
Rotating log files does not modify the old rotated log files. When you rotate a log, you rename the server1.log
file to include the timestamp, and a new, empty server1.log file receives all new log input.
Log Rotation with --logRotate reopen
You should rename the log file using an external process, following the typical Linux/Unix log rotate behavior.
Syslog Log Rotation
Do not include --logpath. Since --syslog tells mongod to send log data to the syslog, specifying a
--logpath will causes an error.
To specify the facility level used when logging messages to the syslog, use the --syslogFacility option or
systemLog.syslogFacility configuration setting.
Step 2: Rotate the log. Store and rotate the log output using your systems default log rotation mechanism.
Forcing a Log Rotation with SIGUSR1
For Linux and Unix-based systems, you can use the SIGUSR1 signal to rotate the logs for a single process, as in the
following:
230
Chapter 5. Administration
Manage Journaling
MongoDB uses write ahead logging to an on-disk journal to guarantee write operation (page 71) durability and to
provide crash resiliency. Before applying a change to the data files, MongoDB writes the change operation to the
journal. If MongoDB should terminate or encounter an error before it can write the changes from the journal to the
data files, MongoDB can re-apply the write operation and maintain a consistent state.
Without a journal, if mongod exits unexpectedly, you must assume your data is in an inconsistent state, and you must
run either repair (page 257) or, preferably, resync (page 613) from a clean member of the replica set.
With journaling enabled, if mongod stops unexpectedly, the program can recover everything written to the journal,
and the data remains in a consistent state. By default, the greatest extent of lost writes, i.e., those not made to the
journal, are those made in the last 100 milliseconds. See commitIntervalMs for more information on the default.
With journaling, if you want a data set to reside entirely in RAM, you need enough RAM to hold the data set plus
the write working set. The write working set is the amount of unique data you expect to see written between
re-mappings of the private view. For information on views, see Storage Views used in Journaling (page 301).
Important: Changed in version 2.0: For 64-bit builds of mongod, journaling is enabled by default. For other
platforms, see storage.journal.enabled.
Procedures
Enable Journaling Changed in version 2.0: For 64-bit builds of mongod, journaling is enabled by default.
To enable journaling, start mongod with the --journal command line option.
If no journal files exist, when mongod starts, it must preallocate new journal files. During this operation, the mongod
is not listening for connections until preallocation completes: for some systems this may take a several minutes.
During this period your applications and the mongo shell are not available.
Disable Journaling
Warning: Do not disable journaling on production systems. If your mongod instance stops without shutti
down cleanly unexpectedly for any reason, (e.g. power failure) and you are not running with journaling, then y
must recover from an unaffected replica set member or backup, as described in repair (page 257).
To disable journaling, start mongod with the --nojournal command line option.
Get Commit Acknowledgment You can get commit acknowledgment with the Write Concern (page 76) and the j
option. For details, see Write Concern Reference (page 128).
Avoid Preallocation Lag To avoid preallocation lag (page 301), you can preallocate files in the journal directory by
copying them from another instance of mongod.
Preallocated files do not contain data. It is safe to later remove them. But if you restart mongod with journaling,
mongod will create them again.
Example
The following sequence preallocates journal files for an instance of mongod running on port 27017 with a database
path of /data/db.
231
For demonstration purposes, the sequence starts by creating a set of journal files in the usual way.
1. Create a temporary directory into which to create a set of journal files:
mkdir ~/tmpDbpath
2. Create a set of journal files by staring a mongod instance that uses the temporary directory:
mongod --port 10000 --dbpath ~/tmpDbpath --journal
3. When you see the following log output, indicating mongod has the files, press CONTROL+C to stop the
mongod instance:
[initandlisten] waiting for connections on port 10000
4. Preallocate journal files for the new instance of mongod by moving the journal files from the data directory of
the existing instance to the data directory of the new instance:
mv ~/tmpDbpath/journal /data/db/
Monitor Journal Status Use the following commands and methods to monitor journal status:
serverStatus
The serverStatus command returns database status information that is useful for assessing performance.
journalLatencyTest
Use journalLatencyTest to measure how long it takes on your volume to write to the disk in an appendonly fashion. You can run this command on an idle system to get a baseline sync time for journaling. You can
also run this command on a busy system to see the sync time on a busy system, which may be higher if the
journal directory is on the same volume as the data files.
The journalLatencyTest command also provides a way to check if your disk drive is buffering writes in
its local cache. If the number is very low (i.e., less than 2 milliseconds) and the drive is non-SSD, the drive
is probably buffering writes. In that case, enable cache write-through for the device in your operating system,
unless you have a disk controller card with battery backed RAM.
Change the Group Commit Interval Changed in version 2.0.
You can set the group commit interval using the --journalCommitInterval command line option. The allowed
range is 2 to 300 milliseconds.
Lower values increase the durability of the journal at the expense of disk performance.
Recover Data After Unexpected Shutdown On a restart after a crash, MongoDB replays all journal files in the
journal directory before the server becomes available. If MongoDB must replay journal files, mongod notes these
events in the log output.
There is no reason to run repairDatabase in these situations.
232
Chapter 5. Administration
The _id field holds the name of the function and is unique per database.
The value field holds the function definition.
Once you save a function in the system.js collection, you can use the function from any JavaScript context; e.g.
$where operator, mapReduce command or db.collection.mapReduce().
In the mongo shell, you can use db.loadServerScripts() to load all the scripts saved in the system.js
collection for the current database. Once loaded, you can invoke the functions directly in the shell, as in the following
example:
db.loadServerScripts();
echoFunction(3);
myAddFunction(3, 5);
Ensure you have an up-to-date backup of your data set. See MongoDB Backup Methods (page 182).
Consult the following documents for any special considerations or compatibility issues specific to your MongoDB release:
The release notes, located at Release Notes (page 763).
233
The documentation for your driver. See Drivers66 page for more information.
If your installation includes replica sets, plan the upgrade during a predefined maintenance window.
Before you upgrade a production environment, use the procedures in this document to upgrade a staging environment that reproduces your production environment, to ensure that your production configuration is compatible
with all changes.
Upgrade Procedure
234
Chapter 5. Administration
To upgrade a replica set, upgrade each member individually, starting with the secondaries and finishing with the
primary. Plan the upgrade during a predefined maintenance window.
Upgrade Secondaries Upgrade each secondary separately as follows:
1. Upgrade the secondarys mongod binary by following the instructions below in Upgrade a MongoDB Instance
(page 234).
2. After upgrading a secondary, wait for the secondary to recover to the SECONDARY state before upgrading the
next instance. To check the members state, issue rs.status() in the mongo shell.
The secondary may briefly go into STARTUP2 or RECOVERING. This is normal. Make sure to wait for the
secondary to fully recover to SECONDARY before you continue the upgrade.
Upgrade the Primary
1. Step down the primary to initiate the normal failover (page 560) procedure. Using one of the following:
The rs.stepDown() helper in the mongo shell.
235
Overview
MongoDB Enterprise can provide database metrics via SNMP, in support of centralized data collection and aggregation. This procedure explains the setup and configuration of a mongod instance as an SNMP subagent, as well as
initializing and testing of SNMP support with MongoDB Enterprise.
See also:
Troubleshoot SNMP (page 239) and Monitor MongoDB Windows with SNMP (page 238) for complete instructions on
using MongoDB with SNMP on Windows systems.
Considerations
Only mongod instances provide SNMP support. mongos and the other MongoDB binaries do not support SNMP.
Configuration Files
236
Chapter 5. Administration
mongod.conf.master:
The configuration file to run mongod as the SNMP master. This file sets SNMP run-time configuration options.
Procedure
Step 1: Copy configuration files. Use the following sequence of commands to move the SNMP configuration files
to the SNMP service configuration directory.
First, create the SNMP configuration directory if needed and then, from the installation directory, copy the configuration files to the SNMP service configuration directory:
mkdir -p /etc/snmp/
cp MONGOD-MIB.txt /usr/share/snmp/mibs/MONGOD-MIB.txt
cp mongod.conf.subagent /etc/snmp/mongod.conf
By default SNMP uses UNIX domain for communication between the agent (i.e. snmpd or the master) and sub-agent
(i.e. MongoDB).
Ensure that the agentXAddress specified in the SNMP configuration file for MongoDB matches the
agentXAddress in the SNMP master configuration file.
Step 2: Start MongoDB. Start mongod with the snmp-subagent to send data to the SNMP master.
mongod --snmp-subagent
Step 3: Confirm SNMP data retrieval. Use snmpwalk to collect data from mongod:
Connect an SNMP client to verify the ability to collect SNMP data from MongoDB.
Install the net-snmp69 package to access the snmpwalk client. net-snmp provides the snmpwalk SNMP client.
snmpwalk -m /usr/share/snmp/mibs/MONGOD-MIB.txt -v 2c -c mongodb 127.0.0.1:<port> 1.3.6.1.4.1.34601
<port> refers to the port defined by the SNMP master, not the primary port used by mongod for client communication.
Optional: Run MongoDB as SNMP Master
You can run mongod with the snmp-master option for testing purposes. To do this, use the SNMP master configuration file instead of the subagent configuration file. From the directory containing the unpacked MongoDB installation
files:
cp mongod.conf.master /etc/snmp/mongod.conf
237
Overview
MongoDB Enterprise can report system information into SNMP traps, to support centralized data collection and
aggregation. This procedure explains the setup and configuration of a mongod.exe instance as an SNMP subagent,
as well as initializing and testing of SNMP support with MongoDB Enterprise.
See also:
Monitor MongoDB With SNMP on Linux (page 236) and Troubleshoot SNMP (page 239) for more information.
Considerations
Only mongod.exe instances provide SNMP support. mongos.exe and the other MongoDB binaries do not support
SNMP.
Configuration Files
Step 1: Copy configuration files. Use the following sequence of commands to move the SNMP configuration files
to the SNMP service configuration directory.
First, create the SNMP configuration directory if needed and then, from the installation directory, copy the configuration files to the SNMP service configuration directory:
md C:\snmp\etc\config
copy MONGOD-MIB.txt C:\snmp\etc\config\MONGOD-MIB.txt
copy mongod.conf.subagent C:\snmp\etc\config\mongod.conf
70 http://www.mongodb.com/products/mongodb-enterprise
238
Chapter 5. Administration
Edit the configuration file to ensure that the communication between the agent (i.e. snmpd or the master) and subagent (i.e. MongoDB) uses TCP.
Ensure that the agentXAddress specified in the SNMP configuration file for MongoDB matches the
agentXAddress in the SNMP master configuration file.
Step 2: Start MongoDB. Start mongod.exe with the snmp-subagent to send data to the SNMP master.
mongod.exe --snmp-subagent
Step 3: Confirm SNMP data retrieval. Use snmpwalk to collect data from mongod.exe:
Connect an SNMP client to verify the ability to collect SNMP data from MongoDB.
Install the net-snmp71 package to access the snmpwalk client. net-snmp provides the snmpwalk SNMP client.
snmpwalk -m C:\snmp\etc\config\MONGOD-MIB.txt -v 2c -c mongodb 127.0.0.1:<port> 1.3.6.1.4.1.34601
<port> refers to the port defined by the SNMP master, not the primary port used by mongod.exe for client
communication.
Optional: Run MongoDB as SNMP Master
You can run mongod.exe with the snmp-master option for testing purposes. To do this, use the SNMP master
configuration file instead of the subagent configuration file. From the directory containing the unpacked MongoDB
installation files:
copy mongod.conf.master C:\snmp\etc\config\mongod.conf
Troubleshoot SNMP
New in version 2.6.
Enterprise Feature
SNMP is only available in MongoDB Enterprise.
Overview
MongoDB Enterprise can report system information into SNMP traps, to support centralized data collection and
aggregation. This document identifies common problems you may encounter when deploying MongoDB Enterprise
with SNMP as well as possible solutions for these issues.
See Monitor MongoDB With SNMP on Linux (page 236) and Monitor MongoDB Windows with SNMP (page 238) for
complete installation instructions.
71 http://www.net-snmp.org/
239
Issues
AgentX is the SNMP agent extensibility protocol defined in Internet RFC 274172 . It explains how to define additional
data to monitor over SNMP. When MongoDB fails to connect to the agentx master agent, use the following procedure
to ensure that the SNMP subagent can connect properly to the SNMP master.
1. Make sure the master agent is running.
2. Compare the SNMP masters configuration file with the subagent configuration file. Ensure that the agentx
socket definition is the same between the two.
3. Check the SNMP configuration files to see if they specify using UNIX Domain Sockets. If so, confirm that the
mongod has appropriate permissions to open a UNIX domain socket.
Error Parsing Command Line One of the following errors at the command line:
Error parsing command line: unknown option snmp-master
try 'mongod --help' for more information
Error parsing command line: unknown option snmp-subagent
try 'mongod --help' for more information
mongod binaries that are not part of the Enterprise Edition produce this error. Install the Enterprise Edition (page 29)
and attempt to start mongod again.
Other MongoDB binaries, including mongos will produce this error if you attempt to star them with snmp-master
or snmp-subagent. Only mongod supports SNMP.
Error Starting SNMPAgent The following line in the log file indicates that mongod cannot read the
mongod.conf file:
[SNMPAgent] warning: error starting SNMPAgent as master err:1
If running on Linux, ensure mongod.conf exists in the /etc/snmp directory, and ensure that the mongod UNIX
user has permission to read the mongod.conf file.
If running on Windows, ensure mongod.conf exists in C:\snmp\etc\config.
240
Chapter 5. Administration
Backup and Restore Sharded Clusters (page 249) Detailed procedures and considerations for backing up sharded
clusters and single shards.
Recover Data after an Unexpected Shutdown (page 257) Recover data from MongoDB data files that were not properly closed or have an invalid state.
Backup and Restore with Filesystem Snapshots
This document describes a procedure for creating backups of MongoDB systems using system-level tools, such as
LVM or storage appliance, as well as the corresponding restoration strategies.
These filesystem snapshots, or block-level backup methods use system level tools to create copies of the device that
holds MongoDBs data files. These methods complete quickly and work reliably, but require more system configuration outside of MongoDB.
See also:
MongoDB Backup Methods (page 182) and Back Up and Restore with MongoDB Tools (page 246).
Snapshots Overview
Snapshots work by creating pointers between the live data and a special snapshot volume. These pointers are theoretically equivalent to hard links. As the working data diverges from the snapshot, the snapshot process uses a
copy-on-write strategy. As a result the snapshot only stores modified data.
After making the snapshot, you mount the snapshot image on your file system and copy data from the snapshot. The
resulting backup contains a full copy of all data.
Snapshots have the following limitations:
The database must be valid when the snapshot takes place. This means that all writes accepted by the database
need to be fully written to disk: either to the journal or to data files.
If all writes are not on disk when the backup occurs, the backup will not reflect these changes. If writes are in
progress when the backup occurs, the data files will reflect an inconsistent state. With journaling all data-file
states resulting from in-progress writes are recoverable; without journaling you must flush all pending writes
to disk before running the backup operation and must ensure that no writes occur during the entire backup
procedure.
If you do use journaling, the journal must reside on the same volume as the data.
Snapshots create an image of an entire disk image. Unless you need to back up your entire system, consider
isolating your MongoDB data files, journal (if applicable), and configuration on one logical disk that doesnt
contain any other data.
Alternately, store all MongoDB data files on a dedicated device so that you can make backups without duplicating extraneous data.
Ensure that you copy data from snapshots and onto other systems to ensure that data is safe from site failures.
Although different snapshots methods provide different capability, the LVM method outlined below does not
provide any capacity for capturing incremental backups.
Snapshots With Journaling If your mongod instance has journaling enabled, then you can use any kind of file
system or volume/block level snapshot tool to create backups.
If you manage your own infrastructure on a Linux-based system, configure your system with LVM to provide your disk
packages and provide snapshot capability. You can also use LVM-based setups within a cloud/virtualized environment.
241
Note: Running LVM provides additional flexibility and enables the possibility of using snapshots to back up MongoDB.
Snapshots with Amazon EBS in a RAID 10 Configuration If your deployment depends on Amazons Elastic
Block Storage (EBS) with RAID configured within your instance, it is impossible to get a consistent state across all
disks using the platforms snapshot tool. As an alternative, you can do one of the following:
Flush all writes to disk and create a write lock to ensure consistent state during the backup process.
If you choose this option see Create Backups on Instances that do not have Journaling Enabled (page 244).
Configure LVM to run and hold your MongoDB data files on top of the RAID within your system.
If you choose this option, perform the LVM backup operation described in Create a Snapshot (page 242).
Backup and Restore Using LVM on a Linux System
This section provides an overview of a simple backup process using LVM on a Linux system. While the tools, commands, and paths may be (slightly) different on your system the following steps provide a high level overview of the
backup operation.
Note: Only use the following procedure as a guideline for a backup system and infrastructure. Production backup
systems must consider a number of application specific requirements and factors unique to specific environments.
Create a Snapshot To create a snapshot with LVM, issue a command as root in the following format:
lvcreate --size 100M --snapshot --name mdb-snap01 /dev/vg0/mongodb
This command creates an LVM snapshot (with the --snapshot option) named mdb-snap01 of the mongodb
volume in the vg0 volume group.
This example creates a snapshot named mdb-snap01 located at /dev/vg0/mdb-snap01. The location and
paths to your systems volume groups and devices may vary slightly depending on your operating systems LVM
configuration.
The snapshot has a cap of at 100 megabytes, because of the parameter --size 100M. This size does not reflect the total amount of the data on the disk, but rather the quantity of differences between the current state of
/dev/vg0/mongodb and the creation of the snapshot (i.e. /dev/vg0/mdb-snap01.)
Warning: Ensure that you create snapshots with enough space to account for data growth, particularly for the
period of time that it takes to copy data out of the system or to a temporary image.
If your snapshot runs out of space, the snapshot image becomes unusable. Discard this logical volume and create
another.
The snapshot will exist when the command returns. You can restore directly from the snapshot at any time or by
creating a new logical volume and restoring from this snapshot to the alternate image.
While snapshots are great for creating high quality backups very quickly, they are not ideal as a format for storing
backup data. Snapshots typically depend and reside on the same storage infrastructure as the original disk images.
Therefore, its crucial that you archive these snapshots and store them elsewhere.
Archive a Snapshot After creating a snapshot, mount the snapshot and copy the data to separate storage. Your
system might try to compress the backup images as you move the offline. Alternatively, take a block level copy of the
snapshot image, such as with the following procedure:
242
Chapter 5. Administration
umount /dev/vg0/mdb-snap01
dd if=/dev/vg0/mdb-snap01 | gzip > mdb-snap01.gz
Restore a Snapshot
mands:
To restore a snapshot created with the above method, issue the following sequence of com-
umount /dev/vg0/mdb-snap01
lvcreate --size 1G --name mdb-new vg0
dd if=/dev/vg0/mdb-snap01 of=/dev/vg0/mdb-new
mount /dev/vg0/mdb-new /srv/mongodb
You can implement off-system backups using the combined process (page 243) and SSH.
This sequence is identical to procedures explained above, except that it archives and compresses the backup on a
remote system using SSH.
Consider the following procedure:
243
umount /dev/vg0/mdb-snap01
dd if=/dev/vg0/mdb-snap01 | ssh [email protected] gzip > /opt/backup/mdb-snap01.gz
lvcreate --size 1G --name mdb-new vg0
ssh [email protected] gzip -d -c /opt/backup/mdb-snap01.gz | dd of=/dev/vg0/mdb-new
mount /dev/vg0/mdb-new /srv/mongodb
If your mongod instance does not run with journaling enabled, or if your journal is on a separate volume, obtaining a
functional backup of a consistent state is more complicated. As described in this section, you must flush all writes to
disk and lock the database to prevent writes during the backup process. If you have a replica set configuration, then
for your backup use a secondary which is not receiving reads (i.e. hidden member).
Important: This procedure is only supported with the MMAPv1 storage engine.
In the following procedure, you must issue the db.fsyncLock() and db.fsyncUnlock() operations
on the same connection.
The client that issues db.fsyncLock() is solely responsible for issuing a
db.fsyncUnlock() operation and must be able to handle potential error conditions so that it can perform the
db.fsyncUnlock() before terminating the connection.
Step 1: Flush writes to disk and lock the database to prevent further writes.
the database, issue the db.fsyncLock() method in the mongo shell:
db.fsyncLock();
Changed in version 2.2: When used in combination with fsync or db.fsyncLock(), mongod will block
reads, including those from mongodump, when queued write operation waits behind the fsync lock. Do not use
mongodump with db.fsyncLock().
Restore a Replica Set from MongoDB Backups
This procedure outlines the process for taking MongoDB data and restoring that data into a new replica set. Use this
approach for seeding test deployments from production backups as well as part of disaster recovery.
You cannot restore a single data set to three new mongod instances and then create a replica set. In this situation
MongoDB will force the secondaries to perform an initial sync. The procedures in this document describe the correct
and efficient ways to deploy a replica set.
Restore Database into a Single Node Replica Set
244
Chapter 5. Administration
Step 1: Obtain backup MongoDB Database files. The backup files may come from a file system snapshot
(page 241). The MongoDB Management Service (MMS)74 produces MongoDB database files for stored snapshots75
and point in time snapshots76 . You can also use mongorestore to restore database files using data created with
mongodump. See Back Up and Restore with MongoDB Tools (page 246) for more information.
Step 2: Start a mongod using data files from the backup as the data path. The following example uses
/data/db as the data path, as specified in the dbpath setting:
mongod --dbpath /data/db
Step 3: Convert the standalone mongod to a single-node replica set Convert the standalone mongod process to
a single-node replica set by shutting down the mongod instance, and restarting it with the --replSet option, as in
the following example:
mongod --dbpath /data/db --replSet <replName>
Optionally, you can explicitly set a oplogSizeMB to control the size of the oplog created for this replica set member.
Step 4: Connect to the mongod instance. For example, first use the following command to a mongod instance
running on the localhost interface:
mongo
Step 5: Initiate the new replica set. Use rs.initiate() to initiate the new replica set, as in the following
example:
rs.initiate()
MongoDB provides two options for restoring secondary members of a replica set:
Manually copy the database files to each data directory.
Allow initial sync (page 574) to distribute data automatically.
The following sections outlines both approaches.
Note: If your database is large, initial sync can take a long time to complete. For large databases, it might be
preferable to copy the database files onto each host.
Copy Database Files and Restart mongod Instance Use the following sequence of operations to seed additional
members of the replica set with the restored data by copying MongoDB data files directly.
Step 1: Shut down the mongod instance that you restored. Use --shutdown or db.shutdownServer()
to ensure a clean shut down.
74 https://mms.mongodb.com/
75 https://docs.mms.mongodb.com/tutorial/restore-from-snapshot/
76 https://docs.mms.mongodb.com/tutorial/restore-from-point-in-time-snapshot/
245
Step 2: Copy the primarys data directory to each secondary. Copy the primarys data directory into the dbPath
of the other members of the replica set. The dbPath is /data/db by default.
Step 3: Start the mongod instance that you restored.
Step 4: Add the secondaries to the replica set. In a mongo shell connected to the primary, add the secondaries to
the replica set using rs.add(). See Deploy a Replica Set (page 583) for more information about deploying a replica
set.
Update Secondaries using Initial Sync Use the following sequence of operations to seed additional members of
the replica set with the restored data using the default initial sync operation.
Step 1: Ensure that the data directories on the prospective replica set members are empty.
Step 2: Add each prospective member to the replica set. When you add a member to the replica set, Initial Sync
(page 574) copies the data from the primary to the new member.
Back Up and Restore with MongoDB Tools
This document describes the process for writing and restoring backups to files in binary format with the mongodump
and mongorestore tools.
Use these tools for backups if other backup methods, such as the MMS Backup Service77 or file system snapshots
(page 241) are unavailable.
See also:
MongoDB Backup Methods (page 182), mongodump, and mongorestore.
Backup a Database with mongodump
246
Chapter 5. Administration
To back up the user-defined roles on a database, you must have the find (page 404) action on the admin databases
admin.system.roles (page 287) collection. Both the backup (page 395) and userAdminAnyDatabase
(page 396) roles provide this privilege.
Basic mongodump Operations The mongodump utility backs up data by connecting to a running mongod or
mongos instance.
The utility can create a backup for an entire server, database or collection, or can use a query to backup just part of a
collection.
When you run mongodump without any arguments, the command connects to the MongoDB instance on the local
system (e.g. 127.0.0.1 or localhost) on port 27017 and creates a database backup named dump/ in the
current directory.
To backup data from a mongod or mongos instance running on the same machine and on the default port of 27017,
use the following command:
mongodump
The data format used by mongodump from version 2.2 or later is incompatible with earlier versions of mongod. Do
not use recent versions of mongodump to back up older data stores.
You can also specify the --host and --port of the MongoDB instance that the mongodump should connect to.
For example:
mongodump --host mongodb.example.net --port 27017
mongodump will write BSON files that hold a copy of data accessible via the mongod listening on port 27017 of
the mongodb.example.net host. See Create Backups from Non-Local mongod Instances (page 247) for more
information.
To specify a different output directory, you can use the --out or -o option:
mongodump --out /data/backup/
To limit the amount of data included in the database dump, you can specify --db and --collection as options to
mongodump. For example:
mongodump --collection myCollection --db test
This operation creates a dump of the collection named myCollection from the database test in a dump/ subdirectory of the current working directory.
mongodump overwrites output files if they exist in the backup data folder. Before running the mongodump command
multiple times, either ensure that you no longer need the files in the output folder (the default is the dump/ folder) or
rename the folders or files.
Point in Time Operation Using Oplogs Use the --oplog option with mongodump to collect the oplog entries
to build a point-in-time snapshot of a database within a replica set. With --oplog, mongodump copies all the data
from the source database as well as all of the oplog entries from the beginning to the end of the backup procedure. This
operation, in conjunction with mongorestore --oplogReplay, allows you to restore a backup that reflects the
specific moment in time that corresponds to when mongodump completed creating the dump file.
Create Backups from Non-Local mongod Instances The --host and --port options for mongodump allow
you to connect to and backup from a remote host. Consider the following example:
247
mongodump --host mongodb1.example.net --port 3017 --username user --password pass --out /opt/backup/m
On any mongodump command you may, as above, specify username and password credentials to specify database
authentication.
Restore a Database with mongorestore
To restore collection data to a database with authentication enabled, the connecting user must possess the appropriate
user roles.
To restore a single database, the connecting user must possess the readWrite (page 390) role for that database.
Alternatively, the readWriteAnyDatabase (page 396) provides access to restore any database. The restore
(page 395) role also provides the requisite permissions.
Changed in version 2.6.
To restore users and user-defined roles (page 313) on a given database, you must have access to the admin database.
MongoDB stores the user data and role definitions for all databases in the admin database.
Specifically, to restore users to a given database, you must have the insert (page 404) action (page 403) on the
admin databases admin.system.users (page 287) collection. The restore (page 395) role provides this
privilege.
To restore user-defined roles to a database, you must have the insert (page 404) action on the admin databases
admin.system.roles (page 287) collection. The restore (page 395) role provides this privilege.
If your database is running with authentication enabled, you must possess the userAdmin (page 392) role on the
database you are restoring, or the userAdminAnyDatabase (page 396) role, which allows you to restore user data
to any database. The restore (page 395) role also provides the requisite privileges.
Basic mongorestore Operations The mongorestore utility restores a binary backup created by
mongodump. By default, mongorestore looks for a database backup in the dump/ directory.
The mongorestore utility restores data by connecting to a running mongod or mongos directly.
mongorestore can restore either an entire database backup or a subset of the backup.
To use mongorestore to connect to an active mongod or mongos, use a command with the following prototype
form:
mongorestore --port <port number> <path to the backup>
Here, mongorestore imports the database backup in the dump-2013-10-25 directory to the mongod instance
running on the localhost interface.
Restore Point in Time Oplog Backup If you created your database dump using the --oplog option to ensure a
point-in-time snapshot, call mongorestore with the --oplogReplay option, as in the following example:
mongorestore --oplogReplay
You may also consider using the mongorestore --objcheck option to check the integrity of objects while
inserting them into the database, or you may consider the mongorestore --drop option to drop each collection
from the database before restoring from backups.
248
Chapter 5. Administration
Restore Backups to Non-Local mongod Instances By default, mongorestore connects to a MongoDB instance
running on the localhost interface (e.g. 127.0.0.1) and on the default port (27017). If you want to restore to a
different host or port, use the --host and --port options.
Consider the following example:
mongorestore --host mongodb1.example.net --port 3017 --username user --password pass /opt/backup/mong
As above, you may specify username and password connections if your mongod requires authentication.
Additional Resources
Overview If your sharded cluster holds a small data set, you can connect to a mongos using mongodump. You can
create backups of your MongoDB cluster, if your backup infrastructure can capture the entire backup in a reasonable
amount of time and if you have a storage system that can hold the complete MongoDB data set.
See MongoDB Backup Methods (page 182) and Backup and Restore Sharded Clusters (page 249) for complete information on backups in MongoDB and backups of sharded clusters in particular.
Important: By default mongodump issue its queries to the non-primary nodes.
78 https://www.mongodb.com/lp/white-paper/backup-disaster-recovery
79 http://mms.mongodb.com
80 http://www.mongodb.com/blog/post/backup-vs-replication-why-do-you-need-both
249
To back up all the databases in a cluster via mongodump, you should have the backup (page 395) role. The backup
(page 395) role provides the required privileges for backing up all databases. The role confers no additional access, in
keeping with the policy of least privilege.
To back up a given database, you must have read access on the database. Several roles provide this access, including
the backup (page 395) role.
To back up the system.profile (page 288) collection, which is created when you activate database profiling
(page 219), you must have additional read access on this collection. Several roles provide this access, including the
clusterAdmin (page 392) and dbAdmin (page 391) roles.
Changed in version 2.6.
To back up users and user-defined roles (page 313) for a given database, you must have access to the admin database.
MongoDB stores the user data and role definitions for all databases in the admin database.
Specifically, to back up a given databases users, you must have the find (page 404) action (page 403)
on the admin databases admin.system.users (page 287) collection. The backup (page 395) and
userAdminAnyDatabase (page 396) roles both provide this privilege.
To back up the user-defined roles on a database, you must have the find (page 404) action on the admin databases
admin.system.roles (page 287) collection. Both the backup (page 395) and userAdminAnyDatabase
(page 396) roles provide this privilege.
Considerations If you use mongodump without specifying a database or collection, mongodump will capture
collection data and the cluster meta-data from the config servers (page 650).
You cannot use the --oplog option for mongodump when capturing data from mongos. As a result, if you need
to capture a backup that reflects a single moment in time, you must stop all writes to the cluster for the duration of the
backup operation.
Procedure
Capture Data You can perform a backup of a sharded cluster by connecting mongodump to a mongos. Use the
following operation at your systems prompt:
mongodump --host mongos3.example.net --port 27017
mongodump will write BSON files that hold a copy of data stored in the sharded cluster accessible via the mongos
listening on port 27017 of the mongos3.example.net host.
Restore Data Backups created with mongodump do not reflect the chunks or the distribution of data in the sharded
collection or collections. Like all mongodump output, these backups contain separate directories for each database
and BSON files for each collection in that database.
You can restore mongodump output to any MongoDB instance, including a standalone, a replica set, or a new sharded
cluster. When restoring data to sharded cluster, you must deploy and configure sharding before restoring data from
the backup. See Deploy a Sharded Cluster (page 670) for more information.
Backup a Sharded Cluster with Filesystem Snapshots
Overview This document describes a procedure for taking a backup of all components of a sharded cluster. This procedure uses file system snapshots to capture a copy of the mongod instance. An alternate procedure uses mongodump
to create binary database dumps when file-system snapshots are not available. See Backup a Sharded Cluster with
Database Dumps (page 252) for the alternate procedure.
250
Chapter 5. Administration
See MongoDB Backup Methods (page 182) and Backup and Restore Sharded Clusters (page 249) for complete information on backups in MongoDB and backups of sharded clusters in particular.
Important: To capture a point-in-time backup from a sharded cluster you must stop all writes to the cluster. On a
running production system, you can only capture an approximation of point-in-time snapshot.
Considerations
Balancing It is essential that you stop the balancer before capturing a backup.
If the balancer is active while you capture backups, the backup artifacts may be incomplete and/or have duplicate data,
as chunks may migrate while recording backups.
Precision In this procedure, you will stop the cluster balancer and take a backup up of the config database, and
then take backups of each shard in the cluster using a file-system snapshot tool. If you need an exact moment-in-time
snapshot of the system, you will need to stop all application writes before taking the filesystem snapshots; otherwise
the snapshot will only approximate a moment in time.
For approximate point-in-time snapshots, you can improve the quality of the backup while minimizing impact on the
cluster by taking the backup from a secondary member of the replica set that provides each shard.
Consistency If the journal and data files are on the same logical volume, you can use a single point-in-time snapshot
to capture a valid copy of the data.
If the journal and data files are on different file systems, you must use db.fsyncLock() and
db.fsyncUnLock() to capture a valid copy of your data.
Procedure
Step 1: Disable the balancer. Disable the balancer process that equalizes the distribution of data among the shards.
To disable the balancer, use the sh.stopBalancer() method in the mongo shell.
Consider the following example:
use config
sh.stopBalancer()
For more information, see the Disable the Balancer (page 695) procedure.
Step 2: If necessary, lock one secondary member of each replica set in each shard. If your mongod does not
have journaling enabled or your journal and data files are on different volumes, you must lock your mongod before
capturing a back up.
If your mongod has journaling enabled and your journal and data files are on the same volume, you may skip this
step.
If you need to lock the monogd, attempt to lock one secondary member of each replica set in each shard so that your
backups reflect the state of your database at the nearest possible approximation of a single moment in time.
To lock a secondary, connect through the mongo shell to the secondary members mongod instance and issue the
db.fsyncLock() method.
251
Step 3: Back up one of the config servers. Backing up a config server (page 650) backs up the sharded clusters
metadata. You need back up only one config server, as they all hold the same data. Do one of the following to back up
one of the config servers:
Create a file-system snapshot of the config server. Do this only if the config server has journaling enabled. Use
the procedure in Backup and Restore with Filesystem Snapshots (page 241). Never use db.fsyncLock() on config
databases.
Create a database dump to backup the config server. Issue mongodump against one of the config mongod
instances or via the mongos. If you are running MongoDB 2.4 or later with the --configsvr option, then include
the --oplog option to ensure that the dump includes a partial oplog containing operations from the duration of the
mongodump operation. For example:
mongodump --oplog --db config
Step 4: Back up the replica set members of the shards that you locked. You may back up the shards in parallel.
For each shard, create a snapshot. Use the procedure in Backup and Restore with Filesystem Snapshots (page 241).
Step 5: Unlock locked replica set members.
them now.
Unlock all locked replica set members of each shard using the db.fsyncUnlock() method in the mongo shell.
Step 6: Enable the balancer. Re-enable the balancer with the sh.setBalancerState() method. Use the
following command sequence when connected to the mongos with the mongo shell:
use config
sh.setBalancerState(true)
Overview This document describes a procedure for taking a backup of all components of a sharded cluster. This
procedure uses mongodump to create dumps of the mongod instance. An alternate procedure uses file system snapshots to capture the backup data, and may be more efficient in some situations if your system configuration allows file
system backups. See Backup and Restore Sharded Clusters (page 249) for more information.
See MongoDB Backup Methods (page 182) and Backup and Restore Sharded Clusters (page 249) for complete information on backups in MongoDB and backups of sharded clusters in particular.
Prerequisites
Important: To capture a point-in-time backup from a sharded cluster you must stop all writes to the cluster. On a
running production system, you can only capture an approximation of point-in-time snapshot.
To back up all the databases in a cluster via mongodump, you should have the backup (page 395) role. The backup
(page 395) role provides the required privileges for backing up all databases. The role confers no additional access, in
keeping with the policy of least privilege.
To back up a given database, you must have read access on the database. Several roles provide this access, including
the backup (page 395) role.
252
Chapter 5. Administration
To back up the system.profile (page 288) collection, which is created when you activate database profiling
(page 219), you must have additional read access on this collection. Several roles provide this access, including the
clusterAdmin (page 392) and dbAdmin (page 391) roles.
Changed in version 2.6.
To back up users and user-defined roles (page 313) for a given database, you must have access to the admin database.
MongoDB stores the user data and role definitions for all databases in the admin database.
Specifically, to back up a given databases users, you must have the find (page 404) action (page 403)
on the admin databases admin.system.users (page 287) collection. The backup (page 395) and
userAdminAnyDatabase (page 396) roles both provide this privilege.
To back up the user-defined roles on a database, you must have the find (page 404) action on the admin databases
admin.system.roles (page 287) collection. Both the backup (page 395) and userAdminAnyDatabase
(page 396) roles provide this privilege.
Consideration To create these backups of a sharded cluster, you will stop the cluster balancer and take a backup up
of the config database, and then take backups of each shard in the cluster using mongodump to capture the backup
data. To capture a more exact moment-in-time snapshot of the system, you will need to stop all application writes
before taking the filesystem snapshots; otherwise the snapshot will only approximate a moment in time.
For approximate point-in-time snapshots, taking the backup from a single offline secondary member of the replica set
that provides each shard can improve the quality of the backup while minimizing impact on the cluster.
Procedure
Step 1: Disable the balancer process. Disable the balancer process that equalizes the distribution of data among
the shards. To disable the balancer, use the sh.stopBalancer() method in the mongo shell. For example:
use config
sh.setBalancerState(false)
For more information, see the Disable the Balancer (page 695) procedure.
Warning: If you do not stop the balancer, the backup could have duplicate data or omit data as chunks migrate
while recording backups.
Step 2: Lock replica set members. Lock one member of each replica set in each shard so that your backups reflect
the state of your database at the nearest possible approximation of a single moment in time. Lock these mongod
instances in as short of an interval as possible.
To lock or freeze a sharded cluster, you shut down one member of each replica set. Ensure that the oplog has sufficient
capacity to allow these secondaries to catch up to the state of the primaries after finishing the backup procedure. See
Oplog Size (page 573) for more information.
Step 3: Backup one config server. Use mongodump to backup one of the config servers (page 650). This backs up
the clusters metadata. You only need to back up one config server, as they all hold the same data.
Use the mongodump tool to capture the content of the config mongod instances.
Your config servers must run MongoDB 2.4 or later with the --configsvr option and the mongodump option
must include the --oplog to capture a consistent copy of the config database:
253
Step 4: Backup replica set members. Back up the replica set members of the shards that shut down using
mongodump and specifying the --dbpath option. You may back up the shards in parallel. Consider the following
invocation:
mongodump --journal --dbpath /data/db/ --out /data/backup/
You must run mongodump on the same system where the mongod ran. This operation will create a dump of all the
data managed by the mongod instances that used the dbPath /data/db/. mongodump writes the output of this
dump to the /data/backup/ directory.
Step 5: Restart replica set members. Restart all stopped replica set members of each shard as normal and allow
them to catch up with the state of the primary.
Step 6: Re-enable the balancer process. Re-enable the balancer with the sh.setBalancerState() method.
Use the following command sequence when connected to the mongos with the mongo shell:
use config
sh.setBalancerState(true)
Overview In a sharded cluster, the balancer process is responsible for distributing sharded data around the cluster,
so that each shard has roughly the same amount of data.
However, when creating backups from a sharded cluster it is important that you disable the balancer while taking
backups to ensure that no chunk migrations affect the content of the backup captured by the backup procedure. Using
the procedure outlined in the section Disable the Balancer (page 695) you can manually stop the balancer process
temporarily. As an alternative you can use this procedure to define a balancing window so that the balancer is always
disabled during your automated backup operation.
Procedure If you have an automated backup schedule, you can disable all balancing operations for a period of time.
For instance, consider the following command:
use config
db.settings.update( { _id : "balancer" }, { $set : { activeWindow : { start : "6:00", stop : "23:00"
This operation configures the balancer to run between 6:00am and 11:00pm, server time. Schedule your backup
operation to run and complete outside of this time. Ensure that the backup can complete outside the window when
the balancer is running and that the balancer can effectively balance the collection among the shards in the window
allotted to each.
Restore a Single Shard
Overview Restoring a single shard from backup with other unaffected shards requires a number of special considerations and practices. This document outlines the additional tasks you must perform when restoring a single shard.
Consider the following resources on backups in general as well as backup and restoration of sharded clusters specifically:
254
Chapter 5. Administration
Overview You can restore a sharded cluster either from snapshots (page 241) or from BSON database dumps
(page 252) created by the mongodump tool. This document provides procedures for both:
Restore a Sharded Cluster with Filesystem Snapshots (page 255)
Restore a Sharded Cluster with Database Dumps (page 256)
Related Documents For an overview of backups in MongoDB, see MongoDB Backup Methods (page 182). For
complete information on backups and backups of sharded clusters in particular, see Backup and Restore Sharded
Clusters (page 249).
For backup procedures, see:
Backup a Sharded Cluster with Filesystem Snapshots (page 250)
Backup a Sharded Cluster with Database Dumps (page 252)
Procedures Use the procedure for the type of backup files to restore.
Restore a Sharded Cluster with Filesystem Snapshots
Step 1: Shut down the entire cluster.
servers.
Stop all mongos and mongod processes, including all shards and all config
255
Step 2: Restore the data files. One each server, extract the data files to the location where the mongod instance
will access them. Restore the following:
Data files for each server in each shard. Because replica sets provide each production shard, restore all the members of the replica set or use the other standard approaches for restoring a replica set from backup. See the Restore a
Snapshot (page 243) and Restore a Database with mongorestore (page 248) sections for details on these procedures.
Data files for each config server.
Step 3: Restart the config servers. Restart each config server (page 650) mongod instance by issuing a command
similar to the following for each, using values appropriate to your configuration:
mongod --configsvr --dbpath /data/configdb --port 27019
Step 4: If shard hostnames have changed, update the config string and config database. If shard hostnames
have changed, start one mongos instance using the updated config string with the new configdb hostnames and
ports.
Then update the shards collection in the Config Database (page 716) to reflect the new hostnames. Then stop the
mongos instance.
Step 5: Restart all the shard mongod instances.
Step 6: Restart all the mongos instances. If shard hostnames have changed, make sure to use the updated config
string.
Step 7: Connect to a mongos to ensure the cluster is operational. Connect to a mongos instance from a mongo
shell and use the db.printShardingStatus() method to ensure that the cluster is operational, as follows:
db.printShardingStatus()
show collections
Stop all mongos and mongod processes, including all shards and all config
256
Chapter 5. Administration
Step 3: Restart the config servers. Restart each config server (page 650) mongod instance by issuing a command
similar to the following for each, using values appropriate to your configuration:
mongod --configsvr --dbpath /data/configdb --port 27019
Step 4: If shard hostnames have changed, update the config string and config database. If shard hostnames
have changed, start one mongos instance using the updated config string with the new configdb hostnames and
ports.
Then update the shards collection in the Config Database (page 716) to reflect the new hostnames. Then stop the
mongos instance.
Step 5: Restart all the shard mongod instances.
Step 6: Restart all the mongos instances. If shard hostnames have changed, make sure to use the updated config
string.
Step 7: Connect to a mongos to ensure the cluster is operational. Connect to a mongos instance from a mongo
shell and use the db.printShardingStatus() method to ensure that the cluster is operational, as follows:
db.printShardingStatus()
show collections
257
Process
Indications When you are aware of a mongod instance running without journaling that stops unexpectedly and
youre not running with replication, you should always run the repair operation before starting MongoDB again. If
youre using replication, then restore from a backup and allow replication to perform an initial sync (page 573) to
restore data.
If the mongod.lock file in the data directory specified by dbPath, /data/db by default, is not a zero-byte file,
then mongod will refuse to start, and you will find a message that contains the following line in your MongoDB log
our output:
Unclean shutdown detected.
This indicates that you need to run mongod with the --repair option. If you run repair when the mongodb.lock
file exists in your dbPath, or the optional --repairpath, you will see a message that contains the following line:
old lock file: /data/db/mongod.lock. probably means unclean shutdown
If you see this message, as a last resort you may remove the lockfile and run the repair operation before starting the
database normally, as in the following procedure:
Overview
There are two processes to repair data files that result from an unexpected shutdown:
Use the --repair option in conjunction with the --repairpath option. mongod will read the existing
data files, and write the existing data to new data files.
You do not need to remove the mongod.lock file before using this procedure.
Use the --repair option. mongod will read the existing data files, write the existing data to new files and
replace the existing, possibly corrupt, files with new files.
You must remove the mongod.lock file before using this procedure.
Note: --repair functionality is also available in the shell with the db.repairDatabase() helper for the
repairDatabase command.
Procedures
Important: Always Run mongod as the same user to avoid changing the permissions of the MongoDB data files.
Repair Data Files and Preserve Original Files To repair your data files using the --repairpath option to
preserve the original data files unmodified.
Repair Data Files without Preserving Original Files To repair your data files without preserving the original files,
do not use the --repairpath option, as in the following procedure:
Warning: After you remove the mongod.lock file you must run the --repair process before using your
database.
258
Chapter 5. Administration
Step 1: Start mongod using the option to replace the original files with the repaired files. Start the mongod
instance using the --repair option and the --repairpath option. Issue a command similar to the following:
mongod --dbpath /data/db --repair --repairpath /data/db0
When this completes, the new repaired data files will be in the /data/db0 directory.
Step 2: Start mongod with the new data directory. Start mongod using the following invocation to point the
dbPath at /data/db0:
mongod --dbpath /data/db0
Once you confirm that the data files are operational you may delete or archive the old data files in the /data/db
directory. You may also wish to move the repaired files to the old database location or update the dbPath to indicate
the new location.
Step 1: Remove the stale lock file. For example:
rm /data/db/mongod.lock
Replace /data/db with your dbPath where your MongoDB instances data files reside.
Step 2: Start mongod using the option to replace the original files with the repaired files. Start the mongod
instance using the --repair option, which replaces the original data files with the repaired data files. Issue a
command similar to the following:
mongod --dbpath /data/db --repair
When this completes, the repaired data files will replace the original data files in the /data/db directory.
Step 3: Start mongod as usual.
Start mongod using the following invocation to point the dbPath at /data/db:
mongod.lock
In normal operation, you should never remove the mongod.lock file and start mongod. Instead consider the one
of the above methods to recover the database and remove the lock files. In dire situations you can remove the lockfile,
and start the database using the possibly corrupt files, and attempt to recover data from the database; however, its
impossible to predict the state of the database in these situations.
If you are not running with journaling, and your database shuts down unexpectedly for any reason, you should always
proceed as if your database is in an inconsistent and likely corrupt state. If at all possible restore from backup
(page 182) or, if running as a replica set, restore by performing an initial sync using data from an intact member of the
set, as described in Resync a Member of a Replica Set (page 613).
259
Note: Most examples in the MongoDB Manual use the mongo shell; however, many drivers provide similar
interfaces to MongoDB.
Server-side JavaScript (page 260) Details MongoDBs support for executing JavaScript code for server-side operations.
Data Types in the mongo Shell (page 261) Describes the super-set of JSON available for use in the mongo shell.
Write Scripts for the mongo Shell (page 264) An introduction to the mongo shell for writing scripts to manipulate
data and administer MongoDB.
Getting Started with the mongo Shell (page 266) Introduces the use and operation of the MongoDB shell.
Access the mongo Shell Help Information (page 270) Describes the available methods for accessing online help for
the operation of the mongo interactive shell.
mongo Shell Quick Reference (page 272) A high level reference to the use and operation of the mongo shell.
Server-side JavaScript
Overview
MongoDB provides the following commands, methods, and operator that perform server-side execution of JavaScript
code:
mapReduce and the corresponding mongo shell method db.collection.mapReduce(). mapReduce
operations map, or associate, values to keys, and for keys with multiple values, reduce the values for each key
to a single object. For more information, see Map-Reduce (page 424).
$where operator that evaluates a JavaScript expression or a function in order to query for documents.
You can also specify a JavaScript file to the mongo shell to run on the server. For more information, see Running .js
files via a mongo shell Instance on the Server (page 260)
JavaScript in MongoDB
Although these methods use JavaScript, most interactions with MongoDB do not use JavaScript but use an
idiomatic driver in the language of the interacting application.
You can also disable server-side execution of JavaScript. For details, see Disable Server-Side Execution of JavaScript
(page 261).
Running .js files via a mongo shell Instance on the Server
You can specify a JavaScript (.js) file to a mongo shell instance to execute the file on the server. This is a good
technique for performing batch administrative work. When you run mongo shell on the server, connecting via the
localhost interface, the connection is fast with low latency.
The command helpers (page 272) provided in the mongo shell are not available in JavaScript files because they are
not valid JavaScript. The following table maps the most common mongo shell helpers to their JavaScript equivalents.
260
Chapter 5. Administration
Shell Helpers
show dbs, show databases
use <db>
show collections
show users
show roles
show log <logname>
show logs
it
JavaScript Equivalents
db.adminCommand('listDatabases')
db = db.getSiblingDB('<db>')
db.getCollectionNames()
db.getUsers()
db.getRoles({showBuiltinRoles: true})
db.adminCommand({ 'getLog' : '<logname>' })
db.adminCommand({ 'getLog' : '*' })
cursor = db.collection.find()
if ( cursor.hasNext() ){
cursor.next();
}
Concurrency
You can disable all server-side execution of JavaScript, by passing the --noscripting option on the command
line or setting security.javascriptEnabled in a configuration file.
See also:
Store a JavaScript Function on the Server (page 233)
Data Types in the mongo Shell
MongoDB BSON provides support for additional data types than JSON. Drivers provide native support for these
data types in host languages and the mongo shell also provides several helper classes to support the use of these data
types in the mongo JavaScript shell. See the Extended JSON reference for additional information.
261
Types
Date The mongo shell provides various methods to return the date, either as a string or as a Date object:
Date() method which returns the current date as a string.
new Date() constructor which returns a Date object using the ISODate() wrapper.
ISODate() constructor which returns a Date object using the ISODate() wrapper.
Internally, Date objects are stored as a 64 bit integer representing the number of milliseconds since the Unix epoch
(Jan 1, 1970), which results in a representable date range of about 290 millions years into the past and future.
Return Date as a String To return the date as a string, use the Date() method, as in the following example:
var myDateString = Date();
To print the value of the variable, type the variable name in the shell, as in the following:
myDateString
You can use the new operator with the ISODate() constructor as well.
To print the value of the variable, type the variable name in the shell, as in the following:
myDate
The result is the Date value of myDate wrapped in the ISODate() helper:
ISODate("2012-12-19T06:01:17.171Z")
262
Chapter 5. Administration
ObjectId The mongo shell provides the ObjectId() wrapper class around the ObjectId data type. To generate a
new ObjectId, use the following operation in the mongo shell:
new ObjectId
See
ObjectId (page 174) for full documentation of ObjectIds in MongoDB.
NumberLong By default, the mongo shell treats all numbers as floating-point values. The mongo shell provides
the NumberLong() wrapper to handle 64-bit integers.
The NumberLong() wrapper accepts the long as a string:
NumberLong("2090845886852")
The following examples use the NumberLong() wrapper to write to the collection:
db.collection.insert( { _id: 10, calc: NumberLong("2090845886852") } )
db.collection.update( { _id: 10 },
{ $set: { calc: NumberLong("2555555000000") } } )
db.collection.update( { _id: 10 },
{ $inc: { calc: NumberLong(5) } } )
If you use the $inc to increment the value of a field that contains a NumberLong object by a float, the data type
changes to a floating point value, as in the following example:
1. Use $inc to increment the calc field by 5, which the mongo shell treats as a float:
db.collection.update( { _id: 10 },
{ $inc: { calc: 5 } } )
In the updated document, the calc field contains a floating point value:
{ "_id" : 10, "calc" : 2555555000010 }
NumberInt By default, the mongo shell treats all numbers as floating-point values. The mongo shell provides the
NumberInt() constructor to explicitly specify 32-bit integers.
Check Types in the mongo Shell
To determine the type of fields, the mongo shell provides the instanceof and typeof operators.
263
In this case typeof will return the more generic object type rather than ObjectId type.
Write Scripts for the mongo Shell
You can write scripts for the mongo shell in JavaScript that manipulate data in MongoDB or perform administrative
operation. For more information about the mongo shell see MongoDB Scripting (page 259), and see the Running .js
files via a mongo shell Instance on the Server (page 260) section for more information about using these mongo script.
This tutorial provides an introduction to writing JavaScript that uses the mongo shell to access MongoDB.
Opening New Connections
From the mongo shell or from a JavaScript file, you can instantiate database connections using the Mongo() constructor:
new Mongo()
new Mongo(<host>)
new Mongo(<host:port>)
Consider the following example that instantiates a new connection to the MongoDB instance running on localhost on
the default port and sets the global db variable to myDatabase using the getDB() method:
conn = new Mongo();
db = conn.getDB("myDatabase");
Additionally, you can use the connect() method to connect to the MongoDB instance. The following example
connects to the MongoDB instance that is running on localhost with the non-default port 27020 and set the
global db variable:
db = connect("localhost:27020/myDatabase");
When writing scripts for the mongo shell, consider the following:
To set the db global variable, use the getDB() method or the connect() method. You can assign the
database reference to a variable other than db.
Write operations in the mongo shell use the safe writes by default. If performing bulk operations, use the
Bulk() methods. See Write Method Acknowledgements (page 821) for more information.
Changed in version 2.6: Before MongoDB 2.6, call db.getLastError() explicitly to wait for the result of
write operations (page 71).
264
Chapter 5. Administration
You cannot use any shell helper (e.g. use <dbname>, show dbs, etc.) inside the JavaScript file because
they are not valid JavaScript.
The following table maps the most common mongo shell helpers to their JavaScript equivalents.
Shell Helpers
show dbs, show databases
use <db>
show collections
show users
show roles
show log <logname>
show logs
it
JavaScript Equivalents
db.adminCommand('listDatabases')
db = db.getSiblingDB('<db>')
db.getCollectionNames()
db.getUsers()
db.getRoles({showBuiltinRoles: true})
db.adminCommand({ 'getLog' : '<logname>' })
db.adminCommand({ 'getLog' : '*' })
cursor = db.collection.find()
if ( cursor.hasNext() ){
cursor.next();
}
In interactive mode, mongo prints the results of operations including the content of all cursors. In scripts, either
use the JavaScript print() function or the mongo specific printjson() function which returns formatted
JSON.
Example
To print all items in a result cursor in mongo shell scripts, use the following idiom:
cursor = db.collection.find();
while ( cursor.hasNext() ) {
printjson( cursor.next() );
}
Scripting
This returns the output of db.getCollectionNames() using the mongo shell connected to the mongod or
mongos instance running on port 27017 on the localhost interface.
265
Execute a JavaScript file You can specify a .js file to the mongo shell, and mongo will execute the JavaScript
directly. Consider the following example:
mongo localhost:27017/test myjsfile.js
This operation executes the myjsfile.js script in a mongo shell that connects to the test database on the
mongod instance accessible via the localhost interface on port 27017.
Alternately, you can specify the mongodb connection parameters inside of the javascript file using the Mongo()
constructor. See Opening New Connections (page 264) for more information.
You can execute a .js file from within the mongo shell, using the load() function, as in the following:
load("myjstest.js")
Note: There is no search path for the load() function. If the desired script is not in the current working directory
or the full specified path, mongo will not be able to access the file.
To start the mongo shell and connect to your MongoDB instance running on localhost with default port:
1. Go to your <mongodb installation dir>:
cd <mongodb installation dir>
If you have added the <mongodb installation dir>/bin to the PATH environment variable, you can
just type mongo instead of ./bin/mongo.
3. To display the database you are using, type db:
db
The operation should return test, which is the default database. To switch databases, issue the use <db>
helper, as in the following example:
use <database>
To list the available databases, use the helper show dbs. See also How can I access different databases
temporarily? (page 737) to access a different database from the current database without switching your current
database context (i.e. db..)
266
Chapter 5. Administration
To start the mongo shell with other options, see examples of starting up mongo and mongo reference which
provides details on the available options.
Note: When starting, mongo checks the users HOME directory for a JavaScript file named .mongorc.js. If found,
mongo interprets the content of .mongorc.js before displaying the prompt for the first time. If you use the shell to
evaluate a JavaScript file or expression, either by using the --eval option on the command line or by specifying a .js
file to mongo, mongo will read the .mongorc.js file after the JavaScript has finished processing. You can prevent
.mongorc.js from being loaded by using the --norc option.
Executing Queries
From the mongo shell, you can use the shell methods to run queries, as in the following example:
db.<collection>.find()
The find() method is the JavaScript method to retrieve documents from <collection>. The find()
method returns a cursor to the results; however, in the mongo shell, if the returned cursor is not assigned to a
variable using the var keyword, then the cursor is automatically iterated up to 20 times to print up to the first
20 documents that match the query. The mongo shell will prompt Type it to iterate another 20 times.
You can set the DBQuery.shellBatchSize attribute to change the number of iteration from the default
value 20, as in the following example which sets it to 10:
DBQuery.shellBatchSize = 10;
For more information and examples on cursor handling in the mongo shell, see Cursors (page 62).
See also Cursor Help (page 271) for list of cursor help in the mongo shell.
For more documentation of basic MongoDB operations in the mongo shell, see:
Getting Started with MongoDB (page 48)
mongo Shell Quick Reference (page 272)
Read Operations (page 58)
Write Operations (page 71)
Indexing Tutorials (page 502)
Print
The mongo shell automatically prints the results of the find() method if the returned cursor is not assigned to
a variable using the var keyword. To format the result, you can add the .pretty() to the operation, as in the
following:
267
db.<collection>.find().pretty()
In addition, you can use the following explicit print methods in the mongo shell:
print() to print without formatting
print(tojson(<obj>)) to print with JSON formatting and equivalent to printjson()
printjson() to print with JSON formatting and equivalent to print(tojson(<obj>))
Evaluate a JavaScript File
You can execute a .js file from within the mongo shell, using the load() function, as in the following:
load("myjstest.js")
Note: There is no search path for the load() function. If the desired script is not in the current working directory
or the full specified path, mongo will not be able to access the file.
You may modify the content of the prompt by creating the variable prompt in the shell. The prompt variable can
hold strings as well as any arbitrary JavaScript. If prompt holds a function that returns a string, mongo can display
dynamic information in each prompt. Consider the following examples:
Example
Create a prompt with the number of operations issued in the current session, define the following variables:
cmdCount = 1;
prompt = function() {
return (cmdCount++) + "> ";
}
Example
To create a mongo shell prompt in the form of <database>@<hostname>$ define the following variables:
host = db.serverStatus().host;
prompt = function() {
268
Chapter 5. Administration
Example
To create a mongo shell prompt that contains the system up time and the number of documents in the current database,
define the following prompt variable:
prompt = function() {
return "Uptime:"+db.serverStatus().uptime+" Documents:"+db.stats().objects+" > ";
}
Note: As mongo shell interprets code edited in an external editor, it may modify code in functions, depending on
the JavaScript compiler. For mongo may convert 1+1 to 2 or remove comments. The actual changes affect only the
appearance of the code and will vary based on the version of JavaScript used but will not affect the semantics of the
code.
5.2. Administration Tutorials
269
To see the list of options and help for starting the mongo shell, use the --help option from the command line:
mongo --help
Shell Help
Database Help
To see the list of databases on the server, use the show dbs command:
show dbs
New in version 2.4: show databases is now an alias for show dbs
To see the list of help for methods you can use on the db object, call the db.help() method:
db.help()
To see the implementation of a method in the shell, type the db.<method name> without the parenthesis
(()), as in the following example which will return the implementation of the method db.updateUser():
db.updateUser
Collection Help
To see the list of collections in the current database, use the show collections command:
270
Chapter 5. Administration
show collections
To see the help for methods available on the collection objects (e.g.
db.<collection>.help() method:
db.collection.help()
<collection> can be the name of a collection that exists, although you may specify a collection that doesnt
exist.
To see the collection method implementation, type the db.<collection>.<method> name without the
parenthesis (()), as in the following example which will return the implementation of the save() method:
db.collection.save
Cursor Help
When you perform read operations (page 59) with the find() method in the mongo shell, you can use various
cursor methods to modify the find() behavior and various JavaScript methods to handle the cursor returned from
the find() method.
To list the available modifier and cursor handling methods, use the db.collection.find().help()
command:
db.collection.find().help()
<collection> can be the name of a collection that exists, although you may specify a collection that doesnt
exist.
To see the implementation of the cursor method, type the db.<collection>.find().<method> name
without the parenthesis (()), as in the following example which will return the implementation of the
toArray() method:
db.collection.find().toArray
To get a list of the wrapper classes available in the mongo shell, such as BinData(), type help misc in the
mongo shell:
help misc
271
You can retrieve previous commands issued in the mongo shell with the up and down arrow keys. Command history
is stored in ~/.dbshell file. See .dbshell for more information.
Command Line Options
The mongo executable can be started with numerous options. See mongo executable page for details on all
available options.
The following table displays some common options for mongo:
OpDescription
tion
--help Show command line options
--nodb Start mongo shell without connecting to a database.
To connect later, see Opening New Connections (page 264).
--shellUsed in conjunction with a JavaScript file (i.e. <file.js>) to continue in the mongo shell after running
the JavaScript file.
See JavaScript file (page 266) for an example.
Command Helpers
The mongo shell provides various help. The following table displays some common help methods and commands:
Help Methods and Description
Commands
help
Show help.
db.help()
Show help for database methods.
db.<collection>.help()
Show help on collection methods. The <collection> can be the name of an existing
collection or a non-existing collection.
show dbs
Print a list of all databases on the server.
use <db>
Switch current database to <db>. The mongo shell variable db is set to the current
database.
show
Print a list of all collections for current database
collections
show users
Print a list of users for current database.
show roles
Print a list of all roles, both user-defined and built-in, for the current database.
show profile
Print the five most recent operations that took 1 millisecond or more. See documentation
on the database profiler (page 225) for more information.
show databases
New in version 2.4: Print a list of all available databases.
load()
Execute a JavaScript file. See Getting Started with the mongo Shell (page 266) for more
information.
Basic Shell JavaScript Operations
272
Chapter 5. Administration
Description
If running in secure mode, authenticate the user.
Set a specific collection in the current database to a variable coll, as in the following example:
coll = db.myCollection;
You can perform operations on the myCollection
using the variable, as in the following example:
coll.find();
find()
insert()
update()
save()
remove()
drop()
createIndex()
db.getSiblingDB()
Function
previous-history
next-history
beginning-of-line
Continued on next page
273
Queries
In the mongo shell, perform read operations using the find() and findOne() methods.
The find() method returns a cursor object which the mongo shell iterates to print documents on screen. By default,
mongo prints the first 20. The mongo shell will prompt the user to Type it to continue iterating the next 20
results.
The following table provides some common read operations in the mongo shell:
274
Chapter 5. Administration
Read Operations
db.collection.find(<query>)
db.collection.find( <query>,
<projection> )
db.collection.find().sort( <sort
order> )
db.collection.findOne( <query> )
Description
Find the documents matching the <query> criteria in
the collection. If the <query> criteria is not specified
or is empty (i.e {} ), the read operation selects all documents in the collection.
The following example selects the documents in the
users collection with the name field equal to "Joe":
coll = db.users;
coll.find( { name: "Joe" } );
For more information on specifying the <query> criteria, see Query Documents (page 95).
Find documents matching the <query> criteria and return just specific fields in the <projection>.
The following example selects all documents from the
collection but returns only the name field and the _id
field. The _id is always returned unless explicitly specified to not return.
coll = db.users;
coll.find( { },
{ name: true }
);
For
more
information
on
specifying
the
<projection>, see Limit Fields to Return from
a Query (page 106).
Return results in the specified <sort order>.
The following example selects all documents from the
collection and returns the results sorted by the name
field in ascending order (1). Use -1 for descending order:
coll = db.users;
coll.find().sort( { name: 1 } );
Return the documents matching the <query> criteria
in the specified <sort order>.
Limit result to <n> rows. Highly recommended if you
need only a certain number of rows for best performance.
Skip <n> results.
Returns total number of documents in the collection.
Returns the total number of documents that match the
query.
The count() ignores limit() and skip(). For
example, if 100 records match but the limit is 10,
count() will return 100. This will be faster than iterating yourself, but still take time.
Find and return a single document. Returns null if not
found.
The following example selects a single document in the users collection with the
name field matches to "Joe":
coll = db.users;
coll.findOne( { name: "Joe" } );
Internally, the findOne() method is the find()
method with a limit(1).
275
See Query Documents (page 95) and Read Operations (page 58) documentation for more information and examples.
See http://docs.mongodb.org/manual/reference/operator/query to specify other query operators.
Error Checking Methods
The following table lists some common methods to support database administration:
JavaScript Database
Description
Administration Methods
db.cloneDatabase(<host>)
Clone the current database from the <host> specified. The <host> database
instance must be in noauth mode.
db.copyDatabase(<from>,Copy the <from> database from the <host> to the <to> database on the
<to>, <host>)
current server.
The <host> database instance must be in noauth mode.
db.fromColl.renameCollection(<toColl>)
Rename collection from fromColl to <toColl>.
db.repairDatabase()
Repair and compact the current database. This operation can be very slow on
large databases.
db.getCollectionNames()Get the list of all collections in the current database.
db.dropDatabase()
Drops the current database.
See also administrative database methods for a full list of methods.
Opening Additional Connections
Description
Open a new database connection.
Open a connection to a new server using new
Mongo().
Use getDB() method of the connection to select a
database.
See also Opening New Connections (page 264) for more information on the opening new connections from the mongo
shell.
276
Chapter 5. Administration
Miscellaneous
Description
Prints the BSON size of a <document> in bytes
See the MongoDB JavaScript API Documentation84 for a full list of JavaScript methods .
Additional Resources
Consider the following reference material that addresses the mongo shell and its interface:
mongo
js-administrative-methods
database-commands
Aggregation Reference (page 451)
Additionally, the MongoDB source code repository includes a jstests directory85 which contains numerous mongo
shell scripts.
277
Administration
Replica Sets
278
Chapter 5. Administration
279
Development Patterns
Perform Two Phase Commits (page 114)
Create an Auto-Incrementing Sequence Field (page 124)
Enforce Unique Keys for Sharded Collections (page 711)
Aggregation Examples (page 434)
Model Data to Support Keyword Search (page 162)
Limit Number of Elements in an Array after an Update (page 107)
Perform Incremental Map-Reduce (page 445)
Troubleshoot the Map Function (page 447)
Troubleshoot the Reduce Function (page 448)
Store a JavaScript Function on the Server (page 233)
Text Search Patterns
Create a text Index (page 524)
Specify a Language for Text Index (page 525)
Specify Name for text Index (page 527)
Control Search Results with Weights (page 528)
Limit the Number of Entries Scanned (page 529)
Data Modeling Patterns
Model One-to-One Relationships with Embedded Documents (page 150)
Model One-to-Many Relationships with Embedded Documents (page 151)
Model One-to-Many Relationships with Document References (page 152)
Model Data for Atomic Operations (page 162)
Model Tree Structures with Parent References (page 155)
Model Tree Structures with Child References (page 156)
Model Tree Structures with Materialized Paths (page 159)
Model Tree Structures with Nested Sets (page 160)
See also:
The MongoDB Manual contains administrative documentation and tutorials though out several sections. See Replica
Set Tutorials (page 581) and Sharded Cluster Tutorials (page 669) for additional tutorials and information.
280
Chapter 5. Administration
Transparent Huge Pages (THP) Settings (page 285) Describes Transparent Huge Pages (THP) and provides detailed
instructions on disabling them.
System Collections (page 287) Introduces the internal collections that MongoDB uses to track per-database metadata,
including indexes, collections, and authentication credentials.
Database Profiler Output (page 288) Describes the data collected by MongoDBs operation profiler, which introspects operations and reports data for analysis on performance and behavior.
Server Status Output (page 292) Provides an example and a high level overview of the output of the
serverStatus command.
Journaling Mechanics (page 300) Describes the internal operation of MongoDBs journaling facility and outlines
how the journal allows MongoDB to provide provides durability and crash resiliency.
Exit Codes and Statuses (page 302) Lists the unique codes returned by mongos and mongod processes upon exit.
Resource Utilization
mongod and mongos each use threads and file descriptors to track connections and manage internal operations. This
section outlines the general resource utilization patterns for MongoDB. Use these figures in combination with the
actual information about your deployment and its use to determine ideal ulimit settings.
Generally, all mongod and mongos instances:
track each incoming connection with a file descriptor and a thread.
track each internal thread or pthread as a system process.
mongod
1 file descriptor for each data file in use by the mongod instance.
1 file descriptor for each journal file used by the mongod instance when storage.journal.enabled is
true.
In replica sets, each mongod maintains a connection to all other members of the set.
mongod uses background threads for a number of internal processes, including TTL collections (page 211), replication, and replica set health checks, which may require a small number of additional resources.
281
mongos
In addition to the threads and file descriptors for client connections, mongos must maintain connects to all config
servers and all shards, which includes all members of all replica sets.
For mongos, consider the following behaviors:
mongos instances maintain a connection pool to each shard so that the mongos can reuse connections and
quickly fulfill requests without needing to create new connections.
You can limit the number of incoming connections using the maxIncomingConnections run-time option.
By restricting the number of incoming connections you can prevent a cascade effect where the mongos creates
too many connections on the mongod instances.
Note: Changed in version 2.6: MongoDB removed the upward limit on the maxIncomingConnections
setting.
You can use the ulimit command at the system prompt to check system limits, as in the following example:
$ ulimit -a
-t: cpu time (seconds)
-f: file size (blocks)
-d: data seg size (kbytes)
-s: stack size (kbytes)
-c: core file size (blocks)
-m: resident set size (kbytes)
-u: processes
-n: file descriptors
-l: locked-in-memory size (kb)
-v: address space (kb)
-x: file locks
-i: pending signals
-q: bytes in POSIX msg queues
-e: max nice
-r: max rt priority
-N 15:
unlimited
unlimited
unlimited
8192
0
unlimited
192276
21000
40000
unlimited
unlimited
192276
819200
30
65
unlimited
ulimit refers to the per-user limitations for various resources. Therefore, if your mongod instance executes as
a user that is also running multiple processes, or multiple mongod processes, you might see contention for these
resources. Also, be aware that the processes value (i.e. -u) refers to the combined number of distinct processes
and sub-process threads.
You can change ulimit settings by issuing a command in the following form:
ulimit -n <value>
There are both hard and the soft ulimits that affect MongoDBs performance. The hard ulimit refers to
the maximum number of processes that a user can have active at any time. This is the ceiling: no non-root process
can increase the hard ulimit. In contrast, the soft ulimit is the limit that is actually enforced for a session or
process, but any process can increase it up to hard ulimit maximum.
A low soft ulimit can cause cant create new thread, closing connection errors if the number
282
Chapter 5. Administration
of connections grows too high. For this reason, it is extremely important to set both ulimit values to the recommended values.
ulimit will modify both hard and soft values unless the -H or -S modifiers are specified when modifying limit
values.
For many distributions of Linux you can change values by substituting the -n option for any possible value in the
output of ulimit -a. On OS X, use the launchctl limit command. See your operating system documentation
for the precise procedure for changing system limits on running systems.
After changing the ulimit settings, you must restart the process to take advantage of the modified settings. You can
use the /proc file system to see the current limitations on a running process.
Depending on your systems configuration, and default settings, any change to system limits made using ulimit
may revert following system a system restart. Check your distribution and operating system documentation for more
information.
Note: SUSE Linux Enterprise Server 11 and potentially other versions of SLES and other SUSE distributions ship
with virtual memory address space limited to 8GB by default. This must be adjusted in order to prevent virtual memory
allocation failures as the database grows.
The SLES packages for MongoDB adjust these limits in the default scripts, but you will need to make this change
manually if you are using custom scripts and/or the tarball release rather than the SLES packages.
Every deployment may have unique requirements and settings; however, the following thresholds and settings are
particularly important for mongod and mongos deployments:
-f (file size): unlimited
-t (cpu time): unlimited
-v (virtual memory): unlimited 87
-n (open files): 64000
-m (memory size): unlimited 1
88
-u (processes/threads): 64000
Always remember to restart your mongod and mongos instances after changing the ulimit settings to ensure that
the changes take effect.
Linux distributions using Upstart
For Linux distributions that use Upstart, you can specify limits within service scripts if you start mongod and/or
mongos instances as Upstart services. You can do this by using limit stanzas89 .
Specify the Recommended ulimit Settings (page 283), as in the following example:
limit fsize unlimited unlimited
limit cpu unlimited unlimited
limit as unlimited unlimited
# (file size)
# (cpu time)
# (virtual memory size)
87 If you limit virtual or resident memory size on a system running MongoDB the operating system will refuse to honor additional allocation
requests.
88 The -m parameter to ulimit has no effect on Linux systems with kernel versions more recent than 2.4.30. You may omit -m if you wish.
89 http://upstart.ubuntu.com/wiki/Stanzas#limit
283
# (open files)
# (processes/threads)
Each limit stanza sets the soft limit to the first value specified and the hard limit to the second.
After after changing limit stanzas, ensure that the changes take effect by restarting the application services, using
the following form:
restart <service name>
For Linux distributions that use systemd, you can specify limits within the [Service] sections of service scripts
if you start mongod and/or mongos instances as systemd services. You can do this by using resource limit directives90 .
Specify the Recommended ulimit Settings (page 283), as in the following example:
[Service]
# Other directives omitted
# (file size)
LimitFSIZE=infinity
# (cpu time)
LimitCPU=infinity
# (virtual memory size)
LimitAS=infinity
# (open files)
LimitNOFILE=64000
# (processes/threads)
LimitNPROC=64000
Each systemd limit directive sets both the hard and soft limits to the value specified.
After after changing limit stanzas, ensure that the changes take effect by restarting the application services, using
the following form:
systemctl restart <service name>
284
Chapter 5. Administration
You can copy and paste this function into a current shell session or load it as part of a script. Call the function with
one the following invocations:
return-limits mongod
return-limits mongos
return-limits mongod mongos
If your system uses grub-legacy, you can edit the configuration file directly.
Add transparent_hugepage=never to the kernel line of your
http://docs.mongodb.org/manual/boot/grub/grub.conf as root:
/etc/grub.conf
or
If your system uses grub2, you can edit the GRUB_CMDLINE_LINUX_DEFAULT value in /etc/default/grub
as root to add transparent_hugepage=never to any existing kernel options:
GRUB_CMDLINE_LINUX_DEFAULT="<kernel options> transparent_hugepage=never"
as root.
See your operating systems documentation for details on the precise location of the grub-legacy or grub2
configuration file.
285
If necessary, create a new profile from an existing default profile by copying the relevant directory. In the example we
use the default profile as the base and call our new profile no-thp, but you may start with any profile and name
the new profile whatever you wish.
cp -r /etc/tune-profiles/default /etc/tune-profiles/no-thp
Locate the ktune.sh file in the new profile directory (/etc/tune-profiles/no-thp/ktune.sh in this
case) and add the following:
set_transparent_hugepages never
where no-thp should be replaced with the name of the profile you defined, if it is different.
Warning: If you are using tuned and ktune and do not wish to disable them, you must use this method, or
tuned may override THP settings configured using either of the other two methods.
In /etc/rc.local (Alternate)
Step 1: Edit /etc/rc.local
/etc/rc.local is a user-configurable script that is run at the end of post-boot system initialization. Add the
following script lines to the file as root to disable THP upon each boot.
if test -f /sys/kernel/mm/transparent_hugepage/khugepaged/defrag; then
echo 0 > /sys/kernel/mm/transparent_hugepage/khugepaged/defrag
fi
if test -f /sys/kernel/mm/transparent_hugepage/defrag; then
echo never > /sys/kernel/mm/transparent_hugepage/defrag
fi
286
Chapter 5. Administration
These lines should immediately precede exit 0, which should already be the last line in the file. Note that on Red
Hat Enterprise Linux, CentOS, and potentially other Red Hat-based derivatives, transparent_hugepage in the
paths in the script should be replaced by redhat_transparent_hugepages.
Step 2: Apply the changes to /etc/rc.local as root
source /etc/rc.local
or
cat /sys/mm/kernel/mm/redhat_transparent_hugepage/enabled
on Red Hat Enterprise Linux, CentOS, and potentially other Red Hat-based derivatives.
Correct output resembles:
always madvise [never]
287
admin.system.version
New in version 2.6.
Stores the schema version of the user credential documents.
System collections also include these collections stored directly in each database:
<database>.system.namespaces
Deprecated since version 3.0: Access this data using listCollections.
The <database>.system.namespaces (page 288) collection contains information about all of the
databases collections.
<database>.system.indexes
Deprecated since version 3.0: Access this data using listIndexes.
The <database>.system.indexes (page 288) collection lists all the indexes in the database.
<database>.system.profile
The <database>.system.profile (page 288) collection stores database profiling information. For information on profiling, see Database Profiling (page 219).
<database>.system.js
The <database>.system.js (page 288) collection holds special JavaScript code for use in server side
JavaScript (page 260). See Store a JavaScript Function on the Server (page 233) for more information.
288
Chapter 5. Administration
"nscanned" : 8,
"scanAndOrder" : true,
"moved" : true,
"nmoved" : 1,
"nupdated" : 1,
"keyUpdates" : 0,
"numYield" : 0,
"lockStats" : {
"timeLockedMicros" : {
"r" : NumberLong(0),
"w" : NumberLong(258)
},
"timeAcquiringMicros" : {
"r" : NumberLong(0),
"w" : NumberLong(7)
}
},
"millis" : 0,
"client" : "127.0.0.1",
"user" : ""
}
Output Reference
For any single operation, the documents created by the database profiler will include a subset of the following fields.
The precise selection of fields in these documents depends on the type of operation.
system.profile.ts
The timestamp of the operation.
system.profile.op
The type of operation. The possible values are:
insert
query
update
remove
getmore
command
system.profile.ns
The namespace the operation targets. Namespaces in MongoDB take the form of the database, followed by a
dot (.), followed by the name of the collection.
system.profile.query
The query document (page 95) used.
system.profile.command
The command operation.
system.profile.updateobj
The <update> document passed in during an update (page 71) operation.
system.profile.cursorid
The ID of the cursor accessed by a getmore operation.
289
system.profile.ntoreturn
Changed in version 2.2: In 2.0, MongoDB includes this field for query and command operations. In 2.2, this
information MongoDB also includes this field for getmore operations.
The number of documents the operation specified to return. For example, the profile command would
return one document (a results document) so the ntoreturn (page 289) value would be 1. The limit(5)
command would return five documents so the ntoreturn (page 289) value would be 5.
If the ntoreturn (page 289) value is 0, the command did not specify a number of documents to return, as
would be the case with a simple find() command with no limit specified.
system.profile.ntoskip
New in version 2.2.
The number of documents the skip() method specified to skip.
system.profile.nscanned
The number of documents that MongoDB scans in the index (page 463) in order to carry out the operation.
In general, if nscanned (page 290) is much higher than nreturned (page 291), the database is scanning
many objects to find the target objects. Consider creating an index to improve this.
system.profile.scanAndOrder
scanAndOrder (page 290) is a boolean that is true when a query cannot use the order of documents in the
index for returning sorted results: MongoDB must sort the documents after it receives the documents from a
cursor.
If scanAndOrder (page 290) is false, MongoDB can use the order of the documents in an index to return
sorted results.
system.profile.moved
Changed in version 3.0.0: Only appears when using the MMAPv1 storage engine.
This field appears with a value of true when an update operation moved one or more documents to a new
location on disk. If the operation did not result in a move, this field does not appear. Operations that result in
a move take more time than updates that do not result in a move and typically occur as a result of document
growth.
See also:
Document Growth and the MMAPv1 Storage Engine (page 85)
system.profile.nmoved
New in version 2.2.
Changed in version 3.0.0: Only appears when using the MMAPv1 storage engine.
The number of documents the operation moved on disk. This field appears only if the operation resulted in a
move. The fields implicit value is zero, and the field is present only when non-zero.
system.profile.nupdated
New in version 2.2.
The number of documents updated by the operation.
system.profile.keyUpdates
New in version 2.2.
The number of index (page 463) keys the update changed in the operation. Changing an index key carries a
small performance cost because the database must remove the old key and inserts a new key into the B-tree
index.
system.profile.numYield
New in version 2.2.
290
Chapter 5. Administration
The number of times the operation yielded to allow other operations to complete. Typically, operations yield
when they need access to data that MongoDB has not yet fully read into memory. This allows other operations
that have data in memory to complete while MongoDB reads in data for the yielding operation. For more
information, see the FAQ on when operations yield (page 739).
Changed in version 3.0.0: system.profile.numYeild does not apply to databases using the WiredTiger
(page 88) storage engine, and as such, is not included in the profiler output for those databases.
system.profile.lockStats
New in version 2.2.
The time in microseconds the operation spent acquiring and holding locks. This field reports data for the
following lock types:
R - global read lock
W - global write lock
r - database-specific read lock
w - database-specific write lock
system.profile.lockStats.timeLockedMicros
The time in microseconds the operation held a specific lock. For operations that require more than one
lock, like those that lock the local database to update the oplog, this value may be longer than the total
length of the operation (i.e. millis (page 291).)
system.profile.lockStats.timeAcquiringMicros
The time in microseconds the operation spent waiting to acquire a specific lock.
Changed in version 3.0.0: system.profile.lockStats (page 291) does not apply to databases using the
WiredTiger (page 88) storage engine, and as such, is not included in the profiler output for those databases.
system.profile.nreturned
The number of documents returned by the operation.
system.profile.responseLength
The length in bytes of the operations result document. A large responseLength (page 291) can affect
performance. To limit the size of the result document for a query operation, you can use any of the following:
Projections (page 106)
The limit() method
The batchSize() method
Note: When MongoDB writes query profile information to the log, the responseLength (page 291) value
is in a field named reslen.
system.profile.millis
The time in milliseconds from the perspective of the mongod from the beginning of the operation to the end of
the operation.
system.profile.client
The IP address or hostname of the client connection where the operation originates.
For some operations, such as db.eval(), the client is 0.0.0.0:0 instead of an actual client.
system.profile.user
The authenticated user who ran the operation.
291
The server-status-locks section reports statistics for each lock type and mode:
"locks" : {
"Global" : {
"acquireCount" : {
"r" : NumberLong(<num>),
"w" : NumberLong(<num>),
"R" : NumberLong(<num>),
"W" : NumberLong(<num>)
},
"acquireWaitCount" : {
"r" : NumberLong(<num>),
"w" : NumberLong(<num>),
"R" : NumberLong(<num>),
"W" : NumberLong(<num>)
},
"timeAcquiringMicros" : {
"r" : NumberLong(<num>),
"w" : NumberLong(<num>),
"R" : NumberLong(<num>),
"W" : NumberLong(<num>)
},
"deadlockCount" : {
"r" : NumberLong(<num>),
"w" : NumberLong(<num>),
"R" : NumberLong(<num>),
"W" : NumberLong(<num>)
}
},
"MMAPV1Journal" : {
"acquireCount" : {
"r" : NumberLong(<num>),
"w" : NumberLong(<num>),
"R" : NumberLong(<num>),
"W" : NumberLong(<num>)
},
"acquireWaitCount" : {
292
Chapter 5. Administration
"r"
"w"
"R"
"W"
:
:
:
:
NumberLong(<num>),
NumberLong(<num>),
NumberLong(<num>),
NumberLong(<num>)
},
"timeAcquiringMicros" : {
"r" : NumberLong(<num>),
"w" : NumberLong(<num>),
"R" : NumberLong(<num>),
"W" : NumberLong(<num>)
},
"deadlockCount" : {
"r" : NumberLong(<num>),
"w" : NumberLong(<num>),
"R" : NumberLong(<num>),
"W" : NumberLong(<num>)
}
},
"Database" : {
"acquireCount" : {
"r" : NumberLong(<num>),
"w" : NumberLong(<num>),
"R" : NumberLong(<num>),
"W" : NumberLong(<num>)
},
"acquireWaitCount" : {
"r" : NumberLong(<num>),
"w" : NumberLong(<num>),
"R" : NumberLong(<num>),
"W" : NumberLong(<num>)
},
"timeAcquiringMicros" : {
"r" : NumberLong(<num>),
"w" : NumberLong(<num>),
"R" : NumberLong(<num>),
"W" : NumberLong(<num>)
},
"deadlockCount" : {
"r" : NumberLong(<num>),
"w" : NumberLong(<num>),
"R" : NumberLong(<num>),
"W" : NumberLong(<num>)
}
},
"Collection" : {
"acquireCount" : {
"r" : NumberLong(<num>),
"w" : NumberLong(<num>),
"R" : NumberLong(<num>),
"W" : NumberLong(<num>)
},
"acquireWaitCount" : {
"r" : NumberLong(<num>),
"w" : NumberLong(<num>),
"R" : NumberLong(<num>),
"W" : NumberLong(<num>)
},
"timeAcquiringMicros" : {
293
"r"
"w"
"R"
"W"
:
:
:
:
NumberLong(<num>),
NumberLong(<num>),
NumberLong(<num>),
NumberLong(<num>)
},
"deadlockCount" : {
"r" : NumberLong(<num>),
"w" : NumberLong(<num>),
"R" : NumberLong(<num>),
"W" : NumberLong(<num>)
}
},
"Metadata" : {
"acquireCount" : {
"r" : NumberLong(<num>),
"w" : NumberLong(<num>),
"R" : NumberLong(<num>),
"W" : NumberLong(<num>)
},
"acquireWaitCount" : {
"r" : NumberLong(<num>),
"w" : NumberLong(<num>),
"R" : NumberLong(<num>),
"W" : NumberLong(<num>)
},
"timeAcquiringMicros" : {
"r" : NumberLong(<num>),
"w" : NumberLong(<num>),
"R" : NumberLong(<num>),
"W" : NumberLong(<num>)
},
"deadlockCount" : {
"r" : NumberLong(<num>),
"w" : NumberLong(<num>),
"R" : NumberLong(<num>),
"W" : NumberLong(<num>)
}
},
"oplog" : {
"acquireCount" : {
"r" : NumberLong(<num>),
"w" : NumberLong(<num>),
"R" : NumberLong(<num>),
"W" : NumberLong(<num>)
},
"acquireWaitCount" : {
"r" : NumberLong(<num>),
"w" : NumberLong(<num>),
"R" : NumberLong(<num>),
"W" : NumberLong(<num>)
},
"timeAcquiringMicros" : {
"r" : NumberLong(<num>),
"w" : NumberLong(<num>),
"R" : NumberLong(<num>),
"W" : NumberLong(<num>)
},
"deadlockCount" : {
294
Chapter 5. Administration
"r"
"w"
"R"
"W"
:
:
:
:
NumberLong(<num>),
NumberLong(<num>),
NumberLong(<num>),
NumberLong(<num>)
}
}
},
The server-status-globallock field reports on MongoDBs global system lock. In most cases the locks document
provides more fine grained data that reflects lock use:
"globalLock" : {
"totalTime" : <num>,
"lockTime" : <num>,
"currentQueue" : {
"total" : <num>,
"readers" : <num>,
"writers" : <num>
},
"activeClients" : {
"total" : <num>,
"readers" : <num>,
"writers" : <num>
}
},
The server-status-connections field reports on MongoDBs current number of open incoming connections:
Changed in version 2.4: The totalCreated field.
"connections" : {
"current" : <num>,
"available" : <num>,
"totalCreated" : NumberLong(<num>)
},
The fields in the server-status-extra-info document provide platform specific information. The following example
block is from a Linux-based system:
"extra_info" : {
"note" : "fields vary by platform",
"heap_usage_bytes" : <num>,
"page_faults" : <num>
},
295
"indexCounters" : {
"accesses" : <num>,
"hits" : <num>,
"misses" : <num>,
"resets" : <num>,
"missRatio" : <num>
},
The server-status-backgroundflushing document reports on the process MongoDB uses to write data to disk. The
server-status-backgroundflushing information only returns for instances that use the MMAPv1 storage engine:
"backgroundFlushing" : {
"flushes" : <num>,
"total_ms" : <num>,
"average_ms" : <num>,
"last_ms" : <num>,
"last_finished" : ISODate("")
},
The server-status-repl document reports on the state of replication and the replica set. This document only appears for
replica sets.
"repl" : {
"setName" : "<string>",
"ismaster" : <boolean>,
"secondary" : <boolean>,
"hosts" : [
<hostname>,
<hostname>,
<hostname>
],
"primary" : <hostname>,
"me" : <hostname>,
"rbid": <num>,
"slaves": [
{
"rid": <ObjectId>,
"optime": <timestamp>,
"host": <hostname>,
"memberID": <num>
296
Chapter 5. Administration
}
],
},
The server-status-opcounters document reports the number of operations this MongoDB instance has processed:
"opcounters" : {
"insert" : <num>,
"query" : <num>,
"update" : <num>,
"delete" : <num>,
"getmore" : <num>,
"command" : <num>
},
The server-status-range-deleter document reports the number of operations this MongoDB instance has processed.
The rangeDeleter document is only present in the output of serverStatus when explicitly enabled.
"rangeDeleter" : {
"lastDeleteStats" : [
{
"deletedDocs" : NumberLong(<num>),
"queueStart" : <date>,
"queueEnd" : <date>,
"deleteStart" : <date>,
"deleteEnd" : <date>,
"waitForReplStart" : <date>,
"waitForReplEnd" : <date>
}
]
}
The server-status-security document reports details about the security features and use:
"security" : {
"SSLServerSubjectName": <string>,
"SSLServerHasCertificateAuthority": <boolean>,
"SSLServerCertificateExpirationDate": <date>
},
The server-status-storage-engine document reports details about the current storage engine:
"storageEngine" : {
"name" : <string>
},
The server-status-asserts document reports the number of assertions or errors produced by the server:
297
"asserts" : {
"regular" : <num>,
"warning" : <num>,
"msg" : <num>,
"user" : <num>,
"rollovers" : <num>
},
The server-status-journaling document reports on data that reflect this mongod instances journaling-related operations and performance during a journal group commit interval (page 232). The server-status-journaling information
only returns for instances that use the MMAPv1 storage engine and have journaling enabled:
"dur" : {
"commits" : <num>,
"journaledMB" : <num>,
"writeToDataFilesMB" : <num>,
"compression" : <num>,
"commitsInWriteLock" : <num>,
"earlyCommits" : <num>,
"timeMs" : {
"dt" : <num>,
"prepLogBuffer" : <num>,
"writeToJournal" : <num>,
"writeToDataFiles" : <num>,
"remapPrivateView" : <num>
}
},
The server-status-recordstats document reports data on MongoDBs ability to predict page faults and yield write
operations when required data isnt in memory:
"recordStats" : {
"accessesNotInMemory" : <num>,
"pageFaultExceptionsThrown" : <num>,
"local" : {
"accessesNotInMemory" : <num>,
"pageFaultExceptionsThrown" : <num>
},
"<database>" : {
"accessesNotInMemory" : <num>,
"pageFaultExceptionsThrown" : <num>
}
},
The server-status-workingset document provides an estimated size of the MongoDB instances working set. This data
may not exactly reflect the size of the working set in all cases. Additionally, the workingSet document is only
present in the output of serverStatus when explicitly enabled.
New in version 2.4.
"workingSet" : {
"note" : "thisIsAnEstimate",
"pagesInMemory" : <num>,
"computationTimeMicros" : <num>,
"overSeconds" : num
},
298
Chapter 5. Administration
The server-status-metrics document contains a number of operational metrics that are useful for monitoring the state
and workload of a mongod instance.
New in version 2.4.
Changed in version 2.6: Added the cursor document.
"metrics" : {
"command": {
"<command>": {
"failed": <num>,
"total": <num>
}
},
"cursor" : {
"timedOut" : NumberLong(<num>),
"open" : {
"noTimeout" : NumberLong(<num>),
"pinned" : NumberLong(<num>),
"total" : NumberLong(<num>)
}
},
"document" : {
"deleted" : NumberLong(<num>),
"inserted" : NumberLong(<num>),
"returned" : NumberLong(<num>),
"updated" : NumberLong(<num>)
},
"getLastError" : {
"wtime" : {
"num" : <num>,
"totalMillis" : <num>
},
"wtimeouts" : NumberLong(<num>)
},
"operation" : {
"fastmod" : NumberLong(<num>),
"idhack" : NumberLong(<num>),
"scanAndOrder" : NumberLong(<num>)
},
"queryExecutor": {
"scanned" : NumberLong(<num>)
},
"record" : {
"moves" : NumberLong(<num>)
},
"repl" : {
"apply" : {
"batches" : {
"num" : <num>,
"totalMillis" : <num>
},
"ops" : NumberLong(<num>)
},
"buffer" : {
"count" : NumberLong(<num>),
"maxSizeBytes" : <num>,
"sizeBytes" : NumberLong(<num>)
299
},
"network" : {
"bytes" : NumberLong(<num>),
"getmores" : {
"num" : <num>,
"totalMillis" : <num>
},
"ops" : NumberLong(<num>),
"readersCreated" : NumberLong(<num>)
},
"oplog" : {
"insert" : {
"num" : <num>,
"totalMillis" : <num>
},
"insertBytes" : NumberLong(<num>)
},
"preload" : {
"docs" : {
"num" : <num>,
"totalMillis" : <num>
},
"indexes" : {
"num" : <num>,
"totalMillis" : <num>
}
}
},
"storage" : {
"freelist" : {
"search" : {
"bucketExhausted" : <num>,
"requests" : <num>,
"scanned" : <num>
}
}
},
"ttl" : {
"deletedDocuments" : NumberLong(<num>),
"passes" : NumberLong(<num>)
}
},
The final ok field holds the return status for the serverStatus command:
"ok" : 1
300
Chapter 5. Administration
Journal Files
With journaling enabled, MongoDB creates a journal subdirectory within the directory defined by dbPath, which is
/data/db by default. The journal directory holds journal files, which contain write-ahead redo logs. The directory
also holds a last-sequence-number file. A clean shutdown removes all the files in the journal directory. A dirty shutdown (crash) leaves files in the journal directory; these are used to automatically recover the database to a consistent
state when the mongod process is restarted.
Journal files are append-only files and have file names prefixed with j._. When a journal file holds 1 gigabyte of data,
MongoDB creates a new journal file. Once MongoDB applies all the write operations in a particular journal file to the
database data files, it deletes the file, as it is no longer needed for recovery purposes. Unless you write many bytes of
data per second, the journal directory should contain only two or three journal files.
You can use the storage.smallFiles run time option when starting mongod to limit the size of each journal
file to 128 megabytes, if you prefer.
To speed the frequent sequential writes that occur to the current journal file, you can ensure that the journal directory
is on a different filesystem from the database data files.
Important: If you place the journal on a different filesystem from your data files you cannot use a filesystem snapshot
alone to capture valid backups of a dbPath directory. In this case, use fsyncLock() to ensure that database files
are consistent before the snapshot and fsyncUnlock() once the snapshot is complete.
Note: Depending on your filesystem, you might experience a preallocation lag the first time you start a mongod
instance with journaling enabled.
MongoDB may preallocate journal files if the mongod process determines that it is more efficient to preallocate
journal files than create new journal files as needed. The amount of time required to pre-allocate lag might last several
minutes, during which you will not be able to connect to the database. This is a one-time preallocation and does not
occur with future invocations.
To avoid preallocation lag, see Avoid Preallocation Lag (page 231).
Storage Views used in Journaling
With journaling, MongoDBs storage layer has two internal views of the data set.
The shared view stores modified data for upload to the MongoDB data files. The shared view is the only view
with direct access to the MongoDB data files. When running with journaling, mongod asks the operating system to
map your existing on-disk data files to the shared view virtual memory view. The operating system maps the files
but does not load them. MongoDB later loads data files into the shared view as needed.
The private view stores data for use with read operations (page 58). The private view is the first place
MongoDB applies new write operations (page 71). Upon a journal commit, MongoDB copies the changes made in
the private view to the shared view, where they are then available for uploading to the database data files.
The journal is an on-disk view that stores new write operations after MongoDB applies the operation to the private
view but before applying them to the data files. The journal provides durability. If the mongod instance were to
crash without having applied the writes to the data files, the journal could replay the writes to the shared view for
eventual upload to the data files.
How Journaling Records Write Operations
MongoDB copies the write operations to the journal in batches called group commits. These group commits help
minimize the performance impact of journaling, since a group commit must block all writers during the commit. See
301
302
Chapter 5. Administration
12
Returned by the mongod.exe process on Windows when it receives a Control-C, Close, Break or Shutdown
event.
14
Returned by MongoDB applications which encounter an unrecoverable error, an uncaught exception or uncaught
signal. The system exits without performing a clean shut down.
20
Message: ERROR: wsastartup failed <reason>
Returned by MongoDB applications on Windows following an error in the WSAStartup function.
Message: NT Service Error
Returned by MongoDB applications for Windows due to failures installing, starting or removing the NT Service
for the application.
45
Returned when a MongoDB application cannot open a file or cannot obtain a lock on a file.
47
MongoDB applications exit cleanly following a large clock skew (32768 milliseconds) event.
48
mongod exits cleanly if the server socket closes. The server socket is on port 27017 by default, or as specified
to the --port run-time option.
49
Returned by mongod.exe or mongos.exe on Windows when either receives a shutdown message from the
Windows Service Control Manager.
100
Returned by mongod when the process throws an uncaught exception.
303
304
Chapter 5. Administration
CHAPTER 6
Security
This section outlines basic security and risk management strategies and access control. The included tutorials outline
specific tasks for configuring firewalls, authentication, and system privileges.
Security Introduction (page 305) A high-level introduction to security and MongoDB deployments.
Security Concepts (page 307) The core documentation of security.
Authentication (page 308) Mechanisms for verifying user and instance access to MongoDB.
Authorization (page 312) Control access to MongoDB instances using authorization.
Network Exposure and Security (page 314) Discusses potential security risks related to the network and strategies for decreasing possible network-based attack vectors for MongoDB.
Continue reading from Security Concepts (page 307) for additional documentation of MongoDBs security
features and operation.
Security Tutorials (page 321) Tutorials for enabling and configuring security features for MongoDB.
Security Checklist (page 322) A high level overview of global security consideration for administrators of
MongoDB deployments. Use this checklist if you are new to deploying MongoDB in production and
want to implement high quality security practices.
Network Security Tutorials (page 324) Ensure that the underlying network configuration supports a secure operating environment for MongoDB deployments, and appropriately limits access to MongoDB deployments.
Access Control Tutorials (page 344) These tutorials describe procedures relevant for the configuration, operation, and maintenance of MongoDBs access control system.
User and Role Management Tutorials (page 370) MongoDBs access control system provides a flexible rolebased access control system that you can use to limit access to MongoDB deployments. The tutorials in
this section describe the configuration an setup of the authorization system.
Continue reading from Security Tutorials (page 321) for additional tutorials that address the use and management
of secure MongoDB deployments.
Create a Vulnerability Report (page 387) Report a vulnerability in MongoDB.
Security Reference (page 389) Reference for security related functions.
6.1.1 Authentication
Before gaining access to a system all clients should identify themselves to MongoDB. This ensures that no client can
access the data stored in MongoDB without being explicitly allowed.
MongoDB supports a number of authentication mechanisms (page 308) that clients can use to verify their identity.
MongoDB supports two mechanisms: a password-based challenge and response protocol and x.509 certificates. Additionally, MongoDB Enterprise1 also provides support for LDAP proxy authentication (page 310) and Kerberos authentication (page 310).
See Authentication (page 308) for more information.
6.1.3 Auditing
Auditing provides administrators with the ability to verify that the implemented security policies are controlling activity in the system. Retaining audit information ensures that administrators have enough information to perform forensic
investigations and comply with regulations and polices that require audit data.
See Auditing (page 317) for more information.
6.1.4 Encryption
Transport Encryption
You can use SSL to encrypt all of MongoDBs network traffic. SSL ensures that MongoDB network traffic is only
readable by the intended client.
See Configure mongod and mongos for SSL (page 331) for more information.
1 http://www.mongodb.com/products/mongodb-enterprise
306
Chapter 6. Security
Encryption at Rest
There are two broad classes of approaches to encrypting data at rest with MongoDB. You can use these solutions
together or independently:
Application Level Encryption
Provide encryption on a per-field or per-document basis within the application layer. To encrypt document or field
level data, write custom encryption and decryption routines or use a commercial solutions such as the Vormetric Data
Security Platform2 .
Storage Encryption
Encrypt all MongoDB data on the storage or operating system to ensure that only authorized processes can access
protected data. A number of third-party libraries can integrate with the operating system to provide transparent disklevel encryption. For example:
Linux Unified Key Setup (LUKS) LUKS is available for most Linux distributions. For configuration explanation,
see the LUKS documentation from Red Hat3 .
IBM Guardium Data Encryption IBM Guardium Data Encryption4 provides support for disk-level encryption for
Linux and Windows operating systems.
Vormetric Data Security Platform The Vormetric Data Security Platform5 provides disk and file-level encryption in
addition to application level encryption.
Bitlocker Drive Encryption Bitlocker Drive Encryption6 is a feature available on Windows Server 2008 and 2012
that provides disk encryption.
Properly configured disk encryption, when used alongside good security policies that protect relevant accounts, passwords, and encryption keys, can help ensure compliance with standards, including HIPAA, PCI-DSS, and FERPA.
307
6.2.1 Authentication
Authentication is the process of verifying the identity of a client. When access control, i.e. authorization (page 312),
is enabled, MongoDB requires all clients to authenticate themselves first in order to determine the access for the client.
Although authentication and authorization (page 312) are closely connected, authentication is distinct from authorization. Authentication verifies the identity of a user; authorization determines the verified users access to resources and
operations.
MongoDB supports a number of authentication mechanisms (page 308) that clients can use to verify their identity.
These mechanisms allow MongoDB to integrate into your existing authentication system. See Authentication Mechanisms (page 308) for details.
In addition to verifying the identity of a client, MongoDB can require members of replica sets and sharded clusters to
authenticate their membership (page 310) to their respective replica set or sharded cluster. See Authentication Between
MongoDB Instances (page 310) for more information.
Client Users
To authenticate a client in MongoDB, you must add a corresponding user to MongoDB. When adding a user, you
create the user in a specific database. Together, the users name and database serve as a unique identifier for that
user. That is, if two users have the same name but are created in different databases, they are two separate users. To
authenticate, the client must authenticate the user against the users database. For instance, if using the mongo shell
as a client, you can specify the database for the user with the authenticationDatabase option.
To add and manage user information, MongoDB provides the db.createUser() method as well as other user
management methods. For an example of adding a user to MongoDB, see Add a User to a Database (page 372).
MongoDB stores all user information, including name (page 401), password (page 401), and the users
database (page 401), in the system.users (page 400) collection in the admin database.
Authentication Mechanisms
Changed in version 3.0.
MongoDB supports multiple authentication mechanisms. MongoDBs default authentication method is a challenge
and response mechanism (SCRAM-SHA-1) (page 309). MongoDB also supports x509 certificate authentication
(page 309), LDAP proxy authentication (page 310), and Kerberos authentication (page 310).
This section introduces the mechanisms available in MongoDB.
To specify the authentication mechanism to use, see authenticationMechanisms.
308
Chapter 6. Security
SCRAM-SHA-1 Authentication
309
For membership authentication, members of sharded clusters and replica sets can use x.509 certificates instead of key
files. See Use x.509 Certificate for Membership Authentication (page 350) for more information.
Kerberos Authentication
MongoDB Enterprise8 supports authentication using a Kerberos service. Kerberos is an industry standard authentication protocol for large client/server systems.
To use MongoDB with Kerberos, you must have a properly configured Kerberos deployment, configured Kerberos
service principals (page 319) for MongoDB, and added Kerberos user principal (page 319) to MongoDB.
See Kerberos Authentication (page 318) for more information on Kerberos and MongoDB. To configure MongoDB to
use Kerberos authentication, see Configure MongoDB with Kerberos Authentication on Linux (page 359) and Configure
MongoDB with Kerberos Authentication on Windows (page 362).
LDAP Proxy Authority Authentication
MongoDB Enterprise9 supports proxy authentication through a Lightweight Directory Access Protocol (LDAP) service. See Authenticate Using SASL and LDAP with OpenLDAP (page 356) and Authenticate Using SASL and LDAP
with ActiveDirectory (page 354).
MongoDB Enterprise for Windows does not include LDAP support for authentication. However, MongoDB Enterprise
for Linux supports using LDAP authentication with an ActiveDirectory server.
MongoDB does not support LDAP authentication in mixed sharded cluster deployments that contain both version 2.4
and version 2.6 shards.
Authentication Behavior
Client Authentication
Clients can authenticate using the challenge and response (page 309), x.509 (page 309), LDAP Proxy (page 310) and
Kerberos (page 310) mechanisms.
Each client connection should authenticate as exactly one user. If a client authenticates to a database as one user and
later authenticates to the same database as a different user, the second authentication invalidates the first. While clients
can authenticate as multiple users if the users are defined on different databases, we recommend authenticating as one
user at a time, providing the user with appropriate privileges on the databases required by the user.
See Authenticate to a MongoDB Instance or Cluster (page 364) for more information.
Authentication Between MongoDB Instances
You can authenticate members of replica sets and sharded clusters. To authenticate members of a single MongoDB
deployment to each other, MongoDB can use the keyFile and x.509 (page 309) mechanisms. Using keyFile
authentication for members also enables authorization.
Always run replica sets and sharded clusters in a trusted networking environment. Ensure that the network permits
only trusted traffic to reach each mongod and mongos instance.
8 http://www.mongodb.com/products/mongodb-enterprise
9 http://www.mongodb.com/products/mongodb-enterprise
310
Chapter 6. Security
Use your environments firewall and network routing to ensure that traffic only from clients and other members can
reach your mongod and mongos instances. If needed, use virtual private networks (VPNs) to ensure secure connections over wide area networks (WANs).
Always ensure that:
Your network configuration will allow every member of the replica set or sharded cluster to contact every other
member.
If you use MongoDBs authentication system to limit access to your infrastructure, ensure that you configure a
keyFile on all members to permit authentication.
See Generate a Key File (page 365) for instructions on generating a key file and turning on key file authentication for
members. For an example of using key files for sharded cluster authentication, see Enable Authentication in a Sharded
Cluster (page 346).
Authentication on Sharded Clusters
In sharded clusters, applications authenticate to directly to mongos instances, using credentials stored in the admin
database of the config servers. The shards in the sharded cluster also have credentials, and clients can authenticate
directly to the shards to perform maintenance directly on the shards. In general, applications and clients should connect
to the sharded cluster through the mongos.
Changed in version 2.6: Previously, the credentials for authenticating to a database on a cluster resided on the primary
shard (page 649) for that database.
Some maintenance operations, such as cleanupOrphaned, compact, rs.reconfig(), require direct connections to specific shards in a sharded cluster. To perform these operations with authentication enabled, you must connect
directly to the shard and authenticate as a shard local administrative user. To create a shard local administrative user,
connect directly to the shard and create the user. MongoDB stores shard local users in the admin database of the shard
itself. These shard local users are completely independent from the users added to the sharded cluster via mongos.
Shard local users are local to the shard and are inaccessible by mongos. Direct connections to a shard should only be
for shard-specific maintenance and configuration.
Localhost Exception
The localhost exception allows you to enable authorization before creating the first user in the system. When active,
the localhost exception allows connections from the localhost interface to create the first user on the admin database.
The exception applies only when there are no users created in the MongoDB instance.
Changed in version 3.0: The localhost exception changed so that these connections only have access to create the first
user on the admin database. In previous versions, connections that gained access using the localhost exception had
unrestricted access to the MongoDB instance.
If you use the localhost exception when deploying a new MongoDB system, the first user you create must be
in the admin database with privileges to create other users, such as a user with the userAdmin (page 392) or
userAdminAnyDatabase (page 396) role. See Enable Client Access Control (page 344) and Create a User Administrator (page 370) for more information.
In the case of a sharded cluster, the localhost exception applies to each shard individually as well as to the cluster as
a whole. Once you create a sharded cluster and add an administrator to the mongos instance, you must still prevent
unauthorized access to the individual shards. Follow one of the following steps for each shard in your cluster:
Create an administrative user, or
Disable the localhost exception at startup. To disable the localhost exception, use setParameter
in your configuration file, or --setParameter on the command line to set the
enableLocalhostAuthBypass parameter to 0.
6.2. Security Concepts
311
6.2.2 Authorization
MongoDB employs Role-Based Access Control (RBAC) to govern access to a MongoDB system. A user is granted
one or more roles (page 312) that determine the users access to database resources and operations. Outside of role
assignments, the user has no access to the system.
MongoDB does not enable authorization by default. You can enable authorization using the --auth or
the --keyFile options, or if using a configuration file, with the security.authorization or the
security.keyFile settings.
MongoDB provides built-in roles (page 390), each with a dedicated purpose for a common use case. Examples include
the read (page 390), readWrite (page 390), dbAdmin (page 391), and root (page 397) roles.
Administrators also can create new roles and privileges to cater to operational needs. Administrators can assign
privileges scoped as granularly as the collection level.
When granted a role, a user receives all the privileges of that role. A user can have several roles concurrently, in which
case the user receives the union of all the privileges of the respective roles.
Roles
A role consists of privileges that pair resources with allowed operations. Each privilege is defined directly in the role
or inherited from another role.
A roles privileges apply to the database where the role is created. A role created on the admin database can include
privileges that apply to all databases or to the cluster (page 403).
A user assigned a role receives all the privileges of that role. The user can have multiple roles and can have different
roles on different databases.
Roles always grant privileges and never limit access. For example, if a user has both read (page 390) and
readWriteAnyDatabase (page 396) roles on a database, the greater access prevails.
Privileges
A privilege consists of a specified resource and the actions permitted on the resource.
A privilege resource (page 402) is either a database, collection, set of collections, or the cluster. If the cluster, the
affiliated actions affect the state of the system rather than a specific database or collection.
An action (page 403) is a command or method the user is allowed to perform on the resource. A resource can have
multiple allowed actions. For available actions see Privilege Actions (page 403).
For example, a privilege that includes the update (page 404) action allows a user to modify existing documents on
the resource. To additionally grant the user permission to create documents on the resource, the administrator would
add the insert (page 404) action to the privilege.
For privilege syntax, see admin.system.roles.privileges (page 398).
Inherited Privileges
A role can include one or more existing roles in its definition, in which case the role inherits all the privileges of the
included roles.
A role can inherit privileges from other roles in its database. A role created on the admin database can inherit
privileges from roles in any database.
312
Chapter 6. Security
User-Defined Roles
By creating a role with privileges (page 312) that are scoped to a specific collection in a particular database, administrators can implement collection-level access control.
See Collection-Level Access Control (page 314) for more information.
Users
MongoDB stores user credentials in the protected admin.system.users (page 287). Use the user management
methods to view and edit user credentials.
Role Assignment to Users
User administrators create the users that access the systems databases. MongoDBs user management commands let
administrators create users and assign them roles.
MongoDB scopes a user to the database in which the user is created. MongoDB stores all user definitions in the admin
database, no matter which database the user is scoped to. MongoDB stores users in the admin databases system.users
collection (page 400). Do not access this collection directly but instead use the user management commands.
The first role assigned in a database should be either userAdmin (page 392) or userAdminAnyDatabase
(page 396). This user can then create all other users in the system. See Create a User Administrator (page 370).
Protect the User and Role Collections
MongoDB stores role and user data in the protected admin.system.roles (page 287) and
admin.system.users (page 287) collections, which are only accessible using the user management methods.
If you disable access control, do not modify the admin.system.roles (page 287) and admin.system.users
(page 287) collections using normal insert() and update() operations.
Additional Information
See the reference section for documentation of all built-in-roles (page 390) and all available privilege actions
(page 403). Also consider the reference for the form of the resource documents (page 402).
To create users see the Create a User Administrator (page 370) and Add a User to a Database (page 372) tutorials.
313
privileges: [
{ resource: { db: "products", collection: "inventory" }, actions: [ "find", "update", "insert" ] },
{ resource: { db: "products", collection: "orders" }, actions: [ "find" ] }
]
The first privilege scopes its actions to the inventory collection of the products database. The second privilege
scopes its actions to the orders collection of the products database.
Additional Information
For more information on user-defined roles and MongoDB authorization model, see Authorization (page 312). For a
tutorial on creating user-defined roles, see Create a Role (page 374).
The enabled setting for mongod and mongos instances disables the home status page.
Changed in version 2.6: The mongod and mongos instances run with the http interface disabled by default.
The status interface is read-only by default, and the default port for the status page is 28017. Authentication does not
control or affect access to this interface.
314
Chapter 6. Security
Warning: Disable this interface for production deployments. If you enable this interface, you should only allow
trusted clients to access this port. See Firewalls (page 315).
rest
The net.http.RESTInterfaceEnabled setting for mongod enables a fully interactive administrative REST
interface, which is disabled by default. The net.http.RESTInterfaceEnabled configuration makes the http
status interface 10 , which is read-only by default, fully interactive. Use the net.http.RESTInterfaceEnabled
setting with the enabled setting.
The REST interface does not support any authentication and you should always restrict access to this interface to only
allow trusted clients to connect to this port.
You may also enable this interface on the command line as mongod --rest --httpinterface.
Warning: Disable this option for production deployments. If do you leave this interface enabled, you should only
allow trusted clients to access this port.
bind_ip
The bindIp setting for mongod and mongos instances limits the network interfaces on which MongoDB programs
will listen for incoming connections. You can also specify a number of interfaces by passing bindIp a comma
separated list of IP addresses. You can use the mongod --bind_ip and mongos --bind_ip option on the
command line at run time to limit the network accessibility of a MongoDB program.
Important: Make sure that your mongod and mongos instances are only accessible on trusted networks. If your
system has more than one network interface, bind MongoDB programs to the private or internal network interface.
port
The port setting for mongod and mongos instances changes the main port on which the mongod or mongos
instance listens for connections. The default port is 27017. Changing the port does not meaningfully reduce risk or
limit exposure. You may also specify this option on the command line as mongod --port or mongos --port.
Setting port also indirectly sets the port for the HTTP status interface, which is always available on the port numbered
1000 greater than the primary mongod port.
Only allow trusted clients to connect to the port for the mongod and mongos instances. See Firewalls (page 315).
See also Security Considerations (page 193) and Default MongoDB Port (page 408).
Firewalls
Firewalls allow administrators to filter and control access to a system by providing granular control over what network
communications. For administrators of MongoDB, the following capabilities are important: limiting incoming traffic
on a specific port to specific systems, and limiting incoming traffic from untrusted hosts.
10
315
On Linux systems, the iptables interface provides access to the underlying netfilter firewall. On Windows
systems, netsh command line interface provides access to the underlying Windows Firewall. For additional information about firewall configuration, see Configure Linux iptables Firewall for MongoDB (page 324) and Configure
Windows netsh Firewall for MongoDB (page 327).
For best results and to minimize overall exposure, ensure that only traffic from trusted sources can reach mongod and
mongos instances and that the mongod and mongos instances can only connect to trusted outputs.
See also:
For MongoDB deployments on Amazons web services, see the Amazon EC211 page, which addresses Amazons
Security Groups and other EC2-specific security features.
Virtual Private Networks
Virtual private networks, or VPNs, make it possible to link two networks over an encrypted and limited-access trusted
network. Typically MongoDB users who use VPNs use SSL rather than IPSEC VPNs for performance issues.
Depending on configuration and implementation, VPNs provide for certificate validation and a choice of encryption
protocols, which requires a rigorous level of authentication and identification of all clients. Furthermore, because
VPNs provide a secure tunnel, by using a VPN connection to control access to your MongoDB instance, you can
prevent tampering and man-in-the-middle attacks.
The mongo program can evaluate JavaScript expressions using the command line --eval option. Also, the mongo
program can evaluate a JavaScript file (.js) passed directly to it (e.g. mongo someFile.js).
Because the mongo program evaluates the JavaScript directly, inputs should only come from trusted sources.
.mongorc.js File
If a .mongorc.js file exists 12 , the mongo shell will evaluate a .mongorc.js file before starting. You can disable
this behavior by passing the mongo --norc option.
11 http://docs.mongodb.org/ecosystem/platforms/amazon-ec2
12 On Linux and Unix systems, mongo reads the .mongorc.js file from $HOME/.mongorc.js (i.e. ~/.mongorc.js). On Windows,
mongo.exe reads the .mongorc.js file from %HOME%.mongorc.js or %HOMEDRIVE%%HOMEPATH%.mongorc.js.
316
Chapter 6. Security
REST API
The REST API to MongoDB provides additional information and write access on top of the HTTP status interface.
While the REST API does not provide any support for insert, update, or remove operations, it does provide administrative access, and its accessibility represents a vulnerability in a secure environment. The REST interface is disabled
by default, and is not recommended for production use.
If you must use the REST API, please control and limit access to the REST API. The REST API does not include any
support for authentication, even when running with authorization enabled.
See the following documents for instructions on restricting access to the REST API interface:
Configure Linux iptables Firewall for MongoDB (page 324)
Configure Windows netsh Firewall for MongoDB (page 327)
6.2.6 Auditing
New in version 2.6.
MongoDB Enterprise includes an auditing capability for mongod and mongos instances. The auditing facility allows
administrators and users to track system activity for deployments with multiple users and applications. The auditing
facility can write audit events to the console, the syslog, a JSON file, or a BSON file.
Audit Events and Filter
To enable auditing for MongoDB Enterprise, see Configure System Events Auditing (page 384).
Once enabled, the auditing system can record the following operations:
schema (DDL),
replica set,
317
In a Kerberos-based system, every participant in the authenticated communication is known as a principal, and every
principal must have a unique name.
Principals belong to administrative units called realms. For each realm, the Kerberos Key Distribution Center (KDC)
maintains a database of the realms principal and the principals associated secret keys.
For a client-server authentication, the client requests from the KDC a ticket for access to a specific asset. KDC
uses the clients secret and the servers secret to construct the ticket which allows the client and server to mutually
authenticate each other, while keeping the secrets hidden.
For the configuration of MongoDB for Kerberos support, two kinds of principal names are of interest: user principals
(page 319) and service principals (page 319).
13
Audit configuration can include a filter (page 385) to limit events to audit.
318
Chapter 6. Security
User Principal To authenticate using Kerberos, you must add the Kerberos user principals to MongoDB to the
$external database. User principal names have the form:
<username>@<KERBEROS REALM>
For every user you want to authenticate using Kerberos, you must create a corresponding user in MongoDB in the
$external database.
For examples of adding a user to MongoDB as well as authenticating as that user, see Configure MongoDB with
Kerberos Authentication on Linux (page 359) and Configure MongoDB with Kerberos Authentication on Windows
(page 362).
See also:
User and Role Management Tutorials (page 370) for general information regarding creating and managing users in
MongoDB.
Service Principal Every MongoDB mongod and mongos instance (or mongod.exe or mongos.exe on Windows) must have an associated service principal. Service principal names have the form:
<service>/<fully qualified domain name>@<KERBEROS REALM>
For MongoDB, the <service> defaults to mongodb. For example, if m1.example.com is a MongoDB server,
and example.com maintains the EXAMPLE.COM Kerberos realm, then m1 should have the service principal name
mongodb/[email protected].
To specify a different value for <service>, use serviceName during the start up of mongod or mongos (or
mongod.exe or mongos.exe). mongo shell or other clients may also specify a different service principal name
using serviceName.
Service principal names must be reachable over the network using the fully qualified domain name (FQDN) part of its
service principal name.
By default, Kerberos attempts to identify hosts using the /etc/kerb5.conf file before using DNS to resolve hosts.
On Windows, if running MongoDB as a service, see Assign Service Principal Name to MongoDB Windows Service
(page 364).
Linux Keytab Files
Linux systems can store Kerberos authentication keys for a service principal (page 319) in keytab files. Each Kerberized mongod and mongos instance running on Linux must have access to a keytab file containing keys for its service
principal (page 319).
To keep keytab files secure, use file permissions that restrict access to only the user that runs the mongod or mongos
process.
Tickets
On Linux, MongoDB clients can use Kerbeross kinit program to initialize a credential cache for authenticating the
user principal to servers.
319
Unlike on Linux systems, mongod and mongos instances running on Windows do not require access to keytab
files. Instead, the mongod and mongos instances read their server credentials from a credential store specific to the
operating system.
However, from the Windows Active Directory, you can export a keytab file for use on Linux systems. See Ktpass14
for more information.
Authenticate With Kerberos
To configure MongoDB for Kerberos support and authenticate, see Configure MongoDB with Kerberos Authentication
on Linux (page 359) and Configure MongoDB with Kerberos Authentication on Windows (page 362).
Operational Considerations
The HTTP Console
The MongoDB HTTP Console15 interface does not support Kerberos authentication.
DNS
Each host that runs a mongod or mongos instance must have both A and PTR DNS records to provide forward and
reverse lookup.
Without A and PTR DNS records, the host cannot resolve the components of the Kerberos domain or the Key Distribution Center (KDC).
System Time Synchronization
To successfully authenticate, the system time for each mongod and mongos instance must be within 5 minutes of the
system time of the other hosts in the Kerberos infrastructure.
Kerberized MongoDB Environments
Driver Support
320
Chapter 6. Security
Although MongoDB supports the use of Kerberos authentication with other authentication mechanisms, only add
the other mechanisms as necessary. See the Incorporate Additional Authentication Mechanisms
section in Configure MongoDB with Kerberos Authentication on Linux (page 359) and Configure MongoDB with
Kerberos Authentication on Windows (page 362) for details.
Additional Resources
MongoDB LDAP and Kerberos Authentication with Dell (Quest) Authentication Services20
MongoDB with Red Hat Enterprise Linux Identity Management and Kerberos21
321
Enable Authentication after Creating the User Administrator (page 347) Describes an alternative process for
enabling authentication for MongoDB deployments.
User and Role Management Tutorials (page 370) MongoDBs access control system provides a flexible role-based
access control system that you can use to limit access to MongoDB deployments. The tutorials in this section
describe the configuration an setup of the authorization system.
Add a User to a Database (page 372) Create non-administrator users using MongoDBs role-based authentication system.
Create a Role (page 374) Create custom role.
Modify a Users Access (page 379) Modify the actions available to a user on specific database resources.
View Roles (page 380) View a roles privileges.
Continue reading from User and Role Management Tutorials (page 370) for additional tutorials on managing
users and privileges in MongoDBs authorization system.
Configure System Events Auditing (page 384) Enable and configure MongoDB Enterprise system event auditing feature.
Create a Vulnerability Report (page 387) Report a vulnerability in MongoDB.
322
Chapter 6. Security
323
Rules in iptables configurations fall into chains, which describe the process for filtering and processing specific
streams of traffic. Chains have an order, and packets must pass through earlier rules in a chain to reach later rules.
This document addresses only the following two chains:
INPUT Controls all incoming traffic.
OUTPUT Controls all outgoing traffic.
Given the default ports (page 314) of all MongoDB processes, you must configure networking rules that permit only
required communication between your application and the appropriate mongod and mongos instances.
Be aware that, by default, the default policy of iptables is to allow all connections and traffic unless explicitly
disabled. The configuration changes outlined in this document will create rules that explicitly allow traffic from
specific addresses and on specific ports, using a default policy that drops all traffic that is not explicitly allowed. When
24 http://info.mongodb.com/rs/mongodb/images/MongoDB_Security_Architecture_WP.pdf
25 http://docs.mongodb.org/ecosystem/platforms/amazon-ec2
324
Chapter 6. Security
you have properly configured your iptables rules to allow only the traffic that you want to permit, you can Change
Default Policy to DROP (page 327).
Patterns
This section contains a number of patterns and examples for configuring iptables for use with MongoDB deployments. If you have configured different ports using the port configuration setting, you will need to modify the rules
accordingly.
Traffic to and from mongod Instances This pattern is applicable to all mongod instances running as standalone
instances or as part of a replica set.
The goal of this pattern is to explicitly allow traffic to the mongod instance from the application server. In the
following examples, replace <ip-address> with the IP address of the application server:
The first rule allows all incoming traffic from <ip-address> on port 27017, which allows the application server to
connect to the mongod instance. The second rule, allows outgoing traffic from the mongod to reach the application
server.
Optional
If you have only one application server, you can replace <ip-address> with either the IP address itself, such as:
198.51.100.55. You can also express this using CIDR notation as 198.51.100.55/32. If you want to permit
a larger block of possible IP addresses you can allow traffic from a /24 using one of the following specifications for
the <ip-address>, as follows:
10.10.10.10/24
10.10.10.10/255.255.255.0
Traffic to and from mongos Instances mongos instances provide query routing for sharded clusters. Clients
connect to mongos instances, which behave from the clients perspective as mongod instances. In turn, the mongos
connects to all mongod instances that are components of the sharded cluster.
Use the same iptables command to allow traffic to and from these instances as you would from the mongod
instances that are members of the replica set. Take the configuration outlined in the Traffic to and from mongod
Instances (page 325) section as an example.
Traffic to and from a MongoDB Config Server Config servers, host the config database that stores metadata
for sharded clusters. Each production cluster has three config servers, initiated using the mongod --configsvr
option. 26 Config servers listen for connections on port 27019. As a result, add the following iptables rules to the
config server to allow incoming and outgoing connection on port 27019, for connection to the other config servers.
Replace <ip-address> with the address or address space of all the mongod that provide config servers.
Additionally, config servers need to allow incoming connections from all of the mongos instances in the cluster and
all mongod instances in the cluster. Add rules that resemble the following:
26
You also can run a config server by using the configsvr value for the clusterRole setting in a configuration file.
325
Replace <ip-address> with the address of the mongos instances and the shard mongod instances.
Traffic to and from a MongoDB Shard Server For shard servers, running as mongod --shardsvr 27 Because
the default port number is 27018 when running with the shardsvr value for the clusterRole setting, you must
configure the following iptables rules to allow traffic to and from each shard:
Replace the <ip-address> specification with the IP address of all mongod. This allows you to permit incoming
and outgoing traffic between all shards including constituent replica set members, to:
all mongod instances in the shards replica sets.
all mongod instances in other shards.
28
Replace <ip-address> with the address of the instance that needs access to the HTTP or REST interface.
For all deployments, you should restrict access to this port to only the monitoring instance.
Optional
For config server mongod instances running with the shardsvr value for the clusterRole setting, the
rule would resemble the following:
For config server mongod instances running with the configsvr value for the clusterRole setting, the
rule would resemble the following:
326
Chapter 6. Security
The default policy for iptables chains is to allow all traffic. After completing all iptables configuration changes,
you must change the default policy to DROP so that all traffic that isnt explicitly allowed as above will not be able to
reach components of the MongoDB deployment. Issue the following commands to change this policy:
iptables -P INPUT DROP
iptables -P OUTPUT DROP
This section contains a number of basic operations for managing and using iptables. There are various front end
tools that automate some aspects of iptables configuration, but at the core all iptables front ends provide the
same basic functionality:
Make all iptables Rules Persistent By default all iptables rules are only stored in memory. When your
system restarts, your firewall rules will revert to their defaults. When you have tested a rule set and have guaranteed
that it effectively controls traffic you can use the following operations to you should make the rule set persistent.
On Red Hat Enterprise Linux, Fedora Linux, and related distributions you can issue the following command:
service iptables save
On Debian, Ubuntu, and related distributions, you can use the following command to dump the iptables rules to
the /etc/iptables.conf file:
iptables-save > /etc/iptables.conf
Place this command in your rc.local file, or in the /etc/network/if-up.d/iptables file with other
similar operations.
List all iptables Rules To list all of currently applied iptables rules, use the following operation at the system
shell.
iptables --L
Flush all iptables Rules If you make a configuration mistake when entering iptables rules or simply need to
revert to the default rule set, you can use the following operation at the system shell to flush all rules:
iptables --F
If youve already made your iptables rules persistent, you will need to repeat the appropriate procedure in the
Make all iptables Rules Persistent (page 327) section.
Configure Windows netsh Firewall for MongoDB
On Windows Server systems, the netsh program provides methods for managing the Windows Firewall. These
firewall rules make it possible for administrators to control what hosts can connect to the system, and limit risk
exposure by limiting the hosts that can connect to a system.
6.3. Security Tutorials
327
This document outlines basic Windows Firewall configurations. Use these approaches as a starting point for your
larger networking organization. For a detailed over view of security practices and risk management for MongoDB, see
Security Concepts (page 307).
See also:
Windows Firewall29 documentation from Microsoft.
Overview
Windows Firewall processes rules in an ordered determined by rule type, and parsed in the following order:
1. Windows Service Hardening
2. Connection security rules
3. Authenticated Bypass Rules
4. Block Rules
5. Allow Rules
6. Default Rules
By default, the policy in Windows Firewall allows all outbound connections and blocks all incoming connections.
Given the default ports (page 314) of all MongoDB processes, you must configure networking rules that permit only
required communication between your application and the appropriate mongod.exe and mongos.exe instances.
The configuration changes outlined in this document will create rules which explicitly allow traffic from specific
addresses and on specific ports, using a default policy that drops all traffic that is not explicitly allowed.
You can configure the Windows Firewall with using the netsh command line tool or through a windows application.
On Windows Server 2008 this application is Windows Firewall With Advanced Security in Administrative Tools. On
previous versions of Windows Server, access the Windows Firewall application in the System and Security control
panel.
The procedures in this document use the netsh command line tool.
Patterns
This section contains a number of patterns and examples for configuring Windows Firewall for use with MongoDB
deployments. If you have configured different ports using the port configuration setting, you will need to modify the
rules accordingly.
Traffic to and from mongod.exe Instances This pattern is applicable to all mongod.exe instances running as
standalone instances or as part of a replica set. The goal of this pattern is to explicitly allow traffic to the mongod.exe
instance from the application server.
netsh advfirewall firewall add rule name="Open mongod port 27017" dir=in action=allow protocol=TCP lo
This rule allows all incoming traffic to port 27017, which allows the application server to connect to the
mongod.exe instance.
Windows Firewall also allows enabling network access for an entire application rather than to a specific port, as in the
following example:
29 http://technet.microsoft.com/en-us/network/bb545423.aspx
328
Chapter 6. Security
netsh advfirewall firewall add rule name="Allowing mongod" dir=in action=allow program=" C:\mongodb\b
You can allow all access for a mongos.exe server, with the following invocation:
netsh advfirewall firewall add rule name="Allowing mongos" dir=in action=allow program=" C:\mongodb\b
Traffic to and from mongos.exe Instances mongos.exe instances provide query routing for sharded clusters.
Clients connect to mongos.exe instances, which behave from the clients perspective as mongod.exe instances.
In turn, the mongos.exe connects to all mongod.exe instances that are components of the sharded cluster.
Use the same Windows Firewall command to allow traffic to and from these instances as you would from the
mongod.exe instances that are members of the replica set.
netsh advfirewall firewall add rule name="Open mongod shard port 27018" dir=in action=allow protocol=
Traffic to and from a MongoDB Config Server Configuration servers, host the config database that stores metadata for sharded clusters. Each production cluster has three configuration servers, initiated using the mongod
--configsvr option. 30 Configuration servers listen for connections on port 27019. As a result, add the following Windows Firewall rules to the config server to allow incoming and outgoing connection on port 27019, for
connection to the other config servers.
netsh advfirewall firewall add rule name="Open mongod config svr port 27019" dir=in action=allow prot
Additionally, config servers need to allow incoming connections from all of the mongos.exe instances in the cluster
and all mongod.exe instances in the cluster. Add rules that resemble the following:
netsh advfirewall firewall add rule name="Open mongod config svr inbound" dir=in action=allow protoco
Replace <ip-address> with the addresses of the mongos.exe instances and the shard mongod.exe instances.
Traffic to and from a MongoDB Shard Server For shard servers, running as mongod --shardsvr 31 Because
the default port number is 27018 when running with the shardsvr value for the clusterRole setting, you must
configure the following Windows Firewall rules to allow traffic to and from each shard:
netsh advfirewall firewall add rule name="Open mongod shardsvr inbound" dir=in action=allow protocol=
netsh advfirewall firewall add rule name="Open mongod shardsvr outbound" dir=out action=allow protoco
Replace the <ip-address> specification with the IP address of all mongod.exe instances. This allows you to
permit incoming and outgoing traffic between all shards including constituent replica set members to:
all mongod.exe instances in the shards replica sets.
all mongod.exe instances in other shards.
32
You also can run a config server by using the configsrv value for the clusterRole setting in a configuration file.
You can also specify the shard server option with the shardsvr value for the clusterRole setting in the configuration file. Shard members
are also often conventional replica sets using the default port.
32 All shards in a cluster need to be able to communicate with all other shards to facilitate chunk and balancing operations.
31
329
netsh advfirewall firewall add rule name="Open mongod config svr outbound" dir=out action=allow proto
netsh advfirewall firewall add rule name="Open mongod HTTP monitoring inbound" dir=in action=all
Replace <ip-address> with the address of the instance that needs access to the HTTP or REST interface.
For all deployments, you should restrict access to this port to only the monitoring instance.
Optional
For config server mongod instances running with the shardsvr value for the clusterRole setting, the
rule would resemble the following:
netsh advfirewall firewall add rule name="Open mongos HTTP monitoring inbound" dir=in action=all
For config server mongod instances running with the configsvr value for the clusterRole setting, the
rule would resemble the following:
netsh advfirewall firewall add rule name="Open mongod configsvr HTTP monitoring inbound" dir=in
This section contains a number of basic operations for managing and using netsh. While you can use the GUI front
ends to manage the Windows Firewall, all core functionality is accessible is accessible from netsh.
Delete all Windows Firewall Rules To delete the firewall rule allowing mongod.exe traffic:
netsh advfirewall firewall delete rule name="Open mongod port 27017" protocol=tcp localport=27017
netsh advfirewall firewall delete rule name="Open mongod shard port 27018" protocol=tcp localport=270
List All Windows Firewall Rules To return a list of all Windows Firewall rules:
netsh advfirewall firewall show rule name=all
Backup and Restore Windows Firewall Rules To simplify administration of larger collection of systems, you can
export or import firewall systems from different servers) rules very easily on Windows:
Export all firewall rules with the following command:
netsh advfirewall export "C:\temp\MongoDBfw.wfw"
330
Chapter 6. Security
Replace "C:\temp\MongoDBfw.wfw" with a path of your choosing. You can use a command in the following
form to import a file created using this operation:
netsh advfirewall import "C:\temp\MongoDBfw.wfw"
This document helps you to configure MongoDB to support SSL. MongoDB clients can use SSL to encrypt connections to mongod and mongos instances.
These instructions assume that you have already installed a build of MongoDB that includes SSL support and that your
client driver supports SSL. For instructions on upgrading a cluster currently not using SSL to using SSL, see Upgrade
a Cluster to Use SSL (page 338).
Changed in version 2.6: MongoDBs SSL encryption only allows use of strong SSL ciphers with a minimum of 128-bit
key length for all connections.
New in version 2.6: MongoDB Enterprise for Windows includes support for SSL.
Prerequisites
MongoDB Support New in version 3.0: Most MongoDB distributions now include support for SSL.
Certain distributions of MongoDB33 do not contain support for SSL. To use SSL, be sure to choose a package that
supports SSL. All MongoDB Enterprise34 supported platforms include SSL support.
Client Support See SSL Configuration for Clients (page 334) to learn about SSL support for Python, Java, Ruby,
and other clients.
Certificate Authorities For production use, your MongoDB deployment should use valid certificates generated and
signed by a single certificate authority. You or your organization can generate and maintain an independent certificate
authority, or use certificates generated by a third-party SSL vendor. Obtaining and managing certificates is beyond the
scope of this documentation.
.pem File Before you can use SSL, you must have a .pem file containing a public key certificate and its associated
private key.
MongoDB can use any valid SSL certificate issued by a certificate authority, or a self-signed certificate. If you use a
self-signed certificate, although the communications channel will be encrypted, there will be no validation of server
identity. Although such a situation will prevent eavesdropping on the connection, it leaves you vulnerable to a man-inthe-middle attack. Using a certificate signed by a trusted certificate authority will permit MongoDB drivers to verify
the servers identity.
In general, avoid using self-signed certificates unless the network is trusted.
Additionally, with regards to authentication among replica set/sharded cluster members (page 310), in order to minimize exposure of the private key and allow hostname validation, it is advisable to use different certificates on different
servers.
33 http://www.mongodb.org/downloads
34 http://www.mongodb.com/products/mongodb-enterprise
331
For testing purposes, you can generate a self-signed certificate and private key on a Unix system with a command that
resembles the following:
cd /etc/ssl/
openssl req -newkey rsa:2048 -new -x509 -days 365 -nodes -out mongodb-cert.crt -keyout mongodb-cert.k
This operation generates a new, self-signed certificate with no passphrase that is valid for 365 days. Once you have
the certificate, concatenate the certificate and private key to a .pem file, as in the following example:
cat mongodb-cert.key mongodb-cert.crt > mongodb.pem
See also:
Use x.509 Certificates to Authenticate Clients (page 348)
Procedures
Set Up mongod and mongos with SSL Certificate and Key To use SSL in your MongoDB deployment, include
the following run-time options with mongod and mongos:
net.ssl.mode set to requireSSL. This setting restricts each server to use only SSL encrypted connections.
You can also specify either the value allowSSL or preferSSL to set up the use of mixed SSL modes on a
port. See net.ssl.mode for details.
PEMKeyfile with the .pem file that contains the SSL certificate and key.
Consider the following syntax for mongod:
mongod --sslMode requireSSL --sslPEMKeyFile <pem>
For example, given an SSL certificate located at /etc/ssl/mongodb.pem, configure mongod to use SSL encryption for all connections with the following command:
mongod --sslMode requireSSL --sslPEMKeyFile /etc/ssl/mongodb.pem
Note:
Specify <pem> with the full path name to the certificate.
If the private key portion of the <pem> is encrypted, specify the passphrase. See SSL Certificate Passphrase
(page 334).
You may also specify these options in the configuration file, as in the following example:
sslMode = requireSSL
sslPEMKeyFile = /etc/ssl/mongodb.pem
To connect, to mongod and mongos instances using SSL, the mongo shell and MongoDB tools must include the
--ssl option. See SSL Configuration for Clients (page 334) for more information on connecting to mongod and
mongos running with SSL.
See also:
Upgrade a Cluster to Use SSL (page 338)
Set Up mongod and mongos with Certificate Validation To set up mongod or mongos for SSL encryption
using an SSL certificate signed by a certificate authority, include the following run-time options during startup:
332
Chapter 6. Security
net.ssl.mode set to requireSSL. This setting restricts each server to use only SSL encrypted connections.
You can also specify either the value allowSSL or preferSSL to set up the use of mixed SSL modes on a
port. See net.ssl.mode for details.
PEMKeyfile with the name of the .pem file that contains the signed SSL certificate and key.
CAFile with the name of the .pem file that contains the root certificate chain from the Certificate Authority.
Consider the following syntax for mongod:
mongod --sslMode requireSSL --sslPEMKeyFile <pem> --sslCAFile <ca>
For example, given a signed SSL certificate located at /etc/ssl/mongodb.pem and the certificate authority file
at /etc/ssl/ca.pem, you can configure mongod for SSL encryption as follows:
mongod --sslMode requireSSL --sslPEMKeyFile /etc/ssl/mongodb.pem --sslCAFile /etc/ssl/ca.pem
Note:
Specify the <pem> file and the <ca> file with either the full path name or the relative path name.
If the <pem> is encrypted, specify the passphrase. See SSL Certificate Passphrase (page 334).
You may also specify these options in the configuration file, as in the following example:
sslMode = requireSSL
sslPEMKeyFile = /etc/ssl/mongodb.pem
sslCAFile = /etc/ssl/ca.pem
To connect, to mongod and mongos instances using SSL, the mongo tools must include the both the --ssl and
--sslPEMKeyFile option. See SSL Configuration for Clients (page 334) for more information on connecting to
mongod and mongos running with SSL.
See also:
Upgrade a Cluster to Use SSL (page 338)
Block Revoked Certificates for Clients To prevent clients with revoked certificates from connecting, include the
sslCRLFile to specify a .pem file that contains revoked certificates.
For example, the following mongod with SSL configuration includes the sslCRLFile setting:
Clients with revoked certificates in the /etc/ssl/ca-crl.pem will not be able to connect to this mongod instance.
Validate Only if a Client Presents a Certificate In most cases it is important to ensure that clients present valid
certificates. However, if you have clients that cannot present a client certificate, or are transitioning to using a certificate
authority you may only want to validate certificates from clients that present a certificate.
If you want to bypass validation for clients that dont present certificates, include the
allowConnectionsWithoutCertificates run-time option with mongod and mongos. If the client
does not present a certificate, no validation occurs. These connections, though not validated, are still encrypted using
SSL.
For example, consider the following mongod with
allowConnectionsWithoutCertificates setting:
an
SSL
configuration
that
includes
the
333
Then, clients can connect either with the option --ssl and no certificate or with the option --ssl and a valid
certificate. See SSL Configuration for Clients (page 334) for more information on SSL connections for clients.
Note: If the client presents a certificate, the certificate must be a valid certificate.
All connections, including those that have not presented certificates are encrypted using SSL.
SSL Certificate Passphrase The PEM files for PEMKeyfile and ClusterFile may be encrypted. With encrypted PEM files, you must specify the passphrase at startup with a command-line or a configuration file option or
enter the passphrase when prompted.
Changed in version 2.6: In previous versions, you can only specify the passphrase with a command-line or a configuration file option.
To specify the passphrase in clear text on the command line or in a configuration file, use the PEMKeyPassword
and/or the ClusterPassword option.
To have MongoDB prompt for the passphrase at the start of mongod or mongos and avoid specifying the passphrase
in clear text, omit the PEMKeyPassword and/or the ClusterPassword option. MongoDB will prompt for each
passphrase as necessary.
Important: The passphrase prompt option is available if you run the MongoDB instance in the foreground with
a connected terminal. If you run mongod or mongos in a non-interactive session (e.g. without a terminal or as a
service on Windows), you cannot use the passphrase prompt option.
For SSL connections, you must use the mongo shell built with SSL support or distributed with MongoDB Enterprise.
To support SSL, mongo has the following settings:
--ssl
--sslPEMKeyFile with the name of the .pem file that contains the SSL certificate and key.
35 http://www.mongodb.com/products/mongodb-enterprise
334
Chapter 6. Security
--sslCAFile with the name of the .pem file that contains the certificate from the Certificate Authority (CA).
Warning: If the mongo shell or any other tool that connects to mongos or mongod is run without
--sslCAFile, it will not attempt to validate server certificates. This results in vulnerability to expired
mongod and mongos certificates as well as to foreign processes posing as valid mongod or mongos
instances. Ensure that you always specify the CA file against which server certificates should be validated
in cases where intrusion is a possibility.
--sslPEMKeyPassword option if the client certificate-key file is encrypted.
Connect to MongoDB Instance with SSL Encryption To connect to a mongod or mongos instance that requires
only a SSL encryption mode (page 332), start mongo shell with --ssl, as in the following:
mongo --ssl
Connect to MongoDB Instance that Requires Client Certificates To connect to a mongod or mongos that requires CA-signed client certificates (page 332), start the mongo shell with --ssl and the --sslPEMKeyFile
option to specify the signed certificate-key file, as in the following:
mongo --ssl --sslPEMKeyFile /etc/ssl/client.pem
Connect to MongoDB Instance that Validates when Presented with a Certificate To connect to a mongod or
mongos instance that only requires valid certificates when the client presents a certificate (page 333), start mongo
shell either with the --ssl ssl and no certificate or with the --ssl ssl and a valid signed certificate.
For example, if mongod is running with weak certificate validation, both of the following mongo shell clients can
connect to that mongod:
mongo --ssl
mongo --ssl --sslPEMKeyFile /etc/ssl/client.pem
The MMS Monitoring agent will also have to connect via SSL in order to gather its statistics. Because the agent
already utilizes SSL for its communications to the MMS servers, this is just a matter of enabling SSL support in MMS
itself on a per host basis.
Use the Edit host button (i.e. the pencil) on the Hosts page in the MMS console to enable SSL.
Please see the MMS documentation36 for more information about MMS configuration.
PyMongo
Add the ssl=True parameter to a PyMongo MongoClient37 to create a MongoDB connection to an SSL MongoDB instance:
36 https://docs.mms.mongodb.com/
37 http://api.mongodb.org/python/current/api/pymongo/mongo_client.html#pymongo.mongo_client.MongoClient
335
throws Exception {
The recent versions of the Ruby driver have support for connections to SSL servers. Install the latest version of the
driver with the following command:
gem install mongo
336
Chapter 6. Security
In the node-mongodb-native41 driver, use the following invocation to connect to a mongod or mongos instance via
SSL:
var db1 = new Db(MONGODB, new Server("127.0.0.1", 27017,
{ auto_reconnect: false, poolSize:4, ssl:true } );
As of release 1.6, the .NET driver supports SSL connections with mongod and mongos instances. To connect using
SSL, you must add an option to the connection string, specifying ssl=true as follows:
var connectionString = "mongodb://localhost/?ssl=true";
var server = MongoServer.Create(connectionString);
The .NET driver will validate the certificate against the local trusted certificate store, in addition to providing encryption of the server. This behavior may produce issues during testing if the server uses a self-signed certificate. If
you encounter this issue, add the sslverifycertificate=false option to the connection string to prevent the
.NET driver from validating the certificate, as follows:
var connectionString = "mongodb://localhost/?ssl=true&sslverifycertificate=false";
var server = MongoServer.Create(connectionString);
337
mongodump
mongoexport
mongofiles
mongoimport
mongooplog
mongorestore
mongostat
mongotop
To use SSL connections with these tools, use the same SSL options as the mongo shell. See mongo Shell SSL
Configuration (page 334).
Upgrade a Cluster to Use SSL
Note: The default distribution of MongoDB44 does not contain support for SSL. To use SSL you can either compile
MongoDB with SSL support or use MongoDB Enterprise. See Configure mongod and mongos for SSL (page 331) for
more information about SSL and MongoDB.
Changed in version 2.6.
The MongoDB server supports listening for both SSL encrypted and unencrypted connections on the same TCP port.
This allows upgrades of MongoDB clusters to use SSL encrypted connections. To upgrade from a MongoDB cluster
using no SSL encryption to one using only SSL encryption, use the following rolling upgrade process:
1. For each node of a cluster, start the node with the option --sslMode set to allowSSL. The --sslMode
allowSSL setting allows the node to accept both SSL and non-SSL incoming connections. Its connections to
other servers do not use SSL. Include other SSL options (page 331) as well as any other options that are required
for your specific configuration. For example:
mongod --replSet <name> --sslMode allowSSL --sslPEMKeyFile <path to SSL Certificate and key PEM
2. Switch all clients to use SSL. See SSL Configuration for Clients (page 334).
3. For each node of a cluster, use the setParameter command to update the sslMode to preferSSL. 45
With preferSSL as its net.ssl.mode, the node accepts both SSL and non-SSL incoming connections,
and its connections to other servers use SSL. For example:
db.getSiblingDB('admin').runCommand( { setParameter: 1, sslMode: "preferSSL" } )
As an alternative to using the setParameter command, you can also restart the nodes with the appropriate SSL options and values.
338
Chapter 6. Security
4. For each node of the cluster, use the setParameter command to update the sslMode to requireSSL. 1
With requireSSL as its net.ssl.mode, the node will reject any non-SSL connections. For example:
db.getSiblingDB('admin').runCommand( { setParameter: 1, sslMode: "requireSSL" } )
5. After the upgrade of all nodes, edit the configuration file with the appropriate SSL settings to ensure
that upon subsequent restarts, the cluster uses SSL.
Configure MongoDB for FIPS
New in version 2.6.
Overview
The Federal Information Processing Standard (FIPS) is a U.S. government computer security standard used to certify
software modules and libraries that encrypt and decrypt data securely. You can configure MongoDB to run with a
FIPS 140-2 certified library for OpenSSL. Configure FIPS to run by default or as needed from the command line.
Prerequisites
Only the MongoDB Enterprise46 version supports FIPS mode. See Install MongoDB Enterprise (page 29) to download
and install MongoDB Enterprise47 to use FIPS mode.
Your system must have an OpenSSL library configured with the FIPS 140-2 module. At the command line, type
openssl version to confirm your OpenSSL software includes FIPS support.
For Red Hat Enterprise Linux 6.x (RHEL 6.x) or its derivatives such as CentOS 6.x, the OpenSSL toolkit must be
at least openssl-1.0.1e-16.el6_5 to use FIPS mode. To upgrade the toolkit for these platforms, issue the
following command:
sudo yum update openssl
Some versions of Linux periodically execute a process to prelink dynamic libraries with pre-assigned addresses. This
process modifies the OpenSSL libraries, specifically libcrypto. The OpenSSL FIPS mode will subsequently fail
the signature check performed upon startup to ensure libcrypto has not been modified since compilation.
To configure the Linux prelink process to not prelink libcrypto:
sudo bash -c "echo '-b /usr/lib64/libcrypto.so.*' >>/etc/prelink.conf.d/openssl-prelink.conf"
Considerations
FIPS is property of the encryption system and not the access control system. However, if your environment requires FIPS compliant encryption and access control, you must ensure that the access control system uses only FIPScompliant encryption.
MongoDBs FIPS support covers the way that MongoDB uses OpenSSL for network encryption and X509 authentication. If you use Kerberos or LDAP Proxy authentication, you muse ensure that these external mechanisms are
FIPS-compliant. MONGODB-CR authentication is not FIPS compliant.
46 http://www.mongodb.com/products/mongodb-enterprise
47 http://www.mongodb.com/products/mongodb-enterprise
339
Procedure
See Configure mongod and mongos for SSL (page 331) for details about config-
Run mongod or mongos instance in FIPS mode Perform these steps after you Configure mongod and mongos
for SSL (page 331).
Step 1: Change configuration file. To configure your mongod or mongos instance to use FIPS mode, shut down
the instance and update the configuration file with the following setting:
net:
ssl:
FIPSMode: true
Step 2: Start mongod or mongos instance with configuration file. For example, run this command to start the
mongod instance with its configuration file:
mongod --config /etc/mongodb.conf
Confirm FIPS mode is running Check the server log file for a message FIPS is active:
FIPS 140-2 mode activated
With authentication (page 308) enabled, MongoDB forces all clients to identify themselves before granting access to
the server. Authorization (page 312), in turn, allows administrators to define and limit the resources and operations
that a user can access. Using authentication and authorization is a key part of a complete security strategy.
All MongoDB deployments support authentication. By default, MongoDB does not require authorization checking.
You can enforce authorization checking when deploying MongoDB, or on an existing deployment; however, you
cannot enable authorization checking on a running deployment without downtime.
This tutorial provides a procedure for creating a MongoDB replica set (page 541) that uses the challenge-response authentication mechanism. The tutorial includes creation of a minimal authorization system to support basic operations.
340
Chapter 6. Security
Considerations
Authentication In this procedure, you will configure MongoDB using the default challenge-response authentication
mechanism, using the keyFile to supply the password for inter-process authentication (page 310). The content of
the key file is the shared secret used for all internal authentication.
All deployments that enforce authorization checking should have one user administrator user that can create new users
and modify existing users. During this procedure you will create a user administrator that you will use to administer
this deployment.
Architecture In a production, deploy each member of the replica set to its own machine and if possible bind to the
standard MongoDB port of 27017. Use the bind_ip option to ensure that MongoDB listens for connections from
applications on configured addresses.
For a geographically distributed replica sets, ensure that the majority of the sets mongod instances reside in the
primary site.
See Replica Set Deployment Architectures (page 553) for more information.
Connectivity Ensure that network traffic can pass between all members of the set and all clients in the network
securely and efficiently. Consider the following:
Establish a virtual private network. Ensure that your network topology routes all traffic between members within
a single site over the local area network.
Configure access control to prevent connections from unknown clients to the replica set.
Configure networking and firewall rules so that incoming and outgoing packets are permitted only on the default
MongoDB port and only from within your deployment.
Finally ensure that each member of a replica set is accessible by way of resolvable DNS or hostnames. You should
either configure your DNS names appropriately or set up your systems /etc/hosts file to reflect this configuration.
Configuration Specify the run time configuration on each system in a configuration file stored in
/etc/mongodb.conf or a related location. Create the directory where MongoDB stores data files before deploying MongoDB.
For more information about the run time options used above and other configuration options, see
http://docs.mongodb.org/manual/reference/configuration-options.
Procedure
This procedure deploys a replica set in which all members use the same key file.
Step 1: Start one member of the replica set. This mongod should not enable auth.
Step 2: Create administrative users. The following operations will create two users: a user administrator that will
be able to create and modify users (siteUserAdmin), and a root (page 397) user (siteRootAdmin) that you
will use to complete the remainder of the tutorial:
use admin
db.createUser( {
user: "siteUserAdmin",
pwd: "<password>",
341
You may generate a key file using any method you choose. Always ensure that the password stored in the key file is
both long and contains a high amount of entropy. Using openssl in this manner helps generate such a key.
Step 5: Copy the key file to each member of the replica set. Copy the mongodb-keyfile to all hosts where
components of a MongoDB deployment run. Set the permissions of these files to 600 so that only the owner of the
file can read or write this file to prevent other users on the system from accessing the shared secret.
Step 6: Start each member of the replica set with the appropriate options. For each member, start a mongod
and specify the key file and the name of the replica set. Also specify other parameters as needed for your deployment.
For replication-specific parameters, see cli-mongod-replica-set required by your deployment.
If your application connects to more than one replica set, each set should have a distinct name. Some drivers group
replica set connections by replica set name.
The following example specifies parameters through the --keyFile and --replSet command-line options:
mongod --keyFile /mysecretdirectory/mongodb-keyfile --replSet "rs0"
In production deployments, you can configure a control script to manage this process. Control scripts are beyond the
scope of this document.
Step 7: Connect to the member of the replica set where you created the administrative users. Connect to
the replica set member you started and authenticate as the siteRootAdmin user. From the mongo shell, use the
following operation to authenticate:
use admin
db.auth("siteRootAdmin", "<password>");
Step 8: Initiate the replica set. Use rs.initiate() on the replica set member:
342
Chapter 6. Security
rs.initiate()
MongoDB initiates a set that consists of the current member and that uses the default replica set configuration.
Step 9: Verify the initial replica set configuration. Use rs.conf() to display the replica set configuration object
(page 632):
rs.conf()
Step 10: Add the remaining members to the replica set. Add the remaining members with the rs.add()
method.
The following example adds two members:
rs.add("mongodb1.example.net")
rs.add("mongodb2.example.net")
When complete, you have a fully functional replica set. The new replica set will elect a primary.
Step 11: Check the status of the replica set. Use the rs.status() operation:
rs.status()
Step 12: Create additional users to address operational requirements. You can use built-in roles (page 390) to
create common types of database users, such as the dbOwner (page 392) role to create a database administrator, the
readWrite (page 390) role to create a user who can update data, or the read (page 390) role to create user who
can search data but no more. You also can define custom roles (page 313).
For example, the following creates a database administrator for the products database:
use products
db.createUser(
{
user: "productsDBAdmin",
pwd: "password",
roles:
[
{
role: "dbOwner",
db: "products"
}
]
343
}
)
For an overview of roles and privileges, see Authorization (page 312). For more information on adding users, see Add
a User to a Database (page 372).
Enabling access control on a MongoDB instance restricts access to the instance by requiring that users identify themselves when connecting. In this procedure, you enable access control and then create the instances first user, which
must be a user administrator. The user administrator grants further access to the instance by creating additional users.
344
Chapter 6. Security
Considerations
If you create the user administrator before enabling access control, MongoDB disables the localhost exception
(page 311). In that case, you must use the Enable Authentication after Creating the User Administrator (page 347)
procedure to enable access control.
This procedure uses the localhost exception (page 311) to allow you to create the first user after enabling authentication.
See Localhost Exception (page 311) and Authentication (page 308) for more information.
Procedure
Step 1: Start the MongoDB instance with authentication enabled. Start the mongod or mongos instance with
the authorization or keyFile setting. Use authorization on a standalone instance. Use keyFile on an
instance in a replica set or sharded cluster.
For example, to start a mongod with authentication enabled and a key file stored in /private/var, first set the
following option in the mongods configuration file:
security:
keyFile: /private/var/key.pem
Then start the mongod and specify the config file. For example:
mongod --config /etc/mongodb/mongodb.conf
After you enable authentication, only the user administrator can connect to the MongoDB instance. The user administrator must log in and grant further access to the instance by creating additional users.
Step 2: Connect to the MongoDB instance via the localhost exception. Connect to the MongoDB instance from
a client running on the same system. This access is made possible by the localhost exception (page 311).
Step 3: Create the system user administrator.
role, and only that role.
The following example creates the user siteUserAdmin user on the admin database:
use admin
db.createUser(
{
user: "siteUserAdmin",
pwd: "password",
roles: [ { role: "userAdminAnyDatabase", db: "admin" } ]
}
)
After you create the user administrator, the localhost exception (page 311) is no longer available.
The mongo shell executes a number of commands at start up. As a result, when you log in as the user administrator,
you may see authentication errors from one or more commands. You may ignore these errors, which are expected,
because the userAdminAnyDatabase (page 396) role does not have permissions to run some of the start up
commands.
Step 4: Create additional users. Login in with the user administrators credentials and create additional users. See
Add a User to a Database (page 372).
345
Next Steps
If you need to disable access control for any reason, restart the process without the authorization or keyFile
setting.
Enable Authentication in a Sharded Cluster
New in version 2.0: Support for authentication with sharded clusters.
Overview
When authentication is enabled on a sharded cluster, every client that accesses the cluster must provide credentials.
This includes MongoDB instances that access each other within the cluster.
To enable authentication on a sharded cluster, you must enable authentication individually on each component of the
cluster. This means enabling authentication on each mongos and each mongod, including each config server, and all
members of a shards replica set.
Authentication requires an authentication mechanism and, in most cases, a key file. The content of the key file
must be the same on all cluster members.
Considerations
It is not possible to convert an existing sharded cluster that does not enforce access control to require authentication
without taking all components of the cluster offline for a short period of time.
As described in Localhost Exception (page 311), the localhost exception will apply to the individual shards unless you
either create an administrative user or disable the localhost exception on each shard.
Procedure
Step 1: Create a key file. Create the key file your deployment will use to authenticate servers to each other.
To generate pseudo-random data to use for a keyfile, issue the following openssl command:
openssl rand -base64 741 > mongodb-keyfile
chmod 600 mongodb-keyfile
You may generate a key file using any method you choose. Always ensure that the password stored in the key file is
both long and contains a high amount of entropy. Using openssl in this manner helps generate such a key.
Step 2: Enable authentication on each component in the cluster. On each mongos and mongod in the cluster,
including all config servers and shards, specify the key file using one of the following approaches:
Specify the key file in the configuration file. In the configuration file, set the keyFile option to the key files path
and then start the component, as in the following example:
security:
keyFile: /srv/mongodb/keyfile
346
Chapter 6. Security
Specify the key file at runtime. When starting the component, set the --keyFile option, which is an option
for both mongos instances and mongod instances. Set the --keyFile to the key files path. The keyFile
setting implies the authorization setting, which means in most cases you do not need to set authorization
explicitly.
Step 3: Add users. While connected to a mongos, add the first administrative user and then add subsequent users.
See Create a User Administrator (page 370).
Related Documents
Enabling authentication on a MongoDB instance restricts access to the instance by requiring that users identify themselves when connecting. In this procedure, you will create the instances first user, which must be a user administrator
and then enable authentication. Then, you can authenticate as the user administrator to create additional users and
grant additional access to the instance.
This procedures outlines how enable authentication after creating the user administrator. The approach requires a
restart. To enable authentication without restarting, see Enable Client Access Control (page 344).
Considerations
This document outlines a procedure for enabling authentication for MongoDB instance where you create the first user
on an existing MongoDB system that does not require authentication before restarting the instance and requiring authentication. You can use the localhost exception (page 311) to gain access to a system with no users and authentication
enabled. See Enable Client Access Control (page 344) for the description of that procedure.
Procedure
For details on starting a mongod or mongos, see Manage mongod Processes (page 222) or Deploy a Sharded Cluster
(page 670).
Step 2: Create the system user administrator.
role, and only that role.
The following example creates the user siteUserAdmin user on the admin database:
347
use admin
db.createUser(
{
user: "siteUserAdmin",
pwd: "password",
roles: [ { role: "userAdminAnyDatabase", db: "admin" } ]
}
)
Step 3: Re-start the MongoDB instance with authentication enabled. Re-start the mongod or mongos instance
with the authorization or keyFile setting. Use authorization on a standalone instance. Use keyFile
on an instance in a replica set or sharded cluster.
The following example enables authentication on a standalone mongod using the authorization command-line
option:
mongod --auth --config /etc/mongodb/mongodb.conf
Step 4: Create additional users. Log in with the user administrators credentials and create additional users. See
Add a User to a Database (page 372).
Next Steps
If you need to disable authentication for any reason, restart the process without the authorization or keyFile
option.
Use x.509 Certificates to Authenticate Clients
New in version 2.6.
MongoDB supports x.509 certificate authentication for use with a secure SSL connection (page 331). The x.509 client
authentication allows clients to authenticate to servers with certificates (page 348) rather than with a username and
password.
To use x.509 authentication for the internal authentication of replica set/sharded cluster members, see Use x.509
Certificate for Membership Authentication (page 350).
Prerequisites
Certificate Authority For production use, your MongoDB deployment should use valid certificates generated and
signed by a single certificate authority. You or your organization can generate and maintain an independent certificate
authority, or use certificates generated by a third-party SSL vendor. Obtaining and managing certificates is beyond the
scope of this documentation.
Client x.509 Certificate The client certificate must have the following properties:
A single Certificate Authority (CA) must issue the certificates for both the client and the server.
Client certificates must contain the following fields:
keyUsage = digitalSignature
extendedKeyUsage = clientAuth
348
Chapter 6. Security
A client x.509 certificates subject, which contains the Distinguished Name (DN), must differ from that of a
Member x.509 Certificate (page 351) to prevent client certificates from identifying the client as a cluster member
and granting full permission on the system. Specifically, the subjects must differ with regards to at least one of
the following attributes: Organization (O), the Organizational Unit (OU) or the Domain Component (DC).
Each unique MongoDB user must have a unique certificate.
Procedures
mongod --clusterAuthMode x509 --sslMode requireSSL --sslPEMKeyFile <path to SSL certificate and key P
Warning: If the --sslCAFile option and its target file are not specified, x.509 client and member authentication will not function. mongod, and mongos in sharded systems, will not be able to verify the certificates of
processes connecting to it against the trusted certificate authority (CA) that issued them, breaking the certificate
chain.
As of version 2.6.4, mongod will not start with x.509 authentication enabled if the CA file is not specified.
Use Configuration File You may also specify these options in the configuration file.
Starting in MongoDB 2.6, you can specify the configuration for MongoDB in YAML format, e.g.:
security:
clusterAuthMode: x509
net:
ssl:
mode: requireSSL
PEMKeyFile: <path to SSL certificate and key PEM file>
CAFile: <path to root CA PEM file>
For backwards compatibility, you can also specify the configuration using the older configuration file format48 , e.g.:
clusterAuthMode = x509
sslMode = requireSSL
sslPEMKeyFile = <path to SSL certificate and key PEM file>
sslCAFile = <path to the root CA PEM file>
Include any additional options, SSL or otherwise, that are required for your specific configuration.
Add x.509 Certificate subject as a User To authenticate with a client certificate, you must first add the value of
the subject from the client certificate as a MongoDB user. Each unique x.509 client certificate corresponds to a
single MongoDB user; i.e. you cannot use a single client certificate to authenticate more than one MongoDB user.
1. You can retrieve the subject from the client certificate with the following command:
openssl x509 -in <pathToClient PEM> -inform PEM -subject -nameopt RFC2253
349
subject= CN=myName,OU=myOrgUnit,O=myOrg,L=myLocality,ST=myState,C=myCountry
-----BEGIN CERTIFICATE----# ...
-----END CERTIFICATE-----
2. Add the value of the subject, omitting the spaces, from the certificate as a user.
For example, in the mongo shell, to add the user with both the readWrite role in the test database and the
userAdminAnyDatabase role which is defined only in the admin database:
db.getSiblingDB("$external").runCommand(
{
createUser: "CN=myName,OU=myOrgUnit,O=myOrg,L=myLocality,ST=myState,C=myCountry",
roles: [
{ role: 'readWrite', db: 'test' },
{ role: 'userAdminAnyDatabase', db: 'admin' }
],
writeConcern: { w: "majority" , wtimeout: 5000 }
}
)
In the above example, to add the user with the readWrite role in the test database, the role specification
document specified test in the db field. To add userAdminAnyDatabase role for the user, the above
example specified admin in the db field.
Note:
Some roles are defined only in the admin database, including: clusterAdmin,
readAnyDatabase,
readWriteAnyDatabase,
dbAdminAnyDatabase,
and
userAdminAnyDatabase. To add a user with these roles, specify admin in the db.
See Add a User to a Database (page 372) for details on adding a user with roles.
Authenticate with a x.509 Certificate To authenticate with a client certificate, you must first add a MongoDB user
that corresponds to the client certificate. See Add x.509 Certificate subject as a User (page 349).
To authenticate, use the db.auth() method in the $external database, specifying "MONGODB-X509" for the
mechanism field, and the user that corresponds to the client certificate (page 349) for the user field.
For example, if using the mongo shell,
1. Connect mongo shell to the mongod set up for SSL:
mongo --ssl --sslPEMKeyFile <path to CA signed client PEM file> --sslCAFile <path to root CA PEM
2. To perform the authentication, use the db.auth() method in the $external database. For the mechanism
field, specify "MONGODB-X509", and for the user field, specify the user, or the subject, that corresponds
to the client certificate.
db.getSiblingDB("$external").auth(
{
mechanism: "MONGODB-X509",
user: "CN=myName,OU=myOrgUnit,O=myOrg,L=myLocality,ST=myState,C=myCountry"
}
)
350
Chapter 6. Security
MongoDB supports x.509 certificate authentication for use with a secure SSL connection (page 331). Sharded cluster
members and replica set members can use x.509 certificates to verify their membership to the cluster or the replica set
instead of using keyfiles (page 308). The membership authentication is an internal process.
For client authentication with x.509, see Use x.509 Certificates to Authenticate Clients (page 348).
Member x.509 Certificate
The member certificate, used for internal authentication to verify membership to the sharded cluster or a replica set,
must have the following properties:
A single Certificate Authority (CA) must issue all the x.509 certificates for the members of a sharded cluster or
a replica set.
The Distinguished Name (DN), found in the member certificates subject, must specify a non-empty value
for at least one of the following attributes: Organization (O), the Organizational Unit (OU) or the Domain
Component (DC).
The Organization attributes (Os), the Organizational Unit attributes (OUs), and the Domain Components (DCs)
must match those from the certificates for the other cluster members. To match, the certificate must match all
specifications of these attributes, or even the non-specification of these attributes. The order of the attributes
does not matter.
In the following example, the two DNs contain matching specifications for O, OU as well as the non-specification
of the DC attribute.
CN=host1,OU=Dept1,O=MongoDB,ST=NY,C=US
C=US, ST=CA, O=MongoDB, OU=Dept1, CN=host2
However, the following two DNs contain a mismatch for the OU attribute since one contains two OU specifications and the other, only one specification.
CN=host1,OU=Dept1,OU=Sales,O=MongoDB
CN=host2,OU=Dept1,O=MongoDB
Either the Common Name (CN) or one of the Subject Alternative Name (SAN) entries must match the hostname
of the server, used by the other members of the cluster.
For example, the certificates for a cluster could have the following subjects:
subject= CN=<myhostname1>,OU=Dept1,O=MongoDB,ST=NY,C=US
subject= CN=<myhostname2>,OU=Dept1,O=MongoDB,ST=NY,C=US
subject= CN=<myhostname3>,OU=Dept1,O=MongoDB,ST=NY,C=US
You can use an x509 certificate that does not have Extended Key Usage (EKU) attributes set. If you use EKU attribute
in the PEMKeyFile certificate, then specify the clientAuth and/or serverAuth attributes (i.e. TLS Web
Client Authentication and TLS Web Server Authentication,) as needed. The certificate that you specify for the
PEMKeyFile option requires the serverAuth attribute, and the certificate you specify to clusterFile requires
the clientAuth attribute. If you omit ClusterFile, mongod will use the certificate specified to PEMKeyFile
for member authentication.
Configure Replica Set/Sharded Cluster
Use Command-line Options To specify the x.509 certificate for internal cluster member authentication, append
the additional SSL options --clusterAuthMode and --sslClusterFile, as in the following example for a
member of a replica set:
351
mongod --replSet <name> --sslMode requireSSL --clusterAuthMode x509 --sslClusterFile <path to members
Include any additional options, SSL or otherwise, that are required for your specific configuration. For instance, if
the membership key is encrypted, set the --sslClusterPassword to the passphrase to decrypt the key or have
MongoDB prompt for the passphrase. See SSL Certificate Passphrase (page 334) for details.
Warning: If the --sslCAFile option and its target file are not specified, x.509 client and member authentication will not function. mongod, and mongos in sharded systems, will not be able to verify the certificates of
processes connecting to it against the trusted certificate authority (CA) that issued them, breaking the certificate
chain.
As of version 2.6.4, mongod will not start with x.509 authentication enabled if the CA file is not specified.
Use Configuration File You can specify the configuration for MongoDB in a YAML formatted configuration
file, as in the following example:
security:
clusterAuthMode: x509
net:
ssl:
mode: requireSSL
PEMKeyFile: <path to SSL certificate and key PEM file>
CAFile: <path to root CA PEM file>
clusterFile: <path to x.509 membership certificate and key PEM file>
To upgrade clusters that are currently using keyfile authentication to x.509 authentication, use a rolling upgrade process.
Clusters Currently Using SSL For clusters using SSL and keyfile authentication, to upgrade to x.509 cluster authentication, use the following rolling upgrade process:
1. For each node of a cluster, start the node with the option --clusterAuthMode set to sendKeyFile and
the option --sslClusterFile set to the appropriate path of the nodes certificate. Include other SSL options
(page 331) as well as any other options that are required for your specific configuration. For example:
With this setting, each node continues to use its keyfile to authenticate itself as a member. However, each
node can now accept either a keyfile or an x.509 certificate from other members to authenticate those members.
Upgrade all nodes of the cluster to this setting.
2. Then, for each node of a cluster, connect to the node and use the setParameter command to update the
clusterAuthMode to sendX509. 49 For example,
db.getSiblingDB('admin').runCommand( { setParameter: 1, clusterAuthMode: "sendX509" } )
With this setting, each node uses its x.509 certificate, specified with the --sslClusterFile option in the
previous step, to authenticate itself as a member. However, each node continues to accept either a keyfile or an
49 As an alternative to using the setParameter command, you can also restart the nodes with the appropriate SSL and x509 options and
values.
352
Chapter 6. Security
x.509 certificate from other members to authenticate those members. Upgrade all nodes of the cluster to this
setting.
3. Optional but recommended. Finally, for each node of the cluster, connect to the node and use the
setParameter command to update the clusterAuthMode to x509 to only use the x.509 certificate for
authentication. 1 For example:
db.getSiblingDB('admin').runCommand( { setParameter: 1, clusterAuthMode: "x509" } )
4. After the upgrade of all nodes, edit the configuration file with the appropriate x.509 settings to ensure
that upon subsequent restarts, the cluster uses x.509 authentication.
See --clusterAuthMode for the various modes and their descriptions.
Clusters Currently Not Using SSL For clusters using keyfile authentication but not SSL, to upgrade to x.509
authentication, use the following rolling upgrade process:
1. For each node of a cluster, start the node with the option --sslMode set to allowSSL, the option
--clusterAuthMode set to sendKeyFile and the option --sslClusterFile set to the appropriate path of the nodes certificate. Include other SSL options (page 331) as well as any other options that are
required for your specific configuration. For example:
mongod --replSet <name> --sslMode allowSSL --clusterAuthMode sendKeyFile --sslClusterFile <path
The --sslMode allowSSL setting allows the node to accept both SSL and non-SSL incoming connections.
Its outgoing connections do not use SSL.
The --clusterAuthMode sendKeyFile setting allows each node continues to use its keyfile to authenticate itself as a member. However, each node can now accept either a keyfile or an x.509 certificate from other
members to authenticate those members.
Upgrade all nodes of the cluster to these settings.
2. Then, for each node of a cluster, connect to the node and use the setParameter command to update the
sslMode to preferSSL and the clusterAuthMode to sendX509. 1 For example:
With the sslMode set to preferSSL, the node accepts both SSL and non-SSL incoming connections, and its
outgoing connections use SSL.
With the clusterAuthMode set to sendX509, each node uses its x.509 certificate, specified with the
--sslClusterFile option in the previous step, to authenticate itself as a member. However, each node
continues to accept either a keyfile or an x.509 certificate from other members to authenticate those members.
Upgrade all nodes of the cluster to these settings.
3. Optional but recommended. Finally, for each node of the cluster, connect to the node and use the
setParameter command to update the sslMode to requireSSL and the clusterAuthMode to x509.
1
For example:
db.getSiblingDB('admin').runCommand( { setParameter: 1, sslMode: "requireSSL", clusterAuthMode:
With the sslMode set to requireSSL, the node only uses SSL connections.
With the clusterAuthMode set to x509, the node only uses the x.509 certificate for authentication.
4. After the upgrade of all nodes, edit the configuration file with the appropriate SSL and x.509 settings
to ensure that upon subsequent restarts, the cluster uses x.509 authentication.
See --clusterAuthMode for the various modes and their descriptions.
353
MongoDB Enterprise for Windows does not include LDAP support for authentication. However, MongoDB Enterprise
for Linux supports using LDAP authentication with an ActiveDirectory server.
MongoDB does not support LDAP authentication in mixed sharded cluster deployments that contain both version 2.4
and version 2.6 shards. See Upgrade MongoDB to 2.6 (page 829) for upgrade instructions.
Use secure encrypted or trusted connections between clients and the server, as well as between saslauthd and the
LDAP server. The LDAP server uses the SASL PLAIN mechanism, sending and receiving data in plain text. You
should use only a trusted channel such as a VPN, a connection encrypted with SSL, or a trusted wired network.
Configure saslauthd
LDAP support for user authentication requires proper configuration of the saslauthd daemon process as well as
the MongoDB server.
Step 1:
Specify the mechanism. On systems that configure saslauthd with the
/etc/sysconfig/saslauthd file, such as Red Hat Enterprise Linux, Fedora, CentOS, and Amazon
Linux AMI, set the mechanism MECH to ldap:
MECH=ldap
On systems that configure saslauthd with the /etc/default/saslauthd file, such as Ubuntu, set the
MECHANISMS option to ldap:
MECHANISMS="ldap"
Step 2: Adjust caching behavior. On certain Linux distributions, saslauthd starts with the caching of authentication credentials enabled. Until restarted or until the cache expires, saslauthd will not contact the LDAP server
to re-authenticate users in its authentication cache. This allows saslauthd to successfully authenticate users in its
cache, even in the LDAP server is down or if the cached users credentials are revoked.
To set the expiration time (in seconds) for the authentication cache, see the -t option50 of saslauthd.
Step 3: Configure LDAP Options with ActiveDirectory. If the saslauthd.conf file does not exist, create it.
The saslauthd.conf file usually resides in the /etc folder. If specifying a different file path, see the -O option51
of saslauthd.
To use with ActiveDirectory, start saslauthd with the following configuration options set in the
saslauthd.conf file:
50 http://www.linuxcommand.org/man_pages/saslauthd8.html
51 http://www.linuxcommand.org/man_pages/saslauthd8.html
354
Chapter 6. Security
For the <ldap uri>, specify the uri of the ldap server.
ldaps://ad.example.net.
For example,
ldap_servers:
Configure MongoDB
Step 1: Add user to MongoDB for authentication. Add the user to the $external database in MongoDB. To
specify the users privileges, assign roles (page 312) to the user.
For example, the following adds a user with read-only access to the records database.
db.getSiblingDB("$external").createUser(
{
user : <username>,
roles: [ { role: "read", db: "records" } ]
}
)
355
Use default Unix-domain socket path. To use the default Unix-domain socket path, set the saslauthdPath to
the empty string "", as in the following command line example:
mongod --auth --setParameter saslauthdPath="" --setParameter authenticationMechanisms=PLAIN
Step 3: Authenticate the user in the mongo shell. To perform the authentication in the mongo shell, use the
db.auth() method in the $external database.
Specify the value "PLAIN" in the mechanism field, the user and password in the user and pwd fields respectively,
and the value false in the digestPassword field. You must specify false for digestPassword since the
server must receive an undigested password to forward on to saslauthd, as in the following example:
db.getSiblingDB("$external").auth(
{
mechanism: "PLAIN",
user: <username>,
pwd: <cleartext password>,
digestPassword: false
}
)
The server forwards the password in plain text. In general, use only on a trusted channel (VPN, SSL, trusted wired
network). See Considerations.
Authenticate Using SASL and LDAP with OpenLDAP
MongoDB Enterprise provides support for proxy authentication of users. This allows administrators to configure
a MongoDB cluster to authenticate users by proxying authentication requests to a specified Lightweight Directory
Access Protocol (LDAP) service.
Considerations
MongoDB Enterprise for Windows does not include LDAP support for authentication. However, MongoDB Enterprise
for Linux supports using LDAP authentication with an ActiveDirectory server.
MongoDB does not support LDAP authentication in mixed sharded cluster deployments that contain both version 2.4
and version 2.6 shards. See Upgrade MongoDB to 2.6 (page 829) for upgrade instructions.
Use secure encrypted or trusted connections between clients and the server, as well as between saslauthd and the
LDAP server. The LDAP server uses the SASL PLAIN mechanism, sending and receiving data in plain text. You
should use only a trusted channel such as a VPN, a connection encrypted with SSL, or a trusted wired network.
Configure saslauthd
LDAP support for user authentication requires proper configuration of the saslauthd daemon process as well as
the MongoDB server.
356
Chapter 6. Security
Step 1:
Specify the mechanism. On systems that configure saslauthd with the
/etc/sysconfig/saslauthd file, such as Red Hat Enterprise Linux, Fedora, CentOS, and Amazon
Linux AMI, set the mechanism MECH to ldap:
MECH=ldap
On systems that configure saslauthd with the /etc/default/saslauthd file, such as Ubuntu, set the
MECHANISMS option to ldap:
MECHANISMS="ldap"
Step 2: Adjust caching behavior. On certain Linux distributions, saslauthd starts with the caching of authentication credentials enabled. Until restarted or until the cache expires, saslauthd will not contact the LDAP server
to re-authenticate users in its authentication cache. This allows saslauthd to successfully authenticate users in its
cache, even in the LDAP server is down or if the cached users credentials are revoked.
To set the expiration time (in seconds) for the authentication cache, see the -t option52 of saslauthd.
Step 3: Configure LDAP Options with OpenLDAP. If the saslauthd.conf file does not exist, create it. The
saslauthd.conf file usually resides in the /etc folder. If specifying a different file path, see the -O option53 of
saslauthd.
To connect to an OpenLDAP server, update the saslauthd.conf file with the following configuration options:
ldap_servers: <ldap uri>
ldap_search_base: <search base>
ldap_filter: <filter>
The ldap_servers specifies the uri of the LDAP server used for authentication. In general, for OpenLDAP installed
on the local machine, you can specify the value ldap://localhost:389 or if using LDAP over SSL, you can
specify the value ldaps://localhost:636.
The ldap_search_base specifies distinguished name to which the search is relative. The search includes the base
or objects below.
The ldap_filter specifies the search filter.
The values for these configuration options should correspond to the values specific for your test. For example, to filter
on email, specify ldap_filter: (mail=%n) instead.
OpenLDAP Example A sample saslauthd.conf file for OpenLDAP includes the following content:
ldap_servers: ldaps://ad.example.net
ldap_search_base: ou=Users,dc=example,dc=com
ldap_filter: (uid=%u)
To use this sample OpenLDAP configuration, create users with a uid attribute (login name) and place under the
Users organizational unit (ou) under the domain components (dc) example and com.
For more information on saslauthd configuration, see http://www.openldap.org/doc/admin24/guide.html#Configuringsaslauthd.
Step 4: Test the saslauthd configuration. Use testsaslauthd utility to test the saslauthd configuration.
For example:
52 http://www.linuxcommand.org/man_pages/saslauthd8.html
53 http://www.linuxcommand.org/man_pages/saslauthd8.html
357
Configure MongoDB
Step 1: Add user to MongoDB for authentication. Add the user to the $external database in MongoDB. To
specify the users privileges, assign roles (page 312) to the user.
For example, the following adds a user with read-only access to the records database.
db.getSiblingDB("$external").createUser(
{
user : <username>,
roles: [ { role: "read", db: "records" } ]
}
)
Use default Unix-domain socket path. To use the default Unix-domain socket path, set the saslauthdPath to
the empty string "", as in the following command line example:
mongod --auth --setParameter saslauthdPath="" --setParameter authenticationMechanisms=PLAIN
358
Chapter 6. Security
Step 3: Authenticate the user in the mongo shell. To perform the authentication in the mongo shell, use the
db.auth() method in the $external database.
Specify the value "PLAIN" in the mechanism field, the user and password in the user and pwd fields respectively,
and the value false in the digestPassword field. You must specify false for digestPassword since the
server must receive an undigested password to forward on to saslauthd, as in the following example:
db.getSiblingDB("$external").auth(
{
mechanism: "PLAIN",
user: <username>,
pwd: <cleartext password>,
digestPassword: false
}
)
The server forwards the password in plain text. In general, use only on a trusted channel (VPN, SSL, trusted wired
network). See Considerations.
Configure MongoDB with Kerberos Authentication on Linux
New in version 2.4.
Overview
MongoDB Enterprise supports authentication using a Kerberos service (page 318). Kerberos is an industry standard
authentication protocol for large client/server system.
Prerequisites
Setting up and configuring a Kerberos deployment is beyond the scope of this document. This tutorial assumes
you have have configured a Kerberos service principal (page 319) for each mongod and mongos instance in your
MongoDB deployment, and you have a valid keytab file (page 319) for for each mongod and mongos instance.
To verify MongoDB Enterprise binaries:
mongod --version
In the output from this command, look for the string modules:
to confirm your system has MongoDB Enterprise.
subscription or modules:
enterprise
Procedure
The following procedure outlines the steps to add a Kerberos user principal to MongoDB, configure a standalone
mongod instance for Kerberos support, and connect using the mongo shell and authenticate the user principal.
Step 1: Start mongod without Kerberos.
support.
For the initial addition of Kerberos users, start mongod without Kerberos
If a Kerberos user is already in MongoDB and has the privileges required to create a user, you can start mongod with
Kerberos support.
359
Step 2: Connect to mongod. Connect via the mongo shell to the mongod instance. If mongod has --auth
enabled, ensure you connect with the privileges required to create a user.
Step 3: Add Kerberos Principal(s) to MongoDB. Add a Kerberos principal, <username>@<KERBEROS
REALM> or <username>/<instance>@<KERBEROS REALM>, to MongoDB in the $external database.
Specify the Kerberos realm in all uppercase. The $external database allows mongod to consult an external source
(e.g. Kerberos) to authenticate. To specify the users privileges, assign roles (page 312) to the user.
The following example adds the Kerberos principal application/[email protected] with read-only
access to the records database:
use $external
db.createUser(
{
user: "application/[email protected]",
roles: [ { role: "read", db: "records" } ]
}
)
Add additional principals as needed. For every user you want to authenticate using Kerberos, you must
create a corresponding user in MongoDB. For more information about creating and managing users, see
http://docs.mongodb.org/manual/reference/command/nav-user-management.
Step 4: Start mongod with Kerberos support. To start mongod with Kerberos support, set the environmental
variable KRB5_KTNAME to the path of the keytab file and the mongod parameter authenticationMechanisms
to GSSAPI in the following form:
env KRB5_KTNAME=<path to keytab file> \
mongod \
--setParameter authenticationMechanisms=GSSAPI
<additional mongod options>
For example, the following starts a standalone mongod instance with Kerberos support:
env KRB5_KTNAME=/opt/mongodb/mongod.keytab \
/opt/mongodb/bin/mongod --auth \
--setParameter authenticationMechanisms=GSSAPI \
--dbpath /opt/mongodb/data
The path to your mongod as well as your keytab file (page 319) may differ. Modify or include additional mongod
options as required for your configuration. The keytab file (page 319) must be only accessible to the owner of the
mongod process.
With the official .deb or .rpm packages, you can set the KRB5_KTNAME in a environment settings file. See
KRB5_KTNAME (page 361) for details.
Step 5: Connect mongo shell to mongod and authenticate. Connect the mongo shell client as the Kerberos principal application/[email protected]. Before connecting, you must have used Kerbeross kinit
program to get credentials for application/[email protected].
You can connect and authenticate from the command line.
mongo --authenticationMechanism=GSSAPI --authenticationDatabase='$external' \
--username application/[email protected]
Or, alternatively, you can first connect mongo to the mongod, and then from the mongo shell, use the db.auth()
method to authenticate in the $external database.
360
Chapter 6. Security
use $external
db.auth( { mechanism: "GSSAPI", user: "application/[email protected]" } )
Additional Considerations
KRB5_KTNAME If you installed MongoDB Enterprise using one of the official .deb or .rpm packages, and you
use the included init/upstart scripts to control the mongod instance, you can set the KR5_KTNAME variable in the
default environment settings file instead of setting the variable each time.
For .rpm packages, the default environment settings file is /etc/sysconfig/mongod.
For .deb packages, the file is /etc/default/mongodb.
Set the KRB5_KTNAME value in a line that resembles the following:
export KRB5_KTNAME="<path to keytab>"
Configure mongos for Kerberos To start mongos with Kerberos support, set the environmental variable KRB5_KTNAME to the path of its keytab file (page 319) and the mongos parameter
authenticationMechanisms to GSSAPI in the following form:
env KRB5_KTNAME=<path to keytab file> \
mongos \
--setParameter authenticationMechanisms=GSSAPI \
<additional mongos options>
For example, the following starts a mongos instance with Kerberos support:
env KRB5_KTNAME=/opt/mongodb/mongos.keytab \
mongos \
--setParameter authenticationMechanisms=GSSAPI \
--configdb shard0.example.net, shard1.example.net,shard2.example.net \
--keyFile /opt/mongodb/mongos.keyfile
The path to your mongos as well as your keytab file (page 319) may differ. The keytab file (page 319) must be only
accessible to the owner of the mongos process.
Modify or include any additional mongos options as required for your configuration. For example, instead of using --keyFile for internal authentication of sharded cluster members, you can use x.509 member authentication
(page 350) instead.
Use a Config File To configure mongod or mongos for Kerberos support using a configuration file,
specify the authenticationMechanisms setting in the configuration file:
setParameter=authenticationMechanisms=GSSAPI
Modify or include any additional mongod options as required for your configuration.
For example, if /opt/mongodb/mongod.conf contains the following configuration settings for a standalone
mongod:
auth = true
setParameter=authenticationMechanisms=GSSAPI
dbpath=/opt/mongodb/data
361
env KRB5_KTNAME=/opt/mongodb/mongod.keytab \
/opt/mongodb/bin/mongod --config /opt/mongodb/mongod.conf
The path to your mongod, keytab file (page 319), and configuration file may differ. The keytab file (page 319) must
be only accessible to the owner of the mongod process.
Troubleshoot Kerberos Setup for MongoDB If you encounter problems when starting mongod or mongos with
Kerberos authentication, see Troubleshoot Kerberos Authentication on Linux (page 366).
Incorporate Additional Authentication Mechanisms Kerberos authentication (GSSAPI (page 310) (Kerberos))
can work alongside MongoDBs challenge/response authentication mechanisms (SCRAM-SHA-1 (page 309) and
MONGODB-CR (page 309)), MongoDBs authentication mechanism for LDAP (PLAIN (page 310) (LDAP SASL)),
and MongoDBs authentication mechanism for x.509 ( MONGODB-X509 (page 309)). Specify the mechanisms as
follows:
--setParameter authenticationMechanisms=GSSAPI,SCRAM-SHA-1
Only add the other mechanisms if in use. This parameter setting does not affect MongoDBs internal authentication of
cluster members.
Additional Resources
MongoDB LDAP and Kerberos Authentication with Dell (Quest) Authentication Services54
MongoDB with Red Hat Enterprise Linux Identity Management and Kerberos55
Configure MongoDB with Kerberos Authentication on Windows
New in version 2.6.
Overview
MongoDB Enterprise supports authentication using a Kerberos service (page 318). Kerberos is an industry standard
authentication protocol for large client/server system. Kerberos allows MongoDB and applications to take advantage
of existing authentication infrastructure and processes.
Prerequisites
Setting up and configuring a Kerberos deployment is beyond the scope of this document. This tutorial assumes have
configured a Kerberos service principal (page 319) for each mongod.exe and mongos.exe instance.
Procedures
Step 1: Start mongod.exe without Kerberos. For the initial addition of Kerberos users, start mongod.exe
without Kerberos support.
If a Kerberos user is already in MongoDB and has the privileges required to create a user, you can start mongod.exe
with Kerberos support.
54 https://www.mongodb.com/blog/post/mongodb-ldap-and-kerberos-authentication-dell-quest-authentication-services
55 http://docs.mongodb.org/ecosystem/tutorial/manage-red-hat-enterprise-linux-identity-management/
362
Chapter 6. Security
Step 2: Connect to mongod. Connect via the mongo.exe shell to the mongod.exe instance. If mongod.exe
has --auth enabled, ensure you connect with the privileges required to create a user.
Step 3: Add Kerberos Principal(s) to MongoDB. Add a Kerberos principal, <username>@<KERBEROS
REALM>, to MongoDB in the $external database. Specify the Kerberos realm in all uppercase. The $external
database allows mongod.exe to consult an external source (e.g. Kerberos) to authenticate. To specify the users
privileges, assign roles (page 312) to the user.
The following example adds the Kerberos principal [email protected] with read-only access to the
records database:
use $external
db.createUser(
{
user: "[email protected]",
roles: [ { role: "read", db: "records" } ]
}
)
Add additional principals as needed. For every user you want to authenticate using Kerberos, you must
create a corresponding user in MongoDB. For more information about creating and managing users, see
http://docs.mongodb.org/manual/reference/command/nav-user-management.
Step 4: Start mongod.exe with Kerberos support. You must start mongod.exe as the service principal account (page 364).
To start mongod.exe with Kerberos support, set the mongod.exe parameter authenticationMechanisms
to GSSAPI:
mongod.exe --setParameter authenticationMechanisms=GSSAPI <additional mongod.exe options>
For example, the following starts a standalone mongod.exe instance with Kerberos support:
mongod.exe --auth --setParameter authenticationMechanisms=GSSAPI
Or, alternatively, you can first connect mongo.exe to the mongod.exe, and then from the mongo.exe shell, use
the db.auth() method to authenticate in the $external database.
use $external
db.auth( { mechanism: "GSSAPI", user: "[email protected]" } )
Additional Considerations
Configure mongos.exe for Kerberos To start mongos.exe with Kerberos support, set the mongos.exe parameter authenticationMechanisms to GSSAPI. You must start mongos.exe as the service principal account (page 364).:
363
For example, the following starts a mongos instance with Kerberos support:
Modify or include any additional mongos.exe options as required for your configuration. For example, instead of
using --keyFile for for internal authentication of sharded cluster members, you can use x.509 member authentication (page 350) instead.
Assign Service Principal Name to MongoDB Windows Service Use setspn.exe to assign the service principal
name (SPN) to the account running the mongod.exe and the mongos.exe service:
setspn.exe -A <service>/<fully qualified domain name> <service account name>
For example, if mongod.exe runs as a service named mongodb on testserver.mongodb.com with the service account name mongodtest, assign the SPN as follows:
setspn.exe -A mongodb/testserver.mongodb.com mongodtest
Incorporate Additional Authentication Mechanisms Kerberos authentication (GSSAPI (page 310) (Kerberos))
can work alongside MongoDBs challenge/response authentication mechanisms (SCRAM-SHA-1 (page 309) and
MONGODB-CR (page 309)), MongoDBs authentication mechanism for LDAP (PLAIN (page 310) (LDAP SASL)),
and MongoDBs authentication mechanism for x.509 ( MONGODB-X509 (page 309)). Specify the mechanisms as
follows:
--setParameter authenticationMechanisms=GSSAPI,SCRAM-SHA-1
Only add the other mechanisms if in use. This parameter setting does not affect MongoDBs internal authentication of
cluster members.
Authenticate to a MongoDB Instance or Cluster
Overview
To authenticate to a running mongod or mongos instance, you must have user credentials for a resource on that
instance. When you authenticate to MongoDB, you authenticate either to a database or to a cluster. Your user privileges
determine the resource you can authenticate to.
You authenticate to a resource either by:
using the authentication options when connecting to the mongod or mongos instance, or
connecting first and then authenticating to the resource with the authenticate command or the db.auth()
method.
This section describes both approaches.
In general, always use a trusted channel (VPN, SSL, trusted wired network) for connecting to a MongoDB instance.
Prerequisites
You must have user credentials on the database or cluster to which you are authenticating.
364
Chapter 6. Security
Procedures
Step 2: Close the session when your work is complete. To close an authenticated session, use the logout command.:
db.runCommand( { logout: 1 } )
Step 3: Authenticate. Use either the authenticate command or the db.auth() method to provide your
username and password to the database. For example:
db.auth( "prodManager", "cleartextPassword" )
Step 4: Close the session when your work is complete. To close an authenticated session, use the logout command.:
db.runCommand( { logout: 1 } )
This section describes how to generate a key file to store authentication information. After generating a key file,
specify the key file using the keyFile option when starting a mongod or mongos instance.
A keys length must be between 6 and 1024 characters and may only contain characters in the base64 set. The key
file must not have group or world permissions on UNIX systems. Key file permissions are not checked on Windows
systems.
MongoDB strips whitespace characters (e.g. x0d, x09, and x20) for cross-platform convenience. As a result, the
following operations produce identical keys:
echo
echo
echo
echo
-e
-e
-e
-e
365
Procedure
Step 1: Create a key file. Create the key file your deployment will use to authenticate servers to each other.
To generate pseudo-random data to use for a keyfile, issue the following openssl command:
openssl rand -base64 741 > mongodb-keyfile
chmod 600 mongodb-keyfile
You may generate a key file using any method you choose. Always ensure that the password stored in the key file is
both long and contains a high amount of entropy. Using openssl in this manner helps generate such a key.
Step 2: Specify the key file when starting a MongoDB instance. Specify the path to the key file with the keyFile
option.
Troubleshoot Kerberos Authentication on Linux
New in version 2.4.
Kerberos Configuration Checklist
If you have difficulty starting mongod or mongos with Kerberos (page 318) on Linux systems, ensure that:
The mongod and the mongos binaries are from MongoDB Enterprise.
To verify MongoDB Enterprise binaries:
mongod --version
In the output from this command, look for the string modules:
enterprise to confirm your system has MongoDB Enterprise.
subscription or modules:
You are not using the HTTP Console56 . MongoDB Enterprise does not support Kerberos authentication over the
HTTP Console interface.
Either the service principal name (SPN) in the keytab file (page 319) matches the SPN for the
mongod or mongos instance, or the mongod or the mongos instance use the --setParameter
saslHostName=<host name> to match the name in the keytab file.
The canonical system hostname of the system that runs the mongod or mongos instance is a resolvable, fully
qualified domain for this host. You can test the system hostname resolution with the hostname -f command
at the system prompt.
Each host that runs a mongod or mongos instance has both the A and PTR DNS records to provide forward
and reverse lookup. The records allow the host to resolve the components of the Kerberos infrastructure.
Both the Kerberos Key Distribution Center (KDC) and the system running mongod instance or mongos must
be able to resolve each other using DNS. By default, Kerberos attempts to resolve hosts using the content of the
/etc/kerb5.conf before using DNS to resolve hosts.
The time synchronization of the systems running mongod or the mongos instances and the Kerberos infrastructure are within the maximum time skew (default is 5 minutes) of each other. Time differences greater than
the maximum time skew will prevent successful authentication.
56 http://docs.mongodb.org/ecosystem/tools/http-interface/#http-console
366
Chapter 6. Security
If you still encounter problems with Kerberos on Linux, you can start both mongod and mongo (or another client)
with the environment variable KRB5_TRACE set to different files to produce more verbose logging of the Kerberos
process to help further troubleshooting. For example, the following starts a standalone mongod with KRB5_TRACE
set:
env KRB5_KTNAME=/opt/mongodb/mongod.keytab \
KRB5_TRACE=/opt/mongodb/log/mongodb-kerberos.log \
/opt/mongodb/bin/mongod --dbpath /opt/mongodb/data \
--fork --logpath /opt/mongodb/log/mongod.log \
--auth --setParameter authenticationMechanisms=GSSAPI
In some situations, MongoDB will return error messages from the GSSAPI interface if there is a problem with the
Kerberos service. Some common error messages are:
GSSAPI error in client while negotiating security context. This error occurs on the
client and reflects insufficient credentials or a malicious attempt to authenticate.
If you receive this error, ensure that you are using the correct credentials and the correct fully qualified domain
name when connecting to the host.
GSSAPI error acquiring credentials. This error occurs during the start of the mongod or mongos
and reflects improper configuration of the system hostname or a missing or incorrectly configured keytab file.
If you encounter this problem, consider the items in the Kerberos Configuration Checklist (page 366), in particular, whether the SPN in the keytab file (page 319) matches the SPN for the mongod or mongos instance.
To determine whether the SPNs match:
1. Examine the keytab file, with the following command:
klist -k <keytab>
Ensure that this name matches the name in the keytab file, or start mongod or mongos with the
--setParameter saslHostName=<hostname>.
See also:
Kerberos Authentication (page 318)
Configure MongoDB with Kerberos Authentication on Linux (page 359)
Configure MongoDB with Kerberos Authentication on Windows (page 362)
Implement Field Level Redaction
The $redact pipeline operator restricts the contents of the documents based on information stored in the documents
themselves.
367
To store the access criteria data, add a field to the documents and embedded documents. To allow for multiple combinations of access levels for the same data, consider setting the access field to an array of arrays. Each array element
contains a required set that allows a user with that set to access the data.
Then, include the $redact stage in the db.collection.aggregate() operation to restrict contents of the
result set based on the access required to view the data.
For more information on the $redact pipeline operator, including its syntax and associated system variables as well
as additional examples, see $redact.
Procedure
For example, a forecasts collection contains documents of the following form where the tags field determines
the access levels required to view the data:
{
_id: 1,
title: "123 Department Report",
tags: [ [ "G" ], [ "FDW" ] ],
year: 2014,
subsections: [
{
subtitle: "Section 1: Overview",
tags: [ [ "SI", "G" ], [ "FDW" ] ],
content: "Section 1: This is the content of section 1."
},
{
subtitle: "Section 2: Analysis",
tags: [ [ "STLW" ] ],
368
Chapter 6. Security
For each document, the tags field contains various access groupings necessary to view the data. For example, the
value [ [ "G" ], [ "FDW", "TGE" ] ] can specify that a user requires either access level ["G"] or both [
"FDW", "TGE" ] to view the data.
Consider a user who only has access to view information tagged with either "FDW" or "TGE". To run a query on all
documents with year 2014 for this user, include a $redact stage as in the following:
var userAccess = [ "FDW", "TGE" ];
db.forecasts.aggregate(
[
{ $match: { year: 2014 } },
{ $redact:
{
$cond: {
if: { $anyElementTrue:
{
$map: {
input: "$tags" ,
as: "fieldTag",
in: { $setIsSubset: [ "$$fieldTag", userAccess ] }
}
}
},
then: "$$DESCEND",
else: "$$PRUNE"
}
}
}
]
)
The aggregation operation returns the following redacted document for the user:
{ "_id" : 1,
"title" : "123 Department Report",
"tags" : [ [ "G" ], [ "FDW" ] ],
"year" : 2014,
"subsections" :
[
{
"subtitle" : "Section 1: Overview",
"tags" : [ [ "SI", "G" ], [ "FDW" ] ],
"content" : "Section 1: This is the content of section 1."
},
{
"subtitle" : "Section 3: Budgeting",
369
See also:
$map, $setIsSubset, $anyElementTrue
User administrators create users and create and assigns roles. A user administrator can grant any privilege in the
database and can create new ones. In a MongoDB deployment, create the user administrator as the first user. Then let
this user create all other users.
To provide user administrators, MongoDB has userAdmin (page 392) and userAdminAnyDatabase (page 396)
roles, which grant access to actions (page 403) that support user and role management. Following the policy of least
privilege userAdmin (page 392) and userAdminAnyDatabase (page 396) confer no additional privileges.
Carefully control access to these roles. A user with either of these roles can grant itself unlimited additional privileges.
Specifically, a user with the userAdmin (page 392) role can grant itself any privilege in the database. A user assigned
either the userAdmin (page 392) role on the admin database or the userAdminAnyDatabase (page 396) can
grant itself any privilege in the system.
370
Chapter 6. Security
Prerequisites
Required Access You must have the createUser (page 404) action (page 403) on a database to create a new user
on that database.
You must have the grantRole (page 404) action (page 403) on a roles database to grant the role to another user.
If you have the userAdmin (page 392) or userAdminAnyDatabase (page 396) role, you have those actions.
First User Restrictions If your MongoDB deployment has no users, you must connect to mongod using the localhost exception (page 311) or use the --noauth option when starting mongod to gain full access the system. Once
you have access, you can skip to Creating the system user administrator in this procedure.
If users exist in the MongoDB database, but none of them has the appropriate prerequisites to create a new user or you
do not have access to them, you must restart mongod with the --noauth option.
Procedure
Step 1: Connect to MongoDB with the appropriate privileges. Connect to mongod or mongos either through
the localhost exception (page 311) or as a user with the privileges indicated in the prerequisites section.
In the following example, manager has the required privileges specified in Prerequisites (page 371).
mongo --port 27017 -u manager -p 123456 --authenticationDatabase admin
The following example creates the user siteUserAdmin user on the admin database:
use admin
db.createUser(
{
user: "siteUserAdmin",
pwd: "password",
roles: [ { role: "userAdminAnyDatabase", db: "admin" } ]
}
)
Step 3: Create a user administrator for a single database. Optionally, you may want to create user administrators
that only have access to administer users in a specific database by way of the userAdmin (page 392) role.
The following example creates the user recordsUserAdmin on the records database:
use records
db.createUser(
{
user: "recordsUserAdmin",
pwd: "password",
roles: [ { role: "userAdmin", db: "records" } ]
}
)
371
Related Documents
Each application and user of a MongoDB system should map to a distinct application or administrator. This access
isolation facilitates access revocation and ongoing user maintenance. At the same time users should have only the
minimal set of privileges required to ensure a system of least privilege.
To create a user, you must define the users credentials and assign that user roles (page 312). Credentials verify the
users identity to a database, and roles determine the users access to database resources and operations.
For an overview of credentials and roles in MongoDB see Security Introduction (page 305).
Considerations
For users that authenticate using external mechanisms, 60 you do not need to provide credentials when creating users.
For all users, select the roles that have the exact required privileges (page 312). If the correct roles do not exist, create
roles (page 374).
You can create a user without assigning roles, choosing instead to assign the roles later. To do so, create the user with
an empty roles (page 401) array.
Prerequisites
To create a user on a system that uses authentication (page 308), you must authenticate as a user administrator. If you
have not yet created a user administrator, do so as described in Create a User Administrator (page 370).
57 https://www.mongodb.com/lp/white-paper/mongodb-security-architecture
58 http://www.mongodb.com/webinar/securing-your-mongodb-deployment
59 https://www.mongodb.com/presentations/creating-single-view-part-3-securing-your-deployment
60 Configure MongoDB with Kerberos Authentication on Linux (page 359), Authenticate Using SASL and LDAP with OpenLDAP (page 356),
Authenticate Using SASL and LDAP with ActiveDirectory (page 354), and x.509 certificates provide external authentication mechanisms.
372
Chapter 6. Security
Required Access You must have the createUser (page 404) action (page 403) on a database to create a new user
on that database.
You must have the grantRole (page 404) action (page 403) on a roles database to grant the role to another user.
If you have the userAdmin (page 392) or userAdminAnyDatabase (page 396) role, you have those actions.
First User Restrictions If your MongoDB deployment has no users, you must connect to mongod using the localhost exception (page 311) or use the --noauth option when starting mongod to gain full access the system. Once
you have access, you can skip to Creating the system user administrator in this procedure.
If users exist in the MongoDB database, but none of them has the appropriate prerequisites to create a new user or you
do not have access to them, you must restart mongod with the --noauth option.
Procedures
Step 1: Connect to MongoDB with the appropriate privileges. Connect to the mongod or mongos with the
privileges specified in the Prerequisites (page 372) section.
The following procedure uses the siteUserAdmin created in Create a User Administrator (page 370).
mongo --port 27017 -u siteUserAdmin -p password --authenticationDatabase admin
Step 2: Create the new user. Create the user in the database to which the user will belong. Pass a well formed user
document to the db.createUser() method.
The following operation creates a user in the reporting database with the specified name, password, and roles.
use reporting
db.createUser(
{
user: "reportsUser",
pwd: "12345678",
roles: [
{ role: "read", db: "reporting" },
{ role: "read", db: "products" },
{ role: "read", db: "sales" },
{ role: "readWrite", db: "accounts" }
]
}
)
To authenticate the reportsUser, you must authenticate the user in the reporting database.
Create an Administrative User with Unrestricted Access
Overview
Most users should have only the minimal set of privileges required for their operations, in keeping with the policy of
least privilege. However, some authorization architectures may require a user with unrestricted access. To support
these super users, you can create users with access to all database resources (page 402) and actions (page 403).
For many deployments, you may be able to avoid having any users with unrestricted access by having an administrative
user with the createUser (page 404) and grantRole (page 404) actions granted as needed to support operations.
373
If users truly need unrestricted access to a MongoDB deployment, MongoDB provides a built-in role (page 390) named
root (page 397) that grants the combined privileges of all built-in roles. This document describes how to create an
administrative user with the root (page 397) role.
For descriptions of the access each built-in role provides, see the section on built-in roles (page 390).
Prerequisites
Required Access You must have the createUser (page 404) action (page 403) on a database to create a new user
on that database.
You must have the grantRole (page 404) action (page 403) on a roles database to grant the role to another user.
If you have the userAdmin (page 392) or userAdminAnyDatabase (page 396) role, you have those actions.
First User Restrictions If your MongoDB deployment has no users, you must connect to mongod using the localhost exception (page 311) or use the --noauth option when starting mongod to gain full access the system. Once
you have access, you can skip to Creating the system user administrator in this procedure.
If users exist in the MongoDB database, but none of them has the appropriate prerequisites to create a new user or you
do not have access to them, you must restart mongod with the --noauth option.
Procedure
Step 1: Connect to MongoDB with the appropriate privileges. Connect to the mongod or mongos as a user
with the privileges specified in the Prerequisites (page 374) section.
The following procedure uses the siteUserAdmin created in Create a User Administrator (page 370).
mongo --port 27017 -u siteUserAdmin -p password --authenticationDatabase admin
Step 2: Create the administrative user. In the admin database, create a new user using the db.createUser()
method. Give the user the built-in root (page 397) role.
For example:
use admin
db.createUser(
{
user: "superuser",
pwd: "12345678",
roles: [ "root" ]
}
)
Authenticate against the admin database to test the new user account. Use db.auth() while using the admin
database or use the mongo shell with the --authenticationDatabase option.
Create a Role
Overview
Roles grant users access to MongoDB resources. By default, MongoDB provides a number of built-in roles (page 390)
that administrators may use to control access to a MongoDB system. However, if these roles cannot describe the
374
Chapter 6. Security
desired set of privileges, you can create a new, customized role in a particular database.
Except for roles created in the admin database, a role can only include privileges that apply to its database and can
only inherit from other roles in its database.
A role created in the admin database can include privileges that apply to the admin database, other databases or to
the cluster (page 403) resource, and can inherit from roles in other databases as well as the admin database.
MongoDB uses the combination of the database name and the role name to uniquely define a role.
Prerequisites
To create a new role, use the db.createRole() method, specifying the privileges in the privileges array and
the inherited roles in the roles array.
Create a Role to Manage Current Operations The following example creates a role named manageOpRole
which provides only the privileges to run both db.currentOp() and db.killOp(). 61
Step 1: Connect to MongoDB with the appropriate privileges. Connect to mongod or mongos with the privileges specified in the Prerequisites (page 375) section.
The following procedure uses the siteUserAdmin created in Create a User Administrator (page 370).
mongo --port 27017 -u siteUserAdmin -p password --authenticationDatabase admin
The siteUserAdmin has privileges to create roles in the admin as well as other databases.
Step 2: Create a new role to manage current operations. manageOpRole has privileges that act on multiple
databases as well as the cluster resource (page 403). As such, you must create the role in the admin database.
use admin
db.createRole(
{
role: "manageOpRole",
privileges: [
{ resource: { cluster: true }, actions: [ "killop", "inprog" ] },
{ resource: { db: "", collection: "" }, actions: [ "killCursors" ] }
],
roles: []
}
)
61 The built-in role clusterMonitor (page 393) also provides the privilege to run db.currentOp() along with other privileges, and the
built-in role hostManager (page 394) provides the privilege to run db.killOp() along with other privileges.
375
Create a Role to Run mongostat The following example creates a role named mongostatRole that provides
only the privileges to run mongostat. 62
Step 1: Connect to MongoDB with the appropriate privileges. Connect to mongod or mongos with the privileges specified in the Prerequisites (page 375) section.
The following procedure uses the siteUserAdmin created in Create a User Administrator (page 370).
mongo --port 27017 -u siteUserAdmin -p password --authenticationDatabase admin
The siteUserAdmin has privileges to create roles in the admin as well as other databases.
Step 2: Create a new role to manage current operations. mongostatRole has privileges that act on the cluster
resource (page 403). As such, you must create the role in the admin database.
use admin
db.createRole(
{
role: "mongostatRole",
privileges: [
{ resource: { cluster: true }, actions: [ "serverStatus" ] }
],
roles: []
}
)
A role provides a user privileges to perform a set of actions (page 403) on a resource (page 402). A user can have
multiple roles.
In MongoDB systems with authorization enforced, you must grant a user a role for the user to access a database
resource. To assign a role, first determine the privileges the user needs and then determine the role that grants those
privileges.
For an overview of roles and privileges, see Authorization (page 312). For descriptions of the access each built-in role
provides, see the section on built-in roles (page 390).
62
The built-in role clusterMonitor (page 393) also provides the privilege to run mongostat along with other privileges.
376
Chapter 6. Security
Prerequisites
You must have the grantRole (page 404) action (page 403) on a database to grant a role on that database.
To view a roles information, you must be explicitly granted the role or must have the viewRole (page 405) action
(page 403) on the roles database.
Procedure
Step 1: Connect with the privilege to grant roles. Connect to the mongod or mongos as a user with the privileges
specified in the Prerequisites (page 377) section.
The following procedure uses the siteUserAdmin created in Create a User Administrator (page 370).
mongo --port 27017 -u siteUserAdmin -p password --authenticationDatabase admin
Step 2: Identify the users roles and privileges. To display the roles and privileges of the user to be modified, use
the db.getUser() and db.getRole() methods.
For example, to view roles for reportsUser created in Add a User to a Database (page 372), issue:
use reporting
db.getUser("reportsUser")
To display the privileges granted to the user by the readWrite role on the "accounts" database, issue:
use accounts
db.getRole( "readWrite", { showPrivileges: true } )
Step 3: Identify the privileges to grant or revoke. If the user requires additional privileges, grant to the user the
role, or roles, with the required set of privileges. If such a role does not exist, create a new role (page 374) with the
appropriate set of privileges.
Step 4: Grant a role to a user. Grant the user the role using the db.grantRolesToUser() method.
For example, the following grants new roles to the user reportsUser created in Add a User to a Database
(page 372).
use reporting
db.grantRolesToUser(
"reportsUser",
[
{ role: "readWrite", db: "products" } ,
{ role: "readAnyDatabase", db:"admin" }
]
)
A users privileges determine the access the user has to MongoDB resources (page 402) and the actions (page 403)
that user can perform. Users receive privileges through role assignments. A user can have multiple roles, and each
role can have multiple privileges.
6.3. Security Tutorials
377
To view a roles information, you must be explicitly granted the role or must have the viewRole (page 405) action
(page 403) on the roles database.
Procedure
Step 1: Connect to MongoDB with the appropriate privileges. Connect to mongod or mongos as a user with
the privileges specified in the prerequisite section.
The following procedure uses the siteUserAdmin created in Create a User Administrator (page 370).
mongo --port 27017 -u siteUserAdmin -p password --authenticationDatabase admin
Step 2: Identify the users roles. Use the usersInfo command or db.getUser() method to display user
information.
For example, to view roles for reportsUser created in Add a User to a Database (page 372), issue:
use reporting
db.getUser("reportsUser")
In the returned document, the roles (page 401) field displays all roles for reportsUser:
...
"roles" : [
{ "role"
{ "role"
{ "role"
{ "role"
]
:
:
:
:
"readWrite",
"read", "db"
"read", "db"
"read", "db"
"db" : "accounts" },
: "reporting" },
: "products" },
: "sales" }
Step 3: Identify the privileges granted by the roles. For a given role, use the db.getRole() method, or the
rolesInfo command, with the showPrivileges option:
For example, to view the privileges granted by read role on the products database, use the following operation,
issue:
use products
db.getRole( "read", { showPrivileges: true } )
In the returned document, the privileges and inheritedPrivileges arrays. The privileges lists
the privileges directly specified by the role and excludes those privileges inherited from other roles. The
inheritedPrivileges lists all privileges granted by this role, both directly specified and inherited. If the role
does not inherit from other roles, the two fields are the same.
...
"privileges" : [
{
"resource": { "db" : "products", "collection" : "" },
"actions": [ "collStats","dbHash","dbStats","find","killCursors","planCacheRead" ]
},
{
"resource" : { "db" : "products", "collection" : "system.js" },
378
Chapter 6. Security
"actions": [ "collStats","dbHash","dbStats","find","killCursors","planCacheRead" ]
}
],
"inheritedPrivileges" : [
{
"resource": { "db" : "products", "collection" : "" },
"actions": [ "collStats","dbHash","dbStats","find","killCursors","planCacheRead" ]
},
{
"resource" : { "db" : "products", "collection" : "system.js" },
"actions": [ "collStats","dbHash","dbStats","find","killCursors","planCacheRead" ]
}
]
When a users responsibilities change, modify the users access to include only those roles the user requires. This
follows the policy of least privilege.
To change a users access, first determine the privileges the user needs and then determine the roles that grants those
privileges. Grant and revoke roles using the db.grantRolesToUser() and db.revokeRolesFromUser()
methods.
For an overview of roles and privileges, see Authorization (page 312). For descriptions of the access each built-in role
provides, see the section on built-in roles (page 390).
Prerequisites
You must have the grantRole (page 404) action (page 403) on a database to grant a role on that database.
You must have the revokeRole (page 405) action (page 403) on a database to revoke a role on that database.
To view a roles information, you must be explicitly granted the role or must have the viewRole (page 405) action
(page 403) on the roles database.
Procedure
Step 1: Connect to MongoDB with the appropriate privileges. Connect to mongod or mongos as a user with
the privileges specified in the prerequisite section.
The following procedure uses the siteUserAdmin created in Create a User Administrator (page 370).
mongo --port 27017 -u siteUserAdmin -p password --authenticationDatabase admin
Step 2: Identify the users roles and privileges. To display the roles and privileges of the user to be modified, use
the db.getUser() and db.getRole() methods.
For example, to view roles for reportsUser created in Add a User to a Database (page 372), issue:
use reporting
db.getUser("reportsUser")
379
To display the privileges granted to the user by the readWrite role on the "accounts" database, issue:
use accounts
db.getRole( "readWrite", { showPrivileges: true } )
Step 3: Identify the privileges to grant or revoke. If the user requires additional privileges, grant to the user the
role, or roles, with the required set of privileges. If such a role does not exist, create a new role (page 374) with the
appropriate set of privileges.
To revoke a subset of privileges provided by an existing role: revoke the original role and grant a role that contains
only the required privileges. You may need to create a new role (page 374) if a role does not exist.
Step 4: Modify the users access.
Revoke a Role Revoke a role with the db.revokeRolesFromUser() method. The following example operation removes the readWrite (page 390) role on the accounts database from the reportsUser:
use reporting
db.revokeRolesFromUser(
"reportsUser",
[
{ role: "readWrite", db: "accounts" }
]
)
Grant a Role Grant a role using the db.grantRolesToUser() method. For example, the following operation
grants the reportsUser user the read (page 390) role on the accounts database:
use reporting
db.grantRolesToUser(
"reportsUser",
[
{ role: "read", db: "accounts" }
]
)
For sharded clusters, the changes to the user are instant on the mongos on which the command runs. However, for other mongos instances in the cluster, the user cache may wait up to 10 minutes to refresh. See
userCacheInvalidationIntervalSecs.
View Roles
Overview
A role (page 312) grants privileges to the users who are assigned the role. Each role is scoped to a particular
database, but MongoDB stores all role information in the admin.system.roles (page 287) collection in the
admin database.
Prerequisites
To view a roles information, you must be explicitly granted the role or must have the viewRole (page 405) action
(page 403) on the roles database.
380
Chapter 6. Security
Procedures
The following procedures use the rolesInfo command. You also can use the methods db.getRole() (singular)
and db.getRoles().
View a Role in the Current Database If the role is in the current database, you can refer to the role by name, as for
the role dataEntry on the current database:
db.runCommand({ rolesInfo: "dataEntry" })
If the role is in a different database, specify the role as a document. Use the
To view the custom appWriter role in the orders database, issue the following command from the mongo shell:
db.runCommand({ rolesInfo: { role: "appWriter", db: "orders" } })
View Multiple Roles To view information for multiple roles, specify each role as a document or string in an array.
To view the custom appWriter and clientWriter roles in the orders database, as well as the dataEntry
role on the current database, use the following command from the mongo shell:
db.runCommand( { rolesInfo: [ { role: "appWriter", db: "orders" },
{ role: "clientWriter", db: "orders" },
"dataEntry" ]
} )
To view the all custom roles, query admin.system.roles (page 398) collection directly, for
db = db.getSiblingDB('admin')
db.system.roles.find()
Strong passwords help prevent unauthorized access, and all users should have strong passwords. You can use the
openssl program to generate unique strings for use in passwords, as in the following command:
openssl rand -base64 48
Prerequisites
You must have the changeAnyPassword action (page 403) on a database to modify the password of any user on
that database.
381
To change your own password, you must have the changeOwnPassword (page 404) action (page 403) on your
database. See Change Your Password and Custom Data (page 382).
Procedure
Step 1: Connect to MongoDB with the appropriate privileges. Connect to the mongod or mongos with the
privileges specified in the Prerequisites (page 381) section.
The following procedure uses the siteUserAdmin created in Create a User Administrator (page 370).
mongo --port 27017 -u siteUserAdmin -p password --authenticationDatabase admin
Step 2:
Change the password. Pass
db.changeUserPassword() method.
the
users
username
and
the
new
password
to
the
Users with appropriate privileges can change their own passwords and custom data. Custom data (page 401) stores
optional user information.
Considerations
To generate a strong password for use in this procedure, you can use the openssl utilitys rand command. For
example, issue openssl rand with the following options to create a base64-encoded string of 48 pseudo-random
bytes:
openssl rand -base64 48
Prerequisites
To modify your own password and custom data, you must have privileges that grant changeOwnPassword
(page 404) and changeOwnCustomData (page 404) actions (page 403) respectively on the users database.
Step 1: Connect as a user with privileges to manage users and roles. Connect to the mongod or mongos with
privileges to manage users and roles, such as a user with userAdminAnyDatabase (page 396) role. The following
procedure uses the siteUserAdmin created in Create a User Administrator (page 370).
mongo --port 27017 -u siteUserAdmin -p password --authenticationDatabase admin
382
Chapter 6. Security
Step 2: Create a role with appropriate privileges. In the admin database, create a new role with
changeOwnPassword (page 404) and changeOwnCustomData (page 404).
use admin
db.createRole(
{ role: "changeOwnPasswordCustomDataRole",
privileges: [
{
resource: { db: "", collection: ""},
actions: [ "changeOwnPassword", "changeOwnCustomData" ]
}
],
roles: []
}
)
Step 3: Add a user with this role. In the test database, create a new user with the created
"changeOwnPasswordCustomDataRole" role. For example, the following operation creates a user with both
the built-in role readWrite (page 390) and the user-created "changeOwnPasswordCustomDataRole".
use test
db.createUser(
{
user:"user123",
pwd:"12345678",
roles:[ "readWrite", { role:"changeOwnPasswordCustomDataRole", db:"admin" } ]
}
)
Step 1: Connect with the appropriate privileges. Connect to the mongod or mongos as a user with appropriate
privileges.
For example, the following operation connects to MongoDB as user123 created in the Prerequisites (page 382)
section.
mongo --port 27017 -u user123 -p 12345678 --authenticationDatabase test
To check that you have the privileges specified in the Prerequisites (page 382) section as well as to see user information,
use the usersInfo command with the showPrivileges option.
Step 2: Change your password and custom data. Use the db.updateUser() method to update the password
and custom data.
For example, the following operation changes thw users password to KNlZmiaNUp0B and custom data to {
title: "Senior Manager" }:
use test
db.updateUser(
"user123",
{
pwd: "KNlZmiaNUp0B",
customData: { title: "Senior Manager" }
383
}
)
To enable auditing and print audit events to the syslog (option is unavailable on Windows) in JSON format, specify
syslog for the --auditDestination setting. For example:
mongod --dbpath data/db --auditDestination syslog
Warning: The syslog message limit can result in the truncation of the audit messages. The auditing system will
neither detect the truncation nor error upon its occurrence.
You may also specify these options in the configuration file:
storage:
dbPath: data/db
auditLog:
destination: syslog
Output to Console
To enable auditing and print the audit events to standard output (i.e.
--auditDestination setting. For example:
384
Chapter 6. Security
To enable auditing and print audit events to a file in JSON format, specify file for the --auditDestination setting, JSON for the --auditFormat setting, and the output filename for the --auditPath. The --auditPath
option accepts either full path name or relative path name. For example, the following enables auditing and records
audit events to a file with the relative path name of data/db/auditLog.json:
mongod --dbpath data/db --auditDestination file --auditFormat JSON --auditPath data/db/auditLog.json
The audit file rotates at the same time as the server log file.
You may also specify these options in the configuration file:
storage:
dbPath: data/db
auditLog:
destination: file
format: JSON
path: data/db/auditLog.json
Note: Printing audit events to a file in JSON format degrades server performance more than printing to a file in BSON
format.
To enable auditing and print audit events to a file in BSON binary format, specify file for the
--auditDestination setting, BSON for the --auditFormat setting, and the output filename for the
--auditPath. The --auditPath option accepts either full path name or relative path name. For example, the following enables auditing and records audit events to a BSON file with the relative path name of
data/db/auditLog.bson:
mongod --dbpath data/db --auditDestination file --auditFormat BSON --auditPath data/db/auditLog.bson
The audit file rotates at the same time as the server log file.
You may also specify these options in the configuration file:
storage:
dbPath: data/db
auditLog:
destination: file
format: BSON
path: data/db/auditLog.bson
To view the contents of the file, pass the file to the MongoDB utility bsondump. For example, the following converts
the audit log into a human-readable form and output to the terminal:
bsondump data/db/auditLog.bson
Filter Events
By default, the audit facility records all auditable operations as detailed in Audit Event Actions, Details, and Results
(page 410). The audit feature has an --auditFilter option to determine which events to record.
The --auditFilter option takes a string representation of a query document of the form:
385
The <field> can be any field in the audit message (page 409), including fields returned in the param
(page 410) document.
The <expression> is a query condition expression.
To specify an audit filter, enclose the filter document in single quotes to pass the document as a string.
To specify the audit filter in a configuration file, you must use the YAML format of the configuration file.
Filter for Multiple Operation Types
The
following
example
uses
the
filter
{ atype: { $in: [ "createCollection",
"dropCollection" ] } } to audit only the createCollection (page 404) and dropCollection
(page 404) actions.
To specify an audit filter, enclose the filter document in single quotes to pass the document as a string.
mongod --dbpath data/db --auditDestination file --auditFilter '{ atype: { $in: [ "createCollection",
To specify the audit filter in a configuration file, you must use the YAML format of the configuration file.
storage:
dbPath: data/db
auditLog:
destination: file
format: JSON
path: data/db/auditLog.json
filter: '{ atype: { $in: [ "createCollection", "dropCollection" ] } }'
The <field> can include any field in the audit message (page 409). For authentication operations, the audit messages
include a db field in the param document.
The following example uses the filter { atype: "authenticate", "param.db":
only the authenticate operations that occur against the test database.
"test" } to audit
To specify an audit filter, enclose the filter document in single quotes to pass the document as a string.
mongod --dbpath data/db --auth --auditDestination file --auditFilter '{ atype: "authenticate", "param
To specify the audit filter in a configuration file, you must use the YAML format of the configuration file.
storage:
dbPath: data/db
security:
authorization: enabled
auditLog:
destination: file
format: JSON
path: data/db/auditLog.json
filter: '{ atype: "authenticate", "param.db": "test" }'
To filter on all authenticate operations across databases, use the filter { atype:
386
"authenticate" }.
Chapter 6. Security
The following example uses the filter { roles: { role: "readWrite", db: "test" } } to only
audit operations for users with readWrite (page 390) role on the test database. This includes users with roles that
inherit from readWrite (page 390).
To specify an audit filter, enclose the filter document in single quotes to pass the document as a string.
mongod --dbpath data/db --auth --auditDestination file --auditFilter '{ roles: { role: "readWrite", d
To specify the audit filter in a configuration file, you must use the YAML format of the configuration file.
storage:
dbPath: data/db
security:
authorization: enabled
auditLog:
destination: file
format: JSON
path: data/db/auditLog.json
filter: '{ roles: { role: "readWrite", db: "test" } }'
To capture read and write operations in the audit, you must also enable the audit system to log authorization
successes using the auditAuthorizationSuccess parameter. 63
Note: Enabling auditAuthorizationSuccess degrades performance more than logging only the authorization
failures.
To specify an audit filter, enclose the filter document in single quotes to pass the document as a string.
mongod --dbpath data/db --auth --setParameter auditAuthorizationSuccess=true --auditDestination file
To specify the audit filter in a configuration file, you must use the YAML format of the configuration file.
storage:
dbPath: data/db
security:
authorization: enabled
auditLog:
destination: file
format: JSON
path: data/db/auditLog.json
filter: '{ atype: "authCheck", "param.command": { $in: [ "insert", "delete" ] } }'
setParameter: { auditAuthorizationSuccess: true }
387
To report an issue, we strongly suggest filing a ticket in the SECURITY64 project in JIRA. MongoDB, Inc responds to
vulnerability notifications within 48 hours.
Create the Report in JIRA
Submit a ticket in the Security65 project at: <http://jira.mongodb.org/browse>. The ticket number will become the
reference identification for the issue for its lifetime. You can use this identifier for tracking purposes.
Information to Provide
All vulnerability reports should contain as much information as possible so MongoDBs developers can move quickly
to resolve the issue. In particular, please include the following:
The name of the product.
Common Vulnerability information, if applicable, including:
CVSS (Common Vulnerability Scoring System) Score.
CVE (Common Vulnerability and Exposures) Identifier.
Contact information, including an email address and/or phone number, if applicable.
Send the Report via Email
While JIRA is the preferred reporting method, you may also report vulnerabilities via email to [email protected] .
You may encrypt email using MongoDBs public key at https://docs.mongodb.org/10gen-security-gpg-key.asc.
MongoDB, Inc. responds to vulnerability reports sent via email with a response email that contains a reference number
for a JIRA ticket posted to the SECURITY67 project.
Evaluation of a Vulnerability Report
MongoDB, Inc. validates all submitted vulnerabilities and uses Jira to track all communications regarding a vulnerability, including requests for clarification or additional information. If needed, MongoDB representatives set up a
conference call to exchange information regarding the vulnerability.
Disclosure
MongoDB, Inc. requests that you do not publicly disclose any information regarding the vulnerability or exploit the
issue until it has had the opportunity to analyze the vulnerability, to respond to the notification, and to notify key users,
customers, and partners.
The amount of time required to validate a reported vulnerability depends on the complexity and severity of the issue.
MongoDB, Inc. takes all required vulnerabilities very seriously and will always ensure that there is a clear and open
channel of communication with the reporter.
64 https://jira.mongodb.org/browse/SECURITY
65 https://jira.mongodb.org/browse/SECURITY
66 [email protected]
67 https://jira.mongodb.org/browse/SECURITY
388
Chapter 6. Security
After validating an issue, MongoDB, Inc. coordinates public disclosure of the issue with the reporter in a mutually
agreed timeframe and format. If required or requested, the reporter of a vulnerability will receive credit in the published
security bulletin.
Description
Authenticates a user to a database.
Description
Creates a new user.
Updates user data.
Changes an existing users password.
Deprecated. Removes a user from a database.
Deletes all users associated with a database.
Removes a single user.
Grants a role and its privileges to a user.
Removes a role from a user.
Returns information about the specified user.
Returns information about all users associated with a database.
Description
Creates a role and specifies its privileges.
Updates a user-defined role.
Deletes a user-defined role.
Deletes all user-defined roles associated with a database.
Assigns privileges to a user-defined role.
Removes the specified privileges from a user-defined role.
Specifies roles from which a user-defined role inherits privileges.
Removes a role from a user.
Returns information for the specified role.
Returns information for all the user-defined roles in a database.
389
Default MongoDB Port (page 408) List of default ports used by MongoDB.
System Event Audit Messages (page 409) Reference on system event audit messages.
Built-In Roles
MongoDB grants access to data and commands through role-based authorization (page 312) and provides built-in
roles that provide the different levels of access commonly needed in a database system. You can additionally create
user-defined roles (page 313).
A role grants privileges to perform sets of actions (page 403) on defined resources (page 402). A given role applies to
the database on which it is defined and can grant access down to a collection level of granularity.
Each of MongoDBs built-in roles defines access at the database level for all non-system collections in the roles
database and at the collection level for all system collections (page 287).
MongoDB provides the built-in database user (page 390) and database administration (page 391) roles on every
database. MongoDB provides all other built-in roles only on the admin database.
This section describes the privileges for each built-in role. You can also view the privileges for a built-in role at any
time by issuing the rolesInfo command with the showPrivileges and showBuiltinRoles fields both set
to true.
Database User Roles
Chapter 6. Security
391
The admin database includes the following roles for administering the whole system rather than just a single database.
These roles include but are not limited to replica set and sharded cluster administrative functions.
clusterAdmin
Provides the greatest cluster-management access. This role combines the privileges granted by the
clusterManager (page 392), clusterMonitor (page 393), and hostManager (page 394) roles. Additionally, the role provides the dropDatabase (page 407) action.
clusterManager
Provides management and monitoring actions on the cluster. A user with this role can access the config and
local databases, which are used in sharding and replication, respectively.
Provides the following actions on the cluster as a whole:
392
Chapter 6. Security
393
394
Chapter 6. Security
The admin database includes the following roles for backing up and restoring data:
backup
Provides minimal privileges needed for backing up data. This role provides sufficient privileges to use the
MongoDB Management Service (MMS)69 backup agent, or to use mongodump to back up an entire mongod
instance.
Provides the following actions (page 403) on the mms.backup collection in the admin database:
insert (page 404)
update (page 404)
Provides the listDatabases (page 408) action on the cluster as a whole.
Provides the listCollections (page 408) action on all databases.
Provides the listIndexes (page 408) action for all collections.
Provides the find (page 404) action on the following:
all non-system collections in the cluster
all the following system collections in the cluster:
system.indexes
system.namespaces (page 288), and system.js (page 288)
(page
288),
395
Provides the following actions on all non-system collections and system.js (page 288) collections in the
cluster; on the admin.system.users (page 287) and admin.system.roles (page 287) collections in
the admin database; and on legacy system.users collections from versions of MongoDB prior to 2.6:
collMod (page 406)
createCollection (page 404)
createIndex (page 404)
dropCollection (page 404)
insert (page 404)
Provides the listCollections (page 408) action on all databases.
Provides the following additional actions on admin.system.users (page 287) and legacy
system.users collections:
find (page 404)
remove (page 404)
update (page 404)
Provides the find (page 404) action on all the system.namespaces (page 288) collections in the cluster.
Although, restore (page 395) includes the ability to modify the documents in the admin.system.users
(page 287) collection using normal modification operations, only modify these data using the user management
methods.
All-Database Roles
The admin database provides the following roles that apply to all databases in a mongod instance and are roughly
equivalent to their single-database equivalents:
readAnyDatabase
Provides the same read-only permissions as read (page 390), except it applies to all databases in the cluster.
The role also provides the listDatabases (page 408) action on the cluster as a whole.
readWriteAnyDatabase
Provides the same read and write permissions as readWrite (page 390), except it applies to all databases in
the cluster. The role also provides the listDatabases (page 408) action on the cluster as a whole.
userAdminAnyDatabase
Provides the same access to user administration operations as userAdmin (page 392), except it applies to all
databases in the cluster. The role also provides the following actions on the cluster as a whole:
authSchemaUpgrade (page 405)
invalidateUserCache (page 405)
listDatabases (page 408)
The role also provides the following actions on the admin.system.users (page 287) and
admin.system.roles (page 287) collections on the admin database, and on legacy system.users
collections from versions of MongoDB prior to 2.6:
collStats (page 407)
dbHash (page 408)
dbStats (page 408)
396
Chapter 6. Security
__system
MongoDB assigns this role to user objects that represent cluster members, such as replica set members and
mongos instances. The role entitles its holder to take any action against any object in the database.
Do not assign this role to user objects representing applications or human administrators, other than in exceptional circumstances.
397
If you need access to all actions on all resources, for example to run applyOps commands, do not assign
this role. Instead, create a user-defined role (page 374) that grants anyAction (page 408) on anyResource
(page 403) and ensure that only the users who need access to these operations have this access.
system.roles Collection
New in version 2.6.
The system.roles collection in the admin database stores the user-defined roles. To create and manage these
user-defined roles, MongoDB provides role management commands.
system.roles Schema
398
Chapter 6. Security
admin.system.roles.privileges[n].resource
A document that specifies the resources upon which the privilege actions (page 399) apply. The document has one of the following form:
{ db: <database>, collection: <collection> }
or
{ cluster : true }
Consider the following sample documents found in system.roles collection of the admin database.
A User-Defined Role Specifies Privileges The following is a sample document for a user-defined role appUser
defined for the myApp database:
{
_id: "myApp.appUser",
role: "appUser",
db: "myApp",
privileges: [
{ resource: { db: "myApp" , collection: "" },
actions: [ "find", "createCollection", "dbStats", "collStats" ] },
{ resource: { db: "myApp", collection: "logs" },
actions: [ "insert" ] },
{ resource: { db: "myApp", collection: "data" },
actions: [ "insert", "update", "remove", "compact" ] },
{ resource: { db: "myApp", collection: "system.js" },
actions: [ "find" ] },
],
roles: []
}
The privileges array lists the five privileges that the appUser role specifies:
399
The first privilege permits its actions ( "find", "createCollection", "dbStats", "collStats") on
all the collections in the myApp database excluding its system collections. See Specify a Database as Resource
(page 402).
The next two privileges permits additional actions on specific collections, logs and data, in the myApp
database. See Specify a Collection of a Database as Resource (page 402).
The last privilege permits actions on one system collections (page 287) in the myApp database. While the first
privilege gives database-wide permission for the find action, the action does not apply to myApps system
collections. To give access to a system collection, a privilege must explicitly specify the collection. See Resource
Document (page 402).
As indicated by the empty roles array, appUser inherits no additional privileges from other roles.
User-Defined Role Inherits from Other Roles The following is a sample document for a user-defined role
appAdmin defined for the myApp database: The document shows that the appAdmin role specifies privileges
as well as inherits privileges from other roles:
{
_id: "myApp.appAdmin",
role: "appAdmin",
db: "myApp",
privileges: [
{
resource: { db: "myApp", collection: "" },
actions: [ "insert", "dbStats", "collStats", "compact", "repairDatabase" ]
}
],
roles: [
{ role: "appUser", db: "myApp" }
]
}
The privileges array lists the privileges that the appAdmin role specifies. This role has a single privilege that
permits its actions ( "insert", "dbStats", "collStats", "compact", "repairDatabase") on all the
collections in the myApp database excluding its system collections. See Specify a Database as Resource (page 402).
The roles array lists the roles, identified by the role names and databases, from which the role appAdmin inherits
privileges.
system.users Collection
Changed in version 2.6.
The system.users collection in the admin database stores user authentication (page 308) and authorization
(page 312) information. To manage data in this collection, MongoDB provides user management commands.
system.users Schema
400
Chapter 6. Security
roles: [
{ role: "<role name>", db: "<database>" },
...
],
customData: <custom information>
}
401
"storedKey" : "wxWGN3ElQ25WbPjACeXdUmN4nNo=",
"serverKey" : "h7vBq5tACT/BtrIElY2QTm+pQzM="
}
},
roles : [
{ role: "read", db: "home" },
{ role: "readWrite", db: "test" },
{ role: "appUser", db: "myApp" }
],
customData : { zipCode: "64157" }
}
The document shows that a user Kari is associated with the home database. Kari has the read (page 390) role
in the home database, the readWrite (page 390) role in the test database, and the appUser role in the myApp
database.
Resource Document
The resource document specifies the resources upon which a privilege permits actions.
Database and/or Collection Resource
Specify a Collection of a Database as Resource If the resource document species both the db and collection
fields as non-empty strings, the resource is the specified collection in the specified database. For example, the following
document specifies a resource of the inventory collection in the products database:
{ db: "products", collection: "inventory" }
For a user-defined role scoped for a non-admin database, the resource specification for its privileges must specify the
same database as the role. User-defined roles scoped for the admin database can specify other databases.
Specify a Database as Resource If only the collection field is an empty string (""), the resource is the specified
database, excluding the system collections (page 287). For example, the following resource document specifies the
resource of the test database, excluding the system collections:
{ db: "test", collection: "" }
For a user-defined role scoped for a non-admin database, the resource specification for its privileges must specify the
same database as the role. User-defined roles scoped for the admin database can specify other databases.
Note: When you specify a database as the resource, system collections are excluded, unless you name them explicitly,
as in the following:
{ db: "test", collection: "system.js" }
402
Chapter 6. Security
Specify Collections Across Databases as Resource If only the db field is an empty string (""), the resource is all
collections with the specified name across all databases. For example, the following document specifies the resource
of all the accounts collections across all the databases:
{ db: "", collection: "accounts" }
For user-defined roles, only roles scoped for the admin database can have this resource specification for their privileges.
Specify All Non-System Collections in All Databases If both the db and collection fields are empty strings
(""), the resource is all collections, excluding the system collections (page 287), in all the databases:
{ db: "", collection: "" }
For user-defined roles, only roles scoped for the admin database can have this resource specification for their privileges.
Cluster Resource
Use the cluster resource for actions that affect the state of the system rather than act on specific set of databases
or collections. Examples of such actions are shutdown, replSetReconfig, and addShard. For example, the
following document grants the action shutdown on the cluster.
{ resource: { cluster : true }, actions: [ "shutdown" ] }
For user-defined roles, only roles scoped for the admin database can have this resource specification for their privileges.
anyResource
The internal resource anyResource gives access to every resource in the system and is intended for internal use.
Do not use this resource, other than in exceptional circumstances. The syntax for this resource is { anyResource:
true }.
Privilege Actions
New in version 2.6.
Privilege actions define the operations a user can perform on a resource (page 402). A MongoDB privilege (page 312)
comprises a resource (page 402) and the permitted actions. This page lists available actions grouped by common
purpose.
MongoDB provides built-in roles with pre-defined pairings of resources and permitted actions. For lists of the actions
granted, see Built-In Roles (page 390). To define custom roles, see Create a Role (page 374).
403
find
User can perform the db.collection.find() method. Apply this action to database or collection resources.
insert
User can perform the insert command. Apply this action to database or collection resources.
remove
User can perform the db.collection.remove() method. Apply this action to database or collection
resources.
update
User can perform the update command. Apply this action to database or collection resources.
Database Management Actions
changeCustomData
User can change the custom information of any user in the given database. Apply this action to database
resources.
changeOwnCustomData
Users can change their own custom information. Apply this action to database resources.
changeOwnPassword
Users can change their own passwords. Apply this action to database resources.
changePassword
User can change the password of any user in the given database. Apply this action to database resources.
createCollection
User can perform the db.createCollection() method. Apply this action to database or collection resources.
createIndex
Provides access to the db.collection.createIndex() method and the createIndexes command.
Apply this action to database or collection resources.
createRole
User can create new roles in the given database. Apply this action to database resources.
createUser
User can create new users in the given database. Apply this action to database resources.
dropCollection
User can perform the db.collection.drop() method. Apply this action to database or collection resources.
dropRole
User can delete any role from the given database. Apply this action to database resources.
dropUser
User can remove any user from the given database. Apply this action to database resources.
emptycapped
User can perform the emptycapped command. Apply this action to database or collection resources.
enableProfiler
User can perform the db.setProfilingLevel() method. Apply this action to database resources.
404
Chapter 6. Security
grantRole
User can grant any role in the database to any user from any database in the system. Apply this action to database
resources.
killCursors
User can kill cursors on the target collection.
revokeRole
User can remove any role from any user from any database in the system. Apply this action to database resources.
unlock
User can perform the db.fsyncUnlock() method. Apply this action to the cluster resource.
viewRole
User can view information about any role in the given database. Apply this action to database resources.
viewUser
User can view the information of any user in the given database. Apply this action to database resources.
Deployment Management Actions
authSchemaUpgrade
User can perform the authSchemaUpgrade command. Apply this action to the cluster resource.
cleanupOrphaned
User can perform the cleanupOrphaned command. Apply this action to the cluster resource.
cpuProfiler
User can enable and use the CPU profiler. Apply this action to the cluster resource.
inprog
User can use the db.currentOp() method to return pending and active operations. Apply this action to the
cluster resource.
invalidateUserCache
Provides access to the invalidateUserCache command. Apply this action to the cluster resource.
killop
User can perform the db.killOp() method. Apply this action to the cluster resource.
planCacheRead
User can perform the planCacheListPlans and planCacheListQueryShapes commands and the
PlanCache.getPlansByQuery() and PlanCache.listQueryShapes() methods. Apply this action to database or collection resources.
planCacheWrite
User can perform the planCacheClear command and the PlanCache.clear() and
PlanCache.clearPlansByQuery() methods. Apply this action to database or collection resources.
storageDetails
User can perform the storageDetails command. Apply this action to database or collection resources.
Replication Actions
appendOplogNote
User can append notes to the oplog. Apply this action to the cluster resource.
replSetConfigure
User can configure a replica set. Apply this action to the cluster resource.
405
replSetGetStatus
User can perform the replSetGetStatus command. Apply this action to the cluster resource.
replSetHeartbeat
User can perform the replSetHeartbeat command. Apply this action to the cluster resource.
replSetStateChange
User can change the state of a replica set through the replSetFreeze, replSetMaintenance,
replSetStepDown, and replSetSyncFrom commands. Apply this action to the cluster resource.
resync
User can perform the resync command. Apply this action to the cluster resource.
Sharding Actions
addShard
User can perform the addShard command. Apply this action to the cluster resource.
enableSharding
User can enable sharding on a database using the enableSharding command and can shard a collection
using the shardCollection command. Apply this action to database or collection resources.
flushRouterConfig
User can perform the flushRouterConfig command. Apply this action to the cluster resource.
getShardMap
User can perform the getShardMap command. Apply this action to the cluster resource.
getShardVersion
User can perform the getShardVersion command. Apply this action to database resources.
listShards
User can perform the listShards command. Apply this action to the cluster resource.
moveChunk
User can perform the moveChunk command. In addition, user can perform the movePrimary command
provided that the privilege is applied to an appropriate database resource. Apply this action to database or
collection resources.
removeShard
User can perform the removeShard command. Apply this action to the cluster resource.
shardingState
User can perform the shardingState command. Apply this action to the cluster resource.
splitChunk
User can perform the splitChunk command. Apply this action to database or collection resources.
splitVector
User can perform the splitVector command. Apply this action to database or collection resources.
Server Administration Actions
applicationMessage
User can perform the logApplicationMessage command. Apply this action to the cluster resource.
closeAllDatabases
User can perform the closeAllDatabases command. Apply this action to the cluster resource.
406
Chapter 6. Security
collMod
User can perform the collMod command. Apply this action to database or collection resources.
compact
User can perform the compact command. Apply this action to database or collection resources.
connPoolSync
User can perform the connPoolSync command. Apply this action to the cluster resource.
convertToCapped
User can perform the convertToCapped command. Apply this action to database or collection resources.
dropDatabase
User can perform the dropDatabase command. Apply this action to database resources.
dropIndex
User can perform the dropIndexes command. Apply this action to database or collection resources.
fsync
User can perform the fsync command. Apply this action to the cluster resource.
getParameter
User can perform the getParameter command. Apply this action to the cluster resource.
hostInfo
Provides information about the server the MongoDB instance runs on. Apply this action to the cluster
resource.
logRotate
User can perform the logRotate command. Apply this action to the cluster resource.
reIndex
User can perform the reIndex command. Apply this action to database or collection resources.
renameCollectionSameDB
Allows the user to rename collections on the current database using the renameCollection command.
Apply this action to database resources.
Additionally, the user must either have find (page 404) on the source collection or not have find (page 404)
on the destination collection.
If a collection with the new name already exists, the user must also have the dropCollection (page 404)
action on the destination collection.
repairDatabase
User can perform the repairDatabase command. Apply this action to database resources.
setParameter
User can perform the setParameter command. Apply this action to the cluster resource.
shutdown
User can perform the shutdown command. Apply this action to the cluster resource.
touch
User can perform the touch command. Apply this action to the cluster resource.
Diagnostic Actions
collStats
User can perform the collStats command. Apply this action to database or collection resources.
407
connPoolStats
User can perform the connPoolStats and shardConnPoolStats commands. Apply this action to the
cluster resource.
cursorInfo
User can perform the cursorInfo command. Apply this action to the cluster resource.
dbHash
User can perform the dbHash command. Apply this action to database or collection resources.
dbStats
User can perform the dbStats command. Apply this action to database resources.
diagLogging
User can perform the diagLogging command. Apply this action to the cluster resource.
getCmdLineOpts
User can perform the getCmdLineOpts command. Apply this action to the cluster resource.
getLog
User can perform the getLog command. Apply this action to the cluster resource.
indexStats
User can perform the indexStats command. Apply this action to database or collection resources.
listDatabases
User can perform the listDatabases command. Apply this action to the cluster resource.
listCollections
User can perform the listCollections command. Apply this action to database resources.
listIndexes
User can perform the ListIndexes command. Apply this action to database or collection resources.
netstat
User can perform the netstat command. Apply this action to the cluster resource.
serverStatus
User can perform the serverStatus command. Apply this action to the cluster resource.
validate
User can perform the validate command. Apply this action to database or collection resources.
top
User can perform the top command. Apply this action to the cluster resource.
Internal Actions
anyAction
Allows any action on a resource. Do not assign this action except for exceptional circumstances.
internal
Allows internal actions. Do not assign this action except for exceptional circumstances.
Default MongoDB Port
The following table lists the default ports used by MongoDB:
408
Chapter 6. Security
Default
Port
27017
27018
27019
28017
Description
The default port for mongod and mongos instances. You can change this port with port or
--port.
The default port when running with --shardsvr runtime operation or the shardsvr value for the
clusterRole setting in a configuration file.
The default port when running with --configsvr runtime operation or the configsvr value for
the clusterRole setting in a configuration file.
The default port for the web status page. The web status page is always accessible at a port number
that is 1000 greater than the port determined by port.
Audit Message
The event auditing feature (page 317) can record events in JSON format. To configure auditing output, see Configure
System Events Auditing (page 384)
The recorded JSON messages have the following syntax:
{
atype: <String>,
ts : { "$date": <timestamp> },
local: { ip: <String>, port: <int> },
remote: { ip: <String>, port: <int> },
users : [ { user: <String>, db: <String> }, ... ],
roles: [ { role: <String>, db: <String> }, ... ],
param: <document>,
result: <int>
}
field String atype Action type. See Audit Event Actions, Details, and Results (page 410).
field document ts Document that contains the date and UTC time of the event, in ISO 8601 format.
field document local Document that contains the local ip address and the port number of the running
instance.
field document remote Document that contains the remote ip address and the port number of the
incoming connection associated with the event.
field array users Array of user identification documents. Because MongoDB allows a session to log in
with different user per database, this array can have more than one user. Each document contains a
user field for the username and a db field for the authentication database for that user.
field array roles Array of documents that specify the roles (page 312) granted to the user. Each document
contains a role field for the name of the role and a db field for the database associated with the
role.
field document param Specific details for the event. See Audit Event Actions, Details, and Results
(page 410).
field integer result Error code. See Audit Event Actions, Details, and Results (page 410).
70 http://www.mongodb.com/products/mongodb-enterprise
409
The following table lists for each atype or action type, the associated param details and the result values, if any.
atype
authenticate
param
{
result
0 - Success
18 - Authentication Failed
0 - Success
13 - Unauthorized to perform the opcommand: <name>,
eration.
ns: <database>.<collection>,
By default, the auditing system
args: <command object>
logs only the authorization fail}
ures.
To enable the system to
ns field is optional.
log authorization successes, use the
args field may be redacted.
auditAuthorizationSuccess
parameter. 71
0 - Success
{ ns: <database>.<collection> }
{
0 - Success
createDatabase
{ ns: <database> }
createIndex (page 404)
0 - Success
{
ns: <database>.<collection>,
indexName: <index name>,
indexSpec: <index specification>
}
0 - Success
renameCollection
{
old: <database>.<collection>,
new: <database>.<collection>
}
dropCollection (page 404)
0 - Success
{ ns: <database>.<collection> }
0 - Success
{ ns: <database> }
0 - Success
{
ns: <database>.<collection>,
indexName: <index name>
}
Continued on next page
71
Enabling auditAuthorizationSuccess degrades performance more than logging only the authorization failures.
410
Chapter 6. Security
atype
createUser (page 404)
dropAllUsersFromDatabase
{ db: <database> }
0 - Success
updateUser
{
grantRolesToUser
{
user: <user name>,
db: <database>,
roles: [
{
role: <role name>,
db: <database>
},
...
]
}
411
atype
revokeRolesFromUser
0 - Success
{
role: <role name>,
db: <database>,
roles: [
{
role: <role name>,
db: <database>
},
...
],
privileges: [
{
resource: <resource document>,
actions: [ <action>, ... ]
},
...
]
}
The roles and the privileges
fields are optional.
For details on the resource document,
see Resource Document (page 402).
For a list of actions, see Privilege Actions (page 403).
Continued on next page
412
Chapter 6. Security
atype
updateRole
dropAllRolesFromDatabase
{ db: <database> }
0 - Success
grantRolesToRole
{
role: <role name>,
db: <database>,
roles: [
{
role: <role name>,
db: <database>
},
...
]
}
413
atype
revokeRolesFromRole
grantPrivilegesToRole
{
revokePrivilegesFromRole
{
414
Chapter 6. Security
atype
replSetReconfig
shardCollection
{
ns: <database>.<collection>,
key: <shard key pattern>,
options: { unique: <boolean> }
}
addShard (page 406)
0 - Success
{
shard: <shard name>,
connectionString: <hostname>:<port>,
maxSize: <maxSize>
}
When a shard is a replica set, the
connectionString includes the
replica set name and can include
other members of the replica set.
0 - Success
{ shard: <shard name> }
0 - Success
{ }
Indicates commencement of database
shutdown.
applicationMessage
(page 406)
0 - Success
{ msg: <custom message string> }
See logApplicationMessage.
415
In version 2.2 and earlier, the read-write users of a database all have access to the system.users collection, which
contains the user names and user password hashes. 72
Password Hashing Insecurity
If a user has the same password for multiple databases, the hash will be the same. A malicious user could exploit this
to gain access on a second database using a different users credentials.
As a result, always use unique username and password combinations for each database.
Thanks to Will Urbanski, from Dell SecureWorks, for identifying this issue.
72
416
Chapter 6. Security
CHAPTER 7
Aggregation
Aggregations operations process data records and return computed results. Aggregation operations group values from
multiple documents together, and can perform a variety of operations on the grouped data to return a single result.
MongoDB provides three ways to perform aggregation: the aggregation pipeline (page 421), the map-reduce function
(page 424), and single purpose aggregation methods and commands (page 426).
Aggregation Introduction (page 417) A high-level introduction to aggregation.
Aggregation Concepts (page 421) Introduces the use and operation of the data aggregation modalities available in
MongoDB.
Aggregation Pipeline (page 421) The aggregation pipeline is a framework for performing aggregation tasks,
modeled on the concept of data processing pipelines. Using this framework, MongoDB passes the documents of a single collection through a pipeline. The pipeline transforms the documents into aggregated
results, and is accessed through the aggregate database command.
Map-Reduce (page 424) Map-reduce is a generic multi-phase data aggregation modality for processing quantities of data. MongoDB provides map-reduce with the mapReduce database command.
Single Purpose Aggregation Operations (page 426) MongoDB provides a collection of specific data aggregation operations to support a number of common data aggregation functions. These operations include
returning counts of documents, distinct values of a field, and simple grouping operations.
Aggregation Mechanics (page 429) Details internal optimization operations, limits, support for sharded collections, and concurrency concerns.
Aggregation Examples (page 434) Examples and tutorials for data aggregation operations in MongoDB.
Aggregation Reference (page 451) References for all aggregation operations material for all data aggregation methods in MongoDB.
417
Map-Reduce
MongoDB also provides map-reduce (page 424) operations to perform aggregation. In general, map-reduce operations
have two phases: a map stage that processes each document and emits one or more objects for each input document,
and reduce phase that combines the output of the map operation. Optionally, map-reduce can have a finalize stage to
make final modifications to the result. Like other aggregation operations, map-reduce can specify a query condition to
select the input documents as well as sort and limit the results.
418
Chapter 7. Aggregation
Map-reduce uses custom JavaScript functions to perform the map and reduce operations, as well as the optional finalize
operation. While the custom JavaScript provide great flexibility compared to the aggregation pipeline, in general, mapreduce is less efficient and more complex than the aggregation pipeline.
Note: Starting in MongoDB 2.4, certain mongo shell functions and properties are inaccessible in map-reduce operations. MongoDB 2.4 also provides support for multiple JavaScript operations to run at the same time. Before
MongoDB 2.4, JavaScript code executed in a single thread, raising concurrency issues for map-reduce.
419
420
Chapter 7. Aggregation
421
422
Chapter 7. Aggregation
The aggregation pipeline provides an alternative to map-reduce and may be the preferred solution for aggregation tasks
where the complexity of map-reduce may be unwarranted.
Aggregation pipeline have some limitations on value types and result size. See Aggregation Pipeline Limits (page 432)
for details on limits and restrictions on the aggregation pipeline.
Pipeline
The MongoDB aggregation pipeline consists of stages. Each stage transforms the documents as they pass through the
pipeline. Pipeline stages do not need to produce one output document for every input document; e.g., some stages may
generate new documents or filter out documents. Pipeline stages can appear multiple times in the pipeline.
MongoDB provides the db.collection.aggregate() method in the mongo shell and the aggregate command for aggregation pipeline. See aggregation-pipeline-operator-reference for the available stages.
For example usage of the aggregation pipeline, consider Aggregation with User Preference Data (page 438) and
Aggregation with the Zip Code Data Set (page 435).
Pipeline Expressions
Some pipeline stages takes a pipeline expression as its operand. Pipeline expressions specify the transformation to
apply to the input documents. Expressions have a document (page 166) structure and can contain other expression
(page 452).
Pipeline expressions can only operate on the current document in the pipeline and cannot refer to data from other
documents: expression operations provide in-memory transformation of documents.
Generally, expressions are stateless and are only evaluated when seen by the aggregation process with one exception:
accumulator expressions.
The accumulators, used with the $group pipeline operator, maintain their state (e.g. totals, maximums, minimums,
and related data) as documents progress through the pipeline.
For more information on expressions, see Expressions (page 452).
Aggregation Pipeline Behavior
In MongoDB, the aggregate command operates on a single collection, logically passing the entire collection into
the aggregation pipeline. To optimize the operation, wherever possible, use the following strategies to avoid scanning
the entire collection.
Pipeline Operators and Indexes
The $match and $sort pipeline operators can take advantage of an index when they occur at the beginning of the
pipeline.
New in version 2.4: The $geoNear pipeline operator takes advantage of a geospatial index. When using $geoNear,
the $geoNear pipeline operation must appear as the first stage in an aggregation pipeline.
Even when the pipeline uses an index, aggregation still requires access to the actual documents; i.e. indexes cannot
fully cover an aggregation pipeline.
Changed in version 2.6: In previous versions, for very select use cases, an index could cover a pipeline.
423
Early Filtering
If your aggregation operation requires only a subset of the data in a collection, use the $match, $limit, and $skip
stages to restrict the documents that enter at the beginning of the pipeline. When placed at the beginning of a pipeline,
$match operations use suitable indexes to scan only the matching documents in a collection.
Placing a $match pipeline stage followed by a $sort stage at the start of the pipeline is logically equivalent to a
single query with a sort and can use an index. When possible, place $match operators at the beginning of the pipeline.
Additional Features
The aggregation pipeline has an internal optimization phase that provides improved performance for certain sequences
of operators. For details, see Aggregation Pipeline Optimization (page 429).
The aggregation pipeline supports operations on sharded collections. See Aggregation Pipeline and Sharded Collections (page 432).
7.2.2 Map-Reduce
Map-reduce is a data processing paradigm for condensing large volumes of data into useful aggregated results. For
map-reduce operations, MongoDB provides the mapReduce database command.
Consider the following map-reduce operation:
In this map-reduce operation, MongoDB applies the map phase to each input document (i.e. the documents in the
collection that match the query condition). The map function emits key-value pairs. For those keys that have multiple
values, MongoDB applies the reduce phase, which collects and condenses the aggregated data. MongoDB then stores
the results in a collection. Optionally, the output of the reduce function may pass through a finalize function to further
condense or process the results of the aggregation.
All map-reduce functions in MongoDB are JavaScript and run within the mongod process. Map-reduce operations
take the documents of a single collection as the input and can perform any arbitrary sorting and limiting before
beginning the map stage. mapReduce can return the results of a map-reduce operation as a document, or may write
the results to collections. The input and the output collections may be sharded.
Note: For most aggregation operations, the Aggregation Pipeline (page 421) provides better performance and more
coherent interface. However, map-reduce operations provide some flexibility that is not presently available in the
aggregation pipeline.
Chapter 7. Aggregation
425
that merge replace, merge, or reduce new results with previous results. See mapReduce and Perform Incremental
Map-Reduce (page 445) for details and examples.
When returning the results of a map reduce operation inline, the result documents must
be within the BSON Document Size limit,
which is currently 16 megabytes.
For
additional
information
on
limits
and
restrictions
on
map-reduce
operations,
see
the
http://docs.mongodb.org/manual/reference/command/mapReduce reference page.
MongoDB supports map-reduce operations on sharded collections (page 641). Map-reduce operations can also output
the results to a sharded collection. See Map-Reduce and Sharded Collections (page 433).
a:
a:
a:
a:
1,
1,
1,
2,
b:
b:
b:
b:
0
1
4
2
}
}
}
}
The following operation would count all documents in the collection and return the number 4:
db.records.count()
The following operation will count only the documents where the value of the field a is 1 and return 3:
db.records.count( { a: 1 } )
Distinct
The distinct operation takes a number of documents that match a query and returns all of the unique values for a field
in the matching documents. The distinct command and db.collection.distinct() method provide this
operation in the mongo shell. Consider the following examples of a distinct operation:
Example
Given a collection named records with only the following documents:
{
{
{
{
a:
a:
a:
a:
426
1,
1,
1,
1,
b:
b:
b:
b:
0
1
1
4
}
}
}
}
Chapter 7. Aggregation
427
{ a: 2, b: 2 }
{ a: 2, b: 2 }
Consider the following db.collection.distinct() operation which returns the distinct values of the field b:
db.records.distinct( "b" )
Group
The group operation takes a number of documents that match a query, and then collects groups of documents based
on the value of a field or fields. It returns an array of documents with computed results for each group of documents.
Access the grouping functionality via the group command or the db.collection.group() method in the
mongo shell.
Warning: group does not support data in sharded collections. In addition, the results of the group operation
must be no larger than 16 megabytes.
Consider the following group operation:
Example
Given a collection named records with the following documents:
{
{
{
{
{
{
{
a:
a:
a:
a:
a:
a:
a:
1,
1,
1,
2,
2,
1,
4,
count:
count:
count:
count:
count:
count:
count:
4
2
4
3
1
5
4
}
}
}
}
}
}
}
Consider the following group operation which groups documents by the field a, where a is less than 3, and sums the
field count for each group:
db.records.group( {
key: { a: 1 },
cond: { a: { $lt: 3 } },
reduce: function(cur, result) { result.count += cur.count },
initial: { count: 0 }
} )
See also:
The $group for related functionality in the aggregation pipeline (page 421).
428
Chapter 7. Aggregation
The aggregation pipeline can determine if it requires only a subset of the fields in the documents to obtain the results.
If so, the pipeline will only use those required fields, reducing the amount of data passing through the pipeline.
Pipeline Sequence Optimization
$sort + $match Sequence Optimization When you have a sequence with $sort followed by a $match, the
$match moves before the $sort to minimize the number of objects to sort. For example, if the pipeline consists of
the following stages:
{ $sort: { age : -1 } },
{ $match: { status: 'A' } }
During the optimization phase, the optimizer transforms the sequence to the following:
{ $match: { status: 'A' } },
{ $sort: { age : -1 } }
$skip + $limit Sequence Optimization When you have a sequence with $skip followed by a $limit, the
$limit moves before the $skip. With the reordering, the $limit value increases by the $skip amount.
For example, if the pipeline consists of the following stages:
{ $skip: 10 },
{ $limit: 5 }
During the optimization phase, the optimizer transforms the sequence to the following:
{ $limit: 15 },
{ $skip: 10 }
429
This optimization allows for more opportunities for $sort + $limit Coalescence (page 430), such as with $sort +
$skip + $limit sequences. See $sort + $limit Coalescence (page 430) for details on the coalescence and $sort +
$skip + $limit Sequence (page 431) for an example.
For aggregation operations on sharded collections (page 432), this optimization reduces the results returned from each
shard.
$redact + $match Sequence Optimization When possible, when the pipeline has the $redact stage immediately followed by the $match stage, the aggregation can sometimes add a portion of the $match stage before the
$redact stage. If the added $match stage is at the start of a pipeline, the aggregation can use an index as well
as query the collection to limit the number of documents that enter the pipeline. See Pipeline Operators and Indexes
(page 423) for more information.
For example, if the pipeline consists of the following stages:
{ $redact: { $cond: { if: { $eq: [ "$level", 5 ] }, then: "$$PRUNE", else: "$$DESCEND" } } },
{ $match: { year: 2014, category: { $ne: "Z" } } }
The optimizer can add the same $match stage before the $redact stage:
{ $match: { year: 2014 } },
{ $redact: { $cond: { if: { $eq: [ "$level", 5 ] }, then: "$$PRUNE", else: "$$DESCEND" } } },
{ $match: { year: 2014, category: { $ne: "Z" } } }
When possible, the optimization phase coalesces a pipeline stage into its predecessor. Generally, coalescence occurs
after any sequence reordering optimization.
$sort + $limit Coalescence When a $sort immediately precedes a $limit, the optimizer can coalesce the
$limit into the $sort. This allows the sort operation to only maintain the top n results as it progresses, where
n is the specified limit, and MongoDB only needs to store n items in memory 5 . See sort-and-memory for more
information.
$limit + $limit Coalescence When a $limit immediately follows another $limit, the two stages can
coalesce into a single $limit where the limit amount is the smaller of the two initial limit amounts. For example, a
pipeline contains the following sequence:
{ $limit: 100 },
{ $limit: 10 }
Then the second $limit stage can coalesce into the first $limit stage and result in a single $limit stage where
the limit amount 10 is the minimum of the two initial limits 100 and 10.
{ $limit: 10 }
$skip + $skip Coalescence When a $skip immediately follows another $skip, the two stages can coalesce
into a single $skip where the skip amount is the sum of the two initial skip amounts. For example, a pipeline contains
the following sequence:
{ $skip: 5 },
{ $skip: 2 }
5
The optimization will still apply when allowDiskUse is true and the n items exceed the aggregation memory limit (page 432).
430
Chapter 7. Aggregation
Then the second $skip stage can coalesce into the first $skip stage and result in a single $skip stage where the
skip amount 7 is the sum of the two initial limits 5 and 2.
{ $skip: 7 }
$match + $match Coalescence When a $match immediately follows another $match, the two stages can
coalesce into a single $match combining the conditions with an $and. For example, a pipeline contains the following
sequence:
{ $match: { year: 2014 } },
{ $match: { status: "A" } }
Then the second $match stage can coalesce into the first $match stage and result in a single $match stage
{ $match: { $and: [ { "year" : 2014 }, { "status" : "A" } ] } }
Examples
The following examples are some sequences that can take advantage of both sequence reordering and coalescence.
Generally, coalescence occurs after any sequence reordering optimization.
$sort + $skip + $limit Sequence A pipeline contains a sequence of $sort followed by a $skip followed
by a $limit:
{ $sort: { age : -1 } },
{ $skip: 10 },
{ $limit: 5 }
First, the optimizer performs the $skip + $limit Sequence Optimization (page 429) to transforms the sequence to the
following:
{ $sort: { age : -1 } },
{ $limit: 15 }
{ $skip: 10 }
The $skip + $limit Sequence Optimization (page 429) increases the $limit amount with the reordering. See $skip +
$limit Sequence Optimization (page 429) for details.
The reordered sequence now has $sort immediately preceding the $limit, and the pipeline can coalesce the two
stages to decrease memory usage during the sort operation. See $sort + $limit Coalescence (page 430) for more
information.
$limit + $skip + $limit + $skip Sequence A pipeline contains a sequence of alternating $limit and
$skip stages:
{
{
{
{
$limit: 100 },
$skip: 5 },
$limit: 10 },
$skip: 2 }
The $skip + $limit Sequence Optimization (page 429) reverses the position of the { $skip:
10 } stages and increases the limit amount:
5 } and { $limit:
431
{
{
{
{
$limit: 100 },
$limit: 15},
$skip: 5 },
$skip: 2 }
The optimizer then coalesces the two $limit stages into a single $limit stage and the two $skip stages into a
single $skip stage. The resulting sequence is the following:
{ $limit: 15 },
{ $skip: 7 }
See $limit + $limit Coalescence (page 430) and $skip + $skip Coalescence (page 430) for details.
See also:
explain option in the db.collection.aggregate()
Aggregation Pipeline Limits
Aggregation operations with the aggregate command have the following limitations.
Result Size Restrictions
If the aggregate command returns a single document that contains the complete result set, the command will
produce an error if the result set exceeds the BSON Document Size limit, which is currently 16 megabytes. To
manage result sets that exceed this limit, the aggregate command can return result sets of any size if the command
return a cursor or store the results to a collection.
Changed in version 2.6: The aggregate command can return results as a cursor or store the results in a collection,
which are not subject to the size limit. The db.collection.aggregate() returns a cursor and can return result
sets of any size.
Memory Restrictions
432
Chapter 7. Aggregation
When operating on a sharded collection, the aggregation pipeline is split into two parts. The first pipeline runs on each
shard, or if an early $match can exclude shards through the use of the shard key in the predicate, the pipeline runs on
only the relevant shards.
The second pipeline consists of the remaining pipeline stages and runs on the primary shard (page 649). The primary
shard merges the cursors from the other shards and runs the second pipeline on these results. The primary shard
forwards the final results to the mongos. In previous versions, the second pipeline would run on the mongos. 6
Optimization
When splitting the aggregation pipeline into two parts, the pipeline is split to ensure that the shards perform as many
stages as possible with consideration for optimization.
To see how the pipeline was split, include the explain option in the db.collection.aggregate() method.
Optimizations are subject to change between releases.
Map-Reduce and Sharded Collections
Map-reduce supports operations on sharded collections, both as an input and as an output. This section describes the
behaviors of mapReduce specific to sharded collections.
Sharded Collection as Input
When using sharded collection as the input for a map-reduce operation, mongos will automatically dispatch the mapreduce job to each shard in parallel. There is no special option required. mongos will wait for jobs on all shards to
finish.
Sharded Collection as Output
Until all shards upgrade to v2.6, the second pipeline runs on the mongos if any shards are still running v2.4.
433
mongos retrieves the results from each shard, performs a merge sort to order the results, and proceeds to the
reduce/finalize phase as needed. mongos then writes the result to the output collection in sharded mode.
This model requires only a small amount of memory, even for large data sets.
Shard chunks are not automatically split during insertion. This requires manual intervention until the chunks
are granular and balanced.
Important: For best results, only use the sharded output options for mapReduce in version 2.2 or later.
434
Chapter 7. Aggregation
me-
Data Model
Each document in the zipcodes collection has the following form:
{
"_id": "10280",
"city": "NEW YORK",
"state": "NY",
"pop": 5574,
"loc": [
-74.016323,
40.710537
]
}
In this example, the aggregation pipeline (page 423) consists of the $group stage followed by the $match stage:
The $group stage groups the documents of the zipcode collection by the state field, calculates the
totalPop field for each state, and outputs a document for each unique state.
7 http://media.mongodb.org/zips.json
435
The new per-state documents have two fields: the _id field and the totalPop field. The _id field contains
the value of the state; i.e. the group by field. The totalPop field is a calculated field that contains the total
population of each state. To calculate the value, $group uses the $sum operator to add the population field
(pop) for each state.
After the $group stage, the documents in the pipeline resemble the following:
{
"_id" : "AK",
"totalPop" : 550043
}
The $match stage filters these grouped documents to output only those documents whose totalPop value is
greater than or equal to 10 million. The $match stage does not alter the matching documents but outputs the
matching documents unmodified.
The equivalent SQL for this aggregation operation is:
SELECT state, SUM(pop) AS totalPop
FROM zipcodes
GROUP BY state
HAVING totalPop >= (10*1000*1000)
See also:
$group, $match, $sum
Return Average City Population by State
The following aggregation operation returns the average populations for cities in each state:
db.zipcodes.aggregate( [
{ $group: { _id: { state: "$state", city: "$city" }, pop: { $sum: "$pop" } } },
{ $group: { _id: "$_id.state", avgCityPop: { $avg: "$pop" } } }
] )
In this example, the aggregation pipeline (page 423) consists of the $group stage followed by another $group
stage:
The first $group stage groups the documents by the combination of city and state, uses the $sum expression to calculate the population for each combination, and outputs a document for each city and state
combination. 8
After this stage in the pipeline, the documents resemble the following:
{
"_id" : {
"state" : "CO",
"city" : "EDGEWATER"
},
"pop" : 13154
}
A second $group stage groups the documents in the pipeline by the _id.state field (i.e. the state field
inside the _id document), uses the $avg expression to calculate the average city population (avgCityPop)
for each state, and outputs a document for each state.
The documents that result from this aggregation operation resembles the following:
8
A city can have more than one zip code associated with it as different sections of the city can each have a different zip code.
436
Chapter 7. Aggregation
{
"_id" : "MN",
"avgCityPop" : 5335
}
See also:
$group, $sum, $avg
Return Largest and Smallest Cities by State
The following aggregation operation returns the smallest and largest cities by population for each state:
db.zipcodes.aggregate( [
{ $group:
{
_id: { state: "$state", city: "$city" },
pop: { $sum: "$pop" }
}
},
{ $sort: { pop: 1 } },
{ $group:
{
_id : "$_id.state",
biggestCity: { $last: "$_id.city" },
biggestPop:
{ $last: "$pop" },
smallestCity: { $first: "$_id.city" },
smallestPop: { $first: "$pop" }
}
},
// the following $project is optional, and
// modifies the output format.
{ $project:
{ _id: 0,
state: "$_id",
biggestCity: { name: "$biggestCity", pop: "$biggestPop" },
smallestCity: { name: "$smallestCity", pop: "$smallestPop" }
}
}
] )
In this example, the aggregation pipeline (page 423) consists of a $group stage, a $sort stage, another $group
stage, and a $project stage:
The first $group stage groups the documents by the combination of the city and state, calculates the sum
of the pop values for each combination, and outputs a document for each city and state combination.
At this stage in the pipeline, the documents resemble the following:
{
"_id" : {
"state" : "CO",
"city" : "EDGEWATER"
},
"pop" : 13154
}
437
The $sort stage orders the documents in the pipeline by the pop field value, from smallest to largest; i.e. by
increasing order. This operation does not alter the documents.
The next $group stage groups the now-sorted documents by the _id.state field (i.e. the state field inside
the _id document) and outputs a document for each state.
The stage also calculates the following four fields for each state. Using the $last expression, the $group
operator creates the biggestCity and biggestPop fields that store the city with the largest population
and that population. Using the $first expression, the $group operator creates the smallestCity and
smallestPop fields that store the city with the smallest population and that population.
The documents, at this stage in the pipeline, resemble the following:
{
"_id" : "WA",
"biggestCity" : "SEATTLE",
"biggestPop" : 520096,
"smallestCity" : "BENGE",
"smallestPop" : 2
}
The final $project stage renames the _id field to state and moves the biggestCity, biggestPop,
smallestCity, and smallestPop into biggestCity and smallestCity embedded documents.
The output documents of this aggregation operation resemble the following:
{
"state" : "RI",
"biggestCity" : {
"name" : "CRANSTON",
"pop" : 176404
},
"smallestCity" : {
"name" : "CLAYVILLE",
"pop" : 45
}
}
438
Chapter 7. Aggregation
All documents from the users collection pass through the pipeline, which consists of the following operations:
The $project operator:
creates a new field called name.
converts the value of the _id to upper case, with the $toUpper operator. Then the $project creates
a new field, named name to hold this value.
suppresses the id field. $project will pass the _id field by default, unless explicitly suppressed.
The $sort operator orders the results by the name field.
The results of the aggregation would resemble the following:
{
"name" : "JANE"
},
{
"name" : "JILL"
},
{
"name" : "JOE"
}
The pipeline passes all documents in the users collection through the following operations:
The $project operator:
Creates two new fields: month_joined and name.
439
Suppresses the id from the results. The aggregate() method includes the _id, unless explicitly
suppressed.
The $month operator converts the values of the joined field to integer representations of the month. Then
the $project operator assigns those values to the month_joined field.
The $sort operator sorts the results by the month_joined field.
The operation returns results that resemble the following:
{
"month_joined" : 1,
"name" : "ruth"
},
{
"month_joined" : 1,
"name" : "harold"
},
{
"month_joined" : 1,
"name" : "kate"
}
{
"month_joined" : 2,
"name" : "jill"
}
The pipeline passes all documents in the users collection through the following operations:
The $project operator creates a new field called month_joined.
The $month operator converts the values of the joined field to integer representations of the month. Then
the $project operator assigns the values to the month_joined field.
The $group operator collects all documents with a given month_joined value and counts how many documents there are for that value. Specifically, for each unique value, $group creates a new per-month document
with two fields:
_id, which contains a nested document with the month_joined field and its value.
number, which is a generated field. The $sum operator increments this field by 1 for every document
containing the given month_joined value.
The $sort operator sorts the documents created by $group according to the contents of the month_joined
field.
The result of this aggregation operation would resemble the following:
440
Chapter 7. Aggregation
{
"_id" : {
"month_joined" : 1
},
"number" : 3
},
{
"_id" : {
"month_joined" : 2
},
"number" : 9
},
{
"_id" : {
"month_joined" : 3
},
"number" : 5
}
The pipeline begins with all documents in the users collection, and passes these documents through the following
operations:
The $unwind operator separates each value in the likes array, and creates a new version of the source
document for every element in the array.
Example
Given the following document from the users collection:
{
_id : "jane",
joined : ISODate("2011-03-02"),
likes : ["golf", "racquetball"]
}
441
joined : ISODate("2011-03-02"),
likes : "racquetball"
}
The $group operator collects all documents the same value for the likes field and counts each grouping.
With this information, $group creates a new document with two fields:
_id, which contains the likes value.
number, which is a generated field. The $sum operator increments this field by 1 for every document
containing the given likes value.
The $sort operator sorts these documents by the number field in reverse order.
The $limit operator only includes the first 5 result documents.
The results of aggregation would resemble the following:
{
"_id" : "golf",
"number" : 33
},
{
"_id" : "racquetball",
"number" : 31
},
{
"_id" : "swimming",
"number" : 24
},
{
"_id" : "handball",
"number" : 19
},
{
"_id" : "tennis",
"number" : 18
}
442
Chapter 7. Aggregation
2. Define the corresponding reduce function with two arguments keyCustId and valuesPrices:
The valuesPrices is an array whose elements are the price values emitted by the map function and
grouped by keyCustId.
The function reduces the valuesPrice array to the sum of its elements.
var reduceFunction1 = function(keyCustId, valuesPrices) {
return Array.sum(valuesPrices);
};
3. Perform the map-reduce on all documents in the orders collection using the mapFunction1 map function
and the reduceFunction1 reduce function.
db.orders.mapReduce(
mapFunction1,
reduceFunction1,
{ out: "map_reduce_example" }
)
443
};
emit(key, value);
}
};
2. Define the corresponding reduce function with two arguments keySKU and countObjVals:
countObjVals is an array whose elements are the objects mapped to the grouped keySKU values
passed by map function to the reducer function.
The function reduces the countObjVals array to a single object reducedValue that contains the
count and the qty fields.
In reducedVal, the count field contains the sum of the count fields from the individual array elements, and the qty field contains the sum of the qty fields from the individual array elements.
var reduceFunction2 = function(keySKU, countObjVals) {
reducedVal = { count: 0, qty: 0 };
for (var idx = 0; idx < countObjVals.length; idx++) {
reducedVal.count += countObjVals[idx].count;
reducedVal.qty += countObjVals[idx].qty;
}
return reducedVal;
};
3. Define a finalize function with two arguments key and reducedVal. The function modifies the
reducedVal object to add a computed field named avg and returns the modified object:
var finalizeFunction2 = function (key, reducedVal) {
reducedVal.avg = reducedVal.qty/reducedVal.count;
return reducedVal;
};
using
the
mapFunction2,
db.orders.mapReduce( mapFunction2,
reduceFunction2,
{
out: { merge: "map_reduce_example" },
query: { ord_date:
{ $gt: new Date('01/01/2012') }
},
finalize: finalizeFunction2
}
)
This operation uses the query field to select only those documents with ord_date greater than new
Date(01/01/2012). Then it output the results to a collection map_reduce_example. If the
map_reduce_example collection already exists, the operation will merge the existing contents with the
results of this map-reduce operation.
444
Chapter 7. Aggregation
{
{
{
{
userid:
userid:
userid:
userid:
"a",
"b",
"c",
"d",
ts:
ts:
ts:
ts:
ISODate('2011-11-03
ISODate('2011-11-03
ISODate('2011-11-03
ISODate('2011-11-03
14:17:00'),
14:23:00'),
15:02:00'),
16:45:00'),
length:
length:
length:
length:
95 } );
110 } );
120 } );
45 } );
db.sessions.save(
db.sessions.save(
db.sessions.save(
db.sessions.save(
{
{
{
{
userid:
userid:
userid:
userid:
"a",
"b",
"c",
"d",
ts:
ts:
ts:
ts:
ISODate('2011-11-04
ISODate('2011-11-04
ISODate('2011-11-04
ISODate('2011-11-04
11:05:00'),
13:14:00'),
17:00:00'),
15:37:00'),
length:
length:
length:
length:
105 } );
120 } );
130 } );
65 } );
2. Define the corresponding reduce function with two arguments key and values to calculate the total time and
the count. The key corresponds to the userid, and the values is an array whose elements corresponds to
the individual objects mapped to the userid in the mapFunction.
445
3. Define the finalize function with two arguments key and reducedValue. The function modifies the
reducedValue document to add another field average and returns the modified document.
var finalizeFunction = function (key, reducedValue) {
if (reducedValue.count > 0)
reducedValue.avg_time = reducedValue.total_time / reducedValue.cou
return reducedValue;
};
4. Perform map-reduce on the session collection using the mapFunction, the reduceFunction, and the
finalizeFunction functions. Output the results to a collection session_stat. If the session_stat
collection already exists, the operation will replace the contents:
db.sessions.mapReduce( mapFunction,
reduceFunction,
{
out: "session_stat",
finalize: finalizeFunction
}
)
{
{
{
{
userid:
userid:
userid:
userid:
"a",
"b",
"c",
"d",
ts:
ts:
ts:
ts:
ISODate('2011-11-05
ISODate('2011-11-05
ISODate('2011-11-05
ISODate('2011-11-05
14:17:00'),
14:23:00'),
15:02:00'),
16:45:00'),
length:
length:
length:
length:
100 } );
115 } );
125 } );
55 } );
At the end of the day, perform incremental map-reduce on the sessions collection, but use the query field to select
only the new documents. Output the results to the collection session_stat, but reduce the contents with the
results of the incremental map-reduce:
db.sessions.mapReduce( mapFunction,
reduceFunction,
{
446
Chapter 7. Aggregation
1. Define the map function that maps the price to the cust_id for each document and emits the cust_id and
price pair:
var map = function() {
emit(this.cust_id, this.price);
};
3. Invoke the map function with a single document from the orders collection:
var myDoc = db.orders.findOne( { _id: ObjectId("50a8240b927d5d8b5891743c") } );
map.apply(myDoc);
5. Invoke the map function with multiple documents from the orders collection:
var myCursor = db.orders.find( { cust_id: "abc123" } );
while (myCursor.hasNext()) {
var doc = myCursor.next();
print ("document _id= " + tojson(doc._id));
map.apply(doc);
print();
}
447
5. Define a reduceFunction2 function that takes the arguments keySKU and valuesCountObjects.
valuesCountObjects is an array of documents that contain two fields count and qty:
var reduceFunction2 = function(keySKU, valuesCountObjects) {
reducedValue = { count: 0, qty: 0 };
for (var idx = 0; idx < valuesCountObjects.length; idx++) {
reducedValue.count += valuesCountObjects[idx].count;
reducedValue.qty += valuesCountObjects[idx].qty;
}
return reducedValue;
};
448
Chapter 7. Aggregation
8. Verify the reduceFunction2 returned a document with exactly the count and the qty field:
{ "count" : 6, "qty" : 30 }
2. Define a reduceFunction2 function that takes the arguments keySKU and valuesCountObjects.
valuesCountObjects is an array of documents that contain two fields count and qty:
var reduceFunction2 = function(keySKU, valuesCountObjects) {
reducedValue = { count: 0, qty: 0 };
for (var idx = 0; idx < valuesCountObjects.length; idx++) {
reducedValue.count += valuesCountObjects[idx].count;
reducedValue.qty += valuesCountObjects[idx].qty;
}
return reducedValue;
};
3. Invoke the reduceFunction2 first with values1 and then with values2:
reduceFunction2('myKey', values1);
reduceFunction2('myKey', values2);
449
3. Define a sample valuesIdempotent array that contains an element that is a call to the reduceFunction2
function:
var valuesIdempotent = [
{ count: 1, qty: 5 },
{ count: 2, qty: 10 },
reduceFunction2(myKey, [ { count:3, qty: 15 } ] )
];
4. Define a sample values1 array that combines the values passed to reduceFunction2:
var values1 = [
{ count: 1, qty: 5 },
{ count: 2, qty: 10 },
{ count: 3, qty: 15 }
];
5. Invoke the reduceFunction2 first with myKey and valuesIdempotent and then with myKey and
values1:
reduceFunction2(myKey, valuesIdempotent);
reduceFunction2(myKey, values1);
450
Chapter 7. Aggregation
MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregation Framework and Hadoop10
The Aggregation Framework11
Webinar: Exploring the Aggregation Framework12
451
Name
Description
$project
Reshapes each document in the stream, such as by adding new fields or removing existing fields. For
each input document, outputs one document.
$match Filters the document stream to allow only matching documents to pass unmodified into the next
pipeline stage. $match uses standard MongoDB queries. For each input document, outputs either one
document (a match) or zero documents (no match).
$redactReshapes each document in the stream by restricting the content for each document based on
information stored in the documents themselves. Incorporates the functionality of $project and
$match. Can be used to implement field level redaction. For each input document, outputs either one
or zero document.
$limit Passes the first n documents unmodified to the pipeline where n is the specified limit. For each input
document, outputs either one document (for the first n documents) or zero documents (after the first n
documents).
$skip Skips the first n documents where n is the specified skip number and passes the remaining documents
unmodified to the pipeline. For each input document, outputs either zero documents (for the first n
documents) or one document (if after the first n documents).
$unwindDeconstructs an array field from the input documents to output a document for each element. Each
output document replaces the array with an element value. For each input document, outputs n
documents where n is the number of array elements and can be zero for an empty array.
$group Groups input documents by a specified identifier expression and applies the accumulator expression(s),
if specified, to each group. Consumes all input documents and outputs one document per each distinct
group. The output documents only contain the identifier field and, if specified, accumulated fields.
$sort Reorders the document stream by a specified sort key. Only the order changes; the documents remain
unmodified. For each input document, outputs one document.
$geoNear
Returns an ordered stream of documents based on the proximity to a geospatial point. Incorporates the
functionality of $match, $sort, and $limit for geospatial data. The output documents include an
additional distance field and can include a location identifier field.
$out
Writes the resulting documents of the aggregation pipeline to a collection. To use the $out stage, it
must be the last stage in the pipeline.
Expressions
Expressions can include field paths and system variables (page 452), literals (page 453), expression objects (page 453),
and expression operators (page 453). Expressions can be nested.
Field Path and System Variables
Aggregation expressions use field path to access fields in the input documents. To specify a field path, use a string that
prefixes with a dollar sign $ the field name or the dotted field name, if the field is in embedded document. For example,
"$user" to specify the field path for the user field or "$user.name" to specify the field path to "user.name"
field.
"$<field>" is equivalent to "$$CURRENT.<field>" where the CURRENT (page 461) is a system variable that
defaults to the root of the current object in the most stages, unless stated otherwise in specific stages. CURRENT
(page 461) can be rebound.
Along with the CURRENT (page 461) system variable, other system variables (page 460) are also available for use in
expressions. To use user-defined variables, use $let and $map expressions. To access variables in expressions, use
a string that prefixes the variable name with $$.
452
Chapter 7. Aggregation
Literals
Literals can be of any type. However, MongoDB parses string literals that start with a dollar sign $ as a path to a field
and numeric/boolean literals in expression objects (page 453) as projection flags. To avoid parsing literals, use the
$literal expression.
Expression Objects
If the expressions are numeric or boolean literals, MongoDB treats the literals as projection flags (e.g. 1 or true to
include the field), valid only in the $project stage. To avoid treating numeric or boolean literals as projection flags,
use the $literal expression to wrap the numeric or boolean literals.
Operator Expressions
Operator expressions are similar to functions that take arguments. In general, these expressions take an array of
arguments and have the following form:
{ <operator>: [ <argument1>, <argument2> ... ] }
If operator accepts a single argument, you can omit the outer array designating the argument list:
{ <operator>: <argument> }
To avoid parsing ambiguity if the argument is a literal array, you must wrap the literal array in a $literal expression
or keep the outer array that designates the argument list.
Boolean Expressions Boolean expressions evaluate their argument expressions as booleans and return a boolean as
the result.
In addition to the false boolean value, Boolean expression evaluates as false the following: null, 0, and
undefined values. The Boolean expression evaluates all other values as true, including non-zero numeric values
and arrays.
Name
$and
$or
$not
Description
Returns true only when all its expressions evaluate to true. Accepts any number of argument
expressions.
Returns true when any of its expressions evaluates to true. Accepts any number of argument
expressions.
Returns the boolean value that is the opposite of its argument expression. Accepts a single argument
expression.
Set Expressions Set expressions performs set operation on arrays, treating arrays as sets. Set expressions ignores
the duplicate entries in each input array and the order of the elements.
If the set operation returns a set, the operation filters out duplicates in the result to output an array that contains only
unique entries. The order of the elements in the output array is unspecified.
If a set contains a nested array element, the set expression does not descend into the nested array but evaluates the
array at top-level.
453
Name
Description
$setEquals Returns true if the input sets have the same distinct elements. Accepts two or more argument
expressions.
$setIntersection
Returns a set with elements that appear in all of the input sets. Accepts any number of argument
expressions.
$setUnion Returns a set with elements that appear in any of the input sets. Accepts any number of argument
expressions.
$setDifference
Returns a set with elements that appear in the first set but not in the second set; i.e. performs a
relative complement13 of the second set relative to the first. Accepts exactly two argument
expressions.
$setIsSubsetReturns true if all elements of the first set appear in the second set, including when the first set
equals the second set; i.e. not a strict subset14 . Accepts exactly two argument expressions.
$anyElementTrue
Returns true if any elements of a set evaluate to true; otherwise, returns false. Accepts a
single argument expression.
$allElementsTrue
Returns true if no element of a set evaluates to false, otherwise, returns false. Accepts a
single argument expression.
Comparison Expressions Comparison expressions return a boolean except for $cmp which returns a number.
The comparison expressions take two argument expressions and compare both value and type, using the specified
BSON comparison order (page 176) for values of different types.
Name Description
$cmp Returns: 0 if the two values are equivalent, 1 if the first value is greater than the second, and -1 if the
first value is less than the second.
$eq
Returns true if the values are equivalent.
$gt
Returns true if the first value is greater than the second.
$gte Returns true if the first value is greater than or equal to the second.
$lt
Returns true if the first value is less than the second.
$lte Returns true if the first value is less than or equal to the second.
$ne
Returns true if the values are not equivalent.
Arithmetic Expressions Arithmetic expressions perform mathematic operations on numbers. Some arithmetic expressions can also support date arithmetic.
Name
$add
Description
Adds numbers to return the sum, or adds numbers and a date to return a new date. If adding numbers
and a date, treats the numbers as milliseconds. Accepts any number of argument expressions, but at
most, one expression can resolve to a date.
$subtract
Returns the result of subtracting the second value from the first. If the two values are numbers, return
the difference. If the two values are dates, return the difference in milliseconds. If the two values are a
date and a number in milliseconds, return the resulting date. Accepts two argument expressions. If the
two values are a date and a number, specify the date argument first as it is not meaningful to subtract a
date from a number.
$multiply
Multiplies numbers to return the product. Accepts any number of argument expressions.
$divide Returns the result of dividing the first number by the second. Accepts two argument expressions.
$mod
Returns the remainder of the first number divided by the second. Accepts two argument expressions.
String Expressions String expressions, with the exception of $concat, only have a well-defined behavior for
strings of ASCII characters.
$concat behavior is well-defined regardless of the characters used.
13 http://en.wikipedia.org/wiki/Complement_(set_theory)
14 http://en.wikipedia.org/wiki/Subset
454
Chapter 7. Aggregation
Name
$concat
$substr
Description
Concatenates any number of strings.
Returns a substring of a string, starting at a specified index position up to a specified length. Accepts
three expressions as arguments: the first argument must resolve to a string, and the second and third
arguments must resolve to integers.
$toLower Converts a string to lowercase. Accepts a single argument expression.
$toUpper Converts a string to uppercase. Accepts a single argument expression.
$strcasecmp
Performs case-insensitive string comparison and returns: 0 if two strings are equivalent, 1 if the first
string is greater than the second, and -1 if the first string is less than the second.
Array Expressions
Name
$size
Date Expressions
Description
Access text search metadata.
Description
Returns the number of elements in the array. Accepts a single expression as argument.
Name Description
$map Applies a subexpression to each element of an array and returns the array of resulting values in order
Accepts named parameters.
$let Defines variables for use within the scope of a subexpression and returns the result of the subexpress
Accepts named parameters.
Variable Expressions
Literal Expressions
Name
$meta
Name
Description
$literal
Return a value without parsing. Use for values that the aggregation pipeline may interpret as an
expression. For example, use a $literal expression to a string that starts with a $ to avoid parsing
a field path.
Name
Description
$dayOfYear Returns the day of the year for a date as a number between 1 and 366 (leap year).
$dayOfMonth Returns the day of the month for a date as a number between 1 and 31.
$dayOfWeek Returns the day of the week for a date as a number between 1 (Sunday) and 7 (Saturday).
$year
Returns the year for a date as a number (e.g. 2014).
$month
Returns the month for a date as a number between 1 (January) and 12 (December).
$week
Returns the week number for a date as a number between 0 (the partial week that precedes the
first Sunday of the year) and 53 (leap year).
$hour
Returns the hour for a date as a number between 0 and 23.
$minute
Returns the minute for a date as a number between 0 and 59.
$second
Returns the seconds for a date as a number between 0 and 60 (leap seconds).
$millisecondReturns the milliseconds of a date as a number between 0 and 999.
$dateToString
Returns the date as a formatted string.
Conditional Expressions
Name Description
$cond A ternary operator that evaluates one expression, and depending on the result, returns the value o
the other two expressions. Accepts either three expressions in an ordered list or three named par
$ifNullReturns either the non-null result of the first expression or the result of the second expression if t
expression results in a null result. Null result encompasses instances of undefined values or miss
fields. Accepts two expressions as arguments. The result of the second expression can be null.
455
Accumulators
Accumulators, available only for the $group stage, compute values by combining documents that share the same
group key. Accumulators take as input a single expression, evaluating the expression once for each input document,
and maintain their state for the group of documents.
Name
$sum
$avg
$first
Description
Returns a sum for each group. Ignores non-numeric values.
Returns an average for each group. Ignores non-numeric values.
Returns a value from the first document for each group. Order is only defined if the documents are
in a defined order.
$last
Returns a value from the last document for each group. Order is only defined if the documents are
in a defined order.
$max
Returns the highest expression value for each group.
$min
Returns the lowest expression value for each group.
$push
Returns an array of expression values for each group.
$addToSet Returns an array of unique expression values for each group. Order of the array elements is
undefined.
456
Chapter 7. Aggregation
aggregate
mapReduce
group
De- New in version 2.2.
Implements the Map-Reduce
Provides grouping functionality.
scrip- Designed with specific goals of
aggregation for processing large
Is slower than the aggregate
tion improving performance and
data sets.
command and has less
usability for aggregation tasks.
functionality than the
Uses a pipeline approach
mapReduce command.
where objects are transformed as
they pass through a series of
pipeline operators such as
$group, $match, and $sort.
See
http://docs.mongodb.org/manual/reference/operator/aggregation
for more information on the
pipeline operators.
Key Pipeline operators can be
In addition to grouping
Can either group by existing
Fea- repeated as needed.
operations, can perform complex
fields or with a custom keyf
tures Pipeline operators need not
aggregation tasks as well as
JavaScript function, can group by
produce one output document for perform incremental aggregation
calculated fields.
every input document.
on continuously growing
See group for information and
Can also generate new
datasets.
example using the keyf
documents or filter out
See Map-Reduce Examples
function.
documents.
(page 442) and Perform
Incremental Map-Reduce
(page 445).
Flex- Limited to the operators and
Custom map, reduce and
Custom reduce and
iexpressions supported by the
finalize JavaScript functions
finalize JavaScript functions
bil- aggregation pipeline.
offer flexibility to aggregation
offer flexibility to grouping logic.
ity
However, can add computed
logic.
See group for details and
fields, create new virtual
See mapReduce for details and
restrictions on these functions.
sub-objects, and extract
restrictions on the functions.
sub-fields into the top-level of
results by using the $project
pipeline operator.
See $project for more
information as well as
http://docs.mongodb.org/manual/reference/operator/aggregation
for more information on all the
available pipeline operators.
Out- Returns results in various options Returns results in various options Returns results inline as an array
put (inline as a document that
(inline, new collection, merge,
of grouped items.
Re- contains the result set, a cursor to replace, reduce). See
The result set must fit within the
sults the result set) or stores the results mapReduce for details on the
maximum BSON document size
in a collection.
output options.
limit.
The result is subject to the BSON Changed in version 2.2: Provides Changed in version 2.2: The
Document size limit if returned
much better support for sharded
returned array can contain at
inline as a document that
map-reduce output than previous
most 20,000 elements; i.e. at
contains the result set.
versions.
most 20,000 unique groupings.
Changed in version 2.6: Can
Previous versions had a limit of
return results as a cursor or store
10,000 elements.
the results to a collection.
Shard-Supports non-sharded and
Supports non-sharded and
Does not support sharded
ing sharded input collections.
sharded input collections.
collection.
Notes
Prior to 2.4, JavaScript code
Prior to 2.4, JavaScript code
executed in a single thread.
executed in a single thread.
More See Aggregation Pipeline
See Map-Reduce (page 424) and
See group.
In(page 421) and aggregate.
mapReduce.
7.4.
457
for- Aggregation Reference
mation
$match
$group
$match
$project
$sort
$limit
$sum
$sum
No direct corresponding operator; however, the $unwind operator allows for
somewhat similar functionality, but with fields embedded within the document.
Examples
The following table presents a quick reference of SQL aggregation statements and the corresponding MongoDB statements. The examples in the table assume the following conditions:
The SQL examples assume two tables, orders and order_lineitem that join by the
order_lineitem.order_id and the orders.id columns.
The MongoDB examples assume one collection orders that contain documents of the following prototype:
{
cust_id: "abc123",
ord_date: ISODate("2012-11-02T17:04:11.102Z"),
status: 'A',
price: 50,
items: [ { sku: "xxx", qty: 25, price: 1 },
{ sku: "yyy", qty: 25, price: 1 } ]
}
458
Chapter 7. Aggregation
SQL Example
MongoDB Example
db.orders.aggregate( [
{
$group: {
_id: null,
count: { $sum: 1 }
}
}
] )
Description
Count all records from orders
db.orders.aggregate( [
{
$group: {
459
For cust_id with multiple records,
return the cust_id and the corresponding record count.
Description
Performs aggregation tasks (page 421) such as group using the aggregation framework.
Counts the number of documents in a collection.
Displays the distinct values found for a specified key in a collection.
Groups documents in a collection by the specified key and performs simple aggregation.
Performs map-reduce (page 424) aggregation for large data sets.
Aggregation Methods
Name
Description
db.collection.aggregate()Provides access to the aggregation pipeline (page 421).
db.collection.group()
Groups documents in a collection by the specified key and performs simple
aggregation.
db.collection.mapReduce()Performs map-reduce (page 424) aggregation for large data sets.
460
Chapter 7. Aggregation
Variable
ROOT
CURRENT
DESCEND
Description
References the root document, i.e. the top-level document, currently being processed in the aggregation
pipeline stage.
References the start of the field path being processed in
the aggregation pipeline stage. Unless documented otherwise, all stages start with CURRENT (page 461) the
same as ROOT (page 461).
CURRENT (page 461) is modifiable. However, since
$<field> is equivalent to $$CURRENT.<field>,
rebinding CURRENT (page 461) changes the meaning
of $ accesses.
One of the allowed results of a $redact expression.
PRUNE
KEEP
See also:
$let, $redact
461
462
Chapter 7. Aggregation
CHAPTER 8
Indexes
Indexes provide high performance read operations for frequently used queries.
This section introduces indexes in MongoDB, describes the types and configuration options for indexes, and describes
special types of indexing MongoDB supports. The section also provides tutorials detailing procedures and operational
concerns, and providing information on how applications may use indexes.
Index Introduction (page 463) An introduction to indexes in MongoDB.
Index Concepts (page 468) The core documentation of indexes in MongoDB, including geospatial and text indexes.
Index Types (page 469) MongoDB provides different types of indexes for different purposes and different types
of content.
Index Properties (page 488) The properties you can specify when building indexes.
Index Creation (page 493) The options available when creating indexes.
Index Intersection (page 495) The use of index intersection to fulfill a query.
Indexing Tutorials (page 502) Examples of operations involving indexes, including index creation and querying indexes.
Indexing Reference (page 537) Reference material for indexes in MongoDB.
463
8.1.1 Optimization
Consider the documentation of the query optimizer (page 66) for more information on the relationship between queries
and indexes.
Create indexes to support common and user-facing queries. Having these indexes will ensure that MongoDB only
scans the smallest possible number of documents.
Indexes can also optimize the performance of other operations in specific situations:
Sorted Results
MongoDB can use indexes to return documents sorted by the index key directly from the index without requiring an
additional sort phase.
An index is traversable for sorting in either direction. For details, see Use Indexes to Sort Query Results (page 533).
Covered Results
When the query criteria and the projection of a query include only the indexed fields, MongoDB will return results
directly from the index without scanning any documents or bringing documents into memory. These covered queries
can be very efficient.
Chapter 8. Indexes
465
The _id index is unique, and prevents clients from inserting two documents with the same value for the _id field.
Single Field
In addition to the MongoDB-defined _id index, MongoDB supports user-defined indexes on a single field of a document (page 470). Consider the following illustration of a single-field index:
Compound Index
MongoDB also supports user-defined indexes on multiple fields. These compound indexes (page 472) behave like
single-field indexes; however, the query can select documents based on additional fields. The order of fields listed
in a compound index has significance. For instance, if a compound index consists of { userid: 1, score:
-1 }, the index sorts first by userid and then, within each userid value, sort by score. Consider the following
illustration of this compound index:
Multikey Index
MongoDB uses multikey indexes (page 474) to index the content stored in arrays. If you index a field that holds an
array value, MongoDB creates separate index entries for every element of the array. These multikey indexes (page 474)
allow queries to select documents that contain arrays by matching on element or elements of the arrays. MongoDB
466
Chapter 8. Indexes
automatically determines whether to create a multikey index if the indexed field contains an array value; you do not
need to explicitly specify the multikey type.
Consider the following illustration of a multikey index:
Geospatial Index
To support efficient queries of geospatial coordinate data, MongoDB provides two special indexes: 2d indexes
(page 483) that uses planar geometry when returning results and 2sphere indexes (page 478) that use spherical geometry to return results.
See 2d Index Internals (page 484) for a high level introduction to geospatial indexes.
Text Indexes
MongoDB provides a text index type that supports searching for string content in a collection. These text indexes
do not store language-specific stop words (e.g. the, a, or) and stem the words in a collection to only store root
words.
See Text Indexes (page 486) for more information on text indexes and search.
Hashed Indexes
To support hash based sharding (page 654), MongoDB provides a hashed index (page 487) type, which indexes the
hash of the value of a field. These indexes have a more random distribution of values along their range, but only
support equality matches and cannot support range-based queries.
467
468
Chapter 8. Indexes
Multikey Indexes (page 474) A multikey index is an index on an array field, adding an index key for each value
in the array.
Geospatial Indexes and Queries (page 476) Geospatial indexes support location-based searches on data that is
stored as either GeoJSON objects or legacy coordinate pairs.
Text Indexes (page 486) Text indexes support search of string content in documents.
Hashed Index (page 487) Hashed indexes maintain entries with hashes of the values of the indexed field and
are primarily used with sharded clusters to support hashed shard keys.
Index Properties (page 488) The properties you can specify when building indexes.
TTL Indexes (page 488) The TTL index is used for TTL collections, which expire data after a period of time.
Unique Indexes (page 490) A unique index causes MongoDB to reject all documents that contain a duplicate
value for the indexed field.
Sparse Indexes (page 490) A sparse index does not index documents that do not have the indexed field.
Index Creation (page 493) The options available when creating indexes.
Index Intersection (page 495) The use of index intersection to fulfill a query.
Multikey Index Bounds (page 497) The computation of bounds on a multikey index scan.
MongoDB indexes may be ascending, (i.e. 1) or descending (i.e. -1) in their ordering. Nevertheless, MongoDB
may traverse the index in either direction. As a result, for single-field indexes, ascending and descending indexes are
interchangeable. This is not the case for compound indexes: in compound indexes, the direction of the sort order can
have a greater impact on the results.
See Sort Order (page 473) for more information on the impact of index order on results in compound indexes.
469
Index Intersection
MongoDB can use the intersection of indexes to fulfill queries with compound conditions. See Index Intersection
(page 495) for details.
Limits
Certain restrictions apply to indexes, such as the length of the index keys or the number of indexes per collection. See
Index Limitations for details.
Index Type Documentation
Single Field Indexes (page 470) A single field index only includes data from a single field of the documents in a
collection. MongoDB supports single field indexes on fields at the top level of a document and on fields in
sub-documents.
Compound Indexes (page 472) A compound index includes more than one field of the documents in a collection.
Multikey Indexes (page 474) A multikey index is an index on an array field, adding an index key for each value in
the array.
Geospatial Indexes and Queries (page 476) Geospatial indexes support location-based searches on data that is stored
as either GeoJSON objects or legacy coordinate pairs.
Text Indexes (page 486) Text indexes support search of string content in documents.
Hashed Index (page 487) Hashed indexes maintain entries with hashes of the values of the indexed field and are
primarily used with sharded clusters to support hashed shard keys.
Single Field Indexes
MongoDB provides complete support for indexes on any field in a collection of documents. By default, all collections
have an index on the _id field (page 471), and applications and users may add additional indexes to support important
queries and operations.
MongoDB supports indexes that contain either a single field or multiple fields depending on the operations that this
index-type supports. This document describes indexes that contain a single field. Consider the following illustration
of a single field index.
470
Chapter 8. Indexes
See also:
Compound Indexes (page 472) for information about indexes that include multiple fields, and Index Introduction
(page 463) for a higher level introduction to indexing in MongoDB.
Example Given the following document in the friends collection:
{ "_id" : ObjectId(...),
"name" : "Alice",
"age" : 27
}
Cases
_id Field Index MongoDB creates the _id index, which is an ascending unique index (page 490) on the _id field,
for all collections when the collection is created. You cannot remove the index on the _id field.
Think of the _id field as the primary key for a collection. Every document must have a unique _id field. You may
store any unique value in the _id field. The default value of _id is an ObjectId which is generated when the client
inserts the document. An ObjectId is a 12-byte unique identifier suitable for use as the value of an _id field.
Note: In sharded clusters, if you do not use the _id field as the shard key, then your application must ensure the
uniqueness of the values in the _id field to prevent errors. This is most-often done by using a standard auto-generated
ObjectId.
Before version 2.2, capped collections did not have an _id field. In version 2.2 and newer, capped collections do
have an _id field, except those in the local database. See Capped Collections Recommendations and Restrictions
(page 209) for more information.
Indexes on Embedded Fields You can create indexes on fields within embedded documents, just as you can index
top-level fields in documents. Indexes on embedded fields differ from indexes on embedded documents (page 472),
which include the full content up to the maximum index size of the embedded document in the index. Instead,
indexes on embedded fields allow you to use a dot notation, to introspect into embedded documents.
Consider a collection named people that holds documents that resemble the following example document:
{"_id": ObjectId(...),
"name": "John Doe",
"address": {
"street": "Main",
"zipcode": "53511",
"state": "WI"
}
}
You can create an index on the address.zipcode field, using the following specification:
db.people.createIndex( { "address.zipcode": 1 } )
471
Indexes on Embedded Documents You can also create indexes on embedded documents.
For example, the factories collection contains documents that contain a metro field, such as:
{
_id: ObjectId(...),
metro: {
city: "New York",
state: "NY"
},
name: "Giant Factory"
}
The metro field is an embedded document, containing the embedded fields city and state. The following command creates an index on the metro field as a whole:
db.factories.createIndex( { metro: 1 } )
The following query can use the index on the metro field:
db.factories.find( { metro: { city: "New York", state: "NY" } } )
This query returns the above document. When performing equality matches on embedded documents, field order
matters and the embedded documents must match exactly. For example, the following query does not match the above
document:
db.factories.find( { metro: { state: "NY", city: "New York" } } )
MongoDB supports compound indexes, where a single index structure holds references to multiple fields
collections documents. The following diagram illustrates an example of a compound index on two fields:
within a
472
Chapter 8. Indexes
{
"_id": ObjectId(...),
"item": "Banana",
"category": ["food", "produce", "grocery"],
"location": "4th Street Store",
"stock": 4,
"type": "cases",
"arrival": Date(...)
}
If applications query on the item field as well as query on both the item field and the stock field, you can specify
a single compound index to support both of these queries:
db.products.createIndex( { "item": 1, "stock": 1 } )
Important: You may not create compound indexes that have hashed index fields. You will receive an error if you
attempt to create a compound index that includes a hashed index (page 487).
The order of the fields in a compound index is very important. In the previous example, the index will contain
references to documents sorted first by the values of the item field and, within each value of the item field, sorted
by values of the stock field. See Sort Order (page 473) for more information.
In addition to supporting queries that match on all the index fields, compound indexes can support queries that match
on the prefix of the index fields. For details, see Prefixes (page 473).
Sort Order Indexes store references to fields in either ascending (1) or descending (-1) sort order. For single-field
indexes, the sort order of keys doesnt matter because MongoDB can traverse the index in either direction. However,
for compound indexes (page 472), sort order can matter in determining whether the index can support a sort operation.
Consider a collection events that contains documents with the fields username and date. Applications can issue
queries that return results sorted first by ascending username values and then by descending (i.e. more recent to last)
date values, such as:
db.events.find().sort( { username: 1, date: -1 } )
or queries that return results sorted first by descending username values and then by ascending date values, such
as:
db.events.find().sort( { username: -1, date: 1 } )
However, the above index cannot support sorting by ascending username values and then by ascending date
values, such as the following:
db.events.find().sort( { username: 1, date: 1 } )
Prefixes Index prefixes are the beginning subsets of indexed fields. For example, consider the following compound
index:
{ "item": 1, "location": 1, "stock": 1 }
1 }
473
{ item:
1, location:
1 }
For a compound index, MongoDB can use the index to support queries on the index prefixes. As such, MongoDB can
use the index for queries on the following fields:
the item field,
the item field and the location field,
the item field and the location field and the stock field.
MongoDB can also use the index to support a query on item and stock fields since item field corresponds to a
prefix. However, the index would not be as efficient in supporting the query as would be an index on only item and
stock.
However, MongoDB cannot use the index to support queries that include the following fields since without the item
field, none of the listed fields correspond to a prefix index:
the location field,
the stock field, or
the location and stock fields.
If you have a collection that has both a compound index and an index on its prefix (e.g. { a: 1, b: 1 } and
{ a: 1 }), if neither index has a sparse or unique constraint, then you can remove the index on the prefix (e.g. {
a: 1 }). MongoDB will use the compound index in all of the situations that it would have used the prefix index.
Index Intersection Starting in version 2.6, MongoDB can use index intersection (page 495) to fulfill queries. The
choice between creating compound indexes that support your queries or relying on index intersection depends on the
specifics of your system. See Index Intersection and Compound Indexes (page 496) for more details.
Multikey Indexes
To index a field that holds an array value, MongoDB adds index items for each item in the array. These multikey indexes
allow MongoDB to return documents from queries using the value of an array. MongoDB automatically determines
whether to create a multikey index if the indexed field contains an array value; you do not need to explicitly specify
the multikey type.
Consider the following illustration of a multikey index:
Multikey indexes support all operations supported by other MongoDB indexes; however, applications may use multikey indexes to select documents based on ranges of values for the value of an array. Multikey indexes support arrays
that hold both values (e.g. strings, numbers) and nested documents.
Limitations
Interactions between Compound and Multikey Indexes While you can create multikey compound indexes
(page 472), at most one field in a compound index may hold an array. For example, given an index on { a: 1,
b: 1 }, the following documents are permissible:
{a: [1, 2], b: 1}
{a: 1, b: [1, 2]}
However, the following document is impermissible, and MongoDB cannot insert such a document into a collection
with the {a: 1, b: 1 } index:
474
Chapter 8. Indexes
If you attempt to insert such a document, MongoDB will reject the insertion, and produce an error that says cannot
index parallel arrays. MongoDB does not index parallel arrays because they require the index to include
each value in the Cartesian product of the compound keys, which could quickly result in incredibly large and difficult
to maintain indexes.
Shard Keys
Important: The index of a shard key cannot be a multi-key index.
Hashed Indexes hashed indexes are not compatible with multi-key indexes.
To compute the hash for a hashed index, MongoDB collapses embedded documents and computes the hash for the
entire value. For fields that hold arrays or embedded documents, you cannot use the index to support queries that
introspect the embedded document.
Examples
Index Basic Arrays Given the following document:
{
"_id" : ObjectId("..."),
"name" : "Warm Weather",
"author" : "Steve",
"tags" : [ "weather", "hot", "record", "april" ]
}
475
"weather",
"hot",
"record", and
"april".
Queries could use the multikey index to return queries for any of the above values.
Index Arrays with Embedded Documents You can create multikey indexes on fields in objects embedded in arrays,
as in the following example:
Consider a feedback collection with documents in the following form:
{
"_id": ObjectId(...),
"title": "Grocery Quality",
"comments": [
{ author_id: ObjectId(...),
date: Date(...),
text: "Please expand the cheddar selection." },
{ author_id: ObjectId(...),
date: Date(...),
text: "Please expand the mustard selection." },
{ author_id: ObjectId(...),
date: Date(...),
text: "Please expand the olive selection." }
]
}
An index on the comments.text field would be a multikey index and would add items to the index for all embedded
documents in the array.
With the index { "comments.text":
The query would select the documents in the collection that contain the following embedded document in the
comments array:
{ author_id: ObjectId(...),
date: Date(...),
text: "Please expand the olive selection." }
MongoDB offers a number of indexes and query mechanisms to handle geospatial information. This section introduces
MongoDBs geospatial features. For complete examples of geospatial queries in MongoDB, see Geospatial Index
Tutorials (page 515).
Surfaces Before storing your location data and writing queries, you must decide the type of surface to use to perform
calculations. The type you choose affects how you store data, what type of index to build, and the syntax of your
queries.
MongoDB offers two surface types:
476
Chapter 8. Indexes
Spherical To calculate geometry over an Earth-like sphere, store your location data on a spherical surface and use
2dsphere (page 478) index.
Store your location data as GeoJSON objects with this coordinate-axis order: longitude, latitude. The coordinate
reference system for GeoJSON uses the WGS84 datum.
Flat To calculate distances on a Euclidean plane, store your location data as legacy coordinate pairs and use a 2d
(page 483) index.
Location Data If you choose spherical surface calculations, you store location data as either:
GeoJSON Objects Queries on GeoJSON objects always calculate on a sphere. The default coordinate reference
system for GeoJSON uses the WGS84 datum.
New in version 2.4: Support for GeoJSON storage and queries is new in version 2.4. Prior to version 2.4, all geospatial
data used coordinate pairs.
Changed in version 2.6: Support for additional GeoJSON types: MultiPoint, MultiLineString, MultiPolygon, GeometryCollection.
MongoDB supports the following GeoJSON objects:
Point
LineString
Polygon
MultiPoint
MultiLineString
MultiPolygon
GeometryCollection
Legacy Coordinate Pairs MongoDB supports spherical surface calculations on legacy coordinate pairs using a
2dsphere index by converting the data to the GeoJSON Point type.
If you choose flat surface calculations via a 2d index, you can store data only as legacy coordinate pairs.
Query Operations MongoDBs geospatial query operators let you query for:
Inclusion MongoDB can query for locations contained entirely within a specified polygon. Inclusion queries use
the $geoWithin operator.
Both 2d and 2dsphere indexes can support inclusion queries. MongoDB does not require an index for inclusion
queries; however, such indexes will improve query performance.
Intersection MongoDB can query for locations that intersect with a specified geometry. These queries apply only
to data on a spherical surface. These queries use the $geoIntersects operator.
Only 2dsphere indexes support intersection.
Proximity MongoDB can query for the points nearest to another point. Proximity queries use the $near operator.
The $near operator requires a 2d or 2dsphere index.
8.2. Index Concepts
477
Geospatial Indexes MongoDB provides the following geospatial index types to support the geospatial queries.
2dsphere 2dsphere (page 478) indexes support:
Calculations on a sphere
GeoJSON objects and include backwards compatibility for legacy coordinate pairs
Compound indexes with scalar index fields (i.e. ascending or descending) as a prefix or suffix of the 2dsphere
index field
New in version 2.4: 2dsphere indexes are not available before version 2.4.
See also:
Query a 2dsphere Index (page 516)
2d 2d (page 483) indexes support:
Calculations using flat geometry
Legacy coordinate pairs (i.e., geospatial points on a flat coordinate system)
Compound indexes with only one additional field, as a suffix of the 2d index field
See also:
Query a 2d Index (page 519)
Geospatial Indexes and Sharding You cannot use a geospatial index as the shard key index.
You can create and maintain a geospatial index on a sharded collection if it uses fields other than the shard key fields.
For sharded collections, queries using $near are not supported. You can instead use either the geoNear command
or the $geoNear aggregation stage.
You also can query for geospatial data using $geoWithin.
Additional Resources The following pages provide complete documentation for geospatial indexes and queries:
2dsphere Indexes (page 478) A 2dsphere index supports queries that calculate geometries on an earth-like sphere.
The index supports data stored as both GeoJSON objects and as legacy coordinate pairs.
2d Indexes (page 483) The 2d index supports data stored as legacy coordinate pairs and is intended for use in MongoDB 2.2 and earlier.
geoHaystack Indexes (page 484) A haystack index is a special index optimized to return results over small areas. For
queries that use spherical geometry, a 2dsphere index is a better option than a haystack index.
2d Index Internals (page 484) Provides a more in-depth explanation of the internals of geospatial indexes. This material is not necessary for normal operations but may be useful for troubleshooting and for further understanding.
2dsphere Indexes New in version 2.4.
A 2dsphere index supports queries that calculate geometries on an earth-like sphere. The index supports data stored
as both GeoJSON objects and as legacy coordinate pairs. The index supports legacy coordinate pairs by converting
the data to the GeoJSON Point type. The default datum for an earth-like sphere in MongoDB 2.4 is WGS84.
Coordinate-axis order is longitude, latitude.
478
Chapter 8. Indexes
The 2dsphere index supports all MongoDB geospatial queries: queries for inclusion, intersection and proximity. See
the http://docs.mongodb.org/manual/reference/operator/query-geospatial for the query
operators that support geospatial queries.
To create a 2dsphere index, use the db.collection.createIndex() method. A compound (page 472)
2dsphere index can reference multiple location and non-location fields within a collections documents. See Create
a 2dsphere Index (page 515) for more information.
2dsphere Version 2 Changed in version 2.6.
MongoDB 2.6 introduces a version 2 of 2dsphere indexes. Version 2 is the default version of 2dsphere
indexes created in MongoDB 2.6. To create a 2dsphere index as a version 1, include the option {
"2dsphereIndexVersion": 1 } when creating the index.
Additional GeoJSON Objects Changed in version 2.6.
Version 2 adds support for additional GeoJSON object: MultiPoint (page 481), MultiLineString (page 482), MultiPolygon (page 482), and GeometryCollection (page 482).
sparse Property Changed in version 2.6.
Version 2 2dsphere indexes are sparse (page 490) by default and ignores the sparse: true (page 490) option. If
a document lacks a 2dsphere index field (or the field is null or an empty array), MongoDB does not add an
entry for the document to the 2dsphere index. For inserts, MongoDB inserts the document but does not add to the
2dsphere index.
For a compound index that includes a 2dsphere index key along with keys of other types, only the 2dsphere
index field determines whether the index references a document.
Earlier versions of MongoDB only support Version 1 2dsphere indexes. Version 1 2dsphere indexes are not
sparse by default and will reject documents with null location fields.
Considerations
geoNear and $geoNear Restrictions The geoNear command and the $geoNear pipeline stage require that
a collection have at most only one 2dsphere index and/or only one 2d (page 483) index whereas geospatial query
operators (e.g. $near and $geoWithin) permit collections to have multiple geospatial indexes.
The geospatial index restriction for the geoNear command and the $geoNear pipeline stage exists because neither
the geoNear command nor the $geoNear pipeline stage syntax includes the location field. As such, index selection
among multiple 2d indexes or 2dsphere indexes is ambiguous.
No such restriction applies for geospatial query operators since these operators take a location field, eliminating the
ambiguity.
Shard Key Restrictions You cannot use a 2dsphere index as a shard key when sharding a collection. However,
you can create and maintain a geospatial index on a sharded collection by using a different field as the shard key.
Data Restrictions Fields with 2dsphere (page 478) indexes must hold geometry data in the form of coordinate pairs
or GeoJSON data. If you attempt to insert a document with non-geometry data in a 2dsphere indexed field, or build
a 2dsphere index on a collection where the indexed field has non-geometry data, the operation will fail.
479
] ] ]
480
Chapter 8. Indexes
481
[ -73.9814, 40.7681 ]
]
}
}
],
],
],
],
[
[
[
[
-73.96082,
-73.95544,
-73.96374,
-73.97036,
40.78095
40.78854
40.77715
40.76811
]
]
]
]
],
],
],
]
{ loc:
{
type: "MultiPolygon",
coordinates: [
[ [ [ -73.958, 40.8003 ], [ -73.9498, 40.7968 ], [ -73.9737, 40.7648 ], [ -73.9814, 40.7681
[ [ [ -73.958, 40.8003 ], [ -73.9498, 40.7968 ], [ -73.9737, 40.7648 ], [ -73.958, 40.8003 ]
]
}
}
],
],
],
]
4 http://geojson.org/geojson-spec.html#id6
5 http://geojson.org/geojson-spec.html#id7
6 http://geojson.org/geojson-spec.html#geometrycollection
482
Chapter 8. Indexes
type: "MultiLineString",
coordinates: [
[ [ -73.96943, 40.78519
[ [ -73.96415, 40.79229
[ [ -73.97162, 40.78205
[ [ -73.97880, 40.77247
]
],
],
],
],
[
[
[
[
-73.96082,
-73.95544,
-73.96374,
-73.97036,
40.78095
40.78854
40.77715
40.76811
]
]
]
]
],
],
],
]
}
]
}
}
2d Indexes Use a 2d index for data stored as points on a two-dimensional plane. The 2d index is intended for
legacy coordinate pairs used in MongoDB 2.2 and earlier.
Use a 2d index if:
your database has legacy location data from MongoDB 2.2 or earlier, and
you do not intend to store any location data as GeoJSON objects.
See the http://docs.mongodb.org/manual/reference/operator/query-geospatial for the
query operators that support geospatial queries.
Considerations The geoNear command and the $geoNear pipeline stage require that a collection have at most
only one 2d index and/or only one 2dsphere index (page 478) whereas geospatial query operators (e.g. $near and
$geoWithin) permit collections to have multiple geospatial indexes.
The geospatial index restriction for the geoNear command and the $geoNear pipeline stage exists because neither
the geoNear command nor the $geoNear pipeline stage syntax includes the location field. As such, index selection
among multiple 2d indexes or 2dsphere indexes is ambiguous.
No such restriction applies for geospatial query operators since these operators take a location field, eliminating the
ambiguity.
Do not use a 2d index if your location data includes GeoJSON objects. To index on both legacy coordinate pairs and
GeoJSON objects, use a 2dsphere (page 478) index.
You cannot use a 2d index as a shard key when sharding a collection. However, you can create and maintain a
geospatial index on a sharded collection by using a different field as the shard key.
Behavior The 2d index supports calculations on a flat, Euclidean plane. The 2d index also supports distance-only
calculations on a sphere, but for geometric calculations (e.g. $geoWithin) on a sphere, store data as GeoJSON
objects and use the 2dsphere index type.
A 2d index can reference two fields. The first must be the location field. A 2d compound index constructs queries
that select first on the location field, and then filters those results by the additional criteria. A compound 2d index can
cover queries.
Points on a 2D Plane To store location data as legacy coordinate pairs, use an array or an embedded document.
When possible, use the array format:
loc : [ <longitude> , <latitude> ]
483
Arrays are preferred as certain languages do not guarantee associative map ordering.
For all points, if you use longitude and latitude, store coordinates in longitude, latitude order.
sparse Property 2d indexes are sparse (page 490) by default and ignores the sparse: true (page 490) option. If
a document lacks a 2d index field (or the field is null or an empty array), MongoDB does not add an entry for the
document to the 2d index. For inserts, MongoDB inserts the document but does not add to the 2d index.
For a compound index that includes a 2d index key along with keys of other types, only the 2d index field determines
whether the index references a document.
geoHaystack Indexes A geoHaystack index is a special index that is optimized to return results over small
areas. geoHaystack indexes improve performance on queries that use flat geometry.
For queries that use spherical geometry, a 2dsphere index is a better option than a haystack index. 2dsphere indexes (page 478) allow field reordering; geoHaystack indexes require the first field to be the location field. Also,
geoHaystack indexes are only usable via commands and so always return all results at once.
Behavior geoHaystack indexes create buckets of documents from the same geographic area in order to improve
performance for queries limited to that area. Each bucket in a geoHaystack index contains all the documents within
a specified proximity to a given longitude and latitude.
sparse Property geoHaystack indexes are sparse (page 490) by default and ignore the sparse: true (page 490)
option. If a document lacks a geoHaystack index field (or the field is null or an empty array), MongoDB does
not add an entry for the document to the geoHaystack index. For inserts, MongoDB inserts the document but does
not add to the geoHaystack index.
geoHaystack indexes include one geoHaystack index key and one non-geospatial index key; however, only the
geoHaystack index field determines whether the index references a document.
Create geoHaystack Index To create a geoHaystack index, see Create a Haystack Index (page 521). For
information and example on querying a haystack index, see Query a Haystack Index (page 522).
2d Index Internals This document provides a more in-depth explanation of the internals of MongoDBs 2d geospatial indexes. This material is not necessary for normal operations or application development but may be useful for
troubleshooting and for further understanding.
Calculation of Geohash Values for 2d Indexes When you create a geospatial index on legacy coordinate pairs,
MongoDB computes geohash values for the coordinate pairs within the specified location range (page 519) and then
indexes the geohash values.
To calculate a geohash value, recursively divide a two-dimensional map into quadrants. Then assign each quadrant a
two-bit value. For example, a two-bit representation of four quadrants would be:
01
11
00
10
484
Chapter 8. Indexes
These two-bit values (00, 01, 10, and 11) represent each of the quadrants and all points within each quadrant. For
a geohash with two bits of resolution, all points in the bottom left quadrant would have a geohash of 00. The top
left quadrant would have the geohash of 01. The bottom right and top right would have a geohash of 10 and 11,
respectively.
To provide additional precision, continue dividing each quadrant into sub-quadrants. Each sub-quadrant would have
the geohash value of the containing quadrant concatenated with the value of the sub-quadrant. The geohash for the
upper-right quadrant is 11, and the geohash for the sub-quadrants would be (clockwise from the top left): 1101,
1111, 1110, and 1100, respectively.
Multi-location Documents for 2d Indexes New in version 2.0: Support for multiple locations in a document.
While 2d geospatial indexes do not support more than one set of coordinates in a document, you can use a multi-key
index (page 474) to index multiple coordinate pairs in a single document. In the simplest example you may have a
field (e.g. locs) that holds an array of coordinates, as in the following example:
{ _id : ObjectId(...),
locs : [ [ 55.5 , 42.3 ] ,
[ -74 , 44.74 ] ,
{ lng : 55.5 , lat : 42.3 } ]
}
The values of the array may be either arrays, as in [ 55.5, 42.3 ], or embedded documents, as in { lng :
55.5 , lat : 42.3 }.
You could then create a geospatial index on the locs field, as in the following:
db.places.createIndex( { "locs": "2d" } )
You may also model the location data as a field inside of an embedded document. In this case, the document would
contain a field (e.g. addresses) that holds an array of documents where each document has a field (e.g. loc:) that
holds location coordinates. For example:
{ _id : ObjectId(...),
name : "...",
addresses : [ {
context
loc : [
} ,
{
context
loc : [
}
]
}
: "home" ,
55.5, 42.3 ]
: "home",
-74 , 44.74 ]
You could then create the geospatial index on the addresses.loc field as in the following example:
db.records.createIndex( { "addresses.loc": "2d" } )
To include the location field with the distance field in multi-location document queries, specify includeLocs:
true in the geoNear command.
See also:
geospatial-query-compatibility-chart
485
Text Indexes
486
Chapter 8. Indexes
If the compound text index includes keys preceding the text index key, to perform a $text search, the
query predicate must include equality match conditions on the preceding keys.
See also Text Index and Sort (page 486) for additional limitations.
For an example of a compound text index, see Limit the Number of Entries Scanned (page 529).
Drop a Text Index To drop a text index, pass the name of the index to the db.collection.dropIndex()
method. To get the name of the index, run the getIndexes() method.
For information on the default naming scheme for text indexes as well as overriding the default name, see Specify
Name for text Index (page 527).
Storage Requirements and Performance Costs text indexes have the following storage requirements and performance costs:
text indexes can be large. They contain one index entry for each unique post-stemmed word in each indexed
field for each document inserted.
Building a text index is very similar to building a large multi-key index and will take longer than building a
simple ordered (scalar) index on the same data.
When building a large text index on an existing collection, ensure that you have a sufficiently high limit on
open file descriptors. See the recommended settings (page 281).
text indexes will impact insertion throughput because MongoDB must add an index entry for each unique
post-stemmed word in each indexed field of each new source document.
Additionally, text indexes do not store phrases or information about the proximity of words in the documents.
As a result, phrase queries will run much more effectively when the entire collection fits in RAM.
Text Search Text search supports the search of string content in documents of a collection. MongoDB provides the
$text operator to perform text search in queries and in aggregation pipelines (page 530).
The text search process:
tokenizes and stems the search term(s) during both the index creation and the text command execution.
assigns a score to each document that contains the search term in the indexed fields. The score determines the
relevance of a document to a given search query.
The $text operator can search for words and phrases. The query matches on the complete stemmed words. For
example, if a document field contains the word blueberry, a search on the term blue will not match the document.
However, a search on either blueberry or blueberries will match.
For information and examples on various text search patterns, see the $text query operator. For examples of text
search in aggregation pipeline, see Text Search in the Aggregation Pipeline (page 530).
Hashed Index
487
MongoDB can use the hashed index to support equality queries, but hashed indexes do not support range queries.
You may not create compound indexes that have hashed index fields or specify a unique constraint
on a hashed index; however, you can create both a hashed index and an ascending/descending
(i.e. non-hashed) index on the same field: MongoDB will use the scalar index for range queries.
Warning: MongoDB hashed indexes truncate floating point numbers to 64-bit integers before hashing. For
example, a hashed index would store the same value for a field that held a value of 2.3, 2.2, and 2.9. To
prevent collisions, do not use a hashed index for floating point numbers that cannot be reliably converted to
64-bit integers (and then back to floating point). MongoDB hashed indexes do not support floating point values
larger than 253 .
Create a hashed index using an operation that resembles the following:
db.active.createIndex( { a: "hashed" } )
This operation creates a hashed index for the active collection on the a field.
Behavior
Expiration of Data TTL indexes expire documents after the specified number of seconds has passed since the
indexed field value; i.e. the expiration threshold is the indexed field value plus the specified number of seconds.
If the field is an array, and there are multiple date values in the index, MongoDB uses lowest (i.e. earliest) date value
in the array to calculate the expiration threshold.
If the indexed field in a document is not a date or an array that holds a date value(s), the document will not expire.
If a document does not contain the indexed field, the document will not expire.
488
Chapter 8. Indexes
Delete Operations A background thread in mongod reads the values in the index and removes expired documents
from the collection.
When the TTL thread is active, you will see delete (page 71) operations in the output of db.currentOp() or in the
data collected by the database profiler (page 225).
Timing of the Delete Operation When you build a TTL index in the background (page 493), the TTL thread can
begin deleting documents while the index is building. If you build a TTL index in the foreground, MongoDB begins
removing expired documents as soon as the index finishes building.
The TTL index does not guarantee that expired data will be deleted immediately upon expiration. There may be a
delay between the time a document expires and the time that MongoDB removes the document from the database.
The background task that removes expired documents runs every 60 seconds. As a result, documents may remain in a
collection during the period between the expiration of the document and the running of the background task.
Because the duration of the removal operation depends on the workload of your mongod instance, expired data may
exist for some time beyond the 60 second period between runs of the background task.
Replica Sets On replica sets, the TTL background thread only deletes documents on the primary. However, the TTL
background thread does run on secondaries. Secondary members replicate deletion operations from the primary.
Support for Queries A TTL index supports queries in the same way non-TTL indexes do.
Record Allocation A collection with a TTL index has usePowerOf2Sizes enabled, and you cannot modify this
setting for the collection. As a result of enabling usePowerOf2Sizes, MongoDB must allocate more disk space
relative to data size. This approach helps mitigate the possibility of storage fragmentation caused by frequent delete
operations and leads to more predictable storage use patterns.
Restrictions
TTL indexes are a single-field indexes. Compound indexes (page 472) do not support TTL and ignores the
expireAfterSeconds option.
The _id field does not support TTL indexes.
You cannot create a TTL index on a capped collection (page 208) because MongoDB cannot remove documents
from a capped collection.
You cannot use createIndex() to change the value of expireAfterSeconds of an existing index.
Instead use the collMod database command in conjunction with the index collection flag. Otherwise, to
change the value of the option of an existing index, you must drop the index first and recreate.
If a non-TTL single-field index already exists for a field, you cannot create a TTL index on the same field
since you cannot create indexes that have the same key specification and differ only by the options. To
change a non-TTL single-field index to a TTL index, you must drop the index first and recreate with the
expireAfterSeconds option.
Additional Information
For examples, see Expire Data from Collections by Setting TTL (page 211).
489
Unique Indexes
A unique index causes MongoDB to reject all documents that contain a duplicate value for the indexed field.
To create a unique index, use the db.collection.createIndex() method with the unique option set to
true. For example, to create a unique index on the user_id field of the members collection, use the following
operation in the mongo shell:
db.members.createIndex( { "user_id": 1 }, { unique: true } )
Unique Constraint Across Separate Documents The unique constraint applies to separate documents in the collection. That is, the unique index prevents separate documents from having the same value for the indexed key, but the
index does not prevent a document from having multiple elements or embedded documents in an indexed array from
having the same value. In the case of a single document with repeating values, the repeated value is inserted into the
index only once.
For example, a collection has a unique index on a.b:
db.collection.createIndex( { "a.b": 1 }, { unique: true } )
The unique index permits the insertion of the following document into the collection if no other document in the
collection has the a.b value of 5:
db.collection.insert( { a: [ { b: 5 }, { b: 5 } ] } )
Unique Index and Missing Field If a document does not have a value for the indexed field in a unique index, the
index will store a null value for this document. Because of the unique constraint, MongoDB will only permit one
document that lacks the indexed field. If there is more than one document without a value for the indexed field or is
missing the indexed field, the index build will fail with a duplicate key error.
You can combine the unique constraint with the sparse index (page 490) to filter these null values from the unique
index and avoid the error.
Restrictions You may not specify a unique constraint on a hashed index (page 487).
See also:
Create a Unique Index (page 505)
Sparse Indexes
Sparse indexes only contain entries for documents that have the indexed field, even if the index field contains a null
value. The index skips over any document that is missing the indexed field. The index is sparse because it does not
include all documents of a collection. By contrast, non-sparse indexes contain all documents in a collection, storing
null values for those documents that do not contain the indexed field.
To create a sparse index, use the db.collection.createIndex() method with the sparse option set to
true. For example, the following operation in the mongo shell creates a sparse index on the xmpp_id field of the
addresses collection:
490
Chapter 8. Indexes
Note: Do not confuse sparse indexes in MongoDB with block-level7 indexes in other databases. Think of them as
dense indexes with a specific filter.
Behavior
Create a Sparse Index On A Collection Consider a collection scores that contains the following documents:
{ "_id" : ObjectId("523b6e32fb408eea0eec2647"), "userid" : "newbie" }
{ "_id" : ObjectId("523b6e61fb408eea0eec2648"), "userid" : "abby", "score" : 82 }
{ "_id" : ObjectId("523b6e6ffb408eea0eec2649"), "userid" : "nina", "score" : 90 }
Then, the following query on the scores collection uses the sparse index to return the documents that have the
score field less than ($lt) 90:
db.scores.find( { score: { $lt: 90 } } )
Because the document for the userid "newbie" does not contain the score field and thus does not meet the query
criteria, the query can use the sparse index to return the results:
7 http://en.wikipedia.org/wiki/Database_index#Sparse_index
491
Sparse Index On A Collection Cannot Return Complete Results Consider a collection scores that contains the
following documents:
{ "_id" : ObjectId("523b6e32fb408eea0eec2647"), "userid" : "newbie" }
{ "_id" : ObjectId("523b6e61fb408eea0eec2648"), "userid" : "abby", "score" : 82 }
{ "_id" : ObjectId("523b6e6ffb408eea0eec2649"), "userid" : "nina", "score" : 90 }
Because the document for the userid "newbie" does not contain the score field, the sparse index does not contain
an entry for that document.
Consider the following query to return all documents in the scores collection, sorted by the score field:
db.scores.find().sort( { score: -1 } )
Even though the sort is by the indexed field, MongoDB will not select the sparse index to fulfill the query in order to
return complete results:
{ "_id" : ObjectId("523b6e6ffb408eea0eec2649"), "userid" : "nina", "score" : 90 }
{ "_id" : ObjectId("523b6e61fb408eea0eec2648"), "userid" : "abby", "score" : 82 }
{ "_id" : ObjectId("523b6e32fb408eea0eec2647"), "userid" : "newbie" }
To use the sparse index, explicitly specify the index with hint():
db.scores.find().sort( { score: -1 } ).hint( { score: 1 } )
The use of the index results in the return of only those documents with the score field:
{ "_id" : ObjectId("523b6e6ffb408eea0eec2649"), "userid" : "nina", "score" : 90 }
{ "_id" : ObjectId("523b6e61fb408eea0eec2648"), "userid" : "abby", "score" : 82 }
See also:
explain() and Analyze Query Performance (page 109)
Sparse Index with Unique Constraint Consider a collection scores that contains the following documents:
{ "_id" : ObjectId("523b6e32fb408eea0eec2647"), "userid" : "newbie" }
{ "_id" : ObjectId("523b6e61fb408eea0eec2648"), "userid" : "abby", "score" : 82 }
{ "_id" : ObjectId("523b6e6ffb408eea0eec2649"), "userid" : "nina", "score" : 90 }
You could create an index with a unique constraint (page 490) and sparse filter on the score field using the following
operation:
db.scores.createIndex( { score: 1 } , { sparse: true, unique: true } )
This index would permit the insertion of documents that had unique values for the score field or did not include a
score field. Consider the following insert operation (page 91):
db.scores.insert(
db.scores.insert(
db.scores.insert(
db.scores.insert(
492
{
{
{
{
"userid":
"userid":
"userid":
"userid":
"AAAAAAA", "score": 43 } )
"BBBBBBB", "score": 34 } )
"CCCCCCC" } )
"DDDDDDD" } )
Chapter 8. Indexes
However, the index would not permit the addition of the following documents since documents already exists with
score value of 82 and 90:
db.scores.insert( { "userid": "AAAAAAA", "score": 82 } )
db.scores.insert( { "userid": "BBBBBBB", "score": 90 } )
Background Construction
By default, creating an index blocks all other operations on a database. When building an index on a collection, the
database that holds the collection is unavailable for read or write operations until the index build completes. Any
operation that requires a read or write lock on all databases (e.g. listDatabases) will wait for the foreground index
build to complete.
For potentially long running index building operations, consider the background operation so that the MongoDB
database remains available during the index building operation. For example, to create an index in the background of
the zipcode field of the people collection, issue the following:
db.people.createIndex( { zipcode: 1}, {background: true} )
Behavior
As of MongoDB version 2.4, a mongod instance can build more than one index in the background concurrently.
Changed in version 2.4: Before 2.4, a mongod instance could only build one background index per database at a time.
Changed in version 2.2: Before 2.2, a single mongod instance could only build one index at a time.
Background indexing operations run in the background so that other database operations can run while creating the
index. However, the mongo shell session or connection where you are creating the index will block until the index
build is complete. To continue issuing commands to the database, open another connection or mongo instance.
Queries will not use partially-built indexes: the index will only be usable once the index build is complete.
Note:
If MongoDB is building an index in the background, you cannot perform other administrative operations involving that collection, including running repairDatabase, dropping the collection (i.e.
8.2. Index Concepts
493
db.collection.drop()), and running compact. These operations will return an error during background
index builds.
Performance
The background index operation uses an incremental approach that is slower than the normal foreground index
builds. If the index is larger than the available RAM, then the incremental process can take much longer than the
foreground build.
If your application includes createIndex() operations, and an index doesnt exist for other operational concerns,
building the index can have a severe impact on the performance of the database.
To avoid performance issues, make sure that your application checks for the indexes at start up using the
getIndexes() method or the equivalent method for your driver8 and terminates if the proper indexes do not exist. Always build indexes in production instances using separate application code, during designated maintenance
windows.
Interrupted Index Builds
If a background index build is in progress when the mongod process terminates, when the instance restarts the index
build will restart as foreground index build. If the index build encounters any errors, such as a duplicate key error, the
mongod will exit with an error.
To start the mongod after a failed index build, use the storage.indexBuildRetry or
--noIndexBuildRetry to skip the index build on start up. .. _index-creation-building-indexes-on-secondaries:
Building Indexes on Secondaries
Changed in version 2.6: Secondary members can now build indexes in the background. Previously all index builds on
secondaries were in the foreground.
Background index operations on a replica set secondaries begin after the primary completes building the index. If
MongoDB builds an index in the background on the primary, the secondaries will then build that index in the background.
To build large indexes on secondaries the best approach is to restart one secondary at a time in standalone mode and
build the index. After building the index, restart as a member of the replica set, allow it to catch up with the other
members of the set, and then build the index on the next secondary. When all the secondaries have the new index, step
down the primary, restart it as a standalone, and build the index on the former primary.
The amount of time required to build the index on a secondary must be within the window of the oplog, so that the
secondary can catch up with the primary.
Indexes on secondary members in recovering mode are always built in the foreground to allow them to catch up as
soon as possible.
See Build Indexes on Replica Sets (page 507) for a complete procedure for building indexes on secondaries.
Drop Duplicates
MongoDB cannot create a unique index (page 490) on a field that has duplicate values. To force the creation of a
unique index, you can specify the dropDups option, which will only index the first occurrence of a value for the key,
8 http://api.mongodb.org/
494
Chapter 8. Indexes
true } will delete data from your database. Use with extreme cau-
495
{ qty: 1 }
{ item: 1 }
MongoDB can use the intersection of the two indexes to support the following query:
db.orders.find( { item: "abc123", qty: { $gt: 15 } } )
To determine if MongoDB used index intersection, run explain(); the results of explain() will include either an
AND_SORTED stage or an AND_HASH stage.
Index Prefix Intersection
With index intersection, MongoDB can use an intersection of either the entire index or the index prefix. An index
prefix is a subset of a compound index, consisting of one or more keys starting from the beginning of the index.
Consider a collection orders with the following indexes:
{ qty: 1 }
{ status: 1, ord_date: -1 }
To fulfill the following query which specifies a condition on both the qty field and the status field, MongoDB can
use the intersection of the two indexes:
db.orders.find( { qty: { $gt: 10 } , status: "A" } )
496
Chapter 8. Indexes
The two indexes can, either individually or through index intersection, support all four aforementioned queries.
The choice between creating compound indexes that support your queries or relying on index intersection depends on
the specifics of your system.
See also:
compound indexes (page 472), Create Compound Indexes to Support Several Different Queries (page 533)
Index Intersection and Sort
Index intersection does not apply when the sort() operation requires an index completely separate from the query
predicate.
For example, the orders collection has the following indexes:
{
{
{
{
qty: 1 }
status: 1, ord_date: -1 }
status: 1 }
ord_date: -1 }
MongoDB cannot use index intersection for the following query with sort:
db.orders.find( { qty: { $gt: 10 } } ).sort( { status: 1 } )
That is, MongoDB does not use the { qty: 1 } index for the query, and the separate { status:
{ status: 1, ord_date: -1 } index for the sort.
1 } or the
However, MongoDB can use index intersection for the following query with sort since the index { status:
ord_date: -1 } can fulfill part of the query predicate.
1,
497
The following query uses $elemMatch to require that the array contains at least one single element that matches
both conditions:
db.survey.find( { ratings : { $elemMatch: { $gte: 3, $lte: 6 } } } )
3) are [ [ 3, Infinity ] ];
6) are [ [ -Infinity, 6 ] ].
Because the query uses $elemMatch to join these predicates, MongoDB can intersect the bounds to:
ratings: [ [ 3, 6 ] ]
If the query does not join the conditions on the array field with $elemMatch, MongoDB cannot intersect the multikey
index bounds. Consider the following query:
db.survey.find( { ratings : { $gte: 3, $lte: 6 } } )
The query searches the ratings array for at least one element greater than or equal to 3 and at least one element
less than or equal to 6. Because a single element does not need to meet both criteria, MongoDB does not intersect the
bounds and uses either [ [ 3, Infinity ] ] or [ [ -Infinity, 6 ] ]. MongoDB makes no guarantee
as to which of these two bounds it chooses.
Compound Bounds for Multikey Index
Compounding bounds refers to using bounds for multiple keys of compound index (page 472). For instance, given a
compound index { a: 1, b: 1 } with bounds on field a of [ [ 3, Infinity ] ] and bounds on field
b of [ [ -Infinity, 6 ] ], compounding the bounds results in the use of both bounds:
{ a: [ [ 3, Infinity ] ], b: [ [ -Infinity, 6 ] ] }
If MongoDB cannot compound the two bounds, MongoDB always constrains the index scan by the bound on its
leading field, in this case, a: [ [ 3, Infinity ] ].
Compound Index on an Array Field
Consider a compound multikey index; i.e. a compound index (page 472) where one of the indexed fields is an array.
For example, a collection survey contains documents with a field item and an array field ratings:
{ _id: 1, item: "ABC", ratings: [ 2, 9 ] }
{ _id: 2, item: "XYZ", ratings: [ 4, 3 ] }
Create a compound index (page 472) on the item field and the ratings field:
db.survey.createIndex( { item: 1, ratings: 1 } )
{ $gte:
MongoDB can compound the two bounds to use the combined bounds of:
498
Chapter 8. Indexes
If an array contains embedded documents, to index on fields contained in the embedded documents, use the dotted
field name (page 169) in the index specification. For instance, given the following array of embedded documents:
ratings: [ { score: 2, by: "mn" }, { score: 9, by: "anon" } ]
{
_id: 1,
item: "ABC",
ratings: [ { score: 2, by: "mn" }, { score: 9, by: "anon" } ]
}
{
_id: 2,
item: "XYZ",
ratings: [ { score: 5, by: "anon" }, { score: 7, by: "wv" } ]
}
Create a compound index (page 472) on the non-array field item as well as two fields from an array
ratings.score and ratings.by:
db.survey2.createIndex( { "item": 1, "ratings.score": 1, "ratings.by": 1 } )
MongoDB can compound the bounds for the item key with either the bounds for "ratings.score" or the bounds
for "ratings.by", depending upon the query predicates and the index key values. MongoDB makes no guarantee
as to which bounds it compounds with the item field. For instance, MongoDB will either choose to compound the
item bounds with the "ratings.score" bounds:
{
"item" : [ [ "XYZ", "XYZ" ] ],
"ratings.score" : [ [ -Infinity, 5 ] ],
"ratings.by" : [ [ MinKey, MaxKey ] ]
}
Or, MongoDB may choose to compound the item bounds with "ratings.by" bounds:
{
"item" : [ [ "XYZ", "XYZ" ] ],
"ratings.score" : [ [ MinKey, MaxKey ] ],
499
However, to compound the bounds for "ratings.score" with the bounds for "ratings.by", the query must
use $elemMatch. See Compound Bounds of Index Fields from an Array (page 500) for more information.
Compound Bounds of Index Fields from an Array
same array:
the index keys must share the same field path up to but excluding the field names, and
the query must specify predicates on the fields using $elemMatch on that path.
For a field in an embedded document, the dotted field name (page 169), such as "a.b.c.d", is the field path for
d. To compound the bounds for index keys from the same array, the $elemMatch must be on the path up to but
excluding the field name itself; i.e. "a.b.c".
For instance, create a compound index (page 472) on the ratings.score and the ratings.by fields:
db.survey2.createIndex( { "ratings.score": 1, "ratings.by": 1 } )
The fields "ratings.score" and "ratings.by" share the field path ratings. The following query uses
$elemMatch on the field ratings to require that the array contains at least one single element that matches both
conditions:
db.survey2.find( { ratings: { $elemMatch: { score: { $lte: 5 }, by: "anon" } } } )
{ $lte:
5 } predicate is [ -Infinity, 5 ];
MongoDB can compound the two bounds to use the combined bounds of:
{ "ratings.score" : [ [ -Infinity, 5 ] ], "ratings.by" : [ [ "anon", "anon" ] ] }
Query Without $elemMatch If the query does not join the conditions on the indexed array fields with
$elemMatch, MongoDB cannot compound their bounds. Consider the following query:
db.survey2.find( { "ratings.score": { $lte: 5 }, "ratings.by": "anon" } )
Because a single embedded document in the array does not need to meet both criteria, MongoDB does not compound
the bounds. When using a compound index, if MongoDB cannot constrain all the fields of the index, MongoDB
always constrains the leading field of the index, in this case "ratings.score":
{
"ratings.score": [ [ -Infinity, 5 ] ],
"ratings.by": [ [ MinKey, MaxKey ] ]
}
$elemMatch on Incomplete Path If the query does not specify $elemMatch on the path of the embedded fields,
up to but excluding the field names, MongoDB cannot compound the bounds of index keys from the same array.
For example, a collection survey3 contains documents with a field item and an array field ratings:
500
Chapter 8. Indexes
{
_id: 1,
item: "ABC",
ratings: [ { score: { q1: 2, q2: 5 } }, { score: { q1: 8, q2: 4 } } ]
}
{
_id: 2,
item: "XYZ",
ratings: [ { score: { q1: 7, q2: 8 } }, { score: { q1: 9, q2: 5 } } ]
}
Create a compound index (page 472) on the ratings.score.q1 and the ratings.score.q2 fields:
db.survey3.createIndex( { "ratings.score.q1": 1, "ratings.score.q2": 1 } )
The fields "ratings.score.q1" and "ratings.score.q2" share the field path "ratings.score" and
the $elemMatch must be on that path.
The following query, however, uses an $elemMatch but not on the required path:
db.survey3.find( { ratings: { $elemMatch: { 'score.q1': 2, 'score.q2': 8 } } } )
As such, MongoDB cannot compound the bounds, and the "ratings.score.q2" field will be unconstrained
during the index scan. To compound the bounds, the query must use $elemMatch on the path "ratings.score":
db.survey3.find( { 'ratings.score': { $elemMatch: { 'q1': 2, 'q2': 8 } } } )
Compound $elemMatch Clauses Consider a query that contains multiple $elemMatch clauses on different field
paths, for instance, "a.b": { $elemMatch: ... }, "a.c": { $elemMatch: ... }. MongoDB cannot combine the bounds of the "a.b" with the bounds of "a.c" since "a.b" and "a.c" also require
$elemMatch on the path a.
For example, a collection survey4 contains documents with a field item and an array field ratings:
{
_id: 1,
item: "ABC",
ratings: [
{ score: { q1: 2, q2: 5 }, certainty: { q1: 2, q2: 3 } },
{ score: { q1: 8, q2: 4 }, certainty: { q1: 10, q2: 10 } }
]
}
{
_id: 2,
item: "XYZ",
ratings: [
{ score: { q1: 7, q2: 8 }, certainty: { q1: 5, q2: 5 } },
{ score: { q1: 9, q2: 5 }, certainty: { q1: 7, q2: 7 } }
]
}
Create a compound index (page 472) on the ratings.score.q1 and the ratings.score.q2 fields:
db.survey4.createIndex( {
"ratings.score.q1": 1,
"ratings.score.q2": 1,
"ratings.certainty.q1": 1,
"ratings.certainty.q2": 1
} )
501
the bounds for the "ratings.certainty" predicate are the compound bounds:
{ "ratings.certainty.q1" : [ [ 7, 7 ] ], "ratings.certainty.q2" : [ [ 7, 7 ] ] }
However, MongoDB cannot compound the bounds for "ratings.score" and "ratings.certainty"
since $elemMatch does not join the two. Instead, MongoDB constrains the leading field of the index
"ratings.score.q1" which can be compounded with the bounds for "ratings.score.q2":
{
"ratings.score.q1" : [
"ratings.score.q2" : [
"ratings.certainty.q1"
"ratings.certainty.q2"
[
[
:
:
5, 5 ] ],
5, 5 ] ],
[ [ MinKey, MaxKey ] ],
[ [ MinKey, MaxKey ] ]
Chapter 8. Indexes
Create a Unique Index (page 505) Build an index that enforces unique values for the indexed field or fields.
Create a Sparse Index (page 505) Build an index that omits references to documents that do not include the indexed
field. This saves space when indexing fields that are present in only some documents.
Create a Hashed Index (page 506) Compute a hash of the value of a field in a collection and index the hashed value.
These indexes permit equality queries and may be suitable shard keys for some collections.
Build Indexes on Replica Sets (page 507) To build indexes on a replica set, you build the indexes separately on the
primary and the secondaries, as described here.
Build Indexes in the Background (page 508) Background index construction allows read and write operations to
continue while building the index, but take longer to complete and result in a larger index.
Build Old Style Indexes (page 509) A {v :
2.0 (or later) to MongoDB version 1.8.
Create an Index
Indexes allow MongoDB to process and fulfill queries quickly by creating small and efficient representations of the
documents in a collection. Users can create indexes for any collection on any field in a document. By default,
MongoDB creates an index on the _id field of every collection.
This tutorial describes how to create an index on a single field. MongoDB also supports compound indexes (page 472),
which are indexes on multiple fields. See Create a Compound Index (page 504) for instructions on building compound
indexes.
Create an Index on a Single Field
To create an index, use createIndex() or a similar method from your driver10 . The createIndex() method
only creates an index if an index of the same specification does not already exist.
For example, the following operation creates an index on the userid field of the records collection:
db.records.createIndex( { userid: 1 } )
The value of the field in the index specification describes the kind of index for that field. For example, a value of 1
specifies an index that orders items in ascending order. A value of -1 specifies an index that orders items in descending
order. For additional index types, see Index Types (page 469).
The created index will support queries that select on the field userid, such as the following:
db.records.find( { userid: 2 } )
db.records.find( { userid: { $gt: 10 } } )
But the created index does not support the following query on the profile_url field:
db.records.find( { profile_url: 2 } )
For queries that cannot use an index, MongoDB must scan all documents in a collection for documents that match the
query.
Additional Considerations
Although indexes can improve query performances, indexes also present some operational considerations. See Operational Considerations for Indexes (page 147) for more information.
10 http://api.mongodb.org/
503
If your collection holds a large amount of data, and your application needs to be able to access the data while building
the index, consider building the index in the background, as described in Background Construction (page 493). To
build indexes on replica sets, see the Build Indexes on Replica Sets (page 507) section for more information.
Note: To build or rebuild indexes for a replica set see Build Indexes on Replica Sets (page 507).
Some drivers may specify indexes, using NumberLong(1) rather than 1 as the specification. This does not have any
affect on the resulting index.
See also:
Create a Compound Index (page 504), Indexing Tutorials (page 502) and Index Concepts (page 468) for more information.
Create a Compound Index
Indexes allow MongoDB to process and fulfill queries quickly by creating small and efficient representations of the
documents in a collection. MongoDB supports indexes that include content on a single field, as well as compound
indexes (page 472) that include content from multiple fields. Continue reading for instructions and examples of
building a compound index.
Build a Compound Index
To create a compound index (page 472) use an operation that resembles the following prototype:
db.collection.createIndex( { a: 1, b: 1, c: 1 } )
The value of the field in the index specification describes the kind of index for that field. For example, a value of 1
specifies an index that orders items in ascending order. A value of -1 specifies an index that orders items in descending
order. For additional index types, see Index Types (page 469).
Example
The following operation will create an index on the item, category, and price fields of the products collection:
db.products.createIndex( { item: 1, category: 1, price: 1 } )
Additional Considerations
If your collection holds a large amount of data, and your application needs to be able to access the data while building
the index, consider building the index in the background, as described in Background Construction (page 493). To
build indexes on replica sets, see the Build Indexes on Replica Sets (page 507) section for more information.
Note: To build or rebuild indexes for a replica set see Build Indexes on Replica Sets (page 507).
Some drivers may specify indexes, using NumberLong(1) rather than 1 as the specification. This does not have any
affect on the resulting index.
See also:
Create an Index (page 503), Indexing Tutorials (page 502) and Index Concepts (page 468) for more information.
504
Chapter 8. Indexes
For example, you may want to create a unique index on the "tax-id": of the accounts collection to prevent
storing multiple account records for the same legal entity:
db.accounts.createIndex( { "tax-id": 1 }, { unique: true } )
The _id index (page 471) is a unique index. In some situations you may consider using the _id field itself for this
kind of data rather than using a unique index on another field.
If a document does not have a value for a field, the index entry for that item will be null in any index that includes
it. Thus, in many situations you will want to combine the unique constraint with the sparse option. Sparse
indexes skip over any document that is missing the indexed field, rather than storing null for the index entry. Since
unique indexes cannot have duplicate values for a field, without the sparse option, MongoDB will reject the second
document and all subsequent documents without the indexed field. Consider the following prototype.
db.collection.createIndex( { a: 1 }, { unique: true, sparse: true } )
You can also enforce a unique constraint on compound indexes (page 472), as in the following prototype:
db.collection.createIndex( { a: 1, b: 1 }, { unique: true } )
These indexes enforce uniqueness for the combination of index keys and not for either key individually.
Drop Duplicates
To force the creation of a unique index (page 490) index on a collection with duplicate values in the field you are
indexing you can use the dropDups option. This will force MongoDB to create a unique index by deleting documents
with duplicate values when building the index. Consider the following prototype invocation of createIndex():
db.collection.createIndex( { a: 1 }, { unique: true, dropDups: true } )
See the full documentation of duplicate dropping (page 494) for more information.
Warning: Specifying { dropDups:
tion.
true } may delete data from your database. Use with extreme cau-
505
See also:
Index Concepts (page 468) and Indexing Tutorials (page 502) for more information.
Prototype
To create a sparse index (page 490) on a field, use an operation that resembles the following prototype:
db.collection.createIndex( { a: 1 }, { sparse: true } )
Example
The following operation, creates a sparse index on the users collection that only includes a document in the index if
the twitter_name field exists in a document.
db.users.createIndex( { twitter_name: 1 }, { sparse: true } )
The index excludes all documents that do not include the twitter_name field.
Considerations
Note: Sparse indexes can affect the results returned by the query, particularly with respect to sorts on fields not
included in the index. See the sparse index (page 490) section for more information.
Procedure
To create a hashed index (page 487), specify hashed as the value of the index key, as in the following example:
Example
Specify a hashed index on _id
db.collection.createIndex( { _id: "hashed" } )
506
Chapter 8. Indexes
Considerations
MongoDB supports hashed indexes of any single field. The hashing function collapses embedded documents and
computes the hash for the entire value, but does not support multi-key (i.e. arrays) indexes.
You may not create compound indexes that have hashed index fields.
Build Indexes on Replica Sets
For replica sets, secondaries will begin building indexes after the primary finishes building the index. In sharded
clusters, the mongos will send :method:.createIndex() <db.collection.createIndex()> to the primary members of the
replica set for each shard, which then replicate to the secondaries after the primary finishes building the index.
To minimize the impact of building an index on your replica set, use the following procedure to build indexes:
See
Indexing Tutorials (page 502) and Index Concepts (page 468) for more information.
Considerations
Ensure that your oplog is large enough to permit the indexing or re-indexing operation to complete without
falling too far behind to catch up. See the oplog sizing (page 573) documentation for additional information.
This procedure does take one member out of the replica set at a time. However, this procedure will only affect
one member of the set at a time rather than all secondaries at the same time.
Do not use this procedure when building a unique index (page 490) with the dropDups option.
Before version 2.6 Background index creation operations (page 493) become foreground indexing operations
on secondary members of replica sets. After 2.6, background index builds replicate as background index builds
on the secondaries.
Procedure
Note: If you need to build an index in a sharded cluster, repeat the following procedure for each replica set that
provides each shard.
Stop One Secondary Stop the mongod process on one secondary. Restart the mongod process without the
--replSet option and running on a different port. 11 This instance is now in standalone mode.
For example, if your mongod normally runs with on the default port of 27017 with the --replSet option you
would use the following invocation:
mongod --port 47017
11 By running the mongod on a different port, you ensure that the other members of the replica set and all clients will not contact the member
while you are building the index.
507
Build the Index Create the new index using the createIndex() in the mongo shell, or comparable method in
your driver. This operation will create or rebuild the index on this mongod instance
For example, to create an ascending index on the username field of the records collection, use the following
mongo shell operation:
db.records.createIndex( { username: 1 } )
See also:
Create an Index (page 503) and Create a Compound Index (page 504) for more information.
Restart the Program mongod When the index build completes, start the mongod instance with the --replSet
option on its usual port:
mongod --port 27017 --replSet rs0
Modify the port number (e.g. 27017) or the replica set name (e.g. rs0) as needed.
Allow replication to catch up on this member.
Build Indexes on all Secondaries Changed in version 2.6: Secondary members can now build indexes in the background (page 508). Previously all index builds on secondaries were in the foreground.
For each secondary in the set, build an index according to the following steps:
1. Stop One Secondary (page 507)
2. Build the Index (page 508)
3. Restart the Program mongod (page 508)
Build the Index on the Primary To build an index on the primary you can either:
1. Build the index in the background (page 508) on the primary.
2. Step down the primary using the rs.stepDown() method in the mongo shell to cause the current primary to
become a secondary graceful and allow the set to elect another member as primary.
Then repeat the index building procedure, listed below, to build the index on the primary:
(a) Stop One Secondary (page 507)
(b) Build the Index (page 508)
(c) Restart the Program mongod (page 508)
Building the index on the background, takes longer than the foreground index build and results in a less compact index
structure. Additionally, the background index build may impact write performance on the primary. However, building
the index in the background allows the set to be continuously up for write operations during while MongoDB builds
the index.
Build Indexes in the Background
By default, MongoDB builds indexes in the foreground, which prevents all read and write operations to the database
while the index builds. Also, no operation that requires a read or write lock on all databases (e.g. listDatabases) can
occur during a foreground index build.
Background index construction (page 493) allows read and write operations to continue while building the index.
See also:
508
Chapter 8. Indexes
Index Concepts (page 468) and Indexing Tutorials (page 502) for more information.
Considerations
Background index builds take longer to complete and result in an index that is initially larger, or less compact, than an
index built in the foreground. Over time, the compactness of indexes built in the background will approach foregroundbuilt indexes.
After MongoDB finishes building the index, background-built indexes are functionally identical to any other index.
Procedure
To create an index in the background, add the background argument to the createIndex() operation, as in the
following index:
db.collection.createIndex( { a: 1 }, { background: true } )
Consider the section on background index construction (page 493) for more information about these indexes and their
implications.
Build Old Style Indexes
Important: Use this procedure only if you must have indexes that are compatible with a version of MongoDB earlier
than 2.0.
MongoDB version 2.0 introduced the {v:1} index format. MongoDB versions 2.0 and later support both the {v:1}
format and the earlier {v:0} format.
MongoDB versions prior to 2.0, however, support only the {v:0} format. If you need to roll back MongoDB to a
version prior to 2.0, you must drop and re-create your indexes.
To build pre-2.0 indexes, use the dropIndexes() and :method:.createIndex() <db.collection.createIndex()> methods. You cannot simply reindex the collection. When you reindex on versions that only support {v:0} indexes, the v
fields in the index definition still hold values of 1, even though the indexes would now use the {v:0} format. If you
were to upgrade again to version 2.0 or later, these indexes would not work.
Example
Suppose you rolled back from MongoDB 2.0 to MongoDB 1.8, and suppose you had the following index on the
items collection:
{ "v" : 1, "key" : { "name" : 1 }, "ns" : "mydb.items", "name" : "name_1" }
The v field tells you the index is a {v:1} index, which is incompatible with version 1.8.
To drop the index, issue the following command:
db.items.dropIndex( { name : 1 } )
See also:
Index Performance Enhancements (page 873).
8.3. Indexing Tutorials
509
Where the value of nIndexesWas reflects the number of indexes before removing this index.
For text (page 486) indexes, pass the index name to the db.collection.dropIndex() method. See Use the
Index Name to Drop a text Index (page 528) for details.
Remove All Indexes
You can also use the db.collection.dropIndexes() to remove all indexes, except for the _id index (page 471)
from a collection.
These shell helpers provide wrappers around the dropIndexes database command. Your client library may
have a different or additional interface for these operations.
Modify an Index
To modify an existing index, you need to drop and recreate the index.
510
Chapter 8. Indexes
The method returns a document with the status of the results. The method only creates an index if the index does
not already exist. See Create an Index (page 503) and Index Creation Tutorials (page 502) for more information on
creating indexes.
Step 2: Attempt to modify the index.
To modify an existing index, you cannot just re-issue the createIndex() method with the updated specification
of the index.
For example, the following operation attempts to remove the unique constraint from the previously created index by
using the createIndex() method.
db.orders.createIndex(
{ "cust_id" : 1, "ord_date" : -1, "items" : 1 }
)
The method returns a document with the status of the operation. Upon successful operation, the ok field in the returned
document should specify a 1. See Remove Indexes (page 510) for more information about dropping indexes.
Step 4: Recreate the index without the unique constraint.
The method returns a document with the status of the results. Upon successful operation, the returned document
should show the numIndexesAfter to be greater than numIndexesBefore by one.
See also:
Index Introduction (page 463), Index Concepts (page 468).
511
Rebuild Indexes
If you need to rebuild indexes for a collection you can use the db.collection.reIndex() method to rebuild all
indexes on a collection in a single operation. This operation drops all indexes, including the _id index (page 471), and
then rebuilds all indexes.
See also:
Index Concepts (page 468) and Indexing Tutorials (page 502).
Process
MongoDB will return the following document when the operation completes:
{
"nIndexesWas" : 2,
"msg" : "indexes dropped for collection",
"nIndexes" : 2,
"indexes" : [
{
"key" : {
"_id" : 1,
"tax-id" : 1
},
"ns" : "records.accounts",
"name" : "_id_"
}
],
"ok" : 1
}
This shell helper provides a wrapper around the reIndex database command. Your client library may have
a different or additional interface for this operation.
Additional Considerations
Note: To build or rebuild indexes for a replica set see Build Indexes on Replica Sets (page 507).
To see the status of an indexing process, you can use the db.currentOp() method in the mongo shell. To filter
the current operations for index creation operations, see currentOp-index-creation for an example.
The msg field will include the percent of the build that is complete.
512
Chapter 8. Indexes
To terminate an ongoing index build, use the db.killOp() method in the mongo shell. For index builds, the effects
of db.killOp() may not be immediate and may occur well after much of the index build operation has completed.
You cannot terminate a replicated index build on secondary members of a replica set. To minimize the impact of
building an index on replica sets, see Build Indexes on Replica Sets (page 507).
Changed in version 2.4: Before MongoDB 2.4, you could only terminate background index builds. After 2.4, you can
terminate both background index builds and foreground index builds.
See also:
db.currentOp(), db.killOp()
Return a List of All Indexes
When performing maintenance you may want to check which indexes exist on a collection. Every index on a collection
has a corresponding document in the system.indexes (page 288) collection, and you can use standard queries (i.e.
find()) to list the indexes, or in the mongo shell, the getIndexes() method to return a list of the indexes on a
collection, as in the following examples.
See also:
Index Concepts (page 468) and Indexing Tutorials (page 502) for more information about indexes in MongoDB and
common index management operations.
List all Indexes on a Collection
To return a list of all indexes on a collection, use the db.collection.getIndexes() method or a similar
method for your driver12 .
For example, to view all indexes on the people collection:
db.people.getIndexes()
To return a list of all indexes on all collections in a database, use the following operation in the mongo shell:
db.system.indexes.find()
See system.indexes (page 288) for more information about these documents.
Measure Index Use
Synopsis
Query performance is a good general indicator of index use; however, for more precise insight into index use, MongoDB provides a number of tools that allow you to study query operations and observe index use for your database.
See also:
Index Concepts (page 468) and Indexing Tutorials (page 502) for more information.
12 http://api.mongodb.org/
513
Operations
Return Query Plan with explain() Use the db.collection.explain() or the cursor.explain()
method in executionStats mode to return statistics about the query process, including the index used, the number of
documents scanned, and the time the query takes to process in milliseconds.
Run db.collection.explain() or the cursor.explain() method in allPlansExecution mode to view
partial execution statistics collected during plan selection.
db.collection.explain() provides information on the execution of other operations,
db.collection.update(). See db.collection.explain() for details.
such as
Control Index Use with hint() To force MongoDB to use a particular index for a db.collection.find()
operation, specify the index with the hint() method. Append the hint() method to the find() method. Consider
the following example:
db.people.find(
{ name: "John Doe", zipcode: { $gt: "63000" } }
).hint( { zipcode: 1 } )
To view the execution statistics for a specific index, append to the db.collection.find() the hint() method
followed by cursor.explain(), e.g.:
db.people.find(
{ name: "John Doe", zipcode: { $gt: "63000" } }
).hint( { zipcode: 1 } ).explain("executionStats")
Specify the $natural operator to the hint() method to prevent MongoDB from using any index:
db.people.find(
{ name: "John Doe", zipcode: { $gt: "63000" } }
).hint( { $natural: 1 } )
Instance Index Use Reporting MongoDB provides a number of metrics of index use and operation that you may
want to consider when analyzing index use for your database:
In the output of serverStatus:
indexCounters
scanned
scanAndOrder
In the output of collStats:
totalIndexSize
indexSizes
In the output of dbStats:
dbStats.indexes
dbStats.indexSize
514
Chapter 8. Indexes
The following procedure presents steps to populate a collection with documents that contain a GeoJSON data field
and create 2dsphere indexes (page 478). Although the procedure populates the collection first, you can also create the
indexes before populating the collection.
Procedure
First, populate a collection places with documents that store location data as GeoJSON Point (page 480) in a field
named loc. The coordinate order is longitude, then latitude.
db.places.insert(
{
loc : { type: "Point", coordinates: [ -73.97, 40.77 ] },
name: "Central Park",
category : "Parks"
}
)
db.places.insert(
{
loc : { type: "Point", coordinates: [ -73.88, 40.78 ] },
name: "La Guardia Airport",
category : "Airport"
}
)
515
Create a 2dsphere Index For example, the following creates a 2dsphere (page 478) index on the location field
loc:
db.places.createIndex( { loc : "2dsphere" } )
Create a Compound Index with 2dsphere Index Key A compound index (page 472) can include a 2dsphere
index key in combination with non-geospatial index keys. For example, the following operation creates a compound
index where the first key loc is a 2dsphere index key, and the remaining keys category and names are nongeospatial index keys, specifically descending (-1) and ascending (1) keys respectively.
db.places.createIndex( { loc : "2dsphere" , category : -1, name: 1 } )
Unlike the 2d (page 483) index, a compound 2dsphere index does not require the location field to be the first field
indexed. For example:
db.places.createIndex( { category : 1 , loc : "2dsphere" } )
Considerations
The geoNear command and the $geoNear pipeline stage require that a collection have at most only one 2dsphere
index and/or only one 2d (page 483) index whereas geospatial query operators (e.g. $near and $geoWithin)
permit collections to have multiple geospatial indexes.
The geospatial index restriction for the geoNear command and the $geoNear pipeline stage exists because neither
the geoNear command nor the $geoNear pipeline stage syntax includes the location field. As such, index selection
among multiple 2d indexes or 2dsphere indexes is ambiguous.
No such restriction applies for geospatial query operators since these operators take a location field, eliminating the
ambiguity.
As such, although this tutorial creates multiple 2dsphere indexes, to use the geoNear command or the $geoNear
pipeline stage against the example collection, you will need to drop all but one of the 2dsphere indexes.
To query using the 2dsphere index, see Query a 2dsphere Index (page 516).
Query a 2dsphere Index
The following sections describe queries supported by the 2dsphere index. For an overview of recommended geospatial queries, see geospatial-query-compatibility-chart.
GeoJSON Objects Bounded by a Polygon
The $geoWithin operator queries for location data found within a GeoJSON polygon. Your location data must be
stored in GeoJSON format. Use the following syntax:
db.<collection>.find( { <location field> :
{ $geoWithin :
{ $geometry :
{ type : "Polygon" ,
coordinates : [ <coordinates> ]
} } } } )
The following example selects all points and shapes that exist entirely within a GeoJSON polygon:
516
Chapter 8. Indexes
db.places.find( { loc :
{ $geoWithin :
{ $geometry :
{ type : "Polygon" ,
coordinates : [ [
[
[
[
[
] ]
} } } } )
0
3
6
0
,
,
,
,
0
6
1
0
] ,
] ,
] ,
]
The following example uses $geoIntersects to select all indexed points and shapes that intersect with the polygon
defined by the coordinates array.
db.places.find( { loc :
{ $geoIntersects :
{ $geometry :
{ type : "Polygon" ,
coordinates: [ [
[
[
[
[
] ]
} } } } )
0
3
6
0
,
,
,
,
0
6
1
0
] ,
] ,
] ,
]
Proximity queries return the points closest to the defined point and sorts the results by distance. A proximity query on
GeoJSON data requires a 2dsphere index.
To query for proximity to a GeoJSON point, use either the $near operator or geoNear command. Distance is in
meters.
The $near uses the following syntax:
db.<collection>.find( { <location field> :
{ $near :
{ $geometry :
{ type : "Point" ,
517
The geoNear command offers more options and returns more information than does the $near operator. To run the
command, see geoNear.
Points within a Circle Defined on a Sphere
To select all grid coordinates in a spherical cap on a sphere, use $geoWithin with the $centerSphere operator.
Specify an array that contains:
The grid coordinates of the circles center point
The circles radius measured in radians. To calculate radians, see Calculate Distance Using Spherical Geometry
(page 522).
Use the following syntax:
db.<collection>.find( { <location field> :
{ $geoWithin :
{ $centerSphere :
[ [ <x>, <y> ] , <radius> ] }
} } )
The following example queries grid coordinates and returns all documents within a 10 mile radius of longitude 88 W
and latitude 30 N. The example converts the distance, 10 miles, to radians by dividing by the approximate equatorial
radius of the earth, 3963.2 miles:
db.places.find( { loc :
{ $geoWithin :
{ $centerSphere :
[ [ -88 , 30 ] , 10 / 3963.2 ]
} } } )
Create a 2d Index
To build a geospatial 2d index, use the :method:.createIndex() <db.collection.createIndex()> method and specify 2d.
Use the following syntax:
db.<collection>.createIndex( { <location field> : "2d" ,
<additional field> : <value> } ,
{ <index-specification options> } )
518
Chapter 8. Indexes
By default, a 2d index assumes longitude and latitude and has boundaries of -180 inclusive and 180 non-inclusive. If
documents contain coordinate data outside of the specified range, MongoDB returns an error.
Important: The default boundaries allow applications to insert documents with invalid latitudes greater than 90 or
less than -90. The behavior of geospatial queries with such invalid points is not defined.
On 2d indexes you can change the location range.
You can build a 2d geospatial index with a location range other than the default. Use the min and max options when
creating the index. Use the following syntax:
db.collection.createIndex( { <location field> : "2d" } ,
{ min : <lower bound> , max : <upper bound> } )
By default, a 2d index on legacy coordinate pairs uses 26 bits of precision, which is roughly equivalent to 2 feet or 60
centimeters of precision using the default range of -180 to 180. Precision is measured by the size in bits of the geohash
values used to store location data. You can configure geospatial indexes with up to 32 bits of precision.
Index precision does not affect query accuracy. The actual grid coordinates are always used in the final query processing. Advantages to lower precision are a lower processing overhead for insert operations and use of less space. An
advantage to higher precision is that queries scan smaller portions of the index to return results.
To configure a location precision other than the default, use the bits option when creating the index. Use following
syntax:
db.<collection>.createIndex( {<location field> : "<index type>"} ,
{ bits : <bit precision> } )
For information on the internals of geohash values, see Calculation of Geohash Values for 2d Indexes (page 484).
Query a 2d Index
The following sections describe queries supported by the 2d index. For an overview of recommended geospatial
queries, see geospatial-query-compatibility-chart.
Points within a Shape Defined on a Flat Surface
To select all legacy coordinate pairs found within a given shape on a flat surface, use the $geoWithin operator along
with a shape operator. Use the following syntax:
db.<collection>.find( { <location field> :
{ $geoWithin :
{ $box|$polygon|$center : <coordinates>
} } } )
The following queries for documents within a rectangle defined by [ 0 , 0 ] at the bottom left corner and by [
100 , 100 ] at the top right corner.
519
db.places.find( { loc :
{ $geoWithin :
{ $box : [ [ 0 , 0 ] ,
[ 100 , 100 ] ]
} } } )
The following queries for documents that are within the circle centered on [ -74 , 40.74 ] and with a radius of
10:
db.places.find( { loc: { $geoWithin :
{ $center : [ [-74, 40.74 ] , 10 ]
} } } )
For syntax and examples for each shape, see the following:
$box
$polygon
$center (defines a circle)
Points within a Circle Defined on a Sphere
MongoDB supports rudimentary spherical queries on flat 2d indexes for legacy reasons. In general, spherical calculations should use a 2dsphere index, as described in 2dsphere Indexes (page 478).
To query for legacy coordinate pairs in a spherical cap on a sphere, use $geoWithin with the $centerSphere
operator. Specify an array that contains:
The grid coordinates of the circles center point
The circles radius measured in radians. To calculate radians, see Calculate Distance Using Spherical Geometry
(page 522).
Use the following syntax:
db.<collection>.find( { <location field> :
{ $geoWithin :
{ $centerSphere : [ [ <x>, <y> ] , <radius> ] }
} } )
The following example query returns all documents within a 10-mile radius of longitude 88 W and latitude 30 N. The
example converts distance to radians by dividing distance by the approximate equatorial radius of the earth, 3963.2
miles:
db.<collection>.find( { loc : { $geoWithin :
{ $centerSphere :
[ [ 88 , 30 ] , 10 / 3963.2 ]
} } } )
Proximity queries return the 100 legacy coordinate pairs closest to the defined point and sort the results by distance.
Use either the $near operator or geoNear command. Both require a 2d index.
The $near operator uses the following syntax:
520
Chapter 8. Indexes
The geoNear command offers more options and returns more information than does the $near operator. To run the
command, see geoNear.
Exact Matches on a Flat Surface
Changed in version 2.6: Previously, 2d indexes would support exact-match queries for coordinate pairs.
You cannot use a 2d index to return an exact match for a coordinate pair. Use a scalar, ascending or descending, index
on a field that stores coordinates to return exact matches.
In the following example, the find() operation will return an exact match on a location if you have a { loc:
1} index:
db.<collection>.find( { loc: [ <x> , <y> ] } )
This query will return any documents with the value of [ <x> , <y> ].
Create a Haystack Index
A haystack index must reference two fields: the location field and a second field. The second field is used for exact
matches. Haystack indexes return documents based on location and an exact match on a single additional criterion.
These indexes are not necessarily suited to returning the closest documents to a particular location.
To build a haystack index, use the following syntax:
db.coll.createIndex( { <location field> : "geoHaystack" ,
<additional field> : 1 } ,
{ bucketSize : <bucket value> } )
To build a haystack index, you must specify the bucketSize option when creating the index. A bucketSize
of 5 creates an index that groups location values that are within 5 units of the specified longitude and latitude. The
bucketSize also determines the granularity of the index. You can tune the parameter to the distribution of your
data so that in general you search only very small regions. The areas defined by buckets can overlap. A document can
exist in multiple buckets.
Example
If you have a collection with documents that contain fields similar to the following:
{ _id : 100, pos: { lng : 126.9, lat : 35.2 } , type : "restaurant"}
{ _id : 200, pos: { lng : 127.5, lat : 36.1 } , type : "restaurant"}
{ _id : 300, pos: { lng : 128.0, lat : 36.7 } , type : "national park"}
The following operations create a haystack index with buckets that store keys within 1 unit of longitude or latitude.
db.places.createIndex( { pos : "geoHaystack", type : 1 } ,
{ bucketSize : 1 } )
This index stores the document with an _id field that has the value 200 in two different buckets:
8.3. Indexing Tutorials
521
In a bucket that includes the document where the _id field has a value of 100
In a bucket that includes the document where the _id field has a value of 300
To query using a haystack index you use the geoSearch command. See Query a Haystack Index (page 522).
By default, queries that use a haystack index return 50 documents.
Query a Haystack Index
A haystack index is a special 2d geospatial index that is optimized to return results over small areas. To create a
haystack index see Create a Haystack Index (page 521).
To query a haystack index, use the geoSearch command. You must specify both the coordinates and the additional
field to geoSearch. For example, to return all documents with the value restaurant in the type field near the
example point, the command would resemble:
db.runCommand( { geoSearch : "places" ,
search : { type: "restaurant" } ,
near : [-74, 40.74] ,
maxDistance : 10 } )
Note: Haystack indexes are not suited to queries for the complete list of documents closest to a particular location.
The closest documents could be more distant compared to the bucket size.
Note: Spherical query operations (page 522) are not currently supported by haystack indexes.
The find() method and geoNear command cannot access the haystack index.
true } option.
Important: These three queries use radians for distance. Other query types do not.
For spherical query operators to function properly, you must convert distances to radians, and convert from radians to
the distances units used by your application.
To convert:
distance to radians: divide the distance by the radius of the sphere (e.g. the Earth) in the same units as the
distance measurement.
radians to distance: multiply the radian measure by the radius of the sphere (e.g. the Earth) in the units system
that you want to convert the distance to.
522
Chapter 8. Indexes
The equatorial radius of the Earth is approximately 3,963.2 miles or 6,378.1 kilometers.
The following query would return documents from the places collection within the circle described by the center [
-74, 40.74 ] with a radius of 100 miles:
db.places.find( { loc: { $geoWithin: { $centerSphere: [ [ -74, 40.74 ] ,
100 / 3963.2 ] } } } )
You may also use the distanceMultiplier option to the geoNear to convert radians in the mongod process,
rather than in your application code. See distance multiplier (page 523).
The following spherical query, returns all documents in the collection places within 100 miles from the point [
-74, 40.74 ].
db.runCommand( { geoNear: "places",
near: [ -74, 40.74 ],
spherical: true
} )
Warning: Spherical queries that wrap around the poles or at the transition from -180 to 180 longitude raise an
error.
Note: While the default Earth-like bounds for geospatial indexes are between -180 inclusive, and 180, valid values
for latitude are between -90 and 90.
Distance Multiplier
The distanceMultiplier option of the geoNear command returns distances only after multiplying the results
by an assigned value. This allows MongoDB to return converted values, and removes the requirement to convert units
in application logic.
523
Using distanceMultiplier in spherical queries provides results from the geoNear command that do not need
radian-to-distance conversion. The following example uses distanceMultiplier in the geoNear command
with a spherical (page 522) example:
db.runCommand( { geoNear: "places",
near: [ -74, 40.74 ],
spherical: true,
distanceMultiplier: 3963.2
} )
524
Chapter 8. Indexes
The following example creates a text index on the fields subject and content:
db.collection.createIndex(
{
subject: "text",
content: "text"
}
)
This text index catalogs all string data in the subject field and the content field, where the field value is either
a string or an array of string elements.
Index All Fields
To allow for text search on all fields with string content, use the wildcard specifier ($**) to index all fields that contain
string content.
The following example indexes any string value in the data of every field of every document in collection and
names the index TextIndex:
db.collection.createIndex(
{ "$**": "text" },
{ name: "TextIndex" }
)
Note: In order to drop a text index, use the index name. See Use the Index Name to Drop a text Index (page 528)
for more information.
The default language associated with the indexed data determines the rules to parse word roots (i.e. stemming) and
ignore stop words. The default language for the indexed data is english.
To specify a different language, use the default_language option when creating the text index. See Text Search
Languages (page 538) for the languages available for default_language.
The following example creates for the quotes collection a text index on the content field and sets the
default_language to spanish:
db.quotes.createIndex(
{ content : "text" },
{ default_language: "spanish" }
)
525
Changed in version 2.6: Added support for language overrides within embedded documents.
Specify the Index Language within the Document If a collection contains documents or embedded documents that
are in different languages, include a field named language in the documents or embedded documents and specify
as its value the language for that document or embedded document.
MongoDB will use the specified language for that document or embedded document when building the text index:
The specified language in the document overrides the default language for the text index.
The specified language in an embedded document override the language specified in an enclosing document or
the default language for the index.
See Text Search Languages (page 538) for a list of supported languages.
For example, a collection quotes contains multi-language documents that include the language field in the document and/or the embedded document as needed:
{
_id: 1,
language: "portuguese",
original: "A sorte protege os audazes.",
translation:
[
{
language: "english",
quote: "Fortune favors the bold."
},
{
language: "spanish",
quote: "La suerte protege a los audaces."
}
]
}
{
_id: 2,
language: "spanish",
original: "Nada hay ms surrealista que la realidad.",
translation:
[
{
language: "english",
quote: "There is nothing more surreal than reality."
},
{
language: "french",
quote: "Il n'y a rien de plus surraliste que la ralit."
}
]
}
{
_id: 3,
original: "is this a dagger which I see before me.",
translation:
{
language: "spanish",
quote: "Es este un pual que veo delante de m."
526
Chapter 8. Indexes
}
}
If you create a text index on the quote field with the default language of English.
db.quotes.createIndex( { original: "text", "translation.quote": "text" } )
Then, for the documents and embedded documents that contain the language field, the text index uses that language to parse word stems and other linguistic characteristics.
For embedded documents that do not contain the language field,
If the enclosing document contains the language field, then the index uses the documents language for the
embedded document.
Otherwise, the index uses the default language for the embedded documents.
For documents that do not contain the language field, the index uses the default language, which is English.
Use any Field to Specify the Language for a Document To use a field with a name other than language, include
the language_override option when creating the index.
For example, give the following command to use idioma as the field name instead of language:
db.quotes.createIndex( { quote : "text" },
{ language_override: "idioma" } )
The documents of the quotes collection may specify a language with the idioma field:
{ _id: 1, idioma: "portuguese", quote: "A sorte protege os audazes" }
{ _id: 2, idioma: "spanish", quote: "Nada hay ms surrealista que la realidad." }
{ _id: 3, idioma: "english", quote: "is this a dagger which I see before me" }
The text index, like other indexes, must fall within the index name length limit.
Specify a Name for text Index
To avoid creating an index with a name that exceeds the index name length limit, you can pass the name
option to the db.collection.createIndex() method:
527
db.collection.createIndex(
{
content: "text",
"users.comments": "text",
"users.profiles": "text"
},
{
name: "MyTextIndex"
}
)
Whether the text (page 486) index has the default name or you specified a name for the text (page 486) index, to drop
the text (page 486) index, pass the index name to the db.collection.dropIndex() method.
For example, consider the index created by the following operation:
db.collection.createIndex(
{
content: "text",
"users.comments": "text",
"users.profiles": "text"
},
{
name: "MyTextIndex"
}
)
Then, to remove this text index, pass the name "MyTextIndex" to the db.collection.dropIndex()
method, as in the following:
db.collection.dropIndex("MyTextIndex")
528
Chapter 8. Indexes
{ _id: 2,
content: "Who doesn't like cake?",
about: "food",
keywords: [ "cake", "food", "dessert" ]
}
To create a text index with different field weights for the content field and the keywords field, include the
weights option to the createIndex() method. For example, the following command creates an index on three
fields and assigns weights to two of the fields:
db.blog.createIndex(
{
content: "text",
keywords: "text",
about: "text"
},
{
weights: {
content: 10,
keywords: 5,
},
name: "TextIndex"
}
)
_id:
_id:
_id:
_id:
_id:
_id:
1,
2,
3,
4,
5,
6,
dept:
dept:
dept:
dept:
dept:
dept:
Consider the common use case that performs text searches by individual departments, such as:
db.inventory.find( { dept: "kitchen", $text: { $search: "green" } } )
To limit the text search to scan only those documents within a specific dept, create a compound index that first specifies an ascending/descending index key on the field dept and then a text index key on the field description:
529
db.inventory.createIndex(
{
dept: 1,
description: "text"
}
)
Then, the text search within a particular department will limit the scan of indexed documents. For example, the
following query scans only those documents with dept equal to kitchen:
db.inventory.find( { dept: "kitchen", $text: { $search: "green" } } )
Note:
A compound text index cannot include any other special index types, such as multi-key (page 474) or geospatial (page 478) index fields.
If the compound text index includes keys preceding the text index key, to perform a $text search, the
query predicate must include equality match conditions on the preceding keys.
See also:
Text Indexes (page 486)
Text Search in the Aggregation Pipeline
New in version 2.6. In the aggregation pipeline, text search is available via the use of the $text query operator in
the $match stage.
Restrictions
The $text operator assigns a score to each document that contains the search term in the indexed fields. The score
represents the relevance of a document to a given text search query. The score can be part of a $sort pipeline
specification as well as part of the projection expression. The { $meta: "textScore" } expression provides
information on the processing of the $text operation. See $meta aggregation for details on accessing the score for
projection or sort.
The metadata is only available after the $match stage that includes the $text operation.
530
Chapter 8. Indexes
Examples The following examples assume a collection articles that has a text index on the field subject:
db.articles.createIndex( { subject: "text" } )
The following aggregation searches for the term cake in the $match stage and calculates the total views for the
matching documents in the $group stage.
db.articles.aggregate(
[
{ $match: { $text: { $search: "cake" } } },
{ $group: { _id: null, views: { $sum: "$views" } } }
]
)
To sort by the text search score, include a $meta expression in the $sort stage. The following example matches on
either the term cake or tea, sorts by the textScore in descending order, and returns only the title field in the
results set.
db.articles.aggregate(
[
{ $match: { $text: { $search: "cake tea" } } },
{ $sort: { score: { $meta: "textScore" } } },
{ $project: { title: 1, _id: 0 } }
]
)
The specified metadata determines the sort order. For example, the "textScore" metadata sorts in descending
order. See $meta for more information on metadata as well as an example of overriding the default sort order of the
metadata.
Match on Text Score
The "textScore" metadata is available for projections, sorts, and conditions subsequent the $match stage that
includes the $text operation.
The following example matches on either the term cake or tea, projects the title and the score fields, and then
returns only those documents with a score greater than 1.0.
db.articles.aggregate(
[
{ $match: { $text: { $search: "cake tea" } } },
{ $project: { title: 1, _id: 0, score: { $meta: "textScore" } } },
{ $match: { score: { $gt: 1.0 } } }
]
)
The following aggregation searches in spanish for documents that contain the term saber but not the term claro in
the $match stage and calculates the total views for the matching documents in the $group stage.
8.3. Indexing Tutorials
531
db.articles.aggregate(
[
{ $match: { $text: { $search: "saber -claro", $language: "es" } } },
{ $group: { _id: null, views: { $sum: "$views" } } }
]
)
If you only ever query on a single key in a given collection, then you need to create just one single-key index for that
collection. For example, you might create an index on category in the product collection:
db.products.createIndex( { "category": 1 } )
532
Chapter 8. Indexes
If you sometimes query on only one key and at other times query on that key combined with a second key, then creating
a compound index is more efficient than creating a single-key index. MongoDB will use the compound index for both
queries. For example, you might create an index on both category and item.
db.products.createIndex( { "category": 1, "item": 1 } )
This allows you both options. You can query on just category, and you also can query on category combined
with item. A single compound index (page 472) on multiple fields can support all the queries that search a prefix
subset of those fields.
Example
The following index on a collection:
{ x: 1, y: 1, z: 1 }
There are some situations where the prefix indexes may offer better query performance: for example if z is a large
array.
The { x:
1, y:
1, z:
1 } index can also support many of the same queries as the following index:
{ x: 1, z: 1 }
Also, { x:
1, z:
db.collection.find( { x: 5 } ).sort( { z: 1} )
The { x: 1, z: 1 } index supports both the query and the sort operation, while the { x: 1, y: 1,
z: 1 } index only supports the query. For more information on sorting, see Use Indexes to Sort Query Results
(page 533).
Starting in version 2.6, MongoDB can use index intersection (page 495) to fulfill queries. The choice between creating
compound indexes that support your queries or relying on index intersection depends on the specifics of your system.
See Index Intersection and Compound Indexes (page 496) for more details.
Use Indexes to Sort Query Results
In MongoDB, sort operations can obtain the sort order by retrieving documents based on the ordering in an index. If
the query planner cannot obtain the sort order from an index, it will sort the results in memory. Sort operations that
use an index often have better performance than those that do not use an index. In addition, sort operations that do not
use an index will abort when they use 32 megabytes of memory.
Sort with a Single Field Index
If an ascending or a descending index is on a single field, the sort operation on the field can be in either direction.
For example, create an ascending index on the field a for a collection records:
db.records.createIndex( { a: 1 } )
533
The index can also support the following descending sort on a by traversing the index in reverse order:
db.records.find().sort( { a: -1 } )
The following query and sort operations use the index prefixes to sort the results. These operations do not need to sort
the result set in memory.
Example
db.data.find().sort(
db.data.find().sort(
db.data.find().sort(
db.data.find().sort(
db.data.find().sort(
db.data.find( { a:
1 } )
{
{
{
{
{
a:
a:
a:
a:
a:
{ $gt:
1 } )
-1 } )
1, b: 1 } )
-1, b: -1 } )
1, b: 1, c: 1 } )
4 } } ).sort( { a:
1, b:
Index Prefix
{ a: 1 }
{ a: 1 }
{ a: 1, b:
{ a: 1, b:
{ a: 1, b:
1 }
{ a: 1, b:
1 }
1 }
1, c:
1 }
Consider the following example in which the prefix keys of the index appear in both the query predicate and the sort:
db.data.find( { a: { $gt: 4 } } ).sort( { a: 1, b: 1 } )
In such cases, MongoDB can use the index to retrieve the documents in order specified by the sort. As the example
shows, the index prefix in the query predicate can be different from the prefix in the sort.
534
Chapter 8. Indexes
Sort and Non-prefix Subset of an Index An index can support sort operations on a non-prefix subset of the index
key pattern. To do so, the query must include equality conditions on all the prefix keys that precede the sort keys.
For example, the collection data has the following index:
{ a: 1, b: 1, c: 1, d: 1 }
The following operations can use the index to get the sort order:
Example
db.data.find( { a:
5 } ).sort( { b:
db.data.find( { b:
db.data.find( { a:
1 } )
1, c:
1 } )
3, a:
4 } ).sort( { c:
1 } )
5, b:
{ $lt:
3} } ).sort( { b:
Index Prefix
{ a: 1 , b: 1, c:
1 }
{ a: 1, b: 1, c: 1
}
{ a: 1, b: 1 }
As the last operation shows, only the index fields preceding the sort subset must have the equality conditions in the
query document; the other index fields may specify other conditions.
If the query does not specify an equality condition on an index prefix that precedes or overlaps with the sort specification, the operation will not efficiently use the index. For example, the following operations specify a sort document
of { c: 1 }, but the query documents do not contain equality matches on the preceding index fields a and b:
db.data.find( { a: { $gt: 2 } } ).sort( { c: 1 } )
db.data.find( { c: 5 } ).sort( { c: 1 } )
1, b:
1, c:
1, d:
The above example shows an index size of almost 4.3 gigabytes. To ensure this index fits in RAM, you must not only
have more than that much RAM available but also must have RAM available for the rest of the working set. Also
remember:
If you have and use multiple collections, you must consider the size of all indexes on all collections. The indexes and
the working set must be able to fit in memory at the same time.
There are some limited cases where indexes do not need to fit in memory. See Indexes that Hold Only Recent Values
in RAM (page 535).
See also:
collStats and db.collection.stats()
Indexes that Hold Only Recent Values in RAM
Indexes do not have to fit entirely into RAM in all cases. If the value of the indexed field increments with every insert,
and most queries select recently added documents; then MongoDB only needs to keep the parts of the index that hold
8.3. Indexing Tutorials
535
the most recent or right-most values in RAM. This allows for efficient index use for read and write operations and
minimize the amount of RAM required to support the index.
Create Queries that Ensure Selectivity
Selectivity is the ability of a query to narrow results using the index. Effective indexes are more selective and allow
MongoDB to use the index for a larger portion of the work associated with fulfilling the query.
To ensure selectivity, write queries that limit the number of possible documents with the indexed field. Write queries
that are appropriately selective relative to your indexed data.
Example
Suppose you have a field called status where the possible values are new and processed. If you add an index
on status youve created a low-selectivity index. The index will be of little help in locating records.
A better strategy, depending on your queries, would be to create a compound index (page 472) that includes the lowselectivity field and another field. For example, you could create a compound index on status and created_at.
Another option, again depending on your use case, might be to use separate collections, one for each status.
Example
Consider an index { a : 1 } (i.e. an index on the key a sorted in ascending order) on a collection where a has
three values evenly distributed across the collection:
{
{
{
{
{
{
{
{
{
_id:
_id:
_id:
_id:
_id:
_id:
_id:
_id:
_id:
ObjectId(),
ObjectId(),
ObjectId(),
ObjectId(),
ObjectId(),
ObjectId(),
ObjectId(),
ObjectId(),
ObjectId(),
a:
a:
a:
a:
a:
a:
a:
a:
a:
1,
1,
1,
2,
2,
2,
3,
3,
3,
b:
b:
b:
b:
b:
b:
b:
b:
b:
"ab"
"cd"
"ef"
"jk"
"lm"
"no"
"pq"
"rs"
"tv"
}
}
}
}
}
}
}
}
}
If you query for { a: 2, b: "no" } MongoDB must scan 3 documents in the collection to return the one
matching result. Similarly, a query for { a: { $gt: 1}, b: "tv" } must scan 6 documents, also to
return one result.
Consider the same index on a collection where a has nine values evenly distributed across the collection:
{
{
{
{
{
{
{
{
{
_id:
_id:
_id:
_id:
_id:
_id:
_id:
_id:
_id:
ObjectId(),
ObjectId(),
ObjectId(),
ObjectId(),
ObjectId(),
ObjectId(),
ObjectId(),
ObjectId(),
ObjectId(),
a:
a:
a:
a:
a:
a:
a:
a:
a:
1,
2,
3,
4,
5,
6,
7,
8,
9,
b:
b:
b:
b:
b:
b:
b:
b:
b:
"ab"
"cd"
"ef"
"jk"
"lm"
"no"
"pq"
"rs"
"tv"
}
}
}
}
}
}
}
}
}
If you query for { a: 2, b: "cd" }, MongoDB must scan only one document to fulfill the query. The index
and query are more selective because the values of a are evenly distributed and the query can select a specific document
using the index.
However, although the index on a is more selective, a query such as { a:
still need to scan 4 documents.
536
{ $gt:
5 }, b:
"tv" } would
Chapter 8. Indexes
If overall selectivity is low, and if MongoDB must read a number of documents to return results, then some queries
may perform faster without indexes. To determine performance, see Measure Index Use (page 513).
For a conceptual introduction to indexes in MongoDB see Index Concepts (page 468).
Description
Builds one or more indexes for a collection.
Removes indexes from a collection.
Defragments a collection and rebuilds the indexes.
Rebuilds all indexes on a collection.
Internal command that scans for a collections data and indexes for correctness.
Experimental command that collects and aggregates statistics on all indexes.
Performs a geospatial query that returns the documents closest to a given point.
Performs a geospatial query that uses MongoDBs haystack index functionality.
An internal command to support geospatial queries.
Internal command that validates index on shard key.
537
Description
Forces MongoDB to report on query execution plans. See explain().
Forces MongoDB to use a specific index. See hint()
Specifies an exclusive upper limit for the index to use in a query. See max().
Specifies an inclusive lower limit for the index to use in a query. See min().
Forces the cursor to only return fields included in the index.
Forces the query to use the index on the _id field. See snapshot().
538
Chapter 8. Indexes
es or spanish
sv or swedish
tr or turkish
Note: If you specify a language value of "none", then the text search uses simple tokenization with no list of stop
words and no stemming.
539
540
Chapter 8. Indexes
CHAPTER 9
Replication
A replica set in MongoDB is a group of mongod processes that maintain the same data set. Replica sets provide
redundancy and high availability, and are the basis for all production deployments. This section introduces replication
in MongoDB as well as the components and architecture of replica sets. The section also provides tutorials for common
tasks related to replica sets.
Replication Introduction (page 541) An introduction to replica sets, their behavior, operation, and use.
Replication Concepts (page 545) The core documentation of replica set operations, configurations, architectures and
behaviors.
Replica Set Members (page 545) Introduces the components of replica sets.
Replica Set Deployment Architectures (page 553) Introduces architectural considerations related to replica
sets deployment planning.
Replica Set High Availability (page 560) Presents the details of the automatic failover and recovery process
with replica sets.
Replica Set Read and Write Semantics (page 565) Presents the semantics for targeting read and write operations to the replica set, with an awareness of location and set configuration.
Replica Set Tutorials (page 581) Tutorials for common tasks related to the use and maintenance of replica sets.
Replication Reference (page 631) Reference for functions and operations related to replica sets.
541
The secondaries replicate the primarys oplog and apply the operations to their data sets. Secondaries data sets reflect
the primarys data set. If the primary is unavailable, the replica set will elect a secondary to be primary. By default,
clients read from the primary, however, clients can specify a read preferences (page 568) to send read operations to
secondaries. Reads from secondaries may return data that does not reflect the state of the primary. See secondaries
(page 547) for more information.
You may add an extra mongod instance to a replica set as an arbiter. Arbiters do not maintain a data set. Arbiters
only exist to vote in elections. If your replica set has an even number of members, add an arbiter to obtain a majority
of votes in an election for primary. Arbiters do not require dedicated hardware. See arbiter (page 552) for more
information.
An arbiter will always be an arbiter. A primary may step down and become a secondary. A secondary may become
542
Chapter 9. Replication
543
See Replica Set Elections (page 561) and Rollbacks During Replica Set Failover (page 564) for more information.
544
Chapter 9. Replication
Additional Features
Replica sets provide a number of options to support application needs. For example, you may deploy a replica set
with members in multiple data centers (page 559), or control the outcome of elections by adjusting the priority of
some members. Replica sets also support dedicated members for reporting, disaster recovery, or backup functions.
See Priority 0 Replica Set Members (page 548), Hidden Replica Set Members (page 550) and Delayed Replica Set
Members (page 551) for more information.
545
You can also maintain an arbiter (page ??) as part of a replica set. Arbiters do not keep a copy of the data. However,
arbiters play a role in the elections that select a primary if the current primary is unavailable.
The minimum requirements for a replica set are: A primary (page ??), a secondary (page ??), and an arbiter (page ??).
Most deployments, however, will keep three members that store data: A primary (page ??) and two secondary members
(page ??).
Changed in version 3.0.0: A replica set can have up to 50 members (page 769) but only 7 voting members.
previous versions, replica sets can have up to 12 members.
In
1 While replica sets are the recommended solution for production, a replica set can support up to 50 members in total. If your deployment
requires more than 50 members, youll need to use master-slave (page 575) replication. However, master-slave replication lacks the automatic
failover capabilities.
546
Chapter 9. Replication
All members of the replica set can accept read operations. However, by default, an application directs its read operations to the primary member. See Read Preference (page 568) for details on changing the default read behavior.
The replica set can have at most one primary. If the current primary becomes unavailable, an election determines the
new primary. See Replica Set Elections (page 561) for more details.
In the following 3-member replica set, the primary becomes unavailable. This triggers an election which selects one
of the remaining secondaries as the new primary.
547
In the following three-member replica set, the primary becomes unavailable. This triggers an election where one of
the remaining secondaries becomes the new primary.
See Replica Set Elections (page 561) for more details.
You can configure a secondary member for a specific purpose. You can configure a secondary to:
Prevent it from becoming a primary in an election, which allows it to reside in a secondary data center or to
serve as a cold standby. See Priority 0 Replica Set Members (page 548).
Prevent applications from reading from it, which allows it to run applications that require separation from normal
traffic. See Hidden Replica Set Members (page 550).
Keep a running historical snapshot for use in recovery from certain errors, such as unintentionally deleted
databases. See Delayed Replica Set Members (page 551).
Priority 0 Replica Set Members
A priority 0 member is a secondary that cannot become primary. Priority 0 members cannot trigger elections.
Otherwise these members function as normal secondaries. A priority 0 member maintains a copy of the data set,
accepts read operations, and votes in elections. Configure a priority 0 member to prevent secondaries from becoming
primary, which is particularly useful in multi-data center deployments.
In a three-member replica set, in one data center hosts the primary and a secondary. A second data center hosts one
priority 0 member that cannot become primary.
Priority 0 Members as Standbys A priority 0 member can function as a standby. In some replica sets, it might not
be possible to add a new member in a reasonable amount of time. A standby member keeps a current copy of the data
to be able to replace an unavailable member.
In many cases, you need not set standby to priority 0. However, in sets with varied hardware or geographic distribution
(page 559), a priority 0 standby ensures that only qualified members become primary.
A priority 0 standby may also be valuable for some members of a set with different hardware or workload profiles.
In these cases, deploy a member with priority 0 so it cant become primary. Also consider using an hidden member
(page 550) for this purpose.
If your set already has seven voting members, also configure the member as non-voting (page 564).
548
Chapter 9. Replication
549
Priority 0 Members and Failover When configuring a priority 0 member, consider potential failover patterns,
including all possible network partitions. Always ensure that your main data center contains both a quorum of voting
members and contains members that are eligible to be primary.
Configuration To configure a priority 0 member, see Prevent Secondary from Becoming Primary (page 601).
Hidden Replica Set Members
A hidden member maintains a copy of the primarys data set but is invisible to client applications. Hidden members
are good for workloads with different usage patterns from the other members in the replica set. Hidden members must
always be priority 0 members (page 548) and so cannot become primary. The db.isMaster() method does not
display hidden members. Hidden members, however, may vote in elections (page 561).
In the following five-member replica set, all four secondary members have copies of the primarys data set, but one of
the secondary members is hidden.
Behavior
Read Operations Clients will not distribute reads with the appropriate read preference (page 568) to hidden members. As a result, these members receive no traffic other than basic replication. Use hidden members for dedicated
tasks such as reporting and backups. Delayed members (page 551) should be hidden.
In a sharded cluster, mongos do not interact with hidden members.
Voting Hidden members do vote in replica set elections. If you stop a hidden member, ensure that the set has an
active majority or the primary will step down.
For the purposes of backups, you can avoid stopping a hidden member with the db.fsyncLock() and
db.fsyncUnlock() operations to flush all writes and lock the mongod instance for the duration of the backup
operation.
Further Reading For more information about backing up MongoDB databases, see MongoDB Backup Methods
(page 182). To configure a hidden member, see Configure a Hidden Replica Set Member (page 603).
550
Chapter 9. Replication
Delayed members contain copies of a replica sets data set. However, a delayed members data set reflects an earlier,
or delayed, state of the set. For example, if the current time is 09:52 and a member has a delay of an hour, the delayed
member has no operation more recent than 08:52.
Because delayed members are a rolling backup or a running historical snapshot of the data set, they may help
you recover from various kinds of human error. For example, a delayed member can make it possible to recover from
unsuccessful application upgrades and operator errors including dropped databases and collections.
Considerations
Requirements Delayed members:
Must be priority 0 (page 548) members. Set the priority to 0 to prevent a delayed member from becoming
primary.
Should be hidden (page 550) members. Always prevent applications from seeing and querying delayed members.
do vote in elections for primary.
Behavior Delayed members apply operations from the oplog on a delay. When choosing the amount of delay,
consider that the amount of delay:
must be is equal to or greater than your maintenance windows.
must be smaller than the capacity of the oplog. For more information on oplog size, see Oplog Size (page 573).
Sharding In sharded clusters, delayed members have limited utility when the balancer is enabled. Because delayed
members replicate chunk migrations with a delay, the state of delayed members in a sharded cluster are not useful for
recovering to a previous state of the sharded cluster if any migrations occur during the delay window.
Example In the following 5-member replica set, the primary and all secondaries have copies of the data set. One
member applies operations with a delay of 3600 seconds, or an hour. This delayed member is also hidden and is a
priority 0 member.
551
Configuration A delayed member has its priority equal to 0, hidden equal to true, and its slaveDelay
equal to the number of seconds of delay:
{
"_id" : <num>,
"host" : <hostname:port>,
"priority" : 0,
"slaveDelay" : <seconds>,
"hidden" : true
}
To configure a delayed member, see Configure a Delayed Replica Set Member (page 604).
Replica Set Arbiter
An arbiter does not have a copy of data set and cannot become a primary. Replica sets may have arbiters to add a
vote in elections of for primary (page 561). Arbiters always have exactly 1 vote election, and thus allow replica sets
to have an uneven number of members, without the overhead of a member that replicates data.
Important: Do not run an arbiter on systems that also host the primary or the secondary members of the replica set.
Only add an arbiter to sets with even numbers of members. If you add an arbiter to a set with an odd number of
members, the set may suffer from tied elections. To add an arbiter, see Add an Arbiter to Replica Set (page 593).
Example
For example, in the following replica set, an arbiter allows the set to have an odd number of votes for elections:
Security
Authentication When running with authorization, arbiters exchange credentials with other members of the
set to authenticate. MongoDB encrypts the authentication process. The MongoDB authentication exchange is cryptographically secure.
Arbiters use keyfiles to authenticate to the replica set.
552
Chapter 9. Replication
Communication The only communication between arbiters and other set members are: votes during elections,
heartbeats, and configuration data. These exchanges are not encrypted.
However, if your MongoDB deployment uses SSL, MongoDB will encrypt all communication between replica set
members. See Configure mongod and mongos for SSL (page 331) for more information.
As with all MongoDB components, run arbiters in trusted network environments.
Fault Tolerance.
1
1
2
2
Adding a member to the replica set does not always increase the fault tolerance. However, in these cases, additional
members can provide support for dedicated functions, such as backups or reporting.
Use Hidden and Delayed Members for Dedicated Functions Add hidden (page 550) or delayed (page 551) members to support dedicated functions, such as backup or reporting.
Load Balance on Read-Heavy Deployments In a deployment with very high read traffic, you can improve read
throughput by distributing reads to secondary members. As your deployment grows, add or move members to alternate
data centers to improve redundancy and availability.
Always ensure that the main facility is able to elect a primary.
553
Add Capacity Ahead of Demand The existing members of a replica set must have spare capacity to support adding
a new member. Always add new members before the current demand saturates the capacity of the set.
Determine the Distribution of Members
Distribute Members Geographically To protect your data if your main data center fails, keep at least one member
in an alternate data center. Set these members priority to 0 to prevent them from becoming primary.
Keep a Majority of Members in One Location When a replica set has members in multiple data centers, network
partitions can prevent communication between data centers. To replicate data, members must be able to communicate
to other members.
In an election, members must see each other to create a majority. To ensure that the replica set members can confirm
a majority and elect a primary, keep a majority of the sets members in one location.
Target Operations with Tags
Use replica set tags (page 614) to ensure that operations replicate to specific data centers. Tags also support targeting
read operations to specific machines.
See also:
Data Center Awareness (page 207) and Operational Segregation in MongoDB Deployments (page 207).
Use Journaling to Protect Against Power Failures
Enable journaling to protect data against service interruptions. Without journaling MongoDB cannot recover data after
unexpected shutdowns, including power failures and unexpected reboots.
All 64-bit versions of MongoDB after version 2.0 have journaling enabled by default.
Replica Set Naming
If your application connects to more than one replica set, each set should have a distinct name. Some drivers group
replica set connections by replica set name.
Deployment Patterns
The following documents describe common replica set deployment patterns. Other patterns are possible and effective
depending on the applications requirements. If needed, combine features of each architecture in your own deployment:
Three Member Replica Sets (page 555) Three-member replica sets provide the minimum recommended architecture
for a replica set.
Replica Sets with Four or More Members (page 555) Four or more member replica sets provide greater redundancy
and can support greater distribution of read operations and dedicated functionality.
Geographically Distributed Replica Sets (page 559) Geographically distributed sets include members in multiple locations to protect against facility-specific failures, such as power outages.
554
Chapter 9. Replication
The minimum architecture of a replica set has three members. A three member replica set can have either three
members that hold data, or two members that hold data and an arbiter.
Primary with Two Secondary Members A replica set with three members that store data has:
One primary (page 546).
Two secondary (page 547) members. Both secondaries can become the primary in an election (page 561).
These deployments provide two complete copies of the data set at all times in addition to the primary. These replica
sets provide additional fault tolerance and high availability (page 560). If the primary is unavailable, the replica set
elects a secondary to be primary and continues normal operation. The old primary rejoins the set when available.
Primary with a Secondary and an Arbiter A three member replica set with a two members that store data has:
One primary (page 546).
One secondary (page 547) member. The secondary can become primary in an election (page 561).
One arbiter (page 552). The arbiter only votes in elections.
Since the arbiter does not hold a copy of the data, these deployments provides only one complete copy of the data.
Arbiters require fewer resources, at the expense of more limited redundancy and fault tolerance.
However, a deployment with a primary, secondary, and an arbiter ensures that a replica set remains available if the
primary or the secondary is unavailable. If the primary is unavailable, the replica set will elect the secondary to be
primary.
See also:
Deploy a Replica Set (page 583).
Replica Sets with Four or More Members
Overview Although the standard replica set configuration has three members, you can deploy larger sets. Add
additional members to a set to increase redundancy or to add capacity for distributing secondary read operations.
Considerations As you add new members to a replica set, consider the following:
9.2. Replication Concepts
555
556
Chapter 9. Replication
557
Odd Number of Voting Members Ensure that the replica set has an odd number of voting members. If you have
an even number of voting members, deploy an arbiter (page ??) so that the set has an odd number.
For example, the following replica set includes an arbiter to ensure an odd number of voting members.
Maximum Number of Voting Members A replica set can have up to 50 members, but only 7 voting
members. 2 If the replica set already has 7 voting members, additional members must be non-voting members
(page 564).
For example, the following 9 member replica set has 7 voting members and 2 non-voting members.
558
Chapter 9. Replication
Electability of Members Some members of the replica set, such as members that have networking restraint or
limited resources, should not be able to become primary in a failover. Configure members that should not become
primary to have priority 0 (page 548).
For example, the secondary member in the third data center with a priority of 0 cannot become primary:
See also:
Deploy a Replica Set (page 583), Add an Arbiter to Replica Set (page 593), and Add Members to a Replica Set
(page 595).
Geographically Distributed Replica Sets
Adding members to a replica set in multiple data centers adds redundancy and provides fault tolerance if one data
center is unavailable. Members in additional data centers should have a priority of 0 (page 548) to prevent them from
becoming primary.
For example: the architecture of a geographically distributed replica set may be:
One primary in the main data center.
One secondary member in the main data center. This member can become primary at any time.
559
One priority 0 (page 548) member in a second data center. This member cannot become primary.
In the following replica set, the primary and one secondary are in Data Center 1, while Data Center 2 has a priority 0
(page 548) secondary that cannot become a primary.
If the primary is unavailable, the replica set will elect a new primary from Data Center 1. If the data centers cannot
connect to each other, the member in Data Center 2 will not become the primary.
If Data Center 1 becomes unavailable, you can manually recover the data set from Data Center 2 with minimal
downtime. With sufficient write concern (page 76), there will be no data loss.
To facilitate elections, the main data center should hold a majority of members. Also ensure that the set has an odd
number of members. If adding a member in another data center results in a set with an even number of members,
deploy an arbiter (page ??). For more information on elections, see Replica Set Elections (page 561).
See also:
Deploy a Geographically Redundant Replica Set (page 588).
Additional Resource
Whitepaper: MongoDB Multi-Data Center Deployments3
Webinar: Multi-Data Center Deployment4
Replica sets remove rollback data when needed without intervention. Administrators must apply or discard rollback data manually.
560
Chapter 9. Replication
Failover Processes
The replica set recovers from the loss of a primary by holding an election. Consider the following:
Replica Set Elections (page 561) Elections occur when the primary becomes unavailable and the replica set members
autonomously select a new primary.
Rollbacks During Replica Set Failover (page 564) A rollback reverts write operations on a former primary when the
member rejoins the replica set after a failover.
Replica Set Elections
Replica sets use elections to determine which set member will become primary. Elections occur after initiating a
replica set, and also any time the primary becomes unavailable. The primary is the only member in the set that can
accept write operations. If a primary becomes unavailable, elections allow the set to recover normal operations without
manual intervention. Elections are part of the failover process (page 560).
In the following three-member replica set, the primary is unavailable. The remaining secondaries hold an election to
choose a new primary.
Behavior Elections are essential for independent operation of a replica set; however, elections take time to complete.
While an election is in process, the replica set has no primary and cannot accept writes and all remaining members
become read-only. MongoDB avoids elections unless necessary.
9.2. Replication Concepts
561
If a majority of the replica set is inaccessible or unavailable, the replica set cannot accept writes and all remaining
members become read-only.
Factors and Conditions that Affect Elections
Heartbeats Replica set members send heartbeats (pings) to each other every two seconds. If a heartbeat does not
return within 10 seconds, the other members mark the delinquent member as inaccessible.
Priority Comparisons The priority setting affects elections. Members will prefer to vote for members with the
highest priority value.
Members with a priority value of 0 cannot become primary and do not seek election. For details, see Priority 0 Replica
Set Members (page 548).
A replica set does not hold an election as long as the current primary has the highest priority value or no secondary
with higher priority is within 10 seconds of the latest oplog entry in the set.
If a higher-priority member catches up to within 10 seconds of the latest oplog entry of the current primary, the set
holds an election in order to provide the higher-priority node a chance to become primary.
Optime The optime is the timestamp of the last operation that a member applied from the oplog. A replica set
member cannot become primary unless it has the highest (i.e. most recent) optime of any visible member in the set.
Connections A replica set member cannot become primary unless it can connect to a majority of the members in the
replica set. For the purposes of elections, a majority refers to the total number of votes, rather than the total number of
members.
If you have a three-member replica set, where every member has one vote, the set can elect a primary as long as two
members can connect to each other. If two members are unavailable, the remaining member remains a secondary
because it cannot connect to a majority of the sets members. If the remaining member is a primary and two members
become unavailable, the primary steps down and becomes a secondary.
Network Partitions Network partitions affect the formation of a majority for an election. If a primary steps down
and neither portion of the replica set has a majority the set will not elect a new primary. The replica set becomes
read-only.
To avoid this situation, place a majority of instances in one data center and a minority of instances in any other data
centers combined.
Election Mechanics
Election Triggering Events Replica sets hold an election any time there is no primary. Specifically, the following:
the initiation of a new replica set.
a secondary loses contact with a primary. Secondaries call for elections when they cannot see a primary.
a primary steps down.
Note: Priority 0 members (page 548), do not trigger elections, even when they cannot connect to the primary.
A primary will step down:
562
Chapter 9. Replication
Participation in Elections Every replica set member has a priority that helps determine its eligibility to become a
primary. In an election, the replica set elects an eligible member with the highest priority value as primary. By
default, all members have a priority of 1 and have an equal chance of becoming primary. In the default, all members
also can trigger an election.
You can set the priority value to weight the election in favor of a particular member or group of members. For
example, if you have a geographically distributed replica set (page 559), you can adjust priorities so that only members
in a specific data center can become primary.
The first member to receive the majority of votes becomes primary. By default, all members have a single vote, unless
you modify the votes setting. Non-voting members (page 605) have votes value of 0. All other members have 1
vote.
Changed in version 3.0.0: Members cannot have votes greater than 1. For details, see Replica Set Configuration
Validation (page 774).
The state of a member also affects its eligibility to vote. Only members in the following states can vote: PRIMARY,
SECONDARY, RECOVERING, ARBITER, and ROLLBACK.
Important: Do not alter the number of votes in a replica set to control the outcome of an election. Instead, modify
the priority value.
Vetoes in Elections All members of a replica set can veto an election, including non-voting members (page 564). A
member will veto an election:
If the member seeking an election is not a member of the voters set.
If the member seeking an election is not up-to-date with the most recent operation accessible in the replica set.
If the member seeking an election has a lower priority than another member in the set that is also eligible for
election.
If a priority 0 member (page 548) 6 is the most current member at the time of the election. In this case, another
eligible member of the set will catch up to the state of this secondary member and then attempt to become
primary.
If the current primary has more recent operations (i.e. a higher optime) than the member seeking election,
from the perspective of the voting member.
If the current primary has the same or more recent operations (i.e. a higher or equal optime) than the member
seeking election.
6
Remember that hidden (page 550) and delayed (page 551) imply priority 0 (page 548) configuration.
563
Non-Voting Members Non-voting members hold copies of the replica sets data and can accept read operations from
client applications. Non-voting members do not vote in elections, but can veto (page 563) an election and become
primary.
Because a replica set can have up to 50 members, but only 7 voting members, non-voting members allow a
replica set to have more than seven members.
For instance, the following nine-member replica set has seven voting members and two non-voting members.
Important: Do not alter the number of votes to control which members will become primary. Instead, modify the
priority option. Only alter the number of votes in exceptional cases. For example, to permit more than seven
members.
When possible, all members should have one vote. Changing the number of votes can cause the wrong members to
become primary.
To configure a non-voting member, see Configure Non-Voting Replica Set Member (page 605).
Rollbacks During Replica Set Failover
A rollback reverts write operations on a former primary when the member rejoins its replica set after a failover.
A rollback is necessary only if the primary had accepted write operations that the secondaries had not successfully
replicated before the primary stepped down. When the primary rejoins the set as a secondary, it reverts, or rolls back,
its write operations to maintain database consistency with the other members.
MongoDB attempts to avoid rollbacks, which should be rare. When a rollback does occur, it is often the result of a
network partition. Secondaries that can not keep up with the throughput of operations on the former primary, increase
the size and impact of the rollback.
A rollback does not occur if the write operations replicate to another member of the replica set before the primary
steps down and if that member remains available and accessible to a majority of the replica set.
564
Chapter 9. Replication
Collect Rollback Data When a rollback does occur, administrators must decide whether to apply or ignore the
rollback data. MongoDB writes the rollback data to BSON files in the rollback/ folder under the databases
dbPath directory. The names of rollback files have the following form:
<database>.<collection>.<timestamp>.bson
For example:
records.accounts.2011-05-09T18-10-04.0.bson
Administrators must apply rollback data manually after the member completes the rollback and returns to secondary
status. Use bsondump to read the contents of the rollback files. Then use mongorestore to apply the changes to
the new primary.
Avoid Replica Set Rollbacks To prevent rollbacks, use replica acknowledged write concern (page 79) to guarantee
that the write operations propagate to the members of a replica set.
Rollback Limitations A mongod instance will not rollback more than 300 megabytes of data. If your system must
rollback more than 300 megabytes, you must manually intervene to recover the data. If this is the case, the following
line will appear in your mongod log:
[replica set sync] replSet syncThread: 13410 replSet too much data to roll back
In this situation, save the data directly or force the member to perform an initial sync. To force initial sync, sync from
a current member of the set by deleting the content of the dbPath directory for the member that requires a larger
rollback.
See also:
Replica Set High Availability (page 560) and Replica Set Elections (page 561).
565
Write Concern for Replica Sets (page 566) Write concern is the guarantee an application requires from MongoDB
to consider a write operation successful.
Read Preference (page 568) Applications specify read preference to control how drivers direct read operations to
members of the replica set.
Read Preference Processes (page 570) With replica sets, read operations may have additional semantics and behavior.
Write Concern for Replica Sets
From the perspective of a client application, whether a MongoDB instance is running as a single server (i.e. standalone) or a replica set is transparent. However, replica sets offer some configuration options for write. 7
Verify Write Operations to Replica Sets
For a replica set, the default write concern (page 76) confirms write operations only on the primary. You can, however, override this default write concern, such as to confirm write operations on a specified number of the replica set
members.
To override the default write concern, specify a write concern with each write operation. For example, the following
method includes a write concern that specifies that the method return only after the write propagates to the primary
and at least one secondary or the method times out after 5 seconds.
db.products.insert(
{ item: "envelopes", qty : 100, type: "Clasp" },
{ writeConcern: { w: 2, wtimeout: 5000 } }
)
You can include a timeout threshold for a write concern. This prevents write operations from blocking indefinitely
if the write concern is unachievable. For example, if the write concern requires acknowledgement from 4 members
of the replica set and the replica set has only available 3 members, the operation blocks until those members become
available. See wtimeout (page 129).
See also:
Write Method Acknowledgements (page 821)
Modify Default Write Concern
You can modify the default write concern for a replica set by setting the getLastErrorDefaults setting in the
replica set configuration (page 632). The following sequence of commands creates a configuration that waits for the
write operation to complete on a majority of the voting members before returning:
cfg = rs.conf()
cfg.settings = {}
cfg.settings.getLastErrorDefaults = { w: "majority", wtimeout: 5000 }
rs.reconfig(cfg)
If you issue a write operation with a specific write concern, the write operation uses its own write concern instead of
the default.
Note: Use of insufficient write concern can lead to rollbacks (page 564) in the case of replica set failover (page 560).
Always ensure that your operations have specified the required write concern for your application.
7
Sharded clusters where the shards are also replica sets provide the same configuration options with regards to write and read operations.
566
Chapter 9. Replication
567
See also:
Write Concern (page 76) and connections-write-concern
Custom Write Concerns
You can tag (page 614) the members of replica sets and use the tags to create custom write concerns. See Configure
Replica Set Tag Sets (page 614) for information on configuring custom write concerns using tag sets.
Read Preference
Read preference describes how MongoDB clients route read operations to members of a replica set.
By default, an application directs its read operations to the primary member in a replica set. Reading from the primary
guarantees that read operations reflect the latest version of a document. However, by distributing some or all reads to
secondary members of the replica set, you can improve read throughput or reduce latency for an application that does
not require fully up-to-date data.
Important: You must exercise care when specifying read preferences: modes other than primary (page 637) can
and will return stale data because the secondary queries will not include the most recent write operations to the replica
sets primary.
Use Cases
Indications The following are common use cases for using non-primary (page 637) read preference modes:
568
Chapter 9. Replication
569
Read Preference
Description
Mode
primary (page 637)
Default mode. All operations read from the current replica set primary.
primaryPreferred In most situations, operations read from the primary but if it is unavailable, operations
(page 637)
read from secondary members.
secondary
All operations read from the secondary members of the replica set.
(page 637)
secondaryPreferred In most situations, operations read from secondary members but if no secondary
(page 637)
members are available, operations read from the primary.
nearest (page 638)
Operations read from member of the replica set with the least network latency,
irrespective of the members type.
The syntax for specifying the read preference mode is specific to the driver and to the idioms of the host language8 .
Read preference modes are also available to clients connecting to a sharded cluster through a mongos. The mongos
instance obeys specified read preferences when connecting to the replica set that provides each shard in the cluster.
In the mongo shell, the readPref() cursor method provides access to read preferences.
For more information, see read preference background (page 568) and read preference behavior (page 570). See also
the documentation for your driver9 .
Tag Sets
Tag sets allow you to target read operations to specific members of a replica set.
Custom read preferences and write concerns evaluate tags sets in different ways. Read preferences consider the value
of a tag when selecting a member to read from. Write concerns ignore the value of a tag to when selecting a member,
except to consider whether or not the value is unique.
You can specify tag sets with the following read preference modes:
primaryPreferred (page 637)
secondary (page 637)
secondaryPreferred (page 637)
nearest (page 638)
Tags are not compatible with mode primary (page 637) and, in general, only apply when selecting (page 571) a
secondary member of a set for a read operation. However, the nearest (page 638) read mode, when combined with
a tag set, selects the matching member with the lowest network latency. This member may be a primary or secondary.
All interfaces use the same member selection logic (page 571) to choose the member to which to direct read operations,
basing the choice on read preference mode and tag sets.
For information on configuring tag sets, see the Configure Replica Set Tag Sets (page 614) tutorial.
For more information on how read preference modes (page 637) interact with tag sets, see the documentation for each
read preference mode (page 636).
Read Preference Processes
Changed in version 2.2.
8 http://api.mongodb.org/
9 http://api.mongodb.org/
570
Chapter 9. Replication
MongoDB drivers use the following procedures to direct operations to replica sets and sharded clusters. To determine
how to route their operations, applications periodically update their view of the replica sets state, identifying which
members are up or down, which member is primary, and verifying the latency to each mongod instance.
Member Selection
Clients, by way of their drivers, and mongos instances for sharded clusters, periodically update their view of the
replica sets state.
When you select non-primary (page 637) read preference, the driver will determine which member to target using
the following process:
1. Assembles a list of suitable members, taking into account member type (i.e. secondary, primary, or all members).
2. Excludes members not matching the tag sets, if specified.
3. Determines which suitable member is the closest to the client in absolute terms.
4. Builds a list of members that are within a defined ping distance (in milliseconds) of the absolute nearest
member.
Applications can configure the threshold used in this stage. The default acceptable latency is 15 milliseconds,
which you can override in the drivers with their own secondaryAcceptableLatencyMS option. For
mongos you can use the --localThreshold or localPingThresholdMs runtime options to set this
value.
5. Selects a member from these hosts at random. The member receives the read operation.
Drivers can then associate the thread or connection with the selected member. This request association (page 571) is
configurable by the application. See your driver documentation about request association configuration and default
behavior.
Request Association
Important: Request association is configurable by the application. See your driver documentation about request
association configuration and default behavior.
Because secondary members of a replica set may lag behind the current primary by different amounts, reads for
secondary members may reflect data at different points in time. To prevent sequential reads from jumping around in
time, the driver can associate application threads to a specific member of the set after the first read, thereby preventing
reads from other members. The thread will continue to read from the same member until:
The application performs a read with a different read preference,
The thread terminates, or
The client receives a socket exception, as is the case when theres a network error or when the mongod closes
connections during a failover. This triggers a retry (page 571), which may be transparent to the application.
When using request association, if the client detects that the set has elected a new primary, the driver will discard all
associations between threads and members.
Auto-Retry
Connections between MongoDB drivers and mongod instances in a replica set must balance two concerns:
571
1. The client should attempt to prefer current results, and any connection should read from the same member of
the replica set as much as possible. Requests should prefer request association (page 571) (e.g. pinning).
2. The client should minimize the amount of time that the database is inaccessible as the result of a connection
issue, networking problem, or failover in a replica set.
As a result, MongoDB drivers:
Reuse a connection to a specific mongod for as long as possible after establishing a connection to that instance.
This connection is pinned to this mongod.
Attempt to reconnect to a new member, obeying existing read preference modes (page 637), if the connection to
mongod is lost.
Reconnections are transparent to the application itself. If the connection permits reads from secondary members, after reconnecting, the application can receive two sequential reads returning from different secondaries.
Depending on the state of the individual secondary members replication, the documents can reflect the state of
your database at different moments.
Return an error only after attempting to connect to three members of the set that match the read preference mode
(page 637) and tag set (page 570). If there are fewer than three members of the set, the client will error after
connecting to all existing members of the set.
After this error, the driver selects a new member using the specified read preference mode. In the absence of a
specified read preference, the driver uses primary (page 637).
After detecting a failover situation,
possible.
10
the driver attempts to refresh the state of the replica set as quickly as
Changed in version 3.0.0: mongos instances take a slightly different approach. mongos instances return connections
to secondaries to the connection pool after every request. As a result, the mongos reevaluates read preference for
every operation.
Read Preference in Sharded Clusters
Changed in version 2.2: Before version 2.2, mongos did not support the read preference mode semantics (page 637).
In most sharded clusters, each shard consists of a replica set. As such, read preferences are also applicable. With
regard to read preference, read operations in a sharded cluster are identical to unsharded replica sets.
Unlike simple replica sets, in sharded clusters, all interactions with the shards pass from the clients to the mongos
instances that are actually connected to the set members. mongos is then responsible for the application of read
preferences, which is transparent to applications.
There are no configuration changes required for full support of read preference modes in sharded environments, as long
as the mongos is at least version 2.2. All mongos maintain their own connection pool to the replica set members.
As a result:
A request without a specified preference has primary (page 637), the default, unless, the mongos reuses an
existing connection that has a different mode set.
To prevent confusion, always explicitly set your read preference mode.
All nearest (page 638) and latency calculations reflect the connection between the mongos and the mongod
instances, not the client and the mongod instances.
This produces the desired result, because all results must pass through the mongos before returning to the
client.
10
When a failover occurs, all members of the set close all client connections that produce a socket error in the driver. This behavior prevents or
minimizes rollback.
572
Chapter 9. Replication
When you start a replica set member for the first time, MongoDB creates an oplog of a default size. The size depends
on the architectural details of your operating system.
In most cases, the default oplog size is sufficient. For example, if an oplog is 5% of free disk space and fills up in 24
hours of operations, then secondaries can stop copying entries from the oplog for up to 24 hours without becoming
too stale to continue replicating. However, most replica sets have much lower operation volumes, and their oplogs can
hold much higher numbers of operations.
Before mongod creates an oplog, you can specify its size with the oplogSizeMB option. However, after you have
started a replica set member for the first time, you can only change the size of the oplog using the Change the Size of
the Oplog (page 608) procedure.
By default, the size of the oplog is as follows:
For 64-bit Linux, Solaris, FreeBSD, and Windows systems, MongoDB allocates 5% of the available free disk
space, but will always allocate at least 1 gigabyte and never more than 50 gigabytes.
For 64-bit OS X systems, MongoDB allocates 183 megabytes of space to the oplog.
For 32-bit systems, MongoDB allocates about 48 megabytes of space to the oplog.
573
If you can predict your replica sets workload to resemble one of the following patterns, then you might want to create
an oplog that is larger than the default. Conversely, if your application predominantly performs reads with a minimal
amount of write operations, a smaller oplog may be sufficient.
The following workloads might require a larger oplog size.
Updates to Multiple Documents at Once The oplog must translate multi-updates into individual operations in order
to maintain idempotency. This can use a great deal of oplog space without a corresponding increase in data size or
disk use.
Deletions Equal the Same Amount of Data as Inserts If you delete roughly the same amount of data as you insert,
the database will not grow significantly in disk use, but the size of the operation log can be quite large.
Significant Number of In-Place Updates If a significant portion of the workload is updates that do not increase the
size of the documents, the database records a large number of operations but does not change the quantity of data on
disk.
Oplog Status
To view oplog status, including the size and the time range of operations, issue the
rs.printReplicationInfo() method. For more information on oplog status, see Check the Size of the
Oplog (page 629).
Under various exceptional situations, updates to a secondarys oplog might lag behind the desired performance time.
Use db.getReplicationInfo() from a secondary member and the replication status output to assess
the current state of replication and determine if there is any unintended replication delay.
See Replication Lag (page 627) for more information.
Replica Set Data Synchronization
In order to maintain up-to-date copies of the shared data set, members of a replica set sync or replicate data from other
members. MongoDB uses two forms of data synchronization: initial sync (page 574) to populate new members with
the full data set, and replication to apply ongoing changes to the entire data set.
Initial Sync
Initial sync copies all the data from one member of the replica set to another member. A member uses initial sync
when the member has no data, such as when the member is new, or when the member has data but is missing a history
of the sets replication.
When you perform an initial sync, MongoDB:
1. Clones all databases. To clone, the mongod queries every collection in each source database and inserts all data
into its own copies of these collections. At this time, _id indexes are also built. The clone process only copies
valid data, omitting invalid documents.
2. Applies all changes to the data set. Using the oplog from the source, the mongod updates its data set to reflect
the current state of the replica set.
574
Chapter 9. Replication
3. Builds all indexes on all collections (except _id indexes, which were already completed).
When the mongod finishes building all index builds, the member can transition to a normal state, i.e. secondary.
Changed in version 3.0: When the clone process omits an invalid document from the sync, MongoDB writes a message
to the logs that begins with Cloner: found corrupt document in <collection>.
To perform an initial sync, see Resync a Member of a Replica Set (page 613).
Replication
Replica set members replicate data continuously after the initial sync. This process keeps the members up to date
with all changes to the replica sets data. In most cases, secondaries synchronize from the primary. Secondaries
may automatically change their sync targets if needed based on changes in the ping time and state of other members
replication.
For a member to sync from another, both members must have the same value for the buildIndexes setting.
Beginning in version 2.2, secondaries avoid syncing from delayed members (page 551) and hidden members
(page 550).
Validity and Durability
In a replica set, only the primary can accept write operations. Writing only to the primary provides strict consistency
among members.
Journaling provides single-instance write durability. Without journaling, if a MongoDB instance terminates ungracefully, you must assume that the database is in an invalid state.
Multithreaded Replication
MongoDB applies write operations in batches using multiple threads to improve concurrency. MongoDB groups
batches by namespace and applies operations using a group of threads, but always applies the write operations to a
namespace in order.
While applying a batch, MongoDB blocks all reads. As a result, secondaries can never return data that reflects a state
that never existed on the primary.
Pre-Fetching Indexes to Improve Replication Throughput
To help improve the performance of applying oplog entries, MongoDB fetches memory pages that hold affected data
and indexes. This pre-fetch stage minimizes the amount of time MongoDB holds the write lock while applying oplog
entries. By default, secondaries will pre-fetch all Indexes (page 463).
Optionally, you can disable all pre-fetching or only pre-fetch the index on the _id field.
secondaryIndexPrefetch setting for more information.
See the
575
In addition to providing all the functionality of master-slave deployments, replica sets are also more robust for production use. Master-slave replication preceded replica sets and made it possible to have a large number of non-master
(i.e. slave) nodes, as well as to restrict replicated operations to only a single database; however, master-slave replication provides less redundancy and does not automate failover. See Deploy Master-Slave Equivalent using Replica
Sets (page 578) for a replica set configuration that is equivalent to master-slave replication. If you wish to convert an
existing master-slave deployment to a replica set, see Convert a Master-Slave Deployment to a Replica Set (page 578).
Fundamental Operations
Initial Deployment
To configure a master-slave deployment, start two mongod instances: one in master mode, and the other in slave
mode.
To start a mongod instance in master mode, invoke mongod as follows:
mongod --master --dbpath /data/masterdb/
With the --master option, the mongod will create a local.oplog.$main (page 634) collection, which the operation log that queues operations that the slaves will apply to replicate operations from the master. The --dbpath
is optional.
To start a mongod instance in slave mode, invoke mongod as follows:
mongod --slave --source <masterhostname><:<port>> --dbpath /data/slavedb/
Specify the hostname and port of the master instance to the --source argument. The --dbpath is optional.
For slave instances, MongoDB stores data about the source server in the local.sources (page 634) collection.
Configuration Options for Master-Slave Deployments
As an alternative to specifying the --source run-time option, can add a document to local.sources (page 634)
specifying the master instance, as in the following operation in the mongo shell:
1
2
3
use local
db.sources.find()
db.sources.insert( { host: <masterhostname> <,only: databasename> } );
In line 1, you switch context to the local database. In line 2, the find() operation should return no documents, to
ensure that there are no documents in the sources collection. Finally, line 3 uses db.collection.insert()
to insert the source document into the local.sources (page 634) collection. The model of the local.sources
(page 634) document is as follows:
host
The host field specifies the master mongod instance, and holds a resolvable hostname, i.e. IP address, or a name
from a host file, or preferably a fully qualified domain name.
You can append <:port> to the host name if the mongod is not running on the default 27017 port.
only
Optional. Specify a name of a database. When specified, MongoDB will only replicate the indicated database.
576
Chapter 9. Replication
Master instances store operations in an oplog which is a capped collection (page 208). As a result, if a slave falls too
far behind the state of the master, it cannot catchup and must re-sync from scratch. Slave may become out of sync
with a master if:
The slave falls far behind the data updates available from that master.
The slave stops (i.e. shuts down) and restarts later after the master has overwritten the relevant operations from
the master.
When slaves are out of sync, replication stops. Administrators must intervene manually to restart replication. Use the
resync command. Alternatively, the --autoresync allows a slave to restart replication automatically, after ten
second pause, when the slave falls out of sync with the master. With --autoresync specified, the slave will only
attempt to re-sync once in a ten minute period.
To prevent these situations you should specify a larger oplog when you start the master instance, by adding the
--oplogSize option when starting mongod. If you do not specify --oplogSize, mongod will allocate 5%
of available disk space on start up to the oplog, with a minimum of 1GB for 64bit machines and 50MB for 32bit
machines.
Run time Master-Slave Configuration
MongoDB provides a number of command line options for mongod instances in master-slave deployments. See the
Master-Slave Replication Command Line Options for options.
Diagnostics
On a master instance, issue the following operation in the mongo shell to return replication status from the perspective
of the master:
rs.printReplicationInfo()
For
previous
versions,
use
On a slave instance, use the following operation in the mongo shell to return the replication status from the perspective
of the slave:
rs.printSlaveReplicationInfo()
For
previous
versions,
use
Use the serverStatus as in the following operation, to return status of the replication:
db.serverStatus( { repl: 1 } )
See server status repl fields for documentation of the relevant section of output.
Security
When running with authorization enabled, in master-slave deployments configure a keyFile so that slave
mongod instances can authenticate and communicate with the master mongod instance.
To enable authentication and configure the keyFile add the following option to your configuration file:
577
keyFile = /srv/mongodb/keyfile
Note: You may chose to set these run-time configuration options using the --keyFile option on the command line.
Setting keyFile enables authentication and specifies a key file for the mongod instances to use when authenticating
to each other. The content of the key file is arbitrary but must be the same on all members of the deployment can
connect to each other.
The key file must be less one kilobyte in size and may only contain characters in the base64 set. The key file must not
have group or world permissions on UNIX systems. Use the following command to use the OpenSSL package to
generate random content for use in a key file:
openssl rand -base64 741
See also:
Security (page 305) for more information about security in MongoDB
Ongoing Administration and Operation of Master-Slave Deployments
Deploy Master-Slave Equivalent using Replica Sets
If you want a replication configuration that resembles master-slave replication, using replica sets replica sets, consider the following replica configuration document. In this deployment hosts <master> and <slave> 11 provide
replication that is roughly equivalent to a two-instance master-slave deployment:
{
_id : 'setName',
members : [
{ _id : 0, host : "<master>", priority : 1 },
{ _id : 1, host : "<slave>", priority : 0, votes : 0 }
]
}
See Replica Set Configuration (page 632) for more information about replica set configurations.
Convert a Master-Slave Deployment to a Replica Set
To convert a master-slave deployment to a replica set, restart the current master as a one-member replica set. Then
remove the data directories from previous secondaries and add them as new secondaries to the new replica set.
1. To confirm that the current instance is master, run:
db.isMaster()
In replica set configurations, the host field must hold a resolvable hostname.
578
Chapter 9. Replication
2. Shut down the mongod processes on the master and all slave(s), using the following command while connected
to each instance:
db.adminCommand({shutdown : 1, force : true})
3. Back up your /data/db directories, in case you need to revert to the master-slave deployment.
4. Start the former master with the --replSet option, as in the following:
mongod --replSet <setname>
5. Connect to the mongod with the mongo shell, and initiate the replica set with the following command:
rs.initiate()
When the command returns, you will have successfully deployed a one-member replica set. You can check the
status of your replica set at any time by running the following command:
rs.status()
You can now follow the convert a standalone to a replica set (page 594) tutorial to deploy your replica set, picking up
from the Expand the Replica Set (page 595) section.
Failing over to a Slave (Promotion)
To permanently failover from a unavailable or damaged master (A in the following example) to a slave (B):
1. Shut down A.
2. Stop mongod on B.
3. Back up and move all data files that begin with local on B from the dbPath.
Warning:
caution.
Removing local.* is irrevocable and cannot be undone. Perform this step with extreme
If you have a master (A) and a slave (B) and you would like to reverse their roles, follow this procedure. The procedure
assumes A is healthy, up-to-date and available.
If A is not healthy but the hardware is okay (power outage, server crash, etc.), skip steps 1 and 2 and in step 8 replace
all of As files with Bs files in step 8.
If A is not healthy and the hardware is not okay, replace A with a new machine. Also follow the instructions in the
previous paragraph.
To invert the master and slave in a deployment:
1. Halt writes on A using the fsync command.
2. Make sure B is up to date with the state of A.
3. Shut down B.
579
4. Back up and move all data files that begin with local on B from the dbPath to remove the existing
local.sources data.
Warning:
caution.
Removing local.* is irrevocable and cannot be undone. Perform this step with extreme
If you can stop write operations to the master for an indefinite period, you can copy the data files from the master to
the new slave and then start the slave with --fastsync.
Warning: Be careful with --fastsync. If the data on both instances is not identical, a discrepancy will exist
forever.
fastsync is a way to start a slave by starting with an existing master disk image/backup. This option declares that
the administrator guarantees the image is correct and completely up-to-date with that of the master. If you have a full
and complete copy of data from a master you can use this option to avoid a full synchronization upon starting the
slave.
Creating a Slave from an Existing Slaves Disk Image
You can just copy the other slaves data file snapshot without any special options. Only take data snapshots when a
mongod process is down or locked using db.fsyncLock().
Resyncing a Slave that is too Stale to Recover
Slaves asynchronously apply write operations from the master that the slaves poll from the masters oplog. The oplog
is finite in length, and if a slave is too far behind, a full resync will be necessary. To resync the slave, connect to a
slave using the mongo and issue the resync command:
use admin
db.runCommand( { resync: 1 } )
This forces a full resync of all data (which will be very slow on a large database). You can achieve the same effect by
stopping mongod on the slave, deleting the entire content of the dbPath on the slave, and restarting the mongod.
580
Chapter 9. Replication
Slave Chaining
Slaves cannot be chained. They must all connect to the master directly.
If a slave attempts slave from another slave you will see the following line in the mongod long of the shell:
assertion 13051 tailable cursor requested on non capped collection ns:local.oplog.$main
To change a slaves source, manually modify the slaves local.sources (page 634) collection.
Example
Consider the following: If you accidentally set an incorrect hostname for the slaves source, as in the following
example:
mongod --slave --source prod.mississippi
You can correct this, by restarting the slave without the --slave and --source arguments:
mongod
Connect to this mongod instance using the mongo shell and update the local.sources (page 634) collection,
with the following operation sequence:
use local
db.sources.update( { host : "prod.mississippi" },
{ $set : { host : "prod.mississippi.example.net" } } )
Restart the slave with the correct command line arguments or with no --source option. After configuring
local.sources (page 634) the first time, the --source will have no subsequent effect. Therefore, both of
the following invocations are correct:
mongod --slave --source prod.mississippi.example.net
or
mongod --slave
581
Convert a Standalone to a Replica Set (page 594) Convert an existing standalone mongod instance into a
three-member replica set.
Add Members to a Replica Set (page 595) Add a new member to an existing replica set.
Remove Members from Replica Set (page 598) Remove a member from a replica set.
Continue reading from Replica Set Deployment Tutorials (page 582) for additional tutorials of related to setting
up replica set deployments.
Member Configuration Tutorials (page 600) Tutorials that describe the process for configuring replica set members.
Adjust Priority for Replica Set Member (page 600) Change the precedence given to a replica set members in
an election for primary.
Prevent Secondary from Becoming Primary (page 601) Make a secondary member ineligible for election as
primary.
Configure a Hidden Replica Set Member (page 603) Configure a secondary member to be invisible to applications in order to support significantly different usage, such as a dedicated backups.
Continue reading from Member Configuration Tutorials (page 600) for more tutorials that describe replica set
configuration.
Replica Set Maintenance Tutorials (page 607) Procedures and tasks for common operations on active replica set
deployments.
Change the Size of the Oplog (page 608) Increase the size of the oplog which logs operations. In most cases,
the default oplog size is sufficient.
Resync a Member of a Replica Set (page 613) Sync the data on a member. Either perform initial sync on a
new member or resync the data on an existing member that has fallen too far behind to catch up by way of
normal replication.
Force a Member to Become Primary (page 611) Force a replica set member to become primary.
Change Hostnames in a Replica Set (page 622) Update the replica set configuration to reflect changes in
members hostnames.
Continue reading from Replica Set Maintenance Tutorials (page 607) for descriptions of additional replica set
maintenance procedures.
Troubleshoot Replica Sets (page 626) Describes common issues and operational challenges for replica sets. For additional diagnostic information, see FAQ: MongoDB Diagnostics (page 757).
582
Chapter 9. Replication
Convert a Standalone to a Replica Set (page 594) Convert an existing standalone mongod instance into a threemember replica set.
Add Members to a Replica Set (page 595) Add a new member to an existing replica set.
Remove Members from Replica Set (page 598) Remove a member from a replica set.
Replace a Replica Set Member (page 599) Update the replica set configuration when the hostname of a members
corresponding mongod instance has changed.
Deploy a Replica Set
This tutorial describes how to create a three-member replica set from three existing mongod instances running with
access control (page 312) disabled.
To deploy a replica set with enabled access control (page 312), see Deploy Replica Set and Configure Authentication
and Authorization (page 340). If you wish to deploy a replica set from a single MongoDB instance, see Convert
a Standalone to a Replica Set (page 594). For more information on replica set deployments, see the Replication
(page 541) and Replica Set Deployment Architectures (page 553) documentation.
Overview
Three member replica sets provide enough redundancy to survive most network partitions and other system failures.
These sets also have sufficient capacity for many distributed read operations. Replica sets should always have an odd
number of members. This ensures that elections (page 561) will proceed smoothly. For more about designing replica
sets, see the Replication overview (page 541).
The basic procedure is to start the mongod instances that will become members of the replica set, configure the replica
set itself, and then add the mongod instances to it.
Requirements
For production deployments, you should maintain as much separation between members as possible by hosting the
mongod instances on separate machines. When using virtual machines for production deployments, you should place
each mongod instance on a separate host server serviced by redundant power circuits and redundant network paths.
Before you can deploy a replica set, you must install MongoDB on each system that will be part of your replica set. If
you have not already installed MongoDB, see the installation tutorials (page 5).
Before creating your replica set, you should verify that your network configuration allows all possible connections
between each member. For a successful replica set deployment, every member must be able to connect to every other
member. For instructions on how to check your connection, see Test Connections Between all Members (page 628).
Considerations When Deploying a Replica Set
Architecture In a production, deploy each member of the replica set to its own machine and if possible bind to the
standard MongoDB port of 27017. Use the bind_ip option to ensure that MongoDB listens for connections from
applications on configured addresses.
For a geographically distributed replica sets, ensure that the majority of the sets mongod instances reside in the
primary site.
See Replica Set Deployment Architectures (page 553) for more information.
583
Connectivity Ensure that network traffic can pass between all members of the set and all clients in the network
securely and efficiently. Consider the following:
Establish a virtual private network. Ensure that your network topology routes all traffic between members within
a single site over the local area network.
Configure access control to prevent connections from unknown clients to the replica set.
Configure networking and firewall rules so that incoming and outgoing packets are permitted only on the default
MongoDB port and only from within your deployment.
Finally ensure that each member of a replica set is accessible by way of resolvable DNS or hostnames. You should
either configure your DNS names appropriately or set up your systems /etc/hosts file to reflect this configuration.
Configuration Specify the run time configuration on each system in a configuration file stored in
/etc/mongodb.conf or a related location. Create the directory where MongoDB stores data files before deploying MongoDB.
For more information about the run time options used above and other configuration options, see
http://docs.mongodb.org/manual/reference/configuration-options.
Procedure
The following procedure outlines the steps to deploy a replica set when access control is disabled.
Step 1: Start each member of the replica set with the appropriate options. For each member, start a mongod and
specify the replica set name through the replSet option. Specify any other parameters specific to your deployment.
For replication-specific parameters, see cli-mongod-replica-set.
If your application connects to more than one replica set, each set should have a distinct name. Some drivers group
replica set connections by replica set name.
The following example specifies the replica set name through the --replSet command-line option:
mongod --replSet "rs0"
You can also specify the replica set name in the configuration file. To start mongod with a configuration file, specify the file with the --config option:
mongod --config $HOME/.mongodb/config
In production deployments, you can configure a control script to manage this process. Control scripts are beyond the
scope of this document.
Step 2: Connect a mongo shell to a replica set member.
localhost on the default port of 27017, simply issue:
mongo
Step 3: Initiate the replica set. Use rs.initiate() on the replica set member:
rs.initiate()
MongoDB initiates a set that consists of the current member and that uses the default replica set configuration.
584
Chapter 9. Replication
Step 4: Verify the initial replica set configuration. Use rs.conf() to display the replica set configuration object
(page 632):
rs.conf()
Step 5: Add the remaining members to the replica set. Add the remaining members with the rs.add() method.
The following example adds two members:
rs.add("mongodb1.example.net")
rs.add("mongodb2.example.net")
When complete, you have a fully functional replica set. The new replica set will elect a primary.
Step 6: Check the status of the replica set. Use the rs.status() operation:
rs.status()
See also:
Deploy Replica Set and Configure Authentication and Authorization (page 340)
Deploy a Replica Set for Testing and Development
This procedure describes deploying a replica set in a development or test environment. For a production deployment,
refer to the Deploy a Replica Set (page 583) tutorial.
This tutorial describes how to create a three-member replica set from three existing mongod instances running with
access control (page 312) disabled.
To deploy a replica set with enabled access control (page 312), see Deploy Replica Set and Configure Authentication
and Authorization (page 340). If you wish to deploy a replica set from a single MongoDB instance, see Convert
a Standalone to a Replica Set (page 594). For more information on replica set deployments, see the Replication
(page 541) and Replica Set Deployment Architectures (page 553) documentation.
Overview
Three member replica sets provide enough redundancy to survive most network partitions and other system failures.
These sets also have sufficient capacity for many distributed read operations. Replica sets should always have an odd
number of members. This ensures that elections (page 561) will proceed smoothly. For more about designing replica
sets, see the Replication overview (page 541).
The basic procedure is to start the mongod instances that will become members of the replica set, configure the replica
set itself, and then add the mongod instances to it.
9.3. Replica Set Tutorials
585
Requirements
For test and development systems, you can run your mongod instances on a local system, or within a virtual instance.
Before you can deploy a replica set, you must install MongoDB on each system that will be part of your replica set. If
you have not already installed MongoDB, see the installation tutorials (page 5).
Before creating your replica set, you should verify that your network configuration allows all possible connections
between each member. For a successful replica set deployment, every member must be able to connect to every other
member. For instructions on how to check your connection, see Test Connections Between all Members (page 628).
Considerations
1. Create the necessary data directories for each member by issuing a command similar to the following:
mkdir -p /srv/mongodb/rs0-0 /srv/mongodb/rs0-1 /srv/mongodb/rs0-2
This will create directories called rs0-0, rs0-1, and rs0-2, which will contain the instances database files.
2. Start your mongod instances in their own shell windows by issuing the following commands:
First member:
mongod --port 27017 --dbpath /srv/mongodb/rs0-0 --replSet rs0 --smallfiles --oplogSize 128
Second member:
mongod --port 27018 --dbpath /srv/mongodb/rs0-1 --replSet rs0 --smallfiles --oplogSize 128
Third member:
mongod --port 27019 --dbpath /srv/mongodb/rs0-2 --replSet rs0 --smallfiles --oplogSize 128
This starts each instance as a member of a replica set named rs0, each running on a distinct port, and specifies
the path to your data directory with the --dbpath setting. If you are already using the suggested ports, select
different ports.
The --smallfiles and --oplogSize settings reduce the disk space that each mongod
instance uses.
This is ideal for testing and development deployments as it prevents overloading your machine.
For more information on these and other configuration options, see
http://docs.mongodb.org/manual/reference/configuration-options.
3. Connect to one of your mongod instances through the mongo shell. You will need to indicate which instance
by specifying its port number. For the sake of simplicity and clarity, you may want to choose the first one, as in
the following command;
586
Chapter 9. Replication
4. In the mongo shell, use rs.initiate() to initiate the replica set. You can create a replica set configuration
object in the mongo shell environment, as in the following example:
rsconf = {
_id: "rs0",
members: [
{
_id: 0,
host: "<hostname>:27017"
}
]
}
replacing <hostname> with your systems hostname, and then pass the rsconf file to rs.initiate() as
follows:
rs.initiate( rsconf )
5. Display the current replica configuration (page 632) by issuing the following command:
rs.conf()
6. In the mongo shell connected to the primary, add the second and third mongod instances to the replica set
using the rs.add() method. Replace <hostname> with your systems hostname in the following examples:
rs.add("<hostname>:27018")
rs.add("<hostname>:27019")
When complete, you should have a fully functional replica set. The new replica set will elect a primary.
Check the status of your replica set at any time with the rs.status() operation.
See also:
The documentation of the following shell functions for more information:
rs.initiate()
rs.conf()
rs.reconfig()
rs.add()
You may also consider the simple setup script12 as an example of a basic automatically-configured replica set.
12 https://github.com/mongodb/mongo-snippets/blob/master/replication/simple-setup.py
587
Refer to Replica Set Read and Write Semantics (page 565) for a detailed explanation of read and write semantics in
MongoDB.
Deploy a Geographically Redundant Replica Set
Overview
This tutorial outlines the process for deploying a replica set with members in multiple locations. The tutorial addresses
three-member sets, four-member sets, and sets with more than four members.
For appropriate background, see Replication (page 541) and Replica Set Deployment Architectures (page 553). For
related tutorials, see Deploy a Replica Set (page 583) and Add Members to a Replica Set (page 595).
Considerations
While replica sets provide basic protection against single-instance failure, replica sets whose members are all located
in a single facility are susceptible to errors in that facility. Power outages, network interruptions, and natural disasters
are all issues that can affect replica sets whose members are colocated. To protect against these classes of failures,
deploy a replica set with one or more members in a geographically distinct facility or data center to provide redundancy.
Prerequisites
In general, the requirements for any geographically redundant replica set are as follows:
Ensure that a majority of the voting members (page 564) are within a primary facility, Site A. This includes
priority 0 members (page 548) and arbiters (page 552). Deploy other members in secondary facilities, Site B,
Site C, etc., to provide additional copies of the data. See Determine the Distribution of Members (page 554)
for more information on the voting requirements for geographically redundant replica sets.
If you deploy a replica set with an even number of members, deploy an arbiter (page 552) on Site A. The arbiter
must be on site A to keep the majority there.
For instance, for a three-member replica set you need two instances in a Site A, and one member in a secondary facility,
Site B. Site A should be the same facility or very close to your primary application infrastructure (i.e. application
servers, caching layer, users, etc.)
A four-member replica set should have at least two members in Site A, with the remaining members in one or more
secondary sites, as well as a single arbiter in Site A.
For all configurations in this tutorial, deploy each replica set member on a separate system. Although you may deploy
more than one replica set member on a single system, doing so reduces the redundancy and capacity of the replica set.
Such deployments are typically for testing purposes and beyond the scope of this tutorial.
This tutorial assumes you have installed MongoDB on each system that will be part of your replica set. If you have
not already installed MongoDB, see the installation tutorials (page 5).
Procedures
General Considerations
588
Chapter 9. Replication
Architecture In a production, deploy each member of the replica set to its own machine and if possible bind to the
standard MongoDB port of 27017. Use the bind_ip option to ensure that MongoDB listens for connections from
applications on configured addresses.
For a geographically distributed replica sets, ensure that the majority of the sets mongod instances reside in the
primary site.
See Replica Set Deployment Architectures (page 553) for more information.
Connectivity Ensure that network traffic can pass between all members of the set and all clients in the network
securely and efficiently. Consider the following:
Establish a virtual private network. Ensure that your network topology routes all traffic between members within
a single site over the local area network.
Configure access control to prevent connections from unknown clients to the replica set.
Configure networking and firewall rules so that incoming and outgoing packets are permitted only on the default
MongoDB port and only from within your deployment.
Finally ensure that each member of a replica set is accessible by way of resolvable DNS or hostnames. You should
either configure your DNS names appropriately or set up your systems /etc/hosts file to reflect this configuration.
Configuration Specify the run time configuration on each system in a configuration file stored in
/etc/mongodb.conf or a related location. Create the directory where MongoDB stores data files before deploying MongoDB.
For more information about the run time options used above and other configuration options, see
http://docs.mongodb.org/manual/reference/configuration-options.
589
You can also specify the replica set name in the configuration file. To start mongod with a configuration file, specify the file with the --config option:
mongod --config $HOME/.mongodb/config
In production deployments, you can configure a control script to manage this process. Control scripts are beyond the
scope of this document.
Step 2: Connect a mongo shell to a replica set member.
localhost on the default port of 27017, simply issue:
mongo
Step 3: Initiate the replica set. Use rs.initiate() on the replica set member:
rs.initiate()
MongoDB initiates a set that consists of the current member and that uses the default replica set configuration.
Step 4: Verify the initial replica set configuration. Use rs.conf() to display the replica set configuration object
(page 632):
rs.conf()
Step 5: Add the remaining members to the replica set. Add the remaining members with the rs.add() method.
The following example adds two members:
rs.add("mongodb1.example.net")
rs.add("mongodb2.example.net")
When complete, you have a fully functional replica set. The new replica set will elect a primary.
Step 7: Check the status of the replica set. Use the rs.status() operation:
rs.status()
590
Chapter 9. Replication
Step 6: Configure the outside member as priority 0 members. Configure the member located in Site B (in this
example, mongodb2.example.net) as a priority 0 member (page 548).
1. View the replica set configuration to determine the members array position for the member. Keep in mind the
array position is not the same as the _id:
rs.conf()
2. Copy the replica set configuration object to a variable (to cfg in the example below). Then, in the variable,
set the correct priority for the member. Then pass the variable to rs.reconfig() to update the replica set
configuration.
For example, to set priority for the third member in the array (i.e., the member at position 2), issue the following
sequence of commands:
cfg = rs.conf()
cfg.members[2].priority = 0
rs.reconfig(cfg)
Note: The rs.reconfig() shell method can force the current primary to step down, causing an election.
When the primary steps down, all clients will disconnect. This is the intended behavior. While most elections complete within a minute, always make sure any replica configuration changes occur during scheduled
maintenance periods.
After these commands return, you have a geographically redundant three-member replica set.
Deploy a Geographically Redundant Four-Member Replica Set A geographically redundant four-member deployment has two additional considerations:
One host (e.g. mongodb4.example.net) must be an arbiter. This host can run on a system that is also used
for an application server or on the same machine as another MongoDB process.
You must decide how to distribute your systems. There are three possible architectures for the four-member
replica set:
Three members in Site A, one priority 0 member (page 548) in Site B, and an arbiter in Site A.
Two members in Site A, two priority 0 members (page 548) in Site B, and an arbiter in Site A.
Two members in Site A, one priority 0 member in Site B, one priority 0 member in Site C, and an arbiter
in site A.
In most cases, the first architecture is preferable because it is the least complex.
To deploy a geographically redundant four-member set:
Step 1: Start each member of the replica set with the appropriate options. For each member, start a mongod and
specify the replica set name through the replSet option. Specify any other parameters specific to your deployment.
For replication-specific parameters, see cli-mongod-replica-set.
If your application connects to more than one replica set, each set should have a distinct name. Some drivers group
replica set connections by replica set name.
The following example specifies the replica set name through the --replSet command-line option:
mongod --replSet "rs0"
591
You can also specify the replica set name in the configuration file. To start mongod with a configuration file, specify the file with the --config option:
mongod --config $HOME/.mongodb/config
In production deployments, you can configure a control script to manage this process. Control scripts are beyond the
scope of this document.
Step 2: Connect a mongo shell to a replica set member.
localhost on the default port of 27017, simply issue:
mongo
Step 3: Initiate the replica set. Use rs.initiate() on the replica set member:
rs.initiate()
MongoDB initiates a set that consists of the current member and that uses the default replica set configuration.
Step 4: Verify the initial replica set configuration. Use rs.conf() to display the replica set configuration object
(page 632):
rs.conf()
Step 8: Check the status of the replica set. Use the rs.status() operation:
rs.status()
Step 5: Add the remaining members to the replica set. Use rs.add() in a mongo shell connected to the current
primary. The commands should resemble the following:
rs.add("mongodb1.example.net")
rs.add("mongodb2.example.net")
rs.add("mongodb3.example.net")
When complete, you should have a fully functional replica set. The new replica set will elect a primary.
Step 6: Add the arbiter. In the same shell session, issue the following command to add the arbiter (e.g.
mongodb4.example.net):
592
Chapter 9. Replication
rs.addArb("mongodb4.example.net")
Step 7: Configure outside members as priority 0 members. Configure each member located outside of Site A (e.g.
mongodb3.example.net) as a priority 0 member (page 548).
1. View the replica set configuration to determine the members array position for the member. Keep in mind the
array position is not the same as the _id:
rs.conf()
2. Copy the replica set configuration object to a variable (to cfg in the example below). Then, in the variable,
set the correct priority for the member. Then pass the variable to rs.reconfig() to update the replica set
configuration.
For example, to set priority for the third member in the array (i.e., the member at position 2), issue the following
sequence of commands:
cfg = rs.conf()
cfg.members[2].priority = 0
rs.reconfig(cfg)
Note: The rs.reconfig() shell method can force the current primary to step down, causing an election.
When the primary steps down, all clients will disconnect. This is the intended behavior. While most elections complete within a minute, always make sure any replica configuration changes occur during scheduled
maintenance periods.
After these commands return, you have a geographically redundant four-member replica set.
Deploy a Geographically Redundant Set with More than Four Members The above procedures detail the steps
necessary for deploying a geographically redundant replica set. Larger replica set deployments follow the same steps,
but have additional considerations:
Never deploy more than seven voting members.
If you have an even number of members, use the procedure for a four-member set (page 591)). Ensure that
a single facility, Site A, always has a majority of the members by deploying the arbiter in that site. For
example, if a set has six members, deploy at least three voting members in addition to the arbiter in Site A, and
the remaining members in alternate sites.
If you have an odd number of members, use the procedure for a three-member set (page 589). Ensure that a
single facility, Site A always has a majority of the members of the set. For example, if a set has five members,
deploy three members within Site A and two members in other facilities.
If you have a majority of the members of the set outside of Site A and the network partitions to prevent communication between sites, the current primary in Site A will step down, even if none of the members outside of
Site A are eligible to become primary.
Add an Arbiter to Replica Set
Arbiters are mongod instances that are part of a replica set but do not hold data. Arbiters participate in elections
(page 561) in order to break ties. If a replica set has an even number of members, add an arbiter.
Arbiters have minimal resource requirements and do not require dedicated hardware. You can deploy an arbiter on an
application server or a monitoring host.
Important: Do not run an arbiter on the same system as a member of the replica set.
9.3. Replica Set Tutorials
593
Considerations
An arbiter does not store data, but until the arbiters mongod process is added to the replica set, the arbiter will act
like any other mongod process and start up with a set of data files and with a full-sized journal.
To minimize the default creation of data, set the following in the arbiters configuration file:
journal.enabled to false
Warning: Never set journal.enabled to false on a data-bearing node.
smallFiles to true
These settings are specific to arbiters. Do not set journal.enabled to false on a data-bearing node. Similarly,
do not set smallFiles unless specifically indicated.
Add an Arbiter
1. Create a data directory (e.g. dbPath) for the arbiter. The mongod instance uses the directory for configuration
data. The directory will not hold the data set. For example, create the /data/arb directory:
mkdir /data/arb
2. Start the arbiter. Specify the data directory and the replica set name. The following, starts an arbiter using the
/data/arb dbPath for the rs replica set:
mongod --port 30000 --dbpath /data/arb --replSet rs
3. Connect to the primary and add the arbiter to the replica set. Use the rs.addArb() method, as in the following
example:
rs.addArb("m1.example.net:30000")
This operation adds the arbiter running on port 30000 on the m1.example.net host.
Convert a Standalone to a Replica Set
This tutorial describes the process for converting a standalone mongod instance into a three-member replica set. Use
standalone instances for testing and development, but always use replica sets in production. To install a standalone
instance, see the installation tutorials (page 5).
To deploy a replica set without using a pre-existing mongod instance, see Deploy a Replica Set (page 583).
Procedure
594
Chapter 9. Replication
If your application connects to more than one replica set, each set should have a distinct name. Some drivers
group replica set connections by replica set name.
Expand the Replica Set Add additional replica set members by doing the following:
1. On two distinct systems, start two new standalone mongod instances. For information on starting a standalone
instance, see the installation tutorial (page 5) specific to your environment.
2. On your connection to the original mongod instance (the former standalone instance), issue a command in the
following form for each new instance to add to the replica set:
rs.add("<hostname><:port>")
Replace <hostname> and <port> with the resolvable hostname and port of the mongod instance to add to
the set. For more information on adding a host to a replica set, see Add Members to a Replica Set (page 595).
Sharding Considerations If the new replica set is part of a sharded cluster, change the shard host information in
the config database by doing the following:
1. Connect to one of the sharded clusters mongos instances and issue a command in the following form:
Replace <name> with the name of the shard. Replace <replica-set> with the name of the replica set.
Replace <member,><member,><> with the list of the members of the replica set.
2. Restart all mongos instances. If possible, restart all components of the replica sets (i.e., all mongos and all
shard mongod instances).
Add Members to a Replica Set
Overview
This tutorial explains how to add an additional member to an existing replica set. For background on replication
deployment patterns, see the Replica Set Deployment Architectures (page 553) document.
Maximum Voting Members A replica set can have a maximum of seven voting members (page 561). To add
a member to a replica set that already has seven votes, you must either add the member as a non-voting member
(page 564) or remove a vote from an existing member.
595
Control Scripts In production deployments you can configure a control script to manage member processes.
Existing Members You can use these procedures to add new members to an existing set. You can also use the same
procedure to re-add a removed member. If the removed members data is still relatively recent, it can recover and
catch up easily.
Data Files If you have a backup or snapshot of an existing member, you can move the data files (e.g. the dbPath
directory) to a new system and use them to quickly initiate a new member. The files must be:
A valid copy of the data files from a member of the same replica set. See Backup and Restore with Filesystem
Snapshots (page 241) document for more information.
Important: Always use filesystem snapshots to create a copy of a member of the existing replica set. Do not
use mongodump and mongorestore to seed a new replica set member.
More recent than the oldest operation in the primarys oplog. The new member must be able to become current
by applying operations from the primarys oplog.
Requirements
Prepare the Data Directory Before adding a new member to an existing replica set, prepare the new members data
directory using one of the following strategies:
Make sure the new members data directory does not contain data. The new member will copy the data from an
existing member.
If the new member is in a recovering state, it must exit and become a secondary before MongoDB can copy all
data as part of the replication process. This process takes time but does not require administrator intervention.
Manually copy the data directory from an existing member. The new member becomes a secondary member
and will catch up to the current state of the replica set. Copying the data over may shorten the amount of time
for the new member to become current.
Ensure that you can copy the data directory to the new member and begin replication within the window allowed
by the oplog (page 573). Otherwise, the new instance will have to perform an initial sync, which completely
resynchronizes the data, as described in Resync a Member of a Replica Set (page 613).
Use rs.printReplicationInfo() to check the current state of replica set members with regards to the
oplog.
For background on replication deployment patterns, see the Replica Set Deployment Architectures (page 553) document.
596
Chapter 9. Replication
Take note of the host name and port information for the new mongod instance.
For more information on configuration options, see the mongod manual page.
Optional
You can specify the data directory and replica set in the mongo.conf configuration file, and start the
mongod with the following command:
mongod --config /etc/mongodb.conf
rs.add("mongodb3.example.net")
4. Verify that the member is now part of the replica set. Call the rs.conf() method, which displays the replica
set configuration (page 632):
rs.conf()
To view replica set status, issue the rs.status() method. For a description of the status fields, see
http://docs.mongodb.org/manual/reference/command/replSetGetStatus.
Configure and Add a Member You can add a member to a replica set by passing to the rs.add() method a
members document. The document must be in the form of a replSetGetConfig.members document. These
documents define a replica set member in the same form as the replica set configuration document (page 632).
Important: Specify a value for the _id field of the members document. MongoDB does not automatically populate
the _id field in this case. Finally, the members document must declare the host value. All other fields are optional.
Example
To add a member with the following configuration:
an _id of 1.
a hostname and port number of mongodb3.example.net:27017.
a priority value within the replica set of 0.
a configuration as hidden,
Issue the following:
597
1. Shut down the mongod instance for the member you wish to remove. To shut down the instance, connect using
the mongo shell and the db.shutdownServer() method.
2. Connect to the replica sets current primary. To determine the current primary, use db.isMaster() while
connected to any member of the replica set.
3. Use rs.remove() in either of the following forms to remove the member:
rs.remove("mongod3.example.net:27017")
rs.remove("mongod3.example.net")
MongoDB disconnects the shell briefly as the replica set elects a new primary. The shell then automatically
reconnects. The shell displays a DBClientCursor::init call() failed error even though the command succeeds.
Remove a Member Using rs.reconfig()
To remove a member you can manually edit the replica set configuration document (page 632), as described here.
1. Shut down the mongod instance for the member you wish to remove. To shut down the instance, connect using
the mongo shell and the db.shutdownServer() method.
2. Connect to the replica sets current primary. To determine the current primary, use db.isMaster() while
connected to any member of the replica set.
3. Issue the rs.conf() method to view the current configuration document and determine the position in the
members array of the member to remove:
Example
mongod_C.example.net is in position 2 of the following configuration file:
{
"_id" : "rs",
"version" : 7,
"members" : [
{
"_id" : 0,
"host" : "mongod_A.example.net:27017"
},
{
"_id" : 1,
"host" : "mongod_B.example.net:27017"
},
{
"_id" : 2,
"host" : "mongod_C.example.net:27017"
598
Chapter 9. Replication
}
]
}
6. Overwrite the replica set configuration document with the new configuration by issuing the following:
rs.reconfig(cfg)
As a result of rs.reconfig() the shell will disconnect while the replica set renegotiates which member is
primary. The shell displays a DBClientCursor::init call() failed error even though the command succeeds, and will automatically reconnected.
7. To confirm the new configuration, issue rs.conf().
For the example above the output would be:
{
"_id" : "rs",
"version" : 8,
"members" : [
{
"_id" : 0,
"host" : "mongod_A.example.net:27017"
},
{
"_id" : 1,
"host" : "mongod_B.example.net:27017"
}
]
}
To change the hostname for a replica set member modify the host field. The value of _id field will not change when
you reconfigure the set.
See Replica Set Configuration (page 632) and rs.reconfig() for more information.
599
Note: Any replica set configuration change can trigger the current primary to step down, which forces an election
(page 561). During the election, the current shell session and clients connected to this replica set disconnect, which
produces an error even when the operation succeeds.
Example
To change the hostname to mongo2.example.net for the replica set member configured at members[0], issue
the following sequence of commands:
cfg = rs.conf()
cfg.members[0].host = "mongo2.example.net"
rs.reconfig(cfg)
The priority settings of replica set members affect the outcomes of elections (page 561) for primary. Use this setting
to ensure that some members are more likely to become primary and that others can never become primary.
The value of the members priority setting determines the members priority in elections. The higher the number,
the higher the priority.
Considerations
To modify priorities, you update the members array in the replica configuration object. The array index begins with
0. Do not confuse this index value with the value of the replica set members _id field in the array.
The value of priority can be any floating point (i.e. decimal) number between 0 and 1000. The default value for
the priority field is 1.
600
Chapter 9. Replication
To block a member from seeking election as primary, assign it a priority of 0. Hidden members (page 550), delayed
members (page 551), and arbiters (page ??) all have priority set to 0.
Adjust priority during a scheduled maintenance window. Reconfiguring priority can force the current primary to step
down, leading to an election. Before an election the primary closes all open client connections.
Procedure
Step 1: Copy the replica set configuration to a variable. In the mongo shell, use rs.conf() to retrieve the
replica set configuration and assign it to a variable. For example:
cfg = rs.conf()
Step 2: Change each members priority value. Change each members priority value, as configured in the
members array.
cfg.members[0].priority = 0.5
cfg.members[1].priority = 2
cfg.members[2].priority = 2
This sequence of operations modifies the value of cfg to set the priority for the first three members defined in the
members array.
Step 3: Assign the replica set the new configuration. Use rs.reconfig() to apply the new configuration.
rs.reconfig(cfg)
This operation updates the configuration of the replica set using the configuration defined by the value of cfg.
Prevent Secondary from Becoming Primary
Overview
In a replica set, by default all secondary members are eligible to become primary through the election process. You
can use the priority to affect the outcome of these elections by making some members more likely to become
primary and other members less likely or unable to become primary.
Secondaries that cannot become primary are also unable to trigger elections. In all other respects these secondaries
are identical to other secondaries.
To prevent a secondary member from ever becoming a primary in a failover, assign the secondary a priority of 0, as
described here. For a detailed description of secondary-only members and their purposes, see Priority 0 Replica Set
Members (page 548).
Considerations
When updating the replica configuration object, access the replica set members in the members array with the array
index. The array index begins with 0. Do not confuse this index value with the value of the _id field in each
document in the members array.
Note: MongoDB does not permit the current primary to have a priority of 0. To prevent the current primary from
again becoming a primary, you must first step down the current primary using rs.stepDown().
601
Procedure
Step 1: Retrieve the current replica set configuration. The rs.conf() method returns a replica set configuration document (page 632) that contains the current configuration for a replica set.
In a mongo shell connected to a primary, run the rs.conf() method and assign the result to a variable:
cfg = rs.conf()
The returned document contains a members field which contains an array of member configuration documents, one
document for each member of the replica set.
Step 2: Assign priority value of 0. To prevent a secondary member from becoming a primary, update the secondary
members priority to 0.
To assign a priority value to a member of the replica set, access the member configuration document using the array
index. In this tutorial, the secondary member to change corresponds to the configuration document found at position
2 of the members array.
cfg.members[2].priority = 0
The configuration change does not take effect until you reconfigure the replica set.
Step 3: Reconfigure the replica set. Use rs.reconfig() method to reconfigure the replica set with the updated
replica set configuration document.
Pass the cfg variable to the rs.reconfig() method:
rs.reconfig(cfg)
Related Documents
priority
Adjust Priority for Replica Set Member (page 600)
Replica Set Reconfiguration
Replica Set Elections (page 561)
602
Chapter 9. Replication
The most common use of hidden nodes is to support delayed members (page 551). If you only need to prevent a
member from becoming primary, configure a priority 0 member (page 548).
If the chainingAllowed setting allows secondary members to sync from other secondaries, MongoDB by default
prefers non-hidden members over hidden members when selecting a sync target. MongoDB will only choose hidden
members as a last resort. If you want a secondary to sync from a hidden member, use the replSetSyncFrom
database command to override the default sync target. See the documentation for replSetSyncFrom before using
the command.
See also:
Manage Chained Replication (page 621)
Changed in version 2.0: For sharded clusters running with replica sets before 2.0, if you reconfigured a member as
hidden, you had to restart mongos to prevent queries from reaching the hidden member.
Examples
Member Configuration Document To configure a secondary member as hidden, set its priority value to 0 and
set its hidden value to true in its member configuration:
{
"_id" : <num>
"host" : <hostname:port>,
"priority" : 0,
"hidden" : true
}
Configuration Procedure The following example hides the secondary member currently at the index 0 in the
members array. To configure a hidden member, use the following sequence of operations in a mongo shell connected to the primary, specifying the member to configure by its array index in the members array:
cfg = rs.conf()
cfg.members[0].priority = 0
cfg.members[0].hidden = true
rs.reconfig(cfg)
After re-configuring the set, this secondary member has a priority of 0 so that it cannot become primary and is hidden.
The other members in the set will not advertise the hidden member in the isMaster or db.isMaster() output.
When updating the replica configuration object, access the replica set members in the members array with the array
index. The array index begins with 0. Do not confuse this index value with the value of the _id field in each
document in the members array.
603
Warning:
The rs.reconfig() shell method can force the current primary to step down, which causes an election
(page 561). When the primary steps down, the mongod closes all client connections. While this typically
takes 10-20 seconds, try to make these changes during scheduled maintenance periods.
To successfully reconfigure a replica set, a majority of the members must be accessible. If your replica set
has an even number of members, add an arbiter (page 593) to ensure that members can quickly obtain a
majority of votes in an election for primary.
Related Documents
The following example sets a 1-hour delay on a secondary member currently at the index 0 in the members array. To
set the delay, issue the following sequence of operations in a mongo shell connected to the primary:
cfg = rs.conf()
cfg.members[0].priority = 0
cfg.members[0].hidden = true
cfg.members[0].slaveDelay = 3600
rs.reconfig(cfg)
After the replica set reconfigures, the delayed secondary member cannot become primary and is hidden from applications. The slaveDelay value delays both replication and the members oplog by 3600 seconds (1 hour).
When updating the replica configuration object, access the replica set members in the members array with the array
index. The array index begins with 0. Do not confuse this index value with the value of the _id field in each
document in the members array.
Warning:
The rs.reconfig() shell method can force the current primary to step down, which causes an election
(page 561). When the primary steps down, the mongod closes all client connections. While this typically
takes 10-20 seconds, try to make these changes during scheduled maintenance periods.
To successfully reconfigure a replica set, a majority of the members must be accessible. If your replica set
has an even number of members, add an arbiter (page 593) to ensure that members can quickly obtain a
majority of votes in an election for primary.
604
Chapter 9. Replication
Related Documents
slaveDelay
Replica Set Reconfiguration
Oplog Size (page 573)
Change the Size of the Oplog (page 608) tutorial
Replica Set Elections (page 561)
Configure Non-Voting Replica Set Member
Non-voting members allow you to add additional members for read distribution beyond the maximum seven voting
members. To configure a member as non-voting, set its votes value to 0.
Example
To disable the ability to vote in elections for the fourth, fifth, and sixth replica set members, use the following command
sequence in the mongo shell connected to the primary. You identify each replica set member by its array index in the
members array:
cfg = rs.conf()
cfg.members[3].votes = 0
cfg.members[4].votes = 0
cfg.members[5].votes = 0
rs.reconfig(cfg)
This sequence gives 0 votes to the fourth, fifth, and sixth members of the set according to the order of the members
array in the output of rs.conf(). This setting allows the set to elect these members as primary but does not allow
them to vote in elections. Place voting members so that your designated primary or primaries can reach a majority of
votes in the event of a network partition.
When updating the replica configuration object, access the replica set members in the members array with the array
index. The array index begins with 0. Do not confuse this index value with the value of the _id field in each
document in the members array.
Warning:
The rs.reconfig() shell method can force the current primary to step down, which causes an election
(page 561). When the primary steps down, the mongod closes all client connections. While this typically
takes 10-20 seconds, try to make these changes during scheduled maintenance periods.
To successfully reconfigure a replica set, a majority of the members must be accessible. If your replica set
has an even number of members, add an arbiter (page 593) to ensure that members can quickly obtain a
majority of votes in an election for primary.
In general and when possible, all members should have only 1 vote. This prevents intermittent ties, deadlocks, or
the wrong members from becoming primary. Use priority to control which members are more likely to become
primary.
Related Documents
votes
Replica Set Reconfiguration
9.3. Replica Set Tutorials
605
1. If your application is connecting directly to the secondary, modify the application so that MongoDB queries
dont reach the secondary.
2. Shut down the secondary.
3. Remove the secondary from the replica set by calling the rs.remove() method. Perform this operation while
connected to the current primary in the mongo shell:
rs.remove("<hostname><:port>")
4. Verify that the replica set no longer includes the secondary by calling the rs.conf() method in the mongo
shell:
rs.conf()
Optional
You may remove the data instead.
6. Create a new, empty data directory to point to when restarting the mongod instance. You can reuse the previous
name. For example:
mkdir /data/db
7. Restart the mongod instance for the secondary, specifying the port number, the empty data directory, and the
replica set. You can use the same port number you used before. Issue a command similar to the following:
mongod --port 27021 --dbpath /data/db --replSet rs
8. In the mongo shell convert the secondary to an arbiter using the rs.addArb() method:
rs.addArb("<hostname><:port>")
9. Verify the arbiter belongs to the replica set by calling the rs.conf() method in the mongo shell.
606
Chapter 9. Replication
rs.conf()
1. If your application is connecting directly to the secondary or has a connection string referencing the secondary,
modify the application so that MongoDB queries dont reach the secondary.
2. Create a new, empty data directory to be used with the new port number. For example:
mkdir /data/db-temp
3. Start a new mongod instance on the new port number, specifying the new data directory and the existing replica
set. Issue a command similar to the following:
mongod --port 27021 --dbpath /data/db-temp --replSet rs
4. In the mongo shell connected to the current primary, convert the new mongod instance to an arbiter using the
rs.addArb() method:
rs.addArb("<hostname><:port>")
5. Verify the arbiter has been added to the replica set by calling the rs.conf() method in the mongo shell.
rs.conf()
8. Verify that the replica set no longer includes the old secondary by calling the rs.conf() method in the mongo
shell:
rs.conf()
Optional
You may remove the data instead.
607
Perform Maintenance on Replica Set Members (page 610) Perform maintenance on a member of a replica set while
minimizing downtime.
Force a Member to Become Primary (page 611) Force a replica set member to become primary.
Resync a Member of a Replica Set (page 613) Sync the data on a member. Either perform initial sync on a new
member or resync the data on an existing member that has fallen too far behind to catch up by way of normal
replication.
Configure Replica Set Tag Sets (page 614) Assign tags to replica set members for use in targeting read and write
operations to specific members.
Reconfigure a Replica Set with Unavailable Members (page 618) Reconfigure a replica set when a majority of
replica set members are down or unreachable.
Manage Chained Replication (page 621) Disable or enable chained replication. Chained replication occurs when a
secondary replicates from another secondary instead of the primary.
Change Hostnames in a Replica Set (page 622) Update the replica set configuration to reflect changes in members
hostnames.
Configure a Secondarys Sync Target (page 625) Specify the member that a secondary member synchronizes from.
Change the Size of the Oplog
The oplog exists internally as a capped collection, so you cannot modify its size in the course of normal operations. In
most cases the default oplog size (page 573) is an acceptable size; however, in some situations you may need a larger
or smaller oplog. For example, you might need to change the oplog size if your applications perform large numbers of
multi-updates or deletes in short periods of time.
This tutorial describes how to resize the oplog. For a detailed explanation of oplog sizing, see Oplog Size (page 573).
For details how oplog size affects delayed members and affects replication lag, see Delayed Replica Set Members
(page 551).
Overview
To change the size of the oplog, you must perform maintenance on each member of the replica set in turn. The
procedure requires: stopping the mongod instance and starting as a standalone instance, modifying the oplog size,
and restarting the member.
Important: Always start rolling replica set maintenance with the secondaries, and finish with the maintenance on
primary member.
Procedure
608
Chapter 9. Replication
Restart a Secondary in Standalone Mode on a Different Port Shut down the mongod instance for one of the
non-primary members of your replica set. For example, to shut down, use the db.shutdownServer() method:
db.shutdownServer()
Restart this mongod as a standalone instance running on a different port and without the --replSet parameter. Use
a command similar to the following:
mongod --port 37017 --dbpath /srv/mongodb
Create a Backup of the Oplog (Optional) Optionally, backup the existing oplog on the standalone instance, as in
the following example:
mongodump --db local --collection 'oplog.rs' --port 37017
Recreate the Oplog with a New Size and a Seed Entry Save the last entry from the oplog. For example, connect
to the instance using the mongo shell, and enter the following command to switch to the local database:
use local
In mongo shell scripts you can use the following operation to set the db object:
db = db.getSiblingDB('local')
Ensure that the temp temporary collection is empty by dropping the collection:
db.temp.drop()
Use the db.collection.save() method and a sort on reverse natural order to find the last entry and save it to a
temporary collection:
db.temp.save( db.oplog.rs.find( { }, { ts: 1, h: 1 } ).sort( {$natural : -1} ).limit(1).next() )
Remove the Existing Oplog Collection Drop the old oplog.rs collection in the local database. Use the following command:
db = db.getSiblingDB('local')
db.oplog.rs.drop()
609
Insert the Last Entry of the Old Oplog into the New Oplog Insert the previously saved last entry from the old
oplog into the new oplog. For example:
db.oplog.rs.save( db.temp.findOne() )
To confirm the entry is in the new oplog, use the following operation:
db.oplog.rs.find()
Restart the Member Restart the mongod as a member of the replica set on its usual port. For example:
db.shutdownServer()
mongod --replSet rs0 --dbpath /srv/mongodb
The replica set member will recover and catch up before it is eligible for election to primary.
Repeat Process for all Members that may become Primary Repeat this procedure for all members you want to
change the size of the oplog. Repeat the procedure for the primary as part of the following step.
Change the Size of the Oplog on the Primary To finish the rolling maintenance operation, step down the primary
with the rs.stepDown() method and repeat the oplog resizing procedure above.
Perform Maintenance on Replica Set Members
Overview
Replica sets allow a MongoDB deployment to remain available during the majority of a maintenance window.
This document outlines the basic procedure for performing maintenance on each of the members of a replica set.
Furthermore, this particular sequence strives to minimize the amount of time that the primary is unavailable and
controlling the impact on the entire deployment.
Use these steps as the basis for common replica set operations, particularly for procedures such as upgrading to the
latest version of MongoDB (page 233) and changing the size of the oplog (page 608).
Procedure
For each member of a replica set, starting with a secondary member, perform the following sequence of events, ending
with the primary:
Restart the mongod instance as a standalone.
Perform the task on the standalone instance.
Restart the mongod instance as a member of the replica set.
Step 1: Stop a secondary. In the mongo shell, shut down the mongod instance:
db.shutdownServer()
610
Chapter 9. Replication
Step 2: Restart the secondary as a standalone on a different port. At the operating system shell prompt, restart
mongod as a standalone instance running on a different port and without the --replSet parameter:
mongod --port 37017 --dbpath /srv/mongodb
Always start mongod with the same user, even when restarting a replica set member as a standalone instance.
Step 3: Perform maintenance operations on the secondary. While the member is a standalone, use the mongo
shell to perform maintenance:
mongo --port 37017
Step 4: Restart mongod as a member of the replica set. After performing all maintenance tasks, use the following
procedure to restart the mongod as a member of the replica set on its usual port.
From the mongo shell, shut down the standalone server after completing the maintenance:
db.shutdownServer()
Restart the mongod instance as a member of the replica set using its normal command-line arguments or configuration
file.
The secondary takes time to catch up to the primary (page 574). From the mongo shell, use the following command
to verify that the member has caught up from the RECOVERING (page 636) state to the SECONDARY (page 635) state.
rs.status()
Step 5: Perform maintenance on the primary last. To perform maintenance on the primary after completing
maintenance tasks on all secondaries, use rs.stepDown() in the mongo shell to step down the primary and allow
one of the secondaries to be elected the new primary. Specify a 300 second waiting period to prevent the member from
being elected primary again for five minutes:
rs.stepDown(300)
After the primary steps down, the replica set will elect a new primary. See Replica Set Elections (page 561) for more
information about replica set elections.
Force a Member to Become Primary
Overview
You can force a replica set member to become primary by giving it a higher priority value than any other member
in the set.
Optionally, you also can force a member never to become primary by setting its priority value to 0, which means
the member can never seek election (page 561) as primary. For more information, see Priority 0 Replica Set Members
(page 548).
Consideration
A majority of the configured members of a replica set must be available for a set to reconfigure a set or elect a primary.
See Replica Set Elections (page 561) for more information.
611
Procedures
Force a Member to be Primary by Setting its Priority High Changed in version 2.0.
For more information on priorities, see priority.
This procedure assumes your current primary is m1.example.net and that youd like to instead make
m3.example.net primary. The procedure also assumes you have a three-member replica set with the configuration below. For more information on configurations, see Replica Set Configuration Use.
This procedure assumes this configuration:
{
"_id" : "rs",
"version" : 7,
"members" : [
{
"_id" : 0,
"host" : "m1.example.net:27017"
},
{
"_id" : 1,
"host" : "m2.example.net:27017"
},
{
"_id" : 2,
"host" : "m3.example.net:27017"
}
]
}
1. In the mongo shell, use the following sequence of operations to make m3.example.net the primary:
cfg = rs.conf()
cfg.members[0].priority = 0.5
cfg.members[1].priority = 0.5
cfg.members[2].priority = 1
rs.reconfig(cfg)
612
Chapter 9. Replication
This prevents m1.example.net from being primary for 86,400 seconds (24 hours), even if there is no other
member that can become primary. When m3.example.net catches up with m1.example.net it will
become primary.
If you later want to make m1.example.net primary again while it waits for m3.example.net to catch
up, issue the following command to make m1.example.net seek election again:
rs.freeze()
3. In a mongo shell connected the mongod running on mdb0.example.net, step down this instance that the
mongod is not eligible to become primary for 120 seconds:
rs.stepDown(120)
613
Restart the machine with a copy of a recent data directory from another member in the replica set. This procedure
can replace the data more quickly but requires more manual steps.
See Sync by Copying Data Files from Another Member (page 614).
Procedures
Warning: During initial sync, mongod will remove the content of the dbPath.
This procedure relies on MongoDBs regular process for initial sync (page 574). This will store the current data on the
member. For an overview of MongoDB initial sync process, see the Replication Processes (page 573) section.
If the instance has no data, you can simply follow the Add Members to a Replica Set (page 595) or Replace a Replica
Set Member (page 599) procedure to add a new member to a replica set.
You can also force a mongod that is already a member of the set to to perform an initial sync by restarting the instance
without the content of the dbPath as follows:
1. Stop the members mongod instance. To ensure a clean shutdown, use the db.shutdownServer() method
from the mongo shell or on Linux systems, the mongod --shutdown option.
2. Delete all data and sub-directories from the members data directory. By removing the data dbPath, MongoDB
will perform a complete resync. Consider making a backup first.
At this point, the mongod will perform an initial sync. The length of the initial sync process depends on the size of
the database and network connection between members of the replica set.
Initial sync operations can impact the other members of the set and create additional traffic to the primary and can only
occur if another member of the set is accessible and up to date.
Sync by Copying Data Files from Another Member This approach seeds a new or stale member using the data
files from an existing member of the replica set. The data files must be sufficiently recent to allow the new member to
catch up with the oplog. Otherwise the member would need to perform an initial sync.
Copy the Data Files You can capture the data files as either a snapshot or a direct copy. However, in most cases you
cannot copy data files from a running mongod instance to another because the data files will change during the file
copy operation.
Important: If copying data files, you must copy the content of the local database.
You cannot use a mongodump backup for the data files, only a snapshot backup. For approaches to capturing a
consistent snapshot of a running mongod instance, see the MongoDB Backup Methods (page 182) documentation.
Sync the Member After you have copied the data files from the seed source, start the mongod instance and allow
it to apply all operations from the oplog until it reflects the current state of the replica set.
Configure Replica Set Tag Sets
Tag sets let you customize write concern and read preferences for a replica set. MongoDB stores tag sets in the replica
set configuration object, which is the document returned by rs.conf(), in the members[n].tags embedded
document.
614
Chapter 9. Replication
This section introduces the configuration of tag sets. For an overview on tag sets and their use, see Replica Set Write
Concern (page 79) and Tag Sets (page 570).
Differences Between Read Preferences and Write Concerns
Custom read preferences and write concerns evaluate tags sets in different ways:
Read preferences consider the value of a tag when selecting a member to read from.
Write concerns do not use the value of a tag to select a member except to consider whether or not the value is
unique.
For example, a tag set for a read operation may resemble the following document:
{ "disk": "ssd", "use": "reporting" }
To fulfill such a read operation, a member would need to have both of these tags. Any of the following tag sets would
satisfy this requirement:
{
{
{
{
"disk":
"disk":
"disk":
"disk":
"ssd",
"ssd",
"ssd",
"ssd",
"use":
"use":
"use":
"use":
"reporting" }
"reporting", "rack": "a" }
"reporting", "rack": "d" }
"reporting", "mem": "r"}
The following tag sets would not be able to fulfill this query:
{
{
{
{
{
"disk": "ssd" }
"use": "reporting" }
"disk": "ssd", "use": "production" }
"disk": "ssd", "use": "production", "rack": "k" }
"disk": "spinning", "use": "reporting", "mem": "32" }
You could add tag sets to the members of this replica set with the following command sequence in the mongo shell:
615
conf = rs.conf()
conf.members[0].tags = { "dc": "east", "use": "production" }
conf.members[1].tags = { "dc": "east", "use": "reporting" }
conf.members[2].tags = { "use": "production" }
rs.reconfig(conf)
After this operation the output of rs.conf() would resemble the following:
{
"_id" : "rs0",
"version" : 2,
"members" : [
{
"_id" : 0,
"host" : "mongodb0.example.net:27017",
"tags" : {
"dc": "east",
"use": "production"
}
},
{
"_id" : 1,
"host" : "mongodb1.example.net:27017",
"tags" : {
"dc": "east",
"use": "reporting"
}
},
{
"_id" : 2,
"host" : "mongodb2.example.net:27017",
"tags" : {
"use": "production"
}
}
]
}
Given a five member replica set with members in two data centers:
1. a facility VA tagged dc.va
2. a facility GTO tagged dc.gto
Create a custom write concern to require confirmation from two data centers using replica set tags, using the following
sequence of operations in the mongo shell:
1. Create a replica set configuration JavaScript object conf:
conf = rs.conf()
616
Chapter 9. Replication
conf.members[0].tags
conf.members[1].tags
conf.members[2].tags
conf.members[3].tags
conf.members[4].tags
rs.reconfig(conf)
=
=
=
=
=
{
{
{
{
{
"dc.va": "rack1"}
"dc.va": "rack2"}
"dc.gto": "rack1"}
"dc.gto": "rack2"}
"dc.va": "rack1"}
3. Create a custom getLastErrorModes setting to ensure that the write operation will propagate to at least
one member of each facility:
conf.settings = { getLastErrorModes: { MultipleDC : { "dc.va": 1, "dc.gto": 1}}
4. Reconfigure the replica set using the modified conf configuration object:
rs.reconfig(conf)
To ensure that a write operation propagates to at least one member of the set in both data centers, use the MultipleDC
write concern mode as follows:
db.users.insert( { id: "xyz", status: "A" }, { writeConcern: { w: "MultipleDC" } } )
Alternatively, if you want to ensure that each write operation propagates to at least 2 racks in each facility, reconfigure
the replica set as follows in the mongo shell:
1. Create a replica set configuration object conf:
conf = rs.conf()
2. Redefine the getLastErrorModes value to require two different values of both dc.va and dc.gto:
conf.settings = { getLastErrorModes: { MultipleDC : { "dc.va": 2, "dc.gto": 2}}
3. Reconfigure the replica set using the modified conf configuration object:
rs.reconfig(conf)
Now, the following write operation will only return after the write operation propagates to at least two different racks
in the each facility:
Changed in version 2.6: A new protocol for write operations (page 815) integrates write concerns with the write
operations. Previous versions used the getLastError command to specify the write concerns.
db.users.insert( { id: "xyz", status: "A" }, { writeConcern: { w: "MultipleDC" } } )
Configure Tag Sets for Functional Segregation of Read and Write Operations
13
Since read preferences and write concerns use the value of fields in tag sets differently, larger deployments may have some redundancy.
617
To target a read operation to a member of the replica set with a disk type of ssd, you could use the following tag set:
{ disk: "ssd" }
However, to create comparable write concern modes, you would specify a different set of getLastErrorModes
configuration. Consider the following sequence of operations in the mongo shell:
1. Create a replica set configuration object conf:
conf = rs.conf()
3. Reconfigure the replica set using the modified conf configuration object:
rs.reconfig(conf)
Now you can specify the MultipleDC write concern mode, as in the following, to ensure that a write operation
propagates to each data center.
Changed in version 2.6: A new protocol for write operations (page 815) integrates write concerns with the write
operations. Previous versions used the getLastError command to specify the write concerns.
db.users.insert( { id: "xyz", status: "A" }, { writeConcern: { w: "MultipleDC" } } )
Additionally, you can specify the ssd write concern mode to ensure that a write operation propagates to at least one
instance with an SSD.
Reconfigure a Replica Set with Unavailable Members
To reconfigure a replica set when a majority of members are available, use the rs.reconfig() operation on the
current primary, following the example in the Replica Set Reconfiguration Procedure.
This document provides the following options for re-configuring a replica set when only a minority of members are
accessible:
Reconfigure by Forcing the Reconfiguration (page 619)
Reconfigure by Replacing the Replica Set (page 619)
You may need to use one of these procedures, for example, in a geographically distributed replica set, where no local
group of members can reach a majority. See Replica Set Elections (page 561) for more information on this situation.
618
Chapter 9. Replication
3. On the same member, remove the down and unreachable members of the replica set from the members array
by setting the array equal to the surviving members alone. Consider the following example, which uses the cfg
variable created in the previous step:
cfg.members = [cfg.members[0] , cfg.members[4] , cfg.members[7]]
4. On the same member, reconfigure the set by using the rs.reconfig() command with the force option set
to true:
rs.reconfig(cfg, {force : true})
This operation forces the secondary to use the new configuration. The configuration is then propagated to all the
surviving members listed in the members array. The replica set then elects a new primary.
Note: When you use force : true, the version number in the replica set configuration increases significantly, by tens or hundreds of thousands. This is normal and designed to prevent set version collisions if you
accidentally force re-configurations on both sides of a network partition and then the network partitioning ends.
5. If the failure or partition was only temporary, shut down or decommission the removed members as soon as
possible.
Reconfigure by Replacing the Replica Set
Use the following procedure only for versions of MongoDB prior to version 2.0. If youre running MongoDB 2.0 or
later, use the above procedure, Reconfigure by Forcing the Reconfiguration (page 619).
These procedures are for situations where a majority of the replica set members are down or unreachable. If a majority
is running, then skip these procedures and instead use the rs.reconfig() command according to the examples in
replica-set-reconfiguration-usage.
If you run a pre-2.0 version and a majority of your replica set is down, you have the two options described here. Both
involve replacing the replica set.
Reconfigure by Turning Off Replication This option replaces the replica set with a standalone server.
619
1. Stop the surviving mongod instances. To ensure a clean shutdown, use an existing control script or use the
db.shutdownServer() method.
For example, to use the db.shutdownServer() method, connect to the server using the mongo shell and
issue the following sequence of commands:
use admin
db.shutdownServer()
2. Create a backup of the data directory (i.e. dbPath) of the surviving members of the set.
Optional
If you have a backup of the database you may instead remove this data.
3. Restart one of the mongod instances without the --replSet parameter.
The data is now accessible and provided by a single server that is not a replica set member. Clients can use this
server for both reads and writes.
When possible, re-deploy a replica set to provide redundancy and to protect your deployment from operational interruption.
Reconfigure by Breaking the Mirror This option selects a surviving replica set member to be the new primary
and to seed a new replica set. In the following procedure, the new primary is db0.example.net. MongoDB
copies the data from db0.example.net to all the other members.
1. Stop the surviving mongod instances. To ensure a clean shutdown, use an existing control script or use the
db.shutdownServer() method.
For example, to use the db.shutdownServer() method, connect to the server using the mongo shell and
issue the following sequence of commands:
use admin
db.shutdownServer()
2. Move the data directories (i.e. dbPath) for all the members except db0.example.net, so that all the
members except db0.example.net have empty data directories. For example:
mv /data/db /data/db-old
3. Move the data files for local database (i.e. local.*) so that db0.example.net has no local database.
For example
mkdir /data/local-old
mv /data/db/local* /data/local-old/
MongoDB performs an initial sync on the added members by copying all data from db0.example.net to
the added members.
See also:
Resync a Member of a Replica Set (page 613)
620
Chapter 9. Replication
To disable chained replication, set the chainingAllowed field in Replica Set Configuration (page 632) to false.
You can use the following sequence of commands to set chainingAllowed to false:
1. Copy the configuration settings into the cfg object:
cfg = rs.config()
2. Take note of whether the current configuration settings contain the settings embedded document. If they do,
skip this step.
Warning: To avoid data loss, skip this step if the configuration settings contain the settings embedded
document.
If the current configuration settings do not contain the settings embedded document, create the embedded
document by issuing the following command:
cfg.settings = { }
To re-enable chained replication, set chainingAllowed to true. You can use the following sequence of commands:
cfg = rs.config()
cfg.settings.chainingAllowed = true
rs.reconfig(cfg)
621
Overview
This document provides two separate procedures for changing the hostnames in the host field. Use either of the
following approaches:
Change hostnames without disrupting availability (page 623). This approach ensures your applications will
always be able to read and write data to the replica set, but the approach can take a long time and may incur
downtime at the application layer.
If you use the first procedure, you must configure your applications to connect to the replica set at both the old
and new locations, which often requires a restart and reconfiguration at the application layer and which may
affect the availability of your applications. Re-configuring applications is beyond the scope of this document.
Stop all members running on the old hostnames at once (page 624). This approach has a shorter maintenance
window, but the replica set will be unavailable during the operation.
See also:
Replica Set Reconfiguration Process, Deploy a Replica Set (page 583), and Add Members to a Replica Set (page 595).
Assumptions
622
Chapter 9. Replication
(d) Use rs.reconfig() to update the replica set configuration document (page 632) with the new hostname.
For example, the following sequence of commands updates the hostname for the secondary at the array
index 1 of the members array (i.e. members[1]) in the replica set configuration document:
cfg = rs.conf()
cfg.members[1].host = "mongodb1.example.net:27017"
rs.reconfig(cfg)
623
3. For each member of the replica set, perform the following sequence of operations:
(a) Open a mongo shell connected to the mongod running on the new, temporary port. For example, for a
member running on a temporary port of 37017, you would issue this command:
mongo --port 37017
(b) Edit the replica set configuration manually. The replica set configuration is the only document in the
system.replset collection in the local database. Edit the replica set configuration with the new
hostnames and correct ports for all the members of the replica set. Consider the following sequence of
commands to change the hostnames in a three-member set:
use local
cfg = db.system.replset.findOne( { "_id": "rs" } )
cfg.members[0].host = "mongodb0.example.net:27017"
cfg.members[1].host = "mongodb1.example.net:27017"
cfg.members[2].host = "mongodb2.example.net:27017"
db.system.replset.update( { "_id": "rs" } , cfg )
624
Chapter 9. Replication
5. Connect to one of the mongod instances using the mongo shell. For example:
mongo --port 27017
Secondaries capture data from the primary member to maintain an up to date copy of the sets data. However, by
default secondaries may automatically change their sync targets to secondary members based on changes in the ping
time between members and the state of other members replication. See Replica Set Data Synchronization (page 574)
and Manage Chained Replication (page 621) for more information.
For some deployments, implementing a custom replication sync topology may be more effective than the default sync
target selection logic. MongoDB provides the ability to specify a host to use as a sync target.
To override the default sync target selection logic, you may manually configure a secondary members sync target to
temporarily pull oplog entries. The following provide access to this functionality:
replSetSyncFrom command, or
rs.syncFrom() helper in the mongo shell
Considerations
Sync Logic Only modify the default sync logic as needed, and always exercise caution. rs.syncFrom() will
not affect an in-progress initial sync operation. To affect the sync target for the initial sync, run rs.syncFrom()
operation before initial sync.
9.3. Replica Set Tutorials
625
If you run rs.syncFrom() during initial sync, MongoDB produces no error messages, but the sync target will not
change until after the initial sync operation.
Persistence replSetSyncFrom and rs.syncFrom() provide a temporary override of default behavior.
mongod will revert to the default sync behavior in the following situations:
The mongod instance restarts.
The connection between the mongod and the sync target closes.
Changed in version 2.4: The sync target falls more than 30 seconds behind another member of the replica set; the
mongod will revert to the default sync target.
Target The member to sync from must be a valid source for data in the set. To sync from a member, the member
must:
Have data. It cannot be an arbiter, in startup or recovering mode, and must be able to answer data queries.
Be accessible.
Be a member of the same set in the replica set configuration.
Build indexes with the buildIndexes setting.
A different member of the set, to prevent syncing from itself.
If you attempt to replicate from a member that is more than 10 seconds behind the current member, mongod will log
a warning but will still replicate from the lagging member.
If you run replSetSyncFrom during initial sync, MongoDB produces no error messages, but the sync target will
not change until after the initial sync operation.
Procedure
626
Chapter 9. Replication
A delayed member (page 551) may show as 0 seconds behind the primary when the inactivity period on the
primary is greater than the slaveDelay value.
Note: The rs.status() method is a wrapper around the replSetGetStatus database command.
Monitor the rate of replication by watching the oplog time in the replica graph in the MongoDB Management
Service14 . For more information see the documentation for MMS15 .
Possible causes of replication lag include:
Network Latency
Check the network routes between the members of your set to ensure that there is no packet loss or network
routing issue.
Use tools including ping to test latency between set members and traceroute to expose the routing of
packets network endpoints.
Disk Throughput
If the file system and disk device on the secondary is unable to flush data to disk as quickly as the primary, then
the secondary will have difficulty keeping state. Disk-related issues are incredibly prevalent on multi-tenant
systems, including virtualized instances, and can be transient if the system accesses disk devices over an IP
network (as is the case with Amazons EBS system.)
Use system-level tools to assess disk status, including iostat or vmstat.
Concurrency
In some cases, long-running operations on the primary can block replication on secondaries. For best results,
configure write concern (page 76) to require confirmation of replication to secondaries, as described in replica
set write concern (page 79). This prevents write operations from returning if replication cannot keep up with
the write load.
Use the database profiler to see if there are slow queries or long-running operations that correspond to the
incidences of lag.
14 https://mms.mongodb.com/
15 https://docs.mms.mongodb.com/
627
2. Test the connection from m2.example.net to the other two hosts with the following operation set from
m2.example.net, as in:
mongo --host m1.example.net --port 27017
mongo --host m3.example.net --port 27017
You have now tested the connection between m2.example.net and m1.example.net in both directions.
3. Test the connection from m3.example.net to the other two hosts with the following operation set from the
m3.example.net host, as in:
mongo --host m1.example.net --port 27017
mongo --host m2.example.net --port 27017
If any connection, in any direction fails, check your networking and firewall configuration and reconfigure your environment to allow these connections.
628
Chapter 9. Replication
10.10546875MB
94400 (26.22hrs)
Mon Mar 19 2012 13:50:38 GMT-0400 (EDT)
Wed Oct 03 2012 14:59:10 GMT-0400 (EDT)
Wed Oct 03 2012 15:00:21 GMT-0400 (EDT)
The oplog should be long enough to hold all transactions for the longest downtime you expect on a secondary. At a
minimum, an oplog should be able to hold minimum 24 hours of operations; however, many users prefer to have 72
hours or even a weeks work of operations.
For more information on how oplog size affects operations, see:
Oplog Size (page 573),
Delayed Replica Set Members (page 551), and
Check the Replication Lag (page 627).
Note: You normally want the oplog to be the same size on all members. If you resize the oplog, resize it on all
members.
To change oplog size, see the Change the Size of the Oplog (page 608) tutorial.
Oplog Entry Timestamp Error
Consider the following error in mongod output and logs:
replSet error fatal couldn't query the local local.oplog.rs collection.
<timestamp> [rsStart] bad replSet oplog entry?
629
Often, an incorrectly typed value in the ts field in the last oplog entry causes this error. The correct data type is
Timestamp.
Check the type of the ts value using the following two queries against the oplog collection:
db = db.getSiblingDB("local")
db.oplog.rs.find().sort({$natural:-1}).limit(1)
db.oplog.rs.find({ts:{$type:17}}).sort({$natural:-1}).limit(1)
The first query returns the last document in the oplog, while the second returns the last document in the oplog where
the ts value is a Timestamp. The $type operator allows you to select BSON type 17, is the Timestamp data type.
If the queries dont return the same document, then the last document in the oplog has the wrong data type in the ts
field.
Example
If the first query returns this as the last oplog entry:
{ "ts" : {t: 1347982456000, i: 1},
"h" : NumberLong("8191276672478122996"),
"op" : "n",
"ns" : "",
"o" : { "msg" : "Reconfig set", "version" : 4 } }
And the second query returns this as the last entry where ts has the Timestamp type:
{ "ts" : Timestamp(1347982454000, 1),
"h" : NumberLong("6188469075153256465"),
"op" : "n",
"ns" : "",
"o" : { "msg" : "Reconfig set", "version" : 3 } }
Then the value for the ts field in the last oplog entry is of the wrong data type.
To set the proper type for this value and resolve this issue, use an update operation that resembles the following:
db.oplog.rs.update( { ts: { t:1347982456000, i:1 } },
{ $set: { ts: new Timestamp(1347982456000, 1)}})
Modify the timestamp values as needed based on your oplog entry. This operation may take some period to complete
because the update must scan and pull the entire oplog into memory.
Duplicate Key Error on local.slaves
Changed in version 3.0.0.
MongoDB 3.0.0 removes the local.slaves (page 634) collection. For local.slaves error in earlier versions
of MongoDB, refer to the appropriate version of the MongoDB Manual.
630
Chapter 9. Replication
631
The following document provides a representation of a replica set configuration document. The configuration of your
replica set may include only a subset of these settings:
{
_id: <string>,
version: <int>,
members: [
{
_id: <int>,
host: <string>,
arbiterOnly: <boolean>,
buildIndexes: <boolean>,
hidden: <boolean>,
priority: <number>,
tags: <document>,
slaveDelay: <int>,
votes: <number>
},
...
],
settings: {
getLastErrorDefaults : <document>,
chainingAllowed : <boolean>,
getLastErrorModes : <document>,
heartbeatTimeoutSecs: <int>
}
}
Every mongod instance has its own local database, which stores data used in the replication process, and other
instance-specific data. The local database is invisible to replication: collections in the local database are not
replicated.
In replication, the local database store stores internal replication data for each member of a replica set. The local
stores the following collections:
Changed in version 2.4: When running with authentication (i.e. authorization), authenticating to the local
database is not equivalent to authenticating to the admin database. In previous versions, authenticating to the local
database provided access to all databases.
632
Chapter 9. Replication
local.startup_log
On startup, each mongod instance inserts a document into startup_log (page 633) with diagnostic information about the mongod instance itself and host information. startup_log (page 633) is a capped collection.
This information is primarily useful for diagnostic purposes.
Example
Consider the following prototype of a document from the startup_log (page 633) collection:
{
"_id" : "<string>",
"hostname" : "<string>",
"startTime" : ISODate("<date>"),
"startTimeLocal" : "<string>",
"cmdLine" : {
"dbpath" : "<path>",
"<option>" : <value>
},
"pid" : <number>,
"buildinfo" : {
"version" : "<string>",
"gitVersion" : "<string>",
"sysInfo" : "<string>",
"loaderFlags" : "<string>",
"compilerFlags" : "<string>",
"allocator" : "<string>",
"versionArray" : [ <num>, <num>, <...> ],
"javascriptEngine" : "<string>",
"bits" : <number>,
"debug" : <boolean>,
"maxBsonObjectSize" : <number>
}
}
Documents in the startup_log (page 633) collection contain the following fields:
local.startup_log._id
Includes the system hostname and a millisecond epoch value.
local.startup_log.hostname
The systems hostname.
local.startup_log.startTime
A UTC ISODate value that reflects when the server started.
local.startup_log.startTimeLocal
A string that reports the startTime (page 633) in the systems local time zone.
local.startup_log.cmdLine
An embedded document that reports the mongod runtime options and their values.
local.startup_log.pid
The process identifier for this process.
local.startup_log.buildinfo
An embedded document that reports information about the build environment and settings used to compile
this mongod. This is the same output as buildInfo. See buildInfo.
633
local.system.replset
local.system.replset (page 634) holds the replica sets configuration object as its single document. To
view the objects configuration information, issue rs.conf() from the mongo shell. You can also query this
collection directly.
local.oplog.rs
local.oplog.rs (page 634) is the capped collection that holds the oplog. You set its size at creation using
the oplogSizeMB setting. To resize the oplog after replica set initiation, use the Change the Size of the Oplog
(page 608) procedure. For additional information, see the Oplog Size (page 573) section.
local.replset.minvalid
This contains an object used internally by replica sets to track replication status.
local.slaves
Removed in version 3.0: Replica set members no longer mirror replication status of the set to the
local.slaves (page 634) collection. Use rs.status() instead.
Collections used in Master/Slave Replication
634
Chapter 9. Replication
Num- Name
ber
0
STARTUP
(page 635)
1
PRIMARY
(page 635)
2
SECONDARY
(page 635)
3
RECOVERING
(page 636)
5
STARTUP2
(page 635)
6
UNKNOWN
(page 636)
7
ARBITER
(page 635)
8
DOWN
(page 636)
9
ROLLBACK
(page 636)
10
REMOVED
(page 636)
State Description
Not yet an active member of any set. All members start up in this state. The mongod
parses the replica set configuration document (page 600) while in STARTUP (page 635).
The member in state primary (page 546) is the only member that can accept write
operations.
A member in state secondary (page 547) is replicating the data store. Data is available
for reads, although they may be stale.
Can vote. Members either perform startup self-checks, or transition from completing a
rollback (page 564) or resync (page 613).
The member has joined the set and is running an initial sync.
The members state, as seen from another member of the set, is not yet known.
Arbiters (page ??) do not replicate data and exist solely to participate in elections.
The member, as seen from another member of the set, is unreachable.
This member is actively performing a rollback (page 564). Data is not available for
reads.
This member was once in a replica set but was subsequently removed.
States
Core States
PRIMARY
Members in PRIMARY (page 635) state accept write operations. A replica set has at most one primary at a time.
A SECONDARY (page 635) member becomes primary after an election (page 561). Members in the PRIMARY
(page 635) state are eligible to vote.
SECONDARY
Members in SECONDARY (page 635) state replicate the primarys data set and can be configured to accept read
operations. Secondaries are eligible to vote in elections, and may be elected to the PRIMARY (page 635) state if
the primary becomes unavailable.
ARBITER
Members in ARBITER (page 635) state do not replicate data or accept write operations. They are eligible to
vote, and exist solely to break a tie during elections. Replica sets should only have a member in the ARBITER
(page 635) state if the set would otherwise have an even number of members, and could suffer from tied elections. There should only be at most one arbiter configured in any replica set.
See Replica Set Members (page 545) for more information on core states.
Other States
STARTUP
Each member of a replica set starts up in STARTUP (page 635) state. mongod then loads that members
replica set configuration, and transitions the members state to STARTUP2 (page 635). Members in STARTUP
(page 635) are not eligible to vote, as they are not yet a recognized member of any replica set.
STARTUP2
Each member of a replica set enters the STARTUP2 (page 635) state as soon as mongod finishes loading
that members configuration, at which time it becomes an active member of the replica set. The member then
decides whether or not to undertake an initial sync. If a member begins an initial sync, the member remains in
635
STARTUP2 (page 635) until all data is copied and all indexes are built. Afterwards, the member transitions to
RECOVERING (page 636).
RECOVERING
A member of a replica set enters RECOVERING (page 636) state when it is not ready to accept reads. The
RECOVERING (page 636) state can occur during normal operation, and doesnt necessarily reflect an error
condition. Members in the RECOVERING (page 636) state are eligible to vote in elections, but are not eligible
to enter the PRIMARY (page 635) state.
A member transitions from RECOVERING (page 636) to SECONDARY (page 635) after replicating enough
data to guarantee a consistent view of the data for client reads. The only difference between RECOVERING
(page 636) and SECONDARY (page 635) states is that RECOVERING (page 636) prohibits client reads and
SECONDARY (page 635) permits them. SECONDARY (page 635) state does not guarantee anything about the
staleness of the data with respect to the primary.
Due to overload, a secondary may fall far enough behind the other members of the replica set such that it may
need to resync (page 613) with the rest of the set. When this happens, the member enters the RECOVERING
(page 636) state and requires manual intervention.
Error States Members in any error state cant vote.
UNKNOWN
Members that have never communicated status information to the replica set are in the UNKNOWN (page 636)
state.
DOWN
Members that lose their connection to the replica set are seen as DOWN (page 636) by the remaining members of
the set.
REMOVED
Members that are removed from the replica set enter the REMOVED (page 636) state. When members enter the
REMOVED (page 636) state, the logs will mark this event with a replSet REMOVED message entry.
ROLLBACK
Whenever the replica set replaces a primary in an election, the old primary may contain documents that did not
replicate to the secondary members. In this case, the old primary member reverts those writes. During rollback
(page 564), the member will have ROLLBACK (page 636) state.
FATAL
A member in FATAL (page 636) encountered an unrecoverable error. The member must be shut down and
restarted; a resync may be required as well.
Read Preference Reference
Read preference describes how MongoDB clients route read operations to members of a replica set.
By default, an application directs its read operations to the primary member in a replica set. Reading from the primary
guarantees that read operations reflect the latest version of a document. However, by distributing some or all reads to
secondary members of the replica set, you can improve read throughput or reduce latency for an application that does
not require fully up-to-date data.
636
Chapter 9. Replication
Read Preference
Description
Mode
primary (page 637)
Default mode. All operations read from the current replica set primary.
primaryPreferred In most situations, operations read from the primary but if it is unavailable, operations
(page 637)
read from secondary members.
secondary
All operations read from the secondary members of the replica set.
(page 637)
secondaryPreferred In most situations, operations read from secondary members but if no secondary
(page 637)
members are available, operations read from the primary.
nearest (page 638)
Operations read from member of the replica set with the least network latency,
irrespective of the members type.
Read Preference Modes
primary
All read operations use only the current replica set primary. This is the default. If the primary is unavailable,
read operations produce an error or throw an exception.
The primary (page 637) read preference mode is not compatible with read preference modes that use tag sets
(page 570). If you specify a tag set with primary (page 637), the driver will produce an error.
primaryPreferred
In most situations, operations read from the primary member of the set. However, if the primary is unavailable,
as is the case during failover situations, operations read from secondary members.
When the read preference includes a tag set (page 570), the client reads first from the primary, if available, and
then from secondaries that match the specified tags. If no secondaries have matching tags, the read operation
produces an error.
Since the application may receive data from a secondary, read operations using the primaryPreferred
(page 637) mode may return stale data in some situations.
Warning: Changed in version 2.2: mongos added full support for read preferences.
When connecting to a mongos instance older than 2.2, using a client that supports read preference modes,
primaryPreferred (page 637) will send queries to secondaries.
secondary
Operations read only from the secondary members of the set. If no secondaries are available, then this read
operation produces an error or exception.
Most sets have at least one secondary, but there are situations where there may be no available secondary. For
example, a set with a primary, a secondary, and an arbiter may not have any secondaries if a member is in
recovering state or unavailable.
When the read preference includes a tag set (page 570), the client attempts to find secondary members that
match the specified tag set and directs reads to a random secondary from among the nearest group (page 571).
If no secondaries have matching tags, the read operation produces an error. 16
Read operations using the secondary (page 637) mode may return stale data.
secondaryPreferred
In most situations, operations read from secondary members, but in situations where the set consists of a single
primary (and no other members), the read operation will use the sets primary.
16 If your set has more than one secondary, and you use the secondary (page 637) read preference mode, consider the following effect. If
you have a three member replica set (page 555) with a primary and two secondaries, and if one secondary becomes unavailable, all secondary
(page 637) queries must target the remaining secondary. This will double the load on this secondary. Plan and provide capacity to support this as
needed.
637
When the read preference includes a tag set (page 570), the client attempts to find a secondary member that
matches the specified tag set and directs reads to a random secondary from among the nearest group (page 571).
If no secondaries have matching tags, the client ignores tags and reads from the primary.
Read operations using the secondaryPreferred (page 637) mode may return stale data.
nearest
The driver reads from the nearest member of the set according to the member selection (page 571) process.
Reads in the nearest (page 638) mode do not consider the members type. Reads in nearest (page 638)
mode may read from both primaries and secondaries.
Set this mode to minimize the effect of network latency on read operations without preference for current or
stale data.
If you specify a tag set (page 570), the client attempts to find a replica set member that matches the specified
tag set and directs reads to an arbitrary member from among the nearest group (page 571).
Read operations using the nearest (page 638) mode may return stale data.
Note: All operations read from a member of the nearest group of the replica set that matches the specified
read preference mode. The nearest (page 638) mode prefers low latency reads over a members primary or
secondary status.
For nearest (page 638), the client assembles a list of acceptable hosts based on tag set and then narrows that
list to the host with the shortest ping time and all other members of the set that are within the local threshold,
or acceptable latency. See Member Selection (page 571) for more information.
Use Cases
Depending on the requirements of an application, you can configure different applications to use different read preferences, or use different read preferences for different queries in the same application. Consider the following applications for different read preference strategies.
Maximize Consistency To avoid stale reads under all circumstances, use primary (page 637). This prevents all
queries when the set has no primary, which happens during elections, or when a majority of the replica set is not
accessible.
Maximize Availability To permit read operations when possible, Use primaryPreferred (page 637). When
theres a primary you will get consistent reads, but if there is no primary you can still query secondaries.
Minimize Latency To always read from a low-latency node, use nearest (page 638). The driver or mongos will
read from the nearest member and those no more than 15 milliseconds 17 further away than the nearest member.
nearest (page 638) does not guarantee consistency. If the nearest member to your application server is a secondary
with some replication lag, queries could return stale data. nearest (page 638) only reflects network distance and
does not reflect I/O or CPU load.
Query From Geographically Distributed Members If the members of a replica set are geographically distributed,
you can create replica tags based that reflect the location of the instance and then configure your application to query
the members nearby.
17
This threshold is configurable. See localPingThresholdMs for mongos or your driver documentation for the appropriate setting.
638
Chapter 9. Replication
For example, if members in east and west data centers are tagged (page 614) {dc: east} and {dc:
west}, your application servers in the east data center can read from nearby members with the following read
preference:
db.collection.find().readPref( { mode: 'nearest',
tags: [ {'dc': 'east'} ] } )
Although nearest (page 638) already favors members with low network latency, including the tag makes the choice
more predictable.
Reduce load on the primary To shift read load from the primary, use mode secondary (page 637). Although
secondaryPreferred (page 637) is tempting for this use case, it carries some risk: if all secondaries are unavailable and your set has enough arbiters to prevent the primary from stepping down, then the primary will receive all
traffic from clients. If the primary is unable to handle this load, queries will compete with writes. For this reason, use
secondary (page 637) to distribute read load to replica sets, not secondaryPreferred (page 637).
Read Preferences for Database Commands
Because some database commands read and return data from the database, all of the official drivers support full read
preference mode semantics (page 637) for the following commands:
group
mapReduce 18
aggregate 19
collStats
dbStats
count
distinct
geoNear
geoSearch
geoWalk
parallelCollectionScan
New in version 2.4: mongos adds support for routing commands to shards using read preferences. Previously
mongos sent all commands to shards primaries.
18 Only inline mapReduce operations that do not write data support read preference, otherwise these operations must run on the primary
members.
19 Using the $out pipeline operator forces the aggregation pipeline to run on the primary.
639
640
Chapter 9. Replication
CHAPTER 10
Sharding
Sharding is the process of storing data records across multiple machines and is MongoDBs approach to meeting the
demands of data growth. As the size of the data increases, a single machine may not be sufficient to store the data nor
provide an acceptable read and write throughput. Sharding solves the problem with horizontal scaling. With sharding,
you add more machines to support data growth and the demands of read and write operations.
Sharding Introduction (page 641) A high-level introduction to horizontal scaling, data partitioning, and sharded
clusters in MongoDB.
Sharding Concepts (page 647) The core documentation of sharded cluster features, configuration, architecture and
behavior.
Sharded Cluster Components (page 647) A sharded cluster consists of shards, config servers, and mongos
instances.
Sharded Cluster Architectures (page 651) Outlines the requirements for sharded clusters, and provides examples of several possible architectures for sharded clusters.
Sharded Cluster Behavior (page 654) Discusses the operations of sharded clusters with regards to the automatic balancing of data in a cluster and other related availability and security considerations.
Sharding Mechanics (page 662) Discusses the internal operation and behavior of sharded clusters, including
chunk migration, balancing, and the cluster metadata.
Sharded Cluster Tutorials (page 669) Tutorials that describe common procedures and administrative operations relevant to the use and maintenance of sharded clusters.
Sharding Reference (page 715) Reference for sharding-related functions and operations.
641
Vertical scaling adds more CPU and storage resources to increase capacity. Scaling by adding capacity has limitations: high performance systems with large numbers of CPUs and large amount of RAM are disproportionately
more expensive than smaller systems. Additionally, cloud-based providers may only allow users to provision smaller
instances. As a result there is a practical maximum capability for vertical scaling.
Sharding, or horizontal scaling, by contrast, divides the data set and distributes the data over multiple servers, or
shards. Each shard is an independent database, and collectively, the shards make up a single logical database.
Sharding addresses the challenge of scaling to support high throughput and large data sets:
Sharding reduces the number of operations each shard handles. Each shard processes fewer operations as the
cluster grows. As a result, a cluster can increase capacity and throughput horizontally.
For example, to insert data, the application only needs to access the shard responsible for that record.
Sharding reduces the amount of data that each server needs to store. Each shard stores less data as the cluster
grows.
For example, if a database has a 1 terabyte data set, and there are 4 shards, then each shard might hold only
256GB of data. If there are 40 shards, then each shard might hold only 25GB of data.
Sharded cluster has the following components: shards, query routers and config servers.
Shards store the data. To provide high availability and data consistency, in a production sharded cluster, each shard is
a replica set 1 . For more information on replica sets, see Replica Sets (page 545).
Query Routers, or mongos instances, interface with client applications and direct operations to the appropriate shard
or shards. The query router processes and targets operations to shards and then returns results to the clients. A sharded
cluster can contain more than one query router to divide the client request load. A client sends requests to one query
router. Most sharded clusters have many query routers.
Config servers store the clusters metadata. This data contains a mapping of the clusters data set to the shards. The
query router uses this metadata to target operations to specific shards. Production sharded clusters have exactly 3
config servers.
643
the chunks evenly across the shards. To divide the shard key values into chunks, MongoDB uses either range based
partitioning or hash based partitioning. See the Shard Key (page 654) documentation for more information.
Range Based Sharding
For range-based sharding, MongoDB divides the data set into ranges determined by the shard key values to provide
range based partitioning. Consider a numeric shard key: If you visualize a number line that goes from negative
infinity to positive infinity, each value of the shard key falls at some point on that line. MongoDB partitions this line
into smaller, non-overlapping ranges called chunks where a chunk is range of values from some minimum value to
some maximum value.
Given a range based partitioning system, documents with close shard key values are likely to be in the same chunk,
and therefore on the same shard.
644
645
Next, the destination shard captures and applies all changes made to the data during the migration process. Finally,
the metadata regarding the location of the chunk on config server is updated.
If theres an error during the migration, the balancer aborts the process leaving the chunk unchanged on the origin
shard. MongoDB removes the chunks data from the origin shard after the migration completes successfully.
646
647
Shards A shard is a MongoDB instance that holds a subset of a collections data. Each shard is either a single
mongod instance or a replica set. In production, all shards are replica sets. For more information see Shards
(page 648).
Config Servers Each config server (page 650) is a mongod instance that holds metadata about the cluster. The
metadata maps chunks to shards. For more information, see Config Servers (page 650).
Routing Instances Each router is a mongos instance that routes the reads and writes from applications to the shards.
Applications do not access the shards directly. For more information see Sharded Cluster Query Routing
(page 658).
Enable sharding in MongoDB on a per-collection basis. For each collection you shard, you will specify a shard key
for that collection.
Deploy a sharded cluster, see Deploy a Sharded Cluster (page 670).
Shards
A shard is a replica set or a single mongod that contains a subset of the data for the sharded cluster. Together, the
clusters shards hold the entire data set for the cluster.
Typically each shard is a replica set. The replica set provides redundancy and high availability for the data in each
shard.
648
Important: MongoDB shards data on a per collection basis. You must access all data in a sharded cluster via the
mongos instances. If you connect directly to a shard, you will see only its fraction of the clusters data. There is no
particular order to the data set on a specific shard. MongoDB does not guarantee that any two contiguous chunks will
reside on a single shard.
Primary Shard
Every database has a primary 8 shard that holds all the un-sharded collections in that database.
To change the primary shard for a database, use the movePrimary command. The process of migrating the primary
shard may take significant time to complete, and you should not access the collections until it completes.
When you deploy a new sharded cluster with shards that were previously used as replica sets, all existing databases
continue to reside on their original shard. Databases created subsequently may reside on any shard in the cluster.
Shard Status
Use the sh.status() method in the mongo shell to see an overview of the cluster. This reports includes which
shard is primary for the database and the chunk distribution across the shards. See sh.status() method for more
details.
8
The term primary shard has nothing to do with the term primary in the context of replica sets.
649
Config Servers
Config servers are special mongod instances that store the metadata (page 668) for a sharded cluster. Config servers
use a two-phase commit to ensure immediate consistency and reliability. Config servers do not run as replica sets. All
config servers must be available to deploy a sharded cluster or to make any changes to cluster metadata.
A production sharded cluster has exactly three config servers. For testing purposes you may deploy a cluster with a
single config server. But to ensure redundancy and safety in production, you should always use three.
Warning: If your cluster has a single config server, then the config server is a single point of failure. If the config
server is inaccessible, the cluster is not accessible. If you cannot recover the data on a config server, the cluster
will be inoperable.
Always use three config servers for production deployments.
Each sharded cluster must have its own config servers. Do not use the same config servers for different sharded
clusters.
Tip
Use CNAMEs to identify your config servers to the cluster so that you can rename and renumber your config servers
without downtime.
Config Database
Config servers store the metadata in the config database (page 716). The mongos instances cache this data and use it
to route reads and writes to shards.
Read and Write Operations on Config Servers
MongoDB only writes data to the config server in the following cases:
To create splits in existing chunks. For more information, see chunk splitting (page 666).
To migrate a chunk between shards. For more information, see chunk migration (page 664).
MongoDB reads data from the config server data in the following cases:
A new mongos starts for the first time, or an existing mongos restarts.
After a chunk migration, the mongos instances update themselves with the new cluster metadata.
MongoDB also uses the config server to manage distributed locks.
Config Server Availability
If one or two config servers become unavailable, the clusters metadata becomes read only. You can still read and
write data from the shards, but no chunk migrations or splits will occur until all three servers are available.
If all three config servers are unavailable, you can still use the cluster if you do not restart the mongos instances
until after the config servers are accessible again. If you restart the mongos instances before the config servers are
available, the mongos will be unable to route reads and writes.
Clusters become inoperable without the cluster metadata. Always, ensure that the config servers remain available and
intact. As such, backups of config servers are critical. The data on the config server is small compared to the data
650
stored in a cluster. This means the config server has a relatively low activity load, and the config server does not need
to be always available to support a sharded cluster. As a result, it is easy to back up the config servers.
If the name or address that a sharded cluster uses to connect to a config server changes, you must restart every mongod
and mongos instance in the sharded cluster. Avoid downtime by using CNAMEs to identify config servers within the
MongoDB deployment.
See Renaming Config Servers and Cluster Availability (page 657) for more information.
Your cluster should manage a large quantity of data if sharding is to have an effect. The default chunk size is 64
megabytes. And the balancer (page 663) will not begin moving data across shards until the imbalance of chunks among
the shards exceeds the migration threshold (page 664). In practical terms, unless your cluster has many hundreds of
megabytes of data, your data will remain on a single shard.
In some situations, you may need to shard a small collection of data. But most of the time, sharding a small collection
is not worth the added complexity and overhead unless you need additional write capacity. If you have a small data
651
set, a properly configured single MongoDB instance or a replica set will usually be enough for your persistence layer
needs.
Chunk size is user configurable. For most deployments, the default value is of 64 megabytes is ideal. See
Chunk Size (page 666) for more information.
Production Cluster Architecture
In a production cluster, you must ensure that data is redundant and that your systems are highly available. To that end,
a production cluster must have the following components:
Components
Config Servers Three config servers (page 650). Each config server must be on separate machines. A single sharded
cluster must have exclusive use of its config servers (page 650). If you have multiple sharded clusters, you will need
to have a group of config servers for each cluster.
Shards Two or more replica sets. These replica sets are the shards. For information on replica sets, see Replication
(page 541).
Query Routers (mongos) One or more mongos instances. The mongos instances are the routers for the cluster.
Typically, deployments have one mongos instance on each application server.
You may also deploy a group of mongos instances and use a proxy/load balancer between the application and the
mongos. In these deployments, you must configure the load balancer for client affinity so that every connection from
a single client reaches the same mongos.
Because cursors and other resources are specific to an single mongos instance, each client must interact with only
one mongos instance.
Example
652
653
When a chunk grows beyond the chunk size (page 666), MongoDB attempts to split the chunk into smaller chunks,
always based on ranges in the shard key.
Considerations
Shard keys are immutable and cannot be changed after insertion. See the system limits for sharded cluster for more
information.
The index on the shard key cannot be a multikey index (page 474).
Hashed Shard Keys
654
If you shard an empty collection using a hashed shard key, MongoDB will automatically create and migrate
chunks so that each shard has two chunks. You can control how many chunks MongoDB will create with the
numInitialChunks parameter to shardCollection or by manually creating chunks on the empty collection
using the split command.
To shard a collection using a hashed shard key, see Shard a Collection Using a Hashed Shard Key (page 675).
Tip
MongoDB automatically computes the hashes when resolving queries using hashed indexes. Applications do not need
to compute hashes.
The shard key affects write and query performance by determining how the MongoDB partitions data in the cluster
and how effectively the mongos instances can direct operations to the cluster. Consider the following operational
impacts of shard key selection:
Write Scaling Some possible shard keys will allow your application to take advantage of the increased write capacity
that the cluster can provide, while others do not. Consider the following example where you shard by the values of the
default _id field, which is ObjectId.
MongoDB generates ObjectId values upon document creation to produce a unique identifier for the object. However, the most significant bits of data in this value represent a time stamp, which means that they increment in a regular
and predictable pattern. Even though this value has high cardinality (page 675), when using this, any date, or other
monotonically increasing number as the shard key, all insert operations will be storing data into a single chunk, and
therefore, a single shard. As a result, the write capacity of this shard will define the effective write capacity of the
cluster.
A shard key that increases monotonically will not hinder performance if you have a very low insert rate, or if most
of your write operations are update() operations distributed through your entire data set. Generally, choose shard
keys that have both high cardinality and will distribute write operations across the entire cluster.
Typically, a computed shard key that has some amount of randomness, such as ones that include a cryptographic
hash (i.e. MD5 or SHA1) of other content in the document, will allow the cluster to scale write operations. However,
random shard keys do not typically provide query isolation (page 655), which is another important characteristic of
shard keys.
New in version 2.4: MongoDB makes it possible to shard a collection on a hashed index. This can greatly improve
write scaling. See Shard a Collection Using a Hashed Shard Key (page 675).
Querying The mongos provides an interface for applications to interact with sharded clusters that hides the complexity of data partitioning. A mongos receives queries from applications, and uses metadata from the config server
(page 650), to route queries to the mongod instances with the appropriate data. While the mongos succeeds in making all querying operational in sharded environments, the shard key you select can have a profound affect on query
performance.
See also:
The Sharded Cluster Query Routing (page 658) and config server (page 650) sections for a more general overview of
querying in sharded environments.
Query Isolation Generally, the fastest queries in a sharded environment are those that mongos will route to a single
shard, using the shard key and the cluster meta data from the config server (page 650). For queries that dont include
655
the shard key, mongos must query all shards, wait for their responses and then return the result to the application.
These scatter/gather queries can be long running operations.
If your query includes the first component of a compound shard key 9 , the mongos can route the query directly to a
single shard, or a small number of shards, which provides better performance. Even if you query values of the shard
key that reside in different chunks, the mongos will route queries directly to specific shards.
To select a shard key for a collection:
determine the most commonly included fields in queries for a given application
find which of these operations are most performance dependent.
If this field has low cardinality (i.e not sufficiently selective) you should add a second field to the shard key making a
compound shard key. The data may become more splittable with a compound shard key.
See
Sharded Cluster Query Routing (page 658) for more information on query operations in the context of sharded clusters.
Sorting In sharded systems, the mongos performs a merge-sort of all sorted query results from the shards. See
Sharded Cluster Query Routing (page 658) and Use Indexes to Sort Query Results (page 533) for more information.
Indivisible Chunks An insufficiently granular shard key can result in chunks that are unsplittable. See Create a
Shard Key that is Easily Divisible (page 674) for more information.
Additional Information
If each application server has its own mongos instance, other application servers can continue access the database.
Furthermore, mongos instances do not maintain persistent state, and they can restart and become unavailable without
losing any state or data. When a mongos instance starts, it retrieves a copy of the config database and can begin
routing queries.
A Single mongod Becomes Unavailable in a Shard
Replica sets (page 541) provide high availability for shards. If the unavailable mongod is a primary, then the replica
set will elect (page 561) a new primary. If the unavailable mongod is a secondary, and it disconnects the primary and
9 In many ways, you can think of the shard key a cluster-wide index. However, be aware that sharded systems cannot enforce cluster-wide unique
indexes unless the unique field is in the shard key. Consider the Index Concepts (page 468) page for more information on indexes and compound
indexes.
656
secondary will continue to hold all data. In a three member replica set, even if a single member of the set experiences
catastrophic failure, two other members have full copies of the data. 10
Always investigate availability interruptions and failures. If a system is unrecoverable, replace it and create a new
member of the replica set as soon as possible to replace the lost redundancy.
All Members of a Replica Set Become Unavailable
If all members of a replica set within a shard are unavailable, all data held in that shard is unavailable. However, the
data on all other shards will remain available, and its possible to read and write data to the other shards. However,
your application must be able to deal with partial results, and you should investigate the cause of the interruption and
attempt to recover the shard as soon as possible.
One or Two Config Databases Become Unavailable
Three distinct mongod instances provide the config database using a special two-phase commits to maintain consistent
state between these mongod instances. Cluster operation will continue as normal but chunk migration (page 663) and
the cluster can create no new chunk splits (page 701). Replace the config server as soon as possible. If all config
databases become unavailable, the cluster can become inoperable.
Note: All config servers must be running and available when you first initiate a sharded cluster.
If the name or address that a sharded cluster uses to connect to a config server changes, you must restart every mongod
and mongos instance in the sharded cluster. Avoid downtime by using CNAMEs to identify config servers within the
MongoDB deployment.
To avoid downtime when renaming config servers, use DNS names unrelated to physical or virtual hostnames to refer
to your config servers (page 650).
Generally, refer to each config server using the DNS alias (e.g. a CNAME record). When specifying the config server
connection string to mongos, use these names. These records make it possible to change the IP address or rename
config servers without changing the connection string and without having to restart the entire cluster.
Shard Keys and Cluster Availability
657
If the shard key allows the mongos to isolate most operations to a single shard, then the failure of a single shard
will only render some data unavailable.
If your shard key distributes data required for every operation throughout the cluster, then the failure of the entire
shard will render the entire cluster unavailable.
In essence, this concern for reliability simply underscores the importance of choosing a shard key that isolates query
operations to a single shard.
Sharded Cluster Query Routing
MongoDB mongos instances route queries and write operations to shards in a sharded cluster. mongos provide the
only interface to a sharded cluster from the perspective of applications. Applications never connect or communicate
directly with the shards.
The mongos tracks what data is on which shard by caching the metadata from the config servers (page 650). The
mongos uses the metadata to route operations from applications and clients to the mongod instances. A mongos
has no persistent state and consumes minimal system resources.
The most common practice is to run mongos instances on the same systems as your application servers, but you can
maintain mongos instances on the shards or on other dedicated resources.
Note: Changed in version 2.1.
Some aggregation operations using the aggregate command (i.e. db.collection.aggregate()) will cause
mongos instances to require more CPU resources than in previous versions. This modified performance profile may
dictate alternate architecture decisions if you use the aggregation framework extensively in a sharded environment.
Routing Process
A mongos instance uses the following processes to route queries and return results.
How mongos Determines which Shards Receive a Query A mongos instance routes a query to a cluster by:
1. Determining the list of shards that must receive the query.
2. Establishing a cursor on all targeted shards.
In some cases, when the shard key or a prefix of the shard key is a part of the query, the mongos can route the query
to a subset of the shards. Otherwise, the mongos must direct the query to all shards that hold documents for that
collection.
Example
Given the following shard key:
{ zipcode: 1, u_id: 1, c_date: 1 }
Depending on the distribution of chunks in the cluster, the mongos may be able to target the query at a subset of
shards, if the query contains the following fields:
{ zipcode: 1 }
{ zipcode: 1, u_id: 1 }
{ zipcode: 1, u_id: 1, c_date: 1 }
658
How mongos Handles Query Modifiers If the result of the query is not sorted, the mongos instance opens a result
cursor that round robins results from all cursors on the shards.
Changed in version 2.0.5: In versions prior to 2.0.5, the mongos exhausted each cursor, one by one.
If the query specifies sorted results using the sort() cursor method, the mongos instance passes the $orderby
option to the shards. The primary shard for the database receives and performs a merge sort for all results before
returning the data to the client via the mongos.
If the query limits the size of the result set using the limit() cursor method, the mongos instance passes that limit
to the shards and then re-applies the limit to the result before returning the result to the client.
If the query specifies a number of records to skip using the skip() cursor method, the mongos cannot pass the skip
to the shards, but rather retrieves unskipped results from the shards and skips the appropriate number of documents
when assembling the complete result. However, when used in conjunction with a limit(), the mongos will pass
the limit plus the value of the skip() to the shards to improve the efficiency of these operations.
Detect Connections to mongos Instances
To detect if the MongoDB instance that your client is connected to is mongos, use the isMaster command. When
a client connects to a mongos, isMaster returns a document with a msg field that holds the string isdbgrid. For
example:
{
"ismaster" : true,
"msg" : "isdbgrid",
"maxBsonObjectSize" : 16777216,
"ok" : 1
}
If the application is instead connected to a mongod, the returned document does not include the isdbgrid string.
Broadcast Operations and Targeted Operations
659
660
For queries that include the shard key or portion of the shard key, mongos can target the query at a specific shard or
set of shards. This is the case only if the portion of the shard key included in the query is a prefix of the shard key. For
example, if the shard key is:
{ a: 1, b: 1, c: 1 }
The mongos program can route queries that include the full shard key or either of the following shard key prefixes at
a specific shard or set of shards:
{ a: 1 }
{ a: 1, b: 1 }
Depending on the distribution of data in the cluster and the selectivity of the query, mongos may still have to contact
multiple shards 11 to fulfill these queries.
11
mongos will route some queries, even some that include the shard key, to all shards, if needed.
661
Sharding operates on the collection level. You can shard multiple collections within a database or have multiple
databases with sharding enabled. 12 However, in production deployments, some databases and collections will use
sharding, while other databases and collections will only reside on a single shard.
Regardless of the data architecture of your sharded cluster, ensure that all queries and operations use the mongos
router to access the data cluster. Use the mongos even for operations that do not impact the sharded data.
662
The balancer process is responsible for redistributing the chunks of a sharded collection evenly among the shards for
every sharded collection. By default, the balancer process is always enabled.
Any mongos instance in the cluster can start a balancing round. When a balancer process is active, the responsible
mongos acquires a lock by modifying a document in the lock collection in the Config Database (page 716).
Note: Changed in version 2.0: Before MongoDB version 2.0, large differences in timekeeping (i.e. clock skew)
between mongos instances could lead to failed distributed locks. This carries the possibility of data loss, particularly
with skews larger than 5 minutes. Always use the network time protocol (NTP) by running ntpd on your servers to
minimize clock skew.
To address uneven chunk distribution for a sharded collection, the balancer migrates chunks (page 664) from shards
with more chunks to shards with a fewer number of chunks. The balancer migrates the chunks, one at a time, until
there is an even dispersion of chunks for the collection across the shards.
Chunk migrations carry some overhead in terms of bandwidth and workload, both of which can impact database
performance. The balancer attempts to minimize the impact by:
Moving only one chunk at a time. See also Chunk Migration Queuing (page 665).
Starting a balancing round only when the difference in the number of chunks between the shard with the greatest
number of chunks for a sharded collection and the shard with the lowest number of chunks for that collection
reaches the migration threshold (page 664).
663
You may disable the balancer temporarily for maintenance. See Disable the Balancer (page 695) for details.
You can also limit the window during which the balancer runs to prevent it from impacting production traffic. See
Schedule the Balancing Window (page 694) for details.
Note: The specification of the balancing window is relative to the local time zone of all individual mongos instances
in the cluster.
See also:
Manage Sharded Cluster Balancer (page 693).
Migration Thresholds
To minimize the impact of balancing on the cluster, the balancer will not begin balancing until the distribution of
chunks for a sharded collection has reached certain thresholds. The thresholds apply to the difference in number
of chunks between the shard with the most chunks for the collection and the shard with the fewest chunks for that
collection. The balancer has the following thresholds:
Changed in version 2.2: The following thresholds appear first in 2.2. Prior to this release, a balancing round would
only start if the shard with the most chunks had 8 more chunks than the shard with the least number of chunks.
Number of Chunks
Fewer than 20
20-79
80 and greater
Migration Threshold
2
4
8
Once a balancing round starts, the balancer will not stop until, for the collection, the difference between the number
of chunks on any two shards for that collection is less than two or a chunk migration fails.
Shard Size
By default, MongoDB will attempt to fill all available disk space with data on every shard as the data set grows. To
ensure that the cluster always has the capacity to handle data growth, monitor disk usage as well as other performance
metrics.
When adding a shard, you may set a maximum size for that shard. This prevents the balancer from migrating chunks
to the shard when the value of mapped exceeds the maximum size. Use the maxSize parameter of the addShard
command to set the maximum size for the shard.
See also:
Change the Maximum Storage Size for a Given Shard (page 692) and Monitoring for MongoDB (page 185).
Chunk Migration Across Shards
Chunk migration moves the chunks of a sharded collection from one shard to another and is part of the balancer
(page 663) process.
Chunk Migration
MongoDB migrates chunks in a sharded cluster to distribute the chunks of a sharded collection evenly among shards.
Migrations may be either:
664
Manual. Only use manual migration in limited cases, such as to distribute data during bulk inserts. See Migrating
Chunks Manually (page 702) for more details.
Automatic. The balancer (page 663) process automatically migrates chunks when there is an uneven distribution
of a sharded collections chunks across the shards. See Migration Thresholds (page 664) for more details.
All chunk migrations use the following procedure:
1. The balancer process sends the moveChunk command to the source shard.
2. The source starts the move with an internal moveChunk command. During the migration process, operations
to the chunk route to the source shard. The source shard is responsible for incoming write operations for the
chunk.
3. The destination shard builds any indexes required by the source that do not exist on the destination.
4. The destination shard begins requesting documents in the chunk and starts receiving copies of the data.
5. After receiving the final document in the chunk, the destination shard starts a synchronization process to ensure
that it has the changes to the migrated documents that occurred during the migration.
6. When fully synchronized, the destination shard connects to the config database and updates the cluster metadata
with the new location for the chunk.
7. After the destination shard completes the update of the metadata, and once there are no open cursors on the
chunk, the source shard deletes its copy of the documents.
Changed in version 2.4: If the balancer needs to perform additional chunk migrations from the source shard,
the balancer can start the next chunk migration without waiting for the current migration process to finish this
deletion step. See Chunk Migration Queuing (page 665).
The migration process ensures consistency and maximizes the availability of chunks during balancing.
Chunk Migration Queuing Changed in version 2.4.
To migrate multiple chunks from a shard, the balancer migrates the chunks one at a time. However, the balancer does
not wait for the current migrations delete phase to complete before starting the next chunk migration. See Chunk
Migration (page 664) for the chunk migration process and the delete phase.
This queuing behavior allows shards to unload chunks more quickly in cases of heavily imbalanced cluster, such as
when performing initial data loads without pre-splitting and when adding new shards.
This behavior also affect the moveChunk command, and migration scripts that use the moveChunk command may
proceed more quickly.
In some cases, the delete phases may persist longer. If multiple delete phases are queued but not yet complete, a crash
of the replica sets primary can orphan data from multiple migrations.
665
Chunk Migration and Replication Changed in version 3.0: The default value secondaryThrottle became
true for all chunk migrations.
New in version 3.0: The new writeConcern field in the balancer configuration document allows you to specify a
write concern (page 76) semantics the _secondaryThrottle option.
By default, each document operation during chunk migration propagates to at least one secondary before the balancer proceeds with the next document, which is equivalent to a write concern of { w: 1 }. You can set the
writeConcern option on the balancer configuration to set different write concern semantics.
To override this behavior and allow the balancer to continue without waiting for replication to a secondary, set the
_secondaryThrottle parameter to false. See Change Replication Behavior for Chunk Migration (Secondary
Throttle) (page 693) to update the _secondaryThrottle parameter for the balancer.
For the moveChunk command, the secondaryThrottle
_secondaryThrottle parameter for the balancer.
parameter
is
independent
of
the
Independent of the secondaryThrottle setting, certain phases of the chunk migration have the following replication policy:
MongoDB briefly pauses all application writes to the source shard before updating the config servers with the
new location for the chunk, and resumes the application writes after the update. The chunk move requires all
writes to be acknowledged by majority of the members of the replica set both before and after committing the
chunk move to config servers.
When an outgoing chunk migration finishes and cleanup occurs, all writes must be replicated to a majority of
servers before further cleanup (from other outgoing migrations) or new incoming migrations can proceed.
Changed in version 2.4: In previous versions, the balancer did not wait for the document move to replicate to a
secondary. For details, see Secondary Throttle in the v2.2 Manual13 .
Jumbo Chunks
During chunk migration, if the chunk exceeds the specified chunk size (page 666) or if the number of documents in the
chunk exceeds Maximum Number of Documents Per Chunk to Migrate, MongoDB does not migrate
the chunk. Instead, MongoDB attempts to split (page 666) the chunk. If the split is unsuccessful, MongoDB labels the
chunk as jumbo to avoid repeated attempts to migrate the chunk.
Chunk Splits in a Sharded Cluster
As chunks grow beyond the specified chunk size (page 666) a mongos instance will attempt to split the chunk in half.
Splits may lead to an uneven distribution of the chunks for a collection across the shards. In such cases, the mongos
instances will initiate a round of migrations to redistribute chunks across shards. See Sharded Collection Balancing
(page 663) for more details on balancing chunks across shards.
Chunk Size
The default chunk size in MongoDB is 64 megabytes. You can increase or reduce the chunk size (page 705), mindful
of its effect on the clusters efficiency.
1. Small chunks lead to a more even distribution of data at the expense of more frequent migrations. This creates
expense at the query routing (mongos) layer.
13 http://docs.mongodb.org/v2.2/tutorial/configure-sharded-cluster-balancer/#sharded-cluster-config-secondary-throttle
666
2. Large chunks lead to fewer migrations. This is more efficient both from the networking perspective and in terms
of internal overhead at the query routing layer. But, these efficiencies come at the expense of a potentially more
uneven distribution of data.
3. Chunk size affects the Maximum Number of Documents Per Chunk to Migrate.
For many deployments, it makes sense to avoid frequent and potentially spurious migrations at the expense of a slightly
less evenly distributed data set.
Limitations
Changing the chunk size affects when chunks split but there are some limitations to its effects.
Automatic splitting only occurs during inserts or updates. If you lower the chunk size, it may take time for all
chunks to split to the new size.
Splits cannot be undone. If you increase the chunk size, existing chunks must grow through inserts or updates
until they reach the new size.
Note: Chunk ranges are inclusive of the lower boundary and exclusive of the upper boundary.
Indivisible Chunks
In some cases, chunks can grow beyond the specified chunk size (page 666) but cannot undergo a split; e.g. if a chunk
represents a single shard key value. See Considerations for Selecting Shard Keys (page 673) for considerations for
selecting a shard key.
Shard Key Indexes
All sharded collections must have an index that starts with the shard key. If you shard a collection without any
documents and without such an index, the shardCollection command will create the index on the shard key. If
the collection already has documents, you must create the index before using shardCollection.
Changed in version 2.2: The index on the shard key no longer needs to be only on the shard key. This index can be an
index of the shard key itself, or a compound index where the shard key is a prefix of the index.
Important: The index on the shard key cannot be a multikey index (page 474).
667
A sharded collection named people has for its shard key the field zipcode. It currently has the index {
zipcode: 1 }. You can replace this index with a compound index { zipcode: 1, username: 1 },
as follows:
1. Create an index on { zipcode:
1, username:
1 }:
2. When MongoDB finishes building the index, you can safely drop the existing index on { zipcode:
1 }:
db.people.dropIndex( { zipcode: 1 } );
Since the index on the shard key cannot be a multikey index, the index { zipcode: 1, username:
can only replace the index { zipcode: 1 } if there are no array values for the username field.
1 }
If you drop the last valid index for the shard key, recover by recreating an index on just the shard key.
For restrictions on shard key indexes, see limits-shard-keys.
Sharded Cluster Metadata
Config servers (page 650) store the metadata for a sharded cluster. The metadata reflects state and organization of the
sharded data sets and system. The metadata includes the list of chunks on every shard and the ranges that define the
chunks. The mongos instances cache this data and use it to route read and write operations to shards.
Config servers store the metadata in the Config Database (page 716).
Important: Always back up the config database before doing any maintenance on the config server.
To access the config database, issue the following command from the mongo shell:
use config
In general, you should never edit the content of the config database directly. The config database contains the
following collections:
changelog (page 717)
chunks (page 718)
collections (page 719)
databases (page 719)
lockpings (page 719)
locks (page 719)
mongos (page 720)
settings (page 720)
shards (page 721)
version (page 721)
For more information on these collections and their role in sharded clusters, see Config Database (page 716). See Read
and Write Operations on Config Servers (page 650) for more information about reads and updates to the metadata.
668
669
Deploy Three Config Servers for Production Deployments (page 677) Convert a test deployment with one config
server to a production deployment with three config servers.
Convert a Replica Set to a Replicated Sharded Cluster (page 678) Convert a replica set to a sharded cluster in which
each shard is its own replica set.
Convert Sharded Cluster to Replica Set (page 683) Replace your sharded cluster with a single replica set.
See also:
Enable Authentication in a Sharded Cluster (page 346)
Deploy a Sharded Cluster
Use the following sequence of tasks to deploy a sharded cluster:
Warning: Sharding and localhost Addresses
If you use either localhost or 127.0.0.1 as the hostname portion of any host identifier, for example as the
host argument to addShard or the value to the --configdb run time option, then you must use localhost
or 127.0.0.1 for all host settings for any MongoDB instances in the cluster. If you mix localhost addresses and
remote host address, MongoDB will error.
The config server processes are mongod instances that store the clusters metadata. You designate a mongod as a
config server using the --configsvr option. Each config server stores a complete copy of the clusters metadata.
In production deployments, you must deploy exactly three config server instances, each running on different servers
to assure good uptime and data safety. In test environments, you can run all three instances on a single server.
Important: All members of a sharded cluster must be able to connect to all other members of a sharded cluster,
including all shards and all config servers. Ensure that the network and security systems including all interfaces and
firewalls, allow these connections.
1. Create data directories for each of the three config server instances. By default, a config server stores its data
files in the /data/configdb directory. You can choose a different location. To create a data directory, issue a
command similar to the following:
mkdir /data/configdb
2. Start the three config server instances. Start each by issuing a command using the following syntax:
mongod --configsvr --dbpath <path> --port <port>
The default port for config servers is 27019. You can specify a different port. The following example starts a
config server using the default port and default data directory:
mongod --configsvr --dbpath /data/configdb --port 27019
670
The mongos instances are lightweight and do not require data directories. You can run a mongos instance on a
system that runs other cluster components, such as on an application server or a server running a mongod process. By
default, a mongos instance runs on port 27017.
When you start the mongos instance, specify the hostnames of the three config servers, either in the configuration file
or as command line parameters.
Tip
To avoid downtime, give each config server a logical DNS name (unrelated to the servers physical or virtual hostname). Without logical DNS names, moving or renaming a config server requires shutting down every mongod and
mongos instance in the sharded cluster.
To start a mongos instance, issue a command using the following syntax:
mongos --configdb <config server hostnames>
For example, to start a mongos that connects to config server instance running on the following hosts and on the
default ports:
cfg0.example.net
cfg1.example.net
cfg2.example.net
You would issue the following command:
mongos --configdb cfg0.example.net:27019,cfg1.example.net:27019,cfg2.example.net:27019
Each mongos in a sharded cluster must use the same configDB string, with identical host names listed in identical
order.
If you start a mongos instance with a string that does not exactly match the string used by the other mongos instances
in the cluster, the mongos return a Config Database String Error (page 714) error and refuse to start.
Add Shards to the Cluster
A shard can be a standalone mongod or a replica set. In a production environment, each shard should be a replica set.
Use the procedure in Deploy a Replica Set (page 583) to deploy replica sets for each shard.
1. From a mongo shell, connect to the mongos instance. Issue a command using the following syntax:
mongo --host <hostname of machine running mongos> --port <port mongos listens on>
For example, if a mongos is accessible at mongos0.example.net on port 27017, issue the following
command:
mongo --host mongos0.example.net --port 27017
2. Add each shard to the cluster using the sh.addShard() method, as shown in the examples below. Issue
sh.addShard() separately for each shard. If the shard is a replica set, specify the name of the replica set and
specify a member of the set. In production deployments, all shards should be replica sets.
Optional
671
You can instead use the addShard database command, which lets you specify a name and maximum size for
the shard. If you do not specify these, MongoDB automatically assigns a name and maximum size. To use the
database command, see addShard.
The following are examples of adding a shard with sh.addShard():
To add a shard for a replica set named rs1 with a member running on port 27017 on
mongodb0.example.net, issue the following command:
sh.addShard( "rs1/mongodb0.example.net:27017" )
sh.addShard( "rs1/mongodb0.example.net:27017,mongodb1.example.net:27017,mongodb2.example.net
To add a shard for a standalone mongod on port 27017 of mongodb0.example.net, issue the following command:
sh.addShard( "mongodb0.example.net:27017" )
Note: It might take some time for chunks to migrate to the new shard.
Before you can shard a collection, you must enable sharding for the collections database. Enabling sharding for a
database does not redistribute data but make it possible to shard the collections in that database.
Once you enable sharding for a database, MongoDB assigns a primary shard for that database where MongoDB stores
all data before sharding begins.
1. From a mongo shell, connect to the mongos instance. Issue a command using the following syntax:
mongo --host <hostname of machine running mongos> --port <port mongos listens on>
2. Issue the sh.enableSharding() method, specifying the name of the database for which to enable sharding.
Use the following syntax:
sh.enableSharding("<database>")
Optionally, you can enable sharding for a database using the enableSharding command, which uses the following
syntax:
db.runCommand( { enableSharding: <database> } )
672
3. Enable sharding for a collection by issuing the sh.shardCollection() method in the mongo shell. The
method uses the following syntax:
sh.shardCollection("<database>.<collection>", shard-key-pattern)
Replace the <database>.<collection> string with the full namespace of your database, which consists
of the name of your database, a dot (e.g. .), and the full name of the collection. The shard-key-pattern
represents your shard key, which you specify in the same form as you would an index key pattern.
Example
The following sequence of commands shards four collections:
sh.shardCollection("records.people", { "zipcode": 1, "name": 1 } )
sh.shardCollection("people.addresses", { "state": 1, "_id": 1 } )
sh.shardCollection("assets.chairs", { "type": 1, "_id": 1 } )
sh.shardCollection("events.alerts", { "_id": "hashed" } )
1, "name":
This shard key distributes documents by the value of the zipcode field. If a number of documents have
the same value for this field, then that chunk will be splittable (page 675) by the values of the name field.
(b) The addresses collection in the people database using the shard key { "state":
1 }.
1, "_id":
This shard key distributes documents by the value of the state field. If a number of documents have the
same value for this field, then that chunk will be splittable (page 675) by the values of the _id field.
(c) The chairs collection in the assets database using the shard key { "type":
}.
1, "_id":
This shard key distributes documents by the value of the type field. If a number of documents have the
same value for this field, then that chunk will be splittable (page 675) by the values of the _id field.
(d) The alerts collection in the events database using the shard key { "_id":
"hashed" }.
For many collections there may be no single, naturally occurring key that possesses all the qualities of a good shard
key. The following strategies may help construct a useful shard key from existing data:
1. Compute a more ideal shard key in your application layer, and store this in all of your documents, potentially in
the _id field.
2. Use a compound shard key that uses two or three values from all documents that provide the right mix of
cardinality with scalable write operations and query isolation.
3. Determine that the impact of using a less than ideal shard key is insignificant in your use case, given:
673
Choosing the correct shard key can have a great impact on the performance, capability, and functioning of your
database and cluster. Appropriate shard key choice depends on the schema of your data and the way that your applications query and write data.
Create a Shard Key that is Easily Divisible An easily divisible shard key makes it easy for MongoDB to distribute
content among the shards. Shard keys that have a limited number of possible values can result in chunks that are
unsplittable.
For instance, if a chunk represents a single shard key value, then MongoDB cannot split the chunk even when the
chunk exceeds the size at which splits (page 666) occur.
See also:
Cardinality (page 675)
Create a Shard Key that has High Degree of Randomness A shard key with high degree of randomness prevents
any single shard from becoming a bottleneck and will distribute write operations among the cluster.
See also:
Write Scaling (page 655)
Create a Shard Key that Targets a Single Shard A shard key that targets a single shard makes it possible for the
mongos program to return most query operations directly from a single specific mongod instance. Your shard key
should be the primary field used by your queries. Fields with a high degree of randomness make it difficult to target
operations to specific shards.
See also:
Query Isolation (page 655)
Shard Using a Compound Shard Key The challenge when selecting a shard key is that there is not always an
obvious choice. Often, an existing field in your collection may not be the optimal key. In those situations, computing
a special purpose shard key into an additional field or using a compound shard key may help produce one that is more
ideal.
674
Cardinality Cardinality in the context of MongoDB, refers to the ability of the system to partition data into chunks.
For example, consider a collection of data such as an address book that stores address records:
Consider the use of a state field as a shard key:
The state keys value holds the US state for a given address document. This field has a low cardinality as all
documents that have the same value in the state field must reside on the same shard, even if a particular states
chunk exceeds the maximum chunk size.
Since there are a limited number of possible values for the state field, MongoDB may distribute data unevenly
among a small number of fixed chunks. This may have a number of effects:
If MongoDB cannot split a chunk because all of its documents have the same shard key, migrations involving these un-splittable chunks will take longer than other migrations, and it will be more difficult for your
data to stay balanced.
If you have a fixed maximum number of chunks, you will never be able to use more than that number of
shards for this collection.
Consider the use of a zipcode field as a shard key:
While this field has a large number of possible values, and thus has potentially higher cardinality, its possible
that a large number of users could have the same value for the shard key, which would make this chunk of users
un-splittable.
In these cases, cardinality depends on the data. If your address book stores records for a geographically distributed contact list (e.g. Dry cleaning businesses in America,) then a value like zipcode would be sufficient.
However, if your address book is more geographically concentrated (e.g ice cream stores in Boston Massachusetts,) then you may have a much lower cardinality.
Consider the use of a phone-number field as a shard key:
Phone number has a high cardinality, because users will generally have a unique value for this field, MongoDB
will be able to split as many chunks as needed.
While high cardinality, is necessary for ensuring an even distribution of data, having a high cardinality does not
guarantee sufficient query isolation (page 655) or appropriate write scaling (page 655).
If you choose a shard key with low cardinality, some chunks may grow too large for MongoDB to migrate. See Jumbo
Chunks (page 666) for more information.
Shard Key Selection Strategy
When selecting a shard key, it is difficult to balance the qualities of an ideal shard key, which sometimes dictate
opposing strategies. For instance, its difficult to produce a key that has both a high degree randomness for even data
distribution and a shard key that allows your application to target specific shards. For some workloads, its more
important to have an even data distribution, and for others targeted queries are essential.
Therefore, the selection of a shard key is about balancing both your data and the performance characteristics caused
by different possible data distributions and system workloads.
Shard a Collection Using a Hashed Shard Key
New in version 2.4.
Hashed shard keys (page 654) use a hashed index (page 506) of a field as the shard key to partition data across your
sharded cluster.
For suggestions on choosing the right field as your hashed shard key, see Hashed Shard Keys (page 654). For limitations on hashed indexes, see Create a Hashed Index (page 506).
10.3. Sharded Cluster Tutorials
675
Note: If chunk migrations are in progress while creating a hashed shard key collection, the initial chunk distribution
may be uneven until the balancer automatically balances the collection.
To shard a collection using a hashed shard key, use an operation in the mongo that resembles the following:
sh.shardCollection( "records.active", { a: "hashed" } )
This operation shards the active collection in the records database, using a hash of the a field as the shard key.
Specify the Initial Number of Chunks
If you shard an empty collection using a hashed shard key, MongoDB automatically creates and migrates empty chunks
so that each shard has two chunks. To control how many chunks MongoDB creates when sharding the collection, use
shardCollection with the numInitialChunks parameter.
Important: MongoDB 2.4 adds support for hashed shard keys. After sharding a collection with a hashed shard key,
you must use the MongoDB 2.4 or higher mongos and mongod instances in your sharded cluster.
Warning: MongoDB hashed indexes truncate floating point numbers to 64-bit integers before hashing. For
example, a hashed index would store the same value for a field that held a value of 2.3, 2.2, and 2.9. To
prevent collisions, do not use a hashed index for floating point numbers that cannot be reliably converted to
64-bit integers (and then back to floating point). MongoDB hashed indexes do not support floating point values
larger than 253 .
Balancing When you add a shard to a sharded cluster, you affect the balance of chunks among the shards of a cluster
for all existing sharded collections. The balancer will begin migrating chunks so that the cluster will achieve balance.
See Sharded Collection Balancing (page 663) for more information.
Capacity Planning When adding a shard to a cluster, always ensure that the cluster has enough capacity to support
the migration required for balancing the cluster without affecting legitimate production traffic.
Add a Shard to a Cluster
676
2. Add a shard to the cluster using the sh.addShard() method, as shown in the examples below. Issue
sh.addShard() separately for each shard. If the shard is a replica set, specify the name of the replica
set and specify a member of the set. In production deployments, all shards should be replica sets.
Optional
You can instead use the addShard database command, which lets you specify a name and maximum size for
the shard. If you do not specify these, MongoDB automatically assigns a name and maximum size. To use the
database command, see addShard.
The following are examples of adding a shard with sh.addShard():
To add a shard for a replica set named rs1 with a member running on port 27017 on
mongodb0.example.net, issue the following command:
sh.addShard( "rs1/mongodb0.example.net:27017" )
sh.addShard( "rs1/mongodb0.example.net:27017,mongodb1.example.net:27017,mongodb2.example.net
To add a shard for a standalone mongod on port 27017 of mongodb0.example.net, issue the following command:
sh.addShard( "mongodb0.example.net:27017" )
Note: It might take some time for chunks to migrate to the new shard.
677
3. Start all three config servers, using the same invocation that you used for the single config server.
mongod --configsvr
This tutorial converts a single three-member replica set to a sharded cluster with two shards. Each shard is an independent three-member replica set. The procedure is as follows:
1. Create the initial three-member replica set and insert data into a collection. See Set Up Initial Replica Set
(page 678).
2. Start the config databases and a mongos. See Deploy Config Databases and mongos (page 679).
3. Add the initial replica set as a shard. See Add Initial Replica Set as a Shard (page 680).
4. Create a second shard and add to the cluster. See Add Second Shard (page 680).
5. Shard the desired collection. See Shard a Collection (page 681).
Prerequisites
This tutorial uses a total of ten servers: one server for the mongos and three servers each for the first replica set, the
second replica set, and the config servers (page 650).
Each server must have a resolvable domain, hostname, or IP address within your system.
The tutorial uses the default data directories (e.g.
/data/db and /data/configdb).
Create the appropriate directories with appropriate permissions.
To use different paths, see
http://docs.mongodb.org/manual/reference/configuration-options .
The tutorial uses the default ports (page 408) (e.g. 27017 and 27019). To use different ports, see
http://docs.mongodb.org/manual/reference/configuration-options.
Considerations
In production deployments, use exactly three config servers. Each config server must be on a separate machine.
In development and testing environments, you can deploy a cluster with a single config server.
Procedures
Set Up Initial Replica Set This procedure creates the initial three-member replica set rs0. The replica
set members are on the following hosts: mongodb0.example.net, mongodb1.example.net, and
mongodb2.example.net.
678
Step 1: Start each member of the replica set with the appropriate options. For each member, start a mongod,
specifying the replica set name through the replSet option. Include any other parameters specific to your deployment. For replication-specific parameters, see cli-mongod-replica-set.
mongod --replSet "rs0"
Repeat this step for the other two members of the rs0 replica set.
Step 2: Connect a mongo shell to a replica set member. Connect a mongo shell to one member of the replica set
(e.g. mongodb0.example.net)
mongo mongodb0.example.net
From the mongo shell, run rs.initiate() to initiate a replica set that consists
rs.initiate()
Step 5: Create and populate a new collection. The following step adds one million documents to the collection
test_collection and can take several minutes depending on your system.
Issue the following operations on the primary of the replica set:
use test
var bulk = db.test_collection.initializeUnorderedBulkOp();
people = ["Marc", "Bill", "George", "Eliot", "Matt", "Trey", "Tracy", "Greg", "Steve", "Kristina", "K
for(var i=0; i<1000000; i++){
user_id = i;
name = people[Math.floor(Math.random()*people.length)];
number = Math.floor(Math.random()*10001);
bulk.insert( { "user_id":user_id, "name":name, "number":number });
}
bulk.execute();
For more information on deploying a replica set, see Deploy a Replica Set (page 583).
Deploy Config Databases and mongos This procedure deploys the three config servers and the mongos.
The config servers use the following hosts: mongodb7.example.net, mongodb8.example.net, and
mongodb9.example.net; the mongos uses mongodb6.example.net.
Step 1: Start three config databases. On each mongodb7.example.net, mongodb8.example.net, and
mongodb9.example.net server, start the config server using default data directory /data/configdb and the
default port 27019:
mongod --configsvr
To modify the default settings or to include additional options specific to your deployment,
http://docs.mongodb.org/manual/reference/configuration-options.
see
679
Step 2: Start a mongos instance. On mongodb6.example.net, start the mongos specifying the config
servers. The mongos runs on the default port 27017.
This tutorial specifies a small --chunkSize of 1 MB to test sharding with the test_collection created earlier.
Note: In production environments, do not use a small chunkSize size.
Add Initial Replica Set as a Shard The following procedure adds the initial replica set rs0 as a shard.
Step 1: Connect a mongo shell to the mongos.
mongo mongodb6.example.net:27017/admin
Step 2: Add the shard. Add a shard to the cluster with the sh.addShard method:
sh.addShard( "rs0/mongodb0.example.net:27017,mongodb1.example.net:27017,mongodb2.example.net:27017" )
Add Second Shard The following procedure deploys a new replica set rs1 for the second shard and
adds it to the cluster. The replica set members are on the following hosts: mongodb3.example.net,
mongodb4.example.net, and mongodb5.example.net.
Step 1: Start each member of the replica set with the appropriate options. For each member, start a mongod,
specifying the replica set name through the replSet option. Include any other parameters specific to your deployment. For replication-specific parameters, see cli-mongod-replica-set.
mongod --replSet "rs1"
Repeat this step for the other two members of the rs1 replica set.
Step 2: Connect a mongo shell to a replica set member. Connect a mongo shell to one member of the replica set
(e.g. mongodb3.example.net)
mongo mongodb3.example.net
From the mongo shell, run rs.initiate() to initiate a replica set that consists
rs.initiate()
Step 4: Add the remaining members to the replica set. Add the remaining members with the rs.add() method.
rs.add("mongodb4.example.net")
rs.add("mongodb5.example.net")
680
Step 6: Add the shard. In a mongo shell connected to the mongos, add the shard to the cluster with the
sh.addShard() method:
sh.addShard( "rs1/mongodb3.example.net:27017,mongodb4.example.net:27017,mongodb5.example.net:27017" )
Shard a Collection
Step 1: Connect a mongo shell to the mongos.
mongo mongodb6.example.net:27017/admin
Step 2: Enable sharding for a database. Before you can shard a collection, you must first enable sharding for the
collections database. Enabling sharding for a database does not redistribute data but makes it possible to shard the
collections in that database.
The following operation enables sharding on the test database:
sh.enableSharding( "test" )
Step 3: Determine the shard key. For the collection to shard, determine the shard key. The shard key (page 654)
determines how MongoDB distributes the documents between shards. Good shard keys:
have values that are evenly distributed among all documents,
group documents that are often accessed at the same time into contiguous chunks, and
allow for effective distribution of activity among shards.
Once you shard a collection with the specified shard key, you cannot change the shard key. For more information on
shard keys, see Shard Keys (page 654) and Considerations for Selecting Shard Keys (page 673).
This procedure will use the number field as the shard key for test_collection.
Step 4: Create an index on the shard key. Before sharding a non-empty collection, create an index on the shard
key (page 667).
use test
db.test_collection.createIndex( { number : 1 } )
Step 5: Shard the collection. In the test database, shard the test_collection, specifying number as the
shard key.
use test
sh.shardCollection( "test.test_collection", { "number" : 1 } )
The balancer (page 663) will redistribute chunks of documents when it next runs. As clients insert additional documents into this collection, the mongos will route the documents between the shards.
681
Step 6:
Confirm the shard is balancing. To confirm balancing activity,
db.printShardingStatus() in the test database.
run db.stats() or
use test
db.stats()
db.printShardingStatus()
682
"storageSize" : 177561600,
"numExtents" : 21,
"indexes" : 4,
"indexSize" : 58540160,
"fileSize" : 536870912,
"extentFreeList" : {
"num" : 0,
"totalSize" : 0
},
"ok" : 1
}
Run these commands for a second time to demonstrate that chunks are migrating from rs0 to rs1.
Convert Sharded Cluster to Replica Set
This tutorial describes the process for converting a sharded cluster to a non-sharded replica set. To convert a replica set
into a sharded cluster Convert a Replica Set to a Replicated Sharded Cluster (page 678). See the Sharding (page 641)
documentation for more information on sharded clusters.
Convert a Cluster with a Single Shard into a Replica Set
In the case of a sharded cluster with only one shard, that shard contains the full data set. Use the following procedure
to convert that cluster into a non-sharded replica set:
1. Reconfigure the application to connect to the primary member of the replica set hosting the single shard that
system will be the new replica set.
2. Optionally remove the --shardsrv option, if your mongod started with this option.
Tip
Changing the --shardsrv option will change the port that mongod listens for incoming connections on.
683
The single-shard cluster is now a non-sharded replica set that will accept read and write operations on the data set.
You may now decommission the remaining sharding infrastructure.
Convert a Sharded Cluster into a Replica Set
Use the following procedure to transition from a sharded cluster with more than one shard to an entirely new replica
set.
1. With the sharded cluster running, deploy a new replica set (page 583) in addition to your sharded cluster. The
replica set must have sufficient capacity to hold all of the data files from all of the current shards combined. Do
not configure the application to connect to the new replica set until the data transfer is complete.
2. Stop all writes to the sharded cluster. You may reconfigure your application or stop all mongos instances.
If you stop all mongos instances, the applications will not be able to read from the database. If you stop all
mongos instances, start a temporary mongos instance on that applications cannot access for the data migration
procedure.
3. Use mongodump and mongorestore (page 246) to migrate the data from the mongos instance to the new replica
set.
Note: Not all collections on all databases are necessarily sharded. Do not solely migrate the sharded collections.
Ensure that all databases and all collections migrate correctly.
4. Reconfigure the application to use the non-sharded replica set instead of the mongos instance.
The application will now use the un-sharded replica set for reads and writes. You may now decommission the remaining unused sharded cluster infrastructure.
684
To list the databases that have sharding enabled, query the databases collection in the Config Database (page 716).
A database has sharding enabled if the value of the partitioned field is true. Connect to a mongos instance
with a mongo shell, and run the following operation to get a full list of databases with sharding enabled:
use config
db.databases.find( { "partitioned": true } )
Example
You can use the following sequence of commands when to return a list of all databases in the cluster:
use config
db.databases.find()
List Shards
To list the current set of configured shards, use the listShards command, as follows:
use admin
db.runCommand( { listShards : 1 } )
To view cluster details, issue db.printShardingStatus() or sh.status(). Both methods return the same
output.
Example
In the following example output from sh.status()
sharding version displays the version number of the shard metadata.
shards displays a list of the mongod instances used as shards in the cluster.
databases displays all databases in the cluster, including database that do not have sharding enabled.
The chunks information for the foo database displays how many chunks are on each shard and displays the
range of each chunk.
--- Sharding Status --sharding version: { "_id" : 1, "version" : 3 }
shards:
{ "_id" : "shard0000", "host" : "m0.example.net:30001" }
{ "_id" : "shard0001", "host" : "m3.example2.net:50000" }
databases:
685
{
{
4. Start the config server instance on the new system. The default invocation is:
mongod --configsvr
When you start the third config server, your cluster will become writable and it will be able to create new splits and
migrate chunks as needed.
Migrate Config Servers with Different Hostnames
Overview
Sharded clusters use a group of three config servers to store cluster meta data, and all three config servers must be
available to support cluster metadata changes that include chunk splits and migrations. If one of the config servers is
unavailable or inoperable, you must replace it as soon as possible.
686
This procedure migrates a config server (page 650) in a sharded cluster (page 647) to a new server that uses a different
hostname. Use this procedure only if the config server will not be accessible via the same hostname. If possible, avoid
changing the hostname so that you can instead use the procedure to migrate a config server and use the same hostname
(page 686).
Considerations
Changing a config servers (page 650) hostname requires downtime and requires restarting every process in the
sharded cluster.
While migrating config servers, always make sure that all mongos instances have three config servers specified in the
configDB setting at all times. Also ensure that you specify the config servers in the same order for each mongos
instances configDB setting.
Procedure
1. Disable the cluster balancer process temporarily. See Disable the Balancer (page 695) for more information.
2. Shut down the config server to migrate.
This renders all config data for the sharded cluster read only.
3. Copy the contents of dbPath from the old config server to the new config server. For example, to copy
the contents of dbPath to a machine named mongodb.config2.example.net, use a command that
resembles the following:
rsync -az /data/configdb mongodb.config2.example.net:/data/configdb
4. Start the config server instance on the new system. The default invocation is:
mongod --configsvr
Sharded clusters use a group of three config servers to store cluster meta data, and all three config servers must be
available to support cluster metadata changes that include chunk splits and migrations. If one of the config servers is
unavailable or inoperable you must replace it as soon as possible.
687
This procedure replaces an inoperable config server (page 650) in a sharded cluster (page 647). Use this procedure
only to replace a config server that has become inoperable (e.g. hardware failure).
This process assumes that the hostname of the instance will not change. If you must change the hostname of the
instance, use the procedure to migrate a config server and use a new hostname (page 686).
Considerations
In the course of this procedure never remove a config server from the configDB parameter on any of the mongos
instances.
Procedure
Step 1: Provision a new system, with the same IP address and hostname as the previous host. You will have to
ensure the new system has the same IP address and hostname as the system its replacing or you will need to modify
the DNS records and wait for them to propagate.
Step 2: Shut down one of the remaining config servers. Copy all of this hosts dbPath path from the current
system to the system that will provide the new config server. This command, issued on the system with the data files,
may resemble the following:
rsync -az /data/configdb mongodb.config2.example.net:/data/configdb
Step 3: If necessary, update DNS and/or networking. Ensure the new config server is accessible by the same
name as the previous config server.
Step 4: Start the new config server.
mongod --configsvr
Disable the balancer to stop chunk migration (page 664) and do not perform any metadata write operations until the
process finishes. If a migration is in progress, the balancer will complete the in-progress migration before stopping.
To disable the balancer, connect to one of the clusters mongos instances and issue the following method:
688
sh.stopBalancer()
Migrate each config server (page 650) by starting with the last config server listed in the configDB string. Proceed
in reverse order of the configDB string. Migrate and restart a config server before proceeding to the next. Do not
rename a config server during this process.
Note: If the name or address that a sharded cluster uses to connect to a config server changes, you must restart every
mongod and mongos instance in the sharded cluster. Avoid downtime by using CNAMEs to identify config servers
within the MongoDB deployment.
See Migrate Config Servers with Different Hostnames (page 686) for more information.
Important: Start with the last config server listed in configDB.
1. Shut down the config server.
This renders all config data for the sharded cluster read only.
2. Change the DNS entry that points to the system that provided the old config server, so that the same hostname
points to the new system. How you do this depends on how you organize your DNS and hostname resolution
services.
3. Copy the contents of dbPath from the old config server to the new config server.
For example, to copy the contents of dbPath to a machine named mongodb.config2.example.net,
you might issue a command similar to the following:
rsync -az /data/configdb/ mongodb.config2.example.net:/data/configdb
4. Start the config server instance on the new system. The default invocation is:
mongod --configsvr
If the configDB string will change as part of the migration, you must shut down all mongos instances before
changing the configDB string. This avoids errors in the sharded cluster over configDB string conflicts.
If the configDB string will remain the same, you can migrate the mongos instances sequentially or all at once.
1. Shut down the mongos instances using the shutdown command. If the configDB string is changing, shut
down all mongos instances.
2. If the hostname has changed for any of the config servers, update the configDB string for each mongos
instance. The mongos instances must all use the same configDB string. The strings must list identical host
names in identical order.
Tip
689
To avoid downtime, give each config server a logical DNS name (unrelated to the servers physical or virtual
hostname). Without logical DNS names, moving or renaming a config server requires shutting down every
mongod and mongos instance in the sharded cluster.
3. Restart the mongos instances being sure to use the updated configDB string if hostnames have changed.
For more information, see Start the mongos Instances (page 671).
Migrate the Shards
Migrate the shards one at a time. For each shard, follow the appropriate procedure in this section.
Migrate a Replica Set Shard To migrate a sharded cluster, migrate each member separately. First migrate the
non-primary members, and then migrate the primary last.
If the replica set has two voting members, add an arbiter (page 552) to the replica set to ensure the set keeps a majority
of its votes available during the migration. You can remove the arbiter after completing the migration.
Migrate a Member of a Replica Set Shard
1. Shut down the mongod process. To ensure a clean shutdown, use the shutdown command.
2. Move the data directory (i.e., the dbPath) to the new machine.
3. Restart the mongod process at the new location.
4. Connect to the replica sets current primary.
5. If the hostname of the member has changed, use rs.reconfig() to update the replica set configuration
document (page 632) with the new hostname.
For example, the following sequence of commands updates the hostname for the instance at position 2 in the
members array:
cfg = rs.conf()
cfg.members[2].host = "pocatello.example.net:27017"
rs.reconfig(cfg)
690
2. Once the primary has stepped down and another member has become PRIMARY (page 635) state. To migrate
the stepped-down primary, follow the Migrate a Member of a Replica Set Shard (page 690) procedure
You can check the output of rs.status() to confirm the change in status.
Migrate a Standalone Shard The ideal procedure for migrating a standalone shard is to convert the standalone to a
replica set (page 594) and then use the procedure for migrating a replica set shard (page 690). In production clusters,
all shards should be replica sets, which provides continued availability during maintenance windows.
Migrating a shard as standalone is a multi-step process during which part of the shard may be unavailable. If the shard
is the primary shard for a database,the process includes the movePrimary command. While the movePrimary
runs, you should stop modifying data in that database. To migrate the standalone shard, use the Remove Shards from
an Existing Sharded Cluster (page 697) procedure.
Re-Enable the Balancer
To complete the migration, re-enable the balancer to resume chunk migrations (page 664).
Connect to one of the clusters mongos instances and pass true to the sh.setBalancerState() method:
sh.setBalancerState(true)
691
operational requirements. If you encounter a situation where you need to modify the behavior of the balancer, use the
procedures described in this document.
For conceptual information about the balancer, see Sharded Collection Balancing (page 663) and Cluster Balancer
(page 663).
Schedule a Window of Time for Balancing to Occur
You can schedule a window of time during which the balancer can migrate chunks, as described in the following
procedures:
Schedule the Balancing Window (page 694)
Remove a Balancing Window Schedule (page 695).
The mongos instances use their own local timezones when respecting balancer window.
Configure Default Chunk Size
The default chunk size for a sharded cluster is 64 megabytes. In most situations, the default size is appropriate for
splitting and migrating chunks. For information on how chunk size affects deployments, see details, see Chunk Size
(page 666).
Changing the default chunk size affects chunks that are processes during migrations and auto-splits but does not
retroactively affect all chunks.
To configure default chunk size, see Modify Chunk Size in a Sharded Cluster (page 705).
Change the Maximum Storage Size for a Given Shard
The maxSize field in the shards (page 721) collection in the config database (page 716) sets the maximum size
for a shard, allowing you to control whether the balancer will migrate chunks to a shard. If mapped size 15 is above
a shards maxSize, the balancer will not move chunks to the shard. Also, the balancer will not move chunks off an
overloaded shard. This must happen manually. The maxSize value only affects the balancers selection of destination
shards.
By default, maxSize is not specified, allowing shards to consume the total amount of available space on their machines if necessary.
You can set maxSize both when adding a shard and once a shard is running.
To set maxSize when adding a shard, set the addShard commands maxSize parameter to the maximum size in
megabytes. For example, the following command run in the mongo shell adds a shard with a maximum size of 125
megabytes:
db.runCommand( { addshard : "example.net:34008", maxSize : 125 } )
To set maxSize on an existing shard, insert or update the maxSize field in the shards (page 721) collection in the
config database (page 716). Set the maxSize in megabytes.
Example
Assume you have the following shard without a maxSize field:
{ "_id" : "shard0000", "host" : "example.net:34001" }
15
This value includes the mapped size of all data files including thelocal and admin databases. Account for this when setting maxSize.
692
Run the following sequence of commands in the mongo shell to insert a maxSize of 125 megabytes:
use config
db.shards.update( { _id : "shard0000" }, { $set : { maxSize : 125 } } )
To later increase the maxSize setting to 250 megabytes, run the following:
use config
db.shards.update( { _id : "shard0000" }, { $set : { maxSize : 250 } } )
Changed in version 3.0.0: The balancer configuration document added configurable writeConcern to control the
semantics of the _secondaryThrottle option.
The _secondaryThrottle parameter of the balancer and the moveChunk command affects the replication behavior during chunk migration (page 666). By default, _secondaryThrottle is true, which means each document move during chunk migration propagates to at least one secondary before the balancer proceeds with its next
operation: this is equivalent to a write concern of { w: 2 }.
You can also configure the writeConcern for the _secondaryThrottle operation, to configure how migrations will wait for replication to complete. For more information on the replication behavior during various steps of
chunk migration, see:ref:chunk-migration-replication.
To change the balancers _secondaryThrottle and writeConcern values, connect to a mongos instance and
directly update the _secondaryThrottle value in the settings (page 720) collection of the config database
(page 716). For example, from a mongo shell connected to a mongos, issue the following command:
use config
db.settings.update(
{ "_id" : "balancer" },
{ $set : { "_secondaryThrottle" : false },
{ "writeConcern": { "w": "majority" } } },
{ upsert : true }
)
The effects of changing the _secondaryThrottle and writeConcern value may not be immediate. To ensure
an immediate effect, stop and restart the balancer to enable the selected value of _secondaryThrottle. See
Manage Sharded Cluster Balancer (page 693) for details.
Manage Sharded Cluster Balancer
This page describes common administrative procedures related to balancing. For an introduction to balancing, see
Sharded Collection Balancing (page 663). For lower level information on balancing, see Cluster Balancer (page 663).
See also:
Configure Behavior of Balancer Process in Sharded Clusters (page 691)
Check the Balancer State
The following command checks if the balancer is enabled (i.e. that the balancer is allowed to run). The command does
not check if the balancer is active (i.e. if it is actively balancing chunks).
To see if the balancer is enabled in your cluster, issue the following command, which returns a boolean:
693
sh.getBalancerState()
New in version 3.0.0: You can also see if the balancer is enabled using sh.status(). The currently-enabled
field indicates whether the balancer is enabled, while the currently-running field indicates if the balancer is
currently running.
Check the Balancer Lock
When this command returns, you will see output like the following:
{
"_id"
"process"
"state"
"ts"
"when"
"who"
"why"
:
:
:
:
:
:
:
"balancer",
"mongos0.example.net:1292810611:1804289383",
2,
ObjectId("4d0f872630c42d1978be8a2e"),
"Mon Dec 20 2010 11:41:10 GMT-0500 (EST)",
"mongos0.example.net:1292810611:1804289383:Balancer:846930886",
"doing balance round" }
the
mongos
running
on
the
system
with
the
hostname
The value in the state field indicates that a mongos has the lock. For version 2.0 and later, the value of an
active lock is 2; for earlier versions the value is 1.
Schedule the Balancing Window
In some situations, particularly when your data set grows slowly and a migration can impact performance, its useful
to be able to ensure that the balancer is active only at certain times. Use the following procedure to specify a window
during which the balancer will be able to migrate chunks:
1. Connect to any mongos in the cluster using the mongo shell.
2. Issue the following command to switch to the Config Database (page 716):
use config
3. Issue the following operation to ensure the balancer is not in the stopped state:
sh.setBalancerState( true )
The balancer will not activate if in the stopped state or outside the activeWindow timeframe.
4. Use an operation modeled on the following example update() operation to modify the balancers window:
694
Replace <start-time> and <end-time> with time values using two digit hour and minute values (e.g
HH:MM) that describe the beginning and end boundaries of the balancing window. These times will be evaluated
relative to the time zone of each individual mongos instance in the sharded cluster. If your mongos instances
are physically located in different time zones, use a common time zone (e.g. GMT) to ensure that the balancer
window is interpreted correctly.
For instance, running the following will force the balancer to run between 11PM and 6AM local time only:
Note: The balancer window must be sufficient to complete the migration of all data inserted during the day.
As data insert rates can change based on activity and usage patterns, it is important to ensure that the balancing window
you select will be sufficient to support the needs of your deployment.
Do not use the sh.startBalancer() method when you have set an activeWindow.
If you have set the balancing window (page 694) and wish to remove the schedule so that the balancer is always
running, issue the following sequence of operations:
use config
db.settings.update({ _id : "balancer" }, { $unset : { activeWindow : true } })
By default the balancer may run at any time and only moves chunks as needed. To disable the balancer for a short
period of time and prevent all migration, use the following procedure:
1. Connect to any mongos in the cluster using the mongo shell.
2. Issue the following operation to disable the balancer:
sh.stopBalancer()
If a migration is in progress, the system will complete the in-progress migration before stopping.
3. To verify that the balancer will not start, issue the following command, which returns false if the balancer is
disabled:
sh.getBalancerState()
Optionally, to verify no migrations are in progress after disabling, issue the following operation in the mongo
shell:
use config
while( sh.isBalancerRunning() ) {
print("waiting...");
sleep(1000);
}
Note:
To disable the balancer from a driver that does not have the sh.stopBalancer() or
sh.setBalancerState() helpers, issue the following command from the config database:
695
Use this procedure if you have disabled the balancer and are ready to re-enable it:
1. Connect to any mongos in the cluster using the mongo shell.
2. Issue one of the following operations to enable the balancer:
From the mongo shell, issue:
sh.setBalancerState(true)
From a driver that does not have the sh.startBalancer() helper, issue the following from the config
database:
db.settings.update( { _id: "balancer" }, { $set : { stopped: false } } , true )
If MongoDB migrates a chunk during a backup (page 182), you can end with an inconsistent snapshot of your sharded
cluster. Never run a backup while the balancer is active. To ensure that the balancer is inactive during your backup
operation:
Set the balancing window (page 694) so that the balancer is inactive during the backup. Ensure that the backup
can complete while you have the balancer disabled.
manually disable the balancer (page 695) for the duration of the backup procedure.
If you turn the balancer off while it is in the middle of a balancing round, the shut down is not instantaneous. The
balancer completes the chunk move in-progress and then ceases all further balancing rounds.
Before starting a backup operation, confirm that the balancer is not active. You can use the following command to
determine if the balancer is active:
!sh.getBalancerState() && !sh.isBalancerRunning()
When the backup procedure is complete you can reactivate the balancer process.
Disable Balancing on a Collection
You can disable balancing for a specific collection with the sh.disableBalancing() method. You may want
to disable the balancer for a specific collection to support maintenance operations or atypical workloads, for example,
during data ingestions or data exports.
When you disable balancing on a collection, MongoDB will not interrupt in progress migrations.
To disable balancing on a collection, connect to a mongos with the mongo shell and call the
sh.disableBalancing() method.
For example:
sh.disableBalancing("students.grades")
The sh.disableBalancing() method accepts as its parameter the full namespace of the collection.
696
You can enable balancing for a specific collection with the sh.enableBalancing() method.
When you enable balancing for a collection, MongoDB will not immediately begin balancing data. However, if the
data in your sharded collection is not balanced, MongoDB will be able to begin distributing the data more evenly.
To enable balancing on a collection,
sh.enableBalancing() method.
For example:
sh.enableBalancing("students.grades")
The sh.enableBalancing() method accepts as its parameter the full namespace of the collection.
Confirm Balancing is Enabled or Disabled
To confirm whether balancing for a collection is enabled or disabled, query the collections collection in the
config database for the collection namespace and check the noBalance field. For example:
db.getSiblingDB("config").collections.findOne({_id : "students.grades"}).noBalance;
To successfully migrate data from a shard, the balancer process must be enabled. Check the balancer state using
the sh.getBalancerState() helper in the mongo shell. For more information, see the section on balancer
operations (page 695).
697
To determine the name of the shard, connect to a mongos instance with the mongo shell and either:
Use the listShards command, as in the following:
db.adminCommand( { listShards: 1 } )
From the admin database, run the removeShard command. This begins draining chunks from the shard you are
removing to other shards in the cluster. For example, for a shard named mongodb0, run:
use admin
db.runCommand( { removeShard: "mongodb0" } )
Depending on your network capacity and the amount of data, this operation can take from a few minutes to several
days to complete.
Check the Status of the Migration
To check the progress of the migration at any stage in the process, run removeShard from the admin database
again. For example, for a shard named mongodb0, run:
use admin
db.runCommand( { removeShard: "mongodb0" } )
In the output, the remaining document displays the remaining number of chunks that MongoDB must migrate to
other shards and the number of MongoDB databases that have primary status on this shard.
Continue checking the status of the removeShard command until the number of chunks remaining is 0. Always run the
command on the admin database. If you are on a database other than admin, you can use sh._adminCommand
to run the command on admin.
698
If the shard is the primary shard for one or more databases in the cluster, then the shard will have unsharded data. If
the shard is not the primary shard for any databases, skip to the next task, Finalize the Migration (page 699).
In a cluster, a database with unsharded collections stores those collections only on a single shard. That shard becomes
the primary shard for that database. (Different databases in a cluster can have different primary shards.)
Warning: Do not perform this procedure until you have finished draining the shard.
1. To determine if the shard you are removing is the primary shard for any of the clusters databases, issue one of
the following methods:
sh.status()
db.printShardingStatus()
In the resulting document, the databases field lists each database and its primary shard. For example, the
following database field shows that the products database uses mongodb0 as the primary shard:
{
"_id" : "products",
"partitioned" : true,
"primary" : "mongodb0" }
2. To move a database to another shard, use the movePrimary command. For example, to migrate all remaining
unsharded data from mongodb0 to mongodb1, issue the following command:
db.runCommand( { movePrimary: "products", to: "mongodb1" })
This command does not return until MongoDB completes moving all data, which may take a long time. The
response from this command will resemble the following:
{ "primary" : "mongodb1", "ok" : 1 }
To clean up all metadata information and finalize the removal, run removeShard again. For example, for a shard
named mongodb0, run:
use admin
db.runCommand( { removeShard: "mongodb0" } )
Once the value of the state field is completed, you may safely stop the processes comprising the mongodb0
shard.
See also:
Backup and Restore Sharded Clusters (page 249)
699
700
701
Use splitAt() to split a chunk in two, using the queried document as the lower bound in the new chunk:
Example
The following command splits the chunk that contains the value of 63109 for the zipcode field in the people
collection of the records database.
sh.splitAt( "records.people", { "zipcode": "63109" } )
Note: splitAt() does not necessarily split the chunk into two equally sized chunks. The split occurs at the location
of the document matching the query, regardless of where that document is in the chunk.
This command moves the chunk that includes the shard key value smith to the shard named
mongodb-shard3.example.net. The command will block until the migration is complete.
Tip
To return a list of shards, use the listShards command.
Example
Evenly migrate chunks
To evenly migrate chunks for the myapp.users collection, put each prefix chunk on the next shard from the other
and run the following commands in the mongo shell:
702
See Create Chunks in a Sharded Cluster (page 700) for an introduction to pre-splitting.
New in version 2.2: The moveChunk command has the: _secondaryThrottle parameter. When set to true,
MongoDB ensures that changes to shards as part of chunk migrations replicate to secondaries throughout the migration operation. For more information, see Change Replication Behavior for Chunk Migration (Secondary Throttle)
(page 693).
Changed in version 2.4: In 2.4, _secondaryThrottle is true by default.
Warning: The moveChunk command may produce the following error message:
The collection's metadata lock is already taken.
This occurs when clients have too many open cursors that access the migrating chunk. You may either wait until
the cursors complete their operations or close the cursors manually.
The mergeChunks command allows you to collapse empty chunks into neighboring chunks on the same shard. A
chunk is empty if it has no documents associated with its shard key range.
Important: Empty chunks can make the balancer assess the cluster as properly balanced when it is not.
Empty chunks can occur under various circumstances, including:
If a pre-split (page 700) creates too many chunks, the distribution of data to chunks may be uneven.
If you delete many documents from a sharded collection, some chunks may no longer contain data.
This tutorial explains how to identify chunks available to merge, and how to merge those chunks with neighboring
chunks.
Procedure
Note: Examples in this procedure use a users collection in the test database, using the username filed as a
shard key.
Identify Chunk Ranges In the mongo shell, identify the chunk ranges with the following operation:
sh.status()
703
The chunk ranges appear after the chunk counts for each sharded collection, as in the following excerpts:
Chunk counts:
chunks:
shard0000
shard0001
7
7
Chunk range:
{ "username" : "user36583" } -->> { "username" : "user43229" } on : shard0000 Timestamp(6, 0)
Verify a Chunk is Empty The mergeChunks command requires at least one empty input chunk. In the mongo
shell, check the amount of data in a chunk using an operation that resembles:
db.runCommand({
"dataSize": "test.users",
"keyPattern": { username: 1 },
"min": { "username": "user36583" },
"max": { "username": "user43229" }
})
If the input chunk to dataSize is empty, dataSize produces output similar to:
704
Merge Chunks Merge two contiguous chunks on the same shard, where at least one of the contains no data, with an
operation that resembles the following:
db.runCommand( { mergeChunks: "test.users",
bounds: [ { "username": "user68982" },
{ "username": "user95197" } ]
} )
On any failure condition, mergeChunks returns a document where the value of the ok field is 0.
View Merged Chunks Ranges After merging all empty chunks, confirm the new chunk, as follows:
sh.status()
705
3. Issue the following save() operation to store the global chunk size configuration value:
db.settings.save( { _id:"chunksize", value: <sizeInMB> } )
Note: The chunkSize and --chunkSize options, passed at runtime to the mongos, do not affect the chunk size
after you have initialized the cluster.
To avoid confusion, always set the chunk size using the above procedure instead of the runtime options.
Modifying the chunk size has several limitations:
Automatic splitting only occurs on insert or update.
If you lower the chunk size, it may take time for all chunks to split to the new size.
Splits cannot be undone.
If you increase the chunk size, existing chunks grow only through insertion or updates until they reach the new
size.
The allowed range of the chunk size is between 1 and 1024 megabytes, inclusive.
Clear jumbo Flag
If MongoDB cannot split a chunk that exceeds the specified chunk size (page 666) or contains a number of documents
that exceeds the max, MongoDB labels the chunk as jumbo (page 666).
If the chunk size no longer hits the limits, MongoDB clears the jumbo flag for the chunk when the mongos reloads
or rewrites the chunk metadata.
In cases where you need to clear the flag manually, the following procedures outline the steps to manually clear the
jumbo flag.
Procedures
Divisible Chunks The preferred way to clear the jumbo flag from a chunk is to attempt to split the chunk. If the
chunk is divisible, MongoDB removes the flag upon successful split of the chunk.
Step 1: Connect to mongos. Connect a mongo shell to a mongos.
Step 2: Find the jumbo Chunk. Run sh.status(true) to find the chunk labeled jumbo.
sh.status(true)
For example, the following output from sh.status(true) shows that chunk with shard key range { "x" :
-->> { "x" : 4 } is jumbo.
2 }
706
shards:
...
databases:
...
test.foo
shard key: { "x" : 1 }
chunks:
shard-b 2
shard-a 2
{ "x" : { "$minKey" : 1 } } -->> { "x" : 1 } on : shard-b Timestamp(2, 0)
{ "x" : 1 } -->> { "x" : 2 } on : shard-a Timestamp(3, 1)
{ "x" : 2 } -->> { "x" : 4 } on : shard-a Timestamp(2, 2) jumbo
{ "x" : 4 } -->> { "x" : { "$maxKey" : 1 } } on : shard-b Timestamp(3, 0)
Step 3: Split the jumbo Chunk. Use either sh.splitAt() or sh.splitFind() to split the jumbo chunk.
sh.splitAt( "test.foo", { x: 3 })
MongoDB removes the jumbo flag upon successful split of the chunk.
Indivisible Chunks In some instances, MongoDB cannot split the no-longer jumbo chunk, such as a chunk with
a range of single shard key value, and the preferred method to clear the flag is not applicable. In such cases, you can
clear the flag using the following steps.
Important: Only use this method if the preferred method (page 706) is not applicable.
Before modifying the config database (page 716), always back up the config database.
If you clear the jumbo flag for a chunk that still exceeds the chunk size and/or the document number limit, MongoDB
will re-label the chunk as jumbo when MongoDB tries to move the chunk.
Step 1: Stop the balancer. Disable the cluster balancer process temporarily, following the steps outlined in Disable
the Balancer (page 695).
Step 2: Create a backup of config database. Use mongodump against a config server to create a backup of the
config database. For example:
mongodump --db config --port <config server port> --out <output file>
For example, the following output from sh.status(true) shows that chunk with shard key range { "x" :
-->> { "x" : 3 } is jumbo.
2 }
707
...
databases:
...
test.foo
shard key: { "x" : 1 }
chunks:
shard-b 2
shard-a 2
{ "x" : { "$minKey" : 1 } } -->> { "x" : 1 } on : shard-b Timestamp(2, 0)
{ "x" : 1 } -->> { "x" : 2 } on : shard-a Timestamp(3, 1)
{ "x" : 2 } -->> { "x" : 3 } on : shard-a Timestamp(2, 2) jumbo
{ "x" : 3 } -->> { "x" : { "$maxKey" : 1 } } on : shard-b Timestamp(3, 0)
Step 5: Update chunks collection. In the chunks collection of the config database, unset the jumbo flag for
the chunk. For example,
db.getSiblingDB("config").chunks.update(
{ ns: "test.foo", min: { x: 2 }, jumbo: true },
{ $unset: { jumbo: "" } }
)
Step 6: Restart the balancer. Restart the balancer, following the steps in Enable the Balancer (page 696).
Step 7: Optional. Clear current cluster meta information. To ensure that mongos instances update their cluster
information cache, run flushRouterConfig in the admin database.
db.adminCommand({ flushRouterConfig: 1 } )
Shard key range tags are distinct from replica set member tags (page 570).
Hash-based sharding only supports tag-aware sharding on an entire collection.
Shard ranges are always inclusive of the lower value and exclusive of the upper boundary.
Behavior and Operations
The balancer migrates chunks of documents in a sharded collection to the shards associated with a tag that has a shard
key range with an upper bound greater than the chunks lower bound.
708
During balancing rounds, if the balancer detects that any chunks violate configured tags, the balancer migrates those
chunks to shards associated with those tags.
After configuring a tag with a shard key range and associating it with a shard or shards, the cluster may take some time
to balance the data among the shards. This depends on the division of chunks and the current distribution of data in
the cluster.
Once configured, the balancer respects tag ranges during future balancing rounds (page 663).
See also:
Manage Shard Tags (page 709)
Additional Resource
Associate tags with a particular shard using the sh.addShardTag() method when connected to a mongos instance. A single shard may have multiple tags, and multiple shards may also have the same tag.
Example
The following example adds the tag NYC to two shards, and the tags SFO and NRT to a third shard:
sh.addShardTag("shard0000",
sh.addShardTag("shard0001",
sh.addShardTag("shard0002",
sh.addShardTag("shard0002",
"NYC")
"NYC")
"SFO")
"NRT")
You may remove tags from a particular shard using the sh.removeShardTag() method when connected to a
mongos instance, as in the following example, which removes the NRT tag from a shard:
sh.removeShardTag("shard0002", "NRT")
To assign a tag to a range of shard keys use the sh.addTagRange() method when connected to a mongos instance.
Any given shard key range may only have one assigned tag. You cannot overlap defined ranges, or tag the same range
more than once.
Example
Given a collection named users in the records database, sharded by the zipcode field. The following operations
assign:
16 http://www.mongodb.com/lp/white-paper/multi-dc
17 https://www.mongodb.com/webinar/Multi-DC-Deployment
709
two ranges of zip codes in Manhattan and Brooklyn the NYC tag
one range of zip codes in San Francisco the SFO tag
sh.addTagRange("records.users", { zipcode: "10001" }, { zipcode: "10281" }, "NYC")
sh.addTagRange("records.users", { zipcode: "11201" }, { zipcode: "11240" }, "NYC")
sh.addTagRange("records.users", { zipcode: "94102" }, { zipcode: "94135" }, "SFO")
Note: Shard ranges are always inclusive of the lower value and exclusive of the upper boundary.
The mongod does not provide a helper for removing a tag range. You may delete tag assignment from a shard key
range by removing the corresponding document from the tags (page 721) collection of the config database.
Each document in the tags (page 721) holds the namespace of the sharded collection and a minimum shard key
value.
Example
The following example removes the NYC tag assignment for the range of zip codes within Manhattan:
use config
db.tags.remove({ _id: { ns: "records.users", min: { zipcode: "10001" }}, tag: "NYC" })
The output from sh.status() lists tags associated with a shard, if any, for each shard. A shards tags exist in the
shards document in the shards (page 721) collection of the config database. To return all shards with a specific
tag, use a sequence of operations that resemble the following, which will return only those shards tagged with NYC:
use config
db.shards.find({ tags: "NYC" })
You can find tag ranges for all namespaces in the tags (page 721) collection of the config database. The output of
sh.status() displays all tag ranges. To return all shard key ranges tagged with NYC, use the following sequence
of operations:
use config
db.tags.find({ tags: "NYC" })
Additional Resource
710
The unique constraint on indexes ensures that only one document can have a value for a field in a collection. For
sharded collections these unique indexes cannot enforce uniqueness because insert and indexing operations are local
to each shard.
MongoDB does not support creating new unique indexes in sharded collections and will not allow you to shard collections with unique indexes on fields other than the _id field.
If you need to ensure that a field is always unique in a sharded collection, there are three options:
1. Enforce uniqueness of the shard key (page 654).
MongoDB can enforce uniqueness for the shard key. For compound shard keys, MongoDB will enforce uniqueness on the entire key combination, and not for a specific component of the shard key.
You cannot specify a unique constraint on a hashed index (page 487).
2. Use a secondary collection to enforce uniqueness.
Create a minimal collection that only contains the unique field and a reference to a document in the main
collection. If you always insert into a secondary collection before inserting to the main collection, MongoDB
will produce an error if you attempt to use a duplicate key.
If you have a small data set, you may not need to shard this collection and you can create multiple unique
indexes. Otherwise you can shard on a single unique key.
3. Use guaranteed unique identifiers.
Universally unique identifiers (i.e. UUID) like the ObjectId are guaranteed to be unique.
Procedures
Remember that the _id field index is always unique. By default, MongoDB inserts an ObjectId into the _id field.
However, you can manually insert your own value into the _id field and use this as the shard key. To use the _id
field as the shard key, use the following operation:
db.runCommand( { shardCollection : "test.users" } )
Limitations
You can only enforce uniqueness on one single field in the collection using this method.
If you use a compound shard key, you can only enforce uniqueness on the combination of component keys in
the shard key.
In most cases, the best shard keys are compound keys that include elements that permit write scaling (page 655) and
query isolation (page 655), as well as high cardinality (page 675). These ideal shard keys are not often the same keys
that require uniqueness and enforcing unique values in these collections requires a different approach.
711
Unique Constraints on Arbitrary Fields If you cannot use a unique field as the shard key or if you need to enforce
uniqueness over multiple fields, you must create another collection to act as a proxy collection. This collection must
contain both a reference to the original document (i.e. its ObjectId) and the unique key.
If you must shard this proxy collection, then shard on the unique key using the above procedure (page 711); otherwise, you can simply create multiple unique indexes on the collection.
Process Consider the following for the proxy collection:
{
"_id" : ObjectId("...")
"email" ": "..."
}
The _id field holds the ObjectId of the document it reflects, and the email field is the field on which you want to
ensure uniqueness.
To shard this collection, use the following operation using the email field as the shard key:
db.runCommand( { shardCollection : "records.proxy" ,
key : { email : 1 } ,
unique : true } );
If you do not need to shard the proxy collection, use the following command to create a unique index on the email
field:
db.proxy.createIndex( { "email" : 1 }, { unique : true } )
You may create multiple unique indexes on this collection if you do not plan to shard the proxy collection.
To insert documents, use the following procedure in the JavaScript shell:
db = db.getSiblingDB('records');
var primary_id = ObjectId();
db.proxy.insert({
"_id" : primary_id
"email" : "[email protected]"
})
// if: the above operation returns successfully,
// then continue:
db.information.insert({
"_id" : primary_id
"email": "[email protected]"
// additional information...
})
You must insert a document into the proxy collection first. If this operation succeeds, the email field is unique, and
you may continue by inserting the actual document into the information collection.
See
The full documentation of: createIndex() and shardCollection.
Considerations
712
Your application must catch errors when inserting documents into the proxy collection and must enforce
consistency between the two collections.
If the proxy collection requires sharding, you must shard on the single field on which you want to enforce
uniqueness.
To enforce uniqueness on more than one field using sharded proxy collections, you must have one proxy collection for every field for which to enforce uniqueness. If you create multiple unique indexes on a single proxy
collection, you will not be able to shard proxy collections.
Use Guaranteed Unique Identifier The best way to ensure a field has unique values is to generate universally
unique identifiers (UUID,) such as MongoDBs ObjectId values.
This approach is particularly useful for the_id field, which must be unique: for collections where you are not
sharding by the _id field the application is responsible for ensuring that the _id field is unique.
Shard GridFS Data Store
When sharding a GridFS store, consider the following:
files Collection
Most deployments will not need to shard the files collection. The files collection is typically small, and only
contains metadata. None of the required keys for GridFS lend themselves to an even distribution in a sharded situation.
If you must shard the files collection, use the _id field possibly in combination with an application field.
Leaving files unsharded means that all the file metadata documents live on one shard. For production GridFS stores
you must store the files collection on a replica set.
chunks Collection
1 , n :
db.fs.chunks.createIndex( { files_id : 1 , n : 1 } )
db.runCommand( { shardCollection : "test.fs.chunks" , key : { files_id : 1 , n : 1 } } )
You may also want to shard using just the file_id field, as in the following operation:
db.runCommand( { shardCollection : "test.fs.chunks" , key : {
Important: { files_id : 1 , n :
for the chunks collection of a GridFS store.
1 } and { files_id :
files_id : 1 } } )
713
And:
mongos specified a different config database string
To solve the issue, restart the mongos with the correct string.
Cursor Fails Because of Stale Config Data
A query returns the following warning when one or more of the mongos instances has not yet updated its cache of
the clusters metadata from the config database:
could not initialize cursor across all shards because : stale config detected
This warning should not propagate back to your application. The warning will repeat until all the mongos instances
refresh their caches. To force an instance to refresh its cache, run the flushRouterConfig command.
Avoid Downtime when Moving Config Servers
Use CNAMEs to identify your config servers to the cluster so that you can rename and renumber your config servers
without downtime.
714
715
Name
flushRouterConfig
addShard
cleanupOrphaned
Description
Forces an update to the cluster metadata cached by a mongos.
Adds a shard to a sharded cluster.
Removes orphaned data with shard key values outside of the ranges of the chunks
owned by a shard.
checkShardingIndexInternal command that validates index on shard key.
enableSharding
Enables sharding on a specific database.
listShards
Returns a list of configured shards.
removeShard
Starts the process of removing a shard from a sharded cluster.
getShardMap
Internal command that reports on the state of a sharded cluster.
getShardVersion
Internal command that returns the config server version.
mergeChunks
Provides the ability to combine chunks on a single shard.
setShardVersion
Internal command to sets the config server version.
shardCollection
Enables the sharding functionality for a collection, allowing the collection to be
sharded.
shardingState
Reports whether the mongod is a member of a sharded cluster.
unsetSharding
Internal command that affects connections between instances in a MongoDB
deployment.
split
Creates a new chunk.
splitChunk
Internal command to split chunk. Instead use the methods sh.splitFind() and
sh.splitAt().
splitVector
Internal command that determines split points.
medianKey
Deprecated internal command. See splitVector.
moveChunk
Internal command that migrates chunks between shards.
movePrimary
Reassigns the primary shard when removing a shard from a sharded cluster.
isdbgrid
Verifies that a process is a mongos.
You can return a list of the collections, with the following helper:
show collections
716
Collections
config
config.changelog
717
Each document in the changelog (page 717) collection contains the following fields:
config.changelog._id
The value of changelog._id is: <hostname>-<timestamp>-<increment>.
config.changelog.server
The hostname of the server that holds this data.
config.changelog.clientAddr
A string that holds the address of the client, a mongos instance that initiates this change.
config.changelog.time
A ISODate timestamp that reflects when the change occurred.
config.changelog.what
Reflects the type of change recorded. Possible values are:
dropCollection
dropCollection.start
dropDatabase
dropDatabase.start
moveChunk.start
moveChunk.commit
split
multi-split
config.changelog.ns
Namespace where the change occurred.
config.changelog.details
A document that contains additional details regarding the change.
(page 718) document depends on the type of change.
config.chunks
718
"shard" : "shard0004"
}
These documents store the range of values for the shard key that describe the chunk in the min and max fields.
Additionally the shard field identifies the shard in the cluster that owns the chunk.
config.collections
config.databases
config.lockpings
719
config.locks
If a mongos holds the balancer lock, the state field has a value of 2, which means that balancer is active.
The when field indicates when the balancer began the current operation.
Changed in version 2.0: The value of the state field was 1 before MongoDB 2.0.
config.mongos
config.settings
720
config.shards
If the shard has tags (page 708) assigned, this document has a tags field, that holds an array of the tags, as in
the following example:
{ "_id" : "shard0001", "host" : "localhost:30001", "tags": [ "NYC" ] }
config.tags
config.version
721
722
CHAPTER 11
723
724
For systems with multiple concurrent readers and writers, MongoDB will allow clients to read the results of a
write operation before the write operation returns.
If the mongod terminates before the journal commits, even if a write returns successfully, queries may have
read data that will not exist after the mongod restarts.
Other database systems refer to these isolation semantics as read uncommitted. For all inserts and updates, MongoDB modifies each document in isolation: clients never see documents in intermediate states. For multi-document
operations, MongoDB does not provide any multi-document transactions or isolation.
When a standalone mongod returns a successful journaled write concern, the data is fully committed to disk and will
be available after mongod restarts.
For replica sets, write operations are durable only after a write replicates and commits to the journal on a majority of
the voting members of the set. MongoDB regularly commits data to the journal regardless of journaled write concern:
use the commitIntervalMs to control how often a mongod commits the journal.
725
726
Collections are containers for documents that share one or more indexes. Databases are groups of collections stored
on disk using a single set of data files. 6
For an example acme.users namespace, acme is the database name and users is the collection name. Period
characters can occur in collection names, so that acme.user.history is a valid namespace, with acme as the
database name, and user.history as the collection name.
While data models like this appear to support nested collections, the collection namespace is flat, and there is no
difference from the perspective of MongoDB between acme, acme.users, and acme.records.
727
728
To optimize storage use, users can specify a value for the _id field explicitly when inserting documents into the
collection. This strategy allows applications to store a value in the _id field that would have occupied space in
another portion of the document.
You can store any value in the _id field, but because this value serves as a primary key for documents in the
collection, it must uniquely identify them. If the fields value is not unique, then it cannot serve as a primary key
as there would be collisions in the collection.
Use shorter field names.
MongoDB stores all field names in every document. For most documents, this represents a small fraction of the
space used by a document; however, for small documents the field names may represent a proportionally large
amount of space. Consider a collection of documents that resemble the following:
{ last_name : "Smith", best_score: 3.9 }
If you shorten the field named last_name to lname and the field named best_score to score, as follows,
you could save 9 bytes per document.
{ lname : "Smith", score : 3.9 }
Shortening field names reduces expressiveness and does not provide considerable benefit for larger documents
and where document overhead is not of significant concern. Shorter field names do not reduce the size of
indexes, because indexes have a predefined structure.
In general it is not necessary to use short field names.
Embed documents.
In some cases you may want to embed documents in other documents and save on the per-document overhead.
729
Here, my_query then will have a value such as { name : "Joe" }. If my_query contained special characters, for example ,, :, and {, the query simply wouldnt match any documents. For example, users cannot hijack a
query and convert it to a delete.
JavaScript
Note: You can disable all server-side execution of JavaScript, by passing the --noscripting option on the
command line or setting security.javascriptEnabled in a configuration file.
All of the following MongoDB operations permit you to run arbitrary JavaScript expressions directly on the server:
$where
mapReduce
group
You must exercise care in these cases to prevent users from submitting malicious JavaScript.
Fortunately, you can express most queries in MongoDB without JavaScript and for queries that require JavaScript, you
can mix JavaScript and non-JavaScript in a single query. Place all the user-supplied fields directly in a BSON field and
pass JavaScript code to the $where field.
If you need to pass user-supplied values in a $where clause, you may escape these values with the CodeWScope
mechanism. When you set user-submitted values as variables in the scope document, you can avoid evaluating them
on the database server.
Dollar Sign Operator Escaping
Field names in MongoDBs query language have semantic meaning. The dollar sign (i.e $) is a reserved character used
to represent operators (i.e. $inc.) Thus, you should ensure that your applications users cannot inject operators
into their inputs.
In some cases, you may wish to build a BSON object with a user-provided key. In these situations, keys will need
to substitute the reserved $ and . characters. Any character is sufficient, but consider using the Unicode full width
equivalents: U+FF04 (i.e. $) and U+FF0E (i.e. .).
Consider the following example:
BSONObj my_object = BSON( a_key << a_name );
The user may have supplied a $ value in the a_key value. At the same time, my_object might be { $where :
"things" }. Consider the following cases:
730
Insert. Inserting this into the database does no harm. The insert process does not evaluate the object as a query.
Note: MongoDB client drivers, if properly implemented, check for reserved characters in keys on inserts.
Update. The update() operation permits $ operators in the update argument but does not support the
$where operator. Still, some users may be able to inject operators that can manipulate a single document
only. Therefore your application should escape keys, as mentioned above, if reserved characters are possible.
Query Generally this is not a problem for queries that resemble { x : user_obj }: dollar signs are
not top level and have no effect. Theoretically it may be possible for the user to build a query themselves.
But checking the user-submitted content for $ characters in key names may help protect against this kind of
injection.
Driver-Specific Issues
See the PHP MongoDB Driver Security Notes8 page in the PHP driver documentation for more information
When comparing values of different BSON types, MongoDB uses the following comparison order, from lowest to
highest:
1. MinKey (internal type)
2. Null
3. Numbers (ints, longs, doubles)
4. Symbol, String
5. Object
6. Array
7. BinData
8. ObjectId
9. Boolean
10. Date
8 http://us.php.net/manual/en/mongo.security.php
731
11. Timestamp
12. Regular Expression
13. MaxKey (internal type)
MongoDB treats some types as equivalent for comparison purposes. For instance, numeric types undergo conversion
before comparison.
Changed in version 3.0.0: Date objects sort before Timestamp objects. Previously Date and Timestamp objects sorted
together.
The comparison treats a non-existent field as it would an empty BSON Object. As such, a sort on the a field in
documents { } and { a: null } would treat the documents as equivalent in sort order.
With arrays, a less-than comparison or an ascending sort compares the smallest element of arrays, and a greater-than
comparison or a descending sort compares the largest element of the arrays. As such, when comparing a field whose
value is a single-element array (e.g. [ 1 ]) with non-array fields (e.g. 2), the comparison is between 1 and 2. A
comparison of an empty array (e.g. [ ]) treats the empty array as less than null or a missing field.
MongoDB sorts BinData in the following order:
1. First, the length or size of the data.
2. Then, by the BSON one-byte subtype.
3. Finally, by the data, performing a byte-by-byte comparison.
Consider the following mongo example:
db.test.insert(
db.test.insert(
db.test.insert(
db.test.insert(
{x
{x
{x
{x
:
:
:
:
3 } );
2.9 } );
new Date() } );
true } );
db.test.find().sort({x:1});
{ "_id" : ObjectId("4b03155dce8de6586fb002c7"),
{ "_id" : ObjectId("4b03154cce8de6586fb002c6"),
{ "_id" : ObjectId("4b031566ce8de6586fb002c9"),
{ "_id" : ObjectId("4b031563ce8de6586fb002c8"),
"x"
"x"
"x"
"x"
:
:
:
:
2.9 }
3 }
true }
"Tue Nov 17 2009 16:28:03 GMT-0500 (EST)" }
The $type operator provides access to BSON type comparison in the MongoDB query syntax. See the documentation
on BSON types and the $type operator for additional information.
Warning: Data models that associate a field name with different data types within a collection are strongly
discouraged.
Without internal consistency complicates application code, and can lead to unnecessary complexity for application
developers.
See also:
The Tailable Cursors (page 121) page for an example of a C++ use of MinKey.
11.2.14 When multiplying values of mixed types, what type conversion rules apply?
The $mul multiplies the numeric value of a field by a number. For multiplication with values of mixed numeric types
(32-bit integer, 64-bit integer, float), the following type conversion rules apply:
732
32-bit Integer
64-bit Integer
Float
32-bit Integer
32-bit or 64-bit Integer
64-bit Integer
Float
64-bit Integer
64-bit Integer
64-bit Integer
Float
Float
Float
Float
Float
Note:
If the product of two 32-bit integers exceeds the maximum value for a 32-bit integer, the result is a 64-bit integer.
Integer operations of any type that exceed the maximum value for a 64-bit integer produce an error.
Type Check
The { cancelDate : { $type: 10 } } query matches documents that contains the cancelDate field
whose value is null only; i.e. the value of the cancelDate field is of BSON Type Null (i.e. 10) :
db.test.find( { cancelDate : { $type: 10 } } )
The query returns only the document that contains the null value:
{ "_id" : 1, "cancelDate" : null }
Existence Check
The { cancelDate :
cancelDate field:
{ $exists:
733
The query returns only the document that does not contain the cancelDate field:
{ "_id" : 2 }
See also:
The reference documentation for the $type and $exists operators.
db.getCollection("_foo").insert( { a : 1 } )
As a cursor returns documents other operations may interleave with the query: if some of these operations are updates (page 71) that cause the
document to move (in the case of a table scan, caused by document growth) or that change the indexed field on the index used by the query; then
the cursor will return the same document more than once.
11 MongoDB does not permit changes to the value of the _id field; it is not possible for a cursor that transverses this index to pass the same
document more than once.
734
Warning:
You cannot use snapshot() with sharded collections.
You cannot use snapshot() with sort() or hint() cursor methods.
As an alternative, if your collection has a field or fields that are never modified, you can use a unique index on this
field or these fields to achieve a similar result as the snapshot(). Query with hint() to explicitly force the query
to use that index.
735
See also:
Record Allocation Strategies (page 90)
You can exit the line continuation mode if you enter two blank lines, as in the following example:
> if (x > 0
...
736
...
>
11.3.3 Does the mongo shell support tab completion and other keyboard shortcuts?
The mongo shell supports keyboard shortcuts. For example,
Use the up/down arrow keys to scroll through command history. See .dbshell documentation for more information on the .dbshell file.
Use <Tab> to autocomplete or to list the completion possibilities, as in the following example which uses
<Tab> to complete the method name starting with the letter c:
db.myCollection.c<Tab>
Because there are many collection methods starting with the letter c, the <Tab> will list the various methods
that start with c.
For a full list of the shortcuts, see Shell Keyboard Shortcuts
The mongo shell prompt should now reflect the new prompt:
[email protected]>
The mongo shell prompt should now reflect the new prompt:
737
You can add the logic for the prompt in the .mongorc.js file to set the prompt each time you start up the mongo shell.
11.3.5 Can I edit long shell operations with an external text editor?
You can use your own editor in the mongo shell by setting the EDITOR environment variable before starting the
mongo shell. Once in the mongo shell, you can edit with the specified editor by typing edit <variable> or
edit <function>, as in the following example:
1. Set the EDITOR variable from the command line prompt:
EDITOR=vim
The command should open the vim edit session. Remember to save your changes.
5. Type myFunction to see the function definition:
myFunction
You may be familiar with a readers-writer lock as multi-reader or shared exclusive lock. See the Wikipedia page on Readers-Writer
Locks (http://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock) for more information.
738
When a read lock exists, many read operations may use this lock. However, when a write lock exists, a single write
operation holds the lock exclusively, and no other read or write operations may share the lock.
Locks are writer greedy, which means write locks have preference over reads. When both a read and write are
waiting for a lock, MongoDB grants the lock to the write.
739
Lock Type
Read lock
Read lock
Write lock
Write lock
Write lock
Read lock and write lock, unless operations are specified as non-atomic. Portions of
map-reduce jobs can run concurrently.
Building an index in the foreground, which is the default, locks the database for
extended periods of time.
Write lock. The db.eval() method takes a global write lock while evaluating the
JavaScript function. To avoid taking this global write lock, you can use the eval
command with nolock: true.
Write lock. By default, eval command takes a global write lock while evaluating the
JavaScript function. If used with nolock: true, the eval command does not take
a global write lock while evaluating the JavaScript function. However, the logic within
the JavaScript function may take write locks for write operations.
Read lock
db.auth(), and
db.addUser().
11.4.7 Does a MongoDB operation ever lock more than one database?
The following MongoDB operations lock multiple databases:
db.copyDatabase() must lock the entire mongod instance at once.
db.repairDatabase() obtains a global write lock and will block other operations until it finishes.
Journaling, which is an internal operation, locks all databases for short intervals. All databases share a single
journal.
User authentication (page 308) requires a read lock on the admin database for deployments using 2.6 user
credentials (page 400). For deployments using the 2.4 schema for user credentials, authentication locks the
admin database as well as the database the user is accessing.
All writes to a replica sets primary lock both the database receiving the writes and then the local database for
a short time. The lock for the local database allows the mongod to write to the primarys oplog and accounts
for a small portion of the total time of the operation.
741
11.4.11 What kind of concurrency does MongoDB provide for JavaScript operations?
Changed in version 2.4: The V8 JavaScript engine added in 2.4 allows multiple JavaScript operations to run at the
same time. Prior to 2.4, a single mongod could only run a single JavaScript operation at once.
742
743
11.5.10 How does MongoDB ensure unique _id field values when using a shard
key other than _id?
If you do not use _id as the shard key, then your application/client layer must be responsible for keeping the _id
field unique. It is problematic for collections to have duplicate _id values.
If youre not sharding your collection by the _id field, then you should be sure to store a globally unique identifier in
that field. The default BSON ObjectId (page 174) works well in this case.
11.5.11 Ive enabled sharding and added a second shard, but all the data is still on
one server. Why?
First, ensure that youve declared a shard key for your collection. Until you have configured the shard key, MongoDB
will not create chunks, and sharding will not occur.
Next, keep in mind that the default chunk size is 64 MB. As a result, in most situations, the collection needs to have at
least 64 MB of data before a migration will occur.
Additionally, the system which balances chunks among the servers attempts to avoid superfluous migrations. Depending on the number of shards, your shard key, and the amount of data, systems often require at least 10 chunks of data
to trigger migrations.
You can run db.printShardingStatus() to see all the chunks present in your cluster.
744
11.5.18 What is the process for moving, renaming, or changing the number of config servers?
See Sharded Cluster Tutorials (page 669) for information on migrating and replacing config servers.
11.5.20 Is it possible to quickly update mongos servers after updating a replica set
configuration?
The mongos instances will detect these changes without intervention over time. However, if you want to force the
mongos to reload its configuration, run the flushRouterConfig command against to each mongos directly.
745
When this happens, the primary member of the shards replica set then terminates to protect data consistency. If a
secondary member can access the config database, data on the shard becomes accessible again after an election.
The user will need to resolve the chunk migration failure independently. If you encounter this issue, contact the
MongoDB User Group23 or MongoDB Support24 to address this issue.
22 http://www.slideshare.net/mongodb/how-queries-work-with-sharding
23 http://groups.google.com/group/mongodb-user
24 http://www.mongodb.org/about/support
746
11.5.27 How does draining a shard affect the balancing of uneven chunk distribution?
The sharded cluster balancing process controls both migrating chunks from decommissioned shards (i.e. draining) and
normal cluster balancing activities. Consider the following behaviors for different versions of MongoDB in situations
where you remove a shard in a cluster with an uneven chunk distribution:
After MongoDB 2.2, the balancer first removes the chunks from the draining shard and then balances the remaining uneven chunk distribution.
Before MongoDB 2.2, the balancer handles the uneven chunk distribution and then removes the chunks from
the draining shard.
747
11.6.5 Does replication work over the Internet and WAN connections?
Yes.
For example, a deployment may maintain a primary and secondary in an East-coast data center along with a secondary
member for disaster recovery in a West-coast data center.
See also:
Deploy a Geographically Redundant Replica Set (page 588)
11.6.8 What is the preferred replication method: replica sets or replica pairs?
Deprecated since version 1.6.
Replica sets replaced replica pairs in version 1.6. Replica sets are the preferred replication mechanism in MongoDB.
748
Journaling is enabled by default on all 64-bit builds of MongoDB v2.0 and greater.
11.6.10 Are write operations durable if write concern does not acknowledge
writes?
Yes.
However, if you want confirmation that a given write has arrived at the server, use write concern (page 76).
After the default write concern change (page 887), the default write concern acknowledges
all write operations, and unacknowledged writes must be explicitly configured.
See the
http://docs.mongodb.org/manual/applications/drivers documentation for your driver for
more information.
Changed in version 2.6: The mongo shell now defaults to use safe writes (page 76). See Write Method Acknowledgements (page 821) for more information.
A new protocol for write operations (page 815) integrates write concerns with the write operations. Previous versions
issued a getLastError command after a write to specify a write concern.
11.6.12 What information do arbiters exchange with the rest of the replica set?
Arbiters never receive the contents of a collection but do exchange the following data with the rest of the replica set:
Credentials used to authenticate the arbiter with the replica set. All MongoDB processes within a replica set use
keyfiles. These exchanges are encrypted.
Replica set configuration data and voting data. This information is not encrypted. Only credential exchanges
are encrypted.
If your MongoDB deployment uses SSL, then all communications between arbiters and the other members of the
replica set are secure. See the documentation for Configure mongod and mongos for SSL (page 331) for more information. Run all arbiters on secure networks, as with all MongoDB components.
749
See
The overview of Arbiter Members of Replica Sets (page ??).
11.6.15 Is it normal for replica set members to use different amounts of disk space?
Yes.
Factors including: different oplog sizes, different levels of storage fragmentation, and MongoDBs data file preallocation can lead to some variation in storage utilization between nodes. Storage use disparities will be most pronounced when you add members at different times.
750
751
...
"wiredTiger" : {
...
"cache" : {
"tracked dirty bytes in the cache" : <num>,
"bytes currently in the cache" : <num>,
"maximum bytes configured" : <num>,
"bytes read into cache" :<num>,
"bytes written from cache" : <num>,
"pages evicted by application threads" : <num>,
"checkpoint blocked page eviction" : <num>,
"unmodified pages evicted" : <num>,
"page split during eviction deepened the tree" : <num>,
"modified pages evicted" : <num>,
"pages selected for eviction unable to be evicted" : <num>,
"pages evicted because they exceeded the in-memory maximum" : <num>,,
"pages evicted because they had chains of deleted items" : <num>,
"failed eviction of pages that exceeded the in-memory maximum" : <num>,
"hazard pointer blocked page eviction" : <num>,
"internal pages evicted" : <num>,
"maximum page size at eviction" : <num>,
"eviction server candidate queue empty when topping up" : <num>,
"eviction server candidate queue not empty when topping up" : <num>,
"eviction server evicting pages" : <num>,
"eviction server populating queue, but not evicting pages" : <num>,
"eviction server unable to reach eviction goal" : <num>,
"pages split during eviction" : <num>,
"pages walked for eviction" : <num>,
"eviction worker thread evicting pages" : <num>,
"in-memory page splits" : <num>,
"percentage overhead" : <num>,
"tracked dirty pages in the cache" : <num>,
"pages currently held in the cache" : <num>,
"pages read into cache" : <num>,
"pages written from cache" : <num>,
},
...
752
Memory mapping assigns files to a block of virtual memory with a direct byte-for-byte correlation. MongoDB memory
maps data files to memory as it accesses documents. Unaccessed data is not mapped to memory.
Once mapped, the relationship between file and memory allows MongoDB to interact with the data in the file as if it
were memory.
Why are the files in my data directory larger than the data in my database?
The data files in your data directory, which is the /data/db directory in default configurations, might be larger than
the data set inserted into the database. Consider the following possible causes:
Preallocated data files
MongoDB preallocates its data files to avoid filesystem fragmentation, and because of this, the size of these files do
not necessarily reflect the size of your data.
The storage.mmapv1.smallFiles option will reduce the size of these files, which may be useful if you have
many small databases on disk.
The oplog
If this mongod is a member of a replica set, the data directory includes the oplog.rs file, which is a preallocated
capped collection in the local database.
The default allocation is approximately 5% of disk space on 64-bit installations. In most cases, you should not need
to resize the oplog. See Oplog Sizing (page 573) for more information.
The journal
The data directory contains the journal files, which store write operations on disk before MongoDB applies them to
databases. See Journaling Mechanics (page 300).
Empty records
MongoDB maintains lists of empty records in data files as it deletes documents and collections. MongoDB can reuse
this space, but will not, by default, return this space to the operating system.
To de-fragment allocated storage, use compact. By de-fragmenting storage, MongoDB can more effectively use the
allocated space. compact requires up to 2 gigabytes of extra disk space to run. Do not use compact if you are
critically low on disk space.
compact only removes fragmentation from MongoDB data files within a collection, and does not return any disk
space to the operating system.
If you must reclaim disk space, you can use repairDatabase. This command rebuilds the database,
de-fragmenting the associated storage in the process.
This may release space to the operating system.
repairDatabase requires up to 2 gigabytes of extra disk space to run. Do not use repairDatabase if you
are critically low on disk space.
Warning: repairDatabase requires enough free disk space to hold both the old and new database files while
the repair is running. Be aware that repairDatabase will block all other operations and may take a long time
to complete.
753
MongoDB also provides the following methods to return specific sizes for the collection:
db.collection.dataSize() to return data size in bytes for the collection.
db.collection.storageSize() to return allocation size in bytes, including unused space.
db.collection.totalSize() to return the data size plus the index size in bytes.
db.collection.totalIndexSize() to return the index size in bytes.
754
The following script prints the statistics for each collection in each database:
db._adminCommand("listDatabases").databases.forEach(function (d) {
mdb = db.getSiblingDB(d.name);
mdb.getCollectionNames().forEach(function(c) {
s = mdb[c].stats();
printjson(s);
})
})
755
756
11.8.11 Can I use a multi-key index to support a query for a whole array?
Not entirely. The index can partially support these queries because it can speed the selection of the first element of
the array; however, comparing all subsequent items in the array cannot use the index and must scan the documents
individually.
11.8.12 How can I effectively use indexes strategy for attribute lookups?
For simple attribute lookups that dont require sorted result sets or range queries, consider creating a field that contains
an array of documents where each document has a field (e.g. attrib ) that holds a specific type of attribute. You can
index this attrib field.
For example, the attrib field in the following document allows you to add an unlimited number of attributes types:
{ _id : ObjectId(...),
attrib : [
{ k: "color",
{ k: "shape":
{ k: "color":
{ k: "avail":
]
}
v:
v:
v:
v:
"red" },
"rectangle" },
"blue" },
true }
1, "attrib.v":
1 } index:
11.9.1 Where can I find information about a mongod process that stopped running
unexpectedly?
If mongod shuts down unexpectedly on a UNIX or UNIX-based platform, and if mongod fails to log a shutdown or
error message, then check your system logs for messages pertaining to MongoDB. For example, for logs located in
/var/log/messages, use the following commands:
sudo grep mongod /var/log/messages
sudo grep score /var/log/messages
29 https://groups.google.com/forum/?fromgroups#!forum/mongodb-user
757
The value is measured in seconds. You can change the tcp_keepalive_time value with the following operation:
echo <value> > /proc/sys/net/ipv4/tcp_keepalive_time
For OS X systems, issue the following command to view the keep alive setting:
sysctl net.inet.tcp.keepinit
The above methods of setting the TCP keepalive are not persistent; you will need to reset the new
tcp_keepalive_time value each time you reboot or restart a system. see your operating systems documentation for instructions on setting the TCP keepalive value persistently.
For Windows systems, issue the following command to view the keep alive setting:
reg query HKLM\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters /v KeepAliveTime
The registry value is not present by default. The system default, used if the value is absent, is 7200000 milliseconds
or 0x6ddd00 in hexadecimal. To set a shorter keep alive period use the following invocation in an Administrator
Command Prompt, where <value> is expressed in hexadecimal (e.g. 0x0124c0 is 120000):
reg add HKLM\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\ /v KeepAliveTime /d <value>
Windows users should consider the Windows Server Technet Article on KeepAliveTime30 for more information on
setting keep alive for MongoDB deployments on Windows systems.
Changed in version 2.0.
You will need to restart mongod and mongos servers for new system-wide tcp_keepalive_time settings to
take effect on server versions newer than 2.0 if the new value is less than 600 seconds. Values greater than or equal to
600 seconds will be ignored by mongod and mongos.
30 https://technet.microsoft.com/en-us/library/cc957549.aspx
758
759
The number or rate of page faults and other MMS gauges to detect when you need more RAM
Each database connection thread will need up to 1 MB of RAM.
MongoDB defers to the operating system when loading data into memory from disk. It simply memory maps
(page 752) all its data files and relies on the operating system to cache data. The OS typically evicts the leastrecently-used data from RAM when it runs low on memory. For example if clients access indexes more frequently
than documents, then indexes will more likely stay in RAM, but it depends on your particular usage.
To calculate how much RAM you need, you must calculate your working set size, or the portion of your data that
clients use most often. This depends on your access patterns, what indexes you have, and the size of your documents.
Because MongoDB uses a thread per connection model, each database connection also will need up to 1MB of RAM,
whether active or idle.
If page faults are infrequent, your working set fits in RAM. If fault rates rise higher than that, you risk performance
degradation. This is less critical with SSD drives than with spinning disks.
How do I read memory statistics in the UNIX top command
Because mongod uses memory-mapped files (page 752), the memory statistics in top require interpretation in a
special way. On a large database, VSIZE (virtual bytes) tends to be the size of the entire database. If the mongod
doesnt have other processes running, RSIZE (resident bytes) is the total memory of the machine, as this counts file
system cache contents.
For Linux systems, use the vmstat command to help determine how the system uses memory. On OS X systems use
vm_stat.
760
Finally, if your shard key has a low cardinality (page 675), MongoDB may not be able to create sufficient splits among
the data.
Why would one shard receive a disproportion amount of traffic in a sharded cluster?
In some situations, a single shard or a subset of the cluster will receive a disproportionate portion of the traffic and
workload. In almost all cases this is the result of a shard key that does not effectively allow write scaling (page 655).
Its also possible that you have hot chunks. In this case, you may be able to solve the problem by splitting and then
migrating parts of these chunks.
In the worst case, you may have to consider re-sharding your data and choosing a different shard key (page 673) to
correct this pattern.
What can prevent a sharded cluster from balancing?
If you have just deployed your sharded cluster, you may want to consider the troubleshooting suggestions for a new
cluster where data remains on a single shard (page 760).
If the cluster was initially balanced, but later developed an uneven distribution of data, consider the following possible
causes:
You have deleted or removed a significant amount of data from the cluster. If you have added additional data, it
may have a different distribution with regards to its shard key.
Your shard key has low cardinality (page 675) and MongoDB cannot split the chunks any further.
Your data set is growing faster than the balancer can distribute data around the cluster. This is uncommon and
typically is the result of:
a balancing window (page 694) that is too short, given the rate of data growth.
an uneven distribution of write operations (page 655) that requires more data migration. You may have to
choose a different shard key to resolve this issue.
poor network connectivity between shards, which may lead to chunk migrations that take too long to
complete. Investigate your network configuration and interconnections between shards.
Why do chunk migrations affect sharded cluster performance?
If migrations impact your cluster or applications performance, consider the following options, depending on the nature
of the impact:
1. If migrations only interrupt your clusters sporadically, you can limit the balancing window (page 694) to prevent
balancing activity during peak hours. Ensure that there is enough time remaining to keep the data from becoming
out of balance again.
2. If the balancer is always migrating chunks to the detriment of overall cluster performance:
You may want to attempt decreasing the chunk size (page 705) to limit the size of the migration.
Your cluster may be over capacity, and you may want to attempt to add one or two shards (page 676) to
the cluster to distribute load.
Its also possible that your shard key causes your application to direct all writes to a single shard. This kind of activity
pattern can require the balancer to migrate most data soon after writing it. Consider redeploying your cluster with a
shard key that provides better write scaling (page 655).
761
762
CHAPTER 12
Release Notes
Always install the latest, stable version of MongoDB. See MongoDB Version Numbers (page 887) for more information.
See the following release notes for an account of the changes in major versions. Release notes also include instructions
for upgrade.
3.0.1 Changelog
Security
SERVER-175073 MongoDB3 enterprise AuditLog
SERVER-173794 Change or to and in webserver localhost exception check
SERVER-169445 dbAdminAnyDatabase should have full parity with dbAdmin for a given database
1 http://docs.opsmanager.mongodb.com/current/
2 http://docs.opsmanager.mongodb.com/current/release-notes/application/
3 https://jira.mongodb.org/browse/SERVER-17507
4 https://jira.mongodb.org/browse/SERVER-17379
5 https://jira.mongodb.org/browse/SERVER-16944
763
SERVER-168496 On mongos we always invalidate the user cache once, even if no user definitions are changing
SERVER-164527 Failed login attempts should log source IP address
Querying
SERVER-173958 Add FSM tests to stress yielding
SERVER-173879 invalid projection for findAndModify triggers fassert() failure
SERVER-1472310 Crash during query planning for geoNear with multiple 2dsphere indices
SERVER-1748611 Crash when parsing invalid polygon coordinates
Replication
SERVER-1751512 copyDatabase fails to replicate indexes to secondary
SERVER-1749913 Using eval command to run getMore on aggregation cursor trips fatal assertion
SERVER-1748714 cloner dropDups removes _id entries belonging to other records
SERVER-1730215 consider blacklist in shouldChangeSyncSource
Sharding
SERVER-1739816 Deadlock in MigrateStatus::startCommit
SERVER-1730017 Balancer tries to create config.tags index multiple times
SERVER-1684918 On mongos we always invalidate the user cache once, even if no user definitions are changing
SERVER-500419 balancer should check for stopped between chunk moves in current round
Indexing
SERVER-1752120 improve createIndex validation of empty name
SERVER-1743621 MultiIndexBlock may access deleted collection after recovering from yield
Aggregation Framework SERVER-1722422 Aggregation pipeline with 64MB document can terminate server
6 https://jira.mongodb.org/browse/SERVER-16849
7 https://jira.mongodb.org/browse/SERVER-16452
8 https://jira.mongodb.org/browse/SERVER-17395
9 https://jira.mongodb.org/browse/SERVER-17387
10 https://jira.mongodb.org/browse/SERVER-14723
11 https://jira.mongodb.org/browse/SERVER-17486
12 https://jira.mongodb.org/browse/SERVER-17515
13 https://jira.mongodb.org/browse/SERVER-17499
14 https://jira.mongodb.org/browse/SERVER-17487
15 https://jira.mongodb.org/browse/SERVER-17302
16 https://jira.mongodb.org/browse/SERVER-17398
17 https://jira.mongodb.org/browse/SERVER-17300
18 https://jira.mongodb.org/browse/SERVER-16849
19 https://jira.mongodb.org/browse/SERVER-5004
20 https://jira.mongodb.org/browse/SERVER-17521
21 https://jira.mongodb.org/browse/SERVER-17436
22 https://jira.mongodb.org/browse/SERVER-17224
764
Write Ops
SERVER-1748923 in bulk ops, only mark last operation with commit=synchronous
SERVER-1727624 WriteConflictException retry loops needed for collection creation on upsert
Concurrency
SERVER-1750125 Increase journalling capacity limits
SERVER-1741626 Deadlock between MMAP V1 journal lock and oplog collection lock
SERVER-1739527 Add FSM tests to stress yielding
Storage
SERVER-1751528 copyDatabase fails to replicate indexes to secondary
SERVER-1743629 MultiIndexBlock may access deleted collection after recovering from yield
SERVER-1741630 Deadlock between MMAP V1 journal lock and oplog collection lock
SERVER-1738131 Rename rocksExperiment to RocksDB
SERVER-1736932 [Rocks] Fix the calculation of nextPrefix
SERVER-1734533 WiredTiger -> session.truncate: the start cursor position is after the stop cursor position
SERVER-1733134 RocksDB configuring and monitoring
SERVER-1732335 MMAPV1Journal lock counts are changing during WT run
SERVER-1731936 invariant at shutdown rc9, rc10, rc11 with wiredTiger
SERVER-1729337 Server crash setting wiredTigerEngineRuntimeConfig:eviction=(threads_max=8)
WiredTiger
SERVER-1751038 Didnt find RecordId in WiredTigerRecordStore on collections after an idle period
SERVER-1750639 Race between inserts and checkpoints can lose records under WiredTiger
SERVER-1748740 cloner dropDups removes _id entries belonging to other records
SERVER-1748141 WiredTigerRecordStore::validate should call WT_SESSION::verify
23 https://jira.mongodb.org/browse/SERVER-17489
24 https://jira.mongodb.org/browse/SERVER-17276
25 https://jira.mongodb.org/browse/SERVER-17501
26 https://jira.mongodb.org/browse/SERVER-17416
27 https://jira.mongodb.org/browse/SERVER-17395
28 https://jira.mongodb.org/browse/SERVER-17515
29 https://jira.mongodb.org/browse/SERVER-17436
30 https://jira.mongodb.org/browse/SERVER-17416
31 https://jira.mongodb.org/browse/SERVER-17381
32 https://jira.mongodb.org/browse/SERVER-17369
33 https://jira.mongodb.org/browse/SERVER-17345
34 https://jira.mongodb.org/browse/SERVER-17331
35 https://jira.mongodb.org/browse/SERVER-17323
36 https://jira.mongodb.org/browse/SERVER-17319
37 https://jira.mongodb.org/browse/SERVER-17293
38 https://jira.mongodb.org/browse/SERVER-17510
39 https://jira.mongodb.org/browse/SERVER-17506
40 https://jira.mongodb.org/browse/SERVER-17487
41 https://jira.mongodb.org/browse/SERVER-17481
765
766
Fixed race condition in WiredTiger between inserts and checkpoints that could result in lost records: SERVER1750667 .
Resolved issue in WiredTigers capped collections implementing that caused a server crash: SERVER-1734568 .
Fixed issue is initial sync with duplicate _id entries: SERVER-1748769 .
Fixed deadlock condition in MMAPv1 between the journal lock and the oplog collection lock: SERVER1741670 .
All issues closed in 3.0.171
Major Changes
Pluggable Storage Engine API
MongoDB 3.0 introduces a pluggable storage engine API that allows third parties to develop storage engines for
MongoDB.
60 https://jira.mongodb.org/browse/SERVER-16452
61 https://jira.mongodb.org/browse/SERVER-17252
62 https://jira.mongodb.org/browse/SERVER-14166
63 https://jira.mongodb.org/browse/SERVER-17338
64 https://jira.mongodb.org/browse/SERVER-17443
65 https://jira.mongodb.org/browse/SERVER-17442
66 https://jira.mongodb.org/browse/SERVER-17395
67 https://jira.mongodb.org/browse/SERVER-17506
68 https://jira.mongodb.org/browse/SERVER-17345
69 https://jira.mongodb.org/browse/SERVER-17487
70 https://jira.mongodb.org/browse/SERVER-17416
71 https://jira.mongodb.org/issues/?jql=fixVersion%20%3D%20%223.0.1%22%20AND%20project%20%3D%20SERVER
767
WiredTiger
MongoDB 3.0 introduces support for the WiredTiger72 storage engine. With the support for WiredTiger, MongoDB
now supports two storage engines:
MMAPv1, the storage engine available in previous versions of MongoDB and the default storage engine for
MongoDB 3.0, and
WiredTiger73 , available only in the 64-bit versions of MongoDB 3.0.
WiredTiger Usage WiredTiger is an alternate to the default MMAPv1 storage engine. WiredTiger supports all MongoDB features, including operations that report on server, database, and collection statistics. Switching to WiredTiger,
however, requires a change to the on-disk storage format (page 773). For instructions on changing the storage engine
to WiredTiger, see the appropriate sections in the Upgrade MongoDB to 3.0 (page 778) documentation.
MongoDB 3.0 replica sets and sharded clusters can have members with different storage engines; however, performance can vary according to workload. For details, see the appropriate sections in the Upgrade MongoDB to 3.0
(page 778) documentation.
The WiredTiger storage engine requires the latest official MongoDB drivers. For more information, see WiredTiger
and Driver Version Compatibility (page 773).
See also:
Support for touch Command (page 773), WiredTiger Storage Engine (page 89) section in the Storage (page 88) documentation
WiredTiger Configuration To configure the behavior and properties of the WiredTiger storage engine, see
storage.wiredTiger configuration options. You can set WiredTiger options on the command line.
See also:
WiredTiger Storage Engine (page 89) section in the Storage (page 88) documentation
WiredTiger Concurrency and Compression The 3.0 WiredTiger storage engine provides document-level locking
and compression.
By default, WiredTiger compresses collection data using the snappy compression library. WiredTiger uses prefix
compression on all indexes by default.
See also:
WiredTiger (page 199) section in the Production Notes (page 198)
MMAPv1 Improvements
MMAPv1 Concurrency Improvement In version 3.0, the MMAPv1 storage engine adds support for collectionlevel locking.
MMAPv1 Configuration Changes To support multiple storage engines, some configuration settings for MMAPv1
have changed. See Configuration File Options Changes (page 772).
72 http://wiredtiger.com
73 http://wiredtiger.com
768
MMAPv1 Record Allocation Behavior Changes MongoDB 3.0 no longer implements dynamic record allocation and deprecates paddingFactor. The default allocation strategy for collections in instances that use MMAPv1
is power of 2 allocation (page 90), which has been improved to better handle large document sizes. In 3.0, the
usePowerOf2Sizes flag is ignored, so the power of 2 strategy is used for all collections that do not have
noPadding flag set.
For collections with workloads that consist only of inserts or in-place updates (such as incrementing counters), you
can disable the power of 2 strategy. To disable the power of 2 strategy for a collection, use the collMod command
with the noPadding flag or the db.createCollection() method with the noPadding option.
Warning: Do not set noPadding if the workload includes removes or any updates that may cause documents
to grow. For more information, see No Padding Allocation Strategy (page 90).
When low on disk space, MongoDB 3.0 no longer errors on all writes but only when the required disk allocation fails.
As such, MongoDB now allows in-place updates and removes when low on disk space.
See also:
Dynamic Record Allocation (page 773)
Replica Sets
Increased Number of Replica Set Members
74
The process that a primary member of a replica set uses to step down has the following changes:
Before stepping down, replSetStepDown will attempt to terminate long running user operations that would
block the primary from stepping down, such as an index build, a write operation or a map-reduce job.
To help prevent rollbacks, the replSetStepDown will wait for an electable secondary to catch up to the state
of the primary before stepping down. Previously, a primary would wait for a secondary to catch up to within 10
seconds of the primary (i.e. a secondary with a replication lag of 10 seconds or less) before stepping down.
replSetStepDown now allows users to specify a secondaryCatchUpPeriodSecs parameter to specify how long the primary should wait for a secondary to catch up before stepping down.
74
769
Initial sync builds indexes more efficiently for each collection and applies oplog entries in batches using threads.
Definition of w: majority (page 129) write concern changed to mean majority of voting nodes.
Stronger restrictions on Replica Set Configuration (page 632). For details, see Replica Set Configuration Validation (page 774).
For pre-existing collections on secondary members, MongoDB 3.0 no longer automatically builds missing _id
indexes.
See also:
Replication Changes (page 773) in Compatibility Changes in MongoDB 3.0 (page 772)
Sharded Clusters
MongoDB 3.0 provides the following enhancements to sharded clusters:
Adds a new sh.removeTagRange() helper to improve management of sharded collections with tags. The
new sh.removeTagRange() method acts as a complement to sh.addTagRange().
Provides a more predictable read preference behavior. mongos instances no longer pin connections to members
of replica sets when performing read operations. Instead, mongos reevaluates read preferences (page 568) for
every operation to provide a more predictable read preference behavior when read preferences change.
Provides a new writeConcern setting to configure the write concern (page 76) of chunk migration operations. You can configure the writeConcern setting for the balancer (page 693) as well as for moveChunk
and cleanupOrphaned commands.
Improves visibility of balancer operations. sh.status() includes information about the state of the balancer.
See sh.status() for details.
See also:
Sharded Cluster Setting (page 775) in Compatibility Changes in MongoDB 3.0 (page 772)
Security Improvements
MongoDB 3.0 includes the following security enhancements:
Adds a new SCRAM-SHA-1 (page 309) challenge-response user authentication mechanism.
Increases restrictions when using the Localhost Exception (page 311) to access MongoDB. For details, see
Localhost Exception Changed (page 775).
See also:
Security Changes (page 775)
Improvements
New Query Introspection System
MongoDB 3.0 includes a new query introspection system that provides an improved output format and a finer-grained
introspection into both query plan and query execution.
770
For details, see the new db.collection.explain() method and the new explain command as well as the
updated cursor.explain() method.
To improve usability of the log messages for diagnosis, MongoDB categorizes some log messages under specific
components, or operations, and provides the ability to set the verbosity level for these components. For information,
see http://docs.mongodb.org/manual/reference/log-messages.
MongoDB Tools Enhancements
All MongoDB tools are now written in Go and maintained as a separate project.
New options for parallelized mongodump and mongorestore. You can control the number of collections
that mongorestore will restore at a time with the --numParallelCollections option.
New options -excludeCollection and --excludeCollectionsWithPrefix for mongodump to
exclude collections.
mongorestore can now accept BSON data input from standard input in addition to reading BSON data from
file.
mongostat and mongotop can now return output in JSON format with the --json option.
Added configurable write concern to mongoimport, mongorestore, and mongofiles.
--writeConcern option.
Use the
mongofiles now allows you to configure the GridFS prefix with the --prefix option so that you can use
custom namespaces and store multiple GridFS namespaces in a single database.
See also:
MongoDB Tools Changes (page 774)
Indexes
Background index builds will no longer automatically interrupt if dropDatabase, drop, dropIndexes
operations occur for the database or collection affected by the index builds. The dropDatabase, drop,
and dropIndexes commands will still fail with the error message a background operation is
currently running, as in 2.6.
If you specify multiple indexes to the createIndexes command,
the command only scans the collection once, and
if at least one index is to be built in the foreground, the operation will build all the specified indexes in the
foreground.
For sharded collections, indexes can now cover queries (page 64) that execute against the mongos if the index
includes the shard key.
See also:
Indexes (page 776) in Compatibility Changes in MongoDB 3.0 (page 772)
771
Query Enhancements
Most non-Enterprise MongoDB distributions now include support for SSL. Previously, only MongoDB Enterprise
distributions came with SSL support included; for non-Enterprise distributions, you had to build MongoDB locally
with the --ssl flag (i.e. scons --ssl).
MongoDB Enterprise Features
Auditing
Auditing (page 317) in MongoDB Enterprise can filter on any field in the audit message (page 409), including the fields
returned in the param (page 410) document. This enhancement, along with the auditAuthorizationSuccess
parameter, enables auditing to filter on CRUD operations. However, enabling auditAuthorizationSuccess to
audit of all authorization successes degrades performance more than auditing only the authorization failures.
Additional Information
Changes Affecting Compatibility
Compatibility Changes in MongoDB 3.0 The following 3.0 changes can affect the compatibility with older versions of MongoDB. See Release Notes for MongoDB 3.0 (page 763) for the full list of the 3.0 changes.
Storage Engine
Configuration File Options Changes With the introduction of additional storage engines in 3.0, some
configuration file options have changed:
Previous Setting
storage.journal.commitIntervalMs
storage.journal.debugFlags
storage.nsSize
storage.preallocDataFiles
storage.quota.enforced
storage.quota.maxFilesPerDB
storage.smallFiles
772
New Setting
storage.mmapv1.journal.commitIntervalMs
storage.mmapv1.journal.debugFlags
storage.mmapv1.nsSize
storage.mmapv1.preallocDataFiles
storage.mmapv1.quota.enforced
storage.mmapv1.quota.maxFilesPerDB
storage.mmapv1.smallFiles
3.0 mongod instances are backward compatible with existing configuration files, but will issue warnings when if you
attempt to use the old settings.
Data Files Must Correspond to Configured Storage Engine The files in the dbPath directory must correspond
to the configured storage engine (i.e. --storageEngine). mongod will not start if dbPath contains data files
created by a storage engine other than the one specified by --storageEngine.
See also:
Change Storage Engine to WiredTiger sections in Upgrade MongoDB to 3.0 (page 778)
WiredTiger and Driver Version Compatibility For MongoDB 3.0 deployments that use the WiredTiger storage
engine, the following operations return no output when issued in previous versions of the mongo shell or drivers:
db.getCollectionNames()
db.collection.getIndexes()
show collections
show tables
Use the 3.0 mongo shell or the 3.0 compatible version (page 776) of the official drivers when connecting to 3.0
mongod instances that use WiredTiger. The 2.6.8 mongo shell is also compatible with 3.0 mongod instances that
use WiredTiger.
db.fsyncLock() is not Compatible with WiredTiger With WiredTiger the db.fsyncLock() and
db.fsyncUnlock() operations cannot guarantee that the data files do not change. As a result, do not use these
methods to ensure consistency for the purposes of creating backups.
Support for touch Command If a storage engine does not support the touch, then the touch command will
return an error.
The MMAPv1 storage engine supports touch.
The WiredTiger storage engine does not support touch.
Dynamic Record Allocation
Factor.
MongoDB 3.0 no longer supports dynamic record allocation and deprecates padding-
MongoDB 3.0 deprecates the newCollectionsUsePowerOf2Sizes parameter such that you can no longer use
the parameter to disable the power of 2 sizes allocation for a collection. Instead, use the collMod command with the
noPadding flag or the db.createCollection() method with the noPadding option. Only set noPadding
for collections with workloads that consist only of inserts or in-place updates (such as incrementing counters).
Warning: Only set noPadding to true for collections whose workloads have no update operations that cause
documents to grow, such as for collections with workloads that are insert-only. For more information, see No
Padding Allocation Strategy (page 90).
For more information, see MMAPv1 Record Allocation Behavior Changes (page 769).
Replication Changes
773
Replica Set Oplog Format Change MongoDB 3.0 is not compatible with oplog entries generated by versions of
MongoDB before 2.2.1. If you upgrade from one of these versions, you must wait for new oplog entries to overwrite
all old oplog entries generated by one of these versions before upgrading to 3.0.0 or earlier.
Secondaries may abort if they replay a pre-2.6 oplog with an index build operation that would fail on a 2.6 or later
primary.
Replica Set Configuration Validation MongoDB 3.0 provides a stricter validation of replica set configuration
settings (page 632) and replica sets invalid replica set configurations.
Stricter validations include:
Arbiters can only have 1 vote. Previously, arbiters could also have a value of 0 for votes. If an arbiter has any
value other than 1 for votes, you must fix the setting.
Non-arbiter members can only have value of 0 or 1 for votes. If a non-arbiter member has any other value for
votes, you must fix the setting.
_id in the Replica Set Configuration (page 632) must specify the same name as that specified by --replSet
or replication.replSetName. Otherwise, you must fix the setting.
Disallows 0 for getLastErrorDefaults value. If getLastErrorDefaults value is 0, you must fix
the setting.
settings can only contain the recognized settings. Previously, MongoDB ignored unrecognized settings. If
settings contains unrecognized settings, you must remove the unrecognized settings.
To fix the settings before upgrading to MongoDB 3.0, connect to the primary and reconfigure your replica set to
valid configuration settings.
If you have already upgraded to MongoDB 3.0, you must downgrade to MongoDB 2.6 (page 788) first and then fix
the settings. Once you have reconfigured the replica set, you can re-upgrade to MongoDB 3.0.
Change of w: majority Semantics A write concern with a w: majority (page 129) value is satisfied when a
majority of the voting members replicates a write operation. In previous versions, majority referred a majority of all
voting and non-voting members of the set.
Remove local.slaves Collection MongoDB 3.0 removes the local.slaves collection that tracked the secondaries replication progress. To track the replication progress, use the rs.status() method.
Replica Set State Change The FATAL replica set state does not exist as of 3.0.0.
HTTP Interface The HTTP Interface (i.e. net.http.enabled) no longer reports replication data.
MongoDB Tools Changes
Require a Running MongoDB Instance The 3.0 versions of MongoDB tools, mongodump, mongorestore,
mongoexport, mongoimport, mongofiles, and mongooplog, must connect to running MongoDB instances
and these tools cannot directly modify the data files with --dbpath as in previous versions. Ensure that you start
your mongod instance(s) before using these tools.
774
Removed Options
Removed --dbpath, --journal, and --filter options for mongodump, mongorestore,
mongoimport, mongoexport, and bsondump.
Removed --locks option for mongotop.
Removed --noobjcheck option for bsondump and mongorestore.
Removed --csv option for mongoexport. Use the new --type option to specify the export format type
(csv or json).
See also:
MongoDB Tools Enhancements (page 771)
Sharded Cluster Setting
Remove releaseConnectionsAfterResponse Parameter MongoDB now always releases connections after response. releaseConnectionsAfterResponse parameter is no longer available.
Security Changes
MongoDB 2.4 User Model Removed After deprecating the 2.4 user model in 2.6, MongoDB 3.0 completely removes support for the 2.4 user model. MongoDB will exit with an error message there is user data with the 2.4 schema.
If your deployment still uses the 2.4 user model, see Upgrade User Authorization Data to 2.6 Format (page 833) to
upgrade to the 2.6 user model before upgrading to 3.0.
After upgrading to 3.0 from 2.6, if you wish to make use of the new SCRAM-SHA-1 challenge-response mechanism,
you will need to upgrade the authentication schema a second time. The upgrade from the 2.4 to the 2.6 user model
does not encompass the necessary changes to use SCRAM-SHA-1 under 3.0. See MongoDB 3.0 and SCRAM-SHA-1
(page 785) for further details.
Localhost Exception Changed In 3.0, the localhost exception changed so that these connections only have access
to create the first user on the admin database. In previous versions, connections that gained access using the localhost
exception had unrestricted access to the MongoDB instance.
See Localhost Exception (page 311) for more information.
db.addUser() Removed 3.0 removes the legacy db.addUser() method. Use db.createUser() and
db.updateUser() instead.
SSL Configuration Option Changes MongoDB 3.0 introduced new net.ssl.allowConnectionsWithoutCertificates
configuration file setting and --sslAllowConnectionsWithoutCertificates command line option for
mongod and mongos. These options replace previous net.ssl.weakCertificateValidation and
--sslWeakCertificateValidation options, which became aliases. Update your configuration to ensure
future compatibility.
SSL Certificates Validation By default, MongoDB instances will only start if its certificate (i.e.
net.ssl.PemKeyFile) is valid.
You can disable this behavior with the
net.ssl.allowInvalidCertificates setting or the --sslAllowInvalidCertificates command line option.
775
SSL Certificate Hostname Validation By default, MongoDB validates the hostnames of hosts attempting to
connect using certificates against the hostnames listed in those certificates. In certain deployment situations this
behavior may be undesirable. It is now possible to disable such hostname validation without disabling validation of the rest of the certificate information with the net.ssl.allowInvalidHostnames setting or the
--sslAllowInvalidHostnames command line option.
SSLv3 Ciphers Disabled In light of vulnerabilities in legacy SSL ciphers75 , these ciphers have been explicitly
disabled in MongoDB. No configuration changes are necessary.
mongo Shell Version Compatibility Versions of the mongo shell before 3.0 are not compatible with 3.0 deployments of MongoDB that enforce access control. If you have a 3.0 MongoDB deployment that requires access control,
you must use 3.0 versions of the mongo shell.
HTTP Status Interface and REST API Compatibility Neither the HTTP status interface nor the REST API support the SCRAM-SHA-1 (page 309) challenge-response user authentication mechanism introduced in version 3.0.
Indexes
Remove dropDups Option dropDups option is no longer available for ensureIndex() and
createIndexes.
Changes to Restart Behavior during Background Indexing For 3.0 mongod instances, if a background index
build is in progress when the mongod process terminates, when the instance restarts the index build will restart as
foreground index build. If the index build encounters any errors, such as a duplicate key error, the mongod will exit
with an error.
To start the mongod after a failed index build,
--noIndexBuildRetry to skip the index build on start up.
use
the
storage.indexBuildRetry
or
2d Indexes and Geospatial Near Queries For $near queries that use a 2d (page 483) index:
MongoDB no longer uses a default limit of 100 documents.
Specifying a batchSize() is no longer analogous to specifying a limit().
For $nearSphere queries that use a 2d (page 483) index, MongoDB no longer uses a default limit of 100 documents.
Driver Compatibility Changes Each officially supported driver has release a version that includes support for all
new features introduced in MongoDB 3.0. Upgrading to one of these version is strongly recommended as part of the
upgrade process.
A driver upgrade is necessary in certain scenarios due to changes in functionality:
Use of the SCRAM-SHA-1 authentication method
Use of functionality that calls listIndexes or listCollections
75 https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2014-3566
776
Deprecate Access to system.indexes and system.namespaces MongoDB 3.0 deprecates direct access
to system.indexes and system.namespaces collections. Use the createIndexes and listIndexes
commands instead. See also WiredTiger and Driver Version Compatibility (page 773).
Collection Name Validation MongoDB 3.0 more consistently enforces the collection naming
restrictions. Ensure your application does not create or depend on invalid collection names.
76 http://docs.mongodb.org/ecosystem/drivers/c
77 https://github.com/mongodb/mongo-c-driver/releases
78 https://github.com/mongodb/mongo-cxx-driver
79 https://github.com/mongodb/mongo-cxx-driver/releases
80 http://docs.mongodb.org/ecosystem/drivers/csharp
81 https://github.com/mongodb/mongo-csharp-driver/releases
82 http://docs.mongodb.org/ecosystem/drivers/java
83 https://github.com/mongodb/mongo-java-driver/releases
84 http://docs.mongodb.org/ecosystem/drivers/node-js
85 https://github.com/mongodb/node-mongodb-native/releases
86 http://docs.mongodb.org/ecosystem/drivers/perl
87 http://search.cpan.org/dist/MongoDB/
88 http://docs.mongodb.org/ecosystem/drivers/php
89 http://pecl.php.net/package/mongo
90 http://docs.mongodb.org/ecosystem/drivers/python
91 https://pypi.python.org/pypi/pymongo/
92 http://docs.mongodb.org/ecosystem/drivers/python
93 https://pypi.python.org/pypi/motor/
94 http://docs.mongodb.org/ecosystem/drivers/ruby
95 https://rubygems.org/gems/mongo
96 http://docs.mongodb.org/ecosystem/drivers/scala
97 https://github.com/mongodb/casbah/releases
777
Platform Support No longer provides commercial support for MongoDB on Linux32 and Win32 platforms; however, will continue to build the MongoDB distributions for the platforms.
Removed/Deprecated Commands The following commands are no longer available in MongoDB 3.0:
closeAllDatabases
getoptime
text
The following commands and methods are deprecated in MongoDB 3.0:
diagLogging
eval, db.eval()
db.collection.copyTo()
Date and Timestamp Comparison Order MongoDB 3.0 no longer treats the Timestamp (page 177) and the Date
(page 178) data types as equivalent for comparison purposes. Instead, the Timestamp (page 177) data type has a higher
comparison/sort order (i.e. is greater) than the Date (page 178) data type. If your application relies on the equivalent
comparison/sort order of Date and Timestamp objects, modify your application accordingly before upgrading.
Some changes in 3.0 can affect compatibility (page 772) and may require user actions. For a detailed list of compatibility changes, see Compatibility Changes in MongoDB 3.0 (page 772).
Upgrade Process
Upgrade MongoDB to 3.0 In the general case, the upgrade from MongoDB 2.6 to 3.0 is a binary-compatible dropin upgrade: shut down the mongod instances and replace them with mongod instances running 3.0. However, before
you attempt any upgrade please familiarize yourself with the content of this document, particularly the procedure for
upgrading sharded clusters (page 781).
Upgrade Recommendations and Checklists When upgrading, consider the following:
Upgrade Requirements To upgrade an existing MongoDB deployment to 3.0, you must be running 2.6. If youre
running a version of MongoDB before 2.6, you must upgrade to 2.6 before upgrading to 3.0. See Upgrade MongoDB to
2.6 (page 829) for the procedure to upgrade from 2.4 to 2.6. Once upgraded to MongoDB 2.6, you cannot downgrade
to any version earlier than MongoDB 2.4.
Preparedness Before upgrading MongoDB, always test your application in a staging environment before deploying
the upgrade to your production environment.
Some changes in MongoDB 3.0 require manual checks and intervention. Before beginning your upgrade, see the
Compatibility Changes in MongoDB 3.0 (page 772) document to ensure that your applications and deployments are
compatible with MongoDB 3.0. Resolve the incompatibilities in your deployment before starting the upgrade.
Downgrade Limitations Once upgraded to MongoDB 3.0, you cannot downgrade to a version lower than 2.6.5.
If you upgrade to 3.0 and have run authSchemaUpgrade, you cannot downgrade to 2.6 without disabling --auth
or restoring a pre-upgrade backup, as authSchemaUpgrade discards the MONGODB-CR credentials used in 2.6.
Upgrade MongoDB Processes
778
Upgrade Standalone mongod Instance to MongoDB 3.0 The following steps outline the procedure to upgrade a
standalone mongod from version 2.6 to 3.0. To upgrade from version 2.4 to 3.0, upgrade to version 2.6 (page 829)
first, and then use the following procedure to upgrade from 2.6 to 3.0.
Upgrade Binaries If you installed MongoDB from the MongoDB apt, yum, or zypper repositories, you should
upgrade to 3.0 using your package manager. Follow the appropriate installation instructions (page 6) for your Linux
system. This will involve adding a repository for the new release, then performing the actual upgrade.
Otherwise, you can manually upgrade MongoDB:
Step 1: Download 3.0 binaries. Download binaries of the latest release in the 3.0 series from the MongoDB Download Page98 . See Install MongoDB (page 5) for more information.
Step 2: Replace 2.6 binaries. Shut down your mongod instance. Replace the existing binary with the 3.0 mongod
binary and restart mongod.
Change Storage Engine to WiredTiger To change storage engine to WiredTiger, you will need to manually export
and upload the data using mongodump and mongorestore.
Step 1: Start 3.0 mongod. Ensure that the 3.0 mongod is running with the default MMAPv1 engine.
Step 2: Export the data using mongodump.
mongodump --out <exportDataDestination>
Specify additional options as appropriate, such as username and password if running with authorization enabled. See
mongodump for available options.
Step 3: Create data directory for WiredTiger. Create a new data directory for WiredTiger. Ensure that the user
account running mongod has read and write permissions for the new directory.
mongod with WiredTiger will not start with data files created with a different storage engine.
Step 4: Restart the mongod with WiredTiger. Restart the 3.0 mongod, specifying wiredTiger as the
--storageEngine and the newly created data directory for WiredTiger as the --dbpath. Specify additional
options as appropriate.
mongod --storageEngine wiredTiger --dbpath <newWiredTigerDBPath>
You can also specify the options in a configuration file. To specify the storage engine, use the new
storage.engine setting.
Step 5: Upload the exported data using mongorestore.
mongorestore <exportDataDestination>
779
Prerequisites
If the oplog contains entries generated by versions of MongoDB that precede version 2.2.1, you must wait for
the entries to be overwritten by later versions before you can upgrade to MongoDB 3.0. For more information,
see Replica Set Oplog Format Change (page 774)
Stricter validation in MongoDB 3.0 (page 774) of replica set configuration may invalidate previously-valid
replica set configuration, preventing replica sets from starting in MongoDB 3.0. For more information, see
Replica Set Configuration Validation (page 774).
To upgrade a replica set from MongoDB 2.6 to 3.0, upgrade all members of the replica set to version 2.6
(page 829) first, and then follow the procedure to upgrade from MongoDB 2.6 to 3.0.
Upgrade Binaries You can upgrade from MongoDB 2.6 to 3.0 using a rolling upgrade to minimize downtime by
upgrading the members individually while the other members are available:
Step 1: Upgrade secondary members of the replica set. Upgrade the secondary members of the set one at a time
by shutting down the mongod and replacing the 2.6 binary with the 3.0 binary. After upgrading a mongod instance,
wait for the member to recover to SECONDARY state before upgrading the next instance. To check the members state,
issue rs.status() in the mongo shell.
Step 2: Step down the replica set primary. Use rs.stepDown() in the mongo shell to step down the primary
and force the set to failover (page 560). rs.stepDown() expedites the failover procedure and is preferable to
shutting down the primary directly.
Step 3: Upgrade the primary. When rs.status() shows that the primary has stepped down and another member has assumed PRIMARY state, shut down the previous primary and replace the mongod binary with the 3.0 binary
and start the new instance.
Replica set failover is not instant and will render the set unavailable to accept writes until the failover process completes. This may take 30 seconds or more: schedule the upgrade procedure during a scheduled maintenance window.
Change Replica Set Storage Engine to WiredTiger In MongoDB 3.0, replica sets can have members with different
storage engines. As such, you can update members to use the WiredTiger storage engine in a rolling manner. Before
changing all the members to use WiredTiger, you may prefer to run with mixed storage engines for some period.
However, performance can vary according to workload.
Note: Before enabling the new WiredTiger storage engine, ensure that all replica set/sharded cluster members are
running at least MongoDB version 2.6.8, and preferably version 3.0.0 or newer.
To change the storage engine to WiredTiger for an existing secondary replica set member, remove the members data
and perform an initial sync (page 613):
Step 1: Shutdown the secondary member. Stop the mongod instance for the secondary member.
Step 2: Prepare data directory for WiredTiger. mongod with WiredTiger will not start if the --dbpath directory contains data files created with a different storage engine.
For the stopped secondary member, either delete the content of the data directory or create a new data directory. If
creating a new directory, ensure that the user account running mongod has read and write permissions for the new
directory.
780
Step 3: Restart the secondary member with WiredTiger. Restart the 3.0 mongod, specifying wiredTiger as
the --storageEngine and the data directory for WiredTiger as the --dbpath. Specify additional options as
appropriate for the member.
mongod --storageEngine wiredTiger --dbpath <newWiredTigerDBPath>
Since no data exists in the --dbpath, the mongod will perform an initial sync. The length of the initial sync process
depends on the size of the database and network connection between members of the replica set.
You can also specify the options in a configuration file. To specify the storage engine, use the new
storage.engine setting.
To update all members of the replica set to use WiredTiger, update the secondary members first. Then step down the
primary, and update the stepped-down member.
Upgrade a Sharded Cluster to 3.0 Only upgrade sharded clusters to 3.0 if all members of the cluster are currently
running instances of 2.6. The only supported upgrade path for sharded clusters running 2.4 is via 2.6. The upgrade
process checks all components of the cluster and will produce warnings if any component is running version 2.4.
Considerations The upgrade process does not require any downtime. However, while you upgrade the sharded
cluster, ensure that clients do not make changes to the collection meta-data. For example, during the upgrade, do not
do any of the following:
sh.enableSharding()
sh.shardCollection()
sh.addShard()
db.createCollection()
db.collection.drop()
db.dropDatabase()
any operation that creates a database
any other operation that modifies the cluster metadata in any way. See Sharding Reference (page 715) for a complete list of sharding commands. Note, however, that not all commands on the Sharding Reference (page 715)
page modifies the cluster meta-data.
Upgrade Sharded Clusters Optional but Recommended. As a precaution, take a backup of the config database
before upgrading the sharded cluster.
Step 1: Disable the Balancer. Turn off the balancer (page 663) in the sharded cluster, as described in Disable the
Balancer (page 695).
Step 2: Upgrade the clusters meta data. Start a single 3.0 mongos instance with the configDB pointing to the
clusters config servers and with the --upgrade option.
To run a mongos with the --upgrade option, you can upgrade an existing mongos instance to 3.0, or if you need
to avoid reconfiguring a production mongos instance, you can use a new 3.0 mongos that can reach all the config
servers.
To upgrade the meta data, run:
781
You can include the --logpath option to output the log messages to a file instead of the standard output. Also
include any other options required to start mongos instances in your cluster, such as --sslOnNormalPorts or
--sslPEMKeyFile.
The 3.0 mongos will output informational log messages.
<timestamp>
...
<timestamp>
<timestamp>
<timestamp>
<timestamp>
...
<timestamp>
...
<timestamp>
<timestamp>
SHARDING
SHARDING
SHARDING
SHARDING
[mongosMain]
[mongosMain]
[mongosMain]
[mongosMain]
starting
starting
about to
checking
After a successful upgrade, restart the mongos instance. If mongos fails to start, check the log for more information.
If the mongos instance loses its connection to the config servers during the upgrade or if the upgrade is otherwise
unsuccessful, you may always safely retry the upgrade.
Step 4: Upgrade the remaining mongos instances to 3.0.
the other mongos instances in the sharded cluster.
After you have successfully upgraded all mongos instances, you can proceed to upgrade the other components in
your sharded cluster.
Warning: Do not upgrade the mongod instances until after you have upgraded all the mongos instances.
Step 5: Upgrade the config servers. After you have successfully upgraded all mongos instances, upgrade all 3
mongod config server instances, leaving the first config server listed in the mongos --configdb argument to
upgrade last.
Step 6: Upgrade the shards. Upgrade each shard, one at a time, upgrading the mongod secondaries before running
replSetStepDown and upgrading the primary of each shard.
782
Step 7: Re-enable the balancer. Once the upgrade of sharded cluster components is complete, Re-enable the balancer (page 696).
Change Sharded Cluster Storage Engine to WiredTiger For a sharded cluster in MongoDB 3.0, you can choose
to update the shards to use WiredTiger storage engine and have the config servers use MMAPv1. If you update the
config servers to use WiredTiger, you must update all three config servers to use WiredTiger.
Note: Before enabling the new WiredTiger storage engine, ensure that all replica set/sharded cluster members are
running at least MongoDB version 2.6.8, and preferably version 3.0.0 or newer.
Change Shards to Use WiredTiger To change the storage engine for the shards to WiredTiger, refer to the procedures in Change Storage Engine to WiredTiger for replica sets (page 780) and Change Storage Engine to WiredTiger
for standalone mongod (page 779) as appropriate for your shards.
Change Config Server to Use WiredTiger For a sharded cluster in MongoDB 3.0, you may continue to use
MMAPv1 for the config servers even if the shards use WiredTiger. If, however, you decide to change the config
servers to use the WiredTiger storage engine, you must change all three config servers to use WiredTiger.
Note: During this process, only two config servers will be running at any given time to ensure that the sharded
clusters metadata is read only.
Step 1: Disable the Balancer. Turn off the balancer (page 663) in the sharded cluster, as described in Disable the
Balancer (page 695).
Step 2: Stop the last config server listed in the mongos configDB setting.
Step 3: Export data of the second config server listed in the mongos configDB setting.
mongodump --out <exportDataDestination>
Specify additional options as appropriate, such as username and password if running with authorization enabled. See
mongodump for available options.
Step 4: For the second config server, create a new data directory for WiredTiger. Ensure that the user account
running mongod has read and write permissions for the new directory.
mongod with WiredTiger will not start if the --dbpath directory contains data files created with a different storage
engine.
Step 5: Restart the second config server with WiredTiger. Specify wiredTiger as the --storageEngine
and the newly created WiredTiger data directory as the --dbpath as well as any additional options as appropriate.
mongod --storageEngine wiredTiger --dbpath <newWiredTigerDBPath> --configsvr
You can also specify the options in a configuration file. To specify the storage engine, use the new
storage.engine setting.
783
Step 6: Upload the exported data using mongorestore to the second config server.
mongorestore <exportDataDestination>
Specify additional options as appropriate, such as username and password if running with authorization enabled. See
mongodump for available options.
Step 10: For the third config server, create a new data directory for WiredTiger. Ensure that the user account
running mongod has read and write permissions for the new directory.
mongod with WiredTiger will not start if the --dbpath directory contains data files created with a different storage
engine.
Step 11: Restart the third config server with WiredTiger. Specify wiredTiger as the --storageEngine
and the newly created WiredTiger data directory as the --dbpath as well as any additional options as appropriate.
mongod --storageEngine wiredTiger --dbpath <newWiredTigerDBPath> --configsvr
You can also specify the options in a configuration file. To specify the storage engine, use the new
storage.engine setting.
Step 12: Upload the exported data using mongorestore to the third config server.
mongorestore <exportDataDestination>
Specify additional options as appropriate, such as username and password if running with authorization enabled. See
mongodump for available options.
Step 14: For the first config server, create data directory for WiredTiger. Ensure that the user account running
mongod has read and write permissions for the new directory.
mongod with WiredTiger will not start if the --dbpath directory contains data files created with a different storage
engine.
Step 15: Restart the first config server with WiredTiger. Specify wiredTiger as the --storageEngine
and the newly created WiredTiger data directory as the --dbpath as well as any additional options as appropriate.
784
You can also specify the options in a configuration file. To specify the storage engine, use the new
storage.engine setting.
Step 16: Upload the exported data using mongorestore to the first config server.
mongorestore <exportDataDestination>
You can also specify the options in a configuration file. To specify the storage engine, use the new
storage.engine setting.
Once all three config servers are up, the sharded clusters metadata is available for writes.
Step 18: Re-enable the balancer. Once all three config servers are up and running with WiredTiger, Re-enable the
balancer (page 696).
Upgrade Authentication Schema to Enable SCRAM-SHA-1 See MongoDB 3.0 and SCRAM-SHA-1 (page 785)
for details on SCRAM-SHA-1 upgrade scenarios.
General Upgrade Procedure
ment:
Except as described on this page, moving between 2.6 and 3.0 is a drop-in replace-
Step 1: Stop the existing mongod instance. For example, on Linux, run 2.6 mongod with the --shutdown
option as follows:
mongod --dbpath /var/mongod/data --shutdown
Replace /var/mongod/data with your MongoDB dbPath. See also the Stop mongod Processes (page 223) for
alternate methods of stopping a mongod instance.
Step 2: Start the new mongod instance. Ensure you start the 3.0 mongod with the same dbPath:
mongod --dbpath /var/mongod/data
785
Overview MongoDB 3.0 includes support for the SCRAM-SHA-1 (page 309) challenge-response user authentication
mechanism. This changes how MongoDB uses and stores user credentials. If your deployment uses authentication
and authorization, you must upgrade the authentication schema in addition to upgrading MongoDB processes if you
wish to make use of SCRAM-SHA-1.
You may, alternatively, opt to continue to use the MONGODB-CR challenge-response mechanism and skip this upgrade.
See the SCRAM-SHA-1 (page 309) documentation for further information on its advantages.
Recommendation SCRAM-SHA-1 represents a significant improvement in security over MONGODB-CR, the previous default authentication mechanism: you are strongly urged to upgrade. The next major version of MongoDB is
likely to remove all support for MONGODB-CR.
Upgrade Scenarios The following scenarios are possible when upgrading from 2.6 to 3.0:
If you are starting with a new 3.0 installation without any users or upgrading from a 2.6 database that has no
users and wish to use SCRAM-SHA-1, no action is required. All new users created will have the correct format
for SCRAM-SHA-1.
If you are upgrading from a 2.6 database with existing data, including users, and wish to continue to use
MONGODB-CR, no action is required. All new users created under 3.0 will continue to use the same authentication model as users already in the database. You can execute the upgrade to SCRAM-SHA-1 at any
point in the future.
If you are upgrading from a 2.6 database with existing data, including users, and wish to upgrade to
SCRAM-SHA-1, you may follow the steps under the heading Upgrade a 2.6 Database to Use SCRAM-SHA-1
on 3.0 (page 787).
Considerations Before upgrading the authentication model, you should first upgrade MongoDB binaries to 3.0. For
sharded clusters, ensure that all cluster components are 3.0.
You should also upgrade all drivers used by applications that will connect to upgraded database instances. The minimum driver versions that support SCRAM-SHA-1 are:
C - 1.1.0
C++ - 1.0.0
C# - 1.1
Java - 2.13
Node.js - 2.0
Perl - 0.706.0.0
PHP - 1.6.0
Python - 2.8
Motor - 0.4
Ruby - 1.12
Scala - 2.8.0
See the MongoDB Drivers Page99 for links to download upgraded drivers.
99 http://docs.mongodb.org/ecosystem/drivers
786
Requirements To upgrade the authentication model, you must have a user in the admin database with the role
userAdminAnyDatabase (page 396).
Timing Because downgrades are more difficult after you upgrade the user authentication model, once you upgrade
the MongoDB binaries to version 3.0, allow your MongoDB deployment to run for a day or two before following this
procedure.
This allows 3.0 some time to burn in and decreases the likelihood of downgrades occurring after the user privilege
model upgrade. The user authentication and access control will continue to work as it did in 2.6.
If you decide to upgrade the user authentication model immediately instead of waiting the recommended burn in
period, then for sharded clusters, you must wait at least 10 seconds after upgrading the sharded clusters to run the
authentication upgrade command.
Replica Sets For a replica set, it is only necessary to run the upgrade process on the primary as the changes will
automatically replicate to the secondaries.
Sharded Clusters For a sharded cluster, connect to one mongos instance and run the upgrade procedure to upgrade
the clusters authentication data. By default, the procedure will upgrade the authentication data of the shards as well.
To override this behavior, run authSchemaUpgrade with the upgradeShards: false option. If you choose
to override, you must run the upgrade procedure on the mongos first, and then run the procedure on the primary
members of each shard.
For a sharded cluster, do not run the upgrade process directly against the config servers (page 650). Instead, perform
the upgrade process using one mongos instance to interact with the config database.
Procedure: Upgrade a 2.6 Database to Use SCRAM-SHA-1
Important: This procedure discards the MONGODB-CR credentials used by 2.6, and therefore is irreversible short of
restoring from backups.
This procedure disables MONGODB-CR as an authentication mechanism.
Step 1: Connect to the MongoDB instance. Connect and authenticate to the mongod instance for a single deployment, the primary mongod for a replica set, or a mongos for a sharded cluster as an admin database user with the
role userAdminAnyDatabase (page 396).
Step 2: Upgrade authentication schema. Use the authSchemaUpgrade command in the admin database to
update the user data using the mongo shell.
Run authSchemaUpgrade command.
db.adminCommand({authSchemaUpgrade: 1});
787
db.adminCommand(
{authSchemaUpgrade: 1, upgradeShards: false }
);
If you override the default behavior or your cluster has shard local users, after running authSchemaUpgrade on
a mongos instance, you will need to connect to the primary for each shard and repeat the upgrade process after
upgrading on the mongos.
Result After this procedure is complete, all users in the database will have SCRAM-SHA-1-style credentials, and
any subsequently-created users will also have this type of credentials.
Additional Resources
Blog Post: Improved Password-Based Authentication in MongoDB 3.0: SCRAM Explained (Part 1)100
Blog Post: Improved Password-Based Authentication in MongoDB 3.0: SCRAM Explained (Part 2)101
Downgrade MongoDB from 3.0 Before you attempt any downgrade, familiarize yourself with the content of this
document, particularly the Downgrade Recommendations and Checklist (page 788) and the procedure for downgrading
sharded clusters (page 790).
Downgrade Recommendations and Checklist When downgrading, consider the following:
Downgrade Path Once upgraded to MongoDB 3.0, you cannot downgrade to a version lower than 2.6.5.
Important: If you upgrade to MongoDB 3.0 and have run authSchemaUpgrade, you cannot downgrade to the
2.6 series without disabling --auth.
788
Specify additional options as appropriate, such as username and password if running with authorization enabled. See
mongodump for available options.
Step 3: Create data directory for MMAPv1. Create a new data directory for MMAPv1. Ensure that the user
account running mongod has read and write permissions for the new directory.
Step 4: Restart the mongod with MMAPv1. Restart the 3.0 mongod, specifying the newly created data directory
for MMAPv1 as the --dbpath. You do not have to specify --storageEngine as MMAPv1 is the default.
mongod --dbpath <newMMAPv1DBPath>
You can update members to use the MMAPv1 storage engine in a rolling
Note: When running a replica set with mixed storage engines, performance can vary according to workload.
To change the storage engine to MMAPv1 for an existing secondary replica set member, remove the members data
and perform an initial sync (page 613):
Step 1: Shutdown the secondary member. Stop the mongod instance for the secondary member.
102 http://www.mongodb.org/downloads
789
Step 2: Prepare data directory for MMAPv1. Prepare --dbpath directory for initial sync.
For the stopped secondary member, either delete the content of the data directory or create a new data directory. If
creating a new directory, ensure that the user account running mongod has read and write permissions for the new
directory.
Step 3: Restart the secondary member with MMAPv1. Restart the 3.0 mongod, specifying the MMAPv1 data
directory as the --dbpath. Specify additional options as appropriate for the member. You do not have to specify
--storageEngine since MMAPv1 is the default.
mongod --dbpath <preparedMMAPv1DBPath>
Since no data exists in the --dbpath, the mongod will perform an initial sync. The length of the initial sync process
depends on the size of the database and network connection between members of the replica set.
Repeat for the remaining the secondary members. Once all the secondary members have switched to MMAPv1, step
down the primary, and update the stepped-down member.
Downgrade Binaries Once upgraded to MongoDB 3.0, you cannot downgrade to a version lower than 2.6.5.
The following steps outline a rolling downgrade process for the replica set. The rolling downgrade process
minimizes downtime by downgrading the members individually while the other members are available:
Step 1: Downgrade secondary members of the replica set.
one at a time:
1. Shut down the mongod. See Stop mongod Processes (page 223) for instructions on safely terminating mongod
processes.
2. Replace the 3.0 binary with the 2.6 binary and restart.
3. Wait for the member to recover to SECONDARY state before downgrading the next secondary. To check the
members state, use the rs.status() method in the mongo shell.
Step 2: Step down the primary. Use rs.stepDown() in the mongo shell to step down the primary and force
the normal failover (page 560) procedure.
rs.stepDown()
rs.stepDown() expedites the failover procedure and is preferable to shutting down the primary directly.
Step 3: Replace and restart former primary mongod. When rs.status() shows that the primary has stepped
down and another member has assumed PRIMARY state, shut down the previous primary and replace the mongod
binary with the 2.6 binary and start the new instance.
Replica set failover is not instant but will render the set unavailable to writes and interrupt reads until the failover process completes. Typically this takes 10 seconds or more. You may wish to plan the downgrade during a predetermined
maintenance window.
Downgrade a 3.0 Sharded Cluster
790
Requirements While the downgrade is in progress, you cannot make changes to the collection meta-data. For
example, during the downgrade, do not do any of the following:
sh.enableSharding()
sh.shardCollection()
sh.addShard()
db.createCollection()
db.collection.drop()
db.dropDatabase()
any operation that creates a database
any other operation that modifies the cluster meta-data in any way. See Sharding Reference (page 715) for a complete list of sharding commands. Note, however, that not all commands on the Sharding Reference (page 715)
page modifies the cluster meta-data.
Change Storage Engine to MMAPv1 If you have changed the storage engine to WiredTiger, change the storage
engine to MMAPv1 before downgrading to 2.6.
Change Shards to Use MMAPv1 To change the storage engine to MMAPv1, refer to the procedure in Change Storage Engine to MMAPv1 for replica set members (page 789) and Change Storage Engine to MMAPv1 for standalone
mongod (page 788) as appropriate for your shards.
Change Config Servers to Use MMAPv1
Note: During this process, only two config servers will be running at any given time to ensure that the sharded
clusters metadata is read only.
Step 1: Disable the Balancer. Turn off the balancer (page 663) in the sharded cluster, as described in Disable the
Balancer (page 695).
Step 2: Stop the last config server listed in the mongos configDB setting.
Step 3: Export data of the second config server listed in the mongos configDB setting.
mongodump --out <exportDataDestination>
Specify additional options as appropriate, such as username and password if running with authorization enabled. See
mongodump for available options.
Step 4: For the second config server, create a new data directory for MMAPv1. Ensure that the user account
running mongod has read and write permissions for the new directory.
Step 5: Restart the second config server with MMAPv1. Specify the newly created MMAPv1 data directory as
the --dbpath as well as any additional options as appropriate.
mongod --dbpath <newMMAPv1DBPath> --configsvr
791
Step 6: Upload the exported data using mongorestore to the second config server.
mongorestore <exportDataDestination>
Specify additional options as appropriate, such as username and password if running with authorization enabled. See
mongodump for available options.
Step 10: For the third config server, create a new data directory for MMAPv1. Ensure that the user account
running mongod has read and write permissions for the new directory.
Step 11: Restart the third config server with MMAPv1. Specify the newly created MMAPv1 data directory as
the --dbpath as well as any additional options as appropriate.
mongod --dbpath <newMMAPv1DBPath> --configsvr
Step 12: Upload the exported data using mongorestore to the third config server.
mongorestore <exportDataDestination>
Specify additional options as appropriate, such as username and password if running with authorization enabled. See
mongodump for available options.
Step 14: For the first config server, create data directory for MMAPv1. Ensure that the user account running
mongod has read and write permissions for the new directory.
Step 15: Restart the first config server with MMAPv1. Specify the newly created MMAPv1 data directory as the
--dbpath as well as any additional options as appropriate.
mongod --dbpath <newMMAPv1DBPath> --configsvr
Step 16: Upload the exported data using mongorestore to the first config server.
mongorestore <exportDataDestination>
792
Step 17: Enable writes to the sharded clusters metadata. Restart the second config server, specifying the newly
created MMAPv1 data directory as the --dbpath. Specify additional options as appropriate.
mongod --dbpath <newMMAPv1DBPath> --configsvr
Once all three config servers are up, the sharded clusters metadata is available for writes.
Step 18: Re-enable the balancer. Once all three config servers are up and running with WiredTiger, Re-enable the
balancer (page 696).
Downgrade Binaries Once upgraded to MongoDB 3.0, you cannot downgrade to a version lower than 2.6.5.
The downgrade procedure for a sharded cluster reverses the order of the upgrade procedure. The version v6 config
database is backwards compatible with MongoDB 2.6.
Step 1: Disable the Balancer. Turn off the balancer (page 663) in the sharded cluster, as described in Disable the
Balancer (page 695).
Step 2: Downgrade each shard, one at a time. For each shard:
1. Downgrade the mongod secondaries before downgrading the primary.
2. To downgrade the primary, run replSetStepDown and downgrade.
Step 3: Downgrade the config servers. Downgrade all 3 mongod config server instances, leaving the first system
in the mongos --configdb argument to downgrade last.
Step 4: Downgrade the mongos instances. Downgrade and restart each mongos, one at a time. The downgrade
process is a binary drop-in replacement.
Step 5: Re-enable the balancer. Once the upgrade of sharded cluster components is complete, re-enable the balancer (page 696).
General Downgrade Procedure
ment:
Except as described on this page, moving between 2.6 and 3.0 is a drop-in replace-
Step 1: Stop the existing mongod instance. For example, on Linux, run 3.0 mongod with the --shutdown
option as follows:
mongod --dbpath /var/mongod/data --shutdown
Replace /var/mongod/data with your MongoDB dbPath. See also the Stop mongod Processes (page 223) for
alternate methods of stopping a mongod instance.
Step 2: Start the new mongod instance. Ensure you start the 2.6 mongod with the same dbPath:
mongod --dbpath /var/mongod/data
793
Download
2.6.8 Changes
Security and Networking
SERVER-17278108 BSON BinData validation enforcement
SERVER-17022109 No SSL Session Caching may not be respected
SERVER-17264110 improve bson validation
Query and Aggregation
SERVER-16655111 Geo predicate is unable to use compound 2dsphere index if it is root of $or clause
SERVER-16527112 2dsphere explain reports works for nscanned & nscannedObjects
SERVER-15802113 Query optimizer should always use equality predicate over unique index when possible
103 http://www.mongodb.org/downloads
104 https://github.com/mongodb/mongo/blob/v3.0/distsrc/THIRD-PARTY-NOTICES
105 http://bit.ly/1CpOu6t
106 https://mms.mongodb.com/help-hosted/v1.4/
107 https://mms.mongodb.com/help-hosted/v1.4/management/changelog/
108 https://jira.mongodb.org/browse/SERVER-17278
109 https://jira.mongodb.org/browse/SERVER-17022
110 https://jira.mongodb.org/browse/SERVER-17264
111 https://jira.mongodb.org/browse/SERVER-16655
112 https://jira.mongodb.org/browse/SERVER-16527
113 https://jira.mongodb.org/browse/SERVER-15802
794
Replication
SERVER-16599115 copydb and clone commands can crash the server if a primary steps down
SERVER-16315116 Replica set nodes should not threaten to veto nodes whose config version is higher than their
own
SERVER-16274117 secondary fasserts trying to replicate an index
SERVER-15471118 Better error message when replica is not found in GhostSync::associateSlave
Sharding
SERVER-17191119 Spurious warning during upgrade of sharded cluster
SERVER-17163120 Fatal error logOp but not primary in MigrateStatus::go
SERVER-16984121 UpdateLifecycleImpl can return empty collectionMetadata even if ns is
sharded
SERVER-10904122 Possible for _master and _slaveConn to be pointing to different connections even with
primary read pref
Storage
SERVER-17087123 Add listCollections command functionality to 2.6 shell & client
SERVER-14572124 Increase C runtime stdio file limit
Tools
SERVER-17216125 2.6 mongostat cannot be used with 3.0 mongod
SERVER-14190126 mongorestore parseMetadataFile passes non-null terminated string to
fromjson
Build and Packaging
SERVER-14803127 Support static libstdc++ builds for non-Linux builds
SERVER-15400128 Create Windows Enterprise Zip File with vcredist and dependent dlls
114 https://jira.mongodb.org/browse/SERVER-14044
115 https://jira.mongodb.org/browse/SERVER-16599
116 https://jira.mongodb.org/browse/SERVER-16315
117 https://jira.mongodb.org/browse/SERVER-16274
118 https://jira.mongodb.org/browse/SERVER-15471
119 https://jira.mongodb.org/browse/SERVER-17191
120 https://jira.mongodb.org/browse/SERVER-17163
121 https://jira.mongodb.org/browse/SERVER-16984
122 https://jira.mongodb.org/browse/SERVER-10904
123 https://jira.mongodb.org/browse/SERVER-17087
124 https://jira.mongodb.org/browse/SERVER-14572
125 https://jira.mongodb.org/browse/SERVER-17216
126 https://jira.mongodb.org/browse/SERVER-14190
127 https://jira.mongodb.org/browse/SERVER-14803
128 https://jira.mongodb.org/browse/SERVER-15400
795
796
Stability
SERVER-12061141 Do not silently ignore read errors when syncing a replica set node
SERVER-12058142 Primary should abort if encountered problems writing to the oplog
Querying
SERVER-16291143 Cannot set/list/clear index filters on the secondary
SERVER-15958144 The isMultiKey value is not correct in the output of aggregation explain plan
SERVER-15899145 Querying against path in document containing long array of subdocuments with nested
arrays causes stack overflow
SERVER-15696146 $regex, $in and $sort with index returns too many results
SERVER-15639147 Text queries can return incorrect results and leak memory when multiple predicates given
on same text index prefix field
SERVER-15580148 Evaluating candidate query plans with concurrent writes on same collection may crash
mongod
SERVER-15528149 Distinct queries can scan many index keys without yielding read lock
SERVER-15485150 CanonicalQuery::canonicalize can leak a LiteParsedQuery
SERVER-15403151 $min and $max equal errors in 2.6 but not in 2.4
SERVER-15233152 Cannot run planCacheListQueryShapes on a Secondary
SERVER-14799153 count with hint doesnt work when hint is a document
Replication
SERVER-16107154 2.6 mongod crashes with segfault when added to a 2.8 replica set with >= 12 nodes.
SERVER-15994155 listIndexes and listCollections can be run on secondaries without slaveOk bit
SERVER-15849156 do not forward replication progress for nodes that are no longer part of a replica set
SERVER-15491157 SyncSourceFeedback
authenticateInternalUser
can
crash
due
to
SocketException
in
141 https://jira.mongodb.org/browse/SERVER-12061
142 https://jira.mongodb.org/browse/SERVER-12058
143 https://jira.mongodb.org/browse/SERVER-16291
144 https://jira.mongodb.org/browse/SERVER-15958
145 https://jira.mongodb.org/browse/SERVER-15899
146 https://jira.mongodb.org/browse/SERVER-15696
147 https://jira.mongodb.org/browse/SERVER-15639
148 https://jira.mongodb.org/browse/SERVER-15580
149 https://jira.mongodb.org/browse/SERVER-15528
150 https://jira.mongodb.org/browse/SERVER-15485
151 https://jira.mongodb.org/browse/SERVER-15403
152 https://jira.mongodb.org/browse/SERVER-15233
153 https://jira.mongodb.org/browse/SERVER-14799
154 https://jira.mongodb.org/browse/SERVER-16107
155 https://jira.mongodb.org/browse/SERVER-15994
156 https://jira.mongodb.org/browse/SERVER-15849
157 https://jira.mongodb.org/browse/SERVER-15491
797
Sharding
SERVER-15318158 copydb should not use exhaust flag when used against mongos
SERVER-14728159 Shard depends on string comparison of replica set connection string
SERVER-14506160 special top chunk logic can move max chunk to a shard with incompatible tag
SERVER-14299161 For sharded limit=N queries with sort, mongos can request >N results from shard
SERVER-14080162 Have migration result reported in the changelog correctly
SERVER-12472163 Fail MoveChunk if an index is needed on TO shard and data exists
Storage
SERVER-16283164 Cant start new wiredtiger node with log file or config file in data directory - false detection
of old mmapv1 files
SERVER-15986165 Starting with different storage engines in the same dbpath should error/warn
SERVER-14057166 Changing TTL expiration time with collMod does not correctly update index definition
Indexing and write Operations
SERVER-14287167 ensureIndex can abort reIndex and lose indexes
SERVER-14886168 Updates against paths composed with array index notation and positional operator fail with
error
Data Aggregation SERVER-15552169 Errors writing to temporary collections during mapReduce command execution should be operation-fatal
Build and Packaging
SERVER-14184170 Unused preprocessor macros from s2 conflict on OS X Yosemite
SERVER-14015171 S2 Compilation on GCC 4.9/Solaris fails
SERVER-16017172 Suse11 enterprise packages fail due to unmet dependencies
SERVER-15598173 Ubuntu 14.04 Enterprise packages depend on unavailable libsnmp15 package
SERVER-13595174 Red Hat init.d script error: YAML config file parsing
158 https://jira.mongodb.org/browse/SERVER-15318
159 https://jira.mongodb.org/browse/SERVER-14728
160 https://jira.mongodb.org/browse/SERVER-14506
161 https://jira.mongodb.org/browse/SERVER-14299
162 https://jira.mongodb.org/browse/SERVER-14080
163 https://jira.mongodb.org/browse/SERVER-12472
164 https://jira.mongodb.org/browse/SERVER-16283
165 https://jira.mongodb.org/browse/SERVER-15986
166 https://jira.mongodb.org/browse/SERVER-14057
167 https://jira.mongodb.org/browse/SERVER-14287
168 https://jira.mongodb.org/browse/SERVER-14886
169 https://jira.mongodb.org/browse/SERVER-15552
170 https://jira.mongodb.org/browse/SERVER-14184
171 https://jira.mongodb.org/browse/SERVER-14015
172 https://jira.mongodb.org/browse/SERVER-16017
173 https://jira.mongodb.org/browse/SERVER-15598
174 https://jira.mongodb.org/browse/SERVER-13595
798
799
Querying
SERVER-15287190 Query planner sort analysis incorrectly allows index key pattern plugin fields to provide sort
SERVER-15286191 Assertion in date indexes when opposite-direction-sorted and double or filtered
SERVER-15279192 Disable hash-based index intersection (AND_HASH) by default
SERVER-15152193 When evaluating plans, some index candidates cause complete index scan
SERVER-15015194 Assertion failure when combining $max and $min and reverse index scan
SERVER-15012195 Server crashes on indexed rooted $or queries using a 2d index
SERVER-14969196 Dropping index during active aggregation operation can crash server
SERVER-14961197 Plan ranker favors intersection plans if predicate generates empty range index scan
SERVER-14892198 Invalid {$elemMatch:
SERVER-14706199 Queries that use negated $type predicate over a field may return incomplete results when an
index is present on that field
SERVER-13104200 Plan enumerator doesnt enumerate all possibilities for a nested $or
SERVER-14984201 Server aborts when running $centerSphere query with NaN radius
SERVER-14981202
Server
aborts
coarsestIndexedLevel:0
SERVER-14831203 Text search
textIndexVersion=1 used
trips
when
querying
assertion
when
against
default
2dsphere
language
only
index
supported
with
in
Replication
SERVER-15038204 Multiple background index builds may not interrupt cleanly for commands, on secondaries
SERVER-14887205 Allow user document changes made on a 2.4 primary to replicate to a 2.6 secondary
SERVER-14805206 Use multithreaded oplog replay during initial sync
Sharding
SERVER-15056207 Sharded connection cleanup on setup error can crash mongos
SERVER-13702208 Commands without optional query may target to wrong shards on mongos
190 https://jira.mongodb.org/browse/SERVER-15287
191 https://jira.mongodb.org/browse/SERVER-15286
192 https://jira.mongodb.org/browse/SERVER-15279
193 https://jira.mongodb.org/browse/SERVER-15152
194 https://jira.mongodb.org/browse/SERVER-15015
195 https://jira.mongodb.org/browse/SERVER-15012
196 https://jira.mongodb.org/browse/SERVER-14969
197 https://jira.mongodb.org/browse/SERVER-14961
198 https://jira.mongodb.org/browse/SERVER-14892
199 https://jira.mongodb.org/browse/SERVER-14706
200 https://jira.mongodb.org/browse/SERVER-13104
201 https://jira.mongodb.org/browse/SERVER-14984
202 https://jira.mongodb.org/browse/SERVER-14981
203 https://jira.mongodb.org/browse/SERVER-14831
204 https://jira.mongodb.org/browse/SERVER-15038
205 https://jira.mongodb.org/browse/SERVER-14887
206 https://jira.mongodb.org/browse/SERVER-14805
207 https://jira.mongodb.org/browse/SERVER-15056
208 https://jira.mongodb.org/browse/SERVER-13702
800
SERVER-15156209 MongoDB upgrade 2.4 to 2.6 check returns error in config.changelog collection
Storage
SERVER-15369210 explicitly zero .ns files on creation
SERVER-15319211 Verify 2.8 freelist is upgrade-downgrade safe with 2.6
SERVER-15111212 partially written journal last section causes recovery to fail
Indexing
SERVER-14848213 Port index_id_desc.js to v2.6 and master branches
SERVER-14205214 ensureIndex failure reports ok:
1 on some failures
Write Operations
SERVER-15106215 Incorrect nscanned and nscannedObjects for idhack updates in 2.6.4 profiler or slow query
log
SERVER-15029216 The $rename modifier uses incorrect dotted source path
SERVER-14829217 UpdateIndexData::clear() should reset all member variables
Data Aggregation
SERVER-15087218 Server crashes when running concurrent mapReduce and dropDatabase commands
SERVER-14969219 Dropping index during active aggregation operation can crash server
SERVER-14168220 Warning logged when incremental MR collections are unsuccessfully dropped on secondaries
Packaging
SERVER-14679221 (CentOS 7/RHEL 7) init.d script should create directory for pid file if it is missing
SERVER-14023222 Support for RHEL 7 Enterprise .rpm packages
SERVER-13243223 Support for Ubuntu 14 Trusty Enterprise .deb packages
SERVER-11077224 Support for Debian 7 Enterprise .deb packages
SERVER-10642225 Generate Community and Enterprise packages for SUSE 11
209 https://jira.mongodb.org/browse/SERVER-15156
210 https://jira.mongodb.org/browse/SERVER-15369
211 https://jira.mongodb.org/browse/SERVER-15319
212 https://jira.mongodb.org/browse/SERVER-15111
213 https://jira.mongodb.org/browse/SERVER-14848
214 https://jira.mongodb.org/browse/SERVER-14205
215 https://jira.mongodb.org/browse/SERVER-15106
216 https://jira.mongodb.org/browse/SERVER-15029
217 https://jira.mongodb.org/browse/SERVER-14829
218 https://jira.mongodb.org/browse/SERVER-15087
219 https://jira.mongodb.org/browse/SERVER-14969
220 https://jira.mongodb.org/browse/SERVER-14168
221 https://jira.mongodb.org/browse/SERVER-14679
222 https://jira.mongodb.org/browse/SERVER-14023
223 https://jira.mongodb.org/browse/SERVER-13243
224 https://jira.mongodb.org/browse/SERVER-11077
225 https://jira.mongodb.org/browse/SERVER-10642
801
802
803
Storage
SERVER-14198262 Std::set<pointer> and Windows Heap Allocation Reuse produces non-deterministic results
SERVER-13975263 Creating index on collection named system can cause server to abort
SERVER-13729264 Reads & Writes are blocked during data file allocation on Windows
SERVER-13681265 mongod B stalls during background flush on Windows
Indexing SERVER-14494266 Dropping collection during active background index build on secondary triggers segfault
Write Ops
SERVER-14257267 remove command can cause process termination by throwing unhandled exception if profiling is enabled
SERVER-14024268 Update fails when query contains part of a DBRef and results in an insert (upsert:true)
SERVER-13764269 debug mechanisms report incorrect nscanned / nscannedObjects for updates
Networking SERVER-13734270 Remove catch (...) from handleIncomingMsg
Geo
SERVER-14039271 $nearSphere query with 2d index, skip, and limit returns incomplete results
SERVER-13701272 Query using 2d index throws exception when using explain()
Text Search
SERVER-14738273 Updates to documents with text-indexed fields may lead to incorrect entries
SERVER-14027274 Renaming collection within same database fails if wildcard text index present
Tools
SERVER-14212275 mongorestore may drop system users and roles
SERVER-14048276 mongodump against mongos cant send dump to standard output
262 https://jira.mongodb.org/browse/SERVER-14198
263 https://jira.mongodb.org/browse/SERVER-13975
264 https://jira.mongodb.org/browse/SERVER-13729
265 https://jira.mongodb.org/browse/SERVER-13681
266 https://jira.mongodb.org/browse/SERVER-14494
267 https://jira.mongodb.org/browse/SERVER-14257
268 https://jira.mongodb.org/browse/SERVER-14024
269 https://jira.mongodb.org/browse/SERVER-13764
270 https://jira.mongodb.org/browse/SERVER-13734
271 https://jira.mongodb.org/browse/SERVER-14039
272 https://jira.mongodb.org/browse/SERVER-13701
273 https://jira.mongodb.org/browse/SERVER-14738
274 https://jira.mongodb.org/browse/SERVER-14027
275 https://jira.mongodb.org/browse/SERVER-14212
276 https://jira.mongodb.org/browse/SERVER-14048
804
Admin
SERVER-14556277 Default dbpath for mongod --configsvr changes in 2.6
SERVER-14355278 Allow dbAdmin role to manually create system.profile collections
Packaging SERVER-14283279 Parameters in installed config file are out of date
JavaScript
SERVER-14254280 Do not store native function pointer as a property in function prototype
SERVER-13798281 v8 garbage collection can cause crash due to independent lifetime of DBClient and Cursor
objects
SERVER-13707282 mongo shell may crash when converting invalid regular expression
Shell
SERVER-14341283 negative opcounter values in serverStatus
SERVER-14107284 Querying for a document containing a value of either type Javascript or JavascriptWithScope
crashes the shell
Usability SERVER-13833285 userAdminAnyDatabase role should be able to create indexes on admin.system.users
and admin.system.roles
Logging and Diagnostics
SERVER-12512286 Add role-based, selective audit logging.
SERVER-14341287 negative opcounter values in serverStatus
Testing
SERVER-14731288 plan_cache_ties.js sometimes fails
SERVER-14147289 make index_multi.js retry on connection failure
SERVER-13615290 sharding_rs2.js intermittent failure due to reliance on opcounters
277 https://jira.mongodb.org/browse/SERVER-14556
278 https://jira.mongodb.org/browse/SERVER-14355
279 https://jira.mongodb.org/browse/SERVER-14283
280 https://jira.mongodb.org/browse/SERVER-14254
281 https://jira.mongodb.org/browse/SERVER-13798
282 https://jira.mongodb.org/browse/SERVER-13707
283 https://jira.mongodb.org/browse/SERVER-14341
284 https://jira.mongodb.org/browse/SERVER-14107
285 https://jira.mongodb.org/browse/SERVER-13833
286 https://jira.mongodb.org/browse/SERVER-12512
287 https://jira.mongodb.org/browse/SERVER-14341
288 https://jira.mongodb.org/browse/SERVER-14731
289 https://jira.mongodb.org/browse/SERVER-14147
290 https://jira.mongodb.org/browse/SERVER-13615
805
2.6.3 Changes
SERVER-14302291 Fixed: Equality queries on _id with projection may return no results on sharded collections
SERVER-14304292 Fixed: Equality queries on _id with projection on _id may return orphan documents on
sharded collections
2.6.2 Changes
Security
SERVER-13727293 The backup (page 395) authorization role now includes privileges to run the collStats
command.
SERVER-13804294 The built-in role restore (page 395) now has privileges on system.roles collection.
SERVER-13612295 Fixed: SSL-enabled server appears not to be sending the list of supported certificate issuers
to the client
SERVER-13753296 Fixed: mongod may terminate if x.509 authentication certificate is invalid
SERVER-13945297 For replica set/sharded cluster member authentication (page 350), now matches x.509 cluster certificates by attributes instead of by substring comparison.
SERVER-13868298 Now marks V1 users as probed on databases that do not have surrogate user documents.
SERVER-13850299 Now ensures that the user cache entry is up to date before using it to determine a users roles
in user management commands on mongos.
SERVER-13588300 Fixed: Shell prints startup warning when auth enabled
Querying
SERVER-13731301 Fixed: Stack overflow when parsing deeply nested $not query
SERVER-13890302 Fixed: Index bounds builder constructs invalid bounds for multiple negations joined by an
$or
SERVER-13752303 Verified assertion on empty $in clause and sort on second field in a compound index.
SERVER-13337304 Re-enabled idhack for queries with projection.
SERVER-13715305 Fixed: Aggregation pipeline execution can fail with $or and blocking sorts
SERVER-13714306 Fixed: non-top-level indexable $not triggers query planning bug
291 https://jira.mongodb.org/browse/SERVER-14302
292 https://jira.mongodb.org/browse/SERVER-14304
293 https://jira.mongodb.org/browse/SERVER-13727
294 https://jira.mongodb.org/browse/SERVER-13804
295 https://jira.mongodb.org/browse/SERVER-13612
296 https://jira.mongodb.org/browse/SERVER-13753
297 https://jira.mongodb.org/browse/SERVER-13945
298 https://jira.mongodb.org/browse/SERVER-13868
299 https://jira.mongodb.org/browse/SERVER-13850
300 https://jira.mongodb.org/browse/SERVER-13588
301 https://jira.mongodb.org/browse/SERVER-13731
302 https://jira.mongodb.org/browse/SERVER-13890
303 https://jira.mongodb.org/browse/SERVER-13752
304 https://jira.mongodb.org/browse/SERVER-13337
305 https://jira.mongodb.org/browse/SERVER-13715
306 https://jira.mongodb.org/browse/SERVER-13714
806
SERVER-13769307 Fixed: distinct command on indexed field with geo predicate fails to execute
SERVER-13675308 Fixed Plans with differing performance can tie during plan ranking
SERVER-13899309 Fixed: Whole index scan query solutions can use incompatible indexes, return incorrect
results
SERVER-13852310 Fixed IndexBounds::endKeyInclusive not initialized by constructor
SERVER-14073311 planSummary no longer truncated at 255 characters
SERVER-14174312 Fixed: If ntoreturn is a limit (rather than batch size) extra data gets buffered during plan
ranking
SERVER-13789313 Some nested queries no longer trigger an assertion error
SERVER-14064314 Added planSummary information for count command log message.
SERVER-13960315 Queries containing $or no longer miss results if multiple clauses use the same index.
SERVER-14180316 Fixed: Crash with and clause, $elemMatch, and nested $mod or regex
SERVER-14176317 Natural order sort specification no longer ignored if query is specified.
SERVER-13754318 Bounds no longer combined for $or queries that can use merge sort.
Geospatial SERVER-13687319 Results of $near query on compound multi-key 2dsphere index are now sorted by
distance.
Write Operations SERVER-13802320 Insert field validation no longer stops at first Timestamp() field.
Replication
SERVER-13993321 Fixed: log a message when shouldChangeSyncTarget() believes a node should
change sync targets
SERVER-13976322 Fixed: Cloner needs to detect failure to create collection
Sharding
SERVER-13616323 Resolved: type 7 (OID) error when acquiring distributed lock for first time
SERVER-13812324 Now catches exception thrown by getShardsForQuery for geo query.
307 https://jira.mongodb.org/browse/SERVER-13769
308 https://jira.mongodb.org/browse/SERVER-13675
309 https://jira.mongodb.org/browse/SERVER-13899
310 https://jira.mongodb.org/browse/SERVER-13852
311 https://jira.mongodb.org/browse/SERVER-14073
312 https://jira.mongodb.org/browse/SERVER-14174
313 https://jira.mongodb.org/browse/SERVER-13789
314 https://jira.mongodb.org/browse/SERVER-14064
315 https://jira.mongodb.org/browse/SERVER-13960
316 https://jira.mongodb.org/browse/SERVER-14180
317 https://jira.mongodb.org/browse/SERVER-14176
318 https://jira.mongodb.org/browse/SERVER-13754
319 https://jira.mongodb.org/browse/SERVER-13687
320 https://jira.mongodb.org/browse/SERVER-13802
321 https://jira.mongodb.org/browse/SERVER-13993
322 https://jira.mongodb.org/browse/SERVER-13976
323 https://jira.mongodb.org/browse/SERVER-13616
324 https://jira.mongodb.org/browse/SERVER-13812
807
SERVER-14138325 mongos will now correctly target multiple shards for nested field shard key predicates.
SERVER-11332326 Fixed: Authentication requests delayed if first config server is unresponsive
Map/Reduce
SERVER-14186327 Resolved: rs.stepDown during mapReduce causes fassert in logOp
SERVER-13981328 Temporary map/reduce collections are now correctly replicated to secondaries.
Storage
SERVER-13750329 convertToCapped on empty collection no longer aborts after invariant() failure.
SERVER-14056330 Moving large collection across databases with renameCollection no longer triggers fatal
assertion.
SERVER-14082331 Fixed: Excessive freelist scanning for MaxBucket
SERVER-13737332 CollectionOptions parser now skips non-numeric for size/max elements if values nonnumeric.
Build and Packaging
SERVER-13950333 MongoDB Enterprise now includes required dependency list.
SERVER-13862334 Support for mongodb-org-server installation 2.6.1-1 on RHEL5 via RPM.
SERVER-13724335 Added SCons flag to override treating all warnings as errors.
Diagnostics
SERVER-13587336 Resolved: ndeleted in system.profile documents reports 1 too few documents
removed
SERVER-13368337 Improved exposure of timing information in currentOp.
Administration SERVER-13954338 security.javascriptEnabled option is now available in the YAML
configuration file.
325 https://jira.mongodb.org/browse/SERVER-14138
326 https://jira.mongodb.org/browse/SERVER-11332
327 https://jira.mongodb.org/browse/SERVER-14186
328 https://jira.mongodb.org/browse/SERVER-13981
329 https://jira.mongodb.org/browse/SERVER-13750
330 https://jira.mongodb.org/browse/SERVER-14056
331 https://jira.mongodb.org/browse/SERVER-14082
332 https://jira.mongodb.org/browse/SERVER-13737
333 https://jira.mongodb.org/browse/SERVER-13950
334 https://jira.mongodb.org/browse/SERVER-13862
335 https://jira.mongodb.org/browse/SERVER-13724
336 https://jira.mongodb.org/browse/SERVER-13587
337 https://jira.mongodb.org/browse/SERVER-13368
338 https://jira.mongodb.org/browse/SERVER-13954
808
Tools
SERVER-10464339 mongodump can now query oplog.$main and oplog.rs when using --dbpath.
SERVER-13760340 mongoexport can now handle large timestamps on Windows.
Shell
SERVER-13865341 Shell now returns correct WriteResult for compatibility-mode upsert with non-OID
equality predicate on _id field.
SERVER-13037342 Fixed typo in error message for compatibility mode.
Internal Code
SERVER-13794343 Fixed: Unused snapshot history consuming significant heap space
SERVER-13446344 Removed Solaris builds dependency on ILLUMOS libc.
SERVER-14092345 MongoDB upgrade 2.4 to 2.6 check no longer returns an error in internal collections.
SERVER-14000346 Added new lsb file location for Debian 7.1
Testing
SERVER-13723347 Stabilized tags.js after a change in its timeout when it was ported to use write commands.
SERVER-13494348 Fixed: setup_multiversion_mongodb.py doesnt download 2.4.10 because of
non-numeric version sorting
SERVER-13603349 Fixed: Test suites with options tests fail when run with --nopreallocj
SERVER-13948350 Fixed: awaitReplication() failures related to getting a config version from master
causing test failures
SERVER-13839351 Fixed sync2.js failure.
SERVER-13972352 Fixed connections_opened.js failure.
SERVER-13712353 Reduced peak disk usage of test suites.
SERVER-14249354 Added tests for querying oplog via mongodump using --dbpath
SERVER-10462355 Fixed: Windows file locking related buildbot failures
339 https://jira.mongodb.org/browse/SERVER-10464
340 https://jira.mongodb.org/browse/SERVER-13760
341 https://jira.mongodb.org/browse/SERVER-13865
342 https://jira.mongodb.org/browse/SERVER-13037
343 https://jira.mongodb.org/browse/SERVER-13794
344 https://jira.mongodb.org/browse/SERVER-13446
345 https://jira.mongodb.org/browse/SERVER-14092
346 https://jira.mongodb.org/browse/SERVER-14000
347 https://jira.mongodb.org/browse/SERVER-13723
348 https://jira.mongodb.org/browse/SERVER-13494
349 https://jira.mongodb.org/browse/SERVER-13603
350 https://jira.mongodb.org/browse/SERVER-13948
351 https://jira.mongodb.org/browse/SERVER-13839
352 https://jira.mongodb.org/browse/SERVER-13972
353 https://jira.mongodb.org/browse/SERVER-13712
354 https://jira.mongodb.org/browse/SERVER-14249
355 https://jira.mongodb.org/browse/SERVER-10462
809
2.6.1 Changes
Stability SERVER-13739356 Repair database failure can delete database files
Build and Packaging
SERVER-13287357 Addition of debug symbols has doubled compile time
SERVER-13563358 Upgrading from 2.4.x to 2.6.0 via yum clobbers configuration file
SERVER-13691359 yum and apt stable repositories contain release candidate 2.6.1-rc0 packages
SERVER-13515360 Cannot install MongoDB as a service on Windows
Querying
SERVER-13066361 Negations over multikey fields do not use index
SERVER-13495362 Concurrent GETMORE and KILLCURSORS operations can cause race condition and server
crash
SERVER-13503363 The $where operator should not be allowed under $elemMatch
SERVER-13537364 Large skip and and limit values can cause crash in blocking sort stage
SERVER-13557365 Incorrect negation of $elemMatch value in 2.6
SERVER-13562366 Queries that use tailable cursors do not stream results if skip() is applied
SERVER-13566367 Using the OplogReplay flag with extra predicates can yield incorrect results
SERVER-13611368 Missing sort order for compound index leads to unnecessary in-memory sort
SERVER-13618369 Optimization for sorted $in queries not applied to reverse sort order
SERVER-13661370 Increase the maximum allowed depth of query objects
SERVER-13664371 Query with $elemMatch using a compound multikey index can generate incorrect results
SERVER-13677372 Query planner should traverse through $all while handling $elemMatch object predicates
SERVER-13766373 Dropping index or collection while $or query is yielding triggers fatal assertion
356 https://jira.mongodb.org/browse/SERVER-13739
357 https://jira.mongodb.org/browse/SERVER-13287
358 https://jira.mongodb.org/browse/SERVER-13563
359 https://jira.mongodb.org/browse/SERVER-13691
360 https://jira.mongodb.org/browse/SERVER-13515
361 https://jira.mongodb.org/browse/SERVER-13066
362 https://jira.mongodb.org/browse/SERVER-13495
363 https://jira.mongodb.org/browse/SERVER-13503
364 https://jira.mongodb.org/browse/SERVER-13537
365 https://jira.mongodb.org/browse/SERVER-13557
366 https://jira.mongodb.org/browse/SERVER-13562
367 https://jira.mongodb.org/browse/SERVER-13566
368 https://jira.mongodb.org/browse/SERVER-13611
369 https://jira.mongodb.org/browse/SERVER-13618
370 https://jira.mongodb.org/browse/SERVER-13661
371 https://jira.mongodb.org/browse/SERVER-13664
372 https://jira.mongodb.org/browse/SERVER-13677
373 https://jira.mongodb.org/browse/SERVER-13766
810
Geospatial
SERVER-13666374 $near queries with out-of-bounds points in legacy format can lead to crashes
SERVER-13540375 The geoNear command no longer returns distance in radians for legacy point
SERVER-13486376 : The geoNear command can create too large BSON objects for aggregation.
Replication
SERVER-13500377 Changing replica set configuration can crash running members
SERVER-13589378 Background index builds from a 2.6.0 primary fail to complete on 2.4.x secondaries
SERVER-13620379 Replicated data definition commands will fail on secondaries during background index build
SERVER-13496380 Creating index with same name but different spec in mixed version replicaset can abort
replication
Sharding
SERVER-12638381 Initial sharding with hashed shard key can result in duplicate split points
SERVER-13518382 The _id field is no longer automatically generated by mongos when missing
SERVER-13777383 Migrated ranges waiting for deletion do not report cursors still open
Security
SERVER-9358384 Log rotation can overwrite previous log files
SERVER-13644385 Sensitive credentials in startup options are not redacted and may be exposed
SERVER-13441386 Inconsistent error handling in user management shell helpers
Write Operations
SERVER-13466387 Error message in collection creation failure contains incorrect namespace
SERVER-13499388 Yield policy for batch-inserts should be the same as for batch-updates/deletes
SERVER-13516389 Array updates on documents with more than 128 BSON elements may crash mongod
374 https://jira.mongodb.org/browse/SERVER-13666
375 https://jira.mongodb.org/browse/SERVER-13540
376 https://jira.mongodb.org/browse/SERVER-13486
377 https://jira.mongodb.org/browse/SERVER-13500
378 https://jira.mongodb.org/browse/SERVER-13589
379 https://jira.mongodb.org/browse/SERVER-13620
380 https://jira.mongodb.org/browse/SERVER-13496
381 https://jira.mongodb.org/browse/SERVER-12638
382 https://jira.mongodb.org/browse/SERVER-13518
383 https://jira.mongodb.org/browse/SERVER-13777
384 https://jira.mongodb.org/browse/SERVER-9358
385 https://jira.mongodb.org/browse/SERVER-13644
386 https://jira.mongodb.org/browse/SERVER-13441
387 https://jira.mongodb.org/browse/SERVER-13466
388 https://jira.mongodb.org/browse/SERVER-13499
389 https://jira.mongodb.org/browse/SERVER-13516
811
Decreased mongos memory footprint when shards have several tags SERVER-16683395
Removed check for shard version if the primary server is down SERVER-16237396
Fixed: /etc/init.d/mongod startup script failure with dirname message SERVER-16081397
Fixed: mongos can cause shards to hit the in-memory sort limit by requesting more results than needed
SERVER-14306398
All issues closed in 2.6.7399
2.6.6 December 09, 2014
Fixed: Evaluating candidate query plans with concurrent writes on same collection may crash mongod
SERVER-15580400
Fixed: 2.6 mongod crashes with segfault when added to a 2.8 replica set with 12 or more members SERVER16107401
Fixed: $regex, $in and $sort with index returns too many results SERVER-15696402
Change: moveChunk will fail if there is data on the target shard and a required index does not exist. SERVER12472403
Primary should abort if encountered problems writing to the oplog SERVER-12058404
All issues closed in 2.6.6405
390 https://jira.mongodb.org/browse/SERVER-17087
391 https://jira.mongodb.org/browse/SERVER-16599
392 https://jira.mongodb.org/browse/SERVER-16274
393 https://jira.mongodb.org/browse/SERVER-15802
394 https://jira.mongodb.org/issues/?jql=fixVersion%20%3D%20%222.6.8%22%20AND%20project%20%3D%20SERVER
395 https://jira.mongodb.org/browse/SERVER-16683
396 https://jira.mongodb.org/browse/SERVER-16237
397 https://jira.mongodb.org/browse/SERVER-16081
398 https://jira.mongodb.org/browse/SERVER-14306
399 https://jira.mongodb.org/issues/?jql=fixVersion%20%3D%20%222.6.7%22%20AND%20project%20%3D%20SERVER
400 https://jira.mongodb.org/browse/SERVER-15580
401 https://jira.mongodb.org/browse/SERVER-16107
402 https://jira.mongodb.org/browse/SERVER-15696
403 https://jira.mongodb.org/browse/SERVER-12472
404 https://jira.mongodb.org/browse/SERVER-12058
405 https://jira.mongodb.org/issues/?jql=fixVersion%20%3D%20%222.6.6%22%20AND%20project%20%3D%20SERVER
812
Fix for text index where under specific circumstances, in-place updates to a text-indexed field may result in
incorrect/incomplete results SERVER-14738412
Check the size of the split point before performing a manual split chunk operation SERVER-14431413
Ensure read preferences are re-evaluated by drawing secondary connections from a global pool and releasing
back to the pool at the end of a query/command SERVER-9788414
Allow read from secondaries when both audit and authorization are enabled in a sharded cluster SERVER14170415
All issues closed in 2.6.4416
2.6.3 June 19, 2014
Equality queries on _id with projection may return no results on sharded collections SERVER-14302417 .
Equality queries on _id with projection on _id may return orphan documents on sharded collections SERVER14304418 .
All issues closed in 2.6.3419 .
2.6.2 June 16, 2014
Query plans with differing performance can tie during plan ranking SERVER-13675420 .
406 https://jira.mongodb.org/browse/SERVER-15029
407 https://jira.mongodb.org/browse/SERVER-15111
408 https://jira.mongodb.org/browse/SERVER-15369
409 https://jira.mongodb.org/browse/SERVER-14961
410 https://jira.mongodb.org/browse/SERVER-10642
411 https://jira.mongodb.org/issues/?jql=fixVersion%20%3D%20%222.6.5%22%20AND%20project%20%3D%20SERVER
412 https://jira.mongodb.org/browse/SERVER-14738
413 https://jira.mongodb.org/browse/SERVER-14431
414 https://jira.mongodb.org/browse/SERVER-9788
415 https://jira.mongodb.org/browse/SERVER-14170
416 https://jira.mongodb.org/issues/?jql=fixVersion%20%3D%20%222.6.4%22%20AND%20project%20%3D%20SERVER
417 https://jira.mongodb.org/browse/SERVER-14302
418 https://jira.mongodb.org/browse/SERVER-14304
419 https://jira.mongodb.org/issues/?jql=fixVersion%20%3D%20%222.6.3%22%20AND%20project%20%3D%20SERVER
420 https://jira.mongodb.org/browse/SERVER-13675
813
Fix to install MongoDB service on Windows with the --install option SERVER-13515426 .
Allow direct upgrade from 2.4.x to 2.6.0 via yum SERVER-13563427 .
Fix issues with background index builds on secondaries: SERVER-13589428 and SERVER-13620429 .
Redact credential information passed as startup options SERVER-13644430 .
2.6.1 Changelog (page 810).
All issues closed in 2.6.1431 .
Major Changes
The following changes in MongoDB affect both the standard and Enterprise editions:
Aggregation Enhancements
The aggregation pipeline adds the ability to return result sets of any size, either by returning a cursor or writing the
output to a collection. Additionally, the aggregation pipeline supports variables and adds new operations to handle sets
and redact data.
The db.collection.aggregate() now returns a cursor, which enables the aggregation pipeline to return
result sets of any size.
Aggregation pipelines now support an explain operation to aid analysis of aggregation operations.
Aggregation can now use a more efficient external-disk-based sorting process.
New pipeline stages:
$out stage to output to a collection.
$redact stage to allow additional control to accessing the data.
New or modified operators:
set expression operators.
421 https://jira.mongodb.org/browse/SERVER-13753
422 https://jira.mongodb.org/browse/SERVER-13981
423 https://jira.mongodb.org/browse/SERVER-14138
424 https://jira.mongodb.org/browse/SERVER-14186
425 https://jira.mongodb.org/issues/?jql=fixVersion%20%3D%20%222.6.2%22%20AND%20project%20%3D%20SERVER
426 https://jira.mongodb.org/browse/SERVER-13515
427 https://jira.mongodb.org/browse/SERVER-13563
428 https://jira.mongodb.org/browse/SERVER-13589
429 https://jira.mongodb.org/browse/SERVER-13620
430 https://jira.mongodb.org/browse/SERVER-13644
431 https://jira.mongodb.org/issues/?jql=fixVersion%20%3D%20%222.6.1%22%20AND%20project%20%3D%20SERVER
814
Text search is now enabled by default, and the query system, including the aggregation pipeline $match stage,
includes the $text operator, which resolves text-search queries.
MongoDB 2.6 includes an updated text index (page 486) format and deprecates the text command.
Insert and Update Improvements
Improvements to the update and insert systems include additional operations and improvements that increase consistency of modified data.
MongoDB preserves the order of the document fields following write operations except for the following cases:
The _id field is always the first field in the document.
Updates that include renaming of field names may result in the reordering of fields in the document.
New or enhanced update operators:
$bit operator supports bitwise xor operation.
$min and $max operators that perform conditional update depending on the relative size of the specified
value and the current value of a field.
$push operator has enhanced support for the $sort, $slice, and $each modifiers and supports a new
$position modifier.
$currentDate operator to set the value of a field to the current date.
The $mul operator for multiplicative increments for insert and update operations.
See also:
Update Operator Syntax Validation (page 824)
New Write Operation Protocol
A new write protocol integrates write operations with write concerns. The protocol also provides improved support
for bulk operations.
MongoDB 2.6 adds the write commands insert, update, and delete, which provide the basis for the improved
bulk insert. All officially supported MongoDB drivers support the new write commands.
The mongo shell now includes methods to perform bulk-write operations. See Bulk() for more information.
See also:
Write Method Acknowledgements (page 821)
815
MongoDB now distributes MSI packages for Microsoft Windows. This is the recommended method for MongoDB
installation under Windows.
Security Improvements
MongoDB 2.6 enhances support for secure deployments through improved SSL support, x.509-based authentication,
an improved authorization system with more granular controls, as well as centralized credential storage, and improved
user management tools.
Specifically these changes include:
A new authorization model (page 312) that provides the ability to create custom User-Defined Roles (page 313)
and the ability to specify user privileges at a collection-level granularity.
Global user management, which stores all user and user-defined role data in the admin database and provides
a new set of commands for managing users and roles.
x.509 certificate authentication for client authentication (page 348) as well as for internal authentication
(page 350) of sharded and/or replica set cluster members. x.509 authentication is only available for deployments using SSL.
Enhanced SSL Support:
Rolling upgrades of clusters (page 338) to use SSL.
MongoDB Tools (page 337) support connections to mongod and mongos instances using SSL connections.
Prompt for passphrase (page 334) by mongod or mongos at startup.
Require the use of strong SSL ciphers, with a minimum 128-bit key length for all connections. The strongcipher requirement prevents an old or malicious client from forcing use of a weak cipher.
MongoDB disables the http interface by default, limiting network exposure (page 314). To enable the interface,
see enabled.
See also:
New Authorization Model (page 823), SSL Certificate Hostname Validation (page 823), and Security Checklist
(page 322).
Query Engine Improvements
MongoDB can now use index intersection (page 495) to fulfill queries supported by more than one index.
Index Filters (page 67) to limit which indexes can become the winning plan for a query.
http://docs.mongodb.org/manual/reference/method/js-plan-cache methods to view
and clear the query plans (page 66) cached by the query optimizer.
MongoDB can now use count() with hint(). See count() for details.
Improvements
Geospatial Enhancements
Support for MultiPoint (page 481), MultiLineString (page 482), MultiPolygon (page 482), and GeometryCollection (page 482).
Support for geospatial query clauses in $or expressions.
See also:
2dsphere Index Version 2 (page 823), $maxDistance Changes (page 826), Deprecated $uniqueDocs (page 826),
Stronger Validation of Geospatial Queries (page 826)
Index Build Enhancements
Background index build (page 493) allowed on secondaries. If you initiate a background index build on a
primary, the secondaries will replicate the index build in the background.
Automatic rebuild of interrupted index builds after a restart.
If a standalone or a primary instance terminates during an index build without a clean shutdown, mongod
now restarts the index build when the instance restarts. If the instance shuts down cleanly or if a user kills
the index build, the interrupted index builds do not automatically restart upon the restart of the server.
If a secondary instance terminates during an index build, the mongod instance will now restart the interrupted index build when the instance restarts.
To disable this behavior, use the --noIndexBuildRetry command-line option.
ensureIndex() now wraps a new createIndex command.
The dropDups option to ensureIndex() and createIndex is deprecated.
See also:
Enforce Index Key Length Limit (page 819)
Enhanced Sharding and Replication Administration
MongoDB 2.6 supports a YAML-based configuration file format in addition to the previous configuration file format.
See the documentation of the Configuration File for more information.
Operational Changes
Storage
usePowerOf2Sizes is now the default allocation strategy for all new collections. The new allocation strategy uses
more storage relative to total document size but results in lower levels of storage fragmentation and more predictable
storage capacity planning over time.
12.2. Previous Stable Releases
817
Removed upward limit for the maxIncomingConnections for mongod and mongos. Previous versions
capped the maximum possible maxIncomingConnections setting at 20,000 connections.
Connection pools for a mongos instance may be used by multiple MongoDB servers. This can reduce the
number of connections needed for high-volume workloads and reduce resource consumption in sharded clusters.
The C++ driver now monitors replica set health with the isMaster command instead of
replSetGetStatus. This allows the C++ driver to support systems that require authentication.
New cursor.maxTimeMS() and corresponding maxTimeMS option for commands to specify a time limit.
Tool Improvements
MongoDB Enterprise for Windows (page 41) is now available. It includes support for Kerberos, SSL, and SNMP.
MongoDB Enterprise for Windows does not include LDAP support for authentication. However, MongoDB Enterprise
for Linux supports using LDAP authentication with an ActiveDirectory server.
MongoDB Enterprise for Windows includes OpenSSL version 1.0.1g.
818
Auditing
MongoDB Enterprise adds auditing (page 317) capability for mongod and mongos instances.
(page 317) for details.
See Auditing
MongoDB Enterprise provides support for proxy authentication of users. This allows administrators to configure a
MongoDB cluster to authenticate users by proxying authentication requests to a specified Lightweight Directory Access Protocol (LDAP) service. See Authenticate Using SASL and LDAP with OpenLDAP (page 356) and Authenticate
Using SASL and LDAP with ActiveDirectory (page 354) for details.
MongoDB Enterprise for Windows does not include LDAP support for authentication. However, MongoDB Enterprise
for Linux supports using LDAP authentication with an ActiveDirectory server.
MongoDB does not support LDAP authentication in mixed sharded cluster deployments that contain both version 2.4
and version 2.6 shards. See Upgrade MongoDB to 2.6 (page 829) for upgrade instructions.
Expanded SNMP Support
MongoDB Enterprise has greatly expanded its SNMP support to provide SNMP access to nearly the full range of
metrics provided by db.serverStatus().
See also:
SNMP Changes (page 824)
Additional Information
Changes Affecting Compatibility
Compatibility Changes in MongoDB 2.6 The following 2.6 changes can affect the compatibility with older versions of MongoDB. See Release Notes for MongoDB 2.6 (page 794) for the full list of the 2.6 changes.
Index Changes
Enforce Index Key Length Limit
Description MongoDB 2.6 implements a stronger enforcement of the limit on index key.
Creating indexes will error if an index key in an existing document exceeds the limit:
db.collection.ensureIndex(),
db.collection.reIndex(),
compact,
and
repairDatabase will error and not create the index. Previous versions of MongoDB would
create the index but not index such documents.
Because db.collection.reIndex(), compact, and repairDatabase drop all the indexes
from a collection and then recreate them sequentially, the error from the index key limit prevents these operations from rebuilding any remaining indexes for the collection and, in the case of the repairDatabase
command, from continuing with the remainder of the process.
Inserts will error:
819
db.collection.insert()
and
other
operations
that
perform
inserts
(e.g.
db.collection.save() and db.collection.update() with upsert that result in inserts) will fail to insert if the new document has an indexed field whose corresponding index entry exceeds
the limit. Previous versions of MongoDB would insert but not index such documents.
mongorestore and mongoimport will fail to insert if the new document has an indexed field whose
corresponding index entry exceeds the limit.
Updates will error:
db.collection.update() and db.collection.save() operations on an indexed field will
error if the updated value causes the index entry to exceed the limit.
If an existing document contains an indexed field whose index entry exceeds the limit, updates on other
fields that result in the relocation of a document on disk will error.
Chunk Migration will fail:
Migrations will fail for a chunk that has a document with an indexed field whose index entry exceeds the
limit.
If left unfixed, the chunk will repeatedly fail migration, effectively ceasing chunk balancing for that collection. Or, if chunk splits occur in response to the migration failures, this response would lead to unnecessarily large number of chunks and an overly large config databases.
Secondary members of replica sets will warn:
Secondaries will continue to replicate documents with an indexed field whose corresponding index entry
exceeds the limit on initial sync but will print warnings in the logs.
Secondaries allow index build and rebuild operations on a collection that contains an indexed field whose
corresponding index entry exceeds the limit but with warnings in the logs.
With mixed version replica sets where the secondaries are version 2.6 and the primary is version 2.4,
secondaries will replicate documents inserted or updated on the 2.4 primary, but will print error messages
in the log if the documents contain an indexed field whose corresponding index entry exceeds the limit.
Solution Run db.upgradeCheckAllDBs() to find current keys that violate this limit and correct as appropriate.
Preferably, run the test before upgrading; i.e. connect the 2.6 mongo shell to your MongoDB 2.4 database and
run the method.
If you have an existing data set and want to disable the default index key length validation so that you can upgrade
before resolving these indexing issues, use the failIndexKeyTooLong parameter.
Index Specifications Validate Field Names
Description In MongoDB 2.6, create and re-index operations fail when the index key refers to an empty field, e.g.
"a..b" : 1 or the field name starts with a dollar sign ($).
db.collection.ensureIndex() will not create a new index with an invalid or empty key name.
db.collection.reIndex(), compact, and repairDatabase will error if an index exists with
an invalid or empty key name.
Chunk migration will fail if an index exists with an invalid or empty key name.
Previous versions of MongoDB allow the index.
Solution Run db.upgradeCheckAllDBs() to find current keys that violate this limit and correct as appropriate.
Preferably, run the test before upgrading; i.e. connect the 2.6 mongo shell to your MongoDB 2.4 database and
run the method.
820
if you specify an index name that already exists but the key specifications differ; e.g. in the following
example, the second db.collection.ensureIndex() will error.
db.mycollection.ensureIndex( { a: 1 }, { name: "myIdx" } )
db.mycollection.ensureIndex( { z: 1 }, { name: "myIdx" } )
Previous versions did not create the index but did not error.
Write Method Acknowledgements
Description The mongo shell write methods db.collection.insert(), db.collection.update(),
db.collection.save() and db.collection.remove() now integrate the write concern (page 76)
directly into the method rather than with a separate getLastError command to provide safe writes (page 77)
whether run interactively in the mongo shell or non-interactively in a script. In previous versions, these methods
exhibited a fire-and-forget behavior. 432
Existing scripts for the mongo shell that used these methods will now observe safe writes which take longer
than the previous fire-and-forget behavior.
The write methods now return a WriteResult object that contains the results of the operation, including any write errors and write concern errors, and obviates the need to call getLastError command to get the status of the results. See db.collection.insert(), db.collection.update(),
db.collection.save() and db.collection.remove() for details.
In sharded environments, mongos no longer supports fire-and-forget behavior. This limits throughput when
writing data to sharded clusters.
Solution Scripts that used these mongo shell methods for bulk write operations with fire-and-forget behavior should
use the Bulk() methods.
In sharded environments, applications using any driver or mongo shell should use Bulk() methods for optimal
performance when inserting or modifying groups of documents.
For example, instead of:
for (var i = 1; i <= 1000000; i++) {
db.test.insert( { x : i } );
}
821
Bulk method returns a BulkWriteResult object that contains the result of the operation.
See also:
New
Write
Operation
Protocol
(page
815),
Bulk(),
Bulk.execute(),
db.collection.initializeUnorderedBulkOp(), db.collection.initializeOrderedBulkOp()
db.collection.aggregate() Change
Description The db.collection.aggregate() method in the mongo shell defaults to returning a cursor to
the results set. This change enables the aggregation pipeline to return result sets of any size and requires cursor
iteration to access the result set. For example:
var myCursor = db.orders.aggregate( [
{
$group: {
_id: "$cust_id",
total: { $sum: "$price" }
}
}
] );
myCursor.forEach( function(x) { printjson (x); } );
Previous versions returned a single document with a field results that contained an array of the result set,
subject to the BSON Document size limit. Accessing the result set in the previous versions of MongoDB required
accessing the results field and iterating the array. For example:
var returnedDoc = db.orders.aggregate( [
{
$group: {
_id: "$cust_id",
total: { $sum: "$price" }
}
}
] );
var myArray = returnedDoc.result; // access the result field
myArray.forEach( function(x) { printjson (x); } );
Solution Update scripts that currently expect db.collection.aggregate() to return a document with a
results array to handle cursors instead.
See also:
Aggregation Enhancements (page 814), db.collection.aggregate(),
Write Concern Validation
Description Specifying a write concern that includes j: true to a mongod or mongos instance running with
--nojournal option now errors. Previous versions would ignore the j: true.
Solution Either remove the j: true specification from the write concern when issued against a mongod or
mongos instance with --nojournal or run mongod or mongos with journaling.
Security Changes
822
MongoDB
provides
the
mongod and mongos to bypass the validation of SSL certificates on other servers in the cluster.
mongo shell, MongoDB tools that support SSL (page 337), and the C++ driver to bypass the validation of
server certificates.
When using the allowInvalidCertificates setting, MongoDB logs as a warning the use of the invalid
certificates.
Warning: The allowInvalidCertificates setting bypasses the other certificate validation, such as
checks for expiration and valid signatures.
1 }
See also:
2dsphere Version 2 (page 479)
Log Messages
12.2. Previous Stable Releases
823
Update operators (e.g $set) cannot repeat in the update statement. For example, the following
expression is invalid:
{ $set: { a: 5 }, $set: { b: 5 } }
824
Run
{ $exists:
Solution To override the behavior to use the sparse index and return incomplete results, explicitly specify the index
with a hint().
See Sparse Index On A Collection Cannot Return Complete Results (page 492) for an example that details the new
behavior.
sort() Specification Values
Description The sort() method only accepts the following values for the sort keys:
1 to specify ascending order for a field,
-1 to specify descending order for a field, or
$meta expression to specify sort by the text search score.
825
826
The following query uses the index to search for documents where price is not greater than or equal to
50:
db.orders.find( { price: { $not: { $gte: 50 } } } )
In previous versions, indexed plans would only return matching documents where the type of the field
matches the type of the query predicate:
{ "_id" : 1, "status" : "A", "cust_id" : "123", "price" : 40 }
If using a collection scan, previous versions would return the same results as those in 2.6.
MongoDB 2.6 allows chaining of $not expressions.
null Comparison Queries
Description
$lt and $gt comparisons to null no longer match documents that are missing the field.
null equality conditions on array elements (e.g. "a.b":
the nested field a.b (e.g. a: [ 2, 3 ]).
null equality queries (i.e. field:
827
nested array (e.g. field: [ "A", "B" ]). Earlier version could only match documents where the
field contains the nested array.
The $all operator returns no match if the array field contains nested arrays (e.g. field: [ "a",
["b"] ]) and $all on the nested field is the element of the nested array (e.g. "field.1": {
$all: [ "b" ] }). Previous versions would return a match.
$mod Operator Enforces Strict Syntax
Description The $mod operator now only accepts an array with exactly two elements, and errors when passed an
array with fewer or more elements. See mod-not-enough-elements and mod-too-many-elements for details.
In previous versions, if passed an array with one element, the $mod operator uses 0 as the second element,
and if passed an array with more than two elements, the $mod ignores all but the first two elements. Previous
versions do return an error when passed an empty array.
Solution Ensure that the array passed to $mod contains exactly two elements:
If the array contains the a single element, add 0 as the second element.
If the array contains more than two elements, remove the extra elements.
$where Must Be Top-Level
Description $where expressions can now only be at top level and cannot be nested within another expression, such
as $elemMatch.
Solution Update existing queries that nest $where.
$exists and notablescan If the MongoDB server has disabled collection scans, i.e. notablescan, then
$exists queries that have no indexed solution will error.
MinKey and MaxKey Queries
Description Equality match for either MinKey or MaxKey no longer match documents missing the field.
Nested Array Queries with $elemMatch
Description The $elemMatch query operator no longer traverses recursively into nested arrays.
For example, if a collection test contains the following document:
{ "_id": 1, "a" : [ [ 1, 2, 5 ] ] }
In 2.6, the following $elemMatch query does not match the document:
db.test.find( { a: { $elemMatch: { $gt: 1, $lt: 5 } } } )
Solution Update existing queries that rely upon the old behavior.
Text Search Compatibility MongoDB does not support the use of the $text query operator in mixed sharded
cluster deployments that contain both version 2.4 and version 2.6 shards. See Upgrade MongoDB to 2.6 (page 829)
for upgrade instructions.
Replica Set/Sharded Cluster Validation
828
Upgrade MongoDB to 2.6 In the general case, the upgrade from MongoDB 2.4 to 2.6 is a binary-compatible dropin upgrade: shut down the mongod instances and replace them with mongod instances running 2.6. However, before
you attempt any upgrade, familiarize yourself with the content of this document, particularly the Upgrade Recommendations and Checklists (page 830), the procedure for upgrading sharded clusters (page 831), and the considerations
for reverting to 2.4 after running 2.6 (page 835).
433 http://docs.mongodb.org/ecosystem/tools/http-interfaces
434 https://jira.mongodb.org/issues/?jql=project%20%3D%20SERVER%20AND%20fixVersion%20in%20(%222.5.0%22%2C%20%222.5.1%22%2C%20%222.5.2%22
rc1%22%2C%20%222.6.0-rc2%22)%20AND%20%22Backwards%20Compatibility%22%20in%20%20(%22Major%20Change%22%2C%20%22Minor%20Change%2
435 https://jira.mongodb.org/issues/?jql=project%20%3D%20SERVER%20AND%20fixVersion%20in%20(%222.5.0%22%2C%20%222.5.1%22%2C%20%222.5.2%22
rc1%22%2C%20%222.6.0-rc2%22%2C%20%222.6.0-rc3%22)%20AND%20%22Backwards%20Compatibility%22%20in%20(%20%22Minor%20Change%22%2C%2
829
830
If you did not install the mongodb-org package, and installed a subset of MongoDB components replace
mongodb-org in the commands above with the appropriate package names.
See installation instructions for Ubuntu (page 14), RHEL (page 7), Debian (page 17), or other Linux Systems (page 19)
for a list of the available packages and complete MongoDB installation instructions.
Upgrade MongoDB Processes
Upgrade Standalone mongod Instance to MongoDB 2.6 The following steps outline the procedure to upgrade a
standalone mongod from version 2.4 to 2.6. To upgrade from version 2.2 to 2.6, upgrade to version 2.4 (page 855)
first, and then follow the procedure to upgrade from 2.4 to 2.6.
1. Download binaries of the latest release in the 2.6 series from the MongoDB Download Page436 . See Install
MongoDB (page 5) for more information.
2. Shut down your mongod instance. Replace the existing binary with the 2.6 mongod binary and restart
mongod.
Upgrade a Replica Set to 2.6 The following steps outline the procedure to upgrade a replica set from MongoDB
2.4 to MongoDB 2.6. To upgrade from MongoDB 2.2 to 2.6, upgrade all members of the replica set to version 2.4
(page 855) first, and then follow the procedure to upgrade from MongoDB 2.4 to 2.6.
You can upgrade from MongoDB 2.4 to 2.6 using a rolling upgrade to minimize downtime by upgrading the members individually while the other members are available:
Step 1: Upgrade secondary members of the replica set. Upgrade the secondary members of the set one at a time
by shutting down the mongod and replacing the 2.4 binary with the 2.6 binary. After upgrading a mongod instance,
wait for the member to recover to SECONDARY state before upgrading the next instance. To check the members state,
issue rs.status() in the mongo shell.
Step 2: Step down the replica set primary. Use rs.stepDown() in the mongo shell to step down the primary
and force the set to failover (page 560). rs.stepDown() expedites the failover procedure and is preferable to
shutting down the primary directly.
Step 3: Upgrade the primary. When rs.status() shows that the primary has stepped down and another member has assumed PRIMARY state, shut down the previous primary and replace the mongod binary with the 2.6 binary
and start the new instance.
Replica set failover is not instant but will render the set unavailable accept writes until the failover process completes.
Typically this takes 30 seconds or more: schedule the upgrade procedure during a scheduled maintenance window.
Upgrade a Sharded Cluster to 2.6 Only upgrade sharded clusters to 2.6 if all members of the cluster are currently
running instances of 2.4. The only supported upgrade path for sharded clusters running 2.2 is via 2.4. The upgrade
process checks all components of the cluster and will produce warnings if any component is running version 2.2.
436 http://www.mongodb.org/downloads
831
Considerations The upgrade process does not require any downtime. However, while you upgrade the sharded
cluster, ensure that clients do not make changes to the collection meta-data. For example, during the upgrade, do not
do any of the following:
sh.enableSharding()
sh.shardCollection()
sh.addShard()
db.createCollection()
db.collection.drop()
db.dropDatabase()
any operation that creates a database
any other operation that modifies the cluster metadata in any way. See Sharding Reference (page 715) for a complete list of sharding commands. Note, however, that not all commands on the Sharding Reference (page 715)
page modifies the cluster meta-data.
Upgrade Sharded Clusters Optional but Recommended. As a precaution, take a backup of the config database
before upgrading the sharded cluster.
Step 1: Disable the Balancer. Turn off the balancer (page 663) in the sharded cluster, as described in Disable the
Balancer (page 695).
Step 2: Upgrade the clusters meta data. Start a single 2.6 mongos instance with the configDB pointing to the
clusters config servers and with the --upgrade option.
To run a mongos with the --upgrade option, you can upgrade an existing mongos instance to 2.6, or if you need
to avoid reconfiguring a production mongos instance, you can use a new 2.6 mongos that can reach all the config
servers.
To upgrade the meta data, run:
mongos --configdb <configDB string> --upgrade
You can include the --logpath option to output the log messages to a file instead of the standard output. Also
include any other options required to start mongos instances in your cluster, such as --sslOnNormalPorts or
--sslPEMKeyFile.
The mongos will exit upon completion of the --upgrade process.
The upgrade will prevent any chunk moves or splits from occurring during the upgrade process. If the data files have
many sharded collections or if failed processes hold stale locks, acquiring the locks for all collections can take seconds
or minutes. Watch the log for progress updates.
Step 3: Ensure mongos --upgrade process completes successfully. The mongos will exit upon completion
of the meta data upgrade process. If successful, the process will log the following messages:
upgrade of config server to v5 successful
Config database is at version v5
After a successful upgrade, restart the mongos instance. If mongos fails to start, check the log for more information.
If the mongos instance loses its connection to the config servers during the upgrade or if the upgrade is otherwise
unsuccessful, you may always safely retry the upgrade.
832
Step 4: Upgrade the remaining mongos instances to v2.6. Upgrade and restart without the --upgrade option
the other mongos instances in the sharded cluster. After upgrading all the mongos, see Complete Sharded Cluster
Upgrade (page 833) for information on upgrading the other cluster components.
Complete Sharded Cluster Upgrade After you have successfully upgraded all mongos instances, you can upgrade
the other instances in your MongoDB deployment.
Warning: Do not upgrade mongod instances until after you have upgraded all mongos instances.
While the balancer is still disabled, upgrade the components of your sharded cluster in the following order:
Upgrade all 3 mongod config server instances, leaving the first system in the mongos --configdb argument to upgrade last.
Upgrade each shard, one at a time, upgrading the mongod secondaries before running replSetStepDown
and upgrading the primary of each shard.
When this process is complete, re-enable the balancer (page 696).
Upgrade Procedure Once upgraded to MongoDB 2.6, you cannot downgrade to any version earlier than MongoDB
2.4. If you have text or 2dsphere indexes, you can only downgrade to MongoDB 2.4.10 or later.
Except as described on this page, moving between 2.4 and 2.6 is a drop-in replacement:
Step 1: Stop the existing mongod instance. For example, on Linux, run 2.4 mongod with the --shutdown
option as follows:
mongod --dbpath /var/mongod/data --shutdown
Replace /var/mongod/data with your MongoDB dbPath. See also the Stop mongod Processes (page 223) for
alternate methods of stopping a mongod instance.
Step 2: Start the new mongod instance. Ensure you start the 2.6 mongod with the same dbPath:
mongod --dbpath /var/mongod/data
833
Timing Because downgrades are more difficult after you upgrade the user authorization model, once you upgrade
the MongoDB binaries to version 2.6, allow your MongoDB deployment to run a day or two without upgrading the
user authorization model.
This allows 2.6 some time to burn in and decreases the likelihood of downgrades occurring after the user privilege
model upgrade. The user authentication and access control will continue to work as it did in 2.4, but it will be
impossible to create or modify users or to use user-defined roles until you run the authorization upgrade.
If you decide to upgrade the user authorization model immediately instead of waiting the recommended burn in
period, then for sharded clusters, you must wait at least 10 seconds after upgrading the sharded clusters to run the
authorization upgrade script.
Replica Sets For a replica set, it is only necessary to run the upgrade process on the primary as the changes will
automatically replicate to the secondaries.
Sharded Clusters For a sharded cluster, connect to a mongos and run the upgrade procedure to upgrade the clusters
authorization data. By default, the procedure will upgrade the authorization data of the shards as well.
To override this behavior, run the upgrade command with the additional parameter upgradeShards: false. If
you choose to override, you must run the upgrade procedure on the mongos first, and then run the procedure on the
primary members of each shard.
For a sharded cluster, do not run the upgrade process directly against the config servers (page 650). Instead, perform
the upgrade process using one mongos instance to interact with the config database.
Requirements To upgrade the authorization model, you must have a user in the admin database with the role
userAdminAnyDatabase (page 396).
Procedure
Step 1: Connect to MongoDB instance. Connect and authenticate to the mongod instance for a single deployment
or a mongos for a sharded cluster as an admin database user with the role userAdminAnyDatabase (page 396).
Step 2: Upgrade authorization schema. Use the authSchemaUpgrade command in the admin database to
update the user data using the mongo shell.
Run authSchemaUpgrade command.
db.getSiblingDB("admin").runCommand({authSchemaUpgrade: 1 });
If you override the behavior, after running authSchemaUpgrade on a mongos instance, you will need to connect
to the primary for each shard and repeat the upgrade process after upgrading on the mongos.
834
Result All users in a 2.6 system are stored in the admin.system.users (page 287) collection. To manipulate
these users, use the user management methods.
The
upgrade
procedure
copies
admin.system.backup_users.
the
version
2.4
admin.system.users
collection
to
The upgrade procedure leaves the version 2.4 <database>.system.users collection(s) intact.
Downgrade MongoDB from 2.6 Before you attempt any downgrade, familiarize yourself with the content of this
document, particularly the Downgrade Recommendations and Checklist (page 835) and the procedure for downgrading
sharded clusters (page 839).
Downgrade Recommendations and Checklist When downgrading, consider the following:
Downgrade Path Once upgraded to MongoDB 2.6, you cannot downgrade to any version earlier than MongoDB
2.4. If you created text or 2dsphere indexes while running 2.6, you can only downgrade to MongoDB 2.4.10 or
later.
Preparedness
Remove or downgrade version 2 text indexes (page 838) before downgrading MongoDB 2.6 to 2.4.
Remove or downgrade version 2 2dsphere indexes (page 838) before downgrading MongoDB 2.6 to 2.4.
Downgrade 2.6 User Authorization Model (page 835). If you have upgraded to the 2.6 user authorization model,
you must downgrade the user model to 2.4 before downgrading MongoDB 2.6 to 2.4.
Procedures Follow the downgrade procedures:
To downgrade sharded clusters, see Downgrade a 2.6 Sharded Cluster (page 839).
To downgrade replica sets, see Downgrade a 2.6 Replica Set (page 839).
To downgrade a standalone MongoDB instance, see Downgrade 2.6 Standalone mongod Instance (page 839).
Downgrade 2.6 User Authorization Model If you have upgraded to the 2.6 user authorization model, you must
first downgrade the user authorization model to 2.4 before before downgrading MongoDB 2.6 to 2.4.
Considerations
For a replica set, it is only necessary to run the downgrade process on the primary as the changes will automatically replicate to the secondaries.
For sharded clusters, although the procedure lists the downgrade of the clusters authorization data first, you
may downgrade the authorization data of the cluster or shards first.
You must have the admin.system.backup_users and admin.system.new_users collections created during the upgrade process.
Important. The downgrade process returns the user data to its state prior to upgrading to 2.6 authorization
model. Any changes made to the user/role data using the 2.6 users model will be lost.
835
Access Control Prerequisites To downgrade the authorization model, you must connect as a user with the following
privileges:
{
{
{
{
resource:
resource:
resource:
resource:
{
{
{
{
db:
db:
db:
db:
"admin",
"admin",
"admin",
"admin",
collection:
collection:
collection:
collection:
If no user exists with the appropriate privileges, create an authorization model downgrade user:
Step 1: Connect as user with privileges to manage users and roles.
userAdminAnyDatabase (page 396).
Step 2: Create a role with required privileges. Using the db.createRole method, create a role (page 313)
with the required privileges.
use admin
db.createRole(
{
role: "downgradeAuthRole",
privileges: [
{ resource: { db: "admin",
{ resource: { db: "admin",
{ resource: { db: "admin",
{ resource: { db: "admin",
],
roles: [ ]
}
)
collection:
collection:
collection:
collection:
Step 3: Create a user with the new role. Create a user and assign the user the downgradeRole.
use admin
db.createUser(
{
user: "downgradeAuthUser",
pwd: "somePass123",
roles: [ { role: "downgradeAuthRole", db: "admin" } ]
}
)
Note:
Instead of creating a new user, you can also grant the role to an existing user.
db.grantRolesToUser() method.
See
Step 4: Authenticate as the new user. Authenticate as the newly created user.
use admin
db.auth( "downgradeAuthUser", "somePass123" )
836
Step 1: Connect and authenticate to MongoDB instance. Connect and authenticate to the mongod instance for a
single deployment or a mongos for a sharded cluster with the appropriate privileges. See Access Control Prerequisites
(page 836) for details.
Step 2:
Create backup of 2.6 admin.system.users collection. Copy all documents in the
admin.system.users (page 287) collection to the admin.system.new_users collection:
db.getSiblingDB("admin").system.users.find().forEach( function(userDoc) {
status = db.getSiblingDB("admin").system.new_users.save( userDoc );
if (status.hasWriteError()) {
print(status.writeError);
}
}
);
The method returns a WriteResult object with the status of the operation.
WriteResult object should have "nModified" equal to 1.
The method returns a WriteResult object with the number of documents removed in the "nRemoved" field.
Step 5: Copy documents from the admin.system.backup_users collection. Copy all documents from the
admin.system.backup_users, created during the 2.6 upgrade, to admin.system.users.
db.getSiblingDB("admin").system.backup_users.find().forEach(
function (userDoc) {
status = db.getSiblingDB("admin").system.users.insert( userDoc );
if (status.hasWriteError()) {
print(status.writeError);
}
}
);
For a sharded cluster, repeat the downgrade process by connecting to the primary replica set member for each shard.
Note: The clusters mongos instances will fail to detect the authorization model downgrade until the user cache
is refreshed. You can run invalidateUserCache on each mongos instance to refresh immediately, or you can
wait until the cache is refreshed automatically at the end of the user cache invalidation interval. To
837
run invalidateUserCache, you must have privilege with invalidateUserCache (page 405) action, which
is granted by userAdminAnyDatabase (page 396) and hostManager (page 394) roles.
Result The downgrade process returns the user data to its state prior to upgrading to 2.6 authorization model. Any
changes made to the user/role data using the 2.6 users model will be lost.
Downgrade Updated Indexes
Text Index Version Check If you have version 2 text indexes (i.e. the default version for text indexes in MongoDB
2.6), drop the version 2 text indexes before downgrading MongoDB. After the downgrade, enable text search and
recreate the dropped text indexes.
To determine the version of your text indexes, run db.collection.getIndexes() to view index specifications. For text indexes, the method returns the version information in the field textIndexVersion. For example,
the following shows that the text index on the quotes collection is version 2.
{
"v" : 1,
"key" : {
"_fts" : "text",
"_ftsx" : 1
},
"name" : "quote_text_translation.quote_text",
"ns" : "test.quotes",
"weights" : {
"quote" : 1,
"translation.quote" : 1
},
"default_language" : "english",
"language_override" : "language",
"textIndexVersion" : 2
}
2dsphere Index Version Check If you have version 2 2dsphere indexes (i.e. the default version for 2dsphere
indexes in MongoDB 2.6), drop the version 2 2dsphere indexes before downgrading MongoDB. After the downgrade, recreate the 2dsphere indexes.
To determine the version of your 2dsphere indexes, run db.collection.getIndexes() to view
index specifications.
For 2dsphere indexes, the method returns the version information in the field
2dsphereIndexVersion. For example, the following shows that the 2dsphere index on the locations
collection is version 2.
{
"v" : 1,
"key" : {
"geo" : "2dsphere"
},
"name" : "geo_2dsphere",
"ns" : "test.locations",
"sparse" : true,
"2dsphereIndexVersion" : 2
}
838
rs.stepDown() expedites the failover procedure and is preferable to shutting down the primary directly.
Step 3: Replace and restart former primary mongod. When rs.status() shows that the primary has stepped
down and another member has assumed PRIMARY state, shut down the previous primary and replace the mongod
binary with the 2.4 binary and start the new instance.
Replica set failover is not instant but will render the set unavailable to writes and interrupt reads until the failover process completes. Typically this takes 10 seconds or more. You may wish to plan the downgrade during a predetermined
maintenance window.
Downgrade a 2.6 Sharded Cluster
Requirements While the downgrade is in progress, you cannot make changes to the collection meta-data. For
example, during the downgrade, do not do any of the following:
sh.enableSharding()
sh.shardCollection()
sh.addShard()
437 http://www.mongodb.org/downloads
839
db.createCollection()
db.collection.drop()
db.dropDatabase()
any operation that creates a database
any other operation that modifies the cluster meta-data in any way. See Sharding Reference (page 715) for a complete list of sharding commands. Note, however, that not all commands on the Sharding Reference (page 715)
page modifies the cluster meta-data.
Procedure The downgrade procedure for a sharded cluster reverses the order of the upgrade procedure.
1. Turn off the balancer (page 663) in the sharded cluster, as described in Disable the Balancer (page 695).
2. Downgrade each shard, one at a time. For each shard,
(a) Downgrade the mongod secondaries before downgrading the primary.
(b) To downgrade the primary, run replSetStepDown and downgrade.
3. Downgrade all 3 mongod config server instances, leaving the first system in the mongos --configdb
argument to downgrade last.
4. Downgrade and restart each mongos, one at a time. The downgrade process is a binary drop-in replacement.
5. Turn on the balancer, as described in Enable the Balancer (page 696).
Downgrade Procedure Once upgraded to MongoDB 2.6, you cannot downgrade to any version earlier than MongoDB 2.4. If you have text or 2dsphere indexes, you can only downgrade to MongoDB 2.4.10 or later.
Except as described on this page, moving between 2.4 and 2.6 is a drop-in replacement:
Step 1: Stop the existing mongod instance. For example, on Linux, run 2.6 mongod with the --shutdown
option as follows:
mongod --dbpath /var/mongod/data --shutdown
Replace /var/mongod/data with your MongoDB dbPath. See also the Stop mongod Processes (page 223) for
alternate methods of stopping a mongod instance.
Step 2: Start the new mongod instance. Ensure you start the 2.4 mongod with the same dbPath:
mongod --dbpath /var/mongod/data
840
Other Resources
2.4.13 - Changes
Security: Enforce BSON BinData length validation (SERVER-17278441 )
Security: Disable SSLv3 ciphers (SERVER-15673442 )
Networking: Improve BSON validation (SERVER-17264443 )
2.4.12 - Changes
Sharding: Sharded connection cleanup on setup error can crash mongos (SERVER-15056444 )
Sharding: type 7 (OID) error when acquiring distributed lock for first time (SERVER-13616445 )
Storage: explicitly zero .ns files on creation (SERVER-15369446 )
Storage: partially written journal last section causes recovery to fail (SERVER-15111447 )
2.4.11 - Changes
Security: Potential information leak (SERVER-14268448 )
Replication: _id with $prefix field causes replication failure due to unvalidated insert (SERVER-12209449 )
Sharding: Invalid access: seg fault in SplitChunkCommand::run (SERVER-14342450 )
Indexing: Creating descending index on _id can corrupt namespace (SERVER-14833451 )
439 https://jira.mongodb.org/secure/IssueNavigator.jspa?reset=true&jqlQuery=project+%3D+SERVER+AND+fixVersion+in+%28%222.5.0%22%2C+%222.5.1%22%2
rc1%22%2C+%222.6.0-rc2%22%2C+%222.6.0-rc3%22%29
440 https://github.com/mongodb/mongo/blob/v2.6/distsrc/THIRD-PARTY-NOTICES
441 https://jira.mongodb.org/browse/SERVER-17278
442 https://jira.mongodb.org/browse/SERVER-15673
443 https://jira.mongodb.org/browse/SERVER-17264
444 https://jira.mongodb.org/browse/SERVER-15056
445 https://jira.mongodb.org/browse/SERVER-13616
446 https://jira.mongodb.org/browse/SERVER-15369
447 https://jira.mongodb.org/browse/SERVER-15111
448 https://jira.mongodb.org/browse/SERVER-14268
449 https://jira.mongodb.org/browse/SERVER-12209
450 https://jira.mongodb.org/browse/SERVER-14342
451 https://jira.mongodb.org/browse/SERVER-14833
841
Text Search: Updates to documents with text-indexed fields may lead to incorrect entries (SERVER-14738452 )
Build: Add SCons flag to override treating all warnings as errors (SERVER-13724453 )
Packaging: Fix mongodb enterprise 2.4 init script to allow multiple processes per host (SERVER-14336454 )
JavaScript: Do not store native function pointer as a property in function prototype (SERVER-14254455 )
2.4.10 - Changes
Indexes: Fixed issue that can cause index corruption when building indexes concurrently (SERVER-12990456 )
Indexes: Fixed issue that can cause index corruption when shutting down secondary node during index build
(SERVER-12956457 )
Indexes: Mongod now recognizes incompatible future text and geo index versions and exits gracefully
(SERVER-12914458 )
Indexes: Fixed issue that can cause secondaries to fail replication when building the same index multiple times
concurrently (SERVER-12662459 )
Indexes: Fixed issue that can cause index corruption on the tenth index in a collection if the index build fails
(SERVER-12481460 )
Indexes: Introduced versioning for text and geo indexes to ensure backwards compatibility (SERVER-12175461 )
Indexes: Disallowed building indexes on the system.indexes collection, which can lead to initial sync failure on
secondaries (SERVER-10231462 )
Sharding: Avoid frequent immediate balancer retries when config servers are out of sync (SERVER-12908463 )
Sharding: Add indexes to locks collection on config servers to avoid long queries in case of large numbers of
collections (SERVER-12548464 )
Sharding: Fixed issue that can corrupt the config metadata cache when sharding collections concurrently
(SERVER-12515465 )
Sharding: Dont move chunks created on collections with a hashed shard key if the collection already contains
data (SERVER-9259466 )
Replication: Fixed issue where node appears to be down in a replica set during a compact operation (SERVER12264467 )
Replication: Fixed issue that could cause delays in elections when a node is not vetoing an election (SERVER12170468 )
452 https://jira.mongodb.org/browse/SERVER-14738
453 https://jira.mongodb.org/browse/SERVER-13724
454 https://jira.mongodb.org/browse/SERVER-14336
455 https://jira.mongodb.org/browse/SERVER-14254
456 https://jira.mongodb.org/browse/SERVER-12990
457 https://jira.mongodb.org/browse/SERVER-12956
458 https://jira.mongodb.org/browse/SERVER-12914
459 https://jira.mongodb.org/browse/SERVER-12662
460 https://jira.mongodb.org/browse/SERVER-12481
461 https://jira.mongodb.org/browse/SERVER-12175
462 https://jira.mongodb.org/browse/SERVER-10231
463 https://jira.mongodb.org/browse/SERVER-12908
464 https://jira.mongodb.org/browse/SERVER-12548
465 https://jira.mongodb.org/browse/SERVER-12515
466 https://jira.mongodb.org/browse/SERVER-9259
467 https://jira.mongodb.org/browse/SERVER-12264
468 https://jira.mongodb.org/browse/SERVER-12170
842
Replication: Step down all primaries if multiple primaries are detected in replica set to ensure correct election
result (SERVER-10793469 )
Replication: Upon clock skew detection, secondaries will switch to sync directly from the primary to avoid sync
cycles (SERVER-8375470 )
Runtime: The SIGXCPU signal is now caught and mongod writes a log message and exits gracefully (SERVER12034471 )
Runtime: Fixed issue where mongod fails to start on Linux when /sys/dev/block directory is not readable
(SERVER-9248472 )
Windows: No longer zero-fill newly allocated files on systems other than Windows 7 or Windows Server 2008
R2 (SERVER-8480473 )
GridFS: Chunk size is decreased to 255 KB (from 256 KB) to avoid overhead with usePowerOf2Sizes option
(SERVER-13331474 )
SNMP: Fixed MIB file validation under smilint (SERVER-12487475 )
Shell: Fixed issue in V8 memory allocation that could cause long-running shell commands to crash (SERVER11871476 )
Shell: Fixed memory leak in the md5sumFile shell utility method (SERVER-11560477 )
Previous Releases
All 2.4.9 improvements478 .
All 2.4.8 improvements479 .
All 2.4.7 improvements480 .
All 2.4.6 improvements481 .
All 2.4.5 improvements482 .
All 2.4.4 improvements483 .
All 2.4.3 improvements484 .
All 2.4.2 improvements485
All 2.4.1 improvements486 .
469 https://jira.mongodb.org/browse/SERVER-10793
470 https://jira.mongodb.org/browse/SERVER-8375
471 https://jira.mongodb.org/browse/SERVER-12034
472 https://jira.mongodb.org/browse/SERVER-9248
473 https://jira.mongodb.org/browse/SERVER-8480
474 https://jira.mongodb.org/browse/SERVER-13331
475 https://jira.mongodb.org/browse/SERVER-12487
476 https://jira.mongodb.org/browse/SERVER-11871
477 https://jira.mongodb.org/browse/SERVER-11560
478 https://jira.mongodb.org/issues/?jql=fixVersion%20%3D%20%222.4.9%22%20AND%20project%20%3D%20SERVER
479 https://jira.mongodb.org/issues/?jql=fixVersion%20%3D%20%222.4.8%22%20AND%20project%20%3D%20SERVER
480 https://jira.mongodb.org/issues/?jql=fixVersion%20%3D%20%222.4.7%22%20AND%20project%20%3D%20SERVER
481 https://jira.mongodb.org/issues/?jql=fixVersion%20%3D%20%222.4.6%22%20AND%20project%20%3D%20SERVER
482 https://jira.mongodb.org/issues/?jql=fixVersion%20%3D%20%222.4.5%22%20AND%20project%20%3D%20SERVER
483 https://jira.mongodb.org/issues/?jql=fixVersion%20%3D%20%222.4.4%22%20AND%20project%20%3D%20SERVER
484 https://jira.mongodb.org/issues/?jql=fixVersion%20%3D%20%222.4.3%22%20AND%20project%20%3D%20SERVER
485 https://jira.mongodb.org/issues/?jql=fixVersion%20%3D%20%222.4.2%22%20AND%20project%20%3D%20SERVER
486 https://jira.mongodb.org/issues/?jql=fixVersion%20%3D%20%222.4.1%22%20AND%20project%20%3D%20SERVER
843
844
Fix for instances where mongos incorrectly reports a successful write SERVER-12146506 .
Make non-primary read preferences consistent with slaveOK versioning logic SERVER-11971507 .
Allow new sharded cluster connections to read from secondaries when primary is down SERVER-7246508 .
All 2.4.9 improvements509 .
2.4.8 November 1, 2013
845
Fix for possible loss of documents during the chunk migration process if a document in the chunk is very large
SERVER-10478518 .
Fix for C++ client shutdown issues SERVER-8891519 .
Improved replication robustness in presence of high network latency SERVER-10085520 .
Improved Solaris support SERVER-9832521 , SERVER-9786522 , and SERVER-7080523 .
All 2.4.6 improvements524 .
2.4.5 July 3, 2013
Fix for CVE-2013-4650 Improperly grant user system privileges on databases other than local SERVER9983525 .
Fix for CVE-2013-3969 Remotely triggered segmentation fault in Javascript engine SERVER-9878526 .
Fix to prevent identical background indexes from being built SERVER-9856527 .
Config server performance improvements SERVER-9864528 and SERVER-5442529 .
Improved initial sync resilience to network failure SERVER-9853530 .
All 2.4.5 improvements531 .
2.4.4 June 4, 2013
Fix for mongo shell ignoring modified objects _id field SERVER-9385536 .
518 https://jira.mongodb.org/browse/SERVER-10478
519 https://jira.mongodb.org/browse/SERVER-8891
520 https://jira.mongodb.org/browse/SERVER-10085
521 https://jira.mongodb.org/browse/SERVER-9832
522 https://jira.mongodb.org/browse/SERVER-9786
523 https://jira.mongodb.org/browse/SERVER-7080
524 https://jira.mongodb.org/issues/?jql=fixVersion%20%3D%20%222.4.6%22%20AND%20project%20%3D%20SERVER
525 https://jira.mongodb.org/browse/SERVER-9983
526 https://jira.mongodb.org/browse/SERVER-9878
527 https://jira.mongodb.org/browse/SERVER-9856
528 https://jira.mongodb.org/browse/SERVER-9864
529 https://jira.mongodb.org/browse/SERVER-5442
530 https://jira.mongodb.org/browse/SERVER-9853
531 https://jira.mongodb.org/issues/?jql=fixVersion%20%3D%20%222.4.5%22%20AND%20project%20%3D%20SERVER
532 https://jira.mongodb.org/browse/SERVER-9721
533 https://jira.mongodb.org/browse/SERVER-9661
534 https://jira.mongodb.org/browse/SERVER-8813
535 https://jira.mongodb.org/issues/?jql=fixVersion%20%3D%20%222.4.4%22%20AND%20project%20%3D%20SERVER
536 https://jira.mongodb.org/browse/SERVER-9385
846
Add support for text search of content in MongoDB databases as a beta feature. See Text Indexes (page 486) for more
information.
Geospatial Support Enhancements
Add new 2dsphere index (page 478). The new index supports GeoJSON547 objects Point, LineString, and
Polygon. See 2dsphere Indexes (page 478) and Geospatial Indexes and Queries (page 476).
Introduce operators $geometry, $geoWithin and $geoIntersects to work with the GeoJSON data.
Hashed Index
Add new hashed index (page 487) to index documents using hashes of field values. When used to index a shard key,
the hashed index ensures an evenly distributed shard key. See also Hashed Shard Keys (page 654).
537 https://jira.mongodb.org/browse/SERVER-4739
538 https://jira.mongodb.org/browse/SERVER-9093
539 https://jira.mongodb.org/issues/?jql=fixVersion%20%3D%20%222.4.3%22%20AND%20project%20%3D%20SERVER
540 https://jira.mongodb.org/browse/SERVER-9267
541 https://jira.mongodb.org/browse/SERVER-9230
542 https://jira.mongodb.org/browse/SERVER-9125
543 https://jira.mongodb.org/browse/SERVER-9014
544 https://jira.mongodb.org/issues/?jql=fixVersion%20%3D%20%222.4.2%22%20AND%20project%20%3D%20SERVER
545 https://jira.mongodb.org/browse/SERVER-9087
546 https://jira.mongodb.org/issues/?jql=fixVersion%20%3D%20%222.4.1%22%20AND%20project%20%3D%20SERVER
547 http://geojson.org/geojson-spec.html
847
Improve support for geospatial queries. See the $geoWithin operator and the $geoNear pipeline stage.
Improve sort efficiency when the $sort stage immediately precedes a $limit in the pipeline.
Add new operators $millisecond and $concat and modify how $min operator processes null values.
Changes to Update Operators
The mapReduce command, group command, and the $where operator expressions cannot access certain global
functions or properties, such as db, that are available in the mongo shell. See the individual command or operator for
details.
Improvements to serverStatus Command
Provide additional metrics and customization for the serverStatus command. See db.serverStatus() and
serverStatus for more information.
Security Enhancements
Introduce a role-based access control system User Privileges548 now use a new format for Privilege
Documents.
Enforce uniqueness of the user in user privilege documents per database. Previous versions of MongoDB did
not enforce this requirement, and existing databases may have duplicates.
Support encrypted connections using SSL certificates signed by a Certificate Authority. See Configure mongod
and mongos for SSL (page 331).
For more information on security and risk management strategies, see MongoDB Security Practices and Procedures
(page 305).
Performance Improvements
V8 JavaScript Engine
JavaScript Changes in MongoDB 2.4 Consider the following impacts of V8 JavaScript Engine (page 848) in MongoDB 2.4:
Tip
Use the new interpreterVersion() method in the mongo shell and the javascriptEngine field in the
output of db.serverBuildInfo() to determine which JavaScript engine a MongoDB binary uses.
548 http://docs.mongodb.org/v2.4/reference/user-privileges
848
Improved Concurrency Previously, MongoDB operations that required the JavaScript interpreter had to acquire
a lock, and a single mongod could only run a single JavaScript operation at a time. The switch to V8 improves
concurrency by permitting multiple JavaScript operations to run at the same time.
Modernized JavaScript Implementation (ES5) The 5th edition of ECMAscript549 , abbreviated as ES5, adds many
new language features, including:
standardized JSON550 ,
strict mode551 ,
function.bind()552 ,
array extensions553 , and
getters and setters.
With V8, MongoDB supports the ES5 implementation of Javascript with the following exceptions.
Note: The following features do not work as expected on documents returned from MongoDB queries:
Object.seal() throws an exception on documents returned from MongoDB queries.
Object.freeze() throws an exception on documents returned from MongoDB queries.
Object.preventExtensions() incorrectly allows the addition of new properties on documents returned
from MongoDB queries.
enumerable properties, when added to documents returned from MongoDB queries, are not saved during
write operations.
See SERVER-8216554 , SERVER-8223555 , SERVER-8215556 , and SERVER-8214557 for more information.
For objects that have not been returned from MongoDB queries, the features work as expected.
Removed Non-Standard SpiderMonkey Features V8 does not support the following non-standard SpiderMonkey558 JavaScript extensions, previously supported by MongoDBs use of SpiderMonkey as its JavaScript engine.
E4X Extensions V8 does not support the non-standard E4X559 extensions. E4X provides a native XML560 object
to the JavaScript language and adds the syntax for embedding literal XML documents in JavaScript code.
You need to use alternative XML processing if you used any of the following constructors/methods:
XML()
Namespace()
QName()
549 http://www.ecma-international.org/publications/standards/Ecma-262.htm
550 http://www.ecma-international.org/ecma-262/5.1/#sec-15.12.1
551 http://www.ecma-international.org/ecma-262/5.1/#sec-4.2.2
552 http://www.ecma-international.org/ecma-262/5.1/#sec-15.3.4.5
553 http://www.ecma-international.org/ecma-262/5.1/#sec-15.4.4.16
554 https://jira.mongodb.org/browse/SERVER-8216
555 https://jira.mongodb.org/browse/SERVER-8223
556 https://jira.mongodb.org/browse/SERVER-8215
557 https://jira.mongodb.org/browse/SERVER-8214
558 https://developer.mozilla.org/en-US/docs/SpiderMonkey
559 https://developer.mozilla.org/en-US/docs/E4X
560 https://developer.mozilla.org/en-US/docs/E4X/Processing_XML_with_E4X
849
XMLList()
isXMLName()
Destructuring Assignment V8 does not support the non-standard destructuring assignments. Destructuring assignment extract[s] data from arrays or objects using a syntax that mirrors the construction of array and object literals. Mozilla docs561
Example
The following destructuring assignment is invalid with V8 and throws a SyntaxError:
original = [4, 8, 15];
var [b, ,c] = a; // <== destructuring assignment
print(b) // 4
print(c) // 15
Iterator(), StopIteration(), and Generators V8 does not support Iterator(), StopIteration(), and generators562 .
InternalError() V8 does not support InternalError(). Use Error() instead.
for each...in Construct V8 does not support the use of for each...in563 construct. Use for (var x in
y) construct instead.
Example
The following for each (var x in y) construct is invalid with V8:
var o = { name: 'MongoDB', version: 2.4 };
for each (var value in o) {
print(value);
}
Instead, in version 2.4, you can use the for (var x in y) construct:
var o = { name: 'MongoDB', version: 2.4 };
for (var prop in o) {
var value = o[prop];
print(value);
}
You can also use the array instance method forEach() with the ES5 method Object.keys():
Object.keys(o).forEach(function (key) {
var value = o[key];
print(value);
});
561 https://developer.mozilla.org/en-US/docs/JavaScript/New_in_JavaScript/1.7#Destructuring_assignment_(Merge_into_own_page.2Fsection)
562 https://developer.mozilla.org/en-US/docs/JavaScript/Guide/Iterators_and_Generators
563 https://developer.mozilla.org/en-US/docs/JavaScript/Reference/Statements/for_each...in
850
Instead, you can implement using the Array instance method forEach() and the ES5 method Object.keys()
:
var a = { w: 1, x: 2, y: 3, z: 4 }
var arr = [];
Object.keys(a).forEach(function (key) {
var val = a[key];
if (val > 2) arr.push(val * val);
})
printjson(arr)
Note:
The new logic uses the Array instance method forEach() and not the generic method
Array.forEach(); V8 does not support Array generic methods. See Array Generic Methods (page 853) for
more information.
Multiple Catch Blocks V8 does not support multiple catch blocks and will throw a SyntaxError.
Example
The following multiple catch blocks is invalid with V8 and will throw "SyntaxError:
if":
Unexpected token
try {
something()
} catch (err if err instanceof SomeError) {
print('some error')
} catch (err) {
print('standard error')
}
Conditional Function Definition V8 will produce different outcomes than SpiderMonkey with conditional function
definitions565 .
Example
The following conditional function definition produces different outcomes in SpiderMonkey versus V8:
function test () {
if (false) {
564 https://developer.mozilla.org/en-US/docs/JavaScript/Guide/Predefined_Core_Objects#Array_comprehensions
565 https://developer.mozilla.org/en-US/docs/JavaScript/Guide/Functions
851
function go () {};
}
print(typeof go)
}
With SpiderMonkey, the conditional function outputs undefined, whereas with V8, the conditional function outputs
function.
If your code defines functions this way, it is highly recommended that you refactor the code. The following example
refactors the conditional function definition to work in both SpiderMonkey and V8.
function test () {
var go;
if (false) {
go = function () {}
}
print(typeof go)
}
SyntaxError: In strict mode code, functions can only be declared at top level or immediately within a
String Generic Methods V8 does not support String generics567 . String generics are a set of methods on the
String class that mirror instance methods.
Example
The following use of the generic method String.toLowerCase() is invalid with V8:
var name = 'MongoDB';
var lower = String.toLowerCase(name);
With V8, use the String instance method toLowerCase() available through an instance of the String class
instead:
var name = 'MongoDB';
var lower = name.toLowerCase();
print(name + ' becomes ' + lower);
566 http://www.nczonline.net/blog/2012/03/13/its-time-to-start-using-javascript-strict-mode/
567 https://developer.mozilla.org/en-US/docs/JavaScript/Reference/Global_Objects/String#String_generic_methods
852
With V8, use the String instance methods instead of following generic methods:
String.charAt()
String.charCodeAt()
String.concat()
String.endsWith()
String.indexOf()
String.lastIndexOf()
String.localeCompare()
String.match()
String.quote()
String.replace()
String.search()
String.slice()
String.split()
String.startsWith()
String.substr()
String.substring()
String.toLocaleLowerCase()
String.toLocaleUpperCase()
String.toLowerCase()
String.toUpperCase()
String.trim()
String.trimLeft()
String.trimRight()
Array Generic Methods V8 does not support Array generic methods568 . Array generics are a set of methods on the
Array class that mirror instance methods.
Example
The following use of the generic method Array.every() is invalid with V8:
var arr = [4, 8, 15, 16, 23, 42];
function isEven (val) {
return 0 === val % 2;
}
var allEven = Array.every(arr, isEven);
print(allEven);
With V8, use the Array instance method every() available through an instance of the Array class instead:
var allEven = arr.every(isEven);
print(allEven);
With V8, use the Array instance methods instead of the following generic methods:
Array.concat()
Array.every()
Array.filter()
Array.forEach()
Array.indexOf()
Array.join()
Array.lastIndexOf()
Array.map()
Array.pop()
Array.push()
Array.reverse()
Array.shift()
Array.slice()
Array.some()
Array.sort()
Array.splice()
Array.unshift()
Array Instance Method toSource() V8 does not support the Array instance method toSource()569 . Use the
Array instance method toString() instead.
uneval() V8 does not support the non-standard method uneval(). Use the standardized JSON.stringify()570
method instead.
Change default JavaScript engine from SpiderMonkey to V8. The change provides improved concurrency for
JavaScript operations, modernized JavaScript implementation, and the removal of non-standard SpiderMonkey features, and affects all JavaScript behavior including the commands mapReduce, group, and eval and the query
operator $where.
568 https://developer.mozilla.org/en-US/docs/JavaScript/Reference/Global_Objects/Array#Array_generic_methods
569 https://developer.mozilla.org/en-US/docs/JavaScript/Reference/Global_Objects/Array/toSource
570 https://developer.mozilla.org/en-US/docs/JavaScript/Reference/Global_Objects/JSON/stringify
853
See JavaScript Changes in MongoDB 2.4 (page 848) for more information about all changes .
BSON Document Validation Enabled by Default for mongod and mongorestore
Enable basic BSON object validation for mongod and mongorestore when writing to MongoDB data files. See
wireObjectCheck for details.
Index Build Enhancements
Add support for multiple concurrent index builds in the background by a single mongod instance. See building
indexes in the background (page 493) for more information on background index builds.
Allow the db.killOp() method to terminate a foreground index build.
Improve index validation during index creation. See Compatibility and Index Type Changes in MongoDB 2.4
(page 862) for more information.
Set Parameters as Command Line Options
Provide --setParameter as a command line option for mongos and mongod. See mongod and mongos for
list of available options for setParameter.
Changed Replication Behavior for Chunk Migration
By default, each document move during chunk migration (page 664) in a sharded cluster propagates to at least one
secondary before the balancer proceeds with its next operation. See Chunk Migration and Replication (page 666).
Improved Chunk Migration Queue Behavior
Increase performance for moving multiple chunks off an overloaded shard. The balancer no longer waits for the
current migrations delete phase to complete before starting the next chunk migration. See Chunk Migration Queuing
(page 665) for details.
Enterprise
The following changes are specific to MongoDB Enterprise Editions:
SASL Library Change
In 2.4.4, MongoDB Enterprise uses Cyrus SASL. Earlier 2.4 Enterprise versions use GNU SASL (libgsasl). To
upgrade to 2.4.4 MongoDB Enterprise or greater, you must install all package dependencies related to this change,
including the appropriate Cyrus SASL GSSAPI library. See Install MongoDB Enterprise (page 29) for details of the
dependencies.
854
In 2.4, the MongoDB Enterprise now supports authentication via a Kerberos mechanism. See Configure MongoDB
with Kerberos Authentication on Linux (page 359) for more information. For drivers that provide support for Kerberos
authentication to MongoDB, refer to Driver Support (page 320).
For more information on security and risk management strategies, see MongoDB Security Practices and Procedures
(page 305).
Additional Information
Platform Notes
For OS X, MongoDB 2.4 only supports OS X versions 10.6 (Snow Leopard) and later. There are no other platform
support changes in MongoDB 2.4. See the downloads page571 for more information on platform support.
Upgrade Process
Upgrade MongoDB to 2.4 In the general case, the upgrade from MongoDB 2.2 to 2.4 is a binary-compatible dropin upgrade: shut down the mongod instances and replace them with mongod instances running 2.4. However, before
you attempt any upgrade please familiarize yourself with the content of this document, particularly the procedure for
upgrading sharded clusters (page 856) and the considerations for reverting to 2.2 after running 2.4 (page 860).
Upgrade Recommendations and Checklist When upgrading, consider the following:
For all deployments using authentication, upgrade the drivers (i.e. client libraries), before upgrading the
mongod instance or instances.
To upgrade to 2.4 sharded clusters must upgrade following the meta-data upgrade procedure (page 856).
If youre using 2.2.0 and running with authorization enabled, you will need to upgrade first to 2.2.1
and then upgrade to 2.4. See Rolling Upgrade Limitation for 2.2.0 Deployments Running with auth Enabled
(page 860).
If you have system.users documents (i.e. for authorization) that you created before 2.4 you must
ensure that there are no duplicate values for the user field in the system.users collection in any database.
If you do have documents with duplicate user fields, you must remove them before upgrading.
See Security Enhancements (page 848) for more information.
Upgrade Standalone mongod Instance to MongoDB 2.4
1. Download binaries of the latest release in the 2.4 series from the MongoDB Download Page572 . See Install
MongoDB (page 5) for more information.
2. Shutdown your mongod instance. Replace the existing binary with the 2.4 mongod binary and restart mongod.
Upgrade a Replica Set from MongoDB 2.2 to MongoDB 2.4 You can upgrade to 2.4 by performing a rolling upgrade of the set by upgrading the members individually while the other members are available to minimize downtime.
Use the following procedure:
571 http://www.mongodb.org/downloads/
572 http://www.mongodb.org/downloads
855
1. Upgrade the secondary members of the set one at a time by shutting down the mongod and replacing the 2.2
binary with the 2.4 binary. After upgrading a mongod instance, wait for the member to recover to SECONDARY
state before upgrading the next instance. To check the members state, issue rs.status() in the mongo
shell.
2. Use the mongo shell method rs.stepDown() to step down the primary to allow the normal failover
(page 560) procedure. rs.stepDown() expedites the failover procedure and is preferable to shutting down
the primary directly.
Once the primary has stepped down and another member has assumed PRIMARY state, as observed in the output
of rs.status(), shut down the previous primary and replace mongod binary with the 2.4 binary and start
the new process.
Note: Replica set failover is not instant but will render the set unavailable to read or accept writes until the
failover process completes. Typically this takes 10 seconds or more. You may wish to plan the upgrade during
a predefined maintenance window.
Overview Upgrading a sharded cluster from MongoDB version 2.2 to 2.4 (or 2.3) requires that you run a 2.4
mongos with the --upgrade option, described in this procedure. The upgrade process does not require downtime.
The upgrade to MongoDB 2.4 adds epochs to the meta-data for all collections and chunks in the existing cluster.
MongoDB 2.2 processes are capable of handling epochs, even though 2.2 did not require them. This procedure applies
only to upgrades from version 2.2. Earlier versions of MongoDB do not correctly handle epochs. See Cluster Metadata Upgrade (page 856) for more information.
After completing the meta-data upgrade you can fully upgrade the components of the cluster. With the balancer
disabled:
Upgrade all mongos instances in the cluster.
Upgrade all 3 mongod config server instances.
Upgrade the mongod instances for each shard, one at a time.
See Upgrade Sharded Cluster Components (page 859) for more information.
Cluster Meta-data Upgrade
Considerations Beware of the following properties of the cluster upgrade process:
Before you start the upgrade, ensure that the amount of free space on the filesystem for the config database
(page 716) is at least 4 to 5 times the amount of space currently used by the config database (page 716) data
files.
Additionally, ensure that all indexes in the config database (page 716) are {v:1} indexes. If a critical index is
a {v:0} index, chunk splits can fail due to known issues with the {v:0} format. {v:0} indexes are present
on clusters created with MongoDB 2.0 or earlier.
The duration of the metadata upgrade depends on the network latency between the node performing the upgrade
and the three config servers. Ensure low latency between the upgrade process and the config servers.
856
While the upgrade is in progress, you cannot make changes to the collection meta-data. For example, during the
upgrade, do not perform:
sh.enableSharding(),
sh.shardCollection(),
sh.addShard(),
db.createCollection(),
db.collection.drop(),
db.dropDatabase(),
any operation that creates a database, or
any other operation that modifies the cluster meta-data in any way. See Sharding Reference (page 715) for
a complete list of sharding commands. Note, however, that not all commands on the Sharding Reference
(page 715) page modifies the cluster meta-data.
Once you upgrade to 2.4 and complete the upgrade procedure do not use 2.0 mongod and mongos processes
in your cluster. 2.0 process may re-introduce old meta-data formats into cluster meta-data.
The upgraded config database will require more storage space than before, to make backup and working copies of the
config.chunks (page 718) and config.collections (page 719) collections. As always, if storage requirements increase, the mongod might need to pre-allocate additional data files. See How can I get information on the
storage use of a database? (page 755) for more information.
Meta-data Upgrade Procedure Changes to the meta-data format for sharded clusters, stored in the config database
(page 716), require a special meta-data upgrade procedure when moving to 2.4.
Do not perform operations that modify meta-data while performing this procedure. See Upgrade a Sharded Cluster
from MongoDB 2.2 to MongoDB 2.4 (page 856) for examples of prohibited operations.
1. Before you start the upgrade, ensure that the amount of free space on the filesystem for the config database
(page 716) is at least 4 to 5 times the amount of space currently used by the config database (page 716) data
files.
Additionally, ensure that all indexes in the config database (page 716) are {v:1} indexes. If a critical index is
a {v:0} index, chunk splits can fail due to known issues with the {v:0} format. {v:0} indexes are present
on clusters created with MongoDB 2.0 or earlier.
The duration of the metadata upgrade depends on the network latency between the node performing the upgrade
and the three config servers. Ensure low latency between the upgrade process and the config servers.
To check the version of your indexes, use db.collection.getIndexes().
If any index on the config database is {v:0}, you should rebuild those indexes by connecting to the mongos
and either: rebuild all indexes using the db.collection.reIndex() method, or drop and rebuild specific
indexes using db.collection.dropIndex() and then db.collection.ensureIndex(). If you
need to upgrade the _id index to {v:1} use db.collection.reIndex().
You may have {v:0} indexes on other databases in the cluster.
2. Turn off the balancer (page 663) in the sharded cluster, as described in Disable the Balancer (page 695).
Optional
For additional security during the upgrade, you can make a backup of the config database using mongodump
or other backup tools.
857
3. Ensure there are no version 2.0 mongod or mongos processes still active in the sharded cluster. The automated
upgrade process checks for 2.0 processes, but network availability can prevent a definitive check. Wait 5 minutes
after stopping or upgrading version 2.0 mongos processes to confirm that none are still active.
4. Start a single 2.4 mongos process with configDB pointing to the sharded clusters config servers (page 650)
and with the --upgrade option. The upgrade process happens before the process becomes a daemon (i.e.
before --fork.)
You can upgrade an existing mongos instance to 2.4 or you can start a new mongos instance that can reach all
config servers if you need to avoid reconfiguring a production mongos.
Start the mongos with a command that resembles the following:
mongos --configdb <config servers> --upgrade
Without the --upgrade option 2.4 mongos processes will fail to start until the upgrade process is complete.
The upgrade will prevent any chunk moves or splits from occurring during the upgrade process. If there are
very many sharded collections or there are stale locks held by other failed processes, acquiring the locks for all
collections can take seconds or minutes. See the log for progress updates.
5. When the mongos process starts successfully, the upgrade is complete. If the mongos process fails to start,
check the log for more information.
If the mongos terminates or loses its connection to the config servers during the upgrade, you may always
safely retry the upgrade.
However, if the upgrade failed during the short critical section, the mongos will exit and report that the upgrade will require manual intervention. To continue the upgrade process, you must follow the Resync after an
Interruption of the Critical Section (page 859) procedure.
Optional
If the mongos logs show the upgrade waiting for the upgrade lock, a previous upgrade process may still be
active or may have ended abnormally. After 15 minutes of no remote activity mongos will force the upgrade
lock. If you can verify that there are no running upgrade processes, you may connect to a 2.2 mongos process
and force the lock manually:
mongo <mongos.example.net>
db.getMongo().getCollection("config.locks").findOne({ _id : "configUpgrade" })
If the process specified in the process field of this document is verifiably offline, run the following operation
to force the lock.
It is always more safe to wait for the mongos to verify that the lock is inactive, if you have any doubts about
the activity of another upgrade operation. In addition to the configUpgrade, the mongos may need to wait
for specific collection locks. Do not force the specific collection locks.
6. Upgrade and restart other mongos processes in the sharded cluster, without the --upgrade option.
See Upgrade Sharded Cluster Components (page 859) for more information.
7. Re-enable the balancer (page 695). You can now perform operations that modify cluster meta-data.
Once you have upgraded, do not introduce version 2.0 MongoDB processes into the sharded cluster. This can reintroduce old meta-data formats into the config servers. The meta-data change made by this upgrade process will help
prevent errors caused by cross-version incompatibilities in future versions of MongoDB.
858
Resync after an Interruption of the Critical Section During the short critical section of the upgrade that applies
changes to the meta-data, it is unlikely but possible that a network interruption can prevent all three config servers
from verifying or modifying data. If this occurs, the config servers (page 650) must be re-synced, and there may be
problems starting new mongos processes. The sharded cluster will remain accessible, but avoid all cluster metadata changes until you resync the config servers. Operations that change meta-data include: adding shards, dropping
databases, and dropping collections.
Note: Only perform the following procedure if something (e.g. network, power, etc.) interrupts the upgrade process
during the short critical section of the upgrade. Remember, you may always safely attempt the meta data upgrade
procedure (page 857).
To resync the config servers:
1. Turn off the balancer (page 663) in the sharded cluster and stop all meta-data operations. If you are in the
middle of an upgrade process (Upgrade a Sharded Cluster from MongoDB 2.2 to MongoDB 2.4 (page 856)),
you have already disabled the balancer.
2. Shut down two of the three config servers, preferably the last two listed in the configDB string. For example, if your configDB string is configA:27019,configB:27019,configC:27019, shut down
configB and configC. Shutting down the last two config servers ensures that most mongos instances will
have uninterrupted access to cluster meta-data.
3. mongodump the data files of the active config server (configA).
4. Move the data files of the deactivated config servers (configB and configC) to a backup location.
5. Create new, empty data directories.
6. Restart the disabled config servers with --dbpath pointing to the now-empty data directory and --port
pointing to an alternate port (e.g. 27020).
7. Use mongorestore to repopulate the data files on the disabled documents from the active config server
(configA) to the restarted config servers on the new port (configB:27020,configC:27020). These
config servers are now re-synced.
8. Restart the restored config servers on the old port, resetting the port back to the old settings (configB:27019
and configC:27019).
9. In some cases connection pooling may cause spurious failures, as the mongos disables old connections only
after attempted use. 2.4 fixes this problem, but to avoid this issue in version 2.2, you can restart all mongos
instances (one-by-one, to avoid downtime) and use the rs.stepDown() method before restarting each of the
shard replica set primaries.
10. The sharded cluster is now fully resynced; however before you attempt the upgrade process again, you must
manually reset the upgrade state using a version 2.2 mongos. Begin by connecting to the 2.2 mongos with the
mongo shell:
mongo <mongos.example.net>
11. Finally retry the upgrade process, as in Upgrade a Sharded Cluster from MongoDB 2.2 to MongoDB 2.4
(page 856).
Upgrade Sharded Cluster Components After you have successfully completed the meta-data upgrade process
described in Meta-data Upgrade Procedure (page 857), and the 2.4 mongos instance starts, you can upgrade the
other processes in your MongoDB deployment.
859
While the balancer is still disabled, upgrade the components of your sharded cluster in the following order:
Upgrade all mongos instances in the cluster, in any order.
Upgrade all 3 mongod config server instances, upgrading the first system in the mongos --configdb argument last.
Upgrade each shard, one at a time, upgrading the mongod secondaries before running replSetStepDown
and upgrading the primary of each shard.
When this process is complete, you can now re-enable the balancer (page 695).
Rolling Upgrade Limitation for 2.2.0 Deployments Running with auth Enabled MongoDB cannot support
deployments that mix 2.2.0 and 2.4.0, or greater, components. MongoDB version 2.2.1 and later processes can exist in
mixed deployments with 2.4-series processes. Therefore you cannot perform a rolling upgrade from MongoDB 2.2.0
to MongoDB 2.4.0. To upgrade a cluster with 2.2.0 components, use one of the following procedures.
1. Perform a rolling upgrade of all 2.2.0 processes to the latest 2.2-series release (e.g. 2.2.3) so that there are no
processes in the deployment that predate 2.2.1. When there are no 2.2.0 processes in the deployment, perform a
rolling upgrade to 2.4.0.
2. Stop all processes in the cluster. Upgrade all processes to a 2.4-series release of MongoDB, and start all processes at the same time.
Upgrade from 2.3 to 2.4 If you used a mongod from the 2.3 or 2.4-rc (release candidate) series, you can safely
transition these databases to 2.4.0 or later; however, if you created 2dsphere or text indexes using a mongod
before v2.4-rc2, you will need to rebuild these indexes. For example:
db.records.dropIndex( { loc: "2dsphere" } )
db.records.dropIndex( "records_text" )
db.records.ensureIndex( { loc: "2dsphere" } )
db.records.ensureIndex( { records: "text" } )
Downgrade MongoDB from 2.4 to Previous Versions For some cases the on-disk format of data files used by 2.4
and 2.2 mongod is compatible, and you can upgrade and downgrade if needed. However, several new features in 2.4
are incompatible with previous versions:
2dsphere indexes are incompatible with 2.2 and earlier mongod instances.
text indexes are incompatible with 2.2 and earlier mongod instances.
using a hashed index as a shard key are incompatible with 2.2 and earlier mongos instances.
hashed indexes are incompatible with 2.0 and earlier mongod instances.
Important: Collections sharded using hashed shard keys, should not use 2.2 mongod instances, which cannot
correctly support cluster operations for these collections.
If you completed the meta-data upgrade for a sharded cluster (page 856), you can safely downgrade to 2.2 MongoDB
processes. Do not use 2.0 processes after completing the upgrade procedure.
Note: In sharded clusters, once you have completed the meta-data upgrade procedure (page 856), you cannot use 2.0
mongod or mongos instances in the same cluster.
If you complete the meta-data upgrade, you can safely downgrade components in any order. When upgrade again,
always upgrade mongos instances before mongod instances.
Do not create 2dsphere or text indexes in a cluster that has 2.2 components.
860
Considerations and Compatibility If you upgrade to MongoDB 2.4, and then need to run MongoDB 2.2 with the
same data files, consider the following limitations.
If you use a hashed index as the shard key index, which is only possible under 2.4 you will not be able to
query data in this sharded collection. Furthermore, a 2.2 mongos cannot properly route an insert operation
for a collections sharded using a hashed index for the shard key index: any data that you insert using a 2.2
mongos, will not arrive on the correct shard and will not be reachable by future queries.
If you never create an 2dsphere or text index, you can move between a 2.4 and 2.2 mongod for a given
data set; however, after you create the first 2dsphere or text index with a 2.4 mongod you will need to run
a 2.2 mongod with the --upgrade option and drop any 2dsphere or text index.
Upgrade and Downgrade Procedures
Basic Downgrade and Upgrade
Except as described below, moving between 2.2 and 2.4 is a drop-in replacement:
Then, you will need to drop any existing 2dsphere or text indexes using db.collection.dropIndex(),
for example:
db.records.dropIndex( { loc: "2dsphere" } )
db.records.dropIndex( "records_text" )
Warning: --upgrade will run repairDatabase on any database where you have created a 2dsphere or
text index, which will rebuild all indexes.
Troubleshooting Upgrade/Downgrade Operations If you do not use --upgrade, when you attempt to start a
2.2 mongod and you have created a 2dsphere or text index, mongod will return the following message:
'need to upgrade database index_plugin_upgrade with pdfile version 4.6, new version: 4.5 Not upgradin
While running 2.4, to check the data file version of a MongoDB database, use the following operation in the shell:
db.getSiblingDB('<databasename>').stats().dataFileVersion
861
The major data file 573 version for both 2.2 and 2.4 is 4, the minor data file version for 2.2 is 5 and the minor data file
version for 2.4 is 6 after you create a 2dsphere or text index.
Compatibility and Index Type Changes in MongoDB 2.4 In 2.4 MongoDB includes two new features related to
indexes that users upgrading to version 2.4 must consider, particularly with regard to possible downgrade paths. For
more information on downgrades, see Downgrade MongoDB from 2.4 to Previous Versions (page 860).
New Index Types In 2.4 MongoDB adds two new index types: 2dsphere and text. These index types do not
exist in 2.2, and for each database, creating a 2dsphere or text index, will upgrade the data-file version and make
that database incompatible with 2.2.
If you intend to downgrade, you should always drop all 2dsphere and text indexes before moving to 2.2.
You can use the downgrade procedure (page 860) to downgrade these databases and run 2.2 if needed, however this
will run a full database repair (as with repairDatabase) for all affected databases.
Index Type Validation In MongoDB 2.2 and earlier you could specify invalid index types that did not exist. In
these situations, MongoDB would create an ascending (e.g. 1) index. Invalid indexes include index types specified by
strings that do not refer to an existing index type, and all numbers other than 1 and -1. 574
In 2.4, creating any invalid index will result in an error. Furthermore, you cannot create a 2dsphere or text index
on a collection if its containing database has any invalid index types. 1
Example
If you attempt to add an invalid index in MongoDB 2.4, as in the following:
db.coll.ensureIndex( { field: "1" } )
See Upgrade MongoDB to 2.4 (page 855) for full upgrade instructions.
Other Resources
MongoDB Downloads575 .
All JIRA issues resolved in 2.4576 .
All Backwards incompatible changes577 .
573
The data file version (i.e. pdfile version) is independent and unrelated to the release version of MongoDB.
In 2.4, indexes that specify a type of "1" or "-1" (the strings "1" and "-1") will continue to exist, despite a warning on start-up. However,
a secondary in a replica set cannot complete an initial sync from a primary that has a "1" or "-1" index. Avoid all indexes with invalid types.
575 http://mongodb.org/downloads
576 https://jira.mongodb.org/secure/IssueNavigator.jspa?reset=true&jqlQuery=project+%3D+SERVER+AND+fixVersion+in+%28%222.3.2%22,+%222.3.1%22,+%222
rc0%22,+%222.4.0-rc1%22,+%222.4.0-rc2%22,+%222.4.0-rc3%22%29
577 https://jira.mongodb.org/issues/?jql=project%20%3D%20SERVER%20AND%20fixVersion%20in%20(%222.3.2%22%2C%20%222.3.1%22%2C%20%222.3.0%22
rc0%22%2C%20%222.4.0-rc1%22%2C%20%222.4.0-rc2%22%2C%20%222.4.0-rc3%22)%20AND%20%22Backwards%20Compatibility%22%20in%20(%22Major%
574
862
1. Download binaries of the latest release in the 2.2 series from the MongoDB Download Page579 .
2. Shutdown your mongod instance. Replace the existing binary with the 2.2 mongod binary and restart MongoDB.
Upgrading a Replica Set
You can upgrade to 2.2 by performing a rolling upgrade of the set by upgrading the members individually while the
other members are available to minimize downtime. Use the following procedure:
1. Upgrade the secondary members of the set one at a time by shutting down the mongod and replacing the 2.0
binary with the 2.2 binary. After upgrading a mongod instance, wait for the member to recover to SECONDARY
state before upgrading the next instance. To check the members state, issue rs.status() in the mongo
shell.
578 https://github.com/mongodb/mongo/blob/v2.4/distsrc/THIRD-PARTY-NOTICES
579 http://downloads.mongodb.org/
863
2. Use the mongo shell method rs.stepDown() to step down the primary to allow the normal failover
(page 560) procedure. rs.stepDown() expedites the failover procedure and is preferable to shutting down
the primary directly.
Once the primary has stepped down and another member has assumed PRIMARY state, as observed in the output
of rs.status(), shut down the previous primary and replace mongod binary with the 2.2 binary and start
the new process.
Note: Replica set failover is not instant but will render the set unavailable to read or accept writes until the
failover process completes. Typically this takes 10 seconds or more. You may wish to plan the upgrade during
a predefined maintenance window.
Changes
Major Features
Aggregation Framework The aggregation framework makes it possible to do aggregation operations without needing to use map-reduce. The aggregate command exposes the aggregation framework, and the aggregate()
helper in the mongo shell provides an interface to these operations. Consider the following resources for background
on the aggregation framework and its use:
Documentation: Aggregation Concepts (page 421)
Reference: Aggregation Reference (page 451)
Examples: Aggregation Examples (page 434)
TTL Collections TTL collections remove expired data from a collection, using a special index and a background
thread that deletes expired documents every minute. These collections are useful as an alternative to capped collections
in some cases, such as for data warehousing and caching cases, including: machine generated event data, logs, and
session information that needs to persist in a database for only a limited period of time.
For more information, see the Expire Data from Collections by Setting TTL (page 211) tutorial.
580 https://jira.mongodb.org/browse/SERVER-6902
864
Concurrency Improvements MongoDB 2.2 increases the servers capacity for concurrent operations with the following improvements:
1. DB Level Locking581
2. Improved Yielding on Page Faults582
3. Improved Page Fault Detection on Windows583
To reflect these changes, MongoDB now provides changed and improved reporting for concurrency and use, see locks
and server-status-record-stats in server status and see db.currentOp(), mongotop, and mongostat.
Improved Data Center Awareness with Tag Aware Sharding MongoDB 2.2 adds additional support for geographic distribution or other custom partitioning for sharded collections in clusters. By using this tag aware sharding, you can automatically ensure that data in a sharded database system is always on specific shards. For example,
with tag aware sharding, you can ensure that data is closest to the application servers that use that data most frequently.
Shard tagging controls data location, and is complementary but separate from replica set tagging, which controls
read preference (page 568) and write concern (page 76). For example, shard tagging can pin all USA data to one
or more logical shards, while replica set tagging can control which mongod instances (e.g. production or
reporting) the application uses to service requests.
See the documentation for the following helpers in the mongo shell that support tagged sharding configuration:
sh.addShardTag()
sh.addTagRange()
sh.removeShardTag()
Also, see Tag Aware Sharding (page 708) and Manage Shard Tags (page 709).
Fully Supported Read Preference Semantics All MongoDB clients and drivers now support full read preferences
(page 568), including consistent support for a full range of read preference modes (page 637) and tag sets (page 570).
This support extends to the mongos and applies identically to single replica sets and to the replica sets for each shard
in a sharded cluster.
Additional read preference support now exists in the mongo shell using the readPref() cursor method.
Compatibility Changes
Authentication Changes MongoDB 2.2 provides more reliable and robust support for authentication clients, including drivers and mongos instances.
If your cluster runs with authentication:
For all drivers, use the latest release of your driver and check its release notes.
In sharded environments, to ensure that your cluster remains available during the upgrade process you must use
the upgrade procedure for sharded clusters (page 864).
581 https://jira.mongodb.org/browse/SERVER-4328
582 https://jira.mongodb.org/browse/SERVER-3357
583 https://jira.mongodb.org/browse/SERVER-4538
865
findAndModify Returns Null Value for Upserts that Perform Inserts In version 2.2, for upsert that perform
inserts with the new option set to false, findAndModify commands will now return the following output:
{ 'ok': 1.0, 'value': null }
In the mongo shell, upsert findAndModify operations that perform inserts (with new set to false.)only output a
null value.
In version 2.0 these operations would return an empty document, e.g. { }.
See: SERVER-6226584 for more information.
mongodump 2.2 Output Incompatible with Pre-2.2 mongorestore If you use the mongodump tool from the
2.2 distribution to create a dump of a database, you must use a 2.2 (or later) version of mongorestore to restore
that dump.
See: SERVER-6961585 for more information.
ObjectId().toString() Returns String Literal ObjectId("...") In version 2.2, the toString()
method returns the string representation of the ObjectId() (page 174) object and has the format ObjectId("...").
Consider
the
following
example
that
calls
ObjectId("507c7f79bcf86cd7994f6c0e") object:
the
toString()
method
on
the
ObjectId("507c7f79bcf86cd7994f6c0e").toString()
the
valueOf()
method
on
the
ObjectId("507c7f79bcf86cd7994f6c0e").valueOf()
866
This change does not affect collections created with now illegal names in earlier versions of MongoDB.
These new restrictions are in addition to the existing restrictions on collection names which are:
A collection name should begin with a letter or an underscore.
A collection name cannot contain the null character.
Begin with the system. prefix.
system.indexes collection.
The maximum size of a collection name is 128 characters, including the name of the database. However, for
maximum flexibility, collections should have names less than 80 characters.
Collections names may have any other valid UTF-8 string.
See the SERVER-4442586 and the Are there any restrictions on the names of Collections? (page 734) FAQ item.
Restrictions on Database Names for Windows Database names running on Windows can no longer contain the
following characters:
/\. "*<>:|?
The names of the data files include the database name. If you attempt to upgrade a database instance with one or more
of these characters, mongod will refuse to start.
Change the name of these databases before upgrading. See SERVER-4584587 and SERVER-6729588 for more information.
_id Fields and Indexes on Capped Collections All capped collections now have an _id field by default, if they
exist outside of the local database, and now have indexes on the _id field. This change only affects capped
collections created with 2.2 instances and does not affect existing capped collections.
See: SERVER-5516589 for more information.
New $elemMatch Projection Operator The $elemMatch operator allows applications to narrow the data returned from queries so that the query operation will only return the first matching element in an array. See the
$elemMatch reference and the SERVER-2238590 and SERVER-828591 issues for more information.
Windows Specific Changes
Windows XP is Not Supported As of 2.2, MongoDB does not support Windows XP. Please upgrade to a more
recent version of Windows to use the latest releases of MongoDB. See SERVER-5648592 for more information.
Service Support for mongos.exe You may now run mongos.exe instances as a Windows Service. See the
mongos.exe reference and Manually Create a Windows Service for MongoDB (page 27) and SERVER-1589593 for
more information.
586 https://jira.mongodb.org/browse/SERVER-4442
587 https://jira.mongodb.org/browse/SERVER-4584
588 https://jira.mongodb.org/browse/SERVER-6729
589 https://jira.mongodb.org/browse/SERVER-5516
590 https://jira.mongodb.org/browse/SERVER-2238
591 https://jira.mongodb.org/browse/SERVER-828
592 https://jira.mongodb.org/browse/SERVER-5648
593 https://jira.mongodb.org/browse/SERVER-1589
867
Log Rotate Command Support MongoDB for Windows now supports log rotation by way of the logRotate
database command. See SERVER-2612594 for more information.
New Build Using SlimReadWrite Locks for Windows Concurrency Labeled 2008+ on the Downloads Page595 ,
this build for 64-bit versions of Windows Server 2008 R2 and for Windows 7 or newer, offers increased performance
over the standard 64-bit Windows build of MongoDB. See SERVER-3844596 for more information.
Tool Improvements
Index Definitions Handled by mongodump and mongorestore When you specify the --collection option
to mongodump, mongodump will now backup the definitions for all indexes that exist on the source database. When
you attempt to restore this backup with mongorestore, the target mongod will rebuild all indexes. See SERVER808597 for more information.
mongorestore now includes the --noIndexRestore option to provide the preceding behavior.
--noIndexRestore to prevent mongorestore from building previous indexes.
Use
mongooplog for Replaying Oplogs The mongooplog tool makes it possible to pull oplog entries from mongod
instance and apply them to another mongod instance. You can use mongooplog to achieve point-in-time backup of
a MongoDB data set. See the SERVER-3873598 case and the mongooplog reference.
Authentication Support for mongotop and mongostat mongotop and mongostat now contain support for
username/password authentication. See SERVER-3875599 and SERVER-3871600 for more information regarding this
change. Also consider the documentation of the following options for additional information:
mongotop --username
mongotop --password
mongostat --username
mongostat --password
Write Concern Support for mongoimport and mongorestore mongoimport now provides an option to
halt the import if the operation encounters an error, such as a network interruption, a duplicate key exception, or a
write error. The --stopOnError option will produce an error rather than silently continue importing data. See
SERVER-3937601 for more information.
In mongorestore, the --w option provides support for configurable write concern.
mongodump Support for Reading from Secondaries You can now run mongodump when connected to a secondary member of a replica set. See SERVER-3854602 for more information.
594 https://jira.mongodb.org/browse/SERVER-2612
595 http://www.mongodb.org/downloads
596 https://jira.mongodb.org/browse/SERVER-3844
597 https://jira.mongodb.org/browse/SERVER-808
598 https://jira.mongodb.org/browse/SERVER-3873
599 https://jira.mongodb.org/browse/SERVER-3875
600 https://jira.mongodb.org/browse/SERVER-3871
601 https://jira.mongodb.org/browse/SERVER-3937
602 https://jira.mongodb.org/browse/SERVER-3854
868
mongoimport Support for full 16MB Documents Previously, mongoimport would only import documents
that were less than 4 megabytes in size. This issue is now corrected, and you may use mongoimport to import
documents that are at least 16 megabytes ins size. See SERVER-4593603 for more information.
Timestamp() Extended JSON format MongoDB extended JSON now includes a new Timestamp() type to
represent the Timestamp type that MongoDB uses for timestamps in the oplog among other contexts.
This permits tools like mongooplog and mongodump to query for specific timestamps. Consider the following
mongodump operation:
Improved Shell User Interface 2.2 includes a number of changes that improve the overall quality and consistency
of the user interface for the mongo shell:
Full Unicode support.
Bash-like line editing features. See SERVER-4312605 for more information.
Multi-line command support in shell history. See SERVER-3470606 for more information.
Windows support for the edit command. See SERVER-3998607 for more information.
Helper to load Server-Side Functions The db.loadServerScripts() loads the contents of the current
databases system.js collection into the current mongo shell session. See SERVER-1651608 for more information.
Support for Bulk Inserts If you pass an array of documents to the insert() method, the mongo shell will now
perform a bulk insert operation. See SERVER-3819609 and SERVER-2395610 for more information.
Note: For bulk inserts on sharded clusters, the getLastError command alone is insufficient to verify success.
Applications should must verify the success of bulk inserts in application logic.
Operations
Support for Logging to Syslog See the SERVER-2957611 case and the documentation of the syslogFacility
run-time option or the mongod --syslog and mongos --syslog command line-options.
touch Command Added the touch command to read the data and/or indexes from a collection into memory. See:
SERVER-2023612 and touch for more information.
603 https://jira.mongodb.org/browse/SERVER-4593
604 https://jira.mongodb.org/browse/SERVER-3483
605 https://jira.mongodb.org/browse/SERVER-4312
606 https://jira.mongodb.org/browse/SERVER-3470
607 https://jira.mongodb.org/browse/SERVER-3998
608 https://jira.mongodb.org/browse/SERVER-1651
609 https://jira.mongodb.org/browse/SERVER-3819
610 https://jira.mongodb.org/browse/SERVER-2395
611 https://jira.mongodb.org/browse/SERVER-2957
612 https://jira.mongodb.org/browse/SERVER-2023
869
indexCounters No Longer Report Sampled Data indexCounters now report actual counters that reflect
index use and state. In previous versions, these data were sampled. See SERVER-5784613 and indexCounters for
more information.
Padding Specifiable on compact Command See the documentation of the compact and the SERVER-4018614
issue for more information.
Added Build Flag to Use System Libraries The Boost library, version 1.49, is now embedded in the MongoDB
code base.
If you want to build MongoDB binaries using system Boost libraries, you can pass scons using the
--use-system-boost flag, as follows:
scons --use-system-boost
When building MongoDB, you can also pass scons a flag to compile MongoDB using only system libraries rather
than the included versions of the libraries. For example:
scons --use-system-all
Improved Logging for Replica Set Lag When secondary members of a replica set fall behind in replication,
mongod now provides better reporting in the log. This makes it possible to track replication in general and identify what process may produce errors or halt replication. See SERVER-3575620 for more information.
Replica Set Members can Sync from Specific Members The new replSetSyncFrom command and new
rs.syncFrom() helper in the mongo shell make it possible for you to manually configure from which member of the set a replica will poll oplog entries. Use these commands to override the default selection logic if needed.
Always exercise caution with replSetSyncFrom when overriding the default behavior.
Replica Set Members will not Sync from Members Without Indexes Unless buildIndexes: false To
prevent inconsistency between members of replica sets, if the member of a replica set has buildIndexes set to
true, other members of the replica set will not sync from this member, unless they also have buildIndexes set
to true. See SERVER-4160621 for more information.
613 https://jira.mongodb.org/browse/SERVER-5784
614 https://jira.mongodb.org/browse/SERVER-4018
615 https://jira.mongodb.org/browse/SERVER-3829
616 https://jira.mongodb.org/browse/SERVER-5172
617 https://jira.mongodb.org/browse/SERVER-188
618 https://jira.mongodb.org/browse/SERVER-4683
619 http://goog-perftools.sourceforge.net/doc/tcmalloc.html
620 https://jira.mongodb.org/browse/SERVER-3575
621 https://jira.mongodb.org/browse/SERVER-4160
870
New Option To Configure Index Pre-Fetching during Replication By default, when replicating options, secondaries will pre-fetch Indexes (page 463) associated with a query to improve replication throughput in most cases. The
replication.secondaryIndexPrefetch setting and --replIndexPrefetch option allow administrators to disable this feature or allow the mongod to pre-fetch only the index on the _id field. See SERVER-6718622
for more information.
Map Reduce Improvements
Index on Shard Keys Can Now Be a Compound Index If your shard key uses the prefix of an existing index,
then you do not need to maintain a separate index for your shard key in addition to your existing index. This index,
however, cannot be a multi-key index. See the Shard Key Indexes (page 667) documentation and SERVER-1506625
for more information.
Migration Thresholds Modified The migration thresholds (page 664) have changed in 2.2 to permit more even
distribution of chunks in collections that have smaller quantities of data. See the Migration Thresholds (page 664)
documentation for more information.
Licensing Changes
Added License notice for Google Perftools (TCMalloc Utility). See the License Notice626 and the SERVER-4683627
for more information.
Resources
MongoDB Downloads628 .
All JIRA issues resolved in 2.2629 .
All backwards incompatible changes630 .
All third party license notices631 .
Whats New in MongoDB 2.2 Online Conference632 .
622 https://jira.mongodb.org/browse/SERVER-6718
623 https://jira.mongodb.org/browse/SERVER-4521
624 https://jira.mongodb.org/browse/SERVER-4158
625 https://jira.mongodb.org/browse/SERVER-1506
626 https://github.com/mongodb/mongo/blob/v2.2/distsrc/THIRD-PARTY-NOTICES#L231
627 https://jira.mongodb.org/browse/SERVER-4683
628 http://mongodb.org/downloads
629 https://jira.mongodb.org/secure/IssueNavigator.jspa?reset=true&jqlQuery=project+%3D+SERVER+AND+fixVersion+in+%28%222.1.0%22%2C+%222.1.1%22%2
rc0%22%2C+%222.2.0-rc1%22%2C+%222.2.0-rc2%22%29+ORDER+BY+component+ASC%2C+key+DESC
630 https://jira.mongodb.org/issues/?filter=11225&jql=project%20%3D%20SERVER%20AND%20fixVersion%20in%20(10483%2C%2010893%2C%2010894%2C%20
631 https://github.com/mongodb/mongo/blob/v2.2/distsrc/THIRD-PARTY-NOTICES
632 http://www.mongodb.com/events/webinar/mongodb-online-conference-sept
871
Read through all release notes before upgrading, and ensure that no changes will affect your deployment.
If you create new indexes in 2.0, then downgrading to 1.8 is possible but you must reindex the new collections.
mongoimport and mongoexport now correctly adhere to the CSV spec for handling CSV input/output. This
may break existing import/export workflows that relied on the previous behavior. For more information see SERVER1097633 .
Journaling (page 300) is enabled by default in 2.0 for 64-bit builds. If you still prefer to run without journaling, start
mongod with the --nojournal run-time option. Otherwise, MongoDB creates journal files during startup. The
first time you start mongod with journaling, you will see a delay as mongod creates new files. In addition, you may
see reduced write throughput.
2.0 mongod instances are interoperable with 1.8 mongod instances; however, for best results, upgrade your deployments using the following procedures:
Upgrading a Standalone mongod
1. Upgrade the secondary members of the set one at a time by shutting down the mongod and replacing the 1.8
binary with the 2.0.x binary from the MongoDB Download Page635 .
2. To avoid losing the last few updates on failover you can temporarily halt your application (failover should take
less than 10 seconds), or you can set write concern (page 76) in your application code to confirm that each
update reaches multiple servers.
3. Use the rs.stepDown() to step down the primary to allow the normal failover (page 560) procedure.
rs.stepDown() and replSetStepDown provide for shorter and more consistent failover procedures than
simply shutting down the primary directly.
When the primary has stepped down, shut down its instance and upgrade by replacing the mongod binary with
the 2.0.x binary.
633 https://jira.mongodb.org/browse/SERVER-1097
634 http://downloads.mongodb.org/
635 http://downloads.mongodb.org/
872
1. Upgrade all config server instances first, in any order. Since config servers use two-phase commit, shard configuration metadata updates will halt until all are up and running.
2. Upgrade mongos routers in any order.
Changes
Compact Command
A compact command is now available for compacting a single collection and its indexes. Previously, the only way
to compact was to repair the entire database.
Concurrency Improvements
When going to disk, the server will yield the write lock when writing data that is not likely to be in memory. The
initial implementation of this feature now exists:
See SERVER-2563636 for more information.
The specific operations yield in 2.0 are:
Updates by _id
Removes
Long cursor iterations
Default Stack Size
MongoDB 2.0 reduces the default stack size. This change can reduce total memory usage when there are many (e.g.,
1000+) client connections, as there is a thread per connection. While portions of a threads stack can be swapped out
if unused, some operating systems do this slowly enough that it might be an issue. The default stack size is lesser of
the system setting or 1MB.
Index Performance Enhancements
v2.0 includes significant improvements to the index (page 509). Indexes are often 25% smaller and 25% faster (depends
on the use case). When upgrading from previous versions, the benefits of the new index type are realized only if you
create a new index or re-index an old one.
Dates are now signed, and the max index key size has increased slightly from 819 to 1024 bytes.
All operations that create a new index will result in a 2.0 index by default. For example:
Reindexing results on an older-version index results in a 2.0 index. However, reindexing on a secondary does
not work in versions prior to 2.0. Do not reindex on a secondary. For a workaround, see SERVER-3866637 .
The repairDatabase command converts indexes to a 2.0 indexes.
636 https://jira.mongodb.org/browse/SERVER-2563
637 https://jira.mongodb.org/browse/SERVER-3866
873
To convert all indexes for a given collection to the 2.0 type (page 873), invoke the compact command.
Once you create new indexes, downgrading to 1.8.x will require a re-index of any indexes created using 2.0. See Build
Old Style Indexes (page 509).
Sharding Authentication
Hidden Nodes in Sharded Clusters In 2.0, mongos instances can now determine when a member of a replica set
becomes hidden without requiring a restart. In 1.8, mongos if you reconfigured a member as hidden, you had to
restart mongos to prevent queries from reaching the hidden member.
Priorities Each replica set member can now have a priority value consisting of a floating-point from 0 to 1000,
inclusive. Priorities let you control which member of the set you prefer to have as primary the member with the
highest priority that can see a majority of the set will be elected primary.
For example, suppose you have a replica set with three members, A, B, and C, and suppose that their priorities are set
as follows:
As priority is 2.
Bs priority is 3.
Cs priority is 1.
During normal operation, the set will always chose B as primary. If B becomes unavailable, the set will elect A as
primary.
For more information, see the priority documentation.
Data-Center Awareness You can now tag replica set members to indicate their location. You can use these tags
to design custom write rules (page 76) across data centers, racks, specific servers, or any other architecture choice.
For example, an administrator can define rules such as very important write or customerData or audit-trail to
replicate to certain servers, racks, data centers, etc. Then in the application code, the developer would say:
db.foo.insert(doc, {w : "very important write"})
which would succeed if it fulfilled the conditions the DBA defined for very important write.
For more information, see Data Center Awareness (page 207).
Drivers may also support tag-aware reads. Instead of specifying slaveOk, you specify slaveOk with tags indicating
which data-centers to read from. For details, see the Drivers638 documentation.
w : majority You can also set w to majority to ensure that the write propagates to a majority of nodes, effectively committing it. The value for majority will automatically adjust as you add or remove nodes from the
set.
For more information, see Write Concern (page 76).
638 http://docs.mongodb.org/ecosystem/drivers
874
Reconfiguration with a Minority Up If the majority of servers in a set has been permanently lost, you can now
force a reconfiguration of the set to bring it back online.
For more information see Reconfigure a Replica Set with Unavailable Members (page 618).
Primary Checks for a Caught up Secondary before Stepping Down To minimize time without a primary, the
rs.stepDown() method will now fail if the primary does not see a secondary within 10 seconds of its latest
optime. You can force the primary to step down anyway, but by default it will return an error message.
See also Force a Member to Become Primary (page 611).
Extended Shutdown on the Primary to Minimize Interruption When you call the shutdown command, the
primary will refuse to shut down unless there is a secondary whose optime is within 10 seconds of the primary. If such
a secondary isnt available, the primary will step down and wait up to a minute for the secondary to be fully caught up
before shutting down.
Note that to get this behavior, you must issue the shutdown command explicitly; sending a signal to the process will
not trigger this behavior.
You can also force the primary to shut down, even without an up-to-date secondary available.
Maintenance Mode When repair or compact runs on a secondary, the secondary will automatically drop into
recovering mode until the operation finishes. This prevents clients from trying to read from it while its busy.
Geospatial Features
Multi-Location Documents Indexing is now supported on documents which have multiple location objects, embedded either inline or in embedded documents. Additional command options are also supported, allowing results to
return with not only distance but the location used to generate the distance.
For more information, see Multi-location Documents for 2d Indexes (page 485).
Polygon searches Polygonal $within queries are also now supported for simple polygon shapes. For details, see
the $within operator documentation.
Journaling Enhancements
Journaling is now enabled by default for 64-bit platforms. Use the --nojournal command line option to
disable it.
The journal is now compressed for faster commits to disk.
A new --journalCommitInterval run-time option exists for specifying your own group commit interval.
The default settings do not change.
A new { getLastError: { j: true } } option is available to wait for the group commit. The
group commit will happen sooner when a client is waiting on {j: true}. If journaling is disabled, {j:
true} is a no-op.
875
Set the continueOnError option for bulk inserts, in the driver, so that bulk insert will continue to insert any
remaining documents even if an insert fails, as is the case with duplicate key exceptions or network interruptions. The
getLastError command will report whether any inserts have failed, not just the last one. If multiple errors occur,
the client will only receive the most recent getLastError results.
Note: For bulk inserts on sharded clusters, the getLastError command alone is insufficient to verify success.
Applications should must verify the success of bulk inserts in application logic.
Map Reduce
Output to a Sharded Collection Using the new sharded flag, it is possible to send the result of a map/reduce to
a sharded collection. Combined with the reduce or merge flags, it is possible to keep adding data to very large
collections from map/reduce jobs.
For more information, see Map-Reduce (page 424) and the mapReduce reference.
Performance Improvements Map/reduce performance will benefit from the following:
Larger in-memory buffer sizes, reducing the amount of disk I/O needed during a job
Larger javascript heap size, allowing for larger objects and less GC
Supports pure JavaScript execution with the jsMode flag. See the mapReduce reference.
New Querying Features
Additional regex options: s Allows the dot (.) to match all characters including new lines. This is in addition to
the currently supported i, m and x. See $regex.
$and A special boolean $and query operator is now available.
Command Output Changes
The output of the validate command and the documents in the system.profile collection have both been
enhanced to return information as BSON objects with keys for each value rather than as free-form strings.
Shell Features
Custom Prompt You can define a custom prompt for the mongo shell. You can change the prompt at any time by
setting the prompt variable to a string or a custom JavaScript function returning a string. For examples, see Use a
Custom Prompt (page 268).
Default Shell Init Script On startup, the shell will check for a .mongorc.js file in the users home directory.
The shell will execute this file after connecting to the database and before displaying the prompt.
If you would like the shell not to run the .mongorc.js file automatically, start the shell with --norc.
For more information, see the mongo reference.
876
In 2.0, when running with authentication (e.g. authorization) all database commands require authentication,
except the following commands.
isMaster
authenticate
getnonce
buildInfo
ping
isdbgrid
Resources
MongoDB Downloads639
All JIRA Issues resolved in 2.0640
All Backward Incompatible Changes641
Read through all release notes before upgrading and ensure that no changes will affect your deployment.
Upgrading a Standalone mongod
641 https://jira.mongodb.org/issues/?filter=11023&jql=project%20%3D%20SERVER%20AND%20fixVersion%20in%20(10889%2C%2010886%2C%2010784%2C%20
642 http://downloads.mongodb.org/
877
878
"ismaster" : false,
"secondary" : true,
"hosts" : [
"ubuntu:27017",
"ubuntu:27018"
],
"arbiters" : [
"ubuntu:27019"
],
"primary" : "ubuntu:27018",
"ok" : 1
}
// for each secondary
config.members[0].priority = 0
config.members[3].priority = 0
config.members[4].priority = 0
rs.reconfig(config)
5. Shut down the primary (the final 1.6 server), and then restart it with the 1.8.x binary from the MongoDB
Download Page645 .
Upgrading a Sharded Cluster
879
Returning to 1.6
If for any reason you must move back to 1.6, follow the steps above in reverse. Please be careful that you have not
inserted any documents larger than 4MB while running on 1.8 (where the max size has increased to 16MB). If you
have you will get errors when the server tries to read those documents.
Journaling Returning to 1.6 after using 1.8 Journaling (page 300) works fine, as journaling does not change anything
about the data file format. Suppose you are running 1.8.x with journaling enabled and you decide to switch back to
1.6. There are two scenarios:
If you shut down cleanly with 1.8.x, just restart with the 1.6 mongod binary.
If 1.8.x shut down uncleanly, start 1.8.x up again and let the journal files run to fix any damage (incomplete
writes) that may have existed at the crash. Then shut down 1.8.x cleanly and restart with the 1.6 mongod binary.
Changes
Journaling
MongoDB now supports write-ahead Journaling Mechanics (page 300) to facilitate fast crash recovery and durability
in the storage engine. With journaling enabled, a mongod can be quickly restarted following a crash without needing
to repair the collections. The aggregation framework makes it possible to do aggregation
Sparse and Covered Indexes
Sparse Indexes (page 490) are indexes that only include documents that contain the fields specified in the index.
Documents missing the field will not appear in the index at all. This can significantly reduce index size for indexes of
fields that contain only a subset of documents within a collection.
Covered Indexes (page 64) enable MongoDB to answer queries entirely from the index when the query only selects
fields that the index contains.
Incremental MapReduce Support
The mapReduce command supports new options that enable incrementally updating existing collections. Previously,
a MapReduce job could output either to a temporary collection or to a named permanent collection, which it would
overwrite with new data.
You now have several options for the output of your MapReduce jobs:
648 http://downloads.mongodb.org/
880
You can merge MapReduce output into an existing collection. Output from the Reduce phase will replace
existing keys in the output collection if it already exists. Other keys will remain in the collection.
You can now re-reduce your output with the contents of an existing collection. Each key output by the reduce
phase will be reduced with the existing document in the output collection.
You can replace the existing output collection with the new results of the MapReduce job (equivalent to setting
a permanent output collection in previous releases)
You can compute MapReduce inline and return results to the caller without persisting the results of the job. This
is similar to the temporary collections generated in previous releases, except results are limited to 8MB.
For more information, see the out field options in the mapReduce document.
Additional Changes and Enhancements
1.8.1
Sharding migrate fix when moving larger chunks.
Durability fix with background indexing.
Fixed mongos concurrency issue with many incoming connections.
1.8.0
All changes from 1.7.x series.
1.7.6
Bug fixes.
1.7.5
Journaling (page 300).
Extent allocation improvements.
Improved replica set connectivity for mongos.
getLastError improvements for sharding.
1.7.4
mongos routes slaveOk queries to secondaries in replica sets.
New mapReduce output options.
Sparse Indexes (page 490).
1.7.3
Initial covered index (page 64) support.
Distinct can use data from indexes when possible.
mapReduce can merge or reduce results into an existing collection.
mongod tracks and mongostat displays network usage. See mongostat.
881
1.8.1649 , 1.8.0650
1.7.6651 , 1.7.5652 , 1.7.4653 , 1.7.3654 , 1.7.2655 , 1.7.1656 , 1.7.0657
649 https://groups.google.com/forum/?fromgroups=#!topic/mongodb-user/v09MbhEm62Y
650 https://groups.google.com/forum/?fromgroups=#!topic/mongodb-user/JeHQOnam6Qk
651 https://groups.google.com/forum/?fromgroups=#!topic/mongodb-user/3t6GNZ1qGYc
652 https://groups.google.com/forum/?fromgroups=#!topic/mongodb-user/S5R0Tx9wkEg
653 https://groups.google.com/forum/?fromgroups=#!topic/mongodb-user/9Om3Vuw-y9c
654 https://groups.google.com/forum/?fromgroups=#!topic/mongodb-user/DfNUrdbmflI
655 https://groups.google.com/forum/?fromgroups=#!topic/mongodb-user/df7mwK6Xixo
656 https://groups.google.com/forum/?fromgroups=#!topic/mongodb-user/HUR9zYtTpA8
657 https://groups.google.com/forum/?fromgroups=#!topic/mongodb-user/TUnJCg9161A
882
Resources
MongoDB Downloads658
All JIRA Issues resolved in 1.8659
883
884
885
Geo
2d geospatial search (page 484)
geo $center and $box searches
886
2. The default write concern on the new MongoClient class will be to acknowledge all write operations
This will allow your application to receive acknowledgment of all write operations.
See the documentation of Write Concern (page 76) for more information about write concern in MongoDB.
Please migrate to the new MongoClient class expeditiously.
Releases
The following driver releases will include the changes outlined in Changes (page 887). See each drivers release notes
for a full account of each release as well as other related driver-specific changes.
C#, version 1.7
Java, version 2.10.0
Node.js, version 1.2
Perl, version 0.501.1
PHP, version 1.4
Python, version 2.4
Ruby, version 1.8
887
Important: Always upgrade to the latest stable revision of your release series.
The version numbering system for MongoDB differs from the system used for the MongoDB drivers. Drivers use only
the first number to indicate a major version. For details, see drivers-version-numbers.
Example
Version numbers
2.0.0 : Stable release.
2.0.1 : Revision.
2.1.0 : Development release for testing only. Includes new features and changes for testing. Interfaces and
stability may not be compatible in development releases.
2.2.0 : Stable release. This is a culmination of the 2.1.x development series.
888
CHAPTER 13
The MongoDB Manual1 contains comprehensive documentation on the MongoDB document-oriented database management system. This page describes the manuals licensing, editions, and versions, and describes how to make a
change request and how to contribute to the manual.
For more information on MongoDB, see MongoDB: A Document Oriented Database2 . To download MongoDB, see
the downloads page3 .
13.1 License
This manual is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported4 (i.e.
CC-BY-NC-SA) license.
The MongoDB Manual is copyright 2011-2015 MongoDB, Inc.
13.2 Editions
In addition to the MongoDB Manual5 , you can also access this content in the following editions:
ePub Format6
Single HTML Page7
PDF Format8 (without reference.)
HTML tar.gz9
You also can access PDF files that contain subsets of the MongoDB Manual:
MongoDB Reference Manual10
MongoDB CRUD Operations11
1 http://docs.mongodb.org/manual/#
2 http://www.mongodb.org/about/
3 http://www.mongodb.org/downloads
4 http://creativecommons.org/licenses/by-nc-sa/3.0/
5 http://docs.mongodb.org/manual/#
6 http://docs.mongodb.org/master/MongoDB-manual.epub
7 http://docs.mongodb.org/master/single/
8 http://docs.mongodb.org/master/MongoDB-manual.pdf
9 http://docs.mongodb.org/master/manual.tar.gz
10 http://docs.mongodb.org/master/MongoDB-reference-manual.pdf
11 http://docs.mongodb.org/master/MongoDB-crud-guide.pdf
889
and
stable
version
of
the
manual
is
always
available
at
890
To this end, the MongoDB Documentation Project is preparing to launch a translation effort to allow the community
to help bring the documentation to speakers of other languages.
If you would like to express interest in helping to translate the MongoDB documentation once this project is opened
to the public, please:
complete the MongoDB Contributor Agreement24 , and
join the mongodb-translators25 user group.
The mongodb-translators26 user group exists to facilitate collaboration between translators and the documentation
team at large. You can join the group without signing the Contributor Agreement, but you will not be allowed to
contribute translations.
See also:
Contribute to the Documentation (page 890)
Style Guide and Documentation Conventions (page 892)
MongoDB Manual Organization (page 901)
MongoDB Documentation Practices and Processes (page 898)
MongoDB Documentation Build System (page 902)
The entire documentation source for this manual is available in the mongodb/docs repository27 , which is one of the
MongoDB project repositories on GitHub28 .
To contribute to the documentation, you can open a GitHub account29 , fork the mongodb/docs repository30 , make a
change, and issue a pull request.
In order for the documentation team to accept your change, you must complete the MongoDB Contributor Agreement31 .
You can clone the repository by issuing the following command at your system shell:
git clone git://github.com/mongodb/docs.git
891
2011-09-27: Document created with a (very) rough list of style guidelines, conventions, and questions.
2012-01-12: Document revised based on slight shifts in practice, and as part of an effort of making it easier for people
outside of the documentation team to contribute to documentation.
2012-03-21: Merged in content from the Jargon, and cleaned up style in light of recent experiences.
2012-08-10: Addition to the Referencing section.
2013-02-07: Migrated this document to the manual. Added map-reduce terminology convention. Other edits.
2013-11-15: Added new table of preferred terms.
Naming Conventions
This section contains guidelines on naming files, sections, documents and other document elements.
File naming Convention:
For Sphinx, all files should have a .txt extension.
Separate words in file names with hyphens (i.e. -.)
For most documents, file names should have a terse one or two word name that
scribes the material covered in the document.
Allow the path of the file within the
ument tree to add some of the required context/categorization.
For example its
ceptable
to
have
http://docs.mongodb.org/manual/core/sharding.rst
http://docs.mongodb.org/manual/administration/sharding.rst.
dedocacand
For tutorials, the full title of the document should be in the file name.
For example,
http://docs.mongodb.org/manual/tutorial/replace-one-configuration-server-in-a-shard
Phrase headlines and titles so users can determine what questions the text will answer, and material that will
be addressed, without needing them to read the content. This shortens the amount of time that people spend
looking for answers, and improvise search/scanning, and possibly SEO.
Prefer titles and headers in the form of Using foo over How to Foo.
When using target references (i.e. :ref: references in documents), use names that include enough context to
be intelligible through all documentation. For example, use replica-set-secondary-only-node as
opposed to secondary-only-node. This makes the source more usable and easier to maintain.
Style Guide
This includes the local typesetting, English, grammatical, conventions and preferences that all documents in the manual
should use. The goal here is to choose good standards, that are clear, and have a stylistic minimalism that does not
interfere with or distract from the content. A uniform style will improve user experience and minimize the effect of a
multi-authored document.
892
Punctuation
Use the Oxford comma.
Oxford commas are the commas in a list of things (e.g. something, something else, and another thing) before
the conjunction (e.g. and or or.).
Do not add two spaces after terminal punctuation, such as periods.
Place commas and periods inside quotation marks.
Headings Use title case for headings and document titles. Title case capitalizes the first letter of the first, last, and
all significant words.
Verbs Verb tense and mood preferences, with examples:
Avoid the first person. For example do not say, We will begin the backup process by locking the database, or
I begin the backup process by locking my database instance.
Use the second person. If you need to back up your database, start by locking the database first. In practice,
however, its more concise to imply second person using the imperative, as in Before initiating a backup, lock
the database.
When indicated, use the imperative mood. For example: Backup your databases often and To prevent data
loss, back up your databases.
The future perfect is also useful in some cases. For example, Creating disk snapshots without locking the
database will lead to an invalid state.
Avoid helper verbs, as possible, to increase clarity and concision. For example, attempt to avoid this does
foo and this will do foo when possible. Use does foo over will do foo in situations where this foos is
unacceptable.
Referencing
To refer to future or planned functionality in MongoDB or a driver, always link to the Jira case. The Manuals
conf.py provides an :issue: role that links directly to a Jira case (e.g. :issue:\SERVER-9001\).
For non-object references (i.e. functions, operators, methods, database commands, settings) always reference
only the first occurrence of the reference in a section. You should always reference objects, except in section
headings.
Structure references with the why first; the link second.
For example, instead of this:
Use the Convert a Replica Set to a Replicated Sharded Cluster (page 678) procedure if you have an existing
replica set.
Type this:
To deploy a sharded cluster for an existing replica set, see Convert a Replica Set to a Replicated Sharded Cluster
(page 678).
General Formulations
Contractions are acceptable insofar as they are necessary to increase readability and flow. Avoid otherwise.
Make lists grammatically correct.
Do not use a period after every item unless the list item completes the unfinished sentence before the list.
13.5. Contribute to the Documentation
893
code-block::
[language] in footnotes.
Chapter 13. About MongoDB Documentation
As it makes sense, use the .. code-block:: [language] form to insert literal blocks into the text.
While the double colon, ::, is functional, the .. code-block:: [language] form makes the source
easier to read and understand.
For all mentions of referenced types (i.e. commands, operators, expressions, functions, statuses, etc.) use the
reference types to ensure uniform formatting and cross-referencing.
895
896
Preferred
Term
document
Concept
Dispreferred
Alternatives
Notes
record, object,
row
instance
process
(acceptable
sometimes), node
(never
acceptable),
server.
field
name
key, column
field/value
The name/value pair that
describes a unit of data in
MongoDB.
value
MongoDB
data
mongo,
mongodb, cluster
embedded
document
mapreduce
An embedded or nested
document within a document or
an array.
embedded
document, nested
document
cluster
A sharded cluster.
mapReduce, map
reduce,
map/reduce
grid, shard
cluster, set,
deployment
shard cluster,
cluster, sharded
system
set, replication
deployment
cluster, system
897
Typically in the form MongoDB deployment.
Includes standalones, replica sets and sharded
clusters.
Geo-Location
1. While MongoDB is capable of storing coordinates in embedded documents, in practice, users should only
store coordinates in arrays. (See: DOCS-4135 .)
MongoDB Documentation Practices and Processes
This document provides an overview of the practices and processes.
Commits
When relevant, include a Jira case identifier in a commit message. Reference documentation cases when applicable,
but feel free to reference other cases from jira.mongodb.org36 .
Err on the side of creating a larger number of discrete commits rather than bundling large set of changes into one
commit.
35 https://jira.mongodb.org/browse/DOCS-41
36 http://jira.mongodb.org/
898
For the sake of consistency, remove trailing whitespaces in the source file.
Hard wrap files to between 72 and 80 characters per-line.
Standards and Practices
At least two people should vet all non-trivial changes to the documentation before publication. One of the
reviewers should have significant technical experience with the material covered in the documentation.
All development and editorial work should transpire on GitHub branches or forks that editors can then merge
into the publication branches.
Collaboration
Building the documentation is useful because Sphinx40 and docutils can catch numerous errors in the format and
syntax of the documentation. Additionally, having access to an example documentation as it will appear to the users
is useful for providing more effective basis for the review process. Besides Sphinx, Pygments, and Python-Docutils,
the documentation repository contains all requirements for building the documentation resource.
Talk to someone on the documentation team if you are having problems running builds yourself.
Publication
The makefile for this repository contains targets that automate the publication process. Use make html to publish
a test build of the documentation in the build/ directory of your repository. Use make publish to build the full
contents of the manual from the current branch in the ../public-docs/ directory relative the docs repository.
Other targets include:
man - builds UNIX Manual pages for all Mongodb utilities.
push - builds and deploys the contents of the ../public-docs/.
pdfs - builds a PDF version of the manual (requires LaTeX dependencies.)
Branches
This section provides an overview of the git branches in the MongoDB documentation repository and their use.
37 https://jira.mongodb.org/browse/DOCS
38 https://github.com/
39 https://github.com/mongodb/docs
40 http://sphinx.pocoo.org/
899
At the present time, future work transpires in the master, with the main publication being current. As the
documentation stabilizes, the documentation team will begin to maintain branches of the documentation for specific
MongoDB releases.
Migration from Legacy Documentation
The MongoDB.org Wiki contains a wealth of information. As the transition to the Manual (i.e. this project and
resource) continues, its critical that no information disappears or goes missing. The following process outlines how
to migrate a wiki page to the manual:
1. Read the relevant sections of the Manual, and see what the new documentation has to offer on a specific topic.
In this process you should follow cross references and gain an understanding of both the underlying information
and how the parts of the new content relates its constituent parts.
2. Read the wiki page you wish to redirect, and take note of all of the factual assertions, examples presented by the
wiki page.
3. Test the factual assertions of the wiki page to the greatest extent possible. Ensure that example output is accurate.
In the case of commands and reference material, make sure that documented options are accurate.
4. Make corrections to the manual page or pages to reflect any missing pieces of information.
The target of the redirect need not contain every piece of information on the wiki page, if the manual as a
whole does, and relevant section(s) with the information from the wiki page are accessible from the target of the
redirection.
5. As necessary, get these changes reviewed by another writer and/or someone familiar with the area of the information in question.
At this point, update the relevant Jira case with the target that youve chosen for the redirect, and make the ticket
unassigned.
6. When someone has reviewed the changes and published those changes to Manual, you, or preferably someone
else on the team, should make a final pass at both pages with fresh eyes and then make the redirect.
Steps 1-5 should ensure that no information is lost in the migration, and that the final review in step 6 should be
trivial to complete.
Review Process
Types of Review The content in the Manual undergoes many types of review, including the following:
Initial Technical Review Review by an engineer familiar with MongoDB and the topic area of the documentation.
This review focuses on technical content, and correctness of the procedures and facts presented, but can improve any
aspect of the documentation that may still be lacking. When both the initial technical review and the content review
are complete, the piece may be published.
Content Review Textual review by another writer to ensure stylistic consistency with the rest of the manual. Depending on the content, this may precede or follow the initial technical review. When both the initial technical review
and the content review are complete, the piece may be published.
900
Consistency Review This occurs post-publication and is content focused. The goals of consistency reviews are to
increase the internal consistency of the documentation as a whole. Insert relevant cross-references, update the style as
needed, and provide background fact-checking.
When possible, consistency reviews should be as systematic as possible and we should avoid encouraging stylistic and
information drift by editing only small sections at a time.
Subsequent Technical Review If the documentation needs to be updated following a change in functionality of the
server or following the resolution of a user issue, changes may be significant enough to warrant additional technical
review. These reviews follow the same form as the initial technical review, but is often less involved and covers a
smaller area.
Review Methods If youre not a usual contributor to the documentation and would like to review something, you
can submit reviews in any of the following methods:
If youre reviewing an open pull request in GitHub, the best way to comment is on the overview diff, which
you can find by clicking on the diff button in the upper left portion of the screen. You can also use the
following URL to reach this interface:
https://github.com/mongodb/docs/pull/[pull-request-id]/files
Replace [pull-request-id] with the identifier of the pull request. Make all comments inline, using
GitHubs comment system.
You may also provide comments directly on commits, or on the pull request itself but these commit-comments
are archived in less coherent ways and generate less useful emails, while comments on the pull request lead to
less specific changes to the document.
Leave feedback on Jira cases in the DOCS41 project. These are better for more general changes that arent
necessarily tied to a specific line, or affect multiple files.
Create a fork of the repository in your GitHub account, make any required changes and then create a pull request
with your changes.
If you insert lines that begin with any of the following annotations:
.. TODO:
TODO:
.. TODO
TODO
followed by your comments, it will be easier for the original writer to locate your comments. The two dots ..
format is a comment in reStructured Text, which will hide your comments from Sphinx and publication if youre
worried about that.
This format is often easier for reviewers with larger portions of content to review.
MongoDB Manual Organization
This document provides an overview of the global organization of the documentation resource. Refer to the notes
below if you are having trouble understanding the reasoning behind a files current location, or if you want to add new
documentation but arent sure how to integrate it into the existing resource.
If you have questions, dont hesitate to open a ticket in the Documentation Jira Project42 or contact the documentation
team43 .
41 http://jira.mongodb.org/browse/DOCS
42 https://jira.mongodb.org/browse/DOCS
43 [email protected]
901
Global Organization
Indexes
and
Experience The
documentation
project
has
two
index
files:
http://docs.mongodb.org/manual/contents.txt and http://docs.mongodb.org/manual/index.txt.
The contents file provides the documentations tree structure, which Sphinx uses to create the left-pane navigational
structure, to power the Next and Previous page functionality, and to provide all overarching outlines of the
resource. The index file is not included in the contents file (and thus builds will produce a warning here) and is
the page that users first land on when visiting the resource.
Having separate contents and index files provides a bit more flexibility with the organization of the resource while
also making it possible to customize the primary user experience.
Topical Organization The placement of files in the repository depends on the type of documentation rather than the
topic of the content. Like the difference between contents.txt and index.txt, by decoupling the organization
of the files from the organization of the information the documentation can be more flexible and can more adequately
address changes in the product and in users needs.
Files in the source/ directory represent the tip of a logical tree of documents, while directories are containers of
types of content. The administration and applications directories, however, are legacy artifacts and with a
few exceptions contain sub-navigation pages.
With several exceptions in the reference/ directory, there is only one level of sub-directories in the source/
directory.
Tools
The organization of the site, like all Sphinx sites derives from the toctree44 structure. However, in order to annotate
the table of contents and provide additional flexibility, the MongoDB documentation generates toctree45 structures
using data from YAML files stored in the source/includes/ directory. These files start with ref-toc or toc
and generate output in the source/includes/toc/ directory. Briefly this system has the following behavior:
files that start with ref-toc refer to the documentation of API objects (i.e. commands, operators and methods),
and the build system generates files that hold toctree46 directives as well as files that hold tables that list
objects and a brief description.
files that start with toc refer to all other documentation and the build system generates files that hold
toctree47 directives as well as files that hold definition lists that contain links to the documents and short
descriptions the content.
file names that have spec following toc or ref-toc will generate aggregated tables or definition lists and
allow ad-hoc combinations of documents for landing pages and quick reference guides.
MongoDB Documentation Build System
This document contains more direct instructions for building the MongoDB documentation.
Getting Started
Install Dependencies The MongoDB Documentation project depends on the following tools:
44 http://sphinx-doc.org/markup/toctree.html#directive-toctree
45 http://sphinx-doc.org/markup/toctree.html#directive-toctree
46 http://sphinx-doc.org/markup/toctree.html#directive-toctree
47 http://sphinx-doc.org/markup/toctree.html#directive-toctree
902
Python
Git
Inkscape (Image generation.)
LaTeX/PDF LaTeX (typically texlive; for building PDFs)
Giza48
OS X Install Sphinx, Docutils, and their dependencies with easy_install the following command:
easy_install giza
Feel free to use pip rather than easy_install to install python packages.
To generate the images used in the documentation, download and install Inkscape49 .
Optional
To generate PDFs for the full production build, install a TeX distribution (for building the PDF.) If you do not have a
LaTeX installation, use MacTeX50 . This is only required to build PDFs.
Arch Linux Install packages from the system repositories with the following command:
pacman -S inkscape python2-pip
Optional
To generate PDFs for the full production build, install the following packages from the system repository:
pacman -S texlive-bin texlive-core texlive-latexextra
Debian/Ubuntu Install the required system packages with the following command:
apt-get install inkscape python-pip
Optional
To generate PDFs for the full production build, install the following packages from the system repository:
apt-get install texlive-latex-recommended texlive-latex-recommended
48 https://pypi.python.org/pypi/giza
49 http://inkscape.org/download/
50 http://www.tug.org/mactex/2011/
903
The MongoDB documentation build system is entirely accessible via make targets. For example, to build an HTML
version of the documentation issue the following command:
make html
You can find the build output in build/<branch>/html, where <branch> is the name of the current branch.
In addition to the html target, the build system provides the following targets:
publish Builds and integrates all output for the production build.
Build output is in
build/public/<branch>/. When you run publish in the master, the build will generate
some output in build/public/.
push; stage Uploads the production build to the production or staging web servers. Depends on publish. Requires access production or staging environment.
push-all; stage-all Uploads the entire content of build/public/ to the web servers.
publish. Not used in common practice.
Depends on
push-with-delete; stage-with-delete Modifies the action of push and stage to remove remote file
that dont exist in the local build. Use with caution.
html; latex; dirhtml; epub; texinfo; man; json These are standard targets derived from the default
Sphinx Makefile, with adjusted dependencies. Additionally, for all of these targets you can append -nitpick
to increase Sphinxs verbosity, or -clean to remove all Sphinx build artifacts.
latex performs several additional post-processing steps on .tex output generated by Sphinx. This target will
also compile PDFs using pdflatex.
html and man also generates a .tar.gz file of the build outputs for inclusion in the final releases.
If you have any questions, please feel free to open a Jira Case51 .
51 https://jira.mongodb.org/browse/DOCS
904