MongoDB Architecture Guide
MongoDB Architecture Guide
Table of Contents
Introduction
Availability
10
12
Security
12
Operational Management
13
MongoDB Atlas
15
Conclusion
15
We Can Help
16
Resources
16
Introduction
MongoDB wasnt designed in a lab. We built MongoDB
from our own experiences building large-scale, high
availability, robust systems. We didnt start from scratch, we
really tried to figure out what was broken, and tackle that.
So the way I think about MongoDB is that if you take
MySQL, and change the data model from relational to
document-based, you get a lot of great features:
embedded docs for speed, manageability, agile
development with dynamic schemas, easier horizontal
scalability because joins arent as important. There are a lot
of things that work great in relational databases: indexes,
dynamic queries and updates to name a few, and we
havent changed much there. For example, the way you
design your indexes in MongoDB should be exactly the
way you do it in MySQL or Oracle, you just have the option
of indexing an embedded field.
Eliot Horowitz, MongoDB CTO and Co-Founder
MongoDB is designed for how we build and run
data-driven applications with modern development
techniques, programming models, computing resources,
and operational automation.
Figur
Figure
e 1: MongoDB Nexus Architecture, blending the best
of relational and NoSQL technologies
Relational databases have reliably served applications for
many years, and offer features that remain critical today as
developers build the next generation of applications:
Expr
Expressive
essive query language & secondary Indexes
Indexes.
Users should be able to access and manipulate their
data in sophisticated ways to support both operational
and analytical applications. Indexes play a critical role in
providing efficient access to data, supported natively by
the database rather than maintained in application code.
Str
Strong
ong consistency
consistency.. Applications should be able to
immediately read what has been written to the
database. It is much more complex to build applications
around an eventually consistent model, imposing
significant work on the developer, even for the most
sophisticated engineering teams.
Enterprise Management and Integrations.
Databases are just one piece of application
infrastructure, and need to fit seamlessly into the
enterprise IT stack. Organizations need a database that
can be secured, monitored, automated, and integrated
with their existing technology infrastructure, processes,
and staff, including operations teams, DBAs, and data
engineers.
However, modern applications impose requirements not
addressed by relational databases, and this has driven the
development of NoSQL databases which offer:
Flexible Dat
Data
a Model. NoSQL databases emerged to
address the requirements for the data we see
dominating modern applications. Whether document,
graph, key-value, or wide-column, all of them offer a
flexible data model, making it easy to store and combine
data of any structure and allow dynamic modification of
the schema without downtime or performance impact.
Sc
Scalability
alability and P
Performance.
erformance. NoSQL databases were
all built with a focus on scalability, so they all include
some form of sharding or partitioning. This allows the
database to be scaled out across commodity hardware
deployed on-premises or in the cloud, enabling almost
unlimited growth with higher throughput and lower
latency than relational databases.
AlwaysAlways-On
On Global Deployments. NoSQL databases
are designed for continuously available systems that
provide a consistent, high quality experience for users
all over the world. They are designed to run across many
nodes, including replication to automatically synchronize
data across servers, racks, and
geographically-dispersed data centers.
While offering these innovations, NoSQL systems have
sacrificed the critical capabilities that people have come to
expect and rely upon from relational databases. MongoDB
offers a different approach. With its Nexus Architecture,
MongoDB is the only database that harnesses the
Figur
Figure
e 2: Flexible Storage Architecture, optimising MongoDB for unique application demands
innovations of NoSQL while maintaining the foundation of
relational databases.
MongoDB Multimodel
Architecture
MongoDB embraces two key trends in modern IT:
Organizations are rapidly expanding the range of
applications they deliver to digitally transform the
business.
CIOs are rationalizing their technology portfolios to a
strategic set of vendors they can leverage to more
efficiently support their business.
With MongoDB, organizations can address diverse
application needs, hardware resources, and deployment
designs with a single database technology:
MongoDBs flexible document data model presents a
superset of other database models. It allows data be
represented as simple key-value pairs and flat, table-like
structures, through to rich documents and objects with
deeply nested arrays and sub-documents
With an expressive query language, documents can be
queried in many ways from simple lookups to creating
sophisticated processing pipelines for data analytics
Figur
Figure
e 3: Example relational data model for a blogging
application
Documents that tend to share a similar structure are
organized as collections. It may be helpful to think of
collections as being analogous to a table in a relational
database: documents are similar to rows, and fields are
similar to columns.
For example, consider the data model for a blogging
application. In a relational database the data model would
comprise multiple tables. To simplify the example, assume
there are tables for Categories, Tags, Users, Comments
and Articles.
In MongoDB the data could be modeled as two collections,
one for users, and the other for articles. In each blog
document there might be multiple comments, multiple tags,
and multiple categories, each expressed as an embedded
array.
Document Validation
Dynamic schemas bring great agility, but it is also important
that controls can be implemented to maintain data quality,
especially if the database is shared by multiple
applications. Unlike NoSQL databases that push
enforcement of these controls back into application code,
MongoDB provides document validation within the
database. Users can enforce checks on document
structure, data types, data ranges and the presence of
mandatory fields. As a result, DBAs can apply data
governance standards, while developers maintain the
benefits of a flexible document model.
Schema Design
Although MongoDB provides schema flexibility, schema
design is still important. Developers and DBAs should
consider a number of topics, including the types of queries
the application will need to perform, relationships between
data, how objects are managed in the application code, and
how documents will change over time. Schema design is an
extensive topic that is beyond the scope of this document.
Idiomatic Drivers
MongoDB provides native drivers for all popular
programming languages and frameworks to make
development natural. Supported drivers include Java,
Javascript, .NET, Python, Perl, PHP, Scala and others, in
addition to 30+ community-developed drivers. MongoDB
drivers are designed to be idiomatic for the given language.
One fundamental difference with relational databases is
that the MongoDB query model is implemented as
methods or functions within the API of a specific
programming language, as opposed to a completely
separate language like SQL. This, coupled with the affinity
between MongoDBs JSON document model and the data
structures used in object-oriented programming, makes
integration with applications simple. For a complete list of
drivers see the MongoDB Drivers page.
Figur
Figure
e 5: Interactively build and execute database queries
with MongoDB Compass
Aggr
Aggregation
egation F
Framework
ramework queries return aggregations
and transformations of values returned by the query
(e.g., count, min, max, average, similar to a SQL GROUP
BY statement).
JOI
JOINs
Ns and graph traversals. Through the $lookup
stage of the aggregation pipeline, documents from
separate collections can be combined through a left
outer JOIN operation. $graphLookup brings native
graph processing within MongoDB, enabling efficient
traversals across trees, graphs and hierarchical data to
uncover patterns and surface previously unidentified
connections.
MapReduce queries execute complex data processing
that is expressed in JavaScript and executed across
data in the database.
Additionally the MongoDB Connector for Apache Spark
exposes Sparks Scala, Java, Python, and R libraries.
MongoDB data is materialized as DataFrames and
Datasets for analysis through machine learning, graph,
streaming, and SQL APIs.
Indexing
Query Optimization
MongoDB automatically optimizes queries to make
evaluation as efficient as possible. Evaluation normally
includes selecting data based on predicates, and sorting
data based on the sort criteria provided. The query
optimizer selects the best index to use by periodically
running alternate query plans and selecting the index with
the best response time for each query type. The results of
this empirical test are stored as a cached query plan and
are updated periodically. Developers can review and
optimize plans using the powerful explain method and
index filters. Using MongoDB Compass, DBAs can
visualize index coverage, enabling them to determine which
specific fields are indexed, their type, size, and how often
they are used. Compass also provides the ability to
visualize explain plans, presenting key information on how
a query performed for example the number of documents
returned, execution time, index usage, and more. Each
Covered Queries
Queries that return results containing only indexed fields
are called covered queries. These results can be returned
without reading from the source documents. With the
appropriate indexes, workloads can be optimized to use
predominantly covered queries.
Figur
Figure
e 6: Automatic sharding provides horizontal scalability
in MongoDB.
Range Shar
Sharding.
ding. Documents are partitioned across
shards according to the shard key value. Documents
with shard key values close to one another are likely to
be co-located on the same shard. This approach is well
suited for applications that need to optimize range
based queries.
Hash Shar
Sharding.
ding. Documents are distributed according
to an MD5 hash of the shard key value. This approach
guarantees a uniform distribution of writes across
shards, but is less optimal for range-based queries.
Zone Shar
Sharding.
ding. Provides the the ability for DBAs and
operations teams to define specific rules governing data
placement in a sharded cluster. Zones accommodate a
range of deployment scenarios for example locating
data by geographic region, by hardware configuration
for tiered storage architectures, or by application
feature. Administrators can continuously refine data
placement rules by modifying shard key ranges, and
MongoDB will automatically migrate the data to its new
zone.
Tens of thousands of organizations use MongoDB to build
high-performance systems at scale. You can read more
about them on the MongoDB scaling page.
Query Router
Sharding is transparent to applications; whether there is
one or one hundred shards, the application code for
querying MongoDB is the same. Applications issue queries
to a query router that dispatches the query to the
appropriate shards.
For key-value queries that are based on the shard key, the
query router will dispatch the query to the shard that
manages the document with the requested key. When
using range-based sharding, queries that specify ranges on
the shard key are only dispatched to shards that contain
documents with values within the range. For queries that
dont use the shard key, the query router will broadcast the
Figur
Figure
e 7: Sharding is transparent to applications.
Consistency
Transaction Model & Configurable Write
Availability
MongoDB is ACID compliant at the document level. One or
more fields may be written in a single operation, including
updates to multiple sub-documents and elements of an
array. The ACID guarantees provided by MongoDB ensures
complete isolation as a document is updated; any errors
cause the operation to roll back and clients receive a
consistent view of the document.
MongoDB also allows users to specify write availability in
the system using an option called the write concern. The
default write concern acknowledges writes from the
application, allowing the client to catch network exceptions
and duplicate key errors. Developers can use MongoDB's
Write Concerns to configure operations to commit to the
application only after specific policies have been fulfilled
for example only after the operation has been flushed to
the journal on disk. This is the same mode used by many
traditional relational databases to provide durability
guarantees. As a distributed system, MongoDB presents
additional flexibility in enabling users to achieve their
desired durability goals, such as writing to at least two
replicas in one data center and one replica in a second
data center. Each query can specify the appropriate write
concern, ranging from unacknowledged to
9
Availability
Replication
MongoDB maintains multiple copies of data called replica
sets using native replication. A replica set is a fully
self-healing shard that helps prevent database downtime
and can be used to scale read operations. Replica failover
is fully automated, eliminating the need for administrators
to intervene manually.
A replica set consists of multiple replicas. At any given time
one member acts as the primary replica set member and
the other members act as secondary replica set members.
MongoDB is strongly consistent by default: reads and
writes are issued to a primary copy of the data. If the
primary member fails for any reason (e.g., hardware failure,
network partition) one of the secondary members is
automatically elected to primary, typically within several
seconds. As discussed below, sophisticated rules govern
which secondary replicas are evaluated for promotion to
the primary member.
Figur
Figure
e 8: Self-Healing MongoDB Replica Sets for High
Availability
The number of replicas in a MongoDB replica set is
configurable: a larger number of replicas provides
increased data durability and protection against database
downtime (e.g., in case of multiple machine failures, rack
failures, data center failures, or network partitions). Up to
50 members can be provisioned per replica set.
Enabling tunable consistency, applications can optionally
read from secondary replicas, where data is eventually
consistent by default. Reads from secondaries can be
useful in scenarios where it is acceptable for data to be
slightly out of date, such as some reporting and analytical
applications. Administrators can control which secondary
members service a query, based on a consistency window
defined in the driver. For data-center aware reads,
applications can also read from the closest copy of the
data as measured by ping distance to reduce the effects of
geographic latency . For more on reading from secondaries
see the entry on Read Preference.
Replica sets also provide operational flexibility by providing
a way to upgrade hardware and software without requiring
10
Election Priority
Sophisticated algorithms control the replica set election
process, ensuring only the most suitable secondary
member is promoted to primary, and reducing the risk of
11
Security
The frequency and severity of data breaches continues to
escalate. Industry analysts predict cybercrime will cost the
global economy $6 trillion annually by 2021. Organizations
face an onslaught of new threat classes and threat actors
with phishing, ransomware and intellectual property theft
growing more than 50% year on year, and key
infrastructure subject to increased disruption. With
databases storing an organizations most important
information assets, securing them is top of mind for
administrators.
MongoDB Enterprise Advanced features extensive
capabilities to defend, detect, and control access to data:
Authentic
Authentication.
ation. Simplifying access control to the
database, MongoDB offers integration with external
security mechanisms including LDAP, Windows Active
Directory, Kerberos, and x.509 certificates.
Authorization. User-defined roles enable
administrators to configure granular permissions for a
user or an application based on the privileges they need
to do their job. These can be defined in MongoDB, or
centrally within an LDAP server. Additionally,
administrators can define views that expose only a
subset of data from an underlying collection, i.e. a view
that filters or masks specific fields, such as Personally
Identifiable Information (PII) from customer data or
health records.
Auditing. For regulatory compliance, security
administrators can use MongoDB's native audit log to
track any operation taken against the database
whether DML, DCL or DDL.
Encryption. MongoDB data can be encrypted on the
network, on disk and in backups. With the Encrypted
storage engine, protection of data at-rest is an integral
feature within the database. By natively encrypting
database files on disk, administrators eliminate both the
management and performance overhead of external
encryption mechanisms. Only those staff who have the
appropriate database authorization credentials can
access the encrypted data, providing additional levels of
defence.
12
Monitoring
High-performance distributed systems benefit from
comprehensive monitoring. Ops Manager and Cloud
Manager have been developed to give administrators the
insights needed to ensure smooth operations and a great
experience for end users.
Featuring charts, custom dashboards, and automated
alerting, Ops Manager tracks 100+ key database and
systems health metrics including operations counters,
memory and CPU utilization, replication status, open
connections, queues and any node status.
The metrics are securely reported to Ops Manager where
they are processed, aggregated, alerted and visualized in a
browser, letting administrators easily determine the health
of MongoDB in real-time. Historic performance can be
reviewed in order to create operational baselines and to
support capacity planning. The Visual Query Profiler
provides a quick and convenient way for DBAs to analyze
the performance of specific queries or query families. It can
also provide recommendations on the addition of indexes
that would improve performance of common operations.
13
Figur
Figure
e 9: Ops Manager self-service portal: simple, intuitive and powerful. Deploy and upgrade entire clusters with a single
click.
Integration with existing monitoring tools is also
straightforward via the Ops Manager and Cloud Manager
RESTful API, and with packaged integrations to leading
Application Performance Management (APM) platforms
such as New Relic. This integration allows MongoDB
status to be consolidated and monitored alongside the rest
Figur
Figure
e1
10:
0: Ops Manager provides real time & historic
visibility into the MongoDB deployment.
Conclusion
MongoDB is the database for today's applications:
innovative, fast time-to-market, globally scalable, reliable,
15
We Can Help
We are the MongoDB experts. Over 2,000 organizations
rely on our commercial products, including startups and
more than 50% of the Fortune 100. We offer software and
services to make your life easier:
MongoDB Enterprise Advanced is the best way to run
MongoDB in your data center. Its a finely-tuned package
of advanced software, support, certifications, and other
services designed for the way you do business.
MongoDB Atlas is a database as a service for MongoDB,
letting you focus on apps instead of ops. With MongoDB
Atlas, you only pay for what you use with a convenient
hourly billing model. With the click of a button, you can
scale up and down when you need to, with no downtime,
full security, and high performance.
Resources
For more information, please visit mongodb.com or contact
us at [email protected].
Case Studies (mongodb.com/customers)
Presentations (mongodb.com/presentations)
Free Online Training (university.mongodb.com)
Webinars and Events (mongodb.com/events)
Documentation (docs.mongodb.com)
MongoDB Enterprise Download (mongodb.com/download)
MongoDB Atlas database as a service for MongoDB
(mongodb.com/cloud)
New York Palo Alto Washington, D.C. London Dublin Barcelona Sydney Tel Aviv
US 866-237-8815 INTL +1-650-440-4474 [email protected]
2016 MongoDB, Inc. All rights reserved.
16