MongoDB Top 7 NoSQL Considerations
MongoDB Top 7 NoSQL Considerations
7 NoSQL
Considerations
Table of Contents
Introduction 4
Graph Database 5
Document Database 6
Graph Database 8
Document Database 8
Consistent Systems 10
Idiomatic Drivers 11
APIs 11
2
Table of Contents
Key Consideration #5: Mobile Data 13
Schema Flexibility 13
Edge-to-Cloud Synchronization 13
Database as a Service 15
Commercial Support 16
Community Strength 16
Conclusion 18
Resources 18
3
Introduction
Data and software are at the heart of business today. But for many organizations,
realizing the full potential of the digital economy remains a significant challenge. Since
the inception of MongoDB, we’ve understood that the biggest challenge organizations
face is working with data:
• Demands for higher productivity and faster time Generally referred to as NoSQL databases, these
to market are being held back by rigid relational systems discard the foundation that has made
data models that are mismatched to modern relational databases so useful for generations
code and impose complex interdependencies of applications: expressive query language,
among engineering teams. secondary indexes, and strong consistency. NoSQL
• Organizations are unable to work with, and databases share several key characteristics,
extract insights from, massive increases in the including a more flexible data model, higher
new and rapidly changing structured, semi- scalability, and superior performance.
structured, and polymorphic data generated by
Although the term NoSQL often is used as an
today’s applications.
umbrella category for all non-tabular databases,
• Monolithic and fragile legacy databases inhibit
it’s too vague and poorly defined to be a useful
the wholesale shift to distributed systems and
descriptor of the underlying data model. Primarily,
cloud computing that deliver the resilience and
it neglects the trade-offs NoSQL databases make
scale businesses need, making it harder to satisfy
to achieve flexibility, scalability, and performance.
new regulatory requirements for data privacy.
• Previously separate transactional, analytical, To help technology decision-makers navigate
search, and mobile workloads are converging the complex and evolving domain of NoSQL
to create rich, data-driven applications and and non-tabular databases, we’ve highlighted
customer experiences. However, each workload the key differences between them in this white
traditionally has been powered by its own paper. We also explore critical considerations
database, creating duplicated data silos based on seven dimensions that define these
stitched together with fragile ETL pipelines systems: data model; query model; consistency
accessed by different APIs. and transactional model; APIs; mobile data; data
platform; and commercial support, community
To address these limitations, several non-tabular strength, and freedom from lock-in.
alternatives to relational databases have emerged.
4
Key Consideration #1: Data Model
The primary way in which NoSQL databases differ they generally fall into three categories: key-value
from relational databases is the data model. or wide-column, graph, and document.
Although there are dozens of NoSQL databases,
Graph Database
Graph databases use graph structures with Applications
nodes, edges, and properties to represent data
relationships. In essence, data is modeled as a Graph databases are useful in cases where
network of relationships among specific elements. traversing relationships is core to the application,
Their main appeal is in their ability to model such as navigating social network connections,
and navigate relationships among entities in network topologies, or supply chains. Other
an application. This makes graph databases use cases include detecting fraud, building
incredibly efficient for finding patterns, making recommendation engines, managing IT networks,
predictions, and creating solutions. Flexible and computing graph algorithms between data.
schemas allow developers to easily make changes
to graph databases as requirements change. This Examples
is especially valuable for agile teams building Neo4j, Amazon Neptune
modern applications.
5
Document Database
Whereas relational databases store data in rows the access patterns for the data you’re working
and columns, document databases store data in with. In essence, data that is accessed together
documents by using JavaScript Object Notation should be stored together. While a document
(JSON), a text-based data interchange format database does allow you to store data without
popular among developers. Documents provide defining what it is, the shape of that data matters
an intuitive and natural way to model data that if you plan to do more than simply retrieve whole
closely aligns with object-oriented programming documents by keys. In some cases, schema
— each document is effectively an object that design is irrelevant — for example, if you’re simply
matches the objects developers work with in code. storing pre-existing documents — and a simple
Documents contain one or more fields, and each key-value store would suffice. On the other hand,
field contains a typed value such as a string, if you expect to filter, modify, and retrieve data
date, binary, decimal value, or array. Rather than efficiently, schema design and data modeling
spreading out a record across multiple columns are essential.
and tables connected with foreign keys, each
record is stored along with its associated (i.e., Developers are often in the best position to know
related) data in a single, hierarchical document. the data access patterns for their applications.
This model accelerates developer productivity, Document schemas can increase performance
simplifies data access, and, in many cases, for a given set of hardware by reducing
eliminates the need for expensive join operations computation, I/O operations, and contention
and complex abstraction layers such as object between users. With today’s pay-as-you-go cloud
relational mapping (ORM). pricing, that’s an important consideration. What
really differentiates a document database from
The schema of a relational database is defined by relational databases is the ability to co-locate
tables; in a document database, the notion of a related data in the atomic unit of storage so
schema is dynamic — each document can contain multiple values for an attribute can exist within
different fields. This flexibility can be particularly a single record rather than being broken up into
helpful for modeling data where structures can rows and stored independently. A document
change between each record — i.e., polymorphic database with a properly designed schema
data. It also makes it easier to evolve an enables you to filter and retrieve data with minimal
application during its life cycle, such as by adding computational overhead and in a single I/O
new fields. operation. This can make finding and retrieving
data far faster and less expensive.
False assumptions about NoSQL databases are
prevalent. One of the most common assumptions Applications
is that NoSQL databases are schemaless and that
data modeling is not necessary. This impression Document databases are useful for a wide variety
comes from the fact that NoSQL databases are of applications due to the flexibility of the data
ideally suited for storing unstructured data, and model, the ability to query on any field, and
because they dispense with the tabular structure the natural mapping of the document model to
of relational databases. This is why NoSQL objects in modern programming languages.
databases are often referred to as non-relational
databases. Ideally, schema design and data Examples
modeling in a document database are based on MongoDB, Azure CosmosDB, Apache CouchDB
6
Takeaways
• Documents are a superset of other data models, • Graph databases are most useful for navigating
so they support a wider variety of data types social connections, network topologies, and
and use cases. supply chains.
• Key-value and wide-column databases are • The document data model has the
opaque to the system — only the primary key can broadest applicability.
be queried. • The document data model is the most natural
• The wide-column model provides more granular because it maps directly to objects in modern
access to data than the key-value model but is object-oriented languages.
less flexible than the document model. • Despite common assumptions, data modeling
• Key-value and wide-column databases are and schemas are critical elements of document
desired for their simplicity, performance, databases if you expect to filter, modify, and
and scalability. retrieve data efficiently.
• Graph databases use nodes to represent
relationships such as parent-child, actions,
and ownership.
7
Graph Database
Graph databases provide rich query models in queries. For use cases involving multiple query
which simple and complex relationships can be patterns, there’s an option to employ a multimodel
interrogated to make direct and indirect inferences database where different data models and query
about the data in the system. Although relationship types are available within a single platform. For
analysis tends to be efficient, other types of example, MongoDB offers the $graphLookup
analysis are less optimal. As a result, graph aggregation stage for graph processing natively
databases are rarely used for general-purpose, within the database. $graphLookup enables
operational applications. Rather, they’re often efficient traversals across graphs, trees, and
coupled with document or relational databases hierarchical data to uncover patterns and surface
to surface graph-specific data structures and previously unidentified connections.
Document Database
Document databases provide the ability to search engine. MongoDB, for instance, provides
query and update any field within a document, an aggregation framework for developers to
although capabilities in this domain vary. Some create processing pipelines for data analytics
databases, such as MongoDB, provide a rich set and transformations via faceted search, joins,
of indexing options to optimize a wide variety unions, geospatial processing, materialized views,
of queries and to automate data management, and graph traversals. To update data, MongoDB
including text, geospatial, compound, sparse, provides expressive update methods that enable
wildcard, time to live (TTL), and unique indexes. developers to perform complex manipulations
Some document databases support real-time against matching elements of a document —
analytics against data in place without having to including elements embedded in nested arrays —
replicate to a dedicated analytics application or all in a single transactional update operation.
Takeaways
• The biggest difference between non-tabular development costs and application-level
databases lies in the ability to query requirements to support more complex
data efficiently. query patterns.
• Key-value databases and wide-column stores • Document databases provide the richest query
provide a single means of accessing data: functionality, which allows them to address
primary keys. Although fast, they offer limited a wide variety of operational and real-time
query functionality and may impose additional analytics applications.
8
Key Consideration #3: Consistency and
Transactional Model
Most NoSQL systems maintain multiple copies • Isolation
of data for availability and scalability purposes. • Durability
These databases can impose different guarantees
on the consistency of data across copies. NoSQL The point of ACID transactions is to guarantee
databases tend to be categorized as either data validity despite errors, power failures, and
strongly consistent or eventually consistent. other mishaps. Atomicity is an assurance that
With a strongly consistent system, writes by the database operations are indivisible or irreducible
application are immediately visible in subsequent such that either all operations complete or none
queries. With an eventually consistent system, the complete. Because these databases can combine
visibility of writes depends on which data replica related data that otherwise would be modeled
is serving the query. For example, when reflecting across separate parent-child tables in a tabular
inventory levels for products in a product catalog, schema, atomic single-record operations provide
with a consistent system each query will see transaction semantics that meet the data integrity
the current inventory as it’s updated by the needs of the majority of applications.
application. With an eventually consistent system,
It’s important to note that some developers and
the inventory levels may not be accurate for a
database administrators have been conditioned
query at a given time but will eventually become
by 40 years of relational data modeling to assume
accurate as data is replicated across all nodes in
multirecord transactions are a requirement
the database cluster. For this reason, application
for any database, regardless of the underlying
code can be different for eventually consistent
data model. Some are concerned that although
systems — rather than updating the inventory by
multidocument transactions aren’t needed by their
taking the current inventory and subtracting one,
apps today, they might be in the future. And for
for example, developers are encouraged to issue
some workloads, support for ACID transactions
idempotent queries that explicitly set the inventory
across multiple records is required.
level. Developers also need to build additional
control logic in their apps to handle potentially MongoDB added support for multidocument ACID
stale or deleted data. transactions in 2018 so developers could address a
wider range of use cases with the familiarity of how
Most NoSQL systems offer atomicity guarantees
transactions are handled in relational databases.
at the level of an individual record. Atomicity is
Through snapshot isolation, transactions provide a
one of four transaction properties that constitute
consistent view of data and enforce all-or-nothing
ACID transactions. The four properties in an ACID
execution. MongoDB is relatively unique in offering
transaction are:
the transactional guarantees of traditional
• Atomicity relational databases with the flexibility and scale
• Consistency that come from NoSQL databases.
9
Consistent Systems
Applications can have different requirements Document and graph databases can be
for data consistency. For many applications, it’s consistent or eventually consistent. MongoDB
imperative for data to be consistent at all times. provides tunable consistency. By default, data
Because development teams have worked under is consistent — all writes and reads access the
a model of consistency with relational databases primary copy of the data. As an option, read
for decades, this approach is more natural and queries can be issued against secondary copies
familiar. In other cases, eventual consistency is an where data may be eventually consistent if the
acceptable trade-off for the flexibility it allows in write operation has not yet been synchronized
the system’s availability. with the secondary copy; the consistency choice is
made at the query level.
Takeaways
• Different consistency models pose different while incurring performance overhead via read
trade-offs for applications in the areas of repairs and compactions.
consistency, availability, and performance. • Most NoSQL databases provide single-record
• MongoDB provides tunable consistency, defined atomicity. This is sufficient for many applications
at the query level. but not all.
• Eventually consistent systems provide some • MongoDB provides multidocument ACID
advantages for inserts at the cost of making guarantees, making it easier to address a range
reads, updates, and deletes more complex, of use cases with a single data platform.
10
Key Consideration #4: Interfaces
There is no single standard for interfacing with maturity of APIs can affect the time and cost
NoSQL databases. Each presents different designs required for developing and maintaining the
and capabilities for application developers. The application and database.
Idiomatic Drivers
Programming languages provide different drivers provide direct interfaces to set and get
paradigms for working with data. Idiomatic documents or fields within documents. With other
drivers are created by development teams that types of interfaces, it may be necessary to retrieve
are experts in a given language and know how and parse entire documents and navigate to
programmers prefer to work within a language. specific values in order to set or get a field.
This approach can also provide efficiencies for
accessing and processing data by leveraging MongoDB supports idiomatic drivers in more than
specific features in a programming language. a dozen languages including Java, .NET, Ruby,
Because idiomatic drivers are easier for Node.js, Python, PHP, C, C++, C#, JavaScript,
developers to learn and use, they reduce the Go, Rust, and Scala. Dozens of other drivers are
onboarding time required for teams to begin supported by the developer community.
working with a database. For example, idiomatic
APIs
Some systems provide representational state SQL-like APIs help reduce the learning curve
transfer (RESTful) interfaces. This approach has for non-developers already skilled in SQL, such
the appeal of simplicity and familiarity, although as business analysts and data scientists. The
it also relies on the inherent latencies associated MongoDB Atlas SQL Interface enables users to
with HTTP. For our multi-cloud developer data leverage existing SQL knowledge and familiar
platform, MongoDB Atlas, the MongoDB Atlas tools to query and analyze Atlas data live. The
Data API is a fully managed REST-like API that Atlas SQL Interface uses mongosql, a SQL-
enables developers to access their MongoDB 92-compatible dialect that’s designed for the
Atlas data and perform CRUD operations and document model. It also leverages Atlas Data
aggregations. With the Atlas Data API, you can Federation functionality for running queries across
read and write data in Atlas with standard Atlas clusters and cloud storage, like S3.
HTTPS requests.
11
Command Line Interface (CLI)
CLIs are text-based interfaces for interacting with the fastest way to create and manage an Atlas
a database, application, file, or piece of hardware. database, automate ongoing operations, and scale
CLIs are often the interaction method of choice a deployment for the full application development
by advanced developers who prefer control and lifecycle. The Atlas CLI gives users a streamlined
speed over a more visual interface like a graphical experience for both onboarding and ongoing
user interface (GUI). The MongoDB Atlas CLI is management of an Atlas database in the cloud.
Takeaways
• The maturity and functionality of APIs vary • Carefully evaluate the SQL-like APIs offered by
significantly across non-relational products. non-relational databases to ensure they can
• MongoDB’s idiomatic drivers minimize meet the needs of applications and developers.
onboarding time for new developers and simplify
application development.
12
Key Consideration #5: Mobile Data
The performance of mobile applications is just as scale easily and quickly as more users download
important as the performance of server-based an app, and to support the cutting edge of mobile
architectures. But mobile apps introduce the development technologies as they evolve. NoSQL
added challenge of not always being connected databases — which are engineered to scale out on
to the network. Application developers need a demand by leveraging less expensive commodity
solution for keeping all of their customers’ apps in hardware or cloud infrastructure — are ideally
sync with the backend database, no matter where suited to the extra demands placed on the
they are in the world and what kind of network backend by mobile applications that sync to it.
connection they have. The solution also needs to
Schema Flexibility
Because new features are always being added in features or updating objects to account for new
mobile apps, making schema changes in relational use cases is simply a matter of entering new lines of
databases for new situational relationships code. NoSQL databases also are ideal for handling
becomes increasingly time-consuming. Mobile frequent application updates that are a continual
applications also present more use cases than part of the app development life cycle. There’s no
relational databases are designed to handle, need to overhaul the logic just to fix a bug. And
including device type, operating system, firmware, making changes in one part of the database is not
and location. For NoSQL databases, adding likely to affect other parts of the application.
Edge-to-Cloud Synchronization
MongoDB Atlas Device Sync is a fully managed and remote workforces that drive user adoption,
service that syncs mobile data and MongoDB improve productivity, and deliver ROI. Device
Atlas. This solution addresses the unique Sync enables teams to take advantage of robust
technical challenges of mobile and offline-first bidirectional data sync between devices and Atlas
development, allowing organizations to rapidly without having to write complex conflict resolution
build responsive applications for their customers and networking code.
13
Takeaways
• The same flexible data model, higher scalability, • The lack of rigid relational schemas makes
and superior performance found in NoSQL NoSQL development more agile and better
databases for server environments make equipped to add new features, update apps,
NoSQL an ideal solution for mobile applications and fix bugs without having to overhaul the
and data. entire database.
• NoSQL databases are engineered to scale • MongoDB Atlas Device Sync is a fully
out on demand by leveraging less expensive managed service that syncs mobile data and
commodity hardware or cloud infrastructure. MongoDB Atlas.
14
The Superset of All Data Models
The way to eliminate DIRT is by using a developer In many cases, the relationships between
data platform that simplifies and accelerates how data is more natural to model with documents
developers work with data. MongoDB has built a and subdocuments than in separate tables.
developer data platform that reduces the need Documents map directly to objects in modern
for niche databases and the associated costs object-oriented languages, so the developer
of deploying and maintaining a complicated experience more closely resembles how they
sprawl of data technologies. It makes it faster already think and code. This makes it an ideal
and easier for teams to work with data to support platform to build upon. The document model is a
the demands of modern applications while superset of other data models because it can be
helping to massively simplify an organization’s used to support graph workloads, key-value, time-
data infrastructure. series, and geospatial data. So there’s no need for
additional niche NoSQL databases.
Database as a Service
A modern developer data platform enabled schemas and complex relationships, the Atlas
through a database-as-a-service capability gives developer data platform provides a fully elastic
developers the freedom and flexibility to work data infrastructure that can be updated as
seamlessly with data wherever their applications needed via idiomatic drivers that developers are
and users need it, and build integrated search already familiar with. This allows developers more
features on top of cloud data across all the major time to focus on their applications rather than
public cloud platforms. Rather than rigid tabular managing databases themselves.
Takeaways
• Modern multi-cloud environments require the ability to create a resilient, high-availability
flexibility, speed, and elasticity not found in data platform that puts data closer to the
relational databases with tabular schemas. applications that need it.
• Rigid tabular structures lead to the Data and • Database-as-a-service capabilities allow
Innovation Recurring Tax (DIRT). developers to spend less time managing
• Distributed databases deployed in the cloud databases and more time building applications
to the edge of the network give organizations and rich query experiences.
15
Key Consideration #7: Commercial Support,
Community Strength, Freedom From Lock-In
A database is a major investment. Once an that can be amortized across many projects.
application has been built on a given database, NoSQL databases are still a relatively emergent
it is costly, challenging, and risky to migrate it technology. Although there are many new options
to a different one. Companies usually invest in a in the market, only a subset of technologies and
small number of core technologies so they can vendors will stand the test of time.
develop expertise, integrations, and best practices
Commercial Support
Consider the health of the vendor or product when of users dictate. Having a strong, experienced
evaluating a database. It is important not only support organization capable of providing
that the product continues to exist, but also that services globally is another relevant consideration.
it evolves and adds new features as the needs
Community Strength
There are significant advantages to having practices, documentation, and code samples, all
a strong community around a technology, of which reduce risk in new projects. It also helps
particularly databases. A database with a organizations retain key technical talent. A
strong community of users makes it easier to strong community encourages other technology
find and hire developers who are familiar with vendors to develop integrations and participate in
the product. It makes it easier to find best the ecosystem.
16
Freedom From Lock-In
Many organizations have been burned by or in the cloud under a database-as-a-service
database lock-in and abusive commercial consumption model.
practices. The use of open-source software and
commodity hardware has provided an escape MongoDB Atlas database enables you to deploy
route for many, but organizations also have data across AWS, Google Cloud, and Microsoft
concerns that as they move to the cloud, they may Azure. In addition, you can create a multi-cloud
end up trading one form of lock-in for another. cluster to enable applications that make use of
two or more clouds at the same time. MongoDB
It’s important to evaluate the licensing and Enterprise Advanced gives developers and
availability of any major new software investment. DevOps teams the option to download and run the
Also critical is having the flexibility to run the database on their own infrastructure. Wherever
database wherever it’s needed — whether it’s from you choose to run MongoDB, it uses the same
a developer’s laptop in early-stage adoption, on codebase, APIs, and management tooling.
your own infrastructure as you go into production,
Takeaways
• Community size and commercial strength are groups in most major cities, and it provides
important for evaluating NoSQL databases. extensive documentation.
• MongoDB is one of the very few NoSQL • MongoDB is available to run on your own
database companies to be publicly traded, infrastructure or as a fully managed
it has the largest and most active community, cloud service on all of the leading public
its support teams spread across the world cloud platforms.
provide 24/7 coverage, it boasts user
17
Conclusion
As the technology landscape evolves, organizations increasingly find the need to evaluate new
databases to support changing application and business requirements. Considering the media hype
around NoSQL databases and the commensurate lack of clarity in the market, it’s important to make
clear distinctions between the available solutions when possible. As discussed in this white paper, there
are several key criteria to consider when evaluating these technologies. Many organizations find that
document databases such as MongoDB are best suited to meet these criteria.
Resources
Presentations
Try MongoDB