Mangodb
Mangodb
REPORT
BigData
MANGODB
By :
Benamara Ichrak
Messar Aya
In front of:
Mme Midoun
Année universitaire 2024/2025
Table of Contents
Introduction 1
Consistency 10
Availability 10
Security 13
Running MongoDB 13
Conclusion 17
Resources 17
Introduction
— Eliot Horowitz, MongoDB CTO and Co- • Developers are working with applications that
Founder create massive volumes of new, rapidly changing
MongoDB is designed for how we build data types — structured, semi-structured, and
and run data-driven applications with polymorphic data.
modern development • Long gone is the twelve-to-eighteen month
waterfall development cycle. Now small
teams work in agile
1
Figure 1: MongoDB Nexus Architecture, blending the best of relational and NoSQL technologies
sprints, iterating quickly and pushing code have done to address the requirements of
every week or two, some even multiple modern applications.
times every day.
3
• Flexible Data Model. NoSQL databases • CIOs are rationalizing their technology
emerged to address the requirements for the portfolios to a strategic set of vendors they
data we see dominating modern applications. can leverage to more efficiently support
Whether document, graph, key-value, or wide- their business.
column, all of them offer a flexible data model,
making it easy to store and combine data of
any structure and allow dynamic modification
of the schema without downtime or
performance impact.
MongoDB Multimodel
Architecture
4
With MongoDB, organizations can address technologies using native replication.
diverse application needs, computing platforms,
MongoDB ships with four supported storage
and deployment designs with a single database
engines, all of which can coexist within a single
technology:
MongoDB replica set. This makes it easy to
• MongoDB’s flexible document data model evaluate and migrate between them, and to
presents a superset of other database optimize for specific application requirements – for
models. It allows data be represented as example combining the in-memory engine for
simple key-value pairs and flat, table-like ultra
structures, through to rich documents and
objects with deeply nested arrays and sub-
documents
5
Figure 2: Flexible storage architecture, optimising MongoDB for unique application demands
6
to include additional types such as int, long,
date, floating point, and decimal128. BSON
documents contain one or more fields, and each
field contains a value of a specific data type,
including arrays, binary data and sub-documents.
MongoDB BSON documents are closely aligned
to the structure of objects in the programming
language. This makes it simpler and faster for
developers to model how data in the
application will map to data stored in the
database.
7
Documents that tend to share a similar structure usually spread across many tables. With the
are organized as collections. It may be helpful to MongoDB document model, data is more localized,
think of a collection as being analogous to a table which
in a relational database: documents are similar to
rows, and fields are similar to columns.
8
significantly reduces the need to JOIN separate Unlike NoSQL databases that push enforcement
tables. The result is dramatically higher of these controls back into application code,
performance and scalability across commodity MongoDB provides schema validation within the
hardware as a single read to the database can database via syntax derived from the proposed
retrieve the entire document containing all IETF JSON Schema standard.
related data. Unlike many NoSQL databases, users
Using schema validation, DevOps and DBA teams
don’t need to give up JOINs entirely. For
can define a prescribed document structure for
additional flexibility, MongoDB provides the
each collection, which can reject any documents
ability to perform equi and non-equi JOINs that
that do not conform to it.
combine data from multiple collections, typically
when executing analytical queries against live,
operational data.
Schema Governance
While MongoDB’s flexible schema is a powerful
feature for many users, there are situations
where strict guarantees on the schema’s data
structure and content are required.
9
Administrators have the flexibility to tune schema One fundamental difference with relational
validation according to use case – for example, if databases is that the MongoDB query model is
a document fails to comply with the defined implemented as methods or functions within the
structure, it can be either be rejected, or still API of a specific programming language, as
written to the collection while logging a warning opposed to a completely separate language like
message. Structure can be imposed on just a SQL. This, coupled with the affinity between
subset of fields – for example requiring a valid MongoDB’s JSON document model and the data
customer a name and address, while others
fields can be freeform, such as social media
handle and cellphone number. And of course,
validation can be turned off entirely, allowing
complete schema flexibility, which is especially
useful during the development phase of the
application.
Schema Design
Although MongoDB provides schema flexibility,
schema design is still important. Developers and
DBAs should consider a number of topics,
including the types of queries the application will
need to perform, relationships between data, how
objects are managed in the application code, and
how documents will change over time. Schema
design is an extensive topic that is beyond the
scope of this document. For more information,
please see Data Modeling Considerations.
Idiomatic Drivers
MongoDB provides native drivers for all popular
programming languages and frameworks to
make development natural. Supported drivers
include Java, Javascript, .NET, Python, Perl, PHP,
Scala and others, in addition to 30+ community-
developed drivers. MongoDB drivers are
designed to be idiomatic for the given
programming language.
10
structures used in object-oriented programming,
makes integration with applications simple. For a
complete list of drivers see the MongoDB
Drivers page.
11
subset of specific fields within the document or Data Visualization with BI Tools
complex aggregations and transformation of
With the MongoDB Connector for BI modern
many documents:
application data can be easily analyzed with
• Key-value queries return results based on industry-standard
any field in the document, often the primary SQL-based BI and analytics platforms. Business
key. analysts and data scientists can seamlessly
• Range queries return results based on analyze
values defined as inequalities (e.g, greater multi-structured, polymorphic data managed in
than, less than or equal to, between). MongoDB, alongside traditional data in their SQL
databases using the same BI tools deployed within
• Geospatial queries return results based on
millions of enterprises.
proximity criteria, intersection and inclusion
as specified by a point, line, circle or
polygon.
12
Indexing index on the component field, each
component is indexed and queries on the
Indexes are a crucial mechanism for optimizing component field can be optimized by this
system performance and scalability while index. There is no special syntax required for
providing flexible access to the data. Like most creating array indexes – if the field contains
database management systems, while indexes an array, it will be indexed as a array index.
will improve the performance of some
• TTL Indexes. In some cases data should
operations by orders of magnitude, they incur
expire out of the system automatically. Time
associated overhead in write operations, disk
to Live (TTL) indexes
usage, and memory consumption. By default,
the WiredTiger storage engine compresses
indexes in RAM, freeing up more of the working
set for documents.
14
optimizer selects the best index to use by style. Use cases enabled by MongoDB change
periodically running alternate query plans and streams include:
selecting the index with the best response time
• Powering trading applications that need to be
for each query type. The results of this empirical
updated in real time as stock prices rise and
test are stored as a cached query plan and are
fall.
updated periodically. Developers can review and
optimize plans using the powerful explain • Refreshing scoreboards in multiplayer games.
method and index filters. Using MongoDB
Compass, DBAs can visualize index coverage,
enabling them to determine which specific fields
are indexed, their type, size, and how often they
are used. Compass also provides the ability to
visualize explain plans, presenting key
information on how a query performed – for
example the number of documents returned,
execution time, index usage, and more. Each
stage of the execution pipeline is represented as
a node in a tree, making it simple to view
explain plans from queries distributed across
multiple nodes.
Covered Queries
Queries that return results containing only
indexed fields are called covered queries. These
results can be returned without reading from the
source documents. With the appropriate
indexes, workloads can be optimized to use
predominantly covered queries.
MongoDB Data
Management
Auto-Sharding
MongoDB provides horizontal scale-out for
16
cluster as the data grows or the size of the cluster zone.
increases or decreases.
Consistency
18
cause the operation to roll back and clients intervene manually.
receive a consistent view of the document.
A replica set consists of multiple replicas. At any
MongoDB also allows users to specify write given time one member acts as the primary
availability in the system using an option called replica set member and the other members act as
the write concern. The default write concern secondary replica set members. MongoDB is
acknowledges writes from the application, strongly consistent by default: reads and writes
allowing the client to catch network exceptions are issued to a primary copy of the data. If the
and duplicate key errors. Developers can use primary member fails for any reason (e.g., hardware
MongoDB's Write Concerns to configure failure,
operations to commit to the application only
after specific policies have been fulfilled – for
example only after the operation has been
flushed to the journal on disk. This is the same
mode used by many traditional relational
databases to provide durability guarantees. As
a distributed system, MongoDB presents
additional flexibility in enabling users to achieve
their desired durability goals, such as writing to
at least two replicas in one data center and
one replica in a second data center. Each query
can specify the appropriate write concern,
ranging from unacknowledged to
acknowledgement that writes have been
committed to all replicas.
Availability
Replication
MongoDB maintains multiple copies of data
called replica sets using native replication. A
replica set is a fully
self-healing shard that helps prevent database
downtime and can be used to scale read
operations. Replica failover is fully automated,
eliminating the need for administrators to
19
network partition) one of the secondary secondary members service a query, based on a
members is automatically elected to primary, consistency window defined in the driver. For data-
typically within several seconds. As discussed center aware reads, applications can also read
below, sophisticated rules govern which secondary from the closest copy of the
replicas are evaluated for promotion to the
primary member.
Sophisticated algorithms control the replica set MongoDB replica sets allow for hybrid in-
election process, ensuring only the most suitable memory and on-disk database deployments.
secondary member is promoted to primary, and Data managed by the
reducing the risk of unnecessary failovers (also In-Memory engine can be processed and analyzed in
real
known as "false positives"). In a typical deployment,
a new primary replica set member is promoted
within several seconds of the original primary
failing. During this time, queries configured with
the appropriate read preference can continue to
be serviced by secondary replica set members.
The election algorithms evaluate a range of
parameters including analysis of election
identifiers and timestamps to identify those
replica set members that have applied the most
recent updates from the primary; heartbeat and
connectivity status; and user-defined priorities
assigned to replica set members. In an election,
the replica set elects an eligible member with
the highest priority value as primary. By default,
all members have a priority of 1 and have an
equal chance of becoming primary; however, it is
possible to set priority values that affect the
likelihood of a replica becoming primary.
22
time, before being automatically replicated to the wire protocol from clients to the database,
MongoDB instances configured with the and of intra-cluster traffic. Network traffic can be
persistent disk-based WiredTiger storage compressed by up to 80%, bringing major
engine. Lengthy ETL cycles typical when moving performance gains to busy network
data between different databases is avoided, and environments and reducing connectivity costs,
users no longer have to trade away the scalable especially in public cloud environments, or when
capacity or durability guarantees offered by connecting remote assets such as IoT devices and
disk storage. gateways.
End-to-End Compression
The WiredTiger and Encrypted storage engines
support native compression, reducing physical
storage footprint by as much as 80%. In addition
to reduced storage space, compression enables
much higher storage I/O scalability as fewer bits
are read from disk.
Running MongoDB
25
• One-click scale up, out, or down on demand. • Upgrade. In minutes, with no downtime;
MongoDB Atlas can provision additional
• Scale. Add capacity, without taking the
storage capacity as needed without manual
application offline;
intervention.
• Visualize. Graphically display query
• Automated patching and single-click upgrades
performance to identify and fix slow running
for new major versions of the database,
operations;
enabling you to take advantage of the latest
and greatest MongoDB features
Monitoring
High-performance distributed systems benefit
from comprehensive monitoring. Ops Manager
27
Figure 9: Ops Manager self-service portal: simple, intuitive and powerful. Deploy and upgrade entire
clusters with a single click.
Featuring charts, custom dashboards, and Figure 10: Ops Manager provides real time &
automated alerting, Ops Manager tracks 100+ historic visibility into the MongoDB
key database and systems health metrics deployment.
including operations counters, memory and CPU
utilization, replication status, open connections,
queues and any node status.
28
database’s schema by running queries to review
document structure, viewing collection metadata,
and inspecting index usage statistics, directly
within the Ops Manager UI.
29
Disaster Recovery: RPO.
Resources
33