DF100 - 01 - Introduction To MongoDB and Atlas
DF100 - 01 - Introduction To MongoDB and Atlas
Introduction to
MongoDB & Atlas
MongoDB Developer Fundamentals
Release: 20240216
Topics we cover
Why a new Database?
MongoDB Terminology
MongoDB Benefits
When to use MongoDB
The MongoDB Atlas Platform
3
Why a new Database?
The world is changing
MongoDB was established in 2007 (as 10gen) by three entrepreneurs who had previously
developed some of the internet's first advertising and shopping networks (DoubleClick)
They were unable to source a suitable database (frustration with the limitations of available
databases at the time), so they built their own.
This was popular, so initially, they focussed on being a database company, returning to the cloud
and then BaaS later.
It is a highly available and horizontally scalable database designed to be the primary database for
modern business applications, focusing on the developer and DevOps audiences.
Fifty years of RDBMS has not been forgotten. The creators of MongoDB wanted to retain as much
"Database" behavior as possible but as a distributed system.
Question - What two fundamental RDBMS features are hard to do efficiently in a distributed
system?
Relational Data Model
EmpID Name Dept Title Manage Payband
9950 Dunham, Justin 500 1500 6531 C
1 9950 100
2 9950 200
Answer:
● Joins - These are slow in a distributed system due to network hops
● Transactions - For similar reasons but also reliability is also an issue
Typical RDBMS application schema depicted above - “One record” may consist of 6 tables - 9 rows
which would cause a problem in a distributed system.
Relational - Denormalization
EmpID Name Dept Title Manage Payband
9950 Dunham, Justin Marketing Product 6531 C
Manager
We can reduce the number of tables hence reduce the number of calls between servers by
denormalizing data.
Denormalizing might seem like a backward step for those from an RDBMS background, but in
reality, it is often utilized for performance reasons. With MongoDB, this is not considered bad
practice.
The focus should be on the business requirements and optimizing a schema for this.
In this example using RDBMS, we cannot denormalize this 1:N relationship any further to create a
single record * - which would negate the need for joins, plus make the data more comfortable to
use and digest
* without complicated and messy delimited methods such as using delimited lists
Document - Arrays
The example above depicts how we would ideally want to store this data in RDBMS, but it is not
possible.
With the document model, we can not only store scalar types such as integer, float, text, etc., but
we can also store embedded data - which equates to how we would ideally want to store the above
(a table inside a table)
We have two of these special Container Types - Array and Nested Document. And Array is a field
with multiple values - a Nested Document is fields that has sub fields of its own.
This explicit parent-child storage mechanism improves retrieval and query performance and
simplifies our application queries and updates.
An example document with the nested parent-child relationships described previously is shown
above.
Note how this visualization is easier to digest than looking at such relationships in tabular form.
The above is JSON, which is human readable. We don’t store JSON in MongoDB as this is not
performant. We use BSON. Similar to viewing data as CSV in Oracle, but this is not how it is stored.
The field names above will also be stored as data, which creates some overhead. But compression
in MongoDB greatly reduces the space required to store field names repeated across documents.
10
BSON is similar to how we used to save objects to disk by serializing them to a stream of bytes in
1990s/2000s windows applications
Terminology
RDBMS
MongoDB
Namespace
11
We explained that a Document is like a richer, more structured version of a Row with Repeating
fields and Depth
Apart from that concepts are very similar to any other database, Documents are grouped together
in collections (Tables) which should have a similar if not identical structure.
Collections are grouped together in Databases - where they relate to the same application or
business "Thing". Like a "Schema' in Oracle or a 'Database' in MySQL and MSSQL
Inside a Document you have Fields - and we haven't mentioned this yet, on fields you can add
indexes to speed up querying.
We will mix and match these through this training to ease in the new terminology.
Note: A field may also be referred to as a key.
MongoDB - Agility
EmpID Name Dept Title Manage Payband BenTyp Plan
e
9950 Dunham, Justin Marketing Product 6531 C Health PPO Plus
Manager
Dental Standard
EmpID Name Title Payband Bonus
12
The MongoDB document approach means that not all documents in a collection (records/tables in
SQL language) need to have exactly the same fields.
Logically they should be variants of the same type of record, and consistency should be enforced
to a degree. However, this flexibility is advantageous.
No need to store missing fields, and there is no cost for adding new ones. The database server
doesn’t have a list of expected fields by default. This is efficient if you need to store sparse data
where in an RDBMS you may have a lot of null values.
Every document in MongoDB already logically contains every possible legal field name. It’s just it
has a value of null by default unless you set it to something else.
MongoDB - Usability
13
When designing MongoDB, careful consideration was given about how to interact with the data.
Traditional SQL was observed as being inadequate
● Using SQL as the internal computer transfer mechanism is inefficient as it requires
parsing, which is expensive and errors prone including injection attacks.
● SQL was not designed as an interprocess design mechanism but as a UI (That's why it's
basically English), and it's a terrible fit, but ODBC/JDBC use it.
● SQL needs to be standard - and the SQL standard doesn't work for documents. Better no
SQL than SQL-ish
Idiomatic Driver Libraries instead:
● MongoDB uses BSON to communicate and utilizes an object-based API in the form of
drivers.
● Drivers that work with native data types (Maps, Dictionaries, Objects) are available for
many popular programming languages as well as idiomatic ones for other less common
languages
● For testing/debugging, we have an interactive JSON REPL (Read Eval Print Loop)
environment. This allows interaction with the database in real-time, rather than having to
compile code every time you run a query
Drivers for: Java, .net, C, C++, JavaScript, Python, Swift, Haskell, Go, Ruby, NodeJS, Scala, PHP
MongoDB - Usability
})
Compass as a GUI to access the
database > db.collection.findOne()
"_id": ObjectId("5106c1c2fc629bfe52792e86"),
"product": "MongoDB",
14
MongoDB - Utility
MongoDB provides agility, scalability, and performance without sacrificing the functionality of
relational database (like full index support and rich queries)
It utilizes a planner to select what indexes might optimize query performance. Indexes are used to
limit the data that needs to be scanned.
This differs from some NoSQL DBs where you can only query on predefined patterns.
MongoDB behaves very much like a typical RDBMS database in that it also edits smartly - i.e., the
server makes changes directly instead of fetching the record back to the client, modifying it, and
pushing it back (like many NoSQL databases do, which causes race conditions and locking
problems)
SQL databases have always been able to summarise and aggregate on the server-side and simply
return raw data. Early on, like other vendors, MongoDB adopted Map-reduce to achieve this, but it
is complicated and slow, so it was changed to the aggregation pipeline.
The aggregation pipeline has extended MongoDB query capability to allow complex in-database
computation and aggregation to provide summarised data. It is declarative, rather than writing
functions in Map-Reduce, which is more like SQL.
Availability and Scalability
Replica Set
Compression of data
Allowing scalability was one of the driving factors for creating MongoDB, along with the ability to
run over multiple physical machines to take advantage of more RAM, CPU, and Disk.
A prerequisite for this is ‘High Availability’ because the more machines you deploy, the greater the
risk for failure. If configured correctly, MongoDB can survive losing up to 50% of its servers and
keep working (with caveats). A server can be taken down, upgraded, maintained, and replaced
without downtime to the cluster.
The concept of sharding is synonymous with table partitioning. If deploying scaling, auto sharding
can be utilized to spread data across multiple clusters. This can dynamically resize partitions and
move data accordingly.
MongoDB compresses data on disk (up to about 50%) to utilize space and minimize storage costs.
Enterprise Tooling
Enterprise Management Tools:
17
Working at scale requires good enterprise management tooling, which we have in the form of
Atlas, MongoDB as a service and
Ops Manager & Cloud Manager (Hosted Ops Manager) to Manage your own hardware.
These tools let you Monitor, Deploy, Upgrade Live , and Backup, etc when you supply the hardware
and, for Atlas even the hardware is provided for you and scaled to fit.
In addition we have integration with Kubernetes, and Terraform to ensure MongoDB can be
deployed in these environments.
When MongoDB should be used
When you need high-speed access to complex objects
● Atomic partial updates
● Fast retrieval
● Secondary indexes
● Aggregation capabilities
18
Imagine writing the backend for a complex system such as a computer game where the player has
created a whole environment with buildings, characters, etc.
Partial updates allow multiple processes to effectively update the same document simultaneously
where there is no overlap of changes.
The aggregation pipeline is a mature and powerful tool for querying data in a fast, efficient, and
easy to understand manner.
MongoDB can store larger text objects, and blobs inline with the data and not overspill pages with
their own API’s
When MongoDB should be used (Cont.)
When you value rapid development
● Interaction by Objects
● Rich functionality
19
Chose MongoDB because of the ability of NoSQL to ingest structured, semi-structured, and
unstructured information without requiring tedious, expensive, and time-consuming database-
mapping or extract, transform and load (ETL) processes.
Leading internet companies release new software versions every half hour or less. Used well,
MongoDB can assist in enabling such rapid evolution of live systems.
When MongoDB should be used (Cont.)
When we have large data volumes.
● Data volumes growing
● Growth potentially unlimited.
● No big up-front payment.
20
Global application usage requiring localized data for speed / legal reasons
Use of ‘Zone Sharding’ to distribute data across different hardware depending on data age.
relevance, etc
Things to be conscious of
Easy to build with no training.
● Easy to get wrong.
● Performance can suffer
● Issues can arise too late!
21
Easy to get wrong - Although a powerful tool, your applications will suffer in the long run if it's
adopted using bad practices. Training and good design are important!
Because something is free and easy to understand the basics of does not mean it's simple to
master
Optimization of schema design for most utilized application queries is crucial to long term
application performance. Changes to schema design are not impossible, but better to try and get
this right from the beginning.
Developers should understand how the database works as well as DBAs as they often perform
traditional DBA tasks.
22
With an RDBMS
23
With MongoDB
24
Dispelling some myths that some people may have heard about MongoDB - Some features such as
schema enforcing are optional, not missing.
You don’t need to store every field for each record. If you just omit it, it’s null and consumes no
space.
Normally a given field has a single data type that can be enforced or not, depending on your
requirements.
Data is stored as serialized binary information as BSON and not as JSON. MongoDB is not a JSON
database any more than Oracle is a CSV database.
The BI connector is an add-on for MongoDB Enterprise that provides a MySQL Compatible proxy
that makes MongoDB pretend to be a MySQL server and provides read-only SQL access.
What is different in MongoDB
You can Query with SQL but normally don't
● Interaction is from code using Object-based APIs
● Rather than constructing SQL Strings
● SQL is used only to enable third-party BI tools.
25
SQL can be used to query MongoDB, but this is not common. MongoDB uses a more performant
query language that involves sending serialized objects over the wire to describe data and give
instructions.
Injection attacks are a concern with SQL as they convert pseudo-English to computer language.
No parsing is required (as with SQL), which reduces server CPU usage.
Good schema design is still paramount with MDB. Just populating with data and utilizing a
dynamic schema is possible but may cause performance issues further down the line when
querying.
MongoDB will automatically create a unique _id when one is not supplied. This is GUID and can be
used as a primary key if no others are used.
The MongoDB Atlas Platform
26
MongoDB is more than just a document database - the core database is part of a larger platform of
tools and technologies.
The MongoDB Atlas Platform
Federated queries
using MQL
Native visualization
Edge to cloud
of MongoDB data
synchronization
using MQL
27
MongoDB Atlas is a fully-managed, integrated data layer designed for modern applications and
modern operating models. At its heart, Atlas delivers the latest versions of MongoDB, which
provides foundational features such as
● An intuitive and flexible data model
● MongoDB Query Language (MQL) for building nearly any workload
● Transactional guarantees at a global scale
● Unique data distribution capabilities
Atlas Search gives you full text search powered by Lucene on your Atlas hosted databases.
Atlas data lake allows you to store data cheaply in Amazon S3 in JSON, CSV or Parquet format and
query them like a MongoDB database.
● This include auto migrating data from Atlas to S3 as it gets older
● And federated queries over Atlas and Data Lake
MongoDB Charts
● Rich, document-centric, visualization functionality provided by MongoDB Charts
MongoDB Realm
● Data management at the edge and edge-to-cloud synchronization provided by MongoDB
Realm.
● Backend as a service - serverless code hosting, authentication of end users scheduled
and event drive triggers.
Atlas Managed Clusters
MongoDB Atlas is MongoDB as a
Service.
28
M0 clusters are free sandbox replica set clusters. You can deploy one M0 cluster per Atlas project.
If you do not have an account / corporate login, you can create one:
https://www.mongodb.com/cloud
We are going to configure a 3-node replica set for this training.
A Node is synonymous to an instance or a single MongoDB server.
A Cluster is a group of nodes.
Launch an Atlas
cluster
29
Setting-up MongoDB Atlas
We will set up an Atlas Free Tier
cluster
Highly available
512 MB of storage
Secure by default
30
Some functionalities are not available/limited (throughput, number of connections, data transfer
limits, monitoring, alerting, API access, etc.)
It is technically large enough to run most small business applications and is free for life.
Demo
31
One of the students will share their screen and the instructor will show the following steps: (Other
students should follow along as the Instructor shows each step)
show dbs is a helper - it's equivalent to a small piece of javascript code to list the databases.
Some commands:
● db - this is a database object and is used at the beginning of most commands to run
against a DB. Executing just db without any other methods, gives you the current
database name.
● show dbs - shows available databases
● use <database> - selects database to use. Note that it does not need to already exist
● show collections - show collections available in the currently selected database. If there
are no collections in a database, the command gives an empty response.
Database Interaction
Create new document object as { name: 'Jon', hungry: true, title: 'director' }
filter
]
34
Technically you are creating an object - and putting it in MongoDB as a Document, but Record is a
generic term.
Some commands:
● db - this is a database object and is used at the beginning of most commands to run
against a db
● insertOne() - this allows us to insert an object e.g.
db.getCollection(‘test’).insertOne({name:”bob”})
● find() - this allows us to query e.g. db.getCollection(‘test’).find()
Note: The employees collection is automatically created when we execute the insert command.
Demo:
1. db.employees.find({})
2. var employee = { "name" : "Jon", "hungry" : true, "title" : "director" } { name: 'Jon', hungry: true
, title: 'director' }
3. db.employees.insertOne(employee)
4. db.employees.find({hungry:true})
5. show collections
A JavaScript REPL
mongosh - JavaScript and Atlas [primary] test> for(let i=0; i<10; i++) {print(i);}
0
NodeJS REPL 1
2
3
4
5
6
7
8
9
35
We can run JavaScript code snippets in mongosh. It is a fully functional JavaScript and NodeJS
REPL for interacting with MongoDB deployments.
Note: REPL stands for Read Evaluate Print Loop, and it is a programming language environment
(basically a console window) that takes single expression as user input and returns the result back
to the console after execution.
The REPL session provides a convenient way to quickly test simple JavaScript code.
Help!
…
db.help
Atlas [primary] test> db.help
db.<collection>.help …
36
Intelligent Autocomplete
38
Compass
GUI tool that includes:
●Aggregation Builder
●Mongosh
●Visual explain Plans
39
40
#1. Select two features that are NOT part
of the core MongoDB database
Stored
A Aggregations B Transactions C procedures
Field level
D updates E Triggers
Stored
A Aggregations B Transactions C procedures
Field level
D updates E Triggers
42
MongoDB
- includes an aggregation framework
- can work with transactions
- allows you to update() and $set fields in a document
MongoDB does not include any stored procedure
Triggers are a feature of Realm Database
#2. Which of the following statements
are correct? MongoDB documents are ...
Often queried
Stored as BSON Homogeneous in
A on the disk B using BSON
syntax
C a collection
Created with a
Distributed over
primary key even
D if this is not E a network when
sharding
provided
Often queried
Stored as BSON Homogeneous in
A on the disk B using BSON
syntax
C a collection
Created with a
Distributed over
primary key even
D if this is not E a network when
sharding
provided
44
Create an Inner
Define Foreign Natively query
A Join on
documents
B Key Constraints C data using SQL
Create an Inner
Define Foreign Natively query
A Join on
documents
B Key Constraints C data using SQL
46
You can use $lookup to gather data from other collections on a query but this is not the same as
an inner join and it would affect performance, so it is not recommended
MongoDB stores data as BSON format that has a binary data type
It is possible to use SQL syntax to query data in MongoDB using specific connectors, like the BI
connector (but not natively)
It is possible in a collection to force the specific fields and data types to define the document
structure
#4. Select four components in the Atlas
platform which use MQL (MongoDB Query Language)
48
Realm has its own APIs and Atlas Search, Data lake, Charts, and Cloud database allows you to use
the mongo shell.
Refer to the MongoDB Atlas Platform diagram.
#5. Select one statement which best
describes MongoDB Atlas
50
Recap
Recap
MongoDB is a modern document-oriented
database with key attributes such as:
● Agility
● Usability
● Utility
● Scalability & Availability
● Data volume, velocity, and variety have increased substantially; software development
cycles have decreased; requirements for always-on, scalable systems are increasingly
table-stakes. The way software and hardware are purchased and consumed has changed
a lot since the RDBMS was first invented.
● MongoDB is a document-oriented database for modern applications:
○ By using the document model, MongoDB makes it easier to map to modern
programming paradigms, faster to retrieve data, and importantly makes it easier
to build systems that scale across multiple machines to escape the limits of a
single server. Documents are stored as binary in BSON, although often shown
using JSON.
○ MongoDB is agile, usable, capable, and scalable. We integrate well with
programming languages and don't sacrifice functionality for scale.
● MongoDB is a general-purpose database with a wide range of use cases and value to bring
to all of them.
● Some ways of working with MongoDB are similar to an RDBMS; others require re-thinking
to ensure that we make the most out of a horizontally scalable document-oriented
database and avoid costly mistakes.