Lesson 5 - The Document Model and MongoDB
Lesson 5 - The Document Model and MongoDB
Distributed and resilient: Document databases are distributed, which allows for
horizontal scaling (typically cheaper than vertical scaling) which distributes the data
across multiple machines rather than making one machine bigger as the data
increases. This system also allows for data to achieve high availability and resiliency
as the data lives in replica sets which creates redundancy, so if one machine fails the
secondary machine will take over and keep the data alive. This system is also
referred to as sharding.
Object mapping: Documents easily map to objects, the most frequently used data
structure in the most popular programming languages. This allows developers to
rapidly develop their applications as it is an intuitive process.
Flexible schema: Document databases have a flexible schema, meaning that not all
documents in a collection need to have the same fields. Note that some document
databases support schema validation, so the schema can be both mandatory or
defined.
{
"_id": ObjectId(
The Document Model:
"5f4f7fef2d4b45b7f11b6d7a"),
Structure and Syntax
"user_id": "Sean", To the left is an example of a
"age": 29, document representing a user
details including user_id, age,
"Status": "A"
and a status category.
}
Each field in a MongoDB document must be enclosed within quotation marks. String
values are often quoted as good practice.
{ The Document Model:
"_id": ObjectId( Structure and Syntax
"5f4f7fef2d4b45b7f11b6d7a"),
Each field-value pair is
"user_id": "Sean", separated within the
"age": 29, document by commas.
"Status": "A"
}
Each of the field-value pairs in the document are separated by a comma from the next
record. The final field-value pair doesn’t require a comma as the final curly brace
indicates the end of the document.
Quiz
Quiz
B. Field-value pairs
C. Objects
Quiz
B. Field-Value pairs
C. Objects
CORRECT: Field-Value pairs - MongoDB stores field-value pairs where the value can
be one of several data types including a sub-object.
Quiz
INCORRECT: Objects - MongoDB does not store data as object, rather it is stored as
a field-value pair within a object.
JSON and BSON
How is data stored?
A document typically stores information
about one object and any of its related
metadata.
JSON is just how we tend to present documents, but it actually has very little to do
with how they are stored/processed within MongoDB.
BSON, or binary JSON, is actually how data is stored within the database and it is
also how data is transmitted across the network.
It is a very lightweight format which keeps the smallest data size possible, it was
designed to be easily traversable, and as a binary format it is highly computational
and efficient in terms of encoding/decoding.
What is BSON?
Bridges the gap between binary representation and JSON format
Optimized for:
● Speed
● Space
● Flexibility
Highly performant
JSON is human readable but space and speed inefficient. Binary JSON, or BSON,
was developed to address these shortcomings.
BSON is optimized for speed and space to facilitate both efficient storage but also
transmission across the network.
BSON is highly performant due to the design to facilitate the traversal of data which
enables fast retrieval as well.
The key point to note is that BSON is the underlying storage format that data is
written to using MongoDB.
17
Okay, but what really is a BSON document?
> { “hello” : “world” }
Let’s look at the traditional “Hello World” example, firstly we have it represented in
JSON and then as encoded in BSON below.
See: http://bsonspec.org/faq.html
See: http://bsonspec.org/faq.html
See: http://bsonspec.org/faq.html
Okay, but what really is a BSON document?
> { “hello” : “world” }
See: http://bsonspec.org/faq.html
Okay, but what really is a BSON document?
> { “hello” : “world” }
Here is the BSON representation of “Hello World” when it is encoded. The document
size is the first item, then the type of the field (a string), then the field name (hello),
and the length of the field string. This is followed by the field value and then the
indicator that it is the end of the object.
See: http://bsonspec.org/faq.html
INTUITIVE
Type Representation
64-bit binary floating point "\x01" e_name double
UTF-8 string "\x02" e_name string
Embedded document "\x03" e_name document
Interesting,
Array "\x04" e_name document
Binary data "\x05" e_name binary
Undefined (value) — Deprecated "\x06" e_name
ObjectId "\x07" e_name (byte*12)
Boolean "false"
are supported?
JavaScript code "\x0C" e_name string (byte*12)
Symbol. — Deprecated "\x0D" e_name string
JavaScript code w/ scope — Deprecated
"\x0E" e_name string
32-bit integer
"\x0F" e_name code_w_s
Timestamp
64-bit integer "\x10" e_name int32
"\x11" e_name uint64
128-bit decimal floating point
"\x12" e_name int64
Min key
"\x13" e_name decimal128
Max key
"\xFF" e_name
"\x7F" e_name
BSON offers a number of additional data types beyond JSON, examples include the
decimal128 for floating point decimal.
What is JSON?
Human-readable
JSON is short for Designed to be easy to read and write.
JSON, uses text encoding, making it human readable, but is slower to parse
computations when compared to BSON. Based on two data structures, the ordered
list and the object of name-value pairs. JSON is very well known to all programmers.
The syntax of JSON is very well known and intuitive to most developers, it follows
similar patterns to other popular programming languages.
It uses curly brackets to mark the start and the end of the document.
MongoDB refers to keys as fields. The field-value pairs in a document are separated
by colons (:).
Each field must be enclosed within quotation marks. String values are often quoted as
good practice.
Each field-value pair is separated within the document by commas.
JSON: Object
{
object
{ whitespace }
"_id": ObjectId(
"5f4f7fef2d4b45b7f11b6d7a"), , whitespace string
"user_id": "Eoin",
"age": 29, whitespace : value
"Status": "A"
}
https://www.json.org/
The JSON specification website has a number of very helpful syntax/railway diagrams
that we will explore over the next few slides to better understand JSON.
The first concept to understand is that of the object which is the main container and
we can see how to interpret the syntax diagram by applying the start and the end
brackets from our example JSON document.
JSON: Field Value Pair Separator
{
object
{ whitespace }
"_id": ObjectId(
"5f4f7fef2d4b45b7f11b6d7a"), , whitespace string
"user_id": "Eoin",
"age": 29, whitespace : value
"Status": "A"
}
https://www.json.org/
We can see that the syntax diagram also clearly highlights that *all* field value pairs
must be separated by colons. We can again reference our example document and
apply it to the syntax diagram to verify this.
JSON: Values
{ value
whitespace string whitespace
"5f4f7fef2d4b45b7f11b6d7a"), object
} null
https://www.json.org/
The value syntax diagram for JSON highlights the range of possible types for the
value. We highlight the string and the number types that correspond to examples
highlighted in our sample document.
JSON: Quoting Strings
{ value
whitespace string whitespace
"5f4f7fef2d4b45b7f11b6d7a"), object
} null
https://www.json.org/
An good practice for string values in JSON is to quote the values within them to allow
for multi-word space separated text. This allows a sentence, or indeed paragraphs, to
be stored in the string value rather than just a single word or letter.
JSON: More values
value
values to represent
number
other objects
object
(sub-object), arrays,
array
true
https://www.json.org/
Let’s compare JSON and BSON broadly. The efficiency and fast parsing aspects of
BSON are key in understanding why it is the storage format used. It is clear as to why
JSON is used as the presentation format in the MongoSh, MongoDB Compass, or
Atlas’s Data Explorer.
CORRECT: Start and end with curly braces - As highlighted in the JSON specification,
all JSON documents must start and end with curly brackets.
CORRECT: Represents a value as a number, a string, an object, booleans, null, or an
array - A JSON document value can hold any of these data types per the JSON
specification.
INCORRECT: Separate a field/key from a value by a comma - A field/key is separated
by a colon “:” rather than a comma “,” from the corresponding value. Commas are
used to separate field-value pairs.
Quiz
Which of the following are true for MongoDB documents using
JSON? More than one answer choice can be correct.
CORRECT: Start and end with curly braces - As highlighted in the JSON specification,
all JSON documents must start and end with curly brackets.
Quiz
Which of the following are true for MongoDB documents using
JSON? More than one answer choice can be correct.
{
{ MongoDB does not
"_id": ObjectId(
"_id": ObjectId( enforce a single schema
"on
5f4f7fef2d4b45b7f11b6d7a"),
a collection.
"5f4f7fef2d4b45b7f11b6d7a"),
"user_id": "Sean", Documents
"user_id":can have
"Daniel",
common
"age": 25, fields, but they
"age": 29,
are not required
"Status": "A", to by
"Status": "A"
default.
"Country": "USA"
}
}
JSON Schema
The schema definition can be used by any query to inspect document structure and
content. For example, DBAs can identify all documents that do not conform to a
prescribed schema.
We will cover some best practices when it comes to modeling data in MongoDB.
Quiz
Quiz
B. Documents
C. Records
Quiz
B. Documents
C. Records
C. Records
CORRECT: Documents - A MongoDB collection holds all the documents for that
specific collection.
Quiz
CORRECT: Field are enclosed with quotation marks - each field must be enclosed
with quotation marks per the JSON specification, which we’ll cover immediately after
this quiz.
CORRECT: Documents are represented as JSON - Documents are represented in
MongoDB as JSON to simplify their readability, however they are stored as BSON.
We’ll explore JSON and then BSON in the next lessons.
INCORRECT: Documents are represented as BSON - Documents are represented as
JSON however they are stored as BSON.
Quiz
CORRECT: Field are enclosed with quotation marks - This is correct. Each field must
be enclosed with quotation marks per the JSON specification, which we’ll cover
immediately after this quiz.
Quiz
MongoDB University has free self-paced Sign up for the MongoDB Student Pack to
courses and labs ranging from beginner receive $50 in Atlas credits and free
to advanced levels. certification!
This concludes the material for this lesson. However, there are many more ways to
learn about MongoDB and non-relational databases, and they are all free! Check out
MongoDB’s University page to find free courses that go into more depth about
everything MongoDB and non-relational. For students and educators alike, MongoDB
for Academia is here to offer support in many forms. Check out our educator
resources and join the Educator Community. Students can receive $50 in Atlas credits
and free certification through the Github Student Developer Pack.