04 Chapter Pattern in Mongodb2
04 Chapter Pattern in Mongodb2
PATTERNS IN MONGODB
1
10/10/2022
EXTENDED REFERENCE
PATTERN
Introduction
˗ Extended reference pattern
is well suited for applications
with repeated joins and
application is trying to query
the data multiple times.
˗ Idea behind this pattern is to
store the frequently
accessed data of one side
on to the many side in one to
many relationship
2
10/10/2022
Introduction
˗ Do you have nightmares about long and complex SQL queries
performing too many joins?
Introduction
˗ Relational Tables and Document Collections
Even if you migrated from a 10 tables relational model in the tabular
database to a 3 collections model in MongoDB, you still have to do a
lot of queries that need to join data from different collections.
MongoDB queries are nowhere going to be as complicated as in SQL.
But in big data, anything you do too often can become a liability for
your performance.
3
10/10/2022
Introduction
˗ Relational Tables and Document Collections
4
10/10/2022
5
10/10/2022
6
10/10/2022
Managing Duplication
˗ Minimize duplication:
Select fields that do not change often, For example, the user ID of a
person can be complemented by his name, as people rarely change
their name.
only bring the fields you need to avoid join
˗ After a source is updated
Identify what should be change, meaning the list of extended
references
When should the extended references be update
7
10/10/2022
Managing Duplication
˗ Duplication may be better than a unique reference:
sometimes duplication is not a problem, but rather the right
solution
Example: we use with invoices and addresses’ customer.
When the invoice was created, the customer may have lived in one
location, and they have moved since.
If we were to keep the last address of the customer in a reference in
the invoice, we will be pointing to the new address
Problem Solution
Too many repetitive joins Identify fields on the lookup side
Bring those fields into the main
object
8
10/10/2022
SUBSET PATTERN
9
10/10/2022
Introduction
Introduction
˗ Working Set fits in RAM
MongoDB keeps frequently accessed data, referred to as
the working set, in RAM.
10
10/10/2022
Introduction
˗ Working set fits in RAM
When the working set of data and indexes grows beyond
the physical RAM allotted, performance is reduced.
MongoDB tries to optimize the use of RAM by pulling in
memory only the document that it needs from the disk
through the RAM
When no more memory is available, it deletes the data it no
longer needs to make room for the documents it needs to
process.
As long as the size of working set fits in RAM.
Introduction
˗ When the working set is larger than the RAM, the server finds
itself frequently dropping documents. This process, which
constantly discards documents that need to be kept in memory,
11
10/10/2022
12
10/10/2022
13
10/10/2022
14
10/10/2022
15
10/10/2022
Subset Pattern
Problem Solution
Working set is too big Split the collection in 2 collections
Lot of pages are evicted from Most used part of documents
memory Least used part of documents
A large part of documents is Duplicate part of a 1-N or N-N
rarely needed relationship that is often used in the
most used side
Use case Examples Benefits and Trade-Offs
List of reviews for a product Split the collection in 2 collections
List of comments on an article Most used part of documents
List of actors in a movie Least used part of documents
Duplicate part of a 1-N or N-N
relationship that is often used in the
most used side
Summary
˗ Reduces working set size
˗ Spit information as:
Frequently needed
Rarely needed
16
10/10/2022
Exercise
Áp dụng Subset Pattern để
1. Thiết kế lưu trữ thông tin các Product và các Reivew của khách hàng
về Product
2. Thiết kế lưu trữ thông tin các article và các Comment của người xem
về article
17
10/10/2022
COMPUTED PATTERN
Introduction
˗ The usefulness of data becomes much more apparent when
we can compute values from it.
˗ Example:
What's the total sales revenue of the latest Amazon Alexa?
How many viewers watched the latest blockbuster movie?
˗ These types of questions can be answered from data stored in
a database but must be computed.
18
10/10/2022
Introduction
˗ Running these computations every time they're requested
though becomes a highly resource-intensive process,
especially on huge datasets. CPU cycles, disk access,
memory all can be involved
˗ In big data systems, these kinds of repeated computations
can lead to very poor performance
19
10/10/2022
Kind of Computations/Transformations
˗ Kinds of Computations
Mathematical Operations (e.g. sum, average, median)
Fan Out Operations (on Read or Write [Prepare on save])
Roll-up Operations (see data at high levels, running a
group of operations)
Mathematical Operations
˗ Mathematical Operations are the ones where we compute a
sum or an average, find a median, etc.
˗ These are often associated with calling a built-in function in
the server.
˗ Why apply the Computed pattern in this case?
20
10/10/2022
21
10/10/2022
22
10/10/2022
23
10/10/2022
24
10/10/2022
25
10/10/2022
Roll Up Operations
˗ OLAP stands for Online Analytical Processing Server. It is a
software technology that allows users to analyze information
from multiple database systems at the same time.
˗ It is based on multidimensional data model and allows the user
to query on multi-dimensional data (eg. Delhi -> 2018 -> Sales
data). OLAP databases are divided into one or more cubes and
these cubes are known as Hyper-cubes.
Roll Up Operations
26
10/10/2022
Roll Up Operations
˗ In the cube, the drill down
operation is performed by
moving down in the concept
hierarchy of Time dimension
(Quarter -> Month).
Roll Up Operations
˗ OLAP operations: There are five basic analytical operations
that can be performed on an OLAP cube:
Drill down: In drill-down operation, the less detailed data is
converted into highly detailed data. It can be done by:
• Moving down in the concept hierarchy
• Adding a new dimension
27
10/10/2022
Roll Up Operations
˗ OLAP operations
Roll up: It is just opposite of the drill-down operation. It
performs aggregation on the OLAP cube. It can be done by:
• Climbing up in the concept hierarchy
• Reducing the dimensions
Roll Up Operations
˗ OLAP operations
In the cube given in the overview
section, the roll-up operation is
performed by climbing up in the
concept hierarchy of Location
dimension (City -> Country).
28
10/10/2022
Roll Up Operations
Roll Up Operations
˗ Roll-up operation: merge data together.
Example:
• Grouping categories together in a parent category.
• Grouping time-based data from small intervals to large ones
• Mathematical computations are roll-ups
This type of roll-up is often seen in reporting for hourly, daily,
monthly, or yearly summaries.
Any operation that wants to see data at a high level is
basically looking at rolling up data.
29
10/10/2022
Example of Roll Up
˗ Scenario
An inventory (kho) has different wine types.
The inventory change once in a while, however, not frequently.
What is more frequent is looking at the wine organized by various
categories.
Example of Roll Up
{ Sku: 12345,
Color” “red”, { attribute: “country”,
Type: “port”, groups: [
Name: “Fonseca”, {name: “France”, count: 5123},
Country: “Fortugal”, {name: “Portugal”, count: 1022},
Price: 50.00 {name: “USA”, count: 4231}
…} …
]
{ Sku: 12346, …
Color” “rose”, }
Type: “Bandol”,
Name: “Domaine Sorin”,
Country: “France”,
{ attribute: “type”,
Price: 50.00
groups: [
…}
{name: “Bandol”, count: 12},
{name: “Chardonnary”, count: 933},
{ Sku: 12387, {name: “Port”, count: 36}
Color” “white”, …
Type: “Chardonnay”, ]
Name: “Robert Mondavi”, …
Country: “USA”, }
Price: 10.00
…}
30
10/10/2022
Example of Roll Up
˗ For example, We may want to see the count of wine types per
country of origin or per type so we can buy what is missing in
my inventory to cover all my customer needs.
˗ If I'm looking more often at the information on the right, the non-
aggregated data than the changes that are happening on my
collection on the left, it makes more sense to generate this data
and cache it in the appropriate documents.
31
10/10/2022
Summary
Problem Solution
Costly computation or Perform the operation and store the
manipulation of data result in the appropriate document and
Executed frequently on the same collection
data, producing the same result If need to redo the operations, keep the
source of them
Summary
In summary, the computed pattern is a useful pattern if
you want to avoid performing similar operations many
times.
32