0% found this document useful (0 votes)
7 views

04 Chapter Pattern in Mongodb2

Mongo

Uploaded by

tai43464
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

04 Chapter Pattern in Mongodb2

Mongo

Uploaded by

tai43464
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

10/10/2022

PATTERNS IN MONGODB

Patterns in NOSQL Data Modeling

1. Attribute Pattern 7. Tree Patterns


2. Extended Reference Pattern8. Polymorphic Pattern
3. Subset Pattern 9. Other Patterns
4. Computed Pattern
5. Bucket Pattern
6. Schema Versioning Pattern

1
10/10/2022

EXTENDED REFERENCE
PATTERN

Introduction
˗ Extended reference pattern
is well suited for applications
with repeated joins and
application is trying to query
the data multiple times.
˗ Idea behind this pattern is to
store the frequently
accessed data of one side
on to the many side in one to
many relationship

2
10/10/2022

Introduction
˗ Do you have nightmares about long and complex SQL queries
performing too many joins?

Introduction
˗ Relational Tables and Document Collections
 Even if you migrated from a 10 tables relational model in the tabular
database to a 3 collections model in MongoDB, you still have to do a
lot of queries that need to join data from different collections.
 MongoDB queries are nowhere going to be as complicated as in SQL.
 But in big data, anything you do too often can become a liability for
your performance.

3
10/10/2022

Introduction
˗ Relational Tables and Document Collections

How joins are performed in MongoDB


˗ Joins in the application side: Before the MongoDB servers
supported any joining facilities, the only possible way was to
do the joins in the application side.
˗ Lookups
 $lookup operation: the aggregation framework supports
joins through the $lookup operation.
 $graphLookup operator to perform recursive queries over
the same collection, a recursive self-join, similar to the ones
you find in graphed other bases.

4
10/10/2022

How joins are performed in MongoDB


˗ Avoid a Join by embedding
the joined table
 To avoid doing a physical join.
You can embed a one-to-many
relationship on the one side.

Extended Reference pattern


˗ Extended Reference for Many-to-One Relationships
˗ Example: A collection of orders which have a many-to-one
relationship with a customer.

5
10/10/2022

Extended Reference pattern


˗ Extended Reference for Many-to-One Relationships
˗ Example (cont.):
 One customer can have many orders, and one order belongs to one
customer.
 Assuming that the application focuses on order management and
fulfillment, And we need to query specific orders more often than
query all orders for a given customer.
 Focus of queries is on the “many” side.

Extended Reference pattern


˗ Extended Reference for Many-to-One Relationships
 With the extended reference pattern, we only copy the
fields that need frequent access, leaving the rest of the
information in the source collection.
 Embed the “One” side, of a “One-to-Many” relationship, into
the “Many” side

6
10/10/2022

Extended Reference pattern


˗ Extended Reference for Many-to-One Relationships
˗ Example (cont.): We embed information from one side: the
customer's address, into all orders...

Managing Duplication
˗ Minimize duplication:
 Select fields that do not change often, For example, the user ID of a
person can be complemented by his name, as people rarely change
their name.
 only bring the fields you need to avoid join
˗ After a source is updated
 Identify what should be change, meaning the list of extended
references
 When should the extended references be update

7
10/10/2022

Managing Duplication
˗ Duplication may be better than a unique reference:
sometimes duplication is not a problem, but rather the right
solution
Example: we use with invoices and addresses’ customer.
 When the invoice was created, the customer may have lived in one
location, and they have moved since.
 If we were to keep the last address of the customer in a reference in
the invoice, we will be pointing to the new address

Extended Reference Pattern

Problem Solution
 Too many repetitive joins  Identify fields on the lookup side
 Bring those fields into the main
object

Use case Examples Benefits and Trade-Offs


 Catalogs Faster reads
 Mobile applications  Reduce number of joints and
 Real-time analytics lookups.
 May introduce lots of
duplication if extended reference
contains fields that mutate a lot

8
10/10/2022

Patterns in NOSQL Data Modeling

1. Attribute Pattern 7. Tree Patterns


2. Extended Reference Pattern 8. Polymorphic Pattern
3. Subset Pattern 9. Other Patterns
4. Computed Pattern
5. Bucket Pattern
6. Schema Versioning Pattern

SUBSET PATTERN

9
10/10/2022

Introduction

˗ Modern applications aren't immune from exhausting


resources (cạn kiệt tài nguyên).

Introduction
˗ Working Set fits in RAM
 MongoDB keeps frequently accessed data, referred to as
the working set, in RAM.

10
10/10/2022

Introduction
˗ Working set fits in RAM
 When the working set of data and indexes grows beyond
the physical RAM allotted, performance is reduced.
 MongoDB tries to optimize the use of RAM by pulling in
memory only the document that it needs from the disk
through the RAM
 When no more memory is available, it deletes the data it no
longer needs to make room for the documents it needs to
process.
 As long as the size of working set fits in RAM.

Introduction
˗ When the working set is larger than the RAM, the server finds
itself frequently dropping documents. This process, which
constantly discards documents that need to be kept in memory,

11
10/10/2022

Working set is too big


˗ There are 3 solutions
A. Add more RAM to the server: scale your infrastructure
vertically, that only scales so much though.
B. Scale with Shading: either start shading, or add more
shards if you already have a shaded cluster. But that
comes with additional costs and complexities that our
application may not be ready for
C. Reduce the size of the working set: This is where we can
leverage the Subset Pattern.

The Subset Pattern


˗ The Subset Pattern addresses the issues associated with a
working set that exceeds RAM, resulting in information being
removed from memory.
˗ The key to that is breaking up huge documents, which we
only need a fraction of it.

12
10/10/2022

Reduce the Size of a Document


˗ Example: Let's say we have a system that keeps a lot of
movies in memory, and each of these movies is taking a fair
amount of memory.
 In those documents there are some information that users
need to use frequently such as: information of top actors
and top reviews,
 Other information such as: comments, citations and releases
...most users don't need all of them.

Reduce the Size of a Document


˗ Example (cont.):
 We could keep only 20 of the cast members, the main
actors-- and also 20 of each of those comments, quote,
release, and reviews.
 The rest of the information can go into a separate
collection.

13
10/10/2022

Reduce the Size of a Document


 Example (cont.): It means if you start with a document that has
all the info in it, a field that has a one-to-one relationship.
 One document

Reduce the Size of a Document


˗ Moving some Fields with One-to-One Relationship
˗ Example (cont.): The full script-- could be moved to a new
collection. And you could access this information through the
dollar lockup operator.

14
10/10/2022

Reduce the Size of a Document


˗ Moving some Fields with One-to-Many Relationship
˗ As for a field that has a one-to-N relationship, you can move
most of those objects to another collection and keep only a
subset of the N relationship in the main document.

Reduce the Size of a Document


˗ The result on the working set:
 Each document has been split into the part that is frequently
accessed and the part that is rarely accessed.
 The documents are smaller, the whole working set can fit in memory.

15
10/10/2022

Subset Pattern
Problem Solution
 Working set is too big  Split the collection in 2 collections
 Lot of pages are evicted from Most used part of documents
memory Least used part of documents
 A large part of documents is  Duplicate part of a 1-N or N-N
rarely needed relationship that is often used in the
most used side
Use case Examples Benefits and Trade-Offs
 List of reviews for a product  Split the collection in 2 collections
 List of comments on an article Most used part of documents
 List of actors in a movie Least used part of documents
 Duplicate part of a 1-N or N-N
relationship that is often used in the
most used side

Summary
˗ Reduces working set size
˗ Spit information as:
 Frequently needed
 Rarely needed

16
10/10/2022

Exercise
Áp dụng Subset Pattern để
1. Thiết kế lưu trữ thông tin các Product và các Reivew của khách hàng
về Product

2. Thiết kế lưu trữ thông tin các article và các Comment của người xem
về article

Làm và nộp theo nhóm, gọi sửa bài theo cá nhân

Patterns in NOSQL Data Modeling

1. Attribute Pattern 7. Tree Patterns


2. Extended Reference Pattern 8. Polymorphic Pattern
3. Subset Pattern 9. Other Patterns
4. Computed Pattern
5. Bucket Pattern
6. Schema Versioning Pattern

17
10/10/2022

COMPUTED PATTERN

Introduction
˗ The usefulness of data becomes much more apparent when
we can compute values from it.
˗ Example:
 What's the total sales revenue of the latest Amazon Alexa?
 How many viewers watched the latest blockbuster movie?
˗ These types of questions can be answered from data stored in
a database but must be computed.

18
10/10/2022

Introduction
˗ Running these computations every time they're requested
though becomes a highly resource-intensive process,
especially on huge datasets. CPU cycles, disk access,
memory all can be involved
˗ In big data systems, these kinds of repeated computations
can lead to very poor performance

The Computed Pattern


˗ The Computed Pattern solves the problem of calculating
costly (tốn kém) operations or or manipulating data repeatedly
on the same data which produces same result.
˗ The computed pattern can be used to limit the overuse (lạm
dụng) of the resources and reduce the latency (độ trễ) of
the read operations.
˗ Problem:
 Computations are expensive
 Overuse of resources (CPU)
 Need to reduce latency for read operations

19
10/10/2022

Kind of Computations/Transformations
˗ Kinds of Computations
 Mathematical Operations (e.g. sum, average, median)
 Fan Out Operations (on Read or Write [Prepare on save])
 Roll-up Operations (see data at high levels, running a
group of operations)

Mathematical Operations
˗ Mathematical Operations are the ones where we compute a
sum or an average, find a median, etc.
˗ These are often associated with calling a built-in function in
the server.
˗ Why apply the Computed pattern in this case?

20
10/10/2022

Example of Mathematical Operations


˗ Example: Suppose we have a
write operation
 This piece of data is added as a
document to a certain collection.
 Another part of the app reads this
collection and sums the numbers.
 If we are doing 1,000 times more
reads than writes, the sum
operation we do with those reads is
identical and very often does the
exact same calculation for each of
those read operations.

Example of Mathematical Operations


˗ Example (cont.): To avoid
performing identical (giống
nhau) operations, we compute
the result when a new piece of
data is received.
˗ Once we have a new piece of
data, we read the other
element for the sum and store
the result in another collection
with documents more
appropriate to keep the sum for
that element

21
10/10/2022

Example of Mathematical Operations


˗ Example (cont.):
 This results in much fewer computation in the system, and also reduce
the amount of data being read (giảm lượng dữ liệu được đọc).
 Each sum reads all those documents.
 Since we don't have to do the computation that at read time, we also
save (tiết kiệm) on those read operations, too, that will be 1,000 fewer
computation and 1,000 fewer reads.

Example of Mathematical Operations


˗ Example 2: track ticket sales and then report sales - say for a
specific movie - on a movie website.
 There may be fewer screening happening per hour, than page views
for the given movie where we want to display the sums.
 So instead of summing those screenings document to display the
total viewer and sales every time access the movie, we are better off
keeping the information in the document and updating it every time
we get a new screening document.
 We don't have to keep the screenings once the calculation is done.

22
10/10/2022

Example of Mathematical Operations

Fan Out Operations


˗ Fan Out Operations: is doing
multiple tasks to represent a
logical task. There are two
basic schemes.
 Either Fan out on reads, which
means in order to return the
appropriate data, the query must
fetch data from different
locations,

23
10/10/2022

Fan Out Operations


˗ Fan Out Operations: is doing
multiple tasks to represent a
logical task. There are two
basic schemes.
 Or fan out on writes, which
means every logical write
operation translate into several
writes to different documents.
 In doing so, the read does not
have to fan out anymore, as the
data is pre-organized at write
time

Fan Out on Writes


˗ Why would you use fan out on writes?
 If the system has plenty (nhiều) of time when the information arrives
compared to the acceptable latency (độ trễ) of returning data on read
operation, then preparing the data at write time makes a lot of sense.
 If you are doing more writes then reads so the system becomes bound
by writes, this may not be a good pattern to apply.

24
10/10/2022

Example of Fan Out on Writes


˗ Example for this pattern will be a social networking system
for sharing photos.
 The system may copy the new photo on each follower's home page as
it gets the new photo.
 This way, when a user load its own page, the system does not have to
spend resources assembling all the information from all the people this
user follows.
 Everything needed for the user's home page would be available via
single document, which lead to a much better experience with the site.

Example of Fan Out on Writes

25
10/10/2022

Roll Up Operations
˗ OLAP stands for Online Analytical Processing Server. It is a
software technology that allows users to analyze information
from multiple database systems at the same time.
˗ It is based on multidimensional data model and allows the user
to query on multi-dimensional data (eg. Delhi -> 2018 -> Sales
data). OLAP databases are divided into one or more cubes and
these cubes are known as Hyper-cubes.

Roll Up Operations

26
10/10/2022

Roll Up Operations
˗ In the cube, the drill down
operation is performed by
moving down in the concept
hierarchy of Time dimension
(Quarter -> Month).

Roll Up Operations
˗ OLAP operations: There are five basic analytical operations
that can be performed on an OLAP cube:
 Drill down: In drill-down operation, the less detailed data is
converted into highly detailed data. It can be done by:
• Moving down in the concept hierarchy
• Adding a new dimension

27
10/10/2022

Roll Up Operations
˗ OLAP operations
 Roll up: It is just opposite of the drill-down operation. It
performs aggregation on the OLAP cube. It can be done by:
• Climbing up in the concept hierarchy
• Reducing the dimensions

Roll Up Operations
˗ OLAP operations
 In the cube given in the overview
section, the roll-up operation is
performed by climbing up in the
concept hierarchy of Location
dimension (City -> Country).

28
10/10/2022

Roll Up Operations

Roll Up Operations
˗ Roll-up operation: merge data together.
 Example:
• Grouping categories together in a parent category.
• Grouping time-based data from small intervals to large ones
• Mathematical computations are roll-ups
 This type of roll-up is often seen in reporting for hourly, daily,
monthly, or yearly summaries.
 Any operation that wants to see data at a high level is
basically looking at rolling up data.

29
10/10/2022

Example of Roll Up
˗ Scenario
 An inventory (kho) has different wine types.
 The inventory change once in a while, however, not frequently.
 What is more frequent is looking at the wine organized by various
categories.

Example of Roll Up
{ Sku: 12345,
Color” “red”, { attribute: “country”,
Type: “port”, groups: [
Name: “Fonseca”, {name: “France”, count: 5123},
Country: “Fortugal”, {name: “Portugal”, count: 1022},
Price: 50.00 {name: “USA”, count: 4231}
…} …
]
{ Sku: 12346, …
Color” “rose”, }
Type: “Bandol”,
Name: “Domaine Sorin”,
Country: “France”,
{ attribute: “type”,
Price: 50.00
groups: [
…}
{name: “Bandol”, count: 12},
{name: “Chardonnary”, count: 933},
{ Sku: 12387, {name: “Port”, count: 36}
Color” “white”, …
Type: “Chardonnay”, ]
Name: “Robert Mondavi”, …
Country: “USA”, }
Price: 10.00
…}

30
10/10/2022

Example of Roll Up
˗ For example, We may want to see the count of wine types per
country of origin or per type so we can buy what is missing in
my inventory to cover all my customer needs.
˗ If I'm looking more often at the information on the right, the non-
aggregated data than the changes that are happening on my
collection on the left, it makes more sense to generate this data
and cache it in the appropriate documents.

When to apply the Computed Pattern?


˗ Overuse of resources (CPU): this may be a sign that you're
doing much more than transferring data from and to the disk.
˗ Reduce latency for read operations: if you have long read
operation that depends on complex aggregation queries, you
might want to make them run faster.

31
10/10/2022

Summary
Problem Solution
 Costly computation or  Perform the operation and store the
manipulation of data result in the appropriate document and
 Executed frequently on the same collection
data, producing the same result  If need to redo the operations, keep the
source of them

Use case Examples Benefits and Trade-Offs


 Internet of things (IOT) Read queries are faster
 Event Sourcing Saving on resources like CPU and
 Time Series Data Disk
 Frequent Aggregation May be difficult to identify the need
Framework queries Avoid applying to overusing it unless
needed

Summary
In summary, the computed pattern is a useful pattern if
you want to avoid performing similar operations many
times.

32

You might also like