TeM SWDND501 NoSQL Database Development
TeM SWDND501 NoSQL Database Development
SWDND501
SOFTWARE
DEVELOPMENT
NoSQL
Database
Development
TRAINEE'S MANUAL
October, 2024
NOSQL DATABASE DEVELOPMENT
2024
AUTHOR’S NOTE PAGE (COPYRIGHT)
The competent development body of this manual is Rwanda TVET Board ©, reproduce
with permission.
● This work has been produced initially with the Rwanda TVET Board with the
support from KOICA through TQUM Project
● This work has copyright, but permission is given to all the Administrative and
Academic Staff of the RTB and TVET Schools to make copies by photocopying or
other duplicating processes for use at their own workplaces.
● This permission does not extend to making of copies for use outside the
immediate environment for which they are made, nor making copies for hire or
resale to third parties.
● The views expressed in this version of the work do not necessarily represent the
views of RTB. The competent body does not give warranty nor accept any liability
● RTB owns the copyright to the trainee and trainer’s manuals. Training providers
may reproduce these training manuals in part or in full for training purposes only.
Acknowledgment of RTB copyright must be included on any reproductions. Any
other use of the manuals must be referred to the RTB.
iii | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
ACKNOWLEDGEMENTS
The publisher would like to thank the following for their assistance in the elaboration of
this textbook:
Rwanda TVET Board (RTB) extends its appreciation to all parties who contributed to the
development of the trainer’s and trainee’s manuals for the TVET Certificate V in Software
Development, specifically for the module "SWDND501: NOSQL Database Development"
We extend our gratitude to KOICA Rwanda for its contribution to the development of
these training manuals and for its ongoing support of the TVET system in Rwanda.
We extend our gratitude to the TQUM Project for its financial and technical support in
the development of these training manuals.
We would also like to acknowledge the valuable contributions of all TVET trainers and
industry practitioners in the development of this training manual.
The management of Rwanda TVET Board extends its appreciation to both its staff and the
staff of the TQUM Project for their efforts in coordinating these activities.
iv | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
This training manual was developed:
Production Team
Authoring and Review
MUGISHA Pacifique
MUKAMUHOZA Liberee
NSENGIYUMVA Emmanuel
Validation
.NIYONSABA Godelive
HABANABAKIZE Jerome
vi | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
TABLE OF CONTENT
vii | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
Indicative content 4.1: Management of database users------------------------------------------115
Indicative content 4.2: Securing database-----------------------------------------------------------131
Indicative content 4.3: Deployment of database---------------------------------------------------137
Learning outcome 4 end assessment 155
References: 158
viii | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
ACRONYMS
ix | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
INTRODUCTION
This trainee's manual includes all the knowledge and skills required in Software
Development, specifically for the module of "NOSQL Database Development" Trainees
enrolled in this module will engage in practical activities designed to develop and
enhance their competencies.
The development of this training manual followed the Competency-Based Training and
Assessment (CBT/A) approach, offering ample practical opportunities that mirror real-life
situations.
The trainee's manual is organized into Learning Outcomes, which is broken down into
indicative content that includes both theoretical and practical activities. It provides
detailed information on the key competencies required for each learning outcome, along
with the objectives to be achieved.
As a trainee, you will start by addressing questions related to the activities, which are
designed to foster critical thinking and guide you towards practical applications in the
labour market. The manual also provides essential information, including learning hours,
required materials, and key tasks to complete throughout the learning process.
All activities included in this training manual are designed to facilitate both individual and
group work. After completing the activities, you will conduct a formative assessment,
referred to as the end learning outcome assessment. Ensure that you thoroughly review
the key readings and the 'Points to Remember' section.
By the end of the learning outcome, the trainees will be able to:
Resources
Duration: 3 hrs
Tasks:
4: Pay attention to the trainer’s clarification and ask questions where necessary.
5: Read the key readings 1.1.1
NoSQL stands for Not only SQL. It is a type of database that uses non-relational data
structures, such as documents, graph databases, and key-value stores to store and retrieve
data. NoSQL systems are designed to be more flexible than traditional relational databases
and can scale up or down easily to accommodate changes in usage or load. This makes
them ideal for use in applications.
Applications of NoSQL Databases
NoSQL databases, designed to handle large sets of unstructured or semi-structured data.
1. Real-time Analytics:
Processing large volumes of data in real-time such as IoT data, social media feeds, and
financial market data and Analyzing data collected over time, like sensor readings, stock
prices, or website traffic.
2. Content Management Systems:
Storing and managing large amounts of unstructured content, such as text, images, videos,
and documents and Handling increasing volumes of content and user interactions.
3. Social Networking:
Representing complex relationships between users, posts, and groups, Handling rapidly
changing data, such as likes, comments, and shares.
4. Gaming:
Maintaining and updating leader boards in real-time, Storing and managing user
preferences and game progress.
5. Internet of Things (IoT): Storing and analysing data from a large number of connected
devices, Processing sensor data in real-time for immediate actions or insights.
6. Big Data Analytics:
Handling massive datasets that are difficult to manage with traditional relational
databases, Processing data of different formats and structures.
MongoDB
MongoDB is a popular NoSQL database that uses a document-oriented model. It stores
data in flexible JSON-like documents, making it well-suited for handling complex and
semi- structured data. MongoDB offers features like automatic sharding, replication, and
indexing for scalability and performance.
Availability
Availability refers to the ability of a system to be accessible and operational when
needed. In the context of databases, it means that users can access and use the data
without interruption. High availability systems often employ redundancy and failover
mechanisms to minimize downtime.
Documents
Documents are the basic unit of data storage in MongoDB. They are JSON-like structures
that can contain nested objects and arrays, allowing for flexible data modeling.
Documents can represent individual entities or groups of related data.
Collection
Collection is a group of related documents in MongoDB. It's analogous to a table in a
relational database. Collections can contain documents with different structures, as long
as they share a common theme or purpose.
Indexing
Indexing is a technique used to improve query performance in MongoDB. It creates data
structures that allow the database to quickly locate specific documents based on their
values. Indexes can be created on individual fields or combinations of fields.
Optimistic Locking
Optimistic locking is a concurrency control technique that assumes conflicts are rare and
checks for them only when a transaction is about to commit. If a conflict is detected, the
transaction is aborted and the user is typically asked to retry the operation.
Relationships
Relationships in MongoDB are often modeled using references. Documents can contain
references to other documents, creating relationships between them. These
relationships can be one-to-one, one-to-many, or many-to-many.
Data model
Data model defines how data is organized and structured in a database. In MongoDB, the
data model is based on documents and collections, providing flexibility and scalability.
Schema
User Testing:
10 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
- Conduct usability testing: Observe users interacting with prototypes or
early versions of the product.
- Gather feedback: Identify areas for improvement and iterate on the design.
- Iterate and refine: Incorporate user insights to enhance the overall
experience.
Additional Considerations:
- Involve stakeholders: Ensure that key stakeholders are involved in the
requirements gathering process.
- Consider constraints: Be mindful of technical limitations, budget
constraints, and timelines.
- Prioritize requirements: Rank requirements based on their importance
and feasibility.
- Document requirements: Create a clear and concise requirements
document.
- Remember: Effective user requirements gathering involves a combination
of research, empathy, and collaboration.
Description of characteristics, features, and datatypes of NoSQL Databases
Characteristics of collections.
Collections in NoSQL databases, particularly document-oriented ones like MongoDB,
Key characteristics:
- Dynamic Schema: Unlike relational databases, collections don't require a
fixed schema. Documents within a collection can have different structures,
allowing for flexibility and adaptability to changing data requirements.
- Unstructured or Semi-Structured Data: Collections can store unstructured
or semi-structured data, such as JSON, XML, or binary data. This makes
them suitable for handling complex data formats that don't fit well into
traditional relational tables.
- High Performance: Collections are often optimized for high-performance
read and write operations. This is especially true for document-oriented
databases that use indexing and sharding techniques to distribute data
across multiple servers.
- Scalability: Collections can scale horizontally by adding more servers to a
cluster. This allows for handling large datasets and increasing throughput
without requiring significant changes to the application.
- Flexibility: Collections provide flexibility in terms of data modeling and
query capabilities. You can query documents based on their structure and
content, allowing for complex and ad-hoc analysis.
Features of NoSQL Databases
11 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
NoSQL databases offer a range of features that make them well-suited for modern
applications:
- Scalability: The ability to handle large datasets and high traffic loads by distributing
data across multiple servers.
- Performance: Optimized for fast read and write operations, often using indexing
and caching techniques.
- Flexibility: The ability to accommodate changing data structures and
requirements without requiring significant schema changes.
- Fault Tolerance: Built-in mechanisms to ensure data consistency and availability
even in the event of hardware failures or network outages.
- Distributed Architecture: The ability to run across multiple servers, providing
redundancy and scalability.
- Schema-less or Flexible Schema and rich query language: No strict requirement
for a predefined schema, allowing for more dynamic data modelling.
- High Availability: The ability to maintain continuous access to data, even in the
event of failures or maintenance.
Types of NoSQL Databases
Document-based databases
- Key-value stores
- Column-oriented databases
- Graph-based databases
Document-based databases :
The document-based database is a nonrelational database. Instead of storing the
data in rows and columns (tables), it uses the documents to store the data in the
database. A document database stores data in JSON, BSON, or XML documents.
Documents can be stored and retrieved in a form that is much closer to the data objects
used in applications which means less translation is required to use these data in the
applications. In the Document database, the particular elements can be accessed by
using the index value that is assigned for faster querying.
Collections are the group of documents that store documents that have similar contents.
Not all the documents are in any collection as they require a similar schema because
document databases have a flexible schema.
12 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
Features of Documents Database:
- Flexible schema: Documents in the database has a flexible schema. It means the
documents in the database need not be the same schema.
- Faster creation and maintenance: the creation of documents is easy and minimal
maintenance is required once we create the document.
- Suitable for unstructured data
- Easy to scale Horizontally
- No foreign keys: There is no dynamic relationship between two documents so
documents can be independent of one another. So, there is no requirement for a
foreign key in a document database.
- Open formats: To build a document we use XML, JSON, and others.
Key-Value Stores:
Graph Databases:
Graph-based databases focus on the relationship between the elements. It stores the
data in the form of nodes in the database. The connections between the nodes are
13 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
called links or relationships. Store data as a graph of interconnected nodes and
relationships, making them suitable for modeling complex relationships between data.
Examples include Neo4j, ArangoDB, and Amazon Neptune.
Features of graph database:
- It is easy to identify the relationship between the data by using the links.
- The Query’s output is real-time results.
- The speed depends upon the number of relationships among the database elements.
- Updating data is also easy, as adding a new node or edge to a graph database is a
straightforward task that does not require significant schema changes.
Data Types
Common Data Types in NoSQL Databases:
Text/String:
- Used for storing plain text or strings.
- Example: Names, addresses, or any text-based data.
- Supported By: MongoDB, Cassandra, Couchbase, DynamoDB, etc.
Number:
- Handles integers, floating-point numbers, and sometimes complex numbers.
- Example: Quantities, prices, sensor data.
- Supported By: All NoSQL databases.
Boolean:
- Used for storing true or false values.
- Example: Flags, status indicators, or binary states.
- Supported By: Most NoSQL databases.
Array/Lists:
- Represents ordered collections of elements, which can be a mix of different
types.
- Example: A list of product IDs in an order.
- Supported By: MongoDB, Couchbase, Cassandra (as sets, lists).
Object/Document:
- Stores complex data as key-value pairs where values can be other types or
even nested documents/objects.
- Example: JSON documents representing users or products.
- Supported By: MongoDB, Couchbase, CouchDB, DynamoDB.
Binary/BLOB:
- Stores binary large objects such as files, images, or other media types.
- Example: Images, videos, PDF files.
- Supported By: MongoDB (Binary), Cassandra (BLOB), Couchbase.
Date/Time:
- Stores dates and timestamps.
14 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
- Example: Transaction times, event logs.
- Supported By: MongoDB, Cassandra, DynamoDB.
Geospatial Data:
- Handles geographic data like coordinates (latitude, longitude).
- Example: Location data for mapping applications.
- Supported By: MongoDB (Geospatial Indexing), Couchbase, Cassandra (via
custom types).
UUID (Universally Unique Identifier):
- A unique identifier for objects or records.
- Example: User IDs, session tokens.
- Supported By: MongoDB, Cassandra, Couchbase, DynamoDB
Task:
2: By referring to the previous activity 1.1.1, you are requested to define use cases based on
this case study
A TVET (Technical and Vocational Education and Training) school is expanding its student
base and curriculum. The institution offers a wide range of courses, from mechanical
engineering and information technology to culinary arts and electrical installation.
Managing student records, course materials, assessments, and operational data has
become a challenge due to growing enrolment and the diverse nature of the
programs. As database analyst, you are tasked to define or perform use cases
database of the aforementioned school by showing the relationship between users.
15 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
Key readings 1.1.2: Defining use cases
16 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
Examples of use case:
- Use Case: Storing user session data, likes, comments, and follower relationships
in real-time.
- Reason: Social media apps require fast retrieval and storage of user-generated
content and interactions. Key-value stores allow for rapid read/write
operations, making them ideal for caching session data, posts, and timelines.
- Example: Twitter uses Redis as a fast, in-memory key-value store to store user
timelines and deliver real-time updates.
- Use Case: Collecting and analyzing massive volumes of sensor data in real-time.
- Reason: IoT devices generate vast amounts of time-series data, and column-
family stores like Cassandra can efficiently store this type of data, allowing fast
writes and optimized querying for analytics.
- Example: Companies that manage smart home devices use Cassandra to store
sensor data from devices such as thermostats, cameras, and motion detectors.
17 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
Points to Remember
18 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
Application of learning 1.1.
19 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
Indicative content 1.2: Analysing NoSQL database
Duration: 3 hrs
i. What is requirement?
ii. What do you understand by requirement analysis?
iii. Identify factors to consider while choosing requirement analysis?
iv. Differentiate requirement analysis from perform analysis?
2: Write your findings on flipchart blackboard or white board
21 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
Practical Activity 1.2.1: Performing data analysis
Task:
22 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
Steps for performing data analysis
23 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
- Filling in any missing data
24 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
- Structuring the data so it can be manipulated more easier
- Removing any data that isn’t relevant to the analysis you are performing
Step 4: Analysing the Data
Now that you have data to work with that is relevant to the task you have at hand and
are confident that it’s accurate, it’s time to analyze the data.
There are a few different analysis techniques.
- Descriptive Analysis: Descriptive analytics involves looking at past events
and patterns. It is often the first step in data analysis before delving deeper
into a subject.
- Diagnostic Analysis: Diagnostic analytics is a type of analytics aiming to
understand a problem’s root cause.
- Predictive Analysis: Predictive analytics is a type of data analysis that uses
historical data to forecast future trends and growth.
- Prescriptive Analysis: Prescriptive analytics is a type of data analysis that
enables users to make recommendations for future actions.
Step 5: Interpret the Results
Up to this point, we know what answers we are looking for in the data by clarifying
research questions.
Implement data validation
Data validation is a crucial step in data analysis to ensure data quality and accuracy. It
involves checking data for errors, inconsistencies, and completeness.
Step 1. Range Checking:
- Verify data within limits: Ensure that numerical values fall within
predefined ranges (e.g., age cannot be negative).
Step 2. Format Checking:
- Validate data format: Check that data adheres to specific formats (e.g.,
email addresses, dates, phone numbers).
Step 3. Consistency Checking:
- Verify data relationships: Ensure that related data values are consistent
(e.g., a person's birth date should be earlier than their current age).
Step 4. Completeness Checking:
- Check for missing values: Verify that all required fields are filled in.
Step 5. Uniqueness Checking:
- Ensure unique values: Verify that certain fields have unique values (e.g.,
customer IDs or social security numbers).
Step 6. Cross-Validation:
- Compare data sources: Compare data from multiple sources to identify
inconsistencies or errors.
Step 7. Business Rule Validation:
- Enforce specific rules: Check that data adheres to predefined business
25 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
rules (e.g., product prices must be positive).
Step 8. Regular Expression Validation:
- Use patterns: Employ regular expressions to validate data based on
specific patterns (e.g., validating email addresses).
Step 9. Data Type Validation:
- Ensure correct types: Verify that data values are of the correct data
type (e.g., a field for age should be numeric).
Step 10. Data Quality Checks:
- Assess data quality: Use data quality metrics to evaluate the accuracy,
completeness, consistency, and timeliness of data.
Points to Remember
Data validation factors are range checking, format checking, consistency checking,
completeness checking and uniqueness checking.
Consideration of requirement analysis are identify key stakeholders and end-user,
capture requirements, categorize requirements, interpret and record
requirements.
Data analysis factors are data cleaning and preparation ,Exploratory data analysis,
Statistical analysis, predictive modelling, Data visualization
Steps to perform data analysis are:
Define the problem
Collect data
Prepare data
Analyse data
Interpret results
Steps to implement data validation are:
Range checking
Format checking
Consistency checking
Completeness checking
Uniqueness checking
Cross-validation
Business rule validation
regular expression validation
Data type validation
Data quality checks.
26 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
Application of learning 1.2.
MXZY Ltd is a company that generates revenue from selling furniture products. The
company uses file system (books) to record information about the sales and inventory.
The company has a problem of non-efficient security, accessibility and management of
information about the stock-in, stock-out and customers. Therefore, the company wants
to switch to a digital system that can be accessed easily by authorized employees. As
database analyst, you are requested to analyse the above case study and find out the
elements that will be stored in database of the company.
27 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
Indicative content 1.3: Preparing database environment
Duration: 4 hrs
4: Pay attention to the trainer’s clarification and ask questions where necessary.
28 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
- Query Optimization: MongoDB's query optimizer analyses queries and chooses the
most efficient execution plans to optimize performance.
Other Factors:
- Replication: MongoDB supports replication to provide data redundancy and fault
tolerance. This helps ensure that data is available even if a server fails.
- High Availability: MongoDB offers high availability features like replica sets and
sharded clusters to minimize downtime and ensure data consistency.
- Cloud Integration: MongoDB is available on various cloud platforms, allowing you to
leverage cloud infrastructure for scalability and management.
- Sharding Key Selection: Choose sharding keys carefully to ensure balanced data
distribution and optimal query performance.
- Indexing Strategy: Create appropriate indexes to support common query patterns and
improve query performance.
- Data Modelling: Design your data model to consider scalability requirements and avoid
performance bottlenecks.
- Monitoring and Tuning: Continuously monitor your MongoDB cluster and tune
performance settings as needed.
Setting up MongoDB environment
Shell environment
To get started with MongoDB, you have to install it in your system. You need to find and download
the latest version of MongoDB, which will be compatible with your computer system. You can
use this (http://www.mongodb.org/downloads) link and follow the instruction to install
MongoDB in your PC. In this chapter, you will learn how to setup a complete environment to
start working with MongoDB.
It is to be noted that, MongoDB will not run in Windows XP; so you need to install higher versions
of windows to use this database.
29 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
Once you visit the link (http://www.mongodb.org/downloads), Install mongoDB in windows
Once the download is complete, double click this setup file to install it. Follow the steps:
1. Click Next.
30 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
2. Now, choose Complete to install MongoDB completely.
31 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
3. Then, select the radio button "Run services as Network service user."
4. The setup system will also prompt you to install MongoDB Compass, which is MongoDB
official graphical user interface (GUI). You can tick the checkbox to install that as well.
32 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
Once the installation is done completely, you need to start MongoDB and to do so follow the
process:
C:\Program Files\MongoDB\Server\4.0\bin>mongo.exe
33 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
Set Environment Variables:
Add the MongoDB bin directory to your system's PATH environment variable. This allows you to
run MongoDB commands from any directory.
On Windows, you can modify the system environment variables. On macOS and Linux, you can
edit your shell's configuration file (e.g., .bashrc, .zshrc).
Start the MongoDB Server:
- Open a terminal or command prompt and run the following command: mongod
Connect to the MongoDB Shell:
- Open a new terminal or command prompt and run the following command: mongo
Compass environment
Compass Environmental is a prominent environmental consulting firm that offers a wide range
of services to help businesses and organizations address their environmental challenges. With
a global presence and a team of experienced professionals, they provide expert guidance and
solutions for a variety of environmental issues.
34 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
- Waste Management: Developing sustainable waste management strategies and
solutions.
Atlas environment
Atlas is a renowned environmental consulting firm that offers a comprehensive range of services
to help businesses and organizations address their environmental challenges. With a global
footprint and a team of highly skilled professionals, they provide innovative solutions for a
variety of environmental issues.
35 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
- Water Quality Management: Ensuring compliance with water quality standards
and protecting water resources.
- Air Quality Management: Monitoring and managing air quality to minimize
pollution.
- Soil and Groundwater Remediation: Cleaning up contaminated sites to protect
human health and the environment.
- Environmental Permitting: Assisting clients in obtaining necessary permits and
licenses for their projects.
- Sustainability Consulting: Developing and implementing sustainable business
practices.
Why Choose Atlas?
Task:
1: Go to the computer lab and by referring to the previous theoretical activity 1.3.1, set
database environment
2: Present the steps to set database environment
3: Referring to the steps provided in task 2, set the database environment 4:
Present your work to the trainer or classmates.
5: Ask questions for clarification where necessary 6:
Read the key readings 1.3.2
36 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
Key readings 1.3.2: Setting database environment
Setting up a NoSQL database environment involves several key steps. The exact process may
vary depending on the specific NoSQL database you're using (e.g., MongoDB, Cassandra,
Redis),
Points to Remember
37 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
Application of learning 1.3.
Track Fini is a startup company which specialises in real-time financial tracking. The company
is migrating from SQL to NoSQL for its scalability and efficient handling of large
datasets.and You are tasked to help the company to create mongoDB database
environment.
38 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
Learning outcome 1 end assessment
Theoretical assessment
Q1. Read the statements carefully, then circle the letter corresponding with the correct
answer:
iii. Which term refers to the process of storing and retrieving data without fixed
schemas?
a) Relational Database
b) Schema Validation
c) NoSQL
d) Data normalization
39 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
vii. What does optimistic locking help prevent in MongoDB?
a) Document duplication
b) Schema validation errors
c) Simultaneous updates from overwriting data
d) Indexing failures
ix. In requirement analysis for a NoSQL database, what is the first step?
a) Perform Data Validation
b) Identify Key Stakeholders and End-Users
c) Interpret and Record Requirements
d) Capture Data Models
Q2. Read the statement carefully then answer By TRUE for the correct statement Or
FALSE for the wrong statement.
i. NoSQL databases are always schema-less and cannot enforce any structure on the
data.
ii. MongoDB stores data in the form of tables similar to relational databases.
iii. In MongoDB, documents within the same collection can have different structures.
iv. Indexing in MongoDB can improve the performance of read operations by making
data retrieval faster.
v. Optimistic locking prevents other users from accessing a document while it is
being edited.
vi. In MongoDB, relationships between documents can be modeled by embedding
documents or referencing them.
vii. Collections in MongoDB always require a predefined structure for documents.
viii. MongoDB Atlas is a cloud-based solution that automates the deployment and
scaling of MongoDB databases.
ix. In the requirements analysis process for a NoSQL database, identifying key
stakeholders and end-users is the first step.
x. MongoDB's Compass environment is a command-line tool used for database
management.
40 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
Q3. Read carefully and then provide answers to the quetions below
i. What is NoSQL?
ii. What is MongoDB and how does it work?
iii. What is meant by Availability in a NoSQL context?
iv. What is a Document in MongoDB?
v. What is a Collection in MongoDB?
vi. How does Indexing work in MongoDB?
vii. How are Relationships managed in NoSQL databases like MongoDB?
viii. What is a Data Model in NoSQL?
ix. . What is a Schema in NoSQL databases?
x. How are user requirements identified for a database?
xi. What are the characteristics of collections in MongoDB?
xii. What are key features of NoSQL databases?
xiii. What are the types of NoSQL databases?
xiv. xv. What types of data type can NoSQL databases store?
xv. What are the steps in the requirements analysis process for a NoSQL database?
xvi. xvii. How is data validation implemented in NoSQL databases like MongoDB?
xvii. How does MongoDB ensure scalability?
xviii. xix. What are the different environments for working with MongoDB?
Practical assessment
Suppose that there is a newly opened Cybercafé located in your area and it needs a database
Developer to set up their database infrastructure. You are asked to help the cybercafé to
set up MongoDB using the mongosh shell and create a database called ecommerce db.
41 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
References :
42 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
Learning Outcome 2: Design NoSQL Database
43 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
Indicative contents
44 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
Duration: 20 hrs
By the end of the learning outcome, the trainees will be able to:
Resources
45 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
Indicative content 2.1: Selecting tools of drawing databases.
Duration: 3 hrs
Tasks:
4: Pay attention to the trainer’s clarification and ask for clarifications where necessary.
5: Read the key readings 2.1.1
46 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
- Indexing: Offers various indexing options to optimize query performance.
- Replication and Sharding: Provides high availability through replica sets and
horizontal scalability via sharding.
Tool:
- MongoDB Compass: A GUI tool for MongoDB that allows users to visualize and explore
data, build queries, and manage indexes.
Redis
Description: Redis is an in-memory key-value store known for its speed and efficiency. It is often
used for caching, session storage, and real-time analytics.
Key Features:
- In-Memory Storage: Provides extremely fast data access.
- Data Structures: Supports various data types including strings, hashes, lists, sets,
and sorted sets.
- Persistence Options: Offers mechanisms for data durability such as snapshots
and append-only files.
- Pub/Sub: Supports publish/subscribe messaging patterns for real-time
communication.
Tool:
- Redis Desktop Manager (RDM): A GUI tool for managing Redis databases,
offering features like data visualization and management.
Apache Cassandra
Description: Apache Cassandra is a highly scalable column-family store designed for handling
large amounts of data across many servers with no single point of failure.
Key Features:
- Horizontal Scalability: Designed to scale out by adding more nodes to the cluster.
- High Availability: Provides continuous availability with no downtime.
- Tunable Consistency: Allows the configuration of consistency levels based on use
case requirements.
- Distributed Architecture: Supports data replication across multiple nodes and
data centers.
Tool:
- DataStax Studio: An interactive development environment for Apache Cassandra
that allows users to visualize and interact with data.
Neo4j
Description: Neo4j is a graph database designed to represent and query data with complex
relationships. It is often used for applications involving networked data, such as social networks
or recommendation engines.
Key Features:
- Graph Data Model: Stores data as nodes and relationships, making it ideal for highly
interconnected data.
47 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
- Cypher Query Language: Provides a powerful query language specifically designed
for graph traversal and pattern matching.
- ACID Compliance: Ensures data integrity through transactional support.
Tool:
- Neo4j Browser: An interactive web-based tool for querying and visualizing graph
data in Neo4j.
Couchbase
Description: Couchbase is a distributed document-oriented NoSQL database that combines key-
value store capabilities with document database features.
Key Features:
- Multi-Model: Supports both key-value and document models.
- In-Memory Performance: Utilizes in-memory caching for fast data access.
- Global Distribution: Offers cross datacenter replication and automatic failover
for high availability.
- N1QL Query Language: Provides SQL-like queries for JSON documents.
Tool:
- Couchbase Sync Gateway: A tool for synchronizing data between Couchbase
Server and Couchbase Lite (mobile), allowing for offline-first applications.
Amazon DynamoDB
Description: Amazon DynamoDB is a managed key-value and document database service
provided by AWS. It is designed for high availability and seamless scaling.
Key Features:
- Managed Service: AWS handles maintenance tasks like backups, updates, and
scaling.
- Automatic Scaling: Automatically adjusts throughput capacity based on
workload demands.
- High Performance: Provides low-latency access to data and high throughput.
- Integration with AWS Ecosystem: Easily integrates with other AWS services such
as Lambda, S3, and CloudWatch.
Tool:
- AWS DynamoDB Console: A web-based interface for managing DynamoDB
tables, performing queries, and monitoring performance.
Apache HBase
Description: Apache HBase is a column-family store built on top of the Hadoop Distributed File
System (HDFS). It is used for real-time read/write access to large datasets.
Key Features:
- Scalability: Can handle large amounts of data across many servers.
- Strong Consistency: Ensures that read operations reflect the most recent writes.
- Integration with Hadoop: Integrates well with Hadoop for big data processing.
48 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
Tool:
- HBase Shell: A command-line tool for managing and querying HBase tables.
RavenDB
Description: RavenDB is a document-oriented database with a focus on simplicity and developer
productivity. It provides features for easy data management and query capabilities.
Key Features:
- Embedded or Standalone: Can be used as an embedded database or as a
standalone server.
- Indexing and Querying: Supports full-text search and complex queries with
automatic indexing.
- Transaction Support: Provides ACID transactions for reliable data operations.
Tool:
- RavenDB Studio: A web-based management tool that provides a user-friendly
interface for database operations, querying, and monitoring.
Task:
Edraw Max is a versatile diagramming and drawing tool used for creating flowcharts, mind
maps, organizational charts, floor plans, and more.
49 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
- Visit the Official Website:
Go to the Edraw Max official website.
- Select the Download Option:
Navigate to the download section and choose the version suitable for your operating
system (Windows, macOS, or Linux).
- Download the Installer:
Click on the download link for the installer. This will download a setup file to your
computer.
Install Edraw Max on Windows
Run the Installer:
- Locate the downloaded .exe file (e.g., EdrawMax_Setup.exe) and
double-click it to start the installation process.
Start Installation:
- The installation wizard will open. Click “Next” to proceed.
Read and Accept the License Agreement:
- Review the End User License Agreement (EULA). If you agree, select “I Agree”
to continue.
Choose Installation Folder:
- Select the destination folder where you want to install Edraw Max or accept
the default location. Click “Next”.
Select Additional Tasks:
- Choose any additional tasks or shortcuts you want to create. Click “Next”.
Install:
- Click “Install” to begin the installation. The process will take a few minutes.
Finish Installation:
- Once the installation is complete, click “Finish” to exit the installer. You can
now launch Edraw Max from your desktop or Start menu.
Install Edraw Max on macOS
Run the Installer:
- Locate the downloaded .dmg file (e.g., EdrawMax.dmg) and double-click it to
open.
Drag and Drop to Applications Folder:
- A window will appear with the Edraw Max application icon and a shortcut to
the Applications folder. Drag the Edraw Max icon into the Applications folder.
Open Edraw Max:
- Go to the Applications folder and double-click on Edraw Max to launch it. The
first time you open it, you may need to confirm that you want to open an
application downloaded from the internet.
50 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
Install Edraw Max on Linux
Edraw Max does not have a native Linux version, but you can run it using compatibility layers
or virtual machines. Here’s a common method using Wine:
Install Wine:
- First, install Wine on your Linux system. You can usually do this through your
package manager. For example:
- sudo apt update
- sudo apt install wine
Run the Installer with Wine:
- Download the Windows installer for Edraw Max. Then, use Wine to run the
installer: wine EdrawMax_Setup.exe
Follow the Windows Installation Instructions:
Post-Installation
- Activation: Launch Edraw Max. If you have a license key, enter it when prompted
to activate the full version.
- Updates: Check for any updates or patches from within the application or on the
official website.
Points to Remember
51 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
Application of learning 2.1.
Suppose that KKK Company needs to install software applications in its computers. Now, as
software developer you are asked to help the company install Edraw Max (on Windows
OS).
52 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
Indicative content 2.2: Creating conceptual data model.
Duration: 10 hrs
4: Pay attention to the trainer’s clarification and ask clarifications where necessary.
5: Read the key readings 2.2.1
Collections in NoSQL databases (like MongoDB) are equivalent to tables in relational databases.
They hold documents or records that share some common characteristics, but unlike relational
tables, collections don’t enforce a fixed schema.
Example:
If you're designing a database for an e-commerce application, some key collections might include:
- users (for storing customer information)
- products (for listing items for sale)
- orders (for tracking customer purchases)
53 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
Each collection contains documents representing individual entities (like individual
users, products, or orders)
Modeling Entity Relationships
NoSQL databases focus on performance, scalability, and flexibility, requiring a different
approach for modeling relationships based on the database type (document, key-value, column-
family, graph). There are common techniques for modeling entity relationships in NoSQL
databases:
54 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
Sharding and Replication
Sharding:
Definition: Sharding is a technique used to distribute data across multiple servers, or “shards,” to
horizontally scale a database. Each shard contains a subset of the data, which helps in handling
large datasets or high-traffic applications.
Example: An online marketplace may shard its products collection by category. One shard could
hold all electronic items, while another could hold clothing items.
Key Benefits:
- Improved performance
- Increased capacity
- Distributed workload.
Replication:
Definition: Replication involves creating multiple copies of the same data across different servers or
data centers. This ensures data availability and redundancy in case of server failure.
Example: MongoDB uses replica sets to replicate data. A replica set contains a primary server and
multiple secondary servers, where the secondary servers are read-only but can take over in case
the primary fails.
Key Benefits:
- Fault tolerance
- High availability
- Load balancing.
Use Case Example: An E-commerce Platform with Users, Orders, Products, and Reviews.
Entities:
55 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
- Review: Represents user reviews for products.
What is a class?
In object-oriented programming (OOP), a class is a blueprint or template for creating objects.
Objects are instances of classes, and each class defines a set of attributes (data members) and
methods (functions or procedures) that the objects created from that class will possess. The
attributes represent the characteristics or properties of the object, while the methods define
the behaviors or actions that the object can perform.
UML Class Notation
class notation is a graphical representation used to depict classes and their relationships in object-
oriented modeling.
Class Name: Is typically written in the top compartment of the class box and is
cantered and bold.
Attributes: Also known as properties or fields, represent the data members of the
class. They are listed in the second compartment of the class box and often
include the visibility (e.g., public, private) and the data type of each attribute.
Methods: Also known as functions or operations, represent the behaviour or
functionality of the class. They are listed in the third compartment of the class
box
56 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
and include the visibility (e.g., public, private), return type, and parameters of each
method.
Visibility Notation: Indicate the access level of attributes and methods.
Relationships between classes: In class diagrams, relationships between classes
describe how classes are connected or interact with each other within a system.
57 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
DFD is used in various organizations for the smooth running of system. Like in a Banking
software system, it is used to describe how data is moved from one entity to another.
58 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
Components of Data Flow Diagrams (DFD)
- Process: Input to output transformation in a system takes place because of
process function. The symbols of a process are rectangular with rounded corners,
oval, rectangle or a circle.
- Data Flow: Describes the information transferring between different parts of the
systems. The arrow symbol is the symbol of data flow. A relatable name should
be given to the flow to determine the information which is being moved.
- Data Store : The data is stored in the warehouse for later use. Two horizontal
lines represent the symbol of the store. The warehouse is simply not restricted to
being a data file rather it can be anything like a folder with documents, an optical
disc, a filing cabinet.
- Terminator (External Entity): Is an external entity that stands outside of the
system and communicates with the system.
DFD Levels:
Data Flow Diagram (DFD) uses hierarchy to maintain transparency thus multilevel Data
Flow Diagram (DFD’s) can be created. Levels of Data Flow Diagram (DFD) are as follows:
Level 0: Shows a high-level overview of the system.
It is also known as a context diagram. It’s designed to be an abstraction view, showing
the system as a single process with its relationship to external entities. It represents the
entire system as a single bubble with input and output data indicated by
incoming/outgoing arrows.
59 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
Level 1: Breaks down each major process into subprocesses.
This level provides a more detailed view of the system by breaking down the major
processes identified in the level 0 DFD into sub-processes. Each sub-process is depicted
as a separate process on the level 1 DFD. The data flows and data stores associated with
each sub-process are also shown. In 1-level DFD, the context diagram is decomposed into
multiple bubbles/processes. In this level, we highlight the main functions of the system
and breakdown the high-level process of 0-level DFD into subprocesses.
Level 2 DFD
This level provides an even more detailed view of the system by breaking down the sub-
processes identified in the level 1 DFD into further sub-processes. Each sub-process is
depicted as a separate process on the level 2 DFD. The data flows and data stores
associated with each sub-process are also shown
60 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
Rules for Data Flow Diagram (DFD)
Data can flow from:
- Terminator or External Entity to Process
- Process to Terminator or External Entity
- Process to Data Store
- Data Store to Process
- Process to Process
Data Cannot Flow From:
- Terminator or External Entity to Terminator or External Entity
- Terminator or External Entity to Data Store
- Data Store to Terminator or External Entity
- Data Store to Data Store
61 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
A conceptual data model is a high-level representation of an organization's data
requirements, focusing on the entities and their relationships. It's a blueprint for the
database design process.
62 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
Practical Activity 2.2.2: Creating data flow diagram
Task:
3: Referring to the steps provided in task 2, create the data flow diagram.
63 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
Key readings 2.2.2: Creating Data Flow Diagram
Steps Create a Data Flow Diagram
Now that you have some background knowledge on data flow diagrams and how they
are categorized, you’re ready to build your own DFD. The process can be broken down
into 5 steps:
Step 1.Identify major inputs and outputs in your system
Nearly every process or system begins with input from an external entity and ends with
the output of data to another entity or database. Identifying such inputs and outputs
gives a macro view of your system, it shows the broadest tasks the system should
achieve. The rest of your DFD will be built on these elements, so it is crucial to know
them early on.
Step 2.Build a context diagram
Once you’ve identified the major inputs and outputs, building a context diagram is
simple. Draw a single process node and connect it to related external entities. This node
represents the most general process that information follows to go from input to output.
The data diagram flow example below shows how information flows between various
entities via an online community. Data flows to and from the external entities,
representing both input and output. The center node, “online community,” is the general
process.
Step 3.Expand the context diagram into a level 1 DFD
The single process node of your context diagram doesn’t provide much information—you
need to break it down into sub-processes. In your level 1 data flow diagram, you should
include several process nodes, major databases, and all external entities. Walk through
the flow of information: where does the information start and what needs to happen to
it before each data store?
Step 4.Expand to a level 2+ DFD
To enhance the detail of your data flow diagram, follow the same process as in step 3.
The processes in your level 1 DFD can be broken down into more specific sub-processes.
Once again, ensure you add any necessary data stores and flows—at this point, you
should have a fairly detailed breakdown of your system. To progress beyond a level 2
data flow diagram, simply repeat this process. Stop once you’ve reached a satisfactory
level of detail.
Step 5.Confirm the accuracy of your final diagram
When your diagram is completely drawn, walk through it. Pay close attention to the flow
of information: does it make sense? Are all necessary data stores included? By looking at
your final diagram, other parties should be able to understand the way your system
64 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
functions. Before presenting your final diagram, check with co-workers to ensure your
65 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
diagram is comprehensible.
Points to Remember
66 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
An association represents a bi-directional relationship between two classes. It
indicates that instances of one class are connected to instances of another class.
Data Flow Diagrams (DFD) provide a graphical representation of the data flow of a
system that can be understood by both technical and non-technical users.
Components of Data flow Diagram are Process, Data flow, Data store and external
entity.
Levels of Data Flow Diagram are (Level 0: Shows a high-level overview of the
system, (Level 1: Breaks down each major process into subprocesses), (level 2
DFD provides an even more detailed view of the system by breaking down the
sub- processes identified in the level 1 DFD into further sub-processes).
Steps for creating data flow diagram are:
Identify major inputs and outputs of the system
Build a context diagram( level 0 DFD)
Expand the content diagram into a level 1DFD
Expand to a level2+DFD
Confirm the accuracy of your final system
ABC Company is migrating from SQL to NoSQL, then as a database designer, you are
requested to help the company to create DFD on MongoDB conceptual data model for its
e-commerce platform, focusing on users, products, orders, and reviews.
67 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
Indicative content 2.3: Designing MongoDB database schema
Duration: 7 hrs
workload Tasks:
1: Answer the following questions:
4: Pay attention to the trainer’s clarification and ask for clarification where necessary.
5: Read the key readings 2.3.1
68 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
Define Collection Structure
In MongoDB, collections are designed to store related documents. The structure depends on
whether data is stored as embedded documents or separate collections with references.
Embedded vs. Referenced Documents:
- Embedded Documents: Store nested data within the same document. This is
ideal for one-to-one or one-to-few relationships and for scenarios where the
data is always accessed together.
- Referenced Documents: Store data in separate collections with references to
related documents. This is useful for one-to-many or many-to-many
relationships, or when data is accessed independently.
Example:
Embedded: Storing product reviews within the product document for
faster access.
{
"_id": 101,
"name": "Laptop",
"price": 999.99,
"reviews": [
{ "userId": 1, "rating": 5, "comment": "Great laptop!" },
{ "userId": 2, "rating": 4, "comment": "Good value for money." }
]
}
Referenced: Storing orders in a separate collection, with references to
users and products.
{
"_id": 5001,
"userId": 1,
"productIds": [101, 102],
"total": 1999.98
}
Map Schema Relationships
69 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
- One-to-Many: Typically use embedded documents for related data or references
for data that grows large or is queried separately.
- Many-to-Many: Use references to relate data between collections.
- Example: For the e-commerce application:
User to Orders (One-to-Many): One user can have multiple orders, stored as references.
Order to Products (Many-to-Many): An order can contain multiple products, and a
product can be part of multiple orders. This is best represented using references in
both Orders and Products collections.
{
"_id": 1, "name":
"Alice",
"orders": [5001, 5002] // References to Orders collection
}
{
"_id": 5001,
"userId": 1, // Reference to Users collection
"products": [
{ "productId": 101, "quantity": 2 },
{ "productId": 102, "quantity": 1 }
]
}
Validate and Normalize Schema
Even though MongoDB is schema-less, it's essential to maintain some level of validation and
normalization for consistency and data integrity.
Validation:
- Schema Validation: MongoDB allows you to define validation rules at the
collection level. You can enforce specific types and structures for your
documents.
- Example: Ensure that all documents in the users collection contain name, email,
and address fields.
{
"validator":
{ "$jsonSchema":
{ "bsonType": "object",
"required": [ "name", "email", "address" ],
"properties": {
"name": { "bsonType": "string" },
"email": { "bsonType": "string", "pattern": "^.+@.+$" },
70 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
"address":
{ "bsonType":
"object",
"properties": {
"street": { "bsonType": "string" },
"city": { "bsonType": "string" },
"state": { "bsonType": "string" },
"zip": { "bsonType": "string" }
}
}
}
}
}
}
Normalization:
- Normalization in NoSQL often means minimizing data redundancy where
necessary. However, you can denormalize if it improves performance (e.g.,
embedding data for faster reads).
- Example: Store a reference to a userID in orders to avoid duplicating user data in
every order document.
Apply Design Patterns
MongoDB offers various design patterns to solve common data modeling challenges.
Common MongoDB Design Patterns:
Bucket Pattern:
- Description: Useful when data grows quickly. Instead of creating one document
per event, you can store multiple related events in one document.
- Example: Instead of creating a new document for every individual user login, you
can group logins for a user into a single document by day.
Schema:
{
"userId": "user123",
"logins": [
{ "date": "2023-09-10", "loginCount": 5 },
{ "date": "2023-09-11", "loginCount": 3 }
]
}
Extended Reference Pattern:
- Description: This pattern is used when you need to minimize the number of joins
(or $lookup queries) by embedding essential information from the referenced
document.
71 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
- Example: Embedding a product's name and price within an order document
instead of performing a join to get the product details during every order
retrieval.
Schema:
{
"_id": "order123",
"userId": "user456",
"items": [
{ "productId": "prod789", "productName": "Laptop", "price": 999.99,
"quantity": 1 },
{ "productId": "prod654", "productName": "Mouse", "price": 19.99,
"quantity": 2 }
],
"totalAmount": 1039.97
}
Outlier Pattern:
- Description: For handling rare "outlier" documents that are much larger than
others, move those documents to a separate collection to optimize general
query performance.
- Example: If some products have extremely large descriptions or media files,
move those fields to a separate productDetails collection.
72 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
Practical Activity 2.3.2: Drawing of entity relationship Diagram (ERD)
Task:
1: Read the case study below and then draw an entity relationship diagram (ERD)
Suppose that you are given the following requirements for a simple database for the
National League (NL):
i. The NL has many teams,
ii. Each team has a name, a city, a coach, a captain, and a set of players,
iii. Each player belongs to only one team,
iv. Each player has a name, a position (such as left wing or goalie), a skill
level, and a set of injury records,
v. The team captain is also a player, then construct a clean and concise ER
diagram for the NL database
2: Respect instructions from your trainer
6: Read the key readings 2.3.2 and ask for clarification where necessary.
Before starting to draw an ER diagram, it is essential to gather requirements and identify the
entities involved. This preparation phase sets the groundwork for creating a comprehensive
and effective ER diagram.
73 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
Consider the following steps:
- Start by understanding the goals and objectives of your project. Gather all necessary
information about the system or database you are working with.
- Identify the main entities involved in the system. These entities can represent real-
world objects, concepts, or people.
- Consider the relationships between the identified entities. Determine how they interact
and depend on each other.
Start by identifying the main entities in your system. Entities are objects or concepts that
have data to be stored.
Attributes provide additional information about entities. Add relevant attributes to the
entities identified in the previous steps.
In this final step, we focus on refining the ER diagram to enhance clarity and readability.
Organize the entities and relationships in a logical and intuitive manner. Group related
entities together and arrange them in a way that reflects their connections.
Points to Remember
74 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
Normalization is process of removing/minimizing data redundancy where
necessary in database.
Apply Design Patterns like Use MongoDB’s design patterns like the bucket
pattern, extended reference pattern, and outlier pattern to optimize the
schema
Steps for drawing entity relationship diagram
Defining entities
Establishing relationships
Add attributes to entities defined
Refining diagram
FitTrack is a company that is seeking for a database developer to design a fitness tracking
application. you are tasked to help the company identify the application's workload,
define the collection structure, and map schema relationships, to validate and normalize
the schema of FitTrack company
75 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
Learning outcome 2 end assessment
Theoretical assessment
Q1. Read the statement carefully, circle the letter corresponding to the correct answer
based on the statement given
i. Which of the following is a tool used for drawing NoSQL databases?
a) Microsoft Word
b) Edraw Max
c) Excel
d) Photoshop
ii. What is the first step in installing Edraw Max?
a) Run the software without downloading
b) Download the installer from the official website
c) Purchase a physical copy from a store
d) Set up a MySQL database
vi. Which of the following diagrams helps visualize how data flows through a system?
a) Entity Relationship Diagram (ERD)
b) UML Diagram
c) Data Flow Diagram (DFD)
d) Gantt Chart
76 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
vii. What should be considered when designing a conceptual data model?
a) The file system structure
b) The user interface design
c) The high-level entities and their relationships
d) Network configuration
8. How would you design a conceptual data model for an e-commerce website using
MongoDB?
9. How do you identify the application workload for MongoDB schema design?
10. What are common schema design patterns used in MongoDB?
77 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
Practical assessment
Imagine there is ABC Network Company that is located in your area. The company needs to
draw an entity relationship diagram (ERD) based on following business rules:
a) A salesperson may manage many other salespeople.
b) A salesperson is managed by only one salespeople.
c) A salesperson can be an agent for many customers.
d) A customer is managed by one salespeople.
e) A customer can place many orders.
f) An order can be placed by one customer.
g) An order lists many inventory items.
h) An inventory item may be listed on many orders.
i) An inventory item is assembled from many parts.
j) A part may be assembled into many inventory items.
k) Many employees assemble an inventory item from many parts.
l) A supplier supplies many parts.
m) A part may be supplied by many suppliers.
Now, help the company carry out the above-mentioned tasks
78 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
References:
79 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
Learning Outcome 3: Implement Database Design
80 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
Indicative contents
81 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
Duration: 20 hrs
By the end of the learning outcome, the trainees will be able to:
Resources
82 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
Indicative content 3.1: Perform MongoDB data definition
Duration: 5 hrs
Tasks:
1: Answer the following questions:
i. What do you understand by mongoDB data?
ii. What is data manipulation language mean?
iii. List features of MongoDB
iv. Discuss advantage and disadvantage of mongoDB
v. Explain mongoDB
4: Pay attention to the trainer’s clarification and ask question where necessary.
MongoDB data definition refers to the structure and organization of data within a
MongoDB database, which is designed to be flexible and scalable. In MongoDB, data is
stored in collections, which are analogous to tables in relational databases. Each collection
contains documents, which are the fundamental data units and are stored in a binary
JSON- like format called BSON (Binary JSON).
83 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
- Indexes: Structures that improve the speed of data retrieval operations on a collection.
Indexes can be created on one or multiple fields within documents to optimize query
performance.
- Schema: While MongoDB is schema-less, it can benefit from a defined schema to
maintain consistency and structure. This is often achieved through schema validation
rules, which can be applied to collections.
- Data Types: MongoDB supports various data types, including string, number, date, array,
object, and others, which allow for versatile data representation.
Perform MongoDB Data Definition
To perform various MongoDB Data Definition tasks, including creating, dropping, and
renaming databases and collections.
Create a Database
To create a new database, you can use the use command, which switches to a specified
database. If the database doesn’t exist, it will be created when you first insert data into it.
use myDatabase
Create Collections
- Once you have a database, you can create collections within it.
db.createCollection("myCollection")
- You can also create a collection implicitly by inserting a document:
db.myCollection.insertOne({ name: "John Doe", age: 30 })
Drop a Database
To drop an entire database, you can use the following command. This will delete the
database and all its collections:
db.dropDatabase()
Make sure to switch to the database you want to drop using use before executing this
command.
Drop Collections
To drop a specific collection from a database, use:
db.myCollection.drop()
This will remove the specified collection and all the documents within it.
Rename a Database
MongoDB does not provide a direct command to rename a database. However, you can
achieve this by copying the collections from the old database to a new one and then
dropping the old database.
// Switch to the source database
use oldDatabase
84 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
// Copy each collection from oldDatabase to newDatabase
db.oldCollection1.find().forEach(function(doc) {
db.newCollection1.insert(doc);
});
Task:
2: Go to the computer lab and by referring to the previous activity install mongo DB
- If you are running MongoDB locally, start the MongoDB server (mongod).
- If you are using a cloud service like MongoDB Atlas, log in to your cluster.
85 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
Step 2: Open the MongoDB Shell or MongoDB Compass
- For MongoDB Compass, simply open the application and connect to your
MongoDB instance.
- To create a database, type the following command in the MongoDB shell. This
switches to the new database, which will be created when you insert data:
use myDatabase
- If you want to delete a collection from your database, use the .drop() method:
db.myCollection.drop()
- To delete an entire database, switch to the database using the use command
and then use the dropDatabase() command:
86 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
i. use myDatabase
ii. db.dropDatabase()
To rename a collection, use the renameCollection() method. Make sure that you are
connected to the database containing the collection you want to rename.
Points to Remember
87 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
Application of learning 3.1.
MXY Ltd is a company that generates revenue from selling Products. The Company uses
file system (books) to store information about the sales and inventory of the products.
The Company has a problem of non-efficient security and management of information
about the products and Customers. As Database developer, you are requested to create
Mongo Database with documents and collections used to store both product and
customers information .
88 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
Indicative content 3.2: MongoDB data Manipulating
Duration: 10 hrs
manipulation Tasks:
1: Answer the following questions:
i. What is data manipulation?
ii. Give the difference between delete and update with their respective syntax
iii. Give the Difference between replacing and querying documents.
iv. What is Bulk Write Operations?
v. What is aggregation operations?
4: Pay attention to the trainer’s clarification and ask question where necessary
Insert Documents
Example:
// Insert a single document into the Users collection
db.Users.insertOne({ name: "John Doe", email: "[email protected]", age: 30 });
89 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
db.Workouts.insertMany([
{ userId: 1, type: "Running", duration: 30 },
{ userId: 1, type: "Cycling", duration: 60 }
]);
Update Documents
Example:
// Update a single document: Change the duration of the workout
db.Workouts.updateOne(
{ type: "Running" },
{ $set: { duration: 45 } }
);
Delete Documents
You can delete documents from a collection using the deleteOne() or deleteMany()
methods.
Example:
// Delete a single document
db.Workouts.deleteOne({ type: "Cycling" });
Replacing Documents
You can replace an entire document using the replaceOne() method, which
substitutes an existing document with a new one.
Example:
// Replace the existing document with a new one
90 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
db.Users.replaceOne(
{ name: "John Doe" },
{ name: "John Smith", email: "[email protected]", age: 35 }
);
Querying Documents
Querying in MongoDB allows you to retrieve data from collections using methods
like find() and findOne().
Example:
// Find all users older than 30
db.Users.find({ age: { $gt: 30 } });
Indexes
Indexes improve the efficiency of queries by allowing faster lookups. You can
create indexes on specific fields using the createIndex() method.
Example:
// Create an index on the email field
db.Users.createIndex({ email: 1 });
Example:
// Perform multiple operations in bulk on the Users collection
db.Users.bulkWrite([
{ insertOne: { document: { name: "Alice", email: "[email protected]" } } },
{ updateOne: { filter: { name: "John Smith" }, update: { $set: { age: 36 } } } },
{ deleteOne: { filter: { name: "Alice" } } }
]);
This batch operation inserts a new document, updates an existing one, and
deletes another—all in one command.
91 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
2. Aggregation Operations
Example:
// Aggregate workout data to calculate the total duration for a specific user
db.Workouts.aggregate([
{ $match: { userId: 1 } }, // Filter by userId
{ $group: { _id: "$userId", totalDuration: { $sum: "$duration" } } } // Group by
userId and sum durations
]);
This query filters workout records by userId and calculates the total workout duration.
Collection methods allow you to interact with and manipulate data within a collection.
- Insert a Document:
db.collection.insertOne({ name: "John", age: 30 })
- Update Documents:
db.collection.updateOne({ name: "John" }, { $set: { age: 31 } })
- Find Documents:
db.collection.find({ age: { $gte: 25 } })
- Delete a Document:
db.collection.deleteOne({ name: "Alice" })
92 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
- Create an Index:
db.collection.createIndex({ name: 1 }) // Ascending index
Cursor Methods
Cursor methods are used to interact with the results returned from a query.
- Limit Results:
db.collection.find().limit(5)
- Sort Results:
db.collection.find().sort({ age: 1 }) // Sort by age in ascending order
- Skip Results:
db.collection.find().skip(10).limit(5) // Skip the first 10 documents, then limit to 5
Database Methods
- Database Stats:
db.stats()
Query plan cache methods allow you to manage and inspect cached query plans.
93 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
- Clear the Query Plan Cache:
db.collection.getPlanCache().clear()
Bulk Operation Methods
Bulk operation methods are optimized for executing multiple write operations in a single
command.
User management methods are used to create and manage users in MongoDB.
- Create a User:
db.createUser({
user: "myUser", pwd:
"myPassword",
roles: [{ role: "readWrite", db: "myDatabase" }]
})
- Drop a User:
db.dropUser("myUser")
-List Users:
db.getUsers()
Role Management Methods
Role management methods allow you to manage user roles in the database.
94 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
- Grant Role to User:
db.grantRolesToUser("myUser", [{ role: "dbAdmin", db: "myDatabase" }])
- List Roles:
db.getRoles()
Replication Methods
- Shard a Collection:
sh.shardCollection("myDatabase.myCollection", { shardKeyField: 1 })
95 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
Object Constructors and Methods
- Authenticate a User:
db.auth("username", "password")
Atlas Search Index Methods
96 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
Practical Activity 3.2.2: Executing MongoDB data manipulation
Task:
Data manipulation in MongoDB refers to the various operations that allow you to create, read,
update, and delete (CRUD) documents stored in collections within a MongoDB database.
Step 2. Launch the MongoDB Compass program. If using Linux, run the following command in
the terminal:
mongodb-compass
97 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
Step 3. Connect to the MongoDB instance. Adjust the URI if required and click Connect.
Step 5. Enter the database and collection name in the appropriate fields.
98 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
Step 6. (Optional) Check the Time-Series box if the database contains time-series data.
99 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
Step 7. Review the names and options. Once ready, click Create Database.
The database and collection appear in the database listing on the left.
Operations
Step 11. Apply Mongosh Methods (Collection Methods, Cursor methods and Database
Methods)
100 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
Points to Remember
Insert document based on the insert () Method: To insert data into MongoDB
collection, you need to use MongoDB's insert () or save () method.
Update document based on update method: MongoDB's update () and save ()
methods are used to update document into a collection.
Replacing Documents: You can replace a single document using the
collection.replaceOne() method. replace One () accepts a query document and a
replacement document.
Querying Documents based on the find () Method: To query data from MongoDB
collection, you need to use MongoDB's find () method and the pretty () Method
To display the results in a formatted way, you can use pretty () method.
Bulk Write Operations: MongoDB provides clients the ability to perform write
operations in bulk. Bulk writes operations affect a single collection.
Aggregations operations process data records and return computed results.
Mongosh Methods are MongoDB Shell (Mongosh) provides methods for
managing and interacting with various MongoDB entities like collection methods,
cursor methods database methods, query plan cache methods, bulk operation
methods, user management methods, replication methods, sharding methods
and free monitoring methods.
101 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
Perform data manipulation
Apply aggregation operations
Apply bulk Write Operations
Apply mongosh methods
XY Ltd is a company that generates revenue from delivering products. The company has a
problem of losing accessibility and management of information about the products and
customers. You are tasked to help the company create Mongodb, collections and then
execute data manipulation operations and then apply mongosh methods.
102 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
Indicative content 3.3: Applying query optimizations
Duration: 5 hrs
4: Pay attention to the trainer’s clarification and ask questions where necessary
Query Optimizations
Indexing
- Indexes are special data structures that store a small portion of the data set, enabling
faster queries. Create indexes on fields that are frequently queried, sorted, or used in
join operations.
- Types of Indexes: Single-field, compound, text, geospatial, and wildcard indexes.
Query Profiling
- Use the MongoDB profiler to analyze query performance and identify slow queries. It
can log operations that exceed a certain execution time.
103 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
Schema Design
- Design your schema based on the application's read and write patterns.
Denormalization (embedding documents) can improve read performance, while
normalization (referencing documents) can save space.
Projection
- Use projection to return only the necessary fields in your queries, reducing the
amount of data transferred and processed.
Query Optimization
- Use the query optimizer to find the most efficient way to execute a query. This
includes analyzing query shapes and filter criteria.
Caching
- The explain() method provides insights into how MongoDB executes a query,
including the stages and indexes used.
- Analyze the output for metrics like execution time, number of documents examined,
and index usage.
- Enable the profiler to log slow queries and examine the output.
db.setProfilingLevel(1, { slowms: 100 }) // Log queries that take longer than 100ms
104 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
- Use MongoDB’s built-in tools or third-party monitoring solutions (like
MongoDB Atlas) to track performance metrics such as query throughput, latency, and
resource utilization.
Once you've evaluated the performance of current operations, you can implement specific
optimizations:
Add Indexes
- Analyze the explain() output to determine if your queries could benefit from
additional indexes. Create indexes on frequently queried fields.
db.myCollection.createIndex({ age: 1 })
- Rewrite inefficient queries for better performance. Avoid using $where and regular
expressions when possible, as they can be slow.
// Inefficient
db.myCollection.find({ $where: "this.age > 30" })
// More efficient
db.myCollection.find({ age: { $gt: 30 }
})
- Optimize aggregation pipelines by using stages that reduce data size early in the
process (e.g., $match before $group).
- Use limit() to restrict the number of documents returned, which can reduce network
and processing overhead.
- Continually monitor and analyze query performance, updating indexes and optimizing
queries as the application evolves and data grows.
105 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
Practical Activity 3.3.2: Creating index in MongoDB
Task:
2: You are requested to go to the computer lab to create index in MongoDB of any
company
Indexes are the most critical tool for optimizing query performance.
- Create Indexes: Identify fields that are frequently queried or sorted and
create indexes on them.
- Compound Indexes: Use compound indexes for queries that filter or sort by
multiple fields.
- Text Indexes: Use text indexes for efficient searching of string content.
106 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
- Wildcard Indexes: Consider using wildcard indexes if your document
structure varies widely.
Use MongoDB tools to analyze and understand how your queries are performing.
- Explain Plan: Use the explain() method to get details about how a query is
executed.
- Profiler: Enable the database profiler to track query performance and
identify slow queries.
Design your schema to reduce the need for complex queries and joins.
- Compact Collections: Use the compact command to reclaim disk space and
improve performance.
- Remove Unused Indexes: Regularly review and remove any unused indexes,
as they can slow down write operations.
107 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
Points to Remember
HealthTrack's data engineering team needs to optimize MongoDB queries to improve system
performance and reduce response times, identify bottlenecks, indexes to use and
aggregation pipelines reduce data processing volume. you are tasked to optimize the
performance of HealthTrack's database
108 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
Learning outcome 3 end assessment
Theoretical assessment
A) Read the statement carefully, then circle the letter corresponding to the correct
answer on the given statement.
109 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
7. What method would you use to create an index on a field called "email" in a
collection named "users"?
a) db.users.createIndex({"email": 1})
b) db.users.addIndex({"email": 1})
c) db.createIndex({"users.email": 1})
d) db.users.index({"email": 1})
B) Read the following statement carefully and then answer True for the
correct statement or False for wrong statement
5. In MongoDB, the update() method can only add new fields to an existing
document and cannot modify existing ones.
110 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
7. Bulk write operations in MongoDB can only insert documents, not update
or delete them.
9. The cursor methods in MongoDB allow for iteration over the results of a
query.
C) Match ColumnA with ColumnB respectively according to their MongoDB terms and
Definitions ,then write the correct answer on column named with ANSWERS
111 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
Practical assessment
112 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
References:
113 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
Learning Outcome 4: Manage MongoDB Database
114 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
Indicative contents
115 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
Duration: 10 hrs
By the end of the learning outcome, the trainees will be able to:
Resources
116 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
Indicative content 4.1: Management of database users
Duration: 3 hrs
Tasks:
4: Pay attention to the trainer’s clarification and ask question where necessary.
117 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
Identify the Role of Database Users
- Database users interact with the MongoDB database to perform operations such as
reading, writing, updating, and deleting data.
- Each user has specific roles that define their access and capabilities within the database.
Types of Roles
Built-in Roles: MongoDB provides several predefined roles with specific permissions:
CustomRoles: Users can also be assigned custom roles tailored to specific application
needs, which can include a combination of privileges.
Creating Users
mongo
- Use the admin database or the relevant database where you want to create the user:
use admin
Create a New User
- Use the createUser() method to add a new user with specified roles and privileges.
db.createUser({
user: "appUser",
pwd: "securePassword123", // Choose a strong password roles: [
{ role: "readWrite", db: "myDatabase" }, // Grants read and write access to myDatabase
{ role: "dbAdmin", db: "myDatabase" } // Grants administrative privileges on
myDatabase
118 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
]
})
Parameters:
db.getUsers()
db.updateUser("appUser", {
roles: [{ role: "read", db: "myDatabase" }] // Update the roles assigned to appUser
})
Remove a User
db.createRole({
role: "customRole",
privileges: [
{ resource: { db: "myDatabase", collection: "" }, actions: ["find", "insert"] } // Custom
privileges
],
roles: [] // No inherited roles
})
119 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
- To revoke specific roles from a user, use the revokeRolesFromUser() method:
Task:
2: Go to the computer lab and by referring to the theoretical activity, 4.1.1 , create users
and assign them privileges depending on responsibilities of Mongo Database
MongoDB GUI is a NoSQL database that is extremely popular for its convenience and
features. There is no SQL here, which means it is a mechanism for processing data
patterned in tabular format and storing it in a database. It is faster in speed, and easy to
scale. One of the parts of Mongo's functioning is creating and adding new users to the
system. It is easy enough if you have purchased our new product — NoSQL Manager.
With it, you can easily make up specific databases, and the user will have access to this
unique database. You can likewise specify the access level for this client in the database.
MongoDB contains a considerable number of roles. By creating a user using our console,
you can assign them one or more functions, thereby regulating access to your database.
120 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
When validating credentials, MongoDB will validate the account against the specified
database and the admin one. It's easy to do this with the NoSQL Manager:
1. Open NoSQL Manager and click New MongoDB Connection button in the toolbar.
2. Next, specify your MongoDB host and port. Leave fields as-is if you are
connecting to a local instance.
121 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
3. Test your connection with Test Connection button and click OK to save the connection.
4. Click double on your connection in DB Explorer, click double on the admin database, next
click Main Menu|Database|Create New User... or click right on Users in DB Explorer and
click Create New User in the context menu.
122 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
5. Specify the user name and password. For example we use the tiger name.
123 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
8. You have created the tiger user with root privileges. The root role is a combination of
readWriteAnyDatabase, dbAdminAnyDatabase, userAdminAnyDatabase, clusterAdmin,
restore and backup roles.
124 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
9. Disconnect the server before the next step.
storage:
dbPath: /data/db
net:
port: 27017
bindIp: 127.0.0.1
security:
authorization: enabled
125 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
setParameter:
authenticationMechanisms: "SCRAM-SHA-256"
Open NoSQL Manager, select your connection in DB Explorer and click Edit MongoDB
Connection button in the toolbar.
Edit the Authentication, User and Password fields as described below and click OK to
save the changes.
126 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
Connect to your instance in NoSQL Manager. Now you can add, edit and remove users
and roles.
In this example we will create a limited user user42 that has read-only access to the test
database only.
127 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
3. Specify the user name and password, select the test database on the Database Roles
tab and click Edit Database Permission button.
128 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
5. Click Apply to save user to the database.
6. So, we have just created a limited user user42 in the test database.
129 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
7. Create a new MongoDB connection to test this user.
130 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
8. Next, connect to the MongoDB instance with the user42 user and try to
execute any command that requires an extra privilege. Try to create a
collection, for example.
131 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
Points to Remember
Fin Serve, a financial services provider, wants to improve its database user management in
MongoDB implementation to ensure data security and operational efficiency. You are
assigned or tasked to create users, identify unique credentials to be used for each user
and then assign the role-based access controls for each user.
132 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
Indicative content 4.2: Securing database
Duration: 3 hrs
security Tasks:
1: Answer the following questions:
Define the following terms:
i. User authentication
ii. RBAC
iii. Auditing
iv. Describe types of data encryption
2: Write your findings on Papers or flipcharts
3: Present your findings to the trainer or classmates
4: Pay attention to the trainer’s clarification and ask question where necessary.
5: Read the key readings 4.2.1
Access control ensures that only authorized users can access your database. MongoDB
provides various authentication mechanisms, including SCRAM, x.509, and LDAP.
- Find the security section and add or uncomment the following line to enable
authentication:
security:
authorization: "enabled"
133 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
Restart MongoDB to apply the changes:
use admin;
db.createUser({ us
er: "admin",
pwd: "secureAdminPassword",
roles: [{ role: "userAdminAnyDatabase", db: "admin" }]
});
Role-based access control allows you to define specific permissions for users based on their
roles.
use FitTrack;
db.createUser({ us
er: "appUser",
pwd: "securepassword",
roles: [
{ role: "readWrite", db: "FitTrack" },
{ role: "dbAdmin", db: "FitTrack" }
]
});
db.updateUser("appUser", {
$set: {
roles: [{ role: "read", db: "FitTrack" }]
}
});
134 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
Data Encryption and Protect Data
Data encryption helps protect sensitive information both at rest and in transit.
net:
ssl:
mode: requireSSL
PEMKeyFile: /path/to/your/certificate.pem
storage:
wiredTiger:
engineConfig:
encryption: "enabled"
Auditing helps track access and modifications to the database, providing visibility into user
actions.
auditLog:
destination: file
path: "/var/log/mongodb/auditLog.json"
format: JSON
cat /var/log/mongodb/auditLog.json | jq .
135 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
Perform Backup and Disaster Recovery
Implementing a robust backup and disaster recovery plan ensures that you can recover
data in case of failure.
Automate Backups:
- Use cron jobs or scheduling tools to automate the backup process. For
example, to back up daily at midnight:
Task:
2: Go to the computer lab to perform dataSafe Corp implemented as robust backup strategy
utilizing MongoDB's native tools, geographical distribution, and oplog-based replication
to safeguard unstructured data and ensure business continuity
3: Present the steps to perform backup in NoSQL MongoDB
136 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
Key readings 4.2.3: Performing NoSQL backup
Perform backup and recovery
Performing backup and disaster recovery in NoSQL involves creating periodic copies of
your NoSQL database data. typically through snapshotting or export/import methods,
to a separate storage location, allowing you to restore the database to a previous state
in case of hardware failure, data corruption, or other disruptions, ensuring business
continuity by minimizing downtime and data loss; key aspects include choosing the
appropriate backup strategy based on your data volume, access requirements, and
recovery time objectives (RTO), and setting up automated backup schedules with proper
retention policies to maintain multiple versions of your data.
Key points about NoSQL backup and disaster recovery:
Backup methods: Snapshot backups: Creating a point-in-time copy of the entire
database, often considered the most reliable method for consistency.
Export/Import backups: Exporting data to a file format and then importing it to a
new database if needed.
File system backups: Backing up the database files directly from the file system,
but may not capture data consistency.
Considerations for NoSQL backups: Data distribution: NoSQL databases often
distribute data across multiple nodes, requiring a strategy to capture all data
consistently.
Replication: Leverage built-in replication features to create redundant data copies
for improved availability.
Backup frequency and retention: Determine how often to back up data and how
long to retain backups based on your recovery point objective (RPO).
Disaster recovery strategies: Failover clusters: Setting up a secondary cluster in a
different location to quickly switch to in case of a primary site failure.
Warm standby: Maintaining a partially active replica of the database that can be
quickly brought online in an emergency.
Cloud-based backups: Utilizing cloud storage services for offsite backups and
disaster recovery capabilities.
Steps for performing a NoSQL backup:
Choose a backup method: Select the most suitable backup method based on your
database type (MongoDB, Cassandra, etc.) and desired data consistency.
Configure backup schedule: Set up automated backups at regular intervals to
capture changes in your data.
Create snapshots or exports: Use the provided database tools to create snapshots
or export data to a backup location.
Monitor backup process: Regularly monitor backup jobs to ensure successful
completion and address any issues.
137 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
Test restore process: Periodically perform test restores to verify data integrity and
identify potential problems with your backup strategy.
Points to Remember
While describing database security, take into consideration the following elements:
Choose the backup methods
Configure backup schedule
Create snapshots or exports
Monitor backup process
Test restore process
Steps for performing backup are:
Choose the backup methods
Configure backup schedule
Create snapshots or exports
Monitor backup process
Test restore process
138 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
Indicative content 4.3: Deployment of database
Duration: 4 hrs
deployment Tasks:
1: Answer the following questions:
i. What do you understand by deployment?
ii. State deployment options
iii. state characteristic of deployment options used in NoSQL Database
iv. Identify MongoDB cluster
architectures 2: Write your findings on papers
or flipcharts
4: Pay attention to the trainer’s clarification and ask questions where necessary.
5: Read the key readings 4.3.1
When deploying a NoSQL database, you can choose between "On-Premises" where the
database runs on your own physical hardware, "Cloud" where it is hosted on a remote cloud
provider's infrastructure, or "Hybrid" which combines elements of both, allowing you to
leverage the benefits of both on-premises control and cloud scalability depending on your
specific needs; all while maintaining the flexible, non-relational data structure characteristic
of NoSQL databases.
On-Premises:
- Control: Provides full control over hardware, security, and network configuration,
ideal for highly sensitive data or strict compliance requirements.
- Customization: Ability to tailor the database environment to specific needs by
managing hardware upgrades and software installations.
139 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
- Cost considerations: Requires upfront investment in hardware and maintenance
staff, potentially higher operational costs compared to cloud.
Cloud:
- Scalability: Easily scale up or down database capacity on demand based on
application usage, without the need for manual hardware management.
- Cost-efficiency: Pay-as-you-go model can reduce costs, especially for applications
with fluctuating data demands.
- High availability: Cloud providers offer robust disaster recovery features and
redundancy across multiple data centers.
Hybrid:
- Data locality: Store frequently accessed data on-premises for faster access while
utilizing the cloud for large data storage or off-peak processing.
- Data migration flexibility: Gradually move data to the cloud while maintaining on-
premises access to critical information.
- Cost optimization: Leverage the benefits of both cloud and on-premises
infrastructure to optimize costs based on specific application needs.
Considerations when Choosing a Deployment Option:
- Data sensitivity: for extremely sensitive data, on-premises may be preferred due to
greater control over security.
- Application requirements: Consider the expected data volume, scalability needs,
and performance demands of your application.
- Budget constraints: Evaluate the upfront costs of hardware versus the pay-as-you-
go cloud model.
- Examples of NoSQL Databases suitable for various deployment options:
- On-Premises: MongoDB, Couchbase Server, Cassandra
- Cloud: Amazon DynamoDB, Azure Cosmos DB, Google Cloud Spanner
- Hybrid: A combination of on-premises MongoDB with cloud-based data storage on
AWS S3
140 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
Identify mongoDB Cluster Architectures
MongoDB Cluster Architectures
MongoDB offers several cluster architectures to accommodate different scalability and
availability requirements.
Three main types:
Single-Node Cluster
Description: A single MongoDB instance running on a single server.
Advantages:
- Simple to set up and manage.
- Suitable for small-scale applications with low data volumes.
Disadvantages:
- Limited scalability and availability.
- Single point of failure.
Replica Set
Description: A group of MongoDB instances that maintain a consistent replica of the same
dataset.
Advantages:
- Improved availability through automatic failover.
- Data redundancy for disaster recovery.
- Read scaling through secondary members.
Disadvantages:
- Increased complexity compared to a single-node cluster.
- Limited write scalability.
Components:
- Primary: The instance that handles write operations and acts as the authoritative
source of data.
- Secondary: Read-only replicas that maintain consistency with the primary.
- Arbiter: Non-voting members used for tie-breaking during elections.
Sharded Cluster
Description: A distributed system that horizontally scales MongoDB across multiple servers by
partitioning data across shards.
141 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
Advantages:
- Exceptional scalability for handling large datasets.
- Improved read and write performance.
- High availability through redundancy across shards.
Disadvantages:
- Increased complexity and management overhead.
- Requires careful sharding key design for optimal performance.
Components:
- Shard: A standalone MongoDB instance responsible for storing a subset of the data.
- Config Server: Stores configuration information about the sharded cluster, including
shard assignments and routing rules.
- Mongos: Query routers that act as a single entry point for client applications.
Choosing the Right Architecture
The optimal cluster architecture depends on factors such as:
- Data Volume: The amount of data to be stored.
- Read and Write Patterns: The expected frequency and intensity of read and write
operations.
- Availability Requirements: The need for high availability and disaster recovery.
- Scalability Needs: The potential for future growth and scalability.
- Complexity Tolerance: The organization's ability to manage a more complex system.
Scaling MongoDB with Sharding
Sharding is a horizontal scaling technique in MongoDB that allows for automatic scaling of a
database across multiple nodes and regions. Sharding is a way to distribute a large database
into smaller pieces, called shards, and spread them across multiple machines.
Sharding can help with:
Scalability
142 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
Sharding allows for near-limitless scaling to handle large data sets and intense workloads.
Performance
Sharding allows read and write operations to be parallelized, which improves overall system
performance.
High availability
Sharding, combined with replication, ensures that data is redundant, providing high
availability and fault tolerance.
Some disadvantages of sharding include:
- Additional complexity: Sharding can increase the complexity of infrastructure and
maintenance.
- Risk of duplicated or lost data: Sharding can introduce a risk of duplicated or lost data.
143 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
- Additional system traffic and storage requirements: Sharding can increase system
traffic and storage requirements.
Introducing MongoDB Atlas
MongoDB Atlas is a Database as a Service (DbaaS), provided by the team behind MongoDB. It
is a fully automated service with minimal to no configuration. Additionally, you have an
option to deploy MongoDB instances in the any of the top three cloud providers, which are
AWS, Azure, or Google cloud. It is an easy-to-use cloud-based service, which was released in
2016 and has been battle tested since. It is used and loved by both start-ups and many well-
established enterprises like Invision, Ebay, Adobe, and Google.
Although MongoDB Atlas is fully automated, it provides a very feature rich deployment. The
moment we create a MongoDB instance, the Built-in replication kicks and our data is now
stored at multiple locations. It is always available, even when the master is down.
Task:
2: Go to the computer lab to deploy NoSQL mongoDB with deployment options, mongoDB
cluster architectures and sharding.
144 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
Key readings 4.3.2: Deploying NoSQL mongoDB
Deploying MongoDB
- MongoDB is completely open -source and free to use, but for deployment, we
generally need to take the paid route. We can also download the community edition of
MongoDB locally and use it through the command line or the nice graphical interface of
MongoDB Compass.
- For deployment, we need a Linux based server. We can either use our own server or
deploy it in any of the available, professionally managed cloud services. Three popular
options are to deploy in a linode server, Heroku, or AWS.
Sign up or Sign in
In the next page, it will ask you to Sign up or Sign in. You can also use your google
account to do so.
145 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
Since I already have an account, I clicked on the ‘Sign in’ option and the following page
came up.
146 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
Create New Project
If you already have a project on MongoDB Atlas, you will be taken to the project on
which you last worked. Here, you need to click on the project, and then in the pop-up,
click on New Project.
Accessing database
Then, it will ask us to give the project a name. I have given the name ‘employees’.
Deployment options
In the next screen, it will ask you to give access to members. I have given access to the
existing users. After that, you need to click on Create Project button.
147 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
In the next page, click on the big Build a Database button to create your database.
It will then give you three options to start with. Here, I am going for a Shared server,
which is free. Notice that you also have the option of a Dedicated server, and you should
use this for production of apps.
148 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
Now, it will ask you to choose a cloud provider and the region for the server. Choose the
server that is nearest to your user base, as the lag will be minimal. Click on the Create
Cluster button.
149 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
Next, it you will ask for a username and password. You should remember this, as you’ll
need it to connect through the NodeJS application. After providing the username and
password, click on the Create User button.
You also need to give the IP address and for your testing project. You should then click
on the Add My Current IP Address button.
150 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
After that, scroll down a bit and click on the Finish and Close button.
On successful creation of the user and IP address, you will get this pop-up. Click on the
151 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
Go to Databases button.
Now, you will be taken to below screen, which shows your cluster. Here, click on the
Connect button.
A pop-up will appear. Click on the Connect your application option in the middle.
152 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
Now, you will get the connection string and you can copy it. You will need it to connect
your NodeJS application next to the MongoDB database.
Connecting to Atlas
You will now connect a simple NodeJS app to your newly created Atlas database. You
will create a simple app with NodeJS and express by first creating a folder and then
changing to it.
153 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
Now, you will create an empty node app by giving the command npm init –y.
You will then install the mongoose and express package in it. Mongoose is a npm module,
which is required to connect NodeJS app to a mongodb database. And express is used in
the NodeJS app to make the programming much easier.
154 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
b. The Processes page displays
Points to Remember
155 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
Application of learning 4.3.
156 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
Learning outcome 4 end assessment
Theoretical assessment
Section A: Read the statement carefully, then circle the letter corresponding to the
correct answer based on the statement given
1. What is the primary role of database users?
a) To create backups
b) To manage database performance
c) To access and manipulate data
d) To configure network settings
2. Which command is used to create a new user in MongoDB?
a) db.createUser()
b) db.addUser()
c) db.newUser()
d) db.insertUser()
3. What is the purpose of roles in MongoDB?
a) To create collections
b) To define user permissions and privileges
c) To enhance database performance
d) To manage database backups
4. Which of the following is a method to enforce authentication in a database?
a) Using indexes
b) Setting up role-based access control
c) Implementing data encryption
d) Creating a user account with a password
5. What does role-based access control (RBAC) in MongoDB allow administrators to do?
a) Back up data automatically
b) Create indexes for faster queries
c) Assign specific roles to users based on their job functions
d) Encrypt sensitive data
6. Which of the following methods helps protect sensitive data in a database?
a) Implementing database sharding
b) Enabling data encryption
c) Creating backup copies
d) Monitoring database performance
157 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
7. What is the purpose of auditing system activity in a database?
a) To improve query performance
b) To track changes and access to the database
c) To back up data regularly
d) To manage user roles
8. Which deployment option involves hosting the database on physical servers owned by
the organization?
a) Cloud
b) Hybrid
c) On-Premises
d) Remote
9. Which MongoDB cluster architecture consists of a primary node and multiple
secondary nodes?
a) Single-Node
b) Replica Set
c) Sharded Cluster
d) Hybrid Cluster
10. What is the primary benefit of scaling MongoDB with sharding?
a) Improved data security
b) Enhanced data visualization
c) Increased database availability and performance
d) Simplified user management
Section B: Read the statement below and Answer by True for the correct statement Or False
for wrong statement.
1) The primary role of database users is to manage server hardware and configuration.
2) A user in MongoDB can be created using the command db.createUser() with specific
roles assigned.
3) Roles in a database are used to assign privileges and control what actions users can
perform.
4) Enforcing authentication in a database is optional and does not significantly impact
security.
5) Role-Based Access Control (RBAC) allows different users to have different permissions
based on their roles within the organization.
6) Data encryption ensures that sensitive information in the database is protected from
unauthorized access.
158 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
7) Auditing system activity in a database is only necessary during the initial setup and not
needed afterward.
8) On-premises deployment of a database means that the database is hosted on the
organization's physical servers.
9) A MongoDB sharded cluster allows for horizontal scaling by distributing data across
multiple servers.
10) A replica set in MongoDB consists of only one primary node and no secondary nodes.
Section C: Read clearly the questions and answer them
Practical assessment
159 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
References:
160 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l
October 2024
161 | N o S Q L D a t a b a s e D e v e l o p m e n t – T r a i n e e M a n u a l