0% found this document useful (0 votes)
3 views

NONSQL-DATABASE_NOTE

The document outlines a module for developing NoSQL databases as part of a TVET Level 5 curriculum in Software Development, detailing the knowledge, skills, and attitudes required. It includes learning outcomes, assessment methods, and performance criteria for preparing, designing, implementing, and managing NoSQL databases. Key concepts covered include database requirements, types of NoSQL databases, data modeling, and validation processes.

Uploaded by

keza loenah
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

NONSQL-DATABASE_NOTE

The document outlines a module for developing NoSQL databases as part of a TVET Level 5 curriculum in Software Development, detailing the knowledge, skills, and attitudes required. It includes learning outcomes, assessment methods, and performance criteria for preparing, designing, implementing, and managing NoSQL databases. Key concepts covered include database requirements, types of NoSQL databases, data modeling, and validation processes.

Uploaded by

keza loenah
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

NoSQL DATABASE DEVELOPMENT

SWDND501
BDCPC301 - Develop NoSQL Database
Trainer: Samie TWAHIRWA

Competence
RQF Level: 5 Learning Hours
60
Credits: 6

Sector: ICT & MULTIMEDIA

Trade: SOFTWARE DEVELOPMENT

Module Type: Specific

Curriculum: ICTSWD5001-TVET Certificate V in Software Development

Copyright: © Rwanda TVET Board, 2024

Issue Date: February 2024


Purpose statement This specific module describes the knowledge, skills and attitude required to
Develop NoSQL databases. This module is intended to prepare students
pursuing TVET Level 5 in Software Development. At the end of this module the
student will be able to Prepare database environment, Design database,
Implement database, Manage Database.

Learning assumed to N/A


be in place

Delivery modality Training delivery 100% Assessment Total 100%

Theoretical content 30% 30%

Practical work:

Group project 20% Formative


and 50%
assessment
presentation 70% 70%
Individual 50%
project
/Work
Summative Assessment 50%
Elements of Competence and Performance Criteria

Elements of Performance criteria


competence
1.Prepare 1.1 Database requirements are properly identified based on user
database requirements
environment 1.2 Database is clearly analysed based on database requirements.
1.3 Database environment is successfully prepared based on established
standards.
2.Design 2.1 Drawing tools are properly selected based on database
database requirements.
2.2 Conceptual Data Modeling is created based on the structure of the
data and its relationships.
2.3 Database Schema is clearly designed according to Mongoose.
3. Implement 3.1 MongoDB data definition are properly performed based on database
database requirements
design 3.2 MongoDB data manipulation are properly performed based on
database requirements
3.3 Query optimizations are properly applied based on query
performance.
4.1 Database users are effectively managed with appropriate
permissions.
4. Manage
4.2 Database is effectively secured in line with best practices.
Database
4.3 Database is successfully deployed based on the targeted
environment.
Unit 1: Prepare database environment
Learning Outcome :1.1 Database requirements are properly identified based on user
requirements

● Identifying Database requirements


Key Terms Definitions
✔ NoSQL (Not Only SQL):
A type of database designed to handle unstructured, semi-structured, or distributed data that
doesn’t fit well into traditional relational databases. NoSQL databases are flexible, scalable, and
capable of handling large volumes of diverse data. Examples include MongoDB, Cassandra, Redis,
and CouchDB. Common NoSQL models include key-value, document-based, column-family, and
graph databases.
✔ MongoDB:
A document-oriented NoSQL database that stores data in flexible, JSON-like documents. It is
schema-less, allowing for high flexibility in data models, and is well-suited for handling large
amounts of data across distributed systems.
✔ Availability:
A database's ability to remain accessible and operational, even during failures or high traffic. In
NoSQL databases, availability is often prioritized over consistency, especially in distributed
systems (as per the CAP theorem), ensuring that the database continues to respond to requests.
✔ Documents:
In MongoDB, a document is a record or a data unit stored in BSON (Binary JSON) format. A
document consists of fields (key-value pairs) similar to JSON objects. Each document is analogous
to a row in a relational database, but it can have a dynamic schema where fields can vary between
documents.
✔ Collection:
A group of MongoDB documents. A collection is analogous to a table in relational databases, but
unlike tables, a MongoDB collection does not enforce a rigid schema. Documents within a
collection can have varying structures.
✔ Indexing:
A technique used to speed up data retrieval operations in a database. Indexes create an ordered
data structure that allows the database to quickly locate and access data without scanning the
entire dataset. MongoDB supports various types of indexes, including single-field, compound,
and geospatial indexes.
✔ Optimistic Locking:
A concurrency control mechanism where a transaction does not lock data when reading it.
Instead, data is checked for modifications before committing changes. If another transaction has
modified the data, the current transaction will fail, and the user must retry the operation. It’s
often used in distributed systems to avoid locking and ensure high performance.
✔ Relationships:
In MongoDB, relationships between documents can be modeled using embedding (storing
related documents inside one another) or referencing (storing references to other documents).
MongoDB doesn't use joins like relational databases; instead, relationships are handled at the
application level.
✔ Data Model:
A blueprint for how data is stored, organized, and manipulated in a database. In MongoDB, the
data model is more flexible compared to relational databases, allowing for dynamic schema
changes and the ability to store diverse types of data within the same collection.
✔ Schema:
The structure or organization of data in a database. In relational databases, a schema is rigid and
defines how tables, fields, and relationships are structured. In MongoDB, schemas are flexible,
meaning each document in a collection can have a different structure or set of fields. Schema
validation can be applied in MongoDB to enforce certain rules on data.
✔ Mongosh (MongoDB Shell):
The interactive command-line interface used to interact with a MongoDB instance. Mongosh
allows users to perform administrative tasks, manage databases, query data, and execute
JavaScript within the MongoDB environment.

Identifying User Requirements

When identifying user requirements for a database system, it’s essential to gather detailed
information about the user's needs. This process involves understanding the type of data, the
volume of data, and how users will interact with the database. Key questions to ask include:

 What kind of data will be stored? (Structured, unstructured, or semi-structured?)


 How much data is expected to be stored and how quickly will it grow?
 How will users access and update the data? (CRUD operations: Create, Read, Update,
Delete)
 What are the performance expectations? (e.g., response time, scalability)
 Are there any security or privacy requirements?
 What kind of availability and consistency is expected?
Once these requirements are collected, they guide the choice of database type (SQL vs. NoSQL)
and the database's structure.

Characteristics of Collections (in MongoDB)

 Flexible Schema: Collections in MongoDB do not enforce a strict schema, meaning each
document (record) in a collection can have a different structure, making it easy to store
diverse data types.
 Grouping of Documents: A collection is a grouping of documents, where each
document represents an individual record (similar to a row in relational databases).
 Indexing Support: Collections can be indexed to improve the speed of queries.
 Sharding and Replication: Collections can be sharded across multiple servers for
scalability and replicated for high availability.
 Storage of Similar Documents: While flexible, collections typically store documents
that have similar fields or serve similar purposes.

Features of NoSQL Databases

 Schema-less Data Storage: NoSQL databases are not rigid with schema design, which
allows for storing structured, semi-structured, or unstructured data.
 Horizontal Scalability: NoSQL databases are designed to scale out by distributing data
across multiple nodes (sharding), making them ideal for handling large amounts of data.
 High Availability: NoSQL databases prioritize availability by replicating data across
multiple nodes, which helps avoid downtime during network or server failures.
 Eventual Consistency: In distributed systems, NoSQL databases often focus on
eventual consistency, meaning that data will become consistent across nodes after some
time.
 Optimized for Large-Scale Data: NoSQL databases handle large datasets and high-
velocity data much more effectively than traditional relational databases.
 Handling of Unstructured Data: NoSQL databases can manage and store unstructured
or semi-structured data like JSON, XML, and multimedia files.

Types of NoSQL Databases

1. Key-Value Stores:
o Data is stored as key-value pairs, where each key is unique and maps to a specific
value.
o Example: Redis, Amazon DynamoDB.
2. Document-Oriented Databases:
oData is stored in documents, typically in JSON, BSON, or XML format, and each
document is semi-structured.
o Example: MongoDB, CouchDB.
3. Column-Family Stores:
o Data is stored in tables but organized by columns instead of rows, making it
efficient for reading/writing large datasets.
o Example: Apache Cassandra, HBase.
4. Graph Databases:
o Designed to store data in nodes and edges, representing relationships between
data points, making them ideal for social networks and recommendation systems.
o Example: Neo4j, Amazon Neptune.

Data Types in NoSQL Databases

 String: A sequence of characters (e.g., text fields, names, and identifiers).


 Number (Integer/Float): Numeric values for calculations and measurements (e.g., age,
product price).
 Boolean: True or false values.
 Array: A collection of elements, such as a list of values (e.g., an array of product IDs).
 Object: A nested structure representing an entity, often used in document-oriented
databases.
 Binary Data: Images, audio, video, or other binary data formats.
 Geospatial Data: Data used for geographic coordinates, supported by certain NoSQL
databases like MongoDB with geospatial queries.

Defining Use Cases

Use cases describe how users will interact with the system, detailing the actions users perform to
achieve a specific goal. For NoSQL databases, use cases help define the data models and
operations.

1. E-Commerce Platform:
o Users: Shoppers, sellers, and admins.
o Actions: Browse products, add items to the cart, place orders, view order history.
o Database Operations: Store product catalogs (key-value), manage user profiles
(documents), and track transactions (document collections).
2. Social Media Application:
o Users: General users and administrators.
o Actions: Post updates, like and comment on posts, follow/unfollow other users.
o Database Operations: Store user profiles (documents), manage relationships
between users (graph database), and manage posts and comments (documents).
3. IoT Data Management:
o Users: Device operators and data analysts.
o Actions: Monitor real-time sensor data, store historical data, trigger alerts based
on thresholds.
o Database Operations: Store time-series data (key-value or column-family),
analyze patterns in sensor data (graph or document).

L.O: 1.2 Database is clearly analysed based on database requirements.

Analyzing NoSQL Database

NoSQL databases are highly flexible, but it's essential to conduct a thorough analysis before
implementing them. This analysis includes a comprehensive understanding of system
requirements, data types, scalability needs, and user requirements.

Requirements Analysis Process for NoSQL Databases

The requirements analysis process helps determine the appropriate database structure,
performance, and functionality to meet user needs. The process typically includes the following
steps:

1. Identify Key Stakeholders and End-Users

 Key Stakeholders: Individuals or groups who have a vested interest in the project, such
as:
o Business Leaders: Define the business goals, timelines, and budget.
o IT/Database Administrators: Oversee the database design, performance, and
security.
o Developers: Design and implement the database schema and queries.
o End-Users: Individuals who will interact with the system daily. They provide
crucial input on what the database should achieve (e.g., sales teams, data analysts,
customers using an app).
 End-Users’ Expectations: Gather insights from users about how they expect the
database to work, such as ease of data retrieval, scalability, and the types of queries they
need to perform.
2. Capture Requirements

 Gather Functional Requirements:


o What are the core functions of the database? For example, storing customer data,
retrieving product information, and processing large-scale analytics.
o Identify the specific data types the system needs to store (e.g., unstructured data
like JSON, media files, logs).
o Understand access patterns: Will users mostly be reading or writing data? How
frequently?
 Non-Functional Requirements:
o Performance: What are the expected response times for data queries?
o Scalability: How much data will the system need to handle, both now and in the
future? Will it need to scale out to handle high traffic or large data volumes?
o Availability: How critical is 24/7 uptime for the system? What level of fault
tolerance is required?
o Security: What are the data privacy and security concerns? Are there regulatory
requirements (e.g., GDPR, HIPAA)?

3. Categorize Requirements

 Functional vs. Non-Functional Requirements:


o Functional Requirements: Define what the database must do (e.g., support
CRUD operations, support querying large datasets, ensure quick data retrieval).
o Non-Functional Requirements: Define quality attributes such as performance,
scalability, security, and data consistency.
 Priority Levels:
o Must-Have: Essential requirements for system success.
o Should-Have: Important but not critical features.
o Nice-to-Have: Features that improve the system but are not essential.

4. Interpret and Record Requirements

 Data Modeling: Once the requirements are captured, create data models. In the case of
NoSQL, data modeling may involve:
o Defining collections or tables and their structure.
o Understanding relationships between entities (e.g., embedding vs. referencing
documents in MongoDB).
o Selecting a database that matches the use case (e.g., choosing a document-
oriented database like MongoDB for semi-structured data).
 Document Requirements: Store all gathered requirements in a formal document that
outlines how the system will handle data, scalability, and access.
5. Validate Requirements

 Review Sessions with Stakeholders:


o Validate that the captured requirements accurately reflect the expectations of all
stakeholders.
o Conduct review sessions where stakeholders confirm whether the database design
meets business and technical needs.
 Prototyping: Consider building a prototype of the NoSQL database to test whether it
meets user requirements before full-scale implementation.

Perform Data Analysis

Data analysis involves understanding the types of data that will be stored in the database, the
relationships between the data, and how the data will be used.

 Identify Data Types: Analyze the data that the system will manage (e.g., documents,
multimedia files, JSON objects, log data). Ensure that the database selected can
efficiently store and process this data.
 Data Patterns: Determine how the data will be accessed. For instance, document-based
databases like MongoDB excel at handling semi-structured or unstructured data such as
JSON files, while key-value stores like Redis are optimal for fast retrieval of single
values.
 Analyze Relationships: NoSQL databases handle relationships differently compared to
relational databases:
o Embedding: Store related data in a single document.
o Referencing: Use a reference to link separate documents.
 Query Requirements: Understand the types of queries users will run. For example, if
complex relationships between entities are involved, a graph database (e.g., Neo4j) may
be more appropriate.

Implement Data Validation

Data validation ensures that the data entering the database conforms to the expected format,
structure, and constraints, even in flexible NoSQL databases.

 Schema Validation: Even though NoSQL databases like MongoDB are schema-less,
they offer schema validation to ensure that inserted documents meet specific conditions
(e.g., required fields, field types).
o Example in MongoDB: You can define JSON schema rules to enforce the
structure of documents.
 Constraints:
o Required Fields: Ensure that certain fields (e.g., user_id, email) are always
present in the document.
o Data Types: Enforce that fields conform to a specific data type (e.g., a field must
be a string, number, or array).
o Range Validation: Ensure that numeric or date values fall within expected
ranges.
 Data Integrity: Since NoSQL databases prioritize availability over consistency in some
cases, ensure that the system includes proper mechanisms for validating data integrity,
such as:
o Optimistic Locking: Avoids conflicts during concurrent updates.
o Consistency Checks: Run periodic checks to ensure the data is synchronized and
valid across multiple nodes.

L.O: 1.3 Database environment is successfully prepared based on established standards.

Preparing Database Environment for MongoDB

Setting up the MongoDB environment involves ensuring that the database is configured for
optimal performance, scalability, and usability. This process includes setting up the necessary
tools, environments, and configurations for both development and production use.

1. Identifying the Scalability of MongoDB

MongoDB is known for its horizontal scalability, which allows it to handle increasing data
volumes by distributing data across multiple servers. Here are the key aspects of MongoDB's
scalability:

Sharding:

 MongoDB uses sharding to partition data across multiple servers. This ensures that large
datasets can be distributed and processed efficiently.
 Shard Key: A key is chosen to distribute data, ensuring an even load across the cluster.
 Horizontal Scalability: New nodes (servers) can be added to handle increased
workloads without impacting performance.

Replica Sets:

 MongoDB uses replica sets to ensure high availability and fault tolerance.
 A replica set consists of a primary node (where write operations are directed) and
secondary nodes (which replicate the data for backup and fault tolerance).
 Automatic Failover: If the primary node goes down, one of the secondary nodes will
automatically become the new primary.

Load Balancing:

 MongoDB distributes queries across shards and replica sets, ensuring that the system can
handle a large number of concurrent read and write operations.
 Elastic Scalability: MongoDB can scale up and down dynamically to meet fluctuating
data loads.

2. Setting up MongoDB Environment

MongoDB can be set up in multiple environments depending on the use case and deployment
scenario. The three most common environments are MongoDB Shell, Compass, and Atlas.

MongoDB Shell Environment (Mongosh)

MongoDB Shell (Mongosh) is the command-line interface for interacting with MongoDB.

1. Install MongoDB:
o Download and install MongoDB from the official MongoDB website.
o Ensure that MongoDB is added to the system's path for easy access from the
terminal or command prompt.
2. Using the Shell (Mongosh):
o After installation, open the terminal or command prompt and run:

bash
Copy code
mongosh

o This opens the MongoDB shell, where you can execute MongoDB commands,
run JavaScript code, and manage your database.
3. Basic Shell Commands:
o Show Databases:

bash
Copy code
show dbs

o Create/Use a Database:

bash
Copy code
use myDatabase
o Insert Data:

bash
Copy code
db.myCollection.insert({ name: "John", age: 30 })

o Query Data:

bash
Copy code
db.myCollection.find({ name: "John" })

MongoDB Compass Environment

MongoDB Compass is a graphical user interface (GUI) for MongoDB that provides an easier
way to visualize and manage data without using the command line.

1. Install MongoDB Compass:


o Download MongoDB Compass from the official MongoDB website.
o Follow the installation instructions for your operating system.
2. Connecting to MongoDB:
o Open Compass and connect to your MongoDB instance by providing the
connection string (e.g., mongodb://localhost:27017 for a local instance).
3. Features of Compass:
o Visualize Data: Browse through databases and collections, view documents in a
user-friendly way, and edit documents directly.
o Query Builder: Build and run queries without writing code.
o Indexing: View and manage indexes to improve query performance.
o Aggregation Pipeline Builder: Construct complex aggregation pipelines using
the GUI.
4. Sample Operations:
o Insert, update, and delete documents using the visual interface.
o Analyze query performance with the built-in performance analysis tools.

MongoDB Atlas Environment

MongoDB Atlas is MongoDB's fully managed cloud database service, which simplifies database
deployment and management.

1. Setting Up MongoDB Atlas:


o Sign up for an account at MongoDB Atlas.
o Once signed in, create a new cluster by following the step-by-step guide in Atlas.
o Atlas allows you to choose a cloud provider (e.g., AWS, Google Cloud, Azure)
and a region to host your database.
2. Connecting to MongoDB Atlas:
o After creating a cluster, MongoDB Atlas provides a connection string that can be
used to connect to the database from MongoDB Shell, Compass, or any
MongoDB client.
o Example connection string:

bash
Copy code
mongodb+srv://username:[email protected]/myDatabase?r
etryWrites=true&w=majority

3. Managing the Atlas Environment:


o Cluster Management: Easily scale up or down the cluster based on usage.
o Backup and Restore: Automated backups and restoration features are provided.
o Security Features: Atlas offers encryption, access control, IP whitelisting, and
two-factor authentication to secure your data.
o Monitoring and Alerts: Atlas includes a monitoring dashboard that tracks
database performance metrics like memory usage, CPU load, and query
performance.
4. Atlas Integration with Development Environments:
o Connect Atlas to development environments like Node.js, Python, or any other
programming language using MongoDB drivers.
o Atlas supports seamless integration with modern frameworks and tools for cloud-
native applications.

Sample Setup for Each Environment

1. Shell Example:
o After starting mongosh, you can insert a document into a new collection:

bash
Copy code
use school
db.students.insert({ name: "Alice", age: 21, course: "Software
Development" })
db.students.find()

2. Compass Example:
o Use the GUI to visualize the students collection you created in the shell, and run
a query like:

bash
Copy code
{ "course": "Software Development" }

3. Atlas Example:
o Deploy a production-ready cluster on MongoDB Atlas, connect using the
connection string, and perform the same operations:

bash
Copy code
use school
db.students.insert({ name: "Bob", age: 23, course: "Data Science"
})

Unit 2: Design Non SQL Database


2.1 Drawing tools are properly selected based on database requirements.

Selecting tools of drawing databases

1. Identify NoSQL Drawing Tools

Several tools are available to help visualize and draw NoSQL database structures, including
MongoDB. Here are some popular options:

 Hackolade

o Purpose-built for NoSQL databases, Hackolade provides a visual schema design


interface for MongoDB and other NoSQL databases. It helps in modeling
collections, relationships, and visualizing schema changes.

o Key Features: Supports schema generation, JSON schema, relationship diagrams,


reverse engineering from existing databases.

 Studio 3T

o Studio 3T is a professional IDE for MongoDB that includes a visual query


builder, schema explorer, and data modeling tools. It allows you to generate
entity-relationship diagrams (ERDs) directly from MongoDB collections.

o Key Features: Visual query builder, schema explorer, export schema diagrams,
data visualization.

 DBSchema

o DBSchema is a multi-database tool that provides visual design for MongoDB


databases. It offers schema design, ER diagrams, and visual query builders,
making it suitable for managing and exploring MongoDB collections.
o Key Features: Visual schema design, relational and NoSQL support, export
diagrams, version control for schema changes.

 Draw.io

o While Draw.io is a generic diagramming tool, it can be customized to create


MongoDB or NoSQL database diagrams. You can use it to manually design
collection structures and relationships.

o Key Features: Free, cloud-based, customizable templates, no specific NoSQL


database features.

 Lucidchart

o Another general-purpose diagramming tool, Lucidchart, allows for database


diagramming, including MongoDB schema representations. While not as feature-
rich for NoSQL databases, it’s useful for simple visual representations.

o Key Features: Cloud-based, collaboration features, easy to use, customizable.

2. Installation of Edraw Max Drawing Tool

Edraw Max is a versatile diagramming tool that supports database diagrams, including
NoSQL databases like MongoDB. Here’s how you can install and use it:

Installation Steps:

1. Download Edraw Max:

o Go to the Edraw Max official website.

o Click on the "Download" button to get the installer for your operating system
(Windows, macOS, or Linux).

2. Run the Installer:

o After downloading, open the installer file.

o Follow the on-screen instructions to install the tool on your computer. It will
involve agreeing to the license agreement and choosing the installation directory.

3. Launch Edraw Max:

o Once installed, open Edraw Max from your desktop or start menu.

o You may need to create an account or sign in if you haven't already.

4. Select Database Diagrams:


o In Edraw Max, you can create database diagrams by navigating to New >
Database Modeling.

o Use the provided templates and tools to design your MongoDB database schema.

Key Features of Edraw Max for Database Design:

 Drag-and-drop interface: Easily create collections, relationships, and fields by dragging


objects into your workspace.

 Template library: Use pre-built database design templates or start from scratch.

 Collaboration: Share diagrams with team members and work collaboratively on database
designs.

 Export options: Export your diagrams as PNG, PDF, SVG, and more for easy sharing.

2.2 Conceptual Data Modeling is created based on the structure of the data and its
relationships.

● Creating a conceptual data model is an essential first step in database design. It represents
the entities, relationships, and data flow at a high level, without delving into the technical
details. Here's a detailed guide on how to approach this for a NoSQL database like
MongoDB:

1. Identify Collections

● In MongoDB, data is stored in collections, which are analogous to tables in a relational


database. To create a conceptual data model, you need to identify the key entities in your
application that will map to collections. Each entity or major concept in your application
should have its own collection.

 Example: For a university application, the collections might be:

o Students

o Courses

o Instructors

o Departments

● These collections represent major entities within your domain.

2. Model Entity Relationships


● Once the collections (entities) are identified, the next step is to define the relationships
between them. In MongoDB, relationships are usually modeled using:

 Embedding: Related data is stored within the same document.

o Example: A student document embeds a list of enrolled courses.

 Referencing: Related data is stored in different documents, with references (like foreign
keys in relational databases).

o Example: A student document references the instructor’s ID instead of embedding


the entire instructor’s information.

● You need to consider:

 One-to-One: Embed if data is always accessed together.

 One-to-Many: Embed or reference, depending on access patterns.

 Many-to-Many: Use references for flexibility.

● Example:

 One student can enroll in multiple courses (One-to-Many relationship).

 One instructor can teach multiple courses (One-to-Many relationship).

 Students and instructors can belong to a department (Many-to-One relationship).

3. Define Sharding and Replication

● When planning for scalability, you should define how your data will be distributed across
different servers. This includes sharding (splitting data across multiple nodes) and replication
(duplicating data across nodes for high availability).

 Sharding:

o In MongoDB, sharding is used to distribute large datasets across multiple servers.

o Identify collections that will grow large and may need sharding (e.g., "Students"
or "Courses" in a large university system).

o Choose a shard key that helps evenly distribute data across servers (e.g., student
IDs or course IDs).

 Replication:

o MongoDB uses replication to ensure high availability by duplicating data across


multiple servers.
o Define which collections or databases should be replicated to ensure data
redundancy (e.g., replicate critical collections like "Students" and "Courses").

4. Visualize High-Level Data Model

● A Conceptual Data Model should be visualized using diagrams to represent the entities,
relationships, and data flow. Two popular tools to visualize NoSQL database models are:

UML Class Diagrams:

 UML (Unified Modeling Language) can be used to visually represent the entities
(collections) and their relationships.

 Each class in UML represents a collection in MongoDB, with attributes corresponding to


the fields in that collection.

 Associations between classes represent relationships (embedding or referencing).

● Example:

Data Flow Diagrams (DFDs):

 DFDs illustrate how data flows through the system. They represent the flow of
information between external entities, processes, and data stores (collections).
 In a MongoDB context, the data stores would represent the collections, and the processes
would represent how data is created, read, updated, and deleted (CRUD operations).

● Example:

5. Design a Conceptual Data Model

● Combining all the steps above, you can design a high-level conceptual data model. This
model will focus on the overall structure of your MongoDB collections, their relationships,
and how the data will be accessed and distributed.

● Steps to design the conceptual model:

 Identify collections: Determine key entities that need collections.

 Model relationships: Define how collections relate to each other (embed or reference).

 Sharding & Replication: Plan for scalability by defining shard keys and identifying
replicated collections.

 Visualize the model: Use UML diagrams and DFDs to represent the data model.

Example Conceptual Data Model:

● For a university management system:

 Collections: Students, Courses, Instructors, Departments, Enrollments.

 Relationships:

o A student can enroll in multiple courses.


o Each course is taught by one instructor.

o Instructors and students are part of a department.

 Sharding: Use student ID as the shard key for the "Students" collection.

 Replication: Replicate the "Students" and "Courses" collections for high availability.

● UML and DFD diagrams help in visualizing this model, capturing how entities interact and
how data flows through the system.

2.3 Database Schema is clearly designed according to Mongoose.


Designing MongoDB Database Schema

When designing a MongoDB database schema, it’s essential to focus on factors that will
optimize performance, maintainability, and scalability. Here's a structured approach that includes
identifying workloads, defining collections, relationships, validation, normalization, and
applying design patterns.

1. Identify Application Workload

Understanding the application workload is crucial because MongoDB’s schema design should be
guided by how the data is accessed and used in the application. Consider the following:

 Read-heavy or write-heavy: Determine whether the application performs more reads or


writes. This will influence how you structure data and choose between embedding and
referencing.
 Access patterns: Analyze the most common queries. MongoDB schemas should be
designed to optimize for the most frequent access patterns.
o Example: If users frequently retrieve both students and their courses, it may be
beneficial to embed course information within the student document.
 Data size and growth: Estimate how much data will be stored now and in the future to
plan for scalability, sharding, and indexing strategies.

Example: For an e-commerce application:

 Workload: Mostly reads (users viewing product pages) but with significant write
operations during product creation and order placement.
 Frequent queries: Retrieving product information, fetching user orders, filtering products
by category.
2. Define Collection Structure

MongoDB uses collections to store documents (analogous to tables in relational databases).


Defining your collection structure involves deciding how to organize data.

Embedding vs. Referencing:

 Embedding: Used when related data is frequently accessed together and can be stored
within the same document.
o Example: An order document can embed the product details since they are
typically viewed together.
 Referencing: Used when related data needs to be separated for flexibility or when it’s
accessed independently.
o Example: A separate collection for users and another for their orders, with user
IDs referenced in the order documents.

Key Considerations:

 Document size limit: MongoDB documents have a size limit of 16MB, so you must
avoid over-embedding large datasets.
 Frequent updates: If an embedded document changes frequently, it might be better to
reference it instead to avoid unnecessary large document rewrites.

Example Collection Structure for an e-commerce app:

 Users: name, email, address


 Products: name, description, price, category
 Orders: userId (reference to Users), embedded products with quantity, orderDate

3. Map Schema Relationships

MongoDB supports flexible relationships that allow embedding, referencing, or a combination of


both depending on the access patterns. Typical relationships are:

 One-to-One: Embed the related document directly if it’s always accessed together.
o Example: A user profile with address details.
 One-to-Many:
o Embed if the “many” side is relatively small and frequently accessed with the
parent.
o Use references if the “many” side is large or frequently accessed independently.
o Example: A product can have multiple reviews, but the reviews may be stored in
a separate collection if there are a lot of them.
 Many-to-Many: Use references to maintain flexibility and avoid document bloat.
o Example: A many-to-many relationship between students and courses could be
managed through references with a separate collection (e.g., enrollments) to
track which students are enrolled in which courses.
Example:

4. Validate and Normalize Schema

MongoDB supports flexible, schema-less designs, but using schema validation can help enforce
structure and consistency.

Validation: You can define validation rules to ensure data consistency and enforce constraints
like required fields, field types, etc.

 Example: Use the $jsonSchema operator to enforce schema validation.

Normalization: MongoDB allows for denormalization (embedding) to improve read


performance, but you should avoid unnecessary data duplication, especially if the data changes
frequently. Normalize where data integrity is critical, such as with referencing relationships.

Example:

 Embed the product details (like name and price) inside the order document, but reference
the user by userId to avoid duplicating user information in every order.

5. Apply Design Patterns

MongoDB has several design patterns that can be applied to optimize schema design:

 Extended Reference Pattern: Use this pattern to partially embed documents and also
maintain references for flexibility.
o Example: For each order, embed basic product details (name, price) for faster
access, but also store a reference (productId) to the full product document.
 Bucket Pattern: This is used to group data into fixed-size "buckets" for performance
reasons. It’s especially useful for time-series data.
o Example: In a logging system, you could store logs grouped by hour in a single
document (bucket).
 Subset Pattern: Store frequently accessed data as a subset inside the document, while
less frequently accessed data is referenced elsewhere.
o Example: In a blog post, store the latest comments inside the post document but
reference the full comment history in a separate collection.

Example:

You might also like