0% found this document useful (0 votes)

8 views24 pages

NONSQL-DATABASE_NOTE

The document outlines a module for developing NoSQL databases as part of a TVET Level 5 curriculum in Software Development, detailing the knowledge, skills, and attitudes required. It includes learning outcomes, assessment methods, and performance criteria for preparing, designing, implementing, and managing NoSQL databases. Key concepts covered include database requirements, types of NoSQL databases, data modeling, and validation processes.

Uploaded by

keza loenah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views24 pages

NONSQL-DATABASE_NOTE

Uploaded by

keza loenah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

NoSQL DATABASE DEVELOPMENT

SWDND501
BDCPC301 - Develop NoSQL Database
Trainer: Samie TWAHIRWA

Competence
RQF Level: 5 Learning Hours
60
Credits: 6

Sector: ICT & MULTIMEDIA

Trade: SOFTWARE DEVELOPMENT

Module Type: Specific

Curriculum: ICTSWD5001-TVET Certificate V in Software Development

Copyright: © Rwanda TVET Board, 2024

Issue Date: February 2024

Purpose statement This specific module describes the knowledge, skills and attitude required to
Develop NoSQL databases. This module is intended to prepare students
pursuing TVET Level 5 in Software Development. At the end of this module the
student will be able to Prepare database environment, Design database,
Implement database, Manage Database.

Learning assumed to N/A

be in place

Delivery modality Training delivery 100% Assessment Total 100%

Theoretical content 30% 30%

Practical work:

Group project 20% Formative

and 50%
assessment
presentation 70% 70%
Individual 50%
project
/Work
Summative Assessment 50%
Elements of Competence and Performance Criteria

Elements of Performance criteria

competence
1.Prepare 1.1 Database requirements are properly identified based on user
database requirements
environment 1.2 Database is clearly analysed based on database requirements.
1.3 Database environment is successfully prepared based on established
standards.
2.Design 2.1 Drawing tools are properly selected based on database
database requirements.
2.2 Conceptual Data Modeling is created based on the structure of the
data and its relationships.
2.3 Database Schema is clearly designed according to Mongoose.
3. Implement 3.1 MongoDB data definition are properly performed based on database
database requirements
design 3.2 MongoDB data manipulation are properly performed based on
database requirements
3.3 Query optimizations are properly applied based on query
performance.
4.1 Database users are effectively managed with appropriate
permissions.
4. Manage
4.2 Database is effectively secured in line with best practices.
Database
4.3 Database is successfully deployed based on the targeted
environment.
Unit 1: Prepare database environment
Learning Outcome :1.1 Database requirements are properly identified based on user
requirements

● Identifying Database requirements

Key Terms Definitions
✔ NoSQL (Not Only SQL):
A type of database designed to handle unstructured, semi-structured, or distributed data that
doesn’t fit well into traditional relational databases. NoSQL databases are flexible, scalable, and
capable of handling large volumes of diverse data. Examples include MongoDB, Cassandra, Redis,
and CouchDB. Common NoSQL models include key-value, document-based, column-family, and
graph databases.
✔ MongoDB:
A document-oriented NoSQL database that stores data in flexible, JSON-like documents. It is
schema-less, allowing for high flexibility in data models, and is well-suited for handling large
amounts of data across distributed systems.
✔ Availability:
A database's ability to remain accessible and operational, even during failures or high traffic. In
NoSQL databases, availability is often prioritized over consistency, especially in distributed
systems (as per the CAP theorem), ensuring that the database continues to respond to requests.
✔ Documents:
In MongoDB, a document is a record or a data unit stored in BSON (Binary JSON) format. A
document consists of fields (key-value pairs) similar to JSON objects. Each document is analogous
to a row in a relational database, but it can have a dynamic schema where fields can vary between
documents.
✔ Collection:
A group of MongoDB documents. A collection is analogous to a table in relational databases, but
unlike tables, a MongoDB collection does not enforce a rigid schema. Documents within a
collection can have varying structures.
✔ Indexing:
A technique used to speed up data retrieval operations in a database. Indexes create an ordered
data structure that allows the database to quickly locate and access data without scanning the
entire dataset. MongoDB supports various types of indexes, including single-field, compound,
and geospatial indexes.
✔ Optimistic Locking:
A concurrency control mechanism where a transaction does not lock data when reading it.
Instead, data is checked for modifications before committing changes. If another transaction has
modified the data, the current transaction will fail, and the user must retry the operation. It’s
often used in distributed systems to avoid locking and ensure high performance.
✔ Relationships:
In MongoDB, relationships between documents can be modeled using embedding (storing
related documents inside one another) or referencing (storing references to other documents).
MongoDB doesn't use joins like relational databases; instead, relationships are handled at the
application level.
✔ Data Model:
A blueprint for how data is stored, organized, and manipulated in a database. In MongoDB, the
data model is more flexible compared to relational databases, allowing for dynamic schema
changes and the ability to store diverse types of data within the same collection.
✔ Schema:
The structure or organization of data in a database. In relational databases, a schema is rigid and
defines how tables, fields, and relationships are structured. In MongoDB, schemas are flexible,
meaning each document in a collection can have a different structure or set of fields. Schema
validation can be applied in MongoDB to enforce certain rules on data.
✔ Mongosh (MongoDB Shell):
The interactive command-line interface used to interact with a MongoDB instance. Mongosh
allows users to perform administrative tasks, manage databases, query data, and execute
JavaScript within the MongoDB environment.

Identifying User Requirements

When identifying user requirements for a database system, it’s essential to gather detailed
information about the user's needs. This process involves understanding the type of data, the
volume of data, and how users will interact with the database. Key questions to ask include:

 What kind of data will be stored? (Structured, unstructured, or semi-structured?)

 How much data is expected to be stored and how quickly will it grow?
 How will users access and update the data? (CRUD operations: Create, Read, Update,
Delete)
 What are the performance expectations? (e.g., response time, scalability)
 Are there any security or privacy requirements?
 What kind of availability and consistency is expected?
Once these requirements are collected, they guide the choice of database type (SQL vs. NoSQL)
and the database's structure.

Characteristics of Collections (in MongoDB)

 Flexible Schema: Collections in MongoDB do not enforce a strict schema, meaning each
document (record) in a collection can have a different structure, making it easy to store
diverse data types.
 Grouping of Documents: A collection is a grouping of documents, where each
document represents an individual record (similar to a row in relational databases).
 Indexing Support: Collections can be indexed to improve the speed of queries.
 Sharding and Replication: Collections can be sharded across multiple servers for
scalability and replicated for high availability.
 Storage of Similar Documents: While flexible, collections typically store documents
that have similar fields or serve similar purposes.

Features of NoSQL Databases

 Schema-less Data Storage: NoSQL databases are not rigid with schema design, which
allows for storing structured, semi-structured, or unstructured data.
 Horizontal Scalability: NoSQL databases are designed to scale out by distributing data
across multiple nodes (sharding), making them ideal for handling large amounts of data.
 High Availability: NoSQL databases prioritize availability by replicating data across
multiple nodes, which helps avoid downtime during network or server failures.
 Eventual Consistency: In distributed systems, NoSQL databases often focus on
eventual consistency, meaning that data will become consistent across nodes after some
time.
 Optimized for Large-Scale Data: NoSQL databases handle large datasets and high-
velocity data much more effectively than traditional relational databases.
 Handling of Unstructured Data: NoSQL databases can manage and store unstructured
or semi-structured data like JSON, XML, and multimedia files.

Types of NoSQL Databases

1. Key-Value Stores:
o Data is stored as key-value pairs, where each key is unique and maps to a specific
value.
o Example: Redis, Amazon DynamoDB.
2. Document-Oriented Databases:
oData is stored in documents, typically in JSON, BSON, or XML format, and each
document is semi-structured.
o Example: MongoDB, CouchDB.
3. Column-Family Stores:
o Data is stored in tables but organized by columns instead of rows, making it
efficient for reading/writing large datasets.
o Example: Apache Cassandra, HBase.
4. Graph Databases:
o Designed to store data in nodes and edges, representing relationships between
data points, making them ideal for social networks and recommendation systems.
o Example: Neo4j, Amazon Neptune.

Data Types in NoSQL Databases

 String: A sequence of characters (e.g., text fields, names, and identifiers).

 Number (Integer/Float): Numeric values for calculations and measurements (e.g., age,
product price).
 Boolean: True or false values.
 Array: A collection of elements, such as a list of values (e.g., an array of product IDs).
 Object: A nested structure representing an entity, often used in document-oriented
databases.
 Binary Data: Images, audio, video, or other binary data formats.
 Geospatial Data: Data used for geographic coordinates, supported by certain NoSQL
databases like MongoDB with geospatial queries.

Defining Use Cases

Use cases describe how users will interact with the system, detailing the actions users perform to
achieve a specific goal. For NoSQL databases, use cases help define the data models and
operations.

1. E-Commerce Platform:
o Users: Shoppers, sellers, and admins.
o Actions: Browse products, add items to the cart, place orders, view order history.
o Database Operations: Store product catalogs (key-value), manage user profiles
(documents), and track transactions (document collections).
2. Social Media Application:
o Users: General users and administrators.
o Actions: Post updates, like and comment on posts, follow/unfollow other users.
o Database Operations: Store user profiles (documents), manage relationships
between users (graph database), and manage posts and comments (documents).
3. IoT Data Management:
o Users: Device operators and data analysts.
o Actions: Monitor real-time sensor data, store historical data, trigger alerts based
on thresholds.
o Database Operations: Store time-series data (key-value or column-family),
analyze patterns in sensor data (graph or document).

L.O: 1.2 Database is clearly analysed based on database requirements.

Analyzing NoSQL Database

NoSQL databases are highly flexible, but it's essential to conduct a thorough analysis before
implementing them. This analysis includes a comprehensive understanding of system
requirements, data types, scalability needs, and user requirements.

Requirements Analysis Process for NoSQL Databases

The requirements analysis process helps determine the appropriate database structure,
performance, and functionality to meet user needs. The process typically includes the following
steps:

1. Identify Key Stakeholders and End-Users

 Key Stakeholders: Individuals or groups who have a vested interest in the project, such
as:
o Business Leaders: Define the business goals, timelines, and budget.
o IT/Database Administrators: Oversee the database design, performance, and
security.
o Developers: Design and implement the database schema and queries.
o End-Users: Individuals who will interact with the system daily. They provide
crucial input on what the database should achieve (e.g., sales teams, data analysts,
customers using an app).
 End-Users’ Expectations: Gather insights from users about how they expect the
database to work, such as ease of data retrieval, scalability, and the types of queries they
need to perform.
2. Capture Requirements

 Gather Functional Requirements:

o What are the core functions of the database? For example, storing customer data,
retrieving product information, and processing large-scale analytics.
o Identify the specific data types the system needs to store (e.g., unstructured data
like JSON, media files, logs).
o Understand access patterns: Will users mostly be reading or writing data? How
frequently?
 Non-Functional Requirements:
o Performance: What are the expected response times for data queries?
o Scalability: How much data will the system need to handle, both now and in the
future? Will it need to scale out to handle high traffic or large data volumes?
o Availability: How critical is 24/7 uptime for the system? What level of fault
tolerance is required?
o Security: What are the data privacy and security concerns? Are there regulatory
requirements (e.g., GDPR, HIPAA)?

3. Categorize Requirements

 Functional vs. Non-Functional Requirements:

o Functional Requirements: Define what the database must do (e.g., support
CRUD operations, support querying large datasets, ensure quick data retrieval).
o Non-Functional Requirements: Define quality attributes such as performance,
scalability, security, and data consistency.
 Priority Levels:
o Must-Have: Essential requirements for system success.
o Should-Have: Important but not critical features.
o Nice-to-Have: Features that improve the system but are not essential.

4. Interpret and Record Requirements

 Data Modeling: Once the requirements are captured, create data models. In the case of
NoSQL, data modeling may involve:
o Defining collections or tables and their structure.
o Understanding relationships between entities (e.g., embedding vs. referencing
documents in MongoDB).
o Selecting a database that matches the use case (e.g., choosing a document-
oriented database like MongoDB for semi-structured data).
 Document Requirements: Store all gathered requirements in a formal document that
outlines how the system will handle data, scalability, and access.
5. Validate Requirements

 Review Sessions with Stakeholders:

o Validate that the captured requirements accurately reflect the expectations of all
stakeholders.
o Conduct review sessions where stakeholders confirm whether the database design
meets business and technical needs.
 Prototyping: Consider building a prototype of the NoSQL database to test whether it
meets user requirements before full-scale implementation.

Perform Data Analysis

Data analysis involves understanding the types of data that will be stored in the database, the
relationships between the data, and how the data will be used.

 Identify Data Types: Analyze the data that the system will manage (e.g., documents,
multimedia files, JSON objects, log data). Ensure that the database selected can
efficiently store and process this data.
 Data Patterns: Determine how the data will be accessed. For instance, document-based
databases like MongoDB excel at handling semi-structured or unstructured data such as
JSON files, while key-value stores like Redis are optimal for fast retrieval of single
values.
 Analyze Relationships: NoSQL databases handle relationships differently compared to
relational databases:
o Embedding: Store related data in a single document.
o Referencing: Use a reference to link separate documents.
 Query Requirements: Understand the types of queries users will run. For example, if
complex relationships between entities are involved, a graph database (e.g., Neo4j) may
be more appropriate.

Implement Data Validation

Data validation ensures that the data entering the database conforms to the expected format,
structure, and constraints, even in flexible NoSQL databases.

 Schema Validation: Even though NoSQL databases like MongoDB are schema-less,
they offer schema validation to ensure that inserted documents meet specific conditions
(e.g., required fields, field types).
o Example in MongoDB: You can define JSON schema rules to enforce the
structure of documents.
 Constraints:
o Required Fields: Ensure that certain fields (e.g., user_id, email) are always
present in the document.
o Data Types: Enforce that fields conform to a specific data type (e.g., a field must
be a string, number, or array).
o Range Validation: Ensure that numeric or date values fall within expected
ranges.
 Data Integrity: Since NoSQL databases prioritize availability over consistency in some
cases, ensure that the system includes proper mechanisms for validating data integrity,
such as:
o Optimistic Locking: Avoids conflicts during concurrent updates.
o Consistency Checks: Run periodic checks to ensure the data is synchronized and
valid across multiple nodes.

L.O: 1.3 Database environment is successfully prepared based on established standards.

Preparing Database Environment for MongoDB

Setting up the MongoDB environment involves ensuring that the database is configured for
optimal performance, scalability, and usability. This process includes setting up the necessary
tools, environments, and configurations for both development and production use.

1. Identifying the Scalability of MongoDB

MongoDB is known for its horizontal scalability, which allows it to handle increasing data
volumes by distributing data across multiple servers. Here are the key aspects of MongoDB's
scalability:

Sharding:

 MongoDB uses sharding to partition data across multiple servers. This ensures that large
datasets can be distributed and processed efficiently.
 Shard Key: A key is chosen to distribute data, ensuring an even load across the cluster.
 Horizontal Scalability: New nodes (servers) can be added to handle increased
workloads without impacting performance.

Replica Sets:

 MongoDB uses replica sets to ensure high availability and fault tolerance.
 A replica set consists of a primary node (where write operations are directed) and
secondary nodes (which replicate the data for backup and fault tolerance).
 Automatic Failover: If the primary node goes down, one of the secondary nodes will
automatically become the new primary.

Load Balancing:

 MongoDB distributes queries across shards and replica sets, ensuring that the system can
handle a large number of concurrent read and write operations.
 Elastic Scalability: MongoDB can scale up and down dynamically to meet fluctuating
data loads.

2. Setting up MongoDB Environment

MongoDB can be set up in multiple environments depending on the use case and deployment
scenario. The three most common environments are MongoDB Shell, Compass, and Atlas.

MongoDB Shell Environment (Mongosh)

MongoDB Shell (Mongosh) is the command-line interface for interacting with MongoDB.

1. Install MongoDB:
o Download and install MongoDB from the official MongoDB website.
o Ensure that MongoDB is added to the system's path for easy access from the
terminal or command prompt.
2. Using the Shell (Mongosh):
o After installation, open the terminal or command prompt and run:

bash
Copy code
mongosh

o This opens the MongoDB shell, where you can execute MongoDB commands,
run JavaScript code, and manage your database.
3. Basic Shell Commands:
o Show Databases:

bash
Copy code
show dbs

o Create/Use a Database:

bash
Copy code
use myDatabase
o Insert Data:

bash
Copy code
db.myCollection.insert({ name: "John", age: 30 })

o Query Data:

bash
Copy code
db.myCollection.find({ name: "John" })

MongoDB Compass Environment

MongoDB Compass is a graphical user interface (GUI) for MongoDB that provides an easier
way to visualize and manage data without using the command line.

1. Install MongoDB Compass:

o Download MongoDB Compass from the official MongoDB website.
o Follow the installation instructions for your operating system.
2. Connecting to MongoDB:
o Open Compass and connect to your MongoDB instance by providing the
connection string (e.g., mongodb://localhost:27017 for a local instance).
3. Features of Compass:
o Visualize Data: Browse through databases and collections, view documents in a
user-friendly way, and edit documents directly.
o Query Builder: Build and run queries without writing code.
o Indexing: View and manage indexes to improve query performance.
o Aggregation Pipeline Builder: Construct complex aggregation pipelines using
the GUI.
4. Sample Operations:
o Insert, update, and delete documents using the visual interface.
o Analyze query performance with the built-in performance analysis tools.

MongoDB Atlas Environment

MongoDB Atlas is MongoDB's fully managed cloud database service, which simplifies database
deployment and management.

1. Setting Up MongoDB Atlas:

o Sign up for an account at MongoDB Atlas.
o Once signed in, create a new cluster by following the step-by-step guide in Atlas.
o Atlas allows you to choose a cloud provider (e.g., AWS, Google Cloud, Azure)
and a region to host your database.
2. Connecting to MongoDB Atlas:
o After creating a cluster, MongoDB Atlas provides a connection string that can be
used to connect to the database from MongoDB Shell, Compass, or any
MongoDB client.
o Example connection string:

bash
Copy code
mongodb+srv://username:[email protected]/myDatabase?r
etryWrites=true&w=majority

3. Managing the Atlas Environment:

o Cluster Management: Easily scale up or down the cluster based on usage.
o Backup and Restore: Automated backups and restoration features are provided.
o Security Features: Atlas offers encryption, access control, IP whitelisting, and
two-factor authentication to secure your data.
o Monitoring and Alerts: Atlas includes a monitoring dashboard that tracks
database performance metrics like memory usage, CPU load, and query
performance.
4. Atlas Integration with Development Environments:
o Connect Atlas to development environments like Node.js, Python, or any other
programming language using MongoDB drivers.
o Atlas supports seamless integration with modern frameworks and tools for cloud-
native applications.

Sample Setup for Each Environment

1. Shell Example:
o After starting mongosh, you can insert a document into a new collection:

bash
Copy code
use school
db.students.insert({ name: "Alice", age: 21, course: "Software
Development" })
db.students.find()

2. Compass Example:
o Use the GUI to visualize the students collection you created in the shell, and run
a query like:

bash
Copy code
{ "course": "Software Development" }

3. Atlas Example:
o Deploy a production-ready cluster on MongoDB Atlas, connect using the
connection string, and perform the same operations:

bash
Copy code
use school
db.students.insert({ name: "Bob", age: 23, course: "Data Science"
})

Unit 2: Design Non SQL Database

2.1 Drawing tools are properly selected based on database requirements.

Selecting tools of drawing databases

1. Identify NoSQL Drawing Tools

Several tools are available to help visualize and draw NoSQL database structures, including
MongoDB. Here are some popular options:

 Hackolade

o Purpose-built for NoSQL databases, Hackolade provides a visual schema design

interface for MongoDB and other NoSQL databases. It helps in modeling
collections, relationships, and visualizing schema changes.

o Key Features: Supports schema generation, JSON schema, relationship diagrams,

reverse engineering from existing databases.

 Studio 3T

o Studio 3T is a professional IDE for MongoDB that includes a visual query

builder, schema explorer, and data modeling tools. It allows you to generate
entity-relationship diagrams (ERDs) directly from MongoDB collections.

o Key Features: Visual query builder, schema explorer, export schema diagrams,
data visualization.

 DBSchema

o DBSchema is a multi-database tool that provides visual design for MongoDB

databases. It offers schema design, ER diagrams, and visual query builders,
making it suitable for managing and exploring MongoDB collections.
o Key Features: Visual schema design, relational and NoSQL support, export
diagrams, version control for schema changes.

 Draw.io

o While Draw.io is a generic diagramming tool, it can be customized to create

MongoDB or NoSQL database diagrams. You can use it to manually design
collection structures and relationships.

o Key Features: Free, cloud-based, customizable templates, no specific NoSQL

database features.

 Lucidchart

o Another general-purpose diagramming tool, Lucidchart, allows for database

diagramming, including MongoDB schema representations. While not as feature-
rich for NoSQL databases, it’s useful for simple visual representations.

o Key Features: Cloud-based, collaboration features, easy to use, customizable.

2. Installation of Edraw Max Drawing Tool

Edraw Max is a versatile diagramming tool that supports database diagrams, including
NoSQL databases like MongoDB. Here’s how you can install and use it:

Installation Steps:

1. Download Edraw Max:

o Go to the Edraw Max official website.

o Click on the "Download" button to get the installer for your operating system
(Windows, macOS, or Linux).

2. Run the Installer:

o After downloading, open the installer file.

o Follow the on-screen instructions to install the tool on your computer. It will
involve agreeing to the license agreement and choosing the installation directory.

3. Launch Edraw Max:

o Once installed, open Edraw Max from your desktop or start menu.

o You may need to create an account or sign in if you haven't already.

4. Select Database Diagrams:

o In Edraw Max, you can create database diagrams by navigating to New >
Database Modeling.

o Use the provided templates and tools to design your MongoDB database schema.

Key Features of Edraw Max for Database Design:

 Drag-and-drop interface: Easily create collections, relationships, and fields by dragging

objects into your workspace.

 Template library: Use pre-built database design templates or start from scratch.

 Collaboration: Share diagrams with team members and work collaboratively on database
designs.

 Export options: Export your diagrams as PNG, PDF, SVG, and more for easy sharing.

2.2 Conceptual Data Modeling is created based on the structure of the data and its
relationships.

● Creating a conceptual data model is an essential first step in database design. It represents
the entities, relationships, and data flow at a high level, without delving into the technical
details. Here's a detailed guide on how to approach this for a NoSQL database like
MongoDB:

1. Identify Collections

● In MongoDB, data is stored in collections, which are analogous to tables in a relational

database. To create a conceptual data model, you need to identify the key entities in your
application that will map to collections. Each entity or major concept in your application
should have its own collection.

 Example: For a university application, the collections might be:

o Students

o Courses

o Instructors

o Departments

● These collections represent major entities within your domain.

2. Model Entity Relationships

● Once the collections (entities) are identified, the next step is to define the relationships
between them. In MongoDB, relationships are usually modeled using:

 Embedding: Related data is stored within the same document.

o Example: A student document embeds a list of enrolled courses.

 Referencing: Related data is stored in different documents, with references (like foreign
keys in relational databases).

o Example: A student document references the instructor’s ID instead of embedding

the entire instructor’s information.

● You need to consider:

 One-to-One: Embed if data is always accessed together.

 One-to-Many: Embed or reference, depending on access patterns.

 Many-to-Many: Use references for flexibility.

● Example:

 One student can enroll in multiple courses (One-to-Many relationship).

 One instructor can teach multiple courses (One-to-Many relationship).

 Students and instructors can belong to a department (Many-to-One relationship).

3. Define Sharding and Replication

● When planning for scalability, you should define how your data will be distributed across
different servers. This includes sharding (splitting data across multiple nodes) and replication
(duplicating data across nodes for high availability).

 Sharding:

o In MongoDB, sharding is used to distribute large datasets across multiple servers.

o Identify collections that will grow large and may need sharding (e.g., "Students"
or "Courses" in a large university system).

o Choose a shard key that helps evenly distribute data across servers (e.g., student
IDs or course IDs).

 Replication:

o MongoDB uses replication to ensure high availability by duplicating data across

multiple servers.
o Define which collections or databases should be replicated to ensure data
redundancy (e.g., replicate critical collections like "Students" and "Courses").

4. Visualize High-Level Data Model

● A Conceptual Data Model should be visualized using diagrams to represent the entities,
relationships, and data flow. Two popular tools to visualize NoSQL database models are:

UML Class Diagrams:

 UML (Unified Modeling Language) can be used to visually represent the entities
(collections) and their relationships.

 Each class in UML represents a collection in MongoDB, with attributes corresponding to

the fields in that collection.

 Associations between classes represent relationships (embedding or referencing).

● Example:

Data Flow Diagrams (DFDs):

 DFDs illustrate how data flows through the system. They represent the flow of
information between external entities, processes, and data stores (collections).
 In a MongoDB context, the data stores would represent the collections, and the processes
would represent how data is created, read, updated, and deleted (CRUD operations).

● Example:

5. Design a Conceptual Data Model

● Combining all the steps above, you can design a high-level conceptual data model. This
model will focus on the overall structure of your MongoDB collections, their relationships,
and how the data will be accessed and distributed.

● Steps to design the conceptual model:

 Identify collections: Determine key entities that need collections.

 Model relationships: Define how collections relate to each other (embed or reference).

 Sharding & Replication: Plan for scalability by defining shard keys and identifying
replicated collections.

 Visualize the model: Use UML diagrams and DFDs to represent the data model.

Example Conceptual Data Model:

● For a university management system:

 Collections: Students, Courses, Instructors, Departments, Enrollments.

 Relationships:

o A student can enroll in multiple courses.

o Each course is taught by one instructor.

o Instructors and students are part of a department.

 Sharding: Use student ID as the shard key for the "Students" collection.

 Replication: Replicate the "Students" and "Courses" collections for high availability.

● UML and DFD diagrams help in visualizing this model, capturing how entities interact and
how data flows through the system.

2.3 Database Schema is clearly designed according to Mongoose.

Designing MongoDB Database Schema

When designing a MongoDB database schema, it’s essential to focus on factors that will
optimize performance, maintainability, and scalability. Here's a structured approach that includes
identifying workloads, defining collections, relationships, validation, normalization, and
applying design patterns.

1. Identify Application Workload

Understanding the application workload is crucial because MongoDB’s schema design should be
guided by how the data is accessed and used in the application. Consider the following:

 Read-heavy or write-heavy: Determine whether the application performs more reads or

writes. This will influence how you structure data and choose between embedding and
referencing.
 Access patterns: Analyze the most common queries. MongoDB schemas should be
designed to optimize for the most frequent access patterns.
o Example: If users frequently retrieve both students and their courses, it may be
beneficial to embed course information within the student document.
 Data size and growth: Estimate how much data will be stored now and in the future to
plan for scalability, sharding, and indexing strategies.

Example: For an e-commerce application:

 Workload: Mostly reads (users viewing product pages) but with significant write
operations during product creation and order placement.
 Frequent queries: Retrieving product information, fetching user orders, filtering products
by category.
2. Define Collection Structure

MongoDB uses collections to store documents (analogous to tables in relational databases).

Defining your collection structure involves deciding how to organize data.

Embedding vs. Referencing:

 Embedding: Used when related data is frequently accessed together and can be stored
within the same document.
o Example: An order document can embed the product details since they are
typically viewed together.
 Referencing: Used when related data needs to be separated for flexibility or when it’s
accessed independently.
o Example: A separate collection for users and another for their orders, with user
IDs referenced in the order documents.

Key Considerations:

 Document size limit: MongoDB documents have a size limit of 16MB, so you must
avoid over-embedding large datasets.
 Frequent updates: If an embedded document changes frequently, it might be better to
reference it instead to avoid unnecessary large document rewrites.

Example Collection Structure for an e-commerce app:

 Users: name, email, address

 Products: name, description, price, category
 Orders: userId (reference to Users), embedded products with quantity, orderDate

3. Map Schema Relationships

MongoDB supports flexible relationships that allow embedding, referencing, or a combination of

both depending on the access patterns. Typical relationships are:

 One-to-One: Embed the related document directly if it’s always accessed together.
o Example: A user profile with address details.
 One-to-Many:
o Embed if the “many” side is relatively small and frequently accessed with the
parent.
o Use references if the “many” side is large or frequently accessed independently.
o Example: A product can have multiple reviews, but the reviews may be stored in
a separate collection if there are a lot of them.
 Many-to-Many: Use references to maintain flexibility and avoid document bloat.
o Example: A many-to-many relationship between students and courses could be
managed through references with a separate collection (e.g., enrollments) to
track which students are enrolled in which courses.
Example:

4. Validate and Normalize Schema

MongoDB supports flexible, schema-less designs, but using schema validation can help enforce
structure and consistency.

Validation: You can define validation rules to ensure data consistency and enforce constraints
like required fields, field types, etc.

 Example: Use the $jsonSchema operator to enforce schema validation.

Normalization: MongoDB allows for denormalization (embedding) to improve read

performance, but you should avoid unnecessary data duplication, especially if the data changes
frequently. Normalize where data integrity is critical, such as with referencing relationships.

Example:

 Embed the product details (like name and price) inside the order document, but reference
the user by userId to avoid duplicating user information in every order.

5. Apply Design Patterns

MongoDB has several design patterns that can be applied to optimize schema design:

 Extended Reference Pattern: Use this pattern to partially embed documents and also
maintain references for flexibility.
o Example: For each order, embed basic product details (name, price) for faster
access, but also store a reference (productId) to the full product document.
 Bucket Pattern: This is used to group data into fixed-size "buckets" for performance
reasons. It’s especially useful for time-series data.
o Example: In a logging system, you could store logs grouped by hour in a single
document (bucket).
 Subset Pattern: Store frequently accessed data as a subset inside the document, while
less frequently accessed data is referenced elsewhere.
o Example: In a blog post, store the latest comments inside the post document but
reference the full comment history in a separate collection.

Example:

Email Mobile Database of Purchase Heads Sample 1
0% (1)
Email Mobile Database of Purchase Heads Sample 1
6 pages
Nosql Notes
No ratings yet
Nosql Notes
110 pages
Swdnd501 Note
No ratings yet
Swdnd501 Note
7 pages
NoSQL Lecture Notes Compilation
No ratings yet
NoSQL Lecture Notes Compilation
5 pages
Mongo DB
No ratings yet
Mongo DB
33 pages
Unit-V DBMS
No ratings yet
Unit-V DBMS
19 pages
unit-2-bda
No ratings yet
unit-2-bda
28 pages
Unit-V SQL
No ratings yet
Unit-V SQL
18 pages
NOSQL.pptx
No ratings yet
NOSQL.pptx
50 pages
Full Stack UNIT 3
No ratings yet
Full Stack UNIT 3
36 pages
NOSQL Interview Q&A
No ratings yet
NOSQL Interview Q&A
25 pages
281507lecture Notes 1 - Introduction To MongoDB-1718181125439
No ratings yet
281507lecture Notes 1 - Introduction To MongoDB-1718181125439
8 pages
Unit 3
No ratings yet
Unit 3
28 pages
ADBMS original-output
No ratings yet
ADBMS original-output
28 pages
2383_1019_DOC_NoSQL Databases
No ratings yet
2383_1019_DOC_NoSQL Databases
6 pages
Unit-I Remaining HM
No ratings yet
Unit-I Remaining HM
32 pages
Unit VI Big data
No ratings yet
Unit VI Big data
19 pages
db 5
No ratings yet
db 5
39 pages
DBMS Unit2
No ratings yet
DBMS Unit2
26 pages
UNIT II First Half Notes
No ratings yet
UNIT II First Half Notes
21 pages
Unit 4
No ratings yet
Unit 4
36 pages
Unit 6
No ratings yet
Unit 6
143 pages
BDA Unit-5
No ratings yet
BDA Unit-5
18 pages
Unit 5
No ratings yet
Unit 5
137 pages
CHAP1 no sql database_085309
No ratings yet
CHAP1 no sql database_085309
72 pages
Unit 1
No ratings yet
Unit 1
23 pages
nosql
No ratings yet
nosql
64 pages
mongodb_report
No ratings yet
mongodb_report
26 pages
Unit - 3
No ratings yet
Unit - 3
34 pages
NoSQL Technologies Notes Unit 1
100% (1)
NoSQL Technologies Notes Unit 1
20 pages
PPT 2.1.2
No ratings yet
PPT 2.1.2
31 pages
Nosql Module 1
No ratings yet
Nosql Module 1
23 pages
NoSQL+Databases+and+MongoDB+-+I+ +Lecture+Notes
No ratings yet
NoSQL+Databases+and+MongoDB+-+I+ +Lecture+Notes
7 pages
Unit 3
No ratings yet
Unit 3
10 pages
Syllabus ADBMS
No ratings yet
Syllabus ADBMS
3 pages
Introduction To NoSQL
No ratings yet
Introduction To NoSQL
12 pages
UNIT 3 FS Notes
No ratings yet
UNIT 3 FS Notes
45 pages
UNIT 1 MongoDB Fully Complete
100% (1)
UNIT 1 MongoDB Fully Complete
60 pages
Chapter14_BigData&NoSQLDatabases
No ratings yet
Chapter14_BigData&NoSQLDatabases
39 pages
BDA Unit-3
No ratings yet
BDA Unit-3
13 pages
NoSQL Database
No ratings yet
NoSQL Database
45 pages
NoSQL MongoDB HBase Cassandra
100% (1)
NoSQL MongoDB HBase Cassandra
142 pages
Module 1
No ratings yet
Module 1
34 pages
1842-week6-NoSQL
No ratings yet
1842-week6-NoSQL
51 pages
Overview of NoSQL
No ratings yet
Overview of NoSQL
17 pages
No SQL
No ratings yet
No SQL
11 pages
DBMS Unit_III-I[1]
No ratings yet
DBMS Unit_III-I[1]
7 pages
What Is NoSQL
No ratings yet
What Is NoSQL
4 pages
Unit Ii - Nosql Databases
No ratings yet
Unit Ii - Nosql Databases
112 pages
3.A Sample Case Study On MongoDB
No ratings yet
3.A Sample Case Study On MongoDB
9 pages
Introduction to NoSQL
No ratings yet
Introduction to NoSQL
13 pages
No SQL
No ratings yet
No SQL
109 pages
Module 2 Notes
No ratings yet
Module 2 Notes
19 pages
FSD NOTES UNIT-3-1
No ratings yet
FSD NOTES UNIT-3-1
26 pages
NoSQL D
No ratings yet
NoSQL D
26 pages
06-NoSQL
No ratings yet
06-NoSQL
80 pages
NGD
No ratings yet
NGD
9 pages
unit 4 BDA
No ratings yet
unit 4 BDA
22 pages
Nosql Databases Unit-1
No ratings yet
Nosql Databases Unit-1
16 pages
Databases: System Concepts, Designs, Management, and Implementation
From Everand
Databases: System Concepts, Designs, Management, and Implementation
Jonathan Rigdon
No ratings yet
MongoDB Architecture and Operations: Definitive Reference for Developers and Engineers
From Everand
MongoDB Architecture and Operations: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Lecture 2
No ratings yet
Lecture 2
14 pages
Python
No ratings yet
Python
11 pages
BlockChain
No ratings yet
BlockChain
10 pages
BLOCKCHAIN
No ratings yet
BLOCKCHAIN
16 pages
QUALITY ASSURANCE pp2
No ratings yet
QUALITY ASSURANCE pp2
37 pages
Databse-chapter4-Entity Relationship (ER) Modeling
No ratings yet
Databse-chapter4-Entity Relationship (ER) Modeling
33 pages
ERD Mapping To Tables: Lecturer: Rana Salah
No ratings yet
ERD Mapping To Tables: Lecturer: Rana Salah
20 pages
Green Black Geometric How To Find The Right University Presentation
No ratings yet
Green Black Geometric How To Find The Right University Presentation
15 pages
Hive Main Installation
No ratings yet
Hive Main Installation
2 pages
PHP Mysql
No ratings yet
PHP Mysql
18 pages
03 Sqlfunctions
No ratings yet
03 Sqlfunctions
122 pages
5.Introduction to relational databases Relational Model Keys
No ratings yet
5.Introduction to relational databases Relational Model Keys
7 pages
Keys and Integrity Rules
No ratings yet
Keys and Integrity Rules
11 pages
Proj 2 Spec
No ratings yet
Proj 2 Spec
12 pages
Database Partitioning A Review Paper
No ratings yet
Database Partitioning A Review Paper
4 pages
Chapter-4 Database Recovery
No ratings yet
Chapter-4 Database Recovery
32 pages
Design Concepts
No ratings yet
Design Concepts
3 pages
Complete Checklist For Manual Upgrades To Non-CDB Oracle Database 12c Release 2 (12.2) (Doc ID 2173141.1)
No ratings yet
Complete Checklist For Manual Upgrades To Non-CDB Oracle Database 12c Release 2 (12.2) (Doc ID 2173141.1)
11 pages
Concac
No ratings yet
Concac
20 pages
SQL Server Auditing On Aws
No ratings yet
SQL Server Auditing On Aws
27 pages
Coding Horror - A Visual Explanation of SQL Joins
No ratings yet
Coding Horror - A Visual Explanation of SQL Joins
33 pages
1 - Introduction To BI
No ratings yet
1 - Introduction To BI
16 pages
Lab-09-Creating A Recovery Catalog
No ratings yet
Lab-09-Creating A Recovery Catalog
5 pages
Introduction
No ratings yet
Introduction
2 pages
Drillhole Database Creation
No ratings yet
Drillhole Database Creation
8 pages
SQL Operators and Functions: BY Ahamed Hashir.M
No ratings yet
SQL Operators and Functions: BY Ahamed Hashir.M
17 pages
View Serializability
No ratings yet
View Serializability
7 pages
SAP Brtools
No ratings yet
SAP Brtools
16 pages
CLASS 12 IP PRACTICAL FILE (2)-pages-deleted (1)-output
No ratings yet
CLASS 12 IP PRACTICAL FILE (2)-pages-deleted (1)-output
17 pages
DBMS Question Bank (Q112)
No ratings yet
DBMS Question Bank (Q112)
101 pages
Basic Performance Optimization in Django by Ryley Sill Medium
No ratings yet
Basic Performance Optimization in Django by Ryley Sill Medium
30 pages
Chapter 3 Concurrency Control
No ratings yet
Chapter 3 Concurrency Control
44 pages
DBMS Unit 3
No ratings yet
DBMS Unit 3
25 pages
Oracle Tables Defragmentation
No ratings yet
Oracle Tables Defragmentation
10 pages