NONSQL-DATABASE_NOTE
NONSQL-DATABASE_NOTE
SWDND501
BDCPC301 - Develop NoSQL Database
Trainer: Samie TWAHIRWA
Competence
RQF Level: 5 Learning Hours
60
Credits: 6
Practical work:
When identifying user requirements for a database system, it’s essential to gather detailed
information about the user's needs. This process involves understanding the type of data, the
volume of data, and how users will interact with the database. Key questions to ask include:
Flexible Schema: Collections in MongoDB do not enforce a strict schema, meaning each
document (record) in a collection can have a different structure, making it easy to store
diverse data types.
Grouping of Documents: A collection is a grouping of documents, where each
document represents an individual record (similar to a row in relational databases).
Indexing Support: Collections can be indexed to improve the speed of queries.
Sharding and Replication: Collections can be sharded across multiple servers for
scalability and replicated for high availability.
Storage of Similar Documents: While flexible, collections typically store documents
that have similar fields or serve similar purposes.
Schema-less Data Storage: NoSQL databases are not rigid with schema design, which
allows for storing structured, semi-structured, or unstructured data.
Horizontal Scalability: NoSQL databases are designed to scale out by distributing data
across multiple nodes (sharding), making them ideal for handling large amounts of data.
High Availability: NoSQL databases prioritize availability by replicating data across
multiple nodes, which helps avoid downtime during network or server failures.
Eventual Consistency: In distributed systems, NoSQL databases often focus on
eventual consistency, meaning that data will become consistent across nodes after some
time.
Optimized for Large-Scale Data: NoSQL databases handle large datasets and high-
velocity data much more effectively than traditional relational databases.
Handling of Unstructured Data: NoSQL databases can manage and store unstructured
or semi-structured data like JSON, XML, and multimedia files.
1. Key-Value Stores:
o Data is stored as key-value pairs, where each key is unique and maps to a specific
value.
o Example: Redis, Amazon DynamoDB.
2. Document-Oriented Databases:
oData is stored in documents, typically in JSON, BSON, or XML format, and each
document is semi-structured.
o Example: MongoDB, CouchDB.
3. Column-Family Stores:
o Data is stored in tables but organized by columns instead of rows, making it
efficient for reading/writing large datasets.
o Example: Apache Cassandra, HBase.
4. Graph Databases:
o Designed to store data in nodes and edges, representing relationships between
data points, making them ideal for social networks and recommendation systems.
o Example: Neo4j, Amazon Neptune.
Use cases describe how users will interact with the system, detailing the actions users perform to
achieve a specific goal. For NoSQL databases, use cases help define the data models and
operations.
1. E-Commerce Platform:
o Users: Shoppers, sellers, and admins.
o Actions: Browse products, add items to the cart, place orders, view order history.
o Database Operations: Store product catalogs (key-value), manage user profiles
(documents), and track transactions (document collections).
2. Social Media Application:
o Users: General users and administrators.
o Actions: Post updates, like and comment on posts, follow/unfollow other users.
o Database Operations: Store user profiles (documents), manage relationships
between users (graph database), and manage posts and comments (documents).
3. IoT Data Management:
o Users: Device operators and data analysts.
o Actions: Monitor real-time sensor data, store historical data, trigger alerts based
on thresholds.
o Database Operations: Store time-series data (key-value or column-family),
analyze patterns in sensor data (graph or document).
NoSQL databases are highly flexible, but it's essential to conduct a thorough analysis before
implementing them. This analysis includes a comprehensive understanding of system
requirements, data types, scalability needs, and user requirements.
The requirements analysis process helps determine the appropriate database structure,
performance, and functionality to meet user needs. The process typically includes the following
steps:
Key Stakeholders: Individuals or groups who have a vested interest in the project, such
as:
o Business Leaders: Define the business goals, timelines, and budget.
o IT/Database Administrators: Oversee the database design, performance, and
security.
o Developers: Design and implement the database schema and queries.
o End-Users: Individuals who will interact with the system daily. They provide
crucial input on what the database should achieve (e.g., sales teams, data analysts,
customers using an app).
End-Users’ Expectations: Gather insights from users about how they expect the
database to work, such as ease of data retrieval, scalability, and the types of queries they
need to perform.
2. Capture Requirements
3. Categorize Requirements
Data Modeling: Once the requirements are captured, create data models. In the case of
NoSQL, data modeling may involve:
o Defining collections or tables and their structure.
o Understanding relationships between entities (e.g., embedding vs. referencing
documents in MongoDB).
o Selecting a database that matches the use case (e.g., choosing a document-
oriented database like MongoDB for semi-structured data).
Document Requirements: Store all gathered requirements in a formal document that
outlines how the system will handle data, scalability, and access.
5. Validate Requirements
Data analysis involves understanding the types of data that will be stored in the database, the
relationships between the data, and how the data will be used.
Identify Data Types: Analyze the data that the system will manage (e.g., documents,
multimedia files, JSON objects, log data). Ensure that the database selected can
efficiently store and process this data.
Data Patterns: Determine how the data will be accessed. For instance, document-based
databases like MongoDB excel at handling semi-structured or unstructured data such as
JSON files, while key-value stores like Redis are optimal for fast retrieval of single
values.
Analyze Relationships: NoSQL databases handle relationships differently compared to
relational databases:
o Embedding: Store related data in a single document.
o Referencing: Use a reference to link separate documents.
Query Requirements: Understand the types of queries users will run. For example, if
complex relationships between entities are involved, a graph database (e.g., Neo4j) may
be more appropriate.
Data validation ensures that the data entering the database conforms to the expected format,
structure, and constraints, even in flexible NoSQL databases.
Schema Validation: Even though NoSQL databases like MongoDB are schema-less,
they offer schema validation to ensure that inserted documents meet specific conditions
(e.g., required fields, field types).
o Example in MongoDB: You can define JSON schema rules to enforce the
structure of documents.
Constraints:
o Required Fields: Ensure that certain fields (e.g., user_id, email) are always
present in the document.
o Data Types: Enforce that fields conform to a specific data type (e.g., a field must
be a string, number, or array).
o Range Validation: Ensure that numeric or date values fall within expected
ranges.
Data Integrity: Since NoSQL databases prioritize availability over consistency in some
cases, ensure that the system includes proper mechanisms for validating data integrity,
such as:
o Optimistic Locking: Avoids conflicts during concurrent updates.
o Consistency Checks: Run periodic checks to ensure the data is synchronized and
valid across multiple nodes.
Setting up the MongoDB environment involves ensuring that the database is configured for
optimal performance, scalability, and usability. This process includes setting up the necessary
tools, environments, and configurations for both development and production use.
MongoDB is known for its horizontal scalability, which allows it to handle increasing data
volumes by distributing data across multiple servers. Here are the key aspects of MongoDB's
scalability:
Sharding:
MongoDB uses sharding to partition data across multiple servers. This ensures that large
datasets can be distributed and processed efficiently.
Shard Key: A key is chosen to distribute data, ensuring an even load across the cluster.
Horizontal Scalability: New nodes (servers) can be added to handle increased
workloads without impacting performance.
Replica Sets:
MongoDB uses replica sets to ensure high availability and fault tolerance.
A replica set consists of a primary node (where write operations are directed) and
secondary nodes (which replicate the data for backup and fault tolerance).
Automatic Failover: If the primary node goes down, one of the secondary nodes will
automatically become the new primary.
Load Balancing:
MongoDB distributes queries across shards and replica sets, ensuring that the system can
handle a large number of concurrent read and write operations.
Elastic Scalability: MongoDB can scale up and down dynamically to meet fluctuating
data loads.
MongoDB can be set up in multiple environments depending on the use case and deployment
scenario. The three most common environments are MongoDB Shell, Compass, and Atlas.
MongoDB Shell (Mongosh) is the command-line interface for interacting with MongoDB.
1. Install MongoDB:
o Download and install MongoDB from the official MongoDB website.
o Ensure that MongoDB is added to the system's path for easy access from the
terminal or command prompt.
2. Using the Shell (Mongosh):
o After installation, open the terminal or command prompt and run:
bash
Copy code
mongosh
o This opens the MongoDB shell, where you can execute MongoDB commands,
run JavaScript code, and manage your database.
3. Basic Shell Commands:
o Show Databases:
bash
Copy code
show dbs
o Create/Use a Database:
bash
Copy code
use myDatabase
o Insert Data:
bash
Copy code
db.myCollection.insert({ name: "John", age: 30 })
o Query Data:
bash
Copy code
db.myCollection.find({ name: "John" })
MongoDB Compass is a graphical user interface (GUI) for MongoDB that provides an easier
way to visualize and manage data without using the command line.
MongoDB Atlas is MongoDB's fully managed cloud database service, which simplifies database
deployment and management.
bash
Copy code
mongodb+srv://username:[email protected]/myDatabase?r
etryWrites=true&w=majority
1. Shell Example:
o After starting mongosh, you can insert a document into a new collection:
bash
Copy code
use school
db.students.insert({ name: "Alice", age: 21, course: "Software
Development" })
db.students.find()
2. Compass Example:
o Use the GUI to visualize the students collection you created in the shell, and run
a query like:
bash
Copy code
{ "course": "Software Development" }
3. Atlas Example:
o Deploy a production-ready cluster on MongoDB Atlas, connect using the
connection string, and perform the same operations:
bash
Copy code
use school
db.students.insert({ name: "Bob", age: 23, course: "Data Science"
})
Several tools are available to help visualize and draw NoSQL database structures, including
MongoDB. Here are some popular options:
Hackolade
Studio 3T
o Key Features: Visual query builder, schema explorer, export schema diagrams,
data visualization.
DBSchema
Draw.io
Lucidchart
Edraw Max is a versatile diagramming tool that supports database diagrams, including
NoSQL databases like MongoDB. Here’s how you can install and use it:
Installation Steps:
o Click on the "Download" button to get the installer for your operating system
(Windows, macOS, or Linux).
o Follow the on-screen instructions to install the tool on your computer. It will
involve agreeing to the license agreement and choosing the installation directory.
o Once installed, open Edraw Max from your desktop or start menu.
o Use the provided templates and tools to design your MongoDB database schema.
Template library: Use pre-built database design templates or start from scratch.
Collaboration: Share diagrams with team members and work collaboratively on database
designs.
Export options: Export your diagrams as PNG, PDF, SVG, and more for easy sharing.
2.2 Conceptual Data Modeling is created based on the structure of the data and its
relationships.
● Creating a conceptual data model is an essential first step in database design. It represents
the entities, relationships, and data flow at a high level, without delving into the technical
details. Here's a detailed guide on how to approach this for a NoSQL database like
MongoDB:
1. Identify Collections
o Students
o Courses
o Instructors
o Departments
Referencing: Related data is stored in different documents, with references (like foreign
keys in relational databases).
● Example:
● When planning for scalability, you should define how your data will be distributed across
different servers. This includes sharding (splitting data across multiple nodes) and replication
(duplicating data across nodes for high availability).
Sharding:
o Identify collections that will grow large and may need sharding (e.g., "Students"
or "Courses" in a large university system).
o Choose a shard key that helps evenly distribute data across servers (e.g., student
IDs or course IDs).
Replication:
● A Conceptual Data Model should be visualized using diagrams to represent the entities,
relationships, and data flow. Two popular tools to visualize NoSQL database models are:
UML (Unified Modeling Language) can be used to visually represent the entities
(collections) and their relationships.
● Example:
DFDs illustrate how data flows through the system. They represent the flow of
information between external entities, processes, and data stores (collections).
In a MongoDB context, the data stores would represent the collections, and the processes
would represent how data is created, read, updated, and deleted (CRUD operations).
● Example:
● Combining all the steps above, you can design a high-level conceptual data model. This
model will focus on the overall structure of your MongoDB collections, their relationships,
and how the data will be accessed and distributed.
Model relationships: Define how collections relate to each other (embed or reference).
Sharding & Replication: Plan for scalability by defining shard keys and identifying
replicated collections.
Visualize the model: Use UML diagrams and DFDs to represent the data model.
Relationships:
Sharding: Use student ID as the shard key for the "Students" collection.
Replication: Replicate the "Students" and "Courses" collections for high availability.
● UML and DFD diagrams help in visualizing this model, capturing how entities interact and
how data flows through the system.
When designing a MongoDB database schema, it’s essential to focus on factors that will
optimize performance, maintainability, and scalability. Here's a structured approach that includes
identifying workloads, defining collections, relationships, validation, normalization, and
applying design patterns.
Understanding the application workload is crucial because MongoDB’s schema design should be
guided by how the data is accessed and used in the application. Consider the following:
Workload: Mostly reads (users viewing product pages) but with significant write
operations during product creation and order placement.
Frequent queries: Retrieving product information, fetching user orders, filtering products
by category.
2. Define Collection Structure
Embedding: Used when related data is frequently accessed together and can be stored
within the same document.
o Example: An order document can embed the product details since they are
typically viewed together.
Referencing: Used when related data needs to be separated for flexibility or when it’s
accessed independently.
o Example: A separate collection for users and another for their orders, with user
IDs referenced in the order documents.
Key Considerations:
Document size limit: MongoDB documents have a size limit of 16MB, so you must
avoid over-embedding large datasets.
Frequent updates: If an embedded document changes frequently, it might be better to
reference it instead to avoid unnecessary large document rewrites.
One-to-One: Embed the related document directly if it’s always accessed together.
o Example: A user profile with address details.
One-to-Many:
o Embed if the “many” side is relatively small and frequently accessed with the
parent.
o Use references if the “many” side is large or frequently accessed independently.
o Example: A product can have multiple reviews, but the reviews may be stored in
a separate collection if there are a lot of them.
Many-to-Many: Use references to maintain flexibility and avoid document bloat.
o Example: A many-to-many relationship between students and courses could be
managed through references with a separate collection (e.g., enrollments) to
track which students are enrolled in which courses.
Example:
MongoDB supports flexible, schema-less designs, but using schema validation can help enforce
structure and consistency.
Validation: You can define validation rules to ensure data consistency and enforce constraints
like required fields, field types, etc.
Example:
Embed the product details (like name and price) inside the order document, but reference
the user by userId to avoid duplicating user information in every order.
MongoDB has several design patterns that can be applied to optimize schema design:
Extended Reference Pattern: Use this pattern to partially embed documents and also
maintain references for flexibility.
o Example: For each order, embed basic product details (name, price) for faster
access, but also store a reference (productId) to the full product document.
Bucket Pattern: This is used to group data into fixed-size "buckets" for performance
reasons. It’s especially useful for time-series data.
o Example: In a logging system, you could store logs grouped by hour in a single
document (bucket).
Subset Pattern: Store frequently accessed data as a subset inside the document, while
less frequently accessed data is referenced elsewhere.
o Example: In a blog post, store the latest comments inside the post document but
reference the full comment history in a separate collection.
Example: