Nosql Notes
Nosql Notes
SWDND501
BDCPC301 - Develop NoSQL Database
Competence
RQF Level: 5 Learning Hours
60
Credits: 6
1
LO1.Prepare database environment
I.C.1 Identifying Database Requirements
NoSQL
NoSQL :stands for "Not Only SQL" and refers to a variety of database
technologies designed to handle different data storage needs beyond the
capabilities of traditional relational databases.
MongoDB
Availability
Documents
- Example:
json
2
{
"email": "[email protected]",
"age": 30,
"address": {
"city": "Anytown",
"state": "CA"
Collection
Indexing
-Example:
javascript
db.users.createIndex({ email: 1 })
Optimistic Locking
3
A concurrency control method that assumes multiple transactions can
complete without affecting each other. Each transaction works with a
snapshot of the data and only commits changes if no other transaction
has modified the data.
Relationships
- Example:
- Embedding:
json
"orders": [
- Referencing:
json
"order_ids": [1, 2] }
Data Model
4
- The logical structure of a database, including the relationships and
constraints among different data elements.
Schema
Mongosh
Summary
- Data Types and Structure: What kind of data will be stored? (e.g., user
profiles, transactions, logs)
- Volume of Data: How much data do you expect to store initially and
over time?
- Access Patterns: How will the data be accessed? (e.g., frequent reads,
occasional writes, complex queries)
5
- Performance: What are the performance requirements? (e.g., response
time, latency)
- Dynamic: Collections can grow as needed, and new fields can be added
to documents without requiring schema changes.
6
- Eventual Consistency: Some NoSQL databases provide eventual
consistency, ensuring high availability and partition tolerance in
distributed environments.
1. Key-Value Stores:
2. Document Stores:
3. Column-Family Stores:
4. Graph Databases:
7
- **String: A sequence of characters. Used for storing text.
8
By understanding user requirements, characteristics of collections,
features of NoSQL databases, types of NoSQL databases, and supported
data types in MongoDB, you can design and implement a robust and
efficient database system tailored to your application's needs.
Use cases help identify how users will interact with the system and what
functionality is required. Here’s how to define use cases:
1. Identify Actors: Determine who will interact with the system (e.g.,
end-users, administrators, external systems).
2. Define Goals: What do the actors want to achieve? (e.g., search for
products, manage inventory, generate reports).
9
- Stakeholders : Individuals or groups with an interest in the project
(e.g., business executives, IT managers, data analysts).
- End-Users: The people who will use the database system on a daily
basis (e.g., employees, customers).
2. Capture Requirements
3. Categorize Requirements
- Types:
10
5. Validate Requirements
1. Data Collection
- Sources: Identify where your data will come from (e.g., user inputs,
transactional data).
2. Data Profiling
3. Data Modelling
- Define Models: Create a data model that represents how data will
be organized and related.
11
- NoSQL Considerations: Choose an appropriate NoSQL model (e.g.,
document, key-value) based on data structure and access patterns.
4. Data Validation
5. Performance Analysis
- Estimate Growth: Project data growth over time and plan for
scalability.
By following these processes, you can ensure that your NoSQL database
is well-designed, meets user needs, and performs efficiently.
Data validation ensures the accuracy and quality of data being stored in
your database. For MongoDB, data validation involves defining rules and
constraints that documents must meet before being accepted into the
database. Here’s how to implement data validation:
12
1. Schema Validation:
- **Example**:
```javascript
db.createCollection("users", {
validator: {
$jsonSchema: {
bsonType: "object",
properties: {
name: {
bsonType: "string",
},
email: {
bsonType: "string",
pattern: "^.+@.+\..+$",
},
age: {
bsonType: "int",
13
}
},
});
```
- Use BSON Types: Ensure that fields conform to specific BSON types,
such as `int`, `string`, `date`, etc.
- Example:
```javascript
```
3. Regular Expressions:
- Example:
```javascript
14
email: {
$regex: /^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/,
$options: "i"
```
1. **Horizontal Scaling**:
2. **Replication**:
15
- **Replica Sets**: MongoDB uses replica sets to provide redundancy
and high availability. Each replica set contains a primary node and one or
more secondary nodes.
3. **Load Balancing**:
4. **Performance Optimization**:
```bash
mongosh "mongodb://localhost:27017"
```
2. **Compass Environment**
16
- **Installation**: Download and install MongoDB Compass, the official
GUI for MongoDB.
3. **Atlas Environment**
17
Before designing the schema, thoroughly understand the application’s
requirements:
- **Data Structure**: Identify what data you need to store (e.g., user
profiles, product details, transactions).
1. **Users Collection**:
- **Document**:
```json
18
{
"_id": ObjectId("user123"),
"email": "[email protected]",
"passwordHash": "hashed_password",
"address": {
"city": "Anytown",
"state": "CA",
"zip": "12345"
},
"orders": [
"orderId": ObjectId("order456"),
"date": ISODate("2023-07-29T12:34:56Z"),
"total": 99.99
```
2. **Products Collection**:
- **Document**:
```json
19
"_id": ObjectId("product789"),
"name": "Laptop",
"price": 799.99,
"stock": 25,
```
3. **Orders Collection**:
- **Document**:
```json
"_id": ObjectId("order456"),
"userId": ObjectId("user123"),
"items": [
"productId": ObjectId("product789"),
"quantity": 1,
"price": 799.99
],
"total": 799.99,
"status": "Shipped",
"shippingAddress": {
20
"street": "123 Main St",
"city": "Anytown",
"state": "CA",
"zip": "12345"
```
### 3. **Indexing**
- **Create Indexes**:
```javascript
```
21
- **Compound Index**: Index on multiple fields to support complex
queries.
```javascript
```
```javascript
```
- **Considerations**:
### 4. **Sharding**
```javascript
```
- **Set Up Sharding**:
- **Shard Key**: Set the shard key when creating a sharded collection.
22
```javascript
```
### 5. **Replication**
```javascript
rs.initiate({
_id: "ecommerceReplicaSet",
members: [
});
```
23
- **Define Validation Rules**:
```javascript
db.createCollection("users", {
validator: {
$jsonSchema: {
bsonType: "object",
properties: {
name: {
bsonType: "string",
},
email: {
bsonType: "string",
pattern: "^.+@.+\..+$",
},
passwordHash: {
bsonType: "string",
},
24
validationAction: "warn"
});
```
### 7. **Security**
- **Backup and Restore**: Regularly back up your data and test restore
procedures.
### Summary
25
### Selecting Tools for Drawing Databases
1. **MongoDB Compass**:
- **Website**: [MongoDB
Compass](https://www.mongodb.com/products/compass)
2. **Draw.io (diagrams.net)**:
- **Website**: [Draw.io](https://www.diagrams.net/)
3. **Lucidchart**:
26
- **Description**: A cloud-based diagramming tool that supports NoSQL
database design.
- **Website**: [Lucidchart](https://www.lucidchart.com/)
4. **ERDPlus**:
- **Website**: [ERDPlus](https://erdplus.com/)
5. **DbSchema**:
- **Website**: [DbSchema](https://www.dbschema.com/)
27
- **Choose Your Version**: Select the appropriate version for your
operating system (Windows, macOS, or Linux).
- **Create a Diagram**:
- **Save and Export**: Save your work in Edraw Max format or export it
to other formats such as PDF or PNG for sharing.
By using these tools, you can effectively visualize and design your NoSQL
database schemas, which can greatly aid in the development and
management of your database systems.
28
Creating a conceptual data model for a NoSQL database involves defining
the high-level structure and relationships of your data. Here’s how to
approach this process for a MongoDB database:
- **Examples of Collections**:
29
- **Example**: Embedding order details within a user document if the
primary access pattern is fetching user orders.
**Example of Relationships**:
- **User and Orders**: A user can have multiple orders. Each order can
reference the user ID.
**Example**:
```javascript
db.adminCommand({
30
shardCollection: "ecommerce.orders",
key: { orderDate: 1 }
});
```
- **Replica Set**: Configure a replica set with one primary node and
multiple secondary nodes to replicate data.
**Example**:
```javascript
rs.initiate({
_id: "ecommerceReplicaSet",
members: [
});
```
31
- **UML Class Diagrams**:
- **Example**:
**Tool**: You can use tools like Lucidchart, Draw.io, or Edraw Max to
create UML Class Diagrams.
- **Example**:
- **Data Flow**: Data flows from the User to the Orders collection and
references the Products collection.
**Tool**: You can create DFDs using tools like Lucidchart, Draw.io, or
Microsoft Visio.
32
1. **UML Class Diagram**:
- **User**:
- **Order**:
- **Product**:
- **Review**:
- **Data Stores**:
- **Data Flow**:
By following these steps and using these tools, you can effectively create
a conceptual data model that helps in designing and understanding your
MongoDB database schema.
33
Designing a conceptual data model involves defining the structure and
relationships of your data in MongoDB. This helps ensure that your
database schema is well-organized, efficient, and scalable. Here’s a step-
by-step guide to designing a MongoDB database schema:
- **Types of Workloads**:
- **Considerations**:
34
- **Identify Collections**: Define what collections you need based on
entities in your application.
**Example Collections**:
- **Users Collection**:
```json
"_id": ObjectId("user123"),
"email": "[email protected]",
"passwordHash": "hashed_password",
"address": {
"city": "Anytown",
"state": "CA",
"zip": "12345"
35
},
"orders": [
"orderId": ObjectId("order456"),
"date": ISODate("2023-07-29T12:34:56Z"),
"total": 99.99
```
- **Products Collection**:
```json
"_id": ObjectId("product789"),
"name": "Laptop",
"price": 799.99,
"stock": 25,
```
- **Orders Collection**:
```json
36
{
"_id": ObjectId("order456"),
"userId": ObjectId("user123"),
"items": [
"productId": ObjectId("product789"),
"quantity": 1,
"price": 799.99
],
"total": 799.99,
"status": "Shipped",
"shippingAddress": {
"city": "Anytown",
"state": "CA",
"zip": "12345"
```
37
- **Embedding**:
- **Referencing**:
**Example**:
- **User and Orders**: Embed orders within the user document if the
primary access pattern is to retrieve user details along with their orders.
- **Validation**:
```javascript
db.createCollection("users", {
validator: {
38
$jsonSchema: {
bsonType: "object",
properties: {
name: {
bsonType: "string",
},
email: {
bsonType: "string",
pattern: "^.+@.+\..+$",
},
passwordHash: {
bsonType: "string",
},
validationAction: "warn"
});
```
- **Normalization**:
39
- **Avoid Redundant Data**: Store related data in separate collections
to reduce redundancy.
- **Reference Pattern**:
- **Aggregation Pattern**:
- **Bucket Pattern**:
40
- **Example**: Grouping logs or events into buckets based on time or
category.
### Summary
41
Understanding the application workload is crucial for designing a schema
that meets performance and scalability requirements.
- **Types of Workloads**:
- **Considerations**:
**Example Collections**:
42
- **Products**: Stores product information.
- **Users Collection**:
```json
"_id": ObjectId("user123"),
"email": "[email protected]",
"passwordHash": "hashed_password",
"address": {
"city": "Anytown",
"state": "CA",
"zip": "12345"
},
"orders": [
"orderId": ObjectId("order456"),
"date": ISODate("2023-07-29T12:34:56Z"),
"total": 99.99
43
}
```
- **Products Collection**:
```json
"_id": ObjectId("product789"),
"name": "Laptop",
"price": 799.99,
"stock": 25,
```
- **Orders Collection**:
```json
"_id": ObjectId("order456"),
"userId": ObjectId("user123"),
"items": [
"productId": ObjectId("product789"),
44
"quantity": 1,
"price": 799.99
],
"total": 799.99,
"status": "Shipped",
"shippingAddress": {
"city": "Anytown",
"state": "CA",
"zip": "12345"
```
- **Embedding**:
- **Referencing**:
45
- **Use Case**: When data is accessed independently or for many-to-
many relationships.
**Example**:
- **User and Orders**: Embed orders within the user document if the
primary access pattern is to retrieve user details along with their orders.
- **Validation**:
```javascript
db.createCollection("users", {
validator: {
$jsonSchema: {
bsonType: "object",
properties: {
name: {
bsonType: "string",
46
description: "Name is required and must be a string"
},
email: {
bsonType: "string",
pattern: "^.+@.+\..+$",
},
passwordHash: {
bsonType: "string",
},
validationAction: "warn"
});
```
- **Normalization**:
47
Utilize design patterns that are well-suited for MongoDB to optimize
performance and scalability.
- **Reference Pattern**:
- **Aggregation Pattern**:
- **Bucket Pattern**:
### Summary
48
1. **Identifying the Application Workload**: Understand the types of
operations and performance requirements.
49
- **Cloud Environment**: Use MongoDB Atlas for managed cloud
deployments.
Once your environment is set up, you can start creating collections and
defining the structure of your documents. Here’s how to do it:
```bash
```
```javascript
db.createCollection("users");
50
// Create 'products' collection
db.createCollection("products");
db.createCollection("orders");
db.createCollection("reviews");
```
**Example Documents**:
- **Users Collection**:
```javascript
db.users.insertOne({
"_id": ObjectId("user123"),
"email": "[email protected]",
"passwordHash": "hashed_password",
"address": {
51
"street": "123 Main St",
"city": "Anytown",
"state": "CA",
"zip": "12345"
},
"orders": [
"orderId": ObjectId("order456"),
"date": ISODate("2023-07-29T12:34:56Z"),
"total": 99.99
});
```
- **Products Collection**:
```javascript
db.products.insertOne({
"_id": ObjectId("product789"),
"name": "Laptop",
"price": 799.99,
"stock": 25,
});
52
```
- **Orders Collection**:
```javascript
db.orders.insertOne({
"_id": ObjectId("order456"),
"userId": ObjectId("user123"),
"items": [
"productId": ObjectId("product789"),
"quantity": 1,
"price": 799.99
],
"total": 799.99,
"status": "Shipped",
"shippingAddress": {
"city": "Anytown",
"state": "CA",
"zip": "12345"
});
```
53
### **3. Set Up Indexes**
**Example**:
```javascript
```
```javascript
```
54
**Example**:
```javascript
sh.enableSharding("ecommerce");
```
**Example**:
```javascript
rs.initiate({
_id: "ecommerceReplicaSet",
members: [
});
```
55
### **5. Implement Data Validation**
**Example**:
```javascript
db.createCollection("users", {
validator: {
$jsonSchema: {
bsonType: "object",
properties: {
name: {
bsonType: "string",
},
email: {
bsonType: "string",
pattern: "^.+@.+\\..+$",
},
passwordHash: {
bsonType: "string",
56
description: "Password hash is required and must be a string"
},
validationAction: "warn"
});
```
### **Summary**
57
2. **Creating Collections and Defining Document Structures**: Set up
collections and sample documents.
In MongoDB, you don't explicitly create a database until you insert data
into it. When you use a database that doesn’t exist, MongoDB creates it
when you first insert data.
**Example**:
```javascript
use ecommerce;
58
// Insert a sample document to create the database
```
**Explicit Creation**:
```javascript
db.createCollection("users");
```
**Implicit Creation**:
```javascript
db.products.insertOne({
"name": "Laptop",
"price": 799.99
});
```
59
### **2. Drop**
**Example**:
```javascript
db.dropDatabase();
```
**Example**:
```javascript
db.users.drop();
```
60
### **3. Rename**
**Steps**:
1. **Create a New Database**: Copy data from the old database to a new
database.
2. **Drop the Old Database**: After verifying data integrity, drop the old
database.
**Example**:
```javascript
use oldDatabase;
use newDatabase;
db.oldCollection.find().forEach(function(doc) {
db.newCollection.insert(doc);
});
61
// Drop the old database
db.oldDatabase.dropDatabase();
```
**Example**:
```javascript
db.oldCollection.renameCollection("newCollection");
```
**Note**: The collection must not exist in the target database when
renaming.
### **Summary**
**1. Create**
**2. Drop**
62
- **Database**: Use `db.dropDatabase()` to drop the entire database.
**3. Rename**
- **Database**: Manually copy data to a new database and drop the old
one.
- **Collections**: Use
`db.collectionName.renameCollection("newCollectionName")` to rename
collections.
**Example:**
```javascript
63
db.users.insertOne({
"email": "[email protected]",
"age": 30
});
db.products.insertMany([
]);
```
**Example:**
```javascript
db.users.updateOne(
{ "email": "[email protected]" },
{ $set: { "age": 31 } }
);
64
// Update multiple documents
db.products.updateMany(
);
```
**Example:**
```javascript
```
65
**Example:**
```javascript
// Replace a document
db.users.replaceOne(
{ "email": "[email protected]" },
"email": "[email protected]",
"age": 40
);
```
**Example:**
```javascript
66
```
**Query Operators**:
- `$eq`: Equal
**Example:**
```javascript
```
67
### **2. Bulk Write Operations**
**Example:**
```javascript
db.users.bulkWrite([
insertOne: {
},
updateOne: {
},
deleteOne: {
68
}
]);
```
**Example:**
```javascript
db.products.aggregate([
$group: {
_id: null,
]);
db.products.aggregate([
$group: {
69
_id: "$category",
count: { $sum: 1 }
]);
```
**Aggregation Stages**:
### **Summary**
70
**2. Bulk Write Operations**: Use `bulkWrite()` for multiple operations in
one request.
```javascript
db.getCollectionNames();
```
```javascript
db.users.drop();
```
71
#### **c. Create Index**
```javascript
```
```javascript
db.users.getIndexes();
```
```javascript
```
```javascript
72
// Skip the first 5 documents and get the next 5
```
```javascript
```
```javascript
db.adminCommand('listDatabases');
```
```javascript
db.dropDatabase();
```
73
### **4. Query Plan Cache Methods**
```javascript
```
```javascript
```
```javascript
db.users.bulkWrite([
]);
74
```
```javascript
db.createUser({
user: "newUser",
pwd: "password123",
});
```
```javascript
db.dropUser("oldUser");
```
```javascript
75
db.createRole({
role: "customRole",
privileges: [
],
roles: []
});
```
```javascript
db.dropRole("customRole");
```
```javascript
rs.status();
```
76
```javascript
rs.initiate({
_id: "myReplicaSet",
members: [
});
```
```javascript
sh.enableSharding("ecommerce");
```
```javascript
```
77
### **10. Free Monitoring Methods**
```javascript
db.currentOp();
```
```javascript
db.serverStatus();
```
```javascript
var id = ObjectId();
```
```javascript
78
// Create a new Date object
```
```javascript
use ecommerce;
```
```javascript
```
```javascript
```
79
#### **b. Manage Atlas Search Index**
```javascript
// Manage indexes via Atlas UI or API; mongosh does not directly handle
Atlas search indexing.
```
### **Summary**
**2. Cursor Methods**: Iterate, limit, skip, and sort query results.
**4. Query Plan Cache Methods**: View and clear query plans.
80
Query optimization is crucial for maintaining high performance and
efficiency in MongoDB. It involves analyzing and improving the
performance of queries to ensure they execute as quickly and efficiently
as possible. Here’s how to apply query optimizations in MongoDB:
```javascript
```
```javascript
```
```javascript
81
```
```javascript
```
```javascript
```
- **Use Projections**: Only retrieve the fields you need to reduce the
amount of data transferred.
```javascript
```
```javascript
db.collection.find().limit(10);
```
82
- **Sort Results Efficiently**: Ensure the sort operation uses an index to
improve performance.
```javascript
```
```javascript
```
83
```javascript
db.currentOp();
```
```javascript
db.serverStatus();
```
- **Profiler**: Use the database profiler to log and analyze slow queries.
```javascript
```
```javascript
```
- **Index Usage**: Ensure queries are utilizing indexes effectively and not
performing full collection scans.
84
### **3. Optimize Query Performance**
```javascript
```
```javascript
db.collection.dropIndex("indexName");
```
```javascript
```
85
#### **c. Optimize Aggregations**
```javascript
db.collection.aggregate([
]);
```
### **Summary**
86
- **Query Optimization**: Use projections, limits, and covered queries.
87
#### **a. Monitor Database Performance**
```javascript
db.serverStatus();
```
```javascript
db.currentOp();
```
```javascript
```
88
#### **b. Analyze and Optimize Performance**
```javascript
```
```javascript
db.collection.dropIndex("indexName");
```
```bash
mongodump --uri="mongodb://localhost:27017/mydatabase"
--out=/backup/directory
```
89
- **Atlas Backup**: If using MongoDB Atlas, configure automated backups
through the Atlas UI.
```bash
```
- **Create User**: Add new users with specific roles and privileges.
```javascript
db.createUser({
user: "username",
pwd: "password",
});
90
```
```javascript
db.dropUser("username");
```
```javascript
```
```javascript
db.createRole({
role: "customRole",
privileges: [
],
roles: []
});
```
91
- **Drop Role**: Remove roles that are no longer needed.
```javascript
db.dropRole("customRole");
```
92
- **Plan for Failures**: Have a disaster recovery plan in place that
includes backup strategies and procedures for data recovery.
```javascript
sh.enableSharding("mydatabase");
```
```javascript
```
```javascript
rs.initiate({
_id: "myReplicaSet",
members: [
93
{ _id: 0, host: "mongodb0.example.net:27017" },
});
```
```javascript
rs.status();
```
```javascript
db.collection.reIndex();
```
94
- **Remove Unused Collections**: Drop collections that are no longer
needed.
```javascript
db.collection.drop();
```
```javascript
```
### **Summary**
95
- Automate backups, test restores, and plan for disaster recovery.
- Enable and manage sharding and replica sets for scalability and high
availability.
- **Admin**: Has full control over all databases and collections. Manages
users, roles, and global settings.
- **Read-Only Users**: Can only read data but cannot modify or delete it.
96
#### **b. Creating Users**
```javascript
db.createUser({
user: "newUser",
pwd: "password",
roles: [
});
```
- `roles`: Specifies the roles and the database on which these roles are
applied.
97
```javascript
db.createRole({
role: "customRole",
privileges: [
],
roles: []
});
```
```javascript
```
```javascript
db.revokeRolesFromUser("username", ["customRole"]);
```
98
```javascript
db.dropRole("customRole");
```
```yaml
security:
authorization: "enabled"
```
```javascript
use admin;
99
db.createUser({
user: "admin",
pwd: "adminPassword",
});
```
- **Define Roles**: Create roles with specific privileges for various users
or applications.
```yaml
security:
enableEncryption: true
encryptionKeyFile: /path/to/keyfile
```
100
- **Encryption in Transit**: Use TLS/SSL to encrypt data in transit
between the client and server.
```yaml
net:
ssl:
mode: requireSSL
PEMKeyFile: /path/to/ssl.pem
```
```yaml
auditLog:
destination: file
format: json
path: /path/to/audit.log
101
```
```bash
mongodump --uri="mongodb://localhost:27017/mydatabase"
--out=/backup/directory
```
```bash
```
### **Summary**
102
**1. Management of Database Users**:
- **Manage Roles and Privileges**: Create, assign, and revoke roles using
`db.createRole()`, `db.grantRolesToUser()`, and
`db.revokeRolesFromUser()`.
103
#### **a. On-Premises**
- **Advantages**:
- **Disadvantages**:
- **Advantages**:
- **Disadvantages**:
104
#### **c. Hybrid**
- **Advantages**:
- **Disadvantages**:
- **Advantages**:
- **Disadvantages**:
105
- Limited scalability and potential for single points of failure.
- **Components**:
- **Secondary**: Nodes that replicate data from the primary and can
serve read requests.
- **Advantages**:
- **Disadvantages**:
- **Components**:
106
- **Shard**: A MongoDB instance or replica set that holds a subset of
the data.
- **Advantages**:
- **Disadvantages**:
- **Bad Shard Key**: Should avoid fields with low cardinality or high
write contention.
107
#### **b. Adding Shards**
```javascript
sh.addShard("shardA/hostname1:27017,hostname2:27017");
```
```javascript
```
### **Summary**
108
**1. Deployment Options**:
109