0% found this document useful (0 votes)
30 views22 pages

Dod Unit2

Document oriented database

Uploaded by

Madhu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views22 pages

Dod Unit2

Document oriented database

Uploaded by

Madhu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 22

UNIT- 2

MongoDB Features and Installation


MongoDB Overview
MongoDB is a NoSQL database management system (DBMS) developed
by MongoDB, Inc. It provides a flexible, document-based data storage
system that allows users to store, query, and manage data efficiently.
Unlike traditional relational databases, MongoDB doesn't use tables and
rows but instead stores data in JSON-like documents with dynamic
schemas. This makes it suitable for scenarios where data needs to evolve
over time, or where the structure might not be fixed.
Key Features of MongoDB
 Document-Oriented: MongoDB stores data in a flexible, JSON-like
format called BSON (Binary JSON). This allows for nested data,
arrays, and a variety of data types to be stored within a single
document.
 Schemaless: Unlike relational databases, MongoDB doesn't require
a predefined schema. This flexibility is essential when working with
rapidly changing datasets.
 Scalability: MongoDB is designed to scale horizontally by
partitioning data across multiple servers, also known as sharding.
This ensures that large datasets can be distributed across multiple
nodes for performance and fault tolerance.
 Indexing: MongoDB supports various types of indexes to optimize
query performance. Indexes can be created on fields to improve
read efficiency and support fast lookups.
 High Availability: MongoDB supports replication using replica sets,
where data is automatically replicated across multiple servers for
failover and redundancy. If one server fails, another can take over to
ensure high availability.
 Rich Query Language: MongoDB provides a robust and flexible
query language that supports CRUD (Create, Read, Update, Delete)
operations, along with powerful features like filtering, sorting,
aggregation, and full-text search.
 Aggregation Framework: MongoDB provides an efficient
aggregation framework to perform complex data analysis and
transformation tasks.
Installation of MongoDB
MongoDB can be installed on a wide range of platforms, including
Windows, macOS, and Linux.
Installation on Windows:
1. Download: Visit the MongoDB official website and download the
installer for Windows.
2. Install: Follow the installation instructions, choose "Complete"
installation, and select the necessary components.
3. Set Path: Add MongoDB to the system PATH to enable command-
line usage.
4. Start MongoDB: Use the mongod command to start the MongoDB
server.
Conclusion
MongoDB’s flexibility, scalability, and ease of use make it a popular choice
for a variety of applications, from small projects to large-scale enterprise
solutions. Whether you're managing a rapidly evolving dataset or building
a highly scalable application, MongoDB offers the tools and infrastructure
needed for efficient database management.

The Need for NoSQL Databases


Why Relational Databases Aren’t Always Sufficient
Traditional relational databases, such as MySQL and PostgreSQL, have
been the go-to database solutions for decades. They follow strict
schemas, use tables, and support SQL (Structured Query Language).
However, as applications became more complex and the volume and
variety of data grew, limitations of relational databases became apparent.
1. Rigid Schemas: In a relational database, the schema (structure)
must be predefined and enforced. This makes it difficult to handle
dynamic and changing data, where the schema may evolve over
time.
2. Scalability: Scaling relational databases vertically (increasing server
resources) is costly and has physical limits. Scaling horizontally
(distributing data across multiple servers) is challenging due to the
nature of relational data, which requires complex joins and
transactions.
3. Data Variety: Modern applications often deal with varied types of
data such as unstructured or semi-structured (e.g., social media
posts, logs, videos). Relational databases struggle to handle such
data types efficiently.
4. High Read/Write Demands: Many modern applications have
massive read and write requirements. Handling millions of
operations per second can strain traditional relational databases,
where locking, consistency, and complex joins are resource-
intensive.
The Rise of NoSQL
NoSQL databases arose as an alternative to relational databases, designed
to address these specific limitations. The "NoSQL" term, standing for "Not
Only SQL," indicates that these databases can store data in different
formats, such as key-value pairs, documents, columns, and graphs.
Advantages of NoSQL
1. Schema Flexibility: NoSQL databases don't require a fixed schema,
allowing dynamic and evolving data structures.
2. Horizontal Scalability: NoSQL databases are designed to scale
across multiple servers (sharding) easily, distributing the load and
enabling faster processing of large datasets.
3. Handling Big Data: NoSQL databases are optimized for handling
massive amounts of data, often required in real-time web
applications and IoT devices.
4. Performance: With minimal or no join operations and reduced
overhead from strict consistency requirements, NoSQL databases
provide faster read and write operations.
5. Cost-Effective Scalability: Since NoSQL databases are designed to
run on commodity hardware, they offer cost-effective ways to scale
out as opposed to the expensive vertical scaling of traditional
relational databases.
Conclusion
The need for NoSQL databases stems from the growing complexity, size,
and variety of modern data. With their flexibility and scalability, NoSQL
databases have become indispensable in fields like big data analytics, IoT,
social media, and real-time applications.

What Are NoSQL Databases?


Definition and Types of NoSQL Databases
NoSQL databases are a category of database systems designed to store
and retrieve data without the constraints of relational models. They
support a variety of data models, including key-value, document, column-
family, and graph.
Types of NoSQL Databases:
1. Key-Value Stores:
o Key-value stores represent the simplest form of NoSQL
databases.
o Data is stored as a pair: a unique key and the corresponding
value.
o These databases are ideal for scenarios where data retrieval is
based solely on a key.
o Examples: Redis, Amazon DynamoDB.
2. Document Databases:
o Document stores represent and store data as documents,
typically in JSON or BSON format.
o Each document is an independent unit and can have different
structures, making them ideal for handling semi-structured
data.
o Examples: MongoDB, CouchDB.
3. Column-Family Stores:
o Column-family stores use a table-like structure, but instead of
storing data in rows, they store data in columns.
o Columns are grouped into column families, allowing for more
efficient storage of sparse data.
o Examples: Apache Cassandra, HBase.
4. Graph Databases:
o Graph databases store data in graph structures with nodes
(entities) and edges (relationships).
o They are ideal for complex queries about relationships
between data, such as social networks.
o Examples: Neo4j, Amazon Neptune.
Benefits of NoSQL Databases
1. High Flexibility: NoSQL databases support a wide range of data
models, enabling you to store unstructured, semi-structured, and
structured data without predefined schemas.
2. High Scalability: NoSQL databases can handle large amounts of data
across distributed systems, ensuring seamless scalability.
3. Faster Development: The absence of a rigid schema allows rapid
iterations and faster changes to the database structure.
4. Handling Big Data: NoSQL databases excel in scenarios involving big
data, real-time processing, and complex relationships between
datasets.
Conclusion
NoSQL databases are revolutionizing the way data is stored, retrieved,
and managed. With their ability to handle large-scale, diverse data
efficiently, they are becoming increasingly popular in domains like e-
commerce, IoT, machine learning, and more.
CAP Theorem:
The CAP Theorem, introduced by Eric Brewer in 2000, is a principle that
explains the trade-offs in distributed database systems. It states that a
distributed database can only provide two out of the following three
guarantees:
1. Consistency (C): Every read receives the most recent write, ensuring
that all nodes have the same data at any given time. In other words,
data remains consistent across distributed nodes.
2. Availability (A): Every request (read/write) gets a response, even if
some nodes are down. This guarantees that the system is always
available for use.
3. Partition Tolerance (P): The system continues to function despite
communication breakdowns between nodes. This means that the
system can tolerate network failures and partitions, ensuring that
data can still be processed in some form.

Trade-offs in the CAP Theorem


According to the CAP theorem, it’s impossible for a distributed system to
achieve all three guarantees (Consistency, Availability, and Partition
Tolerance) simultaneously. Instead, a system can only achieve two:
1. CP (Consistency and Partition Tolerance):
o These systems ensure that data remains consistent and can
tolerate network partitions. However, they may sacrifice
availability. In the event of a partition, the system might
refuse to process requests to maintain data consistency.
o Examples: HBase, MongoDB (in some configurations).
2. AP (Availability and Partition Tolerance):
o These systems remain available even if a network partition
occurs. However, they may sacrifice consistency, meaning that
reads could return stale or outdated data until the partition is
resolved.
o Examples: Cassandra, DynamoDB.
3. CA (Consistency and Availability):
o These systems provide both availability and consistency but
cannot tolerate partitions. If a partition occurs, the system
might become unavailable.
o Examples: Relational databases (in non-distributed
configurations).
Practical Application of CAP
In the real world, most distributed databases are AP (Availability and
Partition Tolerance) because availability is often more important for web
applications and large-scale distributed systems. However, systems can be
tuned to favor consistency or availability based on the application's
needs.
Conclusion
The CAP Theorem provides valuable insight into the limitations of
distributed systems. Developers must choose the two guarantees that
best suit their application’s needs, knowing that they cannot achieve all
three simultaneously. This theorem guides database design, helping to
balance the trade-offs between consistency, availability, and partition
tolerance.

BASE Approach:
What is BASE?
The BASE approach is an alternative to the ACID properties (Atomicity,
Consistency, Isolation, Durability) of traditional relational databases. It’s
used in the context of NoSQL databases to ensure availability and
scalability in distributed systems. BASE stands for:
1. Basically Available (BA):
o The system guarantees availability even in the presence of
failures. This doesn't mean that the system is always
consistent, but that users will get a response from the system,
even if it might not reflect the latest data.
2. Soft state (S):
o The state of the system may change over time, even without
input. Data might be replicated across multiple nodes, and
these copies might not be immediately consistent.
3. Eventual consistency (E):
o The system guarantees that, given enough time, data will
become consistent. This contrasts with the immediate
consistency of ACID. While BASE systems allow temporary
inconsistencies, they ensure that the data will eventually be
synchronized across all nodes.
Why BASE over ACID?
The BASE approach is more suited for NoSQL databases and modern
distributed systems because it prioritizes availability and scalability over
strong consistency:
 Relaxed Consistency: Unlike the strict consistency guarantees of
ACID, BASE systems allow temporary inconsistencies. This trade-off
allows for high availability and horizontal scalability, which is crucial
for systems dealing with massive datasets and real-time data.
 Fault Tolerance: BASE systems are designed to be fault-tolerant.
They handle node failures and network partitions gracefully,
ensuring that the system remains operational even in the presence
of failures.
Examples of BASE in NoSQL Databases
 Cassandra: Prioritizes availability and partition tolerance over strict
consistency. It uses eventual consistency, where data is eventually
synchronized across all nodes.
 Amazon DynamoDB: Implements a BASE approach where data is
eventually consistent unless strong consistency is explicitly
requested.

Conclusion
The BASE approach is well-suited for distributed systems where scalability,
availability, and fault tolerance are more important than strict
consistency. It offers an alternative to the ACID model, which is better
suited for traditional relational databases.

Types of NoSQL Databases


There are four primary types of NoSQL databases, each suited to different
use cases:
1. Key-Value Databases
o Overview: Data is stored as a collection of key-value pairs,
where the key is a unique identifier, and the value can be any
form of data (string, JSON, etc.).
o Use Cases: Caching, session management, and real-time data
processing.
o Examples: Redis, Amazon DynamoDB.
2. Document Databases
o Overview: Document databases store data as documents,
typically in JSON or BSON format. Each document can have a
unique structure, allowing for flexibility in the way data is
stored.
o Use Cases: Content management systems, blogs, e-commerce
platforms.
o Examples: MongoDB, CouchDB.
3. Column-Family Databases
o Overview: Data is stored in columns rather than rows.
Columns are grouped into families, which are stored together
for better performance on certain types of queries.
o Use Cases: Time-series data, analytics, and event logging.
o Examples: Cassandra, HBase.
4. Graph Databases
o Overview: Data is stored in nodes (entities) and edges
(relationships). This structure is highly efficient for queries
about relationships between data points.
o Use Cases: Social networks, recommendation engines, fraud
detection.
o Examples: Neo4j, Amazon Neptune.
Conclusion
Each type of NoSQL database has its strengths and is suited to different
types of data and applications. Choosing the right type of NoSQL database
depends on the structure of the data and the requirements of the
application.

MongoDB Features
1. Schema Flexibility: MongoDB's flexible schema allows developers to
store different types of data without adhering to a rigid structure.
New fields can be added to documents without affecting other
documents in the same collection.
2. High Availability: MongoDB uses replica sets to ensure that data is
replicated across multiple servers. In the event of a server failure,
another replica can take over, ensuring continuous availability.
3. Horizontal Scalability: MongoDB can distribute data across multiple
servers using sharding, which ensures that the database can handle
large amounts of data and traffic without performance degradation.
4. Rich Query Language: MongoDB supports complex queries,
including filtering, projection, joins, and aggregation.
5. BSON Format: MongoDB stores data in BSON (Binary JSON), which
allows for more efficient storage of data and better performance
compared to plain JSON.
Conclusion
MongoDB’s feature set makes it ideal for applications that require
flexibility, high availability, and scalability. Its BSON format, sharding, and
replica sets provide a robust foundation for modern web applications and
big data platforms.

Document Database
Definition and Overview
A Document Database is a type of NoSQL database designed to store,
retrieve, and manage document-oriented information, which can be semi-
structured or unstructured data. Unlike relational databases that use
tables to organize data, document databases store data in documents that
resemble JSON (JavaScript Object Notation) objects, making them highly
flexible and easily accessible.
Key Features of Document Databases
1. Schema Flexibility: Document databases allow for variable
schemas, meaning documents in the same collection can have
different fields and structures. This flexibility is especially useful in
applications where data types evolve over time, as you do not have
to change the database schema to accommodate new attributes.
2. Nested Documents: Documents can contain nested documents and
arrays, allowing for more complex data representations. This is
beneficial for representing hierarchical data and relationships in a
single document.
3. Rich Data Types: Document databases support various data types,
including strings, integers, dates, arrays, and binary data. This
capability enables you to store complex data structures more
naturally.
4. Easy Data Retrieval: Querying a document database is often
straightforward. Users can retrieve documents based on key
attributes, and queries can return entire documents, unlike SQL
databases that may require joining multiple tables.
5. Scalability: Document databases are designed to scale out easily.
They can distribute data across multiple servers, allowing for
horizontal scaling to accommodate large datasets and high traffic.
6. Integration with Web Applications: Document databases work well
with modern web applications that use JSON for data exchange,
making them an excellent choice for application development.
Use Cases
Document databases are particularly well-suited for:
 Content Management Systems (CMS): Websites where content
needs to be structured but may vary greatly from one entry to
another (e.g., blog posts, articles).
 E-Commerce Platforms: Online stores where products may have
different attributes based on their categories, such as clothing sizes,
colors, and materials.
 User Profiles in Social Media: Storing user information, posts, and
interactions in a way that can evolve with user behavior.
 Internet of Things (IoT): Managing data generated by various IoT
devices, which can have different data structures.
Advantages of Document Databases
 Speed: Document databases are often faster than relational
databases because they eliminate the need for joins. A single query
can retrieve an entire document.
 Flexibility: The ability to have varying document structures allows
developers to adapt the database schema quickly as the application
evolves.
 Development Efficiency: Faster development cycles because
changes to the data model can be made without significant
downtime or complex migrations.
Disadvantages
 Data Consistency: The lack of a strict schema can lead to data
inconsistencies, where similar documents may have different fields
or data types.
 Query Complexity: While simple queries are easy, complex queries
involving multiple documents may require more effort than with
relational databases that use joins.
 Limited Relationships: Document databases may not handle
complex relationships (like many-to-many) as efficiently as relational
databases.

MongoDB is Schemaless:
Understanding Schemaless Architecture:
In a schemaless database like MongoDB, the structure of the data is not
predefined. This means you can add new fields to existing documents
without affecting other documents in the same collection. The term
schemaless does not imply that there is no structure; instead, it refers to
the flexibility of having different schemas within the same collection.
Benefits of Being Schemaless
1. Rapid Development: Schemaless databases allow for quicker
iterations and changes to application features. Developers can
introduce new features without worrying about modifying the
database schema.
2. Easier Prototyping: When developing new applications, the lack of
a rigid schema allows teams to quickly prototype and iterate on
their designs, adding fields as needed.
3. Adaptability: As application requirements change, the database can
evolve without the need for significant downtime or complicated
migration scripts.

Potential Issues
While the flexibility of a schemaless design has significant benefits, it can
also introduce challenges:
 Inconsistency: With different documents potentially having
different structures, it can lead to inconsistencies and challenges in
data retrieval and manipulation.
 Validation: The absence of a defined schema means that enforcing
data integrity and validation rules must be managed at the
application level, which can lead to errors if not carefully handled.
Example of Schemaless Data Storage
Consider an example where we store user profiles:
// Document 1
{
"username": "john_doe",
"email": "[email protected]",
"age": 30,
"preferences": {
"language": "English",
"theme": "dark"
}
}
// Document 2
{
"username": "jane_smith",
"email": "[email protected]",
"preferences": {
"language": "Spanish"
}
}

In this example, the first document includes an age field, while the second
does not. This shows how different documents can vary in structure
within the same collection.

MongoDB Uses BSON


What is BSON?
BSON (Binary JSON) is a binary-encoded serialization format that is used
to store documents in MongoDB. It is similar to JSON but extends the
JSON format to support additional data types, such as binary data and
dates. BSON is designed to be efficient both in terms of storage and
processing speed.
Key Characteristics of BSON
1. Binary Representation: Being binary makes BSON more efficient in
storage and processing compared to plain text JSON. It can be
serialized and deserialized quickly, which enhances the performance
of database operations.
2. Rich Data Types: BSON supports several additional data types
beyond standard JSON, including:
o Date: For storing date and time values.
o ObjectId: A unique identifier for documents.
o Regular Expression: For regex matching.
o Binary Data: For storing raw binary data.
3. Efficiency in Queries: The structure of BSON allows MongoDB to
index and query documents more effectively, improving query
performance.
Example of BSON
When you store a document in MongoDB, it is converted to BSON. For
example:
{
"name": "Alice",
"age": 25,
"email": "[email protected]",
"registered_at": ISODate("2024-10-10T12:00:00Z")
}
This JSON document would be stored in BSON format, optimizing its
storage in the database while maintaining all the necessary information.

Rich Query Language in MongoDB


Introduction to MongoDB Query Language
MongoDB provides a powerful and expressive query language that allows
users to perform a variety of operations on collections. The query
language is built to handle complex queries while remaining user-friendly.
Basic Query Operations
 Find Documents: The primary way to retrieve documents is through
the find() method.
o Example: To find all users aged over 30:
db.users.find({ age: { $gt: 30 } })
 Projection: You can control which fields to return in the query
results using projection.
o Example: To return only the name and age fields:
db.users.find({}, { name: 1, age: 1 })
 Sorting: Results can be sorted using the sort() method.
o Example: To sort users by age in descending order:
db.users.find().sort({ age: -1 })
 Counting: You can count the number of documents that match a
query.
o Example: To count users older than 25:
db.users.count({ age: { $gt: 25 } })
Aggregation Framework
MongoDB provides an aggregation framework to perform complex data
processing and analysis. You can use pipelines to process data in stages:
 $match: Filters the documents.
 $group: Groups documents by specified keys and performs
aggregations.
 $sort: Sorts the resulting documents.
Example:
db.sales.aggregate([
{ $match: { status: "completed" } },
{ $group: { _id: "$item", total: { $sum: "$amount" } } },
{ $sort: { total: -1 } }
])
This aggregation pipeline filters completed sales, groups them by item,
sums the total amount for each item, and sorts the results in descending
order.
Terms Used in MongoDB
Common Terms Explained
1. Database: The highest level of organization in MongoDB, a database
holds collections and is created on demand when a collection is
accessed.
2. Collection: A collection is a grouping of documents. Collections are
schema-less and can contain documents with different structures.
3. Document: The fundamental unit of data in MongoDB, similar to a
row in a relational database. Documents are stored in BSON format.
4. Field: A key-value pair in a document, analogous to a column in a
relational database. Each document can have its own unique fields.
5. Index: An index improves query performance by allowing MongoDB
to quickly locate documents based on field values.
6. Replica Set: A group of MongoDB servers that maintain the same
dataset, ensuring high availability and redundancy.
7. Shard: A shard is a horizontal partition of data in a MongoDB
database, allowing for horizontal scaling.

Data Types in MongoDB


MongoDB supports several data types, enabling the storage of diverse
data structures:
1. String: Used for textual data.
o Example: {"name": "Alice"}
2. Integer: Stores numeric values, can be 32 or 64-bit.
o Example: {"age": 30}
3. Double: Stores floating-point values.
o Example: {"price": 19.99}
4. Boolean: Represents true/false values.
o Example: {"isActive": true}
5. Date: Stores date and time values.
o Example: {"createdAt": ISODate("2024-10-10T12:00:00Z")}
6. Array: Stores multiple values in a single field.
o Example: {"tags": ["mongodb", "database", "nosql"]}
7. ObjectId: A unique identifier for documents.
o Example: "_id": ObjectId("507f191e810c19729de860ea")
8. Embedded Document: Documents within documents to represent
complex relationships.
o Example: {"address": {"city": "New York", "zipcode": "10001"}}

Working with Database Commands


Creating a Database:
In MongoDB, you do not explicitly create a database. You simply switch to
the desired database using the use command, and it will be created if it
does not already exist.
1.Switch to Database:
 Use the use databaseName command to create or switch to
database.
 The database will not be created until you add data to it.
Output:
switched to db databaseName
2.Insert Data to Create the Database:
 The database is created when you insert data (a document) into a
collection.
 If the collection does not exist, MongoDB creates both the
collection and the database.
db.users.insertOne({name: "John", age: 25})
Output:
{
"acknowledged": true,
"insertedId": ObjectId("64fc0e0d8a243f9b15d84324")
}

3.Check the Databases:


 You can view the list of existing databases with the show dbs
command.
 Your new database will appear in the list once it contains data.
Output: admin 0.000GB
config 0.000GB
local 0.000GB
databaseName 0.001GB
 You create a database by switching to it with use and inserting data.
 The database is only fully created when data is inserted into a
collection.

4.Delete a Database:
 You can delete a database using the dropDatabase() command:
This will delete the currently selected database along with all its
collections and data.
Output:
{
"dropped": "databaseName",
"ok": 1
}

You might also like