Nosql Database
Nosql Database
Nosql Database
1) Explain how NoSQL databases are different from relational databases. Describe in
detail the key-value store NoSQL data model with an example. [9]
2) Explain BASE properties with its significance. How does the soft state of the
system depend on the Eventual consistency property? [8]
3)List the different NoSQL data models. Explain the document store NoSQL data
model with an example. [9]
6) State and explain the concept of CAP theorem and BASE properties with an
example.
7) BASE Transactions ensure properties like Basically Available, Soft State, Eventual
Consistency. What is the soft state of any system, and how does it depend on the
Eventual consistency property? [6]
8) Enlist the different types of NoSQL databases and explain them with suitable
examples. [8]
9) What is structured and unstructured data? Explain with an example. [4]
10) Explain the CAP theorem referred to during the development of any distributed
application.
11) Analyze the use of NoSQL databases in the current social networking
environment. Also, explain the need for NoSQL databases in the social networking
environment over RDBMS. [6]
12) Explain the difference between SQL and NoSQL databases.
13) Explain the following NoSQL database types with examples and also state the
scenario where each is useful:
i) Column-oriented
ii) Graph
iii) Document-oriented
15) Describe a distributed database. Explain the system architecture of distributed
transactions.
16) Explain the following types of data with examples [9]:
i) Structured
ii) Semi-structured
iii) Unstructured
Sr. Structured Data Semi-Structured Data Unstructured Data
No.
1 Fixed and organized Combination of structured and Not predefined or
form unstructured data organized form
2 Schema dependent, less More flexible than structured, Most flexible
flexible less than unstructured
3 Uses structured query Uses tags and elements to Only textual queries are
languages access the data possible
4 Requires less storage Significant storage Huge storage
requirements requirements
5 Examples: Phone Examples: Server logs, Tweets Examples: Emails and
numbers, Customer organized by hashtags, emails messages, Image files,
Names, Social Security sorted by the inbox, sent or Open-ended survey
numbers draft folders answers
6 Well-defined data model Partially defined data model No defined data model
NoSQL Database
Introduction
• NoSQL stands for not only SQL.
• It is a non-tabular database system that stores data differently than relational
tables.
• Various types of NoSQL databases include document, key-value, wide column,
and graph.
• Using NoSQL, flexible schemas can be maintained, and these schemas can be
easily scaled with a large amount of data.
• NoSQL databases are often chosen for their ability to handle diverse data
types, providing a more dynamic and adaptable approach to data storage and
retrieval.
Need
The NoSQL database technology is usually adopted for the following reasons:
1. NoSQL databases are often used for handling big data as a part of the
fundamental architecture.
2. NoSQL databases are used for storing and modeling structured, semi-
structured, and unstructured data.
3. For the efficient execution of databases with high availability, NoSQL is used.
4. NoSQL databases are non-relational, so they scale out better than relational
databases and can be designed with web applications.
5. NoSQL is used for easy scalability.
Features
1. NoSQL does not follow any relational model.
2. It is either schema-free or has a relaxed schema, meaning it does not require a
specific definition of the schema.
3. Multiple NoSQL databases can be executed in a distributed fashion.
4. It can process both unstructured and semi-structured data.
5. NoSQL databases have higher scalability.
6. It is cost-effective.
7. It supports data in the form of key-value pairs, wide columns, and graphs.
8. NoSQL databases are designed to handle large volumes of data and high-
velocity data streams.
9. They often provide built-in support for horizontal scaling, allowing seamless
expansion as data grows.
10. NoSQL databases are well-suited for scenarios where data structures may
evolve over time.
11. The flexibility of NoSQL allows developers to work with diverse data types
without rigid constraints.
12. Many NoSQL databases offer automatic sharding, distributing data across
multiple servers for improved performance.
13. NoSQL databases are commonly associated with a decentralized architecture,
enhancing fault tolerance.
14. They excel in use cases requiring quick development cycles, as changes to the
database schema are more straightforward.
15. NoSQL databases are frequently employed in modern web and mobile
application development due to their adaptability.
1. Key-value store
2. Document store
3. Graph based
4. Wide column store
1 Key-Value Store
1. The key-value pair is the simplest type of NoSQL database.
2. Designed to handle large volumes of data and heavy loads efficiently.
3. In key-value storage, each key is unique, and the corresponding value can be
in various formats such as JSON, string, or binary objects.
{Customer:
[
{"id": 1, "name": "Ankita"},
{"id": 2, "name": "Kavita"}
]
}
4. Example: Here, "id" and "name" are keys, and the corresponding values are 1,
2, "Ankita," "Kavita."
5. Key-value stores facilitate the storage of schema-less data, making them
particularly useful for scenarios like Shopping Cart Contents.
6. Examples of Key-Value Stores: DynamoDB, Riak, Redis
7. Key-value stores provide a simple and effective way to manage and retrieve
data, making them suitable for various applications that require fast and
scalable data access.
2 Document Store
1. Document stores make use of key-value pairs to store and retrieve data.
2. Documents are typically stored in the form of XML and JSON.
3. Among NoSQL database types, document stores appear most natural.
4. They are commonly used due to their flexibility and the ability to query on any
field within the document.
5. Example:
6. MongoDB and CouchDB are two popular document-oriented NoSQL
databases.
7. Document stores are suitable for scenarios where data structures may evolve
over time, and the ability to work with varying data types is crucial.
8. These databases are known for their adaptability to changing application
requirements.
9. Document stores provide a versatile and scalable solution for managing
diverse data structures and are widely utilized in modern application
development.
BASE Properties
The relational database strongly follows the ACID properties (Atomicity,
Consistency, Isolation, and Durability) while the NoSQL database follows BASE
properties.
BASE properties consist of:
Basically Available:
• The system is guaranteed to be available in the event of failure.
• This property prioritizes system availability over immediate consistency.
• Even in the face of faults or network partitions, the system remains
operational.
Soft State:
• It means that the system state may change even without input.
• The system does not require all components to be in a consistent state at all
times.
• Soft state allows for flexibility and adaptability to changing conditions.
Eventual Consistency:
• The system will become consistent over time.
• While the system may not be immediately consistent, given enough time, all
replicas or nodes in the system will converge to a consistent state.
• This property acknowledges that achieving immediate consistency in a
distributed system might not be practical or efficient.
• BASE properties provide a more relaxed approach to consistency and
availability, making them suitable for distributed and scalable NoSQL
databases where strict adherence to ACID principles may be challenging.
Example:
In a distributed e-commerce system using a NoSQL database with BASE properties:
Basically Available:
Customers can still browse and buy products from available servers even if some
parts of the system go down (e.g., a data center).
Soft State:
Inventory levels may temporarily differ between nodes due to update delays or
network partitions.
The system allows for temporary inconsistency in inventory data.
Eventually Consistent:
After a period without new purchases or updates, all copies of inventory data across
nodes will converge to a consistent state.
The system ensures eventual consistency by providing time for updates to propagate.
3 Graph Database
• Graph databases are typically used in applications where the relationships
among data elements are a critical aspect.
• Connections between elements in a graph database are called links or
relationships.
• In a graph database, connections are first-class elements of the database and
are stored directly.
• Components of a Graph Database:
1. Node: Represents entities (e.g., people, students).
2. Edge: Represents relationships among the entities.
• Graph databases excel in scenarios where understanding and querying
relationships between data entities are essential.
• Example Use Cases:
• Social Networks
• Logistics
• Spatial Data
Notable Graph Databases:
• Neo4J
• Infinite Graph
• OrientDB
Graph databases provide a powerful way to model and query relationships, making
them well-suited for applications that heavily rely on understanding and navigating
connections between different data elements.
• Wide column store databases excel in quickly aggregating values for a given
column, making them well-suited for data warehousing and business
intelligence applications.
• These databases are adept at handling large-scale distributed data and are
designed for scalability.
• Examples of Column-Based Databases:
• HBase
• Cassandra
Wide column store databases offer a flexible and scalable solution for managing and
analyzing large volumes of data with changing or evolving structures.
RDBMS
1. The relational database system is based on relationships among the tables.
2. It is vertically scalable.
3. It has a predefined schema.
4. It uses SQL to query the database.
5. It is a table-based database.
6. It emphasizes on ACID properties (Atomicity, Consistency, Isolation, and
Durability).
7. Schema is fixed or rigid.
8. Pessimistic.
9. Examples: MySQL, Oracle, PostgreSQL
NoSQL
1. It is non-relational database system. It can be used in a distributed environment.
2. It is horizontally scalable.
3. It does not have a schema or it may have a relaxed schema.
4. It uses unstructured query language.
5. It is document-based, graph-based, or key-value pair.
6. It follows Brewer's CAP theorem (Consistency, Availability, and Partition
Tolerance).
7. Schema is dynamic.
8. Optimistic.
9. Examples: MongoDB, BigTable, Redis
MongoDB
CRUD Operations
1. Create Database
• Command to create: use Database_name
• Example: use mystudents
2. Drop Database
• Command to drop: db.dropDatabase()
• Example: db.dropDatabase()
3. Create Collection
• Command for direct insertion: db.collection_name.insert({key1:value1,
key2:value2})
• Command for explicit creation: db.createCollection(name, options)
• Example for explicit creation: db.createCollection("myemp")
4. Display Collection (Read Operation)
• Command to display collections: show collections
• Example: show collections
5. Drop Collection (Delete Operation)
• Command to drop collection: db.collection_name.drop()
• Example: db.myemp.drop()
6. Insert Documents
• Command to insert: db.collection_name.insert({key, value})
• Example: db.myemp.insert({name: "John", age: 25, department:
"HR"})
7. Delete Documents
• Command to delete: db.collection_name.remove(delete_criteria)
• Example: db.myemp.remove({name: "John"})
8. Update Documents
• Command to update: db.collection_name.update(criteria,
update_data)
• Example: db.myemp.update({name: "Alice"}, {$set: {age: 28}})
9. Sorting
• Command for ascending order:
db.collection_name.find().sort({field_name: 1})
• Example: db.myemp.find().sort({name: 1})
10. Indexing
• Command to create index: db.collection.createIndex({KEY: 1})
• Command to find index: db.collection.getIndexes()
• Command to drop index: db.collection.dropIndex(Index Name)
• Example: db.myemp.createIndex({name: 1})
11. Aggregation
• Command for aggregation:
db.collection_name.aggregate(aggregate_operation)
• Example: db.customers.aggregate([{$group: {_id: "$type", category:
{$sum: 1}}}])
12. Map Reduce
• Command for mapReduce: db.collection.mapReduce(mapFunction,
reduceFunction, {out: collection, query: document, sort:
document, limit: number})
Replication
• Replication is the process of making data available across multiple servers to
ensure data availability and resilience against server failures.
• In MongoDB, replication is implemented using replica sets, which consist of a
primary node and multiple secondary nodes.
• The primary node is responsible for handling read and write operations, while
the secondary nodes continuously replicate the data from the primary node.
• If the primary node fails, one of the secondary nodes will be elected as the new
primary node, ensuring continuous data availability.
Benefits of Replication:
1. Data availability: Replication ensures that data remains accessible even if the
primary node fails.
2. Fault tolerance: Replication protects against data loss due to hardware failures
or server crashes.
3. Improved performance: Replication can enhance performance by distributing
data across multiple servers.
Sharding
• Sharding is a horizontal scaling technique that splits large datasets into
smaller chunks called shards distributed across multiple MongoDB instances.
• Sharding is not replication; it simply distributes data across multiple servers
to improve scalability and performance.
• Sharding works by dividing a large collection into multiple shards and using a
config server to maintain metadata about the shards.
• A router instance is responsible for routing client requests to the appropriate
shard based on the shard key.
Benefits of Sharding:
1. Horizontal scalability: Sharding allows MongoDB to handle large datasets by
distributing data across multiple servers.
2. Improved performance: Sharding can improve performance by parallelizing
queries and read operations across multiple shards.
3. Reduced storage costs: Sharding can reduce storage costs by distributing data
across multiple servers.
List the different NoSQL data models. Explain the document store NoSQL data
model with an example. [9]
NoSQL Data Models:
1. Document Store:
2. Key-Value Store:
3. Column-Family Store:
4. Graph Database:
5. Object-Oriented Database:
6. Multi-Model Database:
Document Store NoSQL Data Model:
Explanation: The document store NoSQL data model organizes and stores data in
flexible, semi-structured documents, typically in formats like JSON or BSON. Each
document contains key-value pairs, and collections of documents form a database.
This model allows for dynamic schema, making it suitable for varied and evolving
data structures.
Example: Consider a blogging platform using a document store NoSQL database:
{
"_id": "123456",
"title": "NoSQL Explained",
"author": "John Doe",
"content": "A detailed explanation of NoSQL databases...",
"tags": ["NoSQL", "Database", "Document Store"],
"date": "2023-01-15"
}
Explanation:
• Each blog post is a document.
• Fields like title, author, content, tags, and date are key-value pairs.
• No predefined schema, allowing flexibility in adding or modifying fields.
• Tags field is an array, showcasing support for nested or varied data structures.
• The _id field uniquely identifies each document.
This document store model is beneficial for scenarios where data structures are
diverse, evolving, or where flexibility in schema is crucial.
7) BASE Transactions ensure properties like Basically Available, Soft State, Eventual
Consistency. What is the soft state of any system, and how does it depend on the
Eventual consistency property? [6]
Soft State in a System:
• Soft state refers to a system characteristic where the state can change over
time without explicit external inputs. It allows for temporary inconsistencies
or variations in the data across different components or nodes within a
distributed system.
Dependence on Eventual Consistency:
• The soft state of a system depends on the eventual consistency property,
especially in distributed systems with BASE (Basically Available, Soft State,
Eventual Consistency) transactions.
Explanation:
1. Basically Available (BA):
• The system remains basically available for operations even in the
presence of temporary inconsistencies or variations in the data.
2. Soft State (S):
• Soft state acknowledges the transient nature of inconsistencies and
allows for temporary variations in the system's data.
3. Eventual Consistency (E):
• Eventual consistency ensures that given enough time without new
updates, all replicas or nodes in the system will converge to a consistent
state.
Dependence Relation:
• Soft state is contingent on the understanding that, due to factors like network
delays, partitions, or updates not instantly propagating, temporary
inconsistencies may exist.
• Eventual consistency acts as a guarantee that, over time, these temporary
inconsistencies will be resolved, and the system will reach a consistent state.