Nosql Database

Download as pdf or txt
Download as pdf or txt
You are on page 1of 19

NoSQL Databases

1) Explain how NoSQL databases are different from relational databases. Describe in
detail the key-value store NoSQL data model with an example. [9]
2) Explain BASE properties with its significance. How does the soft state of the
system depend on the Eventual consistency property? [8]
3)List the different NoSQL data models. Explain the document store NoSQL data
model with an example. [9]
6) State and explain the concept of CAP theorem and BASE properties with an
example.
7) BASE Transactions ensure properties like Basically Available, Soft State, Eventual
Consistency. What is the soft state of any system, and how does it depend on the
Eventual consistency property? [6]
8) Enlist the different types of NoSQL databases and explain them with suitable
examples. [8]
9) What is structured and unstructured data? Explain with an example. [4]
10) Explain the CAP theorem referred to during the development of any distributed
application.
11) Analyze the use of NoSQL databases in the current social networking
environment. Also, explain the need for NoSQL databases in the social networking
environment over RDBMS. [6]
12) Explain the difference between SQL and NoSQL databases.
13) Explain the following NoSQL database types with examples and also state the
scenario where each is useful:
i) Column-oriented
ii) Graph
iii) Document-oriented
15) Describe a distributed database. Explain the system architecture of distributed
transactions.
16) Explain the following types of data with examples [9]:
i) Structured
ii) Semi-structured
iii) Unstructured
Sr. Structured Data Semi-Structured Data Unstructured Data
No.
1 Fixed and organized Combination of structured and Not predefined or
form unstructured data organized form
2 Schema dependent, less More flexible than structured, Most flexible
flexible less than unstructured
3 Uses structured query Uses tags and elements to Only textual queries are
languages access the data possible
4 Requires less storage Significant storage Huge storage
requirements requirements
5 Examples: Phone Examples: Server logs, Tweets Examples: Emails and
numbers, Customer organized by hashtags, emails messages, Image files,
Names, Social Security sorted by the inbox, sent or Open-ended survey
numbers draft folders answers
6 Well-defined data model Partially defined data model No defined data model

7 Examples: Relational Examples: JSON, XML, CSV Examples: Text


databases files documents, audio files,
video files
8 Limited variability in Moderate variability in data High variability in data
data
9 Efficient for analysis Intermediate efficiency for Challenging for analysis
analysis
10 Examples: SQL queries Examples: XPath queries, Examples: Full-text
JSON Path queries search, natural language
processing (NLP)
11 Examples: Excel Examples: NoSQL databases Examples: Document
spreadsheets like MongoDB stores, file systems

NoSQL Database
Introduction
• NoSQL stands for not only SQL.
• It is a non-tabular database system that stores data differently than relational
tables.
• Various types of NoSQL databases include document, key-value, wide column,
and graph.
• Using NoSQL, flexible schemas can be maintained, and these schemas can be
easily scaled with a large amount of data.
• NoSQL databases are often chosen for their ability to handle diverse data
types, providing a more dynamic and adaptable approach to data storage and
retrieval.
Need
The NoSQL database technology is usually adopted for the following reasons:
1. NoSQL databases are often used for handling big data as a part of the
fundamental architecture.
2. NoSQL databases are used for storing and modeling structured, semi-
structured, and unstructured data.
3. For the efficient execution of databases with high availability, NoSQL is used.
4. NoSQL databases are non-relational, so they scale out better than relational
databases and can be designed with web applications.
5. NoSQL is used for easy scalability.

Features
1. NoSQL does not follow any relational model.
2. It is either schema-free or has a relaxed schema, meaning it does not require a
specific definition of the schema.
3. Multiple NoSQL databases can be executed in a distributed fashion.
4. It can process both unstructured and semi-structured data.
5. NoSQL databases have higher scalability.
6. It is cost-effective.
7. It supports data in the form of key-value pairs, wide columns, and graphs.
8. NoSQL databases are designed to handle large volumes of data and high-
velocity data streams.
9. They often provide built-in support for horizontal scaling, allowing seamless
expansion as data grows.
10. NoSQL databases are well-suited for scenarios where data structures may
evolve over time.
11. The flexibility of NoSQL allows developers to work with diverse data types
without rigid constraints.
12. Many NoSQL databases offer automatic sharding, distributing data across
multiple servers for improved performance.
13. NoSQL databases are commonly associated with a decentralized architecture,
enhancing fault tolerance.
14. They excel in use cases requiring quick development cycles, as changes to the
database schema are more straightforward.
15. NoSQL databases are frequently employed in modern web and mobile
application development due to their adaptability.

Types of NoSQL Databases

1. Key-value store
2. Document store
3. Graph based
4. Wide column store

1 Key-Value Store
1. The key-value pair is the simplest type of NoSQL database.
2. Designed to handle large volumes of data and heavy loads efficiently.
3. In key-value storage, each key is unique, and the corresponding value can be
in various formats such as JSON, string, or binary objects.
{Customer:
[
{"id": 1, "name": "Ankita"},
{"id": 2, "name": "Kavita"}
]
}
4. Example: Here, "id" and "name" are keys, and the corresponding values are 1,
2, "Ankita," "Kavita."
5. Key-value stores facilitate the storage of schema-less data, making them
particularly useful for scenarios like Shopping Cart Contents.
6. Examples of Key-Value Stores: DynamoDB, Riak, Redis
7. Key-value stores provide a simple and effective way to manage and retrieve
data, making them suitable for various applications that require fast and
scalable data access.

2 Document Store
1. Document stores make use of key-value pairs to store and retrieve data.
2. Documents are typically stored in the form of XML and JSON.
3. Among NoSQL database types, document stores appear most natural.
4. They are commonly used due to their flexibility and the ability to query on any
field within the document.
5. Example:
6. MongoDB and CouchDB are two popular document-oriented NoSQL
databases.
7. Document stores are suitable for scenarios where data structures may evolve
over time, and the ability to work with varying data types is crucial.
8. These databases are known for their adaptability to changing application
requirements.
9. Document stores provide a versatile and scalable solution for managing
diverse data structures and are widely utilized in modern application
development.

BASE Properties
The relational database strongly follows the ACID properties (Atomicity,
Consistency, Isolation, and Durability) while the NoSQL database follows BASE
properties.
BASE properties consist of:
Basically Available:
• The system is guaranteed to be available in the event of failure.
• This property prioritizes system availability over immediate consistency.
• Even in the face of faults or network partitions, the system remains
operational.
Soft State:
• It means that the system state may change even without input.
• The system does not require all components to be in a consistent state at all
times.
• Soft state allows for flexibility and adaptability to changing conditions.
Eventual Consistency:
• The system will become consistent over time.
• While the system may not be immediately consistent, given enough time, all
replicas or nodes in the system will converge to a consistent state.
• This property acknowledges that achieving immediate consistency in a
distributed system might not be practical or efficient.
• BASE properties provide a more relaxed approach to consistency and
availability, making them suitable for distributed and scalable NoSQL
databases where strict adherence to ACID principles may be challenging.
Example:
In a distributed e-commerce system using a NoSQL database with BASE properties:
Basically Available:
Customers can still browse and buy products from available servers even if some
parts of the system go down (e.g., a data center).
Soft State:
Inventory levels may temporarily differ between nodes due to update delays or
network partitions.
The system allows for temporary inconsistency in inventory data.
Eventually Consistent:
After a period without new purchases or updates, all copies of inventory data across
nodes will converge to a consistent state.
The system ensures eventual consistency by providing time for updates to propagate.

3 Graph Database
• Graph databases are typically used in applications where the relationships
among data elements are a critical aspect.
• Connections between elements in a graph database are called links or
relationships.
• In a graph database, connections are first-class elements of the database and
are stored directly.
• Components of a Graph Database:
1. Node: Represents entities (e.g., people, students).
2. Edge: Represents relationships among the entities.
• Graph databases excel in scenarios where understanding and querying
relationships between data entities are essential.
• Example Use Cases:
• Social Networks
• Logistics
• Spatial Data
Notable Graph Databases:
• Neo4J
• Infinite Graph
• OrientDB
Graph databases provide a powerful way to model and query relationships, making
them well-suited for applications that heavily rely on understanding and navigating
connections between different data elements.

Wide Column Store


• The wide column store model is akin to a traditional relational database but
differs in its approach to column organization.
• In this model, columns are created for each row, as opposed to having
predefined columns in the table structure.
• Unlike traditional relational databases where the number of columns is fixed,
the wide column store model allows for flexibility in the number and types of
columns for each record.
• This flexibility is particularly advantageous in scenarios where the data
structure is subject to change or expansion over time.
Example
| Row ID | Columns |
|--------|-----------------------|
|1 | Name City |
| | Ankita Pune |
|2 | Name City Email |
| | Kavita Mumbai [email protected] |

• Wide column store databases excel in quickly aggregating values for a given
column, making them well-suited for data warehousing and business
intelligence applications.
• These databases are adept at handling large-scale distributed data and are
designed for scalability.
• Examples of Column-Based Databases:
• HBase
• Cassandra
Wide column store databases offer a flexible and scalable solution for managing and
analyzing large volumes of data with changing or evolving structures.

| Sr. No. | ACID | BASE |


|---------|---------------------------------------|-----------------------------------------|
|1 | Stands for Atomicity, Consistency, | Stands for Basically Available, Soft |
| | Isolation, and Durability. | State, Eventual Consistency. |
|2 | Shows consistency. | Represents weak consistency. |
|3 | Availability is less important. | Availability of the system is more |
| | | important. |
|4 | Evolution is difficult. | Evolution is easy. |
|5 | Possesses expensive joins and | Free from joins and relationships. |
| | relationships. | |
|6 | High maintenance costs. | Low maintenance cost. |
|7 | Provides vertical scaling. | Provides horizontal scaling. |

RDBMS
1. The relational database system is based on relationships among the tables.
2. It is vertically scalable.
3. It has a predefined schema.
4. It uses SQL to query the database.
5. It is a table-based database.
6. It emphasizes on ACID properties (Atomicity, Consistency, Isolation, and
Durability).
7. Schema is fixed or rigid.
8. Pessimistic.
9. Examples: MySQL, Oracle, PostgreSQL

NoSQL
1. It is non-relational database system. It can be used in a distributed environment.
2. It is horizontally scalable.
3. It does not have a schema or it may have a relaxed schema.
4. It uses unstructured query language.
5. It is document-based, graph-based, or key-value pair.
6. It follows Brewer's CAP theorem (Consistency, Availability, and Partition
Tolerance).
7. Schema is dynamic.
8. Optimistic.
9. Examples: MongoDB, BigTable, Redis

MongoDB

MongoDB is a document-oriented NoSQL database that stores data in JSON-like


documents. It is known for its high performance, scalability, and flexibility, making it
a popular choice for a wide variety of applications, including web applications,
mobile applications, and data warehousing.
Key Features of MongoDB:
• Document-oriented: Data is stored in JSON-like documents, which are
flexible and can hold different types of data.
• Schema-less: MongoDB does not require a predefined schema, so you can add
new fields to documents without changing the schema.
• High performance: MongoDB is designed for high performance and can
handle large volumes of data.
• Scalability: MongoDB is horizontally scalable, so you can add more nodes to
the cluster to handle increasing data volumes.
• Flexibility: MongoDB can be used for a variety of applications, from web
applications to data warehousing.
Use Cases of MongoDB:
• Web applications: MongoDB is a popular choice for storing user data, session
data, and application data in web applications.
• Mobile applications: MongoDB is lightweight and can be easily integrated into
mobile applications.
• Data warehousing: MongoDB can be used to collect and store large amounts
of data for analysis and reporting.
Examples of MongoDB Applications:
• Social media platforms: MongoDB can be used to store user profiles, posts,
and comments.
• E-commerce websites: MongoDB can be used to store product information,
user shopping carts, and order data.
• Game development: MongoDB can be used to store game data, player profiles,
and game statistics.
• IoT applications: MongoDB can be used to store sensor data from IoT devices.
Feature SQL MongoDB

Data Model Relational Document-oriented

Schema Predefined Flexible or no schema

Query SQL MongoDB's MQL (MongoDB Query


Language Language)
Data Storage Tables Documents (JSON-like)

Data ACID (Atomicity, Consistency, CAP (Consistency, Availability, Partition


Consistency Isolation, Durability) Tolerance)
Scalability Vertical Horizontal

Data Integrity Strong Flexible

Applications Structured data, transaction Unstructured data, web applications,


processing mobile applications
Examples MySQL, Oracle, PostgreSQL MongoDB, Cassandra, CouchDB

| StudentID | Name | Age | City | Grade |


|-----------|-----------|-----|----------|-----------|
| 101 | Rahul | 20 | Mumbai | A |
| 102 | Priya | 22 | Delhi | B |
| 103 | Raj | 21 | Kolkata | A |
| 104 | Simran | 23 | Chennai | C |
[
{
"studentID": 101,
"name": "Rahul",
"age": 20,
"city": "Mumbai",
"grade": "A"
},
{
"studentID": 102,
"name": "Priya",
"age": 22,
"city": "Delhi",
"grade": "B"
},
{
"studentID": 103,
"name": "Raj",
"age": 21,
"city": "Kolkata",
"grade": "A"
},
{
"studentID": 104,
"name": "Simran",
"age": 23,
"city": "Chennai",
"grade": "C"
}
]

Various data types supported by MongoDB:


1. Integer:
• Used for storing numerical values.
2. Boolean:
• Used for implementing Boolean values, i.e., true or false.
3. Double:
• Used for storing floating-point data.
4. String:
• Most commonly used for storing string values.
5. Min/Max Keys:
• Used to compare a value against the lowest or highest BSON element.
6. Arrays:
• Used for storing an array or list of multiple values in one key.
7. Object:
• Implemented for embedded documents.
8. Symbol:
• Similar to the string data type; used to store specific symbol types.
9. Null:
• Used for storing null values.
10. Date:
• Used to store the current date or time. Custom date or time objects can
also be created.
11. Binary Data:
• Used to store binary data.
12. Regular Expression:
• Used to store regular expressions.

CRUD Operations
1. Create Database
• Command to create: use Database_name
• Example: use mystudents
2. Drop Database
• Command to drop: db.dropDatabase()
• Example: db.dropDatabase()
3. Create Collection
• Command for direct insertion: db.collection_name.insert({key1:value1,
key2:value2})
• Command for explicit creation: db.createCollection(name, options)
• Example for explicit creation: db.createCollection("myemp")
4. Display Collection (Read Operation)
• Command to display collections: show collections
• Example: show collections
5. Drop Collection (Delete Operation)
• Command to drop collection: db.collection_name.drop()
• Example: db.myemp.drop()
6. Insert Documents
• Command to insert: db.collection_name.insert({key, value})
• Example: db.myemp.insert({name: "John", age: 25, department:
"HR"})
7. Delete Documents
• Command to delete: db.collection_name.remove(delete_criteria)
• Example: db.myemp.remove({name: "John"})
8. Update Documents
• Command to update: db.collection_name.update(criteria,
update_data)
• Example: db.myemp.update({name: "Alice"}, {$set: {age: 28}})
9. Sorting
• Command for ascending order:
db.collection_name.find().sort({field_name: 1})
• Example: db.myemp.find().sort({name: 1})
10. Indexing
• Command to create index: db.collection.createIndex({KEY: 1})
• Command to find index: db.collection.getIndexes()
• Command to drop index: db.collection.dropIndex(Index Name)
• Example: db.myemp.createIndex({name: 1})
11. Aggregation
• Command for aggregation:
db.collection_name.aggregate(aggregate_operation)
• Example: db.customers.aggregate([{$group: {_id: "$type", category:
{$sum: 1}}}])
12. Map Reduce
• Command for mapReduce: db.collection.mapReduce(mapFunction,
reduceFunction, {out: collection, query: document, sort:
document, limit: number})

Replication
• Replication is the process of making data available across multiple servers to
ensure data availability and resilience against server failures.
• In MongoDB, replication is implemented using replica sets, which consist of a
primary node and multiple secondary nodes.
• The primary node is responsible for handling read and write operations, while
the secondary nodes continuously replicate the data from the primary node.
• If the primary node fails, one of the secondary nodes will be elected as the new
primary node, ensuring continuous data availability.
Benefits of Replication:
1. Data availability: Replication ensures that data remains accessible even if the
primary node fails.
2. Fault tolerance: Replication protects against data loss due to hardware failures
or server crashes.
3. Improved performance: Replication can enhance performance by distributing
data across multiple servers.
Sharding
• Sharding is a horizontal scaling technique that splits large datasets into
smaller chunks called shards distributed across multiple MongoDB instances.
• Sharding is not replication; it simply distributes data across multiple servers
to improve scalability and performance.
• Sharding works by dividing a large collection into multiple shards and using a
config server to maintain metadata about the shards.
• A router instance is responsible for routing client requests to the appropriate
shard based on the shard key.
Benefits of Sharding:
1. Horizontal scalability: Sharding allows MongoDB to handle large datasets by
distributing data across multiple servers.
2. Improved performance: Sharding can improve performance by parallelizing
queries and read operations across multiple shards.
3. Reduced storage costs: Sharding can reduce storage costs by distributing data
across multiple servers.

List the different NoSQL data models. Explain the document store NoSQL data
model with an example. [9]
NoSQL Data Models:
1. Document Store:
2. Key-Value Store:
3. Column-Family Store:
4. Graph Database:
5. Object-Oriented Database:
6. Multi-Model Database:
Document Store NoSQL Data Model:
Explanation: The document store NoSQL data model organizes and stores data in
flexible, semi-structured documents, typically in formats like JSON or BSON. Each
document contains key-value pairs, and collections of documents form a database.
This model allows for dynamic schema, making it suitable for varied and evolving
data structures.
Example: Consider a blogging platform using a document store NoSQL database:
{
"_id": "123456",
"title": "NoSQL Explained",
"author": "John Doe",
"content": "A detailed explanation of NoSQL databases...",
"tags": ["NoSQL", "Database", "Document Store"],
"date": "2023-01-15"
}
Explanation:
• Each blog post is a document.
• Fields like title, author, content, tags, and date are key-value pairs.
• No predefined schema, allowing flexibility in adding or modifying fields.
• Tags field is an array, showcasing support for nested or varied data structures.
• The _id field uniquely identifies each document.
This document store model is beneficial for scenarios where data structures are
diverse, evolving, or where flexibility in schema is crucial.

7) BASE Transactions ensure properties like Basically Available, Soft State, Eventual
Consistency. What is the soft state of any system, and how does it depend on the
Eventual consistency property? [6]
Soft State in a System:
• Soft state refers to a system characteristic where the state can change over
time without explicit external inputs. It allows for temporary inconsistencies
or variations in the data across different components or nodes within a
distributed system.
Dependence on Eventual Consistency:
• The soft state of a system depends on the eventual consistency property,
especially in distributed systems with BASE (Basically Available, Soft State,
Eventual Consistency) transactions.
Explanation:
1. Basically Available (BA):
• The system remains basically available for operations even in the
presence of temporary inconsistencies or variations in the data.
2. Soft State (S):
• Soft state acknowledges the transient nature of inconsistencies and
allows for temporary variations in the system's data.
3. Eventual Consistency (E):
• Eventual consistency ensures that given enough time without new
updates, all replicas or nodes in the system will converge to a consistent
state.
Dependence Relation:
• Soft state is contingent on the understanding that, due to factors like network
delays, partitions, or updates not instantly propagating, temporary
inconsistencies may exist.
• Eventual consistency acts as a guarantee that, over time, these temporary
inconsistencies will be resolved, and the system will reach a consistent state.

10) CAP Theorem:


1. Consistency (C):
Definition: All nodes in a distributed system have the same data view
at the same time.
• Implication: Every read receives the most recent write, ensuring a
consistent state across the system.
2. Availability (A):
• Definition: Every request to the distributed system receives a
response, without guarantee that it contains the most recent version of
the data.
• Implication: The system remains operational even in the presence of
node failures.
3. Partition Tolerance (P):
• Definition: The system continues to operate and provide responses
despite network partitions or communication failures between nodes.
• Implication: The system remains functional and available even when
some nodes cannot communicate with others.
Explanation:
• The CAP theorem asserts that in a distributed system, it is impossible to
simultaneously provide all three guarantees (Consistency, Availability, and
Partition Tolerance).
• During network partitions (P), a system must choose between maintaining
Consistency or Availability.
• Different distributed systems prioritize CAP properties based on their specific
use cases and requirements.
11) Use of NoSQL in Social Networking:
1. Scalability:
• Use in NoSQL: NoSQL databases like MongoDB, Cassandra, or Redis
provide horizontal scalability, allowing social networks to handle a
massive volume of users and data.
• Need Over RDBMS: Traditional RDBMS may struggle with the
scalability demands of rapidly growing social networks.
2. Flexibility in Data Models:
• Use in NoSQL: NoSQL databases accommodate varied data formats,
supporting diverse content such as user profiles, multimedia, and
dynamic relationships.
• Need Over RDBMS: Social networks generate diverse and evolving
data types, making flexible schemas crucial.
3. Quick Iteration and Development:
• Use in NoSQL: NoSQL's dynamic schema allows agile development
and quick iterations, essential for adapting to changing social network
features.
• Need Over RDBMS: RDBMS often requires extensive schema
changes, slowing down development in dynamic social environments.
4. High Performance:
• Use in NoSQL: NoSQL databases excel in read and write
performance, critical for delivering real-time updates and responses.
• Need Over RDBMS: Social networking demands low-latency
interactions, where NoSQL's performance advantages are beneficial.
Explanation:
• NoSQL databases align well with the dynamic, high-volume, and diverse data
requirements of social networking platforms.
• The need for scalability, flexibility, quick development, and high performance
drives the preference for NoSQL over traditional RDBMS in social networking
environments.

13) NoSQL Database Types


NoSQL databases offer a variety of data models, each designed to store and manage
different types of data efficiently. Here's an overview of the three NoSQL data models
along with examples and suitable scenarios:
i) Column-oriented Database:
• Data Model: Stores data in columns, where each column is a separate array.
This structure is optimized for analytical queries and data manipulation.
• Examples: Cassandra, HBase, ScyllaDB
• Suitable Scenarios:
o Data warehousing and analytics
o Time-series data analysis
o Large-scale data processing
ii) Graph Database:
• Data Model: Represents data as a network of interconnected nodes and edges.
Nodes represent entities or objects, while edges represent relationships
between them.
• Examples: Neo4j, OrientDB, Apache TinkerGraph
• Suitable Scenarios:
o Social network analysis
o Knowledge graphs
o Recommendation systems
o Supply chain management
iii) Document-oriented Database:
• Data Model: Stores data as self-contained documents, which are collections of
key-value pairs that represent a complete entity or data object. Documents can
also contain nested documents and arrays.
• Examples: MongoDB, CouchDB, Elasticsearch
• Suitable Scenarios:
o Storing unstructured or semi-structured data
o Content management systems
o E-commerce applications
o User profile management
15) Distributed Databases
Distributed databases are databases that spread data across multiple servers or
nodes, enabling horizontal scalability and high availability. They are designed to
handle large datasets and maintain performance as the data volume grows.
System Architecture of Distributed Transactions:
Distributed transactions involve multiple participants, often across different nodes
or servers. To ensure data consistency and integrity, distributed transactions employ
protocols and mechanisms like:
• Two-Phase Commit (2PC): A coordination protocol that ensures all
participants agree on the transaction's outcome before committing or aborting
it.
• Distributed Lock Manager (DLM): Coordinates access to shared resources and
prevents conflicts during concurrent transactions.
• Data Replication: Replicates data across multiple nodes to ensure availability
and fault tolerance.
Benefits of Distributed Databases:
• Horizontal Scalability: Easily add more nodes to handle increasing data
volumes and user requests.
• High Availability: Reduce downtime and maintain data availability by
replicating data and tolerating node failures.
• Improved Performance: Distribute workloads across multiple nodes to
improve response times and throughput.
Challenges of Distributed Databases:
• Data Consistency: Maintaining data consistency across multiple nodes can be
complex and require synchronization mechanisms.
• Transaction Management: Managing distributed transactions across multiple
participants adds complexity to the system.
• Complexity of Deployment and Management: Managing a distributed system
with multiple nodes requires careful planning and configuration.

You might also like