100% found this document useful (1 vote)
278 views

NoSQL Technologies Notes Unit 1

document titled "NoSQL Technologies" explores foundational and advanced concepts related to NoSQL databases, emphasizing their advantages, types, architectures, and practical applications. Below is a detailed description of its content

Uploaded by

Shraddha Mayekar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
278 views

NoSQL Technologies Notes Unit 1

document titled "NoSQL Technologies" explores foundational and advanced concepts related to NoSQL databases, emphasizing their advantages, types, architectures, and practical applications. Below is a detailed description of its content

Uploaded by

Shraddha Mayekar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Msc.

computer science sem 1

NoSQL Technologies(Unit I)
Syllabus:
Unit 1: Introduction to NoSQL and Interfacing with NoSQL Data Stores Basics
Introduction to NoSQL: Characteristics of NoSQL, NoSQL Storage types, Advantages and
Drawbacks, NoSQL Products Interfacing and interacting with NoSQL: Storing Data in and
Accessing Data from MongoDB, Redis, HBase and Apache Cassandra, Language Bindings for
NoSQL Data Stores Understanding the storage architecture: Working with ColumnOriented
Databases, HBase Distributed Storage Architecture, Document Store Internals,
Understanding Key/Value Stores in Memcached and Redis, Eventually Consistent Non-
relational Databases Performing CRUD operations: Creating Records, Accessing Data,
Updating and Deleting Data

Introduction to NoSQL and Interfacing with NoSQL Data Stores

 Basics Introduction to NoSQL:

 Characteristics of NoSQL:
NoSQL (Not Only SQL) refers to a broad class of database management systems that differ
from traditional relational database management systems (RDBMS). NoSQL databases are designed
to handle and store large volumes of unstructured or semi-structured data, providing a more flexible
and scalable approach compared to traditional relational databases.

Schema-less Design:
NoSQL databases are typically schema-less or schemaflexible, allowing for dynamic and evolving
data structures. This is in contrast to traditional relational databases that enforce a fixed schema.
Scalability:
NoSQL databases are designed to scale horizontally, allowing them to handle large amounts of data
and high transaction volumes by adding more servers to the database cluster.
High Performance:
NoSQL databases often provide high performance for certain types of queries, especially for read
and write operations on large datasets, due to their optimized storage and retrieval mechanisms.
Distributed Architecture:
Many NoSQL databases are designed with distributed architectures that support distributed data
storage and processing across multiple nodes. This is essential for achieving high availability and
fault tolerance.
Flexible Data Models:
NoSQL databases support various data models, including document-oriented (e.g., MongoDB), key-
value pairs (e.g., Redis), wide-column store (e.g., Apache Cassandra), and graph databases (e.g.,
Neo4j). This flexibility allows developers to choose the most suitable model for their specific
application requirements.
CAP Theorem:

1
Msc.computer science sem 1

NoSQL databases are often designed with consideration for the CAP (Consistency, Availability,
Partition Tolerance) theorem. According to CAP, a distributed system can achieve at most two out of
the three properties. NoSQL databases vary in how they prioritize and implement these properties
based on their use cases.
Horizontal Partitioning:
NoSQL databases often use horizontal partitioning to distribute data across multiple nodes. This
sharding technique helps in efficiently managing and accessing large datasets.

Developed for Big Data:


NoSQL databases are well-suited for handling large volumes of unstructured or semi-structured data,
making them a popular choice in the context of big data analytics and realtime processing.
Agile Development:
NoSQL databases are conducive to agile development practices, as they allow for quick iteration and
adaptation to changing application requirements without the need for extensive database schema
changes.
Open Source and Community Support:
Many NoSQL databases are open source, fostering a strong community of developers and
contributors. This can be advantageous for learning, troubleshooting, and adopting these databases in
various projects.

 NoSQL Storage types:

NoSQL databases are often covered as part of database management courses. NoSQL
databases are designed to handle large volumes of unstructured or semi-structured data and provide
flexibility and scalability compared to traditional relational databases. There are several types of
NoSQL databases, each with its own strengths and use cases. Here are some common types:
Document Stores:
Examples: MongoDB, CouchDB
Description: Document stores store data in the form of documents, typically in JSON or BSON
format. Each document contains key-value pairs, and collections of documents can be grouped
together.
Key-Value Stores:
Examples: Redis, DynamoDB
Description: Key-value stores are simple databases that store data as key-value pairs. These
databases are efficient for simple retrieval and storage of data.
Column-family Stores:
Examples: Apache Cassandra, HBase
Description: Column-family stores organize data into columns rather than rows. They are suitable for
handling large amounts of sparse data with high write and read throughput.
Graph Databases:
Examples: Neo4j, OrientDB

2
Msc.computer science sem 1

Description: Graph databases are designed to store and process data in the form of graphs, which
consist of nodes and edges. They are especially useful for managing relationships between different
pieces of data.
Object-oriented Databases:
Examples: db4o, ObjectDB
Description: Object-oriented databases store data in the form of objects, similar to the objects used in
object-oriented programming. This type of database is well-suited for applications with complex data
models.
Multi-model Databases:
Examples: ArangoDB, OrientDB
Description: Multi-model databases support multiple data models within a single database system.
This allows users to choose the most appropriate model for their specific use case.

 Advantages and Drawbacks of nosql:

Advantages:
Schema Flexibility:
Advantage: NoSQL databases are schema-less or schemaflexible, allowing for dynamic changes in
data structure without the need for a predefined schema.
Example: MongoDB allows for flexible document structures, accommodating changes in data
requirements without affecting the entire database.
Scalability:
Advantage: NoSQL databases are generally more scalable, both horizontally and vertically,
making them suitable for handling large amounts of data and high traffic. Example: Cassandra is
designed for distributed and horizontally scalable architectures, enabling seamless scaling across
multiple nodes.
Performance:
Advantage: NoSQL databases can provide better performance for certain types of queries and
operations, especially when dealing with large datasets.
Example: Redis, a key-value store, offers high-performance data retrieval and storage for frequently
accessed data.
Support for Unstructured Data:
Advantage: NoSQL databases can handle unstructured and semi-structured data types, making them
suitable for scenarios where data formats are diverse.
Example: Couchbase supports JSON documents, allowing for storage of complex and nested data
structures.
High Availability and Fault Tolerance:
Advantage: Many NoSQL databases are designed with builtin mechanisms for high availability and
fault tolerance, ensuring data availability even in the case of node failures. Example: Apache
Cassandra's decentralized architecture provides fault tolerance by distributing data across multiple
nodes.

Drawbacks:
3
Msc.computer science sem 1

Lack of Standardization:
Drawback: NoSQL databases lack a standardized query language like SQL, which can make it
challenging for developers and analysts to work with the data.
Example: MongoDB uses its query language, and each
NoSQL database may have its own set of query mechanisms.
Consistency and ACID Compliance:
Drawback: Some NoSQL databases sacrifice strong consistency and ACID properties in favor of
performance and scalability, which may not be suitable for all use cases. Example: Cassandra, by
default, offers eventual consistency, which means that different nodes may have slightly different
views of the data for a certain period.
Learning Curve:
Drawback: Developers familiar with traditional relational databases may face a learning curve when
transitioning to NoSQL databases, as they operate on different principles. Example: Developers
accustomed to SQL may need to adapt to the query mechanisms of a document-oriented database
like CouchDB.
Limited Query Capabilities:
Drawback: NoSQL databases may lack the expressive power of SQL for complex querying and
reporting, particularly when dealing with relationships between entities.
Example: Redis, being a key-value store, has limited querying capabilities compared to relational
databases.
Community and Tooling Maturity:
Drawback: Some NoSQL databases have smaller communities and less mature tooling compared to
established relational databases, which can impact support and development resources.
Example: Compared to relational databases like MySQL, certain NoSQL databases may have fewer
third-party tools and integrations.

NoSQL Products Interfacing and interacting with NoSQL:

 Storing Data in and Accessing Data from MongoDB –

MongoDB is a popular NoSQL database that stores data in a flexible, JSON-like format called
BSON (Binary JSON). It is schema-less, which means you can store data without a predefined
structure, allowing for more flexibility and scalability.

Storing data in MongoDB Database Creation:


1)MongoDB stores data in databases, each identified by a unique name.
2)A single MongoDB server can host multiple databases
In MongoDB, data is stored in databases. You can create a new database using the use command.
use mydatabase
Collections:
1)Each database contains collections, which are similar to tables in relational databases.
2)Collections are where you store your documents (data records).
Inside a database, you store data in collections.
Collections are analogous to tables in relational databases.

4
Msc.computer science sem 1

db.createCollection("mycollection")
Inserting Data:
You can insert data into a collection using the insert or insertOne command.
db.mycollection.insert({ key: "value", another_key:
"another_value" }) db.mycollection.insertOne({ key: "value", another_key:
"another_value" })

Accessing Data from MongoDB:


Querying:
You can query MongoDB using the find method. MongoDB supports a rich set of query operators
for filtering and retrieving data.You can perform queries to find documents based on specific criteria.
For example, to find all documents in a collection: db.mycollection.find()
To find documents that match a specific condition:
db.mycollection.find({ key: "value" }) Updating Data:
To update data, you can use the update method. For example, to update a document:
db.mycollection.update({ key: "value" }, { $set: { key:
"new_value" } })
To update multiple documents: db.mycollection.updateMany({ key: "value" }, { $set: { key:
"new_value" } }) Deleting Data:
To delete documents, you can use the remove method. For example, to delete a document:
db.mycollection.remove({ key: "value" }) To delete all documents that
match a condition: db.mycollection.deleteMany({ key: "value" })

 Storing Data in and Accessing Data from Redis-

1. Introduction to Redis:
Redis (Remote Dictionary Server) is an open-source, inmemory data structure store that can be used
as a database, cache, and message broker.
It supports various data structures such as strings, hashes, lists, sets, and more.
Redis is known for its high performance and simplicity.
2. Basic Data Structures in Redis:
Strings: Simple key-value pairs where the key is a string and the value is a string.
Hashes: Key-value pairs where the key is a string and the value is another set of key-value pairs.
Lists: Ordered collections of strings.
Sets: Unordered collections of unique strings.
Sorted Sets: Similar to sets, but each element is associated with a score, allowing for ordering.

3)Starting Redis Server:


Once installed, start the Redis server. You can run it in the
background or foreground depending on your preference redis-server
4)Connecting to Redis:

5
Msc.computer science sem 1

You can connect to Redis using a Redis client library in your programming language of choice.
Popular libraries exist for languages like Python, Java, Node.js, etc.

5)Data Types and Commands:


Redis supports various data types, and each type has associated commands. Here are some examples:
Strings:
SET key value
GET key

Hashes:
HSET key field value
HGET key field

Lists:
LPUSH key value
LRANGE key start stop

6
Msc.computer science sem 1

Sets:
SADD key member
SMEMBERS key
Sorted Sets:
ZADD key score member
ZRANGE key start stop 6)Error Handling:
Make sure to handle errors appropriately, especially when dealing with network-related issues or
unexpected responses from the Redis server.
7)Persistence and Configuration:
Redis supports both in-memory storage and persistence to disk. You can configure Redis to
periodically save data to disk.
8)Security:
For production environments, consider securing your Redis instance, such as by setting a password
and restricting access.
9)Scaling:
Redis can be used in a clustered setup for horizontal scaling to handle larger amounts of data and
traffic.

 Storing Data in and Accessing Data from HBase-

HBase is a distributed, scalable, and NoSQL database that is built on top of the Hadoop Distributed
File System (HDFS). It is designed to handle large amounts of sparse data and is well-suited for
applications that require random, realtime read and write access to Big Data.
Table Structure:
HBase organizes data into tables, which consist of rows and columns.
Each table has a primary key (row key), and data is stored sorted by this key.
Tables can have multiple column families, and each column family can have multiple columns.
Column Families and Columns:
Data is grouped into column families, and each column family has a set of columns.
Column families should be defined during table creation and cannot be changed later.
Columns don't need to be predefined and can be added dynamically. HBase Shell:
HBase provides a shell that allows you to interact with the database using commands. HBase
contains a shell using which you can communicate with HBase. HBase uses the Hadoop File System
to store its data. It will have a master server and region servers. The data storage will be in the form
of regions (tables). These regions will be split up and stored in region servers.
7
Msc.computer science sem 1

You can create tables, insert data, and perform various operations using the shell.

APIs:
API stands for Application Programming Interface. In the context of APIs, the word Application
refers to any software with a distinct function. Interface can be thought of as a contract of service
between two applications. This contract defines how the two communicate with each other using
requests and responses.
HBase supports APIs for different programming languages, such as Java, Python, and others.
Using these APIs, you can interact with HBase programmatically in your applications.

1)Java API:
HBase provides a Java API that allows you to interact with HBase programmatically. You can use
this API to perform operations such as creating tables, putting data, scanning, and getting data.
2)HBase Shell:
The HBase shell is a command-line interface that allows you to interact with HBase using simple
commands. It's a quick way to perform basic operations.
3)HBase REST API:
HBase also provides a RESTful API, which allows you to interact with HBase using HTTP methods.
This can be useful for integrating HBase with applications that use web services.

Accessing Data from HBase:


1)Get Data by Row Key:
Retrieve data from HBase using the get command. Provide the table name and row key.

2)Scan Data:

8
Msc.computer science sem 1

Use the scan command to retrieve multiple rows of data from HBase. You can specify start and stop
row keys, column families, column qualifiers, and other parameters.

3)Delete Data:
Use the delete command to remove data from HBase.

 Storing Data in and Accessing Data from Apache Cassandra-

Apache Cassandra is an open-source, distributed NoSQL database system designed to handle large
amounts of data across many commodity servers, providing high availability with no single point of
failure. It was initially developed by Facebook and later open-sourced as Apache Cassandra.
Apache Cassandra is used to manage very large amounts of structure data spread out across
the world. It provides highly available service with no single point of failure. Listed below are some
points of Apache Cassandra: It is scalable, fault-tolerant, and consistent. It is column-oriented
database.
1)CQL (Cassandra Query Language):
CQL is the query language used to interact with Cassandra. It is similar to SQL but has some
differences to accommodate the NoSQL nature of Cassandra.
2)Consistency Levels:
Cassandra allows you to choose the consistency level for each read and write operation.
Consistency levels determine how many nodes must respond to consider the operation successful.
3)Secondary Indexes:
While Cassandra is designed for efficient querying based on the primary key, it also supports
secondary indexes for nonprimary key columns.
However, using secondary indexes can impact performance and should be used judiciously.

Storing Data Start Cassandra:


Start the Cassandra server on each node in your cluster. Start the Cassandra server using the
appropriate command for your operating system. For example:

Access Cassandra Shell:

9
Msc.computer science sem 1

Connect to the Cassandra cluster using the Cassandra Query Language (CQL) shell. You can use the
‘cqlsh’ command to interact with the cluster.
Create a Keyspace:
A keyspace in Cassandra is a container for tables. Create a keyspace using CQL. For example:

Create a Table:
Define a table within your keyspace. Specify the primary key and other columns. For example:

Insert Data:
Insert data into your table using CQL. For example:

Accessing Data:
Query Data:
Use CQL to query data from your table. For example:

Update Data:
Update existing data using CQL. For example:

Delete Data:
Delete data using CQL. For example:

Advanced Queries:
Cassandra supports various query operations, including secondary indexes, filtering, and
aggregations. Explore the CQL documentation for more advanced query options.
Driver API:
To interact with Cassandra from your application code, use one of the available Cassandra drivers.
Drivers are available for multiple programming languages, such as Java, Python, Node.js, etc.

10
Msc.computer science sem 1

 Language Bindings for NoSQL Data Stores-


Relational databases store the vast majority of web application persistent data. However, there are
several alternative classifications of storage representations.
1)Key-value pair
2)Document-oriented
3)Column-family table
4)Graph
These persistent data storage representations are commonly used to augment, rather than completely
replace, relational databases. The underlying persistence type used by the NoSQL database often
gives it different performance characteristics than a relational database, with better results on some
types of read/writes and worse performance on others.
1)Key-value Pair
Key-value pair data stores are based on hash map data structures.
Key-value pair data stores
Redis is an open source in-memory key-value pair data store. It can be used for caching, queuing,
and storing session data for faster access than a traditional relational database, among many other use
cases.
Memcached is another widely used in-memory key-value pair storage system.

2)Document-oriented
A document-oriented database provides a semi-structured representation for nested data.
Document-oriented data stores
MongoDB is an open source document-oriented data store with a Binary Object Notation (BSON)
storage format that is JSON-style and familiar to web developers. PyMongo is a commonly used
client for interfacing with one or more MongoDB instances through Python code. MongoEngine is a
Python ORM specifically written for MongoDB that is built on top of PyMongo.
Riak is an open source distributed data store focused on availability, fault tolerance and large scale
deployments.
Apache CouchDB is also an open source project where the focus is on embracing RESTful-style
HTTP access for working with stored JSON data.

3)Column-family table
A column-family table class of NoSQL data stores builds on the key-value pair type. Each key-value
pair is considered a row in the store while the column family is similar to a table in the relational
database model.
Column-family table data stores
Apache Cassandra
Apache Cassandra is an open-source, distributed NoSQL database system designed to handle
large amounts of data across many commodity servers, providing high availability with no single
point of failure. It was initially developed by Facebook and later open-sourced as Apache Cassandra.
Apache HBase
Apache HBase is a NoSQL, column-oriented database
that is built on top of the Hadoop ecosystem. It is designed to provide low-latency, high-throughput
access to large-scale, distributed datasets.
11
Msc.computer science sem 1

4)Graph
A graph database represents and stores data in three aspects: nodes, edges and properties.
A node is an entity, such as a person or business.
An edge is the relationship between two entities. For example, an edge could represent that a node
for a person entity is an employee of a business entity.
A property represents information about nodes. For example, an entity representing a person could
have a property of "female" or "male".
Graph data stores
Neo4j is one of the most widely used graph databases and runs on the Java Virtual Machine stack.
These drivers enable developers to work with the graph database capabilities of Neo4j in their
preferred programming language.
Cayley is an open source graph data store written by Google primarily written in Go.
Titan is a distributed graph database built for multi-node clusters.

 Understanding the storage architecture:

 Working with ColumnOriented Databases

Column-oriented databases are a type of database management system (DBMS) that store and
retrieve data in a column-wise fashion, as opposed to row-wise storage used in traditional relational
databases. This design choice offers certain advantages, especially for analytical and reporting
workloads. Here are some key aspects and concepts related to column-oriented databases:
1)Column-Oriented vs. Row-Oriented Databases: Understand the fundamental difference between
columnoriented and row-oriented databases. In row-oriented databases, data is stored in rows,
making it efficient for transactional processing. In column-oriented databases, data is stored in
columns, which is beneficial for analytical queries and aggregations.

2)Advantages of Column-Oriented Databases:

12
Msc.computer science sem 1

Explore the advantages of using column-oriented databases, such as improved query performance for
analytical workloads, better compression rates, and reduced I/O requirements for certain types of
queries.
3)Data Warehousing and Analytics:
Recognize that column-oriented databases are often used in data warehousing environments where
large volumes of data need to be analyzed for business intelligence and reporting purposes.
Familiarize yourself with the role of these databases in supporting analytics.
4)Popular Column-Oriented Databases:
Learn about popular column-oriented database systems such as Apache Cassandra, Apache HBase,
Google Bigtable, and ClickHouse. Understand their features, use cases, and how they differ from one
another.
5)Query Optimization:
Gain knowledge of query optimization techniques specific to column-oriented databases. Understand
how indexing, compression, and other optimization strategies can enhance query performance.
6)Data Modeling:
Explore data modeling considerations for column-oriented databases. Understand how to design
schemas that align with the strengths of these databases and support efficient query processing.
7)Distributed Systems:
Many column-oriented databases are designed to operate in distributed environments. Study the
principles of distributed systems and how they apply to the scalability and fault tolerance of column-
oriented databases.
8)Integration with Programming Languages:
Understand how to interact with column-oriented databases using programming languages. Many of
these databases provide APIs for languages like Java, Python, and others.
9)Security and Compliance:
Explore security measures and compliance considerations when working with column-oriented
databases, especially in scenarios involving sensitive or regulated data.
10)Research and Current Trends:
Stay updated on the latest research and trends in columnoriented databases. As the field of database
management evolves, new technologies and approaches may emerge.

 HBase Distributed Storage Architecture-

13
Msc.computer science sem 1

HBase is a distributed, scalable, and NoSQL database that is designed to provide random
access and strong consistency for large amounts of sparse data. It is built on top of the Hadoop
Distributed File System (HDFS) and is part of the Apache Hadoop project. Here is an overview of
the key components and architecture of HBase:
HBase Cluster:
HBase is designed to operate on a cluster of machines to achieve scalability and fault tolerance. The
cluster typically consists of multiple nodes (servers), and each node contributes to the storage and
processing capabilities of the HBase system.
HMaster:
The HMaster server is a master node in the HBase cluster that manages the metadata and coordinates
the overall operation of the HBase system. It keeps track of region servers, assigns regions to region
servers, and handles administrative tasks.
Region Servers:
Region servers are responsible for serving data for a set of contiguous regions. A region is a subset of
a table's data and is distributed across multiple region servers. Each region server manages multiple
regions, and the load is distributed among the region servers to achieve parallel processing.
Zookeeper:
Apache ZooKeeper is used by HBase for distributed coordination and synchronization. It helps in
managing distributed systems by providing a distributed configuration service, synchronization, and
naming registry. HBase uses ZooKeeper to elect a master (HMaster) and for distributed
synchronization between the HMaster and region servers.
HFile:
HBase stores data in a distributed and fault-tolerant manner using HFiles. HFiles are sorted and
indexed data blocks that contain key-value pairs. Each column family in an HBase table has a
separate set of HFiles on the Hadoop Distributed File System (HDFS).
HLog (Write-Ahead Log):
HBase uses a Write-Ahead Log (HLog) to store changes to data before they are written to the HFiles.
The HLog helps in recovery from failures by allowing the system to replay operations that were not
yet persisted to the HFiles.
MemStore:
When data is written to an HBase table, it first goes to an inmemory data structure called MemStore.
MemStore is a write-ahead log that holds the most recent updates to the data. When MemStore
reaches a certain threshold, its contents are flushed to HFiles on the HDFS.
Hadoop Distributed File System (HDFS):
HBase relies on HDFS for distributed storage. HBase tables are divided into regions, and each region
is stored as an HFile on HDFS. HDFS provides fault tolerance and scalability, making it suitable for
large-scale storage needs.
Advantages of HBase- -Can store large data sets
-Database can be shared
-Cost-effective from gigabytes to petabytes
-High availability through failover and replication

Disadvantages of HBase- -No support SQL


structure

14
Msc.computer science sem 1

-No transaction support


-Sorted only on key
-Memory issues on the cluster

 Document Store Internals-


A document store, in the context of databases, is a type of NoSQL database that stores, retrieves, and
manages document-oriented information. Instead of organizing data in tables with rows and columns
like relational databases, document stores store data in flexible, semi-structured formats, typically
using formats like JSON or BSON (Binary JSON).
Data Model:
Document Format: Document stores store data as documents, which are typically JSON or BSON
objects. These documents can contain nested structures and arrays, providing flexibility in data
representation.
Schema-less Design: Document stores are schema-less, meaning that each document in the
database can have a different structure. This flexibility is useful for evolving data models over
time. Storage and Indexing:
BSON/JSON Storage: Documents are typically stored in a binary format like BSON (Binary JSON)
or JSON. BSON is more space-efficient and allows for fast parsing.
Indexing: Document stores use various indexing mechanisms to speed up query performance.
Common indexing techniques include B-tree indexes, hash indexes, and full-text indexes.
Query Language:
Querying Documents: Document stores often provide a query language for retrieving documents.
Queries are usually expressed using a syntax similar to the document format (e.g., JSON-like
queries).
Map-Reduce: Some document stores support map-reduce operations for complex data
transformations.
Concurrency Control:
Concurrency Handling: Document stores implement concurrency control mechanisms to handle
multiple concurrent read and write operations. This may involve techniques like multi-version
concurrency control (MVCC) to ensure consistency. Replication and Sharding:
Replication: Document stores often support replication for high availability and fault tolerance.
Changes made to one node are replicated to other nodes in the cluster.
Sharding: Sharding involves horizontally partitioning data across multiple nodes to distribute the
load. Document stores may use techniques like range-based or hash-based sharding.
Transaction Support:
ACID Properties: While traditional relational databases follow
ACID properties (Atomicity, Consistency, Isolation, Durability), some document stores relax these
constraints for better scalability and performance. However, there is a trend towards providing
stronger consistency guarantees.
Compression and Compaction:
Compression: Document stores often employ data compression techniques to reduce storage space
and improve I/O performance.
Compaction: Periodic compaction processes may be used to reclaim space and optimize storage.
Security:

15
Msc.computer science sem 1

Authentication and Authorization: Document stores provide mechanisms for user authentication and
authorization to control access to the database.
Encryption: Encryption may be used to secure data in transit and at rest.
Distributed Systems Concepts:
CAP Theorem: Document stores need to consider the CAP theorem (Consistency, Availability,
Partition Tolerance) when designing distributed systems.
Eventual Consistency: Many document stores aim for eventual consistency in distributed scenarios.
Optimizations and Caching:
Query Optimization: Document stores implement query optimization techniques to improve query
performance. Caching: Caching mechanisms, both at the application level and within the database,
can be employed to reduce latency.

 Understanding Key/Value Stores in Memcached and Redis-

Memcached and Redis are both popular in-memory key/value stores used for caching and data
storage. They are often employed in web applications to improve performance by reducing the time it
takes to fetch data from a traditional database.

Memcached:
1.Key/Value Store:
Memcached is a distributed memory caching system that stores data in the form of key/value pairs.
Keys are unique identifiers for the stored data.
Values are the actual data associated with the keys.
2.Distributed Caching:
Memcached is designed to be distributed across multiple nodes, allowing for horizontal scaling and
improved performance.
Each node in the Memcached cluster is responsible for a subset of the keys.
3.Memory-Based Storage:
Memcached is entirely memory-based, meaning that data is stored in RAM for fast access.
This makes it suitable for caching frequently accessed data to reduce database load.
4.Data Expiration:
Each key in Memcached can have a specified expiration time. After this time elapses, the key/value
pair is automatically evicted from the cache. 5.Simple Data Model:
Memcached provides a simple data model, supporting basic data types like strings and binary data.
It does not have advanced features like data structures or persistence.

Redis:
1.Key/Value Store:
Redis is often referred to as a data structure server rather than just a key/value store because it
supports a rich set of data types beyond simple key/value pairs.
Redis supports strings, lists, sets, hashes, and more as values associated with keys.
2.Persistence:
Unlike Memcached, Redis has the capability to persist data to disk, providing durability.
It can be configured to snapshot data at specified intervals or to append changes to a log file.
16
Msc.computer science sem 1

3.Advanced Data Structures:


Redis offers a variety of data structures like lists, sets, sorted sets, and hashes, making it more
versatile for various use cases.
This allows for complex operations on the data directly within the Redis server.
4.Atomic Operations:
Redis supports atomic operations on these complex data structures, enabling powerful and efficient
operations.
5.Single-Threaded:
Redis is single-threaded, meaning that each operation is executed sequentially. However, this can be
advantageous in certain scenarios.
6.Replication and Clustering:
Redis supports master-slave replication for fault tolerance and scaling.
Redis Cluster is a more advanced feature that allows automatic partitioning of data across multiple
nodes.
Use Cases:
Memcached:
Ideal for simple caching scenarios where the goal is to reduce database load by storing frequently
accessed data in memory.
Best suited for read-heavy workloads.
Redis:
Suitable for a broader range of use cases, including caching, real-time analytics, leaderboards,
messaging systems, and more.
Provides additional features like persistence and complex data structures, making it a versatile data
store.

 Eventually Consistent Non-relational- Eventually Consistent:

Eventually consistent" is a term often used in the context of distributed databases, especially in the
realm of non-relational databases. This consistency model acknowledges that in a distributed system,
it may take some time for all nodes to achieve a consistent view of the data.
In an eventually consistent system:
Updates Propagate Over Time: When data is updated in one node of the distributed system, it doesn't
instantly propagate to all other nodes. Instead, the update is propagated over time.
Temporary Inconsistencies: During the propagation period, different nodes may have different views
of the data. This means that at any given point in time, there might be temporary inconsistencies
among the nodes.
Convergence: Eventually, with enough time and system stability, all nodes will converge to a
consistent state where they all have the same data. The system is said to have achieved eventual
consistency.
Consistency Models: In the context of databases, consistency refers to how up-to-date and
synchronized the data is across multiple nodes or replicas in a distributed system. Eventual
Consistency: This model allows for some time lag between the update of a piece of data and its
propagation to all nodes in the system. In other words, the system guarantees that, given enough time
and no further updates, all replicas will converge to the same state.

17
Msc.computer science sem 1

Non-relational database:
Non-relational databases, also known as NoSQL databases, are a broad category of database
systems that do not adhere to the traditional relational database model. They are designed to handle
diverse types of data and can scale horizontally to handle large amounts of data and high transaction
volumes.
NoSQL databases are a broad category of databases that do not strictly adhere to the
traditional relational database model. They are designed to handle large volumes of unstructured or
semi-structured data and are often used in distributed and horizontally scalable architectures.

Some common types of NoSQL databases include: Document Stores: Store and query data in a
flexible, semistructured format (e.g., MongoDB).
Key-Value Stores: Use a simple key-value pair for data storage and retrieval (e.g., Redis,
DynamoDB).
Column-family Stores: Organize data into columns instead of rows, suitable for analytical queries
(e.g., Apache Cassandra). Graph Databases: Designed for managing and querying graph-structured
data (e.g., Neo4j).

 Databases Performing CRUD operations:

 Creating Records, Accessing Data, Updating and Deleting Data-


Performing CRUD operations (Create, Read, Update, Delete) is a fundamental aspect of
working with databases. which involves adding new records to a database. The specific steps and
syntax can vary depending on the type of database you're working with. I'll provide a general
overview using SQL as an example, but keep in mind that different databases (e.g., MySQL,
PostgreSQL, SQLite, Microsoft SQL Server) may have variations in syntax.
SQL Example:
Create Table: Before inserting records, you need to have a table in your database to store the data.
You define the structure of the table with columns and their data types.

Insert Record: Once the table is created, you can use the INSERT INTO statement to add new
records.

Insert Multiple Records: You can insert multiple records in a single statement.
18
Msc.computer science sem 1

Create (C):
Objective: Add new records to the database.
SQL Example (for a relational database):

Procedure:
Specify the table in which you want to insert data.
Provide the values for each column, corresponding to the new record.

Read (R):
Objective: Retrieve data from the database. SQL Example (for a relational
database):

Procedure:
Specify the columns you want to retrieve.
Specify the table from which you want to retrieve data. Optionally, use a condition to filter the
results.

Update (U):
Objective: Modify existing records in the database. SQL Example (for a relational database):

Procedure:
Specify the table to be updated.
Set the new values for the columns.
Use a condition to specify which records should be updated.

Delete (D):
Objective: Remove records from the database. SQL Example (for a relational
database):
19
Msc.computer science sem 1

Procedure:
Specify the table from which you want to delete records.
Use a condition to specify which records should be deleted. ---Understanding purpose---
CAP theorem, also known as Brewer's theorem, is a concept in distributed computing that
highlights the trade-offs between three desirable properties of a distributed system: Consistency,
Availability, and Partition Tolerance. The CAP theorem was introduced by computer scientist Eric
Brewer in 2000.
Consistency (C): All nodes in the distributed system see the same data at the same time. In other
words, a write operation on one node is immediately visible to all other nodes. Availability (A):
Every request to the distributed system receives a response, without guarantee that it contains the
most recent version of the information. In other words, the system remains operational and
responsive despite node failures or network partitions.
Partition Tolerance (P): The system continues to operate despite network partitions (communication
failures) that might prevent some nodes from communicating with others. The CAP theorem states
that in a distributed system, it's impossible to simultaneously achieve all three of these properties. A
distributed system can, at most, guarantee two out of the three. This is often represented as a triangle,
where each corner represents one of the three properties, and the trade-offs lie within the interior of
the triangle.
The theorem is often expressed using the following statements: CA: If you prioritize Consistency and
Availability, the system may not tolerate network Partitions.
CP: If you prioritize Consistency and Partition Tolerance, the system may experience periods of
unavailability.
AP: If you prioritize Availability and Partition Tolerance, the system may not guarantee consistency
at all times.
Different distributed databases and systems are designed with different trade-offs based on the
specific requirements of their use cases. For example, traditional relational databases often
prioritize Consistency and Partition Tolerance (CP), while some NoSQL databases, like Cassandra
and Couchbase, may prioritize Availability and Partition Tolerance (AP). It's essential for architects
and developers to understand these trade-offs when designing and selecting distributed systems for
particular applications.

20

You might also like