0% found this document useful (0 votes)

49 views26 pages

Big Data Bhag 4 Changes

Uploaded by

Ash

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

49 views26 pages

Big Data Bhag 4 Changes

Uploaded by

Ash

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 26

UNIT -- 4TH “BIG DATA”

Overview of NoSql:
NoSQL is a term used to refer to non-relational databases that are designed to handle large
volumes of unstructured or semi-structured data. These databases are used to store and manage
big data,
which is a term used to describe the massive amounts of data that organizations generate on a daily
basis.

Traditional relational databases are designed to handle structured data, but they are not well-suited
for managing big data. This is because big data is often unstructured or semi-structured, and it can
be difficult to store and manage this type of data in a relational database. NoSQL databases, on the
other hand, are designed to handle unstructured and semi-structured data, making them an ideal
choice for big data applications.

NoSQL databases come in a variety of different types, including document-oriented databases, key-
value stores, column-oriented databases, and graph databases. Each type of NoSQL database is
designed to handle a specific type of data and has its own unique features and capabilities.

One of the key advantages of NoSQL databases is their ability to scale horizontally. This means that
they can handle large volumes of data by adding more servers to the database cluster. NoSQL
databases also offer high availability and fault tolerance, which means that they can continue to
operate even if one or more servers fail.

Key Features/Characteristics of NoSQL :

1. Dynamic schema: NoSQL databases do not have a fixed schema and can
accommodate changing data structures without the need for migrations or schema
alterations.

2. Horizontal scalability: NoSQL databases are designed to scale out by adding more nodes to
a database cluster, making them well-suited for handling large amounts of data and high
levels of traffic.

3. Document-based: Some NoSQL databases, such as MongoDB, use a document-based

data model, where data is stored in semi-structured format, such as JSON or BSON.

4. Key-value-based: Other NoSQL databases, such as Redis, use a key-value data model,
where data is stored as a collection of key-value pairs.

5. Column-based: Some NoSQL databases, such as Cassandra, use a column-based data

model, where data is organized into columns instead of rows.

6. Distributed and high availability: NoSQL databases are often designed to be highly
available and to automatically handle node failures and data replication across multiple
nodes in a database cluster.

7. Flexibility: NoSQL databases allow developers to store and retrieve data in a flexible
and dynamic manner, with support for multiple data types and changing data
structures.

8. Performance: NoSQL databases are optimized for high performance and can handle a
high volume of reads and writes, making them suitable for big data and real-time
applications.
Advantages of NoSQL: There are many advantages of working with NoSQL databases such as
MongoDB and Cassandra. The main advantages are high scalability and high availability.

1. High scalability : NoSQL databases use sharding for horizontal scaling. Partitioning of data
and placing it on multiple machines in such a way that the order of the data is preserved
is sharding. Vertical scaling means adding more resources to the existing machine
whereas horizontal scaling means adding more machines to handle the data. Vertical
scaling is not that easy to implement but horizontal scaling is easy to implement.
Examples of horizontal scaling databases are MongoDB, Cassandra, etc. NoSQL can
handle a huge amount of data
because of scalability, as the data grows NoSQL scale itself to handle that data in an
efficient manner.
Sharding is a method of storing data records across many server instances. This is
done through storage area networks to make hardware perform like a single server.
The NoSQL framework is natively designed to support automatic distribution of the
data across multiple servers including the query load.
2. Flexibility: NoSQL databases are designed to handle unstructured or semi-structured
data, which means that they can accommodate dynamic changes to the data model. This
makes
NoSQL databases a good fit for applications that need to handle changing data requirements.

3. High availability : Auto replication feature in NoSQL databases makes it highly

available because in case of any failure data replicates itself to the previous consistent
state.

4. Scalability: NoSQL databases are highly scalable, which means that they can handle large
amounts of data and traffic with ease. This makes them a good fit for applications that
need to handle large amounts of data or traffic

5. Performance: NoSQL databases are designed to handle large amounts of data and traffic,
which means that they can offer improved performance compared to traditional
relational databases.

6. Cost-effectiveness: NoSQL databases are often more cost-effective than traditional

relational databases, as they are typically less complex and do not require expensive
hardware or software.

Disadvantages of NoSQL: NoSQL has the following disadvantages.

1. Lack of standardization : There are many different types of NoSQL databases, each with its
own unique strengths and weaknesses. This lack of standardization can make it difficult to
choose the right database for a specific application

2. Lack of ACID compliance : NoSQL databases are not fully ACID-compliant, which means
that they do not guarantee the consistency, integrity, and durability of data. This can be a
drawback for applications that require strong data consistency guarantees.

3. Narrow focus : NoSQL databases have a very narrow focus as it is mainly designed for
storage but it provides very little functionality. Relational databases are a better choice in
the field of Transaction Management than NoSQL.

4. Open-source : NoSQL is open-source database. There is no reliable standard for NoSQL

yet. In other words, two database systems are likely to be unequal.
5. Lack of support for complex queries : NoSQL databases are not designed to handle
complex queries, which means that they are not a good fit for applications that require
complex data analysis or reporting.
6. Lack of maturity : NoSQL databases are relatively new and lack the maturity of
traditional relational databases. This can make them less reliable and less secure than
traditional databases.

7. Management challenge : The purpose of big data tools is to make the management of a
large amount of data as simple as possible. But it is not so easy. Data management in
NoSQL is much more complex than in a relational database. NoSQL, in particular, has a
reputation for being challenging to install and even more hectic to manage on a daily basis.

8. GUI is not available : GUI mode tools to access the database are not flexibly available in
the market.

9. Backup : Backup is a great weak point for some NoSQL databases like MongoDB.
MongoDB has no approach for the backup of data in a consistent manner.

10. Large document size : Some database systems like MongoDB and CouchDB store data in
JSO5N format. This means that documents are quite large (BigData, network bandwidth,
speed), and having descriptive key names actually hurts since they increase the
document size.

NoSQL databases categories/storage type:

NoSQL (Not Only SQL) storage types are non-relational databases that are designed to handle large
volumes of unstructured or semi-structured data. These databases are often used to manage big
data, which is characterized by its size, complexity, and diversity.

There are several different types of NoSQL storage types, including:

1. Document databases: These databases store data as semi-structured documents, such

as JSON or XML, and can be queried using document-oriented query languages.

Relational Vs.
Document
In this diagram on your left you can see we have rows and columns, and in the right,
we have a document database which has a similar structure to JSON. Now for the
relational database, you have to know what columns you have and so on. However,
for a document database, you have data store like JSON object. You do not require to
define which make it flexible.

The document type is mostly used for CMS systems, blogging platforms, real-time
analytics & e- commerce applications. It should not use for complex transactions
which require multiple operations or queries against varying aggregate structures.
Amazon SimpleDB, CouchDB, MongoDB, Riak, Lotus Notes, MongoDB, are popular
Document originated DBMS systems.

2. Key-value stores: These databases store data as key-value pairs, and are optimized
for simple It is designed in such a way to handle lots of data and heavy load.

Key-value pair storage databases store data as a hash table where each key is
unique, and the value can be a JSON, BLOB(Binary Large Objects), string, etc.

For example, a key-value pair may contain a key like “Website” associated with a value.

It is one of the most basic NoSQL database example. This kind of NoSQL
database is used as a collection, dictionaries, associative arrays, etc. Key value
stores help the developer to store schema-less data. They work best for
shopping cart contents.

Redis, Dynamo, Riak are some NoSQL examples of key-value store DataBases. They are all
based
on Amazon’s Dynamo paper.

3. Column-family stores: These databases store data as column families, which are sets of
columns that are treated as a single entity. They are optimized for fast and efficient
querying of large amounts of data.
Values of single column databases are stored contiguously.

z
Column based NoSQL
database

They deliver high performance on aggregation queries like SUM, COUNT, AVG, MIN
etc. as the data is readily available in a column.

Column-based NoSQL databases are widely used to manage data warehouses,

business intelligence, CRM, Library card catalogs,

HBase, Cassandra, HBase, Hypertable are NoSQL query examples of column based database.
4. Graph databases: These databases store data as nodes and edges, and are designed to
handle complex relationships between data. A graph type database stores entities as
well the relations amongst those entities. The entity is stored as a node with
the relationship as edges. An edge gives a relationship between nodes. Every
node and edge has a unique identifier.

Compared to a relational database where tables are loosely connected, a Graph

database is a multi-relational in nature. Traversing relationship is fast as they are
already captured into the DB, and there is no need to calculate them.

Graph base database mostly used for social networks, logistics, spatial data.

Neo4J, Infinite Graph, OrientDB, FlockDB are some popular graph-based databases.

NoSQL databases are often used in applications where there is a high volume of data that needs
to be processed and analyzed in real-time, such as social media analytics, e-commerce, and
gaming. They can also be used for other applications, such as content management systems,
document management, and customer relationship management.

Each type of NoSQL storage type has its own unique features and benefits, making them suitable for
different types of data management applications. NoSQL databases are becoming increasingly
popular due to their ability to handle big data and provide high scalability, availability, and fault
tolerance.

NoSql Products:

1. MongoDB:
MongoDB is a popular NoSQL database that is designed to handle large volumes of unstructured or
semi-structured data. Unlike traditional relational databases, which use tables and columns to
store data, MongoDB uses a document-oriented data model that allows for flexible schema designs
and easy storage and retrieval of complex data structures.

Some of the key features of MongoDB include:

https://www.mongodb.com/what-is-mong5odb/features
1. Document-oriented data model: MongoDB stores data iwn flexible documents that can have
varying structures or fields. This allows for easy and efficient storage and retrieval of
complex data structures, such as nested arrays and objects.

2. Scalability: MongoDB is designed to scale horizontally, meaning that it can easily handle
large volumes of data across multiple servers. It supports automatic sharding, which allows
data to be partitioned across multiple servers, and provides native support for replication,
ensuring
that data is always available even in the event of a server failure.

3. Flexible querying and indexing: MongoDB provides a powerful query language that allows
for rich data filtering and aggregation, as well as support for secondary indexes that allow
for fast queries on specific data fields. It also supports text search, geospatial queries, and
other advanced querying capabilities.

4. Ease of use: MongoDB is designed to be easy to use, with a simple and intuitive query
language and a flexible data model that allows for easy data manipulation. It also provides
a web-based GUI called MongoDB Compass for managing and visualizing data.

5. Community and ecosystem: MongoDB has a large and active community of users and
contributors, which has resulted in a rich ecosystem of third-party tools and extensions.
This includes popular frameworks like Mongoose, which provides an object modeling layer
for MongoDB, and Stitch, which provides a serverless platform for building and deploying
MongoDB-based applications.

2. Cassandra:
Cassandra uses a column-family data model, which is a type of NoSQL data model that stores data in
column families rather than tables. Each column family contains a set of rows, and each row can
have a different number of columns. This allows for flexible schema designs and easy storage and
retrieval of complex data structures.

Some of the key features of Cassandra include:

https://en.wikipedia.org/wiki/Apache_Cassandra

Distributed
Every node in the cluster has the same role. There is no single point of failure. Data is distributed across the cluster
(so each node contains different data), but there is no master as every node can service any request.
which allows data to be partitioned across multiple servers, and provides native support for
replication, ensuring that data is always available even in the event of a node failure.

1)Scalability: Cassandra is designed to be highly scalable, allowing it to handle large volumes of

data across multiple nodes in a distributed environment. It supports automatic sharding,

1. High availability: Cassandra is designed to be highly available, ensuring that data is

always accessible even in the event of a node failure. It provides support for multi-data
center
replication, allowing data to be replicated across multiple data centers for improved fault
tolerance.

2. Performance: Cassandra is designed for high performance, with support for fast reads and
writes. It provides a powerful query language called Cassandra Query Language (CQL), which
allows for rich data filtering and aggregation. It also supports secondary indexes, which
allow for fast queries on specific data fields.

3. Flexible data model: Cassandra's column-family data model allows for flexible schema
designs and easy storage and retrieval of complex data structures. It also provides support
for collections and user-defined types, which allow for further flexibility in data modeling.
4. Community and ecosystem: Cassandra has a large and active community of users and
contributors, which has resulted in a rich ecosystem of third-party tools and extensions.
This includes popular frameworks like Apache Spark and Apache Hadoop, which provide
integration with Cassandra for big data analytics.

3..Redis:

Redis is a popular NoSQL database that is designed to handle high-performance data storage and
retrieval. It is an open-source, in-memory data structure store that can be used as a database, cache,
and message broker.

Redis uses a key-value data model, where each value is associated with a unique key. It is
optimized for low-latency and high-throughput operations, making it an ideal choice for real-time
applications that require fast data access.

Some of the key features of Redis include:

1. In-memory data storage: Redis stores data in memory, which allows for fast data access
and retrieval. It also provides support for persistent data storage, which allows data to be
saved to disk for durability.

2. High performance: Redis is designed for high performance, with support for fast reads and
writes. It can handle large volumes of data and provides support for pipelining and
batching operations to improve throughput.

3. Flexible data structures: Redis provides a wide range of data structures, including strings,
hashes, lists, sets, and sorted sets. This allows for flexible data modeling and easy
storage and retrieval of complex data structures.

4. Advanced features: Redis provides support for advanced features such as transactions,
Lua scripting, and pub/sub messaging. It also provides support for geospatial indexing and
search.

5. Community and ecosystem: Redis has a large and active community of users and
contributors, which has resulted in a rich ecosystem of third-party tools and extensions.
This includes popular libraries like Redisson and Jedis, which provide integration with Redis
for Java applications.

4. Neo4j:

Neo4j is a popular NoSQL database that is designed to handle graph-based data storage and
retrieval. It is an open-source, ACID-compliant graph database that can be used to model and
store complex relationships between data.

Neo4j uses a property graph data model, where nodes represent entities and edges represent the
relationships between them. It is optimized for fast traversal and manipulation of graph data,
making it an ideal choice for applications that require complex querying and analysis of relationships.

Some of the key features of Neo4j include:

1. Graph-based data storage: Neo4j stores data in a graph-based format, which allows
for flexible data modeling and easy storage and retrieval of complex relationships.

2. Fast querying: Neo4j is optimized for fast querying and traversal of graph data. It provides
a powerful query language called Cypher, which allows for rich data filtering and
aggregation.

3. Flexible data model: Neo4j's property graph data model allows for flexible schema
designs and easy storage and retrieval of complex data structures. It also provides
support for
labeled relationships and dynamic properties, which allow for further flexibility in data
modeling.

4. High scalability: Neo4j is designed to be highly scalable, allowing it to handle large volumes
of data across multiple nodes in a distributed environment. It provides support for
sharding and clustering, ensuring that data is always available even in the event of a node
failure.

5. Community and ecosystem: Neo4j has a large and active community of users and
contributors, which has resulted in a rich ecosystem of third-party tools and extensions.
This includes popular libraries like APOC and GraphAware, which provide integration with
Neo4j for advanced graph algorithms and visualization.

Data Management for Big Data:

Data management for big data refers to the processes and tools used to collect, store, process,
and analyze large and complex data sets. Big data management involves a combination of
technologies, strategies, and methodologies that are designed to handle the volume, velocity, and
variety of data generated by modern organizations.

Some of the key aspects of data management for big data include:

1. Data collection: Big data management begins with the collection of data from
various sources, such as social media, IoT devices, and enterprise systems. This data
is often generated in real-time and can be structured or unstructured.

2. Data storage: Big data requires scalable and flexible storage solutions that can handle
the volume and variety of data being generated. This includes both traditional data
storage
technologies such as relational databases and newer technologies such as NoSQL databases
and Hadoop Distributed File System (HDFS).

3. Data processing: Big data processing involves the use of tools and technologies to clean,
transform, and analyze data. This includes batch processing using technologies like Apache
Spark and Apache Hadoop, as well as real-time processing using technologies such as Apache
Kafka and Apache Flink.

4. Data analysis: Big data analysis involves the use of advanced analytics and machine
learning techniques to extract insights and derive value from large and complex data sets.
This
includes technologies such as data mining, predictive analytics, and natural language
processing.

5. Data security: Big data management requires robust security measures to protect
sensitive data from cyber threats and other security risks. This includes data encryption,
access
control, and data masking.
 Schema Less Databases Modals:
A schemaless database, like MongoDB, does not have these up-front
constraints, mapping to a more ‘natural’ database. Even when sitting on top of a
data lake, each document is created with a partial schema to aid retrieval. Any
formal schema is applied in the code of your applications; this layer of
abstraction protects the raw data in the NoSQL database and allows for rapid
transformation as your needs change.

Any data, formatted or not, can be stored in a non-tabular NoSQL type of

database. At the same time, using the right tools in the form of a schemaless
database can unlock the value of all of your structured and unstructured data
types.

A NoSQL database is very different to a traditional relational database which has

a strictly defined schema enforced by the RDBMS. However, in order to assist
with sorting and retrieval, each NoSQL document contains a partial schema — all
collections and indexes are explicitly listed in the system namespace for
instance. However, a schema is only applied to your data when it is retrieved by
your application.

How does a schemaless database work?

In schemaless databases, information is stored in JSON-style documents which

can have varying sets of fields with different data types for each field. So, a
collection could look like this:

{
name : “Joe”, age : 30, interests : ‘football’ }
{
name : “Kate”, age : 25
}

As you can see, the data itself normally has a fairly consistent structure. With
the schemaless MongoDB database, there is some additional structure — the
system namespace contains an explicit list of collections and indexes.
Collections may be implicitly or explicitly created — indexes must be explicitly
declared.

In a schemaless model, data can be stored as flexible documents, where each

document represents an instance of data with its own unique structure. This
approach is commonly used in NoSQL databases such as MongoDB,
Couchbase, and Apache Cassandra.
Some of the benefits of a schemaless model for data management include:
1. Flexibility: A schemaless model allows for greater flexibility in data
modeling, as the structure of the data can evolve over time to meet
changing business needs.
2. Faster development: A schemaless model can speed up the development
process, as it eliminates the need to define a schema upfront, and allows
developers to focus on building features and functionality.
3. Easier data integration: A schemaless model can make it easier to
integrate data from different sources, as the data can be stored in its
natural format, rather than being forced into a predefined schema.
4. Scalability: A schemaless model can be more scalable than a
traditional relational database, as it can handle a greater volume and
variety of data, without requiring changes to the schema.
However, a schemaless model can also present some challenges, such as:
1. Data consistency: Without a predefined schema, ensuring data
consistency can be more challenging, as there is a greater risk of data
errors and inconsistencies.
2. Data governance: A schemaless model can make it harder to
enforce data governance policies, such as data quality standards
and access controls.
3. Data analysis: A schemaless model can make it harder to analyze data, as
the structure of the data is not predefined, and data may need to be
transformed before it can be analyzed.

 Key-Value Data Model in NoSQL:

A key-value data model or database is also referred to as a key-value store. It is a non-relational
type of database. In this, an associative array is used as a basic database in which an individual key
is
linked with just one value in a collection. For the values, keys are special identifiers. Any kind of
entity can be valued. The collection of key-value pairs stored on separate records is called key-value
databases and they do not have an already defined structure.

How do key-value databases work?

A number of easy strings or even a complicated entity are referred to as a value that is associated
with a key by a key-value database, which is utilized to monitor the entity. Like in many
programming paradigms, a key-value database resembles a map object or array, or dictionary,
however, which is put away in a tenacious manner and controlled by a DBMS.

An efficient and compact structure of the index is used by the key-value store to have the option to
rapidly and dependably find value using its key. For example, Redis is a key-value store used to
tracklists, maps, heaps, and primitive types (which are simple data structures) in a constant
database. Redis can uncover a very basic point of interaction to query and manipulate value types,
just by supporting a predetermined number of value types, and when arranged, is prepared to do
high throughput.
When to use a key-value database:

Here are a few situations in which you can use a key-value database:-

 User session attributes in an online app like finance or gaming, which is referred to as
real- time random data access.

 Caching mechanism for repeatedly accessing data or key-based design.

 The application is developed on queries that are based on keys.

Features:

 One of the most un-complex kinds of NoSQL data models.

 For storing, getting, and removing data, key-value databases utilize simple functions.

 Querying language is not present in key-value databases.

 Built-in redundancy makes this database more reliable.

Advantages:

 It is very easy to use. Due to the simplicity of the database, data can accept any kind, or
even different kinds when required.

 Its response time is fast due to its simplicity, given that the remaining environment near it
is very much constructed and improved.

 Key-value store databases are scalable vertically as well as horizontally.

 Built-in redundancy makes this database more reliable.

Disadvantages:

 As querying language is not present in key-value databases, transportation of queries

from one database to a different database cannot be done.

 The key-value store database is not refined. You cannot query the database without a key.

Some examples of key-value databases:

Here are some popular key-value databases which are widely used:

 Couchbase: It permits SQL-style querying and searching for text.

 Amazon DynamoDB: The key-value database which is mostly used is Amazon DynamoDB
as it is a trusted database used by a large number of users. It can easily handle a large
number of requests every day and it also provides various security options.

 Riak: It is the database used to develop applications.

 Aerospike: It is an open-source and real-time database working with billions of exchanges.

 Berkeley DB: It is a high-performance and open-source database providing scalability

 Graph Based Data Model in NoSQL

Graph Based Data Model in NoSQL is a type of Data Model which tries to focus on building the
relationship between data elements. As the name suggests Graph-Based Data Model, each element
here is stored as a node, and the association between these elements is often known as Links.
Association is stored directly as these are the first-class elements of the data model. These data
models give us a conceptual view of the data.

These are the data models which are based on topographical network structure. Obviously, in
graph theory, we have terms like Nodes, edges, and properties, let’s see what it means here in the
Graph- Based data model.

 Nodes: These are the instances of data that represent objects which is to be tracked.

 Edges: As we already know edges represent relationships between nodes.

 Properties: It represents information associated with nodes.

The below image represents Nodes with properties from relationships represented by edges.

Working of Graph Data Model :

In these data models, the nodes which are connected together are connected physically and the
physical connection among them is also taken as a piece of data. Connecting data in this way
becomes easy to query a relationship. This data model reads the relationship from storage directly
instead of calculating and querying the connection steps. Like many different NoSQL databases these
data models don’t have any schema as it is important because schema makes the model well and
good and easy to edit.

Examples of Graph Data Models :

 JanusGraph: These are very helpful in big data analytics. It is a scalable graph
database system open source too. JanusGraph has different features like:

 Storage: Many options are available for storing graph data like Cassandra.
 Support for transactions: There are many supports available like ACID (Atomicity,
Consistency, Isolation, and Durability) which can hold thousands of concurrent
users.

 Searching options: Complex searching options are available and optional

support too.

 Neo4j: It stands for Network Exploration and Optimization 4 Java. As the name suggests
this graph database is written in Java with native graph storage and processing. Neo4j has
different features like:

 Scalable: Scalable through data partitioning into pieces known as shards.

 Higher Availability: Availability is very much high due to continuous backups

and rolling upgrades.

 Query Language: Uses programmer-friendly query language Cypher graph

query language.DGraph main features are:

 DGraph: It is an open-source distributed graph database system designed with scalability.

 Query Language: It uses GraphQL, which is solely made for APIs.

 open-source system: support for many open standards.

Advantages of Graph Data Model :

 Structure: The structures are very agile and workable too.

 Explicit Representation: The portrayal of relationships between entities is explicit.

 Real-time O/P Results: Query gives us real-time output results.

Disadvantages of Graph Data Model :

 No standard query language: Since the language depends on the platform that is used
so there is no certain standard query language.

 Unprofessional Graphs: Graphs are very unprofessional for transactional-based systems.

 Small User Base: The user base is small which makes it very difficult to get support
when running into a system.

Applications of Graph Data Model:

 Graph data models are very much used in fraud detection which itself is very much
useful and important.

 It is used in Digital asset management which provides a scalable database model to

keep track of digital assets.

 It is used in Network management which alerts a network administrator about problems in

a network.

 It is used in Context-aware services by giving traffic updates and many more.

 It is used in Real-Time Recommendation Engines which provide a better user experience.

 Document Databases in NoSQL:

A Document Data Model is a lot different than other data models because it stores data in JSON,
BSON, or XML documents. in this data model, we can move documents under one document and
apart from this, any particular elements can be indexed to run queries faster. Often documents are
stored and retrieved in such a way that it becomes close to the data objects which are used in many
applications which means very less translations are required to use data in applications. JSON is a
native language that is often used to store and query data too.

So in the document data model, each document has a key-value pair below is an example for the
same.

"Name" : "Yashodhra",

"Address" : "Near Patel Nagar",

"Email" : "[email protected]",

"Contact" : "12345"

Working of Document Data Model:

This is a data model which works as a semi-structured data model in which the records and data
associated with them are stored in a single document which means this data model is not
completely unstructured. The main thing is that data here is stored in a document.

Features:

 Document Type Model: As we all know data is stored in documents rather than tables
or graphs, so it becomes easy to map things in many programming languages.

 Flexible Schema: Overall schema is very much flexible to support this statement one
must know that not all documents in a collection need to have the same fields.

 Distributed and Resilient: Document data models are very much dispersed which is
the reason behind horizontal scaling and distribution of data.

 Manageable Query Language: These data models are the ones in which query language
allows the developers to perform CRUD (Create Read Update Destroy) operations on
the data model.

Examples of Document Data Models :

 Amazon DocumentDB

 MongoDB

 Cosmos DB

 ArangoDB

 Couchbase Server

 CouchDB
Advantages:

 Schema-less: These are very good in retaining existing data at massive volumes
because there are absolutely no restrictions in the format and the structure of data
storage.

 Faster creation of document and maintenance: It is very simple to create a document

and apart from this maintenance requires is almost nothing.

 Open formats: It has a very simple build process that uses XML, JSON, and its other forms.

 Built-in versioning: It has built-in versioning which means as the documents grow in
size there might be a chance they can grow in complexity. Versioning decreases
conflicts.

Disadvantages:

 Weak Atomicity: It lacks in supporting multi-document ACID transactions. A change in the

document data model involving two collections will require us to run two separate
queries
i.e. one for each collection. This is where it breaks atomicity requirements.

 Consistency Check Limitations: One can search the collections and documents that are not
connected to an author collection but doing this might create a problem in the
performance of database performance.

 Security: Nowadays many web applications lack security which in turn results in the
leakage of sensitive data. So it becomes a point of concern, one must pay attention to web
app
vulnerabilities.

Applications of Document Data Model :

 Content Management: These data models are very much used in creating various video
streaming platforms, blogs, and similar services Because each is stored as a single document
and the database here is much easier to maintain as the service evolves over time.

 Book Database: These are very much useful in making book databases because as we
know this data model lets us nest.

 Catalog: When it comes to storing and reading catalog files these data models are very
much used because it has a fast reading ability if incase Catalogs have thousands of
attributes stored.

 Analytics Platform: These data models are very much used in the Analytics Platform.

 Object Data Stores:

Object data stores are a type of NoSQL database that is designed to store and manage complex,
hierarchical data structures such as objects or graphs. Object data stores are often used for
applications that require high performance and scalability, such as e-commerce, gaming, and social
media.

In an object data store, data is stored as objects, which are self-contained units of data that contain
both data and behavior. Objects can be complex, with nested structures and relationships to other
objects, and can include both structured and unstructured data.
Object data stores typically use a flexible schemaless model that allows for dynamic changes to
the data structure as new objects are added or existing objects are updated. This makes it easy to
store and manage complex, evolving data structures.

Object data stores also often provide support for transactions, indexing, and querying, making it
possible to perform complex analytics and searches on the data. Some examples of object data
stores include Amazon DynamoDB, Couchbase, and Apache Cassandra.?

Advantages of Object Data Stores:

1. Flexible Data Model: Object data stores have a flexible data model that allows for easy
storage and management of complex, hierarchical data structures. This makes it easier
to handle unstructured or semi-structured data.

2. Scalability: Object data stores are designed to scale horizontally, which makes it easy to
handle large volumes of data and high traffic loads. They can be easily expanded by
adding more servers to the cluster.

3. High Performance: Object data stores provide high performance for read and write
operations, making them ideal for real-time applications such as gaming, social media, and
financial services.

4. Easy Integration with Applications: Object data stores are easy to integrate with
applications using APIs, making it simple for developers to work with the database.

Disadvantages of Object Data Stores:

1. Limited Querying Capability: Object data stores often have limited querying capability,
which can make it difficult to perform complex analytics or search operations.

2. Lack of Standardization: Unlike relational databases, there is no standard for object

data stores, which can make it difficult to switch between databases or integrate with
other systems.

3. Complexity: Object data stores can be complex to set up and maintain, requiring
specialized skills and knowledge.

Applications of Object Data Stores:

1. Real-Time Applications: Object data stores are ideal for real-time applications such as
gaming, social media, and financial services, where high performance and scalability
are critical.

2. E-commerce: Object data stores are well-suited for e-commerce applications, where
complex data structures such as product catalogs and customer profiles need to be stored
and managed.

3. Internet of Things (IoT): Object data stores are also useful for IoT applications, where
large volumes of data need to be stored and analyzed in real-time.

 Tabular stores:
Tabular stores, also known as columnar stores, are a type of NoSQL database that store data in a
column-oriented format instead of a traditional row-oriented format. Here are some
advantages, disadvantages, and applications of tabular stores:

The Columnar Data Model of NoSQL is important. NoSQL databases are different from SQL
databases. This is because it uses a data model that has a different structure than the previously
followed row-and-column table model used with relational database management systems
(RDBMS). NoSQL databases are a flexible schema model which is designed to scale horizontally
across many servers and is used in large volumes of data.

Columnar Data Model of NoSQL :

Basically, the relational database stores data in rows and also reads the data row by row, column
store is organized as a set of columns. So if someone wants to run analytics on a small number of
columns, one can read those columns directly without consuming memory with the unwanted data.
Columns are somehow are of the same type and gain from more efficient compression, which makes
reads faster than before. Examples of Columnar Data Model: Cassandra and Apache Hadoop Hbase.

Working of Columnar Data Model:

In Columnar Data Model instead of organizing information into rows, it does in columns. This makes
them function the same way that tables work in relational databases. This type of data model is
much more flexible obviously because it is a type of NoSQL database. The below example will help in
understanding the Columnar data model:

Row-Oriented Table:

S.No. Name Course Branch ID

01. Tanmay B-Tech Computer 2

02. Abhishek B-Tech Electronics 5

03. Samriddha B-Tech IT 7

04. Aditi B-Tech E & TC 8

Column – Oriented Table:

S.No. Name ID

01. Tanmay 2
S.No. Name ID S.No. Branch ID

02. Abhishek 5 01. Computer 2

03. Samriddha 7 02. Electronics 5

04. Aditi 8 03. IT 7

S.No. Course ID 04. E & TC 8

01. B-Tech 2

02. B-Tech 5

03. B-Tech 7

04. B-Tech 8

Columnar Data Model uses the concept of keyspace, which is like a schema in relational models.

Advantages of Columnar Data Model :

 Well structured: Since these data models are good at compression so these are
very structured or well organized in terms of storage.

 Flexibility: A large amount of flexibility as it is not necessary for the columns to look like
each other, which means one can add new and different columns without disrupting the
whole database

 Aggregation queries are fast: The most important thing is aggregation queries are quite fast
because a majority of the information is stored in a column. An example would be Adding
up the total number of students enrolled in one year.

 Scalability: It can be spread across large clusters of machines, even numbering in thousands.

 Load Times: Since one can easily load a row table in a few seconds so load times are
nearly excellent.

Disadvantages of Columnar Data Model:

 Designing indexing Schema: To design an effective and working schema is too difficult
and very time-consuming.
 Suboptimal data loading: incremental data loading is suboptimal and must be avoided,
but this might not be an issue for some users.

 Security vulnerabilities: If security is one of the priorities then it must be known that the
Columnar data model lacks inbuilt security features in this case, one must look into
relational databases.

 Online Transaction Processing (OLTP): Online Transaction Processing (OLTP) applications

are also not compatible with columnar data models because of the way data is stored.

Applications of Columnar Data Model:

 Columnar Data Model is very much used in various Blogging Platforms.

 It is used in Content management systems like WordPress, Joomla, etc.

 It is used in Systems that maintain counters.

 It is used in Systems that require heavy write requests.

 It is used in Services that have expiring usage.

 Document stores:
A document store is a type of NoSQL database that stores data in the form of documents rather than
rows and columns as in relational databases. A document can be a JSON or BSON object, which
contains data fields and values that are stored together in a single document.

Here's how document stores work:

1. Documents are stored in collections or buckets, which can be thought of as containers

for related data.

2. Each document can have its own unique structure, and the fields within the document
can have different data types.

3. The document store database provides APIs for reading, writing, and querying documents.

Some advantages of using document stores include:

1. Flexibility: Document stores are schema-less, which means that they can easily
handle unstructured and semi-structured data. This makes it easy to store data of
varying complexity without having to worry about predefined table structures.

2. Scalability: Document stores are designed to handle large volumes of data, making
them ideal for applications that require high scalability.

3. Performance: Document stores can be optimized for high-speed data retrieval, making
them suitable for applications that require fast and responsive queries.

4. Availability: Document stores are designed to be highly available, with built-in features
for replication and failover.

However, there are also some disadvantages to using document stores, including:
1. Limited querying capabilities: Because document stores are schema-less, querying can
be more complex than with traditional relational databases.

2. Lack of transactional support: Document stores do not support transactions in the same
way as traditional relational databases, which can make it more difficult to ensure data
consistency in certain scenarios.

3. Higher storage requirements: Because documents can contain redundant data,

document stores may require more storage space than traditional relational databases.

Some common applications of document stores include:

1. Content management systems: Document stores can be used to store and

manage unstructured content such as articles, images, and videos.

2. Internet of Things (IoT) applications: Document stores can be used to store sensor data
from IoT devices, which can be semi-structured or unstructured.

3. E-commerce platforms: Document stores can be used to store product

information, customer profiles, and order history.

4. Real-time analytics: Document stores can be used to store and analyze real-time data
streams from various sources such as social media platforms, mobile apps, and IoT devices.

 NoSql Misconception:
There are several common misconceptions about NoSQL databases that can lead to confusion or
misunderstandings about their capabilities and use cases. Some of the most common misconceptions include:

1. NoSQL databases are always faster than SQL databases: While NoSQL databases can be faster
than SQL databases in some scenarios, such as when handling large volumes of unstructured data,
this is not always the case. The performance of a database depends on many factors, including the
data model, query complexity, hardware, and network latency.
2. NoSQL databases are schemaless: While many NoSQL databases use a flexible,
schemaless data model, this is not true for all NoSQL databases. Some NoSQL databases,
such as
columnar databases, have a fixed schema.
3. NoSQL databases are always cheaper than SQL databases: While NoSQL databases can
be more cost-effective than SQL databases in some cases, such as when scaling out to
handle
large amounts of data, this is not always the case. The total cost of ownership for a
database depends on many factors, including licensing costs, hardware, maintenance, and
support.
4. NoSQL databases are always better for big data: While NoSQL databases can be well-
suited for handling big data, this is not always the case. The best database for a particular
use case depends on many factors, including the type of data, query patterns, and
performance
requirements.
5. NoSQL databases can't handle transactions: While some NoSQL databases, such as
document stores, may not have full transaction support, many NoSQL databases do support
transactions, including key-value stores and columnar databases.

 Nosql over RDBMs:

Difference between NoSQL and RDBMS

The following table highlights the major differences between NoSQL and RDBMS −

Basis of NoSQL RDBMS

Comparison

Non-relational databases, often known as RDBMS, which stands for Relational Database
Definition distributed databases, are another name Management Systems, is the most common name
for NoSQL databases. for SQL databases.

Query No declarative query language SQL stands for Structured Query Language.

Scalability NoSQL databases are horizontally scalable RDBMS databases are vertically scalable

NoSQL combines multiple database Traditional RDBMS systems use SQL syntax and
Design technologies. These databases were created queries to get insights from data. Different OLAP
in response to the application's requirements. systems use
them.
NoSQL databases use denormalization to Relational database models contain data in different
optimise themselves. One record stores all tables; when running a query, you must integrate
Speed the query data. This simplifies finding the information and set table-spanning restrictions.
matched Because of so many tables, the database's query time
records, which speeds up queries. is slow.

Unit 2 - Big Data Analytics - CCS334
No ratings yet
Unit 2 - Big Data Analytics - CCS334
36 pages
Vector CANoe
No ratings yet
Vector CANoe
7 pages
Nosql Database: Nosql Databases Are Generally Classified Into Four Main Categories
No ratings yet
Nosql Database: Nosql Databases Are Generally Classified Into Four Main Categories
11 pages
Unit Iii
No ratings yet
Unit Iii
22 pages
Unit 2 Bda
No ratings yet
Unit 2 Bda
28 pages
Unit 3
No ratings yet
Unit 3
28 pages
6.unit 2 Bda
No ratings yet
6.unit 2 Bda
50 pages
No SQL
No ratings yet
No SQL
11 pages
BDA Module 3
No ratings yet
BDA Module 3
27 pages
Unit 5
No ratings yet
Unit 5
137 pages
Unit VI Big Data
No ratings yet
Unit VI Big Data
19 pages
BDA Unit-5
No ratings yet
BDA Unit-5
18 pages
Unit 3
No ratings yet
Unit 3
10 pages
3.1 Introduction To NoSQL
No ratings yet
3.1 Introduction To NoSQL
10 pages
Unit 3
No ratings yet
Unit 3
28 pages
Introduction To NoSQL
No ratings yet
Introduction To NoSQL
12 pages
Unit No 1
No ratings yet
Unit No 1
34 pages
Unit 4-1
No ratings yet
Unit 4-1
21 pages
NoSQL DATABASE-B
No ratings yet
NoSQL DATABASE-B
4 pages
Unit - 3
No ratings yet
Unit - 3
34 pages
UNIT II First Half Notes
No ratings yet
UNIT II First Half Notes
21 pages
No SQL - Types, CAP Theorem
No ratings yet
No SQL - Types, CAP Theorem
12 pages
Unit 4
No ratings yet
Unit 4
36 pages
IA2 - QnA
No ratings yet
IA2 - QnA
44 pages
Big Data Notes
No ratings yet
Big Data Notes
18 pages
Unit - 2
No ratings yet
Unit - 2
70 pages
Nosql Databases
No ratings yet
Nosql Databases
2 pages
Introduction To NoSQL
No ratings yet
Introduction To NoSQL
1 page
Nosql Database
No ratings yet
Nosql Database
19 pages
What Is NoSQL
No ratings yet
What Is NoSQL
52 pages
NoSQL Technologies Notes Unit 1
100% (1)
NoSQL Technologies Notes Unit 1
20 pages
NoSQL Complete QB
No ratings yet
NoSQL Complete QB
43 pages
What Is NoSQL
No ratings yet
What Is NoSQL
4 pages
NoSQL Databases
No ratings yet
NoSQL Databases
36 pages
BDT Unit-Ii
No ratings yet
BDT Unit-Ii
13 pages
Adbms Unit 1
No ratings yet
Adbms Unit 1
32 pages
Unit II - BIG DATA ANALYTICS
No ratings yet
Unit II - BIG DATA ANALYTICS
11 pages
NOs QL
No ratings yet
NOs QL
14 pages
NoSQL Databases
No ratings yet
NoSQL Databases
10 pages
BigData Unit2 V2
No ratings yet
BigData Unit2 V2
70 pages
Big Data Notes
No ratings yet
Big Data Notes
70 pages
High-Performance, Non Relational Databases With Flexible Data Models
No ratings yet
High-Performance, Non Relational Databases With Flexible Data Models
4 pages
NoSQL Lec
No ratings yet
NoSQL Lec
45 pages
Unit 2
No ratings yet
Unit 2
23 pages
Module 3 Bigdata Analytics
No ratings yet
Module 3 Bigdata Analytics
19 pages
Unit 1-NoSQL
No ratings yet
Unit 1-NoSQL
31 pages
No SQL
No ratings yet
No SQL
17 pages
UNIT 5 NoSql DBMS Notes
No ratings yet
UNIT 5 NoSql DBMS Notes
19 pages
No SQL
No ratings yet
No SQL
3 pages
Nosql Database
No ratings yet
Nosql Database
8 pages
Unit 6
No ratings yet
Unit 6
143 pages
Nosql Module 1
No ratings yet
Nosql Module 1
23 pages
DB 5
No ratings yet
DB 5
39 pages
Unit 15
No ratings yet
Unit 15
19 pages
Unit II - BDA NEW
No ratings yet
Unit II - BDA NEW
48 pages
Lec 15 Notes
No ratings yet
Lec 15 Notes
3 pages
NoSql Notes
No ratings yet
NoSql Notes
4 pages
26 SQL Vs NoSQL
No ratings yet
26 SQL Vs NoSQL
5 pages
Unit Ii - Nosql Databases
No ratings yet
Unit Ii - Nosql Databases
112 pages
ML-2024 MSC
No ratings yet
ML-2024 MSC
2 pages
Unit 4
No ratings yet
Unit 4
1 page
BT-2024 MSC
No ratings yet
BT-2024 MSC
2 pages
Blockchain 1 by Minal
No ratings yet
Blockchain 1 by Minal
53 pages
Intrw
No ratings yet
Intrw
6 pages
Remainder Theorem Classnotes
No ratings yet
Remainder Theorem Classnotes
16 pages
Online Class 14 Congestion Control
No ratings yet
Online Class 14 Congestion Control
17 pages
Profit and Loss Practice Sheet
No ratings yet
Profit and Loss Practice Sheet
50 pages
Online Class 6
No ratings yet
Online Class 6
28 pages
Online Class TCPUDP IP
No ratings yet
Online Class TCPUDP IP
18 pages
Calculation Sheet Final
86% (7)
Calculation Sheet Final
10 pages
Java 2021
No ratings yet
Java 2021
2 pages
JAVA Practical File
No ratings yet
JAVA Practical File
97 pages
CBCS MCA Bridge Course Dean
No ratings yet
CBCS MCA Bridge Course Dean
10 pages
Typing
No ratings yet
Typing
5 pages
4.2 BigHand Enterprise Core Product Technical Requirements
No ratings yet
4.2 BigHand Enterprise Core Product Technical Requirements
21 pages
Distributed Database Pharmacy Management PMS
0% (2)
Distributed Database Pharmacy Management PMS
27 pages
Computer Lab-X Lab Manual
No ratings yet
Computer Lab-X Lab Manual
118 pages
Logistics
No ratings yet
Logistics
4 pages
Punjab Technical University: Scheme & Syllabus of B. Tech. Computer Science & Engineering (CSE)
No ratings yet
Punjab Technical University: Scheme & Syllabus of B. Tech. Computer Science & Engineering (CSE)
57 pages
Foxpro Tutorial A
No ratings yet
Foxpro Tutorial A
8 pages
FMS 1
No ratings yet
FMS 1
27 pages
SQL Interview Question
100% (3)
SQL Interview Question
22 pages
Design Concepts: Software Engineering: A Practitioner's Approach, 7/e
100% (1)
Design Concepts: Software Engineering: A Practitioner's Approach, 7/e
41 pages
Tableau Day 1
No ratings yet
Tableau Day 1
13 pages
Database Management System
No ratings yet
Database Management System
8 pages
The Geodatabase: GIS Topics and Applications
No ratings yet
The Geodatabase: GIS Topics and Applications
29 pages
Microsoft On Demand Certification Tracks Us
No ratings yet
Microsoft On Demand Certification Tracks Us
1 page
Cashless Canteen System
No ratings yet
Cashless Canteen System
9 pages
Aa Active Alarms Handling
No ratings yet
Aa Active Alarms Handling
19 pages
Safety Methods Database
No ratings yet
Safety Methods Database
261 pages
Food Delivery
No ratings yet
Food Delivery
9 pages
Manual de Instalación SimaPro PDF
No ratings yet
Manual de Instalación SimaPro PDF
24 pages
Disaster Recovery System Administration Guide For Cisco Unified Contact Center Express Release 8.5
No ratings yet
Disaster Recovery System Administration Guide For Cisco Unified Contact Center Express Release 8.5
26 pages
PI System Administration
100% (2)
PI System Administration
196 pages
Black Book 1
No ratings yet
Black Book 1
73 pages
Matseis 1.12 Manual
No ratings yet
Matseis 1.12 Manual
165 pages
Hackr - Io SQL Commands List PDF
No ratings yet
Hackr - Io SQL Commands List PDF
14 pages
01 Become A PostgreSQL DBA Understanding The Architecture
No ratings yet
01 Become A PostgreSQL DBA Understanding The Architecture
10 pages
Sei Employee Referral Program 2011
No ratings yet
Sei Employee Referral Program 2011
69 pages
Databases
No ratings yet
Databases
102 pages
402-Information - Technology PRELIM Answer Sahodaya
No ratings yet
402-Information - Technology PRELIM Answer Sahodaya
13 pages
Zeroth Review Presentation Ladies Hostel Management System
No ratings yet
Zeroth Review Presentation Ladies Hostel Management System
7 pages
ITM Reading Material 2021
100% (1)
ITM Reading Material 2021
70 pages