0% found this document useful (0 votes)
118 views

Unit 1 Notes in NoSQL

NoSQL databases are used to handle large, distributed datasets and are effective for analyzing unstructured cloud data. They arose due to the demands of big data and cloud computing on relational databases. NoSQL databases sacrifice consistency for availability and scalability. There are several families of NoSQL databases including key-value, document, graph and BigTable influenced databases.

Uploaded by

sudhaaass
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
118 views

Unit 1 Notes in NoSQL

NoSQL databases are used to handle large, distributed datasets and are effective for analyzing unstructured cloud data. They arose due to the demands of big data and cloud computing on relational databases. NoSQL databases sacrifice consistency for availability and scalability. There are several families of NoSQL databases including key-value, document, graph and BigTable influenced databases.

Uploaded by

sudhaaass
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 20

NoSQL Databases

These are used for large sets of distributed data. There are some big
data performance issues which are effectively handled by relational
databases, such kind of issues are easily managed by NoSQL
databases. There are very efficient in analyzing large size unstructured
data that may be stored at multiple virtual servers of the cloud.

Why NoSQL?
NoSQL - probably the hottest term in database technology today -
was unheard of only a year ago. And yet, today, there are literally
dozens of database systems described as "NoSQL." How did all of
this happen so quickly?

Although the term "NoSQL" is barely a year old, in reality, most of


the databases described as NoSQL have been around a lot longer
than the term itself. Many databases described as NoSQL arose
over the past few years as reactions to strains placed on traditional
relational databases by two other significant trends affecting our
industry: big data and cloud computing.

Of course, database volumes have grown continuously since the


earliest days of computing, but that growth has intensified
dramatically over the past decade as databases have been tasked
with accepting data feeds from customers, the public, point of sale
devices, GPS, mobile devices, RFID readers and so on.

Cloud computing also has placed new challenges on the database.


The economic vision for cloud computing is to provide computing
resources on demand with a "pay-as-you-go" model. A pool of
computing resources can exploit economies of scale and a levelling
of variable demand by adding or subtracting computing resources as
workload demand changes. The traditional RDBMS has been unable
to provide these types of elastic services.

The demands of big data and elastic provisioning call for a database
that can be distributed on large numbers of hosts spread out across
a widely dispersed network. While commercial relational databases
- such as Oracle's RAC - have taken steps to meet this challenge, it's
become apparent that some of the fundamental characteristics of
relational database are incompatible with the elastic and Big Data
demands.

Ironically, the demand for NoSQL did not come about because of
problems with the SQL language. The demand is due to the strong
consistency and transactional integrity of NoSQL. In a
transactional relational database, all users see an identical view of
data. In 2000, however, Eric Brewer outlined the now famous CAP
theorem, which states that both Consistency and high Availability
cannot be maintained when a database is Partitioned across a
fallible wide area network.

Google, Facebook, Amazon and other huge web sites, therefore,


developed non-relational databases that sacrificed consistency for
availability and scalability. It just so happened that these
databases didn't support the SQL language either, and, when a
group of developers organized a meeting in June 2009 to discuss
these non-relational databases, the term "NoSQL" seemed
convenient. Perhaps unfortunately, the term NoSQL caught on
beyond expectations, and now is used as shorthand for any non-
relational database.

Within the NoSQL zoo, there are several distinct family trees. Some
NoSQL databases are pure key-stores without an explicit data
model, with many based on Amazon's Dynamo key-value store.
Others are heavily influenced by Google's BigTable database, which
supports Google products such as Google Maps and Google
Reader. Document databases store highly structured self-
describing objects, usually in an XML-like format called JSON.
Finally, graph databases store complex relationships such as those
found in social networks.

Within these four NoSQL families are at least a dozen database


systems of significance. Some probably will disappear as the
NoSQL segment matures, and, right now, it's anyone's guess as to
which ones will win, and which will lose.

NoSQL is a fairly imprecise term - it defines what the databases are


not, rather than what they are, and rejects SQL rather than the more
relevent strict consistency of the relational model. As imprecise
as the term may be, however, there's no doubt that NoSQL
databases represent an important direction in database technology.

The value of Relational Databases


NoSQL databases can store relationship data — they just
store it differently than relational databases do. In fact, when
compared with relational databases, many find modeling
relationship data in NoSQL databases to be easier than in
relational databases, because related data doesn't have to be
split between tables.
The Value of Relational Databases
1. Getting at Persistent Data
Two areas of memory:
• Fast, small, volatile main memory
• Larger, slower, non volatile backing store
• Since main memory is volatile to keep data around, we write it to a
backing store, commonly seen a disk which can be persistent memory.
The backing store can be:
• File system
• Database
 The database allows more flexibility than a file system in
storing large amounts of data in a way that allows an application
program to get information quickly and easily.
2. Concurrency
• Enterprise applications tend to have many people using same data at
once, possibly modifying that data. We have to worry about
coordinating interactions between them to avoid things like double
booking of hotel rooms.
• Since enterprise applications can have lots of users and other
systems all working concurrently, there’s a lot of room for bad
things to happen. Relational databases help to handle this by
controlling all access to their data through transactions.
3. Integration
• Enterprise requires multiple applications, written by different
teams, to collaborate in order to get things done. Applications often
need to use the same data and updates made through one application
have to be visible to others.
• A common way to do this is shared database integration where
multiple applications store their data in a single database.
• Using a single database allows all the applications to use each
others’ data easily, while the database’s concurrency control
handles multiple applications in the same way as it handles multiple
users in a single application.
4. A (Mostly) Standard Model
• Relational databases have succeeded because they provide the core
benefits in a (mostly) standard way.
• As a result, developers can learn the basic relational model and
apply it in many projects.
• Although there are differences between different relational
databases, the core mechanisms remain the same.
5. Impedance Mismatch
• For Application developers using relational databases, the biggest
frustration has been what’s commonly called the impedance mismatch:
the difference between the relational model and the in-memory data
structures.
• The relational data model organizes data into a structure of
tables. Where a tuple is a set of name-value pairs and a relation is
a set of tuples.
• The values in a relational tuple have to be simple—they cannot
contain any structure, such as a nested record or a list. This
limitation isn’t true for in-memory data structures, which can take
on much richer structures than relations.
• So if you want to use a richer in-memory data structure, you have
to translate it to a relational representation to store it on disk.
Hence the impedance mismatch—two different representations that
require translation.
The impedance mismatch lead to relational databases being replaced
with databases that replicate the inmemory data structures to disk.
That decade was marked with the growth of object-oriented programming
languages, and with them came object-oriented databases—both looking
to be the dominant environment for software development in the new
millennium. However, while object-oriented languages succeeded in
becoming the major force in programming, object-oriented databases
faded into obscurity.
• Impedance mismatch has been made much easier to deal with by the
wide availability of object relational mapping frameworks, such as
Hibernate and iBATIS that implement well-known mapping patterns, but
the mapping problem is still an issue.
• Relational databases continued to dominate the enterprise computing
world in the 2000s, but during that decade cracks began to open in
their dominance.
6. Application and Integration Databases
• In relational databases, the database acts as an integration
database—where multiple applications developed by separate teams
storing their data in a common database. This improves communication
because all the applications are operating on a consistent set of
persistent data. There are downsides to shared database integration.
• A structure that’s designed to integrate many applications is more
complex than any single application needs.
• If an application wants to make changes to its data storage, it
needs to coordinate with all the other applications using the
database.
• Different applications have different structural and performance
needs, so an index required by one application may cause a
problematic hit on inserts for another.
A different approach is to treat your database as an application
database—which is only accessed by a single application codebase
that’s looked after by a single team.
Advantages:
• With an application database, only the team using the application
needs to know about the database structure, which makes it much
easier to maintain and evolve the schema.
• Since the application team controls both the database and the
application code, the responsibility for database integrity can be
put in the application code.

Introduction to NoSQL
NoSQL is a type of database management system (DBMS) that is
designed to handle and store large volumes of unstructured and semi-
structured data. Unlike traditional relational databases that use tables
with pre-defined schemas to store data, NoSQL databases use flexible
data models that can adapt to changes in data structures and are capable
of scaling horizontally to handle growing amounts of data.
The term NoSQL originally referred to “non-SQL” or “non-relational”
databases, but the term has since evolved to mean “not only SQL,” as
NoSQL databases have expanded to include a wide range of different
database architectures and data models.

NoSQL databases are generally classified into four main


categories:

1. Document databases: These databases store data as semi-


structured documents, such as JSON or XML, and can be queried
using document-oriented query languages.
2. Key-value stores: These databases store data as key-value pairs,
and are optimized for simple and fast read/write operations.
3. Column-family stores: These databases store data as column
families, which are sets of columns that are treated as a single entity.
They are optimized for fast and efficient querying of large amounts of
data.
4. Graph databases: These databases store data as nodes and edges,
and are designed to handle complex relationships between data.
NoSQL databases are often used in applications where there is a high
volume of data that needs to be processed and analyzed in real-time,
such as social media analytics, e-commerce, and gaming. They can also
be used for other applications, such as content management systems,
document management, and customer relationship management.
However, NoSQL databases may not be suitable for all applications, as
they may not provide the same level of data consistency and
transactional guarantees as traditional relational databases. It is
important to carefully evaluate the specific needs of an application when
choosing a database management system.
NoSQL originally referring to non SQL or non relational is a database that
provides a mechanism for storage and retrieval of data. This data is
modeled in means other than the tabular relations used in relational
databases. Such databases came into existence in the late 1960s, but did
not obtain the NoSQL moniker until a surge of popularity in the early
twenty-first century. NoSQL databases are used in real-time web
applications and big data and their use are increasing over time.
 NoSQL systems are also sometimes called Not only SQL to
emphasize the fact that they may support SQL-like query languages.
A NoSQL database includes simplicity of design, simpler horizontal
scaling to clusters of machines and finer control over availability. The
data structures used by NoSQL databases are different from those
used by default in relational databases which makes some operations
faster in NoSQL. The suitability of a given NoSQL database depends
on the problem it should solve.
 NoSQL databases, also known as “not only SQL” databases, are a
new type of database management system that have gained
popularity in recent years. Unlike traditional relational databases,
NoSQL databases are designed to handle large amounts of
unstructured or semi-structured data, and they can accommodate
dynamic changes to the data model. This makes NoSQL databases a
good fit for modern web applications, real-time analytics, and big data
processing.
 Data structures used by NoSQL databases are sometimes also
viewed as more flexible than relational database tables. Many NoSQL
stores compromise consistency in favor of availability, speed and
partition tolerance. Barriers to the greater adoption of NoSQL stores
include the use of low-level query languages, lack of standardized
interfaces, and huge previous investments in existing relational
databases.
 Most NoSQL stores lack true ACID(Atomicity, Consistency, Isolation,
Durability) transactions but a few databases, such as MarkLogic,
Aerospike, FairCom c-treeACE, Google Spanner (though technically a
NewSQL database), Symas LMDB, and OrientDB have made them
central to their designs.
 Most NoSQL databases offer a concept of eventual consistency in
which database changes are propagated to all nodes so queries for
data might not return updated data immediately or might result in
reading data that is not accurate which is a problem known as stale
reads. Also some NoSQL systems may exhibit lost writes and other
forms of data loss. Some NoSQL systems provide concepts such as
write-ahead logging to avoid data loss.
 One simple example of a NoSQL database is a document database.
In a document database, data is stored in documents rather than
tables. Each document can contain a different set of fields, making it
easy to accommodate changing data requirements
 For example, “Take, for instance, a database that holds data
regarding employees.”. In a relational database, this information might
be stored in tables, with one table for employee information and
another table for department information. In a document database,
each employee would be stored as a separate document, with all of
their information contained within the document.
 NoSQL databases are a relatively new type of database management
system that have gained popularity in recent years due to their
scalability and flexibility. They are designed to handle large amounts
of unstructured or semi-structured data and can handle dynamic
changes to the data model. This makes NoSQL databases a good fit
for modern web applications, real-time analytics, and big data
processing.
Key Features of NoSQL :
1. Dynamic schema: NoSQL databases do not have a fixed schema
and can accommodate changing data structures without the need for
migrations or schema alterations.
2. Horizontal scalability: NoSQL databases are designed to scale out
by adding more nodes to a database cluster, making them well-suited
for handling large amounts of data and high levels of traffic.
3. Document-based: Some NoSQL databases, such as MongoDB, use
a document-based data model, where data is stored in semi-
structured format, such as JSON or BSON.
4. Key-value-based: Other NoSQL databases, such as Redis, use a
key-value data model, where data is stored as a collection of key-
value pairs.
5. Column-based: Some NoSQL databases, such as Cassandra, use a
column-based data model, where data is organized into columns
instead of rows.
6. Distributed and high availability: NoSQL databases are often
designed to be highly available and to automatically handle node
failures and data replication across multiple nodes in a database
cluster.
7. Flexibility: NoSQL databases allow developers to store and retrieve
data in a flexible and dynamic manner, with support for multiple data
types and changing data structures.
8. Performance: NoSQL databases are optimized for high performance
and can handle a high volume of reads and writes, making them
suitable for big data and real-time applications.
Advantages of NoSQL: There are many advantages of working with
NoSQL databases such as MongoDB and Cassandra. The main
advantages are high scalability and high availability.
1. High scalability : NoSQL databases use sharding for horizontal
scaling. Partitioning of data and placing it on multiple machines in
such a way that the order of the data is preserved is sharding. Vertical
scaling means adding more resources to the existing machine
whereas horizontal scaling means adding more machines to handle
the data. Vertical scaling is not that easy to implement but horizontal
scaling is easy to implement. Examples of horizontal scaling
databases are MongoDB, Cassandra, etc. NoSQL can handle a huge
amount of data because of scalability, as the data grows NoSQL scale
itself to handle that data in an efficient manner.
2. Flexibility: NoSQL databases are designed to handle unstructured or
semi-structured data, which means that they can accommodate
dynamic changes to the data model. This makes NoSQL databases a
good fit for applications that need to handle changing data
requirements.
3. High availability : Auto replication feature in NoSQL databases
makes it highly available because in case of any failure data replicates
itself to the previous consistent state.
4. Scalability: NoSQL databases are highly scalable, which means that
they can handle large amounts of data and traffic with ease. This
makes them a good fit for applications that need to handle large
amounts of data or traffic
5. Performance: NoSQL databases are designed to handle large
amounts of data and traffic, which means that they can offer improved
performance compared to traditional relational databases.
6. Cost-effectiveness: NoSQL databases are often more cost-effective
than traditional relational databases, as they are typically less complex
and do not require expensive hardware or software.
7. Agility: Ideal for agile development.
Disadvantages of NoSQL: NoSQL has the following disadvantages.
1. Lack of standardization : There are many different types of NoSQL
databases, each with its own unique strengths and weaknesses. This
lack of standardization can make it difficult to choose the right
database for a specific application
2. Lack of ACID compliance : NoSQL databases are not fully ACID-
compliant, which means that they do not guarantee the consistency,
integrity, and durability of data. This can be a drawback for
applications that require strong data consistency guarantees.
3. Narrow focus : NoSQL databases have a very narrow focus as it is
mainly designed for storage but it provides very little functionality.
Relational databases are a better choice in the field of Transaction
Management than NoSQL.
4. Open-source : NoSQL is open-source database. There is no reliable
standard for NoSQL yet. In other words, two database systems are
likely to be unequal.
5. Lack of support for complex queries : NoSQL databases are not
designed to handle complex queries, which means that they are not a
good fit for applications that require complex data analysis or
reporting.
6. Lack of maturity : NoSQL databases are relatively new and lack the
maturity of traditional relational databases. This can make them less
reliable and less secure than traditional databases.
7. Management challenge : The purpose of big data tools is to make
the management of a large amount of data as simple as possible. But
it is not so easy. Data management in NoSQL is much more complex
than in a relational database. NoSQL, in particular, has a reputation
for being challenging to install and even more hectic to manage on a
daily basis.
8. GUI is not available : GUI mode tools to access the database are not
flexibly available in the market.
9. Backup : Backup is a great weak point for some NoSQL databases
like MongoDB. MongoDB has no approach for the backup of data in a
consistent manner.
10. Large document size : Some database systems like MongoDB and
CouchDB store data in JSON format. This means that documents are
quite large (BigData, network bandwidth, speed), and having
descriptive key names actually hurts since they increase the document
size.
Types of NoSQL database: Types of NoSQL databases and the name
of the databases system that falls in that category are:
1. Graph Databases: Examples – Amazon Neptune, Neo4j
2. Key value store: Examples – Memcached, Redis, Coherence
3. Tabular: Examples – Hbase, Big Table, Accumulo
4. Document-based: Examples – MongoDB, CouchDB, Cloudant
When should NoSQL be used:
1. When a huge amount of data needs to be stored and retrieved.
2. The relationship between the data you store is not that important
3. The data changes over time and is not structured.
4. Support of Constraints and Joins is not required at the database level
5. The data is growing continuously and you need to scale the database
regularly to handle the data.
In conclusion, NoSQL databases offer several benefits over traditional
relational databases, such as scalability, flexibility, and cost-
effectiveness. However, they also have several drawbacks, such as a
lack of standardization, lack of ACID compliance, and lack of support for
complex queries. When choosing a database for a specific application, it
is important to weigh the benefits and drawbacks carefully to determine
the best fit.

Integration Databases in NoSQL


In NoSQL databases, integration databases refer to databases that
combine different types of NoSQL databases and/or traditional relational
databases to provide a comprehensive and flexible data storage solution.
This can help organizations to overcome some of the limitations of using
a single type of database and to take advantage of the strengths of
multiple database types.
Integration databases typically use a middleware layer to connect and
communicate between the different types of databases. The middleware
layer provides a uniform interface for applications to access data across
the different databases, which can simplify application development and
improve performance.
One of the benefits of integration databases is that they can help
organizations to use the best database type for each data storage
requirement. For example, some data may be best stored in a document
database, while other data may be best stored in a graph database. By
using an integration database, organizations can store all of their data in
a single place while taking advantage of the strengths of each database
type.
Another benefit of integration databases is that they can provide greater
scalability and reliability than using a single database. By using a
distributed database architecture, organizations can distribute their data
across multiple servers and data centers, which can improve
performance and provide greater resilience in the event of a hardware
failure or other issues.

Some popular integration databases in NoSQL include:

1. Apache Cassandra: A distributed database that is designed for


scalability and high availability, and supports multiple data models,
including column-family, document, and graph.
2. Apache Hadoop: A distributed data processing framework that
supports a variety of data sources, including HBase, Cassandra, and
MongoDB.
3. Apache Kafka: A distributed streaming platform that can be used to
integrate multiple data sources and to stream data to multiple
destinations.
4. Overall, integration databases in NoSQL can provide a powerful
solution for organizations that need to store and manage large
volumes of data across multiple data sources. By using a middleware
layer to connect and communicate between different types of
databases, organizations can take advantage of the strengths of each
database type and provide a flexible and scalable solution for their
data storage needs.

Nowadays, an enormous amount of information is been made each


second. This
information is of different schemes-unstructured, structured, and semi-
structured data. The variety and volume of this information can’t be
managed by traditional databases. Therefore, NoSQL frameworks have
emerged, which is another age of database frameworks.
To handle data that is heterogeneous, NoSQL databases are more
proficient at this. Tools that can be used to scale for accommodating a
large volume of information are required by NoSQL data integration,
however, manual complicated coding is required by conventional SQL
ETL tools. and also they include methods disturbing creation sources.
A database serving as a store for numerous applications is called an
integration database and therefore, data is integrated across applications.
A schema is needed by an integration database, and all applications of
clients are taken by the schema into account. Either the resultant schema
is general or complicated or both.
Here is an example for a better understanding of the integration
database. For example, the computation data of an organization is stored
in the Oracle database and client information is stored in Salesforce. The
employees can get the integrated data of the two frameworks in a single
spot with the help of database integration processes. website database
integration is used by a few organizations for managing and bringing
together information from different site pages. Database integration is
only viable with the consolidation of data from on-premise systems,
legacy systems, and cloud databases. Different software is used by each
company.

Aggregate Data Model in NoSQL


We know, NoSQL are databases that store data in another format other
than relational databases. NoSQL deals in nearly every industry
nowadays. For the people who interact with data in databases, the
Aggregate Data model will help in that interaction.
Features of NoSQL Databases:
 Schema Agnostic: NoSQL Databases do not require any specific
schema or s storage structure than traditional RDBMS.
 Scalability: NoSQL databases scale horizontally as data grows
rapidly certain commodity hardware could be added and scalability
features could be preserved for NoSQL.
 Performance: To increase the performance of the NoSQL system one
can add a different commodity server than reliable and fast access of
database transfer with minimum overhead.
 High Availability: In traditional RDBMS it relies on primary and
secondary nodes for fetching the data, Some NoSQL databases use
master place architecture.
 Global Availability: As data is replicated among multiple servers and
clouds the data is accessible to anyone, this minimizes the latency
period.

Aggregate Data Models:

The term aggregate means a collection of objects that we use to treat as


a unit. An aggregate is a collection of data that we interact with as a unit.
These units of data or aggregates form the boundaries for ACID
operation.
Example of Aggregate Data Model:
Here in the diagram have two Aggregate:
 Customer and Orders link between them represent an aggregate.
 The diamond shows how data fit into the aggregate structure.
 Customer contains a list of billing address
 Payment also contains the billing address
 The address appears three times and it is copied each time
 The domain is fit where we don’t want to change shipping and billing
address.
Consequences of Aggregate Orientation:
 Aggregation is not a logical data property It is all about how the data is
being used by applications.
 An aggregate structure may be an obstacle for others but help with
some data interactions.
 It has an important consequence for transactions.
 NoSQL databases don’t support ACID transactions thus sacrificing
consistency.
 aggregate-oriented databases support the atomic manipulation of a
single aggregate at a time.
Advantage:
 It can be used as a primary data source for online applications.
 Easy Replication.
 No single point Failure.
 It provides fast performance and horizontal Scalability.
 It can handle Structured semi-structured and unstructured data with
equal effort.
Disadvantage:
 No standard rules.
 Limited query capabilities.
 Doesn’t work well with relational data.
 Not so popular in the enterprise.
 When the value of data increases it is difficult to maintain unique
values.

Key-Value Data Model in NoSQL


A key-value data model or database is also referred to as a key-value
store. It is a non-relational type of database. In this, an associative array
is used as a basic database in which an individual key is linked with just
one value in a collection. For the values, keys are special identifiers. Any
kind of entity can be valued. The collection of key-value pairs stored on
separate records is called key-value databases and they do not have an
already defined structure.

How do key-value databases work?

A number of easy strings or even a complicated entity are referred to as a


value that is associated with a key by a key-value database, which is
utilized to monitor the entity. Like in many programming paradigms, a
key-value database resembles a map object or array, or dictionary,
however, which is put away in a tenacious manner and controlled by a
DBMS.
An efficient and compact structure of the index is used by the key-value
store to have the option to rapidly and dependably find value using its
key. For example, Redis is a key-value store used to tracklists, maps,
heaps, and primitive types (which are simple data structures) in a
constant database. Redis can uncover a very basic point of interaction to
query and manipulate value types, just by supporting a predetermined
number of value types, and when arranged, is prepared to do high
throughput.

When to use a key-value database:

Here are a few situations in which you can use a key-value database:-
 User session attributes in an online app like finance or gaming, which
is referred to as real-time random data access.
 Caching mechanism for repeatedly accessing data or key-based
design.
 The application is developed on queries that are based on keys.

Features:

 One of the most un-complex kinds of NoSQL data models.


 For storing, getting, and removing data, key-value databases utilize
simple functions.
 Querying language is not present in key-value databases.
 Built-in redundancy makes this database more reliable.

Advantages:

 It is very easy to use. Due to the simplicity of the database, data can
accept any kind, or even different kinds when required.
 Its response time is fast due to its simplicity, given that the remaining
environment near it is very much constructed and improved.
 Key-value store databases are scalable vertically as well as
horizontally.
 Built-in redundancy makes this database more reliable.

Disadvantages:

 As querying language is not present in key-value databases,


transportation of queries from one database to a different database
cannot be done.
 The key-value store database is not refined. You cannot query the
database without a key.

Some examples of key-value databases:

Here are some popular key-value databases which are widely used:
 Couchbase: It permits SQL-style querying and searching for text.
 Amazon DynamoDB: The key-value database which is mostly used is
Amazon DynamoDB as it is a trusted database used by a large
number of users. It can easily handle a large number of requests
every day and it also provides various security options.
 Riak: It is the database used to develop applications.
 Aerospike: It is an open-source and real-time database working with
billions of exchanges.
 Berkeley DB: It is a high-performance and open-source database
providing scalability.

Columnar Data Model of NoSQL


The Columnar Data Model of NoSQL is important. NoSQL databases are
different from SQL databases. This is because it uses a data model that
has a different structure than the previously followed row-and-column
table model used with relational database management systems
(RDBMS). NoSQL databases are a flexible schema model which is
designed to scale horizontally across many servers and is used in large
volumes of data.

Columnar Data Model of NoSQL :

Basically, the relational database stores data in rows and also reads the
data row by row, column store is organized as a set of columns. So if
someone wants to run analytics on a small number of columns, one can
read those columns directly without consuming memory with the
unwanted data. Columns are somehow are of the same type and gain
from more efficient compression, which makes reads faster than before.
Examples of Columnar Data Model: Cassandra and Apache Hadoop
Hbase.

Working of Columnar Data Model:

In Columnar Data Model instead of organizing information into rows, it


does in columns. This makes them function the same way that tables
work in relational databases. This type of data model is much more
flexible obviously because it is a type of NoSQL database. The below
example will help in understanding the Columnar data model:
Row-Oriented Table:

S.No. Name Course Branch ID

01. Tanmay B-Tech Computer 2

Electronic
02. Abhishek B-Tech 5
s
S.No. Name Course Branch ID

03. Samriddha B-Tech IT 7

04. Aditi B-Tech E & TC 8


Column – Oriented Table:

S.No. Name ID

01. Tanmay 2

02. Abhishek 5

03. Samriddha 7

04. Aditi 8

S.No. Course ID

01. B-Tech 2

02. B-Tech 5

03. B-Tech 7

04. B-Tech 8

S.No. Branch ID

01. Computer 2

02. Electronics 5

03. IT 7

04. E & TC 8
Columnar Data Model uses the concept of keyspace, which is like a
schema in relational models.
Advantages of Columnar Data Model :

 Well structured: Since these data models are good at compression


so these are very structured or well organized in terms of storage.
 Flexibility: A large amount of flexibility as it is not necessary for the
columns to look like each other, which means one can add new and
different columns without disrupting the whole database
 Aggregation queries are fast: The most important thing is
aggregation queries are quite fast because a majority of the
information is stored in a column. An example would be Adding up the
total number of students enrolled in one year.
 Scalability: It can be spread across large clusters of machines, even
numbering in thousands.
 Load Times: Since one can easily load a row table in a few seconds
so load times are nearly excellent.

Disadvantages of Columnar Data Model:

 Designing indexing Schema: To design an effective and working


schema is too difficult and very time-consuming.
 Suboptimal data loading: incremental data loading is suboptimal and
must be avoided, but this might not be an issue for some users.
 Security vulnerabilities: If security is one of the priorities then it must
be known that the Columnar data model lacks inbuilt security features
in this case, one must look into relational databases.
 Online Transaction Processing (OLTP): Online Transaction
Processing (OLTP) applications are also not compatible with columnar
data models because of the way data is stored.

Applications of Columnar Data Model:

 Columnar Data Model is very much used in various Blogging


Platforms.
 It is used in Content management systems like WordPress, Joomla,
etc.
 It is used in Systems that maintain counters.
 It is used in Systems that require heavy write requests.
 It is used in Services that have expiring usage.

Graph Based Data Model in NoSQL


Graph Based Data Model in NoSQL is a type of Data Model which tries to
focus on building the relationship between data elements. As the name
suggests Graph-Based Data Model, each element here is stored as a
node, and the association between these elements is often known as
Links. Association is stored directly as these are the first-class elements
of the data model. These data models give us a conceptual view of the
data.
These are the data models which are based on topographical network
structure. Obviously, in graph theory, we have terms like Nodes, edges,
and properties, let’s see what it means here in the Graph-Based data
model.
 Nodes: These are the instances of data that represent objects which
is to be tracked.
 Edges: As we already know edges represent relationships between
nodes.
 Properties: It represents information associated with nodes.
The below image represents Nodes with properties from relationships
represented by edges.

Working of Graph Data Model :

In these data models, the nodes which are connected together are
connected physically and the physical connection among them is also
taken as a piece of data. Connecting data in this way becomes easy to
query a relationship. This data model reads the relationship from storage
directly instead of calculating and querying the connection steps. Like
many different NoSQL databases these data models don’t have any
schema as it is important because schema makes the model well and
good and easy to edit.
Examples of Graph Data Models :
 JanusGraph: These are very helpful in big data analytics. It is a
scalable graph database system open source too. JanusGraph has
different features like:
 Storage: Many options are available for storing graph data
like Cassandra.
 Support for transactions: There are many supports available like
ACID (Atomicity, Consistency, Isolation, and Durability) which can
hold thousands of concurrent users.
 Searching options: Complex searching options are available and
optional support too.
 Neo4j: It stands for Network Exploration and Optimization 4 Java. As
the name suggests this graph database is written in Java with native
graph storage and processing. Neo4j has different features like:
 Scalable: Scalable through data partitioning into pieces known as
shards.
 Higher Availability: Availability is very much high due to continuous
backups and rolling upgrades.
 Query Language: Uses programmer-friendly query language Cypher
graph query language.DGraph main features are:
 DGraph: It is an open-source distributed graph database system
designed with scalability.
 Query Language: It uses GraphQL, which is solely made for APIs.
 open-source system: support for many open standards.

Advantages of Graph Data Model :

 Structure: The structures are very agile and workable too.


 Explicit Representation: The portrayal of relationships between
entities is explicit.
 Real-time O/P Results: Query gives us real-time output results.

Disadvantages of Graph Data Model :

 No standard query language: Since the language depends on the


platform that is used so there is no certain standard query language.
 Unprofessional Graphs: Graphs are very unprofessional for
transactional-based systems.
 Small User Base: The user base is small which makes it very difficult
to get support when running into a system.

Applications of Graph Data Model:


 Graph data models are very much used in fraud detection which itself
is very much useful and important.
 It is used in Digital asset management which provides a scalable
database model to keep track of digital assets.
 It is used in Network management which alerts a network
administrator about problems in a network.
 It is used in Context-aware services by giving traffic updates and
many more.
 It is used in Real-Time Recommendation Engines which provide a
better user experience.

References
1. https://www.geeksforgeeks.org/columnar-data-model-of-nosql/
2. https://acs.dypvp.edu.in/NAAC/Database-Technology.pdf
3. https://www.studocu.com/in/document/visvesvaraya-
technological-university/nosql-databases/module-1-notes-nosql/
29646579
4.

You might also like