0% found this document useful (0 votes)
3 views18 pages

DBMS Module 5 Part 2

The document provides an overview of NoSQL databases, highlighting their characteristics, types, and differences from traditional SQL databases. It explains the advantages and disadvantages of NoSQL, emphasizing its flexibility, scalability, and ability to handle unstructured data. Specific NoSQL databases like Redis and MongoDB are discussed, detailing their features, use cases, and operational mechanisms.

Uploaded by

me1301me
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views18 pages

DBMS Module 5 Part 2

The document provides an overview of NoSQL databases, highlighting their characteristics, types, and differences from traditional SQL databases. It explains the advantages and disadvantages of NoSQL, emphasizing its flexibility, scalability, and ability to handle unstructured data. Specific NoSQL databases like Redis and MongoDB are discussed, detailing their features, use cases, and operational mechanisms.

Uploaded by

me1301me
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

CST204 DBMS

MODULE 5 PART 3

Introduction to NoSQL Databases, Main characteristics of Key-value DB (examples from: Redis),


Document DB (examples from: MongoDB)
Main characteristics of Column - Family DB (examples from: Cassandra) and Graph DB
(examplesfrom :ArangoDB)

What Is NoSQL?
Databases can be considered as one of the important component entity for technology and
applications. Data need to be stored in a specific structure and format to retrieve it whenever
required. But, there are situations where data are not always in a structured format, i.e., their schemas
are not rigid. In this chapter, you will learn in details about NoSQL and its characteristic features.

NoSQL can be defined as an approach to database designing, which holds a vast diversity of data such
as key-value, multimedia, document, columnar, graph formats, external files, etc. NoSQL is
purposefully developed for handling specific data models having flexible schemas to build modern
applications.

NoSQL is famous for its high functionality, ease of development with a performance at scale. Because
of such diverse data handling feature, NoSQL is called a non-relational database. It does not follow the
rules of Relational Database Management Systems (RDBMS), and hence do not use traditional SQL
statements to query your data. Some famous examples are MongoDB, Neo4J, HyperGraphDB, etc.

Why NoSQL?
The concept of NoSQL databases became popular with Internet giants like Google,
Facebook, Amazon, etc. who deal with huge volumes of data. The system response time
becomes slow when you use RDBMS for massive volumes of data.

To resolve this problem, we could “scale up” our systems by upgrading our existing
hardware. This process is expensive.

The alternative for this issue is to distribute database load on multiple hosts whenever the
load increases. This method is known as “scaling out.”

NoSQL database is non-relational, so it scales out better than relational databases as they
are designed with web applications in mind.

Ambily Mohan, Dept of CSE, JBCMET


CST204 DBMS

Features of NoSQL
Non-relational

• NoSQL databases never follow the relational model


• Never provide tables with flat fixed-column records
• Work with self-contained aggregates or BLOBs
• Doesn’t require object-relational mapping and data normalization
• No complex features like query languages, query planners,referential integrity joins,
ACID

Schema-free

• NoSQL databases are either schema-free or have relaxed schemas


• Do not require any sort of definition of the schema of the data
• Offers heterogeneous structures of data in the same domain

NoSQL is Schema-Free

Simple API

• Offers easy to use interfaces for storage and querying data provided
• APIs allow low-level data manipulation & selection methods
• Text-based protocols mostly used with HTTP REST with JSON
• Mostly used no standard based NoSQL query language
• Web-enabled databases running as internet-facing services

Distributed

• Multiple NoSQL databases can be executed in a distributed fashion


• Offers auto-scaling and fail-over capabilities
• Often ACID concept can be sacrificed for scalability and throughput
• Mostly no synchronous replication between distributed nodes Asynchronous Multi-
Master Replication, peer-to-peer, HDFS Replication

Ambily Mohan, Dept of CSE, JBCMET


CST204 DBMS

• Only providing eventual consistency


• Shared Nothing Architecture. This enables less coordination and higher distribution.

NoSQL is Shared Nothing.

Types of NoSQL Databases


NoSQL databases usually fall under any one of these four categories:

1. Key-value stores: is the most straightforward type where every item of your database gets stored
in the form of an attribute name (i.e., "key") along with the value.
2. Wide-column stores: accumulate data collectively as a column rather than rows which are
optimized for querying big datasets.
3. Document databases: couple every key with a composite data structure termed as a document.
These documents hold a lot of different key-value pairs, as well as key-array pairs or sometimes
nested documents.
4. Graph databases: are used for storing information about networks, like social connections.

Ambily Mohan, Dept of CSE, JBCMET


CST204 DBMS

Difference Between NoSQL and SQL


Here is the list of comparisons between both the DBMS:

• SQL databases are mainly coming under Relational Databases (RDBMS) whereas NoSQL databases
mostly come under non-relational or distributed database.
• SQL databases are table-oriented databases, whereas NoSQL databases document-oriented have
key-value pairs or wide-column stores or graph databases.
• SQL databases have a predefined or static schema that is rigid, whereas NoSQL databases have
dynamic or flexible schema to handle unstructured data.
• SQL is used to store structured data, whereas NoSQL is used to store structured as well as
unstructured data.
• SQL databases can be considered as vertically scalable, but NoSQL databases are considered
horizontally scalable.
• Scaling of SQL databases is done by mounting the horse-power of your hardware. But, scaling of
NoSQL databases is calculated by mounting the databases servers for reducing the load.
• Examples of SQL databases: MySql, Sqlite, Oracle, Postgres SQL, and MS-SQL. Examples of NoSQL
databases: BigTable, MongoDB, Redis, Cassandra, RavenDb, Hbase, CouchDB and Neo4j
• When your queries are complex SQL databases are a good fit for the intensive environment, and
NoSQL databases are not an excellent fit for complex queries. Queries of NoSQL are not that
powerful as compared to SQL query language.
• SQL databases need vertical scalability, i.e., excess of load can be managed by increasing the CPU,
SSD, RAM, GPU, etc., on your server. In the case of NoSQL databases, they horizontally scalable,
i.e., the addition of more servers will ease out the load management thing to handle.

RDBMS NOSQL

Structured and organized database schema Database schema is dynamic, unstructured


and flexible
Database schema needs to be defined Database schema is dynamic and need not
beforehand be defined beforehand
Table based databases NoSQL databases can be document
oriented, key-value pairs, graph databases
or column oriented.
Vertically scalable Horizontally scalable
Best fit for transaction intensive applications It does not support complex complex
transaction
It emphasizes on ACID properties ( Atomicity, It follows the Brewers CAP theorem (
Consistency, Isolation and Durability) Consistency, Availability and Partition
tolerance )

Ambily Mohan, Dept of CSE, JBCMET


CST204 DBMS

RDBMS NOSQL

It uses structured query language (SQL), Data Query language varies from database to
Manipulation Language (DML), Data database.
Definition Language (DDL) for defining and
manipulating the data.
Examples- MySql, Oracle, Sqlite, Postgres and Examples- MongoDB, Redis, Hbase,
MS-SQL. RavenDb, Cassandra, Neo4j and CouchDb

Advantages of NoSQL

• Can be used as Primary or Analytic Data Source


• Big Data Capability
• No Single Point of Failure
• Easy Replication
• No Need for Separate Caching Layer
• It provides fast performance and horizontal scalability.
• Can handle structured, semi-structured, and unstructured data with equal effect
• Object-oriented programming which is easy to use and flexible
• NoSQL databases don’t need a dedicated high-performance server
• Support Key Developer Languages and Platforms
• Simple to implement than using RDBMS
• It can serve as the primary data source for online applications.
• Handles big data which manages data velocity, variety, volume, and complexity
• Excels at distributed database and multi-data center operations
• Eliminates the need for a specific caching layer to store data
• Offers a flexible schema design which can easily be altered without downtime or
service disruption

Disadvantages of NoSQL

• No standardization rules
• Limited query capabilities
• RDBMS databases and tools are comparatively mature
• It does not offer any traditional database capabilities, like consistency when multiple
transactions are performed simultaneously.
• When the volume of data increases it is difficult to maintain unique values as keys
become difficult
• Doesn’t work as well with relational data
• The learning curve is stiff for new developers
• Open source options so not so popular for enterprises.

Ambily Mohan, Dept of CSE, JBCMET


CST204 DBMS

Web Resource: https://www.w3resource.com/redis/


web Resource: https://www.w3schools.in/category/mongodb/
Web Resource: https://www.tutorialspoint.com/cassandra/cassandra_introduction.htm
Web Resource : https://www.tutorialspoint.com/arangodb/index.htm

Redis

Redis, developed in 2009, is a flexible, open-source (BSD licensed), in-memory data


structure store, used as database, cache, and message broker. Following in the
footsteps of other NoSQL databases, such as Cassandra, CouchDB, and MongoDB,
Redis allows the user to store vast amounts of data without the limits of a relational
database.
It supports various data structures such as strings, hashes, sets, lists, sorted sets,
bitmaps, hyperloglogs and geospatial indexes with radius queries.
Redis has built-in replication, Lua scripting, LRU eviction, transactions and different
levels of on-disk persistence, and provides high availability via Redis Sentinel and
automatic partitioning with Redis Cluster.
Redis can be compiled and used on Linux, OSX, OpenBSD, NetBSD, FreeBSD. It
supports big endian and little endian architectures, and both 32 bit and 64 bit
systems.
Redis is maintained and developed by Salvatore Sanfilippo. In the past, Pieter
Noordhuis and Matt Stancliff provided a very significant amount of code and ideas to
both the Redis core and client libraries.

What does the name Redis mean?

Redis stands for REmote DIctionary Server.

What is Redis used for?

Redis is an advanced key-value store that can function as a NoSQL database or as


a memory-cache store to improve performance when serving data that is stored in
system memory.

How to interact with Redis?

Once installed in a server, run the Redis CLI (Command Line Interface) to issue
commands to Redis. While working on the CLI tool, your command-line prompt will
change to: redis>

Features:

Ambily Mohan, Dept of CSE, JBCMET


CST204 DBMS

• Speed : Redis loads the whole dataset in memory. It loads up to 110,000


SETs/second and 81,000 GETs/second can be achieved in an entry level
Linux box. Redis supports Pipelining of commands and getting and setting
multiple values in a single command to speed up communication with the
client libraries.

• Persistence : While all the data lives in memory, changes are asynchronously
saved on disk using flexible policies based on elapsed time and/or number of
updates since last save. Redis supports an append-only file persistence
mode. Check more on Persistence, or read the AppendOnlyFileHowto for
more information.

• Data Structures : It supports data structures such as strings, hashes, sets,


lists, sorted sets with range queries, bitmaps, hyperloglogs and geospatial
indexes with radius queries.

• Atomic Operations : Redis operations working on the different Data Types are
atomic, so setting or increasing a key, adding and removing elements from a
set, increasing a counter will all be accomplished safely.

• Supported Languages : Many languages have Redis bindings, including:


ActionScript, C, C++, C#, Clojure, Common Lisp, D, Dart, Erlang, Go, Haskell,
Haxe, Io, Java, JavaScript (Node.js), Julia, Lua, Objective-C, Perl, PHP, Pure
Data, Python, R, Racket, Ruby, Rust, Scala, Smalltalk and Tcl.

• Master/Slave Replication : Redis supports a very simple and fast


Master/Slave replication. Is so simple it takes only one line in the configuration
file to set it up, and 21 seconds for a Slave to complete the initial sync of 10
MM key set on an Amazon EC2 instance.

• Sharding : Distributing the dataset across multiple Redis instances is easy in


Redis, as in any other key-value store. And this depends basically on the
Languages client libraries being able to do so.

• Portable : Redis is written in ANSI C and works in most POSIX systems like
Linux, BSD, Mac OS X, Solaris, and so on. Redis is reported to compile and
work under WIN32 if compiled with Cygwin, but there is no official support for
Windows currently.

Why is Redis is different compared to other key-value stores?

• Redis is a different evolution path in the key-value DBs where values can
contain more complex data types, with atomic operations defined on those

Ambily Mohan, Dept of CSE, JBCMET


CST204 DBMS

data types. Redis data types are closely related to fundamental data
structures and are exposed to the programmer as such, without additional
abstraction layers.

• Redis is an in-memory but persistent on disk database, so it represents a


different trade off where very high write and read speed is achieved with the
limitation of data sets that can't be larger than memory. Another advantage of
in memory databases is that the memory representation of complex data
structures is much simpler to manipulate compared to the same data structure
on disk, so Redis can do a lot, with little internal complexity. At the same time
the two on-disk storage formats (RDB and AOF) don't need to be suitable for
random access, so they are compact and always generated in an append-
only fashion.

What happens if Redis runs out of memory?

Redis will either be killed by the Linux kernel OOM killer, crash with an error or will
start to slow down. With modern operating systems malloc() returning NULL is not
common, usually the server will start swapping, and Redis performance will degrade,
so you'll probably notice there is something wrong.
The INFO command will report the amount of memory Redis is using so you can
write scripts that monitor your Redis servers checking for critical conditions.
Redis has built-in protections allowing the user to set a max limit to memory usage,
using the maxmemory option in the config file to put a limit to the memory Redis can
use. If this limit is reached Redis will start to reply with an error to write commands,
or you can configure it to evict keys when the max memory limit is reached in the
case you are using Redis for caching.
You can easily build complex systems on top of Redis, here is a sample list :

• User defined indexing schemes

• Message queues with real time new element notification

• Directed and undirected graph stores for following or friending systems

• Real-time publish/subscribe notification systems

• Real-time analytics backends

• Bloom filter servers

• Task queues and job systems

• High score leaderboards

• User ranking systems

• Hierarchical/tree structured storage systems

Ambily Mohan, Dept of CSE, JBCMET


CST204 DBMS

• Individual personalized news or data feeds for your users

MongoDB
MongoDB can be defined as a document-oriented database system that uses the concept
of NoSQL. It also provides high availability, high performance, along with automatic scaling.
This open-source product was developed by the company - 10gen in October 2007, and the
company also maintains it. MongoDB exists under the General Public License (GPL) as a free
database management tool as well as available under Commercial license as of the
manufacturer. MongoDB was also intended to function with commodity servers. Companies
of different sizes all over the world across all industries are using MongoDB as their database.
In MongoDB,, a database can be defined as a physical container for collections of data.
Here, on the file system, every database has its collection of files residing. Usually, a MongoDB
server contains numerous databases.
Collections can be defined as a cluster of MongoDB documents that exist within a
single database. You can relate this to that of a table in a relational database management
system. MongoDB collections do not implement the concept of schema. Documents that have
collection usually contain different fields. Typically, all the documents residing within a
collection are meant for a comparable or related purpose.
A document can be defined as a collection of key-value pairs that contain dynamic
schema. Dynamic schema is something that documents of the equal collection do not require
for having the same collection of fields or construction, and a common field is capable of
holding various types of data.
The terminologies used in RDBMS and MongoDB
RDBMS MongoDB

Database Database

Table Collection

Tuple or Row Document

Column Field

Table Join Embedded Documents

Primary Key Primary key / Default key

Ambily Mohan, Dept of CSE, JBCMET


CST204 DBMS

Mysqld / Oracle mongod

Here is a list of some popular and multinational companies and organizations that are
using MongoDB as their official database to perform and manage different business
applications.
• Adobe
• McAfee
• LinkedIn
• FourSquare
• MetLife
• eBay
• SAP
Where Is MongoDB Used?
Beginners need to know the purpose and requirement of why to use MongoDB or what
is the need of it in contrast to SQL and other database systems. In simple words, it can be said
that every modern-day application involves the concept of big data, analyzing different forms
of data, fast features improvement in handling data, deployment flexibility, which old database
systems are not competent enough to handle. Hence, MongoDB is the next choice.

Why Use MongoDB?


Some basic requirements are supported by this NoSQL database, which is lacking in
other database systems. These collective reasons make MongoDB popular among other
database systems:
• Document-Oriented data storage, i.e., data, is stored in a JSON style format, which
increases the readability of data as well.
• Replication and high availability of data.
• MongoDB provides Auto-sharding.
• Ad hoc queries are supported by MongoDB, which helps in searching by range queries,
field, or using regex terms.
• Indexing of values can be used to create and improve the overall search performance in
MongoDB. MongoDB allows any field to be indexed within a document.
• MongoDB has a rich collection of queries.
• Updating of data can be done at a faster pace.
• It can be integrated with other popular programming languages also to handle structured
as well as unstructured data within various types of applications.
Advantages of Using MongoDB
• It is easy to set up, i.e., install the MongoDB.
• Since MongoDB is a schema-less database, so there is no hassle of schema migration.
• Since it is a document-oriented language, document queries are used, which plays a vital
role in supporting dynamic queries.
• Easily scalable.
• It is easy to have a performance tuning as compared to other relational databases.
• It helps in providing fast accessing of data because of its nature of implementing the
internal memory to store the data.

Ambily Mohan, Dept of CSE, JBCMET


CST204 DBMS

• MongoDB is also used as a file system that can help in easy management of load
balancing.
• MongoDB also supports the searching using the concept of regex (regular expression) as
well as fields.
• Users can run MongoDB as a windows service also.
• It does not require any VM to run on different platforms.
• It also supports sharding of data.

What are the various areas where MongoDB can be used?


MongoDB can be used in supporting content management systems, online and offline
gaming applications, e-commerce systems, mobile applications, data analytics section,
archiving, as well as logging.

Cassandra
Apache Cassandra is an open source, distributed and decentralized/distributed storage
system (database), for managing very large amounts of structured data spread out across the
world. It provides highly available service with no single point of failure.
Listed below are some of the notable points of Apache Cassandra −
• It is scalable, fault-tolerant, and consistent.
• It is a column-oriented database.
• Its distribution design is based on Amazon’s Dynamo and its data model on Google’s
Bigtable.
• Created at Facebook, it differs sharply from relational database management systems.
• Cassandra implements a Dynamo-style replication model with no single point of
failure, but adds a more powerful “column family” data model.
• Cassandra is being used by some of the biggest companies such as Facebook, Twitter,
Cisco, Rackspace, ebay, Twitter, Netflix, and more.
Features of Cassandra
• Elastic scalability − Cassandra is highly scalable; it allows to add more hardware to
accommodate more customers and more data as per requirement.
• Always on architecture − Cassandra has no single point of failure and it is
continuously available for business-critical applications that cannot afford a failure.
• Fast linear-scale performance − Cassandra is linearly scalable, i.e., it increases your
throughput as you increase the number of nodes in the cluster. Therefore it maintains
a quick response time.
• Flexible data storage − Cassandra accommodates all possible data formats including:
structured, semi-structured, and unstructured. It can dynamically accommodate
changes to your data structures according to your need.
• Easy data distribution − Cassandra provides the flexibility to distribute data where
you need by replicating data across multiple data centers.
• Transaction support − Cassandra supports properties like Atomicity, Consistency,
Isolation, and Durability (ACID).
• Fast writes − Cassandra was designed to run on cheap commodity hardware. It
performs blazingly fast writes and can store hundreds of terabytes of data, without
sacrificing the read efficiency.

Ambily Mohan, Dept of CSE, JBCMET


CST204 DBMS

The design goal of Cassandra is to handle big data workloads across multiple nodes without
any single point of failure. Cassandra has peer-to-peer distributed system across its nodes, and
data is distributed among all the nodes in a cluster.
• All the nodes in a cluster play the same role. Each node is independent and at the
same time interconnected to other nodes.
• Each node in a cluster can accept read and write requests, regardless of where the data
is actually located in the cluster.
• When a node goes down, read/write requests can be served from other nodes in the
network.
Data Replication in Cassandra
In Cassandra, one or more of the nodes in a cluster act as replicas for a given piece of
data. If it is detected that some of the nodes responded with an out-of-date value, Cassandra
will return the most recent value to the client. After returning the most recent value, Cassandra
performs a read repair in the background to update the stale values.
The following figure shows a schematic view of how Cassandra uses data replication among
the nodes in a cluster to ensure no single point of failure.

Note − Cassandra uses the Gossip Protocol in the background to allow the nodes to
communicate with each other and detect any faulty nodes in the cluster.
Components of Cassandra
The key components of Cassandra are as follows −
• Node − It is the place where data is stored.
• Data center − It is a collection of related nodes.
• Cluster − A cluster is a component that contains one or more data centers.
• Commit log − The commit log is a crash-recovery mechanism in Cassandra. Every
write operation is written to the commit log.

Ambily Mohan, Dept of CSE, JBCMET


CST204 DBMS

• Mem-table − A mem-table is a memory-resident data structure. After commit log, the


data will be written to the mem-table. Sometimes, for a single-column family, there
will be multiple mem-tables.
• SSTable − It is a disk file to which the data is flushed from the mem-table when its
contents reach a threshold value.
• Bloom filter − These are nothing but quick, nondeterministic, algorithms for testing
whether an element is a member of a set. It is a special kind of cache. Bloom filters
are accessed after every query.
Cassandra Query Language
Users can access Cassandra through its nodes using Cassandra Query Language (CQL).
CQL treats the database (Keyspace) as a container of tables. Programmers use cqlsh: a prompt
to work with CQL or separate application language drivers.
Clients approach any of the nodes for their read-write operations. That node
(coordinator) plays a proxy between the client and the nodes holding the data.
Write Operations
Every write activity of nodes is captured by the commit logs written in the nodes. Later
the data will be captured and stored in the mem-table. Whenever the mem-table is full, data
will be written into the SStable data file. All writes are automatically partitioned and replicated
throughout the cluster. Cassandra periodically consolidates the SSTables, discarding
unnecessary data.
Read Operations
During read operations, Cassandra gets values from the mem-table and checks the bloom filter
to find the appropriate SSTable that holds the required data.
How Cassandra stores its data.
Cluster: Cassandra database is distributed over several machines that operate together.
The outermost container is known as the Cluster. For failure handling, every node contains a
replica, and in case of a failure, the replica takes charge. Cassandra arranges the nodes in a
cluster, in a ring format, and assigns data to them.
Keyspace is the outermost container for data in Cassandra. The basic attributes of a
Keyspace in Cassandra are −
• Replication factor − It is the number of machines in the cluster that will receive copies
of the same data.
• Replica placement strategy − It is nothing but the strategy to place replicas in the ring.
We have strategies such as simple strategy (rack-aware strategy), old network
topology strategy (rack-aware strategy), and network topology strategy (datacenter-
shared strategy).
• Column families − Keyspace is a container for a list of one or more column families.
A column family, in turn, is a container of a collection of rows. Each row contains
ordered columns. Column families represent the structure of your data. Each keyspace
has at least one and often many column families.
The syntax of creating a Keyspace is as follows −
CREATE KEYSPACE Keyspace name
WITH replication = {'class': 'SimpleStrategy', 'replication_factor' : 3};
The following illustration shows a schematic view of a Keyspace.

Ambily Mohan, Dept of CSE, JBCMET


CST204 DBMS

A column family is a container for an ordered collection of rows. Each row, in turn, is
an ordered collection of columns. The following table lists the points that differentiate a
column family from a table of relational databases.

Relational Table Cassandra column Family

A schema in a relational model is fixed. In Cassandra, although the


Once we define certain columns for a table, column families are defined, the columns
while inserting data, in every row all the are not. You can freely add any column
columns must be filled at least with a null to any column family at any time.
value.

Relational tables define only columns In Cassandra, a table contains


and the user fills in the table with values. columns, or can be defined as a super
column family.

A Cassandra column family has the following attributes −


• keys_cached − It represents the number of locations to keep cached per SSTable.
• rows_cached − It represents the number of rows whose entire contents will be cached
in memory.
• preload_row_cache − It specifies whether you want to pre-populate the row cache.
Note − Unlike relational tables where a column family’s schema is not fixed, Cassandra does
not force individual rows to have all the columns.
The following figure shows an example of a Cassandra column family.

Ambily Mohan, Dept of CSE, JBCMET


CST204 DBMS

A column is the basic data structure of Cassandra with three values, namely key or
column name, value, and a time stamp. Given below is the structure of a column.

A super column is a special column, therefore, it is also a key-value pair. But a super
column stores a map of sub-columns.
Generally column families are stored on disk in individual files. Therefore, to optimize
performance, it is important to keep columns that you are likely to query together in the same
column family, and a super column can be helpful here.Given below is the structure of a super
column.

Data Models of Cassandra and RDBMS

RDBMS Cassandra

RDBMS deals with structured data. Cassandra deals with unstructured data.

It has a fixed schema. Cassandra has a flexible schema.

Ambily Mohan, Dept of CSE, JBCMET


CST204 DBMS

In RDBMS, a table is an array of arrays. In Cassandra, a table is a list of “nested key-


(ROW x COLUMN) value pairs”. (ROW x COLUMN key x
COLUMN value)

Database is the outermost container that Keyspace is the outermost container that
contains data corresponding to an contains data corresponding to an application.
application.

Tables are the entities of a database. Tables or column families are the entity of a
keyspace.

Row is an individual record in RDBMS. Row is a unit of replication in Cassandra.

Column represents the attributes of a Column is a unit of storage in Cassandra.


relation.

RDBMS supports the concepts of foreign Relationships are represented using collections.
keys, joins.

ArangoDB
ArangoDB is hailed as a native multi-model database by its developers. This is unlike
other NoSQL databases. In this database, the data can be stored as documents, key/value pairs
or graphs. And with a single declarative query language, any or all of your data can be
accessed. Moreover, different models can be combined in a single query. And, owing to its
multi-model style, one can make lean applications, which will be scalable horizontally with
any or all of the three data models.
Layered vs. Native Multi-Model Databases
Many database vendors call their product “multi-model,” but adding a graph layer to a
key/value or document store does not qualify as native multi-model.
With ArangoDB, the same core with the same query language, one can club together
different data models and features in a single query. In ArangoDB, there is no “switching”
between data models, and there is no shifting of data from A to B to execute queries. It leads
to performance advantages to ArangoDB in comparison to the “layered” approaches.
Features of ArangoDB
• Multi-model Paradigm
• ACID Properties

Ambily Mohan, Dept of CSE, JBCMET


CST204 DBMS

• HTTP API
ArangoDB supports all popular database models. Following are a few models supported by
ArangoDB −
• Document model
• Key/Value model
• Graph model
A single query language is enough to retrieve data out of the database
The four properties Atomicity, Consistency, Isolation, and Durability (ACID) describe
the guarantees of database transactions. ArangoDB supports ACID-compliant transactions.
ArangoDB allows clients, such as browsers, to interact with the database with HTTP
API, the API being resource-oriented and extendable with JavaScript.
Advantages of using ArangoDB
• Consolidation : As a native multi-model database, ArangoDB eliminates the need to
deploy multiple databases, and thus decreases the number of components and their
maintenance. Consequently, it reduces the technology-stack complexity for the
application. In addition to consolidating your overall technical needs, this
simplification leads to lower total cost of ownership and increasing flexibility.
• Simplified Performance Scaling : With applications growing over time, ArangoDB
can tackle growing performance and storage needs, by independently scaling with
different data models. As ArangoDB can scale both vertically and horizontally, so in
case when your performance demands a decrease (a deliberate, desired slow-down),
your back-end system can be easily scaled down to save on hardware as well as
operational costs.
• Reduced Operational Complexity: The decree of Polyglot Persistence is to employ
the best tools for every job you undertake. Certain tasks need a document database,
while others may need a graph database. As a result of working with single-model
databases, it can lead to multiple operational challenges. Integrating single-model
databases is a difficult job in itself. But the biggest challenge is building a large
cohesive structure with data consistency and fault tolerance between separate,
unrelated database systems. It may prove nearly impossible.
Polyglot Persistence can be handled with a native multi-model database, as it allows to
have polyglot data easily, but at the same time with data consistency on a fault tolerant
system. With ArangoDB, we can use the correct data model for the complex job.
• Strong Data Consistency : If one uses multiple single-model databases, data
consistency can become an issue. These databases aren’t designed to communicate
with each other, therefore some form of transaction functionality needs to be
implemented to keep your data consistent between different models.
Supporting ACID transactions, ArangoDB manages your different data models with a
single back-end, providing strong consistency on a single instance, and atomic
operations when operating in cluster mode.
• Fault Tolerance :It is a challenge to build fault tolerant systems with many unrelated
components. This challenge becomes more complex when working with clusters.
Expertise is required to deploy and maintain such systems, using different technologies
and/or technology stacks. Moreover, integrating multiple subsystems, designed to run
independently, inflict large engineering and operational costs.

Ambily Mohan, Dept of CSE, JBCMET


CST204 DBMS

As a consolidated technology stack, multi-model database presents an elegant solution.


Designed to enable modern, modular architectures with different data models,
ArangoDB works for cluster usage as well.
• Lower Total Cost of Ownership: Each database technology requires ongoing
maintenance, bug fixing patches, and other code changes which are provided by the
vendor. Embracing a multi-model database significantly reduces the related
maintenance costs simply by eliminating the number of database technologies in
designing an application.
• Transactions : Providing transactional guarantees throughout multiple machines is a
real challenge, and few NoSQL databases give these guarantees. Being native multi-
model, ArangoDB imposes transactions to guarantee data consistency.
The terminologies for ArangoDB
• Document
• Collection
• Collection Identifier
• Collection Name
• Database
• Database Name
• Database Organization
Documents are grouped into collections, and Collections exist inside databases. It
should be obvious that Identifier and Name are two attributes for the collection and database.
Usually, two documents (vertices) stored in document collections are linked by a document
(edge) stored in an edge collection. This is ArangoDB's graph data model. It follows the
mathematical concept of a directed, labeled graph, except that edges don't just have labels, but
are full-blown documents.
In this model, there exist two types of collections: document collections and edge
collections. Edge collections store documents and also include two special attributes: first is
the _from attribute, and the second is the _to attribute. These attributes are used to create edges
(relations) between documents essential for graph database. Document collections are also
called vertex collections in the context of graphs.

Ambily Mohan, Dept of CSE, JBCMET

You might also like