0% found this document useful (0 votes)

10 views18 pages

BDA Unit-5

NoSQL databases are designed to manage large volumes of unstructured and semi-structured data, offering flexibility in data models and horizontal scalability. They are categorized into four main types: document databases, key-value stores, column-family stores, and graph databases, each suited for different applications. While NoSQL provides advantages like high scalability and performance, it also has drawbacks such as lack of standardization and ACID compliance, making careful evaluation necessary when selecting a database system.

Uploaded by

edla.laxman

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views18 pages

BDA Unit-5

Uploaded by

edla.laxman

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 18

UNIT V: Introduction to NoSQL

NoSQL is a type of database management system (DBMS) that is designed to handle and
store large volumes of unstructured and semi-structured data. Unlike traditional relational
databases that use tables with pre-defined schemas to store data, NoSQL databases use
flexible data models that can adapt to changes in data structures and are capable of scaling
horizontally to handle growing amounts of data.

The term NoSQL originally referred to “non-SQL” or “non-relational” databases, but the term
has since evolved to mean “not only SQL,” as NoSQL databases have expanded to include a
wide range of different database architectures and data models.

NoSQL databases are generally classified into four main categories:

1. Document databases: These databases store data as semi-structured documents, such as

JSON or XML, and can be queried using document-oriented query languages.
2. Key-value stores: These databases store data as key-value pairs, and are optimized for
simple and fast read/write operations.
3. Column-family stores: These databases store data as column families, which are sets of
columns that are treated as a single entity. They are optimized for fast and efficient
querying of large amounts of data.
4. Graph databases: These databases store data as nodes and edges, and are designed to
handle complex relationships between data.
NoSQL databases are often used in applications where there is a high volume of data that
needs to be processed and analyzed in real-time, such as social media analytics, e-commerce,
and gaming. They can also be used for other applications, such as content management
systems, document management, and customer relationship management.

However, NoSQL databases may not be suitable for all applications, as they may not provide
the same level of data consistency and transactional guarantees as traditional relational
databases. It is important to carefully evaluate the specific needs of an application when
choosing a database management system.
NoSQL originally referring to non SQL or non relational is a database that provides a
mechanism for storage and retrieval of data. This data is modeled in means other than the
tabular relations used in relational databases. Such databases came into existence in the late
1960s, but did not obtain the NoSQL moniker until a surge of popularity in the early twenty-
first century. NoSQL databases are used in real-time web applications and big data and their
use are increasing over time.
 NoSQL systems are also sometimes called Not only SQL to emphasize the fact that they
may support SQL-like query languages. A NoSQL database includes simplicity of design,
simpler horizontal scaling to clusters of machines and finer control over availability. The
data structures used by NoSQL databases are different from those used by default in
relational databases which makes some operations faster in NoSQL. The suitability of a
given NoSQL database depends on the problem it should solve.
 NoSQL databases, also known as “not only SQL” databases, are a new type of database
management system that have gained popularity in recent years. Unlike traditional
relational databases, NoSQL databases are designed to handle large amounts of
unstructured or semi-structured data, and they can accommodate dynamic changes to the
data model. This makes NoSQL databases a good fit for modern web applications, real-
time analytics, and big data processing.
 Data structures used by NoSQL databases are sometimes also viewed as more flexible
than relational database tables. Many NoSQL stores compromise consistency in favor of
availability, speed and partition tolerance. Barriers to the greater adoption of NoSQL
stores include the use of low-level query languages, lack of standardized interfaces, and
huge previous investments in existing relational databases.
 Most NoSQL stores lack true ACID(Atomicity, Consistency, Isolation, Durability)
transactions but a few databases, such as MarkLogic, Aerospike, FairCom c-treeACE,
Google Spanner (though technically a NewSQL database), Symas LMDB, and OrientDB
have made them central to their designs.
 Most NoSQL databases offer a concept of eventual consistency in which database changes
are propagated to all nodes so queries for data might not return updated data immediately
or might result in reading data that is not accurate which is a problem known as stale
reads. Also some NoSQL systems may exhibit lost writes and other forms of data loss.
Some NoSQL systems provide concepts such as write-ahead logging to avoid data loss.
 One simple example of a NoSQL database is a document database. In a document
database, data is stored in documents rather than tables. Each document can contain a
different set of fields, making it easy to accommodate changing data requirements
 For example, “Take, for instance, a database that holds data regarding employees.”. In a
relational database, this information might be stored in tables, with one table for employee
information and another table for department information. In a document database, each
employee would be stored as a separate document, with all of their information contained
within the document.
 NoSQL databases are a relatively new type of database management system that have
gained popularity in recent years due to their scalability and flexibility. They are designed
to handle large amounts of unstructured or semi-structured data and can handle dynamic
changes to the data model. This makes NoSQL databases a good fit for modern web
applications, real-time analytics, and big data processing.

Key Features of NoSQL :

1. Dynamic schema: NoSQL databases do not have a fixed schema and can accommodate
changing data structures without the need for migrations or schema alterations.
2. Horizontal scalability: NoSQL databases are designed to scale out by adding more nodes
to a database cluster, making them well-suited for handling large amounts of data and
high levels of traffic.
3. Document-based: Some NoSQL databases, such as MongoDB, use a document-based
data model, where data is stored in semi-structured format, such as JSON or BSON.
4. Key-value-based: Other NoSQL databases, such as Redis, use a key-value data model,
where data is stored as a collection of key-value pairs.
5. Column-based: Some NoSQL databases, such as Cassandra, use a column-based data
model, where data is organized into columns instead of rows.
6. Distributed and high availability: NoSQL databases are often designed to be highly
available and to automatically handle node failures and data replication across multiple
nodes in a database cluster.
7. Flexibility: NoSQL databases allow developers to store and retrieve data in a flexible and
dynamic manner, with support for multiple data types and changing data structures.
8. Performance: NoSQL databases are optimized for high performance and can handle a
high volume of reads and writes, making them suitable for big data and real-time
applications.

Advantages of NoSQL: There are many advantages of working with NoSQL databases such
as MongoDB and Cassandra. The main advantages are high scalability and high availability.
1. High scalability : NoSQL databases use sharding for horizontal scaling. Partitioning of
data and placing it on multiple machines in such a way that the order of the data is
preserved is sharding. Vertical scaling means adding more resources to the existing
machine whereas horizontal scaling means adding more machines to handle the data.
Vertical scaling is not that easy to implement but horizontal scaling is easy to implement.
Examples of horizontal scaling databases are MongoDB, Cassandra, etc. NoSQL can
handle a huge amount of data because of scalability, as the data grows NoSQL scale itself
to handle that data in an efficient manner.
2. Flexibility: NoSQL databases are designed to handle unstructured or semi-structured
data, which means that they can accommodate dynamic changes to the data model. This
makes NoSQL databases a good fit for applications that need to handle changing data
requirements.
3. High availability : Auto replication feature in NoSQL databases makes it highly
available because in case of any failure data replicates itself to the previous consistent
state.
4. Scalability: NoSQL databases are highly scalable, which means that they can handle
large amounts of data and traffic with ease. This makes them a good fit for applications
that need to handle large amounts of data or traffic
5. Performance: NoSQL databases are designed to handle large amounts of data and traffic,
which means that they can offer improved performance compared to traditional relational
databases.
6. Cost-effectiveness: NoSQL databases are often more cost-effective than traditional
relational databases, as they are typically less complex and do not require expensive
hardware or software.
7. Agility: Ideal for agile development.
Disadvantages of NoSQL: NoSQL has the following disadvantages.
1. Lack of standardization : There are many different types of NoSQL databases, each
with its own unique strengths and weaknesses. This lack of standardization can make it
difficult to choose the right database for a specific application
2. Lack of ACID compliance : NoSQL databases are not fully ACID-compliant, which
means that they do not guarantee the consistency, integrity, and durability of data. This
can be a drawback for applications that require strong data consistency guarantees.
3. Narrow focus : NoSQL databases have a very narrow focus as it is mainly designed for
storage but it provides very little functionality. Relational databases are a better choice in
the field of Transaction Management than NoSQL.
4. Open-source : NoSQL is open-source database. There is no reliable standard for NoSQL
yet. In other words, two database systems are likely to be unequal.
5. Lack of support for complex queries : NoSQL databases are not designed to handle
complex queries, which means that they are not a good fit for applications that require
complex data analysis or reporting.
6. Lack of maturity : NoSQL databases are relatively new and lack the maturity of
traditional relational databases. This can make them less reliable and less secure than
traditional databases.
7. Management challenge : The purpose of big data tools is to make the management of a
large amount of data as simple as possible. But it is not so easy. Data management in
NoSQL is much more complex than in a relational database. NoSQL, in particular, has a
reputation for being challenging to install and even more hectic to manage on a daily
basis.
8. GUI is not available : GUI mode tools to access the database are not flexibly available in
the market.
9. Backup : Backup is a great weak point for some NoSQL databases like MongoDB.
MongoDB has no approach for the backup of data in a consistent manner.
10. Large document size : Some database systems like MongoDB and CouchDB store data
in JSON format. This means that documents are quite large (BigData, network bandwidth,
speed), and having descriptive key names actually hurts since they increase the document
size.
Types of NoSQL database: Types of NoSQL databases and the name of the databases system
that falls in that category are:
1. Graph Databases: Examples – Amazon Neptune, Neo4j
2. Key value store: Examples – Memcached, Redis, Coherence
3. Tabular: Examples – Hbase, Big Table, Accumulo
4. Document-based: Examples – MongoDB, CouchDB, Cloudant

When should NoSQL be used:

1. When a huge amount of data needs to be stored and retrieved.
2. The relationship between the data you store is not that important
3. The data changes over time and is not structured.
4. Support of Constraints and Joins is not required at the database level
5. The data is growing continuously and you need to scale the database regularly to handle
the data.

In conclusion, NoSQL databases offer several benefits over traditional relational databases,
such as scalability, flexibility, and cost-effectiveness. However, they also have several
drawbacks, such as a lack of standardization, lack of ACID compliance, and lack of support
for complex queries. When choosing a database for a specific application, it is important to
weigh the benefits and drawbacks carefully to determine the best fit.

SQL vs NoSQLDifference between SQL and QL

In the event of selecting a modern database, the difficulty is to choose from the relational and
non-relational database. Here, in this article, we are contemplating the differences between SQL
and NoSQL. One of these databases is relational, and the other is non-relational. We compare the
SQL and NoSQL databases, here. However, prior to understanding the NoSQL and SQL
differences, users are advised to have a look at them individually.
The databases that are viable options are given below. They are:

 SQL
 NoSQL

What is SQL?

Structured Query Language or SQL is a table-based relational database. By applying the SQL
programming language, users can now search, insert, modify and delete data from the database
records. This in no way limits the use of SQL. The services it supports are also not limited to the
optimization or administration of the database.

What is NoSQL?

NoSQL is a non-relational database or DMS without any fixed schema, while it is easy to scale.
Distributed data stores that require a large quantity of data storage needs have a call for NoSQL.
Big Data and real-time web apps make use of NoSQL.

What is the Difference between SQL and NoSQL?

The databases in SQL are table-based, while the databases in NoSQL are document, key-value,
graph, or wide-column stores. SQL databases suit multi-row transactions, while NoSQL is better
for unstructured data like documents or JSON. Learn more about what is the difference between
SQL and NoSQL from the table.

Difference between SQL and NoSQL

SQL NoSQL
SQL is also pronounced as “S-Q-L” or as NoSQL is a distributed or Non-relational
“See-Quel” and is primarily known to be a Database
Relational Database

Use of SQL queries and syntax to analyse Apply different types of database technologies
and get further data insights. Used for
OLAP systems

Database, here is in table format NoSQL databases are document based with
key-value pairs and graph databases.

They are scalable vertically These are horizontally scalable

Schema used is pre-defined Dynamic schema is used for unstructured or

disorganised data

SQL uses specialized DB hardware to NoSQL uses commodity hardware

enhance performance

Total focus on ACID (Atomicity, Makes use of the Brewer’s CAP theorem
Consistency, Isolation and Durability) (Consistency, Availability and Partition
properties Tolerance)

Examples are Sqlite, MySql, Oracle, Examples are Cassandra, MongoDB,

Postgres and MS-SQL BigTable, Redis, RavenDb, Hbase, Neo4j and
CouchDb

MongoDB, Key value store, Tabular, and Document based.

HBase: HBasics, Installation, clients, Building an Online Query Application.

Installing HBase
We can install HBase in any of the three modes: Standalone mode, Pseudo Distributed mode,
and Fully Distributed mode.

Installing HBase in Standalone Mode

Download the latest stable version of HBase

form http://www.interior-dsgn.com/apache/hbase/stable/ using “wget” command, and extract it
using the tar “zxvf” command. See the following command.
$cd usr/local/
$wget http://www.interior-dsgn.com/apache/hbase/stable/hbase-0.98.8-
hadoop2-bin.tar.gz
$tar -zxvf hbase-0.98.8-hadoop2-bin.tar.gz

Shift to super user mode and move the HBase folder to /usr/local as shown below.

$su
$password: enter your password here
mv hbase-0.99.1/* Hbase/

Configuring HBase in Standalone Mode

Before proceeding with HBase, you have to edit the following files and configure HBase.

hbase-env.sh

Set the java Home for HBase and open hbase-env.sh file from the conf folder. Edit
JAVA_HOME environment variable and change the existing path to your current JAVA_HOME
variable as shown below.
cd /usr/local/Hbase/conf
gedit hbase-env.sh
This will open the env.sh file of HBase. Now replace the existing JAVA_HOME value with
your current value as shown below.
export JAVA_HOME=/usr/lib/jvm/java-1.7.0

hbase-site.xml

This is the main configuration file of HBase. Set the data directory to an appropriate location by
opening the HBase home folder in /usr/local/HBase. Inside the conf folder, you will find several
files, open the hbase-site.xml file as shown below.
#cd /usr/local/HBase/
#cd conf
# gedit hbase-site.xml
Inside the hbase-site.xml file, you will find the <configuration> and </configuration> tags.
Within them, set the HBase directory under the property key with the name “hbase.rootdir” as
shown below.
<configuration>
//Here you have to set the path where you want HBase to store its files.
<property>
<name>hbase.rootdir</name>
<value>file:/home/hadoop/HBase/HFiles</value>
</property>

//Here you have to set the path where you want HBase to store its built in zookeeper files.
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/hadoop/zookeeper</value>
</property>
</configuration>
With this, the HBase installation and configuration part is successfully complete. We can start
HBase by using start-hbase.sh script provided in the bin
folder of HBase. For that, open HBase Home Folder and run HBase start script as shown below.
$cd /usr/local/HBase/bin
$./start-hbase.sh

If everything goes well, when you try to run HBase start script, it will prompt you a message
saying that HBase has started.

starting master, logging to /usr/local/HBase/bin/../logs/hbase-tpmaster-localhost.localdomain.out

Installing HBase in Pseudo-Distributed Mode

Let us now check how HBase is installed in pseudo-distributed mode.

Configuring HBase

Before proceeding with HBase, configure Hadoop and HDFS on your local system or on a
remote system and make sure they are running. Stop HBase if it is running.

hbase-site.xml

Edit hbase-site.xml file to add the following properties.

<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>

It will mention in which mode HBase should be run. In the same file from the local file system,
change the hbase.rootdir, your HDFS instance address, using the hdfs://// URI syntax. We are
running HDFS on the localhost at port 8030.

<property>
<name>hbase.rootdir</name>
<value>hdfs://localhost:8030/hbase</value>
</property>

Starting HBase

After configuration is over, browse to HBase home folder and start HBase using the following
command.

$cd /usr/local/HBase
$bin/start-hbase.sh
Note: Before starting HBase, make sure Hadoop is running.

Checking the HBase Directory in HDFS

HBase creates its directory in HDFS. To see the created directory, browse to Hadoop bin and
type the following command.

$ ./bin/hadoop fs -ls /hbase

If everything goes well, it will give you the following output.

Found 7 items
drwxr-xr-x - hbase users 0 2014-06-25 18:58 /hbase/.tmp
drwxr-xr-x - hbase users 0 2014-06-25 21:49 /hbase/WALs
drwxr-xr-x - hbase users 0 2014-06-25 18:48 /hbase/corrupt
drwxr-xr-x - hbase users 0 2014-06-25 18:58 /hbase/data
-rw-r--r-- 3 hbase users 42 2014-06-25 18:41 /hbase/hbase.id
-rw-r--r-- 3 hbase users 7 2014-06-25 18:41 /hbase/hbase.version
drwxr-xr-x - hbase users 0 2014-06-25 21:49 /hbase/oldWALs

Starting and Stopping a Master

Using the “local-master-backup.sh” you can start up to 10 servers. Open the home folder of
HBase, master and execute the following command to start it.

$ ./bin/local-master-backup.sh 2 4
To kill a backup master, you need its process id, which will be stored in a file
named “/tmp/hbase-USER-X-master.pid.” you can kill the backup master using the following
command.
$ cat /tmp/hbase-user-1-master.pid |xargs kill -9

Starting and Stopping RegionServers

You can run multiple region servers from a single system using the following command.

$ .bin/local-regionservers.sh start 2 3
To stop a region server, use the following command.

$ .bin/local-regionservers.sh stop 3

Starting HBaseShell

After Installing HBase successfully, you can start HBase Shell. Below given are the sequence of
steps that are to be followed to start the HBase shell. Open the terminal, and login as super user.

Start Hadoop File System

Browse through Hadoop home sbin folder and start Hadoop file system as shown below.

$cd $HADOOP_HOME/sbin
$start-all.sh

Start HBase

Browse through the HBase root directory bin folder and start HBase.

$cd /usr/local/HBase
$./bin/start-hbase.sh

Start HBase Master Server

This will be the same directory. Start it as shown below.

$./bin/local-master-backup.sh start 2 (number signifies specific

server.)

Start Region

Start the region server as shown below.

$./bin/./local-regionservers.sh start 3

Start HBase Shell

You can start HBase shell using the following command.

$cd bin
$./hbase shell

This will give you the HBase Shell Prompt as shown below.
2014-12-09 14:24:27,526 INFO [main] Configuration.deprecation:
hadoop.native.lib is deprecated. Instead, use io.native.lib.available
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 0.98.8-hadoop2, r6cfc8d064754251365e070a10a82eb169956d5fe, Fri
Nov 14 18:26:29 PST 2014

hbase(main):001:0>

HBase Web Interface

To access the web interface of HBase, type the following url in the browser.

http://localhost:60010

This interface lists your currently running Region servers, backup masters and HBase tables.

HBase Region servers and Backup Masters

HBase Tables
Setting Java Environment

We can also communicate with HBase using Java libraries, but before accessing HBase using
Java API you need to set classpath for those libraries.

Setting the Classpath

Before proceeding with programming, set the classpath to HBase libraries in .bashrc file.
Open .bashrc in any of the editors as shown below.
$ gedit ~/.bashrc

Set classpath for HBase libraries (lib folder in HBase) in it as shown below.

export CLASSPATH = $CLASSPATH://home/hadoop/hbase/lib/*

This is to prevent the “class not found” exception while accessing the HBase using java API.

Apache Drill - Querying Data using HBase

HBase is a distributed column-oriented database built on top of the Hadoop file system. It is a
part of the Hadoop ecosystem that provides random real-time read/write access to data in the
Hadoop File System. One can store the data in HDFS either directly or through HBase. The
following steps are used to query HBase data in Apache Drill.

How to Start Hadoop and HBase?

Step 1: Prerequisites
Before moving on to querying HBase data, you must need to install the following −
 Java installed version 1.7 or greater
 Hadoop
 HBase
Step 2: Enable Storage Plugin

After successful installation navigate to Apache Drill web console and select the storage menu
option as shown in the following screenshot.

Then choose HBase Enable option, after that go to the update option and now you will see the
response as shown in the following program.

{
"type": "hbase",
"config": {
"hbase.zookeeper.quorum": "localhost",
"hbase.zookeeper.property.clientPort": "2181"
},
"size.calculator.enabled": false,
"enabled": true
}

Here the config settings “hbase.zookeeper.property.clientPort” : “2181” indicates ZooKeeper

port id. In the embedded mode, it will automatically assign it to the ZooKeeper, but in the
distributed mode, you must specify the ZooKeeper port id’s separately. Now, HBase plugin is
enabled in Apache Drill.
Step 3: Start Hadoop and HBase

After enabling the plugin, first start your Hadoop server then start HBase.

Creating a Table Using HBase Shell

After Hadoop and HBase has been started, you can start the HBase interactive shell using “hbase
shell” command as shown in the following query.

Query

/bin/hbase shell

Then you will see the response as shown in the following program.

Result

hbase(main):001:0>

To query HBase, you should complete the following steps −

Create a Table

Pipe the following commands to the HBase shell to create a “customer” table.

Query

hbase(main):001:0> create 'customers','account','address'

Load Data into the Table

Create a simple text file named “hbase-customers.txt” as shown in the following program.

Example

put 'customers','Alice','account:name','Alice'
put 'customers','Alice','address:street','123 Ballmer Av'
put 'customers','Alice','address:zipcode','12345'
put 'customers','Alice','address:state','CA'
put 'customers','Bob','account:name','Bob'
put 'customers','Bob','address:street','1 Infinite Loop'
put 'customers','Bob','address:zipcode','12345'
put 'customers','Bob','address:state','CA'
put 'customers','Frank','account:name','Frank'
put 'customers','Frank','address:street','435 Walker Ct'
put 'customers','Frank','address:zipcode','12345'
put 'customers','Frank','address:state','CA'
put 'customers','Mary','account:name','Mary'
put 'customers','Mary','address:street','56 Southern Pkwy'
put 'customers','Mary','address:zipcode','12345'
put 'customers','Mary','address:state','CA'

Now, issue the following command in hbase shell to load the data into a table.

Query

hbase(main):001:0> cat ../drill_sample/hbase/hbase-customers.txt | bin/hbase shell

Query

Now switch to Apache Drill shell and issue the following command.

0: jdbc:drill:zk = local> select * from hbase.customers;

Result

+------------+---------------------+---------------------------------------------------------------------------+
| row_key | account | address |
+------------+---------------------+---------------------------------------------------------------------------+
| 416C696365 | {"name":"QWxpY2U="} |
{"state":"Q0E=","street":"MTIzIEJhbGxtZXIgQXY=","zipcode":"MTIzNDU="} |
| 426F62 | {"name":"Qm9i"} |
{"state":"Q0E=","street":"MSBJbmZpbml0ZSBMb29w","zipcode":"MTIzNDU="} |
| 4672616E6B | {"name":"RnJhbms="} |
{"state":"Q0E=","street":"NDM1IFdhbGtlciBDdA==","zipcode":"MTIzNDU="} |
| 4D617279 | {"name":"TWFyeQ=="} |
{"state":"Q0E=","street":"NTYgU291dGhlcm4gUGt3eQ==","zipcode":"MTIzNDU="} |
+------------+---------------------+---------------------------------------------------------------------------+

The output will be 4 rows selected in 1.211 seconds.

Apache Drill fetches the HBase data as a binary format, which we can convert into readable data
using CONVERT_FROM function available in drill. Check and use the following query to get
proper data from drill.

Query

0: jdbc:drill:zk = local> SELECT CONVERT_FROM(row_key, 'UTF8') AS customer_id,

. . . . . . . . . . . > CONVERT_FROM(customers.account.name, 'UTF8') AS customers_name,
. . . . . . . . . . . > CONVERT_FROM(customers.address.state, 'UTF8') AS customers_state,
. . . . . . . . . . . > CONVERT_FROM(customers.address.street, 'UTF8') AS customers_street,
. . . . . . . . . . . > CONVERT_FROM(customers.address.zipcode, 'UTF8') AS customers_zipcode
. . . . . . . . . . . > FROM hbase.customers;

Result

+--------------+----------------+-----------------+------------------+--------------------+
| customer_id | customers_name | customers_state | customers_street | customers_zipcode |
+--------------+----------------+-----------------+------------------+--------------------+
| Alice | Alice | CA | 123 Ballmer Av | 12345 |
| Bob | Bob | CA | 1 Infinite Loop | 12345 |
| Frank | Frank | CA | 435 Walker Ct | 12345 |
| Mary | Mary | CA | 56 Southern Pkwy | 12345 |
+--------------+----------------+-----------------+------------------+--------------------+

Full Stack UNIT3
No ratings yet
Full Stack UNIT3
57 pages
Unit 2 - Big Data Analytics - CCS334
No ratings yet
Unit 2 - Big Data Analytics - CCS334
36 pages
Unit 3
No ratings yet
Unit 3
28 pages
Adbms Unit 1
No ratings yet
Adbms Unit 1
32 pages
NoSQL Databases
No ratings yet
NoSQL Databases
10 pages
IA2 - QnA
No ratings yet
IA2 - QnA
44 pages
BigData Unit2 V2
No ratings yet
BigData Unit2 V2
70 pages
NoSQL Complete QB
No ratings yet
NoSQL Complete QB
43 pages
What Is NoSQL
No ratings yet
What Is NoSQL
52 pages
Unit-I Remaining HM
No ratings yet
Unit-I Remaining HM
32 pages
BDA Module 3
No ratings yet
BDA Module 3
27 pages
Unit 5
No ratings yet
Unit 5
137 pages
DB 5
No ratings yet
DB 5
39 pages
Module 5
No ratings yet
Module 5
31 pages
Unit 2 Bda Bda
No ratings yet
Unit 2 Bda Bda
29 pages
Big Data Notes
No ratings yet
Big Data Notes
70 pages
Module 3 Bigdata Analytics
No ratings yet
Module 3 Bigdata Analytics
19 pages
Unit - 3
No ratings yet
Unit - 3
34 pages
Unit 15
No ratings yet
Unit 15
19 pages
Unit 2 Bda
No ratings yet
Unit 2 Bda
28 pages
Unit 4-1
No ratings yet
Unit 4-1
21 pages
NoSQL Databases
No ratings yet
NoSQL Databases
36 pages
Unit VI Big Data
No ratings yet
Unit VI Big Data
19 pages
6.unit 2 Bda
No ratings yet
6.unit 2 Bda
50 pages
Nosql Module 1
No ratings yet
Nosql Module 1
23 pages
Big Data Notes
No ratings yet
Big Data Notes
18 pages
Unit II Nosql Data Management
No ratings yet
Unit II Nosql Data Management
57 pages
DBMS Chapter 5
No ratings yet
DBMS Chapter 5
52 pages
Mongo Nosql
No ratings yet
Mongo Nosql
12 pages
Unit 2
No ratings yet
Unit 2
23 pages
Module 1 Introduction
No ratings yet
Module 1 Introduction
9 pages
NoSQL Technologies Notes Unit 1
100% (1)
NoSQL Technologies Notes Unit 1
20 pages
NoSQL DATABASE-B
No ratings yet
NoSQL DATABASE-B
4 pages
Unit Ii - Nosql Databases
No ratings yet
Unit Ii - Nosql Databases
112 pages
BDT Unit-Ii
No ratings yet
BDT Unit-Ii
13 pages
Unit II - BIG DATA ANALYTICS
No ratings yet
Unit II - BIG DATA ANALYTICS
11 pages
Unit Iii
No ratings yet
Unit Iii
22 pages
Unit No 1
No ratings yet
Unit No 1
34 pages
No SQL - Types, CAP Theorem
No ratings yet
No SQL - Types, CAP Theorem
12 pages
Nosql Database: Nosql Databases Are Generally Classified Into Four Main Categories
No ratings yet
Nosql Database: Nosql Databases Are Generally Classified Into Four Main Categories
11 pages
Unit 4
No ratings yet
Unit 4
36 pages
3.1 Introduction To NoSQL
No ratings yet
3.1 Introduction To NoSQL
10 pages
No SQL
No ratings yet
No SQL
3 pages
Nosql Database
No ratings yet
Nosql Database
19 pages
Introduction To NoSQL
No ratings yet
Introduction To NoSQL
12 pages
UNIT II First Half Notes
No ratings yet
UNIT II First Half Notes
21 pages
Unit 3
No ratings yet
Unit 3
10 pages
Big Data Bhag 4 Changes
No ratings yet
Big Data Bhag 4 Changes
26 pages
High-Performance, Non Relational Databases With Flexible Data Models
No ratings yet
High-Performance, Non Relational Databases With Flexible Data Models
4 pages
NoSQL Group1
No ratings yet
NoSQL Group1
15 pages
Introduction To NoSQL
No ratings yet
Introduction To NoSQL
1 page
Nosql Databases
No ratings yet
Nosql Databases
2 pages
No SQL
No ratings yet
No SQL
11 pages
Nosql Databases Unit-1
No ratings yet
Nosql Databases Unit-1
16 pages
Unit 3
No ratings yet
Unit 3
28 pages
Case Study On Netflix
No ratings yet
Case Study On Netflix
20 pages
Iot Tools
No ratings yet
Iot Tools
35 pages
What Is NoSQL
No ratings yet
What Is NoSQL
4 pages
Nosql Database
No ratings yet
Nosql Database
8 pages
DevOps Using Python
No ratings yet
DevOps Using Python
57 pages
FDS - Aids Complete Notes
No ratings yet
FDS - Aids Complete Notes
138 pages
Motivation Letter
No ratings yet
Motivation Letter
4 pages
Bda Unit 3
No ratings yet
Bda Unit 3
28 pages
NoSql Notes
No ratings yet
NoSql Notes
4 pages
Azure + Dynamics 365 + Online Services - Public & Government - SOC 2 Type II + C5 + CSA Star Report (2019!04!01 To 2020-03-31)
No ratings yet
Azure + Dynamics 365 + Online Services - Public & Government - SOC 2 Type II + C5 + CSA Star Report (2019!04!01 To 2020-03-31)
329 pages
Acknowledgement: Jatin Rohilla Surbhi Mittal
No ratings yet
Acknowledgement: Jatin Rohilla Surbhi Mittal
35 pages
OOP (Python) in DevOps
No ratings yet
OOP (Python) in DevOps
14 pages
A Method For Analysing Delay Duration Considering Lost Productivity Through Construction Productivity Data Model
No ratings yet
A Method For Analysing Delay Duration Considering Lost Productivity Through Construction Productivity Data Model
9 pages
5 IAM Infrastructure
No ratings yet
5 IAM Infrastructure
11 pages
Complier Design Documentation
No ratings yet
Complier Design Documentation
39 pages
Innovation Data Collection
No ratings yet
Innovation Data Collection
4 pages
MKT3039 - Week Ten Online - Website Checklist
No ratings yet
MKT3039 - Week Ten Online - Website Checklist
54 pages
Lesson 2
No ratings yet
Lesson 2
30 pages
PDC Week 2 (Performance Metrice, Amdahl's Law)
No ratings yet
PDC Week 2 (Performance Metrice, Amdahl's Law)
18 pages
CSL Chap-02 Part-2
No ratings yet
CSL Chap-02 Part-2
89 pages
Data Analytics Conference
No ratings yet
Data Analytics Conference
15 pages
Assignment 02+SRS+SP21 BCS 098+SP21 BCS 099
No ratings yet
Assignment 02+SRS+SP21 BCS 098+SP21 BCS 099
94 pages
Block Chain Hash Functions
No ratings yet
Block Chain Hash Functions
35 pages
EOI Documentation Aryan Jain
No ratings yet
EOI Documentation Aryan Jain
19 pages
Computer Fundamentals MODULE 4 LESSON 1
No ratings yet
Computer Fundamentals MODULE 4 LESSON 1
9 pages
Application & Cloud Security Analyst
No ratings yet
Application & Cloud Security Analyst
3 pages
CMSR Quick Start Guide: Rosella Software
No ratings yet
CMSR Quick Start Guide: Rosella Software
19 pages
Abdul Rafe Khan Resume3
No ratings yet
Abdul Rafe Khan Resume3
1 page
6236-Implementing and Maintaining Microsoft SQL Server 2008 Reporting Services
No ratings yet
6236-Implementing and Maintaining Microsoft SQL Server 2008 Reporting Services
5 pages
AWS For Pinterest Cloud Computing
No ratings yet
AWS For Pinterest Cloud Computing
9 pages
Report of Algo-Trading
No ratings yet
Report of Algo-Trading
6 pages
Refer.3.3.3.AskF5 - Manual Chapter - AFM DoS - DDoS Protection
No ratings yet
Refer.3.3.3.AskF5 - Manual Chapter - AFM DoS - DDoS Protection
2 pages
OAuth2 Authorization Patterns and Microservices - by Johan Sydseter - Norway Community Site - Medium
No ratings yet
OAuth2 Authorization Patterns and Microservices - by Johan Sydseter - Norway Community Site - Medium
5 pages
Computer Organization
No ratings yet
Computer Organization
2 pages
Unit2 PDF
No ratings yet
Unit2 PDF
8 pages
Apps Setup Assistant
No ratings yet
Apps Setup Assistant
1 page
Tally
No ratings yet
Tally
3 pages
Juan M. Agudo: Head of Product
No ratings yet
Juan M. Agudo: Head of Product
2 pages

BDA Unit-5

Uploaded by

BDA Unit-5

Uploaded by

UNIT V: Introduction to NoSQL

NoSQL databases are generally classified into four main categories:

1. Document databases: These databases store data as semi-structured documents, such as

Key Features of NoSQL :

When should NoSQL be used:

SQL vs NoSQLDifference between SQL and QL

What is the Difference between SQL and NoSQL?

Difference between SQL and NoSQL

They are scalable vertically These are horizontally scalable

Schema used is pre-defined Dynamic schema is used for unstructured or

SQL uses specialized DB hardware to NoSQL uses commodity hardware

Examples are Sqlite, MySql, Oracle, Examples are Cassandra, MongoDB,

MongoDB, Key value store, Tabular, and Document based.

HBase: HBasics, Installation, clients, Building an Online Query Application.

Installing HBase in Standalone Mode

Download the latest stable version of HBase

Configuring HBase in Standalone Mode

starting master, logging to /usr/local/HBase/bin/../logs/hbase-tpmaster-localhost.localdomain.out

Installing HBase in Pseudo-Distributed Mode

Let us now check how HBase is installed in pseudo-distributed mode.

Edit hbase-site.xml file to add the following properties.

Checking the HBase Directory in HDFS

$ ./bin/hadoop fs -ls /hbase

If everything goes well, it will give you the following output.

Starting and Stopping a Master

Starting and Stopping RegionServers

Start Hadoop File System

Start HBase Master Server

This will be the same directory. Start it as shown below.

$./bin/local-master-backup.sh start 2 (number signifies specific

Start the region server as shown below.

Start HBase Shell

You can start HBase shell using the following command.

HBase Web Interface

HBase Region servers and Backup Masters

Setting the Classpath

export CLASSPATH = $CLASSPATH://home/hadoop/hbase/lib/*

Apache Drill - Querying Data using HBase

How to Start Hadoop and HBase?

Here the config settings “hbase.zookeeper.property.clientPort” : “2181” indicates ZooKeeper

Creating a Table Using HBase Shell

To query HBase, you should complete the following steps −

hbase(main):001:0> create 'customers','account','address'

Load Data into the Table

hbase(main):001:0> cat ../drill_sample/hbase/hbase-customers.txt | bin/hbase shell

0: jdbc:drill:zk = local> select * from hbase.customers;

The output will be 4 rows selected in 1.211 seconds.

0: jdbc:drill:zk = local> SELECT CONVERT_FROM(row_key, 'UTF8') AS customer_id,

You might also like