0% found this document useful (0 votes)
57 views19 pages

Apache HBase Tutorial & Setup Guide

HBase is a distributed column-oriented database built on top of Hadoop that provides quick random access to large amounts of structured data. It uses a column-oriented data model similar to BigTable and stores data across servers in a distributed file system like HDFS. HBase provides a Java API and shell for clients to perform CRUD operations on tables containing rows and columns.

Uploaded by

Vasanth Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views19 pages

Apache HBase Tutorial & Setup Guide

HBase is a distributed column-oriented database built on top of Hadoop that provides quick random access to large amounts of structured data. It uses a column-oriented data model similar to BigTable and stores data across servers in a distributed file system like HDFS. HBase provides a Java API and shell for clients to perform CRUD operations on tables containing rows and columns.

Uploaded by

Vasanth Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 19

Department of Collegiate and Technical Education

Big Data Analytics

Department of Computer Science and Engineering

Module – 2:
Session – 4:
Using Apache HBase:
• HBase is a data model that is similar to Google’s big table designed to provide quick random access
to huge amounts of structured data. This tutorial provides an introduction to HBase, the procedures
to set up HBase on Hadoop File Systems, and ways to interact with HBase shell. It also describes
how to connect to HBase using java, and how to perform basic operations on HBase using java.
• Since 1970, RDBMS is the solution for data storage and maintenance related problems. After the
advent of big data, companies realized the benefit of processing big data and started opting for
solutions like Hadoop.
• Hadoop uses distributed file system for storing big data, and MapReduce to process it. Hadoop
excels in storing and processing of huge data of various formats such as arbitrary, semi-, or even
unstructured.
WHAT IS HBASE?
• HBase is a distributed column-oriented database built on top of the Hadoop file system. It is an
open-source project and is horizontally scalable.
• HBase is a data model that is similar to Google’s big table designed to provide quick random
access to huge amounts of structured data. It leverages the fault tolerance provided by the
Hadoop File System (HDFS).
• It is a part of the Hadoop ecosystem that provides random real-time read/write access to data in
the Hadoop File System.
• One can store the data in HDFS either directly or through HBase. Data consumer reads/accesses
the data in HDFS randomly using HBase. HBase sits on top of the Hadoop File System and
provides read and write access.
HBase Architecture
HBase and HDFS
HDFS HBase

HDFS is a distributed file system HBase is a database built on top of the


suitable for storing large files. HDFS.

HDFS does not support fast individual HBase provides fast lookups for larger
record lookups. tables.

It provides high latency batch It provides low latency access to single


processing; no concept of batch rows from billions of records (Random
processing. access).
It provides only sequential access of HBase internally uses Hash tables and
data. provides random access, and it stores
the data in indexed HDFS files for
faster lookups.
STORAGE MECHANISM IN HBASE
•HBase is a column-oriented database and the tables in it are sorted by row. The table schema
defines only column families, which are the key value pairs. A table have multiple column
families and each column family can have any number of columns. Subsequent column values
are stored contiguously on the disk. Each cell value of the table has a timestamp. In short, in an
HBase:
 Table is a collection of rows.
 Row is a collection of column families.
 Column family is a collection of columns.
 Column is a collection of key value pairs.
• Given below is an example schema of table in HBase.
COLUMN ORIENTED AND ROW ORIENTED
•Column-oriented databases are those that store data tables as sections of columns of data, rather
than as rows of data. Shortly, they will have column families.

Row-Oriented Database Column-Oriented Database


It is suitable for Online It is suitable for Online Analytical
Transaction Process (OLTP). Processing (OLAP).
Such databases are designed for Column-oriented databases are
small number of rows and designed for huge tables.
columns.
• The following image shows column families in a column-oriented database:
• HBASE AND RDBMS

HBase RDBMS

HBase is schema-less, it doesn't have An RDBMS is governed by its schema,


the concept of fixed columns schema; which describes the whole structure of
defines only column families. tables.

It is built for wide tables. HBase is It is thin and built for small tables. Hard
horizontally scalable. to scale.

No transactions are there in HBase. RDBMS is transactional.

It has de-normalized data. It will have normalized data.

It is good for semi-structured as well It is good for structured data.


as structured data.
FEATURES OF HBASE

 HBase is linearly scalable.

 It has automatic failure support.

 It provides consistent read and writes.

 It integrates with Hadoop, both as a source and a destination.

 It has easy java API for client.

 It provides data replication across clusters.


WHERE TO USE HBASE
 Apache HBase is used to have random, real-time read/write access to Big Data.
 It hosts very large tables on top of clusters of commodity hardware.
 Apache HBase is a non-relational database modeled after Google's Bigtable. Bigtable acts up
on Google File System, likewise Apache HBase works on top of Hadoop and HDFS.
Applications of HBase
• It is used whenever there is a need to write heavy applications.
• HBase is used whenever we need to provide fast random access to available data.
• Companies such as Facebook, Twitter, Yahoo, and Adobe use HBase internally.
• HBASE HISTORY

Year Event
Nov 2006 Google released the paper on BigTable.
Feb 2007 Initial HBase prototype was created as a
Hadoop contribution.
Oct 2007 The first usable HBase along with Hadoop
0.15.0 was released.
Jan 2008 HBase became the sub project of Hadoop.
Oct 2008 HBase 0.18.1 was released.
Jan 2009 HBase 0.19.0 was released.
Sept 2009 HBase 0.20.0 was released.
May 2010 HBase became Apache top-level project.
HBase - Architecture
•In HBase, tables are split into regions and are served by the region servers. Regions are
vertically divided by column families into “Stores”. Stores are saved as files in HDFS. Shown
below is the architecture of HBase.
•Note: The term ‘store’ is used for regions to explain the storage structure.
• HBase has three major components: the client library, a master server, and region servers.
Region servers can be added or removed as per requirement.
MASTERSERVER
•The master server -
 Assigns regions to the region servers and takes the help of Apache ZooKeeper for this task.
 Handles load balancing of the regions across region servers. It unloads the busy servers and
shifts the regions to less occupied servers.
 Maintains the state of the cluster by negotiating the load balancing.
 Is responsible for schema changes and other metadata operations such as creation of tables and
column families.
REGIONS

•Regions are nothing but tables that are split up and spread across the region servers.

Region server

•The region servers have regions that -

 Communicate with the client and handle data-related operations.

 Handle read and write requests for all the regions under it.

 Decide the size of the region by following the region size thresholds.
• When we take a deeper look into the region server, it contain regions and stores as shown below:

• The store contains memory store and HFiles. Memstore is just like a cache memory. Anything
that is entered into the HBase is stored here initially. Later, the data is transferred and saved in
Hfiles as blocks and the memstore is flushed.
HBASE WEB INTERFACE
•To access the web interface of HBase, type the following url in the browser.
• http://localhost:60010
• This interface lists your currently running Region servers, backup masters and HBase tables.
• HBase Region servers and Backup Masters
• HBase Tables

You might also like