0% found this document useful (0 votes)

25 views27 pages

HBase

DETAILED INFO ABOUT HBASE

Uploaded by

hemanthavanapu13

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views27 pages

HBase

DETAILED INFO ABOUT HBASE

Uploaded by

hemanthavanapu13

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

HBase

• Limitations of Hadoop
• Hadoop can perform only batch processing, and data will be accessed only in
a sequential manner. That means one has to search the entire dataset even
for the simplest of jobs.
• A huge dataset when processed results in another huge data set, which
should also be processed sequentially. At this point, a new solution is needed
to access any point of data in a single unit of time (random access).
• Hadoop Random Access Databases
• Applications such as HBase, Cassandra, couchDB, Dynamo, and MongoDB
are some of the databases that store huge amounts of data and access the
data in a random manner.
What is HBase?
HBase is a distributed column-oriented database built on top of the Hadoop file system. It
is an open-source project and is horizontally scalable.

HBase is a data model that is similar to Google’s big table designed to provide quick
random access to huge amounts of structured data. It leverages the fault tolerance
provided by the Hadoop File System (HDFS).
It is a part of the Hadoop ecosystem that provides random real-time read/write access to
data in the Hadoop File System.
One can store the data in HDFS either directly or through HBase. Data consumer
reads/accesses the data in HDFS randomly using HBase. HBase sits on top of the
Hadoop File System and provides read and write access.
HBase and HDFS
HDFS HBase

HDFS is a distributed file system suitable for HBase is a database built on top of the
storing large files. HDFS.
HDFS does not support fast individual HBase provides fast lookups for larger
record lookups. tables.
It provides high latency batch processing; It provides low latency access to single rows
no concept of batch processing. from billions of records (Random access).

It provides only sequential access of data. HBase internally uses Hash tables and
provides random access, and it stores the
data in indexed HDFS files for faster
lookups.
Storage Mechanism in HBase
HBase is a column-oriented database and the tables in it are sorted by row.
The table schema defines only column families, which are the key value pairs.

Row Column Family Column Family Column Family

ID col1 col2 col3 col1 col2 col3 col1 col2 col3

3
Column Oriented and Row Oriented
Column-oriented databases are those that store data tables as sections of columns of data, rather
than as rows of data. Shortly, they will have column families.

Row-Oriented Database Column-Oriented Database

It is suitable for Online Transaction Process (OLTP). It is suitable for Online Analytical Processing
(OLAP).
Such databases are designed for small number of Column-oriented databases are designed for huge
rows and columns. tables.
HBase and RDBMS
HBase RDBMS

HBase is schema-less, it doesn't have the An RDBMS is governed by its schema, which
concept of fixed columns schema; defines describes the whole structure of tables.
only column families.

It is built for wide tables. HBase is It is thin and built for small tables. Hard to
horizontally scalable. scale.

No transactions are there in HBase. RDBMS is transactional.

It has de-normalized data. It will have normalized data.

It is good for semi-structured as well as It is good for structured data.

structured data.
Features of HBase
• HBase is linearly scalable.
• It has automatic failure support.
• It provides consistent read and writes.
• It integrates with Hadoop, both as a source and a destination.
• It has easy java API for client.
• It provides data replication across clusters.
Where to Use HBase

• Apache HBase is used to have random, real-time read/write access to

Big Data.

• It hosts very large tables on top of clusters of commodity hardware.

• Apache HBase is a non-relational database modeled after Google's

Bigtable. Bigtable acts up on Google File System, likewise Apache
HBase works on top of Hadoop and HDFS.
Applications of HBase

• It is used whenever there is a need to write heavy applications.

• HBase is used whenever we need to provide fast random access to

available data.

• Companies such as Facebook, Twitter, Yahoo, and Adobe use HBase

internally.
HBase - Architecture
MasterServer
Assigns regions to the region servers and takes the help of Apache ZooKeeper
for this task.

Handles load balancing of the regions across region servers. It unloads the
busy servers and shifts the regions to less occupied servers.
Maintains the state of the cluster by negotiating the load balancing.

Is responsible for schema changes and other metadata operations such as

creation of tables and column families.
Regions
Regions are nothing but tables that are split up and spread across the region
servers.
Region server:
The region servers have regions that -
• Communicate with the client and handle data-related operations.
• Handle read and write requests for all the regions under it.
• Decide the size of the region by following the region size thresholds.
When we take a deeper look into the region server, it
contain regions and stores as shown below:
Zookeeper
Zookeeper is an open-source project that provides services like maintaining
configuration information, naming, providing distributed synchronization, etc.
Zookeeper has ephemeral nodes representing different region servers.
Master servers use these nodes to discover available servers.

In addition to availability, the nodes are also used to track server failures or
network partitions.
Clients communicate with region servers via zookeeper.
In pseudo and standalone modes, HBase itself will take care of zookeeper.
HBase Shell : HBase contains a shell using which you can communicate with HBase.
General Commands:
• status - Provides the status of HBase, for example, the number of servers.
• version - Provides the version of HBase being used.
• table_help - Provides help for table-reference commands.
• whoami - Provides information about the user.
Data Definition Language:
These are the commands that operate on the tables in HBase.
create - Creates a table.
list - Lists all the tables in HBase.
disable - Disables a table.
is_disabled - Verifies whether a table is disabled.
enable - Enables a table.
is_enabled - Verifies whether a table is enabled.
•describe - Provides the description of a table.

•alter - Alters a table.

•exists - Verifies whether a table exists.

•drop - Drops a table from HBase.

•drop_all - Drops the tables matching the ‘regex’ given in the command.

•Java Admin API - Prior to all the above commands, Java provides an Admin
API to achieve DDL functionalities through programming.
Under org.apache.hadoop.hbase.client package, HBaseAdmin and
HTableDescriptor are the two important classes in this package that provide DDL
functionalities.
Data Manipulation Language:

•put - Puts a cell value at a specified column in a specified row in a particular table.
•get - Fetches the contents of row or a cell.
•delete - Deletes a cell value in a table.
•deleteall - Deletes all the cells in a given row.
•scan - Scans and returns the table data.
•count - Counts and returns the number of rows in a table.
•truncate - Disables, drops, and recreates a specified table.

•Java client API - Prior to all the above commands, Java provides a client API to
achieve DML functionalities, CRUD (Create Retrieve Update Delete) operations and
more through programming, under org.apache.hadoop.hbase.client
package. HTable Put and Get are the important classes in this package.
Table Creation in HBase
Creating a Table using HBase Shell

Creating a Table Using java API

Creating a Table using HBase Shell
You can create a table using the create command, here you must specify the table
name and the Column Family name.

create ‘<table name>’,’<column family>’

create 'emp', 'personal data', 'professional data'

Row key personal data professional data

Creating a Table Using java API
You can create a table in HBase using the createTable() method of
HBaseAdmin class. This class belongs to the org.apache.hadoop.hbase.client
package.
steps to create a table in HBase using java API.

Step1: Instantiate HBaseAdmin

Step2: Create TableDescriptor
Step3: Execute through Admin
Step1: Instantiate HBaseAdmin
This class requires the Configuration object as a parameter, therefore initially
instantiate the Configuration class and pass this instance to HBaseAdmin.

Configuration conf = HBaseConfiguration.create();

HBaseAdmin admin = new HBaseAdmin(conf);
Step2: Create TableDescriptor
HTableDescriptor is a class that belongs to the org.apache.hadoop.hbase class
This class is like a container of table names and column families.

//creating table descriptor

HTableDescriptor table = new HTableDescriptor(toBytes("Table name"));

//creating column family descriptor

HColumnDescriptor family = new HColumnDescriptor(toBytes("column family"));

//adding coloumn family to HTable

table.addFamily(family);
Step 3: Execute through Admin
Using the createTable() method of HBaseAdmin class, you can execute the
created table in Admin mode.

admin.createTable(table);
import java.io.IOException;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.HColumnDescriptor;
import org.apache.hadoop.hbase.HTableDescriptor;
import org.apache.hadoop.hbase.client.HBaseAdmin;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.conf.Configuration;
public class CreateTable {
public static void main(String[] args) throws IOException {
// Instantiating configuration class
Configuration con = HBaseConfiguration.create();
// Instantiating HbaseAdmin class
HBaseAdmin admin = new HBaseAdmin(con);
// Instantiating table descriptor class
HTableDescriptor tableDescriptor = new HTableDescriptor(TableName.valueOf("emp"));
// Adding column families to table descriptor
tableDescriptor.addFamily(new HColumnDescriptor("personal"));
tableDescriptor.addFamily(new HColumnDescriptor("professional"));
// Execute the table through admin
admin.createTable(tableDescriptor);
System.out.println(" Table created ");
}}

Unit 5 BDA
No ratings yet
Unit 5 BDA
34 pages
RetroGamer - Issue.01.ebook Goomba
100% (4)
RetroGamer - Issue.01.ebook Goomba
66 pages
UNIT 5 Notes
No ratings yet
UNIT 5 Notes
47 pages
HBase - Tutorial
No ratings yet
HBase - Tutorial
14 pages
Unit 5 Lecture No-3 (Hbase)
No ratings yet
Unit 5 Lecture No-3 (Hbase)
35 pages
Bda Unit 5
No ratings yet
Bda Unit 5
16 pages
Bda - Unit 5
No ratings yet
Bda - Unit 5
30 pages
Lesson 6 NoSQL Databases HBase
100% (1)
Lesson 6 NoSQL Databases HBase
47 pages
Unit 5 Hbase - Hive - Pig
No ratings yet
Unit 5 Hbase - Hive - Pig
93 pages
BDA Unit 5 HIVE HBASE
No ratings yet
BDA Unit 5 HIVE HBASE
33 pages
Cse 17CS82 M2 S4 PPT
No ratings yet
Cse 17CS82 M2 S4 PPT
19 pages
Unit 5 Notes
100% (3)
Unit 5 Notes
66 pages
HBASE
No ratings yet
HBASE
11 pages
Big Data Analytics Unit-5
No ratings yet
Big Data Analytics Unit-5
28 pages
Unit 5 Hbase
No ratings yet
Unit 5 Hbase
15 pages
HBASE
No ratings yet
HBASE
18 pages
Hadoop HBASE
No ratings yet
Hadoop HBASE
71 pages
HBase
No ratings yet
HBase
6 pages
Unit 5 Big Data
No ratings yet
Unit 5 Big Data
34 pages
9 HBase
No ratings yet
9 HBase
77 pages
10 HBase
No ratings yet
10 HBase
13 pages
Unit-5 Notes
No ratings yet
Unit-5 Notes
61 pages
Chapter 12 HBase
No ratings yet
Chapter 12 HBase
108 pages
HBase (Unit 4)
No ratings yet
HBase (Unit 4)
37 pages
HBase
No ratings yet
HBase
30 pages
Hadoop Week 6
No ratings yet
Hadoop Week 6
38 pages
BCA 2050 Computer Organization Model Question Paper
No ratings yet
BCA 2050 Computer Organization Model Question Paper
16 pages
Big Data UNIT 5 Own
No ratings yet
Big Data UNIT 5 Own
18 pages
UNIT5
No ratings yet
UNIT5
42 pages
Hbase
No ratings yet
Hbase
15 pages
Canteen Management System
100% (1)
Canteen Management System
13 pages
Unit V
No ratings yet
Unit V
6 pages
BDA Unit 5
No ratings yet
BDA Unit 5
33 pages
Unit 5
No ratings yet
Unit 5
10 pages
Apache HBase
No ratings yet
Apache HBase
12 pages
Hbase
No ratings yet
Hbase
23 pages
BDT Unit - V
No ratings yet
BDT Unit - V
15 pages
HBASE
No ratings yet
HBASE
18 pages
Unit V Hadoop Related Tools
No ratings yet
Unit V Hadoop Related Tools
54 pages
Big Data Unit 5
No ratings yet
Big Data Unit 5
18 pages
Unit 5 Lecture No-3 (Hbase)
No ratings yet
Unit 5 Lecture No-3 (Hbase)
35 pages
BDA Unit-5
No ratings yet
BDA Unit-5
31 pages
Big Data 22MSM40206
No ratings yet
Big Data 22MSM40206
9 pages
HBASE
No ratings yet
HBASE
35 pages
40 5RT84B Maa2hg
No ratings yet
40 5RT84B Maa2hg
14 pages
Unit - 5 Part - 1
No ratings yet
Unit - 5 Part - 1
8 pages
HBase
No ratings yet
HBase
39 pages
H Base Tutorial
No ratings yet
H Base Tutorial
38 pages
Unit 5 Bda
No ratings yet
Unit 5 Bda
42 pages
Unit - IV - Notes
No ratings yet
Unit - IV - Notes
23 pages
Columnar Database
No ratings yet
Columnar Database
18 pages
HBase
No ratings yet
HBase
12 pages
Unit 4
No ratings yet
Unit 4
15 pages
Hbase - Quick Guide Hbase - Overview
No ratings yet
Hbase - Quick Guide Hbase - Overview
53 pages
Teradata Performance Tuning
No ratings yet
Teradata Performance Tuning
21 pages
Hbase Big Table: Oriented vs. Column-Oriented Data Stores. As Shown Below, in A Row
No ratings yet
Hbase Big Table: Oriented vs. Column-Oriented Data Stores. As Shown Below, in A Row
6 pages
Cs525: Special Topics in DBS: Large-Scale Data Management
No ratings yet
Cs525: Special Topics in DBS: Large-Scale Data Management
35 pages
2 Unit 5
No ratings yet
2 Unit 5
24 pages
Irc3380 2880 Series-Sm
No ratings yet
Irc3380 2880 Series-Sm
135 pages
Data Communications - Transmission Modes
100% (1)
Data Communications - Transmission Modes
25 pages
4.5 Hbase
No ratings yet
4.5 Hbase
27 pages
Hbase - in Detail: Pushpinder Singh Paxcel Technologies
No ratings yet
Hbase - in Detail: Pushpinder Singh Paxcel Technologies
32 pages
HBase
No ratings yet
HBase
14 pages
Hbase: Q) What Is Hbase ?
No ratings yet
Hbase: Q) What Is Hbase ?
15 pages
Windows Keyboard Shortcuts
No ratings yet
Windows Keyboard Shortcuts
3 pages
Large-Scale Data Management: Hbase
No ratings yet
Large-Scale Data Management: Hbase
36 pages
HBase
No ratings yet
HBase
31 pages
Windows Network Diagnostic Commands
No ratings yet
Windows Network Diagnostic Commands
3 pages
NC Monitor Instruction Manual - M700
No ratings yet
NC Monitor Instruction Manual - M700
43 pages
EnCase Forensic v20.2 User Guide
No ratings yet
EnCase Forensic v20.2 User Guide
848 pages
Welcome To (OLT) : Automated Online Test System Presented by Zelda Colson
100% (1)
Welcome To (OLT) : Automated Online Test System Presented by Zelda Colson
22 pages
Quizizz: ICT Lab Rules and Procedures
No ratings yet
Quizizz: ICT Lab Rules and Procedures
22 pages
Header and Footer Format
No ratings yet
Header and Footer Format
10 pages
Computer Repair Utility Kit
No ratings yet
Computer Repair Utility Kit
6 pages
Java For Beginners IB
No ratings yet
Java For Beginners IB
139 pages
Biostar h61mgv3 Spec
No ratings yet
Biostar h61mgv3 Spec
2 pages
Pengaturcaraan Komputer
No ratings yet
Pengaturcaraan Komputer
6 pages
Variables, Expressions, and Statements
No ratings yet
Variables, Expressions, and Statements
34 pages
Instruction Scheduling: List Scheduling, Trace Scheduling, Loop Unrolling & Software Pipelining
No ratings yet
Instruction Scheduling: List Scheduling, Trace Scheduling, Loop Unrolling & Software Pipelining
137 pages
User'S Manual: Revision 1.1c
No ratings yet
User'S Manual: Revision 1.1c
93 pages
Himashu Resume
No ratings yet
Himashu Resume
1 page
Tasmay Project
No ratings yet
Tasmay Project
13 pages
Hyperion LCM Utility
No ratings yet
Hyperion LCM Utility
1 page
Memory Arch
No ratings yet
Memory Arch
9 pages
AMX Acendo Vibe Conferencing Sound Bar With Camera: Data Sheet
No ratings yet
AMX Acendo Vibe Conferencing Sound Bar With Camera: Data Sheet
4 pages
E11133 MB Pin Definition v2 Print Vendor Only PDF
No ratings yet
E11133 MB Pin Definition v2 Print Vendor Only PDF
18 pages
Ui22Cs57 Lab 5 Tejkumar
No ratings yet
Ui22Cs57 Lab 5 Tejkumar
8 pages
WebSphere Interview Preparation
No ratings yet
WebSphere Interview Preparation
2 pages
Description: Tags: EDconnect v72 ErrorCodes
No ratings yet
Description: Tags: EDconnect v72 ErrorCodes
23 pages
Learn Hbase in 24 Hours
From Everand
Learn Hbase in 24 Hours
Alex Nordeen
No ratings yet

HBase

Uploaded by

HBase

Uploaded by

HBase

Row Column Family Column Family Column Family

Row-Oriented Database Column-Oriented Database

No transactions are there in HBase. RDBMS is transactional.

It has de-normalized data. It will have normalized data.

It is good for semi-structured as well as It is good for structured data.

• Apache HBase is used to have random, real-time read/write access to

• It hosts very large tables on top of clusters of commodity hardware.

• Apache HBase is a non-relational database modeled after Google's

• It is used whenever there is a need to write heavy applications.

• HBase is used whenever we need to provide fast random access to

• Companies such as Facebook, Twitter, Yahoo, and Adobe use HBase

Is responsible for schema changes and other metadata operations such as

•alter - Alters a table.

•exists - Verifies whether a table exists.

•drop - Drops a table from HBase.

Creating a Table Using java API

create ‘<table name>’,’<column family>’

create 'emp', 'personal data', 'professional data'

Row key personal data professional data

Step1: Instantiate HBaseAdmin

Configuration conf = HBaseConfiguration.create();

//creating table descriptor

//creating column family descriptor

//adding coloumn family to HTable

You might also like