0% found this document useful (0 votes)
18 views15 pages

Unit 5 Hbase

Uploaded by

anoop6387276254
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views15 pages

Unit 5 Hbase

Uploaded by

anoop6387276254
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 15

Unit – 5 Hbase

1
Agenda

• Hbase Concept
• Hbase vs RDBMS
• Advanced usages
• Schema design
• Advance indexing
• Zookeeper

2
Introduction
• HBase is a data model that is similar to Google’s big table designed to
provide quick random access to huge amounts of structured data.

Limitations of Hadoop
• Hadoop can perform only batch processing, and data will be accessed
only in a sequential manner. That means one has to search the entire
dataset even for the simplest of jobs.
• At this point, a new solution is needed to access any point of data in a
single unit of time (random access).

3
What is HBase?

• HBase is a distributed column-oriented database built on


top of the Hadoop file system.
• It is an open-source project and is horizontally scalable.
• HBase is a data model that is similar to Google’s big table
designed to provide quick random access to huge amounts
of structured data.
• It leverages the fault tolerance provided by the Hadoop File
System (HDFS).
• One can store the data in HDFS either directly or through
HBase.
• HBase sits on top of the Hadoop File System and provides
read and write access.
4
HBase Client

• HBase is written in Java and provides Java API to communicate with


it.
• The client APIs provide both DDL (data definition language) and DML
(data manipulation language) semantics very much like what you
find in SQL for relational databases.

5
Storage Mechanism in HBase
• HBase is a column-oriented database and the tables in it are sorted
by row.
• The table schema defines only column families, which are the key
value pairs.
• A table have multiple column families and each column family can
have any number of columns.

6
HBase Vs RDBMS

7
Features of HBase

• HBase is linearly scalable.


• It has automatic failure support.
• It provides consistent read and writes.
• It integrates with Hadoop, both as a source and a destination.
• It has easy java API for client.
• It provides data replication across clusters.

8
Where to Use HBase

• Data volume - It is must process petabytes of data in this distributed


environment else it will be a misuse of technology framework.
• Application Types - While we have a variable schema with slightly
different rows and when you are going for a key dependent access
to our stored data, we prefer to use HBase.
• Hardware Environment - If you have good hardware support,
as HDFS works efficiently with a large number of nodes, and HBase
runs on top of HDFS, then, HBase can be a right choice.

9
Where to Use HBase

• No requirement of relational features - If we do not need features


like transaction, triggers, complex query, complex joins etc. then go
for HBase.
• Quick Access to data

10
11
HBase Table Schema Design

• The HBase schema design is very different compared to the relation


database schema design.
• By using the create command in HBase, we can create a table.

hbase(main):002:0> create 'emp', 'personal data', 'professional data'

12
Inserting Data using HBase Shell

• To create data in an HBase table, the following commands and


methods are used:
• put command,
• add() method of Put class, and
• put() method of HTable class.

13
• hbase(main):005:0> put 'emp','1','personal data:name','raju‘
• hbase(main):006:0> put 'emp','1','personal data:city','hyderabad'

14
Q & A Time
We have 10 Minutes for Q&A

15

You might also like