HBase
HBase
1
HBase: Part of Hadoop’s
Ecosystem
2
HBase: Overview
• HBase is a distributed column-oriented datas store built on top of HDFS
• HBase is an Apache open-source project whose goal is to provide storage
for the Hadoop Distributed Computing
• Data is logically organized into tables, rows and columns
4
HBase vs. HDFS (Cont’d)
• HBase is designed to efficiently address the above points
• Fast record lookup
• Support for record-level insertion
• Support for updates (not in place)
• HBase internally uses Hash tables and provides random
access, and it stores the data in indexed HDFS files for
faster lookups.
5
HBase vs. HDFS (Cont’d)
6
HBase Data Model
7
HBase Data Model
• A column-oriented database stores data in cells grouped into columns,
not rows
8
HBase Data Model
1. Table & 2. Row
•Several Rows are multiple in Hbase Table. Columns have values assigned to them. HBase sorts rows alphabetically by
row key.
•The main goal is to store data so that related rows are closer together. The domain of the site is used as a common row-
key pattern. For example, if our row keys are domains, we should store them in reverse, i.e. org.apache.www or
org.apache.mail or org.Apache.Jira. This way, all Apache domains are close to each other in the HBase table.
3. Column
•An HBase column consists of a column family and a column qualifier separated by the : (colon) character.
•A. Column family: Column families physically house a set of columns and their values; then, Each column family has a
set of storage properties, such as how its data is compressed, whether its values should be cached, how its row keys are
encoded, and more. Each row in an HBase table has the same column families.
•b. Column qualifications: A column qualifier for qualification is added to the column family to provide an index for that
data part. Example: the column family is content, then the column qualifier can be content: HTML or content: pdf. The
Column families are fixed during table creation, but column qualifiers are mutable and vary widely between rows.
4. The cell
•A cell is essentially a combination of a row, a column family, and a column qualifier. Contains a value and a timestamp
that represents the version of the value.
5. Timestamp
•A timestamp is an identifier for a given value version and is written next to each value. The timestamp default represents
the time on the RegionServer when the data was written. However, we can specify a different timestamp value when
inserting data into a cell.
HBase: Keys and Column
Families
Each record is divided into Column Families
10
Column family named “anchor”
Column family named “Contents”
• Key
• Byte array
• Serves as the primary key
for the table
Column named “apache.com”
• Indexed far fast lookup
• Column Family
• Has a name (string)
• Contains one or more
related columns
• Column
• Belongs to one column
family
• Included inside the row
• familyName:columnName
11
Version number for each row
• Version Number
• Unique within each
key value
• By default→
System’s timestamp
• Data type is Long
• Value (Cell)
• Byte array
12
Notes on Data Model
• HBase schema consists of several Tables
• Each table consists of a set of Column Families
• Columns are not part of the schema
13
Notes on Data Model (Cont’d)
• The version number can be user-supplied
• Even does not have to be inserted in increasing order
• Version number are unique within each key
15
HBase Physical Model
• Each column family is stored in a separate file (called HTables)
• Key & Version numbers are replicated with each column family
16
Example
17
Column Families
18
HBase Regions
• Each HTable (column family) is partitioned horizontally
into regions
• Regions are counterpart to HDFS blocks
19
HBase Architecture
20
Three Major Components
• The HBaseMaster
• One master
• The HRegionServer
• Many region servers
21
HBase Components
• Region
• A subset of a table’s rows, like horizontal range partitioning
• Automatically done
• Master
• Responsible for coordinating the slaves
• Assigns regions, detects failures
• Admin functions
https://www.analyticsvidhya.com/blog/2022/10/a-brief-introduction-to-apache-hbase-and-its-architecture/
22
Big Picture
23
ZooKeeper
• HBase depends on ZooKeeper
24
Creating a Table
HBaseAdmin admin= new HBaseAdmin(config);
HColumnDescriptor []column;
column= new HColumnDescriptor[2];
column[0]=new HColumnDescriptor("columnFamily1:");
column[1]=new HColumnDescriptor("columnFamily2:");
HTableDescriptor desc= new HTableDescriptor(Bytes.toBytes("MyTable"));
desc.addFamily(column[0]);
desc.addFamily(column[1]);
admin.createTable(desc);
25
Operations On Regions: Get()
• Given a key → return corresponding record
26
Operations On Regions: Scan()
27
Select value from table where
Get() key=‘com.apache.www’ AND
label=‘anchor:apache.com’
Time
Row key Column “anchor:”
Stamp
t12
t11
“com.apache.www”
t9 “anchor:cnnsi.com” “CNN”
t8 “anchor:my.look.ca” “CNN.com”
“com.cnn.www”
t6
t5
t3
Select value from table
Scan() where anchor=‘cnnsi.com’
Time
Row key Column “anchor:”
Stamp
t12
t11
“com.apache.www”
t9 “anchor:cnnsi.com” “CNN”
t8 “anchor:my.look.ca” “CNN.com”
“com.cnn.www”
t6
t5
t3
Operations On Regions: Put()
• Insert a new record (with a new key), Or
30
Operations On Regions: Delete()
• Multiple levels
• Can mark an entire column family as deleted
• Can make all column families of a given row as deleted
31
HBase: Joins
• HBase does not support joins
32
Altering a Table
33
Logging Operations
34
HBase Deployment
Master
node
Slave
nodes
35
HBase vs. HDFS
36
HBase vs. RDBMS
37
When to use HBase
38
References
• https://www.bmc.com/blogs/hadoop-hbase/
• https://towardsdatascience.com/hbase-working-principle-a-part-of-hadoop-architecture-fbe0453a031b
• https://medium.com/hands-on-apache-hbase/an-introduction-to-apache-hbase-2cdd1d9ff13
• https://builtin.com/data-science/hbase
• https://www.analyticsvidhya.com/blog/2022/10/a-brief-introduction-to-apache-hbase-and-its-
architecture/
• https://www.tutorialspoint.com/hbase/hbase_overview.htm
• https://www.geeksforgeeks.org/apache-hbase/
• https://www.simplilearn.com/tutorials/hadoop-tutorial/hbase