Cs525: Special Topics in DBS: Large-Scale Data Management
Cs525: Special Topics in DBS: Large-Scale Data Management
HBase
1
HBase: Overview
• HBase is a distributed column-oriented data store
built on top of HDFS
2
HBase vs. HDFS (Cont’d)
3
HBase Data Model
4
HBase Data Model
• HBase is based on Google’s Bigtable model
• Key-Value pairs
5
HBase Logical View
6
HBase: Keys and Column
Families
Each record is divided into Column Families
7
Column family named “anchor”
Column family named “Contents”
• Key
• Byte array
• Serves as the primary key for
the table Column named “apache.com”
• Indexed far fast lookup
• Column Family
• Has a name (string)
• Contains one or more related
columns
• Column
• Belongs to one column
family
• Included inside the row
• familyName:columnName
8
Version number for each row
• Version Number
• Unique within each
value
key
• By default🡪 System’s
timestamp
• Data type is Long
• Value (Cell)
• Byte array
9
Notes on Data Model
• HBase schema consists of several Tables
10
Notes on Data Model (Cont’d)
• The version number can be user-supplied
• Even does not have to be inserted in increasing order
• Version number are unique within each key
12
HBase Physical Model
• Each column family is stored in a separate file (called HTables)
• Key & Version numbers are replicated with each column family
13
Example
14
Column Families
15
HBase Regions
• Each HTable (column family) is partitioned horizontally
into regions
• Regions are counterpart to HDFS blocks
16
HBase Architecture
17
Three Major Components
• The HBaseMaster
• One master
• The HRegionServer
• Many region servers
18
HBase Components
• Region
• A subset of a table’s rows, like horizontal range partitioning
• Automatically done
• Master
• Responsible for coordinating the slaves
• Assigns regions, detects failures
• Admin functions
19
Big Picture
20
ZooKeeper
• HBase depends on
ZooKeeper
21
Creating a Table
HBaseAdmin admin= new HBaseAdmin(config);
HColumnDescriptor []column;
column= new HColumnDescriptor[2];
column[0]=new HColumnDescriptor("columnFamily1:");
column[1]=new HColumnDescriptor("columnFamily2:");
HTableDescriptor desc= new HTableDescriptor(Bytes.toBytes("MyTable"));
desc.addFamily(column[0]);
desc.addFamily(column[1]);
admin.createTable(desc);
22
Operations On Regions: Get()
• Given a key 🡪 return corresponding record
23
Operations On Regions: Scan()
24
Select value from table where
Get() key=‘com.apache.www’ AND
label=‘anchor:apache.com’
Time
Row key Column “anchor:”
Stamp
t12
t11
“com.apache.www”
t9 “anchor:cnnsi.com” “CNN”
t8 “anchor:my.look.ca” “CNN.com”
“com.cnn.www”
t6
t5
t3
Select value from table
Scan() where anchor=‘cnnsi.com’
Time
Row key Column “anchor:”
Stamp
t12
t11
“com.apache.www”
t9 “anchor:cnnsi.com” “CNN”
t8 “anchor:my.look.ca” “CNN.com”
“com.cnn.www”
t6
t5
t3
Operations On Regions: Put()
• Insert a new record (with a new key), Or
27
Operations On Regions: Delete()
• Marking table cells as deleted
• Multiple levels
• Can mark an entire column family as deleted
• Can make all column families of a given row as deleted
28
HBase: Joins
• HBase does not support joins
29
Altering a Table
30
Logging Operations
31
HBase Deployment
Master
node
Slave
nodes
32
HBase vs. HDFS
33
HBase vs. RDBMS
34
When to use HBase
35