0% found this document useful (0 votes)

107 views38 pages

HBase

HBase is a distributed column-oriented database built on HDFS. It provides Bigtable-style capabilities for Hadoop, including fast random reads and writes and incremental data loading. HBase partitions tables into regions that are distributed across region servers for scalability. The HBase master coordinates region assignments and failures across the region servers.

Uploaded by

Chris Harris

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

107 views38 pages

HBase

Uploaded by

Chris Harris

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 38

CS525: Special Topics in DBs

Large-Scale Data Management

HBase

Spring 2013
WPI, Mohamed Eltabakh

1
HBase: Overview
• HBase is a distributed column-oriented data
store built on top of HDFS

• HBase is an Apache open source project whose goal

is to provide storage for the Hadoop Distributed
Computing

• Data is logically organized into tables, rows and

columns

2
HBase: Part of Hadoop’s
Ecosystem

HBase is built on top of HDFS

HBase files are

internally stored
in HDFS

3
HBase vs. HDFS
• Both are distributed systems that scale to hundreds or
thousands of nodes

• HDFS is good for batch processing (scans over big files)

• Not good for record lookup
• Not good for incremental addition of small batches
• Not good for updates

4
HBase vs. HDFS (Cont’d)
• HBase is designed to efficiently address the above points
• Fast record lookup
• Support for record-level insertion
• Support for updates (not in place)

• HBase updates are done by creating new versions of

values

5
HBase vs. HDFS (Cont’d)

If application has neither random reads or writes  Stick to HDFS

6
HBase Data Model

7
HBase Data Model
• HBase is based on Google’s Bigtable model
• Key-Value pairs

Column Family

Row key

TimeStamp value

8
HBase Logical View

9
HBase: Keys and Column
Families
Each record is divided into Column Families

Each row has a Key

Each column family consists of one or more Columns

10
Column family named “anchor”
Column family named “Contents”

Column
Time
Row key “content Column “anchor:”
• Key Stamp
s:”
• Byte array
“<html>
• Serves as the primary key t12
…”
for the table “com.apac
“<html>
Column named “apache.com”
• Indexed far fast lookup he.ww t11
…”
w”
• Column Family t10
“anchor:apache
.com”
“APACH
E”
• Has a name (string)
“anchor:cnnsi.co
• Contains one or more t15 “CNN”
m”
related columns
“anchor:my.look. “CNN.co
t13
ca” m”
• Column
“com.cnn.w “<html>
• Belongs to one column ww” t6
…”
family
“<html>
• Included inside the row t5
…”
• familyName:columnName “<html>
t3
…”

11
Version number for each row

Column
Time
Row key “content Column “anchor:”
Stamp
• Version Number s:”

• Unique within each “<html>

t12
key …” value
“com.apac
“<html>
• By default System’s he.ww
w”
t11
…”
timestamp t10
“anchor:apache “APACH
.com” E”
• Data type is Long
“anchor:cnnsi.co
t15 “CNN”
m”
• Value (Cell) “anchor:my.look. “CNN.co
t13
ca” m”
• Byte array
“com.cnn.w “<html>
t6
ww” …”

“<html>
t5
…”
“<html>
t3
…”

12
Notes on Data Model
• HBase schema consists of several Tables
• Each table consists of a set of Column Families
• Columns are not part of the schema

• HBase has Dynamic Columns

• Because column names are encoded inside the cells
• Different cells can have different columns

“Roles” column family

has different columns
in different cells

13
Notes on Data Model (Cont’d)
• The version number can be user-supplied
• Even does not have to be inserted in increasing order
• Version number are unique within each key

• Table can be very sparse

Has two columns
• Many cells are empty [cnnsi.com & my.look.ca]

• Keys are indexed as the primary key

HBase Physical Model

15
HBase Physical Model
• Each column family is stored in a separate file (called HTables)

• Key & Version numbers are replicated with each column family

• Empty cells are not stored

16
Example

17
Column Families

18
HBase Regions
• Each HTable (column family) is partitioned horizontally
into regions
• Regions are counterpart to HDFS blocks

Each will be one region

19
HBase Architecture

20
Three Major Components
• The HBaseMaster
• One master

• The HRegionServer
• Many region servers

• The HBase client

21
HBase Components
• Region
• A subset of a table’s rows, like horizontal range partitioning
• Automatically done

• RegionServer (many slaves)

• Manages data regions
• Serves data for reads and writes (using a log)

• Master
• Responsible for coordinating the slaves
• Assigns regions, detects failures
• Admin functions

22
Big Picture

23
ZooKeeper
• HBase depends on
ZooKeeper

• By default HBase manages

the ZooKeeper instance
• E.g., starts and stops
ZooKeeper

• HMaster and HRegionServers

24
Creating a Table
HBaseAdmin admin= new HBaseAdmin(config);
HColumnDescriptor []column;
column= new HColumnDescriptor[2];
column[0]=new HColumnDescriptor("columnFamily1:");
column[1]=new HColumnDescriptor("columnFamily2:");
HTableDescriptor desc= new HTableDescriptor(Bytes.toBytes("MyTable"));
desc.addFamily(column[0]);
desc.addFamily(column[1]);
admin.createTable(desc);

25
Operations On Regions: Get()
• Given a key  return corresponding record

• For each value return the highest version

• Can control the number of versions you want

26
Operations On Regions: Scan()

27
Select value from table where
Get() key=‘com.apache.www’ AND
label=‘anchor:apache.com’

Time
Row key Column “anchor:”
Stamp

t12

t11
“com.apache.www”

t10 “anchor:apache.com” “APACHE”

t9 “anchor:cnnsi.com” “CNN”

t8 “anchor:my.look.ca” “CNN.com”
“com.cnn.www”
t6

t3
Select value from table
Scan() where anchor=‘cnnsi.com’

Time
Row key Column “anchor:”
Stamp

t12

t11
“com.apache.www”

t10 “anchor:apache.com” “APACHE”

t9 “anchor:cnnsi.com” “CNN”

t8 “anchor:my.look.ca” “CNN.com”
“com.cnn.www”
t6

t3
Operations On Regions: Put()
• Insert a new record (with a new key), Or

• Insert a record for an existing key

Implicit version number
(timestamp)

Explicit version number

30
Operations On Regions: Delete()

• Marking table cells as deleted

• Multiple levels
• Can mark an entire column family as deleted
• Can make all column families of a given row as deleted

31
HBase: Joins
• HBase does not support joins

• Can be done in the application layer

• Using scan() and get() operations

32
Altering a Table

33
Logging Operations

34
HBase Deployment

Master
node

Slave
nodes

35
HBase vs. HDFS

36
HBase vs. RDBMS

37
When to use HBase

CockroachDB - The Resilient Geo-Distributed SQL Database PDF
No ratings yet
CockroachDB - The Resilient Geo-Distributed SQL Database PDF
17 pages
6.1 GCP - Cloud - Bigtable PDF
100% (1)
6.1 GCP - Cloud - Bigtable PDF
18 pages
GURPS Classic - Ogre - Greg Rose - 2000 - Steve Jackson Games Incorporated - 1556344171 - Anna's Archive
100% (1)
GURPS Classic - Ogre - Greg Rose - 2000 - Steve Jackson Games Incorporated - 1556344171 - Anna's Archive
131 pages
14 Types of Databases and Data Stores You Should Know
No ratings yet
14 Types of Databases and Data Stores You Should Know
16 pages
Using Volt DB
No ratings yet
Using Volt DB
228 pages
Welcome To VoltDB Training
100% (1)
Welcome To VoltDB Training
102 pages
Implement - Column-Family Stores
No ratings yet
Implement - Column-Family Stores
37 pages
Mastering Google Bigtable Database
No ratings yet
Mastering Google Bigtable Database
248 pages
Defiance Evolution of Arms Expansion Mjg0411
67% (3)
Defiance Evolution of Arms Expansion Mjg0411
64 pages
Lesson 6 NoSQL Databases HBase
100% (1)
Lesson 6 NoSQL Databases HBase
47 pages
S Pig Hive HBase Zookeeper 07
No ratings yet
S Pig Hive HBase Zookeeper 07
21 pages
HBase
No ratings yet
HBase
39 pages
DarkAgeCoreRules2013 PDF
No ratings yet
DarkAgeCoreRules2013 PDF
89 pages
Nosql Is Dead: Eric Redmond @coderoshi
No ratings yet
Nosql Is Dead: Eric Redmond @coderoshi
55 pages
The Dronescourge Returns
100% (1)
The Dronescourge Returns
112 pages
Figure Descriptions and Rules
No ratings yet
Figure Descriptions and Rules
17 pages
Bigtable: A Distributed Storage System For Structured Data: Presentation On Paper by
No ratings yet
Bigtable: A Distributed Storage System For Structured Data: Presentation On Paper by
12 pages
HF4 Core RulesV3 - 2020-06-14
No ratings yet
HF4 Core RulesV3 - 2020-06-14
56 pages
Killing Time 2
No ratings yet
Killing Time 2
20 pages
Hadoop HBASE
No ratings yet
Hadoop HBASE
71 pages
DAChronicles Forsaken StJohn
No ratings yet
DAChronicles Forsaken StJohn
10 pages
Ashes Reborn Rulebook Final
No ratings yet
Ashes Reborn Rulebook Final
28 pages
Hbase: Q) What Is Hbase ?
No ratings yet
Hbase: Q) What Is Hbase ?
15 pages
CORE 2017 Faction
No ratings yet
CORE 2017 Faction
26 pages
Computer Science Practical File 2024-25
No ratings yet
Computer Science Practical File 2024-25
50 pages
Solution Manual For Database Systems Design Implementation and Management 10th Edition
50% (2)
Solution Manual For Database Systems Design Implementation and Management 10th Edition
13 pages
DBMS (4CS4-05) - Solution-Model Guess Paper
No ratings yet
DBMS (4CS4-05) - Solution-Model Guess Paper
103 pages
Ironclads Admirals Handbook
50% (2)
Ironclads Admirals Handbook
88 pages
HBase (Unit 4)
No ratings yet
HBase (Unit 4)
37 pages
noSQL V newSQL
No ratings yet
noSQL V newSQL
33 pages
Lesson 3 - Data - Ingestion - Into - Big - Data - Systems - and - ETL
No ratings yet
Lesson 3 - Data - Ingestion - Into - Big - Data - Systems - and - ETL
104 pages
Cs525: Special Topics in DBS: Large-Scale Data Management
No ratings yet
Cs525: Special Topics in DBS: Large-Scale Data Management
35 pages
BDT Unit - V
No ratings yet
BDT Unit - V
15 pages
Tin01 Rulebook Eng Smaller-Compressed
No ratings yet
Tin01 Rulebook Eng Smaller-Compressed
20 pages
Bushido New Dawn Rules PDF
100% (1)
Bushido New Dawn Rules PDF
15 pages
Google Bigtable
No ratings yet
Google Bigtable
3 pages
Cockroach DB
No ratings yet
Cockroach DB
37 pages
Data Access For Highly Scalable Solutions
No ratings yet
Data Access For Highly Scalable Solutions
273 pages
Zenian League Fleet Manual1
No ratings yet
Zenian League Fleet Manual1
20 pages
Quetzol: Supreme War Captain
No ratings yet
Quetzol: Supreme War Captain
5 pages
Rules of Play: COIN Series, Volume IV
No ratings yet
Rules of Play: COIN Series, Volume IV
28 pages
"I Can Hear The Father I Hear Him in My Blood ": Commander
No ratings yet
"I Can Hear The Father I Hear Him in My Blood ": Commander
8 pages
HellDorado Rulebook
No ratings yet
HellDorado Rulebook
40 pages
DBMS Ptu
No ratings yet
DBMS Ptu
42 pages
Unit 1 Introduction To Dbms
No ratings yet
Unit 1 Introduction To Dbms
52 pages
Crimson Knights Tree of Savior: Tactical Guide
No ratings yet
Crimson Knights Tree of Savior: Tactical Guide
55 pages
Combat Com 2
No ratings yet
Combat Com 2
109 pages
Brawlmachine+1 1+
No ratings yet
Brawlmachine+1 1+
22 pages
Reports Documentation Project Presentation Final
No ratings yet
Reports Documentation Project Presentation Final
22 pages
Snowflake Advanced Data Engineer Notes
No ratings yet
Snowflake Advanced Data Engineer Notes
5 pages
WRICEF Code Review Checklist
100% (1)
WRICEF Code Review Checklist
59 pages
Command & Conquer Generals - User Manual (EN)
No ratings yet
Command & Conquer Generals - User Manual (EN)
17 pages
Battleline Wooden Ships and Iron Men Rules PDF
No ratings yet
Battleline Wooden Ships and Iron Men Rules PDF
15 pages
Power BI Workshop
No ratings yet
Power BI Workshop
8 pages
SOLO Rules LOS 200630 PDF
No ratings yet
SOLO Rules LOS 200630 PDF
15 pages
A Guide To Improving Data Integrity and Adoption
No ratings yet
A Guide To Improving Data Integrity and Adoption
39 pages
SQL Database
No ratings yet
SQL Database
18 pages
MERCS Quick Rules
No ratings yet
MERCS Quick Rules
23 pages
Good Interview Questions For People Who Have Smart Home Devices
No ratings yet
Good Interview Questions For People Who Have Smart Home Devices
1 page
Blogs Data
No ratings yet
Blogs Data
47 pages
AGE AGE AGE Players Players Players Players
No ratings yet
AGE AGE AGE Players Players Players Players
16 pages
Database Management System Class 10 Question Bank
100% (2)
Database Management System Class 10 Question Bank
56 pages
Data Science: Part 2 - SQL
100% (1)
Data Science: Part 2 - SQL
13 pages
Large-Scale Data Management: Hbase
No ratings yet
Large-Scale Data Management: Hbase
36 pages
Quickstart Rules 1.3
No ratings yet
Quickstart Rules 1.3
6 pages
70 461 SampleQuestions
No ratings yet
70 461 SampleQuestions
126 pages
Revisiting Metagaming (Warning Order 35) (2013)
No ratings yet
Revisiting Metagaming (Warning Order 35) (2013)
3 pages
43.darpan (It-2) - 09315003119 - DBMS
No ratings yet
43.darpan (It-2) - 09315003119 - DBMS
50 pages
Project Report of 8th Sem
No ratings yet
Project Report of 8th Sem
56 pages
DBMS Notes
No ratings yet
DBMS Notes
22 pages
Chapter 4 Solutions
No ratings yet
Chapter 4 Solutions
63 pages
Negativ Space Rules - Barebones - Pages
No ratings yet
Negativ Space Rules - Barebones - Pages
39 pages
ISM LAB FILE by Raman
No ratings yet
ISM LAB FILE by Raman
35 pages
Brotherhood 11.2015 F2
No ratings yet
Brotherhood 11.2015 F2
24 pages
KDB
No ratings yet
KDB
27 pages
Connecting Flights - Rules.en
No ratings yet
Connecting Flights - Rules.en
9 pages
Aquan Prime Updated Guide April 2016
No ratings yet
Aquan Prime Updated Guide April 2016
21 pages
Detecting Logic Bugs
No ratings yet
Detecting Logic Bugs
26 pages
DBMS Unit 4
No ratings yet
DBMS Unit 4
18 pages
AVG 3200-01 Free
100% (1)
AVG 3200-01 Free
24 pages
1z0 071
100% (4)
1z0 071
144 pages
Cryptographic Hash Functions: Purpose
No ratings yet
Cryptographic Hash Functions: Purpose
20 pages
Darkest Hour Development Diaries Archive
No ratings yet
Darkest Hour Development Diaries Archive
103 pages
CLL F045 Ap TRM Eng
No ratings yet
CLL F045 Ap TRM Eng
48 pages
Class12 Mysql Rev8
No ratings yet
Class12 Mysql Rev8
5 pages
Infinity Tactics - The Fundamentals of Defense - Goonhammer
No ratings yet
Infinity Tactics - The Fundamentals of Defense - Goonhammer
12 pages
1 Introduction To Statsmodels
No ratings yet
1 Introduction To Statsmodels
28 pages
01 Quantopian Research Basics
No ratings yet
01 Quantopian Research Basics
25 pages
Tanel Poder Active Session History Seminar
No ratings yet
Tanel Poder Active Session History Seminar
88 pages
Entity Relationship Notes
No ratings yet
Entity Relationship Notes
15 pages
DB
No ratings yet
DB
12 pages
CV040 Conversion Data Mapping
No ratings yet
CV040 Conversion Data Mapping
11 pages
Android Sai Tech : SCJP Durga Soft SCJP 1. 2. Ramireddy SCJP
No ratings yet
Android Sai Tech : SCJP Durga Soft SCJP 1. 2. Ramireddy SCJP
4 pages
All Tasks Solved Immediate Download: Requirements For Business Intelligence Capstone Project
No ratings yet
All Tasks Solved Immediate Download: Requirements For Business Intelligence Capstone Project
8 pages
Ticket Granting Server
No ratings yet
Ticket Granting Server
1 page
The Communication Protocols
No ratings yet
The Communication Protocols
1 page
The Needham-Schroeder Rules
No ratings yet
The Needham-Schroeder Rules
1 page
Ticket Options
No ratings yet
Ticket Options
1 page
SQL Case Study
No ratings yet
SQL Case Study
3 pages
Ogre 6e Sell PDF
No ratings yet
Ogre 6e Sell PDF
2 pages
Steps To Be Taken For DESKI To WEBI Conversion
No ratings yet
Steps To Be Taken For DESKI To WEBI Conversion
23 pages
Select q22016 Rainey
No ratings yet
Select q22016 Rainey
5 pages
Salesperson Customer Sales Invoice
No ratings yet
Salesperson Customer Sales Invoice
1 page
Eve Quick Reference
100% (1)
Eve Quick Reference
2 pages

HBase

Uploaded by

HBase

Uploaded by

CS525: Special Topics in DBs

Large-Scale Data Management

• HBase is an Apache open source project whose goal

• Data is logically organized into tables, rows and

HBase is built on top of HDFS

HBase files are

• HDFS is good for batch processing (scans over big files)

• HBase updates are done by creating new versions of

If application has neither random reads or writes  Stick to HDFS

Each row has a Key

Each column family consists of one or more Columns

• Unique within each “<html>

• HBase has Dynamic Columns

“Roles” column family

• Table can be very sparse

• Keys are indexed as the primary key

• Empty cells are not stored

Each will be one region

• The HBase client

• RegionServer (many slaves)

• By default HBase manages

• HMaster and HRegionServers

• For each value return the highest version

• Can control the number of versions you want

t10 “anchor:apache.com” “APACHE”

t10 “anchor:apache.com” “APACHE”

• Insert a record for an existing key

Explicit version number

• Marking table cells as deleted

• Can be done in the application layer

You might also like