0% found this document useful (0 votes)

27 views35 pages

Cs525: Special Topics in DBS: Large-Scale Data Management

This document provides an overview of HBase, an open source, distributed, column-oriented database built on top of HDFS. It describes HBase's data model using tables, rows, columns and versions, its logical and physical storage layout, architecture involving a master and region servers, basic operations like get, put and scan, and how it compares to HDFS and relational databases. The key aspects covered are its scalability for large datasets, real-time random read/write capabilities and suitability for applications with large amounts of structured or semi-structured data.

Uploaded by

Woya Ma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views35 pages

Cs525: Special Topics in DBS: Large-Scale Data Management

Uploaded by

Woya Ma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 35

CS525: Special Topics in DBs

Large-Scale Data Management

HBase

1
HBase: Overview
• HBase is a distributed column-oriented data store
built on top of HDFS

• HBase is an Apache open source project whose goal is to

provide storage for the Hadoop Distributed Computing

• Data is logically organized into tables, rows and

columns

2
HBase vs. HDFS (Cont’d)

If application has neither random reads or writes 🡺 Stick to HDFS

3
HBase Data Model

4
HBase Data Model
• HBase is based on Google’s Bigtable model
• Key-Value pairs

5
HBase Logical View

6
HBase: Keys and Column
Families
Each record is divided into Column Families

Each row has a Key

Each column family consists of one or more Columns

7
Column family named “anchor”
Column family named “Contents”

• Key
• Byte array
• Serves as the primary key for
the table Column named “apache.com”
• Indexed far fast lookup

• Column Family
• Has a name (string)
• Contains one or more related
columns

• Column
• Belongs to one column
family
• Included inside the row
• familyName:columnName
8
Version number for each row

• Version Number
• Unique within each
value
key
• By default🡪 System’s
timestamp
• Data type is Long

• Value (Cell)
• Byte array

9
Notes on Data Model
• HBase schema consists of several Tables

• Each table consists of a set of Column Families

• Columns are not part of the schema

• HBase has Dynamic Columns

• Because column names are encoded inside the cells
• Different cells can have different columns

“Roles” column family

has different columns in
different cells

10
Notes on Data Model (Cont’d)
• The version number can be user-supplied
• Even does not have to be inserted in increasing order
• Version number are unique within each key

• Table can be very sparse

Has two columns
• Many cells are empty [cnnsi.com & my.look.ca]

• Keys are indexed as the primary key

HBase Physical Model

12
HBase Physical Model
• Each column family is stored in a separate file (called HTables)

• Key & Version numbers are replicated with each column family

• Empty cells are not stored

HBase maintains a multi-level

index on values:
<key, column family, column
name, timestamp>

13
Example

14
Column Families

15
HBase Regions
• Each HTable (column family) is partitioned horizontally
into regions
• Regions are counterpart to HDFS blocks

Each will be one

region

16
HBase Architecture

17
Three Major Components

• The HBaseMaster
• One master

• The HRegionServer
• Many region servers

• The HBase client

18
HBase Components
• Region
• A subset of a table’s rows, like horizontal range partitioning
• Automatically done

• RegionServer (many slaves)

• Manages data regions
• Serves data for reads and writes (using a log)

• Master
• Responsible for coordinating the slaves
• Assigns regions, detects failures
• Admin functions

19
Big Picture

20
ZooKeeper
• HBase depends on
ZooKeeper

• By default HBase manages

the ZooKeeper instance
• E.g., starts and stops
ZooKeeper

• HMaster and HRegionServers

21
Creating a Table
HBaseAdmin admin= new HBaseAdmin(config);
HColumnDescriptor []column;
column= new HColumnDescriptor[2];
column[0]=new HColumnDescriptor("columnFamily1:");
column[1]=new HColumnDescriptor("columnFamily2:");
HTableDescriptor desc= new HTableDescriptor(Bytes.toBytes("MyTable"));

desc.addFamily(column[0]);
desc.addFamily(column[1]);
admin.createTable(desc);

22
Operations On Regions: Get()
• Given a key 🡪 return corresponding record

• For each value return the highest version

• Can control the number of versions you want

23
Operations On Regions: Scan()

24
Select value from table where
Get() key=‘com.apache.www’ AND
label=‘anchor:apache.com’

Time
Row key Column “anchor:”
Stamp

t12

t11
“com.apache.www”

t10 “anchor:apache.com” “APACHE”

t9 “anchor:cnnsi.com” “CNN”

t8 “anchor:my.look.ca” “CNN.com”
“com.cnn.www”
t6

t3
Select value from table
Scan() where anchor=‘cnnsi.com’

Time
Row key Column “anchor:”
Stamp

t12

t11
“com.apache.www”

t10 “anchor:apache.com” “APACHE”

t9 “anchor:cnnsi.com” “CNN”

t8 “anchor:my.look.ca” “CNN.com”
“com.cnn.www”
t6

t3
Operations On Regions: Put()
• Insert a new record (with a new key), Or

• Insert a record for an existing key

Implicit version number
(timestamp)

Explicit version number

27
Operations On Regions: Delete()
• Marking table cells as deleted

• Multiple levels
• Can mark an entire column family as deleted
• Can make all column families of a given row as deleted

• All operations are logged by the RegionServers

• The log is flushed periodically

28
HBase: Joins
• HBase does not support joins

• Can be done in the application layer

• Using scan() and get() operations

29
Altering a Table

Disable the table before changing the schema

30
Logging Operations

31
HBase Deployment

Master
node

Slave
nodes

32
HBase vs. HDFS

33
HBase vs. RDBMS

34
When to use HBase

PGDCA II Sem Internet & Web Page Desigining
No ratings yet
PGDCA II Sem Internet & Web Page Desigining
120 pages
HBase
No ratings yet
HBase
38 pages
HBase
No ratings yet
HBase
39 pages
HBase (Unit 4)
No ratings yet
HBase (Unit 4)
37 pages
Large-Scale Data Management: Hbase
No ratings yet
Large-Scale Data Management: Hbase
36 pages
BDT Unit - V
No ratings yet
BDT Unit - V
15 pages
HBase
No ratings yet
HBase
31 pages
Unit 5 Lecture No-3 (Hbase)
No ratings yet
Unit 5 Lecture No-3 (Hbase)
35 pages
Hadoop HBASE
No ratings yet
Hadoop HBASE
71 pages
Unit 5 Lecture No-3 (Hbase)
No ratings yet
Unit 5 Lecture No-3 (Hbase)
35 pages
Big Data Analytics Unit-5
No ratings yet
Big Data Analytics Unit-5
28 pages
Unit 5 Big Data
No ratings yet
Unit 5 Big Data
34 pages
HBase
No ratings yet
HBase
27 pages
Bda - Unit 5
No ratings yet
Bda - Unit 5
30 pages
Unit 5 BDA
No ratings yet
Unit 5 BDA
34 pages
Lec 18
No ratings yet
Lec 18
18 pages
Unit 5 Hbase
No ratings yet
Unit 5 Hbase
15 pages
UNIT5
No ratings yet
UNIT5
42 pages
Cse 17CS82 M2 S4 PPT
No ratings yet
Cse 17CS82 M2 S4 PPT
19 pages
Unit-5 Notes
No ratings yet
Unit-5 Notes
61 pages
HBASE
No ratings yet
HBASE
11 pages
BDA Unit-5
No ratings yet
BDA Unit-5
31 pages
Assignment 10
No ratings yet
Assignment 10
9 pages
4.5 Hbase
No ratings yet
4.5 Hbase
27 pages
Assignment Day 10: Task 1
No ratings yet
Assignment Day 10: Task 1
8 pages
Chapter 12 HBase
No ratings yet
Chapter 12 HBase
108 pages
Hadoop Week 6
No ratings yet
Hadoop Week 6
38 pages
Lec 18
No ratings yet
Lec 18
21 pages
10 HBase
No ratings yet
10 HBase
13 pages
Unit 5 Notes
100% (3)
Unit 5 Notes
66 pages
UNIT 5 Notes
No ratings yet
UNIT 5 Notes
47 pages
HBASE
No ratings yet
HBASE
18 pages
Unit V Hadoop Related Tools
No ratings yet
Unit V Hadoop Related Tools
54 pages
Unit - 5 Part - 1
No ratings yet
Unit - 5 Part - 1
8 pages
Big Data Analytics & Technologies: Hbase
No ratings yet
Big Data Analytics & Technologies: Hbase
30 pages
Hbase - in Detail: Pushpinder Singh Paxcel Technologies
No ratings yet
Hbase - in Detail: Pushpinder Singh Paxcel Technologies
32 pages
Big Data Unit 5
No ratings yet
Big Data Unit 5
18 pages
HBase - Tutorial
No ratings yet
HBase - Tutorial
14 pages
9 HBase
No ratings yet
9 HBase
77 pages
HBASE
No ratings yet
HBASE
35 pages
Unit 5 Hbase - Hive - Pig
No ratings yet
Unit 5 Hbase - Hive - Pig
93 pages
Hbase Big Table: Oriented vs. Column-Oriented Data Stores. As Shown Below, in A Row
No ratings yet
Hbase Big Table: Oriented vs. Column-Oriented Data Stores. As Shown Below, in A Row
6 pages
Big Data UNIT 5 Own
No ratings yet
Big Data UNIT 5 Own
18 pages
HBASE
No ratings yet
HBASE
18 pages
Columnar Database
No ratings yet
Columnar Database
18 pages
HBase
No ratings yet
HBase
6 pages
C7 Hbase
No ratings yet
C7 Hbase
36 pages
BDA Unit 5
No ratings yet
BDA Unit 5
33 pages
H Base Tutorial
No ratings yet
H Base Tutorial
38 pages
Apache HBase
No ratings yet
Apache HBase
12 pages
BDM Unit 5
No ratings yet
BDM Unit 5
60 pages
Bda Unit 5
No ratings yet
Bda Unit 5
16 pages
BDA Unit-4 Part-2 HBase, Hive, Pig
No ratings yet
BDA Unit-4 Part-2 HBase, Hive, Pig
74 pages
Unit 5 Bda
No ratings yet
Unit 5 Bda
42 pages
Hbase - Quick Guide Hbase - Overview
No ratings yet
Hbase - Quick Guide Hbase - Overview
53 pages
Hbase in Practice
No ratings yet
Hbase in Practice
46 pages
4 4HBase
No ratings yet
4 4HBase
17 pages
Chapter03HBase DistributedDatabase&Hive
No ratings yet
Chapter03HBase DistributedDatabase&Hive
54 pages
Hbase
100% (1)
Hbase
30 pages
BDA Unit 5 HIVE HBASE
No ratings yet
BDA Unit 5 HIVE HBASE
33 pages
Learn Hbase in 24 Hours
From Everand
Learn Hbase in 24 Hours
Alex Nordeen
No ratings yet
Beginning Salesforce Developer: Printed Book
No ratings yet
Beginning Salesforce Developer: Printed Book
1 page
Module 03 SDLC Dan RPI Intelcycle
No ratings yet
Module 03 SDLC Dan RPI Intelcycle
20 pages
Human Computer Interaction - Mid II Question
No ratings yet
Human Computer Interaction - Mid II Question
2 pages
GCCT 2015 7342686
No ratings yet
GCCT 2015 7342686
6 pages
2014 Anthology WEB BIx Enhance Rep Skills
No ratings yet
2014 Anthology WEB BIx Enhance Rep Skills
138 pages
Computer Architecture and Organization Presented by Mr.P.Prashan T Ap/Aiml
No ratings yet
Computer Architecture and Organization Presented by Mr.P.Prashan T Ap/Aiml
365 pages
PortWise Manual
No ratings yet
PortWise Manual
374 pages
Lecture Note - Top Level View of Computer
No ratings yet
Lecture Note - Top Level View of Computer
6 pages
Lecture Notes in Artificial Intelligence 2198
No ratings yet
Lecture Notes in Artificial Intelligence 2198
630 pages
Summer Report Muhammad Fiz
No ratings yet
Summer Report Muhammad Fiz
30 pages
NS0-003 - A Tung
No ratings yet
NS0-003 - A Tung
34 pages
Mateen Resume
No ratings yet
Mateen Resume
6 pages
Recovery Appliance Short Overview 4487391
No ratings yet
Recovery Appliance Short Overview 4487391
19 pages
These Are The Best Times To Post On Social Media
No ratings yet
These Are The Best Times To Post On Social Media
4 pages
Sumit Singh Resume
No ratings yet
Sumit Singh Resume
1 page
Apex
No ratings yet
Apex
18 pages
Finding The Tool
No ratings yet
Finding The Tool
11 pages
Blockchain QB
No ratings yet
Blockchain QB
3 pages
Nagios // Core Feature Comparison: The Industry Standard in IT Infrastructure Monitoring
No ratings yet
Nagios // Core Feature Comparison: The Industry Standard in IT Infrastructure Monitoring
2 pages
openSAP S4h35 Week 2 Unit 01 DIRECRRRT Presentation
No ratings yet
openSAP S4h35 Week 2 Unit 01 DIRECRRRT Presentation
17 pages
Mis-2.5-Balaji College - Kadapa - Icet Code Bimk
No ratings yet
Mis-2.5-Balaji College - Kadapa - Icet Code Bimk
68 pages
Blockchain and Deep Learning For Secure Communication in Digital Twin Empowered Industrial IoT Network
No ratings yet
Blockchain and Deep Learning For Secure Communication in Digital Twin Empowered Industrial IoT Network
13 pages
Neuro Kode-5 Answer Key
No ratings yet
Neuro Kode-5 Answer Key
50 pages
Introduction To Information Technology For Business (Presentation)
100% (1)
Introduction To Information Technology For Business (Presentation)
27 pages
CheatSheet FortiOS 6.2
No ratings yet
CheatSheet FortiOS 6.2
3 pages
DBMS Notes
No ratings yet
DBMS Notes
11 pages
Lab 5 - Student
No ratings yet
Lab 5 - Student
20 pages
Software Engineering Research Topics
No ratings yet
Software Engineering Research Topics
3 pages
Pengantar Basis Data: Kemas Rahmat Saleh Wiharja Fakultas Teknik Informatika Ittelkom
No ratings yet
Pengantar Basis Data: Kemas Rahmat Saleh Wiharja Fakultas Teknik Informatika Ittelkom
41 pages

Cs525: Special Topics in DBS: Large-Scale Data Management

Uploaded by

Cs525: Special Topics in DBS: Large-Scale Data Management

Uploaded by

CS525: Special Topics in DBs

Large-Scale Data Management

• HBase is an Apache open source project whose goal is to

• Data is logically organized into tables, rows and

If application has neither random reads or writes 🡺 Stick to HDFS

Each row has a Key

Each column family consists of one or more Columns

• Each table consists of a set of Column Families

• HBase has Dynamic Columns

“Roles” column family

• Table can be very sparse

• Keys are indexed as the primary key

• Empty cells are not stored

HBase maintains a multi-level

Each will be one

• The HBase client

• RegionServer (many slaves)

• By default HBase manages

• HMaster and HRegionServers

• For each value return the highest version

• Can control the number of versions you want

t10 “anchor:apache.com” “APACHE”

t10 “anchor:apache.com” “APACHE”

• Insert a record for an existing key

Explicit version number

• All operations are logged by the RegionServers

• Can be done in the application layer

Disable the table before changing the schema

You might also like