Module 05 HBase - Distributed NoSQL Database
Module 05 HBase - Distributed NoSQL Database
HBase
www.huawei.com
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 2
Contents
1. Introduction to HBase
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 3
HBase Overview
HBase is a column-based distributed storage system that
features high reliability, performance, and scalability.
HBase is suitable for storing big table data (which contains billions of rows
and millions of columns) and allows real-time data access.
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 4
HBase vs. RDB
HBase RDB
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 5
Application Scenarios of HBase
HBase applies to the following scenarios:
Massive data (TB and PB)
The Atomicity, Consistency, Isolation, Durability (ACID) feature supported
by traditional relational databases is not required.
High throughput
Efficient random reading of massive data
High scalability
Simultaneous processing of structured and unstructured data
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 6
Position of HBase in FusionInsight
Application service layer
OpenAPI/SDK REST/SNMP/Syslog
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 7
Data Stored By Row
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 8
Data Stored by Column
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 9
KeyValue Storage Model (1)
ID Name Phone Address
KeyValue has a specific structure. Key is used to quickly query a data record,
and Value is used to store user data.
As a basic user data storage unit, KeyValue must store some description of
itself, such as timestamp and type information. This requires some structured
space.
Data can be expanded dynamically, adaptive to changes of data types and
structures. Data is read and written by block. Different Columns are not
associated, so are tables.
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 10
KeyValue Storage Model (2)
Partition mode of a KeyValue Database - based on continuous Key range.
Data subregions are created based on the RowKey range (sorting based on a sorting
algorithm such as the alphabetic order based on RowKeys). Each subregion is a basic
distributed storage unit.
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 11
KeyValue Storage Model (3)
The underlying data of HBase exists in the form of KeyValue. KeyValue has a
specific format.
KeyValue contains key information such as timestamp and type, etc.
The same key can be associated with multiple Values. Each KeyValue has a
qualifier.
There can be multiple KeyValues associated with the same Key and Qualifier.
In this case, they are distinguished using timestamps. This is why there are
multiple versions of the same data record.
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 12
Contents
1. Introduction to HBase
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 13
HBase Architecture (1)
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 14
HBase Architecture (2)
Store: A Region consists of one or
to a Column Family.
StoreFile: The data flushed to the HDFS is stored as a StoreFile in the HDFS.
Hfile: HFile defines the storage format of StoreFiles in a file system. HFile is underlying
implementation of StoreFile.
Hlog: HLogs prevent data loss when a RegionServer is faulty. Multiple Regions in a
RegionServer share the same HLog.
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 15
HMaster (1)
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 16
HMaster (2)
The HMaster process manages all the RegionServers.
Handles RegionServer failovers.
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 17
RegionServer
RegionServer is the data service
Region process of HBase and is responsible
RegionServer for processing reading and writing
requests of user data.
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 18
Region (1)
A data table is divided horizontally into subtables based on the
KeyValue range to implement distributed storage. A subtable is called
a Region in HBase.
Each Region is associated with a KeyValue range, which is described
using a StartKey and an EndKey.
Each Region only needs to record a StartKey, because its EndKey serves as
the StartKey of the next Region.
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 19
Region (2)
Row001
Row001
Row002 Region-1
Row002 StartKey, EndKey
………..
……….. Row010
Row010
Row011
Row011 Row012 Region-2
Row012 ……….. StartKey, EndKey
……….. Row020
Row020 Row021
Row021 Row022 Region-3
Row022 ……….. StartKey, EndKey
……….. Row030
Row030 Row031
Row031 ……….. Region-4
……….. ……….. StartKey, EndKey
………..
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 20
Region (3)
META
Region
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 21
Column Family
Region Region Region Region
/HBase/table
/region-1/ColumnFamily-1
/region-1/ColumnFamily-2
/region-2/ColumnFamily-1
/region-2/ColumnFamily-2
/HBase/table
/region-1 /region-3/ColumnFamily-1
/region-2 /region-3/ColumnFamily-2
/region-3
HDFS
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 22
ZooKeeper
ZooKeeper provides the following functions for HBase:
Distributed lock service
Multiple HMaster processes will try registering a node in ZooKeeper and the node can be
registered only by one HMaster process. The process that successfully registers the node
becomes the active HMaster process.
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 23
MetaData Table
User Table 1
The MetaData Table HBase:Meta
stores the information about
Regions to locate the Specific
Region for Client.
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 24
Contents
1. Introduction to HBase
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 25
Writing Process
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 26
Client Initiating a Data Writing Request
Client
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 27
Writing Process - Locating a Region
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 28
Writing Process - Grouping Data (1)
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 29
Writing Process - Grouping Data (2)
Data groups includes two
division steps:
Find the information of
meta table
Data on each RegionServer is sent at the same time. In this case, the
data has been divided by Region.
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 30
Writing Process - Sending a Request to
a RegionServer
Data is sent using the encapsulated RPC
framework of HBase.
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 31
Writing Process - Process of Writing
Data to a Region
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 32
Writing Process - Flush
MemStore-1
(ColumnFamily-1)
HFile
Region
MemStore-2 HFile
(ColumnFamily-2)
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 33
Impacts of Multiple HFiles
As time passes by, the number of HFiles increases and a query request
will take much more time.
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 34
Compaction (1)
Compaction aims to reduce the number of small files in a column family in a
Region, thereby increasing reading performance.
There are two kinds of compaction: major and minor.
Minor: compaction covering a small range. Minimum and maximum numbers of
files are specified. Small files at a consecutive time duration are combined.
Major: compaction covering the HFiles in a column family in a Region. During
major compaction, deleted data is cleared.
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 35
Compaction (2)
Write
put MemStore
Flush
Minor Compaction
Major Compaction
HFile
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 36
Region Split
A common Region splitting operation is
performed to split a Region into two subregions
if the data size of the Region exceeds the Parent
predefined threshold. Region
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 37
Reading Process
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 38
Client Initiating a Data Reading Request
Get When a precise key is provided, the
Get operation is performed to read a
single row of user data.
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 39
Locating a Region
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 40
OpenScanner
ColumnFamily-1
MemStore
HFile-11
HFile-12
Region
ColumnFamily-2
MemStore
HFile-21
HFile-22
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 41
Filter
Filter allows users to set filtering criteria during the Scan
Satisfied Row
operation. Only user data that meets the criteria returns.
There are some typical Filter types:
Satisfied Row
RowFilter
SingleColumnValueFilter
KeyOnlyFilter
FilterList
Satisfied Row
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 42
BloomFilter
BloomFilter is used to optimize scenarios where data is randomly read, that is,
scenarios where the Get operation is performed. It can be used to quickly
check whether a piece of user data exists in a large dataset (most data in the
dataset cannot be loaded to the memory).
A certain error rate exists when BloomFilter checks whether a piece of data
exits. Nevertheless, the conclusion indicated by the message "User data XXXX
does not exist" is accurate.
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 43
Contents
1. Introduction to HBase
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 44
Supporting Secondary Index
The secondary index enables HBase to query data based on specific column
values.
Column Family A Column Family B
RowKey A:Name A:Addr. A:Age B:Mobile B:Email
01 ZhangSan Beijing 23 6875349 ……
02 LiLei Hangzhou 43 6831475 ……
03 WangWu Shenzhen 35 6809568 ……
04 …… Wuhan 28 6812645 ……
05 …… Changsha 26 6889763 ……
06 …… Jinan 35 6854912 ……
When the secondary index is not used, the mobile field needs to be matched in the entire table by row
to search for specified mobile numbers such as ‘68XXX’ which results in long time delay.
When the secondary index is used, the index table is searched first to identify the location of the
mobile number, which narrows down the search scope and reduces the time delay.
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 45
HFS
HBase FileStream (HFS) is a separate module of Hbase. As an
encapsulation of Hbase and HDFS interfaces, HFS provides
capabilities, such as storing, reading and deleting files for
upper-level applications.
HFS provides the ability of storing massive small files and large
files in HDFS。 That is, massive small files (less than 10MB) and
some large files (larger than 10MB) can be stored in HBase.
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 46
HBase MOB (1)
MOB Data(100KB to 10MB)is directly stored in the file
system (HDFS for example)as HFile. And the information about
address and size of file is stored in HBase as a value. With tools
managing these files, the frequency of compation and split
can be highly reduced, and performance can be improved.
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 47
HBase MOB (2)
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 48
Summary
This module describes the following information about HBase:
KeyValue Storage Model, technical architecture, reading and
writing process and enhanced features of FusionInsight HBase.
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 49
Quiz
1. Can the services of the Region in HBase be provided when splitting?
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 50
Quiz
1. What is Compaction used for? ( )
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 51
Quiz
1. What is the physical storage unit of HBase? ( )
A. Region
B. Column Family
C. Column
D. Cell
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 52
More Information
Training materials:
http://support.huawei.com/learning/Certificate!showCertificate?lang=en&pbiPath=term100002
5450&id=Node1000011796
Exam outline:
http://support.huawei.com/learning/Certificate!toExamOutlineDetail?lang=en&nodeId=Node10
00011797
Mock exam:
http://support.huawei.com/learning/Certificate!toSimExamDetail?lang=en&nodeId=Node10000
11798
Authentication process:
http://support.huawei.com/learning/NavigationAction!createNavi#navi[id]=_40
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 53
Thank You
www.huawei.com
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 54