0% found this document useful (0 votes)

22 views54 pages

Module 05 HBase - Distributed NoSQL Database

The document provides an overview of HBase, a distributed column-oriented storage system designed for high reliability, performance, and scalability. It covers HBase's architecture, key processes, and Huawei's enhanced features, highlighting its suitability for massive data storage and real-time access. Additionally, it contrasts HBase with traditional relational databases and details its data storage models, including KeyValue structures and the role of ZooKeeper in managing distributed operations.

Uploaded by

Lucas Oliveira

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views54 pages

Module 05 HBase - Distributed NoSQL Database

Uploaded by

Lucas Oliveira

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 54

Technical Principles of

HBase

www.huawei.com

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved.

Objectives
 Upon completion of this course, you will be able to know:
 System architecture of HBase
 Key features of HBase
 Basic functions of HBase
 Huawei enhanced features of HBase

2. Functions and Architecture of HBase

3. Key Processes of HBase

4. Huawei Enhanced Features of HBase

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 3
HBase Overview
 HBase is a column-based distributed storage system that
features high reliability, performance, and scalability.
 HBase is suitable for storing big table data (which contains billions of rows
and millions of columns) and allows real-time data access.

 HBase uses HDFS as the file storage system to provide a distributed

column-oriented database system that allows real-time data reading and
writing.

 HBase uses ZooKeeper as the collaboration service.

HBase RDB

1. Distributed storage and

column-oriented. 1. Fixed data structure.
2. Dynamic extension of 2. Pre-defined data
columns. structure.
3. Supports common 3. I/O intensive and cost-
commercial hardware, consuming expansion.
lowering the expansion cost.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 5
Application Scenarios of HBase
 HBase applies to the following scenarios:
 Massive data (TB and PB)
 The Atomicity, Consistency, Isolation, Durability (ACID) feature supported
by traditional relational databases is not required.
 High throughput
 Efficient random reading of massive data
 High scalability
 Simultaneous processing of structured and unstructured data

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 6
Position of HBase in FusionInsight
Application service layer
OpenAPI/SDK REST/SNMP/Syslog

Data Information Knowledge Wisdom

DataFarm Porter Miner Farmer Manager
System
management
Hadoop API Plugin API
Service
governance
HIVE M/R Spark Storm Flink
Hadoop LibrA
YARN/ Zookeeper Security
management
HDFS/HBase

 HBase is a column-based distributed storage system that features high

reliability, performance, and scalability. It stores massive data and is designed
to eliminate limitations of relational databases in the processing of mass data.

ID Name Phone Address

 Data is stored by row in an underlying file system. Generally, a fixed amount

of space is allocated to each row.
 Advantages: Data can be added, modified, or read by row.
 Disadvantages: Some unnecessary data is obtained when data in a column is
queried.

ID Name Phone Address

 Data is stored by column in an underlying file system.

 Advantages: Data can be read or calculated by column.
 Disadvantages: When a row is read, multiple I/O operations may be
required.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 9
KeyValue Storage Model (1)
ID Name Phone Address

Key-01 Value-ID01 Key-01 Value-Name01

Key-01 Value-Phone01 Key-01 Value-Address01

 KeyValue has a specific structure. Key is used to quickly query a data record,
and Value is used to store user data.
 As a basic user data storage unit, KeyValue must store some description of
itself, such as timestamp and type information. This requires some structured
space.
 Data can be expanded dynamically, adaptive to changes of data types and
structures. Data is read and written by block. Different Columns are not
associated, so are tables.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 10
KeyValue Storage Model (2)
 Partition mode of a KeyValue Database - based on continuous Key range.

Region_01 Region_02 Region_05 Region_06 Region_09 Region_10

Region_03 Region_04 Region_07 Region_08 Region_11 Region_12

Node1 Node2 Node3

Region_01 Region_05 Region_02 Region_06 Region_03 Region_07

Region_09 Region_04 Region_10 Region_12 Region_11 Region_08

Data subregions are created based on the RowKey range (sorting based on a sorting
algorithm such as the alphabetic order based on RowKeys). Each subregion is a basic
distributed storage unit.

 The underlying data of HBase exists in the form of KeyValue. KeyValue has a
specific format.
 KeyValue contains key information such as timestamp and type, etc.
 The same key can be associated with multiple Values. Each KeyValue has a
qualifier.
 There can be multiple KeyValues associated with the same Key and Qualifier.
In this case, they are distinguished using timestamps. This is why there are
multiple versions of the same data record.

2. Functions and Architecture of HBase

3. Key Processes of HBase

4. Huawei Enhanced Features of HBase

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 14
HBase Architecture (2)
 Store: A Region consists of one or

multiple Stores. Each store corresponds

to a Column Family.

 MemStore: A Store contains one MemStore.

Data inserted to a Region by client is

cached to the MemStore.

 StoreFile: The data flushed to the HDFS is stored as a StoreFile in the HDFS.

 Hfile: HFile defines the storage format of StoreFiles in a file system. HFile is underlying
implementation of StoreFile.

 Hlog: HLogs prevent data loss when a RegionServer is faulty. Multiple Regions in a
RegionServer share the same HLog.

"Hey, Region A, please move to

RegionServer 1!"
“RegionServer 2 was gone! Let others take
over it’s Regions!"

RegionServer1 RegionServer2 RegionServer3

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 16
HMaster (2)
 The HMaster process manages all the RegionServers.
 Handles RegionServer failovers.

 The HMaster process performs cluster operations including creating,

modifying, and deleting tables.

 The HMaster process migrates Regions.

 Allocates Regions when a new table is created.

 Ensures load balancing during operation.

 Takes over Regions after a RegionServer failover occurs.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 17
RegionServer
 RegionServer is the data service
Region process of HBase and is responsible
RegionServer for processing reading and writing
requests of user data.

 RegionServer manages Regions. All

Region
reading and writing requests of user
data are handled based on interaction
among Regions on RegionServers.

Region  Regions can be migrated between

RegionServers.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 18
Region (1)
 A data table is divided horizontally into subtables based on the
KeyValue range to implement distributed storage. A subtable is called
a Region in HBase.
 Each Region is associated with a KeyValue range, which is described
using a StartKey and an EndKey.
 Each Region only needs to record a StartKey, because its EndKey serves as
the StartKey of the next Region.

 Region is the most basic distributed storage unit of HBase.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 19
Region (2)
Row001
Row001
Row002 Region-1
Row002 StartKey, EndKey
………..
……….. Row010
Row010
Row011
Row011 Row012 Region-2
Row012 ……….. StartKey, EndKey
……….. Row020
Row020 Row021
Row021 Row022 Region-3
Row022 ……….. StartKey, EndKey
……….. Row030
Row030 Row031
Row031 ……….. Region-4
……….. ……….. StartKey, EndKey
………..

Region Region Region Region Region

 Regions are categorized as Meta Region and User Region.

 Meta Region records routing information of User Regions.
 Perform the following steps to access data in a Region:
 Search for the address of the Meta Region.
 Search for the address of the User Regions in the Meta Region.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 21
Column Family
Region Region Region Region
/HBase/table
/region-1/ColumnFamily-1
/region-1/ColumnFamily-2

/region-2/ColumnFamily-1
/region-2/ColumnFamily-2
/HBase/table
/region-1 /region-3/ColumnFamily-1
/region-2 /region-3/ColumnFamily-2
/region-3
HDFS

 A ColumnFamily is a physical storage unit of a Region. Multiple column families of the

same Region have different paths in HDFS.
 ColumnFamily information is table-level configuration information. That is, multiple
Regions of the same table have the same column family information. (For example,
each Region has two column families and the configuration information of the same
column family of different Regions is the same.)

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 22
ZooKeeper
ZooKeeper provides the following functions for HBase:
 Distributed lock service
 Multiple HMaster processes will try registering a node in ZooKeeper and the node can be
registered only by one HMaster process. The process that successfully registers the node
becomes the active HMaster process.

 Event listening mechanism

 The active Hmaster’s record is deleted after the active process fails and the standby
processes will receive an update message which indicates the Active HMaster is down.

 Micro database roles

 ZooKeeper stores the addresses of RegionServers. In this case, ZooKeeper can be regarded
as a micro database.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 23
MetaData Table
User Table 1
 The MetaData Table HBase:Meta
stores the information about
Regions to locate the Specific
Region for Client.

 The MetaData Table is splitted User Table N

into multiple Regions，and

metadata information of Region is
stored in ZooKeeper.
Mapping relation
Metadata Table
User table

2. Functions and Architecture of HBase

3. Key Processes of HBase

4. Huawei Enhanced Features of HBase

Client

 The process of initiating a writing request by a client is like sending

books to a library by a book supplier. The book supplier must
determine to which building and floor the books should be sent.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 29
Writing Process - Grouping Data (2)
 Data groups includes two

division steps:
 Find the information of

region and regionserver

of tables based on the

meta table

 Transfer data to specific region according to rwokey

 Data on each RegionServer is sent at the same time. In this case, the
data has been divided by Region.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 30
Writing Process - Sending a Request to
a RegionServer
 Data is sent using the encapsulated RPC
framework of HBase.

 Operations of sending requests to multiple

RegionServers are implemented concurrently.

 After sending a data writing request, a client

waits for the request processing result.

 If the client does not capture any exception, it

deems that all data has been written successfully.
If writing the data fails completely or partially,
the client can obtain a detailed KeyValue list
relevant to the failure.

MemStore-2 HFile
（ColumnFamily-2）

 In either of the following scenarios, a Flush operation of Memstore is

triggered:
 The total usage of MemStore of a Region reaches the predefined Flush Size
threshold.
 The ratio of occupied memory to total memory of RegionServer reaches the
threshold.
 The number of WALs reaches the threshold.
 Memstore is updated every 1 hour by default.Hbase
 Users can flush a table or Region separately by a shell command.

As time passes by, the number of HFiles increases and a query request
will take much more time.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 34
Compaction (1)
 Compaction aims to reduce the number of small files in a column family in a
Region, thereby increasing reading performance.
 There are two kinds of compaction: major and minor.
 Minor: compaction covering a small range. Minimum and maximum numbers of
files are specified. Small files at a consecutive time duration are combined.
 Major: compaction covering the HFiles in a column family in a Region. During
major compaction, deleted data is cleared.

 Files are selected based on a certain algorithm during minor compaction.

Write
put MemStore
Flush

HFile HFile HFile HFile HFile HFile HFile

Minor Compaction

HFile HFile HFile

Major Compaction

HFile

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 36
Region Split
 A common Region splitting operation is
performed to split a Region into two subregions
if the data size of the Region exceeds the Parent
predefined threshold. Region

 During splitting, the split Region suspends

the reading and writing services. During
splitting, data files of the parent Region are
not split and rewritten to the two subregions.
Reference files are created in the new Region
to achieve quick splitting. Therefore, services
of the Region are suspended only for a short
time. DaughterRegion-2

 Routing information of the parent Region

DaughterRegion-1
cached in clients must be updated.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 38
Client Initiating a Data Reading Request
Get  When a precise key is provided, the
Get operation is performed to read a
single row of user data.

Scan  The Scan operation is to batch scan

user data of a specified Key range.
Client

Hi, META, I want to look for books whose code ranges is

from xxx to xxx, please find the bookshelf number and the
floor information within the code range.

 During the OpenScanner process, scanners corresponding to

MemStore and each HFile are created:
 The scanner corresponding to HFile is StoreFileScanner.

 The scanner corresponding to MemStore is MemStoreScanner.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 41
Filter
 Filter allows users to set filtering criteria during the Scan
 Satisfied Row
operation. Only user data that meets the criteria returns.
 There are some typical Filter types:
 Satisfied Row
 RowFilter
 SingleColumnValueFilter
 KeyOnlyFilter
 FilterList
 Satisfied Row

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 42
BloomFilter
 BloomFilter is used to optimize scenarios where data is randomly read, that is,
scenarios where the Get operation is performed. It can be used to quickly
check whether a piece of user data exists in a large dataset (most data in the
dataset cannot be loaded to the memory).

 A certain error rate exists when BloomFilter checks whether a piece of data
exits. Nevertheless, the conclusion indicated by the message "User data XXXX
does not exist" is accurate.

 The data relevant to BloomFilter of HBase is stored in HFiles.

2. Functions and Architecture of HBase

3. Key Processes of HBase

4. Huawei Enhanced Features of HBase

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 44
Supporting Secondary Index
 The secondary index enables HBase to query data based on specific column
values.
Column Family A Column Family B
RowKey A:Name A:Addr. A:Age B:Mobile B:Email
01 ZhangSan Beijing 23 6875349 ……
02 LiLei Hangzhou 43 6831475 ……
03 WangWu Shenzhen 35 6809568 ……
04 …… Wuhan 28 6812645 ……
05 …… Changsha 26 6889763 ……
06 …… Jinan 35 6854912 ……

When the secondary index is not used, the mobile field needs to be matched in the entire table by row
to search for specified mobile numbers such as ‘68XXX’ which results in long time delay.
When the secondary index is used, the index table is searched first to identify the location of the
mobile number, which narrows down the search scope and reduces the time delay.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 45
HFS
 HBase FileStream (HFS) is a separate module of Hbase. As an
encapsulation of Hbase and HDFS interfaces, HFS provides
capabilities, such as storing, reading and deleting files for
upper-level applications.

 HFS provides the ability of storing massive small files and large
files in HDFS。 That is, massive small files (less than 10MB) and
some large files (larger than 10MB) can be stored in HBase.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 46
HBase MOB (1)
 MOB Data（100KB to 10MB）is directly stored in the file
system (HDFS for example)as HFile. And the information about
address and size of file is stored in HBase as a value. With tools
managing these files, the frequency of compation and split
can be highly reduced, and performance can be improved.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 48
Summary
 This module describes the following information about HBase:
KeyValue Storage Model, technical architecture, reading and
writing process and enhanced features of FusionInsight HBase.

2. What are the advantages of the Region splitting of HBase?

A. Reducing the number of files in a column family and Region

B. Improving data reading performance

C. Reducing the number of files in a column family

D. Reducing the number of files in a Region

B. Column Family

C. Column

D. Cell

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 52
More Information
 Training materials:
 http://support.huawei.com/learning/Certificate!showCertificate?lang=en&pbiPath=term100002
5450&id=Node1000011796
 Exam outline:
 http://support.huawei.com/learning/Certificate!toExamOutlineDetail?lang=en&nodeId=Node10
00011797
 Mock exam:
 http://support.huawei.com/learning/Certificate!toSimExamDetail?lang=en&nodeId=Node10000
11798
 Authentication process:
 http://support.huawei.com/learning/NavigationAction!createNavi#navi[id]=_40

Chapter03HBase DistributedDatabase&Hive
No ratings yet
Chapter03HBase DistributedDatabase&Hive
54 pages
Cse 17CS82 M2 S4 PPT
No ratings yet
Cse 17CS82 M2 S4 PPT
19 pages
10 HBase
No ratings yet
10 HBase
13 pages
Hadoop HBASE
No ratings yet
Hadoop HBASE
71 pages
Hadoop Week 6
No ratings yet
Hadoop Week 6
38 pages
HBase
No ratings yet
HBase
39 pages
Lec 18
No ratings yet
Lec 18
21 pages
Lec 18
No ratings yet
Lec 18
18 pages
HBase
No ratings yet
HBase
31 pages
HBASE
No ratings yet
HBASE
18 pages
Chapter 4 HBase Technical Principles
No ratings yet
Chapter 4 HBase Technical Principles
50 pages
HBase (Unit 4)
No ratings yet
HBase (Unit 4)
37 pages
BDA Unit 5
No ratings yet
BDA Unit 5
33 pages
HBase - Tutorial
No ratings yet
HBase - Tutorial
14 pages
Large-Scale Data Management: Hbase
No ratings yet
Large-Scale Data Management: Hbase
36 pages
Unit 5 Lecture No-3 (Hbase)
No ratings yet
Unit 5 Lecture No-3 (Hbase)
35 pages
BDA Unit-5
No ratings yet
BDA Unit-5
31 pages
Unit 5 Big Data
No ratings yet
Unit 5 Big Data
34 pages
Unit 5 Lecture No-3 (Hbase)
No ratings yet
Unit 5 Lecture No-3 (Hbase)
35 pages
Bda - Unit 5
No ratings yet
Bda - Unit 5
30 pages
Big Data Analytics Unit-5
No ratings yet
Big Data Analytics Unit-5
28 pages
Hbase
No ratings yet
Hbase
23 pages
Big Data 22MSM40206
No ratings yet
Big Data 22MSM40206
9 pages
HBASE
No ratings yet
HBASE
11 pages
Hbase - Quick Guide Hbase - Overview
No ratings yet
Hbase - Quick Guide Hbase - Overview
53 pages
HBase
No ratings yet
HBase
6 pages
Hbase - in Detail: Pushpinder Singh Paxcel Technologies
No ratings yet
Hbase - in Detail: Pushpinder Singh Paxcel Technologies
32 pages
Hbase: Q) What Is Hbase ?
No ratings yet
Hbase: Q) What Is Hbase ?
15 pages
Unit 5 Hbase
No ratings yet
Unit 5 Hbase
15 pages
BDA Unit-4 Part-2 HBase, Hive, Pig
No ratings yet
BDA Unit-4 Part-2 HBase, Hive, Pig
74 pages
Unit 5 Bda
No ratings yet
Unit 5 Bda
42 pages
BDT Unit - V
No ratings yet
BDT Unit - V
15 pages
HBase
No ratings yet
HBase
27 pages
Lesson 6 NoSQL Databases HBase
100% (1)
Lesson 6 NoSQL Databases HBase
47 pages
Chapter 12 HBase
No ratings yet
Chapter 12 HBase
108 pages
HBase Presentation
No ratings yet
HBase Presentation
23 pages
Unit 5 BDA
No ratings yet
Unit 5 BDA
34 pages
Hbase
100% (1)
Hbase
30 pages
BDA1
No ratings yet
BDA1
42 pages
NoteGPT - What Is HBase - HBase Architecture - HBase Tutorial For Beginners - Hadoop Tutorial - Simplilearn
No ratings yet
NoteGPT - What Is HBase - HBase Architecture - HBase Tutorial For Beginners - Hadoop Tutorial - Simplilearn
5 pages
4 4HBase
No ratings yet
4 4HBase
17 pages
Cs525: Special Topics in DBS: Large-Scale Data Management
No ratings yet
Cs525: Special Topics in DBS: Large-Scale Data Management
35 pages
Unit - IV - Notes
No ratings yet
Unit - IV - Notes
23 pages
Hbase Big Table: Oriented vs. Column-Oriented Data Stores. As Shown Below, in A Row
No ratings yet
Hbase Big Table: Oriented vs. Column-Oriented Data Stores. As Shown Below, in A Row
6 pages
Unit - 5 Part - 1
No ratings yet
Unit - 5 Part - 1
8 pages
HBase
No ratings yet
HBase
38 pages
4.5 Hbase
No ratings yet
4.5 Hbase
27 pages
HBASE
No ratings yet
HBASE
18 pages
HBASE
No ratings yet
HBASE
35 pages
9 HBase
No ratings yet
9 HBase
77 pages
Hbase
No ratings yet
Hbase
3 pages
Assignment 10
No ratings yet
Assignment 10
9 pages
Unit V Hadoop Related Tools
No ratings yet
Unit V Hadoop Related Tools
54 pages
HBase
No ratings yet
HBase
4 pages
BDM Unit 5
No ratings yet
BDM Unit 5
60 pages
Apache HBase
No ratings yet
Apache HBase
12 pages
HBase Architecture PDF
No ratings yet
HBase Architecture PDF
32 pages
Big Data UNIT 5 Own
No ratings yet
Big Data UNIT 5 Own
18 pages
Big Data Analytics
From Everand
Big Data Analytics
Nitin Kumar Yadav
No ratings yet
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
From Everand
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
Wei Liu
No ratings yet
Appendix 4 Apply For HCAI Certificate Guide
No ratings yet
Appendix 4 Apply For HCAI Certificate Guide
8 pages
Huawei Academy Instructor Assessment Guide (Only For HCNA-cloud) - 20180416
No ratings yet
Huawei Academy Instructor Assessment Guide (Only For HCNA-cloud) - 20180416
8 pages
Module 10 Flume - Massive Logs Aggregation
No ratings yet
Module 10 Flume - Massive Logs Aggregation
42 pages
Module 11 Kafka - Distributed Message Subscription System
No ratings yet
Module 11 Kafka - Distributed Message Subscription System
34 pages
Module 13 FusionInsight HD Solution Overview
No ratings yet
Module 13 FusionInsight HD Solution Overview
57 pages
Module 12 Zookeeper - Cluster Distributed Coordination Service
No ratings yet
Module 12 Zookeeper - Cluster Distributed Coordination Service
26 pages
Module 01 Big Data Industry and Technological Trends
No ratings yet
Module 01 Big Data Industry and Technological Trends
50 pages
Module 07 Streaming - Distributed Stream Computing Engine
No ratings yet
Module 07 Streaming - Distributed Stream Computing Engine
33 pages
Module 08 Flink - Stream Processing and Batch Processing Platform
No ratings yet
Module 08 Flink - Stream Processing and Batch Processing Platform
40 pages
Semester 2 Final Exam PL SQL 2
No ratings yet
Semester 2 Final Exam PL SQL 2
10 pages
PL/SQL - Oracle's Procedural Language Extension To SQL
No ratings yet
PL/SQL - Oracle's Procedural Language Extension To SQL
32 pages
Lakshmi Sai Akhila Gade: Professional Summary
No ratings yet
Lakshmi Sai Akhila Gade: Professional Summary
4 pages
Using SQL Performance Analyzer
No ratings yet
Using SQL Performance Analyzer
20 pages
Chapter 3 - JDBC
No ratings yet
Chapter 3 - JDBC
37 pages
Teradata Vantage™ - Workload Management: User Guide
No ratings yet
Teradata Vantage™ - Workload Management: User Guide
115 pages
A Project Report On: Cable Network Management System
No ratings yet
A Project Report On: Cable Network Management System
39 pages
Create Database Quanlysinhvien
No ratings yet
Create Database Quanlysinhvien
7 pages
Database Managment System
No ratings yet
Database Managment System
2 pages
Unit-II (Data Analytics)
100% (1)
Unit-II (Data Analytics)
17 pages
Proactive Performance Monitoring Using Metric Extensions and SPA
No ratings yet
Proactive Performance Monitoring Using Metric Extensions and SPA
7 pages
Mongoose
No ratings yet
Mongoose
26 pages
Imp - QB RDBMS
No ratings yet
Imp - QB RDBMS
3 pages
How To Create MW Network Topology in Mapinfo
No ratings yet
How To Create MW Network Topology in Mapinfo
11 pages
Backing Up Applications With Networker Modules: Emc Proven Professional Knowledge Sharing 2009
No ratings yet
Backing Up Applications With Networker Modules: Emc Proven Professional Knowledge Sharing 2009
38 pages
Examples
No ratings yet
Examples
11 pages
Lab 7 JDBC 1 (Part 2) : Objectives
No ratings yet
Lab 7 JDBC 1 (Part 2) : Objectives
14 pages
RDBMS Unit1
No ratings yet
RDBMS Unit1
10 pages
Data Mining and Data Warehousing
No ratings yet
Data Mining and Data Warehousing
25 pages
Important MYSQL Commands
No ratings yet
Important MYSQL Commands
14 pages
Module-1 DBMS Student Vesion
No ratings yet
Module-1 DBMS Student Vesion
95 pages
Advanced Database Systems: Lab Material (Part I)
100% (5)
Advanced Database Systems: Lab Material (Part I)
21 pages
CIS Oracle Database 11g R2 Benchmark v2.2.0
No ratings yet
CIS Oracle Database 11g R2 Benchmark v2.2.0
161 pages
Playtube Document
No ratings yet
Playtube Document
16 pages
CS3492 DBMS Important 2 Mark With Answer
No ratings yet
CS3492 DBMS Important 2 Mark With Answer
16 pages
Project
No ratings yet
Project
17 pages
InstallationChecklist - Primtech - R16 - EN PDF
No ratings yet
InstallationChecklist - Primtech - R16 - EN PDF
7 pages
Web of Science As A Data Source For Research On SC
No ratings yet
Web of Science As A Data Source For Research On SC
15 pages
MS SQL Question Paper - Basic
No ratings yet
MS SQL Question Paper - Basic
2 pages
Documentum Content Server 6.7 DQL Reference
No ratings yet
Documentum Content Server 6.7 DQL Reference
415 pages

Module 05 HBase - Distributed NoSQL Database

Uploaded by

Module 05 HBase - Distributed NoSQL Database

Uploaded by

Technical Principles of

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved.

2. Functions and Architecture of HBase

3. Key Processes of HBase

4. Huawei Enhanced Features of HBase

 HBase uses HDFS as the file storage system to provide a distributed

 HBase uses ZooKeeper as the collaboration service.

1. Distributed storage and

Data Information Knowledge Wisdom

 HBase is a column-based distributed storage system that features high

ID Name Phone Address

 Data is stored by row in an underlying file system. Generally, a fixed amount

ID Name Phone Address

 Data is stored by column in an underlying file system.

Key-01 Value-ID01 Key-01 Value-Name01

Key-01 Value-Phone01 Key-01 Value-Address01

Region_01 Region_02 Region_05 Region_06 Region_09 Region_10

Region_03 Region_04 Region_07 Region_08 Region_11 Region_12

Node1 Node2 Node3

Region_01 Region_05 Region_02 Region_06 Region_03 Region_07

Region_09 Region_04 Region_10 Region_12 Region_11 Region_08

2. Functions and Architecture of HBase

3. Key Processes of HBase

4. Huawei Enhanced Features of HBase

multiple Stores. Each store corresponds

 MemStore: A Store contains one MemStore.

Data inserted to a Region by client is

cached to the MemStore.

"Hey, Region A, please move to

RegionServer1 RegionServer2 RegionServer3

 The HMaster process performs cluster operations including creating,

 The HMaster process migrates Regions.

 Ensures load balancing during operation.

 Takes over Regions after a RegionServer failover occurs.

 RegionServer manages Regions. All

Region  Regions can be migrated between

 Region is the most basic distributed storage unit of HBase.

Region Region Region Region Region

 Regions are categorized as Meta Region and User Region.

 A ColumnFamily is a physical storage unit of a Region. Multiple column families of the

 Event listening mechanism

 Micro database roles

 The MetaData Table is splitted User Table N

into multiple Regions，and

2. Functions and Architecture of HBase

3. Key Processes of HBase

4. Huawei Enhanced Features of HBase

 The process of initiating a writing request by a client is like sending

region and regionserver

of tables based on the

 Transfer data to specific region according to rwokey

 Operations of sending requests to multiple

 After sending a data writing request, a client

 If the client does not capture any exception, it

 In either of the following scenarios, a Flush operation of Memstore is

 Files are selected based on a certain algorithm during minor compaction.

HFile HFile HFile HFile HFile HFile HFile

HFile HFile HFile

 During splitting, the split Region suspends

 Routing information of the parent Region

Scan  The Scan operation is to batch scan

Hi, META, I want to look for books whose code ranges is

 During the OpenScanner process, scanners corresponding to

 The scanner corresponding to MemStore is MemStoreScanner.

 The data relevant to BloomFilter of HBase is stored in HFiles.

2. Functions and Architecture of HBase

3. Key Processes of HBase

4. Huawei Enhanced Features of HBase

2. What are the advantages of the Region splitting of HBase?

A. Reducing the number of files in a column family and Region

B. Improving data reading performance

C. Reducing the number of files in a column family

D. Reducing the number of files in a Region

You might also like