0% found this document useful (0 votes)

130 views31 pages

HBase

HBase is a distributed, column-oriented database that provides random access and real-time read/write capabilities for large datasets by running on top of Hadoop and its Hadoop Distributed File System, as Hadoop only allows sequential batch processing, while HBase allows for fast random reads and writes by using a distributed, scalable architecture with a master server and region servers along with Zookeeper for coordination. HBase stores data in tables comprised of rows and columns grouped into column families and uses key-value pairs to provide fast lookups and allows for easy scalability and failover.

Uploaded by

DムRK々 BLムDE

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

130 views31 pages

HBase

Uploaded by

DムRK々 BLムDE

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 31

HBase

Topics at a glance
⮚Hadoop and its data access limitations.
⮚Why Hbase?
⮚Hbase and its importance in Hadoop frame work.
⮚History and Architecture of Hbase.
⮚Hbase components and their responsibilities.
⮚Hbase data storage model.
⮚Advantages and disadvantages of Hbase.
⮚Conclusion of the session
Why Hbase???

With the evolution of the internet web application scope was increased -
⮚ Huge volumes of structured and semi-structured data started getting generated.
⮚ Semi-structured data (emails, JSON, XML, and .csv files and exe files)
⮚ Loads of semi-structured data was created across the globe.
⮚ So storing and processing of this data became a major challenge.
Hadoop and its limitations
⮚Hadoop can perform only batch processing, and data will be accessed only in
a sequential manner.
⮚So if needed to access any data randomly then need a new access
methodology.

✔New program tool added in Hadoop framework to provide the random

access to user.

▪ HBase
▪ Cassandra,
▪ CouchDB,
▪ Dynamo and MongoDB
HBase
• HBase is an open-source NoSQL database and Part of the Hadoop framework.
• Similar to Google’s big table. Initially, it was Google Big Table, afterward; it was
renamed as HBase .
• Hbase is primarily written in Java and needed for real-time Big Data
applications.
• HBase is a distributed column-oriented non-relational database management
system that runs on top of Hadoop Distributed File System (HDFS).
• HBase is a column-oriented database and the tables in it are sorted by row.
• The table schema defines only column families, which are the key value pairs.
• It uses log storage with the Write-Ahead Logs (WAL).
• It supports fast random access and heavy writing competency.

5
How is HBase different from other NoSQL models

• HBase stores data in the form of key/value pairs in a columnar model. In this
model, all the columns are grouped together as Column families.
• HBase on top of Hadoop will increase the throughput and performance of
distributed cluster set up.
• Provides faster random reads and writes operations.
Features of Hbase
• Horizontally scalable: Can add any number of columns anytime.
• Automatic Failover: Allows a system administrator to automatically switch
data handling to a standby system in the event of system compromise/failure.
• Integrations with Map/Reduce framework: Al the commands and java codes
internally implement Map/ Reduce to do the task and it is built over Hadoop
Distributed File System.
• It doesn't enforce relationships within your data.
• It is designed to run on a cluster of computers, built using commodity
hardware.
• HBase is built for low latency operations
History of HBase
• In Nov 2006, Google released the paper on BigTable.
• Feb 2007, Initial HBase prototype was created as a Hadoop
contribution.
• Oct 2007, The first usable HBase along with Hadoop 0.15.0 was
released.
• Jan 2008, HBase became the sub project of Hadoop.
• Oct 2008, HBase 0.18.1 was released.
• Jan 2009, HBase 0.19.0 was released.
• Sept 2009, HBase 0.20.0 was released.
• May 2010, HBase became Apache top-level project.
HBase existence in the Hadoop Ecosystem
HBase Table – To store data

10
HBase: Keys and Column Families

Each record is divided into Column Families

Each row has a Key

Each column family consists of one or more Columns

11
Example- Storage Mechanism in HBase

⮚HBase is a column-oriented database.

⮚Data is store in form of table.
HBase Architecture

❖Apache Zookeeper monitors the system.

❖HBase Master assigns regions and load balancing.
❖The Region server serves data to read and write.
❖The Region Server is all the different computers in the Hadoop cluster.
❖It consists of Region, HLog, Store, Memory Store, and different files.
❖All this is a part of the HDFS storage system.
HBase - Components

HBase has three major components -

1. Master servers. 2. Region server 3. Zookeeper
1. HMaster

• HMaster in HBase is the implementation of a Master server in HBase.

• It acts as a monitoring agent to monitor all Region Server instances present in the cluster and acts
as an interface for all the metadata changes.
• In a distributed cluster environment, Master runs on Name Node.
Responsibilities of a HMaster in HBase architecture

a. Coordinating the region servers as following -

Assigns Regions on startup.
Recovery and load balancing.
Monitors all RegionServer instances in the HBase Cluster.
b. Admin functions
When a client wants to change any schema or change in Metadata operations,
HMaster takes responsibility for these operations as follow-
• Table (create able, removeTable, enable, disable)
• ColumnFamily (add Column, modify Column)
• Region (move, assign)
2. Regions & Regions Server

⮚Table is split According to rowkey Scope

horizontal to several region. (Start to end key)
⮚After split rows are called region and these
Regions are assigned to certain nodes in the cluster
for management is called Region Server.
• They are responsible for processing data read and
write requests.
• Each Region Server can manage about 1000
regions.
⮚HRegion Server is the Region Server
implementation.
⮚Responsible for serving and managing regions or
data that is present in a distributed cluster.
⮚Region servers run on Data Nodes present in the
Hadoop cluster.
17
How Regions splits
Sr.No Personal Education_ Details Job details
Details
Emp_ Name Age Graduate Percent Company Designation
id age Name
1 Amit 25 Bsc 76 HCL Project Lead
2 Sumit 30 BTech 80 TCS Project Manager
3 Varsha 35 MTech 75 Wipro Project Engineer
HDFS

Region
Server
HDFS
HBase Regions & Regions Server ..

⮚When HBase Region Server receives writes and read requests from the client, it
assigns the request to a specific region, where the actual column family resides.
⮚Client can directly contact with HRegion servers, there is no need of HMaster
mandatory permission to the client regarding communication with HRegion servers.
⮚The client requires HMaster help when operations related to metadata and schema
changes are required.

❖HMaster can get into contact with multiple HRegion servers and performs the
following functions.

▪ Hosting and managing regions

▪ Splitting regions automatically
▪ Handling read and writes requests
▪ Communicating with the client directly
Region Server
⮚Region Server runs on HDFS DataNode and Responsible for processing data read and write requests.
⮚If Client required any data then he will directly interacts with Region Server.
⮚Regions are tables that are split up and spread across the region servers.

Components of Region server :

⮚WAL (Write Ahead Log) is a file on a

distributed file system for Storing new
data
⮚Block Cache - This is the read cache.
Memory The most frequently accessed
data is stored in the LRU (Least Current
Used) cache.
⮚MemStore - This is the write cache,
in Memory.
⮚Hfile -Store HBase data on hard disk
(HDFS). 20
3.Hbase- Zookeeper
• Hbase use Zookeeper to coordinate shared state information for members of
distributed systems.
• Active HMaster and Region servers, connects with a session to Zookeeper.
• For active sessions ZooKeeper maintains ephemeral nodes by using heartbeats.
• Zookeeper maintains which servers are healthily available and notifies them when
the server fails.
• Ephemeral nodes mean znodes which exist as long as the session which created the
znode is active and then znode is deleted when the session ends.
• Zookeeper uses a consistency protocol to ensure the consistency of the distributed
state.
• Each Region Server in HBase Architecture produces an ephemeral node. Further, to
discover available region servers and HMaster shall monitors these nodes.
• Active HMaster sends heartbeats to Zookeeper.
Working Process of Zookeeper

1. Active HMaster
2. Inactive Hmaster
❖ HBase META Table
• META Table is a special HBase Catalog Table. Basically, it holds the location of the
regions in the HBase Cluster.
• It keeps a list of all Regions in the system.
• Structure of the .META. table is as follows:
• Key: region start key, region id
• Values: RegionServer
Hbase Table parameters

• Tables: Data is stored in a table format in Hbase.

• Row Key: Row keys are used to search records which make searches
fast.
• Column Families: Various columns are combined in a column family.
These column families are stored together which makes the searching
process faster because data belonging to same column family can be
accessed together in a single seek.
• Column Qualifiers: Each column’s name is known as its column
qualifier.
• Cell: Data is stored in cells.
• Timestamp: Timestamp is a combination of date and time. Whenever
data is stored, it is stored with its timestamp.
HDFS vs. HBase

HDFS HBase

HDFS is a Java-based file system HBase is a Java based No-SQL

utilized for storing large data sets. database.

HDFS has a rigid architecture that HBase allows for dynamic

does not allow changes. It doesn’t changes and can be utilized for
facilitate dynamic storage. standalone applications.

HDFS is ideally suited for write- HBase is ideally suited for random
once and read-many times use write and read of data that is
cases stored in HDFS.
25
HBase - Read
• A Read against HBase must be
reconciled between the HFiles,
MemStore & BLOCKCACHE.
• The Block Cache is designed to
keep frequently accessed data
from the HFiles in memory so as
to avoid disk reads.
• Each column family has its own
Block Cache.
Block: It is the smallest indexed
unit of data and is the smallest
unit of data that can be read from
disk. default size 64KB.
Hbase - Write

When a write is
made, by default,
it goes into two
places:
⮚write-ahead log
(WAL), Hlog.
⮚in-memory
write buffer,
MemStore.
Advantages of HBase
❖ Hbase designed to store Denormalized Data.
❖ Hbase Supports Automatic Partitioning
❖ Strong consistency model– All readers will see same value, while a write returns.
❖ Scales automatically
– While data grows too large, Regions splits automatically.
– To spread and replicate data, it uses HDFS.
❖ Built-in recovery – It uses Write Ahead Log for recovery.
❖ Integrated with Hadoop
❖ Hbase is schema-less, no data model has been defined.
❖ Hbase has the ability to perform Random read and write operations.
❖ Hbase provides data replication across clusters for higher availability.
❖ Feature random access (internal hash table) to stores data in HDFS files for faster
lookups/searching.
Disadvantages of HBase

• Single point of failure - If HMaster goes down, complete cluster will be fail
and no work/task will be performed.
• Cannot perform functions like SQL and doesn’t support SQL structure.
• Does not contain any query optimizer
• Does not support for transaction.
• Business continuity reliability
– Write Ahead Log replay very slow.
– Also, a slow complex crash recovery.
• Joining and normalization is very difficult to perform.
• Very difficult to store large binary data.
Real Time Example of HBase-Facebook

How Facebook use Hbase to store user data

User Account Type Type of Posted Time Stamp Violating Last Login
Account ID (Personal/ Contents for community Activity
Business) Posted (Public/ standards Time of
Private) Account

Rahul3@fb Personal Image + Text Public 20/08/2022 No 05.30 PM

08.45.00.PM
ABZ@fb Business Text +Video Public 25/08/2022 No 08.30 PM
06.45.00.PM

Tarun@fb Personal Text +Video Public 25/08/2022 Yes 08.30 PM

06.45.00.PM

User Account Type of Action Account Account Remarks

Account Type Contents Required Suspended Blocked
ID (Personal/ Posted
Business)
Tarun@fb Personal Text +Video Yes For a period Yes Violating
Is a hated 2 week etc. Community
contents standard
Any Query??

Estimating Walmarts Cost of Capital
0% (1)
Estimating Walmarts Cost of Capital
6 pages
09 Bills of Exchange
100% (1)
09 Bills of Exchange
37 pages
Fae Courts
No ratings yet
Fae Courts
4 pages
Large-Scale Data Management: Hbase
No ratings yet
Large-Scale Data Management: Hbase
36 pages
Summative Test in English 10
100% (1)
Summative Test in English 10
2 pages
MCQ Type Questions
No ratings yet
MCQ Type Questions
24 pages
BDA Unit 5 HIVE HBASE
No ratings yet
BDA Unit 5 HIVE HBASE
33 pages
Unmanned Surface Vessel (Usv) Operations Manual and Code of Conduct
No ratings yet
Unmanned Surface Vessel (Usv) Operations Manual and Code of Conduct
11 pages
Fundamentals of Database Systems: Lesson 1: Introduction
No ratings yet
Fundamentals of Database Systems: Lesson 1: Introduction
35 pages
Chapter - 1 Introduction
No ratings yet
Chapter - 1 Introduction
22 pages
BDC Previous Papers 2 Marks
100% (1)
BDC Previous Papers 2 Marks
7 pages
The Seventh Seal.29072326
No ratings yet
The Seventh Seal.29072326
89 pages
SKP Engineering College: A Course Material On
No ratings yet
SKP Engineering College: A Course Material On
212 pages
DBM Presentation - 23rd Annual Convention-Janet B. Abuel PDF
No ratings yet
DBM Presentation - 23rd Annual Convention-Janet B. Abuel PDF
19 pages
Multiple Choice Questions: Principles of Database Management
No ratings yet
Multiple Choice Questions: Principles of Database Management
8 pages
Cyber Security IMP Points Short Notes
No ratings yet
Cyber Security IMP Points Short Notes
20 pages
Legal Medicine Case Digest For Finals
100% (2)
Legal Medicine Case Digest For Finals
4 pages
The Social Festivals of Bangladesh. Id - 1521653030, SL-11, Sec-9
No ratings yet
The Social Festivals of Bangladesh. Id - 1521653030, SL-11, Sec-9
16 pages
Hadoop Module 3.2
100% (1)
Hadoop Module 3.2
57 pages
Hadoop I/O: Jaeyong Choi
No ratings yet
Hadoop I/O: Jaeyong Choi
36 pages
Pythonic Data Cleaning With Numpy and Pandas
No ratings yet
Pythonic Data Cleaning With Numpy and Pandas
11 pages
Adbms Data Warehousing and Data Mining
No ratings yet
Adbms Data Warehousing and Data Mining
169 pages
Diss DLL 2.1
No ratings yet
Diss DLL 2.1
4 pages
Hadoop Unit-4
No ratings yet
Hadoop Unit-4
44 pages
MCA - BigData Notes
No ratings yet
MCA - BigData Notes
136 pages
SQL Questions
No ratings yet
SQL Questions
4 pages
B+ Tree & B Tree
No ratings yet
B+ Tree & B Tree
38 pages
0-Daftar Penerima Sertifikat
No ratings yet
0-Daftar Penerima Sertifikat
32 pages
Data Warehouse Concepts
No ratings yet
Data Warehouse Concepts
11 pages
Etl VS Elt
No ratings yet
Etl VS Elt
8 pages
Sample Paper Q0503
No ratings yet
Sample Paper Q0503
20 pages
Introduction To MapReduce
No ratings yet
Introduction To MapReduce
43 pages
Checkpoint: Exam 156-215.80
No ratings yet
Checkpoint: Exam 156-215.80
210 pages
Data Warehouse Concepts
No ratings yet
Data Warehouse Concepts
68 pages
Mining Data Streams
No ratings yet
Mining Data Streams
67 pages
Dbms Quiz
No ratings yet
Dbms Quiz
13 pages
Big Data
No ratings yet
Big Data
22 pages
DBMS Unit 1 Notes
100% (1)
DBMS Unit 1 Notes
22 pages
CS302 Unit1-III
No ratings yet
CS302 Unit1-III
18 pages
Oltp VS Olap
100% (1)
Oltp VS Olap
9 pages
28.InnInnocent Owners Insuranceocent Owners Interest
No ratings yet
28.InnInnocent Owners Insuranceocent Owners Interest
1 page
1 Peter 5:7 "Cast All Your Worries and Cares To God, For He Cares About You"
No ratings yet
1 Peter 5:7 "Cast All Your Worries and Cares To God, For He Cares About You"
2 pages
04 Bigdata Hive
No ratings yet
04 Bigdata Hive
22 pages
DM FBI Search # 70 - Gov Docs Unsealed - D.nev. - 3-06-Cv-00263
No ratings yet
DM FBI Search # 70 - Gov Docs Unsealed - D.nev. - 3-06-Cv-00263
139 pages
Chap02 - Transportation Systems and Organization
No ratings yet
Chap02 - Transportation Systems and Organization
56 pages
Map Reduce
No ratings yet
Map Reduce
40 pages
Today: in This Issue AAOMS Provides OMSF $120,000 To Support Clinical Surgery Fellowships
No ratings yet
Today: in This Issue AAOMS Provides OMSF $120,000 To Support Clinical Surgery Fellowships
18 pages
Data Mining Unit-IV
No ratings yet
Data Mining Unit-IV
37 pages
BDA Unit 5 Notes BDA Unit 5 Notes: Big Data Analytics (Anna University) Big Data Analytics (Anna University)
No ratings yet
BDA Unit 5 Notes BDA Unit 5 Notes: Big Data Analytics (Anna University) Big Data Analytics (Anna University)
20 pages
Hadoop Commands Cheat Sheet
No ratings yet
Hadoop Commands Cheat Sheet
1 page
Tuning SQL Queries - Oracle
100% (1)
Tuning SQL Queries - Oracle
27 pages
Town Planning & Urban Management: Part-Ii
No ratings yet
Town Planning & Urban Management: Part-Ii
1 page
Easa Ad 2015-0119 1
No ratings yet
Easa Ad 2015-0119 1
5 pages
Pay Increase Letter To Employee
100% (1)
Pay Increase Letter To Employee
2 pages
Apache Sqoop
No ratings yet
Apache Sqoop
21 pages
Unit 2 BDA
No ratings yet
Unit 2 BDA
32 pages
Query Optimization
No ratings yet
Query Optimization
9 pages
SnowFlake Course Brochure FINAL
No ratings yet
SnowFlake Course Brochure FINAL
7 pages
CCS334 BIG DATA ANALYTICS Session 1 Intr
No ratings yet
CCS334 BIG DATA ANALYTICS Session 1 Intr
18 pages
Big Data Assignment PDF
No ratings yet
Big Data Assignment PDF
18 pages
Distributed Database: GDC Thana Semester 6
No ratings yet
Distributed Database: GDC Thana Semester 6
10 pages
Unit V Big Data Analytics
No ratings yet
Unit V Big Data Analytics
47 pages
NMMS Ut-3 em (20.10.2023)
No ratings yet
NMMS Ut-3 em (20.10.2023)
4 pages
Chapter 5 Additional Self Test
No ratings yet
Chapter 5 Additional Self Test
5 pages
File Formats in Big Data
No ratings yet
File Formats in Big Data
13 pages
SQL01 - Introduction To Business Intelligence
No ratings yet
SQL01 - Introduction To Business Intelligence
75 pages
To Whom So Ever It May Concern
No ratings yet
To Whom So Ever It May Concern
13 pages
Databricks Performance Tuning
No ratings yet
Databricks Performance Tuning
9 pages
Case Digest - UNLAD RESOURCES DEVELOPMENT CORPORATION v. RENATO P. DRAGON PDF
No ratings yet
Case Digest - UNLAD RESOURCES DEVELOPMENT CORPORATION v. RENATO P. DRAGON PDF
2 pages
Unit 1 Bda Complete Notes
No ratings yet
Unit 1 Bda Complete Notes
15 pages
Solving Problems With 2 or More Operations
No ratings yet
Solving Problems With 2 or More Operations
1 page
Big Data Analytics - Unit 4
No ratings yet
Big Data Analytics - Unit 4
32 pages
Software Testing Methodologcompletenotes
No ratings yet
Software Testing Methodologcompletenotes
147 pages
Technical Interview Questions For Freshers - With Answers (2024)
No ratings yet
Technical Interview Questions For Freshers - With Answers (2024)
7 pages
BDA Unit2 Complete
No ratings yet
BDA Unit2 Complete
56 pages
Pig Hive
No ratings yet
Pig Hive
72 pages
Unit I - Data Science
No ratings yet
Unit I - Data Science
161 pages
Bigdata Unit II
No ratings yet
Bigdata Unit II
19 pages
BDA Unit - II
No ratings yet
BDA Unit - II
66 pages
Data Storage Technologies and Networks
No ratings yet
Data Storage Technologies and Networks
7 pages
To Study An Impact of Online Advertising On Consumer Buyung Behavior
No ratings yet
To Study An Impact of Online Advertising On Consumer Buyung Behavior
13 pages
A Building Brands Together
No ratings yet
A Building Brands Together
22 pages
Fp230 Cofides Kuali Fund GCF
No ratings yet
Fp230 Cofides Kuali Fund GCF
112 pages
Marketing Strategy of Phuc Long
No ratings yet
Marketing Strategy of Phuc Long
23 pages
2.04A Create An Investment Plan Using AI As A Thought Partner
No ratings yet
2.04A Create An Investment Plan Using AI As A Thought Partner
3 pages
Software Asset Management: What Is It and Why Do I Need It?: A Textbook on the Fundamentals in Software License Compliance, Audit Risks, Optimizing Software License ROI, Business Practices and Life Cycle Management
From Everand
Software Asset Management: What Is It and Why Do I Need It?: A Textbook on the Fundamentals in Software License Compliance, Audit Risks, Optimizing Software License ROI, Business Practices and Life Cycle Management
Carl A. Bolton
No ratings yet
HDInsight Essentials - Second Edition
From Everand
HDInsight Essentials - Second Edition
Rajesh Nadipalli
No ratings yet
Ultimate AWS Certified Solutions Architect Associate Exam Guide: Master Designing Resilient, Scalable Architectures with Core and Advanced AWS Services to Crack the SAA-C03 Certification (English Edition)
From Everand
Ultimate AWS Certified Solutions Architect Associate Exam Guide: Master Designing Resilient, Scalable Architectures with Core and Advanced AWS Services to Crack the SAA-C03 Certification (English Edition)
Venkata Sasi Kanumuri
No ratings yet
Data Engineering with Scala and Spark: Build streaming and batch pipelines that process massive amounts of data using Scala
From Everand
Data Engineering with Scala and Spark: Build streaming and batch pipelines that process massive amounts of data using Scala
Eric Tome
No ratings yet
Ultimate Salesforce Data Cloud for Customer Experience: Explore, Implement and Elevate B2C Experiences Through Customer Data Innovations Using Salesforce Data Cloud
From Everand
Ultimate Salesforce Data Cloud for Customer Experience: Explore, Implement and Elevate B2C Experiences Through Customer Data Innovations Using Salesforce Data Cloud
Gourab Mukherjee
No ratings yet

HBase

Uploaded by

HBase

Uploaded by

HBase

✔New program tool added in Hadoop framework to provide the random

Each record is divided into Column Families

Each row has a Key

Each column family consists of one or more Columns

⮚HBase is a column-oriented database.

❖Apache Zookeeper monitors the system.

HBase has three major components -

• HMaster in HBase is the implementation of a Master server in HBase.

a. Coordinating the region servers as following -

⮚Table is split According to rowkey Scope

▪ Hosting and managing regions

Components of Region server :

⮚WAL (Write Ahead Log) is a file on a

• Tables: Data is stored in a table format in Hbase.

HDFS is a Java-based file system HBase is a Java based No-SQL

HDFS has a rigid architecture that HBase allows for dynamic

How Facebook use Hbase to store user data

Rahul3@fb Personal Image + Text Public 20/08/2022 No 05.30 PM

Tarun@fb Personal Text +Video Public 25/08/2022 Yes 08.30 PM

User Account Type of Action Account Account Remarks

You might also like