10 HBase

HBase is a distributed, column-oriented database built on top of the Hadoop file system, designed for quick random access to large amounts of structured data. It offers features such as linear scalability, automatic failure support, and integration with Hadoop, contrasting with traditional RDBMS which are schema-based and transactional. HBase architecture includes a master server, region servers, and uses Zookeeper for configuration management and server availability tracking.

Uploaded by

kethinithejas22112004

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views13 pages

10 HBase

Uploaded by

kethinithejas22112004

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 13

HBase

1
Limitations of Hadoop
What is HBase
Hbase vs HDFS
Storage Mechanism in Hbase

HBase Column Oriented and Row Oriented

Features of Hbase
Hbase vs RDBMS
HBase Data Model
HBase Architecture and its Important Components
HBase Commands

2
Disclaimer: Content Present in this PPT has © Copyright IBM Corp.
HBase
Limitations of Hadoop
• Hadoop can perform batch processing, and data will be accessed only in a sequential manner. That means one has to search the
entire dataset even for the simplest of jobs.
• A huge dataset when processed results in another huge data set, which should also be processed sequentially. At this point, a new
solution is needed to access any point of data in a single unit of time (random access).

What is HBase
• Since 1970, RDBMS is the solution for data storage and maintenance related problems. After the advent of big data, companies
realized the benefit of processing big data and started opting for solutions like Hadoop.
• Hadoop uses distributed file system for storing big data, and MapReduce to process it. Hadoop excels in storing and processing of
huge data of various formats such as semi-, or even unstructured.
• HBase is a distributed column-oriented database built on top of the Hadoop file system. It is an open-source project and is
horizontally scalable.
• HBase is a data model that is similar to Google’s big table designed to provide quick random access to huge amounts of structured
data. It leverages the fault tolerance provided by the Hadoop File System (HDFS).
• It is a part of the Hadoop ecosystem that provides random real-time read/write access to data in the Hadoop File System.
• One can store the data in HDFS either directly or through HBase. Data consumer reads/accesses the data in HDFS randomly using
HBase. HBase sits on top of the Hadoop File System and provides read and write access.

3
HBase
HBase vs HDFS
• HDFS • HBase
• HDFS is a distributed file system suitable for storing • HBase is a database built on top of the Hadoop file System.
large files. • HBase provides fast lookups for larger tables.
• HDFS does not support fast individual record lookups. • no concept of batch processing.It provides low latency access
• It provides high latency batch processing to single rows from billions of records (Random access).
• It provides only sequential access of data. • HBase internally uses Hash tables and provides random
access, and it stores the data in indexed HDFS files for faster
lookups.
Storage Mechanism in Hbase
• HBase is a column-oriented database and the tables in it are sorted by row. The table schema defines only column families, which
are the key value pairs. A table have multiple column families and each column family can have any number of columns.
Subsequent column values are stored contiguously on the disk. Each cell value of the table has a timestamp. In short, in an HBase:
• Table is a collection of rows.
• Row is a collection of column families.
• Column family is a collection of columns.
• Column is a collection of key value pairs.

4
HBase
Column Oriented and Row Oriented
• Column-oriented databases are those that store data tables as sections of columns of data, rather than as rows of data. In short,
they will have column families.
• Row-Oriented Database, It is suitable for Online Transaction Process (OLTP).Such databases are designed for small number of rows
and columns.
• Column-oriented databases are designed for huge tables.

Features of HBase
• HBase is linearly scalable.
• It has automatic failure support.
• It provides consistent read and writes.
• It integrates with Hadoop, both as a source and a destination.
• It provides data replication across clusters

5
HBase
HBase vs RDBMS
• HBase • RDBMS
• HBase is schema-less, it doesn't have the concept of • An RDBMS is governed by its schema, which describes the
fixed columns schema; defines only column families. whole structure of tables.
• It is built for wide tables. HBase is horizontally scalable. • It is thin and built for small tables. Hard to scale.
• No transactions are there in HBase. • RDBMS is transactional.
• It has de-normalized data. • It will have normalized data.
• It is good for semi-structured as well as structured data. • It is good for structured data.

HBase Data Model

• HBase Data Model consists of following elements,
• Set of tables
• Each table with column families and rows
• Each table must have an element defined as Primary Key.
• Row key acts as a Primary key in HBase.
• Any access to HBase tables uses this Primary Key
• Each column present in HBase denotes attribute corresponding to object
• In HBase, tables are split into regions and are served by the region servers. Regions are vertically divided by column families into
“Stores”. Stores are saved as files in HDFS. Shown below is the architecture of HBase.
• Note: The term ‘store’ is used for regions to explain the storage structure.

6
HBase
HBase Architecture and its Important Components
HBase has three major components: the client library, a
master server, and region servers. Region servers can be
added or removed as per requirement.

MasterServer
The master server -
• Assigns regions to the region servers and takes the help
of Apache ZooKeeper for this task.
• Handles load balancing of the regions across region
servers. It unloads the busy servers and shifts the regions
to less occupied servers.
• Maintains the state of the cluster by negotiating the load
balancing.
• Is responsible for schema changes and other metadata
operations such as creation of tables and column
families.

7
HBase
HBase Architecture and its Important Components
Regions
• Regions are nothing but tables that are split up and
spread across the region servers.
• The region servers have regions that -
• Communicate with the client and handle data-related
operations.
• Handle read and write requests for all the regions under
it.
• Decide the size of the region by following the region size
thresholds.
• The store contains memory store and HFiles. Memstore
is just like a cache memory. Anything that is entered into
the HBase is stored here initially. Later, the data is
transferred and saved in Hfiles as blocks and the
memstore is flushed.

8
HBase
HBase Architecture and its Important Components
Zookeeper
• Zookeeper is an open-source project that provides
services like maintaining configuration information,
naming, providing distributed synchronization, etc.
• Zookeeper has ephemeral nodes representing different
region servers. Master servers use these nodes to
discover available servers.
• In addition to availability, the nodes are also used to
track server failures or network partitions.
• Clients communicate with region servers via zookeeper.
• In pseudo and standalone modes, HBase itself will take
care of zookeeper.

9
HBase
Commands
Command to create Table in HBase
hbase(main):001:0> create 'emp', 'personal data', 'professional data‘

To list all the tables

hbase(main):001:0> list
Check Status
hbase(main):009:0> status
Chek Version
hbase(main):010:0> version

General Commands
• status - Provides the status of HBase, for example, the number of servers.
• version - Provides the version of HBase being used.
• table_help - Provides help for table-reference commands.
• whoami - Provides information about the user.

10
HBase
Data Definition Language Commands Data Manipulation Language Commands
These are the commands that operate on the tables in • put - Puts a cell value at a specified column in a specified
HBase. row in a particular table.
• create - Creates a table. • get - Fetches the contents of row or a cell.
• list - Lists all the tables in HBase. • delete - Deletes a cell value in a table.
• disable - Disables a table. • deleteall - Deletes all the cells in a given row.
• is_disabled - Verifies whether a table is disabled. • scan - Scans and returns the table data.
• enable - Enables a table. • count - Counts and returns the number of rows in a
• is_enabled - Verifies whether a table is enabled. table.
• describe - Provides the description of a table. • truncate - Disables, drops, and recreates a specified
• alter - Alters a table. table.
• exists - Verifies whether a table exists.
• drop - Drops a table from HBase.
• drop_all - Drops the tables matching the ‘regex’ given in
the command.

11
Reference

• Introduction to Big Data Ecosystem Skills Academy: Data Science Technologist (IBM)
• Hadoop Definitive Guide Author: Tom White Publisher: O’Reilly Media
• Hadoop Real-world Solutions Author: Brian Femiano, Jon Lentz, Jonathan Owens,
• https://developer.ibm.com/tutorials/dm-1209hadoopbigdata/
• https://www.udacity.com/
• https://httpd.apache.org/ (Guide Corner)
• https://cognitiveclass.ai/learn/hadoop/
• https://www.edureka.co/blog/
• https://www.ibm.com/training/in
• https://www.ibm.com/cloud/blog
THANK YOU

UNIT 5 Notes
No ratings yet
UNIT 5 Notes
47 pages
Lesson 6 NoSQL Databases HBase
100% (1)
Lesson 6 NoSQL Databases HBase
47 pages
9 HBase
No ratings yet
9 HBase
77 pages
BDA Unit-4 Part-2 HBase, Hive, Pig
No ratings yet
BDA Unit-4 Part-2 HBase, Hive, Pig
74 pages
Chapter 12 HBase
No ratings yet
Chapter 12 HBase
108 pages
Hadoop HBASE
No ratings yet
Hadoop HBASE
71 pages
Hadoop Week 6
No ratings yet
Hadoop Week 6
38 pages
HBase (Unit 4)
No ratings yet
HBase (Unit 4)
37 pages
HBASE
No ratings yet
HBASE
35 pages
DBMS Unit3
No ratings yet
DBMS Unit3
28 pages
Hbase - in Detail: Pushpinder Singh Paxcel Technologies
No ratings yet
Hbase - in Detail: Pushpinder Singh Paxcel Technologies
32 pages
UNIT5
No ratings yet
UNIT5
42 pages
Big Data Analytics Unit-5
No ratings yet
Big Data Analytics Unit-5
28 pages
Big Data Unit 5
No ratings yet
Big Data Unit 5
18 pages
Hbase
No ratings yet
Hbase
23 pages
HBase
No ratings yet
HBase
27 pages
HBase Presentation
No ratings yet
HBase Presentation
23 pages
4 4HBase
No ratings yet
4 4HBase
17 pages
Unit 5 Big Data
No ratings yet
Unit 5 Big Data
34 pages
Hbase
No ratings yet
Hbase
15 pages
Hbase: Q) What Is Hbase ?
No ratings yet
Hbase: Q) What Is Hbase ?
15 pages
Unit V Hadoop Related Tools
No ratings yet
Unit V Hadoop Related Tools
54 pages
Lec 18
No ratings yet
Lec 18
18 pages
Cse 17CS82 M2 S4 PPT
No ratings yet
Cse 17CS82 M2 S4 PPT
19 pages
Unit 5 Hbase
No ratings yet
Unit 5 Hbase
15 pages
Apache HBase
No ratings yet
Apache HBase
12 pages
Bda - Unit 5
No ratings yet
Bda - Unit 5
30 pages
Assignment 10
No ratings yet
Assignment 10
9 pages
Lec 18
No ratings yet
Lec 18
21 pages
Columnar Database
No ratings yet
Columnar Database
18 pages
Big Data 22MSM40206
No ratings yet
Big Data 22MSM40206
9 pages
HBASE
No ratings yet
HBASE
11 pages
Hbase
No ratings yet
Hbase
3 pages
BDA Unit 5
No ratings yet
BDA Unit 5
33 pages
BDA Unit-5
No ratings yet
BDA Unit-5
31 pages
HBase - Tutorial
No ratings yet
HBase - Tutorial
14 pages
Wa0005.
No ratings yet
Wa0005.
53 pages
Hbase Big Table: Oriented vs. Column-Oriented Data Stores. As Shown Below, in A Row
No ratings yet
Hbase Big Table: Oriented vs. Column-Oriented Data Stores. As Shown Below, in A Row
6 pages
BDT Unit - V
No ratings yet
BDT Unit - V
15 pages
Big Data UNIT 5 Own
No ratings yet
Big Data UNIT 5 Own
18 pages
BDA Unit 5 HIVE HBASE
No ratings yet
BDA Unit 5 HIVE HBASE
33 pages
Hbase - Quick Guide Hbase - Overview
No ratings yet
Hbase - Quick Guide Hbase - Overview
53 pages
Unit - 5 Part - 1
No ratings yet
Unit - 5 Part - 1
8 pages
Unit 5 Lecture No-3 (Hbase)
No ratings yet
Unit 5 Lecture No-3 (Hbase)
35 pages
HBase
No ratings yet
HBase
6 pages
HBase
No ratings yet
HBase
39 pages
Unit 5 Lecture No-3 (Hbase)
No ratings yet
Unit 5 Lecture No-3 (Hbase)
35 pages
Unit 5 BDA
No ratings yet
Unit 5 BDA
34 pages
HBASE
No ratings yet
HBASE
18 pages
Unit - IV - Notes
No ratings yet
Unit - IV - Notes
23 pages
Unit 5 Bda
No ratings yet
Unit 5 Bda
42 pages
Unit 3
No ratings yet
Unit 3
15 pages
Hbase in Practice
No ratings yet
Hbase in Practice
46 pages
4.5 Hbase
No ratings yet
4.5 Hbase
27 pages
Large-Scale Data Management: Hbase
No ratings yet
Large-Scale Data Management: Hbase
36 pages
HBase
No ratings yet
HBase
31 pages
HBase
No ratings yet
HBase
4 pages
NoteGPT - What Is HBase - HBase Architecture - HBase Tutorial For Beginners - Hadoop Tutorial - Simplilearn
No ratings yet
NoteGPT - What Is HBase - HBase Architecture - HBase Tutorial For Beginners - Hadoop Tutorial - Simplilearn
5 pages
HBase Architecture
No ratings yet
HBase Architecture
1 page
Blockchain Unit 1
No ratings yet
Blockchain Unit 1
13 pages
Ipl Team Management
No ratings yet
Ipl Team Management
18 pages
XIIComp - Sc. 15 Sample Papers
100% (1)
XIIComp - Sc. 15 Sample Papers
144 pages
Spark - groupByKey Vs reduceByKey
No ratings yet
Spark - groupByKey Vs reduceByKey
3 pages
Tableau Assessment
No ratings yet
Tableau Assessment
17 pages
Veeam Backup 10 0 Agent Management Guide
No ratings yet
Veeam Backup 10 0 Agent Management Guide
308 pages
ERRO
No ratings yet
ERRO
184 pages
Manage Dbms Cat Term 1 2024
No ratings yet
Manage Dbms Cat Term 1 2024
2 pages
Microsoft Access 2013 Programming by Example With Vba XML and Asp 1st Edition Julitta Korol PDF Download
No ratings yet
Microsoft Access 2013 Programming by Example With Vba XML and Asp 1st Edition Julitta Korol PDF Download
85 pages
ATP To ADW Replication
No ratings yet
ATP To ADW Replication
53 pages
Weka: A Tool For Data Preprocessing, Classification, Ensemble, Clustering and Association Rule Mining
No ratings yet
Weka: A Tool For Data Preprocessing, Classification, Ensemble, Clustering and Association Rule Mining
4 pages
CSE5003 - DAT ABA Se Syste MS: DES IGN A ND I M PLE Ment Atio N L, T, P, J, C 2,0,2,4,4
No ratings yet
CSE5003 - DAT ABA Se Syste MS: DES IGN A ND I M PLE Ment Atio N L, T, P, J, C 2,0,2,4,4
9 pages
Chapter 3
No ratings yet
Chapter 3
61 pages
Extensibility
No ratings yet
Extensibility
45 pages
Ans Key - First Model-Dec2022
No ratings yet
Ans Key - First Model-Dec2022
4 pages
QB Mod 1
No ratings yet
QB Mod 1
3 pages
Vulnerability Playbook
No ratings yet
Vulnerability Playbook
14 pages
Modified Rkr21-Ii Year Ii-Sem
No ratings yet
Modified Rkr21-Ii Year Ii-Sem
26 pages
Sri Harsha Java - Dev 2
No ratings yet
Sri Harsha Java - Dev 2
4 pages
The Complete Servicenow System Administrator Course: Section 5 - Tables & Fields
No ratings yet
The Complete Servicenow System Administrator Course: Section 5 - Tables & Fields
23 pages
Pa2 Activity Ganavi
No ratings yet
Pa2 Activity Ganavi
13 pages
Design - and - Implementation - of - Simple - Interactive - e - (1) SYSTEM
No ratings yet
Design - and - Implementation - of - Simple - Interactive - e - (1) SYSTEM
5 pages
W2L4 - DBMS Concepts
No ratings yet
W2L4 - DBMS Concepts
29 pages
Search-Based User Interest Modeling With Lifelong Sequential Behavior Data For Click-Through Rate Prediction
No ratings yet
Search-Based User Interest Modeling With Lifelong Sequential Behavior Data For Click-Through Rate Prediction
8 pages
Miriyala Harika
No ratings yet
Miriyala Harika
2 pages
Timescaledb: SQL Made Scalable For Time-Series Data: 1 Background
No ratings yet
Timescaledb: SQL Made Scalable For Time-Series Data: 1 Background
7 pages
Life Cycle of A MongoDB Query
No ratings yet
Life Cycle of A MongoDB Query
5 pages
OCS and Desktop Communication Protocol
No ratings yet
OCS and Desktop Communication Protocol
92 pages
Ais Module 3
No ratings yet
Ais Module 3
13 pages
DW&M Unit - 1-Imp Vii Sem
No ratings yet
DW&M Unit - 1-Imp Vii Sem
9 pages

10 HBase

Uploaded by

10 HBase

Uploaded by

HBase

HBase Column Oriented and Row Oriented

HBase Data Model

To list all the tables

You might also like