Hbase
Hbase
HBase 2
HBase
• HBase is a distributed column-oriented database built
on top of HDFS.
• HBase is the Hadoop application to use
• when users require real-time read/write random-
access
• to very large datasets.
• HBase can store massive amounts of data from
terabytes to petabytes.
• It is column oriented and horizontally scalable.
HBase 3
HBase..
• There are numerous strategies and implementations
available for database storage and retrieval,
• however, they are not designed for large and
distributed datasets.
• HBase is able to host very large, sparsely populated
tables on different clusters made from commodity
hardware.
• Unlike relational database systems,
• HBase does not support a structured query
language like SQL;
• hence, HBase does not store relational data.
HBase 4
Difference between HBase and
RDBMS
RDBMS HBase
RDBMS is mostly row oriented HBase is column oriented
RDBMS has fixed schema HBase facilitates addition of
columns during run time.
RDBMS is suitable for structured HBase is suitable for structured as
data well as semi-structured data
RDBMS is optimized for joins HBase is not optimized for joins
RDBMS can store only normalized HBase can have denormalized
data data
• HBase is a NoSQL, column-oriented database that is
built on top of the Hadoop ecosystem.
• It is designed to provide low-latency, high-throughput
5
access to large-scale, distributed datasets.
HBase
• HBase applications are written in Java
• HBase does support writing applications in Apache
Hadoop Project
• such as in Avro, REST and Thrift.
• HBase system is designed to scale linearly.
• HBase comprises a set of standard tables with rows and
columns, much like a traditional database.
• Each table must have an element defined as a
primary key, and all access attempts to HBase tables
must use this primary key.
HBase 6
Applications of HBase
1. Real-time Analytics
• HBase is helpful in applications that focus on real-
time analytics
• as it provides low-latency data access.
• It provides fast read and write performance and
• can handle large amounts of data, making it
suitable for real-time data analysis.
HBase 7
Applications of HBase
2. Social Media Applications
• HBase is an ideal database for social media
applications
• that require high scalability and performance.
• It can handle the large volume of data generated
by social media platforms
• and provide real-time analytics capabilities.
3. IoT Applications
• HBase is scalable in nature and provides fast write
performance,
• It is helpful in IoT applications that require low-
latency data processing. 8
HBase
Applications of HBase
4. Online Transaction Processing (OLTP)
• HBase can be used as an OLTP database, providing
high availability, consistency, and low-latency data
access.
• HBase’s distributed architecture and automatic
failover capabilities make it a good fit for OLTP
applications that require high availability.
HBase 9
Applications of HBase
5. Ad serving and clickstream analysis
• HBase can be used to store and process large
volumes of clickstream data for ad serving and
clickstream analysis.
• HBase’s column-oriented data storage and indexing
capabilities make it a good fit for these types of
applications.
HBase 10
Advantages of HBase
1. Scalability
• HBase can handle extremely large datasets that can
be distributed across a cluster of machines.
• It is designed to scale horizontally by adding more
nodes to the cluster, which allows it to handle
increasingly larger amounts of data.
2. High-performance
• HBase is optimized for low-latency, high-throughput
access to data.
• It uses a distributed architecture that allows it to
process large amounts of data in parallel, which can
result in faster query response times. 11
HBase
Advantages of HBase..
3. Flexible Data Model
• HBase’s column-oriented data model allows for
flexible schema design and supports sparse
datasets.
• This can make it easier to work with data that has a
variable or evolving schema.
4. Fault Tolerance
• HBase is designed to be fault-tolerant by replicating
data across multiple nodes in the cluster.
• This helps ensure that data is not lost in the event of
a hardware or network failure.
HBase 12
Disadvantages of HBase
1. Complexity
• HBase can be complex to set up and manage.
• It requires knowledge of the Hadoop ecosystem and
distributed systems concepts, which can be
challenging for some users.
2. Limited Query Language
• HBase’s query language, HBase Shell, is not as
feature-rich as SQL.
• This can make it difficult to perform complex
queries and analyses.
HBase 13
Disadvantages of HBase..
3. No support for transactions
• HBase does not support transactions, which can
make it difficult to maintain data consistency in
some use cases.
4. Not suitable for all use cases
• HBase is best suited for use cases where high
throughput and low-latency access to large datasets
is required.
• It may not be the best choice for applications that
require real-time processing or strong consistency
guarantees.
HBase 14
Architecture of HBase
• HBase architecture comprises of three main
components:
• HMaster,
• Region Server
• Zookeeper
HBase 15
Architecture of HBase
• HMaster
• The implementation of Master Server in HBase is
HMaster.
• It is a process in which regions are assigned to
region server as well as DDL (create, delete table)
operations.
• It monitor all Region Server instances present in the
cluster.
• In a distributed environment, Master runs several
background threads.
• HMaster has many features like controlling load
balancing, failover etc.
HBase 16
Architecture of HBase..
• Region Server
• HBase Tables are divided horizontally by row key range
into Regions.
• Regions are the basic building elements of HBase cluster
• that consists of the distribution of tables and are
comprised of Column families.
• Region Server runs on HDFS DataNode which is present
in Hadoop cluster.
• Regions of Region Server are responsible for several
things,
• like handling, managing, executing as well as reads
and writes HBase operations on that set of regions.
• The default size of a region is 256 MB.
17
Architecture of HBase..
• Zookeeper
• It is like a coordinator in HBase.
• It provides services
• like maintaining configuration information,
naming, providing distributed synchronization,
server failure notification etc.
• Clients communicate with region servers via
zookeeper.
HBase 18
Architecture of HBase
HBase 19
Features of HBase Architecture
• Distributed and Scalable
• HBase is designed to be distributed and scalable,
which means it can handle large datasets and can
scale out horizontally by adding more nodes to the
cluster.
• Column-oriented Storage
• HBase stores data in a column-oriented manner,
which means data is organized by columns rather
than rows.
• This allows for efficient data retrieval and
aggregation.
HBase 20
Features of HBase Architecture..
• Hadoop Integration
• HBase is built on top of HDFS, which means it can
leverage HDFS for storage and MapReduce for data
processing.
• Consistency and Replication
• HBase provides strong consistency guarantees for
read and write operations,
• and supports replication of data across multiple
nodes for fault tolerance.
HBase 21
Features of HBase Architecture..
• Built-in Caching
• HBase has a built-in caching mechanism that can
cache frequently accessed data in memory, which
can improve query performance.
• Compression
• HBase supports compression of data, which can
reduce storage requirements and improve query
performance.
• Flexible Schema
• HBase supports flexible schemas,
• which means the schema can be updated on the fly
without requiring a database schema migration. 22
HBase
Difference between HBase and
HDFS
HBase HDFS
HBase provides low latency HDFS provides high latency
access operations
HBase supports random HDFS supports Write once
read and writes Read Many times
HBase is accessed through HDFS is accessed through
shell commands, Java API, MapReduce jobs
REST, Avro or Thrift API
HBase 23