Contents
Overview of HBASE
Column Oriented Database
HBASE Architecture
HBASE Features
HBASE COMPONENTS
Overview of HBASE
What is Apache HBase?
App
MR
ZK
HDFS
Apache HBase is an
open source,
distributed, column
oriented, scalable,
consistent, low
latency, random
access nonrelational database
built on Apache
Hadoop
Overview of HBASE
Production Apache HBase Applications
Inbox
Storage
Web
Search
Analytics
Monitoring
MoreCaseStudiesathttp://www.hbasecon.com/agenda/
3
Overview of HBASE
Why HBase ?
HBase is a Bigtable clone.
It is open source
It has a good community and promise for the
future
It is developed on top of and has good
integration for the Hadoop platform.
Linear Scalability.
Automatic failover
Overview of HBASE
Why HBase ?
Consistent reads and writes.
Sharding of tables
Failover support
Classes for backing hadoop mapreduce jobs
Java API for client access
Thrift gateway and a REST-ful Web
Shell support
Contents
Overview of HBASE
Column Oriented Database
HBASE Architecture
HBASE Features
HBASE COMPONENTS
Column Oriented
Column oriented
databases
Databases
Acolumn-oriented DBMSis adatabase
management system(DBMS) that stores data tables
as sections of columns of data rather than as rows
of data.
The goal of a columnar database is to efficiently
write and read data to and from hard disk storage in
order to speed up the time it takes to return a query
A column-oriented architecture looks the same on
the surface, but stores data differently than
legacy/row-based database.
Column Oriented
Column vs.
row orientation
Databases
Column Oriented
Advantages
of Column
Databases
Database
One of the main benefits of a columnar database is that data
can be highlycompressed. The compression permits columnar
operations like MIN, MAX, SUM, COUNT and AVG to be
performed very rapidly.
Another benefit is that because a column-based DBMSs is self-
indexing, it uses less disk space than a relational database
management system (RDBMS) containing the same data.
Column architecture doesnt read unnecessary columns.
Avoids decompression costs and perform operations faster.
Use compression schemes allow us to lower our disk space
requirements.
Contents
Overview of HBASE
Column Oriented Database
HBASE Architecture
HBASE Features
HBASE COMPONENTS
10
HBASE Architecture
Contents
Overview of HBASE
Column Oriented Database
HBASE Architecture
HBASE Features
HBASE COMPONENTS
12
HBase Features
Auto sharding
13
HBase Features
Distribution
14
HBase Features
Auto sharding & Distribution
Unit of scalability in Hbase is region.
Sorted, contigious range of rows.
Spread randomly across region servers.
Moved around for load balancing and failover
Split automatically or manually to scale with
growing data
Capacity is solely a factor of cluster nodes vs.
regions per node.
15
HBase Features
Storage Separation
16
HBase Features
Storage Separation
Column Families allow for separation of data
Used By Columnar databases for fast analytical
queries, but on column level only
Allow different or no compression depending on
the content type.
Segragate information based on access pattern
Data is stored in one or more storage file, called
HFiles
17
Contents
Overview of HBASE
Column Oriented Database
HBase Architecture
HBase Features
HBase COMPONENTS
18
HBase Components
HMaster
Responsible
servers
for
monitoring
region
Redirect client to correct region servers
Master controls critical functions such as
RegionServer failover and completing
region splits. So while the cluster can
still run for a time without the Master,
the Master should be restarted as soon
as possible.
Is
the interface for all metadata
changes, it runs on the server which
hosts namenode.
19
HBase Components
Regionservers
Responsible for serving and managing
regions, its like a data node for Hbase.
These can be thought of Datanode for
Hadoop cluster. It serve the client request
for the data.
It handle the actual data storage and
request.
Send HeartBeat to Master
It consists of Regions or in better words
tables.
RegionServers are usually configured to run
on servers of HDFS DataNode. Running
RegionServer on the DataNode server has
the advantage of data locality too
20
HBase Components
Zookeeper
Zookeeper is an open source
software providing a highly reliable,
distributed coordination service
Entry point for an HBase system
It includes tracking of region servers,
where the root region is hosted
21
HBase Components
API
Interface to HBase
Using these we can we can access HBase
and perform read/write and other
operation on Hbase.
REST, Thrift, and Avro
Thrift API framework, for scalable cross-
language services development, combines
a software stack with a code generation
engine to build services that work
efficiently and seamlessly between C++,
Java, Python, PHP, Ruby, Erlang, Perl,
Haskell, C#, Cocoa, JavaScript, Node.js,
Smalltalk, OCaml and Delphi and other
languages.
22
Thank You
23