0% found this document useful (0 votes)

59 views10 pages

Unit V-HBase

HBase

Uploaded by

Smitha Rajesh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

59 views10 pages

Unit V-HBase

HBase

Uploaded by

Smitha Rajesh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Apache HBase

 Open Source, distributed Hadoop database.

 HBase is a data model that is similar to Google’s big table.
 Provide quick random access to huge amounts of structured data.

Hbase Features
 Apache HBase is linearly scalable.
 It provides automatic failure support.
 It also offers consistent read and writes.
 We can integrate it with Hadoop, both as a source as well as the destination.
 It has easy java API for the client.
 HBase also offers data replication across clusters.
 Other Features :
 Consistency-offers consistent reads and writes.
 Atomic Read and Write-During one read or write process, all other processes are prevented
from performing any read or write operations
 Sharding-HBase offers automatic and manual splitting of regions into smaller subregions, as
soon as it reaches a threshold size.
 High availability-It offers LAN and WAN which supports failover and recovery. There is a
master server, at the core, which handles monitoring the region servers as well as all metadata for
the cluster.

Use cases of Apache HBase are:

 While we want to have random, real-time read/write access to Big Data, we use Apache
HBase.
 It is possible to host very large tables on top of clusters of commodity hardware with Apache
HBase.
 After Google’s Bigtable, HBase is a non-relational database modeled. Basically, as Bigtable
acts up on Google File System, in same way HBase works on top of Hadoop and HDFS.

Hbase Architecture

HBase has three major components:

 The client library/ Zoo Keeper,
 A master server / Hbase Master
 Region servers.
A master server / Hbase Master
 Assigns regions to the region servers with the help of Apache ZooKeeper .
 Handles load balancing of the regions across region servers.
 It unloads the busy servers and shifts the regions to less occupied servers.
 Maintains the state of the cluster by negotiating the load balancing.

 Is responsible for schema changes and other metadata operations such as creation of tables
and column families.
ZooKeeper
 Zookeeper is an open-source project that provides services like maintaining configuration
information, naming, providing distributed synchronization, etc.
 ZooKeeper provides distributed coordination services.
 ZooKeeper maintains which servers are alive and which are available.

 In addition to availability, the nodes are also used to track server failures or network
partitions.

 Clients communicate with region servers via zookeeper.

 In pseudo and standalone modes, HBase itself will take care of zookeeper.

Region Server
 The region servers have regions that

◦ Communicate with the client and handle data-related operations.

◦ Handle read and write requests for all the regions under it.
◦ Decide the size of the region by following the region size thresholds.
 Regions are nothing but tables that are split up and spread across the region servers.
 Region Server are responsible for several things, like handling, managing, executing as well
as reads and writes HBase operations on that set of regions.
 The default size of a region is 256MB, which we can configure as per requirement.
HBase META Table
 META Table is a special HBase Catalog Table.
 It holds the location of the regions in the HBase Cluster.

Hbase Data Model

 Column-oriented database and the tables in it are sorted by row.
 The table schema defines only column families, which are the key value pairs.
 A table have multiple column families and each column family can have any number of
columns.
 Subsequent column values are stored contiguously on the disk.
 Each cell value of the table has a timestamp.
Given below is an example schema of table in Hbase.
Column Family Column Family Column Family Column Family
Rowid
col1 col2 col3 col1 col2 col3 col1 col2 col3 col1 col2 col3
1
2
3

 It is suitable for Online Analytical Processing (OLAP).

 Column-oriented databases are designed for huge tables.

HBase and RDBMS

HBase RDBMS
An RDBMS is governed by its schema,
HBase is schema-less, it doesn't have the concept of
which describes the whole structure of
fixed columns schema; defines only column families.
tables.
It is built for wide tables. HBase is horizontally It is thin and built for small tables. Hard to
scalable. scale.
No transactions are there in HBase. RDBMS is transactional.
It has de-normalized data. It will have normalized data.
It is good for semi-structured as well as structured
It is good for structured data.
data.
Apache ZooKeeper
ZooKeeper is a distributed coordination service which also helps to manage the large set of
hosts.
The ZooKeeper framework was originally built at “Yahoo!” for accessing their applications in an
easy and robust manner. Later, Apache ZooKeeper became a standard for organized service used by
Hadoop, HBase, and other distributed frameworks.
 Apache HBase uses ZooKeeper to track the status of distributed data.
Benefits of ZooKeeper
Here are the benefits of using ZooKeeper −

 Simple distributed coordination process

 Synchronization − Mutual exclusion and co-operation between server processes. This

process helps in Apache HBase for configuration management.

 Ordered Messages-By stamping each update, it keeps track with a number denoting its
order.

 Serialization − Serialization means a surety about the consistency of running application.

Encode the data according to specific rules. Ensure your application runs consistently. This
approach can be used in MapReduce to coordinate queue to execute running threads.

 Reliability-When one or more nodes fail, the system keeps performing.

 Atomicity − Data transfer either succeed or fail completely, but no transaction is partial.

 Naming Service-To every node, identifying ZooKeeper attaches a unique identification

which is quite similar to the DNA.

 Automatice Failure Recovery-While modifying, ZooKeeper locks the data, so, if a failure
occurs in the database, that helps the cluster to recover it automatically.
Characteristics of ZooKeeper

• ZooKeeper is simple
◦ ZooKeeper is, simple filesystem that exposes a few simple operations, and some extra
abstractions such as ordering and notifications.
• ZooKeeper is expressive
◦ The ZooKeeper has primitives which are a rich set of building blocks that can be used to
build a large class of data structures and protocols.
◦ Examples include: distributed queues, distributed locks, and leader election among a
group of peers.
• ZooKeeper is highly available
◦ ZooKeeper runs on a collection of machines and is designed to be highly available, so
applications can depend on it.
• ZooKeeper facilitates loosely coupled interactions
◦ ZooKeeper interactions support participants that do not need to know about one another.
◦ For example, ZooKeeper can be used as a meeting mechanism so that processes that
otherwise don’t know of each other’s existence (or network details) can discover each
other and interact.
◦ Coordinating parties may not even be contemporaneous, since one process may leave a
message in ZooKeeper that is read by another after the first has shut down.
• ZooKeeper is a library
◦ ZooKeeper provides an open source, shared repository of implementations and recipes
of common coordination patterns.
◦ Individual programmers do not have the burden of writing common protocols
themselves .
◦ Over time the community can add to, and improve, the libraries, which is to everyone’s
benefit.
Architecture of ZooKeeper

 One ZooKeeper client is connected to one ZooKeeper server, at any given time.
 It has a simple client-server model in which clients are nodes (i.e. machines) and servers are
nodes.
 As a function, ZooKeper Clients make use of the services and servers provides the services.
 Applications make calls to ZooKeeper through a client library.
 The client library handles the interaction with ZooKeeper servers here.
 ZooKeeper architecture must be able to tolerate failures.
 Also, it must be in the position to recover from correlated recoverable failures (power
outages).
 Most importantly it must be correct or easy to implement correctly.

 Additionally, it must be fast along with high throughput and low latency.
Part Description
Clients, one of the nodes in our distributed application cluster, access information from
the server. For a particular time interval, every client sends a message to the server to let
the sever know that the client is alive.
Client
Similarly, the server sends an acknowledgement when a client connects. If there is no
response from the connected server, the client automatically redirects the message to
another server.

Server, one of the nodes in our ZooKeeper ensemble, provides all the services to clients.
Server
Gives acknowledgement to client to inform that the server is alive.
Group of ZooKeeper servers. The minimum number of nodes that is required to form an
Ensemble
ensemble is 3.
Server node which performs automatic recovery if any of the connected node failed.
Leader
Leaders are elected on service startup.
Follower Server node which follows leader instruction.

Hierarchical Namespace
The following diagram depicts the tree structure of ZooKeeper file system used for memory
representation. ZooKeeper node is referred as znode. Every znode is identified by a name and
separated by a sequence of path (/).

 In the diagram, first you have a root znode separated by “/”. Under root, you have two
logical namespaces config and workers.

 The config namespace is used for centralized configuration management and the workers
namespace is used for naming.

 Under config namespace, each znode can store upto 1MB of data. This is similar to UNIX
file system except that the parent znode can store data as well. The main purpose of this
structure is to store synchronized data and describe the metadata of the znode. This structure
is called as ZooKeeper Data Model.
Every znode in the ZooKeeper data model maintains a stat structure. A stat simply provides the
metadata of a znode. It consists of Version number, Action control list (ACL), Timestamp, and Data
length.

 Version number − Every znode has a version number, which means every time the data
associated with the znode changes, its corresponding version number would also increased.
The use of version number is important when multiple zookeeper clients are trying to
perform operations over the same znode.

 Action Control List (ACL) − ACL is basically an authentication mechanism for accessing
the znode. It governs all the znode read and write operations.

 Timestamp − Timestamp represents time elapsed from znode creation and modification. It
is usually represented in milliseconds. ZooKeeper identifies every change to the znodes
from “Transaction ID” (zxid). Zxid is unique and maintains time for each transaction so that
you can easily identify the time elapsed from one request to another request.

 Data length − Total amount of the data stored in a znode is the data length. You can store a
maximum of 1MB of data.

Types of Znodes
Znodes are categorized as persistence, sequential, and ephemeral.

 Persistence znode − Persistence znode is alive even after the client, which created that
particular znode, is disconnected. By default, all znodes are persistent unless otherwise
specified.

 Ephemeral znode − Ephemeral znodes are active until the client is alive. When a client gets
disconnected from the ZooKeeper ensemble, then the ephemeral znodes get deleted
automatically. For this reason, only ephemeral znodes are not allowed to have a children
further. If an ephemeral znode is deleted, then the next suitable node will fill its position.
Ephemeral znodes play an important role in Leader election.

 Sequential znode − Sequential znodes can be either persistent or ephemeral. When a new
znode is created as a sequential znode, then ZooKeeper sets the path of the znode by
attaching a 10 digit sequence number to the original name. For example, if a znode with path
/myapp is created as a sequential znode, ZooKeeper will change the path to
/myapp0000000001 and set the next sequence number as 0000000002. If two sequential
znodes are created concurrently, then ZooKeeper never uses the same number for each
znode. Sequential znodes play an important role in Locking and Synchronization.

Basic Operations in ZooKeeper

Operation Description
create Creates a znode (the parent znode must already exist)

delete delete Deletes a znode (the znode may not have any children)

exists exists Tests whether a znode exists and retrieves its

metadata

getAcl,setAcl getACL , setACL Gets/sets the ACL for a znode

getChildren getChildren Gets a list of the children of a znode

getData,setData getData , setData Gets/sets the data associated with a znode

sync sync Synchronizes a client’s view of a znode with ZooKeeper

• Update operations in ZooKeeper are conditional.

• A delete or setData operation has to specify the version number of the znode that is being
updated. If the version number does not match, the update will fail.
• Writes in ZooKeeper are atomic. Successfull write operations are saved permanently to the
ZooKeeper servers.

Unit 5 Topic 13 IBM Big Data Strategy (12 Files Merged)
No ratings yet
Unit 5 Topic 13 IBM Big Data Strategy (12 Files Merged)
219 pages
Understanding Hadoop Ecosystem
No ratings yet
Understanding Hadoop Ecosystem
38 pages
Zookeeper Tutorial: What Is, Architecture of Apache Zookeeper
No ratings yet
Zookeeper Tutorial: What Is, Architecture of Apache Zookeeper
10 pages
Hadoop and HBase
No ratings yet
Hadoop and HBase
31 pages
Bda Lab Manual
0% (1)
Bda Lab Manual
40 pages
Unit 5 Lecture No-3 (Hbase)
No ratings yet
Unit 5 Lecture No-3 (Hbase)
35 pages
Unit 3
No ratings yet
Unit 3
15 pages
BD Imp Ques 2
No ratings yet
BD Imp Ques 2
26 pages
2 Unit 5
No ratings yet
2 Unit 5
24 pages
BDA Unit-4 Part-2 HBase, Hive, Pig
No ratings yet
BDA Unit-4 Part-2 HBase, Hive, Pig
74 pages
Unit5 BDA
No ratings yet
Unit5 BDA
75 pages
Hadoop Questions
No ratings yet
Hadoop Questions
61 pages
Define Zookeeper
No ratings yet
Define Zookeeper
20 pages
Unit 3 & 4 Big Data
No ratings yet
Unit 3 & 4 Big Data
18 pages
Safety of Ro-Ro Passenger and Cruise Ships PDF
88% (8)
Safety of Ro-Ro Passenger and Cruise Ships PDF
54 pages
Zookeeper
No ratings yet
Zookeeper
59 pages
Unit 5 Lecture No-4 (Zookeeper)
No ratings yet
Unit 5 Lecture No-4 (Zookeeper)
20 pages
8 MapReduce Different Phases 08-01-2025
No ratings yet
8 MapReduce Different Phases 08-01-2025
28 pages
Zookeeper and Hbase
No ratings yet
Zookeeper and Hbase
43 pages
Unit - 5 Updated MHM
No ratings yet
Unit - 5 Updated MHM
25 pages
Bda-Unit-2 - 2023
No ratings yet
Bda-Unit-2 - 2023
58 pages
Unit 5
No ratings yet
Unit 5
10 pages
Unit-5 BDA
No ratings yet
Unit-5 BDA
96 pages
Big Data
No ratings yet
Big Data
223 pages
Zookeeper
No ratings yet
Zookeeper
14 pages
Zookeeper
No ratings yet
Zookeeper
28 pages
Lecture 5 Archof Confand Cood Systems
No ratings yet
Lecture 5 Archof Confand Cood Systems
42 pages
Hadoop Ecosystem
No ratings yet
Hadoop Ecosystem
55 pages
Hadoop Ecosystem
100% (2)
Hadoop Ecosystem
33 pages
Reference: Apache Hadoop: Hadoop: The Definitive Guide, by Tom White, 2 Edition, Oreilly's, 2010
100% (1)
Reference: Apache Hadoop: Hadoop: The Definitive Guide, by Tom White, 2 Edition, Oreilly's, 2010
57 pages
Zookeeper
No ratings yet
Zookeeper
4 pages
Zookeeper Tomwheeler Ll-20120607
No ratings yet
Zookeeper Tomwheeler Ll-20120607
23 pages
Big Data UNIT 5 Own
No ratings yet
Big Data UNIT 5 Own
18 pages
Unit V-Apache Pig
No ratings yet
Unit V-Apache Pig
10 pages
Hadoop Ecosystem PDF
No ratings yet
Hadoop Ecosystem PDF
55 pages
Apache ZooKeeper
No ratings yet
Apache ZooKeeper
3 pages
BDA Module 2-2023
No ratings yet
BDA Module 2-2023
30 pages
Hadoop Ecosystem
No ratings yet
Hadoop Ecosystem
56 pages
Zookeeper Hbase
No ratings yet
Zookeeper Hbase
14 pages
Zookeeper Programmers
No ratings yet
Zookeeper Programmers
20 pages
Zookeeper
100% (1)
Zookeeper
42 pages
Report
No ratings yet
Report
4 pages
Clarion IDE Users Guide
No ratings yet
Clarion IDE Users Guide
302 pages
Big Data Lab Manual
No ratings yet
Big Data Lab Manual
44 pages
IAT-IV Question Paper With Solution of 18CS72 Big Data Analytics Feb-2022-Poonam Vijay Tijare
No ratings yet
IAT-IV Question Paper With Solution of 18CS72 Big Data Analytics Feb-2022-Poonam Vijay Tijare
9 pages
2nd Unit Bda
No ratings yet
2nd Unit Bda
30 pages
Zookeeper HBase SPARK
No ratings yet
Zookeeper HBase SPARK
25 pages
Big Data Introduction & Ecosystems
No ratings yet
Big Data Introduction & Ecosystems
4 pages
Unit Ii LM
No ratings yet
Unit Ii LM
18 pages
HBASE
No ratings yet
HBASE
35 pages
Unit 3 Banking Law
No ratings yet
Unit 3 Banking Law
24 pages
SDL Module-No SQL Module Assignment No. 2: Q1 What Is Hadoop and Need For It? Discuss It's Architecture
No ratings yet
SDL Module-No SQL Module Assignment No. 2: Q1 What Is Hadoop and Need For It? Discuss It's Architecture
6 pages
Zookeeper: Coordinating Your Cluster
No ratings yet
Zookeeper: Coordinating Your Cluster
13 pages
2 Hadoop
No ratings yet
2 Hadoop
20 pages
Big Data and Hadoop Guide
No ratings yet
Big Data and Hadoop Guide
8 pages
MIP GET VIEW BOQDripSystem
No ratings yet
MIP GET VIEW BOQDripSystem
6 pages
Hadoop Unit-4
No ratings yet
Hadoop Unit-4
44 pages
Hadoop Ecosystem PDF
No ratings yet
Hadoop Ecosystem PDF
55 pages
BD - HadoopEcoSystem Unit 2part 1
No ratings yet
BD - HadoopEcoSystem Unit 2part 1
12 pages
Agitation Laboratory Report
100% (4)
Agitation Laboratory Report
34 pages
Sikorsky v. City of Newburgh, No. 23-1171 (2d Cir. May 2, 2025)
No ratings yet
Sikorsky v. City of Newburgh, No. 23-1171 (2d Cir. May 2, 2025)
13 pages
Apache Zookeeper
No ratings yet
Apache Zookeeper
31 pages
Asymmetric Information: Theory and Applications
No ratings yet
Asymmetric Information: Theory and Applications
35 pages
The Origin of The Name "Zookeeper"
No ratings yet
The Origin of The Name "Zookeeper"
4 pages
PR Electronics 5715v104 - Uk
No ratings yet
PR Electronics 5715v104 - Uk
25 pages
A Report On An Automated Whistle Blowing System For Aiding Crime Investigation
No ratings yet
A Report On An Automated Whistle Blowing System For Aiding Crime Investigation
68 pages
Chapter 10 Strategy Implementation Organizing and Structure
100% (1)
Chapter 10 Strategy Implementation Organizing and Structure
28 pages
Aero Seal
No ratings yet
Aero Seal
14 pages
OptaSense Third Party Interface Specification
No ratings yet
OptaSense Third Party Interface Specification
32 pages
Analog To Analog Conversion Techniques
No ratings yet
Analog To Analog Conversion Techniques
15 pages
Ot MCQ 3
No ratings yet
Ot MCQ 3
13 pages
STATISTICAL CONCEPTS-module1
No ratings yet
STATISTICAL CONCEPTS-module1
9 pages
Unit V-Hive
No ratings yet
Unit V-Hive
10 pages
Control of Static Electricity Work Instruction
No ratings yet
Control of Static Electricity Work Instruction
7 pages
ADSL Application Form
No ratings yet
ADSL Application Form
6 pages
CPC Modes of Servive Esummon
No ratings yet
CPC Modes of Servive Esummon
12 pages
Bigdata Unit II
No ratings yet
Bigdata Unit II
19 pages
Social Security and Health Rights of Migrant Workers in India
No ratings yet
Social Security and Health Rights of Migrant Workers in India
4 pages
Summit Evolution™: World-Class Digital Photogrammetric Workstation
No ratings yet
Summit Evolution™: World-Class Digital Photogrammetric Workstation
2 pages
Science Quarter 4 Week 4: Capslet
No ratings yet
Science Quarter 4 Week 4: Capslet
9 pages
Cryptoasset Registration Flowchart
No ratings yet
Cryptoasset Registration Flowchart
1 page
Peta1 Q1
No ratings yet
Peta1 Q1
2 pages
PRTG Report 4812 - Report Sensor - Created 2022-06-21 13-16-49 (2022-05-01 00-00 - 2022-05-31 00-00) UTC
No ratings yet
PRTG Report 4812 - Report Sensor - Created 2022-06-21 13-16-49 (2022-05-01 00-00 - 2022-05-31 00-00) UTC
2 pages
TBS-isCon Pro 75 GR-5407997-en
No ratings yet
TBS-isCon Pro 75 GR-5407997-en
1 page
Chapter 17 - Answer PDF
No ratings yet
Chapter 17 - Answer PDF
5 pages
Activity Hazards Analysis: MD485B Tower Assembly AHA
No ratings yet
Activity Hazards Analysis: MD485B Tower Assembly AHA
6 pages
Darjeeling Toy Train
No ratings yet
Darjeeling Toy Train
2 pages
80-90 DT - Fiat Tractor (01/84 - 12/92)
No ratings yet
80-90 DT - Fiat Tractor (01/84 - 12/92)
2 pages
Proposed RAT For VPs
No ratings yet
Proposed RAT For VPs
3 pages
99sqq89 Sonar
No ratings yet
99sqq89 Sonar
4 pages
ALPFA Brings Top Latino Professionals Together For 2011 Annual Convention in Anaheim, CA.
No ratings yet
ALPFA Brings Top Latino Professionals Together For 2011 Annual Convention in Anaheim, CA.
1 page
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
From Everand
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
Wei Liu
No ratings yet
HBase Configuration and Operations: Definitive Reference for Developers and Engineers
From Everand
HBase Configuration and Operations: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Apache Hive Handbook: Query, Analyze, and Optimize Big Data
From Everand
Apache Hive Handbook: Query, Analyze, and Optimize Big Data
Robert Johnson
No ratings yet
Learn Hbase in 24 Hours
From Everand
Learn Hbase in 24 Hours
Alex Nordeen
No ratings yet
Learn Hive in 24 Hours
From Everand
Learn Hive in 24 Hours
Alex Nordeen
No ratings yet

Unit V-HBase

Uploaded by

Unit V-HBase

Uploaded by

Apache HBase

 Open Source, distributed Hadoop database.

Use cases of Apache HBase are:

HBase has three major components:

 Clients communicate with region servers via zookeeper.

◦ Communicate with the client and handle data-related operations.

Hbase Data Model

 It is suitable for Online Analytical Processing (OLAP).

HBase and RDBMS

 Simple distributed coordination process

 Synchronization − Mutual exclusion and co-operation between server processes. This

 Serialization − Serialization means a surety about the consistency of running application.

 Reliability-When one or more nodes fail, the system keeps performing.

 Naming Service-To every node, identifying ZooKeeper attaches a unique identification

Basic Operations in ZooKeeper

exists exists Tests whether a znode exists and retrieves its

getAcl,setAcl getACL , setACL Gets/sets the ACL for a znode

getChildren getChildren Gets a list of the children of a znode

getData,setData getData , setData Gets/sets the data associated with a znode

sync sync Synchronizes a client’s view of a znode with ZooKeeper

• Update operations in ZooKeeper are conditional.

You might also like