0% found this document useful (0 votes)

36 views6 pages

Hbase Big Table: Oriented vs. Column-Oriented Data Stores. As Shown Below, in A Row

HBase is an open-source, column-oriented database modeled after Google's Bigtable. It provides scalability, versioning, and fault tolerance by using write-ahead logging and distributed configurations. HBase data is stored in columns and retrieved more efficiently than row-oriented databases when only some columns are needed. It is built on Hadoop and HDFS, allowing it to leverage Hadoop's MapReduce.

Uploaded by

NaniSatish

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views6 pages

Hbase Big Table: Oriented vs. Column-Oriented Data Stores. As Shown Below, in A Row

Uploaded by

NaniSatish

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

Introduction

HBase is a column-oriented database that’s an open-source implementation of

Google’s Big Table storage architecture. It can manage structured and semi-
structured data and has some built-in features such as scalability, versioning,
compression and garbage collection. Since its uses write-ahead logging and
distributed configuration, it can provide fault-tolerance and quick recovery from
individual server failures. HBase built on top of Hadoop / HDFS and the data
stored in HBase can be manipulated using Hadoop’s MapReduce capabilities.
Let’s now take a look at how HBase (a column-oriented database) is different
from some other data structures and concepts that we are familiar with Row-
Oriented vs. Column-Oriented data stores. As shown below, in a row-
oriented data store, a row is a unit of data that is read or written together. In a
column-oriented data store, the data in a column is stored together and hence
quickly retrieved.

Row-oriented data stores –

 Data is stored and retrieved one row at a time and hence could read
unnecessary data if only some of the data in a row is required.
 Easy to read and write records
 Well suited for OLTP systems
 Not efficient in performing operations applicable to the entire dataset and
hence aggregation is an expensive operation
 Typical compression mechanisms provide less effective results than those
on column-oriented data stores
Column-oriented data stores –
 Data is stored and retrieved in columns and hence can read only relevant
data if only some data is required
 Read and Write are typically slower operations
 Well suited for OLAP systems
 Can efficiently perform operations applicable to the entire dataset and
hence enables aggregation over many rows and columns
 Permits high compression rates due to few distinct values in columns
Introduction Relational Databases vs. HBase
When talking of data stores, we first think of Relational Databases with
structured data storage and a sophisticated query engine. However, a Relational
Database incurs a big penalty to improve performance as the data size increases.
HBase, on the other hand, is designed from the ground up to provide scalability
and partitioning to enable efficient data structure serialization, storage and
retrieval. Broadly, the differences between a Relational Database and HBase
are:

Relational Database –

 Is Based on a Fixed Schema

 Is a Row-oriented datastore
 Is designed to store Normalized Data
 Contains thin tables
 Has no built-in support for partitioning.
HBase –

 Is Schema-less
 Is a Column-oriented datastore
 Is designed to store Denormalized Data
 Contains wide and sparsely populated tables
 Supports Automatic Partitioning
HDFS vs. HBase
HDFS is a distributed file system that is well suited for storing large files. It’s
designed to support batch processing of data but doesn’t provide fast individual
record lookups. HBase is built on top of HDFS and is designed to provide
access to single rows of data in large tables. Overall, the differences between
HDFS and HBase are

HDFS –
 Is suited for High Latency operations batch processing
 Data is primarily accessed through MapReduce
 Is designed for batch processing and hence doesn’t have a concept of
random reads/writes
HBase –

 Is built for Low Latency operations

 Provides access to single rows from billions of records
 Data is accessed through shell commands, Client APIs in Java, REST,
Avro or Thrift
HBase Architecture
The HBase Physical Architecture consists of servers in a Master-Slave
relationship as shown below. Typically, the HBase cluster has one Master node,
called HMaster and multiple Region Servers called HRegionServer. Each
Region Server contains multiple Regions – HRegions.

Just like in a Relational Database, data in HBase is stored in Tables and these
Tables are stored in Regions. When a Table becomes too big, the Table is
partitioned into multiple Regions. These Regions are assigned to Region
Servers across the cluster. Each Region Server hosts roughly the same number
of Regions.
The HMaster in the HBase is responsible for

 Performing Administration
 Managing and Monitoring the Cluster
 Assigning Regions to the Region Servers
 Controlling the Load Balancing and Failover
On the other hand, the HRegionServer perform the following work

 Hosting and managing Regions

 Splitting the Regions automatically
 Handling the read/write requests
 Communicating with the Clients directly
Each Region Server contains a Write-Ahead Log (called HLog) and multiple
Regions. Each Region in turn is made up of a MemStore and multiple StoreFiles
(HFile). The data lives in these StoreFiles in the form of Column Families
(explained below). The MemStore holds in-memory modifications to the Store
(data).

The mapping of Regions to Region Server is kept in a system table called

.META. When trying to read or write data from HBase, the clients read the
required Region information from the .META table and directly communicate
with the appropriate Region Server. Each Region is identified by the start key
(inclusive) and the end key (exclusive)

HBase Data Model

The Data Model in HBase is designed to accommodate semi-structured data that
could vary in field size, data type and columns. Additionally, the layout of the
data model makes it easier to partition the data and distribute it across the
cluster. The Data Model in HBase is made of different logical components such
as Tables, Rows, Column Families, Columns, Cells and Versions.

Tables – The HBase Tables are more like logical collection of rows stored in
separate partitions called Regions. As shown above, every Region is then served
by exactly one Region Server. The figure above shows a representation of a
Table.
Rows – A row is one instance of data in a table and is identified by a rowkey.
Rowkeys are unique in a Table and are always treated as a byte[].

Column Families – Data in a row are grouped together as Column Families.

Each Column Family has one more Columns and these Columns in a family are
stored together in a low level storage file known as HFile. Column Families
form the basic unit of physical storage to which certain HBase features like
compression are applied. Hence it’s important that proper care be taken when
designing Column Families in table. The table above shows Customer and Sales
Column Families. The Customer Column Family is made up 2 columns – Name
and City, whereas the Sales Column Families is made up to 2 columns –
Product and Amount.

Columns – A Column Family is made of one or more columns. A Column is

identified by a Column Qualifier that consists of the Column Family name
concatenated with the Column name using a colon – example:
columnfamily:columnname. There can be multiple Columns within a Column
Family and Rows within a table can have varied number of Columns.

Cell – A Cell stores data and is essentially a unique combination of rowkey,

Column Family and the Column (Column Qualifier). The data stored in a Cell is
called its value and the data type is always treated as byte[].

Version – The data stored in a cell is versioned and versions of data are
identified by the timestamp. The number of versions of data retained in a
column family is configurable and this value by default is 3.

UNIT 5 Notes
No ratings yet
UNIT 5 Notes
47 pages
4.5 Hbase
No ratings yet
4.5 Hbase
27 pages
Wa0005.
No ratings yet
Wa0005.
53 pages
Unit 5 Bda
No ratings yet
Unit 5 Bda
42 pages
Chapter 12 HBase
No ratings yet
Chapter 12 HBase
108 pages
Pbds Unit-5
No ratings yet
Pbds Unit-5
60 pages
HBase - Tutorial
No ratings yet
HBase - Tutorial
14 pages
CTF Handbook
100% (1)
CTF Handbook
808 pages
Tableau Desktop
100% (1)
Tableau Desktop
3,545 pages
HBase
No ratings yet
HBase
39 pages
Unit V Hadoop Related Tools
No ratings yet
Unit V Hadoop Related Tools
54 pages
Big Data Analytics Unit-5
No ratings yet
Big Data Analytics Unit-5
28 pages
Unit 3
No ratings yet
Unit 3
15 pages
BDA Unit 5
No ratings yet
BDA Unit 5
33 pages
Unit 5 Lecture No-3 (Hbase)
No ratings yet
Unit 5 Lecture No-3 (Hbase)
35 pages
NoteGPT - What Is HBase - HBase Architecture - HBase Tutorial For Beginners - Hadoop Tutorial - Simplilearn
No ratings yet
NoteGPT - What Is HBase - HBase Architecture - HBase Tutorial For Beginners - Hadoop Tutorial - Simplilearn
5 pages
H Base Tutorial
No ratings yet
H Base Tutorial
38 pages
PHP MYSQL Bangla E Book Tutorial Download Training Bangladesh PDF
No ratings yet
PHP MYSQL Bangla E Book Tutorial Download Training Bangladesh PDF
153 pages
Unit 5 Lecture No-3 (Hbase)
No ratings yet
Unit 5 Lecture No-3 (Hbase)
35 pages
BDA Unit-4 Part-2 HBase, Hive, Pig
No ratings yet
BDA Unit-4 Part-2 HBase, Hive, Pig
74 pages
Unit - IV - Notes
No ratings yet
Unit - IV - Notes
23 pages
BDA Unit-5
No ratings yet
BDA Unit-5
31 pages
Unit 4
No ratings yet
Unit 4
15 pages
HBASE
No ratings yet
HBASE
18 pages
Bda - Unit 5
No ratings yet
Bda - Unit 5
30 pages
Unit-5 Notes
No ratings yet
Unit-5 Notes
61 pages
DBMS Unit3
No ratings yet
DBMS Unit3
28 pages
BDM Unit 5
No ratings yet
BDM Unit 5
60 pages
9 HBase
No ratings yet
9 HBase
77 pages
C7 Hbase
No ratings yet
C7 Hbase
36 pages
BDT Unit - V
No ratings yet
BDT Unit - V
15 pages
10 HBase
No ratings yet
10 HBase
13 pages
Game Development Lab Manual
No ratings yet
Game Development Lab Manual
22 pages
Unit 5 Big Data
No ratings yet
Unit 5 Big Data
34 pages
Hadoop HBASE
No ratings yet
Hadoop HBASE
71 pages
Lec 18
No ratings yet
Lec 18
18 pages
4 4HBase
No ratings yet
4 4HBase
17 pages
Unit 5 Hbase
No ratings yet
Unit 5 Hbase
15 pages
HBase
No ratings yet
HBase
27 pages
Apache HBase
No ratings yet
Apache HBase
12 pages
HBase
No ratings yet
HBase
6 pages
HBASE
No ratings yet
HBASE
11 pages
Unit 5 BDA
No ratings yet
Unit 5 BDA
34 pages
HBase (Unit 4)
No ratings yet
HBase (Unit 4)
37 pages
Big Data Unit 5
No ratings yet
Big Data Unit 5
18 pages
Lec 18
No ratings yet
Lec 18
21 pages
Lesson 6 NoSQL Databases HBase
100% (1)
Lesson 6 NoSQL Databases HBase
47 pages
HBase
No ratings yet
HBase
30 pages
Cse 17CS82 M2 S4 PPT
No ratings yet
Cse 17CS82 M2 S4 PPT
19 pages
UNIT5
No ratings yet
UNIT5
42 pages
Cs525: Special Topics in DBS: Large-Scale Data Management
No ratings yet
Cs525: Special Topics in DBS: Large-Scale Data Management
35 pages
HBase
No ratings yet
HBase
38 pages
Computer System Validation (CSV) - Ajay Kulkarni
33% (3)
Computer System Validation (CSV) - Ajay Kulkarni
66 pages
Unit 5 Notes
100% (3)
Unit 5 Notes
66 pages
Hbase - Quick Guide Hbase - Overview
No ratings yet
Hbase - Quick Guide Hbase - Overview
53 pages
HBASE
No ratings yet
HBASE
35 pages
Columnar Database
No ratings yet
Columnar Database
18 pages
Accu-Chek Smart Pix Manual-EN-2.2.1
No ratings yet
Accu-Chek Smart Pix Manual-EN-2.2.1
190 pages
Rebol Coreguide
100% (1)
Rebol Coreguide
428 pages
Assignment 10
No ratings yet
Assignment 10
9 pages
Hbase - in Detail: Pushpinder Singh Paxcel Technologies
No ratings yet
Hbase - in Detail: Pushpinder Singh Paxcel Technologies
32 pages
Apress
No ratings yet
Apress
28 pages
Hbase in Practice
No ratings yet
Hbase in Practice
46 pages
Me 3BTnaTiCv9wU52h4gdA Chamillard-C-Unity-Book
No ratings yet
Me 3BTnaTiCv9wU52h4gdA Chamillard-C-Unity-Book
509 pages
Assignment Day 10: Task 1
No ratings yet
Assignment Day 10: Task 1
8 pages
HBase
No ratings yet
HBase
31 pages
Hbase
No ratings yet
Hbase
13 pages
Hbase: Q) What Is Hbase ?
No ratings yet
Hbase: Q) What Is Hbase ?
15 pages
Mca Alagappa University
No ratings yet
Mca Alagappa University
27 pages
CCL1188-K23 1694705256955001ivr9
No ratings yet
CCL1188-K23 1694705256955001ivr9
46 pages
08 Widgets
No ratings yet
08 Widgets
22 pages
(RHV) How To Prevent User Account Getting Locked Due To Failed Attempts - Red Hat Customer Portal
No ratings yet
(RHV) How To Prevent User Account Getting Locked Due To Failed Attempts - Red Hat Customer Portal
3 pages
Software Engineering-Unit-1
No ratings yet
Software Engineering-Unit-1
52 pages
Fortios™ Handbook: Load Balancing For Forrtios 5.0
No ratings yet
Fortios™ Handbook: Load Balancing For Forrtios 5.0
55 pages
Cymbal Ingest To Vector Database All Steps
No ratings yet
Cymbal Ingest To Vector Database All Steps
6 pages
Adms-15 FTM-200D Im Eng 2211-B
No ratings yet
Adms-15 FTM-200D Im Eng 2211-B
35 pages
Aplikasi Game Pendidikan Berbasis Android Untuk Memperkenalkan Pakaian Adat Indonesia
No ratings yet
Aplikasi Game Pendidikan Berbasis Android Untuk Memperkenalkan Pakaian Adat Indonesia
8 pages
Ss Activity 6
No ratings yet
Ss Activity 6
2 pages
X, Y, Z X Y Z X + Y : 1. What Is Javascript?Features of Javascript, What Is Javascript Syntax?
No ratings yet
X, Y, Z X Y Z X + Y : 1. What Is Javascript?Features of Javascript, What Is Javascript Syntax?
15 pages
Current Log
No ratings yet
Current Log
6 pages
AWS VM Creation
No ratings yet
AWS VM Creation
4 pages
Config
No ratings yet
Config
3 pages
Smart Home With Openhab: June 2018
No ratings yet
Smart Home With Openhab: June 2018
7 pages
EMG White Paper Ver2.1
No ratings yet
EMG White Paper Ver2.1
5 pages
Packet Tracer - Implement Basic Connectivity
No ratings yet
Packet Tracer - Implement Basic Connectivity
3 pages
CV 2017 Kelly
No ratings yet
CV 2017 Kelly
1 page
SharePoint BDC Vs Business Data List Connector (BDLC)
No ratings yet
SharePoint BDC Vs Business Data List Connector (BDLC)
14 pages
AWS Certified Developer Associate - Sample Questions
No ratings yet
AWS Certified Developer Associate - Sample Questions
5 pages
3rd Quarter Exam-Empowerment Technologies-Answerkey&Tos
No ratings yet
3rd Quarter Exam-Empowerment Technologies-Answerkey&Tos
10 pages
Learn Hbase in 24 Hours
From Everand
Learn Hbase in 24 Hours
Alex Nordeen
No ratings yet

Hbase Big Table: Oriented vs. Column-Oriented Data Stores. As Shown Below, in A Row

Uploaded by

Hbase Big Table: Oriented vs. Column-Oriented Data Stores. As Shown Below, in A Row

Uploaded by

Introduction

HBase is a column-oriented database that’s an open-source implementation of

Row-oriented data stores –

 Is Based on a Fixed Schema

 Is built for Low Latency operations

 Hosting and managing Regions

The mapping of Regions to Region Server is kept in a system table called

HBase Data Model

Column Families – Data in a row are grouped together as Column Families.

Columns – A Column Family is made of one or more columns. A Column is

Cell – A Cell stores data and is essentially a unique combination of rowkey,

You might also like