0% found this document useful (0 votes)
60 views2 pages

Hbase What Is Hbase?

The document discusses HBase which is a distributed column-oriented database built on Hadoop. HBase provides fast random access to large datasets stored in Hadoop and is modeled after Google's BigTable. It discusses how HBase stores and accesses data, its features, applications, and relationship with other Hadoop components like HDFS and Hive.

Uploaded by

Muriel Sozim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
60 views2 pages

Hbase What Is Hbase?

The document discusses HBase which is a distributed column-oriented database built on Hadoop. HBase provides fast random access to large datasets stored in Hadoop and is modeled after Google's BigTable. It discusses how HBase stores and accesses data, its features, applications, and relationship with other Hadoop components like HDFS and Hive.

Uploaded by

Muriel Sozim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

HBase

What is HBase?

It is a distributed column-oriented database built on top of the Hadoop File System - HDSF.
It is an open-source project and is horizontally scalable.

HBase is a data model that is similar to Google’s Big Table designed to provide quick
random access to huge amounts of structured data.

It is a part of the Hadoop ecosystem that provides random real-time read/write access to data
in the Hadoop File System.

HBase and HDSF

HDSF HBase
Is a distributed file system suitable for storing HBase is a database built on top of the HDSF.
large file.
Doesn’t support fast individual record lookups. Provides fast lookups for larger tables.
It provides high latency batch processing; no It provides low latency access to single rows
concept of batch processing. from billions of records (Random access).
It provides only sequential access of data. Internally uses Hash tables and provides random
access, and it stores the data in indexed HDSF
files for faster lookups.

Storage Mechanism in HBase

In short, in an HBase:
• Table is a collection of rows
• Rom is a collection of column families
• Column families is a collection of columns
• Column is a collection of key value pairs

Features of HBase

• HBase is linearly scalable.


• Is has automatic failure support.
• It provides consisted read and writes.
• It is integrates with Hadoop, both as a source and a destination.
• It has easy java API for client.
• I provides data replication across clusters.

Where to Use HBase

• Apache HBase is used to have random, real-time read/write access to Big Data.
• It hosts very large tables on top of clusters of commodity hardware.
• Apache HBase is a non-relational databases modeled after Google’s Big Table. Big
table acts up on Google File System, likewise Apache HBase works on top of
Hadoop and HDFS.
Applications of HBase

• It is used whenever there is a need to write heavy applications.


• HBase is used whenever we need to provide fast random access to available data.
• Companies such as Facebook, Twitter, Yahoo and Adobe use HBase internally.

Does HBase support SQL?

Not really. SQL-ish support for HBase via Hives is in development, however Hive is based
on MapReduce which is not generally suitable for low-latency requests.

Lookup → a procedure is which a table of values stored in a computer is searched until a specified
values is found.

Google’s Big Table → is a distributed storage system for managing structured data that is designed
to scale to a very size: petabytes of data across thousands of commodity
servers. Used in Google Earth and Google Finance.

Hives→ The Apache Hive ™ data warehouse software facilitates reading, writing, and managing
large datasets residing in distributed storage using SQL. Structure can be projected onto
data already in storage. A command line tool and JDBC driver are provided to connect
users to Hive.

MapReduce → is a programming model and an associated implementation for processing and


generating big data sets with a parallel, distributed algorithm on a cluster.

You might also like