0% found this document useful (0 votes)
309 views33 pages

Hbase PDF

This document provides an introduction to HBase, an open source, distributed, sorted key-value database modeled after Google's Bigtable. It discusses what HBase is, how to install it, compares it to relational databases, describes its architecture and components, and how to interface with it using Java, Thrift, and the HBase shell. Key aspects covered include its column-oriented data model, use of HDFS for storage, and architecture consisting of a master server and region servers.

Uploaded by

Sathya Ch
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
309 views33 pages

Hbase PDF

This document provides an introduction to HBase, an open source, distributed, sorted key-value database modeled after Google's Bigtable. It discusses what HBase is, how to install it, compares it to relational databases, describes its architecture and components, and how to interface with it using Java, Thrift, and the HBase shell. Key aspects covered include its column-oriented data model, use of HDFS for storage, and architecture consisting of a master server and region servers.

Uploaded by

Sathya Ch
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Introduction to Hbase

GkavresisGiorgos1470

Agenda

WhatisHbase

Installation

AboutRDBMS

OverviewofHbase

WhyHbaseinsteadofRDBMS

ArchitectureofHbase

Hbaseinterface

Summarise

WhatisHbase

Hbaseisanopensource,distributedsortedmap
modeledafterGoogle'sBigTable

OpenSource

Apache2.0License
Commitersandcontributorsfromdiverse
organizationslikeFacebook,TrendMicroetc.

Installation
Downloadlink
http://www.apache.org/dyn/closer.cgi/hbase/
Beforestartingit,youmightwanttoedit
conf/hbasesite.xmlandsetthedirectoryyouwant
HBasetowriteto,hbase.rootdir
Canbestandaloneorpseudodistributedand
distributed
StartHbasevia$./bin/starthbase.sh

AboutRelational
DatabaseManagementSystems

HavealotofLimitations
Bothread/writethroughoutnot
possible(transactionaldatabases)
SpecializedHardwareisquiteexpensive

Background

GooglereleasespaperonBigtable2006

FirstusableHbase2007

HbasebecomesApachetoplevenproject2010

Hbase0.26.5released.

OverviewofHbase

HbaseisapartofHadoop
ApacheHadoopisanopensourcesystemto
reliablystoreandprocessdataacrossmany
commoditycomputers

HbaseandHadooparewritteninJava

Hadoopprovides:

Faulttolerance

Scalability

Hadoopadvantages

Datapararellorcomputepararell.Forexample:
Extensivemachinelearningon<100GBofimage
data

SimpleSQLquerieson>100TBofclickstreaming
data

Hadoop'scomponents

MapReduce(Process)
Faulttolerantdistributedprocessing

HDFS(store)

Selfhealing

Highbandwidth

Clusteredstorage

DifferenceBetweenHadoop/HDFS
andHbase
HDFSisadistributedfilesystemthatiswellsuited
forthestorageoflargefiles.HBase,ontheother
hand,isbuiltontopofHDFSandprovidesfast
recordlookups(andupdates)forlargetables.
HDFShasbasedonGFSfilesystem.

Hbaseis

DistributedusesHDFSforstorage

ColumnOriented

MultiDimensional(Versions)

StorageSystem

HbaseisNOT

AsqlDatabaseNoJoins,noqueryengine,no
datatypes,no(damn)sql

NoSchema

NoDBAneeded

StorageModel

Columnorienteddatabase(columnfamilies)
TableconsistsofRows,eachwhichhasaprimary
key(rowkey)
EachRowmayhaveanynumberofcolumns
TableschemaonlydefinesColumnfamiles(column
familycanhaveanynumberofcolumns)
Eachcellvaluehasatimestamp

StaticColumns
int

varchar

int

varchar

int

int

varchar

int

varchar

int

int

varchar

int

varchar

int

Somethingdifferent

Row1ColA=Value

ColB=Value

ColC=Value

Row2ColX=Value

ColY=Value

ColZ=Value

ABigMap
RowKey+ColumnKey+timestamp
=>value
Row Key

Column Key

Timestamp

Value

Info:name

127351619786
8

Sakis

Info:age

127387182418
4

21

Info:sex

127374628143
2

Male

Info:name

127386372322
7

Themis

Info:name

127397313423
8

Andreas

Onemoreexample
Row Key

Data

cutting

Info:{'height':'9ft','state':'CA'}
Roles:{'ASF':Director','Hadoop':'Founder'}

tlipcon

Info:{'height':5ft7','state':'CA'}
Roles:{'Hadoop':'Committer'@ts=2010
'Hadoop':'PMC'@ts=2011
'Hive':'Contributor'}

ColumnFamilies

Differentsetsofcolumnsmayhavedifferent
priorities
CFsstoredseparatelyondiskaccessonewithout
wastingIOontheother.
Configurablebycolumnfamily

Compression(none,gzip,LZO)

Versionretentionpolicies

Cachepriority

HbasevsRDBMS
RDBMS

Hbase

Data layout

Row-oriented

Column family oriented

Query language

SQL

Get/put/scan/etc *

Security

Authentication/Authorizati Work in Progress


on

Max data size

TBs

Hundrends of PBs

Read / write throughput


limits

1000s queries/second

Millions of queries per


second

TermsandDaemons

Region
Asubsetoftable'srows,

RegionServer(slave)
Servesdataforreadsandwrites

Master

Responsibleforcoordinatingtheslaves

Assignsregions,detectsfailuresofRegionServers

Controlsomeadminfunction

Distributedcoordination

Tomanagemasterelectionandserveravailability
weuseZookeeper
Setupacluster,providesdistributedcoordination
primitives
Anexcellenttoolforbuildingclustermanagement
systems

HbaseArchitecture

Distributedcoordination

Tomanagemasterelectionandserveravailability
weuseZookeeper
Setupacluster,providesdistributedcoordination
primitives
Anexcellenttoolforbuildingclustermanagement
systems

HbaseInterface

Java

Thrift(Ruby,Php,Python,Perl,C++,..)

HbaseShell

HbaseAPI

get(row)

put(row,Map<column,value>)

scan(keyrange,filter)

increment(row,columns)

CheckandPut,deleteetc.

Hbaseshell

hbase(main):003:0>create'test','cf'

0row(s)in1.2200seconds

hbase(main):004:0>put'test','row1','cf:a','value1'

0row(s)in0.0560seconds

hbase(main):005:0>put'test','row2','cf:b','value2'

0row(s)in0.0370seconds

hbase(main):006:0>put'test','row3','cf:c','value3'

0row(s)in0.0450seconds

Hbaseshellcont.

hbase(main):007:0>scan'test'

ROWCOLUMN+CELL

row1column=cf:a,timestamp=1288380727188,value=value1

row2column=cf:b,timestamp=1288380738440,value=value2

row3column=cf:c,timestamp=1288380747365,value=value3

3row(s)in0.0590seconds

Hbaseinjava
HBaseConfigurationconf=newHBaseConfiguration();
conf.addResource(newPath("/opt/hbase0.19.3/conf/hbasesite.xml"));

HTabletable=newHTable(conf,"test_table");
BatchUpdatebatchUpdate=newBatchUpdate("test_row1");
batchUpdate.put("columnfamily:column1",Bytes.toBytes("somevalue")
);
batchUpdate.delete("column1");
table.commit(batchUpdate);

GetData
Readonecolumnvaluefromarow
Cellcell=table.get("test_row1","columnfamily1:column1");
Toreadonerowwithgivencolumns,useHTable#getRow()method.
RowResultsingleRow=table.getRow(Bytes.toBytes("test_row1")
);

Atoughfacebookapplication

RealtimecountersofURLsshared,linksliked,
impressionsgenerated

20billionevents/day(200Kevents/sec)

~30seclatencyfromclicktocount

HeavyuseofincrementColumnValueAPI

TriedMySQL,Cassandra,settledonHbase

UseHbaseif

Youneedrandomwrire,randomreadorboth(but
notneither)
Youneedtodomanythousandsofoperationsper
seconmultipleTBofdata
Youraccesspatternsaresimple

Thankyou\../

You might also like