Big Data Analytics & Technologies: Hbase
Big Data Analytics & Technologies: Hbase
Technologies
CT047-3-M
HBase
Topic & Structure of The Lesson
• HBase
– Conceptual data model
– Physical data storage
– Data operations
– Architecture
Column
Time
Row key “content Column “anchor:”
• Key Stamp
s:”
– Byte array
– Serves as the primary “<html>
t12
…”
key for the table “com.apac
“<html>
Column named “apache.com”
– Indexed far fast he.ww t11
…”
lookup w”
“anchor:apache “APACH
• Column Family t10
.com” E”
– Has a name (string) “anchor:cnnsi.co
t15 “CNN”
– Contains one or more m”
related columns “anchor:my.look. “CNN.co
t13
• Column ca” m”
– Belongs to one “com.cnn.w “<html>
t6
column family ww” …”
– Included inside the “<html>
t5
row …”
• familyName:column “<html>
Name t3
…”
Column
Time
Row key “content Column “anchor:”
Stamp
• Version Number s:”
“<html>
t5
…”
“<html>
t3
…”
• The HBaseMaster
– One master
• The HRegionServer
– Many region
servers
• Region
– A subset of a table’s rows, like horizontal range
partitioning
– Automatically done
• RegionServer (many slaves)
– Manages data regions
– Serves data for reads and writes (using a log)
• Master
– Responsible for coordinating the slaves
– Assigns regions, detects failures
– Admin functions
• Spark Architecture
– Spark Core
– Resilient Distributed data set
– Programming Languages Supported by Spark