Hbase Lab Manual3.0-Update
Hbase Lab Manual3.0-Update
1. General commands
In Hbase, general commands are categorized into following commands
1. Status hbase(main):001:0> status /version/WHOAMI/table_help
2. Version
3. table_help ( scan, drop, get, put, disable, etc.)
4. Whoami
5.disable_all This command will disable all the tables matching the given regex.
The implementation is same as delete command (Except adding regex for matching)
Once the table gets disable the user can able to delete the table from HBase
Before delete or dropping table, it should be disabled first
Syntax: disable_all<"matching regex"
7.show_filtersThis command displays all the filters present in HBase like ColumnPrefix Filter,
TimestampsFilter, PageFilter, FamilyFilter, etc.
Syntax: show_filters
8.drop To delete/DROP the table present in HBase, first we have to disable it.
So either table to drop or delete first the table should be disable using disable command
Syntax: drop <table name>
9.drop_all This command will drop all the tables matching the given regex
Syntax: drop_all<"regex">
10.is_enabled This command will verify whether the named table is enabled or not.
Suppose a table is disabled, to use that table we have to enable it by using enable command
is_enabled command will check either the table is enabled or not
Syntax: is_enabled 'education'
11..alter Changing the Maximum Number of Cells of a Column Family
Syntax: alter <tablename>, NAME=><column familyname>, VERSIONS=>5
Altering single, multiple column family names
Syntax: alter ‘<table_name>’, ‘reference of column_family’ , { NAME => 'New _column_family
name', IN_MEMORY => true, VERSIONS => 5}
Ex- alter 'student','Personal_details', {NAME => 'Admission_details', IN_MEMORY => true,
VERSIONS => 5}
Before alter command table structure in Hbase
Personal_details Education_details
Name Age Address Course Year Grade
After alter command
Personal_details Admission_details Education_details
Name Age Address Course Year Grade
Deleting column family names from table
alter 'sun', 'delete' =>'weather' // only column family
alter 'education', 'delete' =>’waether'
12. alter_status - can get the status of the alter command
Which indicates the number of regions of the table that have received the updated schema
pass table name
Syntax: alter_status 'education'(Table Name)
2. Put - It will put a cell ‘value’ at defined or specified table or row or column. It will optionally
coordinate time stamp
Syntax: put <'tablename'>,<'rowname'>,<'columnvalue'>,<'value'> OR
hbase(main):018:0>put 'cdac1',1,'course:online','type:free'
Example: Here we are placing values into table “guru99” under row r1 and column c1
hbase> put 'guru99', 'r1', 'c1', 'value', 10 /where r1 is row and c1 is column
3.Get- By using this command, you will get a row or cell contents present in the table.
Syntax: get <'tablename'>, <'rowname'>, {< Additional parameters>}
eX-hbase> get 'guru99', 'r1', {COLUMN => 'c1'}
4.Delete This command will delete cell value at defined table of row or column.
Delete must and should match the deleted cells coordinates exactly.
When scanning, delete cell suppresses older versions of values.
eX-hbase(main):)020:0> delete 'guru99', 'r1', 'c1''.
The above execution will delete row r1 from column family c1 in table “guru99.”
5.deleteall This Command will delete all cells in a given row.
Syntax: deleteall <'tablename'>, <'rowname'>
6.Truncate
Syntax: truncate <tablename>
After truncate of an hbase table, the schema will present but not the records. This command
performs 3 functions; those are listed below
Disables table if it already presents
Drops table if it already presents
Recreates the mentioned table
7.Scan- Display the Content of HBase Table
Syntax: scan <'tablename'>, {Optional parameters}
Scanner specifications may include one or more of the following attributes.
These are TIMERANGE, FILTER, TIMESTAMP, LIMIT, MAXLENGTH, COLUMNS, CACHE, STARTROW and
STOPROW.
eX-scan 'guru99'
Create a table who has more than 01 column families-
1. There are two column families CF1 and CF2 in creating table Hbase
*you can only add one column and one column, not multiple columns at the same time.
This table has two column families, CF1 and CF2. Under CF1 and CF2, there are two columns, name
and gender, Chinese and Math
*. {NAME=>'cf1'} / name should be in capital i.e NAME
Example:
hbase(main):041:0> create 'hbase_1102', {NAME=>'cf1'}, {NAME=>'cf2'}
2. Add data to the table. When you want to add data to the table of HBase,
hbase(main):042:0> put'hbase_1102', '001','cf1:name','Sumit'
hbase(main):043:0> put'hbase_1102', '001','cf1:gender','male'
hbase(main):044:0> put'hbase_1102', '001','cf2:chinese','90'
hbase(main):045:0> put'hbase_1102', '001','cf2:math','91'
R Grade Attendance Essay Computer Name Mob Height Weight Company Year Package
o writing No.
proficency
w
1 First Good Average Yes Tarun 8589 5" 58 HCL 2018 6 LPA
HBase architecture always has “Single Point Of Failure” feature, and there is no exception handling
mechanism associated with it
Performance Bottlenecks in HBase
In any production environment, HBase is running with a cluster of more than 5000 nodes, only
Hmaster acts as the master to all the slaves Region servers. If Hmaster goes down, it can be only be
recovered after a long time. Even though the client is able to connect region server. Having another
master is possible but only one will be active. It will take a long time to activate the second Hmaster if
the main Hmaster goes down. So, Hmaster is a performance bottleneck.
In HBase, we cannot implement any cross data operations and joining operations, of course, we can
implement the joining operations using MapReduce, which would take a lot of time to designing and
development. Tables join operations are difficult to perform in HBase. In some use case, its impossible
to create join operations that related to tables that are present in HBase
HBase would require new design when we want to migrate data from RDBMS external sources to
HBase servers. However, this process takes a lot of time.
HBase is really tough for querying. We may have to integrate HBase with someSQL layers
like Apache phoenix where we can write queries to trigger the data in the HBase. It’s really good to
have Apache Phoenix on top of HBase.
Another drawback with HBase is that, we cannot have more than one indexing in the
table, only row key column acts as a primary key. So, the performance would be slow
when we wanted to search on more than one field or other than Row key. This problem
we can overcome by writing MapReduce code, integrating with Apache SOLR and with
Apache Phoenix.
Slow improvements in the security for the different users to access the data from HBase.
HBase doesn’t support partial keys completely
HBase allows only one default sort per table
It’s very difficult to store large size of binary files in HBase
The storage of HBase will limit real-time queries and sorting
Key lookup and Range lookup in terms of searching table contents using key values, it will limit
queries that perform on real time
Default indexing is not present in HBase. Programmers have to define several lines of code or script
to perform indexing functionality in HBase
Expensive in terms of Hardware requirements and memory blocks allocations.
More servers should be installed for distributed cluster environments (like each server for
NameNode, DataNodes, ZooKeeper, and Region Servers)
Performance wise it require high memory machines
Costing and maintenance wise it is also higher
Advantages of HBase
Here, we will learn what are the pros/benefits of HBase:
Can store large data sets on top of HDFS file storage and will aggregate and analyze billions of rows
present in the HBase tables
In HBase, the database can be shared
Operations such as data reading and processing will take small amount of time as compared to
traditional relational models
Random read and write operations
For online analytical operations, HBase is used extensively.
For example: In banking applications such as real-time data updates in ATM machines, HBase can be
used.