0% found this document useful (0 votes)
103 views9 pages

Big Data Assignment Nov 18 H Base

The document provides instructions for loading sales data into Hive and HBase. It details steps to: 1) Load a CSV file into Hive external and internal tables; 2) Insert records from the external table into the internal ORC table; 3) Create an HBase table and Hive table mapped to HBase; 4) Insert records from the Hive ORC table into the HBase table; and 5) Run queries on the HBase table including scans, gets, and filters.

Uploaded by

inder saini
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
103 views9 pages

Big Data Assignment Nov 18 H Base

The document provides instructions for loading sales data into Hive and HBase. It details steps to: 1) Load a CSV file into Hive external and internal tables; 2) Insert records from the external table into the internal ORC table; 3) Create an HBase table and Hive table mapped to HBase; 4) Insert records from the Hive ORC table into the HBase table; and 5) Run queries on the HBase table including scans, gets, and filters.

Uploaded by

inder saini
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

N01346254 Inderjit Singh

1. Load sales data


a. First load product. csv file into hadoop /user/maria_dev/ directory

2. then create a hive external table and load the csv file
a. CREATE EXTERNAL TABLE IF NOT EXISTS product_external(id int, item string,fullname
string, quantity int, price int,item_type string)
b. ROW FORMAT DELIMITED
c. FIELDS TERMINATED BY ','
d. STORED AS TEXTFILE
e. LOCATION '/tmp/product';
3. Then lets load data into external hive table
a. LOAD DATA INPATH '/tmp/lab/product.csv' overwrite INTO TABLE product_external
b. If you get an error saying failed to movetask - try selecting from table product_external
to see if it has loaded data – if it has move on as there could be an issue with the
sandbox
4. Then create a hive internal table
a. CREATE TABLE IF NOT EXISTS EXISTS product_ORC(id int, item string,fullname string,
quantity int, price int,item_type string) STORED AS ORC;
b. If you get an error – remove the redundant word 😊

5. Then load data from external to internal ORC


a. INSERT INTO TABLE product_orc SELECT * FROM product_external; - screen print
results
6. select from both tables to see the data - screen print results

select * from product_external;


select * from product_ORC;

7. login to Hbase and create a hbase table


a. create table 'Product', 'details'
8. create table in hive that maps directly to the hbase table
a. Please review previous class notes to do this ?
create and print the definition use name ext_hbase_product

Table definition :
9. From hive we insert records into Hbase table
a. INSERT INTO TABLE ext_hbase_product SELECT * FROM product_orc;
b. Select * from ext_hbase_table - screen print first page only
10. From Hbase
a. scan 'Product' - screen print the last page from hbase

Last page of screen

b. get ‘Product’,’1’ – screen print


c. Write a filter command to show just the fullname

d. Write a filter command where item_type = ‘paper’ screen print

You might also like