0% found this document useful (0 votes)
47 views3 pages

Dsbda Ass 3

1. The document discusses installing Apache Hadoop and Hive, setting up environment variables and hive-site.xml file. It also provides examples of creating, dropping and altering databases and tables in Hive and HBase. 2. Steps shown include loading a table with data, inserting new values and fields, joining tables with Hive. An index is also created on a flight information table. 3. Examples demonstrate finding average departure delay per day from a flight table for the year 2008 using Hive queries.

Uploaded by

Clash Cinema
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views3 pages

Dsbda Ass 3

1. The document discusses installing Apache Hadoop and Hive, setting up environment variables and hive-site.xml file. It also provides examples of creating, dropping and altering databases and tables in Hive and HBase. 2. Steps shown include loading a table with data, inserting new values and fields, joining tables with Hive. An index is also created on a flight information table. 3. Examples demonstrate finding average departure delay per day from a flight table for the year 2008 using Hive queries.

Uploaded by

Clash Cinema
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

HADOOP HBASE USING HIVE

Installing Apache Hadoop and Hive


$ mkdirhadoop; cp hadoop-1.2.1.tar.gz hadoop; cd Hadoop
$ gunzip hadoop-1.2.1.tar.gz
$ tar xvf *.tar
$ mkdir hive; cp hive-0.11.0.tar.gz hive; cd hive
$ gunzip hive-0.11.0.tar.gz
$ tar xvf *.tar

Setting Up Apache Hive Environment Variables in .bashrc


export HADOOP_HOME=/home/user/Hive/hadoop/hadoop-1.2.1 export JAVA_HOME=/opt/jdk
export HIVE_HOME=/home/user/Hive/hive-0.11.0 export PATH=$HADOOP_HOME/bin:
$HIVE_HOME/bin:
$JAVA_HOME/bin:$PATH

Setting Up the hive-site.xml File


$ cd $HIVE_HOME/conf
$ cp hive-default.xml.templateto hive-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<!-- Hive Execution Parameters -->
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/home/biadmin/Hive/warehouse</value>
<description>location of default database for the warehouse</description>
</property>
</configuration>

Create Database Statement In Hive


hive> CREATE DATABASE userdb;

Create Table Statement


hive> CREATE TABLE IF NOT EXISTS employee ( eidint, name String,salary String, destination
String)COMMENT „Employee details‟ROW FORMAT DELIMITED FIELDS TERMINATED BY „\
t‟LINES TERMINATED BY „\n‟STORED AS TEXTFILE;

Alter Table Statement


hive> ALTER TABLE employee RENAME TO emp;

Drop Table Statement


hive>DROP TABLE IF EXISTS employee;

1) Creating, Dropping, and altering Database tables


hbase(main):001:0> create 'flight','finfo','fsch'
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/suruchi/hbase/lib/slf4jlog4j12-
1.6.4.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hadoop-1.2.1/lib/slf4jlog4j12-1.4.3.jar!/org/
slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
0 row(s) in 1.6440 seconds
=>Hbase::Table – flight
hbase(main):014:0> disable'tb1'
0 row(s) in 1.4940 seconds
hbase(main):015:0> drop 'tb1'
0 row(s) in 0.2540 seconds
2) Load table with data, insert new values and field in the table, Join tables with
Hive
hbase(main):002:0> put 'flight',1,'finfo:dest','mumbai'
0 row(s) in 0.1400 seconds
hbase(main):003:0> put 'flight',1,'finfo:source','pune'
0 row(s) in 0.0070 seconds
hbase(main):004:0> put 'flight',1,'fsch:at','10.25am'
0 row(s) in 0.0120 seconds
hbase(main):005:0> put 'flight',1,'fsch:dt','11.25am'
0 row(s) in 0.0100 seconds
hbase(main):006:0> scan 'flight'
ROW COLUMN+CELL 1
1 column=finfo:dest, timestamp=1554629442188, value=mumbai
1 column=finfo:source, timestamp=1554629455512, value=pune
1 column=fsch:at, timestamp=1554629478320, value=10.25am
1 column=fsch:dt, timestamp=1554629491414, value=11.25am
1 row(s) in 0.0450 seconds
hbase(main):007:0> alter 'flight',Name='revenue'
Updating all regions with the new schema...
0/1 regions updated.
1/1 regions updated.
Done.
0 row(s) in 2.3720 seconds
hbase(main):008:0> put 'flight',1,'revenue',10000
0 row(s) in 0.0110 seconds
hbase(main):016:0> get 'flight',1
COLUMN CELL
Finfo:dest timestamp=1554629442188, value=Mumbai
Finfo: source timestamp=1554629455512, value=pune
fsch:at timestamp=1554629478320, value=10.25am
fsch:dt timestamp=1554629491414, value=11.25am
revenue: timestamp=1554629582539, value=10000
5 row(s) in 0.0310 seconds

3) Create index on Flight information Table


hive>CREATE INDEX ine ON TABLE FLIGHT(source) AS
'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' WITH DEFERRED
REBUILD;
OK
Time taken: 1.841 seconds
hive>SHOW INDEX ON FLIGHT;
OK
Time taken: 0.126 seconds, Fetched: 1 row(s)
4) Find the average departure delay per day in 2008.
hive>select avg(delay) from flight where year = 2008;

You might also like