1. The document discusses installing Apache Hadoop and Hive, setting up environment variables and hive-site.xml file. It also provides examples of creating, dropping and altering databases and tables in Hive and HBase.
2. Steps shown include loading a table with data, inserting new values and fields, joining tables with Hive. An index is also created on a flight information table.
3. Examples demonstrate finding average departure delay per day from a flight table for the year 2008 using Hive queries.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
47 views3 pages
Dsbda Ass 3
1. The document discusses installing Apache Hadoop and Hive, setting up environment variables and hive-site.xml file. It also provides examples of creating, dropping and altering databases and tables in Hive and HBase.
2. Steps shown include loading a table with data, inserting new values and fields, joining tables with Hive. An index is also created on a flight information table.
3. Examples demonstrate finding average departure delay per day from a flight table for the year 2008 using Hive queries.
$ cd $HIVE_HOME/conf $ cp hive-default.xml.templateto hive-site.xml <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <!-- Hive Execution Parameters --> <property> <name>hive.metastore.warehouse.dir</name> <value>/home/biadmin/Hive/warehouse</value> <description>location of default database for the warehouse</description> </property> </configuration>
Create Database Statement In Hive
hive> CREATE DATABASE userdb;
Create Table Statement
hive> CREATE TABLE IF NOT EXISTS employee ( eidint, name String,salary String, destination String)COMMENT „Employee details‟ROW FORMAT DELIMITED FIELDS TERMINATED BY „\ t‟LINES TERMINATED BY „\n‟STORED AS TEXTFILE;
Alter Table Statement
hive> ALTER TABLE employee RENAME TO emp;
Drop Table Statement
hive>DROP TABLE IF EXISTS employee;
1) Creating, Dropping, and altering Database tables
hbase(main):001:0> create 'flight','finfo','fsch' SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/home/suruchi/hbase/lib/slf4jlog4j12- 1.6.4.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/local/hadoop-1.2.1/lib/slf4jlog4j12-1.4.3.jar!/org/ slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. 0 row(s) in 1.6440 seconds =>Hbase::Table – flight hbase(main):014:0> disable'tb1' 0 row(s) in 1.4940 seconds hbase(main):015:0> drop 'tb1' 0 row(s) in 0.2540 seconds 2) Load table with data, insert new values and field in the table, Join tables with Hive hbase(main):002:0> put 'flight',1,'finfo:dest','mumbai' 0 row(s) in 0.1400 seconds hbase(main):003:0> put 'flight',1,'finfo:source','pune' 0 row(s) in 0.0070 seconds hbase(main):004:0> put 'flight',1,'fsch:at','10.25am' 0 row(s) in 0.0120 seconds hbase(main):005:0> put 'flight',1,'fsch:dt','11.25am' 0 row(s) in 0.0100 seconds hbase(main):006:0> scan 'flight' ROW COLUMN+CELL 1 1 column=finfo:dest, timestamp=1554629442188, value=mumbai 1 column=finfo:source, timestamp=1554629455512, value=pune 1 column=fsch:at, timestamp=1554629478320, value=10.25am 1 column=fsch:dt, timestamp=1554629491414, value=11.25am 1 row(s) in 0.0450 seconds hbase(main):007:0> alter 'flight',Name='revenue' Updating all regions with the new schema... 0/1 regions updated. 1/1 regions updated. Done. 0 row(s) in 2.3720 seconds hbase(main):008:0> put 'flight',1,'revenue',10000 0 row(s) in 0.0110 seconds hbase(main):016:0> get 'flight',1 COLUMN CELL Finfo:dest timestamp=1554629442188, value=Mumbai Finfo: source timestamp=1554629455512, value=pune fsch:at timestamp=1554629478320, value=10.25am fsch:dt timestamp=1554629491414, value=11.25am revenue: timestamp=1554629582539, value=10000 5 row(s) in 0.0310 seconds
3) Create index on Flight information Table
hive>CREATE INDEX ine ON TABLE FLIGHT(source) AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' WITH DEFERRED REBUILD; OK Time taken: 1.841 seconds hive>SHOW INDEX ON FLIGHT; OK Time taken: 0.126 seconds, Fetched: 1 row(s) 4) Find the average departure delay per day in 2008. hive>select avg(delay) from flight where year = 2008;