Day 02
Day 02
md - 2025-05-27
Apache Hive
Hive Documentation
https://cwiki.apache.org/confluence/display/Hive/Home
https://cwiki.apache.org/confluence/display/Hive/LanguageManual
Hive History
Hive advantages
Hive Limitations
Hive Applications
Hive Installation
Install Hadoop.
Install Hive.
hive-site.xml
set PATH in ~/.bashrc
Start metastore service.
Start hive CLI.
Start hiveserver2 service.
Start hive beeline.
Hive QL
When data is ingested in Hive tables using "LOAD DATA" command, the data file is uploaded in Hive directly. Internally, it does HDFS put operation.
Author: Nilesh Ghule 3/6
Sunbeam Institute of Information Technology, Pune & Karad day02.md - 2025-05-27
While uploading file, no checks on data are performed against schema of the table.
However, when data is read/processed record by record, it is verified against schema. If data is not compatible with its data type, the data is considered
as null. If length/size of data is not matching, it will be truncated.
This feature is called "Schema-on-Read". This enables high speed data ingestion.
Hive INSERT
Assignments
1. Assignment 1 -
-- Customers table
CREATE TABLE Customers (
CustomerID INT PRIMARY KEY,
Name VARCHAR(100),
Age INT,
LocationID INT
);
-- Products table
CREATE TABLE Products (
ProductID INT PRIMARY KEY,
ProductName VARCHAR(100),
Category VARCHAR(50),
Price DECIMAL(10, 2)
);
-- Sales table
CREATE TABLE Sales (
Author: Nilesh Ghule 4/6
Sunbeam Institute of Information Technology, Pune & Karad day02.md - 2025-05-27
-- Locations table
CREATE TABLE Locations (
LocationID INT PRIMARY KEY,
City VARCHAR(50),
State VARCHAR(50)
);
-- Customers
INSERT INTO Customers VALUES (1, 'John Doe', 30, 1);
INSERT INTO Customers VALUES (2, 'Jane Smith', 25, 2);
INSERT INTO Customers VALUES (3, 'Bob Johnson', 35, 1);
INSERT INTO Customers VALUES (4, 'Alice Brown', 28, 3);
INSERT INTO Customers VALUES (5, 'Charlie Davis', 32, 2);
-- Products
INSERT INTO Products VALUES (1, 'Laptop', 'Electronics', 800.00);
INSERT INTO Products VALUES (2, 'Smartphone', 'Electronics', 400.00);
INSERT INTO Products VALUES (3, 'T-shirt', 'Clothing', 20.00);
INSERT INTO Products VALUES (4, 'Shoes', 'Footwear', 50.00);
INSERT INTO Products VALUES (5, 'Bookshelf', 'Furniture', 150.00);
-- Sales
INSERT INTO Sales VALUES (1, 1, 1, '2023-01-01', 2, 1600.00);
INSERT INTO Sales VALUES (2, 2, 3, '2023-01-02', 3, 60.00);
INSERT INTO Sales VALUES (3, 3, 2, '2023-01-03', 1, 400.00);
INSERT INTO Sales VALUES (4, 4, 4, '2023-02-01', 2, 100.00);
INSERT INTO Sales VALUES (5, 5, 5, '2023-02-02', 1, 150.00);
-- Locations
INSERT INTO Locations VALUES (1, 'Pune', 'Maharashtra');
INSERT INTO Locations VALUES (2, 'Mumbai', 'Maharashtra');
INSERT INTO Locations VALUES (3, 'Bangalore', 'Karnataka');
INSERT INTO Locations VALUES (4, 'Delhi', 'Delhi');
INSERT INTO Locations VALUES (5, 'Chennai', 'Tamil Nadu');