0% found this document useful (0 votes)

4 views6 pages

Day 02

The document provides an overview of Apache Hive, a data warehouse built on the Hadoop framework for managing structured data and executing SQL-like queries. It discusses Hive's advantages, limitations, applications, installation process, and features of Hive QL, including schema-on-read and data types. Additionally, it includes sample SQL commands for creating and inserting data into tables, as well as assignments for practical application.

Uploaded by

Kuldeep Rane

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views6 pages

Day 02

Uploaded by

Kuldeep Rane

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Sunbeam Institute of Information Technology, Pune & Karad day02.

md - 2025-05-27

Big Data Technologies

Agenda
Hadoop Fundamentals
Hadoop Ecosystems
Hadoop Distributions
Hive

Apache Hive

Hive is data warehouse (for OLAP) built on Hadoop framework.

Data warehouse = (Huge) RDBMS for Analytical Processing
Hive manage "structured data".
Hive is client software that convert Hive QL (SQL-like queries) queries to MR.
Hive QL is similar to SQL with many extended features.

Hive Documentation

https://cwiki.apache.org/conﬂuence/display/Hive/Home
https://cwiki.apache.org/conﬂuence/display/Hive/LanguageManual

Hive History

Facebook data ingestion into Hadoop

10s GB/day – 2006
1 TB/day – 2007
MySQL/Oracle database limitations
Processing Hadoop data using MR is complex
Developed Hive to convert SQL queries into MR
Open sourced under Apache license (2010)

Author: Nilesh Ghule 1/6

Sunbeam Institute of Information Technology, Pune & Karad day02.md - 2025-05-27

Hive advantages

Data warehouse - data analysis

Long running queries.
Fault tolerant environment.

Hive Limitations

Slower response time.

Data manipulation is not supported (fully).

Hive Applications

Batch processing (SQL based)

ETL jobs
Business Intelligence (Reports)
Predictive Modeling
Data mining
Log processing

Hive Installation

Install Hadoop.
Install Hive.
hive-site.xml
set PATH in ~/.bashrc
Start metastore service.
Start hive CLI.
Start hiveserver2 service.
Start hive beeline.

Hive QL

Author: Nilesh Ghule 2/6

Sunbeam Institute of Information Technology, Pune & Karad day02.md - 2025-05-27

Hive QL is extended SQL.

Supports DQL, DML, DDL and DCL.
DQL supports filtering, ordering, grouping, joins, etc.
Data will be read from HDFS (default location: /user/hive/warehouse).
Store query result into HDFS (or in another table).
INSERT INTO atable SELECT * FROM another_table;
Supports views and indexes (till Hive 2.x).
Managed tables and External tables
Partitioning & Bucketing.
Various hive data types
Primitive Types:
BOOLEAN(1)
Integers: TINYINT(1), SMALLINT(2), INT(4), BIGINT(8)
Floating Point: FLOAT (single precision), DOUBLE (double precision), DECIMAL(m,n)
Characters: CHAR(n), VARCHAR(n), STRING
Date & Time: TIMESTAMP, DATE, DATETIME
Collection Types:
ARRAY: collection of same type of data
e.g. emails (separated by |)
STRUCT: collection of different type of data
e.g. addr (area STRING| dist STRING| pin INT)
MAP: collection of key-value pairs
e.g. phones (type:phone)
UNION
Follows Schema-on-Read for better ingestion performance
While loading the data (LOAD DATA) schema is not verified.
While processing individual records schema is verified (SELECT, INSERT, UPDATE).
If data is not compatible with the type, value is considered null.

Hive QL - Schema on Read

When data is ingested in Hive tables using "LOAD DATA" command, the data ﬁle is uploaded in Hive directly. Internally, it does HDFS put operation.
Author: Nilesh Ghule 3/6
Sunbeam Institute of Information Technology, Pune & Karad day02.md - 2025-05-27

While uploading ﬁle, no checks on data are performed against schema of the table.
However, when data is read/processed record by record, it is veriﬁed against schema. If data is not compatible with its data type, the data is considered
as null. If length/size of data is not matching, it will be truncated.
This feature is called "Schema-on-Read". This enables high speed data ingestion.

Hive INSERT

Inserts new records into hive table.

Internally creates new ﬁles under HDFS (table directory).
Produce MR job to insert data.
While INSERT hive follows schema on write.

Assignments
1. Assignment 1 -

-- Customers table
CREATE TABLE Customers (
CustomerID INT PRIMARY KEY,
Name VARCHAR(100),
Age INT,
LocationID INT
);

-- Products table
CREATE TABLE Products (
ProductID INT PRIMARY KEY,
ProductName VARCHAR(100),
Category VARCHAR(50),
Price DECIMAL(10, 2)
);

-- Sales table
CREATE TABLE Sales (
Author: Nilesh Ghule 4/6
Sunbeam Institute of Information Technology, Pune & Karad day02.md - 2025-05-27

SaleID INT PRIMARY KEY,

CustomerID INT,
ProductID INT,
SaleDate DATE,
Quantity INT,
TotalAmount DECIMAL(10, 2)
);

-- Locations table
CREATE TABLE Locations (
LocationID INT PRIMARY KEY,
City VARCHAR(50),
State VARCHAR(50)
);

-- Customers
INSERT INTO Customers VALUES (1, 'John Doe', 30, 1);
INSERT INTO Customers VALUES (2, 'Jane Smith', 25, 2);
INSERT INTO Customers VALUES (3, 'Bob Johnson', 35, 1);
INSERT INTO Customers VALUES (4, 'Alice Brown', 28, 3);
INSERT INTO Customers VALUES (5, 'Charlie Davis', 32, 2);

-- Products
INSERT INTO Products VALUES (1, 'Laptop', 'Electronics', 800.00);
INSERT INTO Products VALUES (2, 'Smartphone', 'Electronics', 400.00);
INSERT INTO Products VALUES (3, 'T-shirt', 'Clothing', 20.00);
INSERT INTO Products VALUES (4, 'Shoes', 'Footwear', 50.00);
INSERT INTO Products VALUES (5, 'Bookshelf', 'Furniture', 150.00);

-- Sales
INSERT INTO Sales VALUES (1, 1, 1, '2023-01-01', 2, 1600.00);
INSERT INTO Sales VALUES (2, 2, 3, '2023-01-02', 3, 60.00);
INSERT INTO Sales VALUES (3, 3, 2, '2023-01-03', 1, 400.00);
INSERT INTO Sales VALUES (4, 4, 4, '2023-02-01', 2, 100.00);
INSERT INTO Sales VALUES (5, 5, 5, '2023-02-02', 1, 150.00);

Author: Nilesh Ghule 5/6

Sunbeam Institute of Information Technology, Pune & Karad day02.md - 2025-05-27

-- Locations
INSERT INTO Locations VALUES (1, 'Pune', 'Maharashtra');
INSERT INTO Locations VALUES (2, 'Mumbai', 'Maharashtra');
INSERT INTO Locations VALUES (3, 'Bangalore', 'Karnataka');
INSERT INTO Locations VALUES (4, 'Delhi', 'Delhi');
INSERT INTO Locations VALUES (5, 'Chennai', 'Tamil Nadu');

A. Retrieve the names of all customers who made a purchase.

B. List the products and their total sales amounts for a given date range.
C. Find the total sales amount for each product category.
D. Identify the customers who made purchases in a speciﬁc city.
E. Calculate the average age of customers who bought products in the 'Electronics' category.
F. List the top 3 products based on total sales amount.
G. Find the total sales amount for each month.
H. Identify the products with no sales.
I. Calculate the total sales amount for each state.
J. Retrieve the customer names and their highest purchase amount.

Author: Nilesh Ghule 6/6

Tech Mahindra Data Analyst Interview Questions
No ratings yet
Tech Mahindra Data Analyst Interview Questions
11 pages
Barclays Data Engineer Interview Questions
No ratings yet
Barclays Data Engineer Interview Questions
17 pages
Core Abap Notes
No ratings yet
Core Abap Notes
38 pages
Spring Boot Token Based Authentication With Spring Security and JWT
No ratings yet
Spring Boot Token Based Authentication With Spring Security and JWT
30 pages
SQL Test 2 - IP - Answers
No ratings yet
SQL Test 2 - IP - Answers
8 pages
Chapter 1 - Concept of Object Oriented Database
100% (1)
Chapter 1 - Concept of Object Oriented Database
23 pages
Power Builder Interview Question and Answers
80% (10)
Power Builder Interview Question and Answers
157 pages
Python Programming LAB IV Sem NEP-1
100% (2)
Python Programming LAB IV Sem NEP-1
22 pages
SQL Interview
100% (1)
SQL Interview
68 pages
DBMS QP Set3 To Students
0% (1)
DBMS QP Set3 To Students
4 pages
DBMS Lab Manual 2020 - 21 Student Copy Updated
100% (1)
DBMS Lab Manual 2020 - 21 Student Copy Updated
59 pages
Software Lab
No ratings yet
Software Lab
10 pages
DB2 Refresher
No ratings yet
DB2 Refresher
28 pages
Views: Friday, January 17th, 2003
No ratings yet
Views: Friday, January 17th, 2003
27 pages
Labpsp
No ratings yet
Labpsp
46 pages
Hive Lecture Notes
100% (1)
Hive Lecture Notes
17 pages
Database Management Practical File
100% (1)
Database Management Practical File
17 pages
Summary of DataSys
No ratings yet
Summary of DataSys
5 pages
PeopleCode Eventsv1.01
100% (1)
PeopleCode Eventsv1.01
186 pages
DBMS Lab Manual
No ratings yet
DBMS Lab Manual
25 pages
Apache Hive: An Introduction
No ratings yet
Apache Hive: An Introduction
51 pages
SQL Essentials PDF
No ratings yet
SQL Essentials PDF
36 pages
DBM Sfile
No ratings yet
DBM Sfile
10 pages
Multidimensional Data Model and OLAP
No ratings yet
Multidimensional Data Model and OLAP
21 pages
Mysql Final
No ratings yet
Mysql Final
14 pages
Optimisation
No ratings yet
Optimisation
9 pages
HIVE Codes
No ratings yet
HIVE Codes
6 pages
DBMSS Lab Record
No ratings yet
DBMSS Lab Record
41 pages
(UPDATED) SQL Level 3
No ratings yet
(UPDATED) SQL Level 3
9 pages
Dev Exp 02 DBMS
No ratings yet
Dev Exp 02 DBMS
9 pages
SQL From Zero To Data Hero
No ratings yet
SQL From Zero To Data Hero
43 pages
Mysql Project
No ratings yet
Mysql Project
7 pages
Session 3.2
No ratings yet
Session 3.2
27 pages
Hive Documet
No ratings yet
Hive Documet
33 pages
SQL Commands:: Structured Query Language
No ratings yet
SQL Commands:: Structured Query Language
12 pages
MySQL Query
No ratings yet
MySQL Query
18 pages
Food Ordering System
No ratings yet
Food Ordering System
13 pages
Revision Mid 496
No ratings yet
Revision Mid 496
12 pages
How To Improve ASCP Data Collections Performance
No ratings yet
How To Improve ASCP Data Collections Performance
6 pages
Data Engineer (3-5 Years of Experience.) PDF
No ratings yet
Data Engineer (3-5 Years of Experience.) PDF
7 pages
CBSE Class 12 Informatic Practices Databases and SQL
No ratings yet
CBSE Class 12 Informatic Practices Databases and SQL
45 pages
DBMS Manual Final PDF
No ratings yet
DBMS Manual Final PDF
49 pages
Queries
No ratings yet
Queries
28 pages
Final-MySQL - Revision
No ratings yet
Final-MySQL - Revision
15 pages
DBMS Query
No ratings yet
DBMS Query
33 pages
Database Labreport
No ratings yet
Database Labreport
22 pages
ch1 Boookbksolnpdf
No ratings yet
ch1 Boookbksolnpdf
13 pages
SQL Tutorial1
No ratings yet
SQL Tutorial1
25 pages
Amey B-50 DWM Lab Experiment-2
No ratings yet
Amey B-50 DWM Lab Experiment-2
19 pages
Data Warehousing & OLAP (Business Intellegent)
No ratings yet
Data Warehousing & OLAP (Business Intellegent)
31 pages
SQL PPT DDL DML Agg Operator Clauses
No ratings yet
SQL PPT DDL DML Agg Operator Clauses
76 pages
Basics of SQL: Select From Where Select From Where
No ratings yet
Basics of SQL: Select From Where Select From Where
13 pages
Model-FAT Lab
No ratings yet
Model-FAT Lab
12 pages
ESQUEMAy SINTAXISv 3 Eng
No ratings yet
ESQUEMAy SINTAXISv 3 Eng
2 pages
SQL Questions
No ratings yet
SQL Questions
4 pages
Database Practices Lab Record NEW
No ratings yet
Database Practices Lab Record NEW
58 pages
SetA 1 Unlocked
No ratings yet
SetA 1 Unlocked
9 pages
Leetcode SQL QnA 1693149052
No ratings yet
Leetcode SQL QnA 1693149052
60 pages
Create A Table Students With Columns Student
No ratings yet
Create A Table Students With Columns Student
29 pages
Dbms
No ratings yet
Dbms
9 pages
SQL Queries+5Tables
No ratings yet
SQL Queries+5Tables
19 pages
Abishake C - 611222205003
No ratings yet
Abishake C - 611222205003
68 pages
2020UCM2333-DBMS Practical File
No ratings yet
2020UCM2333-DBMS Practical File
23 pages
Sunshine Chennai Senior Secondary School: Ssce Practical Examination 2024-2025 Computer Science
No ratings yet
Sunshine Chennai Senior Secondary School: Ssce Practical Examination 2024-2025 Computer Science
5 pages
Swiggy Business Analyst Interview Preparation
No ratings yet
Swiggy Business Analyst Interview Preparation
14 pages
Software Lab
No ratings yet
Software Lab
18 pages
Database Management Systems-Sql Model
No ratings yet
Database Management Systems-Sql Model
3 pages
Dbms Lab Manual Barath PDF
No ratings yet
Dbms Lab Manual Barath PDF
80 pages
SQL CREATE TABLE Statement
No ratings yet
SQL CREATE TABLE Statement
11 pages
Exercises DBA II
No ratings yet
Exercises DBA II
63 pages
Integrity Constraints and Triggers
100% (1)
Integrity Constraints and Triggers
12 pages
Oracle Lead2pass 1z0-071 Sample Question 2021-Feb-21 by Dave 190q Vce
No ratings yet
Oracle Lead2pass 1z0-071 Sample Question 2021-Feb-21 by Dave 190q Vce
28 pages
Sql-Lab 3 DML PDF
No ratings yet
Sql-Lab 3 DML PDF
4 pages
SQL Queries
No ratings yet
SQL Queries
6 pages
CST204 QP
No ratings yet
CST204 QP
4 pages
DP203 Exam Question Guide Ed-2
No ratings yet
DP203 Exam Question Guide Ed-2
11 pages
HTTP Localhost 9889 Doc Ad Bad Apter HTML Tib Adadb Config Depl
No ratings yet
HTTP Localhost 9889 Doc Ad Bad Apter HTML Tib Adadb Config Depl
13 pages
Dbms Unit-II Notes
No ratings yet
Dbms Unit-II Notes
43 pages
4.2 PLSQL Chapter 4 PDF
No ratings yet
4.2 PLSQL Chapter 4 PDF
5 pages
Top Best Practices SSIS PDF
No ratings yet
Top Best Practices SSIS PDF
10 pages
Advanced SQL Puzzles
No ratings yet
Advanced SQL Puzzles
105 pages
It Project For Class 12
No ratings yet
It Project For Class 12
64 pages
Timetk Functions
No ratings yet
Timetk Functions
185 pages
DBMS AnnualFile
No ratings yet
DBMS AnnualFile
7 pages
Record SQL (1, 2, 3)
No ratings yet
Record SQL (1, 2, 3)
7 pages
PostgreSQL Cheatsheet
No ratings yet
PostgreSQL Cheatsheet
1 page
Module 7
No ratings yet
Module 7
2 pages

Day 02

Uploaded by

Day 02

Uploaded by

Sunbeam Institute of Information Technology, Pune & Karad day02.

Big Data Technologies

Hive is data warehouse (for OLAP) built on Hadoop framework.

Facebook data ingestion into Hadoop

Author: Nilesh Ghule 1/6

Data warehouse - data analysis

Slower response time.

Batch processing (SQL based)

Author: Nilesh Ghule 2/6

Hive QL is extended SQL.

Hive QL - Schema on Read

Inserts new records into hive table.

SaleID INT PRIMARY KEY,

Author: Nilesh Ghule 5/6

A. Retrieve the names of all customers who made a purchase.

Author: Nilesh Ghule 6/6

You might also like