Cheat Sheet: Hive Basics

Hive functions allow performing operations on HQL queries. User defined functions (UDFs) take columns as arguments and return values, while user defined table generating functions (UDTFs) split columns into multiple rows. Partitioning divides tables into parts based on values like date or department. HCatalog is a metadata and table management system that allows storing data in any format on Hadoop. The Hive SELECT command retrieves data specified by columns, tables, and optional clauses for filtering, sorting, grouping, aggregation, and counting.

Uploaded by

Travis Scott

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

562 views1 page

Cheat Sheet: Hive Basics

Uploaded by

Travis Scott

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 1

HIVE Hive Functions Partitioner

O p e r a t i o n s
Function
- P e r f o r m e d o n
HQL Query
H i v e

C H E AT S H E E T
• UDF(User defined Functions): It is a function that fetches one or more
Partitioner controls the partitioning of keys of the intermediate map SELECT from_columns FROM table WHERE
columns from a row as arguments and returns a single value To retrieve information
conditions;
outputs, typically by a hash function which is same as the number of reduce
• UDTF( User defined Tabular Functions): This function is used to produce To select all values SELECT * FROM table;
tasks for a job

Hive Basics
multiple columns or rows of output by taking zero or more inputs To select a particular category
• Partitioning: It is used for distributing load horizontally. It is a way of SELECT * FROM table WHERE rec_name = "value";
• Macros: It is a function that uses other Hive functions values
dividing the tables into related parts based on values such as date, city, SELECT * FROM TABLE WHERE rec1 = "value1“ AND
• User defined aggregate functions: A user defined function that takes To select for multiple criteria
rec2 = "value2";
departments etc.
multiple rows or columns and returns the aggregation of the data For selecting specific columns SELECT column_name FROM table;
Apache Hive • User defined table generating functions: A function which takes a column
To retrieve unique output records SELECT DISTINCT column_name FROM table;
from single record and splitting it into multiple rows Hcatalog
It is a data warehouse infrastructure based on Hadoop framework which is For sorting SELECT col1, col2 FROM table ORDER BY col2;
perfectly suitable for data summarization, analysis and querying. It uses an It is a metadata and table management system for Hadoop platform which For sorting backwards SELECT col1, col2 FROM table ORDER BY col2 DESC;
SQL like language called HQL (Hive query Language) Hive SELECT Command enables storage of data in any format.
For counting rows from the table SELECT COUNT(*) FROM table;
HQL: It is a query language used to write the custom map reduce SELECT [ALL | DISTINCT] select_expr, select_expr, ... SELECT owner, COUNT(*) FROM table GROUP BY
framework in Hive to perform more sophisticated analysis of the data For grouping along with counting

Table: Table in hive is a table which contains logically stored data

FROM table_reference Hive commands in HQL owner;
SELECT owner, COUNT(*) FROM table GROUP BY
[WHERE where_condition] For selecting maximum values
owner;
Hive Interfaces: [GROUP BY col_list] Data Definition Language(DDL): It is used to build or modify tables and Selecting from multiple tables and SELECT pet.name, comment FROM pet JOIN event
• Hive interfaces includes WEB UI [HAVING having_condition] joining ON (pet.name = event.name);
objects stored in a database. Some of the DDL commands are as follows:
• Hive command line [CLUSTER BY col_list | [DISTRIBUTE BY col_list] [SORT BY col_list]] • To create database in Hive: create database<data base name> C o m m a n d L i n e S t a t e m e n t s
• HD insight (windows server) [LIMIT number] • To list out the databases created in a Hive warehouse: show databases Function Hive Commands
; • To use the database created: USE <data base name> To run the query hive -e 'select a.col from tab1 a'
Components of Hive • Select: Select is a projection operator in HiveQL, which scans the table • To describe the associated database in metadata: describe<data base To run a query in a silent mode hive -S -e 'select a.col from tab1 a'
To select hive configuration hive -e 'select a.col from tab1 a’ –hiveconf
Meta store: Meta store is where the schemas of the Hive tables are stored, specified by the FROM clause name> variables hive.root.logger=DEBUG,console
it stores the information about the tables and partitions that are in the • Where: Where is a condition which specifies what to filter • To alter the database created: alter<data base name> To use the initialization script hive -i initialize.sql

warehouse. • Group by: It uses the list of columns, which specifies how to aggregate the Data Manipulation Language(DML): These statements are used to retrieve, To run the non-interactive script hive -f script.sql
To run script inside the shell source file_name
SerDe: Serializer, Deserializer which gives instructions to hive on how to records store, modify, delete, insert and update data in a database
To run the list command dfs –ls /user
process records • Cluster by, Distribute by, Sort by: Specifies the algorithm to sort, distribute • Inserting data in a database: The Load function is used to move the data To run ls (bash command) from
!ls
and create cluster, and the order for sorting into a particular Hive table. the shell
• Limit: This specifies how many records to be retrieved LOAD data <LOCAL> inpath <file path> into table [tablename] To set configuration variables set mapred.reduce.tasks=32
Thrift • Drop table: The drop table statements deletes the data and metadata
Tab auto completion set hive.<TAB>
To display all variables starting
A thrift service is used to provide remote access from other processors set
Hive Data Types from the table: drop table<table name> with hive
• Aggregation: It is used to count different categories from the table : To revert all variables reset
Integral data types: Timestamp: It supports the traditional To add jar files to distributed
Meta Store • Tinyint
Select count (DISTINCT category) from tablename;
cache
add jar jar_path
Unix timestamp with optional • Grouping: Group command is used to group the result set, where the To display all the jars in the
This is a service which stores the metadata information such as table • Smallint list jars
nanosecond precision result of one table is stored in the other: Select <category>, sum( distributed cache
schemas • Int • Dates amount) from <txt records> group by <category> To delete jars from the
delete jar jar_name
• distributed cache
Bigint • Decimals • To exit from the Hive shell: Use the command quit
Indexes String types: Complex types: M e t a d a t a F u n c t i o n s a n d Q u e r y
Indexes are created to the speedy access to columns in the database • VARCHAR-Length(1 to 65355) • Arrays: Syntax-ARRAY<data_type> Function Hive Commands
• CHAR-Length(255) • Maps: Syntax- MAP<primitive_type, User Selecting a database USE database;
Syntax: Create index <INDEX_NAME> on table <TABLE_NAME> WEB UI HIVE COMMAND LINE HD Insight
Union type: It is a collection of Interface Listing databases SHOW DATABASES;
data_type>
heterogenous data types. • Structs: STRUCT<col_name : listing table in a database SHOW TABLES;
Hive Function Meta • Syntax: UNIONTYPE<int, double, data_type [COMMENT Describing format of a table DESCRIBE (FORMATTED|EXTENDED) table;

array<string>, Hive QL Process Engine Creating a database CREATE DATABASE db_name;

Commands col_comment], ...>
Meta Store Execution Engine Dropping a database DROP DATABASE db_name (CASCADE);
struct<a:int,b:string>>
Show functions: Lists Hive functions and operators Map Reduce
Describe function [function name]: Displays short description of the
particular function Bucketing
Describe function extended [function name]: Displays extended description HDFS or HBASE Data Storage
of the particular function It is a technique to decompose the datasets into more manageable parts FURTHERMORE:
Hadoop Certification Training Course

Azure DevOps CI-CD
No ratings yet
Azure DevOps CI-CD
25 pages
Cheat Sheet: Created by Tomi Mester
100% (1)
Cheat Sheet: Created by Tomi Mester
12 pages
Translator Program PDF
100% (1)
Translator Program PDF
3 pages
Hadoop Interview Guide
100% (1)
Hadoop Interview Guide
34 pages
Unit-7 Transaction Processing
No ratings yet
Unit-7 Transaction Processing
107 pages
Data Warehouse Schema
No ratings yet
Data Warehouse Schema
6 pages
Naming Conventions For Oracle Tables
100% (1)
Naming Conventions For Oracle Tables
5 pages
HIVE
No ratings yet
HIVE
80 pages
Snowflake Certification
No ratings yet
Snowflake Certification
102 pages
Deld Unit 1
No ratings yet
Deld Unit 1
177 pages
Data Warehouse Concepts
No ratings yet
Data Warehouse Concepts
11 pages
React Native Guide
100% (1)
React Native Guide
31 pages
Tableau Notes
No ratings yet
Tableau Notes
21 pages
Hive Overview
No ratings yet
Hive Overview
28 pages
Hive
No ratings yet
Hive
29 pages
Sophos Firewall Vs Sonicwall Battlecard
No ratings yet
Sophos Firewall Vs Sonicwall Battlecard
10 pages
TLE-ICT Learning Module
No ratings yet
TLE-ICT Learning Module
40 pages
Dimensional Modeling
No ratings yet
Dimensional Modeling
84 pages
Kafka Cheat Sheets
No ratings yet
Kafka Cheat Sheets
1 page
Timeprovider 5000 Ieee 1588 Grand Master Clock: User'S Guide
No ratings yet
Timeprovider 5000 Ieee 1588 Grand Master Clock: User'S Guide
306 pages
Apache Pig
100% (2)
Apache Pig
80 pages
Software Architecture Cheat Sheet For Daily Usage
No ratings yet
Software Architecture Cheat Sheet For Daily Usage
6 pages
A Dimension Table Consists of The Attributes About The Facts
No ratings yet
A Dimension Table Consists of The Attributes About The Facts
3 pages
Hive
No ratings yet
Hive
9 pages
Extract Transform Load
No ratings yet
Extract Transform Load
80 pages
Data Mining Cheat Sheet PDF
No ratings yet
Data Mining Cheat Sheet PDF
6 pages
Cheat Sheet: With Stata 15
No ratings yet
Cheat Sheet: With Stata 15
6 pages
HBase
No ratings yet
HBase
31 pages
SQL Interview Questions and Answers G
No ratings yet
SQL Interview Questions and Answers G
67 pages
Data Modeler Release Notes
No ratings yet
Data Modeler Release Notes
81 pages
ETL Testing Int - 1
No ratings yet
ETL Testing Int - 1
16 pages
Big Data Cheat Sheet
No ratings yet
Big Data Cheat Sheet
1 page
GlideRecord Query Cheat Sheet ServiceNow Guru PDF
No ratings yet
GlideRecord Query Cheat Sheet ServiceNow Guru PDF
20 pages
DAX Functions - Math and Statistical Functions
No ratings yet
DAX Functions - Math and Statistical Functions
9 pages
Hadoop Commands Cheat Sheet
No ratings yet
Hadoop Commands Cheat Sheet
1 page
SLX 7600 GX
No ratings yet
SLX 7600 GX
521 pages
Cheat Sheet Stats For Exam Cheat Sheet Stats For Exam
No ratings yet
Cheat Sheet Stats For Exam Cheat Sheet Stats For Exam
3 pages
Adbms Data Warehousing and Data Mining
No ratings yet
Adbms Data Warehousing and Data Mining
169 pages
Data Warehousing Basics
No ratings yet
Data Warehousing Basics
20 pages
SQL01 - Introduction To Business Intelligence
No ratings yet
SQL01 - Introduction To Business Intelligence
75 pages
Dataprep Cheat Sheet
No ratings yet
Dataprep Cheat Sheet
1 page
Stock Maintenance
No ratings yet
Stock Maintenance
19 pages
DevOps Glossary - Razorops
No ratings yet
DevOps Glossary - Razorops
17 pages
Dimensional Data Modeling - Lecture 1
No ratings yet
Dimensional Data Modeling - Lecture 1
21 pages
Hidesign
No ratings yet
Hidesign
22 pages
Hyperion Data Relationship Management
No ratings yet
Hyperion Data Relationship Management
5 pages
CA CheatSheet
No ratings yet
CA CheatSheet
3 pages
Business Strategy: Program Objectives
No ratings yet
Business Strategy: Program Objectives
6 pages
Typical Interview Questions
No ratings yet
Typical Interview Questions
11 pages
Data Warehousing MCQ
No ratings yet
Data Warehousing MCQ
71 pages
9 Sqoop Notes
No ratings yet
9 Sqoop Notes
17 pages
MSBI Cheat Sheet PDF
No ratings yet
MSBI Cheat Sheet PDF
1 page
Fact and Dimension Tables
No ratings yet
Fact and Dimension Tables
11 pages
Dbms Quiz
No ratings yet
Dbms Quiz
13 pages
Fundamentals of Database Systems: Lesson 1: Introduction
No ratings yet
Fundamentals of Database Systems: Lesson 1: Introduction
35 pages
Isom 2700 Cheat Sheet - 1
No ratings yet
Isom 2700 Cheat Sheet - 1
2 pages
Rotel Ra-971 SM
No ratings yet
Rotel Ra-971 SM
4 pages
HPVM Cheat Sheet
No ratings yet
HPVM Cheat Sheet
4 pages
Lacoste Case Final
No ratings yet
Lacoste Case Final
19 pages
Sap BW Cheat Sheet
No ratings yet
Sap BW Cheat Sheet
2 pages
AGILE
No ratings yet
AGILE
1 page
Sqoop Cheatsheet
No ratings yet
Sqoop Cheatsheet
3 pages
ZT Docker Cheat Sheet
No ratings yet
ZT Docker Cheat Sheet
1 page
Etl VS Elt
No ratings yet
Etl VS Elt
8 pages
Sap BW Cheat Sheet
No ratings yet
Sap BW Cheat Sheet
2 pages
Shashikanth Bondugula Director, Head of People, India: Strictly Private & Confidential
No ratings yet
Shashikanth Bondugula Director, Head of People, India: Strictly Private & Confidential
1 page
Multiple Choice Questions: Principles of Database Management
No ratings yet
Multiple Choice Questions: Principles of Database Management
8 pages
Dev's Datastage Tutorial, Guides, Training and Online Help 4 U. Unix, Etl, Database Related Solutions - Datastage Interview Questions and Answers v1
No ratings yet
Dev's Datastage Tutorial, Guides, Training and Online Help 4 U. Unix, Etl, Database Related Solutions - Datastage Interview Questions and Answers v1
6 pages
Cheat Sheet Dbms
No ratings yet
Cheat Sheet Dbms
1 page
EQPM and Analyst Cheat Sheet
No ratings yet
EQPM and Analyst Cheat Sheet
2 pages
RESEARCH METHOD - RESEARCH METHOD - Project Topics
No ratings yet
RESEARCH METHOD - RESEARCH METHOD - Project Topics
8 pages
BioStar2.7.10 Installation Guide
No ratings yet
BioStar2.7.10 Installation Guide
12 pages
Hpom 8e1 4eth 11 463
No ratings yet
Hpom 8e1 4eth 11 463
4 pages
Interactive Reports
No ratings yet
Interactive Reports
54 pages
Zara - Kriti and Akanksha
No ratings yet
Zara - Kriti and Akanksha
16 pages
Understanding The Memory Segments - Why Do We Need It
No ratings yet
Understanding The Memory Segments - Why Do We Need It
19 pages
Eric Sloof-AdvancedTroubleshooting
No ratings yet
Eric Sloof-AdvancedTroubleshooting
50 pages
ABAS (Aircraft Based Augmentation Systems) : 1.1 - RAIM (Receiver Autonomous Integrity Monitoring
No ratings yet
ABAS (Aircraft Based Augmentation Systems) : 1.1 - RAIM (Receiver Autonomous Integrity Monitoring
7 pages
Status Indicators: Appendix
No ratings yet
Status Indicators: Appendix
3 pages
HASIM FARHAN - Original
No ratings yet
HASIM FARHAN - Original
1 page
Roaming Settings
No ratings yet
Roaming Settings
6 pages
Toshiba Qosmio F750-X5312 Phase-In (14 03 2011)
No ratings yet
Toshiba Qosmio F750-X5312 Phase-In (14 03 2011)
22 pages
Unit Ii
No ratings yet
Unit Ii
55 pages
SysCat - I181E-EN-07+Sysmac Studio
No ratings yet
SysCat - I181E-EN-07+Sysmac Studio
14 pages
Growth Strategy Assignment: Jyoti Rana
No ratings yet
Growth Strategy Assignment: Jyoti Rana
8 pages
Teaching Model-Based Design and Rapid Prototyping To Undergraduate Students
No ratings yet
Teaching Model-Based Design and Rapid Prototyping To Undergraduate Students
6 pages
Ashab Abid Rizvi Ob
No ratings yet
Ashab Abid Rizvi Ob
5 pages
API - Tr100+ Manaul1
No ratings yet
API - Tr100+ Manaul1
65 pages
Datasheet HP 20
No ratings yet
Datasheet HP 20
1 page
Mohit - Kumar - Mohit Kumar
No ratings yet
Mohit - Kumar - Mohit Kumar
8 pages
Qpaper Stats
No ratings yet
Qpaper Stats
1 page
Neelesh - Suteri - Neelesh Suteri
No ratings yet
Neelesh - Suteri - Neelesh Suteri
6 pages
hw4 Spr23-Programming Assignment
No ratings yet
hw4 Spr23-Programming Assignment
5 pages
Feasibility Study: Guidelines For Making The Presentation
No ratings yet
Feasibility Study: Guidelines For Making The Presentation
1 page
MBA Internship 2021: Pre-Interview Assignment
No ratings yet
MBA Internship 2021: Pre-Interview Assignment
9 pages
INR - HS - 76011010 - Aluminium Ingots
No ratings yet
INR - HS - 76011010 - Aluminium Ingots
7 pages
Sda 30a 8
No ratings yet
Sda 30a 8
3 pages
Renuka Mainframe
No ratings yet
Renuka Mainframe
3 pages
BackupReport2020 04 11 22 00 00+ 1579037321328
No ratings yet
BackupReport2020 04 11 22 00 00+ 1579037321328
3 pages
UFC Mobile Daily Login Rewards-1
No ratings yet
UFC Mobile Daily Login Rewards-1
2 pages
Waiting List: SL Name Roll No Overall Rank Category Rank
No ratings yet
Waiting List: SL Name Roll No Overall Rank Category Rank
2 pages
Exception Handling in CPP PDF
No ratings yet
Exception Handling in CPP PDF
2 pages
Better Proposals Yield Better Wins!
From Everand
Better Proposals Yield Better Wins!
Howard Nevin
No ratings yet
Google Cloud Platform Complete Self-Assessment Guide
From Everand
Google Cloud Platform Complete Self-Assessment Guide
Gerardus Blokdyk
1/5 (1)

Cheat Sheet: Hive Basics

Uploaded by

Cheat Sheet: Hive Basics

Uploaded by

HIVE Hive Functions Partitioner

Table: Table in hive is a table which contains logically stored data

array<string>, Hive QL Process Engine Creating a database CREATE DATABASE db_name;

You might also like