Apache Hive

Apache Hive is a very effective tool when it comes to big data (descriptive data to be analyzed). • Archive data software that supports the process of data analysis of big data on a regular basis, the concept of big data nest is very popular in the technology area. • As data is stored in the Apache Hadoop Distributed File System (HDFS) where data is processed and processed, Apache Hive assists in processing and analyzing, and producing data-driven patterns and trends.

Uploaded by

koxawap663

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views17 pages

Apache Hive

Uploaded by

koxawap663

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/376086576

HIVE (APACHE HIVE)

Presentation · December 2023

CITATIONS READS

0 212

1 author:

Nilu Singh
Koneru Lakshmaiah Education Foundation
120 PUBLICATIONS 356 CITATIONS

SEE PROFILE

All content following this page was uploaded by Nilu Singh on 01 December 2023.

The user has requested enhancement of the downloaded file.

HIVE (APACHE HIVE)

Dr. Nilu Singh

Department of Computer Science &
Engineering,
Koneru Lakshmaiah Education Foundation
(K.L. University, Vijayawada).
1
INTRODUCTION
• Hive, originally developed by Facebook and later owned
by Apache, is a data storage system that was developed
with a purpose to analyze organized data.
• Hive in Big Data is a data warehouse and SQL-like
querying tool built on the Hadoop ecosystem.
• Apache Hive is a distributed, fault-tolerant data
warehouse system that enables analytics at a massive
scale.
Cont.

• Apache Hive is a distributed, fault-tolerant data warehouse system that enables analytics
at a massive scale.
• Hive Metastore (HMS) provides a central repository of metadata that can easily be
analyzed to make informed, data driven decisions, and therefore it is a critical
component of many data lake architectures.
• Hive is built on top of Apache Hadoop and supports storage on S3, ADLS, GS etc.
though HDFS. Hive allows users to read, write, and manage petabytes of data using
SQL.
Apache HIVE Architecture

• Apache Hive is a very effective tool when it comes to big data (descriptive data to be
analyzed).
• Archive data software that supports the process of data analysis of big data on a regular
basis, the concept of big data nest is very popular in the technology area.
• As data is stored in the Apache Hadoop Distributed File System (HDFS) where data is
processed and processed, Apache Hive assists in processing and analyzing, and
producing data-driven patterns and trends.
Cont.

➢ HiveQL is a SQL-like
language that interacts
with the Hive website
in various organizations
and analyzes the
required data in a
structured format.
Cont.

Hive chiefly consists of three core parts:

Hive Clients: Hive offers a variety of drivers designed for communication with different
applications. For example, Hive provides Thrift clients for Thrift-based applications.
Hive Services: Hive services perform client interactions with Hive. For example, if a
client wants to perform a query, it must talk with Hive services.
Hive Storage and Computing: Hive services such as file system, job client, and meta store
then communicates with Hive storage and stores things like metadata table information
and query results.
Need of HIVE
• Hive in big data innovation is a milestone that eventually led to data analysis on a large
scale.
• Large organizations need big data to record information collected over time.
• To generate data-driven analysis, organizations collect data and use such software
applications to analyze their data.
• This data, contained in Apache Hive, can be used to read, write, and manage stored
information in an organized way.
• For this, organizations needed larger equipment and that is probably why the release of
software like Apache Hive was needed.
Characteristics of Hive
•SQL-like Interface: Hive's familiar SQL-like interface makes it simple for users to query
and analyze big datasets without the need for programming experience.
Scalability: Hive in Big Data can handle massive amounts of data stored in HDFS and
other data stores compatible with Hadoop.
Flexibility: Hive supports various data serialization formats, including Avro, Parquet, and
ORC, making it a versatile tool capable of handling various use cases and data formats.
Integration: Hive in Big Data interfaces with other Hadoop ecosystem tools like Pig,
Sqoop, and Flume, allowing users to conduct data analysis jobs and processes.
Cont.

External tables: Hive supports external tables, which allow users to access data stored in
other storage systems such as HBase, Cassandra, and Amazon S3.
Partitioning: Hive offers partitioning, which allows users to separate huge datasets based
on parameters such as date, location, or user ID. Restricting the quantity of data that must
be scanned improves query performance.
Advantages of Hive
Fast : Quickly process enormous amounts of data.
Familiar : Hive is its familiar SQL-like interface.
Scalable : Hive in Big Data can handle massive amounts of data stored in HDFS and
other compatible data stores.
Hive Optimization Techniques
•Partition your data to reduce read time within your directory, or else all the data will get
read
•Use appropriate file formats such as the Optimized Row Columnar (ORC) to increase
query performance. ORC reduces the original data size by up to 75 percent
•Divide table sets into more manageable parts by employing bucketing
•Improve aggregations, filters, scans, and joins by vectorizing your queries. Perform these
functions in batches of 1024 rows at once, rather than one at a time
•Create a separate index table that functions as a quick reference for the original table.
Components of Hive
• Shell
• Driver
• Compiler
• Metastore
• Execution Engine
Applications of Hive
•Data Mining
•Log Processing
•Document Indexing
•Customer Facing Business
Intelligence
•Predictive Modelling
•Hypothesis Testing
EXAMPLES

➢ “Airbnb connects people with accommodation and activities worldwide by 2.9 million
registered tourists, who support 800k overnight stays. Airbnb uses Amazon EMR to run
Apache Hive in the S3 data pool. Running Hive in EMR collections enables Airbnb analysts to
create temporary SQL queries in data stored in the S3 data pool. Spark at three times its
original speed”.
➢ “Guardian provides 27 million members with the protection they deserve through insurance
and asset management products and services. Guardian uses Amazon EMR to deploy Apache
Hive in the S3 data pool. Apache Hive is used to process clusters. data once influenced
Guardian Direct, a digital platform that allows consumers to research and purchase both
Guardian products and third-party products in the insurance industry”.
Important Points

•Hive is a Hadoop-based data warehouse and SQL-style querying tool.

•It enables users to execute ad-hoc searches and analyses on big datasets without learning
languages like MapReduce or Pig.
•Hive supports external tables, partitioning, and data serialization formats such as Avro
and Parquet.
•Hive's architecture comprises four major components: Hive User Interface, Meta Store,
HiveQL Process Engine, and Execution Engine.
•Hive has several benefits for big data analysis, including ease of use, scalability,
flexibility, integration, and cost-effectiveness.
References:
1. https://www.ibm.com/topics/mapreduce
2. https://www.simplilearn.com/tutorials/hadoop-tutorial/mapreduce-example

View publication stats

Kafka in Action
100% (4)
Kafka in Action
209 pages
Hadoop HIVE
No ratings yet
Hadoop HIVE
41 pages
Interview Data Engineer
100% (1)
Interview Data Engineer
13 pages
Apache Hive Handbook: Query, Analyze, and Optimize Big Data
From Everand
Apache Hive Handbook: Query, Analyze, and Optimize Big Data
Robert Johnson
No ratings yet
Final Doc Presentation Hive
No ratings yet
Final Doc Presentation Hive
20 pages
Introduction To Hive
No ratings yet
Introduction To Hive
9 pages
Big Data & Analytics (CSE6005) L6
No ratings yet
Big Data & Analytics (CSE6005) L6
56 pages
Report On Hive of Apache
No ratings yet
Report On Hive of Apache
3 pages
Hadoop - Hive
No ratings yet
Hadoop - Hive
190 pages
Bda Exp-6
No ratings yet
Bda Exp-6
10 pages
Hive
No ratings yet
Hive
7 pages
Hive
No ratings yet
Hive
5 pages
Hive - Self Learning Notes
No ratings yet
Hive - Self Learning Notes
69 pages
BD - Unit - IV - Hive and Pig
No ratings yet
BD - Unit - IV - Hive and Pig
41 pages
Bda 06
No ratings yet
Bda 06
15 pages
Unit-4 Hive
No ratings yet
Unit-4 Hive
10 pages
NoteGPT - Apache Hive Tutorial For Beginners - Big Data Training - Edureka - Big Data Rewind
No ratings yet
NoteGPT - Apache Hive Tutorial For Beginners - Big Data Training - Edureka - Big Data Rewind
15 pages
Session 3.1
No ratings yet
Session 3.1
29 pages
7 Hive
No ratings yet
7 Hive
30 pages
Unit 3-1
No ratings yet
Unit 3-1
41 pages
Bda Report
No ratings yet
Bda Report
16 pages
Hive Full Lecture
No ratings yet
Hive Full Lecture
17 pages
Apache Hive Essentials - Sample Chapter
No ratings yet
Apache Hive Essentials - Sample Chapter
13 pages
Big Data Analytics
From Everand
Big Data Analytics
Nitin Kumar Yadav
No ratings yet
HIVE
No ratings yet
HIVE
18 pages
Chapter 5 Hive
No ratings yet
Chapter 5 Hive
69 pages
Ibiz Hive
No ratings yet
Ibiz Hive
27 pages
Hive Introduction
No ratings yet
Hive Introduction
47 pages
Unit 5 Lecture No-1 (Hive)
No ratings yet
Unit 5 Lecture No-1 (Hive)
30 pages
A Project Report On Web Based Data Management
No ratings yet
A Project Report On Web Based Data Management
16 pages
Hive and Presto For Big Data
100% (1)
Hive and Presto For Big Data
31 pages
Apache HIVE
100% (1)
Apache HIVE
105 pages
Big-Data-Unit 5
No ratings yet
Big-Data-Unit 5
54 pages
IET Udaipur BDA Unit-5
No ratings yet
IET Udaipur BDA Unit-5
9 pages
Big-Data-Unit 5
No ratings yet
Big-Data-Unit 5
54 pages
Hive
No ratings yet
Hive
12 pages
Apache Hive
No ratings yet
Apache Hive
3 pages
Hive
No ratings yet
Hive
30 pages
Lecture Notes - Hive and Querying
No ratings yet
Lecture Notes - Hive and Querying
20 pages
Using Hive For Data Warehousing: Introduction To Hive
No ratings yet
Using Hive For Data Warehousing: Introduction To Hive
4 pages
Hive Tutorial
No ratings yet
Hive Tutorial
19 pages
Unit-IV - BDA
No ratings yet
Unit-IV - BDA
42 pages
Hands-On Lab: IBM Software Information Management
No ratings yet
Hands-On Lab: IBM Software Information Management
25 pages
Unit V-Hive
No ratings yet
Unit V-Hive
10 pages
Big Data Analytics Using Hadoop Tools - Apache Hive VS Apache Pig - 1604726800
No ratings yet
Big Data Analytics Using Hadoop Tools - Apache Hive VS Apache Pig - 1604726800
5 pages
Introduction To Hive
No ratings yet
Introduction To Hive
8 pages
Hive Architecture
No ratings yet
Hive Architecture
7 pages
Unit 5 Lecture No-1 (Hive)
No ratings yet
Unit 5 Lecture No-1 (Hive)
30 pages
Hive Slides-2
No ratings yet
Hive Slides-2
25 pages
Hive
No ratings yet
Hive
49 pages
Data Science and Big Data UNIT 4
No ratings yet
Data Science and Big Data UNIT 4
10 pages
Apache Hive: Prashant Gupta
100% (1)
Apache Hive: Prashant Gupta
61 pages
Hive - PIG - HBase - Zookeeper
100% (1)
Hive - PIG - HBase - Zookeeper
31 pages
Hive Is A Data Warehouse Infrastructure Tool To Process Structured Data in Hadoop
No ratings yet
Hive Is A Data Warehouse Infrastructure Tool To Process Structured Data in Hadoop
30 pages
Hive - A Warehousing Solution Over A Map-Reduce Framework
No ratings yet
Hive - A Warehousing Solution Over A Map-Reduce Framework
4 pages
Unit 3 Hive
No ratings yet
Unit 3 Hive
3 pages
Hadoop and Hive Architecture 1
No ratings yet
Hadoop and Hive Architecture 1
11 pages
Hive Unit VI
No ratings yet
Hive Unit VI
39 pages
Hive PPT
No ratings yet
Hive PPT
61 pages
Session 3.2
No ratings yet
Session 3.2
27 pages
Hive
No ratings yet
Hive
4 pages
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
From Everand
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
Wei Liu
No ratings yet
Assignment Topics Bigdata
No ratings yet
Assignment Topics Bigdata
4 pages
3-1 Syllabus (R20)
No ratings yet
3-1 Syllabus (R20)
36 pages
Data Engineering Lab
No ratings yet
Data Engineering Lab
4 pages
Luigi Readthedocs Io en Stable
No ratings yet
Luigi Readthedocs Io en Stable
292 pages
Learning Spark Preview Ed
No ratings yet
Learning Spark Preview Ed
18 pages
Spark-Rdd
No ratings yet
Spark-Rdd
15 pages
Full Download Big Data Analytics in Cybersecurity First Edition Deng PDF
100% (1)
Full Download Big Data Analytics in Cybersecurity First Edition Deng PDF
49 pages
Resume
No ratings yet
Resume
3 pages
20767A - Implementing A SQL Data Warehouse-195-230
No ratings yet
20767A - Implementing A SQL Data Warehouse-195-230
36 pages
Apache Spark Interview Questions and Answers PDF
No ratings yet
Apache Spark Interview Questions and Answers PDF
31 pages
Getting Started With Amazon Redshift
No ratings yet
Getting Started With Amazon Redshift
51 pages
BD - Spark - Baladasu A - SightSpectrum
No ratings yet
BD - Spark - Baladasu A - SightSpectrum
3 pages
21cs71BDA Question Bank
No ratings yet
21cs71BDA Question Bank
4 pages
Facebook Data Analysis Using Hadoop and Hive
No ratings yet
Facebook Data Analysis Using Hadoop and Hive
4 pages
Learn Apache Spark
100% (1)
Learn Apache Spark
31 pages
Oracle Data Integrator 11G & 12c Tutorials, - ODI 12c R2 (12.2.1.2
No ratings yet
Oracle Data Integrator 11G & 12c Tutorials, - ODI 12c R2 (12.2.1.2
10 pages
Apache Pig
100% (2)
Apache Pig
80 pages
Hadoop Cluster
No ratings yet
Hadoop Cluster
14 pages
CloudComputingTechnology1 Unit1
No ratings yet
CloudComputingTechnology1 Unit1
36 pages
Research On Big Data Technology-Based Agricultural Information System
No ratings yet
Research On Big Data Technology-Based Agricultural Information System
6 pages
3 Cse Big Data Analytics 19a 05 602p R 19 Lab Manual
No ratings yet
3 Cse Big Data Analytics 19a 05 602p R 19 Lab Manual
29 pages
Spark and Scala Course
No ratings yet
Spark and Scala Course
5 pages
CC Lab Final
No ratings yet
CC Lab Final
61 pages
Hive
No ratings yet
Hive
48 pages
Thesis Apache Spark
100% (2)
Thesis Apache Spark
4 pages
Hcia Big Data Practice
No ratings yet
Hcia Big Data Practice
35 pages
Pentaho CE Vs EE - 6.1 Feature Comparision Chart PDF
No ratings yet
Pentaho CE Vs EE - 6.1 Feature Comparision Chart PDF
2 pages
Kevin Cui Resume
No ratings yet
Kevin Cui Resume
1 page

Apache Hive

Uploaded by

Apache Hive

Uploaded by

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

HIVE (APACHE HIVE)

Presentation · December 2023

The user has requested enhancement of the downloaded file.

Dr. Nilu Singh

Hive chiefly consists of three core parts:

•Hive is a Hadoop-based data warehouse and SQL-style querying tool.

View publication stats

You might also like