0% found this document useful (0 votes)

42 views7 pages

Hive

Uploaded by

mytempemail2023

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

42 views7 pages

Hive

Uploaded by

mytempemail2023

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Hive

Hive: It is a platform used to develop SQL type scripts to do MapReduce operations.

What is Hive
Hive is a data warehouse infrastructure tool to process data in Hadoop. It resides on top of
Hadoop to summarize Big Data, and makes querying and analyzing easy.
Initially Hive was developed by Facebook, later the Apache Software Foundation took it up
and developed it further as an open source under the name Apache Hive. It is used by
different companies. For example, Amazon uses it in Amazon Elastic MapReduce.
Hive is not
• A relational database
• A design for OnLine Transaction Processing (OLTP)
• A language for real-time queries and row-level updates
Features of Hive
• It stores schema in a database and processed data into HDFS.
• It is designed for OLAP.
• It provides SQL type language for querying called HiveQL or HQL.
• It is familiar, fast, scalable, and extensible.

Architecture of Hive
The following component diagram depicts the architecture of Hive:
This component diagram contains different units. The following table describes each unit:

Unit Name Operation

User Hive is a data warehouse infrastructure software that can create interaction
Interface between user and HDFS. The user interfaces that Hive supports are Hive
Web UI, Hive command line, and Hive HD Insight (In Windows server).
Meta Store Hive chooses respective database servers to store the schema or Metadata of
tables, databases, columns in a table, their data types, and HDFS mapping.
HiveQL HiveQL is similar to SQL for querying on schema info on the Metastore. It is
Process one of the replacements of traditional approach for MapReduce program.
Engine Instead of writing MapReduce program in Java, we can write a query for
MapReduce job and process it.
Execution The conjunction part of HiveQL process Engine and MapReduce is Hive
Engine Execution Engine. Execution engine processes the query and generates
results as same as MapReduce results. It uses the flavor of MapReduce.
HDFS or Hadoop distributed file system or HBASE are the data storage techniques to
HBASE store data into file system.

Working of Hive
The following diagram depicts the workflow between Hive and Hadoop.

The following table defines how Hive interacts with Hadoop framework:

Step Operation
No.
1 Execute Query
The Hive interface such as Command Line or Web UI sends query to Driver (any
database driver such as JDBC, ODBC, etc.) to execute.
2 Get Plan
The driver takes the help of query compiler that parses the query to check the syntax
and query plan or the requirement of query.
3 Get Metadata
The compiler sends metadata request to Metastore (any database).
4 Send Metadata
Metastore sends metadata as a response to the compiler.
5 Send Plan
The compiler checks the requirement and resends the plan to the driver. Up to here,
the parsing and compiling of a query is complete.
6 Execute Plan
The driver sends the execute plan to the execution engine.
7 Execute Job
Internally, the process of execution job is a MapReduce job. The execution engine
sends the job to JobTracker, which is in Name node and it assigns this job to
TaskTracker, which is in Data node. Here, the query executes MapReduce job.
7.1 Metadata Ops
Meanwhile in execution, the execution engine can execute metadata operations with
Metastore.
8 Fetch Result
The execution engine receives the results from Data nodes.
9 Send Results
The execution engine sends those resultant values to the driver.
10 Send Results
The driver sends the results to Hive Interfaces.

Why do we need it?

Hive in big data is a milestone innovation that has eventually led to data analysis on a large

scale. Big organizations need big data to record the information that is collected over the

time.
To produce data-driven analysis, organizations gather data and use such software applications

to analyze their data. This data, with Apache Hive, can be used for reading, writing, and

managing information. Ever since data analytics has come into being, storage of data has

been a trending topic.

Even though small scale organizations were able to manage medium-sized data and analyze it

with traditional data analytics tools, big data could not be managed with such applications

and so, there was a dire need for advanced software.

As data collection became a daily task and organizations expanded in all aspects, data

collection became exponential and vast. Furthermore, data began to be dealt in petabytes that

define storage of vast data.

For this, organizations needed hefty equipment and perhaps that is the reason why the release

of a software like Apache Hive was necessary. Thus, Apache Hive was released with the

purpose of analyzing big data and producing data-driven analogies.

Here are 2 case studies of airbnb and the guardian that can help you to understand the use of

Hive in Big Data.

"Airbnb connects people with places to stay and things to do around the world with 2.9

million hosts listed, supporting 800k nightly stays. Airbnb uses Amazon EMR to run Apache

Hive on a S3 data lake. Running Hive on the EMR clusters enables Airbnb analysts to

perform ad hoc SQL queries on data stored in the S3 data lake. By migrating to a S3 data

lake, Airbnb reduced expenses, can now do cost attribution, and increased the speed of

Apache Spark jobs by three times their original speed."

"Guardian gives 27 million members the security they deserve through insurance and wealth
management products and services. Guardian uses Amazon EMR to run Apache Hive on a S3
data lake. Apache Hive is used for batch processing. The S3 data lake fuels Guardian Direct,
a digital platform that allows consumers to research and purchase both Guardian products and
third party products in the insurance sector."

Benefits of Hive

Hive in Big Data is extremely beneficial. While it has its own cons, the pros of Hive make it

an unbeatable option available for data optimization and analysis.

The USP of Apache Hive can be summed up in its benefits that have been highly helpful in

big data analysis over the time. Here are a few benefits that will make you understand the

concept better.

1. Easy-to-use

Hive in Big Data is an easy-to-use software application that lets one analyze large-scale

data through the batch processing technique. An efficient program, it uses a familiar

software that uses HiveQL, a language that is very similar to SQL- structured query

language used for interaction with databases.

Such a software can be operated by both programmers and non-programmers, making

it a very accessible and easy-to-use application for converting petabytes of data into

useful data strands.

This is one of the biggest benefits of Apache Hive that has made it a popular choice for

data analytics among large organizations with vast data.

2. Fast Experience

The technique of batch processing refers to the analysis of data in bits and parts that are

later clubbed together. Moreover, the analyzed data is sent to Apache Hadoop, while

the schemas or derived stereotypes remain with Apache Hive.

The technique of batch processing makes Apache Hive a fast software that conducts

the analysis of data in a rapid manner. In addition, Apache Hive is an advanced data

analysis batch processing software that is unlike traditional tools.

Thus, this particular software can handle big loads of data in one go as opposed to the

traditional softwares that could only filter moderate-sized data in one go.

3. Fault-tolerant Software

In most of the softwares that is used to handle Big Data today, fault tolerance is a rare

feature. However, Apache Hive and the HDFS file system together work in a fault-

tolerant manner that operates on the basis of replica creation.

This means that as soon as big data is analyzed in Hive, it is immediately replicated to

other machines. This is done in order to prevent loss of data or schemas just in case a

particular machine fails to work or stops operating.

Fault tolerance in Hadoop (Hive) is one of the biggest benefits of Hive as it beats other
competitors like Impala and makes Hive unique in its own way.

4. Cheaper Option

Another reason why Apache Hive is beneficial is that it is a comparatively cheaper

option. For large organizations, profit is the key. Yet with technologically advanced

tools and softwares that are expensive to operate, profit margins can stoop low.
Therefore, it is necessary for organizations to look out for cheaper options that can help

them achieve the same goals but with cost-effective measures. When it comes to big

data and data analysis, Apache Hive is one of the best softwares to use and operate.

Fast and familiar, it is highly efficient and also relies on fault tolerance to produce

better results.

5. Productive Software

Apache Hive is a productive software. Why? Well, the answer lies in its other benefits.
Apache Hive not only analyzes data, but also enables its users to read and write the

data in an organized manner.

What's more is that this software defines specific schemas related to data analysis and

stores them in Hadoop Distributed File System (HDFS) which helps in future analysis.

Henceforth, Hive in Big Data is quite productive and enables large organizations to

make the best use of the data collected and generated over a long period of time to

convert it into meaningful bits and pieces.

Apache HIVE
100% (1)
Apache HIVE
105 pages
Bda Unit 5 Hive Notes
No ratings yet
Bda Unit 5 Hive Notes
23 pages
Chapter 5 Hive
No ratings yet
Chapter 5 Hive
69 pages
BDA Unit-5
No ratings yet
BDA Unit-5
25 pages
DL205 User Manual Vol1
No ratings yet
DL205 User Manual Vol1
415 pages
x86 Assembly-Print Version - Wikibooks, Open Books For An Open World
No ratings yet
x86 Assembly-Print Version - Wikibooks, Open Books For An Open World
196 pages
Hash Tables: Dr. Dibakar Saha
No ratings yet
Hash Tables: Dr. Dibakar Saha
26 pages
Unit 5 (BDC)
No ratings yet
Unit 5 (BDC)
59 pages
Bda Unit 4 - Mam
No ratings yet
Bda Unit 4 - Mam
57 pages
Hadoop and Hive Architecture 1
No ratings yet
Hadoop and Hive Architecture 1
11 pages
Hadoop - Hive
No ratings yet
Hadoop - Hive
190 pages
Fortigate Cookbook
No ratings yet
Fortigate Cookbook
393 pages
Apache Hive Essentials - Sample Chapter
No ratings yet
Apache Hive Essentials - Sample Chapter
13 pages
I O Interface
No ratings yet
I O Interface
27 pages
Bigdata Lecture 5
No ratings yet
Bigdata Lecture 5
19 pages
Topic 12 Manipulating Data: Prepared By: Mohammad Nabeel Arshad
No ratings yet
Topic 12 Manipulating Data: Prepared By: Mohammad Nabeel Arshad
101 pages
Apache Hive
No ratings yet
Apache Hive
17 pages
Big Data & Analytics (CSE6005) L6
No ratings yet
Big Data & Analytics (CSE6005) L6
56 pages
Unit-Vi Hive Hadoop & Big Data
100% (1)
Unit-Vi Hive Hadoop & Big Data
24 pages
BDA Unit 4 Notes
No ratings yet
BDA Unit 4 Notes
33 pages
Big-Data-Unit 5
No ratings yet
Big-Data-Unit 5
54 pages
Unit 5 Lecture No-1 (Hive)
No ratings yet
Unit 5 Lecture No-1 (Hive)
30 pages
Unit 5 Lecture No-1 (Hive)
No ratings yet
Unit 5 Lecture No-1 (Hive)
30 pages
Hive
No ratings yet
Hive
49 pages
Syllabus
No ratings yet
Syllabus
55 pages
DA Unit-5
No ratings yet
DA Unit-5
78 pages
Big Data: Week - 11
No ratings yet
Big Data: Week - 11
28 pages
SAP DB Control Center 4 Guide en
No ratings yet
SAP DB Control Center 4 Guide en
34 pages
HIVE
No ratings yet
HIVE
18 pages
PWX MVS DB2 Overview
No ratings yet
PWX MVS DB2 Overview
23 pages
7 Hive
No ratings yet
7 Hive
30 pages
Unit 3-1
No ratings yet
Unit 3-1
41 pages
Final Doc Presentation Hive
No ratings yet
Final Doc Presentation Hive
20 pages
Xbee-Pro PKG-R™ Rs-232 RF Modem: Product Manual
No ratings yet
Xbee-Pro PKG-R™ Rs-232 RF Modem: Product Manual
53 pages
IET Udaipur BDA Unit-5
No ratings yet
IET Udaipur BDA Unit-5
9 pages
Hive
No ratings yet
Hive
30 pages
Bda Report
No ratings yet
Bda Report
16 pages
Ibiz Hive
No ratings yet
Ibiz Hive
27 pages
Lecture Notes - Hive and Querying
No ratings yet
Lecture Notes - Hive and Querying
20 pages
NoteGPT - Apache Hive Tutorial For Beginners - Big Data Training - Edureka - Big Data Rewind
No ratings yet
NoteGPT - Apache Hive Tutorial For Beginners - Big Data Training - Edureka - Big Data Rewind
15 pages
0026 DO Q-Control User Manual
No ratings yet
0026 DO Q-Control User Manual
32 pages
Data Science and Big Data UNIT 4
No ratings yet
Data Science and Big Data UNIT 4
10 pages
Session 3.1
No ratings yet
Session 3.1
29 pages
Hashing Solutions
No ratings yet
Hashing Solutions
4 pages
Sap Table Ref
No ratings yet
Sap Table Ref
18 pages
Hive
No ratings yet
Hive
12 pages
ch01 2 PDF
No ratings yet
ch01 2 PDF
32 pages
01 Introduction To Hive
No ratings yet
01 Introduction To Hive
17 pages
01 Introduction To Hive (1) 2 15
No ratings yet
01 Introduction To Hive (1) 2 15
14 pages
Hive Tutorial
No ratings yet
Hive Tutorial
19 pages
Bda 06
No ratings yet
Bda 06
15 pages
Hadoop - MapReduce
No ratings yet
Hadoop - MapReduce
5 pages
Computer 2.3.1
No ratings yet
Computer 2.3.1
5 pages
Big Data Huawei Course
No ratings yet
Big Data Huawei Course
23 pages
Unit-4 Hive
No ratings yet
Unit-4 Hive
10 pages
Web Based Data Management of Apache Hive
No ratings yet
Web Based Data Management of Apache Hive
22 pages
1) Create Table Client - Master45
No ratings yet
1) Create Table Client - Master45
5 pages
Virtual Device Context (VDC) Configuration Example
No ratings yet
Virtual Device Context (VDC) Configuration Example
4 pages
BDA Answers
No ratings yet
BDA Answers
10 pages
Introduction To Hive
No ratings yet
Introduction To Hive
9 pages
GIS Q1: Define Gis
No ratings yet
GIS Q1: Define Gis
16 pages
Lab 1: LINQ Project: Unified Language Features For Object and Relational Queries
No ratings yet
Lab 1: LINQ Project: Unified Language Features For Object and Relational Queries
11 pages
HIVE
No ratings yet
HIVE
7 pages
Sowmya Gadipally Resume SSIS-SSRS Developer
No ratings yet
Sowmya Gadipally Resume SSIS-SSRS Developer
5 pages
01 Introduction To Hive
No ratings yet
01 Introduction To Hive
14 pages
Bda Exp-6
No ratings yet
Bda Exp-6
10 pages
Unit 3
No ratings yet
Unit 3
8 pages
Hadoop Distributed File System (HDFS)
No ratings yet
Hadoop Distributed File System (HDFS)
6 pages
Pig
No ratings yet
Pig
6 pages
HBase
No ratings yet
HBase
6 pages
Fig - Interaction Between User App & V4L2 Driver: Source: VVDN Technologies
No ratings yet
Fig - Interaction Between User App & V4L2 Driver: Source: VVDN Technologies
14 pages
Hive
No ratings yet
Hive
5 pages
Big SQL
No ratings yet
Big SQL
6 pages
7 - 2 - Arrays of Structures
No ratings yet
7 - 2 - Arrays of Structures
12 pages
Big Data Analytics Using Hadoop Tools - Apache Hive VS Apache Pig - 1604726800
No ratings yet
Big Data Analytics Using Hadoop Tools - Apache Hive VS Apache Pig - 1604726800
5 pages
DT 01 Aca
No ratings yet
DT 01 Aca
2 pages
Chapter 10
No ratings yet
Chapter 10
7 pages
Big Data Emerging Technologie
No ratings yet
Big Data Emerging Technologie
10 pages
Week 14 Hive
No ratings yet
Week 14 Hive
6 pages
Data Migration SAP
No ratings yet
Data Migration SAP
6 pages
Introduction To Hive-5
No ratings yet
Introduction To Hive-5
4 pages
1 - Introduction
No ratings yet
1 - Introduction
5 pages
Refresher Electronics (Comp)
No ratings yet
Refresher Electronics (Comp)
7 pages
Ibm Hadoop
No ratings yet
Ibm Hadoop
4 pages
Apache Hive
No ratings yet
Apache Hive
3 pages
Using Hive For Data Warehousing: Introduction To Hive
No ratings yet
Using Hive For Data Warehousing: Introduction To Hive
4 pages
SB Collect
No ratings yet
SB Collect
2 pages
MTech Admission Notice
No ratings yet
MTech Admission Notice
2 pages
Report On Hive of Apache
No ratings yet
Report On Hive of Apache
3 pages
Antiragging Form
No ratings yet
Antiragging Form
2 pages
Question Bank
No ratings yet
Question Bank
4 pages
Quation 9
No ratings yet
Quation 9
3 pages
Trace
No ratings yet
Trace
2 pages
Revision 24 Controledge Hc900 Control Designer User Guide 41 April 2019
No ratings yet
Revision 24 Controledge Hc900 Control Designer User Guide 41 April 2019
2 pages
HTML and XML Dom
No ratings yet
HTML and XML Dom
2 pages
The Free Hive Book
No ratings yet
The Free Hive Book
1 page

Hive

Uploaded by

Hive

Uploaded by

Hive

Hive: It is a platform used to develop SQL type scripts to do MapReduce operations.

Unit Name Operation

Why do we need it?

been a trending topic.

and so, there was a dire need for advanced software.

define storage of vast data.

purpose of analyzing big data and producing data-driven analogies.

Hive in Big Data.

Apache Spark jobs by three times their original speed."

an unbeatable option available for data optimization and analysis.

language used for interaction with databases.

Such a software can be operated by both programmers and non-programmers, making

useful data strands.

data analytics among large organizations with vast data.

the schemas or derived stereotypes remain with Apache Hive.

analysis batch processing software that is unlike traditional tools.

tolerant manner that operates on the basis of replica creation.

particular machine fails to work or stops operating.

Another reason why Apache Hive is beneficial is that it is a comparatively cheaper

data in an organized manner.

convert it into meaningful bits and pieces.

You might also like