Assignment 4-Gcc: Hive Is Not

Hive is a data warehouse infrastructure tool that allows users to query and analyze large datasets stored in Hadoop using SQL-like queries. It resides on top of Hadoop and stores schema information and processed data in HDFS. Hive uses a metastore to store metadata about tables and columns and a query compiler to generate execution plans for queries, which are then executed using Hadoop's MapReduce framework. This allows users to analyze structured big data using familiar SQL queries without needing to write MapReduce programs.

Uploaded by

mini v

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

61 views3 pages

Assignment 4-Gcc: Hive Is Not

Uploaded by

mini v

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 3

ASSIGNMENT 4-GCC

MAP REDUCE TECHNIQUES AND HIVE CONCEPTS

Hadoop
Hadoop is an open-source framework to store and process Big Data in a distributed environment. It contains two modules, one
is MapReduce and another is Hadoop Distributed File System (HDFS).
 MapReduce: It is a parallel programming model for processing large amounts of structured, semi-structured, and
unstructured data on large clusters of commodity hardware.
 HDFS:Hadoop Distributed File System is a part of Hadoop framework, used to store and process the datasets. It provides
a fault-tolerant file system to run on commodity hardware.
The Hadoop ecosystem contains different sub-projects (tools) such as Sqoop, Pig, and Hive that are used to help Hadoop
modules.
 Sqoop: It is used to import and export data to and from between HDFS and RDBMS.
 Pig: It is a procedural language platform used to develop a script for MapReduce operations.
 Hive: It is a platform used to develop SQL type scripts to do MapReduce operations.
Note: There are various ways to execute MapReduce operations:

 The traditional approach using Java MapReduce program for structured, semi-structured, and unstructured data.
 The scripting approach for MapReduce to process structured and semi structured data using Pig.
 The Hive Query Language (HiveQL or HQL) for MapReduce to process structured data using Hive.

What is Hive
Hive is a data warehouse infrastructure tool to process structured data in Hadoop. It resides on top of Hadoop to summarize Big
Data, and makes querying and analyzing easy.
Initially Hive was developed by Facebook, later the Apache Software Foundation took it up and developed it further as an open
source under the name Apache Hive. It is used by different companies. For example, Amazon uses it in Amazon Elastic
MapReduce.

Hive is not

 A relational database
 A design for OnLine Transaction Processing (OLTP)
 A language for real-time queries and row-level updates

Features of Hive

 It stores schema in a database and processed data into HDFS.

 It is designed for OLAP.
 It provides SQL type language for querying called HiveQL or HQL.
 It is familiar, fast, scalable, and extensible.

Architecture of Hive
The following component diagram depicts the architecture of Hive:
This component diagram contains different units. The following table describes each unit:

Unit Name Operation

User Interface Hive is a data warehouse infrastructure software that can create interaction
between user and HDFS. The user interfaces that Hive supports are Hive Web
UI, Hive command line, and Hive HD Insight (In Windows server).

Meta Store Hive chooses respective database servers to store the schema or Metadata of
tables, databases, columns in a table, their data types, and HDFS mapping.

HiveQL Process Engine HiveQL is similar to SQL for querying on schema info on the Metastore. It is one
of the replacements of traditional approach for MapReduce program. Instead of
writing MapReduce program in Java, we can write a query for MapReduce job
and process it.

Execution Engine The conjunction part of HiveQL process Engine and MapReduce is Hive
Execution Engine. Execution engine processes the query and generates results
as same as MapReduce results. It uses the flavor of MapReduce.

HDFS or HBASE Hadoop distributed file system or HBASE are the data storage techniques to
store data into file system.

Working of Hive
The following diagram depicts the workflow between Hive and Hadoop.
Step Operation
No.

1 Execute Query

The Hive interface such as Command Line or Web UI sends query to Driver (any database driver
such as JDBC, ODBC, etc.) to execute.

2 Get Plan
The driver takes the help of query compiler that parses the query to check the syntax and query
plan or the requirement of query.

3 Get Metadata
The compiler sends metadata request to Metastore (any database).

4 Send Metadata
Metastore sends metadata as a response to the compiler.

5 Send Plan
The compiler checks the requirement and resends the plan to the driver. Up to here, the parsing
and compiling of a query is complete.

Apache HIVE
100% (1)
Apache HIVE
105 pages
Unit 4 Hadoop Ecosystem - HIVE and PIG
No ratings yet
Unit 4 Hadoop Ecosystem - HIVE and PIG
157 pages
Bda Unit 5 Hive Notes
No ratings yet
Bda Unit 5 Hive Notes
23 pages
CS505-P Update Mcqs FinalTerm by Vu Topper RM
100% (1)
CS505-P Update Mcqs FinalTerm by Vu Topper RM
18 pages
Hadoop and Hive Architecture 1
No ratings yet
Hadoop and Hive Architecture 1
11 pages
BDA Unit-5
No ratings yet
BDA Unit-5
25 pages
DA Unit-5
No ratings yet
DA Unit-5
78 pages
Architecture and Working of Hive
No ratings yet
Architecture and Working of Hive
7 pages
Bda Unit 4 - Mam
No ratings yet
Bda Unit 4 - Mam
57 pages
Big-Data-Unit 5
No ratings yet
Big-Data-Unit 5
54 pages
Unit 3-1
No ratings yet
Unit 3-1
41 pages
Big-Data-Unit 5
No ratings yet
Big-Data-Unit 5
54 pages
Chapter 5 Hive
No ratings yet
Chapter 5 Hive
69 pages
Course3 Module2 Intro To Hive Slides
No ratings yet
Course3 Module2 Intro To Hive Slides
76 pages
Hive
No ratings yet
Hive
30 pages
Hive
No ratings yet
Hive
49 pages
Session 3.1
No ratings yet
Session 3.1
29 pages
Hive
No ratings yet
Hive
52 pages
Top 100 Interview Questions For Information Security
100% (1)
Top 100 Interview Questions For Information Security
48 pages
Introduction To Hive
No ratings yet
Introduction To Hive
28 pages
BDA Unit-5
No ratings yet
BDA Unit-5
26 pages
Unit-IV - BDA
No ratings yet
Unit-IV - BDA
42 pages
7 Hive
No ratings yet
7 Hive
30 pages
Ibiz Hive
No ratings yet
Ibiz Hive
27 pages
Unit 5 (BDC)
No ratings yet
Unit 5 (BDC)
59 pages
Hive Unit VI
No ratings yet
Hive Unit VI
39 pages
Big Data & Analytics (CSE6005) L6
No ratings yet
Big Data & Analytics (CSE6005) L6
56 pages
Hive Tutorial
No ratings yet
Hive Tutorial
19 pages
01 Introduction To Hive
No ratings yet
01 Introduction To Hive
17 pages
Unit 3 BDA
No ratings yet
Unit 3 BDA
44 pages
01 Introduction To Hive
No ratings yet
01 Introduction To Hive
14 pages
Unit5 Notes
No ratings yet
Unit5 Notes
29 pages
Hive Full Lecture
No ratings yet
Hive Full Lecture
17 pages
Bigdata Lecture 5
No ratings yet
Bigdata Lecture 5
19 pages
Bda Ia-3 QB-1
No ratings yet
Bda Ia-3 QB-1
17 pages
Unit 5 Lecture No-1 (Hive)
No ratings yet
Unit 5 Lecture No-1 (Hive)
30 pages
Big Data: Week - 11
No ratings yet
Big Data: Week - 11
28 pages
Web Based Data Management of Apache Hive
No ratings yet
Web Based Data Management of Apache Hive
22 pages
Unit 5 Lecture No-1 (Hive)
No ratings yet
Unit 5 Lecture No-1 (Hive)
30 pages
01 Introduction To Hive (1) 2 15
No ratings yet
01 Introduction To Hive (1) 2 15
14 pages
HIVE
No ratings yet
HIVE
18 pages
(R17a0528) Big Data Analytics-57-100
No ratings yet
(R17a0528) Big Data Analytics-57-100
44 pages
BDA Unit 4 Notes
No ratings yet
BDA Unit 4 Notes
33 pages
Unit 5-Hive
No ratings yet
Unit 5-Hive
18 pages
Unit-4 Hive
No ratings yet
Unit-4 Hive
10 pages
PaaS For Dummies
No ratings yet
PaaS For Dummies
69 pages
Unit V-Hive
No ratings yet
Unit V-Hive
10 pages
Introduction To Hive
No ratings yet
Introduction To Hive
9 pages
Hive
No ratings yet
Hive
23 pages
HIVE
No ratings yet
HIVE
7 pages
MS SQL
No ratings yet
MS SQL
46 pages
Hive
No ratings yet
Hive
12 pages
Week 14 Hive
No ratings yet
Week 14 Hive
6 pages
1 - Introduction
No ratings yet
1 - Introduction
5 pages
What Is Hive
No ratings yet
What Is Hive
4 pages
IET Udaipur BDA Unit-5
No ratings yet
IET Udaipur BDA Unit-5
9 pages
Unit 3
No ratings yet
Unit 3
8 pages
Hive
No ratings yet
Hive
5 pages
Unit-Vi Hive Hadoop & Big Data
100% (1)
Unit-Vi Hive Hadoop & Big Data
24 pages
Introduction To Hive-5
No ratings yet
Introduction To Hive-5
4 pages
Working of Hive: Mapreduce: It Is A Parallel Programming Model For Processing Large Amounts
No ratings yet
Working of Hive: Mapreduce: It Is A Parallel Programming Model For Processing Large Amounts
3 pages
MIS
No ratings yet
MIS
11 pages
Introduction To HIVE
No ratings yet
Introduction To HIVE
8 pages
Fortinet
No ratings yet
Fortinet
29 pages
Tibco Admin 5.11
No ratings yet
Tibco Admin 5.11
313 pages
Development Roadmap
No ratings yet
Development Roadmap
4 pages
11TH CS Sample Paper
No ratings yet
11TH CS Sample Paper
4 pages
Ix - Ai - Pre Midterm (2021-22) - MS
No ratings yet
Ix - Ai - Pre Midterm (2021-22) - MS
4 pages
NoSQL Unit 1 & 2 QnA
No ratings yet
NoSQL Unit 1 & 2 QnA
18 pages
Suzuki Kasami Algorithm and Mutual Exclusion in Distributed Systems
No ratings yet
Suzuki Kasami Algorithm and Mutual Exclusion in Distributed Systems
18 pages
Midtem
No ratings yet
Midtem
13 pages
PRD Totalchrom63
No ratings yet
PRD Totalchrom63
4 pages
Win The Digital Banking Race
No ratings yet
Win The Digital Banking Race
20 pages
CMP 312 1
No ratings yet
CMP 312 1
23 pages
SAAD Chapter 2
No ratings yet
SAAD Chapter 2
16 pages
BSC IT TB For 5th Semester (Data Warehousing - 53) Kuvempu University
No ratings yet
BSC IT TB For 5th Semester (Data Warehousing - 53) Kuvempu University
7 pages
Tarpapel Bahagi NG Kompyuter Lecture
No ratings yet
Tarpapel Bahagi NG Kompyuter Lecture
45 pages
A Data Type Is Characterized by
No ratings yet
A Data Type Is Characterized by
3 pages
My Resume
No ratings yet
My Resume
1 page
Arshan HAH: Relevant Coursework
No ratings yet
Arshan HAH: Relevant Coursework
2 pages
Express Middleware - Wds
No ratings yet
Express Middleware - Wds
15 pages
Book2 (KARA Solution Introduction v2.1401.03)
No ratings yet
Book2 (KARA Solution Introduction v2.1401.03)
20 pages
Audit Program For Retail Teller Module (Oracle FLEXCUBE)
No ratings yet
Audit Program For Retail Teller Module (Oracle FLEXCUBE)
11 pages
Web Portal For Student Information System: Prepared by
No ratings yet
Web Portal For Student Information System: Prepared by
16 pages
rdb1 ws0910 v2 2x3 PDF
No ratings yet
rdb1 ws0910 v2 2x3 PDF
14 pages
MicroService Design Patterns
No ratings yet
MicroService Design Patterns
6 pages
So Hani Resume
No ratings yet
So Hani Resume
1 page
Brochure Oracle DBABrochure Oracle DBA Basics Non BSNL Basics Non BSNL
No ratings yet
Brochure Oracle DBABrochure Oracle DBA Basics Non BSNL Basics Non BSNL
2 pages
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
From Everand
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
Wei Liu
No ratings yet
Learn Hive in 24 Hours
From Everand
Learn Hive in 24 Hours
Alex Nordeen
No ratings yet

Assignment 4-Gcc: Hive Is Not

Uploaded by

Assignment 4-Gcc: Hive Is Not

Uploaded by

ASSIGNMENT 4-GCC

MAP REDUCE TECHNIQUES AND HIVE CONCEPTS

 It stores schema in a database and processed data into HDFS.

Unit Name Operation

You might also like