Hive

Uploaded by

mustafaalj2002

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views12 pages

Hive

Uploaded by

mustafaalj2002

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 12

 Hadoop is an open source framework that is

used to efficiently store and process large

datasets ranging in size from gigabytes to
petabytes of data.
 Hadoop is an open source distributed processing
framework that manages data processing and
storage for big data applications in scalable clusters
of computer servers. It's at the center of an
ecosystem of big data technologies that are primarily
used to support advanced analytics initiatives, data
mining and machine learning. Hadoop systems can
handle various forms of structured and unstructured
data, giving users more flexibility for collecting,
processing, analyzing and managing data than
relational databases and data warehouses provide.
 Hadoop's ability to process and store
different types of data makes it a particularly
good fit for big data environments. They
typically involve not only large amounts of
data, but also a mix of structured
transaction data and semistructured and
unstructured information, such as internet
clickstream records, web server and mobile
application logs, social media posts,
customer emails and sensor data from the
internet of things (IoT).
 Formally known as Apache Hadoop, the technology
is developed as part of an open source project within
the Apache software foundation. Multiple vendors
offer commercial Hadoop distributions, although the
number of Hadoop vendors has declined because of
an overcrowded market and then competitive
pressures driven by the increased deployment of big
data systems in the cloud. The shift to the cloud also
enables users to store data in lower-cost
cloud object storage services instead of Hadoop's
namesake file system; as a result, Hadoop's role is
being reduced in some big data architectures.
Hive
Motivation
 Yahoo worked on Pig to facilitate application
deployment on Hadoop.
◦ Their need mainly was focused on unstructured
data
 Simultaneously Facebook started working on
deploying warehouse solutions on Hadoop
that resulted in Hive.
◦ The size of data being collected and analyzed in
industry for business intelligence (BI) is growing
rapidly making traditional warehousing solution
prohibitively expensive.

05/31/2024 7
Hive architecture (from the paper)

05/31/2024 8
Data model
 Hive structures data into well-understood
database concepts such as: tables, rows, cols,
partitions
 It supports primitive types: integers, floats,
doubles, and strings
 Hive also supports:
◦ associative arrays: map<key-type, value-type>
◦ Lists: list<element type>
◦ Structs: struct<file name: file type…>
 SerDe: serialize and deserialized API is used to
move data in and out of tables
05/31/2024 9
Query Language (HiveQL)
 Subset of SQL
 Meta-data queries
 No inserts on existing tables

◦ Can overwrite an entire table

10
Data Model
 Tables
 Basic type columns (int, float, boolean)
 Complex type: List / Map ( associate array)
 Partitions
 Buckets
 CREATE TABLE sales( id INT, items
ARRAY<STRUCT<id:INT,name:STRING>
) PARITIONED BY (ds STRING)
CLUSTERED BY (id) INTO 32 BUCKETS;

 SELECT id FROM sales TABLESAMPLE (BUCKET 1 OUT OF

32)
05/31/2024 11
Introduction to Hive
Apache Hive

 Run HiveQL, SQL-like language, to interact with Hadoop.

 Demo: Create and load wordcount results from Pig script into table.
Retrieve data.

MCA Dbms Lab Manual Full
80% (10)
MCA Dbms Lab Manual Full
23 pages
BAD601 Module 2 PDF
No ratings yet
BAD601 Module 2 PDF
58 pages
Big Data & Hadoop Training Material 0 1 PDF
50% (2)
Big Data & Hadoop Training Material 0 1 PDF
168 pages
CC Unit 51
No ratings yet
CC Unit 51
39 pages
Ch2 - Hadoop & Hdfs-En
No ratings yet
Ch2 - Hadoop & Hdfs-En
61 pages
Ch6 Architectural Design v1
No ratings yet
Ch6 Architectural Design v1
26 pages
UNIT-4-Hadoop Ecosystem-Part 1
No ratings yet
UNIT-4-Hadoop Ecosystem-Part 1
22 pages
Module 2 CN
No ratings yet
Module 2 CN
23 pages
Big Data
No ratings yet
Big Data
63 pages
Big Data Unit 2 (Easy Notes) Edushine Classes
No ratings yet
Big Data Unit 2 (Easy Notes) Edushine Classes
35 pages
Module 2
No ratings yet
Module 2
23 pages
Unit 2
No ratings yet
Unit 2
73 pages
CC Unit 2
No ratings yet
CC Unit 2
29 pages
Big Data Technology
No ratings yet
Big Data Technology
9 pages
Chap3 OverviewOfBigDataEcosystem
No ratings yet
Chap3 OverviewOfBigDataEcosystem
91 pages
Hadoop
No ratings yet
Hadoop
5 pages
Experiment No 1
No ratings yet
Experiment No 1
7 pages
Seminar Report On Bigdata and Hadoop
No ratings yet
Seminar Report On Bigdata and Hadoop
4 pages
Big Data 2 - Part
No ratings yet
Big Data 2 - Part
40 pages
Big Data Unit II
No ratings yet
Big Data Unit II
42 pages
Unit Iii
No ratings yet
Unit Iii
20 pages
Unit 4 Hadoop
No ratings yet
Unit 4 Hadoop
31 pages
Unit-III (Big Data) Final
No ratings yet
Unit-III (Big Data) Final
34 pages
Chapter 2
No ratings yet
Chapter 2
19 pages
UNIT-I Introduction To Hadoop - A20
No ratings yet
UNIT-I Introduction To Hadoop - A20
24 pages
Hadoop Ecosystem
No ratings yet
Hadoop Ecosystem
58 pages
Hadoop Ecosystem
No ratings yet
Hadoop Ecosystem
5 pages
IRS Unit-1
50% (2)
IRS Unit-1
14 pages
Unit 5
No ratings yet
Unit 5
32 pages
Module 2. 16974328568170
No ratings yet
Module 2. 16974328568170
113 pages
Big Data-2
No ratings yet
Big Data-2
40 pages
IBM Hadoop
No ratings yet
IBM Hadoop
11 pages
Day 2 S1 Intro - To - Hadoop - Ashok
No ratings yet
Day 2 S1 Intro - To - Hadoop - Ashok
27 pages
CASE STUDY On Application of Hadoop
No ratings yet
CASE STUDY On Application of Hadoop
16 pages
Hadoop Notes
No ratings yet
Hadoop Notes
8 pages
DBMS Unit-5
No ratings yet
DBMS Unit-5
92 pages
Data Science
No ratings yet
Data Science
87 pages
Clone An Oracle Database Using Rman Duplicate
No ratings yet
Clone An Oracle Database Using Rman Duplicate
3 pages
Big Data - Introduction To Hadoop
No ratings yet
Big Data - Introduction To Hadoop
61 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
14 pages
Hadoop Ecosystem Large PDF
No ratings yet
Hadoop Ecosystem Large PDF
229 pages
Haddob Lab Report
No ratings yet
Haddob Lab Report
12 pages
Big Data Module 2
No ratings yet
Big Data Module 2
23 pages
CC Unit - 5
No ratings yet
CC Unit - 5
27 pages
Big Data Lab Manual
No ratings yet
Big Data Lab Manual
36 pages
IOT and Comp - Architecture
No ratings yet
IOT and Comp - Architecture
17 pages
Do IP
No ratings yet
Do IP
23 pages
Apache Hadoop
No ratings yet
Apache Hadoop
27 pages
Big Data ABHISHEK PRAJA C CCCCCCCCCCC
No ratings yet
Big Data ABHISHEK PRAJA C CCCCCCCCCCC
11 pages
Big Data Intro
No ratings yet
Big Data Intro
10 pages
HADOOP
No ratings yet
HADOOP
10 pages
The Age OF: Every Minute
No ratings yet
The Age OF: Every Minute
47 pages
Chapter 2 Hadoop Eco System
No ratings yet
Chapter 2 Hadoop Eco System
34 pages
Hadoop
No ratings yet
Hadoop
11 pages
Big Data RAJNEESH CCC
No ratings yet
Big Data RAJNEESH CCC
11 pages
BigData Unit 2
No ratings yet
BigData Unit 2
15 pages
Hadoop & HDFS Final
No ratings yet
Hadoop & HDFS Final
31 pages
BDA Presentations Unit-4 - Hadoop, Ecosystem
100% (1)
BDA Presentations Unit-4 - Hadoop, Ecosystem
25 pages
Apc2 Application Controller Configuration and Programming
100% (2)
Apc2 Application Controller Configuration and Programming
98 pages
Hadoop Overview Training Material
No ratings yet
Hadoop Overview Training Material
44 pages
Chapter - 2 Hadoop
No ratings yet
Chapter - 2 Hadoop
32 pages
Hadoop V.01
No ratings yet
Hadoop V.01
24 pages
Big Data: Introduction To Terms, Concepts and Tools
No ratings yet
Big Data: Introduction To Terms, Concepts and Tools
23 pages
Sap Hana PDF
No ratings yet
Sap Hana PDF
47 pages
Big Data?: Hadoop?
No ratings yet
Big Data?: Hadoop?
2 pages
Modern Data Warehouse Proposal
No ratings yet
Modern Data Warehouse Proposal
3 pages
Accelerator Template For BPM To-Be Process Documentation
No ratings yet
Accelerator Template For BPM To-Be Process Documentation
16 pages
Anti Textbook Py
No ratings yet
Anti Textbook Py
74 pages
Hard Disk Data Recovery
100% (1)
Hard Disk Data Recovery
19 pages
EAS 6 (1) .3 Exchange Admin
100% (1)
EAS 6 (1) .3 Exchange Admin
336 pages
Class X Imp Notes Cbse
No ratings yet
Class X Imp Notes Cbse
131 pages
S-93C46B/56B/66B: 3-Wire Serial E Prom
No ratings yet
S-93C46B/56B/66B: 3-Wire Serial E Prom
49 pages
224 Service Manual Aspire 7715z 7315
No ratings yet
224 Service Manual Aspire 7715z 7315
197 pages
SRS Image Viewer & Analyzer
No ratings yet
SRS Image Viewer & Analyzer
16 pages
Hercules Manual
No ratings yet
Hercules Manual
142 pages
Cluster Database Table in HR
No ratings yet
Cluster Database Table in HR
13 pages
IBM TS7700 Virtual Tape Library Education For Technical Sales Level 3 Quiz
No ratings yet
IBM TS7700 Virtual Tape Library Education For Technical Sales Level 3 Quiz
16 pages
110-6183-En-R2 HIT Install User Guide V4.7
No ratings yet
110-6183-En-R2 HIT Install User Guide V4.7
94 pages
Data structures-Lab-Programs-II BCA A - PROGRAM 3
No ratings yet
Data structures-Lab-Programs-II BCA A - PROGRAM 3
8 pages
ER Diagrams (Concluded), Schema Refinement, and Normalization
No ratings yet
ER Diagrams (Concluded), Schema Refinement, and Normalization
39 pages
GG Doc
No ratings yet
GG Doc
9 pages
Ibm SW ds3-5k 11.20.x5.10 Windows Anycpu
No ratings yet
Ibm SW ds3-5k 11.20.x5.10 Windows Anycpu
27 pages
Error Detection & Correction Codes: Contents
No ratings yet
Error Detection & Correction Codes: Contents
13 pages
Ps 2 Keyboard Controller
No ratings yet
Ps 2 Keyboard Controller
2 pages
PG Seeburger Rosettanet
No ratings yet
PG Seeburger Rosettanet
3 pages
Database Akademik
No ratings yet
Database Akademik
5 pages
FAQ - Revit Server Network GUIDs PDF
No ratings yet
FAQ - Revit Server Network GUIDs PDF
2 pages
Class VIII - Computer
No ratings yet
Class VIII - Computer
1 page
Big Data Analytics
From Everand
Big Data Analytics
Nitin Kumar Yadav
No ratings yet
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
From Everand
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
Wei Liu
No ratings yet

Hive

Uploaded by

Hive

Uploaded by

 Hadoop is an open source framework that is

used to efficiently store and process large

◦ Can overwrite an entire table

 SELECT id FROM sales TABLESAMPLE (BUCKET 1 OUT OF

 Run HiveQL, SQL-like language, to interact with Hadoop.

You might also like