0% found this document useful (0 votes)

4 views11 pages

Lab6F_Creating Hive Table with Complex Data Type

The document provides a comprehensive guide on creating Hive tables with complex data types, including String Array, Map, and Struct. It outlines the necessary steps for each scenario, including data understanding, table structure planning, and data loading into Hive. Additionally, it covers accessing tools like HUE, Hive, and MariaDB, as well as monitoring YARN applications.

Uploaded by

2024740897

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views11 pages

Lab6F_Creating Hive Table with Complex Data Type

Uploaded by

2024740897

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

L6F - Creating Hive Table with Complex Data Type

Outlines • Concept
• Scenario 1 - Creating a table with String Array data type and load data into the table
• Scenario 2 - Creating a table with Map data type and load data into the table
• Scenario 3 - Creating a table with Struct data type and load data into the table
• Scenario 4 - Processing values from Array data type
• Scenario 5 - Creating a table with Struct data type and load data into the table and perform a calculation

concept Hive Data Types

Can be classified into two types:

1) Primitive data types
2) Collective data types

Reference
• https://www.educba.com/hive-data-types/
• https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types#LanguageManualTypes-ComplexTypes
• https://impala.apache.org/docs/build/html/topics/impala_array.html#array

Primitive data types There are:

Collective data types There are:

1) Array - a sequence of elements of a common type that can be indexed, and the index value starts from zero

2) Map - a set of key-value pair elements

3) Struct - a datatype that comprises of a set of attributes from different data type

4) Uniontype - can hold any one of the specified data types (beyond the scope)
Scenario 1 • Creating a table with String Array data type and load data into the table

reference:
• https://stackoverflow.com/questions/33984794/loading-csv-file-on-hive-table-with-string-array

Steps • understand data and plan for the table structure

• create a directory and upload the file into HDFS
• construct and execute a command to create an external table
• check the output

data understanding The file: scenario1.txt

and table planning

The data:

Plan for the table structure:

1) determine the datatypes

• the primitive data type
o id as int
o title as string
o author as string
• the collective data type
o genre as array
o
2) determine the delimiter
• we need to separate each field by ','
• we need to separate each collection by '|'

upload into HDFS Create a directory and the upload the file, as follows: (*Note replace student30 with your student access number)

create the table Note:

• We can apply internal and external table
• For this exercise, external example is chosen
• make sure you have chosen your database, such as: use student30;

Run this command:

create table if not exists article (

id int,
title string,
author string,
genre array<string> )
row format delimited
fields terminated by ','
collection items terminated by '|'
location '/user/student30/scenario1';

check the output run this command to check the table structure:
• describe article;
run this command to check the location of data:
• show create table article;

• notice that, the location is in HDFS directory, not specifically in Hive metastore

run this command to check the data:

• select * from article;

what if you want to list the data of genre only?

• select genre from article;

What if you want to retrieve the first genre of each record?

• select genre[0] as first_genre from article;

Exploration - What if you decide to create an internal table?

internal table
Steps:

1) you need to create the table

create table if not exists article_int (

id int,
title string,
author string,
genre array<string> )
row format delimited
fields terminated by ','
collection items terminated by '|';

2) check out the expected location of the data:

• show create table article_int;

3) you need to load the data into the table in Hive metastore
• load data inpath '/user/student30/scenario1/scenario1.txt' overwrite into table article_int;

4) check out the data

• select * from article_int;

Exploration 1) As you use the same data source for the internal table, what happen to the data of the previously created external
table?
2) How to address that problem?

Scenario 2 • Creating a table with Map data type and load data into the table

reference:
• https://acadgild.com/blog/hive-complex-data-types-with-examples

Steps • understand data and plan for the table structure

• create a directory and upload the file into HDFS
• construct and execute a command to create an internal table
• load the data into the table
• check the output

data understanding the file: scenario2.txt

the data:

Plan for the table structure:

1) determine the datatypes

• the primitive data type
o school level as string
o state as string
o gender as string
• the collective data type
o total student as map where:
▪ year as key
▪ total as value
▪
2) determine the delimiter
• we need to separate each field by space
• we need to separate each collection by ','
• we need to separate each map by ':'

upload into HDFS Create a directory and the upload the file, as follows:

create the table Run this command:

create table if not exists school_info (

school_level string,
state string,
gender string,
total_stud map<int,int> )
row format delimited
fields terminated by ' '
collection items terminated by ','
map keys terminated by ':';

check the created table structure:

load the data run this command:

load data inpath '/user/student30/scenario2/scenario2.txt' overwrite into table school_info;

Note:
• you will need to adjust the path according to your path

check the output run this command to check the location of data:
• show create table school_info;

run this command to check the data:

• select * from school_info;

what if you want to list the data of total only?

• select total_stud from school_info;

What if you want to retrieve the data of 2015 of each record?

• select total_stud[2015] as total from school_info;

Exploration What if, we want to count the total student for each year?

1) identify how to access key in the map

• mapname[key]
2) construct and execute this command
• select sum(total_stud[2015]) as total_2015, sum(total_stud[2016]) as total_2016, sum(total_stud[2017]) as total_2017 from
school_info;
3) you should get the following output:
Scenario 3 • Creating a table with Struct data type and load data into the table

references:
• https://acadgild.com/blog/hive-complex-data-types-with-examples
• http://myitlearnings.com/complex-data-type-in-hive-struct/

Steps • understand data and plan for the table structure

• create a directory and upload the file into HDFS
• construct and execute a command to create an internal table
• load the data into the table
• check the output

data understanding the file: scenario3.txt

the data:

Plan for the table structure:

1) determine the datatypes

• the primitive data type
o firstname as string
o lastname as string
• the collective data type
o address as struct where:
▪ house number as int
▪ road name as string
▪ city as string
▪ state as string
▪
2) determine the delimiter
• we need to separate each field by '\t'
• we need to separate each collection by ','

upload into HDFS Create a directory and the upload the file, as follows:

create the table Run this command:

create table if not exists address_info (

firstname string,
lastname string,
address struct<num:int, road:string, city:string, state:string>)
row format delimited
fields terminated by '\t'
collection items terminated by ',';

check the created table structure:

load the data run this command:

load data inpath '/user/student30/scenario3/scenario3.txt' overwrite into table address_info;

Note:
• you will need to adjust the path according to your path

check the output run this command to check the location of data:
• show create table address_info;

run this command to check the data:

• select * from address_info;

what if you want to list the address only?

• select address from address_info;

What if you want to retrieve the city only from each record?
• select address.city as city from address_info;

Scenario 4 • Processing the values of Array data type

Steps • understand data and plan for the table structure

• create a directory and upload the file into HDFS
• construct and execute a command to create an internal table
• load the data into the table
• check the output

the dataset The file: scenario4.txt

The data:

Plan for the table structure:

1) determine the datatypes
• the primitive data type
o class name as string
• the collective data type
o mark as array
2) determine the delimiter
• we need to separate each field by space
• we need to separate each collection by ','

upload into HDFS Create a directory and the upload the file, as follows:

create the table Run this command:

create table if not exists classmark (

classname string,
mark array<int> )
row format delimited
fields terminated by '\t'
collection items terminated by ',';

check the created table structure:

load the data run this command:

load data inpath '/user/student30/scenario4/scenario4.txt' overwrite into table classmark;

Note:
• you will need to adjust the path according to your path

check the output run this command to check the location of data:
• show create table classmark;

run this command to check the data:

• select * from classmark;

what if you want to list the mark only?

• select mark from classmark;
What if you want to count the size of array?
• select classname, size(mark) as num_of_mark from classmark;

What if you want to sum the total mark for index 0?

• select sum(mark[0]) as total_index0 from classmark;

Scenario 5 • Creating a table with Struct data type and load data into the table and perform a calculation

Steps • understand data and plan for the table structure

• create a directory and upload the file into HDFS
• construct and execute a command to create an internal table
• load the data into the table
• check the output

the dataset The file: region.csv

The data:

Plan for the table structure:

1) determine the datatypes

• the primitive data type
o r_regionkey as smallint
o r_name as string
• the collective data type
o r_nation as struct where
▪ n_nationkey as smallint
▪ n_name as string
▪ n_comment as string
▪
2) determine the delimiter
• we need to separate each field by '|'
• we need to separate each collection by ','

upload into HDFS Create a directory and the upload the file, as follows:
create the table Run this command:

create table if not exists region (

r_regionkey smallint,
r_name string,
r_comment string,
r_nations struct<n_nationkey:smallint, n_name:string, n_comment:string> )
row format delimited
fields terminated by '|'
collection items terminated by ','
tblproperties("skip.header.line.count"="1");

Note:
• we need skip.header.line.count because the dataset contains header
• alternatively, we can manually delete header in the dataset

check the created table structure:

load the data run this command:

load data inpath '/user/student30/scenario5/region.csv' overwrite into table region;

Note:
• you will need to adjust the path according to your path

check the output run this command to check the location of data:
• show create table region;

run this command to check the data:

• select * from region;

What if you want to calculate total number of nation keys and group it by the region name?
• select r_name, count(r_nations.n_nationkey) as nation_num from region group by r_name;
Accessing HUE • to access HUE, go to https://bigdatalab-rm-en1.uitm.edu.my:8889/hue/accounts/login?next=/
• then login using the given account

Accessing Hive • to access Hive, execute the following command:

o beeline -u jdbc:hive2://bigdatalab-cdh-mn1.uitm.edu.my:10000 -n yourusername -p yourpassword
• then type in:
o use yourdatabasename
• then, you can browse the available tables, by typing in:
o show tables

Accessing MariaDB• Type in the following:

o mysql -ustudent -pp@ssw0rd retail_db

YARN monitoring • To view the monitored applications (Note: you must access within UiTM network), go
tools to http://10.5.19.231:8088/cluster/apps
• To view the monitored jobs (Note: you must access within UiTM network), go to http://10.5.19.231:19888/jobhistory/app

Hive Data Types and Data Models
No ratings yet
Hive Data Types and Data Models
24 pages
South East Asian Institute of Technology, INC. National Highway, Crossing Rubber, Tupi, South Cotabato
No ratings yet
South East Asian Institute of Technology, INC. National Highway, Crossing Rubber, Tupi, South Cotabato
14 pages
Hive Intoduction and Tables
No ratings yet
Hive Intoduction and Tables
31 pages
Practical-2 Hive (Show- Create- Load Commands).Pptx
No ratings yet
Practical-2 Hive (Show- Create- Load Commands).Pptx
13 pages
Big data notes
No ratings yet
Big data notes
7 pages
05b-Hive
No ratings yet
05b-Hive
37 pages
demo
No ratings yet
demo
45 pages
Hive File Format
No ratings yet
Hive File Format
38 pages
Big Data Record 2
No ratings yet
Big Data Record 2
117 pages
DOC-20250429-WA0006. (1)
No ratings yet
DOC-20250429-WA0006. (1)
53 pages
Hive
No ratings yet
Hive
42 pages
Slide 5 High-Level Data Process Components Tutorial
No ratings yet
Slide 5 High-Level Data Process Components Tutorial
109 pages
HIVE AND PIG
No ratings yet
HIVE AND PIG
57 pages
Cse3002 Big Data m2
No ratings yet
Cse3002 Big Data m2
76 pages
Hive Main
No ratings yet
Hive Main
24 pages
Dan Wahlin XML For ASP - NET Developers Kaleidoscope
No ratings yet
Dan Wahlin XML For ASP - NET Developers Kaleidoscope
497 pages
Hive Query Language
No ratings yet
Hive Query Language
33 pages
HIVE architecture
No ratings yet
HIVE architecture
5 pages
HIVE
No ratings yet
HIVE
28 pages
BDA-UNIT-IV -2020-21
100% (1)
BDA-UNIT-IV -2020-21
30 pages
Hive
No ratings yet
Hive
4 pages
Hive Notes
No ratings yet
Hive Notes
15 pages
GlobalIDPAPI - This Python Code Allows You To Access The Global API and Get The Result in Different Format (CSV, XLSX, and JSON)
No ratings yet
GlobalIDPAPI - This Python Code Allows You To Access The Global API and Get The Result in Different Format (CSV, XLSX, and JSON)
4 pages
Compilers Lecture 7
No ratings yet
Compilers Lecture 7
21 pages
HDFSandhivecommands
No ratings yet
HDFSandhivecommands
15 pages
Hive Documet
No ratings yet
Hive Documet
33 pages
hive table session
No ratings yet
hive table session
23 pages
FWD: (Passed) Quiz Results: "MBDHC"
No ratings yet
FWD: (Passed) Quiz Results: "MBDHC"
8 pages
Big Data
No ratings yet
Big Data
17 pages
hive
No ratings yet
hive
15 pages
Big Data
No ratings yet
Big Data
120 pages
Unit5 Notes
No ratings yet
Unit5 Notes
29 pages
2864-1637147622543-Practical - 10.3 (4.5 Hours)
No ratings yet
2864-1637147622543-Practical - 10.3 (4.5 Hours)
5 pages
Apache HIVE
No ratings yet
Apache HIVE
44 pages
Bigdata@master: 4.set The Environmental Variable HIVE - HOME in Bashrc File
No ratings yet
Bigdata@master: 4.set The Environmental Variable HIVE - HOME in Bashrc File
91 pages
BDS Session 8
No ratings yet
BDS Session 8
49 pages
Log
No ratings yet
Log
5 pages
Hive Cammand
No ratings yet
Hive Cammand
22 pages
HIVE Data Types
No ratings yet
HIVE Data Types
6 pages
Hive_Main
No ratings yet
Hive_Main
33 pages
Bigdata Analytics
No ratings yet
Bigdata Analytics
13 pages
Unit-4 Pig Hive
No ratings yet
Unit-4 Pig Hive
40 pages
OSY 7 Years Assignment (1, 2 & 3)
No ratings yet
OSY 7 Years Assignment (1, 2 & 3)
3 pages
Hive
No ratings yet
Hive
29 pages
Meshmixer Log
No ratings yet
Meshmixer Log
2 pages
BDA Assignment I and II
No ratings yet
BDA Assignment I and II
8 pages
Snowflake Best Practice Guide
100% (1)
Snowflake Best Practice Guide
75 pages
Faculty Notes PPS
No ratings yet
Faculty Notes PPS
91 pages
HIVE
No ratings yet
HIVE
80 pages
DSCI 5350 - Lecture 5 PDF
No ratings yet
DSCI 5350 - Lecture 5 PDF
64 pages
Abhishek Project Report
No ratings yet
Abhishek Project Report
33 pages
Introduction to Hive
No ratings yet
Introduction to Hive
14 pages
Accessing Hadoop Data Using Hive: Hive DDL - VIDEO 1
No ratings yet
Accessing Hadoop Data Using Hive: Hive DDL - VIDEO 1
3 pages
Chapte 1 Exercises Unit II Introduction to JAVA
No ratings yet
Chapte 1 Exercises Unit II Introduction to JAVA
11 pages
Hive Pig PDF
No ratings yet
Hive Pig PDF
20 pages
6.1NoSQL ApacheHIVE Witha3
No ratings yet
6.1NoSQL ApacheHIVE Witha3
45 pages
Hadoop Hive
No ratings yet
Hadoop Hive
61 pages
Session 3.2
No ratings yet
Session 3.2
27 pages
Hive Overview
No ratings yet
Hive Overview
28 pages
3 SQL Hadoop Analyzing Big Data Hive m3 Hiveql Slides
No ratings yet
3 SQL Hadoop Analyzing Big Data Hive m3 Hiveql Slides
33 pages
Python CX Oracle
No ratings yet
Python CX Oracle
121 pages
Lokesh Final
No ratings yet
Lokesh Final
26 pages
Autocad MFT V3.1 : Welcome To The Multifiletool
No ratings yet
Autocad MFT V3.1 : Welcome To The Multifiletool
11 pages
Airline Reservation System
No ratings yet
Airline Reservation System
81 pages
Pega Interview Final
No ratings yet
Pega Interview Final
26 pages
Hive Tutorial
No ratings yet
Hive Tutorial
25 pages
NVIDIA NIM Customer Deck - Partners
No ratings yet
NVIDIA NIM Customer Deck - Partners
12 pages
Big Data Analytics and Developers Training Session 10
No ratings yet
Big Data Analytics and Developers Training Session 10
27 pages
7 Hive Notes
No ratings yet
7 Hive Notes
36 pages
Hive - A Warehousing Solution Over A Map-Reduce Framework
No ratings yet
Hive - A Warehousing Solution Over A Map-Reduce Framework
4 pages
Toaz - Info Fantasy Cricket Game Using Python Intershala Project PR
No ratings yet
Toaz - Info Fantasy Cricket Game Using Python Intershala Project PR
33 pages
SYS600 - External OPC Data Access Client
No ratings yet
SYS600 - External OPC Data Access Client
80 pages
07 Hive 01 Exercises
0% (1)
07 Hive 01 Exercises
4 pages
3 Algorithm and Flowchart 150823203936 Lva1 App6892
No ratings yet
3 Algorithm and Flowchart 150823203936 Lva1 App6892
37 pages
Wings1 Process (Group 4) 56978 Process
100% (3)
Wings1 Process (Group 4) 56978 Process
3 pages
Hive Queries
No ratings yet
Hive Queries
5 pages
Experiment 3: Hive: Aim: To Understand Data Processing Tool - Hive and HQL (Hive Query Language)
No ratings yet
Experiment 3: Hive: Aim: To Understand Data Processing Tool - Hive and HQL (Hive Query Language)
11 pages
Crochet Sdbes Com Amigurumi Cool Xiao Free Crochet Pattern F
100% (4)
Crochet Sdbes Com Amigurumi Cool Xiao Free Crochet Pattern F
47 pages
Design Methodologies in Embedded System
100% (3)
Design Methodologies in Embedded System
17 pages
Write SQL Vba
No ratings yet
Write SQL Vba
9 pages
Rust Package 100 Knocks: One-Hour Mastery Series 2024 Edition
From Everand
Rust Package 100 Knocks: One-Hour Mastery Series 2024 Edition
Kanto
No ratings yet
Hive Interview
75% (4)
Hive Interview
17 pages
SQL Bible
100% (6)
SQL Bible
870 pages
Application Threat Modelling
No ratings yet
Application Threat Modelling
3 pages
Building Scalable Apps With Redis and Node - Js Sample Chapter
No ratings yet
Building Scalable Apps With Redis and Node - Js Sample Chapter
44 pages
C# Package Mastery: 100 Essentials in 1 Hour - 2024 Edition
From Everand
C# Package Mastery: 100 Essentials in 1 Hour - 2024 Edition
Tenko
No ratings yet
PythonScientific Simple PDF
100% (2)
PythonScientific Simple PDF
335 pages
Visualizing Data Structures
From Everand
Visualizing Data Structures
Rhonda Hoenigman
No ratings yet
Chapter+9+ HIVE
No ratings yet
Chapter+9+ HIVE
50 pages
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
3/5 (1)

Lab6F_Creating Hive Table with Complex Data Type

Uploaded by

Lab6F_Creating Hive Table with Complex Data Type

Uploaded by

L6F - Creating Hive Table with Complex Data Type

concept Hive Data Types

Can be classified into two types:

Primitive data types There are:

Collective data types There are:

2) Map - a set of key-value pair elements

Steps • understand data and plan for the table structure

data understanding The file: scenario1.txt

Plan for the table structure:

1) determine the datatypes

create the table Note:

Run this command:

create table if not exists article (

run this command to check the data:

what if you want to list the data of genre only?

What if you want to retrieve the first genre of each record?

Exploration - What if you decide to create an internal table?

1) you need to create the table

create table if not exists article_int (

2) check out the expected location of the data:

4) check out the data

Steps • understand data and plan for the table structure

data understanding the file: scenario2.txt

Plan for the table structure:

1) determine the datatypes

create the table Run this command:

create table if not exists school_info (

check the created table structure:

load data inpath '/user/student30/scenario2/scenario2.txt' overwrite into table school_info;

run this command to check the data:

what if you want to list the data of total only?

What if you want to retrieve the data of 2015 of each record?

1) identify how to access key in the map

Steps • understand data and plan for the table structure

data understanding the file: scenario3.txt

Plan for the table structure:

1) determine the datatypes

create the table Run this command:

create table if not exists address_info (

check the created table structure:

load data inpath '/user/student30/scenario3/scenario3.txt' overwrite into table address_info;

run this command to check the data:

what if you want to list the address only?

Scenario 4 • Processing the values of Array data type

Steps • understand data and plan for the table structure

the dataset The file: scenario4.txt

Plan for the table structure:

create the table Run this command:

create table if not exists classmark (

check the created table structure:

load the data run this command:

load data inpath '/user/student30/scenario4/scenario4.txt' overwrite into table classmark;

run this command to check the data:

what if you want to list the mark only?

What if you want to sum the total mark for index 0?

Steps • understand data and plan for the table structure

the dataset The file: region.csv

Plan for the table structure:

1) determine the datatypes

create table if not exists region (

check the created table structure:

load the data run this command:

load data inpath '/user/student30/scenario5/region.csv' overwrite into table region;

run this command to check the data:

Accessing Hive • to access Hive, execute the following command:

Accessing MariaDB• Type in the following:

You might also like