Lab6F_Creating Hive Table with Complex Data Type
Lab6F_Creating Hive Table with Complex Data Type
Outlines • Concept
• Scenario 1 - Creating a table with String Array data type and load data into the table
• Scenario 2 - Creating a table with Map data type and load data into the table
• Scenario 3 - Creating a table with Struct data type and load data into the table
• Scenario 4 - Processing values from Array data type
• Scenario 5 - Creating a table with Struct data type and load data into the table and perform a calculation
Reference
• https://www.educba.com/hive-data-types/
• https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types#LanguageManualTypes-ComplexTypes
• https://impala.apache.org/docs/build/html/topics/impala_array.html#array
1) Array - a sequence of elements of a common type that can be indexed, and the index value starts from zero
3) Struct - a datatype that comprises of a set of attributes from different data type
4) Uniontype - can hold any one of the specified data types (beyond the scope)
Scenario 1 • Creating a table with String Array data type and load data into the table
reference:
• https://stackoverflow.com/questions/33984794/loading-csv-file-on-hive-table-with-string-array
The data:
upload into HDFS Create a directory and the upload the file, as follows: (*Note replace student30 with your student access number)
check the output run this command to check the table structure:
• describe article;
run this command to check the location of data:
• show create table article;
• notice that, the location is in HDFS directory, not specifically in Hive metastore
3) you need to load the data into the table in Hive metastore
• load data inpath '/user/student30/scenario1/scenario1.txt' overwrite into table article_int;
Exploration 1) As you use the same data source for the internal table, what happen to the data of the previously created external
table?
2) How to address that problem?
Scenario 2 • Creating a table with Map data type and load data into the table
reference:
• https://acadgild.com/blog/hive-complex-data-types-with-examples
the data:
upload into HDFS Create a directory and the upload the file, as follows:
Note:
• you will need to adjust the path according to your path
check the output run this command to check the location of data:
• show create table school_info;
Exploration What if, we want to count the total student for each year?
references:
• https://acadgild.com/blog/hive-complex-data-types-with-examples
• http://myitlearnings.com/complex-data-type-in-hive-struct/
the data:
upload into HDFS Create a directory and the upload the file, as follows:
Note:
• you will need to adjust the path according to your path
check the output run this command to check the location of data:
• show create table address_info;
What if you want to retrieve the city only from each record?
• select address.city as city from address_info;
The data:
upload into HDFS Create a directory and the upload the file, as follows:
Note:
• you will need to adjust the path according to your path
check the output run this command to check the location of data:
• show create table classmark;
Scenario 5 • Creating a table with Struct data type and load data into the table and perform a calculation
The data:
upload into HDFS Create a directory and the upload the file, as follows:
create the table Run this command:
Note:
• we need skip.header.line.count because the dataset contains header
• alternatively, we can manually delete header in the dataset
Note:
• you will need to adjust the path according to your path
check the output run this command to check the location of data:
• show create table region;
What if you want to calculate total number of nation keys and group it by the region name?
• select r_name, count(r_nations.n_nationkey) as nation_num from region group by r_name;
Accessing HUE • to access HUE, go to https://bigdatalab-rm-en1.uitm.edu.my:8889/hue/accounts/login?next=/
• then login using the given account
YARN monitoring • To view the monitored applications (Note: you must access within UiTM network), go
tools to http://10.5.19.231:8088/cluster/apps
• To view the monitored jobs (Note: you must access within UiTM network), go to http://10.5.19.231:19888/jobhistory/app