0% found this document useful (0 votes)
5 views4 pages

L6H_Processing Data using Impala

The document outlines three scenarios for processing data using Impala with a focus on grouping orders by date and year. It details methods for executing group by commands, including the use of substr for extracting years and subqueries for aliasing. Additionally, it provides instructions for accessing Impala and managing table visibility in the Hive metastore.

Uploaded by

2024740897
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views4 pages

L6H_Processing Data using Impala

The document outlines three scenarios for processing data using Impala with a focus on grouping orders by date and year. It details methods for executing group by commands, including the use of substr for extracting years and subqueries for aliasing. Additionally, it provides instructions for accessing Impala and managing table visibility in the Hive metastore.

Uploaded by

2024740897
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

L6H - Processing Data using Impala

outlines • Scenario 1 - Group by


• Scenario 2 - Group by with substr
• Scenario 3 - Group by with alias through sub queries

Scenario 1 You are interested in knowing the number of orders and group by order dates.

Concept - Group https://www.tutorialspoint.com/hive/hiveql_group_by.htm


By

Additional note: The alias limitation with group by:

https://stackoverflow.com/questions/3841295/sql-using-alias-in-group-by

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+GroupBy

Method • data understanding


• construct and execute the group by command
• check the results

data Note:
understanding • Make sure you already have this table (not a partitioned table).
• If the table is already in your Hive metastore, but it is not appeared in your
impala, then perform invalidate command.
• If the table is not yet in your Hive metastore, then you will need to sqoop from
MariaDB to Hive metastore. After that, perform invalidate command.

The table schema:


Sample data:

construct and select order_date, count(order_id) from orders group by order_date;


execute

check the result

Scenario 2 This time, you are interested in knowing the number of orders and group by
order years.

Method • data understanding (same as above)


• extract the years
• construct and execute the group by command
• check the results

extracting year using substr - http://www.hplsql.org/substr


construct and
execute the select substr(order_date,1,4) as years, count(*) from orders group by
command substr(order_date,1,4);

check the results

Scenario 3 • We are interested in knowing the number of orders group by order years using
alias through subqueries technique.

Subqueries https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SubQueries#
:~:text=Subqueries%20in%20the%20FROM%20Clause,-
%3F&text=Hive%20supports%20subqueries%20only%20in,list%20must%20hav
e%20unique%20names.

Method • data understanding (same as above)


• extract the years (same as above)
• construct and execute the group by command
• check the results

construct and SELECT years, count(*)


execute the FROM (
command SELECT substr(order_date, 1, 4) as years
FROM orders
)a
GROUP BY years;

Note: years is a kind of alias

check the results

Accessing Impala type in the following:


• impala-shell
• connect bigdatalab-cdh-dn2.uitm.edu.my;
• use student30;
• show tables;

Note:
• if the required table is not listed yet (e.g. customers table) but it is already in Hive
metastore, then run the following:
o invalidate metadata customers;

You might also like