0% found this document useful (0 votes)

22 views10 pages

Lab02 Hive1

Uploaded by

Thushie Jayani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views10 pages

Lab02 Hive1

Uploaded by

Thushie Jayani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

CSE3BDC/CSE5BDC

Lab 02: Apache Hive

Department of Computer Science and IT, La Trobe University

Objectives
Learn the advantages of Hive over traditional MapReduce

Gain experience writing and executing basic workloads in Hive

See the connection between Hive and MapReduce

This weeks lab, and those following it, should be executed in the provided docker container using
the run.sh script in local mode. See LMS for instructions for installing Docker and getting access
to a Linux command line. Depending on your setup docker may be installed through WSL on your
own computer, on a VM via a web interface or in the lab, or maybe you have Linux and docker
installed on your computer already.

Task 1: Word count

The task of counting how frequently words appear in a corpus of documents is commonly used as an
introduction to big data processing. So, surely enough, our rst exercise with Hive will be counting
words!

1. Ensure that you have the lab les downloaded in the VM and stored in the directory you
specied when starting run.sh.
2. Within the docker container started by run.sh, navigate to the same directory as the .hql
lab les. These .hql les contain HiveQL code, which is a dialect of SQL used by Hive.

3. Run the rst HiveQL script:

$ hive -f t1-wordcount.hql

Note that in Hive, we are using hard-coded directory paths rather than specifying command-
line arguments, so be sure to not modify the path of the input data les without modifying
the HiveQL le as well.

4. The program may take a short while to complete, so while you wait, take this opportunity
look over the code of the t1-wordcount.hql le. The program creates the following three
tables:

1
(a) The myinput table. This table stores lines of text from a directory of text les. Each
row of the table stores an entire line of text.

(b) The mywords table. This table stores individual words extracted from the myinput
table. The table is created by expanding (using the EXPLODE function) each row of
the myinput table into multiple rows, where each row contains a single word. The create
table command also strips the input of punctuation and control characters using a regular
expression.

(c) The wordcount table gets each word from themywords table and counts the number of
occurrences of each unique word using count(1). It also removes any blank words using
the WHERE clause by keeping only words that are not blank.

Finally, this data is then written to an output le using the last two lines of the code.

5. Take a look at the les inside the input directory Input_data/1/. Notice there are many
dierent les in it. When you give Hive an input directory, it takes all the les in it together
as the input.

6. Once the program has completed successfully, the tables will be stored in Hive. You can start
an extra Hive interpreter by starting a new terminal and then running the following command
to start a new bash shell in the same docker container that is running the Hive scripts. Inside
that new bash shell type 'hive' to start the hive interpreter.

$ docker exec -it hive-host bash

You can keep this terminal open to use the Hive interpreter at any time.

7. List all of the tables in Hive:

SHOW TABLES;

You should see the three tables created by the script listed.

8. Type in the following to see the rst 10 rows of the myinput table:

SELECT * FROM myinput LIMIT 10;

2
9. Repeat the above for the mywords and wordcount tables.

10. Swap to your original bash prompt, leaving the new Hive prompt open.

11. Try to run the Hive script again using the same command and see what happens.

$ hive -f t1-wordcount.hql

12. This time the script fails with an AlreadyExistsException. This is because the system still
contains the old tables you created. You need to drop the three tables at the beginning of the
script before recreating them again. Insert the following three lines at the top of the script.

DROP TABLE myinput;

DROP TABLE mywords;
DROP TABLE wordcount;

13. Once the program has completed successfully, browse to the task1-out folder and view the
generated output in a text editor. If everything went well, you should see a whole lot of
words and numbers in no particular order. The columns are separated by the \001 character
(rendered as SOH in Sublime), which is the default Hive eld delimiter.

14. You probably do not like having the output columns separated by \001. You can change
the separator to anything you want. Do the following in order to make the output columns
separated by the tab character \t instead. Insert the following commands just before the
SELECT * FROM wordcount; line:

ROW FORMAT DELIMITED

FIELDS TERMINATED BY '\t'
STORED AS TEXTFILE

Task 2: Subqueries and Hive MapReduce analysis

Using subqueries
Let's try to redo the word count program using just two tables: myinput and wordcount. We will
do this by writing the mywords table as a subquery within the wordcount table.
1. For only this Task, you must restart run.sh in yarnmode for the jobs to show up in Hue.
Yarnmode is slower than localmode because it adds a lot of scheduling overhead, so we
recommend using localmode for other tasks.
2. Copy the le t1-wordcount.hql to t2-wordcount.hql so that we don't have to start from
scratch.

3. In the command to create the wordcount table (which begins with the line CREATE TABLE
wordcount AS), replace the line

3
FROM mywords

with

FROM (<subquery>) splitwords

where <subquery> is the second and third line from the command to create the mywords table
(SELECT EXPLODE ... FROM myinput). This modication makes the wordcount table take
its input from the result of a subquery instead of from the table mywords. The splitwords
is just a name we give to the table created by the subquery, it can be any valid name.

4. Modify the t2-wordcount.hql script so that the output is saved to task2-out instead of
task1-out.
5. Execute t2-wordcount.hql and check that you get the same output as for Task 1.

6. Open Hue in your web-browser by going to http://localhost:8888. If you are working in a

VM, make sure you open the web browser inside the VM. Enter root as the username and
type whatever you like in the password eld (you don't need to remember what you type).
Navigate to the job browser, as shown below:

7. Now lets do a small experiment comparing the eciency of t1-wordcount.hql and t2-wordcount.hql.
While keeping the job browser open, rst run t1-wordcount.hql and then t2-wordcount.hql.
What do you notice?

(a) The processing for t1-wordcount.hql uses three separate MapReduce jobs:

i. MapReduce job 1: Create the mywords table.

ii. MapReduce job 2: Create the wordcount table.

iii. MapReduce job 3: Output the wordcount table to the local directory.

(b) The processing for t2-wordcount.hql uses two separate MapReduce jobs:

i. MapReduce job 1: Create the wordcount table.

ii. MapReduce job 2: Output the wordcount table to the local directory.

4
(c) If you add up the total time taken by jobs for each of the two scripts, you should see
thatt2-wordcount.hql is faster. This is because by having one less MapReduce job
t2-wordcount.hql performs roughly 1/3 less disk IO. This is because at the start of
each MapReduce job all of the data needs to be loaded from disk, and then the results
need to be written to disk again afterwards.

8. It's all good and well to be able to count a bunch of words, but the data right now is not
presented in any useful order. Modify the program so that it presents the wordcount data
ordered by the count in descending order (that is, the words with the most occurrences will
appear rst). As a secondary order, make the words also appear is ascending order.
Hint: You may want to use the familiar SQL syntax shown below at the end of the query
which creates the wordcount table (don't forget to delete the semi-colon at the end of the
GROUP BY):

ORDER BY <col1> DESC, <col2> ASC;

9. Execute the script again and verify the output. You should now have an output dataset with
the most frequent occurring words at the top and, in order to break ties (words with the same
frequency), words are also listed in alphabetic order.

You have now modied and executed a basic word count example in Hive that also orders its output,
with a program that is only about 20 lines long. But don't stop therethere's many other things
we can do in Hive, just as easily!

Exercise 1. Modify the t2-wordcount.hql script again, this time so that it only outputs the top
10 most frequently occurring words. Hint: this is similar to how you added the ORDER BY clause
earlier, but this time use LIMIT (see the LIMIT clause documentation).

5
Task 3: Stop list and joins
With just one line you can change the ordering of the output, and with another you can modify
how many rows to select. If you were to write MapReduce code for this directly, it would take a
lot more eort (trust us on this one!). But the SQL-like nature of Hive provides a lot more than
just thisone of the most powerful tools in your arsenal is being able to use joins.
The join operation returns combinations of records from two tables. For example, if you have
a table of students and a table of classes, you could use a join to obtain a list of student-class
combinations.

Stop lists
A stop list is a list of words that we want to lter out of a data set. Typically stop lists include
words which don't carry much meaning, like a, the, in, etc. Therefore we want to look inside
a data set and then discard every word in it that appears in the stop list.

1. We will start with the Hive script le t3-stoplist.hql. This le currently just creates
the two tables myinput and mywords from task 1 and dumps the output to the directory
task3-out. You will modify this le in order to lter out a set of stop list words from the
mywords table.
2. Create another single-column table called stopwords. Then read the stoplist.txt le (In-
put_data/3/stoplist.txt) into your newly created stopwords table. Refer to how the myinput
table was created if you are having trouble. Don't forget to put DROP TABLE at the beginning
of the script for the added table, since we will be rerunning the script many times.
Note: you can assume each separate word in the stop list is on a dierent line, so there is no
need to split lines. Take a look at the le to verify it.

3. Execute the script, and then go into the Hive interpreter to execute SELECT * ... LIMIT
10 on the stopwords table, to make sure it has the correct data.

4. Next, create an interim (temporary) table called stopjoin that contains two columns. The
rst column is called mword and the second column is called sword. Take a look at Table 1
below for an example of what the stopjoin table should look like. The rst column (mword
column) of the stopjoin table just contains all the words inside the mywords table. For each
mword, the second column, sword, contains the matching stop word. If an mword does not
exist in the stopword table, then its corresponding sword NULL.
is
Your job now is to create thestopjoin table. You will need to JOIN
the mywords table with
thestopwords table to create the stopjoin table. Since we want to keep all the words in the
mywords table, even the ones that do not match the stopwords, you need to use the OUTER
JOIN. See below for example join syntax (note: you need to substitute the right names for
col1, col2 and table1 and table2):

SELECT <table1.col1> AS mword, <table2.col1> AS sword

FROM <table1> LEFT OUTER JOIN <table2>
ON (<table1.col1> = <table2.col1>);

6
stopjoin
mywords stopwords mword sword
the a the the
treasure is treasure NULL
is the is is
my my NULL
treasure treasure NULL

Table 1: Example mywords and stopwords tables, and the expected stopjoin table.

5. Execute the script, and again check the table contains the correct information by using the
Hive interpreter to do SELECT * ... LIMIT 10 on the stopjoin table.

6. Currently the stopjoin table contains rows for blank words (empty strings). We can count
how many of these rows there are by running the following query in the Hive interpreter:

SELECT COUNT(1) FROM stopjoin

WHERE mword LIKE "";

You should see that there are over 50,000 rows for useless blank words! We will now prevent
your script from adding these rows to the stopjoin table. To do this, add a WHERE clause to
the end of the CREATE stopjoin ... query that only keeps the words that do not match the
empty string, "":

WHERE mywords.word NOT LIKE "";

Run your updated script, then use the Hive interpreter to count the rows with blank words
again. This time the count should be 0.
7. To get a better idea of what is in the stopjoin table, do the following in the Hive interpreter:
(a) Select the rst 10 lines where mword is the. The result should be 10 rows where both
columns have the word the, since the is a stop word.

(b) Select the rst 10 lines where mword is help. The result should be 10 rows where the
rst column has help and the second row has NULL, since help is not a stop word.

8. Next, create a new table called stoplistOut, which contains only the rows in the stopjoin
table where the second column (sword) is NULL. These are the words that we want to keep,
WHERE <col>
since they are not stop words. The syntax for selecting null values is as follows:
IS NULL. ThestoplistOut table should only contain a single column, which includes
each kept mword. See Table 2 for the contents of the stoplistOut table for our running
example. Take a look at the contents of the table in the Hive interpreter to make sure it
contains the correct information. Again try looking for words the and then help and see
if the result is what you expect.

7
stoplistOut
mword sword
treasure NULL
my NULL
treasure NULL

Table 2: The stoplistOut table only considers rows from stopjoin where sword is NULL.

Exercise 2. Extend the stoplistOut query to produce word counts for each unique word. Table
3 shows the contents of the newstoplistOut table for our running example.
1. The stoplistOut table should have two columns, mword and count. You can obtain the
count by using COUNT(1) and GROUP BY. Refer to the Hive documentation on GROUP BY if
you need to.

2. Sort the data in descending order according to count and ascending order in terms of mword.
This is very similar to task 2.

3. Limit the output to 10 rows.

4. Edit the Dump output to le part of the script to save the table stoplistOut instead of
mywords.
Run the program and compare the output with that of task 2. You should see many of the top
words from task 2 are absent from the output of task 3, as these were included in the stop list le.

stoplistOut
mword count
treasure 2
my 1

Table 3: The updated stoplistOut, which contains word counts instead of duplicates.

8
Task 4: Include list
An include list is the inverse of a stop list. That is, an include list contains all of the words that
we want to keep rather than all of the words that we want to remove. For example:

Word list Include list Final list (with count)

big sh sh (2)

sh eat eat (1)
eat girae
other
sh

Exercise 3. Copy your completed t3-stoplist.hql from Exercise 2 to a new le called t4-includelist.hql.
Modify the script so that it now loads data from the le located at Input_data/4/includelist.txt
and saves output to task4-out. Now it is your turn to show what you are capable of. Modify the
program so that it now produces the desired output. Remember the output needs to include the
count of the included words and sorted according to the same criteria as task 2 and 3. Hint: if you
change the left outer join to an inner join, you will not need to check for null values.

9
Task 5: Sorting
SORT BY, like ORDER BY, is a clause used in queries to tell Hive that it should perform some sorting
on the data collection. However, the dierence between the two is that ORDER BY guarantees total
global order in the output by enforcing only one reducer, while SORT BY only guarantees ordering
of the rows within each reducer, as it uses multiple reducers. While this may not order the data
perfectly, it is generally more ecient. You will now experiment with this.

1. In order to see a dierence between SORT BY and ORDER BY we need to have multiple reducers.
By default Hive uses just one reducer. Look at t5-orderBy.hql to nd the line where we set
the number of reducers to 2.

set mapred.reduce.tasks = 2;

2. Currently, the t5-orderBy.hql script uses ORDER BY to perform sorting. Run the script and
take a look at the output le. You should notice that all of the data is globally sorted by
ascending count order.

3. Now copy t5-orderBy.hql into a new le called t5-sortBy.hql. Modify t5-sortBy.hql
SORT BY instead of ORDER BY. Also modify the output directory name to
so that it uses
task5sortBy-out.
4. Execute t5-sortBy.hql and look at the output. You should see that the output is eec-
tively two sorted lists concatenated one after the other. This is because SORT BY only sorts
internal to the reducer, and in this script we set two reducers. SORT BY allows us to achieve
more parallelism during reduction (and is therefore faster), but does not produce a globally
sorted order. Whereas ORDER BY sorts the data using a single reducer (regardless of how
mapred.reduce.tasks is set), hence it can produce a globally sorted order.

5. Modify the number of reducers to 5 for t5-sortBy.hql and run it again to see what happens.

How To Trade Like A Trader-Preneur PDF
100% (2)
How To Trade Like A Trader-Preneur PDF
50 pages
Hands-On Exercises With Big Data: Lab Sheet 1: Getting Started With Mapreduce and Hadoop
No ratings yet
Hands-On Exercises With Big Data: Lab Sheet 1: Getting Started With Mapreduce and Hadoop
14 pages
Word Count Program With MapReduce and Java
No ratings yet
Word Count Program With MapReduce and Java
6 pages
PDF
100% (2)
PDF
39 pages
DSCI 5350 - Lecture 5 PDF
No ratings yet
DSCI 5350 - Lecture 5 PDF
64 pages
Lecture38 PDF
No ratings yet
Lecture38 PDF
23 pages
TD Hive Guide V2.0 PDF
No ratings yet
TD Hive Guide V2.0 PDF
34 pages
TD Hive Guide V2.0
No ratings yet
TD Hive Guide V2.0
34 pages
Writing An Hadoop MapReduce Program in Python
No ratings yet
Writing An Hadoop MapReduce Program in Python
21 pages
Big Data Analytics and Developers Training Session 10
No ratings yet
Big Data Analytics and Developers Training Session 10
27 pages
Hadoop Course Outline UPDATED SURESH
No ratings yet
Hadoop Course Outline UPDATED SURESH
5 pages
1c MR YARN Transcript
No ratings yet
1c MR YARN Transcript
4 pages
Big Data Analysis Lab File: Objective-Design A Word Count Application Using Mapreduce Programming Model. Theory
No ratings yet
Big Data Analysis Lab File: Objective-Design A Word Count Application Using Mapreduce Programming Model. Theory
12 pages
2000 Procedimientos Industriales - Formoso
100% (2)
2000 Procedimientos Industriales - Formoso
1,219 pages
Apache Hive: An Introduction
No ratings yet
Apache Hive: An Introduction
51 pages
Exercises
No ratings yet
Exercises
3 pages
Word Count Program With MapReduce and Java
No ratings yet
Word Count Program With MapReduce and Java
6 pages
Example - (Map Function in Word Count)
No ratings yet
Example - (Map Function in Word Count)
6 pages
CS-702 (D) BigData
No ratings yet
CS-702 (D) BigData
61 pages
Lec 8
No ratings yet
Lec 8
24 pages
Map Reduce
No ratings yet
Map Reduce
57 pages
Big Data
No ratings yet
Big Data
120 pages
Lab2 WC
No ratings yet
Lab2 WC
2 pages
Certified Ethical Hacker Sample
No ratings yet
Certified Ethical Hacker Sample
0 pages
Run The WordCount Program Instructions
No ratings yet
Run The WordCount Program Instructions
3 pages
HIVE
No ratings yet
HIVE
80 pages
Bda Lab Exercises Lab Mannual - 2023
No ratings yet
Bda Lab Exercises Lab Mannual - 2023
72 pages
DSBDA GRP B Print
No ratings yet
DSBDA GRP B Print
21 pages
CTBD Ex02
No ratings yet
CTBD Ex02
3 pages
Week 2 Seminar
No ratings yet
Week 2 Seminar
50 pages
Manisha's Journey
No ratings yet
Manisha's Journey
6 pages
Eco System Notes
100% (1)
Eco System Notes
15 pages
Microsoft Certkey AZ-900 v2020-08-18 by Zala 111q PDF
No ratings yet
Microsoft Certkey AZ-900 v2020-08-18 by Zala 111q PDF
87 pages
Careers in Applied Psychology
No ratings yet
Careers in Applied Psychology
15 pages
Viscosity Sample Information Sheet PDF
No ratings yet
Viscosity Sample Information Sheet PDF
1 page
Practical 2-1
No ratings yet
Practical 2-1
4 pages
6 WIBD-Practicals
No ratings yet
6 WIBD-Practicals
19 pages
Chapter 8 Implementing VPNv2
No ratings yet
Chapter 8 Implementing VPNv2
23 pages
BDC Final Record
No ratings yet
BDC Final Record
36 pages
Compaq Armada m300
No ratings yet
Compaq Armada m300
102 pages
Building Information Modelling (Bim) For Facilities Management (FM) : The Mediacity Case Study Approach
No ratings yet
Building Information Modelling (Bim) For Facilities Management (FM) : The Mediacity Case Study Approach
21 pages
Practical 2c
No ratings yet
Practical 2c
2 pages
Unit 4 Notes CC Ramadevi
No ratings yet
Unit 4 Notes CC Ramadevi
31 pages
Exercise 7,8,9 Basic Commands
No ratings yet
Exercise 7,8,9 Basic Commands
7 pages
Word Count Example-HIVE & PIG LATIN
No ratings yet
Word Count Example-HIVE & PIG LATIN
2 pages
MapReduce Design Process (Word Count Example)
No ratings yet
MapReduce Design Process (Word Count Example)
3 pages
Synchronous Optical Networking (Sonet)
No ratings yet
Synchronous Optical Networking (Sonet)
6 pages
Lec 8
No ratings yet
Lec 8
19 pages
Design and Interface Examples Using 8051
No ratings yet
Design and Interface Examples Using 8051
43 pages
Unit 3 BDA
No ratings yet
Unit 3 BDA
4 pages
Kick Start Hadoop: Word Count - Hadoop Map Reduce Example
No ratings yet
Kick Start Hadoop: Word Count - Hadoop Map Reduce Example
13 pages
Practice 2
No ratings yet
Practice 2
7 pages
03 - Run The WordCount Program Instructions
No ratings yet
03 - Run The WordCount Program Instructions
4 pages
Ex. No. 13 Finding Most Commonly Occurring Word From 10 Phishing Emails Aim
No ratings yet
Ex. No. 13 Finding Most Commonly Occurring Word From 10 Phishing Emails Aim
20 pages
Internal Analysis: Resources, Capabilities, and Core Competencies
No ratings yet
Internal Analysis: Resources, Capabilities, and Core Competencies
59 pages
Bda QB3
No ratings yet
Bda QB3
22 pages
Ic Datasheet CH en
No ratings yet
Ic Datasheet CH en
2 pages
4 - Creating Creative Photomontages or Image Mixing Using Generative Adversarial Networks
No ratings yet
4 - Creating Creative Photomontages or Image Mixing Using Generative Adversarial Networks
9 pages
TK Series Magnet GPS Tracker USER MANUAL
No ratings yet
TK Series Magnet GPS Tracker USER MANUAL
26 pages
Ex No 04
No ratings yet
Ex No 04
4 pages
OFAD 40023 Internet and Web Design COMMON
No ratings yet
OFAD 40023 Internet and Web Design COMMON
86 pages
Experiment 3
No ratings yet
Experiment 3
5 pages
BDA Unit-5
No ratings yet
BDA Unit-5
39 pages
Ravikant Hadoop File
No ratings yet
Ravikant Hadoop File
22 pages
Hive File Format
No ratings yet
Hive File Format
38 pages
Journal of Parallel and Distributed Computing: Daming Zhao, Jiantao Zhou
No ratings yet
Journal of Parallel and Distributed Computing: Daming Zhao, Jiantao Zhou
11 pages
Word Count (2021)
No ratings yet
Word Count (2021)
50 pages
Labs Lecture2
No ratings yet
Labs Lecture2
6 pages
Lab Manual: Department of Computer Science and Engineering
No ratings yet
Lab Manual: Department of Computer Science and Engineering
10 pages
Experiment-11 BDA Lab
No ratings yet
Experiment-11 BDA Lab
4 pages
HIVE
No ratings yet
HIVE
28 pages
DSBDA Manual
No ratings yet
DSBDA Manual
54 pages
Nasscom Mlops Playbook 2022
No ratings yet
Nasscom Mlops Playbook 2022
55 pages
Scripting Use Cases
No ratings yet
Scripting Use Cases
4 pages
Answers
No ratings yet
Answers
5 pages
07 Hive 01
No ratings yet
07 Hive 01
21 pages
SaravananPV3010 - UiPath Web Scraping Data
No ratings yet
SaravananPV3010 - UiPath Web Scraping Data
6 pages
TP3 - Hadoop Python - Wordcount
No ratings yet
TP3 - Hadoop Python - Wordcount
6 pages
Hive and Pig
No ratings yet
Hive and Pig
57 pages
Baldwin 2020 The Shift To The Third Unbundling in The World
No ratings yet
Baldwin 2020 The Shift To The Third Unbundling in The World
13 pages
Epfo Mis 312
No ratings yet
Epfo Mis 312
1 page
Theory and Practice of Artificial Intelligence
No ratings yet
Theory and Practice of Artificial Intelligence
7 pages
OOSD Unit 1.3
No ratings yet
OOSD Unit 1.3
27 pages
Soln Numerical Methods Practice Questions MSBTE
No ratings yet
Soln Numerical Methods Practice Questions MSBTE
24 pages
Hive Commands
No ratings yet
Hive Commands
7 pages
Hive Notes
No ratings yet
Hive Notes
3 pages
Fundamental of Programmingi
No ratings yet
Fundamental of Programmingi
21 pages

Lab02 Hive1

Uploaded by

Lab02 Hive1

Uploaded by

CSE3BDC/CSE5BDC

Lab 02: Apache Hive

 Gain experience writing and executing basic workloads in Hive

 See the connection between Hive and MapReduce

Task 1: Word count

3. Run the rst HiveQL script:

$ docker exec -it hive-host bash

7. List all of the tables in Hive:

SELECT * FROM myinput LIMIT 10;

DROP TABLE myinput;

ROW FORMAT DELIMITED

Task 2: Subqueries and Hive MapReduce analysis

FROM (<subquery>) splitwords

6. Open Hue in your web-browser by going to http://localhost:8888. If you are working in a

i. MapReduce job 1: Create the mywords table.

i. MapReduce job 1: Create the wordcount table.

ORDER BY <col1> DESC, <col2> ASC;

SELECT <table1.col1> AS mword, <table2.col1> AS sword

SELECT COUNT(1) FROM stopjoin

WHERE mywords.word NOT LIKE "";

3. Limit the output to 10 rows.

Word list Include list Final list (with count)

big sh sh (2)

You might also like

Gain experience writing and executing basic workloads in Hive

See the connection between Hive and MapReduce

3. Run the rst HiveQL script:

big sh sh (2)