0% found this document useful (0 votes)
195 views7 pages

Debpriyo Roy Lab 1

This document outlines the steps for a lab assignment on getting started with Hadoop. The steps include setting up the Cloudera environment, ingesting and querying structured relational data using Hadoop commands, querying the data using Hue's Impala app, correlating structured and unstructured data by bulk uploading and building tables, and answering questions about revenue data.

Uploaded by

udaikiran
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
195 views7 pages

Debpriyo Roy Lab 1

This document outlines the steps for a lab assignment on getting started with Hadoop. The steps include setting up the Cloudera environment, ingesting and querying structured relational data using Hadoop commands, querying the data using Hue's Impala app, correlating structured and unstructured data by bulk uploading and building tables, and answering questions about revenue data.

Uploaded by

udaikiran
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Lab Assignment #1: Getting Started with Hadoop

Debpriyo Roy

Northeastern University

17/01/2020

Instructor: Daya Rudhramoorthi

Data Management and Big Data

This study source was downloaded by 100000841344363 from CourseHero.com on 06-12-2023 17:50:40 GMT -05:00

https://www.coursehero.com/file/56453733/Debpriyo-Roy-Lab-1docx/
Step 1: Setting up the cloudera environment.

After starting cloudera, go to cloud era manager and start the following services:

Step 2: Ingest and Query Structured Relational data

Enter the following commands in the terminal window:

Step 3: Query the data using Hue’s Impala app

Select the query editor as Impala and use the following commands to query the data.

This study source was downloaded by 100000841344363 from CourseHero.com on 06-12-2023 17:50:40 GMT -05:00

https://www.coursehero.com/file/56453733/Debpriyo-Roy-Lab-1docx/
This study source was downloaded by 100000841344363 from CourseHero.com on 06-12-2023 17:50:40 GMT -05:00

https://www.coursehero.com/file/56453733/Debpriyo-Roy-Lab-1docx/
Step 4: Correlate Structured and Unstructured Data

The following steps can be used to correlate structured and unstructured data.

 Bulk Upload Data

Use the terminal to execute the following commands from the manager node.

This study source was downloaded by 100000841344363 from CourseHero.com on 06-12-2023 17:50:40 GMT -05:00

https://www.coursehero.com/file/56453733/Debpriyo-Roy-Lab-1docx/
 Build a table using hive.

 Query the data using Impala

This study source was downloaded by 100000841344363 from CourseHero.com on 06-12-2023 17:50:40 GMT -05:00

https://www.coursehero.com/file/56453733/Debpriyo-Roy-Lab-1docx/
Questions/Answers

Q.1. What is the 5th most revenue generating product?

A.1. Perfect fitness and perfect Rip Deck

Q.2. How much revenue does the Nike men's dry fit polo earn?

A.2. 48185600

Q.3. There is one product that did not show up in the previous result. It seems to be viewed a lot,

but never purchased. Why?

A.3. The price labelling had a typo which was preventing it’s sale. Once the error was fixed, the

sales began rapidly.

This study source was downloaded by 100000841344363 from CourseHero.com on 06-12-2023 17:50:40 GMT -05:00

https://www.coursehero.com/file/56453733/Debpriyo-Roy-Lab-1docx/
References:

Hadoop, A., Cloudera, Inc, & Apache Software Foundation. (2018, October 11). The "Getting

Started With Hadoop" tutorial, Exercise 1. Retrieved from

https://www.cloudera.com/developers/get-started-with-hadoop-tutorial/exercise-1.html

Hadoop, A., Cloudera, Inc, & Apache Software Foundation. (2018, October 11). The "Getting

Started With Hadoop" tutorial, Exercise 2. Retrieved from

https://www.cloudera.com/developers/get-started-with-hadoop-tutorial/exercise-2.html

This study source was downloaded by 100000841344363 from CourseHero.com on 06-12-2023 17:50:40 GMT -05:00

https://www.coursehero.com/file/56453733/Debpriyo-Roy-Lab-1docx/
Powered by TCPDF (www.tcpdf.org)

You might also like