Lab Assignment #1: Getting Started with Hadoop
Debpriyo Roy
Northeastern University
17/01/2020
Instructor: Daya Rudhramoorthi
Data Management and Big Data
This study source was downloaded by 100000841344363 from CourseHero.com on 06-12-2023 17:50:40 GMT -05:00
https://www.coursehero.com/file/56453733/Debpriyo-Roy-Lab-1docx/
Step 1: Setting up the cloudera environment.
After starting cloudera, go to cloud era manager and start the following services:
Step 2: Ingest and Query Structured Relational data
Enter the following commands in the terminal window:
Step 3: Query the data using Hue’s Impala app
Select the query editor as Impala and use the following commands to query the data.
This study source was downloaded by 100000841344363 from CourseHero.com on 06-12-2023 17:50:40 GMT -05:00
https://www.coursehero.com/file/56453733/Debpriyo-Roy-Lab-1docx/
This study source was downloaded by 100000841344363 from CourseHero.com on 06-12-2023 17:50:40 GMT -05:00
https://www.coursehero.com/file/56453733/Debpriyo-Roy-Lab-1docx/
Step 4: Correlate Structured and Unstructured Data
The following steps can be used to correlate structured and unstructured data.
Bulk Upload Data
Use the terminal to execute the following commands from the manager node.
This study source was downloaded by 100000841344363 from CourseHero.com on 06-12-2023 17:50:40 GMT -05:00
https://www.coursehero.com/file/56453733/Debpriyo-Roy-Lab-1docx/
Build a table using hive.
Query the data using Impala
This study source was downloaded by 100000841344363 from CourseHero.com on 06-12-2023 17:50:40 GMT -05:00
https://www.coursehero.com/file/56453733/Debpriyo-Roy-Lab-1docx/
Questions/Answers
Q.1. What is the 5th most revenue generating product?
A.1. Perfect fitness and perfect Rip Deck
Q.2. How much revenue does the Nike men's dry fit polo earn?
A.2. 48185600
Q.3. There is one product that did not show up in the previous result. It seems to be viewed a lot,
but never purchased. Why?
A.3. The price labelling had a typo which was preventing it’s sale. Once the error was fixed, the
sales began rapidly.
This study source was downloaded by 100000841344363 from CourseHero.com on 06-12-2023 17:50:40 GMT -05:00
https://www.coursehero.com/file/56453733/Debpriyo-Roy-Lab-1docx/
References:
Hadoop, A., Cloudera, Inc, & Apache Software Foundation. (2018, October 11). The "Getting
Started With Hadoop" tutorial, Exercise 1. Retrieved from
https://www.cloudera.com/developers/get-started-with-hadoop-tutorial/exercise-1.html
Hadoop, A., Cloudera, Inc, & Apache Software Foundation. (2018, October 11). The "Getting
Started With Hadoop" tutorial, Exercise 2. Retrieved from
https://www.cloudera.com/developers/get-started-with-hadoop-tutorial/exercise-2.html
This study source was downloaded by 100000841344363 from CourseHero.com on 06-12-2023 17:50:40 GMT -05:00
https://www.coursehero.com/file/56453733/Debpriyo-Roy-Lab-1docx/
Powered by TCPDF (www.tcpdf.org)