0% found this document useful (0 votes)

4 views5 pages

Answers

The document provides a step-by-step guide for setting up a Hadoop environment using Docker on macOS, including pulling the Docker image, running the container, and accessing its shell. It outlines tasks such as HDFS operations, installing and configuring Hive and HBase, running MapReduce jobs, and executing Pig scripts for data analysis. Each task includes specific commands and sub-steps to ensure successful execution within the Docker container environment.

Uploaded by

subramanyau67

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views5 pages

Answers

Uploaded by

subramanyau67

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Alright, let's go step-by-step, from pulling and running the Docker container for Hadoop to

executing all 8 questions. I'll guide you through every step, explaining where you need to be in terms
of directories and the exact commands.

1. Pull and Run the Docker Container for Hadoop

First, let's get your Hadoop environment running in Docker (since we’re doing this on macOS).

Step 1.1: Pull the Docker Image

You can use the official Hadoop Docker image (or a pre-configured one for ease). If you’re new to
Hadoop setup, I recommend using a pre-configured image that will save you time.

docker pull sequenceiq/hadoop-docker:2.7.1

Step 1.2: Run the Docker Container

Now, let's run the container, which will automatically set up a small Hadoop cluster with HDFS.

docker run -it sequenceiq/hadoop-docker:2.7.1 /etc/bootstrap.sh -d

Step 1.3: Access the Container’s Shell

Once the container starts, you should be inside the container’s shell.

docker exec -it <container_id> bash

2. Perform the Hadoop Tasks (Questions)

Let’s break down each task one by one now that your Docker container is up and running.

Question 1: HDFS Operations

Sub-steps:

1. Create directories in HDFS: Use the following command to create a directory in HDFS:

2. hdfs dfs -mkdir /user/<username>/mydir

3. Upload files from local to HDFS: To upload a file from your local system to HDFS:

4. hdfs dfs -put /localpath/filename /user/<username>/mydir

5. Display the file: To display the content of the file you uploaded:

6. hdfs dfs -cat /user/<username>/mydir/filename

7. Delete the file: To delete the file from HDFS:

8. hdfs dfs -rm /user/<username>/mydir/filename

Question 2: Install Hive and Configure with Hadoop

Hive allows you to query HDFS data using SQL-like syntax. In your Docker container, you may need to
install Hive and configure it.

Sub-steps:

1. Install Hive (if not installed):

If Hive is not already installed in your Docker container, follow the installation steps:

# Download Hive

wget http://apache.mirror.digitalpacific.com.au/hive/stable/hive-2.3.7-bin.tar.gz

tar -xvzf hive-2.3.7-bin.tar.gz

mv hive-2.3.7-bin /opt/hive

2. Configure Hive with Hadoop: Set HADOOP_HOME and HIVE_HOME:

3. export HADOOP_HOME=/opt/hadoop

4. export HIVE_HOME=/opt/hive

5. export PATH=$HIVE_HOME/bin:$PATH

6. Start Hive and Configure Hadoop:

To start Hive in the container:

hive

Then create a Hive table, query it, and calculate an average.

CREATE TABLE my_table (id INT, name STRING);

INSERT INTO my_table VALUES (1, 'Hemanth'), (2, 'John');

SELECT * FROM my_table;

SELECT AVG(id) FROM my_table;

Question 3: Word Count Mapper Program

The code I gave you earlier in WordCountMapper.java, WordCountReducer.java, and

WordCountDriver.java should be compiled and executed.

Sub-steps:

1. Compile Java files:

2. javac -classpath `hadoop classpath` -d /path/to/output/ WordCountMapper.java

WordCountReducer.java WordCountDriver.java

3. Create JAR file:

4. jar -cvf wordcount.jar -C /path/to/output/ .

5. Run the MapReduce job (assuming your input file is in HDFS):

6. hadoop jar wordcount.jar WordCountDriver /input_path /output_path

Question 4: Run Word Count MapReduce (Steps)

This is more about the process than just the code. Follow these steps:

1. Format HDFS:

Format HDFS before running MapReduce:

hdfs namenode -format

2. Start Hadoop Services:

Start the services necessary for Hadoop and HDFS:

start-dfs.sh

start-yarn.sh

3. Create an Input Folder in HDFS:

Create the directory where your input data will reside:

hdfs dfs -mkdir /input

4. Compile and Execute the Job:

The compilation and execution steps are the same as in Question 3:

hadoop jar wordcount.jar WordCountDriver /input /output

5. View Results:

After running the job, view the output:

hdfs dfs -cat /output/part-r-00000

6. Clean Up (Remove Directories):

Delete the input and output directories:

hdfs dfs -rm -r /input /output

Question 5: Upload File to HDFS and Run WordCount MapReduce

1. Upload File to HDFS:

Same as in Question 1:

hdfs dfs -put /localpath/filename /user/<username>/input

2. Compile and Execute WordCount Program:

Again, follow the steps in Question 4 to run the job.

3. Verify Output:
4. hdfs dfs -cat /output/part-r-00000

Question 6: Install HBase and Create Table

HBase is a distributed NoSQL database that runs on top of HDFS.

1. Install HBase:

If HBase is not installed, install it like this:

wget https://downloads.apache.org/hbase/2.4.9/hbase-2.4.9-bin.tar.gz

tar -xvzf hbase-2.4.9-bin.tar.gz

mv hbase-2.4.9 /opt/hbase

2. Configure HBase:

Set up HBase by configuring hbase-site.xml.

3. Create a Table and Insert Data:

4. hbase shell

5. create 'employee', 'info'

6. put 'employee', '1', 'info:name', 'John'

7. put 'employee', '2', 'info:name', 'Alice'

8. scan 'employee'

9. Update and Delete Data:

10. put 'employee', '1', 'info:name', 'Updated John'

11. delete 'employee', '2'

12. scan 'employee'

Question 7: Pig Latin Script for Max Temperature

Pig is a high-level platform for Hadoop. Here's a simple Pig script to find the maximum temperature
per year:

1. Create Pig Script (max_temp.pig):

2. data = LOAD '/data/temperature.csv' USING PigStorage(',') AS (year:int, temp:int);

3. grouped = GROUP data BY year;

4. max_temp = FOREACH grouped GENERATE group AS year, MAX(data.temp) AS max_temp;

5. DUMP max_temp;

6. Run the Script:

7. pig max_temp.pig

Question 8: Pig Latin Script for Filtering, Grouping, and Revenue Calculation

1. Create Pig Script (revenue_analysis.pig):

2. data = LOAD '/data/sales.csv' USING PigStorage(',') AS (product:chararray, revenue:int);

3. filtered = FILTER data BY revenue > 1000;

4. grouped = GROUP filtered BY product;

5. revenue = FOREACH grouped GENERATE group AS product, SUM(filtered.revenue) AS

total_revenue;

6. DUMP revenue;

7. Run the Script:

8. pig revenue_analysis.pig

Directory Information:

• You don’t need to change directories for each of these commands in HDFS. You’ll mostly be
using absolute paths or relative paths under /user/<username>/.

• Just ensure you're in the right directory where your input files are located and where you
want to store outputs.

Final Thoughts

This should give you step-by-step guidance for the 8 questions, starting from Docker setup to
Hadoop commands. You can execute all these tasks in the Docker container environment.

GSP313 Create and Manage Cloud Resources - Challenge Lab
No ratings yet
GSP313 Create and Manage Cloud Resources - Challenge Lab
2 pages
Rushmitha AWS Devops
No ratings yet
Rushmitha AWS Devops
6 pages
Big Data & Analytics Lab Manual
No ratings yet
Big Data & Analytics Lab Manual
51 pages
Configure Hadoop Cluster in Pseudo Distributed Mode. Try Hadoop Basic Commands
No ratings yet
Configure Hadoop Cluster in Pseudo Distributed Mode. Try Hadoop Basic Commands
88 pages
BDA Record
No ratings yet
BDA Record
58 pages
Final Bda 1-8 Lab Aayush
No ratings yet
Final Bda 1-8 Lab Aayush
17 pages
20CSPL701 - Bda - Record 2024-2025
No ratings yet
20CSPL701 - Bda - Record 2024-2025
54 pages
Expert 52
No ratings yet
Expert 52
58 pages
PDC All Labs
100% (1)
PDC All Labs
129 pages
Bigdatamanualfinal 231019063224 d211cb48
No ratings yet
Bigdatamanualfinal 231019063224 d211cb48
45 pages
Cloud PDF
No ratings yet
Cloud PDF
47 pages
BDH Record - Merged
No ratings yet
BDH Record - Merged
47 pages
SigmaNESTReferenceManual en
0% (1)
SigmaNESTReferenceManual en
460 pages
Simatic Manager Step7 Connect and Backup
100% (1)
Simatic Manager Step7 Connect and Backup
12 pages
Eco System Notes
100% (1)
Eco System Notes
15 pages
Cloud Computing Era Practice
No ratings yet
Cloud Computing Era Practice
75 pages
Bda Lab Manual
No ratings yet
Bda Lab Manual
42 pages
Lab File Format
No ratings yet
Lab File Format
60 pages
Activity 2
No ratings yet
Activity 2
31 pages
Program Code
No ratings yet
Program Code
30 pages
Bda Exp1 Chinmay
No ratings yet
Bda Exp1 Chinmay
13 pages
BDA Lab Manual - Organized
No ratings yet
BDA Lab Manual - Organized
69 pages
The Basics of Bringing Up A Hardware Platform
No ratings yet
The Basics of Bringing Up A Hardware Platform
47 pages
Big Data Analytics Lab
No ratings yet
Big Data Analytics Lab
18 pages
Big Data Analytics lab-JD
No ratings yet
Big Data Analytics lab-JD
49 pages
Blade Oa Putty - Copy - Log
No ratings yet
Blade Oa Putty - Copy - Log
36 pages
Azure AD Integration
No ratings yet
Azure AD Integration
12 pages
Big Data Cheat Sheet
No ratings yet
Big Data Cheat Sheet
1 page
3 Visual Studio Code Intro
No ratings yet
3 Visual Studio Code Intro
10 pages
IRIS - Student Instructions
No ratings yet
IRIS - Student Instructions
12 pages
How To Install and Configure Citrix Web Interface 4.6 and Citrix Secure Gateway On The Same Server
No ratings yet
How To Install and Configure Citrix Web Interface 4.6 and Citrix Secure Gateway On The Same Server
40 pages
BDF Programs
No ratings yet
BDF Programs
32 pages
BDA Manual
No ratings yet
BDA Manual
41 pages
HADOOP One Day Crash Course
No ratings yet
HADOOP One Day Crash Course
19 pages
Registry Edits For Windows XP
No ratings yet
Registry Edits For Windows XP
18 pages
C21053 Jay Vijay Karwatkar-Big Data Analytics & Visualization
No ratings yet
C21053 Jay Vijay Karwatkar-Big Data Analytics & Visualization
210 pages
BDA Lab
No ratings yet
BDA Lab
13 pages
Big Data File
No ratings yet
Big Data File
16 pages
QuickTransit SSLI Release Notes 1.1
No ratings yet
QuickTransit SSLI Release Notes 1.1
12 pages
Hadoop Course Outline UPDATED SURESH
No ratings yet
Hadoop Course Outline UPDATED SURESH
5 pages
Data Storage Data Processing: Hadoop Distributed File System (HDFS) Mapreduce
No ratings yet
Data Storage Data Processing: Hadoop Distributed File System (HDFS) Mapreduce
35 pages
Um1718 User Manual: Stm32Cubemx For Stm32 Configuration and Initialization C Code Generation
No ratings yet
Um1718 User Manual: Stm32Cubemx For Stm32 Configuration and Initialization C Code Generation
327 pages
W Java132
No ratings yet
W Java132
14 pages
BDA Record
No ratings yet
BDA Record
34 pages
Bda Lab Record
No ratings yet
Bda Lab Record
32 pages
Big Data Lab Manual Printout
No ratings yet
Big Data Lab Manual Printout
51 pages
Bda Lab S
No ratings yet
Bda Lab S
92 pages
Big Datalab
No ratings yet
Big Datalab
4 pages
HadoopExercises July2011 PDF
No ratings yet
HadoopExercises July2011 PDF
26 pages
Xxx17index File
No ratings yet
Xxx17index File
2 pages
Tutorial-Counting Words in File (S) Using Mapreduce: Prerequisites
No ratings yet
Tutorial-Counting Words in File (S) Using Mapreduce: Prerequisites
11 pages
Setup Hadoop Gettingstart
No ratings yet
Setup Hadoop Gettingstart
4 pages
Izmjene :D
No ratings yet
Izmjene :D
3 pages
Two Factor Authentication in SonicOS
No ratings yet
Two Factor Authentication in SonicOS
9 pages
Exp 1-2
No ratings yet
Exp 1-2
9 pages
Data Science
No ratings yet
Data Science
82 pages
Exploits
No ratings yet
Exploits
2 pages
CC Hadoop Lab
No ratings yet
CC Hadoop Lab
6 pages
Big Data Record 2024-25
No ratings yet
Big Data Record 2024-25
46 pages
Notes
No ratings yet
Notes
53 pages
Windows11 Deployment Imaging Framework
No ratings yet
Windows11 Deployment Imaging Framework
39 pages
Bigdatamanual
No ratings yet
Bigdatamanual
45 pages
Big Data
No ratings yet
Big Data
28 pages
Extracting Real Value From Your Data With Apache Hadoop: Sarah Sproehnle
No ratings yet
Extracting Real Value From Your Data With Apache Hadoop: Sarah Sproehnle
51 pages
CCS334-BDA LAB MANUAL Final
No ratings yet
CCS334-BDA LAB MANUAL Final
46 pages
How To Get A New Salesforce Lightning Enabled GS0 Org
No ratings yet
How To Get A New Salesforce Lightning Enabled GS0 Org
2 pages
Practical-1: Aim: Hadoop Configuration and Single Node Cluster Setup and Perform File Management Task in
No ratings yet
Practical-1: Aim: Hadoop Configuration and Single Node Cluster Setup and Perform File Management Task in
61 pages
@bigdatalabfile 09
No ratings yet
@bigdatalabfile 09
35 pages
Anusha Resume
No ratings yet
Anusha Resume
5 pages
CQ FirmwareReleaseNotes V1 1 1
No ratings yet
CQ FirmwareReleaseNotes V1 1 1
3 pages
BDA Practicalfile
No ratings yet
BDA Practicalfile
19 pages
Notes Android Studio
No ratings yet
Notes Android Studio
5 pages
1.laptop, Macbook, Computer: Many of Them Teach Carding With Mobile Phone and Exchange They Demand
100% (2)
1.laptop, Macbook, Computer: Many of Them Teach Carding With Mobile Phone and Exchange They Demand
8 pages
Extreme Computing Lab Exercises Session One: 1 Getting Started
No ratings yet
Extreme Computing Lab Exercises Session One: 1 Getting Started
6 pages
Netbackup Process Flow
100% (2)
Netbackup Process Flow
6 pages
Bda Lab
No ratings yet
Bda Lab
4 pages
Developing A Simple Map-Reduce Program For Hadoop: Big Data Course CS6350 Professor: Dr. Latifur Khan
No ratings yet
Developing A Simple Map-Reduce Program For Hadoop: Big Data Course CS6350 Professor: Dr. Latifur Khan
22 pages
Input Devices
No ratings yet
Input Devices
4 pages
PDF Files: Scan - Create - Reduce File Size
No ratings yet
PDF Files: Scan - Create - Reduce File Size
4 pages
BIGDATA LAB MANUAL
No ratings yet
BIGDATA LAB MANUAL
27 pages
Hadoop Administrator Training - Lab Hand Book
No ratings yet
Hadoop Administrator Training - Lab Hand Book
12 pages
Redline Clearview
No ratings yet
Redline Clearview
2 pages
Prerequisites: Single Node Setup Cluster Setup
No ratings yet
Prerequisites: Single Node Setup Cluster Setup
5 pages
How To Set Up A Hadoop Cluster in Docker
No ratings yet
How To Set Up A Hadoop Cluster in Docker
13 pages
How To Unprotect Excel Sheet With Password
No ratings yet
How To Unprotect Excel Sheet With Password
11 pages
Hadoop实际解决方案手册: Chinese Edition
From Everand
Hadoop实际解决方案手册: Chinese Edition
Posts & Telecom Press
No ratings yet
Big Data Analytics
From Everand
Big Data Analytics
Nitin Kumar Yadav
No ratings yet
Firebase Storage for Angular: A reliable file upload solution for your applications
From Everand
Firebase Storage for Angular: A reliable file upload solution for your applications
Abdelfattah Ragab
No ratings yet
OpenCart Tips and Tricks
From Everand
OpenCart Tips and Tricks
iSenseLabs
No ratings yet
Quick Configuration of Openldap and Kerberos In Linux and Authenicating Linux to Active Directory
From Everand
Quick Configuration of Openldap and Kerberos In Linux and Authenicating Linux to Active Directory
Dr. Hidaia Mahmood Alassouli
No ratings yet