0% found this document useful (0 votes)
21 views

Lab Programs on HDFS and MapReduce

Uploaded by

mithun d'souza
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views

Lab Programs on HDFS and MapReduce

Uploaded by

mithun d'souza
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

II MCA

BIG DATA ANALYTICS LAB


Part B LAB-EXERCISES

I. Basic HDFS Operations

Perform the following tasks by interacting with Hadoop Distributed File System (HDFS).

• Create a directory named HDFS folder and verify its creation


• Upload a PDF file, a text file in the folder without using the put command
• Read the first few lines and the last few lines of the text file
• Edit the text file and display all the contents
• Copy the file to a different location within HDFS
• Download the text file from HDFS folder to your local directory

II. Advanced HDFS Operations

Perform the following tasks by interacting with Hadoop Distributed File System (HDFS).

• Upload a folder with multiple files to HDFS and verify


• Display space used by all files in a directory
• Append Data to an existing text File
• Upload files to demonstrate the usage of -put and -copyToLocal command
• Move a file to a different location within HDFS
• Download a file from HDFS folder to your local directory
• Delete the downloaded file from HDFS

III. Word Count

Count the frequency of each word in a text file.

• Prepare a text input file with sample data (Input.txt)


• Upload the file to HDFS in the /input/ directory
• Write a Mapper function to split lines into words and emit each word with a count of 1
• Write a Reducer function to sum up all counts for each word
• Package the MapReduce code into a JAR file
• Run the program using the Hadoop command
• Save the output to a specified HDFS directory
• View the results in the output directory and on web interface
• Download the output from HDFS to the local system
• Verify the correctness of the word counts by displaying the contents of the output file
IV. Temperature Analysis

Find the maximum temperature for each year in a weather dataset.

• Prepare a dataset with weather records in the format Year Temperature


• Upload the dataset to HDFS in the /input/ directory
• Write a Mapper function to extract the year and temperature from each record.
• Emit the year as the key and the temperature as the value.
• Write a Reducer function to calculate the maximum temperature for each year.
• Package the MapReduce code into a JAR file.
• Execute the program on the Hadoop cluster.
• Save the results to an HDFS directory (e.g., /output/temp_analysis).
• Download the output from HDFS to the local system
• Verify the correctness by displaying the contents of the output file

V. Character Frequency Count using MapReduce

Count the frequency of each character in a text file using the Hadoop MapReduce framework.

• Prepare Input Data: Create a text file with sample content (e.g., input.txt)
• Upload Input to HDFS: Upload the input file to HDFS
• Write Mapper Class: Implement a Mapper that reads characters and emits each character with
a count of 1
• Write Reducer Class: Implement a Reducer that sums up counts for each character
• Write Driver Class: Set up the job, define input/output paths, and specify Mapper/Reducer
classes
• Compile Java Code: Compile the Mapper, Reducer, and Driver classes into a JAR file
• Run the MapReduce Job: Execute the job with hadoop command on the input file
• Download Output: Download the result from HDFS to the local system and verify

You might also like