0% found this document useful (0 votes)
19 views

Lab Programs on HDFS and MapReduce

Uploaded by

mithun d'souza
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views

Lab Programs on HDFS and MapReduce

Uploaded by

mithun d'souza
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

II MCA

BIG DATA ANALYTICS LAB


Part B LAB-EXERCISES

I. Basic HDFS Operations

Perform the following tasks by interacting with Hadoop Distributed File System (HDFS).

• Create a directory named HDFS folder and verify its creation


• Upload a PDF file, a text file in the folder without using the put command
• Read the first few lines and the last few lines of the text file
• Edit the text file and display all the contents
• Copy the file to a different location within HDFS
• Download the text file from HDFS folder to your local directory

II. Advanced HDFS Operations

Perform the following tasks by interacting with Hadoop Distributed File System (HDFS).

• Upload a folder with multiple files to HDFS and verify


• Display space used by all files in a directory
• Append Data to an existing text File
• Upload files to demonstrate the usage of -put and -copyToLocal command
• Move a file to a different location within HDFS
• Download a file from HDFS folder to your local directory
• Delete the downloaded file from HDFS

III. Word Count

Count the frequency of each word in a text file.

• Prepare a text input file with sample data (Input.txt)


• Upload the file to HDFS in the /input/ directory
• Write a Mapper function to split lines into words and emit each word with a count of 1
• Write a Reducer function to sum up all counts for each word
• Package the MapReduce code into a JAR file
• Run the program using the Hadoop command
• Save the output to a specified HDFS directory
• View the results in the output directory and on web interface
• Download the output from HDFS to the local system
• Verify the correctness of the word counts by displaying the contents of the output file
IV. Temperature Analysis

Find the maximum temperature for each year in a weather dataset.

• Prepare a dataset with weather records in the format Year Temperature


• Upload the dataset to HDFS in the /input/ directory
• Write a Mapper function to extract the year and temperature from each record.
• Emit the year as the key and the temperature as the value.
• Write a Reducer function to calculate the maximum temperature for each year.
• Package the MapReduce code into a JAR file.
• Execute the program on the Hadoop cluster.
• Save the results to an HDFS directory (e.g., /output/temp_analysis).
• Download the output from HDFS to the local system
• Verify the correctness by displaying the contents of the output file

V. Character Frequency Count using MapReduce

Count the frequency of each character in a text file using the Hadoop MapReduce framework.

• Prepare Input Data: Create a text file with sample content (e.g., input.txt)
• Upload Input to HDFS: Upload the input file to HDFS
• Write Mapper Class: Implement a Mapper that reads characters and emits each character with
a count of 1
• Write Reducer Class: Implement a Reducer that sums up counts for each character
• Write Driver Class: Set up the job, define input/output paths, and specify Mapper/Reducer
classes
• Compile Java Code: Compile the Mapper, Reducer, and Driver classes into a JAR file
• Run the MapReduce Job: Execute the job with hadoop command on the input file
• Download Output: Download the result from HDFS to the local system and verify

You might also like