Lab Programs on HDFS and MapReduce

Uploaded by

mithun d'souza

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views

Lab Programs on HDFS and MapReduce

Uploaded by

mithun d'souza

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

II MCA

BIG DATA ANALYTICS LAB

Part B LAB-EXERCISES

I. Basic HDFS Operations

Perform the following tasks by interacting with Hadoop Distributed File System (HDFS).

• Create a directory named HDFS folder and verify its creation

• Upload a PDF file, a text file in the folder without using the put command
• Read the first few lines and the last few lines of the text file
• Edit the text file and display all the contents
• Copy the file to a different location within HDFS
• Download the text file from HDFS folder to your local directory

II. Advanced HDFS Operations

Perform the following tasks by interacting with Hadoop Distributed File System (HDFS).

• Upload a folder with multiple files to HDFS and verify

• Display space used by all files in a directory
• Append Data to an existing text File
• Upload files to demonstrate the usage of -put and -copyToLocal command
• Move a file to a different location within HDFS
• Download a file from HDFS folder to your local directory
• Delete the downloaded file from HDFS

III. Word Count

Count the frequency of each word in a text file.

• Prepare a text input file with sample data (Input.txt)

• Upload the file to HDFS in the /input/ directory
• Write a Mapper function to split lines into words and emit each word with a count of 1
• Write a Reducer function to sum up all counts for each word
• Package the MapReduce code into a JAR file
• Run the program using the Hadoop command
• Save the output to a specified HDFS directory
• View the results in the output directory and on web interface
• Download the output from HDFS to the local system
• Verify the correctness of the word counts by displaying the contents of the output file
IV. Temperature Analysis

Find the maximum temperature for each year in a weather dataset.

• Prepare a dataset with weather records in the format Year Temperature

• Upload the dataset to HDFS in the /input/ directory
• Write a Mapper function to extract the year and temperature from each record.
• Emit the year as the key and the temperature as the value.
• Write a Reducer function to calculate the maximum temperature for each year.
• Package the MapReduce code into a JAR file.
• Execute the program on the Hadoop cluster.
• Save the results to an HDFS directory (e.g., /output/temp_analysis).
• Download the output from HDFS to the local system
• Verify the correctness by displaying the contents of the output file

V. Character Frequency Count using MapReduce

Count the frequency of each character in a text file using the Hadoop MapReduce framework.

• Prepare Input Data: Create a text file with sample content (e.g., input.txt)
• Upload Input to HDFS: Upload the input file to HDFS
• Write Mapper Class: Implement a Mapper that reads characters and emits each character with
a count of 1
• Write Reducer Class: Implement a Reducer that sums up counts for each character
• Write Driver Class: Set up the job, define input/output paths, and specify Mapper/Reducer
classes
• Compile Java Code: Compile the Mapper, Reducer, and Driver classes into a JAR file
• Run the MapReduce Job: Execute the job with hadoop command on the input file
• Download Output: Download the result from HDFS to the local system and verify

Michael Okpara University of Agriculture, Umudike
No ratings yet
Michael Okpara University of Agriculture, Umudike
2 pages
Yahoo Hadoop Tutorial
No ratings yet
Yahoo Hadoop Tutorial
28 pages
Unit-Iii: A Weather Dataset
No ratings yet
Unit-Iii: A Weather Dataset
12 pages
Question Bank - Cad Cam Cae
No ratings yet
Question Bank - Cad Cam Cae
12 pages
bda lab s
No ratings yet
bda lab s
92 pages
CSF443 Lab-Report Nimish Shandilya 1000016934
No ratings yet
CSF443 Lab-Report Nimish Shandilya 1000016934
17 pages
ADBMS-Module 3
No ratings yet
ADBMS-Module 3
115 pages
Bda Lab Manual 2024
No ratings yet
Bda Lab Manual 2024
45 pages
BDA Journal
No ratings yet
BDA Journal
52 pages
Bda Unit 3
No ratings yet
Bda Unit 3
22 pages
University Institute of Computing: Big Bata Analytics 22CAH-782
No ratings yet
University Institute of Computing: Big Bata Analytics 22CAH-782
51 pages
HADOOP One Day Crash Course
No ratings yet
HADOOP One Day Crash Course
19 pages
Hadoop Practical Commands & Mapreduce Lab Mannula With Java and Python
No ratings yet
Hadoop Practical Commands & Mapreduce Lab Mannula With Java and Python
2 pages
Bda Material Unit 3
No ratings yet
Bda Material Unit 3
14 pages
CS-702 (D) BigData
No ratings yet
CS-702 (D) BigData
61 pages
Big Data Analytics QB 3
No ratings yet
Big Data Analytics QB 3
2 pages
Distcpcommand in Hadoop.: Big Data Analytics Qb-3 Module-4
No ratings yet
Distcpcommand in Hadoop.: Big Data Analytics Qb-3 Module-4
2 pages
PDC All Labs
100% (1)
PDC All Labs
129 pages
@bigdatalabfile 09
No ratings yet
@bigdatalabfile 09
35 pages
Unit v Programming Model
No ratings yet
Unit v Programming Model
53 pages
BIG data master
No ratings yet
BIG data master
24 pages
084 Liza Bda File
No ratings yet
084 Liza Bda File
23 pages
Bda Lab
No ratings yet
Bda Lab
94 pages
big-data-unit 2
No ratings yet
big-data-unit 2
70 pages
Unit 4 Handouts
No ratings yet
Unit 4 Handouts
13 pages
Lab Experiments
No ratings yet
Lab Experiments
1 page
Unit-Iv CC&BD CS62
No ratings yet
Unit-Iv CC&BD CS62
76 pages
C21053 Jay Vijay Karwatkar-Big Data Analytics & Visualization
No ratings yet
C21053 Jay Vijay Karwatkar-Big Data Analytics & Visualization
210 pages
Notes
No ratings yet
Notes
53 pages
Unit Iii LM
No ratings yet
Unit Iii LM
14 pages
Analyzing The Data With Hadoop
No ratings yet
Analyzing The Data With Hadoop
13 pages
Data Science
No ratings yet
Data Science
82 pages
Short Programs
No ratings yet
Short Programs
41 pages
Big Data Analytics
From Everand
Big Data Analytics
Nitin Kumar Yadav
No ratings yet
Bda Lab Manual
No ratings yet
Bda Lab Manual
20 pages
Activity 2
No ratings yet
Activity 2
31 pages
MapReduce and Yarn
No ratings yet
MapReduce and Yarn
39 pages
BDA4
No ratings yet
BDA4
7 pages
Big Data Lab Manual Printout Copy
No ratings yet
Big Data Lab Manual Printout Copy
51 pages
Course: Big Data Analytics Lab Scheme: 2017
No ratings yet
Course: Big Data Analytics Lab Scheme: 2017
25 pages
Unit-Iii: A Weather Dataset
No ratings yet
Unit-Iii: A Weather Dataset
12 pages
Map Reduce
No ratings yet
Map Reduce
15 pages
Hadoop module1
No ratings yet
Hadoop module1
37 pages
BDA Practicalfile
No ratings yet
BDA Practicalfile
19 pages
BDC Final Record
No ratings yet
BDC Final Record
36 pages
Big Data Manual
No ratings yet
Big Data Manual
82 pages
Manual 5
No ratings yet
Manual 5
51 pages
DSBDA GRP B Print
No ratings yet
DSBDA GRP B Print
21 pages
DA Lab Program-3
No ratings yet
DA Lab Program-3
9 pages
05_MapReduce in Hadoop - An Introduction
No ratings yet
05_MapReduce in Hadoop - An Introduction
31 pages
Big Data AnalyticUnit2
No ratings yet
Big Data AnalyticUnit2
19 pages
BDH Record - Merged
No ratings yet
BDH Record - Merged
47 pages
Big Data Ia Answers
No ratings yet
Big Data Ia Answers
14 pages
Cloud PDF
No ratings yet
Cloud PDF
47 pages
05 Movies Data Analysis Using Mapreduce
No ratings yet
05 Movies Data Analysis Using Mapreduce
20 pages
BDA IV B.Tech I Sem MR18-Mid-2 Objective Questions
No ratings yet
BDA IV B.Tech I Sem MR18-Mid-2 Objective Questions
11 pages
Mcsl26 See QP Solution 2024
No ratings yet
Mcsl26 See QP Solution 2024
33 pages
BDA UNIT -3 Updated (1).docx
No ratings yet
BDA UNIT -3 Updated (1).docx
25 pages
Unit1 Remainingtopics 6feb
No ratings yet
Unit1 Remainingtopics 6feb
13 pages
MR YARN - Lab 1 - Cloud - Updated-V2.0
No ratings yet
MR YARN - Lab 1 - Cloud - Updated-V2.0
30 pages
sets_bda
No ratings yet
sets_bda
19 pages
MapReduce Hands On
No ratings yet
MapReduce Hands On
28 pages
Development of Deep Learning Architecture: Pantech Solutions & The Institution of Electronics and Telecommunication
No ratings yet
Development of Deep Learning Architecture: Pantech Solutions & The Institution of Electronics and Telecommunication
31 pages
Team
No ratings yet
Team
23 pages
Learn Karate at Home
100% (1)
Learn Karate at Home
17 pages
CURRICULUM CONTENT Information
No ratings yet
CURRICULUM CONTENT Information
14 pages
Education and Social Change
100% (1)
Education and Social Change
5 pages
A Study On Energy Efficient Resource Scheduling Algorithms in Cloud Computing
No ratings yet
A Study On Energy Efficient Resource Scheduling Algorithms in Cloud Computing
6 pages
Training Methods: Session 8 Trainer Development Conference
No ratings yet
Training Methods: Session 8 Trainer Development Conference
42 pages
Training Methods: Session 8 Trainer Development Conference
No ratings yet
Training Methods: Session 8 Trainer Development Conference
42 pages
C Programming Functions and Recursion
No ratings yet
C Programming Functions and Recursion
15 pages
Millipede - Nano Technology by Mithun D'souza
100% (1)
Millipede - Nano Technology by Mithun D'souza
26 pages
Enterprise Cloud Computing
No ratings yet
Enterprise Cloud Computing
7 pages
GMECG2X00 cMT3162X Installation
No ratings yet
GMECG2X00 cMT3162X Installation
2 pages
Cover Letter For Teacher
100% (1)
Cover Letter For Teacher
6 pages
HPT Brand Guidelines 2021v1.7
No ratings yet
HPT Brand Guidelines 2021v1.7
13 pages
ABB Protection Solutions
No ratings yet
ABB Protection Solutions
16 pages
Domicile Mohsin
No ratings yet
Domicile Mohsin
1 page
Adc, Dac, and Sensor Interfacing
No ratings yet
Adc, Dac, and Sensor Interfacing
25 pages
TacFLIR 380-HD Datasheet A4
No ratings yet
TacFLIR 380-HD Datasheet A4
2 pages
Kinds of Data - Quantitative Data & Qualitative Data - Introduction To Statistics
No ratings yet
Kinds of Data - Quantitative Data & Qualitative Data - Introduction To Statistics
3 pages
Pro 3 - @marcelo - Lcampos
50% (2)
Pro 3 - @marcelo - Lcampos
7 pages
Internship Book Joshua
100% (1)
Internship Book Joshua
72 pages
21 - Effective Pages: Beechcraft Corporation
No ratings yet
21 - Effective Pages: Beechcraft Corporation
166 pages
Airside Augmented Reality Solution
No ratings yet
Airside Augmented Reality Solution
19 pages
AUTOSAR SWS CANInterface
No ratings yet
AUTOSAR SWS CANInterface
215 pages
BSBPMG634 - Assessment Task 1 V1.2
No ratings yet
BSBPMG634 - Assessment Task 1 V1.2
8 pages
National Interest: Waiver Petitions
No ratings yet
National Interest: Waiver Petitions
12 pages
5SV04
No ratings yet
5SV04
3 pages
ISAR. December 2022
No ratings yet
ISAR. December 2022
2 pages
Amanda Viona - Review Management Feedlot
No ratings yet
Amanda Viona - Review Management Feedlot
3 pages
Fetch - Execute Cycle: Name: - Grade: 9
No ratings yet
Fetch - Execute Cycle: Name: - Grade: 9
5 pages
Breaker y
No ratings yet
Breaker y
80 pages
Hard Landscape Operation Manuals
No ratings yet
Hard Landscape Operation Manuals
3 pages
Overview of The C# Language
No ratings yet
Overview of The C# Language
4 pages
Modular Rubber Screening Media: Sandvik
No ratings yet
Modular Rubber Screening Media: Sandvik
2 pages
Frid Dell Harold G 1958
No ratings yet
Frid Dell Harold G 1958
43 pages
Pile Group Tool Documentation
No ratings yet
Pile Group Tool Documentation
30 pages
Activity Documentation SLAC
No ratings yet
Activity Documentation SLAC
4 pages
New Hello 3rd Year Unit 5 - 20222
No ratings yet
New Hello 3rd Year Unit 5 - 20222
55 pages
ForgeOps Dok
No ratings yet
ForgeOps Dok
59 pages

Lab Programs on HDFS and MapReduce

Uploaded by

Lab Programs on HDFS and MapReduce

Uploaded by

II MCA

BIG DATA ANALYTICS LAB

I. Basic HDFS Operations

• Create a directory named HDFS folder and verify its creation

II. Advanced HDFS Operations

• Upload a folder with multiple files to HDFS and verify

III. Word Count

Count the frequency of each word in a text file.

• Prepare a text input file with sample data (Input.txt)

Find the maximum temperature for each year in a weather dataset.

• Prepare a dataset with weather records in the format Year Temperature

V. Character Frequency Count using MapReduce

You might also like