0% found this document useful (0 votes)

15 views10 pages

Euro Stat Map Reducetraining

The document provides training on using MapReduce by having trainees connect to a preconfigured server and run MapReduce jobs on a text file to count word frequencies. Trainees download Python mapper and reducer files, run the jobs on a novel to count word frequencies, and save the output to a file.

Uploaded by

Радомир Мутабџија

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views10 pages

Euro Stat Map Reducetraining

Uploaded by

Радомир Мутабџија

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

MAPREDUCE TRAINING

1. Introduction
The objective of this practice is to provide the trainees with a broad vision of the mapreduce
paradigm. Due to the difficulty of installing and setting up a Spark infrastructure, we provide
the trainees with the connection to a preconfigured server.

2. PuTTY Installation
Download the latest version of PuTTY:

https://www.chiark.greenend.org.uk/~sgtatham/putty/latest.html
Run the installer:

Follow the instructions:

3. Putty Configuration
Open the PuTTY configuration:
Set the following value in the Host Name field:

spark.autoritas.net
Select Connection / SSH / Auth in the left menu:

Browse for the spark.ppk key, and accept the security alert:
Once connected to the server, log in using the user: ubuntu

4. Generating the user space

Invent a username by combining words and numbers, without any special character or
spaces. For example, my name is Kico Rangel and I was born in 1977, hence my username
could be: kicorangel77

Write the following in the command line:

./start.sh [your_username]
cd [your_username]

5. Practice with MapReduce

The objective of this practice is to obtain the list of the words contained in the novel with their
frequency of occurrence. To do so we follow some steps by executing command line
commands (below).

All codes and data can be downloaded from the following urls:

https://s3-eu-west-1.amazonaws.com/autoritas.academy/EuroStat/mapreduce/mapper.py
https://s3-eu-west-1.amazonaws.com/autoritas.academy/EuroStat/mapreduce/reducer.py
https://s3-eu-west-1.amazonaws.com/autoritas.academy/EuroStat/mapreduce/wuthering-heights.txt
https://s3-eu-west-1.amazonaws.com/autoritas.academy/EuroStat/mapreduce/wuthering-heights.words.txt
Mapper.py

Reducer.py
Let’s run them:

● Showing the wuthering-heights.txt file, with a version of the novel in plain text:

cat wuthering-heights.txt

● Counting the number of lines, words and characters:

cat wuthering-heights.txt | wc

The output should be something like the following:

Lines Words Characters

4283 118903 684482

● Mapping the job: It decomposes the novel into its words, creating pairs of [word, 1]

cat wuthering-heights.txt | ./mapper.py

● Ordening the mapping: We can see that the same word is many times, always
accompanied by the number 1

cat wuthering-heights.txt | ./mapper.py | sort

● Reducing the job: It reduces the list of words by grouping per word and summing up
the 1s, obtaining the frequency of occurrence of each word:

cat wuthering-heights.txt | ./mapper.py | ./reducer.py

● Ordening the output: We can see the frequency of occurrence per word:

cat wuthering-heights.txt | ./mapper.py | ./reducer.py | sort

● Redirecting the output to a file and exploring the file

cat wuthering-heights.txt | ./mapper.py | ./reducer.py | sort > wuthering-heights.words.txt

nano wuthering-heights.words.txt

● Exiting from the editor:

[ctrl]x
6. Conclusions
MapReduce paradigm is based on the divide and conquer philosophy, where a set of
mappers decompose the data into small groups and apply a simple operation (e.g. obtaining
words and creating pairs [word, 1]), and the reducer regroups the data by making a simple
joining task from the output of the mappers (e.g. summing up frequencies).

Bda Lab Exercises Lab Mannual - 2023
No ratings yet
Bda Lab Exercises Lab Mannual - 2023
72 pages
14 MapReduce
100% (1)
14 MapReduce
82 pages
14 MapReduce PDF
100% (1)
14 MapReduce PDF
82 pages
Ecs765p W2
No ratings yet
Ecs765p W2
55 pages
Hadoop-Yahoo - Tutorial Course 1
No ratings yet
Hadoop-Yahoo - Tutorial Course 1
149 pages
Bda Lab S
No ratings yet
Bda Lab S
92 pages
EUC1502 Module1 Machine Learning
No ratings yet
EUC1502 Module1 Machine Learning
154 pages
Module2 C MapReduceParadigm
No ratings yet
Module2 C MapReduceParadigm
74 pages
MapReduce: Simplified Data Processing On Large Clusters
100% (1)
MapReduce: Simplified Data Processing On Large Clusters
13 pages
102 - Sorting and Subsetting - Python
No ratings yet
102 - Sorting and Subsetting - Python
2 pages
Big Data Lab Manual Printout
No ratings yet
Big Data Lab Manual Printout
51 pages
EUC1502 Module6 TextualAnalysis
No ratings yet
EUC1502 Module6 TextualAnalysis
99 pages
Lecture 4: Mapreduce and Hadoop: Indranil Gupta (Indy)
No ratings yet
Lecture 4: Mapreduce and Hadoop: Indranil Gupta (Indy)
37 pages
Big Data CNN Models
No ratings yet
Big Data CNN Models
32 pages
CS702 Big Data Programs
No ratings yet
CS702 Big Data Programs
58 pages
Chapter 6
No ratings yet
Chapter 6
57 pages
CS-702 (D) BigData
No ratings yet
CS-702 (D) BigData
61 pages
Chapter 4
No ratings yet
Chapter 4
53 pages
09b - MapReduce
No ratings yet
09b - MapReduce
44 pages
Map Reduce
No ratings yet
Map Reduce
42 pages
Map Reduce Notes and Learning
No ratings yet
Map Reduce Notes and Learning
48 pages
Ch02a Mapreduce
No ratings yet
Ch02a Mapreduce
53 pages
M4 06 MapReduce
No ratings yet
M4 06 MapReduce
28 pages
Parlab Parallel Boot Camp Cloud Computing With Mapreduce and Hadoop
No ratings yet
Parlab Parallel Boot Camp Cloud Computing With Mapreduce and Hadoop
49 pages
CURSL
No ratings yet
CURSL
60 pages
Da Unit 5 Data Analytics
No ratings yet
Da Unit 5 Data Analytics
43 pages
Map-Reduce For Parallel Computing: Amit Jain
No ratings yet
Map-Reduce For Parallel Computing: Amit Jain
72 pages
Lecture 03
No ratings yet
Lecture 03
26 pages
Ir MR 1
No ratings yet
Ir MR 1
34 pages
101 - Introducing DataFrames - Python
No ratings yet
101 - Introducing DataFrames - Python
2 pages
1s07 Map Reduce Presentation 2019
No ratings yet
1s07 Map Reduce Presentation 2019
43 pages
Lsde Workshop wk9
No ratings yet
Lsde Workshop wk9
31 pages
Parlab Parallel Boot Camp: Cloud Computing With Mapreduce and Hadoop
No ratings yet
Parlab Parallel Boot Camp: Cloud Computing With Mapreduce and Hadoop
55 pages
Lecture 1 - Map Reduce
No ratings yet
Lecture 1 - Map Reduce
31 pages
Map Reduce
No ratings yet
Map Reduce
39 pages
Parlab Parallel Boot Camp: Cloud Computing With Mapreduce and Hadoop
No ratings yet
Parlab Parallel Boot Camp: Cloud Computing With Mapreduce and Hadoop
53 pages
Paper Map Reduce
No ratings yet
Paper Map Reduce
16 pages
Map Reduce
No ratings yet
Map Reduce
25 pages
BDT Lab Manual
No ratings yet
BDT Lab Manual
48 pages
Lecture - 3
No ratings yet
Lecture - 3
25 pages
Introduction To Batch Processing
No ratings yet
Introduction To Batch Processing
23 pages
000 - Data Manipulation With Pandas - DataCamp
100% (1)
000 - Data Manipulation With Pandas - DataCamp
5 pages
Map Reduce: Simplified Processing On Large Clusters
No ratings yet
Map Reduce: Simplified Processing On Large Clusters
29 pages
084 Liza Bda File
No ratings yet
084 Liza Bda File
23 pages
EUC1502 Module2 Machine Learning
No ratings yet
EUC1502 Module2 Machine Learning
32 pages
Map Reduce Design and Execution Framework Part 1
No ratings yet
Map Reduce Design and Execution Framework Part 1
19 pages
1stmarch Updated Version - February, New Topics, Presentation Online Course 2021
No ratings yet
1stmarch Updated Version - February, New Topics, Presentation Online Course 2021
27 pages
Run Python MapReduce On Local Docker Hadoop Cluster - DEV Community
No ratings yet
Run Python MapReduce On Local Docker Hadoop Cluster - DEV Community
5 pages
MapReduce - Simpli Ed Data Processing On Large Clusters
No ratings yet
MapReduce - Simpli Ed Data Processing On Large Clusters
22 pages
Mapreduce
No ratings yet
Mapreduce
13 pages
Mapreduce
No ratings yet
Mapreduce
13 pages
Writing An Hadoop MapReduce Program in Python
No ratings yet
Writing An Hadoop MapReduce Program in Python
21 pages
Map Reduce
No ratings yet
Map Reduce
18 pages
Map Reduce
No ratings yet
Map Reduce
28 pages
Bda Experiment No2
No ratings yet
Bda Experiment No2
12 pages
Big Data Report
No ratings yet
Big Data Report
7 pages
EuroStat TextAnalysistraining
No ratings yet
EuroStat TextAnalysistraining
12 pages
Template of GSBPM - ENG
No ratings yet
Template of GSBPM - ENG
15 pages
Write Your First MapReduce Program in 20 Minutes
No ratings yet
Write Your First MapReduce Program in 20 Minutes
16 pages
Practice 2
No ratings yet
Practice 2
7 pages
The Mapreduce Paradigm: Michael Kleber
No ratings yet
The Mapreduce Paradigm: Michael Kleber
13 pages
Map Reduce
No ratings yet
Map Reduce
3 pages
Scaling Up With Mapreduce, Hadoop, and Amazon
No ratings yet
Scaling Up With Mapreduce, Hadoop, and Amazon
15 pages
Dean 08 Map Reduce
No ratings yet
Dean 08 Map Reduce
7 pages
Traditional Way Vs Map Reduce Way and Steps in Mapreduce (Word Count) - 1
No ratings yet
Traditional Way Vs Map Reduce Way and Steps in Mapreduce (Word Count) - 1
4 pages
2 - Hadoop MapReduce
No ratings yet
2 - Hadoop MapReduce
2 pages
Tricet
No ratings yet
Tricet
3 pages
Scu - 37 23101713060
No ratings yet
Scu - 37 23101713060
1 page
IT Report Annex2
No ratings yet
IT Report Annex2
4 pages
Scholarship MELS 2021-2023
No ratings yet
Scholarship MELS 2021-2023
2 pages
Master of European Legal Studies - MELS Online: Faculty Alumni
No ratings yet
Master of European Legal Studies - MELS Online: Faculty Alumni
2 pages
Tipkamo Li U Mraku?-Upalimo Svjetlo!: Annex 1
No ratings yet
Tipkamo Li U Mraku?-Upalimo Svjetlo!: Annex 1
2 pages
A. Reporting Non-Response, Types
No ratings yet
A. Reporting Non-Response, Types
1 page
How To Do A Pre-Test
No ratings yet
How To Do A Pre-Test
1 page
Sampling Methods: 1 Simple Random Sampling (S.R.S.)
No ratings yet
Sampling Methods: 1 Simple Random Sampling (S.R.S.)
1 page
Resources To Learn Statistical Programs: SPSS Webpage: Ucla: SPSS Tutorial: SPSS List
No ratings yet
Resources To Learn Statistical Programs: SPSS Webpage: Ucla: SPSS Tutorial: SPSS List
1 page

Euro Stat Map Reducetraining

Uploaded by

Euro Stat Map Reducetraining

Uploaded by

MAPREDUCE TRAINING

Follow the instructions:

4. Generating the user space

Write the following in the command line:

5. Practice with MapReduce

● Counting the number of lines, words and characters:

The output should be something like the following:

Lines Words Characters

cat wuthering-heights.txt | ./mapper.py

cat wuthering-heights.txt | ./mapper.py | sort

cat wuthering-heights.txt | ./mapper.py | ./reducer.py

cat wuthering-heights.txt | ./mapper.py | ./reducer.py | sort

● Redirecting the output to a file and exploring the file

cat wuthering-heights.txt | ./mapper.py | ./reducer.py | sort > wuthering-heights.words.txt

● Exiting from the editor:

You might also like