Analyzing Data With Hadoop

Uploaded by

Suseela Devi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views54 pages

Analyzing Data With Hadoop

Uploaded by

Suseela Devi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 54

Analyzing the Data with Hadoop

A Weather Dataset

• The data we will use is from the National Climatic Data Center
(NCDC, http://www .ncdc.noaa.gov/).
• Weather sensors collect data every hour at many locations
across the globe and gather a large volume of log data, which
is a good candidate for analysis with MapReduce because we
want to process all the data, and the data is semi-structured
and record-oriented.
• The data is stored using a line-oriented ASCII format, in which
each line is a record.
• The format supports a rich set of meteorological elements,
many of which are optional or with variable data lengths.
Analyzing the Data with Hadoop
Map and Reduce
 Map Reduce works by breaking the
processing into two phases: the map phase
and the reduce phase.

 Each phase has key-value pairs as input

and output, the types of which may be
chosen by the programmer.

 The programmer also specifies two

functions: the map function and the reduce
function
• The input to our map phase is the raw NCDC data. We
choose a TextInputFormat that gives us each line in the
dataset as a text value.

• The map function is just a data preparation phase, setting

up the data in such a way that the reducer function can do
its work on it: finding the maximum temperature for each
year.

• The map function is also a good place to drop bad records:

here we filter out temperatures that are missing, suspect,
or erroneous
 The keys are the line offsets within the file, which we
ignore in our map function.
 The map function merely extracts the year and the air
temperature (indicated in bold text), and emits them
as its output (the temperature values have been
interpreted as integers):
(1950, 0)
(1950, 22)
(1950, −11)
(1949, 111)
(1949, 78)
• The output from the map function is processed by the
Map Reduce framework before being sent to the
reduce function. This processing sorts and groups the
key-value pairs by key.
(1949, [111, 78])
(1950, [0, 22, −11])
• Each year appears with a list of all its air temperature
readings. All the reduce function has to do now is
iterate through the list and pick up the maximum
reading:
(1949, 111)
(1950, 22)
This is the final output
Map Reduce logical data flow
• The input for our program is weather data files for each year This
weather data is collected by National Climatic Data Center – NCDC
from weather sensors at all over the world.
• You can find weather data for each year from
ftp://ftp.ncdc.noaa.gov/pub/data/noaa/.All files are zipped by year
and the weather station.
• For each year, there are multiple files for different weather stations .
Meaning
• 0029029070999991905010106004+64333+023450FM12+00
0599999V0202301N008219999999N0000001N9-01391+
99999102641ADDGF102991999999999999999999.
• (029070) is the USAF weather station identifier
• The next one (19050101) represents the observation date.
• The third highlighted one (-0139) represents the air
temperature in Celsius times ten.
• So the reading of -0139 equates to -13.9 degrees Celsius.
The next highlighted and italic item indicates a reading
quality code
Map Phase
• The input for Map phase is set of weather data files as shown
in snap shot. The types of input key value pairs are Long
Writable and Text and the types of output key value pairs are
Text and IntWritable.
• Each Map task extracts the temperature data from the given
year file. The output of the map phase is set of key value pairs.
Set of keys are the years. Values are the temperature of each
year.
• The Mapper class is a generic type, with four formal type
parameters that specify the input key, input value, output key,
and output value types of the map function.
Java Map Reduce
Mapper for maximum temperature
example
Reduce Phase
• Reduce phase takes all the values associated with a
particular key. That is all the temperature values
belong to a particular year is fed to a same reducer.
Then each reducer finds the highest recorded
temperature for each year.

• The types of output key value pairs in Map phase is

same for the types of input key value pairs in reduce
phase (Text and IntWritable). The types of output key
value pairs in reduce phase is too Text and IntWritable
Reducer for maximum temperature example
Application to find the maximum temperature in the weather dataset
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class MaxTemperature {
public static void main(String[] args) throws Exception {
if (args.length != 2) {
System.err.println("Usage: MaxTemperature <input path> <output path>");
System.exit(-1);
}
Job job = new Job();
job.setJarByClass(MaxTemperature.class);
job.setJobName("Max temperature");
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.setMapperClass(MaxTemperatureMapper.class);
job.setReducerClass(MaxTemperatureReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
• % export HADOOP_CLASSPATH=hadoop-examples.jar
• % hadoop MaxTemperature input/ncdc/sample.txt output
• Job object forms the specification of the job and gives you control over how
the job is run. When we run this job on a Hadoop cluster, we will package
the code into a JAR file (which Hadoop will distribute around the cluster).
• Rather than explicitly specifying the name of the JAR file, we can pass a class
in the Job’s setJarByClass() method, which Hadoop will use to locate the
relevant JAR file by looking for the JAR file containing this class.
• The return value of the waitForCompletion() method is a Boolean indicating
success (true) or failure (false), which we translate into the program’s exit
code of 0 or 1.
Scaling Out
• We’ve seen how MapReduce works for small inputs;
• For simplicity, the examples so far have used files on the local
filesystem. However, to scale out, we need to store the data in a
distributed filesystem.
• This allows Hadoop to move the MapReduce computation to each
machine hosting a part of the data, using Hadoop’s resource
management system, called YARN.
Data Flow
• A MapReduce job is a unit of work that the client wants to be
performed: it consists of the input data, the MapReduce program,
and configuration information.
• Hadoop runs the job by dividing it into tasks, of which there are two
types: map tasks and reduce tasks.
• The tasks are scheduled using YARN and run on nodes in the cluster. If
a task fails, it will be automatically rescheduled to run on a different
node.
• Hadoop divides the input to a MapReduce job into fixed-size pieces
called input splits,or just splits. Hadoop creates one map task for each
split, which runs the user-defined map function for each record in the
split.
• Hadoop does its best to run the map task on a node where the input
data resides in HDFS, because it doesn’t use valuable cluster
bandwidth. This is called the data locality optimization.
• There are two types of nodes that control the job execution process: a
jobtracker and a number of tasktrackers.

• The jobtracker coordinates all the jobs run on the system by scheduling
tasks to run on tasktrackers. Job Tracker determines the location of the data
by communicating with the Name Node. Job Tracker also helps in finding
the Task Tracker.

• Tasktrackers run tasks and send progress reports to the jobtracker, which
keeps a record of the overall progress of each job. The Task Tracker helps in
mapping, shuffling and reducing the data operations.
• Task Tracker continuously updates the status of the Job Tracker. It also
informs about the number of slots available in the cluster. In case the Task
Tracker is unresponsive, then Job Tracker assigns the work to some other
nodes.
Daemons of Hadoop
Distributed computing system – MapReduce Framework
Job Tracker:
• Centrally Monitors the submitted Job and
controls all processes running on the
nodes of the cluster.

Task Tracker:
• constantly communicates with job Tracker
Daemons Architecture
Hadoop Server Roles
• Hadoop divides the input to a Map Reduce job into fixed-size
pieces called input splits, or just splits.
• Hadoop creates one map task for each split, which runs the
user defined map function for each record in the split.
• Having many splits means the time taken to process each
split is small compared to the time to process the whole
input.
• So if we are processing the splits in parallel, the processing is
better load-balanced if the splits are small, since a faster
machine will be able to process proportionally more splits
over the course of the job than a slower machine.
• if splits are too small, then the overhead of managing the splits
and of map task creation begins to dominate the total job
execution time.
• For most jobs, a good split size tends to be the size of an HDFS
block, 128 MB by default, although this can be changed for the
cluster (for all newly created files), or specified when each file
is created.
• Hadoop does its best to run the map task on a node where the
input data resides in HDFS. This is called the data locality
optimization since it doesn’t use valuable cluster bandwidth.
• Sometimes, however, all three nodes hosting the HDFS block
replicas for a map task’s input split are running other map tasks
so the job scheduler will look for a free map slot on a node in
the same rack as one of the blocks.
• Map tasks write their output to the local disk, not to HDFS.
• Because Map output is intermediate output: it’s processed by
reduce tasks to produce the final output, and once the job is
complete, the map output can be thrown away.
• So, storing it in HDFS with replication would be overkill.
• If the node running the map task fails before the map output
has been consumed by the reduce task, then Hadoop will
automatically rerun the map task on another node to re-create
the map output.
• Reduce tasks don’t have the advantage of data locality; the
input to a single reduce task is normally the output from all
mappers.
• In the present example, we have a single reduce task that is fed
by all of the map tasks. Therefore, the sorted map outputs have
to be transferred across the network to the node where the
reduce task is running, where they are merged and then passed
to the user-defined reduce function.
• The output of the reduce is normally stored in HDFS for
reliability.
The whole data flow with a single reduce task is as follows:
Map Reduce data flow diagram
• The number of reduce tasks is not governed by the size of the
input, but instead is specified independently.
• When there are multiple reducers, the map tasks partition their
output, each creating one partition for each reduce task. There
can be many keys (and their associated values) in each partition,
but the records for any given key are all in a single partition.
• The partitioning can be controlled by a user-defined partitioning
function, but normally the default partitioner—which buckets
keys using a hash function.
Map Reduce data flow with multiple reduce tasks
• It’s also possible to have zero reduce tasks.
• This can be appropriate when we don’t need the shuffle because the
processing can be carried out entirely in parallel
Combiner Functions
• Many MapReduce jobs are limited by the bandwidth available on the
cluster, so it pays to minimize the data transferred between map and
reduce tasks.
• Hadoop allows the user to specify a combiner function to be run on the
map output, and the combiner function’s output forms the input to
the reduce function. Because the combiner function is an optimization
• Hadoop does not provide a guarantee of how many times it will call it
for a particular map output record.
• The contract for the combiner function constrains the type of function
that may be used.
• Suppose that for the maximum temperature example, readings for the year 1950 were
processed by two maps (because they were in different splits). Imagine the first map
produced the output:
(1950, 0)
(1950, 20)
(1950, 10)
and the second produced:
(1950, 25)
(1950, 15)
The reduce function would be called with a list of all the values:
(1950, [0, 20, 10, 25, 15])
with output:
(1950, 25)
• since 25 is the maximum value in the list. We could use a combiner function
that, just like the reduce function, finds the maximum temperature for each
map output.
• The reduce function would then be called with:
(1950, [20, 25])
and would produce the same output as before. More succinctly, we may
express the function calls on the temperature values in this case as follows:
• max(0, 20, 10, 25, 15) = max(max(0, 20, 10), max(25, 15)) = max(20, 25) = 25
• Not all functions possess this property.1 For example, if we were
calculating mean temperatures,
• we couldn’t use the mean as our combiner function, because:
• mean(0, 20, 10, 25, 15) = 14
• but:
• mean(mean(0, 20, 10), mean(25, 15)) = mean(10, 20) = 15
• The combiner function doesn’t replace the reduce function.
Specifying a combiner function
• Same implementation as the reduce function in
MaxTemperatureReducer.
• The only change we need to make is to set the combiner class on the Job.
public class MaxTemperatureWithCombiner {
public static void main(String[] args) throws Exception {
if (args.length != 2) {
System.err.println("Usage: MaxTemperatureWithCombiner <input path> " +
"<output path>");
System.exit(-1);
}
Job job = new Job();
job.setJarByClass(MaxTemperatureWithCombiner.class);
job.setJobName("Max temperature");
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.setMapperClass(MaxTemperatureMapper.class);
job.setCombinerClass(MaxTemperatureReducer.class);
job.setReducerClass(MaxTemperatureReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
Hadoop Streaming
• Hadoop provides an API to MapReduce that allows us to write our
map and reduce functions in languages other than Java.
• Hadoop Streaming uses Unix standard streams as the interface
between Hadoop and our program, so we can use any language that
can read standard input and write to standard output to write our
MapReduce program
Ruby

• The map function can be expressed in Ruby as

Map function for maximum temperature in Ruby
#!/usr/bin/env ruby
STDIN.each_line do |line|
val = line
year, temp, q = val[15,4], val[87,5], val[92,1]
puts "#{year}\t#{temp}" if (temp != "+9999" && q =~ /[01459]/)
end
% cat input/ncdc/sample.txt |
ch02-mr-intro/src/main/ruby/max_temperature_map.rb
1950 +0000
1950 +0022
1950 -0011
1949 +0111
1949 +0078
Reduce function for maximum temperature in Ruby
#!/usr/bin/env ruby
last_key, max_val = nil, -1000000
STDIN.each_line do |line|
key, val = line.split("\t")
if last_key && last_key != key
puts "#{last_key}\t#{max_val}"
last_key, max_val = key, val.to_i
else
last_key, max_val = key, [max_val, val.to_i].max
end
end
puts "#{last_key}\t#{max_val}" if last_key
% cat input/ncdc/sample.txt | \
ch02-mr-intro/src/main/ruby/max_temperature_map.rb | \
sort | ch02-mr-intro/src/main/ruby/max_temperature_reduce.rb
1949 111
1950 22
• Specify the Streaming JAR file along with the jar option. Options to
the Streaming program specify the input and output paths and the
map and reduce scripts.
% hadoop jar $HADOOP_HOME/share/hadoop/tools/lib/hadoop-
streaming-*.jar \
-input input/ncdc/sample.txt \
-output output \
-mapper ch02-mr-intro/src/main/ruby/max_temperature_map.rb \
-reducer ch02-mr-intro/src/main/ruby/max_temperature_reduce.rb
Python
Map function for maximum temperature in Python
#!/usr/bin/env python
import re
import sys
for line in sys.stdin:
val = line.strip()
(year, temp, q) = (val[15:19], val[87:92], val[92:93])
if (temp != "+9999" and re.match("[01459]", q)):
print "%s\t%s" % (year, temp)
Reduce function for maximum temperature in Python
#!/usr/bin/env python
import sys
(last_key, max_val) = (None, -sys.maxint)
for line in sys.stdin:
(key, val) = line.strip().split("\t")
if last_key and last_key != key:
print "%s\t%s" % (last_key, max_val)
(last_key, max_val) = (key, int(val))
else:
(last_key, max_val) = (key, max(max_val, int(val)))
if last_key:
print "%s\t%s" % (last_key, max_val)
% cat input/ncdc/sample.txt | \
ch02-mr-intro/src/main/python/max_temperature_map.py | \
sort | ch02-mr-intro/src/main/python/max_temperature_reduce.py
1949 111
1950 22
Word count using python : Mapper code
#!/usr/bin/env python
# import sys because we need to read and write data to STDIN and STDOUT
import sys
# reading entire line from STDIN (standard input)
for line in sys.stdin:
# to remove leading and trailing whitespace
line = line.strip()
# split the line into words
words = line.split()

# we are looping over the words array and printing the word
# with the count of 1 to the STDOUT
for word in words:
# write the results to STDOUT (standard output);
# what we output here will be the input for the
# Reduce step, i.e. the input for reducer.py
print '%s\t%s' % (word, 1)
Word count using python : Reducercode
#!/usr/bin/env python # this IF-switch only works because Hadoop sorts
import sys map output
current_word = None # by key (here: word) before it is passed to the
current_count = 0 reducer
word = None if current_word == word:
# read the entire line from STDIN current_count += count
for line in sys.stdin: else:
# remove leading and trailing whitespace if current_word:
line = line.strip() # write result to STDOUT
# splitting the data on the basis of tab we have print '%s\t%s' % (current_word,
provided in mapper.py current_count)
word, count = line.split('\t', 1)
current_count = count
# convert count (currently a string) to int
current_word = word
try:
# do not forget to output the last word if needed!
count = int(count)
except ValueError:
if current_word == word:
# count was not a number, so silently print '%s\t%s' % (current_word, current_count)
# ignore/discard this line
continue

05 - MapReduce in Hadoop - An Introduction
No ratings yet
05 - MapReduce in Hadoop - An Introduction
31 pages
ADBMS-Module 3
No ratings yet
ADBMS-Module 3
115 pages
BDA Unit-3
No ratings yet
BDA Unit-3
63 pages
Unit III EBDP 2022
No ratings yet
Unit III EBDP 2022
77 pages
Map Reduce 1
No ratings yet
Map Reduce 1
8 pages
BDA Unit 2 Notes
No ratings yet
BDA Unit 2 Notes
32 pages
Big Data
No ratings yet
Big Data
47 pages
B. Hadoop Ecosystem - III (MapReduce)
No ratings yet
B. Hadoop Ecosystem - III (MapReduce)
55 pages
Bda CHP2
No ratings yet
Bda CHP2
105 pages
Chapter 4 MapReduce and New Software Stack
No ratings yet
Chapter 4 MapReduce and New Software Stack
48 pages
3D Hardware design:: Software applications for GPU
From Everand
3D Hardware design:: Software applications for GPU
S Mathioudakis
No ratings yet
MapReduce Unit3
No ratings yet
MapReduce Unit3
27 pages
AI Documentation
100% (1)
AI Documentation
54 pages
BDA Unit-4
No ratings yet
BDA Unit-4
32 pages
Executing Hadoop Map Reduce Jobs
No ratings yet
Executing Hadoop Map Reduce Jobs
2 pages
Unit-Iii: A Weather Dataset
No ratings yet
Unit-Iii: A Weather Dataset
12 pages
Unit IV BDA
No ratings yet
Unit IV BDA
32 pages
3 MapReduce Framework
No ratings yet
3 MapReduce Framework
28 pages
Big Data Analytics Unit-3
No ratings yet
Big Data Analytics Unit-3
29 pages
Hadoop Map Reduce Concepts - Teaching - 1
No ratings yet
Hadoop Map Reduce Concepts - Teaching - 1
53 pages
Unit V Programming Model
No ratings yet
Unit V Programming Model
53 pages
Bda Module 4
No ratings yet
Bda Module 4
34 pages
Analyzing The Data With Hadoop
No ratings yet
Analyzing The Data With Hadoop
13 pages
Unit-2 MapReduce2024
No ratings yet
Unit-2 MapReduce2024
41 pages
Intro Digital Design-Digilent-VHDL Online
100% (1)
Intro Digital Design-Digilent-VHDL Online
124 pages
Unit-Iv CC&BD CS62
No ratings yet
Unit-Iv CC&BD CS62
76 pages
Unit IV Notes
No ratings yet
Unit IV Notes
25 pages
Unit 4 1
No ratings yet
Unit 4 1
12 pages
MapReduce and Yarn
No ratings yet
MapReduce and Yarn
39 pages
Questionsand Answers
No ratings yet
Questionsand Answers
23 pages
MapReduce Arch
No ratings yet
MapReduce Arch
29 pages
Unit 4 Handouts
No ratings yet
Unit 4 Handouts
13 pages
CT - IMB - 36 To 300 KV
No ratings yet
CT - IMB - 36 To 300 KV
12 pages
BDA Unit 3 Notes
No ratings yet
BDA Unit 3 Notes
11 pages
U-3 Big Data
No ratings yet
U-3 Big Data
23 pages
Unit 4
No ratings yet
Unit 4
19 pages
Hadoop: A Seminar Report On
No ratings yet
Hadoop: A Seminar Report On
28 pages
Unit3 Mosfet Diode Transistor
No ratings yet
Unit3 Mosfet Diode Transistor
45 pages
Mapreduce Introduction
No ratings yet
Mapreduce Introduction
14 pages
Unit 3
No ratings yet
Unit 3
13 pages
Sem 7 - COMP - BDA
No ratings yet
Sem 7 - COMP - BDA
16 pages
P.prabu (28x61c) CCS334 BDA - Unit 4
No ratings yet
P.prabu (28x61c) CCS334 BDA - Unit 4
28 pages
BDA Unit 4 Notes
No ratings yet
BDA Unit 4 Notes
20 pages
3.1.how Map Reduce Works & 3.2 Anatomy
No ratings yet
3.1.how Map Reduce Works & 3.2 Anatomy
11 pages
Big Data BCA Unit4
No ratings yet
Big Data BCA Unit4
9 pages
Kcs 061 PPT Unit 2
No ratings yet
Kcs 061 PPT Unit 2
56 pages
05 Movies Data Analysis Using Mapreduce
No ratings yet
05 Movies Data Analysis Using Mapreduce
20 pages
Conceptual Programming: Conceptual Programming: Learn Programming the old way!
From Everand
Conceptual Programming: Conceptual Programming: Learn Programming the old way!
Avishek Sharma
No ratings yet
Survey Paper On Traditional Hadoop and Pipelined Map Reduce: Dhole Poonam B, Gunjal Baisa L
No ratings yet
Survey Paper On Traditional Hadoop and Pipelined Map Reduce: Dhole Poonam B, Gunjal Baisa L
5 pages
HadoopMapreduce Summerization
No ratings yet
HadoopMapreduce Summerization
24 pages
Map Reduce and Hadoop
No ratings yet
Map Reduce and Hadoop
39 pages
Unit-2 (MapReduce-II)
No ratings yet
Unit-2 (MapReduce-II)
11 pages
3 Fuel Consumption Example - MR
No ratings yet
3 Fuel Consumption Example - MR
7 pages
Lecture 9-SOCIAL MEDIA MARKETING
No ratings yet
Lecture 9-SOCIAL MEDIA MARKETING
26 pages
Tentative Course List (July - Dec 2024)
No ratings yet
Tentative Course List (July - Dec 2024)
108 pages
18mcs35e U4
No ratings yet
18mcs35e U4
7 pages
Interview Questions for IBM Mainframe Developers
From Everand
Interview Questions for IBM Mainframe Developers
Robert Wingate
1/5 (1)
123-ODE-ELE-A-RA-000001 - Revb - Outer Dowsing OWF - Concept Electrical System Report
No ratings yet
123-ODE-ELE-A-RA-000001 - Revb - Outer Dowsing OWF - Concept Electrical System Report
101 pages
Megger Test Procedure Explained With Transformer Example
No ratings yet
Megger Test Procedure Explained With Transformer Example
4 pages
03 Firstmrjob Invertedindexconstruction 141206231216 Conversion Gate01 PDF
No ratings yet
03 Firstmrjob Invertedindexconstruction 141206231216 Conversion Gate01 PDF
54 pages
Eligible Students For TCS Hiring With Reference Numbers at 5.30pm 15-Aug-2024
No ratings yet
Eligible Students For TCS Hiring With Reference Numbers at 5.30pm 15-Aug-2024
29 pages
Ps Pps Ic 9.1r8.2 Supportedplatforms
No ratings yet
Ps Pps Ic 9.1r8.2 Supportedplatforms
32 pages
Weather Data Analysis Using Had Oop
No ratings yet
Weather Data Analysis Using Had Oop
9 pages
Infosys Intranet - Knowledge Management
No ratings yet
Infosys Intranet - Knowledge Management
15 pages
Hadoop Karunesh
No ratings yet
Hadoop Karunesh
14 pages
2018 - Beta Easy
No ratings yet
2018 - Beta Easy
20 pages
Hadoop: A Report Writing On
No ratings yet
Hadoop: A Report Writing On
13 pages
Top Answers To Map Reduce Interview Questions
No ratings yet
Top Answers To Map Reduce Interview Questions
6 pages
Bda - Unit 3
No ratings yet
Bda - Unit 3
29 pages
Map Reduce
No ratings yet
Map Reduce
40 pages
GM 1927 17 Processes and Measurements Procedure Rev8.0
100% (3)
GM 1927 17 Processes and Measurements Procedure Rev8.0
41 pages
Catalogue DRX New DRX Adjustable 01
No ratings yet
Catalogue DRX New DRX Adjustable 01
22 pages
Photoluminescence Imaging Speeds Solar Cell Inspection
No ratings yet
Photoluminescence Imaging Speeds Solar Cell Inspection
7 pages
Introduction To Hadoop - Chapter-2
No ratings yet
Introduction To Hadoop - Chapter-2
59 pages
Bda Unit 3
No ratings yet
Bda Unit 3
22 pages
STP8000 and STP9000 Interface Specs
No ratings yet
STP8000 and STP9000 Interface Specs
31 pages
Amplitude Modulation: Rajesh Maurya 16510060 M.Sc. Physics Iit Gandhinagar April 4, 2017
No ratings yet
Amplitude Modulation: Rajesh Maurya 16510060 M.Sc. Physics Iit Gandhinagar April 4, 2017
9 pages
Service Manual: Xr/Uc, Xr/Ew, Xr/Es
No ratings yet
Service Manual: Xr/Uc, Xr/Ew, Xr/Es
23 pages
Umar Javeed & Others vs. Google LLC and Another
No ratings yet
Umar Javeed & Others vs. Google LLC and Another
6 pages
Group 10 Engine Control System: 1. Cpu Controller Mounting
No ratings yet
Group 10 Engine Control System: 1. Cpu Controller Mounting
7 pages
HLS - Digital Output Module (DOM) - DS
No ratings yet
HLS - Digital Output Module (DOM) - DS
4 pages
Unit-Iii: A Weather Dataset
No ratings yet
Unit-Iii: A Weather Dataset
12 pages
EViews - Wikipedia PDF
No ratings yet
EViews - Wikipedia PDF
8 pages
HE600UK IB MP 210727 Mv2 LR
No ratings yet
HE600UK IB MP 210727 Mv2 LR
7 pages
Sas #3 - Neri, Armee Gay C.
No ratings yet
Sas #3 - Neri, Armee Gay C.
3 pages
Radar and Navigational Aids
0% (1)
Radar and Navigational Aids
1 page
Flyer Digivod 16 MRZ 09 E
No ratings yet
Flyer Digivod 16 MRZ 09 E
4 pages
Technological Factors
No ratings yet
Technological Factors
3 pages
Git 1
No ratings yet
Git 1
11 pages
AI in Drug Discovery Final
No ratings yet
AI in Drug Discovery Final
3 pages
Linux Driver For Temp Recorder Ellitech RC5
No ratings yet
Linux Driver For Temp Recorder Ellitech RC5
3 pages
Elevate Customer Service Standards With Promptora Generative AI
No ratings yet
Elevate Customer Service Standards With Promptora Generative AI
5 pages
OTH Forschungsbericht 2023 TTM
No ratings yet
OTH Forschungsbericht 2023 TTM
6 pages
Questions Asked in Ssc Mts Exam 2024 Exam Analysis 15th October
No ratings yet
Questions Asked in Ssc Mts Exam 2024 Exam Analysis 15th October
5 pages

Analyzing Data With Hadoop

Uploaded by

Analyzing Data With Hadoop

Uploaded by

Analyzing the Data with Hadoop

 Each phase has key-value pairs as input

 The programmer also specifies two

• The map function is just a data preparation phase, setting

• The map function is also a good place to drop bad records:

• The types of output key value pairs in Map phase is

• The map function can be expressed in Ruby as

You might also like