Mapreduce Types and Formats

The document discusses MapReduce types and formats. It covers: 1. The general forms of map and reduce functions in Hadoop MapReduce, with different key and value types for inputs and outputs. 2. Combiner functions have the same form as reduce but output the same types, often performing the same function. 3. Partition functions determine which reducer a map output goes to based on key. 4. Input formats define input types while other types are set explicitly in the job. FileInputFormat is the base class for file inputs and defines input splits.

Uploaded by

Tejaswini Karmakonda

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

166 views65 pages

Mapreduce Types and Formats

Uploaded by

Tejaswini Karmakonda

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 65

MapReduce Types and

Formats
By
Dr. K. Venkateswara Rao
Professor CSE

Prepared using O’Reilly – Hadoop: The Definitive Guide and some slides
are taken from Taikyoung Kim Presentation
MapReduce Types
• MapReduce has a simple model of data processing: inputs and outputs for the
map and reduce functions are key-value pairs.
• The map and reduce functions in Hadoop MapReduce have the following
general form:
map: (K1, V1) → list(K2, V2)
reduce: (K2, list(V2)) → list(K3, V3)
• In general, the map input key and value types (K1 and V1) are different from
the map output types (K2 and V2).
• The reduce input must have the same types as the map output, although the
reduce output types may be different again (K3 and V3).
MapReduce Types
• Combiner Function
map: (K1, V1) → list(K2, V2)
combiner: (K2, list(V2)) → list(K2, V2)
reduce: (K2, list(V2)) → list(K3, V3)
It has the same form as the reduce function except its output types
Often the combiner and reduce functions are the same
• Partition Function
partition: (K2, V2) → integer
Operates on the intermediate key and value types (K2 and V2)
Returns the partition index
In practice, the partition is determined solely by the key (the value is
ignored)
MapReduce Types
• Input types are set by the input format.

• Ex:- setInputFormatClass(TextInputFormat.class)
Generates keys of type LongWritable and values of type Text
• The other types are set explicitly by calling the methods on the Job
Ex:- Job conf; conf.setMapOutputKeyClass(Text.class)
• intermediate types are also set as the final output types by default.
The default MapReduce Job
@Override
public int run(String[] args) throws Exception {
Job job = JobBuilder.parseInputAndOutput(this, getConf(), args);
job.setInputFormatClass(TextInputFormat.class);
job.setMapperClass(Mapper.class);
job.setMapOutputKeyClass(LongWritable.class);
job.setMapOutputValueClass(Text.class);
job.setPartitionerClass(HashPartitioner.class);
job.setNumReduceTasks(1);
job.setReducerClass(Reducer.class);
job.setOutputKeyClass(LongWritable.class);
job.setOutputValueClass(Text.class);
job.setOutputFormatClass(TextOutputFormat.class);
return job.waitForCompletion(true) ? 0 : 1; }
MapReduce Types
• Number of Reducers:
• Choosing the number of reducers for a job is more of an art than a science.
Increasing the number of reducers makes the reduce phase shorter, since you
get more parallelism. However, if you take this too far, you can have lots of
small files, which is suboptimal. One rule of thumb is to aim for reducers that
each run for five minutes or so, and which produce at least one HDFS block’s
worth of output.
• The number of map tasks is equal to the number of splits that the input is
turned into. The number of reducers will be equal to the number of nodes
multiplied by the slots per node mapred.tasktracker.reduce.tasks.maximum.
• It is good to have slightly fewer reducers than total slots.
• By default, there is a single reducer
MapReduce Types
• Number of Reducers:
• Choosing the number of reducers for a job is more of an art than a science.
Increasing the number of reducers makes the reduce phase shorter, since you
get more parallelism. However, if you take this too far, you can have lots of
small files, which is suboptimal. One rule of thumb is to aim for reducers that
each run for five minutes or so, and which produce at least one HDFS block’s
worth of output.
• The number of map tasks is equal to the number of splits that the input is
turned into. The number of reducers will be equal to the number of nodes
multiplied by the slots per node mapred.tasktracker.reduce.tasks.maximum.
• It is good to have slightly fewer reducers than total slots.
• By default, there is a single reducer
Hadoop Streaming
• Hadoop Streaming uses Unix standard streams as the interface between Hadoop
and user program.
• Difference between Streaming and the Java MapReduce API.
• The Java API is geared toward processing your map function one record at a
time. The framework calls the map() method on your Mapper for each record in
the input, whereas
• With Streaming the map program can decide how to process the input
• for example, it could easily read and process multiple lines at a time since
it’s in control of the reading.
The relationship of the Streaming executable to the node
manager and the task container
.• Streaming runs special map and reduce tasks for the
purpose of launching the user supplied executable
and communicating with it
• The Streaming task communicates with the
streaming process (which may be written in any
language) using standard input and output streams.
• During execution of the task, the Java process
passes input key-value pairs to the external process,
which runs it through the user-defined map or
reduce function and passes the output key-value
pairs back to the Java process.
MapReduce Types: The default Streaming Job
• $ hadoop jar $HADOOP_INSTALL/contrib/streaming/hadoop*-streaming.jar \
-input input/sample.txt -output output -mapper /bin/cat
• There is no default identity mapper, so it must explicitly be set. Hadoop Streaming
output keys and values are always Text. Usually the key (the line offset) is not passed
to the mapper. The default command set explicitly is:
• $ hadoop jar $HADOOP_INSTALL/contrib/hadoop-*streaming.jar \
-input input/sample.txt -output output \
- inputformat org.apache.hadoop.mapred.TextInputFormat -mapper /bin/cat \
-partitioner org.apache.hadoop.mapred.lib.HashPartitioner -numReduceTasks 1 \
-reducer org.apache.hadoop.mapred.lib.IdentityReducer \
-outputformat org.apache.hadoop.mapred.TextOutputFormat
MapReduce Types: Keys and values in Streaming
• A Streaming application can control the separator that is used when
a key-value pair is turned into a series of bytes and sent to the map or reduce
process over standard input.
• The separator can be configured independently for maps and reducers
• Furthermore, the key from the output can be composed of more than the first field: it
can be made up of the first n fields (defined by
stream.num.map.output.key.fields
or
stream.num.reduce.output.key.fields),
• The value is the remaining fields, after n fields
For example, if the output from a Streaming process was a,b,c (with a comma as
the separator), and n was 2, the key would be parsed as a,b and the value as c.
MapReduce Types: Keys and values in Streaming
• The separator can be configured or set independently for maps and reducers.
• These settings do not have any bearing on the input and output formats.
Use of separators in a Streaming MapReduce job
Input Formats
Input Formats: Input Splits and Records
• Splits and records are logical
Input Formats
• Hadoop can process many different types of data formats, from flat text files to
databases.
Input Formats
Input Formats: Input Splits and Records
• Input splits are represented by the Java class InputSplit (which is in the
org.apache.hadoop.mapreduce package)

• An InputSplit has a length in bytes and a set of storage locations which are just hostname
strings.
• A split doesn’t contain the input data; it is just a reference to the data.
• The storage locations are used by the MapReduce system to place map tasks as close to the
split’s data as possible
• The size is used to order the splits so that the largest get processed first, in an attempt to
minimize the job runtime.
Input Formats: Input Splits and Records
• MapReduce application developer need not deal with InputSplits directly, as they are created
by an InputFormat.

• The client running the job calculates the splits for the job by calling getSplits(), then sends
them to the application master, which uses their storage locations to schedule map tasks that
will process them on the cluster.
• The map task passes the split to the createRecordReader() method on InputFormat to obtain a
RecordReader for that split. RecordReader iterate over records.
Input Formats: Input Splits and Records
• The map task uses RecordReader to generate record key-value pairs, which it passes to the
map function. Following is the Mapper’s run() method:

• After running setup(), the nextKeyValue() is called repeatedly on the Context to populate the
key and value objects for the mapper.
• The key and value are retrieved from the RecordReader by way of the Context and are
passed to the map() method for it to do its work.
• When the reader gets to the end of the stream, the nextKeyValue() method returns false, and
the map task runs its cleanup() method and then completes.
Input Formats: FileInputFormat
• FileInputFormat is the base class for all implementations of InputFormat that
use files as their data source
• It provides two things:
1. A place to define which files are included as the input to a job,
2. An implementation for generating splits for the input files.
• The job of dividing splits into records is performed by subclasses.
• FileInputFormat offers four static convenience methods for setting a Job’s input
paths:
Input Formats: FileInputFormat input splits
• FileInputFormat splits only large files—here, “large” means larger than an
HDFS block.
Input Formats: FileInputFormat input splits
• The split size is calculated by the following formula (see the computeSplitSize() method in
FileInputFormat):
• max(minimumSize, min(maximumSize, blockSize))
• and by default: minimumSize < blockSize < maximumSize
Small Files and CombineFile Input Format
Input Formats: TextInputFormat
Input Formats: NLineInputFormat
Input Formats: Binary Input
Input Formats: Multiple Inputs
Output Formats
Output Formats
Output Formats: Text Output
Output Formats: Binary Output
Output Formats: Multiple Outputs
Output Formats: Multiple Outputs
Output Formats: Lazy Output

• FileOutputFormat subclasses will create output (part-r-nnnnn)

files, even if they are empty.
• Lazy OutputFormat helps some applications that does not want to
create empty files
• It is a wrapper output format that ensures that the output file is
created only when the first record is emitted for a given partition.
• To use it, call its setOutputFormatClass() method with the
JobConf.
MapReduce Features
By
Dr. K. Venkateswara Rao
Professor, CSE
(Reference:- Hadoop: The Definitive Guide,
4th Edition, Tom White)
MapReduce Features
• Following are the advanced features of MapReduce that help to know about the data being
analyzed.
1. Counters
2. Sorting datasets
3. Joining datasets
4. Side Data Distribution
• Benefits of the Advanced features of MapReduce
1. Finding any bug while counting invalid records
2. Examining different ways of sorting datasets and controlling the sort order in
MapReduce
3. Understanding concept of joining of Datasets in Mapeduce
4. Making side data available to all the map or reduce tasks (which are spread across the
cluster) in a convenient and efficient fashion. Side data can be defined as extra read-
only data needed by a job to process the main dataset.
Counters
• Counters are a useful channel for gathering statistics about the job
1. for quality control or for application-level statistics
2. for problem diagnosis
Ex) # of invalid records
• Counter values are much easier to retrieve than log output from logfiles for large
distributed jobs
• Types of Counters
1. Built-in Counters such as task counters, Job Counters etc.
2. User-Defined Java Counters
• User can define a set of counters to be incremented in a mapper/reducer
function.
Ex:- Dynamic counters (not defined by Java enum) can be created by the user
Built-in Counters
• Hadoop maintains some built-in counters for every job, and these report
various metrics.
For example, there are counters for the number of bytes and records
processed, which allow you to confirm that the expected amount of input
was consumed and the expected amount of output was produced.
• Counters are divided into groups
• Each group either contains task counters (which are updated as a task
progresses) or job counters (which are updated as a job progresses).
1. Task counters gather information about tasks over the course of their
execution, and the results are aggregated over all the tasks in a job
2. Job counters are maintained by the application master to measure job-
level statistics.
Built-in Counter Groups
Built-in MapReduce task counters
Built-in MapReduce task counters Contd…
Built-in MapReduce task counters Contd…
Built-in MapReduce task counters Contd…
Built-in filesystem task Counters
Built-in File I/O Format Task Counters
Built-in job Counters
Built-in job Counters
Built-in job Counters
User-Defined Java Counters
• MapReduce allows user code to define a set of counters, which are then
incremented as desired in the mapper or reducer. \
• Counters are defined by a Java enum, which serves to group related counters.
• A job may define an arbitrary number of enums, each with an arbitrary
number of fields.
• The name of the enum is the group name, and the enum’s fields are the
counter names.
• Counters are global: the MapReduce framework aggregates them across all
maps and reduces to produce a grand total at the end of the job.
• It is also possible retrieve counter values using the Java APIwhile the job is
Running, although it is more usual to get counters at the end of a job run,
when they are stable.
User-Defined Java Counters
Sorting
• by default, MapReduce will sort input records by their keys.
Sorting
• Suppose we run this program using 30 reducers using following command

• This command produces 30 output files, each of which is sorted.

• However, there is no easy way to combine the files to produce a globally sorted file
(Partial Sort).
• Total Sort
• produces a globally-sorted output file
• Produce a set of sorted files that, if concatenated, would form a globally sorted
file
• Use a partitioner that respects the total order of the output
• Ex) Range partitioner
Total Sort
• Although Total Sort approach of producing a set of files and concatenating them
works, It requires to choose the partition sizes carefully to ensure that they are fairly
even so that job times aren’t dominated by a single reducer
• Example: bad partitioning

• To construct more even partitions, we need to have a better understanding of the

distribution for the whole dataset
• It’s possible to get a fairly even set of partitions, by sampling the key space. The idea
behind sampling is that you look at a small subset of the keys to approximate the key
distribution, which is then used to construct partitions. Luckily, we don’t have to
write the code to do this ourselves, as Hadoop comes with a selection of samplers.
Sorting
• The InputSampler class defines a nested Sampler interface whose implementations return a
sample of keys given an InputFormat and Job:

• Type of samplers
1. Random sampler
2. Split sampler
3. Interval sampler
• RandomSampler method takes following parameters to chooses keys
1. uniform probability,
2. the maximum number of samples to take
3. the maximum number of splits to sample as parameters
Sorting
• SplitSampler samples only the first n records in a split. It is not so good for
sorted data because it doesn’t select keys from throughout the split.
• IntervalSampler chooses keys at regular intervals through the split and makes
a better choice for sorted data.
• RandomSampler is a good general-purpose sampler.
• If none of these suits user application, he can write own implementation of the
Sampler interface.
• The objective of sampling is to produce partitions that are approximately equal
in size.
Secondary Sort
• The MapReduce framework sorts the records by key before they reach the reducers.
For any particular key, however, the values are not sorted.
• The order in which the values appear is not even stable from one run to the next,
because they come from different map tasks, which may finish at different times
from run to run.
• It is possible to impose an order on the values by sorting and grouping the keys in a
particular way.
• Do following to get the effect of sorting by value:
Make the key a composite of the natural key and the natural value.
The sort comparator should order by the composite key (i.e., the natural key and
natural value).
The partitioner and grouping comparator for the composite key should consider
only the natural key for partitioning and grouping.
Joins
• MapReduce can perform joins between large datasets
• How the join can be implemented depends on how large the
datasets are and how they are partitioned. If one dataset is large (the
weather records) but the other one is small enough to be distributed
to each node in the cluster (as the station metadata is), the join can
be effected by a MapReduce job that brings the records for each
station together.
• The mapper or reducer uses the smaller dataset to look up the
station metadata for a station ID.
• If the join is performed by the mapper it is called a map-side join.
• if the join is performed by the reducer it is called a reduce-side join.
Inner join of two datasets
Map-Side Joins
• If both datasets are too large for either to be copied to each node in the cluster, they
can still be joined using MapReduce with a map-side or reduce-side join, depending
on how the data is structured.
• A map-side join works by performing the join before the data reaches the map
function
• Requirements
• Each input dataset must be divided into the same number of partitions
• It must be sorted by the same key (the join key) in each source
• All the records for a particular key must reside in the same partition
• Above requirements actually fit the description of the output of a MapReduce job
• A map-side join can be used to join the outputs of several jobs that had the same
number of reducers, the same keys, and output files that are not splittable
Map-Side Joins
• Use a CompositeInputFormat from the org.apache.hadoop.mapred.join
package to run a map-side join
Reduce-Side Joins
• More general than a map-side join
• Input datasets don’t have to be structured in any particular way
• Less efficient as both datasets have to go through the MapReduce shuffle
• Idea
• The mapper tags each record with its source
• Uses the join key as the map output key so that the records with the same key are
brought together in the reducer
• Multiple inputs
• The input sources for the datasets have different formats
• Use the MultipleInputs class to separate the logic for parsing and tagging each source.
• Secondary sort
• To perform the join, it is important to have the data from one source before another
Joins - Map-Side vs Reduce-Side
Side Data Distribution
• Side data can be defined as extra read-only data needed by a job to process the main dataset.
• The challenge is to make side data available to all the map or reduce tasks (which are spread across
the cluster) in a convenient and efficient fashion.
• Using the Job Configuration
• Set arbitrary key-value pairs in the job configuration using the various setter methods on
JobConf
• Useful if one needs to pass a small piece of metadata to tasks
• Don’t use this mechanism for transferring more than a few kilobytes of data
• The job configuration is read by the jobtracker, the tasktracker, and the child JVM, and
each time the configuration is read, all of its entries are read into memory, even if they
are not used
• Distributed Cache
• Instead of serializing side data in the job config, it is preferred to distribute the datasets using
Hadoop’s distributed cache
• Provides a service for copying files and archives to the task nodes in time for the tasks to
use them when they run
MapReduce Library Classes
• Hadoop comes with a library of mappers and reducers for commonly used functions.

Advanced Mapreduce
No ratings yet
Advanced Mapreduce
37 pages
Hadoop MapReduce Flow Chart
No ratings yet
Hadoop MapReduce Flow Chart
28 pages
Map Reduce
No ratings yet
Map Reduce
30 pages
S MapReduce Types Formats
100% (2)
S MapReduce Types Formats
22 pages
MAP Reduce - 1
No ratings yet
MAP Reduce - 1
34 pages
Map Reduce Types and Formats
No ratings yet
Map Reduce Types and Formats
32 pages
Map Reduce
No ratings yet
Map Reduce
45 pages
Bda U2
No ratings yet
Bda U2
79 pages
2 1-MapReduce
No ratings yet
2 1-MapReduce
16 pages
Map Reduce Programming
No ratings yet
Map Reduce Programming
64 pages
B. Hadoop Ecosystem - III - B (MapReduce Framework)
No ratings yet
B. Hadoop Ecosystem - III - B (MapReduce Framework)
33 pages
Analyzing The Data With Hadoop
No ratings yet
Analyzing The Data With Hadoop
13 pages
Unit 3
No ratings yet
Unit 3
14 pages
Hadoop Week 4
No ratings yet
Hadoop Week 4
13 pages
BDA Unit-2
No ratings yet
BDA Unit-2
11 pages
Map Reduce
No ratings yet
Map Reduce
74 pages
Cloud Unit 5
No ratings yet
Cloud Unit 5
52 pages
BDA Unit 4 Notes
No ratings yet
BDA Unit 4 Notes
20 pages
Basics of Hadoop
No ratings yet
Basics of Hadoop
19 pages
S MapReduce Types Formats Features 06
No ratings yet
S MapReduce Types Formats Features 06
26 pages
Bda Unit-3
No ratings yet
Bda Unit-3
44 pages
Job Scheduling in MR
No ratings yet
Job Scheduling in MR
6 pages
Bda U3, U4 and U5 Two Marks Qs
No ratings yet
Bda U3, U4 and U5 Two Marks Qs
19 pages
BDA-MapReduce (1) 5rfgy656yhgvcft6
No ratings yet
BDA-MapReduce (1) 5rfgy656yhgvcft6
60 pages
Map Reduce Programming
No ratings yet
Map Reduce Programming
67 pages
Big Data Analytics Unit-3
No ratings yet
Big Data Analytics Unit-3
29 pages
MapReduce Arch
No ratings yet
MapReduce Arch
29 pages
Unit 4
No ratings yet
Unit 4
19 pages
Unit 3 Bda
No ratings yet
Unit 3 Bda
59 pages
Lecture 04
No ratings yet
Lecture 04
25 pages
Computer Graphics All MCQ
100% (1)
Computer Graphics All MCQ
58 pages
Big Data Unit - 3
No ratings yet
Big Data Unit - 3
7 pages
Unit 4
No ratings yet
Unit 4
11 pages
Map Reduce Programming
No ratings yet
Map Reduce Programming
74 pages
S MapReduce Types Formats Features
No ratings yet
S MapReduce Types Formats Features
15 pages
Mapreduce Introduction
No ratings yet
Mapreduce Introduction
14 pages
Developing A Mapreduce Application: by Dr. K. Venkateswara Rao Professor Department of Cse
No ratings yet
Developing A Mapreduce Application: by Dr. K. Venkateswara Rao Professor Department of Cse
83 pages
Unit 2 - From Hadoop Streaming PDF
No ratings yet
Unit 2 - From Hadoop Streaming PDF
20 pages
Map Reduce
No ratings yet
Map Reduce
25 pages
Hadoop
No ratings yet
Hadoop
38 pages
HDFS Unit 4
No ratings yet
HDFS Unit 4
12 pages
Hadoop Training in Hyderabad
No ratings yet
Hadoop Training in Hyderabad
49 pages
Hadoop Wordcount Program
No ratings yet
Hadoop Wordcount Program
20 pages
Hadoop Unit III DR David
No ratings yet
Hadoop Unit III DR David
12 pages
Bda - Unit 3
No ratings yet
Bda - Unit 3
29 pages
Mapreduce: Simplified Data Processing On Large Clusters by Jeffrey Dean and Sanjay Ghemawa Presented by Jon Logan
No ratings yet
Mapreduce: Simplified Data Processing On Large Clusters by Jeffrey Dean and Sanjay Ghemawa Presented by Jon Logan
30 pages
Map Red
No ratings yet
Map Red
6 pages
Hadoop Mapred
100% (1)
Hadoop Mapred
11 pages
Bda Unit 4
No ratings yet
Bda Unit 4
20 pages
Hadoop Karunesh
No ratings yet
Hadoop Karunesh
14 pages
Hadoop Interview Questions Faq
No ratings yet
Hadoop Interview Questions Faq
14 pages
Map Reduce
No ratings yet
Map Reduce
40 pages
S MapReduce Types Formats Features 03
No ratings yet
S MapReduce Types Formats Features 03
16 pages
P.prabu (28x61c) CCS334 BDA - Unit 4
No ratings yet
P.prabu (28x61c) CCS334 BDA - Unit 4
28 pages
Mad Lab Manual
100% (1)
Mad Lab Manual
68 pages
Final Lab Manual Fs 2021
100% (1)
Final Lab Manual Fs 2021
29 pages
Top Answers To Map Reduce Interview Questions
No ratings yet
Top Answers To Map Reduce Interview Questions
6 pages
03 Firstmrjob Invertedindexconstruction 141206231216 Conversion Gate01 PDF
No ratings yet
03 Firstmrjob Invertedindexconstruction 141206231216 Conversion Gate01 PDF
54 pages
PCS-9705S (Bcu)
No ratings yet
PCS-9705S (Bcu)
2 pages
Part 5
No ratings yet
Part 5
3 pages
History of Neural Networks
No ratings yet
History of Neural Networks
4 pages
Arrays Theory & Sorting
No ratings yet
Arrays Theory & Sorting
11 pages
Sdwan Xe Gs Book
No ratings yet
Sdwan Xe Gs Book
404 pages
Applied Mathematics
No ratings yet
Applied Mathematics
6 pages
75F Modbus Integration PDF
No ratings yet
75F Modbus Integration PDF
5 pages
CS 331 - Outline Spring 2020
No ratings yet
CS 331 - Outline Spring 2020
3 pages
Certification Paths by Credential
No ratings yet
Certification Paths by Credential
17 pages
Unit 4
No ratings yet
Unit 4
30 pages
Azure SQL IT Resource Kit
No ratings yet
Azure SQL IT Resource Kit
3 pages
Akul PHP Practical File (Edited Version) - Page - Number
No ratings yet
Akul PHP Practical File (Edited Version) - Page - Number
52 pages
UNIT-2 IoT
No ratings yet
UNIT-2 IoT
11 pages
Computer System Structure PDF
No ratings yet
Computer System Structure PDF
24 pages
LM139, LM239, LM339: Low-Power Quad Voltage Comparators
No ratings yet
LM139, LM239, LM339: Low-Power Quad Voltage Comparators
16 pages
Computer Fundamentals and Programming: Course Description
No ratings yet
Computer Fundamentals and Programming: Course Description
2 pages
Product CI858
No ratings yet
Product CI858
3 pages
C# Melsec MxComponent V4 #3 Coding (Sending and Receiving in Double Word Units)
No ratings yet
C# Melsec MxComponent V4 #3 Coding (Sending and Receiving in Double Word Units)
28 pages
First Lessons With Make Code and The Micro Bit 4 Slides
No ratings yet
First Lessons With Make Code and The Micro Bit 4 Slides
15 pages
Software Engineer JD
No ratings yet
Software Engineer JD
2 pages
BBM384 Software Engineering Laboratory: R.A. Burcu Yalçiner R.A. Bahar Gezici
No ratings yet
BBM384 Software Engineering Laboratory: R.A. Burcu Yalçiner R.A. Bahar Gezici
13 pages
CNN and RNN Comparative Study For Intrusion Detection System
No ratings yet
CNN and RNN Comparative Study For Intrusion Detection System
12 pages
GR 10 Ict Pra Exam Eng
No ratings yet
GR 10 Ict Pra Exam Eng
4 pages
How To Run Wordcount Program in EC2
No ratings yet
How To Run Wordcount Program in EC2
8 pages
Application Code Samples
No ratings yet
Application Code Samples
4 pages
READ ME Digital Image Processing Lab - Manual
No ratings yet
READ ME Digital Image Processing Lab - Manual
4 pages
7.security Kernels
No ratings yet
7.security Kernels
4 pages
Quickly Test RS-232 Signals On DB9 Ports
No ratings yet
Quickly Test RS-232 Signals On DB9 Ports
2 pages
SAP interface programming with RFC and VBA: Edit SAP data with MS Access
From Everand
SAP interface programming with RFC and VBA: Edit SAP data with MS Access
Karl Josef Hensel
No ratings yet
Dart for Flutter
From Everand
Dart for Flutter
Zeuz IT
No ratings yet
Lisp Programming Language
From Everand
Lisp Programming Language
Faiz ul haque Zeya
No ratings yet
Introduction to Algorithms
From Everand
Introduction to Algorithms
S VASIST
No ratings yet

Mapreduce Types and Formats

Uploaded by

Mapreduce Types and Formats

Uploaded by

MapReduce Types and

• FileOutputFormat subclasses will create output (part-r-nnnnn)

• This command produces 30 output files, each of which is sorted.

• To construct more even partitions, we need to have a better understanding of the

You might also like