S MapReduce Types Formats Features 06
S MapReduce Types Formats Features 06
● Input Format takes care about how input file is split and read
by Hadoop.
● It uses input format interface and TextInputFormat is the
default.
● Each Input file is broken into splits and each map processes a
single split.Each Split is further divided into records of
key/value pairs which are processed by map tasks one record
at a time.
● Record reader creates key/value pairs from input splits and
writes on context, which will be shared with Mapper class.
Input Format
Types of Input File Format
● SequenceFileasTextInputFormat: Similar to
SequenceFileInputFormat. It converts sequence file key
values to text objects.
● SequenceFileasBinaryInputFormat: To read any
sequence files. It is used to extract sequence files keys and
values as opaque binary object.
● NLineInputFormat: Similar to TextInputFormat, But each
split is guaranteed to have exactly N lines.
● DBInputFormat: To read data from RDS. Key is
LongWrittables and values are DB Writable.
Output Formats
● Counters provides a way to measure the progress or the number of operations that occur within map
reduce.
● There are basically 2 types of MapReduce Counters:
■ Hadoop maintains some built-in Hadoop counters for every job and these report various
metrics, like, there are counters for the number of bytes and records, which allow us to
confirm that the expected amount of input is consumed and the expected amount of
output is produced.
■ In addition to MapReduce built-in counters, MapReduce allows user code to define a set of
counters
○ one-to-one
Map-side join
e● Graph Querying
When the join function is performed by the mapper it is called map
side join.
● Expects a strong prerequisite before joining data at map side. The
prerequisites are:
○ All the records for a particular key must reside in the same
partition.
Conclusion
● Topics covered:
○ Map Reduce
○ Input Formats
○ Output Formats
○ Sorting
○ Counters
○ Conclusion
Thank you