HKBK College of Engineering Department of Ise: Big Data Analytics (18Cs72) Seminar On The Topic Key-Value Pairs

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 15

HKBK COLLEGE OF ENGINEERING

DEPARTMENT OF ISE
BIG DATA ANALYTICS [18CS72]
SEMINAR ON THE TOPIC KEY-VALUE PAIRS

GUIDED BY
PROFESSOR PRIYA . J

PRESENTED BY
AKHILA R[1HK18IS005]
CONTENTS

 KEY-VALUE PAIR MEANING

 EXAMPLES

 ADVANTAGES

 DISADVANTAGES

 EXAMPLE
KEY-VALUE PAIR

 A key-value pair is two pieces of data associated with each other. The key
is a unique identifier that points to its associated value, and a value is either
the data being identified or a pointer to that data.

 A key-value pair is the fundamental data structure of a key-value store or


key-value database, but key-value pairs have existed outside of software
for much longer
 Key-value pair in MapReduce is the record entity that Hadoop MapReduce
accepts for execution.

 We use Hadoop mainly for data Analysis. It deals with structured,


unstructured and semi-structured data. With Hadoop, if the schema is static
we can directly work on the column instead of key value. But, if the
schema is not static we will work on a key value.
 Keys value is not the intrinsic properties of the data.  But they are chosen
by user analyzing the data.

 MapReduce is the core component of Hadoop, which provides data


processing. It performs processing by breaking the job by into two
phases: Map phase and Reduce phase. Each phase has key-value as input
and output.
MapReduce Key value pair generation in Hadoop

 In MapReduce job execution, before sending data to the mapper, first

convert it into key-value pairs. Because mapper only key-value pairs of

data.

 Key-value pair in MapReduce is generated as follows:


 InputSplit – It is the logical representation of data which InputFormat
generates. In MapReduce program it describes a unit of work that contains
a single map task.

 RecordReader – It communicates with the InputSplit. After that it


converts the data into key value pairs suitable for reading by the Mapper.
RecordReader by default uses TextInputFormat to convert data into key
value pairs.
 In MapReduce job execution, the map function processes a certain key-
value pair. Then emits a certain number of key-value pairs. The Reduce
function processes the values grouped by the same key.

 Then emits another set of key-value pairs as the output. The Map output
types should match the input types of the Reduce as shown below:

 Map: (K1, V1) -> list (K2, V2)


 Reduce: {(K2, list (V2}) -> list (K3, V3)
On what basis is a key-value pair generated
in Hadoop?
 MapReduce Key-value pair generation totally depends on the
data set. Also depends on the required output. Framework
specifies key-value pair in 4 places: Map input/output, Reduce
input/output.
 Map Input:-
 Map Input by default takes the line offset as the key. The
content of the line is value as Text. We can modify them; by
using the custom input format.
Map Output

 The Map is responsible to filter the data. It also provides the environment to group the
data on the basis of key.

 Key– It is field/ text/ object on which the data groups and aggregates on the reducer.
 Value– It is the field/ text/ object which each individual reduces method handles.

 Reduce Input
 Map output is input to reduce. So it’s same as Map-Output.

 Reduce Output
 It totally depends on the required output .
MapReduce Key-value Pair Example
 Example, the content of the file which HDFS stores are Chandler is Joey
Mark is John. So, now by using InputFormat, we will define how this file
will split and read. By default, RecordReader uses TextInputFormat to
convert this file into a key-value pair.

• Key – It is offset of the beginning of the line within the file.
• Value –  It is the content of the line, excluding line terminators.

 Here, Key is 0 and Value is Chandler is Joey Mark is John


ADVANTAGES
 Good for simple data models, if all you need is a simple lookup.

 Easy to get started. There are many quick to set up and free relational DB
solutions.
DISADVANTAGES

 Does not scale well as data models grow in complexity.

 Performance penalties for complex queries

 Value type conversion penalties

You might also like