0% found this document useful (0 votes)

3 views

Data Types in Hadoop

Hadoop uses the Writable interface for serialization and de-serialization of data in HDFS and MapReduce, requiring all data types to implement this interface. It provides various Writable classes for primitive types, arrays, maps, and other custom objects, including specialized classes like NullWritable and Text. For complex data types not covered by existing wrappers, users can encode them as Text, use regular expressions, or create custom implementations of WritableComparable.

Uploaded by

d.deshmukh13287

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

Data Types in Hadoop

Uploaded by

d.deshmukh13287

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 3

Data Types in Hadoop

While programming for a distributed system, we cannot use standard data types. This is because they do
not know how to read from/write to the disk i.e. they are not serializable.

Hadoop provides Writable interface based data types for serialization and de-serialization of data
storage in HDFS and MapReduce computations.

 Serialization is the process of converting object data into byte stream data for transmission over
a network across different nodes in a cluster or for persistent data storage.

 De-serialization is the reverse process of serialization and converts byte stream data into object
data for reading data from HDFS.

All types in Hadoop must implement the Writable interface. Hadoop provides Writable wrappers for
almost all Java primitive types and some other types. However, we might sometimes need to create our
own wrappers for custom objects.

All the Writable wrapper classes have a get() and a set() method for retrieving and storing the wrapped
value.

Hadoop also provides another interface called WritableComparable. The WritableComparable interface
is a sub-interface of Hadoop’s Writable and Java’s Comparable interfaces.

Note that the Writable and WritableComparable interfaces are provided in the org.apache.hadoop.io
package.

As we know, data flows from mappers to reducers in the form of (key, value) pairs. It is important to
note that any data type used for the key must implement the WritableComparable interface along
with Writable interface to compare the keys of this type with each other for sorting purposes, and any
data type used for the value must implement the Writable interface.

Writable Classes

Primitive Writable Classes

These are Writable wrappers for Java primitive data types and they hold a single primitive value. Below
is the list of primitive writable data types available in Hadoop:

 BooleanWritable

 ByteWritable

 IntWritable

 VIntWritable
 FloatWritable

 LongWritable

 VLongWritable

 DoubleWritable

Note that the serialized sizes of the above primitive Writable data types are same as the size of the
actual Java data types.

Array Writable Classes

Hadoop provides two types of array Writable classes: one for single-dimensional and another for two-
dimensional arrays:

 ArrayWritable

 TwoDArrayWritable

The elements of these arrays must be other Writable objects like IntWritable or FLoatWritable only; not
the Java native data types like int or float.

Map Writable Classes

Hadoop provides the following MapWritable data types which implement the java.util.Map interface:

 AbstractMapWritable: this is the abstract or base class for other MapWritable classes

 MapWritable: this is a general purpose map, mapping Writable keys to Writable values

 SortedMapWritable: this is a specialization of the MapWritable class that also implements the
SortedMap interface

Other Writable Classes

 NullWritable: it is a special type of Writable representing a null value. No bytes are read or
written when a data type is specified as NullWritable. So, in Mapreduce, a key or a value can be
declared as a NullWritable when we don’t need to use that field

 ObjectWritable: this is a general-purpose generic object wrapper which can store any objects
like Java primitives, String, Enum, Writable, null, or arrays

 Text: it can be used as the Writable equivalent of java.lang.String and its max size is 2 GB. Unlike
Java’s String data type, Text is mutable in Hadoop

 BytesWritable: it is a wrapper for an array of binary data

 GenericWritable: it is similar to ObjectWritable but supports only a few types. The user needs to
subclass this GenericWritable class and needs to specify the types to support

How to Create Complex Data Types in Hadoop?

As discussed earlier, Hadoop already has Writable wrappers for most primitive Java data types.
However, if we need to use a data type for which Hadoop doesn't already have a Writable wrapper, we
can do one of the following:

 Encode it as Text, Ex. (a, b) = “a:b”

 Use regular expressions to parse and extract data

 Define a custom implementation of WritableComparable and define its methods (readFields(),

write(), compareTo()), etc.

big-data-unit 3
No ratings yet
big-data-unit 3
47 pages
Big-Data-Unit 3
No ratings yet
Big-Data-Unit 3
47 pages
Hadoop IO - Notes
No ratings yet
Hadoop IO - Notes
22 pages
Unit 4
No ratings yet
Unit 4
15 pages
Unit 4: IT Dept
No ratings yet
Unit 4: IT Dept
21 pages
Bda Queston and Answer
No ratings yet
Bda Queston and Answer
8 pages
Hadoop and Big Data Unit 4
No ratings yet
Hadoop and Big Data Unit 4
13 pages
Part A
No ratings yet
Part A
13 pages
B.Tech VIII BDA Chapter - 3 1
No ratings yet
B.Tech VIII BDA Chapter - 3 1
3 pages
Data Analytics
No ratings yet
Data Analytics
26 pages
Data Serialization
No ratings yet
Data Serialization
5 pages
Hadoop Unit III DR David
No ratings yet
Hadoop Unit III DR David
12 pages
BigData Avro-1
No ratings yet
BigData Avro-1
30 pages
Bda 2
No ratings yet
Bda 2
15 pages
BDA 2
No ratings yet
BDA 2
13 pages
CH 3 BDA
No ratings yet
CH 3 BDA
13 pages
Question Bank-BDA
No ratings yet
Question Bank-BDA
15 pages
BDT UNIT - III
No ratings yet
BDT UNIT - III
12 pages
BigData Unit 2
No ratings yet
BigData Unit 2
56 pages
Job Scheduling in MR
No ratings yet
Job Scheduling in MR
6 pages
CLOUD UNIT 5
No ratings yet
CLOUD UNIT 5
52 pages
Hadoop: The Definitive Guide Unit 2 Part 2: Hadoop I/O
No ratings yet
Hadoop: The Definitive Guide Unit 2 Part 2: Hadoop I/O
26 pages
42_P16CSE5A-P16ITE3A_2020052204503639
No ratings yet
42_P16CSE5A-P16ITE3A_2020052204503639
23 pages
Csen 3101
No ratings yet
Csen 3101
11 pages
Unit 4
No ratings yet
Unit 4
11 pages
Bda Unit 1
No ratings yet
Bda Unit 1
13 pages
Unit-Iv CC&BD CS62
No ratings yet
Unit-Iv CC&BD CS62
76 pages
Hadoop
No ratings yet
Hadoop
30 pages
Module-2 - Introduction To Hadoop
No ratings yet
Module-2 - Introduction To Hadoop
13 pages
Big Data Unit 2 Notes
No ratings yet
Big Data Unit 2 Notes
6 pages
Hadoop File Formats - YoussefEtman
No ratings yet
Hadoop File Formats - YoussefEtman
8 pages
Introduction to Hadoop - Copy
No ratings yet
Introduction to Hadoop - Copy
14 pages
BDA-Unit 4
No ratings yet
BDA-Unit 4
20 pages
Module 2 BDA
No ratings yet
Module 2 BDA
64 pages
A Brief On MapReduce Performance
No ratings yet
A Brief On MapReduce Performance
6 pages
Untitled Document
No ratings yet
Untitled Document
5 pages
BDA Unit-3
No ratings yet
BDA Unit-3
47 pages
Unit 2
No ratings yet
Unit 2
56 pages
Elementary Concepts of Big Data and Hadoop
No ratings yet
Elementary Concepts of Big Data and Hadoop
4 pages
Data Serialization in Big Data
No ratings yet
Data Serialization in Big Data
3 pages
Module - 2 Half
No ratings yet
Module - 2 Half
12 pages
File Formats in Big Data
No ratings yet
File Formats in Big Data
13 pages
BDA Notes Unit-4
No ratings yet
BDA Notes Unit-4
86 pages
S MapReduce Types Formats
100% (2)
S MapReduce Types Formats
22 pages
Hadoop: A Report Writing On
No ratings yet
Hadoop: A Report Writing On
13 pages
CH 3
No ratings yet
CH 3
4 pages
Quick HadOop Ref Card Always
No ratings yet
Quick HadOop Ref Card Always
2 pages
Unit 3-BDA
50% (2)
Unit 3-BDA
26 pages
High Performance Fault-Tolerant Hadoop Distributed File System
No ratings yet
High Performance Fault-Tolerant Hadoop Distributed File System
9 pages
High Performance Fault-Tolerant Hadoop Distributed File System
No ratings yet
High Performance Fault-Tolerant Hadoop Distributed File System
9 pages
To Hadoop: A Dell Technical White Paper
No ratings yet
To Hadoop: A Dell Technical White Paper
9 pages
Practise Quiz Ccd-333 Exam (01-2014) - Cloudera Quiz Learning
No ratings yet
Practise Quiz Ccd-333 Exam (01-2014) - Cloudera Quiz Learning
44 pages
Lesson 2 A Review of Hadoop
No ratings yet
Lesson 2 A Review of Hadoop
6 pages
Cloudera Academic Partnership 3 PDF
0% (1)
Cloudera Academic Partnership 3 PDF
103 pages
HADOOP notes unit 3 and 4
No ratings yet
HADOOP notes unit 3 and 4
14 pages
Wa0002.
No ratings yet
Wa0002.
32 pages
Hadoop
No ratings yet
Hadoop
7 pages
500+ Interview Questions-1
No ratings yet
500+ Interview Questions-1
126 pages
Lab Manual BDA
No ratings yet
Lab Manual BDA
36 pages
Learn Hive in 24 Hours
From Everand
Learn Hive in 24 Hours
Alex Nordeen
No ratings yet
Naukri Vamsi (7y 5m)
No ratings yet
Naukri Vamsi (7y 5m)
5 pages
Bda Aiml Note Unit 2
No ratings yet
Bda Aiml Note Unit 2
13 pages
Mapreduce Types and Formats
No ratings yet
Mapreduce Types and Formats
65 pages
Cloudera Academic Partnership 7
No ratings yet
Cloudera Academic Partnership 7
70 pages
2 Hadoop Ecosystem
No ratings yet
2 Hadoop Ecosystem
41 pages
(IJCST-V6I6P21) :yogesh Sharma, Aastha Jaie, Heena Garg, Sagar Kumar
No ratings yet
(IJCST-V6I6P21) :yogesh Sharma, Aastha Jaie, Heena Garg, Sagar Kumar
6 pages
The Origin of The Name "Zookeeper"
No ratings yet
The Origin of The Name "Zookeeper"
4 pages
Big Data Notes (All Lectures)
No ratings yet
Big Data Notes (All Lectures)
44 pages
Machine Learning Cheat Sheet: 1. Hardware
No ratings yet
Machine Learning Cheat Sheet: 1. Hardware
14 pages
Apache Kafka
No ratings yet
Apache Kafka
37 pages
r16 Te Sem Viii Choice It Big Data Analytics
No ratings yet
r16 Te Sem Viii Choice It Big Data Analytics
5 pages
Bigdatamcq mcq1
No ratings yet
Bigdatamcq mcq1
21 pages
BDA Lab Manual1
No ratings yet
BDA Lab Manual1
54 pages
Hadoop All Installations
No ratings yet
Hadoop All Installations
19 pages
King SafetyStock
No ratings yet
King SafetyStock
4 pages
Microsoft Azure DP 203 Cert Notes 1712494873
100% (1)
Microsoft Azure DP 203 Cert Notes 1712494873
151 pages
Big Data - Infrastructure Considerations: Author Anand Veeramani / Deepak Shivamurthy
No ratings yet
Big Data - Infrastructure Considerations: Author Anand Veeramani / Deepak Shivamurthy
11 pages
Introduction To Big data-21CS753-syllabus
No ratings yet
Introduction To Big data-21CS753-syllabus
3 pages
HPC Unit 456
No ratings yet
HPC Unit 456
25 pages
Bigdata
No ratings yet
Bigdata
2 pages
Using Big Data Analytics in The Field of Agriculture A Survey
No ratings yet
Using Big Data Analytics in The Field of Agriculture A Survey
3 pages
Array: Gurjit Singh Bhathal, Amardeep Singh
No ratings yet
Array: Gurjit Singh Bhathal, Amardeep Singh
8 pages
Electrical and Electronics Engg. B.Tech. Semester-Vii: Course Code Course Title L T P Credits
No ratings yet
Electrical and Electronics Engg. B.Tech. Semester-Vii: Course Code Course Title L T P Credits
26 pages
Increase Data Lake ROI Whitepaper A4
No ratings yet
Increase Data Lake ROI Whitepaper A4
17 pages
Step 1: Verifying Java Installation: Download Scala
No ratings yet
Step 1: Verifying Java Installation: Download Scala
3 pages
Our History
No ratings yet
Our History
2 pages
IBM Watsonx.data Level 2 Quiz_ Attempt Review
No ratings yet
IBM Watsonx.data Level 2 Quiz_ Attempt Review
17 pages
PowerScale OneFS Supportability and Compatibility Guide 9.2.1.0
No ratings yet
PowerScale OneFS Supportability and Compatibility Guide 9.2.1.0
22 pages
HP Vertica 7.1.x New Features
No ratings yet
HP Vertica 7.1.x New Features
61 pages
UNIT – III
No ratings yet
UNIT – III
38 pages

Data Types in Hadoop

Uploaded by

Data Types in Hadoop

Uploaded by

Data Types in Hadoop

Primitive Writable Classes

Array Writable Classes

Map Writable Classes

Other Writable Classes

 BytesWritable: it is a wrapper for an array of binary data

How to Create Complex Data Types in Hadoop?

 Encode it as Text, Ex. (a, b) = “a:b”

 Use regular expressions to parse and extract data

 Define a custom implementation of WritableComparable and define its methods (readFields(),

You might also like