4 UNIT-4 Introduction To Hadoop
4 UNIT-4 Introduction To Hadoop
Unit-IV
Prof.P.R.Gadekar
INTRODUCING HADOOP
�Data: The treasure Trove
�Hardware Failure
�1.Data Replication
�2.Data Pipeline
Processing data with HADOOP
�Map Reduce programming is software framework.
�It helps you in processing massive data in parallel.
�The MapReduce algorithm contains two important
tasks, namely Map and Reduce.
�
Data Pipeline
�1.Job Tracker
�2.Task Tracker
1. JobTracker:
�It provides connectivity between Hadoop and your
application.
�When you submit code to cluster, Job Tracker creates
the execution plan by deciding which task to assign to
which node.
� It also monitors all the running tasks.
�When a task fails, it automatically re-schedules the task
to a different node after a predefined number of retries.
� JobTracker is a master daemon responsible for
executing overall MapReduce job.