0% found this document useful (0 votes)

43 views47 pages

ECS765P - W3 - Hadoop Principles and Components

The document provides an overview of Apache Hadoop, detailing its principles, components, and architecture, including HDFS and YARN. It explains the roles of different nodes and daemons, the MapReduce framework, and the importance of data distribution and replication. Additionally, it covers the Hadoop ecosystem, including security measures, distributed coordination services, and data ingestion tools.

Uploaded by

Yen-Kai Cheng

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

43 views47 pages

ECS765P - W3 - Hadoop Principles and Components

Uploaded by

Yen-Kai Cheng

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 47

ECS640U/ECS765P Big Data Processing

Hadoop principles and components

Lecturer: Ahmed M. A. Sayed
School of Electronic Engineering and Computer Science
ECS640U/ECS765P Big Data Processing

Hadoop Principles and Components

Lecturer: Ahmed M. A. Sayed
School of Electronic Engineering and Computer Science

Credit: Joseph Doyle, Jesus Carrion, Felix Cuadrado, …

Contents

● Introduction to Apache Hadoop

● HDFS
● YARN
● The Apache Hadoop Ecosystem
Map/Reduce job
Map/Reduce framework roles
Parallelising the problem
Hadoop is a MapReduce framework which executes on a cluster of networked PCs
● Each node runs a set of daemons or service to facilitate the execution of MapReduce jobs
● ResourceManager
● NodeManger as many as clusters YARN
● JobHistoryServer
● NameNode
● DataNode as many as clusters
HDFS
● SecondaryNameNode Hadoop Distributed File System
Nodes vs Daemons
● A node is a process which is running on virtual machine, physical machine or in a container
● A daemon is a process which runs in the background rather than being directly controlled by a user
Hadoop Architecture
Leader-Follower Architecture
master-slave
Leader (1)
● Is aware of all the follower nodes
● Receives external requests
● Decides who executes what, and when
● Speaks with follower

Follower (1..*)
● Worker node
● Executes the tasks the leader tells it to do
Hadoop Leader-Follower architecture
from the daemon’s point of view:
Contents

● Introduction to Apache Hadoop?

● HDFS (Hadoop Distributed File System)
● YARN
● The Apache Hadoop Ecosystem
HDFS
Hadoop Distributed File System (HDFS)
● Shared distributed storage among the nodes of the Hadoop cluster
Storage for Input and output of MapReduce jobs
HDFS is tailored for MapReduce jobs
● Large block size (128MB default)
But not too large, blocks define the minimum parallelization unit
Trade-offs for improving data processing throughput
HDFS Namenode (Master Node)

● Manages the file system namespace

● Maintains the filesystem tree and metadata for all files and directories in the tree
● Namespace Data is stored persistently in two files:
Namespace image file
Edit log file
Also knows which Datanode possess the blocks for a given file (not persistently)

NameNode maintains the data that provides information about DataNodes like which block is mapped to which
DataNode (this information is called metadata) and also executes operations like the renaming of files.
HDFS Datanode (Slave Node)

● Workhorse of the filesystem

● Stores and retrieves blocks when instructed
● Reports to Namenode periodically with a list of blocks that it is storing
● Implements block caching for blocks which are frequently accessed
Blocks is cached in Datanode’s memory
By default a block is only cached in one datanode’s memory but this is configurable
128M

DataNodes store the actual data and also perform tasks like replication and deletion of data as instructed by
NameNode. DataNodes also communicate with each other.
Hadoop Nodes Daemons
● DataNode (1 ..* per cluster)
Stores blocks from the HDFS
Report periodically to NameNode list of stored blocks
● NameNode (1 per cluster)
Keeps index table with (all) the locations of each block
Heavy task, no computation responsibilities Daemons have no computation responsibilities!

Single point of failure

● Secondary Namenode (1 per cluster)
Communicates periodically with NameNode
Stores backup copy of index table
HDFS Data Distribution
Data distribution is a key element of the MapReduce model and architecture
“Move computation to data” principle
Blocks are replicated over the cluster for fault-tolerance purposes
Default number of replicas is three times
Spread replicas among different physical locations
Improves reliability
Data Replication
HDFS Usage

Note: Client is an interface that communicates with NameNode for metadata and DataNodes for read
and writes operations.
HDFS File Read operation
HDFS File Write operation
(only contact one datanode)

7: complete
Contents

● Introduction to Apache Hadoop?

● HDFS
● YARN (Yet Another Resource Negotiator)
● The Apache Hadoop Ecosystem

Quiz and Break

MapReduce Classic Job Tracker Architecture
Competing demand for resources and
execution cycles arising from the
single point of control in the design

Reliability, Availability and Utilization issues

Scalability issue - Clusters of 10,000 nodes or/and 200,000 cores
Unpredictable Latency - A major customer concern
Job Execution Architecture (YARN)
The fundamental idea of YARN is to split
(2) Job Scheduling
up the two major functionalities of the & Monitoring
JobTracker into separate processes

(1) Resource Management (daemons)

Hadoop computation tasks
Resource Management Resource manager, Node manager

● Being aware of what resources are in the cluster

● Which resources are available/used/failed now
Job Allocation Resource manager, Application Master/Manager

● How many resources are needed to compute the job

● Which nodes should execute each of the tasks
Job Execution/Monitoring Application Master/Manager, Node manager

● Coordinate task execution from workers

● Make sure the job completes, deal with failures
Hadoop job allocation
Resource management needs to estimate how many Map and Reduce tasks are needed for a given job
● Based on input dataset
● Based on job definition
Ideally, a single node (physical node, VM, or container) will be allocated for each different Map/Reduce
tasks
● Otherwise multiple tasks can be allocated to the same node (physical node, VM, or container)
Job Execution: complete MapReduce job flow
● Split (logically) input data into computing chunks
● Assign one chunk to a (co-located) NodeManager
● Run 1..* Mappers
● Shuffle and Sort
● Run 1..* Reducers
● Results from the Reducers create the job output
How many Mappers are needed?
Mapper parallelisation:
● Each Mapper processes a different input split
● Input dataset size is known
Number of mappers = input size / split size
● If input has multiple small files, more Mappers can be invoked (Hadoop inefficiency)
How many Reducers are needed?
Reducer parallelisation
● Keys are partitioned across the reducers
● Hard to automatically estimate what is the right number
● Too many Reducers can result into too much shuffle and sort.
Number of reducers = User defined parameter
● (in MapReduce job definition)
Hadoop Execution daemons
ResourceManager (1 per cluster)
● Receives job requests from Hadoop Clients
● Creates one ApplicationMaster per job to manage it
● Allocates Containers in slave nodes, with assigned/dedicated resources
● Keeps track of the health of NodeManager nodes
NodeManager (1..* per cluster)
● Coordinates execution of Map and Reduce tasks at node
● Sends heartbeat messages to ResourceManager

the task/chunk is run by each container

Application Master/Manager in slave node/node manager
the most computing part takes place here

One per job. Implements the specific computing framework

● After creation, negotiates with ResourceManager how many resources will be required for the job
● Decides which nodes will run Map and Reduce jobs among the Containers given by the
ResourceManager
● Reports to the ResourceManager about the progress and completion of the whole job
● Is destroyed when the job is completed
● Job outcome recorded in the JobHistoryServer
Responsibilities on computation tasks
● Resource management

● Job allocation

● Job execution
Application Manager
Three different schedulers available in YARN
● FIFO
● Capacity
● Fair
FIFO Scheduler
First in, first out. Requests for the first application in the queue are allocated first; once its requests have
been satisfied, the next application in the queue is served, and so on.
● Easy to understand
● No configuration necessary
● Not suitable for shared clusters
● Large applications will use all resources in the cluster so each application will have to wait its turn
FIFO Scheduler
Capacity Scheduler
A separate dedicated queue allows a small job to start as soon as it is submitted
● Large jobs finish later
● Smaller jobs get results back in reasonable time
● The overall cluster utilization can be low since the queue capacity is reserved for jobs in that queue
Capacity Scheduler
Fair Scheduler
Cluster will dynamically balance resources between all running jobs. Just after the first (large) job starts, it
is the only job running, so it gets all the resources in the cluster. When the second (small) job starts, it is
allocated half of the cluster resources so that each job is using its fair share of resources.
● Lag between the time the second job starts and when it receives its fair share, since it has to wait for
resources to free up as containers used by the first job complete.
● High cluster utilization
● Timely small job completion
Fair Scheduler

there should be a lag after job 2 submitted

Contents

● Introduction to Apache Hadoop?

● HDFS
● YARN
● The Apache Hadoop Ecosystem

Quiz!
The Apache Hadoop Ecosystem

Several other components can be used in the Hadoop Ecosystem

Three of the important ones are:
● Security (Kerebos)
● Distributed Coordination Service (Zookeeper)
● Data Ingestion from event-based data (Flume)
Kerebos
By default, the security in Hadoop is set to simple which uses a simple authentication mechanism
However, malicious users to assume the root’s identity to access or delete any data in the cluster
Kerebos prevents this by introducing a three step process to gain access to a service which are
● Authentication: The client authenticates itself to the Authentication Server and receives a
timestamped Ticket-Granting Ticket (TGT).
● Authorization: The client uses the TGT to request a service ticket from the Ticket-Granting Server.
● Service request: The client uses the service ticket to authenticate itself to the server that is providing
the service. In the case of Hadoop, this might be the namenode or the resource manager.

TGTs last 10 hours by default to user will only need go through this process every 10 hours (This is also
configurable). Kind of similar to Single Sign-on (SSO)
Kerebos
Zookeeper
● Distributed, open-source coordination service for distributed applications.
keeps the distributed system functioning together as a single unit via synchronization and coordination
● Quorum algorithms for selecting leaders, agreeing on shared state
The minimum number of servers required to run the Zookeeper is called Quorum. Zookeeper replicates
whole data tree to all the quorum servers

https://medium.com/@akashsingla19/zookeeper-quorum-44906bb17d74
Hadoop with Automated failover

ZKFailoverController (ZKFC) is a ZooKeeper client which implements automated failover for the
NameNodes (master and secondaries) by:

● Failure detection - each NameNode maintains a persistent session in ZooKeeper. If the machine
crashes, the ZooKeeper session will expire, notifying the other NameNodes that a failover should be
triggered.
● Active NameNode election - If the active NameNode crashes, another node may take a special
exclusive lock in ZooKeeper indicating that it should become the next active.

https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.0.0/fault-
tolerance/content/configuring_and_deploying_namenode_automatic_failover.html
Flume

● Flume runs agents which are long-lived Java process that run sources and sinks, connected by channels
● A source in Flume produces events and delivers them to the channel
● The channel stores the events until they are forwarded to the sink
● The Flume installation is made up of a collection of connected agents running in a distributed topology

https://flume.apache.org/FlumeUserGuide.html
Flume

• Flume event is defined as a unit of data flow having a byte payload and an optional set of string
attributes
• Flume Agent is a (JVM) process that hosts the components through which events flow from an
external source to the next destination
Contents

● Introduction to Apache Hadoop?

● HDFS
● YARN
● The Apache Hadoop Ecosystem

Quiz and End!

Important RGPV Question OS
100% (1)
Important RGPV Question OS
13 pages
The Passion of An Amateur Card Magician
100% (4)
The Passion of An Amateur Card Magician
557 pages
Tom Rose - From The Red Notebook 2nd Edition
75% (4)
Tom Rose - From The Red Notebook 2nd Edition
33 pages
Process Management - Operating Systems
No ratings yet
Process Management - Operating Systems
109 pages
Misc Topics in Computer Networks
No ratings yet
Misc Topics in Computer Networks
160 pages
Logs
0% (1)
Logs
14 pages
README - Tano1221
No ratings yet
README - Tano1221
4 pages
ECS765P - W11 - Stream Processing II
No ratings yet
ECS765P - W11 - Stream Processing II
47 pages
Overview of RTOS
No ratings yet
Overview of RTOS
22 pages
ECS781P-9-Cloud Data Management
No ratings yet
ECS781P-9-Cloud Data Management
79 pages
W4 Ecs7020p
No ratings yet
W4 Ecs7020p
48 pages
Ecs765p W2
No ratings yet
Ecs765p W2
55 pages
XFS - Extended Filesystem
No ratings yet
XFS - Extended Filesystem
46 pages
W3 Ecs7020p
No ratings yet
W3 Ecs7020p
51 pages
W2 Ecs7020p
No ratings yet
W2 Ecs7020p
54 pages
Ecs781p 4 Rest
No ratings yet
Ecs781p 4 Rest
47 pages
ECS781P-3-Cloud Applications
No ratings yet
ECS781P-3-Cloud Applications
50 pages
Shell Scripting Exercises. Exercise - 1 - by Sanka Dharmarathna - Medium
No ratings yet
Shell Scripting Exercises. Exercise - 1 - by Sanka Dharmarathna - Medium
13 pages
EaseUS Data Recovery Wizard User Guide
No ratings yet
EaseUS Data Recovery Wizard User Guide
8 pages
Hadoop Overview: Open Source Framework Processing Large Amounts of Heterogeneous Data Sets Distributed Fashion
No ratings yet
Hadoop Overview: Open Source Framework Processing Large Amounts of Heterogeneous Data Sets Distributed Fashion
62 pages
ECS781P 6 CloudPerformanceSLAs
No ratings yet
ECS781P 6 CloudPerformanceSLAs
39 pages
Matt Mello - Thought Control
No ratings yet
Matt Mello - Thought Control
16 pages
Lecture 01 Operating Systems Introduction
100% (1)
Lecture 01 Operating Systems Introduction
8 pages
ECS765P - W9 - Large-Scale Graph Processing
No ratings yet
ECS765P - W9 - Large-Scale Graph Processing
51 pages
ECS765P - W10 - Stream Processing
No ratings yet
ECS765P - W10 - Stream Processing
39 pages
ECS765P - W6 - Big Data Ingestion and Storage
No ratings yet
ECS765P - W6 - Big Data Ingestion and Storage
34 pages
ECS726-Week05 Cryptographic Protocols Key Management-P
No ratings yet
ECS726-Week05 Cryptographic Protocols Key Management-P
58 pages
ECS726-Week02 Symmetric EncryptionP
No ratings yet
ECS726-Week02 Symmetric EncryptionP
62 pages
ECS726-Week01 Intro
No ratings yet
ECS726-Week01 Intro
70 pages
Chapter 4: Threads: Silberschatz, Galvin and Gagne ©2013 Operating System Concepts - 9 Edition
No ratings yet
Chapter 4: Threads: Silberschatz, Galvin and Gagne ©2013 Operating System Concepts - 9 Edition
24 pages
1 SNA Lab Lecture1
No ratings yet
1 SNA Lab Lecture1
41 pages
ECS781P 10 Microservices
No ratings yet
ECS781P 10 Microservices
34 pages
ECS726-Week04 - Hash - MAC - Digital Sinatures - Freshness - Dynamic Password Schemes
No ratings yet
ECS726-Week04 - Hash - MAC - Digital Sinatures - Freshness - Dynamic Password Schemes
52 pages
Unit 3
No ratings yet
Unit 3
24 pages
ECS7020P ClassificationExercisesSolutions II
No ratings yet
ECS7020P ClassificationExercisesSolutions II
7 pages
Basic Linux Commands ESXI
No ratings yet
Basic Linux Commands ESXI
7 pages
Virtual Cluster For HPC Education
No ratings yet
Virtual Cluster For HPC Education
10 pages
BDA Lab Assignment 2
No ratings yet
BDA Lab Assignment 2
18 pages
OVRDBF
No ratings yet
OVRDBF
15 pages
Magic Pen Script 10-05-19
No ratings yet
Magic Pen Script 10-05-19
4 pages
Module-2 PPT-1
No ratings yet
Module-2 PPT-1
126 pages
Lights Illusions Script 08-26-19
No ratings yet
Lights Illusions Script 08-26-19
6 pages
Chapter Five: File Permissions in Linux
No ratings yet
Chapter Five: File Permissions in Linux
17 pages
The Floatpag Package: Vytas Statulevi Cius and Sigitas Tolu Sis Vytas@vtex - LT, Sigitas@vtex - LT v1.1 From 2012/05/29
No ratings yet
The Floatpag Package: Vytas Statulevi Cius and Sigitas Tolu Sis Vytas@vtex - LT, Sigitas@vtex - LT v1.1 From 2012/05/29
4 pages
Unit-Iv CC&BD CS71
No ratings yet
Unit-Iv CC&BD CS71
148 pages
Java Profiling
No ratings yet
Java Profiling
100 pages
ECS781P-11-Edge of The Cloud
No ratings yet
ECS781P-11-Edge of The Cloud
30 pages
Creating and Recording Command Lists and Bundles
No ratings yet
Creating and Recording Command Lists and Bundles
4 pages
Chapter 10
No ratings yet
Chapter 10
45 pages
Ejemplo de Como Hacer Un Script KSH
No ratings yet
Ejemplo de Como Hacer Un Script KSH
3 pages
UNIT V-Cloud Computing
No ratings yet
UNIT V-Cloud Computing
33 pages
Unit 2 Hadoop
No ratings yet
Unit 2 Hadoop
67 pages
How To Fix Wrong DID Entries After A Disk Replacement
No ratings yet
How To Fix Wrong DID Entries After A Disk Replacement
7 pages
Bda Unit 2
No ratings yet
Bda Unit 2
79 pages
YARN
No ratings yet
YARN
5 pages
Data Science
No ratings yet
Data Science
14 pages
Xampp With Oracle
No ratings yet
Xampp With Oracle
9 pages
Big Data Unit 2
No ratings yet
Big Data Unit 2
31 pages
Bda - Unit 2
No ratings yet
Bda - Unit 2
56 pages
Cloud Computing Lab 2
No ratings yet
Cloud Computing Lab 2
4 pages
Business Intelligence & Big Data Analytics-CSE3124Y
No ratings yet
Business Intelligence & Big Data Analytics-CSE3124Y
26 pages
UNIT-1-part-2-BIG DATA ANALYTICS AND TOOLS
No ratings yet
UNIT-1-part-2-BIG DATA ANALYTICS AND TOOLS
19 pages
Lecture-1 - 3 Hadoop - HDFS - Mapreduce (Self Study)
No ratings yet
Lecture-1 - 3 Hadoop - HDFS - Mapreduce (Self Study)
25 pages
OS LAB 1 - BASICs - SHELL
No ratings yet
OS LAB 1 - BASICs - SHELL
10 pages
Unit - 4 by Pragya Siddhi
No ratings yet
Unit - 4 by Pragya Siddhi
15 pages
Unit 2 Hadoop
No ratings yet
Unit 2 Hadoop
60 pages
Hadoopintro
No ratings yet
Hadoopintro
31 pages
Introduction To Hadoop: Dr. G Sudha Sadhasivam Professor, CSE PSG College of Technology Coimbatore
No ratings yet
Introduction To Hadoop: Dr. G Sudha Sadhasivam Professor, CSE PSG College of Technology Coimbatore
34 pages
The TSL Instruction: Examples To Use Sleep and Wakeup Primitives
No ratings yet
The TSL Instruction: Examples To Use Sleep and Wakeup Primitives
1 page
Bda A2
No ratings yet
Bda A2
17 pages
Chapter2 Bdi
No ratings yet
Chapter2 Bdi
101 pages
Lecture 2
No ratings yet
Lecture 2
28 pages
Bigdata and Hadoop - Unit III
No ratings yet
Bigdata and Hadoop - Unit III
24 pages
CH 2
No ratings yet
CH 2
6 pages
Chapter 2 Introduction To Hadoop
No ratings yet
Chapter 2 Introduction To Hadoop
31 pages
Jenny Blog
No ratings yet
Jenny Blog
12 pages
Hadoop
No ratings yet
Hadoop
4 pages
Lecture 06 - Data Analytics For IoT A Primer
No ratings yet
Lecture 06 - Data Analytics For IoT A Primer
31 pages
Unit 2 Notes BDA
No ratings yet
Unit 2 Notes BDA
10 pages
Hadoop 1
No ratings yet
Hadoop 1
75 pages
Week 3 v1.1 (Hidden) Supervised Learning (Regression)
No ratings yet
Week 3 v1.1 (Hidden) Supervised Learning (Regression)
52 pages
ECS765P - W5 - Spark Programming
No ratings yet
ECS765P - W5 - Spark Programming
43 pages
BDA Unit 1
No ratings yet
BDA Unit 1
35 pages
Part 1
No ratings yet
Part 1
14 pages
Week 4 v1.1 (Hidden) - Supervised Learning (Classification)
No ratings yet
Week 4 v1.1 (Hidden) - Supervised Learning (Classification)
43 pages
DM Hadoop Architecture
No ratings yet
DM Hadoop Architecture
6 pages
ECS765P - W4 - Introduction To Spark
No ratings yet
ECS765P - W4 - Introduction To Spark
39 pages
Unit 3
No ratings yet
Unit 3
18 pages
Unit-2 - Introduction To Hadoop and Hadoop Architecture
No ratings yet
Unit-2 - Introduction To Hadoop and Hadoop Architecture
46 pages
Unit 3
No ratings yet
Unit 3
25 pages
HADOOP
No ratings yet
HADOOP
19 pages
DBS Architecture
No ratings yet
DBS Architecture
12 pages
Haoop Architecture
No ratings yet
Haoop Architecture
34 pages
Big Data-Week 3 - 1
No ratings yet
Big Data-Week 3 - 1
22 pages
bdcc-2 2
No ratings yet
bdcc-2 2
12 pages
Converting Traffic Signs Dataset To YOLO Format
No ratings yet
Converting Traffic Signs Dataset To YOLO Format
3 pages
2-Hadoop History Terminologies DFS-03-01-2025
No ratings yet
2-Hadoop History Terminologies DFS-03-01-2025
52 pages
Hadoop 1
No ratings yet
Hadoop 1
26 pages
Hadoop
No ratings yet
Hadoop
31 pages
Note - Wireless Communications For Everybody
No ratings yet
Note - Wireless Communications For Everybody
2 pages
Module II
No ratings yet
Module II
46 pages
Big Data Notes Unit-3
No ratings yet
Big Data Notes Unit-3
7 pages
Crash 2025 03 18 22 29 32 791
No ratings yet
Crash 2025 03 18 22 29 32 791
6 pages
BigdataUnit III-Part2
No ratings yet
BigdataUnit III-Part2
9 pages
Unit-3 BDA
No ratings yet
Unit-3 BDA
30 pages
Introduction To Hadoop
No ratings yet
Introduction To Hadoop
18 pages
Chapter - 6 - Hadoop
No ratings yet
Chapter - 6 - Hadoop
51 pages
Unit 5-PLH
No ratings yet
Unit 5-PLH
34 pages
Big Data Unit-2 PPT Part1
No ratings yet
Big Data Unit-2 PPT Part1
76 pages
Hadoop Platform & Services
No ratings yet
Hadoop Platform & Services
41 pages
Introduction To Hadoop
No ratings yet
Introduction To Hadoop
56 pages
BD U-4 (Anupam Sir)
No ratings yet
BD U-4 (Anupam Sir)
23 pages
Lec 5 - Big Data Storage Technologies I - Hadoop
No ratings yet
Lec 5 - Big Data Storage Technologies I - Hadoop
44 pages
HDFS
No ratings yet
HDFS
46 pages
UNIT-4 BIG DATA (NoSql)
No ratings yet
UNIT-4 BIG DATA (NoSql)
38 pages
Unit - 2
No ratings yet
Unit - 2
42 pages
Wa0002.
No ratings yet
Wa0002.
66 pages
Introduction To Hadoop
No ratings yet
Introduction To Hadoop
27 pages
Unit-4: Illustrate Mapreduce Architecture With Diagram
No ratings yet
Unit-4: Illustrate Mapreduce Architecture With Diagram
7 pages
Big Data Analytics
From Everand
Big Data Analytics
Nitin Kumar Yadav
No ratings yet

ECS765P - W3 - Hadoop Principles and Components

Uploaded by

ECS765P - W3 - Hadoop Principles and Components

Uploaded by

ECS640U/ECS765P Big Data Processing

Hadoop principles and components

Hadoop Principles and Components

Credit: Joseph Doyle, Jesus Carrion, Felix Cuadrado, …

● Introduction to Apache Hadoop

● Introduction to Apache Hadoop?

● Manages the file system namespace

● Workhorse of the filesystem

Single point of failure

● Introduction to Apache Hadoop?

Quiz and Break

Reliability, Availability and Utilization issues

(1) Resource Management (daemons)

● Being aware of what resources are in the cluster

● How many resources are needed to compute the job

● Coordinate task execution from workers

the task/chunk is run by each container

One per job. Implements the specific computing framework

there should be a lag after job 2 submitted

● Introduction to Apache Hadoop?

Several other components can be used in the Hadoop Ecosystem

● Introduction to Apache Hadoop?

Quiz and End!

You might also like