0% found this document useful (0 votes)

13 views43 pages

Module 03 MapReduce - Distributed Off-Line Batch Processing and Yarn - Resource Negotiator

The document provides an overview of MapReduce and YARN, detailing their concepts, architectures, and functionalities. It explains the working processes of MapReduce, including data processing phases and the shuffle mechanism, as well as YARN's resource management and scheduling capabilities. Additionally, it discusses the Capacity Scheduler's features for resource allocation in a multi-tenant environment.

Uploaded by

Lucas Oliveira

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views43 pages

Module 03 MapReduce - Distributed Off-Line Batch Processing and Yarn - Resource Negotiator

Uploaded by

Lucas Oliveira

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 43

Technical Principles of

MapReduce and YARN

www.huawei.com

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved.

Objectives
 Upon completion of this course, you will be able to know:
 Concepts of MapReduce and YARN
 Application scenarios and principles of MapReduce
 Functions and architectures of MapReduce and YARN
 New Features of YARN

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 2
Contents
1. Introduction to MapReduce and YARN

2. Functions and Architectures of MapReduce and YARN

3. Resource Management and Task Scheduling of YARN

4. Enhanced Features

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 3
MapReduce Overview
 MapReduce is developed based on the paper issued by Google about
MapReduce and is used for parallel computing of a massive data set
(larger than 1 TB). It delivers the following highlights:
 Easy to program: Programmers only need to describe what to do, and the
execution framework will do the job accordingly.
 Outstanding scalability: Cluster capabilities can be improved by adding
nodes.
 High fault tolerance: Cluster availability and fault tolerance are improved
by policies such as computing or data migration.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 4
YARN Overview
 Apache Hadoop YARN (Yet Another Resource Negotiator) is a
new Hadoop resource manager. It provides unified resource
management and scheduling for upper-layer applications,
remarkably improving cluster resource utilization, unified
resource management, and data sharing.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 5
Position of YARN in FusionInsight
Application service layer
OpenAPI/SDK REST/SNMP/Syslog

Data Information Knowledge Wisdom

DataFarm Porter Miner Farmer Manager
System
management
Hadoop API Plugin API
Service
governance
HIVE M/R Spark Streaming Flink
Hadoop LibrA
YARN/ Zookeeper Security
management
HDFS/HBase

YARN is the resource management system of Hadoop 2.0. It is a general resource management module
that manages and schedules resources for applications.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 6
Contents
1. Introduction to MapReduce and YARN

2. Functions and Architectures of MapReduce and YARN

3. Resource Management and Task Scheduling of YARN

4. Enhanced Features

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 7
Working Process of MapReduce (1)
Before starting MapReduce, make sure that the files to be processed are stored in
HDFS.
Commit MapReduce submits requests to ResourceManager. Then ResourceManager creates
jobs. One application maps to one job (example job ID: job_201431281420_0001).
Job.jar
Job.split Before jobs are submitted, the files to be processed are split. By default, the
Job.xml MapReduce framework regards a block as a split. Client applications can redefine
the mapping relation between blocks and splits.
Split
After the jobs are submitted to ResourceManager, ResourceManager selects an
appropriate NodeManager in the cluster to schedule ApplicationMasters based on
the workloads of NodeManagers. The ApplicationMaster initializes jobs and applies
for resources from ResourceManager. ResourceManager selects an appropriate
NodeManager to start the container for task execution.
Map
The outputs of Map are placed to the buffer in memory. When the buffer
Buffer in overflows, data in the buffer needs to be written to local disks. Before that, the
memory following process must be completed:

1. Partition — By default, the hash algorithm is used for partitioning. The

Partition
MapReduce framework determines the number of partitions based on that of
Reduce tasks. The records with the same key value are sent to the same Reduce
tasks for processing.

Sort 2. Sort — The outputs of Map are sorted, for example, ('Hi','1'),('Hello','1') are
reordered as ('Hello','1'),('Hi','1').

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 8
Working Process of MapReduce (2)
3. Combine — By default, this operation is optional. For example, ('Hi','1'),
('Hi','1'),('Hello','1'), ('Hello','1') are combined into ('Hi','2'),('Hello','2').
Combine

4. Spill — After a Map task is processed, many spill files are generated. These spill
files must be combined into spill file (MOF: MapOutFile) that is partitioned and
Spill/Merge sorted. To reduce the amount of data to be written to disks, MapReduce allows
MOFs to be written after being compressed.

When the MOF output progress of Map tasks reaches 3%, the Reduce tasks are
Copy started and obtains MOF files from each Map task. The number of Reduce tasks is
determined by clients, and the number of MOF partitions is determined by that of
Reduce tasks. For this reason, the MOF files outputted by Map tasks map to Reduce
In memory or tasks.
on disk

MOF files need to be sorted. If the amount of data received by Reduce tasks is small,
Sort/Merge
the data is directly stored in the buffer. As the number of files in the buffer increases,
the MapReduce background thread merges the files into a large one. Many
intermediate files are generated during the merge operation. The last merge result is
directly outputted to the Reduce function defined by the user.
Reduce

Combine Spill/Merge Copy Sort/Merge Reduce

Shuffle: is the data transfer process

between the Map phase and Reduce
phase involves obtaining MOF files from
the Map tasks of Reduce tasks and
sorting and merging MOF files.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 10
Example: Typical Program WordCount
WordCount
App
2

Resource Name
Manager Node

Slaver #1 Slaver #2 Slaver #3

3
Node Data#1 Node Data#2 Node Data#3
Manager Node Manager Node Manager Node
Container A.1 Container A.2 Container A.3

Input Output

File that contains words Number of times that

each word occurs

Bye 3
Hello World Bye World MapReduce Hadoop 4
Hello Hadoop Bye Hadoop
Hello 3
Bye Hadoop Hello Hadoop
World 2

Input Map Output

<Hello,1>
1.“Hello World Bye World” Map <World,1>
<Bye,1>
<World,1>

<Hello,1>
2.“Hello Hadoop Bye Hadoop” Map <Hadoop,1>
<Bye,1>
<Hadoop,1>

<Bye,1>
3.“Bye Hadoop Hello Hadoop” Map <Hadoop,1>
<Hello,1>
<Hadoop,1>

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 13
Reduce Process of WordCount
Map Map Output Reduce Reduce
Output Input Output
<Hello,1> Reduce Bye 3
<Hello,1> <World,2> <Hello,1 1 1>
<World,1> <Bye,1>
<Bye,1> Reduce
<Hello,1> <Bye,1 1 1> Hadoop 4
<World,1> <Hadoop,2> Shuffle
Combine
<Hello,1> <Bye,1>
<Hadoop,1> <World,2> Reduce Hello 3
<Bye,1>
<Hadoop,1> <Bye,1>
<Hadoop,2> <Hadoop,2 2>
<Hello,1> Reduce World 2
<Bye,1>
<Hadoop,1>
<Hello,1>
<Hadoop,1>

Container App Mstr

client Node
Resource Manager
Manager
client
App Mstr Container

Node
MapReduce Status Manager
Job Submission
Node Status
Container Container
Resource Request

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 17
YARN HA Solution
ResourceManager in YARN manages resources and schedules tasks in the cluster. The
YARN HA solution uses redundant ResourceManager nodes to solve single point of failure
problem of ResourceManager.

2.Fail-over if the Active RM

Active fails(auto) Standby
ResourceManager ResourceManager

1.Active AM write its states

intozookeeper

Zookeeper Cluster

zookeeper zookeeper zookeeper

Container

AM-1

Restart/
Container
Failure

AM-1

Container

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 20
Contents
1. Introduction to MapReduce and YARN

2. Functions and Architectures of MapReduce and YARN

3. Resource Management and Task Scheduling of YARN

4. Enhanced Features

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 21
Resource Management
 Yarn manages and allocates memory and CPU resources.
 Memory and CPU resources from each NodeManager can be
configured (on the Yarn service configuration page).
 yarn.nodemanager.resource.memory-mb
 yarn.nodemanager.vmem-pmem-ratio
 yarn.nodemanager.resource.cpu-vcore

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 22
Resource Allocation Model
Root
1. Selects a queue.

Scheduler Parent Parent

Leaf Leaf Leaf

2. Selects an application
App1 App 2 … App N
from the queue.

3. Matches requested Server A

resources on the
application. Server B
Rack A

Rack B

Any resources

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 23
Capacity Scheduler Overview
 Capacity Scheduler enables Hadoop applications to run in a shared, multi-
tenant cluster while maximizing the throughput and utilization of the cluster.
 Capacity Scheduler allocates resources by queue. Users can set upper and
lower limits for the resource usage of each queue. Administrators can restrict
the resource used by a queue, user, or job. Job priorities can be set but
resource preemption is not supported.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 24
Highlights of Capacity Scheduler
 Capacity assurance: Administrators can set upper and lower limits for the resource
usage of each queue. All applications submitted to the queue share the resources.
 Flexibility: The remaining resources of a queue can be used by other queues that
require resources. If a new application is submitted to the queue, other queues release
and return the resources to the queue.
 Priority: Priority queuing is supported (FIFO by default).
 Multi-leasing: Multiple users can share a cluster, and multiple applications can run
concurrently. Administrators can add multiple restrictions to prevent cluster resources
from being exclusively occupied by an application, user, or queue.
 Dynamic update of configuration files: Administrators can dynamically modify
configuration parameters to manage clusters online.

 During scheduling, select an appropriate queue first based on the following

policies:
 The queue with the lower resource usage is allocated first. For example, you have
two queues, Q1 and Q2, and both have the same capacities – 30. And the used
capacities of Q1 is 10 and Q2 is 12, resources are allocated to Q1 first.
 Resources are allocated to the queue with the minimum queue hierarchy first. For
example, for QueueA and QueueB.childQueueB, resources are allocated to QueueA
first.
 Resources are allocated to the resource reclamation request queue first.
 A task is then selected from the queue based on the following policy:
 The task is selected based on the task priority and submission sequence as well as
the limits of user resources and memory.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 26
Queue Resource Limitation (1)
Queues are created on the Tenant page. After a tenant is created and
associated with YARN, a queue with the same name as the tenant is created.
For example, if tenants QueueA and QueueB are created, two YARN queues
QueueA and QueueB are created.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 27
Queue Resource Limitation (2)
 Queue resource capacity (percentage), there are three queues, default, QueueA, and
QueueB, and each has a [queue name].capacity configuration:
 The capacity of the default queue is 20% of the total cluster resources.

 The capacity of the QueueA queue is 10% of the total cluster resources.

 The capacity of the QueueB queue is 10% of the total cluster resources. The capacity of the
root-default shadow queue in the background is 60% of the total cluster resources.

Resource Allocation

Tenant（Queue） Resource Capacity

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 28
Queue Resource Limitation (3)
 Sharing Idle Resources
 Due to resource sharing, the resources used by a queue may
exceed its capacity (for example, QueueA.capacity). The maximum
resource usage can be limited by parameter.

 If only a few tasks are running in a queue, the remaining resource

of the queue can be shared with other queues. For example, if
maximum-capacity of QueueA is set to 100 and tasks are running
in QueueA only, QueueA can use all the cluster resources
theoretically.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 29
User and Task Limitation
Log into FusionInsight Manager and choose Tenant > Dynamic Resource
Plan > Queue Config to configure user and task limitation parameters.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 30
User Limitation (1)
 Minimum resource assurance (percentage) of a user：
 The resources for each user in a queue are limited at any time. If tasks of multiple users are running
at the same time in a queue, the resource usage of each user fluctuates between the minimum
value and the maximum value. The maximum value is determined by the number of running tasks,
while the minimum value is determined by minimum-user-limit-percent.
 For example, if yarn.scheduler.capacity.root.QueueA.minimum-user-limit-percent=25, the
queue resources are adjusted as follows when the number of users who submit tasks increases:

The first user submits tasks to

The user obtains 100% of QueueA resources.
QueueA
The second user submits tasks to
Each user obtains 50% of QueueA resources at most.
QueueA
The third user submits tasks to
Each user obtains 33.33% of QueueA resources at most.
QueueA
The fourth user submits tasks to
Each user obtains 25% of QueueA resources at most.
QueueA
To ensure that each user can obtain 25% resources at least, the
The fifth user submits tasks to
fifth user cannot obtain any resources and must wait for them to
QueueA
be released by other users.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 31
User Limitation (2)
 Maximum resource usage of a user (multiples of queue capacity) :

 Indicates the multiples of queue capacity. This parameter is

used to set the resources that can be obtained by a user, with
a default of 1: yarn.scheduler.capacity.root.QueueD.user-limit-factor
= 1, indicating that the resource capacity obtained by a user
cannot exceed the queue capacity. No matter how many free
resources a cluster has, the resource capacity that can be
obtained by a user cannot exceed maximum-capacity.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 32
Task Limitation
 Maximum number of active tasks:
 Indicates the maximum number of active tasks in a cluster, including the running or suspended
tasks. When the number of submitted task requests reaches the limit, new tasks will be rejected.
The default value is 10000.

 Maximum number of tasks in a queue:

 Indicates the maximum number of tasks submitted to a queue. If the parameter value is set to
1000 for QueueA, QueueA allows a maximum of 1000 active tasks.

 Maximum number of tasks submitted by a user:

 Depends on the maximum number of tasks in a queue. If QueueA allows a maximum of 1000
tasks, the maximum number of tasks that each user can submit is as follows: 1000*
yarn.scheduler.capacity.root.QueueA.minimum-user-limit-percent(assume 25%)*
yarn.scheduler.capacity.root.QueueA.user-limit-factor (assume 1).

2. Functions and Architectures of MapReduce and YARN

3. Resource Management and Task Scheduling of YARN

4. Enhanced Features

Does the Does the total

No memory Yes memory usage Yes
usage exceed exceed the memory
the container threshold set for
threshold? NodeManager?

NM MEM Thrshold = Containers

with
yarn.nodemanager.resource.memory-mb*1024*1024 excessive
*yarn.nodemanager.dynamic.memory.usage.threshold memory
usage cannot
run.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 36
Enhanced Features - YARN Label-based
Scheduling
Applications that Applications that
Applications that
have common have demanding
have demanding
resource memory
I/O requirements
requirements requirements

Servers with standard Servers with Servers with high

performance large memory I/Os

NodeManager NodeManager
NodeManager

Queue

Tasks

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 37
Summary
 This module describes the following information：Application
scenarios and Architectures of MapReduce and YARN, Resource
management and task scheduling of YARN, and enhanced
features of YARN in FusionInsight HD．

2. What is the working principle of Yarn?

A. Easy to program

B. Outstanding scalability

C. Real-time computing

D. High fault tolerance

A. Memory

B. CPU

C. Container

D. Disk space

A. Iterative computing

B. Offline computing

C. Real-time interactive computing

D. Stream computing

A. Capacity assurance

B. Flexibility

C. Multi-leasing

D. Dynamic update of configuration files

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 43
More Information
 Training materials:
 http://support.huawei.com/learning/Certificate!showCertificate?lang=en&pbiPath=term100002
5450&id=Node1000011796
 Exam outline:
 http://support.huawei.com/learning/Certificate!toExamOutlineDetail?lang=en&nodeId=Node10
00011797
 Mock exam:
 http://support.huawei.com/learning/Certificate!toSimExamDetail?lang=en&nodeId=Node10000
11798
 Authentication process:
 http://support.huawei.com/learning/NavigationAction!createNavi#navi[id]=_40

Describe The Functions and Features of HDP
100% (2)
Describe The Functions and Features of HDP
16 pages
Thomas Douglas Hacker Culture
100% (1)
Thomas Douglas Hacker Culture
296 pages
Bda Unit 3 - Mam
No ratings yet
Bda Unit 3 - Mam
89 pages
Ap Educe Undamentals: Business
No ratings yet
Ap Educe Undamentals: Business
74 pages
The World Is Flat by Thomas Friedman
No ratings yet
The World Is Flat by Thomas Friedman
3 pages
Datasheets Procesador Ecu Megane
No ratings yet
Datasheets Procesador Ecu Megane
298 pages
Chapter 5 MapReduce and YARN Technical Principles
No ratings yet
Chapter 5 MapReduce and YARN Technical Principles
36 pages
Module 3 - Mapreduce
No ratings yet
Module 3 - Mapreduce
40 pages
Unit Iii
No ratings yet
Unit Iii
38 pages
CS 425 / ECE 428 Distributed Systems Fall 2014: Lecture 3: Mapreduce and Hadoop
No ratings yet
CS 425 / ECE 428 Distributed Systems Fall 2014: Lecture 3: Mapreduce and Hadoop
24 pages
Drivers For Big Data
No ratings yet
Drivers For Big Data
7 pages
06 - YARN in Hadoop - An Introduction
No ratings yet
06 - YARN in Hadoop - An Introduction
41 pages
3.4 Map Scheduler
No ratings yet
3.4 Map Scheduler
23 pages
Lec 6
No ratings yet
Lec 6
16 pages
Map Reduce
No ratings yet
Map Reduce
16 pages
Hadoop Class 2 PDF
No ratings yet
Hadoop Class 2 PDF
18 pages
SABDE3G05 Big Data MapReduce Yarn
No ratings yet
SABDE3G05 Big Data MapReduce Yarn
69 pages
Adobe Scan 22 Apr 2024
No ratings yet
Adobe Scan 22 Apr 2024
3 pages
Parlab Parallel Boot Camp: Cloud Computing With Mapreduce and Hadoop
No ratings yet
Parlab Parallel Boot Camp: Cloud Computing With Mapreduce and Hadoop
53 pages
Engineering Equation Solver (EES)
No ratings yet
Engineering Equation Solver (EES)
6 pages
Framework For Processing Data in Hadoop - : Yarn and Mapreduce
No ratings yet
Framework For Processing Data in Hadoop - : Yarn and Mapreduce
31 pages
BD U-4 (Anupam Sir)
No ratings yet
BD U-4 (Anupam Sir)
23 pages
YARN (Yet Another Resource Negotiator) : Apache Hadoop in A Nutshell
No ratings yet
YARN (Yet Another Resource Negotiator) : Apache Hadoop in A Nutshell
2 pages
Chap 3-5.-Hadoop Ecosystem YARN MapReduce - 1
No ratings yet
Chap 3-5.-Hadoop Ecosystem YARN MapReduce - 1
87 pages
Unit 3
No ratings yet
Unit 3
18 pages
Cloud PDF
No ratings yet
Cloud PDF
138 pages
Chapter3 HDFS MapReduce YARN
No ratings yet
Chapter3 HDFS MapReduce YARN
35 pages
04 MapRed 6 JobExecutionOnYarn
No ratings yet
04 MapRed 6 JobExecutionOnYarn
20 pages
Open Hrms User Manual
No ratings yet
Open Hrms User Manual
79 pages
ALV Reports
No ratings yet
ALV Reports
70 pages
Apache Hadoop YARN
No ratings yet
Apache Hadoop YARN
24 pages
Hadoop 2.0
No ratings yet
Hadoop 2.0
20 pages
Apache Hadoop YARN - Enabling Next Generation Data Applications
No ratings yet
Apache Hadoop YARN - Enabling Next Generation Data Applications
64 pages
Chapter 7 Interrupts of 8085
100% (1)
Chapter 7 Interrupts of 8085
20 pages
Lec 6
No ratings yet
Lec 6
14 pages
Yarn Tutorial
No ratings yet
Yarn Tutorial
14 pages
4b1 - Muhamad Nurul Hana, M.pd. - Laboratorium Inventory
No ratings yet
4b1 - Muhamad Nurul Hana, M.pd. - Laboratorium Inventory
27 pages
Bda Unit 3
No ratings yet
Bda Unit 3
50 pages
Lecture - 3
No ratings yet
Lecture - 3
25 pages
Map Reduce and Hadoop
No ratings yet
Map Reduce and Hadoop
39 pages
BDA Unit 4 PDF
No ratings yet
BDA Unit 4 PDF
31 pages
Apache Hadoop Next Generation Compute Platform: Bikas Saha @bikassaha
No ratings yet
Apache Hadoop Next Generation Compute Platform: Bikas Saha @bikassaha
22 pages
UNIT-4 Bda
No ratings yet
UNIT-4 Bda
26 pages
Preparation: ERICSSON Node B Commissioning and Integration
No ratings yet
Preparation: ERICSSON Node B Commissioning and Integration
37 pages
Big Data - Hadoop
No ratings yet
Big Data - Hadoop
20 pages
Tally Shortcut Keys 2024-2025 (Commerce Academy)
No ratings yet
Tally Shortcut Keys 2024-2025 (Commerce Academy)
3 pages
Enhanced Hadoop Ecosystem Presentation
No ratings yet
Enhanced Hadoop Ecosystem Presentation
10 pages
Hadoop MapReduce YARN Detailed
No ratings yet
Hadoop MapReduce YARN Detailed
2 pages
Introduction-to-Hadoop-Ecosystem
No ratings yet
Introduction-to-Hadoop-Ecosystem
26 pages
Unit 2 - From Hadoop Streaming PDF
No ratings yet
Unit 2 - From Hadoop Streaming PDF
20 pages
B. Hadoop Ecosystem - III (MapReduce)
No ratings yet
B. Hadoop Ecosystem - III (MapReduce)
55 pages
Unit IV Notes
No ratings yet
Unit IV Notes
25 pages
Unit5 B
No ratings yet
Unit5 B
4 pages
Big Data Exam Help
No ratings yet
Big Data Exam Help
7 pages
BDA Assignment 3
No ratings yet
BDA Assignment 3
24 pages
DM Hadoop Architecture
No ratings yet
DM Hadoop Architecture
6 pages
Big Data Notes Unit-3
No ratings yet
Big Data Notes Unit-3
7 pages
UHFReader Demo Software User's Guidev1
No ratings yet
UHFReader Demo Software User's Guidev1
17 pages
CS 425 / ECE 428 Distributed Systems Fall 2016: Lecture 4: Mapreduce and Hadoop
No ratings yet
CS 425 / ECE 428 Distributed Systems Fall 2016: Lecture 4: Mapreduce and Hadoop
24 pages
Adoop Cosystem: S W S A, T L at 68
No ratings yet
Adoop Cosystem: S W S A, T L at 68
22 pages
Unit-4: Illustrate Mapreduce Architecture With Diagram
No ratings yet
Unit-4: Illustrate Mapreduce Architecture With Diagram
7 pages
Privacy Tools v19.84 Secure Open List: Ubuntu Touch: Android Alternative For Phones and Tablets
No ratings yet
Privacy Tools v19.84 Secure Open List: Ubuntu Touch: Android Alternative For Phones and Tablets
84 pages
Cp4152 Database Practices Lab Record
No ratings yet
Cp4152 Database Practices Lab Record
38 pages
29.06.2022-SWIFT MT103 GPI30B PAF GMBH (No Code)
50% (2)
29.06.2022-SWIFT MT103 GPI30B PAF GMBH (No Code)
2 pages
Unit-2 Bda Kalyan - Pagenumber
No ratings yet
Unit-2 Bda Kalyan - Pagenumber
15 pages
Big Data Unit 3 Own
No ratings yet
Big Data Unit 3 Own
20 pages
CCTV Camera 2
No ratings yet
CCTV Camera 2
8 pages
Unit 2 Topic 4 Map Reduce
No ratings yet
Unit 2 Topic 4 Map Reduce
27 pages
Watercad 4.0
No ratings yet
Watercad 4.0
5 pages
How To Send Biiling Document Through Edi Idoc PDF
No ratings yet
How To Send Biiling Document Through Edi Idoc PDF
15 pages
Lecture 10 Chapter 6 Part 1 Big Data Processing Concepts
No ratings yet
Lecture 10 Chapter 6 Part 1 Big Data Processing Concepts
26 pages
BASICQUERY2
No ratings yet
BASICQUERY2
3 pages
Rhod RGB: Quick Installation Guide
No ratings yet
Rhod RGB: Quick Installation Guide
12 pages
ECS765P - W3 - Hadoop Principles and Components
No ratings yet
ECS765P - W3 - Hadoop Principles and Components
47 pages
Worksheet 2.11 Unit Testing
No ratings yet
Worksheet 2.11 Unit Testing
8 pages
Unit 2 Notes BDA
No ratings yet
Unit 2 Notes BDA
10 pages
883 Question Paper
No ratings yet
883 Question Paper
2 pages
Recursion, As A Different Way of Solving Problems. Example Programs Such As Finding Factorial. Fibon
No ratings yet
Recursion, As A Different Way of Solving Problems. Example Programs Such As Finding Factorial. Fibon
11 pages
Sheet No. Sheet Name: Hierarchical Block
No ratings yet
Sheet No. Sheet Name: Hierarchical Block
8 pages
The Road To Enterprise Artificial Intelligence: A Case Studies Driven Exploration
No ratings yet
The Road To Enterprise Artificial Intelligence: A Case Studies Driven Exploration
73 pages
Unit 2 B)
No ratings yet
Unit 2 B)
16 pages
Bec405 A Module4 - 1
No ratings yet
Bec405 A Module4 - 1
4 pages
Cannot Delete DTP Delta Initial Request: Symptom
No ratings yet
Cannot Delete DTP Delta Initial Request: Symptom
2 pages
Module 12 Zookeeper - Cluster Distributed Coordination Service
No ratings yet
Module 12 Zookeeper - Cluster Distributed Coordination Service
26 pages
CAPS and IBASS Manuals
No ratings yet
CAPS and IBASS Manuals
50 pages
Module 11 Kafka - Distributed Message Subscription System
No ratings yet
Module 11 Kafka - Distributed Message Subscription System
34 pages
Module 13 FusionInsight HD Solution Overview
No ratings yet
Module 13 FusionInsight HD Solution Overview
57 pages
Huawei Academy Instructor Assessment Guide (Only For HCNA-cloud) - 20180416
No ratings yet
Huawei Academy Instructor Assessment Guide (Only For HCNA-cloud) - 20180416
8 pages
Module 10 Flume - Massive Logs Aggregation
No ratings yet
Module 10 Flume - Massive Logs Aggregation
42 pages
Appendix 4 Apply For HCAI Certificate Guide
No ratings yet
Appendix 4 Apply For HCAI Certificate Guide
8 pages
Module 01 Big Data Industry and Technological Trends
No ratings yet
Module 01 Big Data Industry and Technological Trends
50 pages
Module 07 Streaming - Distributed Stream Computing Engine
No ratings yet
Module 07 Streaming - Distributed Stream Computing Engine
33 pages
Module 08 Flink - Stream Processing and Batch Processing Platform
No ratings yet
Module 08 Flink - Stream Processing and Batch Processing Platform
40 pages
Web Technology Lab Aim and Algorithm
No ratings yet
Web Technology Lab Aim and Algorithm
10 pages
Os Unit 5
No ratings yet
Os Unit 5
15 pages
Professional Hadoop Solutions
From Everand
Professional Hadoop Solutions
Boris Lublinsky
4/5 (2)
Mastering Hadoop
From Everand
Mastering Hadoop
Sandeep Karanth
No ratings yet
Learning Hadoop 2
From Everand
Learning Hadoop 2
Garry Turkington
4/5 (1)
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
From Everand
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
Wei Liu
No ratings yet

Module 03 MapReduce - Distributed Off-Line Batch Processing and Yarn - Resource Negotiator

Uploaded by

Module 03 MapReduce - Distributed Off-Line Batch Processing and Yarn - Resource Negotiator

Uploaded by

Technical Principles of

MapReduce and YARN

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved.

2. Functions and Architectures of MapReduce and YARN

3. Resource Management and Task Scheduling of YARN

Data Information Knowledge Wisdom

2. Functions and Architectures of MapReduce and YARN

3. Resource Management and Task Scheduling of YARN

1. Partition — By default, the hash algorithm is used for partitioning. The

Combine Spill/Merge Copy Sort/Merge Reduce

Shuffle: is the data transfer process

Slaver #1 Slaver #2 Slaver #3

File that contains words Number of times that

Input Map Output

Container App Mstr

2.Fail-over if the Active RM

1.Active AM write its states

zookeeper zookeeper zookeeper

2. Functions and Architectures of MapReduce and YARN

3. Resource Management and Task Scheduling of YARN

Scheduler Parent Parent

Leaf Leaf Leaf

3. Matches requested Server A

 During scheduling, select an appropriate queue first based on the following

Tenant（Queue） Resource Capacity

 If only a few tasks are running in a queue, the remaining resource

The first user submits tasks to

 Indicates the multiples of queue capacity. This parameter is

 Maximum number of tasks in a queue:

 Maximum number of tasks submitted by a user:

2. Functions and Architectures of MapReduce and YARN

3. Resource Management and Task Scheduling of YARN

Does the Does the total

NM MEM Thrshold = Containers

Servers with standard Servers with Servers with high

2. What is the working principle of Yarn?

D. High fault tolerance

C. Real-time interactive computing

D. Dynamic update of configuration files

You might also like