0% found this document useful (0 votes)

12 views6 pages

CH 2

Uploaded by

Xenos Playground aka Boxman Studios

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views6 pages

CH 2

Uploaded by

Xenos Playground aka Boxman Studios

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

You are on page 1/ 6

Data teams - composed of members who fit into 3 categories

-data engineers - responsible for finding connections or relations within the data.
correct and clean the data.
-data modelers - focus on model generation. makes use of machine learning.
-subject matter expert - provide feedback for the model so it can be revised

Hadoop - OS for big data

2 components
HDFS - hadoop distributed file system
-interacts with memory components on all computers on distributed network

YARN - yet another resource negotiator

-run algorithms that dictate who the coordinator is

MapReduce - Technically an application, though sometimes considered another

component
runs batch in background

Architechture

-Applications that carry out specific tasks - Hive(SQL), STORM(Streaming),

MapReduce(Batch), Spark(Real time processing)
-YARN dictates coordinators, distributes tasks between computers
-HDFS runs on all the computers, handling memory across the computers
-cluster of interconnected different computers, each with their own memory (Cluster
of Nodes)

Distributes System Requirements

-Fault tolerance - if a component fails, it should not result in failure of entire
system
-Recovery - in the event of failure, no data should be lost
-Consistency - failure of one job should not affect final result
-Scalability - able to add larger load (more data, more computation) without
decline in performance

How Hadoop enforces these requirements

- data is distriuted immediately by YARN when added to the cluster and stored on
multiple nodes.
nodes process data that is stored locally in order to minimize traffic across the
network.
- data is stored in blocks of fixed size (usually 128 mb) and each block is
duplicated multipe times across the system to provide redundancy and data safety.
- A computation is reffered to as a job. jobs are broken into tasks where each
individual node performs the task on a single block of data.
- Jobs are written at a high-level (abstract) without concern for network
programming, time, or low level infrastructure (each individual node)(HDFS and YARN
handle this),
allowing devs to focus on data and computation rather than distributed programming
details.
- reduce reading and writing during computation to speed up process.
- the amount of network traffic between nodes should be minimized transparently by
the system. Each task should be independent and nodes should not have to
communicate
with each other during processing to ensure that there are no interporcess
dependencies that could lead to deadlock.
- Jobs are fault tolerant usually through task redundancy, such that if a single
task fails, the final computation is not incorrect or incomplete
- Master programs allocate work to worker nodes such that many worker nodes can
operate in parallel, each on their own portion of the larger dataset.

HADOOP CLUSTERS

HDFS and YARN expose an application programming interface (API) that abstracts
developers from low-level cluster administration details.
A set of machines that is running HDFS and YARN is known as a cluster, and the
individual machines are called nodes.
A cluster can have a single node, or many thousands of nodes, but all clusters
scale horizontally, meaning as you add more nodes,
the cluster increases in both capacity and performance in a linear fashion.

daemon processes - background processes of HDFS and YARN that make any computation
of big data successful! Work as master programs that allocate work to worker nodes
allowing them to work in parallell, each on their own portion of the larger
dataset.
MapReduce runs the Daemon processes in parallell as batch, runnign them int he
background.

YARN and HDFS are implemented by several daemon processes—that is, software that
runs in the background and does not require user input.
Hadoop processes are services, meaning they run all the time on a cluster node and
accept input and deliver output through the network, similar to how an HTTP server
works.
Each of these processes runs inside of its own Java Virtual Machine (JVM) so each
daemon has its own system resource allocation and is managed independently by the
operating system.
Each node in the cluster is identified by the type of process or processes that it
runs:

Master nodes
These nodes run coordinating services for Hadoop workers and are usually the entry
points for user access to the cluster.
Without masters, coordination would fall apart, and distributed storage or
computation would not be possible.

Worker nodes
These nodes are the majority of the computers in the cluster. Worker nodes run
services that accept tasks from master nodes
—either to store or retrieve data or to run a particular application.
A distributed computation is run by parallelizing the analysis across worker nodes.

HDFS and YARN work in concert to minimize the amount of network traffic in the
cluster primarily by ensuring that data is local to the required computation.
Duplication of both data and tasks ensures fault tolerance, recoverability, and
consistency.
Moreover, the cluster is centrally managed to provide scalability and to abstract
low-level clustering programming details.
Together, HDFS and YARN are a platform upon which big data applications are built;
perhaps more than just a platform, they provide an operating system for big data.
HDFS
NameNode (Master)
-Stores the directory tree of the file system, file metadata, and the locations of
each file in the cluster.
-Clients wanting to access HDFS must first locate the appropriate storage nodes by
requesting information from the NameNode.
-The master NameNode keeps track of what blocks make up a file and where those
blocks are located.
-The NameNode communicates with the DataNodes, the processes that actually hold the
blocks in the cluster.
-Metadata associated with each file is stored in the memory of the NameNode master
for quick lookups, and if the NameNode stops or fails,
the entire cluster will become inaccessible!

-When a client application wants access to read a file, it first requests the
metadata from the NameNode to locate the blocks that make up the file,
as well as the locations of the DataNodes that store the blocks. The application
then communicates directly with the DataNodes to read the data.
Therefore, the NameNode simply acts like a journal or a lookup table and is not a
bottleneck to simultaneous reads.
-HDFS manages info about each chunk so that they are retrieved in order via file
metadata.
-NameNode does not process or store the data.

-To upload a file, request it from NameNode. The NameNode will look up where it
will be stored, divide it into chunks, and store those chunks in the DataNodes.
Info about chunks of a file is stored through Hash3

Secondary NameNode (Master)

-Performs housekeeping tasks and checkpointing on behalf of the NameNode.
-Despite its name, it is not a backup NameNode.
-The Secondary NameNode is not a backup to the NameNode, but instead performs
housekeeping tasks on behalf of the NameNode,
including (and especially) periodically merging a snapshot of the current data
space with the edit log to ensure that the edit log doesn’t get too large.
-The edit log is used to ensure data consistency and prevent data loss; if the
NameNode fails, this merged record can be used to reconstruct the state of the
DataNodes.

DataNode (Worker)
-Stores and manages HDFS blocks on the local disk.
-Reports health and status of individual data stores back to the NameNode.

-HDFS files are split into blocks, usually of either 64 MB or 128 MB, although this
is configurable at runtime and high-performance systems
typically select block sizes of 256 MB. The block size is the minimum amount of
data that can be read or written to in HDFS.
However, unlike blocks on a single disk, files that are smaller than the block size
do not occupy the full blocks’ worth of space on the actual file system.
Additionally, blocks will be replicated across the DataNodes. By default, the
replication is three-fold, but this is also configurable at runtime.

HDFS performs best with moderate to large number of very large files.
Storage Pattern - HDFS implements WORM (Write Once, Read Many) - No random writes
or appends to files
HDFS is optimized for large streaming reading of files, no random reading or
selection
kind of applications that work good with HDFS? -
It is not a good fit as a data backend for applications that require updates in
real-time, interactive data analysis, or record-based transactional support.
does not work well with transactional applications as they require real time
updates with each transaction to maintain consistency and integrity.
Instead, by writing data only once and reading many times, HDFS users tend to
create large stores of heterogeneous data to aid in a variety of different
computations and analytics.
These stores are sometimes called “data lakes” because they simply hold all data
about a known problem in a recoverable and fault-tolerant manner.

HDFS file system similar to UNIX/Linux not Windows

YARN Nodes
ResourceManager (Master)
-Allocates and monitors available cluster resources (e.g., physical assets like
memory and processor cores) to applications as well as
handling scheduling of jobs on the cluster.

ApplicationMaster (Master)
-Coordinates a particular application being run on the cluster as scheduled by the
ResourceManager.

NodeManager (Worker)
-Runs and manages processing tasks on an individual node as well as reports the
health and status of tasks as they’re running.

-clients that wish to execute a job must first request resources from the
ResourceManager, which assigns an application-specific ApplicationMaster for the
duration of the job.
The ApplicationMaster tracks the execution of the job, while the ResourceManager
tracks the status of the nodes,
and each individual NodeManager creates containers and executes tasks within them.
Note that there may be other processes running on the Hadoop cluster as well—for
example,
JobHistory servers or ZooKeeper coordinators, but these services are the primary
software running in a Hadoop cluster.

the moment the application master node ceases to recieve a live signal from a
worker node, it contacts the resource manager. Thise ensures fault tolerance, as a
backup
worker node can now take over

Master processes are so important that they usually are run on their own node so
they don’t compete for resources and present a bottleneck.
However, in smaller clusters, the master daemons may all run on a single node.

single node cluster is possible.

MAPREDUCE
MapReduce is a simple but very powerful computational framework specifically
designed to enable fault-tolerant distributed computation across
a cluster of centrally managed machines. It does this by employing a “functional”
programming style that is inherently parallelizable—by allowing multiple
independent tasks to execute a function on local chunks of data and aggregating the
results after processing.

Functional programming is a style of programming that ensures unit computations are

evaluated in a stateless manner.
This means functions depend only on their inputs, and they are closed and do not
share state.
Data is transferred between functions by sending the output of one function as the
input to another, wholly independent function.
These traits make functional programming a great fit for distributed, big data
computational systems,
because it allows us to move the computation to any node that has the data input
and guarantee that we will still get the same result.
Because functions are stateless and depend solely on their input, many functions on
many machines can work independently on smaller chunks of the dataset.
By strategically chaining the outputs of functions to the inputs of other
functions, we can guarantee that we will reach a final computation across the
entire dataset.

multiple tasks are carried out across nodes in a cluster , no need of sharing
state!
i.e. task carried out by one node depends on any other node in terms of a sequence

mapping the input numbers to a group/cluster of 16 nodes/particpants

reducing input 32 numbers to their final sum!

Map applies a function to many things independently, before sending off its many
results to the reducer in a sequence
Reduce has two arguments(a sequence and a function that it will use to operate over
the sequence to reduce it to a single thing.)

MapReduce is stateless. variables are given as input and carried forward via
output. As such, any node thet has the data input can complete the computation
and get the same result. By chaining outputs of functions to the inputs of other
functions,
we can guarantee that we will reach a final computation across the entire dataset.

The MapReduce model consists of two main phases: the Map phase and the Reduce
phase.
Map Phase: -
-In this phase, the input data is divided into smaller chunks, and a function
called the "mapper" is applied to each chunk independently.
-The mapper function takes the input data and transforms it into a set of key-value
pairs, where the key represents
some attribute of the data and the value represents the processed data itself.
-Each mapper operates independently and processes its chunk of data in parallel
across multiple machines in a distributed environment.

Shuffle and Sort:

-After the map phase, the intermediate key-value pairs generated by the mappers are
shuffled and sorted based on their keys.
-This ensures that all values associated with the same key are grouped together.
-The shuffle and sort phase is crucial for preparing the data for the reduce phase
by ensuring that all values with the same key are sent to the same reducer.

Reduce Phase:
-In this phase, the intermediate key-value pairs generated by the mappers are input
to a function called the "reducer."
-The reducer function takes the key and the corresponding list of values (grouped
by the key) and performs
some aggregation or computation on those values to produce the final output.
-Like the mapper phase, the reduce phase operates independently and in parallel
across multiple machines.
-The output of the reducer is typically written to a file or some other storage
system.
-The reducer is intended to aggregate the many values that are output from the map
phase in order
to transform a large volume of data into a smaller, more manageable set of summary
data, but has many other uses as well.

However, more complex algorithms and analyses cannot be distilled to a single

MapReduce job.
For example, many machine learning or predictive analysis techniques require
optimization, an iterative process where error is minimized.
MapReduce does not support native iteration through a single map or reduce.

In fact, the use of multiple MapReduce jobs to perform a single computation is how
more complex applications are constructed, through a process called “job chaining.”
By creating data flows through a system of intermediate MapReduce jobs, we can
create a pipeline of analytical steps that lead us to our end result.

The MapReduce API is written in Java, and therefore MapReduce jobs submitted to the
cluster are going to be compiled Java Archive (JAR) files.
Hadoop will transmit the JAR files across the network to each node that will run a
task (either a mapper or reducer) and the individual tasks of the MapReduce job are
executed.

Bda Final Sem 7
No ratings yet
Bda Final Sem 7
120 pages
21st Century Boys v02, (2007) (Obxist)
No ratings yet
21st Century Boys v02, (2007) (Obxist)
205 pages
Understanding Hadoop Ecosystem
No ratings yet
Understanding Hadoop Ecosystem
38 pages
BDS Session 6
No ratings yet
BDS Session 6
78 pages
Unit 2
No ratings yet
Unit 2
73 pages
Unit-Iv CC&BD CS71
No ratings yet
Unit-Iv CC&BD CS71
148 pages
Works of Arthur Schopenhauer - Arthur Schopenhauer
100% (1)
Works of Arthur Schopenhauer - Arthur Schopenhauer
2,370 pages
Unit-3 BDA
No ratings yet
Unit-3 BDA
30 pages
Unit II Hadoop
No ratings yet
Unit II Hadoop
23 pages
BDA Unit 1
No ratings yet
BDA Unit 1
35 pages
Introduction To Hadoop
No ratings yet
Introduction To Hadoop
56 pages
Introduction To Hadoop
No ratings yet
Introduction To Hadoop
18 pages
Big Data Unit-2 PPT Part1
No ratings yet
Big Data Unit-2 PPT Part1
76 pages
Hadoop Frame Work
No ratings yet
Hadoop Frame Work
38 pages
Lec 5 - Big Data Storage Technologies I - Hadoop
No ratings yet
Lec 5 - Big Data Storage Technologies I - Hadoop
44 pages
Lecture Notes Hadoop
100% (1)
Lecture Notes Hadoop
11 pages
Wa0002.
No ratings yet
Wa0002.
66 pages
Unit IV Notes
No ratings yet
Unit IV Notes
34 pages
Hadoop Platform & Services
No ratings yet
Hadoop Platform & Services
41 pages
2-Hadoop History Terminologies DFS-03-01-2025
No ratings yet
2-Hadoop History Terminologies DFS-03-01-2025
52 pages
Hadoop 1
No ratings yet
Hadoop 1
75 pages
Hadoop Major Components
No ratings yet
Hadoop Major Components
10 pages
Chapter - 6 - Hadoop
No ratings yet
Chapter - 6 - Hadoop
51 pages
Session3 - 4-Bigdata Tools and Movie Use Case
No ratings yet
Session3 - 4-Bigdata Tools and Movie Use Case
79 pages
Hadoop System and Yarn Including Their Components
No ratings yet
Hadoop System and Yarn Including Their Components
3 pages
Introduction To Hadoop
No ratings yet
Introduction To Hadoop
5 pages
Data Science
No ratings yet
Data Science
14 pages
Unit - 2
No ratings yet
Unit - 2
42 pages
CC Unit 5
No ratings yet
CC Unit 5
43 pages
Chapter2 Bdi
No ratings yet
Chapter2 Bdi
101 pages
Unit 5-PLH
No ratings yet
Unit 5-PLH
34 pages
LIS Communication Protocol Specification - 20191126 - Rev.0.2
No ratings yet
LIS Communication Protocol Specification - 20191126 - Rev.0.2
6 pages
A19 III Year Cse
No ratings yet
A19 III Year Cse
239 pages
Big Data Unit 2
No ratings yet
Big Data Unit 2
31 pages
Chapter 3 J v8.0 V04
No ratings yet
Chapter 3 J v8.0 V04
150 pages
Bda Unit34
No ratings yet
Bda Unit34
17 pages
Module II
No ratings yet
Module II
46 pages
Hadoop Common Hadoop Distributed File System (HDFS) Hadoop Yarn Hadoop Mapreduce
No ratings yet
Hadoop Common Hadoop Distributed File System (HDFS) Hadoop Yarn Hadoop Mapreduce
30 pages
Introduction To Hadoop: Dr. G Sudha Sadhasivam Professor, CSE PSG College of Technology Coimbatore
No ratings yet
Introduction To Hadoop: Dr. G Sudha Sadhasivam Professor, CSE PSG College of Technology Coimbatore
34 pages
Learn
No ratings yet
Learn
16 pages
Year 1 Computer Programming Assessment Brief
No ratings yet
Year 1 Computer Programming Assessment Brief
14 pages
Unit 2 Hadoop
No ratings yet
Unit 2 Hadoop
60 pages
Unit 3
No ratings yet
Unit 3
18 pages
Big Data-UNIT-2
No ratings yet
Big Data-UNIT-2
46 pages
ACD301 Exam Valid Dumps
No ratings yet
ACD301 Exam Valid Dumps
10 pages
Unit 2 Hadoop
No ratings yet
Unit 2 Hadoop
67 pages
Unit-2 - Introduction To Hadoop and Hadoop Architecture
No ratings yet
Unit-2 - Introduction To Hadoop and Hadoop Architecture
46 pages
Haoop Architecture
No ratings yet
Haoop Architecture
34 pages
bdcc-2 2
No ratings yet
bdcc-2 2
12 pages
Wa0002.
No ratings yet
Wa0002.
32 pages
Sams Teach Yourself MySQL in 10 Minutes
100% (1)
Sams Teach Yourself MySQL in 10 Minutes
382 pages
ECS765P - W3 - Hadoop Principles and Components
No ratings yet
ECS765P - W3 - Hadoop Principles and Components
47 pages
Unit 3
No ratings yet
Unit 3
25 pages
Unit - 2
No ratings yet
Unit - 2
27 pages
UNIT-1-part-2-BIG DATA ANALYTICS AND TOOLS
No ratings yet
UNIT-1-part-2-BIG DATA ANALYTICS AND TOOLS
19 pages
Chapter 02
No ratings yet
Chapter 02
45 pages
Lecture 2
No ratings yet
Lecture 2
28 pages
Bigdata and Hadoop - Unit III
No ratings yet
Bigdata and Hadoop - Unit III
24 pages
86EIGHTY-SIX Vol 10 Light Novel Fragmental Neoteny - Asato Asato
No ratings yet
86EIGHTY-SIX Vol 10 Light Novel Fragmental Neoteny - Asato Asato
289 pages
Chapter 10
No ratings yet
Chapter 10
45 pages
BLAME! Master Edition v03 (2017) (Digital) (Danke-Empire)
100% (1)
BLAME! Master Edition v03 (2017) (Digital) (Danke-Empire)
341 pages
Chapter 06
No ratings yet
Chapter 06
46 pages
Lhu Comp 200: Chapter 2 (2 C) Application Layer
No ratings yet
Lhu Comp 200: Chapter 2 (2 C) Application Layer
37 pages
Chapter 14
No ratings yet
Chapter 14
35 pages
BLAME! Master Edition v01 (2016) (Digital) (Danke-Empire)
100% (1)
BLAME! Master Edition v01 (2016) (Digital) (Danke-Empire)
396 pages
UNIT V-Cloud Computing
No ratings yet
UNIT V-Cloud Computing
33 pages
Unit 5 Print
No ratings yet
Unit 5 Print
32 pages
Promax
No ratings yet
Promax
286 pages
Hadoop
No ratings yet
Hadoop
4 pages
Bda - Unit 2
No ratings yet
Bda - Unit 2
56 pages
Chapter 04
No ratings yet
Chapter 04
29 pages
Chapter 08 2
No ratings yet
Chapter 08 2
20 pages
BLAME! Master Edition v02 (2016) (Digital) (Danke-Empire)
No ratings yet
BLAME! Master Edition v02 (2016) (Digital) (Danke-Empire)
364 pages
CC Unit 5 Notes
No ratings yet
CC Unit 5 Notes
30 pages
Resume For Media Internship
100% (2)
Resume For Media Internship
8 pages
Business Intelligence & Big Data Analytics-CSE3124Y
No ratings yet
Business Intelligence & Big Data Analytics-CSE3124Y
26 pages
Leaked GIft Card Method 2021
33% (3)
Leaked GIft Card Method 2021
9 pages
Review of DB Concepts
No ratings yet
Review of DB Concepts
27 pages
SQL Views & Procedures
No ratings yet
SQL Views & Procedures
23 pages
Big Data
No ratings yet
Big Data
16 pages
Message
No ratings yet
Message
17 pages
SQL Queries5
No ratings yet
SQL Queries5
20 pages
IT Recruiter Training Part
No ratings yet
IT Recruiter Training Part
4 pages
Eliot PsychoanalyticInterpretationGroup 1920
No ratings yet
Eliot PsychoanalyticInterpretationGroup 1920
21 pages
Columnar Database
No ratings yet
Columnar Database
18 pages
Deutsch GroupFormation 1973
No ratings yet
Deutsch GroupFormation 1973
20 pages
Review - Normal Forms2
No ratings yet
Review - Normal Forms2
17 pages
CAP Theorem
No ratings yet
CAP Theorem
15 pages
Vendors
No ratings yet
Vendors
266 pages
SQL Functions
No ratings yet
SQL Functions
18 pages
SQL Triggers & Functions
No ratings yet
SQL Triggers & Functions
16 pages
Examining Maslow's Hierarchy Need Theory in The Social Media Adoption
No ratings yet
Examining Maslow's Hierarchy Need Theory in The Social Media Adoption
11 pages
Unit 2 Notes BDA
No ratings yet
Unit 2 Notes BDA
10 pages
Query Optimization
No ratings yet
Query Optimization
10 pages
Intro-Databases For Big Data
No ratings yet
Intro-Databases For Big Data
10 pages
Jenny Blog
No ratings yet
Jenny Blog
12 pages
Quality Indicators For The Care of Older Adults W Disabilities in Longterm Care Wbased On Maslow Hierarchy of Needs
No ratings yet
Quality Indicators For The Care of Older Adults W Disabilities in Longterm Care Wbased On Maslow Hierarchy of Needs
7 pages
WK 4 Data Flow Diagram (DFD)
No ratings yet
WK 4 Data Flow Diagram (DFD)
26 pages
A Suggested Modification To Maslow's Need Hierarchy
No ratings yet
A Suggested Modification To Maslow's Need Hierarchy
6 pages
Chapter 6 Management A Practical Introduction
No ratings yet
Chapter 6 Management A Practical Introduction
6 pages
Relativism in Ethics - William Shaw
No ratings yet
Relativism in Ethics - William Shaw
4 pages
Error Detection and Correction
No ratings yet
Error Detection and Correction
25 pages
Healthappd 2024 01 29 220526
No ratings yet
Healthappd 2024 01 29 220526
8 pages
Bluetooth Protocol Bluetooth 4dd
No ratings yet
Bluetooth Protocol Bluetooth 4dd
60 pages
Data Model Fact
No ratings yet
Data Model Fact
4 pages
CMS Requirements Document
No ratings yet
CMS Requirements Document
19 pages
Cucumber MCQ 2
No ratings yet
Cucumber MCQ 2
3 pages
The Impact of Digital Technology and Industry 4 0 On The Ripple Effect and Supply Chain Risk Analytics
No ratings yet
The Impact of Digital Technology and Industry 4 0 On The Ripple Effect and Supply Chain Risk Analytics
19 pages
Rajib Ahmed CV
No ratings yet
Rajib Ahmed CV
4 pages
Excel Lat 1 08082019
No ratings yet
Excel Lat 1 08082019
37 pages
Error Details
No ratings yet
Error Details
9 pages
Proposal - Website +software AMC For Cryptoconnect
No ratings yet
Proposal - Website +software AMC For Cryptoconnect
5 pages
004 Ingrid MEYER ComputerWords in Our Everyday
No ratings yet
004 Ingrid MEYER ComputerWords in Our Everyday
20 pages
Curriculum Map Ap
No ratings yet
Curriculum Map Ap
6 pages
Architecture Roadmap
No ratings yet
Architecture Roadmap
5 pages
Ecografo M7 - Especificaciones
No ratings yet
Ecografo M7 - Especificaciones
2 pages
R01an0474eg rx62n
No ratings yet
R01an0474eg rx62n
21 pages
Multipla Bluetooth 02-05 PDF
No ratings yet
Multipla Bluetooth 02-05 PDF
16 pages
Led Display
No ratings yet
Led Display
5 pages
Big Data Analytics
From Everand
Big Data Analytics
Nitin Kumar Yadav
No ratings yet

CH 2

Uploaded by

CH 2

Uploaded by

Data teams - composed of members who fit into 3 categories

Hadoop - OS for big data

YARN - yet another resource negotiator

**MapReduce** - Technically an application, though sometimes considered another

-Applications that carry out specific tasks - Hive(SQL), STORM(Streaming),

Distributes System Requirements

How Hadoop enforces these requirements

Secondary NameNode (Master)

HDFS file system similar to UNIX/Linux not Windows

single node cluster is possible.

Functional programming is a style of programming that ensures unit computations are

mapping the input numbers to a group/cluster of 16 nodes/particpants

Shuffle and Sort:

However, more complex algorithms and analyses cannot be distilled to a single

You might also like

MapReduce - Technically an application, though sometimes considered another