0% found this document useful (0 votes)

36 views36 pages

Mapreduce and Hadoop Distributed File System: K. Madurai and B. Ramamurthy

The document introduces MapReduce and Hadoop Distributed File System (HDFS). It provides context on the growth of big data. It then describes MapReduce as a programming model for processing large datasets in parallel across clusters of machines. HDFS is introduced as a file system that supports MapReduce and handles failures, communications and performance. The outline discusses how MapReduce works and its relevance to undergraduate curriculums.

Uploaded by

shubhi agarwal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views36 pages

Mapreduce and Hadoop Distributed File System: K. Madurai and B. Ramamurthy

Uploaded by

shubhi agarwal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 36

MapReduce and

Hadoop Distributed
File SystemContact:
Dr. Bina Ramamurthy
K. MADURAI
CSE AND B. RAMAMURTHY
Department
University at Buffalo (SUNY)
[email protected]
http://www.cse.buffalo.edu/faculty/bina
Partially Supported by
NSF DUE Grant: 0737243

10/6/2019
1
The Context: Big-data
Man on the moon with 32KB (1969); my laptop had 2GB RAM (2009)
Google collects 270PB data in a month (2007), 20000PB a day (2008)
2010 census data is expected to be a huge gold mine of information
Data mining huge amounts of data collected in a wide range of domains from
astronomy to healthcare has become essential for planning and performance.
We are in a knowledge economy.
◦ Data is an important asset to any organization
◦ Discovery of knowledge; Enabling discovery; annotation of data
We are looking at newer
◦ programming models, and
◦ Supporting algorithms and data structures.
NSF refers to it as “data-intensive computing” and industry calls it “big-data” and
“cloud computing”

10/6/2019
2
Purpose of this talk
 To provide a simple introduction to:
“The big-data computing” : An important
advancement that has a potential to impact
significantly the CS and undergraduate curriculum.
A programming model called MapReduce for
processing “big-data”
A supporting file system called Hadoop Distributed
File System (HDFS)
 To encourage educators to explore ways to infuse
relevant concepts of this emerging area into their
curriculum.

10/6/2019
3
The Outline
Introduction to MapReduce
From CS Foundation to MapReduce
MapReduce programming model
Hadoop Distributed File System
Relevance to Undergraduate Curriculum
Demo (Internet access needed)
Our experience with the framework
Summary
References

10/6/2019
4
MapReduce

10/6/2019 5
What is MapReduce?
 MapReduce is a programming model Google has used
successfully is processing its “big-data” sets (~ 20000 peta
bytes per day)
Users specify the computation in terms of a map and a
reduce function,
Underlying runtime system automatically parallelizes
the computation across large-scale clusters of machines,
and
Underlying system also handles machine failures,
efficient communications, and performance issues.
-- Reference: Dean, J. and Ghemawat, S. 2008. MapReduce: simplified
data processing on large clusters. Communication of ACM 51, 1 (Jan.
2008), 107-113.

10/6/2019
6
From CS Foundations to
MapReduce
Consider a large data collection:
{web, weed, green, sun, moon, land, part, web, green,…}
Problem: Count the occurrences of the different words in the collection.

Lets design a solution for this problem;

 We will start from scratch
 We will add and relax constraints
 We will do incremental design, improving the solution for performance and
scalability

10/6/2019
7
Word Counter and Result
Table
{web, weed, green, sun, moon, land, part, web 2
web, green,…}
weed 1

green 2
Data Main
sun 1
collection
moon 1

land 1

WordCounter part 1

parse( )
count( )

DataCollection ResultTable

10/6/2019
8
Multiple Instances of Word
Counter web 2

weed 1

green 2
Data
Main sun 1
collection
moon 1
Thread
land 1
1..*
WordCounter part 1

parse( )
count( )

DataCollection ResultTable Observe:

Multi-thread
Lock on shared data

10/6/2019
9
Improve Word Counter for
Performance Main
N No need for lock
oweb 2

weed 1

Data green 2
collection
sun 1

moon 1
Thread
land 1
1..*
1..* part 1
Parser Counter

WordList
Separate counters
DataCollection ResultTable

KEY web weed green sun moon land part web green …….

VALUE
10/6/2019
10
Peta-scale Data Main
web 2

weed 1

green 2

Data sun 1

collection moon 1
Thread
land 1
1..*
1..* part 1
Parser Counter

DataCollection WordList ResultTable

KEY web weed green sun moon land part web green …….

VALUE
10/6/2019
11
Addressing the Scale Issue
 Single machine cannot serve all the data: you need a distributed special
(file) system
 Large number of commodity hardware disks: say, 1000 disks 1TB each
Issue: With Mean time between failures (MTBF) or
failure rate of 1/1000, then at least 1 of the above 1000
disks would be down at a given time.
Thus failure is norm and not an exception.
File system has to be fault-tolerant: replication,
checksum
Data transfer bandwidth is critical (location of data)
 Critical aspects: fault tolerance + replication + load balancing, monitoring
 Exploit parallelism afforded by splitting parsing and counting
 Provision and locate computing at data locations

10/6/2019
12
Peta-scale Data Main
web 2

weed 1

green 2

Data sun 1

collection moon 1
Thread
land 1
1..*
1..* part 1
Parser Counter

DataCollection WordList ResultTable

KEY web weed green sun moon land part web green …….

VALUE
10/6/2019
13
Peta
Data Scale Data is Commonly Distributed
collection

Main
web 2
Data
collection weed 1

green 2

Data sun 1
collection
moon 1
Thread
land 1
1..*
Data part 1
1..*
collection Parser Counter

WordList
Data DataCollection ResultTable

collection Issue: managing the

large scale data
KEY web weed green sun moon land part web green …….

VALUE
10/6/2019
14
Data
collection

Write Once Read Many (WORM)

Main
data
web 2
Data
collection weed 1

green 2

Data sun 1
collection
moon 1
Thread
land 1
1..*
Data part 1
1..*
collection Parser Counter

WordList
Data DataCollection ResultTable

collection

KEY web weed green sun moon land part web green …….

VALUE
10/6/2019
15
Data
collection

WORM Data is AmenableMain

to Parallelism
Data
collection
1. Data with WORM
characteristics : yields
Data to parallel processing;
collection 2. Data without
Thread dependencies: yields
1..*
to out of order
Data
1..* processing
collection Parser Counter

WordList
Data DataCollection ResultTable

collection

10/6/2019
16
For our example,
Divide and Conquer: Provision Computing at Data Location Main

#1: Schedule parallel parse tasks

Data Thread
#2: Schedule parallel count tasks
1..*

collection Parser
1..*
Counter

One node DataCollection WordList ResultTable This is a particular solution;

Lets generalize it:
Main

Data Thread
Our parse is a mapping operation:
collection Parser
1..*
1..*

Counter
MAP: input  <key, value> pairs
DataCollection WordList ResultTable

Main
Our count is a reduce operation:
REDUCE: <key, value> pairs reduced
Data Thread

1..*

collection Parser
1..*
Counter

Map/Reduce originated from Lisp

DataCollection WordList ResultTable

But have different meaning here

Main

Runtime adds distribution + fault

Data Thread tolerance + replication + monitoring +
collection Parser
1..*
1..*

Counter

load balancing to your base application!

DataCollection WordList ResultTable

10/6/2019
17
Mapper and Reducer
MapReduceTask

Mapper Reducer

YourReducer Counter
YourMapper Parser

Remember: MapReduce is simplified processing for larger data sets:

MapReduce Version of WordCount Source code
10/6/2019
18
Map Operation web

weed
1

green 1
MAP: Input data  <key, value> pair
web 1
sun 1 weed 1
moon 1 green 1
land 1 sun1 1
web
part 1 moon 1
Map web
weed
1
1
land
1web 1 1
Data green
green
1 part
1weed 1 1
Collection: split1 Split the data to web … 1
sun
1 web
moon 1green 1 1
Supply multiple weedKEY 1 VALUE
land green
1sun 1 1

processors green 1
part … 1moon 1 1
sun 1 KEY1land VALUE
1
web
moon 1
green 1part 1

Data Map land 1

… 1web 1
part 1
Collection: split 2 KEY green
VALUE 1
……

web 1 … 1

…
green 1 KEY VALUE
… 1

KEY VALUE

Data
Collection: split n

10/6/2019
19
Reduce Operation
MAP: Input data  <key, value> pair
REDUCE: <key, value> pair  <result>
Reduce
Map
Data
Collection: split1 Split the data to
Supply multiple
processors
Reduce
Data Map
Collection: split 2
……

Data
…
Reduce
Collection: split n Map

10/6/2019
20
Large scale data splits
Map <key, 1> Reducers (say, Count)

Parse-hash

Count
P-0000
, count1

Parse-hash

Count
P-0001
, count2
Parse-hash

Count
P-0002
Parse-hash ,count3

10/6/2019
21
MapReduce Example in my operating systems class
combine part0
map reduce
Cat split

reduce part1
split map combine

Bat

map part2
split combine reduce
Dog

split map
Other
Words
(size:
TByte)
10/6/2019
22
MapReduce
Programming
Model

10/6/2019 23
MapReduce programming model

 Determine if the problem is parallelizable and solvable using MapReduce

(ex: Is the data WORM?, large data set).
 Design and implement solution as Mapper classes and Reducer class.
 Compile the source code with hadoop core.
 Package the code as jar executable.
 Configure the application (job) as to the number of mappers and reducers
(tasks), input and output streams
 Load the data (or use it on previously available data)
 Launch the job and monitor.
 Study the result.
 Detailed steps.

10/6/2019
24
MapReduce Characteristics
 Very large scale data: peta, exa bytes
 Write once and read many data: allows for parallelism without mutexes
 Map and Reduce are the main operations: simple code
 There are other supporting operations such as combine and partition (out of the scope
of this talk).
 All the map should be completed before reduce operation starts.
 Map and reduce operations are typically performed by the same physical processor.
 Number of map tasks and reduce tasks are configurable.
 Operations are provisioned near the data.
 Commodity hardware and storage.
 Runtime takes care of splitting and moving data for operations.
 Special distributed file system. Example: Hadoop Distributed File System and Hadoop
Runtime.

10/6/2019
25
Classes of problems
“mapreducable”
 Benchmark for comparing: Jim Gray’s challenge on data-intensive
computing. Ex: “Sort”
 Google uses it (we think) for wordcount, adwords, pagerank, indexing
data.
 Simple algorithms such as grep, text-indexing, reverse indexing
 Bayesian classification: data mining domain
 Facebook uses it for various operations: demographics
 Financial services use it for analytics
 Astronomy: Gaussian analysis for locating extra-terrestrial objects.
 Expected to play a critical role in semantic web and web3.0

10/6/2019
26
Scope of MapReduce
Data size: small
Pipelined Instruction level

Concurrent Thread level

Service Object level

Indexed File level

Mega Block level

Virtual System Level

Data size: large

10/6/2019
27
Hadoop

10/6/2019 28
What is Hadoop?
 At Google MapReduce operation are run on a special file system called
Google File System (GFS) that is highly optimized for this purpose.
 GFS is not open source.
 Doug Cutting and Yahoo! reverse engineered the GFS and called it
Hadoop Distributed File System (HDFS).
 The software framework that supports HDFS, MapReduce and other
related entities is called the project Hadoop or simply Hadoop.
 This is open source and distributed by Apache.

10/6/2019
29
Basic Features: HDFS
Highly fault-tolerant
High throughput
Suitable for applications with large data sets
Streaming access to file system data
Can be built out of commodity hardware

10/6/2019
30
Hadoop Distributed File
System HDFS Server Master node

HDFS Client
Application

Local file
system
Block size: 2K
Name Nodes
Block size: 128M
More details: We discuss this in great detail in my Operating Replicated
Systems course
10/6/2019
31
Hadoop Distributed File
System HDFS Server Master node

blockmap

HDFS Client heartbeat

Application

Local file
system
Block size: 2K
Name Nodes
Block size: 128M
More details: We discuss this in great detail in my Operating Replicated
Systems course
10/6/2019
32
Relevance and Impact on
Undergraduate courses
 Data structures and algorithms: a new look at traditional algorithms
such as sort: Quicksort may not be your choice! It is not easily
parallelizable. Merge sort is better.
 You can identify mappers and reducers among your algorithms.
Mappers and reducers are simply place holders for algorithms
relevant for your applications.
 Large scale data and analytics are indeed concepts to reckon with
similar to how we addressed “programming in the large” by OO
concepts.
 While a full course on MR/HDFS may not be warranted, the concepts
perhaps can be woven into most courses in our CS curriculum.

10/6/2019
33
Demo
VMware simulated Hadoop and MapReduce demo
Remote access to NEXOS system at my Buffalo office
5-node HDFS running HDFS on Ubuntu 8.04
1 –name node and 4 data-nodes
Each is an old commodity PC with 512 MB RAM, 120GB – 160GB
external memory
Zeus (namenode), datanodes: hermes, dionysus, aphrodite, athena

10/6/2019
34
Summary
We introduced MapReduce programming model for processing large
scale data
We discussed the supporting Hadoop Distributed File System
The concepts were illustrated using a simple example
We reviewed some important parts of the source code for the example.
Relationship to Cloud Computing

10/6/2019
35
References
1. Apache Hadoop Tutorial: http://hadoop.apache.org
http://hadoop.apache.org/core/docs/current/mapred_t
utorial.html
2. Dean, J. and Ghemawat, S. 2008. MapReduce: simplified
data processing on large clusters. Communication of
ACM 51, 1 (Jan. 2008), 107-113.
3. Cloudera Videos by Aaron Kimball:
http://www.cloudera.com/hadoop-training-basic
4.
http://www.cse.buffalo.edu/faculty/bina/mapreduce.ht
ml

10/6/2019
36

MapReduce Tutorial
No ratings yet
MapReduce Tutorial
192 pages
L1: Introduction, Mapreduce, Spark: Csl7710: Machine Learning With Big Data Dip Sankar Banerjee Cse, Iit Jodhpur
No ratings yet
L1: Introduction, Mapreduce, Spark: Csl7710: Machine Learning With Big Data Dip Sankar Banerjee Cse, Iit Jodhpur
51 pages
MapReduce Its Applications For Course
No ratings yet
MapReduce Its Applications For Course
36 pages
7-Brief About Big Data, Hadoop Map Reduce-31-07-2023
No ratings yet
7-Brief About Big Data, Hadoop Map Reduce-31-07-2023
35 pages
Chapter 4
No ratings yet
Chapter 4
71 pages
Week 14
No ratings yet
Week 14
33 pages
Introduction To MapReduce
No ratings yet
Introduction To MapReduce
17 pages
Ch02a Mapreduce
No ratings yet
Ch02a Mapreduce
53 pages
Map Reduce
No ratings yet
Map Reduce
30 pages
Big Data Analytics Module 3: Mapreduce Paradigm: Faculty Name: Ms. Varsha Sanap Dr. Vivek Singh
No ratings yet
Big Data Analytics Module 3: Mapreduce Paradigm: Faculty Name: Ms. Varsha Sanap Dr. Vivek Singh
36 pages
Lecture 10 Map Reduce
No ratings yet
Lecture 10 Map Reduce
42 pages
Big Data Management Continued
No ratings yet
Big Data Management Continued
48 pages
02 Hadoop
No ratings yet
02 Hadoop
117 pages
Unit IV Hadoop
No ratings yet
Unit IV Hadoop
90 pages
Introduction To MapReduce
No ratings yet
Introduction To MapReduce
9 pages
Map Reduce
No ratings yet
Map Reduce
28 pages
Bda 2
No ratings yet
Bda 2
35 pages
Big Data and Analytics and MapReduce 29052023 054155pm
No ratings yet
Big Data and Analytics and MapReduce 29052023 054155pm
35 pages
Map Reduce Notes and Learning
No ratings yet
Map Reduce Notes and Learning
48 pages
Lez.d-01-Hadoop (A) Intro
No ratings yet
Lez.d-01-Hadoop (A) Intro
58 pages
Mapreduce and Hadoop Distributed File System
No ratings yet
Mapreduce and Hadoop Distributed File System
45 pages
Ir MR 1
No ratings yet
Ir MR 1
34 pages
Introduction To Map Reduce
No ratings yet
Introduction To Map Reduce
50 pages
AAAI2011 Tutorial Slides
No ratings yet
AAAI2011 Tutorial Slides
213 pages
TM2 ch02 Mapreduce
No ratings yet
TM2 ch02 Mapreduce
51 pages
Unit 5
No ratings yet
Unit 5
32 pages
Week 02
No ratings yet
Week 02
115 pages
Lecture 3 - MapReduce
No ratings yet
Lecture 3 - MapReduce
9 pages
Map Reduce
No ratings yet
Map Reduce
42 pages
Big Data Computing
No ratings yet
Big Data Computing
36 pages
CAIM: Cerca I Anàlisi D'informació Massiva: FIB, Grau en Enginyeria Informàtica
No ratings yet
CAIM: Cerca I Anàlisi D'informació Massiva: FIB, Grau en Enginyeria Informàtica
65 pages
Problem-Solving Using Mapreduce/Hadoop
No ratings yet
Problem-Solving Using Mapreduce/Hadoop
22 pages
Hadoop: A Seminar Report On
No ratings yet
Hadoop: A Seminar Report On
28 pages
Hadoop Spark
No ratings yet
Hadoop Spark
34 pages
Bda Unit 1
No ratings yet
Bda Unit 1
32 pages
Map Reduce Programming
No ratings yet
Map Reduce Programming
74 pages
Parlab Parallel Boot Camp: Cloud Computing With Mapreduce and Hadoop
No ratings yet
Parlab Parallel Boot Camp: Cloud Computing With Mapreduce and Hadoop
55 pages
Big Data
No ratings yet
Big Data
43 pages
Unit 5 Lecture 5
No ratings yet
Unit 5 Lecture 5
21 pages
Lecture4 IntroMapReduce PDF
No ratings yet
Lecture4 IntroMapReduce PDF
75 pages
Introduction To Hadoop
No ratings yet
Introduction To Hadoop
37 pages
11 Lecture
No ratings yet
11 Lecture
22 pages
HadoopMapreduce Summerization
No ratings yet
HadoopMapreduce Summerization
24 pages
Parlab Parallel Boot Camp: Cloud Computing With Mapreduce and Hadoop
No ratings yet
Parlab Parallel Boot Camp: Cloud Computing With Mapreduce and Hadoop
53 pages
9 Hadoop PDF
No ratings yet
9 Hadoop PDF
59 pages
Module2 C MapReduceParadigm
No ratings yet
Module2 C MapReduceParadigm
74 pages
Lecture 4: Mapreduce and Hadoop: Indranil Gupta (Indy)
No ratings yet
Lecture 4: Mapreduce and Hadoop: Indranil Gupta (Indy)
37 pages
1.4 Map Reduce
No ratings yet
1.4 Map Reduce
30 pages
Chapter Five Hadoop Mapreduce & HDFS
No ratings yet
Chapter Five Hadoop Mapreduce & HDFS
44 pages
Distributed and Cloud Computing
No ratings yet
Distributed and Cloud Computing
58 pages
Mapreduce and Hadoop Distributed File System
No ratings yet
Mapreduce and Hadoop Distributed File System
36 pages
Parlab Parallel Boot Camp Cloud Computing With Mapreduce and Hadoop
No ratings yet
Parlab Parallel Boot Camp Cloud Computing With Mapreduce and Hadoop
49 pages
Data Mining With Hadoop and Hive Introduction To Architecture
No ratings yet
Data Mining With Hadoop and Hive Introduction To Architecture
39 pages
Map Reduce
No ratings yet
Map Reduce
69 pages
Big Data, Map Reduce & Hadoop: By: Surbhi Vyas (7) Varsha
No ratings yet
Big Data, Map Reduce & Hadoop: By: Surbhi Vyas (7) Varsha
40 pages
Hadoop Course Content
No ratings yet
Hadoop Course Content
3 pages
Hadoop
No ratings yet
Hadoop
34 pages
A Beginner's Guide To MariaDB Presentation
67% (3)
A Beginner's Guide To MariaDB Presentation
26 pages
Chapter 3. Wireless Networks and Wan Technologies 1398-2-24-14-59
No ratings yet
Chapter 3. Wireless Networks and Wan Technologies 1398-2-24-14-59
70 pages
02 Linux - Fundamentals
No ratings yet
02 Linux - Fundamentals
28 pages
Spark Preliminaries
No ratings yet
Spark Preliminaries
4 pages
CCNA Security Ch03 Quiz Answers
100% (1)
CCNA Security Ch03 Quiz Answers
9 pages
4) Information Schema & Performanc Schema PDF
No ratings yet
4) Information Schema & Performanc Schema PDF
22 pages
Essential Check Point FireWall-1
No ratings yet
Essential Check Point FireWall-1
720 pages
How To Connect To MySQL Using PHP
No ratings yet
How To Connect To MySQL Using PHP
3 pages
CS Assessment 1 - Question 1
No ratings yet
CS Assessment 1 - Question 1
8 pages
ADAM-6700 - Node-Red Application Tutorial & Example
No ratings yet
ADAM-6700 - Node-Red Application Tutorial & Example
29 pages
IBM Sterling B2B Integrator - Understanding and Monitoring Database Growth
No ratings yet
IBM Sterling B2B Integrator - Understanding and Monitoring Database Growth
54 pages
Rman Q A
No ratings yet
Rman Q A
16 pages
Working of Cache Technology
100% (1)
Working of Cache Technology
5 pages
TutorialModule5 Part1 Answers
100% (1)
TutorialModule5 Part1 Answers
8 pages
Network Basics: Ipaddress Netmask Gateway Dns Server Ip (If Any) Hostname Tcp/Ip Service On
No ratings yet
Network Basics: Ipaddress Netmask Gateway Dns Server Ip (If Any) Hostname Tcp/Ip Service On
48 pages
Data Mining Cheat Sheet PDF
No ratings yet
Data Mining Cheat Sheet PDF
6 pages
Guide To Microsoft System Center Management Pack For SQL Server Dashboards
No ratings yet
Guide To Microsoft System Center Management Pack For SQL Server Dashboards
43 pages
An 829
No ratings yet
An 829
22 pages
2021-08-06 05.41.29 Crash - 3828
No ratings yet
2021-08-06 05.41.29 Crash - 3828
28 pages
English PART 02 Gampaha Zone
No ratings yet
English PART 02 Gampaha Zone
8 pages
Eds Cobol Training
No ratings yet
Eds Cobol Training
120 pages
AERPLOT Sample Run: Preparation
No ratings yet
AERPLOT Sample Run: Preparation
4 pages
TCP and Udp Tutorial
No ratings yet
TCP and Udp Tutorial
10 pages
Operating Systems: EDITION q3.2
No ratings yet
Operating Systems: EDITION q3.2
3 pages
Pages From RedHat - EX200 PDF
No ratings yet
Pages From RedHat - EX200 PDF
18 pages
How SysAid Remote Control Works
No ratings yet
How SysAid Remote Control Works
2 pages
Error Control PDF
No ratings yet
Error Control PDF
5 pages
6 Structured Query Language (II) : Data Query and Update: ICT Focus
No ratings yet
6 Structured Query Language (II) : Data Query and Update: ICT Focus
5 pages
Flat File Testing
No ratings yet
Flat File Testing
2 pages
Collections in Python Are Basically Container Data Types, Namely Lists, Sets, Tuples, Dictionary
No ratings yet
Collections in Python Are Basically Container Data Types, Namely Lists, Sets, Tuples, Dictionary
3 pages
Addressing Earth's Challenges: GIS for Earth Sciences
From Everand
Addressing Earth's Challenges: GIS for Earth Sciences
Lorraine Tighe
No ratings yet
Hadoop Ecosystem for Big Data
From Everand
Hadoop Ecosystem for Big Data
Dr. Zemelak Goraga
No ratings yet
Technology and Emergency Management
From Everand
Technology and Emergency Management
John C. Pine
No ratings yet

Mapreduce and Hadoop Distributed File System: K. Madurai and B. Ramamurthy

Uploaded by

Mapreduce and Hadoop Distributed File System: K. Madurai and B. Ramamurthy

Uploaded by

MapReduce and

Lets design a solution for this problem;

DataCollection ResultTable Observe:

DataCollection WordList ResultTable

DataCollection WordList ResultTable

collection Issue: managing the

Write Once Read Many (WORM)

WORM Data is AmenableMain

#1: Schedule parallel parse tasks

One node DataCollection WordList ResultTable This is a particular solution;

Map/Reduce originated from Lisp

But have different meaning here

Runtime adds distribution + fault

load balancing to your base application!

Remember: MapReduce is simplified processing for larger data sets:

Data Map land 1

 Determine if the problem is parallelizable and solvable using MapReduce

Concurrent Thread level

Service Object level

Indexed File level

Mega Block level

Virtual System Level

Data size: large

HDFS Client heartbeat

You might also like