0% found this document useful (0 votes)
45 views7 pages

MapReduce - What It Is, and Why It Is So Popular

The document discusses MapReduce, a programming model for processing large datasets across clusters of computers. It describes how MapReduce addresses challenges like sorting terabytes and petabytes of data by distributing tasks across thousands of computers to achieve high performance. The document also provides examples of MapReduce sorting large amounts of data very quickly using large computer clusters.

Uploaded by

jefferyleclerc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views7 pages

MapReduce - What It Is, and Why It Is So Popular

The document discusses MapReduce, a programming model for processing large datasets across clusters of computers. It describes how MapReduce addresses challenges like sorting terabytes and petabytes of data by distributing tasks across thousands of computers to achieve high performance. The document also provides examples of MapReduce sorting large amounts of data very quickly using large computer clusters.

Uploaded by

jefferyleclerc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

MapReduce

What it is, and why it is so popular

Luigi Laura

Dipartimento di Informatica e Sistemistica


“Sapienza” Università di Roma

Rome, May 9th and 11th , 2012


Motivations: From the description of this course...

...This is a tentative list of questions that are likely be covered in


the class:
I The running times obtained in practice by scanning a
moderately large matrix by row or by column may be very
different: what is the reason? Is the assumption that memory
access times are constant realistic?
I How would you sort 1TB of data? How would you measure
the performances of algorithms in applications that need to
process massive data sets stored in secondary memories?
I Do memory allocation and free operations really require
constant time? How do real memory allocators work?
I ...
Motivations: From the description of this course...

...This is a tentative list of questions that are likely be covered in


the class:
I The running times obtained in practice by scanning a
moderately large matrix by row or by column may be very
different: what is the reason? Is the assumption that memory
access times are constant realistic?
I How would you sort 1TB of data? How would you measure
the performances of algorithms in applications that need to
process massive data sets stored in secondary memories?
I Do memory allocation and free operations really require
constant time? How do real memory allocators work?
I ...
Motivations: sorting one Petabyte
Motivations: sorting...

I Nov. 2008: 1TB, 1000 computers, 68 seconds.


Previous record was 910 computers, 209 seconds.
Motivations: sorting...

I Nov. 2008: 1TB, 1000 computers, 68 seconds.


Previous record was 910 computers, 209 seconds.
I Nov. 2008: 1PB, 4000 computers, 6 hours; 48k harddisks...
Motivations: sorting...

I Nov. 2008: 1TB, 1000 computers, 68 seconds.


Previous record was 910 computers, 209 seconds.
I Nov. 2008: 1PB, 4000 computers, 6 hours; 48k harddisks...
I Sept. 2011: 1PB, 8000 computers, 33 minutes.

You might also like