The document discusses MapReduce, a programming model for processing large datasets across clusters of computers. It describes how MapReduce addresses challenges like sorting terabytes and petabytes of data by distributing tasks across thousands of computers to achieve high performance. The document also provides examples of MapReduce sorting large amounts of data very quickly using large computer clusters.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
45 views7 pages
MapReduce - What It Is, and Why It Is So Popular
The document discusses MapReduce, a programming model for processing large datasets across clusters of computers. It describes how MapReduce addresses challenges like sorting terabytes and petabytes of data by distributing tasks across thousands of computers to achieve high performance. The document also provides examples of MapReduce sorting large amounts of data very quickly using large computer clusters.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7
MapReduce
What it is, and why it is so popular
Luigi Laura
Dipartimento di Informatica e Sistemistica
“Sapienza” Università di Roma
Rome, May 9th and 11th , 2012
Motivations: From the description of this course...
...This is a tentative list of questions that are likely be covered in
the class: I The running times obtained in practice by scanning a moderately large matrix by row or by column may be very different: what is the reason? Is the assumption that memory access times are constant realistic? I How would you sort 1TB of data? How would you measure the performances of algorithms in applications that need to process massive data sets stored in secondary memories? I Do memory allocation and free operations really require constant time? How do real memory allocators work? I ... Motivations: From the description of this course...
...This is a tentative list of questions that are likely be covered in
the class: I The running times obtained in practice by scanning a moderately large matrix by row or by column may be very different: what is the reason? Is the assumption that memory access times are constant realistic? I How would you sort 1TB of data? How would you measure the performances of algorithms in applications that need to process massive data sets stored in secondary memories? I Do memory allocation and free operations really require constant time? How do real memory allocators work? I ... Motivations: sorting one Petabyte Motivations: sorting...
I Nov. 2008: 1TB, 1000 computers, 68 seconds.
Previous record was 910 computers, 209 seconds. Motivations: sorting...
I Nov. 2008: 1TB, 1000 computers, 68 seconds.
Previous record was 910 computers, 209 seconds. I Nov. 2008: 1PB, 4000 computers, 6 hours; 48k harddisks... Motivations: sorting...
I Nov. 2008: 1TB, 1000 computers, 68 seconds.
Previous record was 910 computers, 209 seconds. I Nov. 2008: 1PB, 4000 computers, 6 hours; 48k harddisks... I Sept. 2011: 1PB, 8000 computers, 33 minutes.