0% found this document useful (0 votes)
17 views3 pages

Map Reduce

The document discusses the Apache Hadoop framework, which allows for distributed processing of large datasets across clusters of computers. It describes Hadoop as consisting of two core components - the Hadoop Distributed File System (HDFS) for storage, and MapReduce for processing. The document also lists some key Hadoop modules including Hadoop Common, HDFS, YARN, and MapReduce.

Uploaded by

jefferyleclerc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views3 pages

Map Reduce

The document discusses the Apache Hadoop framework, which allows for distributed processing of large datasets across clusters of computers. It describes Hadoop as consisting of two core components - the Hadoop Distributed File System (HDFS) for storage, and MapReduce for processing. The document also lists some key Hadoop modules including Hadoop Common, HDFS, YARN, and MapReduce.

Uploaded by

jefferyleclerc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

MapReduce

& Pig & Spark

Ioanna Miliou
Giuseppe Attardi

Advanced Programming
Università di Pisa
Hadoop
• The Apache™ Hadoop® project develops open-source software for
reliable, scalable, distributed computing.

• Framework that allows for the distributed processing of large data sets
across clusters of computers using simple programming models.

• It is designed to scale up from single servers to thousands of machines,


each offering local computation and storage.

• It is designed to detect and handle failures at the application layer.

The core of Apache Hadoop consists of a storage part, known as Hadoop


Distributed File System (HDFS), and a processing part called MapReduce.
Hadoop
• The project includes these modules:

– Hadoop Common: The common utilities that support the other


Hadoop modules.

– Hadoop Distributed File System (HDFS): A distributed file


system that provides high-throughput access to application
data.

– Hadoop YARN: A framework for job scheduling and cluster


resource management.

– Hadoop MapReduce: A YARN-based system for parallel


processing of large data sets.

You might also like