Bda 2 - Hadoop
Bda 2 - Hadoop
Hadoop
Dr. Shivangi Shukla
Assistant Professor
Computer Science and Engineering
IIIT Pune
Contents
• Introduction
• History
• The Apache Hadoop Project
• HDFS
11
HDFS (Hadoop Distributed File
System)
• Distributed Filesystems
• Filesystems that manage the storage across a
network of machines are called distributed
filesystems.
• Distributed filesystems are more complex than
regular disk filesystems owing to complications of
network programming.
• One of the biggest challenge is to tolerate node
failure without suffering data loss.
• Commodity Hardware
• Hadoop is designed to run on clusters of commodity
hardware.
• These commodity hardware have high chance of node
failure across the cluster, especially in case of large
clusters
• HDFS is designed to carry on working without a
noticeable interruption to the user in the face of such
failure