HDFS Presentation Kunal Yadav
HDFS Presentation Kunal Yadav
HDFS Presentation Kunal Yadav
• Purpose of HDFS
• - Handles large data sets with high fault
tolerance.
• - Supports distributed data storage.
Architecture of HDFS
• Master-Slave Architecture
• Replication Factor
• - Data split into blocks, each replicated across
nodes for redundancy.
Key Concepts in HDFS
• Blocks
• Replication
• - Default replication factor of 3 per block.
• Fault Tolerance
• - Data remains available despite node failures.
Namenode & Datanode Roles
• Namenode (Master)
• Datanodes (Slaves)
• - Store and retrieve data blocks.
• - Send regular status updates to Namenode.
Data Storage Process in HDFS
• Data Write Process
• Heartbeat Mechanism
• - Datanodes send "heartbeat" signals to
Namenode.
• - Namenode re-replicates data if a Datanode
fails.
Advantages of HDFS
• Scalable
• Cost-Effective
• - Uses commodity hardware.
• High Availability
• - Replication ensures data remains accessible.
Limitations of HDFS
• Not for Small Files
• Latency
• - Slower for real-time processing.
• Data Processing
• - Supports batch processing like log analysis.
• Data Backup
• - Secure, distributed storage for large datasets.
Conclusion
• Summary
• Future of HDFS
• - Advancements for scalability and resilience.