0% found this document useful (0 votes)
24 views22 pages

HDFS

Uploaded by

maheshmkvb92
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views22 pages

HDFS

Uploaded by

maheshmkvb92
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

MODULE-2 HDFS (HADOOP DISTRIBUTED FILE SYSTEM)

Introduction to Distributed Filesystems

When a dataset exceeds the storage capacity of a single physical machine, it becomes necessary to partition
and distribute it across multiple machines. Distributed filesystems manage this distributed storage across a
network of machines, allowing for scalability and fault tolerance.

Key Points:

• Definition: Distributed filesystems are filesystems that operate over a network, managing storage
across multiple machines.
• Complexity: They are more complex than traditional disk filesystems due to the challenges of network
programming and ensuring data integrity across nodes.
• Challenges:
o Node Failure: Ensuring the filesystem can tolerate the failure of nodes without data loss.
o Network Issues: Handling latency, bandwidth limitations, and reliability of the network.

2. Hadoop Distributed Filesystem (HDFS)

HDFS is a distributed filesystem designed to store and manage large datasets across a cluster of machines. It
is a core component of the Hadoop ecosystem.

Key Features:

• Scalability: HDFS is designed to scale out by adding more nodes to the cluster, which increases
storage capacity and computational power.
• Fault Tolerance: It replicates data across multiple nodes to ensure data is not lost in case of hardware
failures.
• High Throughput: Optimized for high throughput rather than low latency, making it suitable for
processing large files.

Key Components:

• NameNode: Manages the metadata of the filesystem, such as file names, directory structure, and file-
to-block mapping. It is a single point of failure and crucial for the filesystem's operation.
• DataNode: Stores the actual data blocks. Data is replicated across multiple DataNodes for fault
tolerance.
• Secondary NameNode: Performs periodic checkpoints of the NameNode's metadata to provide a
backup in case of failures.
HDFS Operation:

• Data Storage: Files are split into blocks (default size is 128 MB or 256 MB) and distributed across the
cluster. Each block is replicated multiple times (default replication factor is 3) to ensure data reliability.
• Access: HDFS is designed for large-scale data processing tasks and provides write-once, read-many
access patterns. It is not optimized for small file storage or frequent updates.

Integration with Other Storage Systems

Hadoop’s filesystem abstraction allows it to integrate with various storage systems beyond HDFS:

• Local Filesystem: Hadoop can use the local filesystem for development or smaller datasets.
• Amazon S3: Hadoop can interact with Amazon S3 as a storage backend, allowing for scalable storage
in the cloud. This integration uses the S3A file system client to read and write data.
• RDBMS, DateLake, DataWarehouses, streaming systems, cloud systems and so on

Key Considerations:

• Data Consistency: When integrating with other storage systems, ensure that the data consistency
models are compatible with your use case.
• Performance: Consider the performance implications of different storage backends, particularly in
terms of latency and throughput.
Design and Use Cases of HDFS

1. Overview of HDFS Design

HDFS (Hadoop Distributed File System) is specifically designed to address the needs of storing and processing
very large files in a distributed computing environment. Here’s a breakdown of its key design principles:

Key Features:

• Very Large Files:


o Definition: In HDFS, "very large" refers to files that range from hundreds of megabytes to
terabytes, and even petabytes in some cases.
o Purpose: It is optimized for storing such large datasets efficiently, which is essential for big
data applications and analytics.
• Streaming Data Access:
o Access Pattern: HDFS is built around a write-once, read-many-times model. This model suits
scenarios where data is initially ingested and then analyzed multiple times.
o Performance: The design prioritizes high throughput for reading large datasets over low-
latency access. The time to read the entire dataset is more critical than the time to read the first
record.
• Commodity Hardware:
o Cost Efficiency: HDFS is designed to run on clusters of inexpensive, commonly available
hardware. This design choice makes it cost-effective and scalable.
o Fault Tolerance: It anticipates hardware failures and is designed to continue operating with
minimal user impact, leveraging the redundancy built into the system.

2. Challenges and Limitations

While HDFS is powerful for many use cases, there are specific scenarios where it may not be the best fit:

Low-Latency Data Access:

• Latency Constraints: Applications requiring quick, sub-second data access (in the tens of
milliseconds range) may not perform well with HDFS. Its optimization for high throughput rather than
low latency means it may not meet the performance needs of such applications.
• Alternative Solutions: HBase, which is built on top of HDFS, is often used for scenarios requiring
low-latency data access. HBase provides capabilities for fast random access to data.
Handling Lots of Small Files:

• Metadata Management: HDFS stores metadata (such as file and directory information) in memory
on the NameNode. This means the scalability of HDFS in terms of the number of files is limited by the
memory capacity of the NameNode.
• Scalability Constraints: Each file, directory, and block take up about 150 bytes of memory. For
example, handling one million files requires at least 300 MB of memory. While managing millions of
files is feasible, managing billions of files can exceed current hardware capabilities.

Multiple Writers and Arbitrary Modifications:

• Write Restrictions: HDFS supports a single writer per file, with data being appended to the end of the
file. It does not support multiple writers or modifications at arbitrary file offsets.
• Future Considerations: Although support for multiple writers and arbitrary modifications might be
introduced in the future, such features are likely to be less efficient compared to the current append-
only model.

HDFS Concepts
Blocks

1. Understanding Blocks in Filesystems

In both traditional filesystems and HDFS, the concept of "blocks" is fundamental. Here's a detailed look at
what blocks are and their significance:

Filesystem Blocks vs. HDFS Blocks:

• Traditional Filesystems:
o Disk Blocks: The smallest unit of storage on a disk, usually 512 bytes.
o Filesystem Blocks: Typically larger, a few kilobytes in size, used by the filesystem to manage
data. The filesystem block size is an integral multiple of the disk block size.
o Tools: Commands like df and fsck operate at the filesystem block level for maintenance and
checking.
• HDFS Blocks:
o Size: Much larger than traditional filesystem blocks, with a default size of 128 MB.
o Function: Files in HDFS are divided into blocks of this size, which are stored independently
across the cluster.
o Storage Efficiency: If a file is smaller than a block, only the space needed for the file is used
(e.g., a 1 MB file on a 128 MB block uses only 1 MB of disk space).
2. Why Are HDFS Blocks So Large?

The choice of large block sizes in HDFS is driven by several practical considerations:

• Minimizing Seek Time:


o Seek Time vs. Transfer Rate: The time to seek to the start of a block can be minimized relative
to the time taken to transfer the data. Large blocks ensure that the transfer time dominates over
the seek time.
• MapReduce Considerations:
o Map Tasks: In MapReduce, tasks typically process one block at a time. Having too few blocks
compared to the number of nodes can lead to slower job execution due to insufficient
parallelism.

3. Benefits of Block Abstraction

The block abstraction in HDFS provides several advantages:

• Handling Large Files:


o Scalability: HDFS allows files to exceed the size of any single disk in the cluster. Blocks can
be distributed across all available disks, enabling the storage of very large files.
• Simplified Storage Management:
o Fixed Size: Blocks are of a fixed size, making it straightforward to calculate storage
requirements and manage space on disks.
o Metadata Management: Since blocks are only chunks of data, file metadata (like permissions)
is managed separately from the blocks, simplifying the storage subsystem.
• Replication and Fault Tolerance:
o Replication: To ensure fault tolerance, each block is replicated across multiple machines
(typically three). This redundancy allows for recovery from disk or machine failures.
o Automatic Recovery: If a block becomes unavailable due to corruption or a failure, it can be
replicated from other available copies to restore the required replication factor.
• Load Distribution:
o Read Load: Applications may set a higher replication factor for blocks in frequently accessed
files to distribute the read load across the cluster.

4. Filesystem Check (fsck) with HDFS

HDFS provides a command to understand and manage blocks:

• Command: hdfs fsck / -files -blocks


o Function: Lists the blocks that make up each file in the HDFS filesystem.
o Purpose: Useful for checking the health and integrity of the filesystem and its blocks.

Namenodes and Datanodes in HDFS

HDFS (Hadoop Distributed File System) operates with a master-worker architecture involving two main types
of nodes: Namenodes and Datanodes. Understanding their roles and mechanisms is essential for maintaining
the health and performance of an HDFS cluster.

1. Namenode

The Namenode is the master node in the HDFS architecture, responsible for managing the filesystem
namespace and metadata.

Responsibilities:

• Filesystem Namespace Management:


o Maintains the directory tree of the filesystem and metadata for all files and directories.
o Stores metadata persistently on local disks in two files: the namespace image and the edit log.
• Block Management:
o Keeps track of which Datanodes store the blocks for each file.
o Does not persistently store block locations; instead, this information is reconstructed from
Datanodes during system startup.
• Client Interaction:
o Clients interact with the Namenode to perform filesystem operations.
o Provides a filesystem interface, abstracting the complexities of the Namenode and Datanodes
from the user.

Failure and Recovery:

• Critical Role:
o If the Namenode fails, the filesystem cannot be used, and data loss can occur because the system
would not know how to reconstruct files from the blocks on Datanodes.
• Resilience Mechanisms:
o Backup:
▪ Namenode's persistent state is backed up to multiple filesystems. This includes
synchronous and atomic writes to local disks and remote NFS mounts.
o Secondary Namenode:
▪ Role: Periodically merges the namespace image with the edit log to prevent the edit log
from becoming too large.
▪ Operation: Runs on a separate physical machine with sufficient CPU and memory. It
keeps a copy of the merged namespace image.
▪ Limitations: The state of the Secondary Namenode lags behind the primary, so there is
a risk of data loss if the primary fails. In such cases, the primary's metadata files are
copied to the Secondary Namenode, which then acts as the new primary.
▪ Alternative: A hot standby Namenode can be used for high availability, which provides
a more robust failover solution.

2. Datanodes

Datanodes are the worker nodes in the HDFS architecture, responsible for storing and managing the data
blocks.

Responsibilities:

• Block Storage and Retrieval:


o Store and retrieve blocks as instructed by clients or the Namenode.
o Report periodically to the Namenode with lists of blocks they are storing.

Operation:

• Data Management:
o Datanodes handle the actual data blocks, and their efficiency directly impacts the performance
of HDFS.
o They ensure data redundancy and availability by storing multiple replicas of each block.

3. Interaction Between Namenode and Datanodes

• Client Operations:
o Clients interact with the Namenode to get information about where to find the blocks of a file.
o The Namenode directs clients to the appropriate Datanodes for block retrieval or storage.
• Datanode Reporting:
o Datanodes regularly send heartbeat signals and block reports to the Namenode.
o This reporting helps the Namenode monitor the health of the Datanodes and manage data
replication and block recovery.
Block Caching

Purpose:

• Block caching improves read performance by storing frequently accessed blocks in the memory of
Datanodes.
• It allows quick access to data without having to read from disk repeatedly.

Mechanics:

• Caching Location: Blocks are cached in an off-heap memory area on the Datanodes. Off-heap
memory is used to avoid garbage collection overhead.
• Cache Scope: By default, each block is cached in one Datanode’s memory. This is configurable on a
per-file basis, allowing multiple Datanodes to cache the same block if needed.
• Cache Management: Administrators can specify which files should be cached and the duration of
caching using cache directives added to a cache pool.
• Cache Pools: Cache pools are used to manage cache permissions and resource usage. They help in
organizing and controlling access to cached data.
Benefits:

• Performance Improvement: Increases read performance by reducing disk I/O. For instance, job
schedulers in frameworks like MapReduce or Spark can schedule tasks on Datanodes that have relevant
blocks cached, leading to faster data access.
• Use Cases: Ideal for use cases such as small lookup tables used in joins, where data is frequently
accessed.

HDFS Federation

Purpose:

• HDFS Federation enhances the scalability of HDFS by allowing the use of multiple Namenodes.

Mechanics:

• Namespace Management: In HDFS Federation, the filesystem namespace is split across multiple
Namenodes. Each Namenode manages a portion of the namespace and its associated block pool.
• Namespace Volumes: Each Namenode is responsible for a namespace volume, which includes
metadata for its portion of the namespace. These volumes are independent, meaning that the failure of
one Namenode does not affect others.
• Block Pool Storage: Datanodes register with all Namenodes in the cluster and store blocks from
multiple block pools. This ensures that blocks from different namespaces can be stored and managed
efficiently.

Access:

• Client Interaction: Clients use client-side mount tables to map file paths to the appropriate
Namenodes. Configuration is managed using ViewFileSystem and the viewfs:// URIs.

Benefits:

• Scalability: Allows the HDFS cluster to scale beyond the limitations of a single Namenode's memory
by distributing the namespace management.
• Fault Tolerance: Improves fault tolerance by isolating namespace management across multiple
Namenodes.
HDFS High Availability (HA)

Purpose:

• HDFS HA addresses the single point of failure issue associated with the Namenode by providing a pair
of Namenodes in an active-standby configuration.

Mechanics:

• Active-Standby Configuration: One Namenode acts as the active Namenode, handling client
requests, while the other acts as the standby, ready to take over if the active Namenode fails.
• Shared Storage: Namenodes use highly available shared storage to keep the edit log. Two main
choices for this storage are:
o NFS Filer: Traditional Network File System for shared storage.
o Quorum Journal Manager (QJM): A specialized HDFS component designed to provide
highly available edit logs. It uses a group of journal nodes where each edit must be written to a
majority of nodes (e.g., three nodes, allowing for one node failure).
• Datanode Reporting: Datanodes must report block information to both the active and standby
Namenodes.
• Client Configuration: Clients must be configured to handle Namenode failover transparently.

Failover Process:

• Quick Failover: The standby Namenode can take over quickly (within seconds) as it maintains up-to-
date state in memory, including the latest edit log and block mappings.
• Recovery: In case the standby Namenode is down when the active fails, the administrator can start the
standby from cold. While this process is better than the non-HA scenario, it still requires standard
operational procedures.

Advantages:

• Reduced Downtime: Provides high availability and reduces downtime by enabling a rapid failover
mechanism.
• Operational Efficiency: Standardizes the failover process, making it more predictable and
manageable.

Failover Process

Failover Controller:
• Role: Manages the transition between the active and standby Namenodes. It ensures that only one
Namenode is active at a time.
• Default Implementation: Uses ZooKeeper, which coordinates the failover process by monitoring the
health of the Namenodes and triggering failover if needed.
• Process:
o Heartbeat Mechanism: Each Namenode runs a lightweight failover controller that sends
heartbeats to check the status of the other Namenode.
o Graceful Failover: Can be initiated manually by an administrator, such as during routine
maintenance. This involves an orderly transition where both Namenodes switch roles smoothly.
o Ungraceful Failover: Occurs automatically if the active Namenode fails unexpectedly. This
can happen due to issues like network partitions or slow networks, where the active Namenode
might still be running but is unreachable.

Client Failover Handling:

• Transparent to Clients: The client library manages failover transparently. Clients are configured with
a logical hostname that maps to a pair of Namenode addresses.
• Failover Mechanism: The client library attempts connections to each Namenode address in turn until
it succeeds. This ensures continuous service availability even during failover.

Fencing Mechanisms

Purpose:

• Prevent Data Corruption: Ensures that the previously active Namenode, which might still be running
or reachable, does not interfere with the cluster operations or cause data corruption.

Fencing Techniques:

• SSH Fencing Command: A simple method where an SSH command is used to kill the process of the
previously active Namenode. This is effective if the failover controller is unsure if the old Namenode
has stopped completely.
• NFS Filer: When using an NFS filer for shared edit logs, stronger fencing methods are required
because NFS filers do not enforce exclusive write access as effectively as QJM.
o Revoking Access: Commands to revoke the Namenode’s access to the shared storage directory
can be used.
o Disabling Network Port: The Namenode’s network port can be disabled via remote
management commands to prevent it from accepting requests.
o STONITH (Shoot The Other Node In The Head): This drastic measure involves using a
specialized power distribution unit (PDU) to forcibly power down the host machine of the failed
Namenode, ensuring it can no longer affect the system.
Hadoop Filesystems: Overview and Usage

Hadoop’s flexible filesystem abstraction allows it to interact with various storage systems, each designed for
specific use cases. Understanding these filesystems can help in choosing the right one for your needs, whether
for local testing, distributed processing, or cloud integration.

1. Local FileSystem (fs.LocalFileSystem)

• URI Scheme: file:///


• Description: Represents local disk storage. Ideal for small-scale testing or development on a single
machine.
• Purpose:
o Testing and development in a local environment.
o When data integrity through client-side checksums is needed.
• How to Use:
o Access local files directly using the file:/// scheme.
o For environments where checksums are not required, use RawLocalFileSystem.

2. Hadoop Distributed File System (HDFS) (hdfs.DistributedFileSystem)

• URI Scheme: hdfs://


• Description: A distributed storage system designed for high-throughput access to large datasets.
Provides fault tolerance through data replication.
• Purpose:
o Handling large-scale data storage and processing.
o Optimized for use with MapReduce and other Hadoop processing frameworks.
o Ensures data reliability and fault tolerance.
• How to Use:
o Access files using the hdfs:// scheme.
o Ideal for processing large volumes of data across multiple nodes.
3. WebHDFS (hdfs.web.WebHdfsFileSystem)

• URI Scheme: webhdfs://


• Description: Allows access to HDFS over HTTP, enabling interaction through web interfaces and
REST APIs.
• Purpose:
o Web-based access to HDFS.
o Integration with web applications and services.
• How to Use:
o Interact with HDFS via HTTP requests using the webhdfs:// scheme.

4. Secure WebHDFS (hdfs.web.SWebHdfsFileSystem)

• URI Scheme: swebhdfs://


• Description: Secure version of WebHDFS using HTTPS for encrypted data transfers.
• Purpose:
o Secure data access over the web.
o Ensures encrypted communication with HDFS.
• How to Use:
o Access HDFS with secure HTTPS connections using the swebhdfs:// scheme.

5. Hadoop ARchives (HAR) (fs.HarFileSystem)

• URI Scheme: har://


• Description: Archives multiple small files into a single file to reduce memory usage on the namenode.
• Purpose:
o Reduces namenode memory overhead by consolidating small files.
o Useful for managing a large number of small files.
• How to Use:
o Access archived files using the har:// scheme.

6. ViewFS (viewfs.ViewFileSystem)

• URI Scheme: viewfs://


• Description: Provides a unified view of multiple filesystems, often used to create mount points for
federated namenodes.
• Purpose:
o Aggregates multiple HDFS clusters or filesystems into a single namespace.
o Facilitates data access across federated systems.
• How to Use:
o Access a combined view of filesystems using the viewfs:// scheme.

7. FTP (fs.ftp.FTPFileSystem)

• URI Scheme: ftp://


• Description: Interacts with FTP servers, allowing file operations over FTP.
• Purpose:
o Integration with FTP servers for data access and manipulation.
• How to Use:
o Read and write files over FTP using the ftp:// scheme.

8. Amazon S3 (fs.s3a.S3AFileSystem)

• URI Scheme: s3a://


• Description: Accesses Amazon S3 storage, supporting larger file sizes and better performance than
the older s3n implementation.
• Purpose:
o Integration with Amazon S3 cloud storage.
o Suitable for large-scale cloud storage requirements.
• How to Use:
o Access files in S3 using the s3a:// scheme.
9. Microsoft Azure Blob Storage (fs.azure.NativeAzureFileSystem)

• URI Scheme: wasb://


• Description: Provides access to Azure Blob Storage, integrating with Hadoop for cloud-based storage.
• Purpose:
o Storage integration with Microsoft Azure.
o Suitable for cloud storage needs with Hadoop.
• How to Use:
o Access Azure Blob Storage using the wasb:// scheme

10. OpenStack Swift (fs.swift.snative.SwiftNativeFileSystem)

• URI Scheme: swift://


• Description: Accesses OpenStack Swift, an object storage service.
• Purpose:
o Integration with OpenStack Swift for object storage.
o Suitable for cloud storage with OpenStack.
• How to Use:
o Access files in Swift using the swift:// scheme.

Choosing a Filesystem

• HDFS is recommended for large-scale data processing due to its optimization for data locality and
fault tolerance.
• Cloud-based filesystems (S3, Azure, Swift) are used for integrating with cloud storage platforms and
can be suitable depending on specific storage and processing requirements.
• Local filesystems are best for small-scale or development testing.

Each filesystem has distinct use cases depending on your needs for data volume, processing, and integration
with other storage systems.

Interface

Hadoop provides several interfaces to interact with its filesystems, catering to different programming
languages, environments, and access needs. Here’s an overview of the key interfaces:

1. Java API

• Purpose: Hadoop is written in Java, so its primary filesystem interactions are handled through Java
APIs.
• Implementation: The org.apache.hadoop.fs.FileSystem class provides methods for filesystem
operations such as reading, writing, and listing files.
• Usage: Commonly used for Java applications and for running Hadoop commands.

2. HTTP (WebHDFS)

• Purpose: Provides access to HDFS via HTTP, enabling non-Java applications to interact with Hadoop.
• Implementation: The WebHDFS protocol exposes an HTTP REST API that allows file operations
over HTTP or HTTPS.
• Access Methods:
o Direct Access: Clients interact directly with HDFS daemons (namenodes and datanodes) via
embedded web servers.
o Proxy Access: Clients connect through an HttpFS proxy server, which forwards requests to
HDFS. This approach can help with firewall and bandwidth management, and is useful for
cross-data-center or cloud access.
• Performance: WebHDFS is slower than the native Java client, so it’s less suitable for very large data
transfers.

3. C API

• Purpose: Provides a C library for interacting with Hadoop filesystems.


• Implementation:
o libhdfs: Mirrors the Java FileSystem interface using Java Native Interface (JNI). It can access
any Hadoop filesystem.
o libwebhdfs: Uses the WebHDFS interface for accessing HDFS.
• Usage: Allows C applications to interact with Hadoop filesystems. Prebuilt binaries are available for
64-bit Linux, with other platforms requiring custom builds.

4. NFS (Network File System)

• Purpose: Allows HDFS to be mounted on a local client’s filesystem, enabling interaction using
standard Unix utilities.
• Implementation: Hadoop’s NFSv3 gateway makes HDFS appear as a local filesystem.
• Usage: Useful for Unix-like environments where standard tools and libraries are used for file
operations.
5. FUSE (Filesystem in Userspace)

• Purpose: Integrates filesystems implemented in user space as standard Unix filesystems.


• Implementation: Hadoop’s Fuse-DFS module allows HDFS (or any Hadoop filesystem) to be
mounted as a local filesystem.
• Usage: Provides a standard interface for interacting with Hadoop filesystems from Unix-like systems.

These interfaces provide flexibility for accessing Hadoop filesystems from various programming
environments and tools:

• Java API: Native and preferred for Hadoop’s Java-based ecosystem.


• HTTP (WebHDFS): Provides web-based access and integration with non-Java applications.
• C API: Allows C applications to interact with Hadoop.
• NFS: Enables integration with Unix utilities and libraries via network file system access.
• FUSE: Integrates Hadoop filesystems with Unix-like systems as a local filesystem.

Each interface serves different use cases and provides different levels of access, performance, and
compatibility.

DATA FLOW

Anatomy of a File Read

Components Involved

1. Client: The user or application requesting to read a file from HDFS.


2. Namenode: The master server that manages the metadata and namespace of the file system.
3. Datanodes: The servers where the actual data blocks are stored.

Sequence of Events

1. Client Opens the File:


o Action: The client calls open() on the FileSystem object.
o Instance: For HDFS, this object is an instance of DistributedFileSystem.
o Step: (Step 1 in Figure )

FileSystem fs = FileSystem.get(new Configuration());


FSDataInputStream in = fs.open(new Path("/user/hadoop/file.txt"));
2. Namenode Determines Block Locations:
o Action: DistributedFileSystem contacts the namenode using remote procedure calls (RPCs).
o Purpose: To determine the locations of the first few blocks of the file.
o Response: The namenode returns the addresses of the datanodes that have copies of these
blocks, sorted by their proximity to the client.
o Step: (Step 2 in Figure)

Namenode returns datanodes for block 1: /d1/r1/n1, /d1/r1/n2

Namenode returns datanodes for block 2: /d1/r2/n3, /d1/r2/n4


3. Client Receives FSDataInputStream:
o Action: DistributedFileSystem returns an FSDataInputStream to the client.
o Role: An input stream that supports file seeks, wrapping a DFSInputStream.
o Function: DFSInputStream manages I/O with datanodes and the namenode.

FSDataInputStream in = fs.open(new Path("/user/hadoop/file.txt"));

4. Client Reads Data from Stream:


o Action: The client calls read() on the FSDataInputStream.
o Function: DFSInputStream connects to the first (closest) datanode holding the first block.
o Data Transfer: Data is streamed from the datanode to the client.
o Step: (Step 3 in Figure)
5. Transition Between Blocks:
o Action: When the end of a block is reached, DFSInputStream closes the connection to the
current datanode.
o Next Block: DFSInputStream finds the best datanode for the next block and opens a new
connection.
o Transparency: This process is transparent to the client.
o Step: (Step 4 and Step 5 in Figure)

Example:

o End of block 1 reached, switch from /d1/r1/n1 to /d1/r2/n3 for block 2.


6. Namenode Communication for Next Blocks:
o Action: DFSInputStream calls the namenode to retrieve locations for the next batch of blocks
as needed.
o Request more block locations as the client continues to read.
7. Client Finishes Reading:
o Action: The client calls close() on the FSDataInputStream when done.
o Step: (Step 6 in Figure)

Error Handling and Data Integrity

• Datandode Failures:
o Retry: If an error occurs while communicating with a datanode, DFSInputStream will try the
next closest datanode.
o Tracking Failures: It remembers failed datanodes to avoid retrying them unnecessarily.
Example:

o If /d1/r1/n1 fails, switch to /d1/r1/n2.


• Checksum Verification:
o Integrity Check: DFSInputStream verifies checksums for data transferred from the datanode.
o Corrupted Blocks:
▪ Action: Attempts to read a replica from another datanode.
▪ Reporting: Reports the corrupted block to the namenode.

Example:

o If checksum verification fails for block 1 from /d1/r1/n1, read block 1 from /d1/r1/n2.

Design Benefits

• Direct Data Retrieval:


o Client-Datanode Interaction: Clients contact datanodes directly for data retrieval, guided by
the namenode.
o Scalability: Allows HDFS to handle many concurrent clients, distributing data traffic across
all datanodes.
• Namenode Efficiency:
o Light Load: The namenode only services block location requests, which are stored in
memory for efficiency.
o Avoids Bottlenecks: The namenode does not serve data directly, preventing it from becoming
a bottleneck.

You might also like