HDFS 3

Uploaded by

himavamsi19

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views51 pages

HDFS 3

Uploaded by

himavamsi19

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 51

HDFS (Hadoop Distributed File System)

What is HDFS?
• HDFS is a distributed file system designed for
storing large data sets across clusters of
computers.
• It is a core component of the Apache Hadoop
ecosystem.
• HDFS provides a reliable, scalable, and efficient
storage solution for big data applications.
• Purpose of HDFS
• HDFS (Hadoop Distributed File System) is a
distributed file system designed to store and process
large datasets across multiple machines in a reliable
and scalable manner.
• Key Features of HDFS
• Fault tolerance: HDFS is designed to handle hardware
failures by replicating data across multiple nodes.
• Scalability: HDFS can scale horizontally by adding
more nodes to the cluster.
• High throughput: HDFS is optimized for sequential
data access, making it suitable for big data processing.
HDFS (Hadoop Distributed File System)
NameNode: The NameNode is the master
node that manages the file system
namespace and regulates access to files. It
stores the metadata about files and
directories, including the file hierarchy,
permissions, and block locations.
DataNode: DataNodes are the
worker nodes that store the actual
data blocks of files. They are
responsible for reading and writing
data to the local file system and
replicating data blocks to ensure
fault-tolerance.
•Data Replication:-Ensures data availability and
fault tolerance.
•Each data block is replicated across multiple
data nodes.
•Replication factor: Configurable parameter to
determine the number of copies.
•Improves data availability: If a DataNode fails,
data can still be accessed from other replicas.
•Client:-Applications interact with HDFS
through the HDFS client.
•Available as a command-line tool (hdfs dfs) and
programmatic APIs.
•Performs file system operations on behalf of
users.
•Communicates with the NameNode for
metadata management.
•Interacts directly with DataNodes for data
transfer (read/write).
•Block:-The fundamental unit of data storage in
HDFS.
•Files are divided into fixed-size blocks
(configurable).
•Default block size: 128 MB (can be adjusted
based on workload).
•Each block is replicated across multiple data
nodes.
•Block size impacts performance and storage
efficiency.
Advantages of Block
Advantages of Block
• Fault tolerance
Advantages of Block
• Fault tolerance
• Parallel processing
Advantages of Block
• Fault tolerance
• Parallel processing
• Scalability
Advantages of Block
• Fault tolerance
• Parallel processing
• Scalability
• Data locality
Advantages of Block
• Fault tolerance
• Parallel processing
• Scalability
• Data locality
• Ease of data management
Advantages of Block
• Fault tolerance
• Parallel processing
• Scalability
• Data locality
• Ease of data management
•Rack:-A group of DataNodes physically
located together in a data center.
•Connected by a high-bandwidth network
switch.
•Improves data locality and network
performance.
•Enables rack awareness for data placement
strategies.
.
HDFS (Hadoop Distributed File System)
READ OPERATION
READ OPERATION
• Client Request
READ OPERATION
• Client Request
• NameNode Lookup
and Block locations
READ OPERATION
• Client Request
• NameNode Lookup
and Block locations
• Client Reads Data
Blocks
READ OPERATION
• Client Request
• NameNode Lookup
and Block locations
• Client Reads Data
Blocks
• Data Streaming and
Block-by-Block
Processing
READ OPERATION
• Client Request
• NameNode Lookup
and Block locations
• Client Reads Data
Blocks
• Data Streaming and
Block-by-Block
Processing
• Stream Management and
Next Block Lookups
READ OPERATION
• Client Request
• NameNode Lookup
and Block locations
• Client Reads Data
Blocks
• Data Streaming and
Block-by-Block
Processing
• Stream Management and
Next Block Lookups
• Delivery to Application
READ OPERATION
• Client Request
• NameNode Lookup
and Block locations
• Client Reads Data
Blocks
• Data Streaming and
Block-by-Block
Processing
• Stream Management and
Next Block Lookups
• Delivery to Application
HDFS (Hadoop Distributed File System)
Write Operation
Write Operation
• Client Initiates Write Request
Write Operation
• Client Initiates Write Request
• NameNode Interaction and
File Creation
Write Operation
• Client Initiates Write Request
• NameNode Interaction and
File Creation
• Client Writes Data and
Packet Creation
Write Operation
• Client Initiates Write Request
• NameNode Interaction and
File Creation
• Client Writes Data and
Packet Creation
• DataStreamer, Block
Allocation, and Pipeline
Creation
Write Operation
• Client Initiates Write Request
• NameNode Interaction and
File Creation
• Client Writes Data and
Packet Creation
• DataStreamer, Block
Allocation, and Pipeline
Creation
• Data Replication Pipeline
Write Operation
• Client Initiates Write Request
• NameNode Interaction and
File Creation
• Client Writes Data and
Packet Creation
• DataStreamer, Block
Allocation, and Pipeline
Creation
• Data Replication Pipeline
HDFS (Hadoop Distributed File System)
What is Hdfs Federation
•HDFS Federation is an extension of the Hadoop
Distributed File System (HDFS) that allows you to
manage multiple HDFS clusters as a single, unified
namespace. This means you can store and access
data across geographically distributed clusters,
overcoming the limitations of a single HDFS
cluster.
Features of HDFS Federation
Features of HDFS Federation
•Overcoming Limitations
Features of HDFS Federation
•Overcoming Limitations
•Scalability
Features of HDFS Federation
•Overcoming Limitations
•Scalability
•Improved Manageability
Features of HDFS Federation
•Overcoming Limitations
•Scalability
•Improved Manageability
• Geographical Distribution
HDFS (Hadoop Distributed File System)
HDFS federation

• Supports multiple NameNodes for

scalability and isolation
• Namespaces are federated, but not shared
• Useful for large clusters or multiple
organizations
HDFS federation

• Supports multiple NameNodes for

scalability and isolation
• Namespaces are federated, but not shared
• Useful for large clusters or multiple
organizations
: HDFS High
Availability

• Provides NameNode redundancy

• Active NameNode and Standby
NameNode
• Automatic failover in case of NameNode
failure
HDFS Permissions

• Supports POSIX-like permissions (user, group,

other)
• Permissions are managed by the NameNode
• Useful for securing sensitive data
HDFS Snapshots

• Provides point-in-time backups of the file system

• Efficient storage utilization by storing only
differences
• Useful for data recovery or auditing purposes
HDFS File Operations

• Support for basic file operations (create, read,

write, delete, rename, etc.)
• Optimized for streaming data access patterns
• Not recommended for large numbers of small
files
The hdfs dfs command line interface provides a
way to interact with HDFS from the command

HDFS
line.

Command Common commands include ls (list

files/directories), put (upload files), get
(download files), rm (remove files/directories),

Line mkdir (create directories), chmod (change

permissions), and others.

Interface The command line interface is useful for

managing and administering HDFS clusters.

MSCIT Ojbective Questions and Answers
60% (5)
MSCIT Ojbective Questions and Answers
5 pages
Big Data Unit-3
No ratings yet
Big Data Unit-3
46 pages
2.2 Dynamics of Feedback Control Systems
No ratings yet
2.2 Dynamics of Feedback Control Systems
42 pages
Unit 3 Full
No ratings yet
Unit 3 Full
89 pages
Lec 10
No ratings yet
Lec 10
5 pages
Unit-4 BDA As On 25-11-2024
No ratings yet
Unit-4 BDA As On 25-11-2024
258 pages
Java Programming MCQ Questions and Answers: Read/Download
No ratings yet
Java Programming MCQ Questions and Answers: Read/Download
2 pages
HDFS
No ratings yet
HDFS
16 pages
Big Data Refers To Extremely Large and Complex Datasets That 1
No ratings yet
Big Data Refers To Extremely Large and Complex Datasets That 1
421 pages
Bigdata Unit 3
No ratings yet
Bigdata Unit 3
96 pages
Unit-Iv CC&BD CS71
No ratings yet
Unit-Iv CC&BD CS71
148 pages
Unit-2 Introduction To Hadoop
No ratings yet
Unit-2 Introduction To Hadoop
19 pages
Css With Python
No ratings yet
Css With Python
11 pages
BCS061 Notes Unit3
No ratings yet
BCS061 Notes Unit3
23 pages
Hacking Terminologies Explained
No ratings yet
Hacking Terminologies Explained
4 pages
Class 8 PYTHON
No ratings yet
Class 8 PYTHON
23 pages
HDFS
No ratings yet
HDFS
11 pages
HDFS
No ratings yet
HDFS
20 pages
controllingTheFlow Answers
No ratings yet
controllingTheFlow Answers
11 pages
Applications of Queues
No ratings yet
Applications of Queues
2 pages
Unit-4 BDA As On 25-11-2024
No ratings yet
Unit-4 BDA As On 25-11-2024
248 pages
DATA228 Lecture Notes Week 4
No ratings yet
DATA228 Lecture Notes Week 4
21 pages
HDFS (27 Jan 2025 Hadoop Distributed File System)
No ratings yet
HDFS (27 Jan 2025 Hadoop Distributed File System)
73 pages
NMK10603 - Chapter 4 - Functions - Part 2
No ratings yet
NMK10603 - Chapter 4 - Functions - Part 2
22 pages
Big Data Hadoop HDFS
No ratings yet
Big Data Hadoop HDFS
32 pages
Dbms Proficiency
No ratings yet
Dbms Proficiency
8 pages
Computer Science Apprenticeship Bigdata Assignement3
No ratings yet
Computer Science Apprenticeship Bigdata Assignement3
3 pages
8086 Stack, Procedures
No ratings yet
8086 Stack, Procedures
18 pages
FM Rail Book Lecture Notes Version
No ratings yet
FM Rail Book Lecture Notes Version
17 pages
Hadoop Distributed File System: Presented by Mohammad Sufiyan Nagaraju Kola Prudhvi Krishna Kamireddy
No ratings yet
Hadoop Distributed File System: Presented by Mohammad Sufiyan Nagaraju Kola Prudhvi Krishna Kamireddy
17 pages
Bigdta Unit 3
No ratings yet
Bigdta Unit 3
65 pages
Paper Hdfs Summary
No ratings yet
Paper Hdfs Summary
5 pages
Conversion 35 41 AEA
No ratings yet
Conversion 35 41 AEA
7 pages
HDFS
No ratings yet
HDFS
14 pages
Notes - 3 Unit Neha
No ratings yet
Notes - 3 Unit Neha
25 pages
BD U-3 (Anupam Sir)
No ratings yet
BD U-3 (Anupam Sir)
23 pages
Big Data Lecture # 05
No ratings yet
Big Data Lecture # 05
22 pages
Wa Introhdfs PDF
No ratings yet
Wa Introhdfs PDF
11 pages
HDFS
No ratings yet
HDFS
22 pages
Module III Hadoop Framework
No ratings yet
Module III Hadoop Framework
21 pages
Big Data Importance of Hadoop Distributed Filesystem
No ratings yet
Big Data Importance of Hadoop Distributed Filesystem
4 pages
Engleza Chap 2 x86 Arch
No ratings yet
Engleza Chap 2 x86 Arch
13 pages
DC Mod 6
No ratings yet
DC Mod 6
9 pages
Hadoop Distributed File System
No ratings yet
Hadoop Distributed File System
14 pages
Huawei
No ratings yet
Huawei
32 pages
BD U-3 Notes
No ratings yet
BD U-3 Notes
27 pages
Javatpoint - Java 8-Featur
No ratings yet
Javatpoint - Java 8-Featur
3 pages
Hadoop Architecture
No ratings yet
Hadoop Architecture
84 pages
Unit 3 Big Data - 240516 - 090400
No ratings yet
Unit 3 Big Data - 240516 - 090400
20 pages
5.apache Hadoop Updated
No ratings yet
5.apache Hadoop Updated
57 pages
4
No ratings yet
4
53 pages
HDFS
No ratings yet
HDFS
1 page
Hadoop Distributed File System (HDFS)
No ratings yet
Hadoop Distributed File System (HDFS)
6 pages
Unit 3.1
No ratings yet
Unit 3.1
88 pages
How To Land On Azure Data Engineer Job
No ratings yet
How To Land On Azure Data Engineer Job
5 pages
(17CS82) 8 Semester CSE: Big Data Analytics
No ratings yet
(17CS82) 8 Semester CSE: Big Data Analytics
169 pages
05 - Introduction To HDFS
No ratings yet
05 - Introduction To HDFS
27 pages
Computer Forensics
No ratings yet
Computer Forensics
7 pages
Unit 3 1
No ratings yet
Unit 3 1
20 pages
3.1 Hadoop Ecosystem
No ratings yet
3.1 Hadoop Ecosystem
48 pages
BIG DATA - Unit 4 HADOOP AND MAP REDUCE - Mini Xerox - Easy Read
No ratings yet
BIG DATA - Unit 4 HADOOP AND MAP REDUCE - Mini Xerox - Easy Read
16 pages
IMTC634 - Data Science - Chapter 14
No ratings yet
IMTC634 - Data Science - Chapter 14
22 pages
OPPs - Unit1, Unit2-Basis Structural Modeling
No ratings yet
OPPs - Unit1, Unit2-Basis Structural Modeling
24 pages
Chapter 4 - Hadoop Ecosystem
No ratings yet
Chapter 4 - Hadoop Ecosystem
24 pages
PROG31111111111111
No ratings yet
PROG31111111111111
48 pages
CS19741-Cloud Computing-Unit 3 Notes
No ratings yet
CS19741-Cloud Computing-Unit 3 Notes
37 pages
DSA7s Catalog
No ratings yet
DSA7s Catalog
2 pages
Unit II Big Data Analytics
No ratings yet
Unit II Big Data Analytics
11 pages
TIGER
No ratings yet
TIGER
4 pages
Complete Hadoop Notes Final
No ratings yet
Complete Hadoop Notes Final
4 pages
Control Builder Components Reference PDF Free
No ratings yet
Control Builder Components Reference PDF Free
368 pages
Hadoop Architecture
No ratings yet
Hadoop Architecture
48 pages
BD Unit-IIINotes
No ratings yet
BD Unit-IIINotes
17 pages
Read Write in HDFS
No ratings yet
Read Write in HDFS
6 pages
Ilovepdf - Merged (3) - Merged
No ratings yet
Ilovepdf - Merged (3) - Merged
20 pages
HDFS
No ratings yet
HDFS
13 pages
10 Dfs
No ratings yet
10 Dfs
5 pages
Reporte de Threat Modeling Proyecto
No ratings yet
Reporte de Threat Modeling Proyecto
19 pages
Introduction To Hadoop Ecosystem
No ratings yet
Introduction To Hadoop Ecosystem
46 pages
Study Notes
No ratings yet
Study Notes
243 pages
HDFS Unit 4
No ratings yet
HDFS Unit 4
8 pages
Module 1 PDF
No ratings yet
Module 1 PDF
49 pages
SC 7000 - 9000 XL Service Manual
No ratings yet
SC 7000 - 9000 XL Service Manual
160 pages
Big Data Aktu Unit 3
No ratings yet
Big Data Aktu Unit 3
90 pages
Python Programming LAB IV Sem NEP-1
100% (2)
Python Programming LAB IV Sem NEP-1
22 pages
Cloud Computing
No ratings yet
Cloud Computing
15 pages
HDFS Internals
No ratings yet
HDFS Internals
30 pages
EBS On Exadata
No ratings yet
EBS On Exadata
55 pages
Mastering Data Engineering: Advanced Techniques with Apache Hadoop and Hive
From Everand
Mastering Data Engineering: Advanced Techniques with Apache Hadoop and Hive
Peter Jones
No ratings yet
Big Data Analytics
From Everand
Big Data Analytics
Nitin Kumar Yadav
No ratings yet

HDFS 3

Uploaded by

HDFS 3

Uploaded by

HDFS (Hadoop Distributed File System)

• Supports multiple NameNodes for

• Supports multiple NameNodes for

• Provides NameNode redundancy

• Supports POSIX-like permissions (user, group,

• Provides point-in-time backups of the file system

• Support for basic file operations (create, read,

Command Common commands include ls (list

Line mkdir (create directories), chmod (change

Interface The command line interface is useful for

You might also like