0% found this document useful (0 votes)

15 views32 pages

Huawei

huawei

Uploaded by

eric sandria

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views32 pages

Huawei

huawei

Uploaded by

eric sandria

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 32

Chapter 3 HDFS — Hadoop Distributed

File System
Foreword
⚫ This chapter describes the HDFS concept, advantages and
disadvantages, architecture, read and write processes, basic
commands, and use cases.

2
Objectives
⚫ Upon completion of this course, you will understand:
 HDFS advantages and disadvantages
 HDFS architecture and key features
 HDFS read and write processes
 Common HDFS commands and basic operations

3
Contents
1. HDFS Overview

2. HDFS Basic Components

3. HDFS Key Features

4. HDFS Read and Write Processes

5. HDFS Use Cases

4
HDFS Overview
⚫ HDFS is the bottom-layer core of the Hadoop big data ecosystem
and supports distributed storage of big data. It is designed and
developed to process large data sets, facilitating high-throughput,
large-scale file operations.

5
HDFS Advantages
Multi-replica mechanism Parallel computing from
Automatic replica mobile applications to
restoration data nodes

High
fault Efficiency
tolerance
Java as the programming
Streaming data ingestion mode
language Streaming Processing by block in batches
Strong transplantability Cross-platform
data
compatibility
ingestion

Suitable
Simple file for
model big data
processing

Support for thousands of

Written once and read multiple times nodes in a cluster
File unmodifiable but appendable PB-level data processing

6
HDFS Disadvantages
⚫ HDFS has the following disadvantages:

Low-latency Not suitable for Not suitable for No support for

data small file concurrent data random file
access restricted storage write modification

To write vast amounts HDFS files are stored as Files can only be Data files can only be
of data in a certain blocks. The storage of modified by one user appended, but not
period of time, HDFS is block metadata occupies a instead of by multiple randomly modified.
optimized, which large amount of memory users concurrently.
increases the latency of of NameNodes. However,
obtaining data. the memory resources of
NameNodes are limited.

7
Contents
1. HDFS Overview

2. HDFS Basic Components

3. HDFS Key Features

4. HDFS Read and Write Processes

5. HDFS Use Cases

8
HDFS Architecture
HDFS architecture

Metadata (Name,replicas...):
NameNode /home/foo/data,3...
Metadata ops

Block ops
Client

Read DataNode DataNode

Replication

Blocks Blocks

Client
Rack 1 Rack 2

9
Block
⚫ The default HDFS block size is 64 MB in versions earlier than Hadoop 2.0 and
128 MB in Hadoop 2.0 or later. A file is divided into multiple blocks. A block
is the storage unit.
⚫ The block size is much larger than that of a common file system, minimizing
the addressing overhead.
⚫ A block has the following benefits:
 Large-scale file storage
 Simplified system design
 Applicable to data backup

10
Client
⚫ Clients are the most common way of using HDFS. HDFS provides a client during deployment.
⚫ It is a library that contains HDFS interfaces that hide most of the complexity in HDFS
implementation.
⚫ It supports common operations such as opening, reading, and writing, and provides a shell-
like command line mode to access data in HDFS.
⚫ HDFS also provides Java APIs that serve as client programming interfaces for applications to
access the file system.
⚫ Strictly speaking, the client is not part of the HDFS architecture. It is a built-in HDFS library
of Hadoop and is external to HDFS.

11
NameNode Functions
⚫ In HDFS, the NameNode manages the namespace of the distributed file system and
stores two core data structures: FsImage and EditLog.

NameNode
The EditLog file records update
Metadata mirroring Operation log
FsImage
operations such as file creation,
EditLog
The FsImage file contains a deletion, and renaming. When a
serialized form of all the NameNode is started, FsImage is
directory and file inodes in the loaded to the memory, and then
file system. It maintains the Root directory operations in EditLog are
metadata of the file system performed to synchronize
tree and all the files and metadata in the memory with
folders in that file tree. Subdirectory Subdirectory Subdirectory the actual metadata. Metadata
in the memory supports read
... File ... operations on clients.

Block ... Block

12
Functions of DataNodes
⚫ A DataNode is a worker node of HDFS. It stores and retrieves data based on
Client requests or NameNode scheduling, and periodically sends the list of
blocks stored on it to NameNodes.
 A DataNode is a place where data is stored in a file system.
 A Client or NameNode may request to write or read blocks to or from a DataNode, and
the DataNode periodically returns information about the blocks stored in it to the
NameNode.

13
SecondaryNameNode Functions
⚫ SecondaryNameNode is a component of
NameNode SecondaryNamenode
the HDFS architecture. It is used to store
replicas of HDFS metadata in a NameNode EditLog FsImage
2. Obtains Editlog and FsImage from the
active NameNode.
and reduce the restart time of the
NameNode. Its main function is to 1. Sends
notifications.
periodically combine the FsImage file and
Editlog
the EditLog file of the NameNode to FsImage
.new EditLog
prevent the log file from being too large.
3. Merges the
Typically, SecondaryNameNode runs FsImage and
EditLog files.
separately on a node. Fsimage Fsimage
.ckpt .ckpt
⚫ SecondaryNameNode is not the standby
5. Rolls back
node when a NameNode is faulty. It plays a FsImage. 4. Uploads the newly generated
FsImage file to the active NameNode.
different role from the NameNode. EditLog FsImage

14
Contents
1. HDFS Overview

2. HDFS Basic Components

3. HDFS Key Features

4. HDFS Read and Write Processes

5. HDFS Use Cases

15
Block Replication
⚫ HDFS stores very large files across machines in a large cluster. It stores each
file as a sequence of blocks; all blocks in a file except the last block are the
same size.
⚫ In most cases, a file has three replicas. The HDFS storage policy is to store a
replica on a node in the local rack, store a replica on another node in the
same rack, and store the last replica on a node in a different rack. In this
way, blocks are replicated.

16
Rack Awareness
⚫ Rack awareness: In an HDFS cluster, two nodes on different racks communicate with each
other through a switch. NameNode can horizontally replicate block replicas and store them
on DataNodes on different racks. This process is rack awareness.

⚫ Advantages: This prevents data loss when a rack fails and allows the bandwidth of multiple
racks to be fully utilized when data is read. In this way, replicas can be evenly distributed in
the cluster, implementing load balancing.

⚫ Disadvantage: A write operation of this policy needs to transmit blocks to multiple racks,
which increases the write cost.

17
Cluster Balancing Policy
⚫ After Hadoop starts a balancer task, the cluster automatically reads the disk
usage of each node and replicates data from the node whose space usage is
far greater than the average value to the node whose space usage is lower
than the average value based on the configured host space usage difference.
After the replication is complete, the original node data is deleted. In this
way, the cluster load is balanced.

18
Data Integrity
⚫ With checksum checkpointing, when an HDFS file is created on the client, the
client calculates the checksum of all blocks in the file and stores the
checksum as an independent hidden file in the same namespace in HDFS.
After obtaining the file, the client checks whether the data obtained from a
DataNode matches the checksum in the checksum file. If they do not match,
the client can obtain a replica of the block from another DataNode to ensure
that the obtained data is complete.

19
Snapshot Principle
⚫ A snapshot is copies of specified files in HDFS at a certain point in time. In
other words, a snapshot is an image of a file or a directory at a specific time.
⚫ An HDFS snapshot is used to create an index for a file system and create a
new space to store modified files. Once a snapshot is created, the file and file
directory structure at a certain time point can be restored using the snapshot
regardless of the file directory changes. Snapshots are read-only and can be
used to restore important data and prevent misoperations.

20
Contents
1. HDFS Overview

2. HDFS Basic Components

3. HDFS Key Features

4. HDFS Read and Write Processes

5. HDFS Use Cases

21
HDFS Data Write Process

1. Sends requests to create files. 2. Creates file metadata.

Distributed
HDFS file system NameNode
3. Writes data. 7. Completes the write operation.
client
FSData NameNode
6. Closes files.
output stream

Client node

4. Writes data packets. 5. Receives acknowledgment packets.

4 4
DataNode DataNode DataNode
5 5

DataNode DataNode DataNode

22
HDFS Data Read Process

1. Opens the file. Distributed 2. Obtains the block information.

HDFS NameNode
3. Reads the request.
File system
client
FSData NameNode
Input stream

Client node 5. Reads data.

4. Reads data.

DataNode DataNode DataNode

23
Contents
1. HDFS Overview

2. HDFS Basic Components

3. HDFS Key Features

4. HDFS Read and Write Processes

5. HDFS Use Cases

24
Common HDFS Commands (1)

Command Format Command Function

hdfs dfs -cat <hdfs file> /* Views the content of a specified file in HDFS.
hdfs dfs -chmod [-R]
<MODE[,MODE]... | OCTALMODE> Modifies the permission on a file.
PATH...
Collects statistics on the number of directories, files, and
hdfs dfs -count <hdfs path>
total file bytes in a specified directory in HDFS.
hdfs dfs -ls <hdfs path> Lists directories and files in a specified directory in HDFS.
hdfs dfs -mkdir <hdfs path> Creates a subdirectory in a specified directory in HDFS.
hdfs dfs -get <hdfs file> <local file or Downloads a specified file from HDFS to a local file or
dir > directory.

25
Common HDFS Commands (2)

Command Format Command Function

hdfs dfs -put <local file> <hdfs file> Uploads a file to HDFS.
hdfs dfs -rm <hdfs file> Deletes a file from HDFS.
Deletes directories and files from a specified directory in
hdfs dfs -rm -r <hdfs dir>
HDFS.
hdfs dfs -cp <path/file> <path/file> Copies a file in HDFS.
Moves a file in HDFS, which is equivalent to cutting or
hdfs dfs -mv <hdfs file> <hdfs file>
renaming a file.
hdfs dfs -tail <hdfs file> Displays the content at the end of a file.
hdfs dfs -text <hdfs file> Displays the file content in characters.

26
Uploading Data (Write Operation)
⚫ Uploading a local file to HDFS:
[root@localhost ~]# hdfs dfs –put /root/student.txt /user/inputs

⚫ This command is used to upload the student.txt file in the local root
directory to the /user/inputs directory in HDFS.

27
Downloading Data (Read Operation)
⚫ Downloading a file from HDFS to the local host:
[root@localhost ~]# hdfs dfs –get /user/inputs/student.txt /root/inputs

⚫ This command is used to download the student.txt file from HDFS to the
/root/inputs directory on the local host.

28
Section Summary
⚫ This chapter described the HDFS concept, advantages and
disadvantages, architecture, read and write processes, basic
commands, and use cases.

29
Q&A
1. What are the common commands for adding, deleting,
modifying, and querying HDFS data?

2. Describe the HDFS file read process.

3. Describe the HDFS file write process.

30
Recommendations
⚫ Huawei Cloud websites
 Official website: https://www.huaweicloud.com/intl/en-us/
 Developer Institute: https://edu.huaweicloud.com/intl/en-us/

Huawei Cloud
Developer Institute

31
Thank You.
Copyright© 2023 Huawei Technologies Co., Ltd. All Rights Reserved.
The information in this document may contain predictive statements including,
without limitation, statements regarding the future financial and operating results,
future product portfolio, new technology, etc. There are a number of factors that
could cause actual results and developments to differ materially from those
expressed or implied in the predictive statements. Therefore, such information is
provided for reference purpose only and constitutes neither an offer nor an
acceptance. Huawei may change the information at any time without notice.

JN0 104
100% (2)
JN0 104
109 pages
Big Data Unit-3
No ratings yet
Big Data Unit-3
46 pages
21CS72 Bigdata Module 2 HDFS
No ratings yet
21CS72 Bigdata Module 2 HDFS
55 pages
Unit 3 Full
No ratings yet
Unit 3 Full
89 pages
Bigdata Unit 3
No ratings yet
Bigdata Unit 3
96 pages
HDFS
No ratings yet
HDFS
16 pages
Basic Electronics
0% (1)
Basic Electronics
2 pages
Unit-4 BDA As On 25-11-2024
No ratings yet
Unit-4 BDA As On 25-11-2024
248 pages
Unit-2 Introduction To Hadoop
No ratings yet
Unit-2 Introduction To Hadoop
19 pages
Unit - 3 (HDFS)
No ratings yet
Unit - 3 (HDFS)
23 pages
Big Data Aktu Unit 3
No ratings yet
Big Data Aktu Unit 3
90 pages
Unit-3 (HDFS)
No ratings yet
Unit-3 (HDFS)
59 pages
Module 4 - Hadoop HDFS
No ratings yet
Module 4 - Hadoop HDFS
102 pages
BCS061 Notes Unit3
No ratings yet
BCS061 Notes Unit3
23 pages
Cambridge Primary Checkpoint Science0846Past Papers 2020 2007
92% (12)
Cambridge Primary Checkpoint Science0846Past Papers 2020 2007
770 pages
BDA - Unit-2
No ratings yet
BDA - Unit-2
24 pages
Bda - M 2
No ratings yet
Bda - M 2
113 pages
HDFS (27 Jan 2025 Hadoop Distributed File System)
No ratings yet
HDFS (27 Jan 2025 Hadoop Distributed File System)
73 pages
5.apache Hadoop
No ratings yet
5.apache Hadoop
33 pages
Primary Maths 2ed 4 LB Answers
75% (12)
Primary Maths 2ed 4 LB Answers
22 pages
Adobe Photoshop CC 2017 (v18.0) x86-x64 RUS/ENG TORRENT Download
No ratings yet
Adobe Photoshop CC 2017 (v18.0) x86-x64 RUS/ENG TORRENT Download
5 pages
Bigdta Unit 3
No ratings yet
Bigdta Unit 3
65 pages
3.1 Hadoop Ecosystem
No ratings yet
3.1 Hadoop Ecosystem
48 pages
Unit 3.1
No ratings yet
Unit 3.1
88 pages
Unit 2 Da Material
No ratings yet
Unit 2 Da Material
71 pages
Hadoop Architecture
No ratings yet
Hadoop Architecture
84 pages
Hadoop Intro
No ratings yet
Hadoop Intro
40 pages
Pri Eng 2ed wb6 Answers
92% (48)
Pri Eng 2ed wb6 Answers
20 pages
BDA Module 2 - Notes PDF
No ratings yet
BDA Module 2 - Notes PDF
101 pages
(17CS82) 8 Semester CSE: Big Data Analytics
No ratings yet
(17CS82) 8 Semester CSE: Big Data Analytics
169 pages
DSECL ZG 522: Big Data Systems: Session 6: Hadoop Architecture and Filesystem
No ratings yet
DSECL ZG 522: Big Data Systems: Session 6: Hadoop Architecture and Filesystem
56 pages
Chapter N2 HDFS The Hadoop Distributed File System - Matrix
No ratings yet
Chapter N2 HDFS The Hadoop Distributed File System - Matrix
37 pages
BD U-3 Notes
No ratings yet
BD U-3 Notes
27 pages
HDFS 3
No ratings yet
HDFS 3
51 pages
Hadoop Architecture
No ratings yet
Hadoop Architecture
48 pages
Unit - 3 (HDFS) - 1
No ratings yet
Unit - 3 (HDFS) - 1
24 pages
HDFS and YARN
No ratings yet
HDFS and YARN
91 pages
HDFS
No ratings yet
HDFS
20 pages
HDFS
No ratings yet
HDFS
37 pages
BD U-3 (Anupam Sir)
No ratings yet
BD U-3 (Anupam Sir)
23 pages
End of Unit Tests Answers
40% (5)
End of Unit Tests Answers
20 pages
Module 1 PDF
No ratings yet
Module 1 PDF
42 pages
HDFSnew
No ratings yet
HDFSnew
20 pages
What Is Hadoop HDFS
No ratings yet
What Is Hadoop HDFS
20 pages
HDFS
No ratings yet
HDFS
19 pages
Hdfs R20it III
No ratings yet
Hdfs R20it III
19 pages
Chapter 4 - Hadoop Ecosystem
No ratings yet
Chapter 4 - Hadoop Ecosystem
24 pages
Unit 3 1
No ratings yet
Unit 3 1
20 pages
Unit 2
No ratings yet
Unit 2
22 pages
Manuel de L'ordi Digital CURSOR 13500
No ratings yet
Manuel de L'ordi Digital CURSOR 13500
32 pages
Exp1 Bda
No ratings yet
Exp1 Bda
11 pages
Prim Maths 3 2ed TR End of Year Test
100% (6)
Prim Maths 3 2ed TR End of Year Test
8 pages
Hadoop Distributed File System (HDFS)
No ratings yet
Hadoop Distributed File System (HDFS)
22 pages
05 - Introduction To HDFS
No ratings yet
05 - Introduction To HDFS
27 pages
Module 1 PDF
No ratings yet
Module 1 PDF
49 pages
Cambridge Science Year 7 LB Lyp
84% (213)
Cambridge Science Year 7 LB Lyp
344 pages
What Is Hadoop HDF1
No ratings yet
What Is Hadoop HDF1
6 pages
10 0096 01 6RP AFP tcm142-725648
100% (14)
10 0096 01 6RP AFP tcm142-725648
20 pages
Introduction To Hadoop Ecosystem
No ratings yet
Introduction To Hadoop Ecosystem
46 pages
P Science 5 Learners Book Answers
96% (25)
P Science 5 Learners Book Answers
22 pages
HDFS Internals
No ratings yet
HDFS Internals
30 pages
Primary Checkpoint Math 2024 April Paper 2 0845
91% (23)
Primary Checkpoint Math 2024 April Paper 2 0845
20 pages
WB ANSWERS STAGE 5 Cambridge Primary Science
85% (26)
WB ANSWERS STAGE 5 Cambridge Primary Science
15 pages
Hadoop File System: B. Ramamurthy
No ratings yet
Hadoop File System: B. Ramamurthy
36 pages
Maths Cambrilearn Grade 4 Assignment 2
80% (10)
Maths Cambrilearn Grade 4 Assignment 2
7 pages
Stage - 3 - English - Paper - 1 - Progression Test, 2023, Past Paper - Q
89% (9)
Stage - 3 - English - Paper - 1 - Progression Test, 2023, Past Paper - Q
12 pages
Hdfs and Pig
No ratings yet
Hdfs and Pig
13 pages
Paper Hdfs Summary
No ratings yet
Paper Hdfs Summary
5 pages
Hadoop Distributed File System
No ratings yet
Hadoop Distributed File System
4 pages
Hadoop File System: B. Ramamurthy
No ratings yet
Hadoop File System: B. Ramamurthy
36 pages
Cambridge Primary Mathematics 6 Learner's Book Second Edition
35% (34)
Cambridge Primary Mathematics 6 Learner's Book Second Edition
10 pages
Hadoop File System
No ratings yet
Hadoop File System
36 pages
Hadoop Distributed File System: Bhavneet Kaur B.Tech Computer Science 2 Year
No ratings yet
Hadoop Distributed File System: Bhavneet Kaur B.Tech Computer Science 2 Year
34 pages
Science Stage 5 Sample Paper 1
87% (15)
Science Stage 5 Sample Paper 1
14 pages
Primary Progression Test Stage 4 Science Paper 1
94% (17)
Primary Progression Test Stage 4 Science Paper 1
16 pages
Mathematics Stage 4 Sample QN Paper 1 - 2020
71% (14)
Mathematics Stage 4 Sample QN Paper 1 - 2020
14 pages
Đề thi Primary Checkpoint Math 2024 April Paper 1
77% (13)
Đề thi Primary Checkpoint Math 2024 April Paper 1
20 pages
Hadoop File System: B. Ramamurthy
No ratings yet
Hadoop File System: B. Ramamurthy
36 pages
Science Stage 5
69% (13)
Science Stage 5
20 pages
Sample Cambridge Primary Mathematics 1 Workbook Second Edition
75% (12)
Sample Cambridge Primary Mathematics 1 Workbook Second Edition
20 pages
enteliWEB 4.20 Operator Guide
No ratings yet
enteliWEB 4.20 Operator Guide
244 pages
Math Apr 24 P1 MS
92% (12)
Math Apr 24 P1 MS
12 pages
Mathematics: Stage 3 Paper 2
91% (11)
Mathematics: Stage 3 Paper 2
12 pages
English Primary Checkpoint Past Papers PDF
90% (29)
English Primary Checkpoint Past Papers PDF
209 pages
P - Science 4 - End-Of-Unit - 3
100% (10)
P - Science 4 - End-Of-Unit - 3
3 pages
English Stage 4 Sample Paper 2 - tcm142-594880
41% (17)
English Stage 4 Sample Paper 2 - tcm142-594880
10 pages
Skuld List of Correspondent
No ratings yet
Skuld List of Correspondent
351 pages
Science Stage 4 2024 Paper 1
81% (16)
Science Stage 4 2024 Paper 1
10 pages
Primary Progression Tests Science Stage 5 Paper2 2020
100% (12)
Primary Progression Tests Science Stage 5 Paper2 2020
14 pages
Maths - Stage 5 - 02 - 5RP - AFP - tcm142-639570
80% (30)
Maths - Stage 5 - 02 - 5RP - AFP - tcm142-639570
22 pages
Primary Progression Test - Stage 3 English Paper 1
82% (60)
Primary Progression Test - Stage 3 English Paper 1
12 pages
Cambridge Primary Progression Test: English Question Paper 1
67% (24)
Cambridge Primary Progression Test: English Question Paper 1
8 pages
Cambridge Primary Checkpoint - Mathematics (0845) April 2021 Paper 1 Question
79% (42)
Cambridge Primary Checkpoint - Mathematics (0845) April 2021 Paper 1 Question
16 pages
Robot Controlling Through ZIGBEE Technology
No ratings yet
Robot Controlling Through ZIGBEE Technology
50 pages
Content Server 53 API Ref
No ratings yet
Content Server 53 API Ref
554 pages
Common Bus System
No ratings yet
Common Bus System
11 pages
Fingerprint Based Student Attendance Management System With Automatic Excel Computation
No ratings yet
Fingerprint Based Student Attendance Management System With Automatic Excel Computation
18 pages
Syed Ammal Engineering College: Department of Computer Science and Engineering
No ratings yet
Syed Ammal Engineering College: Department of Computer Science and Engineering
34 pages
Marketing and Computer
No ratings yet
Marketing and Computer
15 pages
AD7616
No ratings yet
AD7616
50 pages
Basic Raspberry Pi
No ratings yet
Basic Raspberry Pi
78 pages
Datastage
No ratings yet
Datastage
52 pages
Dell PowerEdge Server Start-Up Guide - Dell Singapore
No ratings yet
Dell PowerEdge Server Start-Up Guide - Dell Singapore
5 pages
PhDThesis2016-Energy-Efficient Management of Resources in Container-Based Clouds
No ratings yet
PhDThesis2016-Energy-Efficient Management of Resources in Container-Based Clouds
220 pages
Quick Guide For OOP in C++ - LeetCode Discuss
No ratings yet
Quick Guide For OOP in C++ - LeetCode Discuss
1 page
TDA7461N: Car Radio Signal Processor
No ratings yet
TDA7461N: Car Radio Signal Processor
31 pages
CRUD Operation in ASP
No ratings yet
CRUD Operation in ASP
26 pages
j9150d Datasheet
No ratings yet
j9150d Datasheet
4 pages
CCNA 1 (v5.1 + v6.0) Chapter 6 Exam Answers 2018
No ratings yet
CCNA 1 (v5.1 + v6.0) Chapter 6 Exam Answers 2018
8 pages
Circuit Lab 51
No ratings yet
Circuit Lab 51
3 pages
(English Version) User Manual of Presenter H100 - v12
No ratings yet
(English Version) User Manual of Presenter H100 - v12
15 pages
02 Handout 1
No ratings yet
02 Handout 1
2 pages
Image Processing Applications To Determine The Lane Center For Autonomous Vehicle - Submitted
No ratings yet
Image Processing Applications To Determine The Lane Center For Autonomous Vehicle - Submitted
5 pages
Sterling and Wilson Commissioning Test Report For Ups DOC. No.: REV. No.: 00 Page No: 1 of 2 Project
No ratings yet
Sterling and Wilson Commissioning Test Report For Ups DOC. No.: REV. No.: 00 Page No: 1 of 2 Project
2 pages
Servview17": Contact Us Worldwide: WWW - Blackbox.Eu
No ratings yet
Servview17": Contact Us Worldwide: WWW - Blackbox.Eu
4 pages
Acer Aspire E5-571G Performance Results - UserBenchmark
No ratings yet
Acer Aspire E5-571G Performance Results - UserBenchmark
4 pages
Os Chapter 1
No ratings yet
Os Chapter 1
8 pages
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
From Everand
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
Wei Liu
No ratings yet
Big Data Analytics
From Everand
Big Data Analytics
Nitin Kumar Yadav
No ratings yet
Mastering Data Engineering: Advanced Techniques with Apache Hadoop and Hive
From Everand
Mastering Data Engineering: Advanced Techniques with Apache Hadoop and Hive
Peter Jones
No ratings yet

Huawei

Uploaded by

Huawei

Uploaded by

Chapter 3 HDFS — Hadoop Distributed

2. HDFS Basic Components

3. HDFS Key Features

4. HDFS Read and Write Processes

5. HDFS Use Cases

Support for thousands of

Low-latency Not suitable for Not suitable for No support for

2. HDFS Basic Components

3. HDFS Key Features

4. HDFS Read and Write Processes

5. HDFS Use Cases

Read DataNode DataNode

Block ... Block

2. HDFS Basic Components

3. HDFS Key Features

4. HDFS Read and Write Processes

5. HDFS Use Cases

2. HDFS Basic Components

3. HDFS Key Features

4. HDFS Read and Write Processes

5. HDFS Use Cases

1. Sends requests to create files. 2. Creates file metadata.

4. Writes data packets. 5. Receives acknowledgment packets.

DataNode DataNode DataNode

1. Opens the file. Distributed 2. Obtains the block information.

Client node 5. Reads data.

DataNode DataNode DataNode

DataNode DataNode DataNode

2. HDFS Basic Components

3. HDFS Key Features

4. HDFS Read and Write Processes

5. HDFS Use Cases

Command Format Command Function

Command Format Command Function

2. Describe the HDFS file read process.

3. Describe the HDFS file write process.

You might also like