0% found this document useful (0 votes)

83 views5 pages

Hadoop File System: CSC 369 Distributed Computing Alexander Dekhtyar

HDFS is a distributed file system that stores large files across compute nodes in a Hadoop cluster. It is optimized for large files and write-once, read-many access patterns. HDFS uses a master-slave architecture and replicates data across nodes for fault tolerance. Users have their own directories and HDFS supports POSIX permissions. Commands like hdfs dfs -ls, -get, -put allow interacting with files from the command line. Files are copied in blocks and HDFS is best for large, static files rather than many small, changing files.

Uploaded by

Carl Alabaster

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

83 views5 pages

Hadoop File System: CSC 369 Distributed Computing Alexander Dekhtyar

Uploaded by

Carl Alabaster

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

. .

CSC 369 Distributed Computing Alexander Dekhtyar

. .

Hadoop File System

HDFS Basics

Hadoop File System or HDFS is a distributed file system that resides on top
of the filesystems in of the compute nodes forming a Hadoop cluster.
HDFS has the following properties:

1. Distributed file storage. All data stored on HDFS is accessible from all
Hadoop nodes.

2. Optimized for very large file storage. HDFS stores files in blocks of 64
MB. This means that a single disk read operation can bring 64 Mb of
data directly to a compute node.

3. Support for write-once, read-many access parttern. HDFS is largely

designed for the following pattern of use:

• A data file is uploaded to HDFS once.

• A large number of analytical tasks (individual MapReduce jobs)
is performed using this file as input data.

4. Support for commodity hardware. HDFS assumes that it runs on com-

modity hardware, which has a high probability of failure. To com-
pensate, HDFS supports a variety of data replication and recovery
protocols that prevent data loss in case of hard disk failures.

5. Standard POSIX interface. HDFS supports the standard POSIX file

system interface. This essentially means that with minor exceptions
(HDFS needs to have commands for transfer of files to/from that are
not present in regular file systems), standard UNIX/Linux file system
commands are supported on HDFS.

HDFS is not very good for dealing with

1
1. Large numbers of small files. Each file will wind up being stored in a
64MB block.

2. Multiple active writes to HDFS files. Data files on HDFS are assumed
to be static. HDFS is not very good at supporting active modification
of data files.

HDFS organization. By default, HDFS is organized in a way similar to

how Linux file system is organized. Each Hadoop user receives their own
directory:

hdfs:///user/<loginId>

or, simply

/user/<loginId>

This is the default location for all file transfers/file operations for HDFS
for user <loginId>. For example

$ hdfs dfs -ls

command that I run (as user dekhtyar), is equivalent to running

$ hdfs dfs -ls /user/dekhtyar

$ hdfs dfs -ls hdfs:///user/dekhtyar

Permissions. HDFS supports the standard user-group-others POSIX file

access model. By default, the group is set to supergroup and all Hadoop
users usually are its members.

Working with HDFS

Hadoop provides three command-line methods for accessing HDFS:

• hadoop fs command

• hadoop dfs command

• hdfs dfs command

hadoop dfs and hdfs dfs commands. The hadoop dfs and hdfs dfs
commands provide command-line access to HDFS and the files stored on it.
hadoop dfs command is depricated in the new version of hadoop. You
must use hdfs dsf command now.
hadoop fs command. The hadoop fs command provides interface to
any file system reachable from the node on which the command is run.
Specifically, in addition to HDFS, hadoop fs can access files from the local
file system.
Below, we use hadoop fs to represent the syntax of HDFS commands.
The syntax of the other two commands is similar.

General file system access command format. The general format of

an HDFS access command is:

$ hadoop fs -<command> [<arguments>]

Here, <command> is the file system access command, and <arguments> are
the optional arguments to each command.

File system access commands.

HDFS supports the following file system access commands. (This is not a
full list, but rather a list of most important commands.)
Command Meaning
-help help message, instructions on use of commands
-usage display information about the usage of a specific command
-ls display the lists of files/directories
-put, -copyFromLocal copy file from local file system to HDFS
-get, -copyToLocal copy file from HDFS to local file system
-moveFromLocal move file from local file system to HDFS
-moveToLocal move file from HDFS to local fils system
-mkdir create a directory
-rmdir remove a directory
-cp copy files
-mv move files
-rm delete (remove) files
-touchz create a zero length file
-chmod change file access permissions
-chgroup change file group
-chown change file owner
-cat display contents of file(s)
-text output the contents of a file as text
-tail display the last 1Kb of the file
-du show file system usage statistics
-df show free space on the file system

Viewing directory structure and files. To see what is in a specific

HDFS directory, use the -ls command.

$ hadoop fs -ls <hdfsPath>

For example

$ hadoop fs -ls test/

shows the list of files and directories in the test directory located in the
home directory of the current user.
A sample output may be:
dekhtyar@cslvm31:~/369/lab6$ hadoop fs -ls test/
Found 5 items
-rw-r--r-- 2 dekhtyar supergroup 83 2016-02-04 14:59 test/data
drwxr-xr-x - dekhtyar supergroup 0 2016-02-05 12:03 test/grep
drwxr-xr-x - dekhtyar supergroup 0 2016-02-09 17:33 test/out01
drwxr-xr-x - dekhtyar supergroup 0 2016-02-09 17:09 test/output
-rw-r--r-- 2 dekhtyar supergroup 3302 2016-02-04 20:00 test/wc.jar

HDFS supports the -ls -R flag, which recursively lists all subdirectories.

dekhtyar@cslvm31:~/369/lab6$ hadoop fs -ls -R test/

-rw-r--r-- 2 dekhtyar supergroup 83 2016-02-04 14:59 test/data
drwxr-xr-x - dekhtyar supergroup 0 2016-02-05 12:03 test/grep
-rw-r--r-- 2 dekhtyar supergroup 0 2016-02-05 12:03 test/grep/_SUCCESS
-rw-r--r-- 2 dekhtyar supergroup 7 2016-02-05 12:03 test/grep/part-r-00000
drwxr-xr-x - dekhtyar supergroup 0 2016-02-09 17:33 test/out01
-rw-r--r-- 2 dekhtyar supergroup 0 2016-02-09 17:33 test/out01/_SUCCESS
-rw-r--r-- 2 dekhtyar supergroup 114 2016-02-09 17:33 test/out01/part-r-00000
drwxr-xr-x - dekhtyar supergroup 0 2016-02-09 17:09 test/output
-rw-r--r-- 2 dekhtyar supergroup 0 2016-02-09 17:09 test/output/_SUCCESS
-rw-r--r-- 2 dekhtyar supergroup 94 2016-02-09 17:09 test/output/part-r-00000
-rw-r--r-- 2 dekhtyar supergroup 3302 2016-02-04 20:00 test/wc.jar

To view the contents of the file you can issue one of the following com-
mands:

$ hadoop fs -cat <hdfsFile>

$ hadoop fs -text <hdfsFile>

To view only the end of a large file, use

$ hadoop fs -tail <hdfsFile>

Copying files. To put a file (or files) onto HDFS from a local system, use
-put:

$ hadoop fs -put <localSource> <hdfsDestination>

Here, <localSource> is the file access path/pattern (can include wild-

cards) on the local system, and <hdfsDestination> is a destination (must
be a directory if <localSource> matches multiple files) on HDFS, where
the file(s) shall be uploaded.
For example,

$ hadoop fs -put data .

copies the file data from the current directory of the local filesystem to
the home directory of the current user of HDFS.
To copy a file (or files) from HDFS to a local file system use -get:

$ hadoop fs -put <hdfsSource> <localDestination>

Here, <hdfsSource> is the file access path/pattern (can include wildcards)
on the HDFS and <localDestination> is a destination (must be a directory
if <hdfsSource> matches multiple files) on HDFS, where the file(s) shall be
uploaded.
For example,

$ hadoop fs -get test/output/part-r-00000 .

copies the file part-r-00000 residing in /user/<loginId>/test/output

directory into the current directory on the local file system.
Using -moveFromLocal instead of -put and -moveToLocal instead of -get
erases the source file/files after they have been successfully transferred to
the new destination.
hadoop fs -cp can be used to copy files within HDFS, as well as copy
files between different file systems.

$ hadoop fs -cp foo bar

copies file foo on HDFS (/user/<loginId>/foo to a new file in the same

directory named bar.

$ hadoop fs -cp file:///home/<loginId>/foo hdfs:///user/<loginId/

copies file foo from the home directory of the user <loginId> on local file
system to HDFS. The inverse can be done using the following command:

$ hadoop fs -cp hdfs:///user/<loginId>/foo file:///home/<loginId/

hadoop fs -mv works the same way, only it removes the source file after
the successful transfer.

Directory operations. Simple directory management is the same as in

Linux.
To create a new HDFS directory:

$ hadoop fs -mkdir <hdfsDirectory>

To remove an empty HDFS directory:

$ hadoop fs -rmdir <hdfsDirectory>

Yahoo Hadoop Tutorial
No ratings yet
Yahoo Hadoop Tutorial
28 pages
Introduction To HDFS
No ratings yet
Introduction To HDFS
21 pages
Hadoop Command Line Interface
No ratings yet
Hadoop Command Line Interface
10 pages
HDFS Commands Updated
No ratings yet
HDFS Commands Updated
87 pages
Chma Unit - Vi
No ratings yet
Chma Unit - Vi
19 pages
COMMAND Line Interface
No ratings yet
COMMAND Line Interface
26 pages
GPFS and HDFS
No ratings yet
GPFS and HDFS
5 pages
Hadoop Distributed File System HDFS 1688981751
No ratings yet
Hadoop Distributed File System HDFS 1688981751
49 pages
Hadoop Hdfs Commands
No ratings yet
Hadoop Hdfs Commands
2 pages
Deepshikha Agrawal Pushp B.Sc. (IT), MBA (IT) Certification-Hadoop, Spark, Scala, Python, Tableau, ML (Assistant Professor JLBS)
No ratings yet
Deepshikha Agrawal Pushp B.Sc. (IT), MBA (IT) Certification-Hadoop, Spark, Scala, Python, Tableau, ML (Assistant Professor JLBS)
74 pages
C:/Users/HP Hdfs Namenode - Format
No ratings yet
C:/Users/HP Hdfs Namenode - Format
7 pages
HDFS File System Shell Guide
No ratings yet
HDFS File System Shell Guide
10 pages
Magnetic Degaussers: NSA/CSS Evaluated Products List For
No ratings yet
Magnetic Degaussers: NSA/CSS Evaluated Products List For
6 pages
DDR4 Sdram
No ratings yet
DDR4 Sdram
29 pages
Unit 2
No ratings yet
Unit 2
22 pages
Create A Directory in HDFS at Given Path(s) .: Upload
No ratings yet
Create A Directory in HDFS at Given Path(s) .: Upload
11 pages
Flash Mem Summit Jcooke Inconvenient Truths Nand
No ratings yet
Flash Mem Summit Jcooke Inconvenient Truths Nand
32 pages
Hadoop Linux Commands
No ratings yet
Hadoop Linux Commands
8 pages
Introduction To HDFS
No ratings yet
Introduction To HDFS
20 pages
Unit 3.1
No ratings yet
Unit 3.1
88 pages
NTFS MFT Example: COEN 152 / 252
No ratings yet
NTFS MFT Example: COEN 152 / 252
70 pages
Lista de Comandos HDFS
No ratings yet
Lista de Comandos HDFS
8 pages
HDFS
No ratings yet
HDFS
6 pages
HDFS Commands v02 PDF
No ratings yet
HDFS Commands v02 PDF
7 pages
HOL - Exploring HDFS
No ratings yet
HOL - Exploring HDFS
6 pages
TP 1 - HDFS
No ratings yet
TP 1 - HDFS
40 pages
Hadoop Commands
No ratings yet
Hadoop Commands
2 pages
Hadoop Commands
100% (1)
Hadoop Commands
6 pages
Memory and Programmable Logic Devices
No ratings yet
Memory and Programmable Logic Devices
44 pages
Bidirectional IDE To SATA Converter: Features
No ratings yet
Bidirectional IDE To SATA Converter: Features
1 page
2 HDFS Commands
No ratings yet
2 HDFS Commands
7 pages
Hadoop Hdfs Commands
No ratings yet
Hadoop Hdfs Commands
5 pages
PDF - HDFS Commandsdsa
No ratings yet
PDF - HDFS Commandsdsa
22 pages
HDFS Command
No ratings yet
HDFS Command
15 pages
Chapter 4 - Hadoop Ecosystem
No ratings yet
Chapter 4 - Hadoop Ecosystem
24 pages
USB Memory
No ratings yet
USB Memory
2 pages
Introduction To CSS (Unit 2 Lesson 2)
No ratings yet
Introduction To CSS (Unit 2 Lesson 2)
92 pages
HDFS Commands - Revised
No ratings yet
HDFS Commands - Revised
6 pages
HDFS (Hadoop Distributed File System) : HDFS Architecture Components of The Architecture
No ratings yet
HDFS (Hadoop Distributed File System) : HDFS Architecture Components of The Architecture
10 pages
Hadoop Linux Hdfs Commands
No ratings yet
Hadoop Linux Hdfs Commands
2 pages
Big Data Analytics Lab Experiments
No ratings yet
Big Data Analytics Lab Experiments
16 pages
FloppyDriveInfo 1349101164
No ratings yet
FloppyDriveInfo 1349101164
27 pages
Command
No ratings yet
Command
1 page
Chapter 3 - Hardware Key Words
No ratings yet
Chapter 3 - Hardware Key Words
10 pages
Big Data AnalyticUnit2
No ratings yet
Big Data AnalyticUnit2
19 pages
Practical 1 - 1 - Hadoop Commands
No ratings yet
Practical 1 - 1 - Hadoop Commands
3 pages
10 Dfs
No ratings yet
10 Dfs
5 pages
Big Data Cheat Sheet
No ratings yet
Big Data Cheat Sheet
12 pages
Unit 2-HDFS SGS
No ratings yet
Unit 2-HDFS SGS
29 pages
Introduction To HDFS
No ratings yet
Introduction To HDFS
18 pages
2.basics of Computer Hardware-Lecture 2, Module 1, MFS-105
No ratings yet
2.basics of Computer Hardware-Lecture 2, Module 1, MFS-105
26 pages
HDFS Commands
No ratings yet
HDFS Commands
7 pages
Exp-2 Hadoop Commands
No ratings yet
Exp-2 Hadoop Commands
6 pages
Prix 10.2.1 Baie de Stockage-11022024
No ratings yet
Prix 10.2.1 Baie de Stockage-11022024
2 pages
Hadoop Commands Only
No ratings yet
Hadoop Commands Only
19 pages
Hadoop HDFS Commands
No ratings yet
Hadoop HDFS Commands
1 page
Family 3957+04 IBM TS7720 Model VEB - 2018
No ratings yet
Family 3957+04 IBM TS7720 Model VEB - 2018
36 pages
HCSA Storage Reviewer (UPDATED 2021)
No ratings yet
HCSA Storage Reviewer (UPDATED 2021)
6 pages
BDA Final Compiled - Pagenumber
No ratings yet
BDA Final Compiled - Pagenumber
71 pages
JKSSB Junior Assistant 25 Jan 2022 Shift 1 (English)
No ratings yet
JKSSB Junior Assistant 25 Jan 2022 Shift 1 (English)
16 pages
Os Chapter 8
No ratings yet
Os Chapter 8
18 pages
ICT IGCSE Command Words - Nageeb
No ratings yet
ICT IGCSE Command Words - Nageeb
4 pages
T2 Worksheet 2
No ratings yet
T2 Worksheet 2
3 pages
Diploma Assignment
No ratings yet
Diploma Assignment
2 pages
Hadoop 1
No ratings yet
Hadoop 1
15 pages
3 - HDFS Hive HBase Pig
No ratings yet
3 - HDFS Hive HBase Pig
8 pages
BCA-4-DECOA-Memory System-1
No ratings yet
BCA-4-DECOA-Memory System-1
11 pages
Lab2 BD
No ratings yet
Lab2 BD
20 pages
Hadoop
No ratings yet
Hadoop
6 pages
Hadoop HDFS Commands With Examples
No ratings yet
Hadoop HDFS Commands With Examples
3 pages
IOTDS Introduction
No ratings yet
IOTDS Introduction
22 pages
Grade 10 Six Weekly Test 2025
No ratings yet
Grade 10 Six Weekly Test 2025
8 pages
BDA UNIT - 3 Updated
No ratings yet
BDA UNIT - 3 Updated
25 pages
Partition
No ratings yet
Partition
10 pages
IBIC-Sample Paper CLASS 1-2
No ratings yet
IBIC-Sample Paper CLASS 1-2
3 pages
Big Data Class Activity Assignment 2
No ratings yet
Big Data Class Activity Assignment 2
17 pages
Patterson6e MIPS Ch05 Modified Part2
No ratings yet
Patterson6e MIPS Ch05 Modified Part2
121 pages
Hafs Commands
No ratings yet
Hafs Commands
17 pages
3a HDFS
No ratings yet
3a HDFS
17 pages
HDFS and HAdoop Command
No ratings yet
HDFS and HAdoop Command
5 pages
BDA Exp 2
No ratings yet
BDA Exp 2
15 pages
Computer Memory Worksheet
No ratings yet
Computer Memory Worksheet
4 pages
Challenges in Multimedia Systems
No ratings yet
Challenges in Multimedia Systems
2 pages
Ai&Ml (Bdamanual)
No ratings yet
Ai&Ml (Bdamanual)
24 pages
Hadoop
No ratings yet
Hadoop
4 pages
Top Tier Oracle Dumps PDF File, 85 Questions
No ratings yet
Top Tier Oracle Dumps PDF File, 85 Questions
4 pages
Apache Hadoop
No ratings yet
Apache Hadoop
3 pages
Quick Configuration of Openldap and Kerberos in Linux and Authenicating Linux to Active Directory
From Everand
Quick Configuration of Openldap and Kerberos in Linux and Authenicating Linux to Active Directory
Dr. Hidaia Mahmood Alassouli
No ratings yet
Big Data Analytics
From Everand
Big Data Analytics
Nitin Kumar Yadav
No ratings yet
Linux Commands By Example
From Everand
Linux Commands By Example
Khaled Jamal
4.5/5 (3)

Hadoop File System: CSC 369 Distributed Computing Alexander Dekhtyar

Uploaded by

Hadoop File System: CSC 369 Distributed Computing Alexander Dekhtyar

Uploaded by

. .

CSC 369 Distributed Computing Alexander Dekhtyar

Hadoop File System

3. Support for write-once, read-many access parttern. HDFS is largely

• A data file is uploaded to HDFS once.

4. Support for commodity hardware. HDFS assumes that it runs on com-

5. Standard POSIX interface. HDFS supports the standard POSIX file

HDFS is not very good for dealing with

HDFS organization. By default, HDFS is organized in a way similar to

$ hdfs dfs -ls

command that I run (as user dekhtyar), is equivalent to running

$ hdfs dfs -ls /user/dekhtyar

$ hdfs dfs -ls hdfs:///user/dekhtyar

Permissions. HDFS supports the standard user-group-others POSIX file

Working with HDFS

Hadoop provides three command-line methods for accessing HDFS:

• hadoop dfs command

• hdfs dfs command

General file system access command format. The general format of

$ hadoop fs -<command> [<arguments>]

File system access commands.

Viewing directory structure and files. To see what is in a specific

$ hadoop fs -ls <hdfsPath>

$ hadoop fs -ls test/

dekhtyar@cslvm31:~/369/lab6$ hadoop fs -ls -R test/

$ hadoop fs -cat <hdfsFile>

$ hadoop fs -text <hdfsFile>

To view only the end of a large file, use

$ hadoop fs -tail <hdfsFile>

$ hadoop fs -put <localSource> <hdfsDestination>

Here, <localSource> is the file access path/pattern (can include wild-

$ hadoop fs -put data .

$ hadoop fs -put <hdfsSource> <localDestination>

$ hadoop fs -get test/output/part-r-00000 .

copies the file part-r-00000 residing in /user/<loginId>/test/output

$ hadoop fs -cp foo bar

copies file foo on HDFS (/user/<loginId>/foo to a new file in the same

$ hadoop fs -cp file:///home/<loginId>/foo hdfs:///user/<loginId/

$ hadoop fs -cp hdfs:///user/<loginId>/foo file:///home/<loginId/

Directory operations. Simple directory management is the same as in

$ hadoop fs -mkdir <hdfsDirectory>

To remove an empty HDFS directory:

$ hadoop fs -rmdir <hdfsDirectory>

You might also like