Hadoop File System: CSC 369 Distributed Computing Alexander Dekhtyar
Hadoop File System: CSC 369 Distributed Computing Alexander Dekhtyar
HDFS Basics
Hadoop File System or HDFS is a distributed file system that resides on top
of the filesystems in of the compute nodes forming a Hadoop cluster.
HDFS has the following properties:
1. Distributed file storage. All data stored on HDFS is accessible from all
Hadoop nodes.
2. Optimized for very large file storage. HDFS stores files in blocks of 64
MB. This means that a single disk read operation can bring 64 Mb of
data directly to a compute node.
1
1. Large numbers of small files. Each file will wind up being stored in a
64MB block.
2. Multiple active writes to HDFS files. Data files on HDFS are assumed
to be static. HDFS is not very good at supporting active modification
of data files.
hdfs:///user/<loginId>
or, simply
/user/<loginId>
This is the default location for all file transfers/file operations for HDFS
for user <loginId>. For example
or
• hadoop fs command
hadoop dfs and hdfs dfs commands. The hadoop dfs and hdfs dfs
commands provide command-line access to HDFS and the files stored on it.
hadoop dfs command is depricated in the new version of hadoop. You
must use hdfs dsf command now.
hadoop fs command. The hadoop fs command provides interface to
any file system reachable from the node on which the command is run.
Specifically, in addition to HDFS, hadoop fs can access files from the local
file system.
Below, we use hadoop fs to represent the syntax of HDFS commands.
The syntax of the other two commands is similar.
Here, <command> is the file system access command, and <arguments> are
the optional arguments to each command.
HDFS supports the following file system access commands. (This is not a
full list, but rather a list of most important commands.)
Command Meaning
-help help message, instructions on use of commands
-usage display information about the usage of a specific command
-ls display the lists of files/directories
-put, -copyFromLocal copy file from local file system to HDFS
-get, -copyToLocal copy file from HDFS to local file system
-moveFromLocal move file from local file system to HDFS
-moveToLocal move file from HDFS to local fils system
-mkdir create a directory
-rmdir remove a directory
-cp copy files
-mv move files
-rm delete (remove) files
-touchz create a zero length file
-chmod change file access permissions
-chgroup change file group
-chown change file owner
-cat display contents of file(s)
-text output the contents of a file as text
-tail display the last 1Kb of the file
-du show file system usage statistics
-df show free space on the file system
For example
shows the list of files and directories in the test directory located in the
home directory of the current user.
A sample output may be:
dekhtyar@cslvm31:~/369/lab6$ hadoop fs -ls test/
Found 5 items
-rw-r--r-- 2 dekhtyar supergroup 83 2016-02-04 14:59 test/data
drwxr-xr-x - dekhtyar supergroup 0 2016-02-05 12:03 test/grep
drwxr-xr-x - dekhtyar supergroup 0 2016-02-09 17:33 test/out01
drwxr-xr-x - dekhtyar supergroup 0 2016-02-09 17:09 test/output
-rw-r--r-- 2 dekhtyar supergroup 3302 2016-02-04 20:00 test/wc.jar
HDFS supports the -ls -R flag, which recursively lists all subdirectories.
To view the contents of the file you can issue one of the following com-
mands:
or
Copying files. To put a file (or files) onto HDFS from a local system, use
-put:
copies the file data from the current directory of the local filesystem to
the home directory of the current user of HDFS.
To copy a file (or files) from HDFS to a local file system use -get:
copies file foo from the home directory of the user <loginId> on local file
system to HDFS. The inverse can be done using the following command:
hadoop fs -mv works the same way, only it removes the source file after
the successful transfer.