0% found this document useful (0 votes)
16 views17 pages

Hafs Commands

The document provides detailed definitions and purposes for various HDFS commands, categorized into file and directory operations, file transfer, manipulation, content viewing, information gathering, permissions, advanced operations, and administrative tasks. Each command includes its syntax, options, and intended use cases, facilitating effective management of the Hadoop Distributed File System. The summary also highlights the importance of these commands in data organization, integrity, and security within HDFS.

Uploaded by

telebe3450
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views17 pages

Hafs Commands

The document provides detailed definitions and purposes for various HDFS commands, categorized into file and directory operations, file transfer, manipulation, content viewing, information gathering, permissions, advanced operations, and administrative tasks. Each command includes its syntax, options, and intended use cases, facilitating effective management of the Hadoop Distributed File System. The summary also highlights the importance of these commands in data organization, integrity, and security within HDFS.

Uploaded by

telebe3450
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

HDFS Commands - Detailed Definitions

1. File and Directory Listing Commands

ls - List Directory Contents

Definition: Lists files and directories in the specified HDFS path, similar to Unix ls command.

hdfs dfs -ls [options] <path>

Purpose:

 View contents of HDFS directories

 Check file permissions, ownership, and timestamps

 Verify file existence and properties

Options:

 -l: Long format with detailed information

 -R: Recursive listing of subdirectories

 -a: Show hidden files (starting with .)

 -h: Human-readable file sizes

mkdir - Make Directory

Definition: Creates one or more directories in HDFS file system.

hdfs dfs -mkdir [options] <path1> [path2] ...

Purpose:

 Create directory structure for data organization

 Establish folder hierarchy for different data types

 Prepare storage locations before data ingestion

Options:
 -p: Create parent directories if they don't exist

rmdir - Remove Directory

Definition: Removes empty directories from HDFS.

hdfs dfs -rmdir <path1> [path2] ...

Purpose:

 Clean up empty directory structures

 Remove unused organizational folders

 Maintain clean file system hierarchy

Note: Only works on empty directories. Use rm -r for non-empty directories.

2. File Transfer Commands

put - Upload Files to HDFS

Definition: Copies files from local file system to HDFS.

hdfs dfs -put [options] <local_src> ... <hdfs_dest>

Purpose:

 Upload data files from local system to Hadoop cluster

 Initial data ingestion into HDFS

 Transfer processed results back to HDFS

Options:

 -f: Force overwrite if destination exists

 -p: Preserve file attributes (timestamps, ownership, permissions)


get - Download Files from HDFS

Definition: Copies files from HDFS to local file system.

hdfs dfs -get [options] <hdfs_src> ... <local_dest>

Purpose:

 Download processed results from Hadoop cluster

 Extract data for local analysis

 Create local backups of HDFS data

Options:

 -ignoreCrc: Skip CRC checksum verification

 -crc: Also copy CRC files

copyFromLocal - Copy from Local (No Overwrite)

Definition: Copies files from local file system to HDFS, but fails if destination exists.

hdfs dfs -copyFromLocal <local_src> ... <hdfs_dest>

Purpose:

 Safe file upload that prevents accidental overwrites

 Initial data loading with protection against duplicates

 Batch uploads where file uniqueness is important

copyToLocal - Copy to Local

Definition: Copies files from HDFS to local file system.

hdfs dfs -copyToLocal <hdfs_src> ... <local_dest>


Purpose:

 Extract specific files for local processing

 Create local copies while keeping HDFS originals

 Download configuration or result files

moveFromLocal - Move from Local

Definition: Moves files from local file system to HDFS (deletes local copy).

hdfs dfs -moveFromLocal <local_src> ... <hdfs_dest>

Purpose:

 Transfer files while saving local disk space

 One-time data migration to HDFS

 Move temporary files after processing

3. File Manipulation Commands

cp - Copy Files within HDFS

Definition: Copies files or directories from one HDFS location to another.

hdfs dfs -cp [options] <src> ... <dest>

Purpose:

 Create backups within HDFS

 Duplicate data for different processing pipelines


 Reorganize data across different directories

Options:

 -p: Preserve file attributes

mv - Move/Rename Files

Definition: Moves or renames files and directories within HDFS.

hdfs dfs -mv <src> ... <dest>

Purpose:

 Reorganize data structure

 Rename files with better naming conventions

 Move data between different organizational hierarchies

rm - Remove Files and Directories

Definition: Deletes files and directories from HDFS.

hdfs dfs -rm [options] <path> ...

Purpose:

 Clean up unnecessary files

 Remove temporary processing files

 Delete outdated or corrupted data

Options:

 -r or -R: Recursive deletion for directories

 -f: Force deletion without confirmation

 -skipTrash: Permanent deletion bypassing trash


4. File Content Viewing Commands

cat - Concatenate and Display Files

Definition: Displays the entire content of one or more files to stdout.

hdfs dfs -cat <path> ...

Purpose:

 View small file contents

 Combine multiple files for display

 Quick content verification

Note: Not suitable for large files as it displays entire content.

head - Display Beginning of File

Definition: Shows the first 1KB of a file's content.

hdfs dfs -head <path>

Purpose:

 Preview file structure and format

 Check file headers

 Verify data format without downloading entire file

Note: Unlike Unix head, shows bytes not lines.

tail - Display End of File

Definition: Shows the last 1KB of a file's content.

hdfs dfs -tail [options] <path>


Purpose:

 View latest entries in log files

 Check file endings

 Monitor ongoing data writes

Options:

 -f: Follow file (continuously display new content)

text - Display File as Text

Definition: Displays file content as text, automatically decompressing compressed files.

hdfs dfs -text <path> ...

Purpose:

 View compressed files without manual decompression

 Display various file formats as readable text

 Handle different compression formats automatically

5. File Information Commands

stat - Display File Statistics

Definition: Shows specific statistics about files or directories using format specifiers.

hdfs dfs -stat <format> <path> ...


Purpose:

 Get specific file properties

 Programmatically extract file metadata

 Monitor file characteristics

Format Specifiers:

 %b: Block size

 %o: File size in bytes

 %n: File name

 %r: Replication factor

 %y: Modification time

du - Disk Usage

Definition: Shows space consumed by files and directories.

hdfs dfs -du [options] <path> ...

Purpose:

 Monitor storage consumption

 Identify large files or directories

 Plan storage capacity

Options:

 -h: Human-readable format (KB, MB, GB)

 -s: Summary (total size only)

df - Display File System Information

Definition: Shows HDFS file system capacity, used space, and available space.

hdfs dfs -df [options] [path]


Purpose:

 Monitor overall cluster storage

 Check available disk space

 Plan data ingestion based on capacity

Options:

 -h: Human-readable format

count - Count Files, Directories, and Bytes

Definition: Counts directories, files, and content size for specified paths.

hdfs dfs -count [options] <path> ...

Purpose:

 Inventory data organization

 Monitor data growth

 Generate usage reports

Options:

 -h: Human-readable sizes

 -q: Show quota information

 -u: Show quota usage

6. File Integrity and Testing Commands

checksum - Calculate File Checksum

Definition: Computes and displays checksums for file integrity verification.

hdfs dfs -checksum <path> ...


Purpose:

 Verify file integrity after transfers

 Detect data corruption

 Compare file versions

test - Test File Properties

Definition: Tests various properties of files and directories, returns exit codes.

hdfs dfs -test <flag> <path>

Purpose:

 Script-friendly file existence checking

 Conditional operations based on file properties

 Automated file validation

Flags:

 -e: File exists

 -f: Is a file

 -d: Is a directory

 -z: File is empty

 -s: File is not empty

7. Permission and Ownership Commands

chmod - Change File Permissions

Definition: Modifies access permissions for files and directories.

hdfs dfs -chmod [options] <mode> <path> ...


Purpose:

 Control file access security

 Set appropriate read/write permissions

 Implement data governance policies

Options:

 -R: Recursive permission change

Modes: Octal (755) or symbolic (u+x, g-w, o=r)

chown - Change Ownership

Definition: Changes the owner and/or group of files and directories.

hdfs dfs -chown [options] [owner][:group] <path> ...

Purpose:

 Transfer file ownership

 Assign data to appropriate teams

 Implement organizational data structure

Options:

 -R: Recursive ownership change

chgrp - Change Group Ownership

Definition: Changes only the group ownership of files and directories.

hdfs dfs -chgrp [options] <group> <path> ...


Purpose:

 Modify group access without changing owner

 Reorganize team-based access controls

 Implement departmental data sharing

Options:

 -R: Recursive group change

8. Advanced File Operations

appendToFile - Append to Existing File

Definition: Appends content from local files to an existing HDFS file.

hdfs dfs -appendToFile <local_src> ... <hdfs_dest>

Purpose:

 Add data to existing files

 Implement incremental data loading

 Append log entries to existing log files

touchz - Create Empty File

Definition: Creates empty files in HDFS (zero-length files).

hdfs dfs -touchz <path> ...

Purpose:

 Create placeholder files

 Mark completion of processes

 Initialize files for later appending


getmerge - Merge and Download

Definition: Merges multiple HDFS files into a single local file.

hdfs dfs -getmerge [options] <src> <local_dest>

Purpose:

 Combine distributed processing results

 Create single output file from multiple parts

 Consolidate data for external systems

Options:

 -nl: Add newline between merged files

 -skip-empty-file: Skip empty files during merge

9. Replication Management

setrep - Set Replication Factor

Definition: Changes the replication factor for existing files.

hdfs dfs -setrep [options] <rep> <path> ...

Purpose:

 Adjust data redundancy levels

 Optimize storage usage

 Improve data availability

Options:

 -R: Recursive replication setting

 -w: Wait for replication to complete


10. Administrative Commands

fsck - File System Check

Definition: Checks HDFS file system health and reports issues.

hdfs fsck [options] <path>

Purpose:

 Diagnose file system problems

 Identify corrupted files

 Monitor cluster health

Options:

 -files: Show file information

 -blocks: Show block information

 -locations: Show block locations

 -list-corruptfileblocks: List corrupted files

find - Find Files and Directories

Definition: Searches for files and directories based on various criteria.

hdfs dfs -find <path> <expression>

Purpose:

 Locate files by name patterns

 Find files by size or date

 Search directory structures

Expressions:
 -name pattern: Find by name

 -type f|d: Find files or directories

 -size [+|-]size: Find by size

 -mtime days: Find by modification time

11. Access Control Lists (ACLs)

getfacl - Get Access Control List

Definition: Displays Access Control List information for files and directories.

hdfs dfs -getfacl [options] <path> ...

Purpose:

 View detailed permission settings

 Audit access controls

 Understand current security configuration

Options:

 -R: Recursive ACL display

setfacl - Set Access Control List

Definition: Sets or modifies Access Control Lists for fine-grained permissions.

hdfs dfs -setfacl [options] <acl_spec> <path> ...

Purpose:

 Implement complex permission schemes

 Grant specific user/group access

 Override default permission model


Options:

 -m: Modify ACL

 -x: Remove ACL entries

 -b: Remove all ACLs

 -R: Recursive ACL setting

12. Data Transfer Between Clusters

distcp - Distributed Copy

Definition: Efficiently copies large amounts of data within or between Hadoop clusters.

hadoop distcp [options] <source> <destination>

Purpose:

 Transfer data between clusters

 Perform large-scale data migrations

 Synchronize data across environments

Options:

 -update: Skip files that exist and have same size

 -delete: Delete files in destination not in source

 -overwrite: Overwrite existing files

 -m <num>: Number of mappers to use

Summary of Command Categories

1. File Operations: put, get, cp, mv, rm

2. Directory Operations: ls, mkdir, rmdir

3. Content Viewing: cat, head, tail, text

4. Information Gathering: stat, du, df, count, checksum

5. Permissions: chmod, chown, chgrp, getfacl, setfacl

6. File Manipulation: appendToFile, touchz, getmerge


7. System Administration: fsck, setrep, distcp

8. Search and Test: find, test

Each command serves specific purposes in managing the Hadoop Distributed File System,
from basic file operations to advanced administrative tasks and security management.

You might also like