Open In App

Hadoop - getmerge Command

Last Updated : 04 Aug, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

hdfs dfs -getmerge command in Hadoop is used to merge multiple files stored in HDFS (Hadoop Distributed File System) into a single output file and place that file into the local file system. This is useful when:

  • You have multiple small files in HDFS and want to combine them into one.
  • You want to fetch processed output files from HDFS to your local system in a single file for further use.

Example : Suppose we have two files in HDFS:

  • file1.txt
  • file2.txt

We want to merge them into a single file named output.txt in our local file system.

Step 1: Check Content of Files

Before merging, let’s see the content of both files that are available in HDFS.

Content of file1.txt

Content of File1.txt

Content of file2.txt

Content of File2.txt

We will merge these two files into one.

Step 2: Create a Directory in HDFS

First, create a directory in HDFS (e.g., /Hadoop_File) where we will store our files:

hdfs dfs -mkdir /Hadoop_File

Step 3: Copy Files from Local to HDFS

Copy both file1.txt and file2.txt from the local system to HDFS:

hdfs dfs -copyFromLocal /home/dikshant/Documents/hadoop_file/file1.txt /Hadoop_File
hdfs dfs -copyFromLocal /home/dikshant/Documents/hadoop_file/file2.txt /Hadoop_File

Hadoop - getmerge Command - 1

Now both files are inside the /Hadoop_File directory in HDFS. You can verify this by listing the files:

hdfs dfs -ls /Hadoop_File

Hadoop - getmerge Command - 2

Step 4: Syntax of -getmerge Command

hdfs dfs -getmerge [-nl] <source_path1> <source_path2> ... <local_destination_file>

  • -nl: Adds a new line between the contents of files being merged.
  • <source_path> : The files in HDFS to merge.
  • <local_destination_file>: Path in the local file system where the merged file will be created.

Step 5: Merge Files Using -getmerge

Now merge file1.txt and file2.txt from HDFS into a single file output.txt in the local system:

hdfs dfs -getmerge -nl /Hadoop_File/file1.txt /Hadoop_File/file2.txt /home/dikshant/Documents/hadoop_file/output.txt

Step 6: Verify the Output

Check whether the files have been merged successfully:

cd /home/dikshant/Documents/hadoop_file
ls
cat output.txt

You should now see the combined content of both files, with a newline separating them (because we used -nl).

getmerge command output in Hadoop

Key Points to Remember

  • If you omit -nl, the contents of files will be merged without newlines, which may cause data overlap.
  • You can merge an entire directory instead of individual files:

hdfs dfs -getmerge -nl /Hadoop_File /home/dikshant/Documents/hadoop_file/output.txt

  • This will merge all files inside /Hadoop_File into output.txt.
  • The merged file is always stored in the local file system, not back in HDFS.

Article Tags :

Similar Reads