Importing and Exporting Files in Hadoop Distributed File System
Importing and Exporting Files in Hadoop Distributed File System
Task 1: Import a text file from the local file system into HDFS using the Hadoop
command-line tool. Ensure that the file is correctly replicated across the HDFS data nodes. b.
Task 2: Import a CSV file into HDFS, considering the file format and data structure. Validate the
successful import by checking the file location and size in HDFS.
2. Exporting Files: a. Task 1: Export a text file from HDFS to the local file system. Use the
appropriate Hadoop command-line tool to ensure a seamless export operation. b. Task 2: Export
a csv file from HDFS to the local file system. Validate the export by checking the file integrity and
verifying its compatibility with the Parquet file format.
3. Advanced Import/Export Operations: a. Task 1: Import a directory containing multiple files from
the local file system into HDFS. Ensure that the entire directory structure is preserved during the
import process. b. Task 2: Export a directory from HDFS to the local file system, including all its
subdirectories and files. Verify the exported directory structure and file contents.
4. Documentation and Reflection: Write a detailed report documenting the steps you followed,
including the commands used for importing and exporting files. Reflect on the challenges you
encountered, the efficiency of the import/export operations, and the benefits of using HDFS for
data storage and retrieval.
Firstly, I imported Test2.txt into HDFS by using hadoop fs -put test2.txt / command. For exporting,
the file from Hadoop hadoop fs -get /test2.txt /home/cloudera was used. The movies.csv was
also imported and exported similarly. I then created a directory with multiple files in it and
imported and exported it similarly.
hadoop fs -put testy / was used to import the ‘testy’ directory with multiple files in it inside
Hadoop.
hadoop fs -get /testy /home/cloudera was used to export ‘testy’ directory with multiple files in it
from Hadoop to local system.
Any files and/or directories already existent in the local system or Hadoop had to be removed
during the demonstration. Apart from that, no challenges were encountered. The execution of
the operations was smooth.