0% found this document useful (0 votes)
81 views5 pages

L2 Accessing HDFS On Cloudera Distribution

The document outlines how to browse directories and copy files between the local filesystem and HDFS. It provides scenarios for listing directories in the local home and HDFS, creating files and directories, and copying a file from the local home to HDFS using either the command line, WinSCP, or HUE. The document also describes how to view file parameters like replication, locations, and blocks in HDFS using fsck and how to remove directories in HDFS. Exercises are provided to practice additional HDFS file operations.

Uploaded by

Ahmad Hazzeem
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
81 views5 pages

L2 Accessing HDFS On Cloudera Distribution

The document outlines how to browse directories and copy files between the local filesystem and HDFS. It provides scenarios for listing directories in the local home and HDFS, creating files and directories, and copying a file from the local home to HDFS using either the command line, WinSCP, or HUE. The document also describes how to view file parameters like replication, locations, and blocks in HDFS using fsck and how to remove directories in HDFS. Exercises are provided to practice additional HDFS file operations.

Uploaded by

Ahmad Hazzeem
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 5

L2: Accessing HDFS

Outlines Scenario 1 - browsing directories of cloudera home 


Scenario 2 - copying a file from home to HDFS
Troubleshooting
Exercise

Scenario 1 One of the important steps toward copying a file to HDFS is to get familiar to browse
through the directories within the cloudera's home

Open terminal (via


putty)

view the current


directory on local
system

list the current


directory in local
system

list the current


directory in hdfs

*you may see some files if exists, otherwise, just an empty hdfs directory

list files from a


specific directory

create a file using cat

*to exit the cat command in terminal, press keys CTRL & D

u can check the created file via ls command:


create a directory in
hdfs

u can check the created directory as follows:

Scenario 2 To copy a text file from local home directory into HDFS via terminal

*Note: Take note the different path of local directory vs HDFS:


local home directory -> /home/XX (depending on user profile, eg.
student_sa1)
HDFS -> user/XX (depending on user profile, eg. student_sa1)

transfer the file into


hdfs

you can check the transferred file as follows:

(optional) view the


created directed via
HUE

transfer the file into Note: you will need to install winscp and setting up winscp
hdfs
(using WinSCP
winscp+ssh+comman 1) launch your winscp and login to the remote machine. You should see both sides
ds) such as:

2) browse and select the file to be transferred on the left side

2) click upload
4) you should see the uploaded file in the right panel

SSH
5) launch putty and access cluster via SSH using the given account

6) list the transferred files in your home directory by typing: ls

7) (optional) you can check the existing files and directories in HDFS
hdfs dfs –ls

8) create a directory in HDFS to store the file to be transferred. The command:


$hdfs dfs -mkdir input

9) transfer the file into hdfs using this command:


hdfs dfs -put  student_record.csv input
(Alternatively you could use –CopyFromLocal command)

10) check the transferred file in HDFS using this command:


hdfs dfs -ls input

transfer the file into 1) login to HUE using the given accounts - https://bigdatalab-rm-en1.uitm.edu.my:8889/
hdfs
(using HUE) 2) click on the file browser

3) click upload
4) browse and select the file

5) you should get the following:

observe several Pre-requisite: In this example, you should have created the directory and stored the
parameters respective file

Sample dataset less than 128MB

Run the following command:

hdfs fsck input/student_record.csv -files -blocks -locations


where

input is the directory


student_record.csv is the sample file

Observe the following (within the configured cluster):

replication factors
locations
number of blocks
number of data-nodes
number of racks

remove directory in If you need to remove a directory in HDFS which is not empty, you can use this
HDFS command:

hdfs dfs -rm -r input

where input is the name of the directory of folder


Explore the cat command...
https://www.tecmint.com/13-basic-cat-command-examples-in-linux/

Exercises Create copy of existing file in HDFS

$ hdfs dfs -cp id.txt copy_of_id.txt

Create copy of existing directory in HDFS

$ hdfs dfs -cp my_new_dir/ copy_of_my_new_dir

Move existing file into a directory, and followed by renaming the file to (id.txt)

$ hdfs dfs -mv copy_of_id.txt my_new_dir/id.txt

Removing file

$ hdfs dfs -rm copy_of_id.txt

Create an empty file

$ hdfs dfs -touchz new_empty_id.txt

copy a file from Local FileSystem to HDFS FileSystem using;


$ -copyFromLocal [source-path] [dest-hdfs-path]
copy a file from HDFS FileSystem to Local FileSystem;
$ -copyToLocal [source-path] [dest-hdfs-path]
(Alternativel you could use -get <source_dir source_Pth> command)

You might also like