TPhadoop
TPhadoop
© 1
Hadoop Distribution
• Hadoop distribution: Cloudera QuickStart
• Platform: Virtual Box
• System Requirements
– 64-bit host OS and a virtualization that support
64-bit guest OS
– RAM for VM: 4 GB
– HDD: 20 GB
© 2
Installing Cloudera QuickStart
• Download size: ~5.5 GB
• Download links
– https://www.virtualbox.org/wiki/Downloads
Select package corresponding to your host system
– https://downloads.cloudera.com/demo_vm/virtu
albox/cloudera-quickstart-vm-5.13.0-0-
virtualbox.zip
© 3
VirtualBox Download
© 4
Installing Cloudera QuickStart
• Install VirtualBox
• Unzip Cloudera VM
• Start VirtualBox
• Import Appliance (Virtual Machine)
• Launch Cloudera VM
© 5
Start Virtual Box
© 6
Import Appliance
© 7
Setting Up the VM
Select Bidirectional
to share clipboard
© 8
Setting Up the VM
8GB of RAM is
recommended
© 9
Setting Up the VM
At least 2 CPUs is
recommended
© 10
Launch Cloudera VM
© 11
Launch Cloudera VM
© 13
Troubleshooting
• The VM does not start:
AMD-V is disabled in the BIOS (or by
the host OS) (VERR_SVM_DISABLED).
© 14
Let’s check if we can run Hadoop
• Open terminal
© 17
Download and Save
• Open web browser
© 18
Download and Save
• After the page is loaded,
save the file
• Default destination is
~/Download
© 19
Let’s count the words
• Open terminal and type
hadoop jar /usr/lib/hadoop-
mapreduce/hadoop-mapreduce-
examples.jar wordcount big.txt out
• It will fail
InvalidInputException: Input path
does not exist:
© 21
Copy the data into HDFS
• Open the terminal and go to Downloads
directory
cd Downloads/
© 22
Copy the data into HDFS
• Copy the file from local file system to HDFS
hadoop fs -copyFromLocal big.txt
© 23
Other HDFS Command Options
• List the files in current directory
hadoop fs –ls
© 25
Copy the result to local FS
• The output is stored in directory out in HDFS
© 27