0% found this document useful (0 votes)
33 views27 pages

TPhadoop

This document discusses setting up Cloudera QuickStart virtual machine to run Hadoop and MapReduce programs. It describes downloading and installing VirtualBox and the Cloudera QuickStart VM, then importing and configuring the VM. It demonstrates running the Hadoop word count example to count words in a text file, which requires first copying the file to HDFS from the local file system. Finally, it summarizes the key steps taken to run a MapReduce job on Hadoop using the Cloudera QuickStart VM.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views27 pages

TPhadoop

This document discusses setting up Cloudera QuickStart virtual machine to run Hadoop and MapReduce programs. It describes downloading and installing VirtualBox and the Cloudera QuickStart VM, then importing and configuring the VM. It demonstrates running the Hadoop word count example to count words in a text file, which requires first copying the file to HDFS from the local file system. Finally, it summarizes the key steps taken to run a MapReduce job on Hadoop using the Cloudera QuickStart VM.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

Big Data Analytics

© 1
Hadoop Distribution
• Hadoop distribution: Cloudera QuickStart
• Platform: Virtual Box
• System Requirements
– 64-bit host OS and a virtualization that support
64-bit guest OS
– RAM for VM: 4 GB
– HDD: 20 GB

© 2
Installing Cloudera QuickStart
• Download size: ~5.5 GB
• Download links
– https://www.virtualbox.org/wiki/Downloads
Select package corresponding to your host system

– https://downloads.cloudera.com/demo_vm/virtu
albox/cloudera-quickstart-vm-5.13.0-0-
virtualbox.zip

© 3
VirtualBox Download

© 4
Installing Cloudera QuickStart
• Install VirtualBox
• Unzip Cloudera VM
• Start VirtualBox
• Import Appliance (Virtual Machine)
• Launch Cloudera VM

© 5
Start Virtual Box

© 6
Import Appliance

© 7
Setting Up the VM

Select Bidirectional
to share clipboard

© 8
Setting Up the VM

8GB of RAM is
recommended

© 9
Setting Up the VM

At least 2 CPUs is
recommended

© 10
Launch Cloudera VM

© 11
Launch Cloudera VM

Login: cloudera Password: cloudera


© 12
Launch Cloudera VM

© 13
Troubleshooting
• The VM does not start:
AMD-V is disabled in the BIOS (or by
the host OS) (VERR_SVM_DISABLED).

Make sure that your BIOS allows virtualization

• VM freezes when starting:


It does not freeze, just wait until it finishes
loading

© 14
Let’s check if we can run Hadoop
• Open terminal

• Type in the following command


hadoop jar /usr/lib/hadoop-
mapreduce/hadoop-mapreduce-examples.jar

• It should list available commands


© 15
Word Count
• Now let’s try
hadoop jar /usr/lib/hadoop-
mapreduce/hadoop-mapreduce-
examples.jar wordcount
• Result
Usage: wordcount <in> [<in>...] <out>
[cloudera@quickstart ~]$
• This is word-counting example
• Let’s count some words
© 16
Word Files
• The Complete Works of William Shakespeare
https://ocw.mit.edu/ans7870/6/6.006/s08/lec
turenotes/files/t8.shakespeare.txt

• The Project Gutenberg EBook of The


Adventures of Sherlock Holmes
http://norvig.com/big.txt

© 17
Download and Save
• Open web browser

• Type in or paste the URL

© 18
Download and Save
• After the page is loaded,
save the file
• Default destination is
~/Download

© 19
Let’s count the words
• Open terminal and type
hadoop jar /usr/lib/hadoop-
mapreduce/hadoop-mapreduce-
examples.jar wordcount big.txt out

• It will fail
InvalidInputException: Input path
does not exist:

• This is because the file is not yet in HDFS!


© 20
Local File System and HDFS
• Hadoop does not store everything in HDFS
• Map results are normally stored in nodes’ local
file systems
– Map results are intermediate results which will be
sent to reduce task later
– They do not need redundancy provided by HDFS
– If a map node fails, Hadoop task manager simply
resend the task to another node
• Hadoop HDFS stores
– Input data: We must put our data into HDFS first
– Reduce output data: Result of the entire process

© 21
Copy the data into HDFS
• Open the terminal and go to Downloads
directory
cd Downloads/

• List the files with ls or ls –al

• You should see your downloaded files


[cloudera@quickstart Downloads]$ ls
big.txt t8.shakespeare.txt

© 22
Copy the data into HDFS
• Copy the file from local file system to HDFS
hadoop fs -copyFromLocal big.txt

Command: Command Option:


File system Copy file from local FS to HDFS
commands

• Check whether the file is copied correctly


hadoop fs –ls

• Now, let’s try to copy big.txt to HDFS again

© 23
Other HDFS Command Options
• List the files in current directory
hadoop fs –ls

• Copy files within HDFS


hadoop fs -cp big.txt big2.txt

• Copy files back to local file system


hadoop fs -copyToLocal big2.txt

• Remove files in HDFS


hadoop fs -rm big2.txt

• Show all command options


hadoop fs
© 24
Let’s count the words (again)
• Open terminal and type
hadoop jar /usr/lib/hadoop-
mapreduce/hadoop-mapreduce-
examples.jar wordcount big.txt out
• This time it should run
• While it is running, Hadoop will show progress
including completed map and reduce tasks

© 25
Copy the result to local FS
• The output is stored in directory out in HDFS

• You can list the contents inside the directory with:


hadoop fs –ls out

• Then copy the result file back with


hadoop fs –copyToLocal out/part-r-
00000

• Now see the contents of the result:


more part-r-00000
© 26
What have we done so far?
• We copied files to and from HDFS
• We have run some HDFS file commands
• We have executed MapReduce program
– The data to be operated is on HDFS
– But the program is on the local file system
– WordCount is written in Java but it can be any
language

© 27

You might also like