0% found this document useful (0 votes)

33 views27 pages

TPhadoop

This document discusses setting up Cloudera QuickStart virtual machine to run Hadoop and MapReduce programs. It describes downloading and installing VirtualBox and the Cloudera QuickStart VM, then importing and configuring the VM. It demonstrates running the Hadoop word count example to count words in a text file, which requires first copying the file to HDFS from the local file system. Finally, it summarizes the key steps taken to run a MapReduce job on Hadoop using the Cloudera QuickStart VM.

Uploaded by

Abdou garba Hamissou

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views27 pages

TPhadoop

Uploaded by

Abdou garba Hamissou

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

Big Data Analytics

© 1
Hadoop Distribution
• Hadoop distribution: Cloudera QuickStart
• Platform: Virtual Box
• System Requirements
– 64-bit host OS and a virtualization that support
64-bit guest OS
– RAM for VM: 4 GB
– HDD: 20 GB

© 2
Installing Cloudera QuickStart
• Download size: ~5.5 GB
• Download links
– https://www.virtualbox.org/wiki/Downloads
Select package corresponding to your host system

– https://downloads.cloudera.com/demo_vm/virtu
albox/cloudera-quickstart-vm-5.13.0-0-
virtualbox.zip

© 4
Installing Cloudera QuickStart
• Install VirtualBox
• Unzip Cloudera VM
• Start VirtualBox
• Import Appliance (Virtual Machine)
• Launch Cloudera VM

Select Bidirectional
to share clipboard

8GB of RAM is
recommended

At least 2 CPUs is
recommended

Login: cloudera Password: cloudera

Make sure that your BIOS allows virtualization

• VM freezes when starting:

It does not freeze, just wait until it finishes
loading

• Type in the following command

hadoop jar /usr/lib/hadoop-
mapreduce/hadoop-mapreduce-examples.jar

• It should list available commands

© 15
Word Count
• Now let’s try
hadoop jar /usr/lib/hadoop-
mapreduce/hadoop-mapreduce-
examples.jar wordcount
• Result
Usage: wordcount <in> [<in>...] <out>
[cloudera@quickstart ~]$
• This is word-counting example
• Let’s count some words
© 16
Word Files
• The Complete Works of William Shakespeare
https://ocw.mit.edu/ans7870/6/6.006/s08/lec
turenotes/files/t8.shakespeare.txt

• The Project Gutenberg EBook of The

Adventures of Sherlock Holmes
http://norvig.com/big.txt

• Type in or paste the URL

© 19
Let’s count the words
• Open terminal and type
hadoop jar /usr/lib/hadoop-
mapreduce/hadoop-mapreduce-
examples.jar wordcount big.txt out

• It will fail
InvalidInputException: Input path
does not exist:

• This is because the file is not yet in HDFS!

© 20
Local File System and HDFS
• Hadoop does not store everything in HDFS
• Map results are normally stored in nodes’ local
file systems
– Map results are intermediate results which will be
sent to reduce task later
– They do not need redundancy provided by HDFS
– If a map node fails, Hadoop task manager simply
resend the task to another node
• Hadoop HDFS stores
– Input data: We must put our data into HDFS first
– Reduce output data: Result of the entire process

• List the files with ls or ls –al

• You should see your downloaded files

[cloudera@quickstart Downloads]$ ls
big.txt t8.shakespeare.txt

Command: Command Option:

File system Copy file from local FS to HDFS
commands

• Check whether the file is copied correctly

hadoop fs –ls

• Now, let’s try to copy big.txt to HDFS again

• Copy files within HDFS

hadoop fs -cp big.txt big2.txt

• Copy files back to local file system

hadoop fs -copyToLocal big2.txt

• Remove files in HDFS

hadoop fs -rm big2.txt

• Show all command options

hadoop fs
© 24
Let’s count the words (again)
• Open terminal and type
hadoop jar /usr/lib/hadoop-
mapreduce/hadoop-mapreduce-
examples.jar wordcount big.txt out
• This time it should run
• While it is running, Hadoop will show progress
including completed map and reduce tasks

• You can list the contents inside the directory with:

hadoop fs –ls out

• Then copy the result file back with

hadoop fs –copyToLocal out/part-r-
00000

• Now see the contents of the result:

more part-r-00000
© 26
What have we done so far?
• We copied files to and from HDFS
• We have run some HDFS file commands
• We have executed MapReduce program
– The data to be operated is on HDFS
– But the program is on the local file system
– WordCount is written in Java but it can be any
language

Big Data & Analytics Lab Manual
No ratings yet
Big Data & Analytics Lab Manual
51 pages
CClab
No ratings yet
CClab
63 pages
PDC All Labs
100% (1)
PDC All Labs
129 pages
Lab 4 - Installation of Hadoop and MapReduce WordCount Example
100% (1)
Lab 4 - Installation of Hadoop and MapReduce WordCount Example
16 pages
Online Hotel Management System Synopsis Report
No ratings yet
Online Hotel Management System Synopsis Report
33 pages
Big Data Analytics - Lecture 6
No ratings yet
Big Data Analytics - Lecture 6
33 pages
Guia Contador de Palabras Cloudera
No ratings yet
Guia Contador de Palabras Cloudera
23 pages
Big Data Cloudera TP
No ratings yet
Big Data Cloudera TP
33 pages
Bda Lab
No ratings yet
Bda Lab
47 pages
Activity 2
No ratings yet
Activity 2
31 pages
Data Storage Data Processing: Hadoop Distributed File System (HDFS) Mapreduce
No ratings yet
Data Storage Data Processing: Hadoop Distributed File System (HDFS) Mapreduce
35 pages
Cloud PDF
No ratings yet
Cloud PDF
47 pages
Exp1 Hirday Merged
No ratings yet
Exp1 Hirday Merged
102 pages
Intellipaat Hands On Exercises PDF
No ratings yet
Intellipaat Hands On Exercises PDF
49 pages
Part 03 Intro To Hadoop
No ratings yet
Part 03 Intro To Hadoop
22 pages
CC Hadoop Lab
No ratings yet
CC Hadoop Lab
6 pages
DAN Lab ManuaL
No ratings yet
DAN Lab ManuaL
53 pages
Big Data File
No ratings yet
Big Data File
16 pages
Setup Hadoop Gettingstart
No ratings yet
Setup Hadoop Gettingstart
4 pages
Tutorial-Counting Words in File (S) Using Mapreduce: Prerequisites
No ratings yet
Tutorial-Counting Words in File (S) Using Mapreduce: Prerequisites
11 pages
20CSPL701 - Bda - Record 2024-2025
No ratings yet
20CSPL701 - Bda - Record 2024-2025
61 pages
20CSPL701 - Bda - Record 2024-2025
No ratings yet
20CSPL701 - Bda - Record 2024-2025
54 pages
Run The WordCount Program Instructions
No ratings yet
Run The WordCount Program Instructions
3 pages
Lab Assignment 1: Mapreduce / Hadoop: Notes
No ratings yet
Lab Assignment 1: Mapreduce / Hadoop: Notes
2 pages
Big Data Analytics Lab Experiments
No ratings yet
Big Data Analytics Lab Experiments
16 pages
Basic HDFS Commands
No ratings yet
Basic HDFS Commands
7 pages
Extreme Computing Lab Exercises Session One: 1 Getting Started
No ratings yet
Extreme Computing Lab Exercises Session One: 1 Getting Started
6 pages
Word Count Using MapReduce On Hadoop
No ratings yet
Word Count Using MapReduce On Hadoop
14 pages
Course: Big Data Analytics Lab Scheme: 2017
No ratings yet
Course: Big Data Analytics Lab Scheme: 2017
25 pages
Bda Lab S
No ratings yet
Bda Lab S
92 pages
PRACTICAL NO-3 Word Count
No ratings yet
PRACTICAL NO-3 Word Count
4 pages
BODS HANA 3years
No ratings yet
BODS HANA 3years
2 pages
@bigdatalabfile 09
No ratings yet
@bigdatalabfile 09
35 pages
Practice 2
No ratings yet
Practice 2
7 pages
03 - Run The WordCount Program Instructions
No ratings yet
03 - Run The WordCount Program Instructions
4 pages
BDA Lab
No ratings yet
BDA Lab
13 pages
BIG Data File
No ratings yet
BIG Data File
28 pages
Big Datalab
No ratings yet
Big Datalab
4 pages
Big Data Lab Manual Printout
No ratings yet
Big Data Lab Manual Printout
51 pages
Homework Labs Lecture01
No ratings yet
Homework Labs Lecture01
9 pages
Hadoop Administrator Training - Lab Hand Book
No ratings yet
Hadoop Administrator Training - Lab Hand Book
12 pages
Hadoop Installaion
No ratings yet
Hadoop Installaion
113 pages
How To Set Up A Hadoop Cluster in Docker
No ratings yet
How To Set Up A Hadoop Cluster in Docker
13 pages
Developing A Simple Map-Reduce Program For Hadoop: Big Data Course CS6350 Professor: Dr. Latifur Khan
No ratings yet
Developing A Simple Map-Reduce Program For Hadoop: Big Data Course CS6350 Professor: Dr. Latifur Khan
22 pages
Labs Hadoop1
No ratings yet
Labs Hadoop1
9 pages
Requirement Gathering Questions
No ratings yet
Requirement Gathering Questions
15 pages
Extracting Real Value From Your Data With Apache Hadoop: Sarah Sproehnle
No ratings yet
Extracting Real Value From Your Data With Apache Hadoop: Sarah Sproehnle
51 pages
Hands On
No ratings yet
Hands On
26 pages
Homework Labs Lecture2
No ratings yet
Homework Labs Lecture2
6 pages
Go To Cloudera Quickstart VM To Download A Pre-Setup CDH Virtual Machine
No ratings yet
Go To Cloudera Quickstart VM To Download A Pre-Setup CDH Virtual Machine
20 pages
Labs Lecture2
No ratings yet
Labs Lecture2
6 pages
Bda Lab Manual
No ratings yet
Bda Lab Manual
42 pages
BDA Manual
No ratings yet
BDA Manual
41 pages
Lab 1 - Hadoop HDFS and MapReduce
No ratings yet
Lab 1 - Hadoop HDFS and MapReduce
4 pages
Answers
No ratings yet
Answers
5 pages
BDA Journal
No ratings yet
BDA Journal
52 pages
Final Bda Lab Manual
No ratings yet
Final Bda Lab Manual
56 pages
TP3 - Hadoop Python - Wordcount
No ratings yet
TP3 - Hadoop Python - Wordcount
6 pages
Big Data Analytics Lab Manual (BE AI&DS)
No ratings yet
Big Data Analytics Lab Manual (BE AI&DS)
29 pages
Ais Target
No ratings yet
Ais Target
6 pages
Hadoop Practical Commands & Mapreduce Lab Mannula With Java and Python
No ratings yet
Hadoop Practical Commands & Mapreduce Lab Mannula With Java and Python
2 pages
Reproducibility and Replicability in Science (2019) : This PDF Is Available at
100% (1)
Reproducibility and Replicability in Science (2019) : This PDF Is Available at
257 pages
ICHAMS Research Manual
No ratings yet
ICHAMS Research Manual
37 pages
Unit 04 - Database Design and Development
No ratings yet
Unit 04 - Database Design and Development
6 pages
Active Learning
No ratings yet
Active Learning
21 pages
T-Codes, Tables, Reports
No ratings yet
T-Codes, Tables, Reports
6 pages
Hashing
No ratings yet
Hashing
34 pages
C02 Data - Models WEEK 2 PDF
No ratings yet
C02 Data - Models WEEK 2 PDF
57 pages
Practical Questions XII 802
No ratings yet
Practical Questions XII 802
5 pages
Cse102-Data Structures and Algorithms Unit-I: M. Nagaraju, SCSE, VIT Vellore
No ratings yet
Cse102-Data Structures and Algorithms Unit-I: M. Nagaraju, SCSE, VIT Vellore
43 pages
Unit 5 Dbms
No ratings yet
Unit 5 Dbms
12 pages
BCA 205 B Fundamental of DBMS
No ratings yet
BCA 205 B Fundamental of DBMS
4 pages
Foxpro File Mantain Notes
No ratings yet
Foxpro File Mantain Notes
77 pages
Chapter 6 Foundation of Business Intelligence
No ratings yet
Chapter 6 Foundation of Business Intelligence
6 pages
Top SQL MCQ
No ratings yet
Top SQL MCQ
85 pages
Jin Et Al. - 2014 - Studying The Motivations of Chinese Young EFL Learners Through Metaphor Analysis
No ratings yet
Jin Et Al. - 2014 - Studying The Motivations of Chinese Young EFL Learners Through Metaphor Analysis
13 pages
Template Springer - AIAMA - 2024-VF
No ratings yet
Template Springer - AIAMA - 2024-VF
7 pages
Manual Aerekaprobe v1
No ratings yet
Manual Aerekaprobe v1
34 pages
XML Cheat Sheet: For Operations Manager and Essentials 2007
No ratings yet
XML Cheat Sheet: For Operations Manager and Essentials 2007
5 pages
Social Oriented Quality: From Quality 4.0 Towards Quality 5.0
No ratings yet
Social Oriented Quality: From Quality 4.0 Towards Quality 5.0
8 pages
1 s2.0 S0168169921004592 Main
No ratings yet
1 s2.0 S0168169921004592 Main
18 pages
Tutorial A (DB and SQL) Solutions
No ratings yet
Tutorial A (DB and SQL) Solutions
12 pages
DWDM QB
No ratings yet
DWDM QB
6 pages
Group 2 Community Midwifery - 015556
No ratings yet
Group 2 Community Midwifery - 015556
14 pages
Daad Courses 2024 11 02 3
No ratings yet
Daad Courses 2024 11 02 3
6 pages
Business Analytics Techniques: Opera Solutions
No ratings yet
Business Analytics Techniques: Opera Solutions
1 page
MIL Quiz Reviewer Sem 2 Q1
No ratings yet
MIL Quiz Reviewer Sem 2 Q1
4 pages
Evaluation of Some Cloud Based Virtual Private Server (VPS) Providers
From Everand
Evaluation of Some Cloud Based Virtual Private Server (VPS) Providers
Dr. Hidaia Mamood Alassouli
No ratings yet
Evaluation of Some Cloud Based Virtual Private Server (VPS) Providers
From Everand
Evaluation of Some Cloud Based Virtual Private Server (VPS) Providers
Dr. Hidaia Mahmood Alassouli
No ratings yet
Quick Configuration of Openldap and Kerberos In Linux and Authenicating Linux to Active Directory
From Everand
Quick Configuration of Openldap and Kerberos In Linux and Authenicating Linux to Active Directory
Dr. Hidaia Mahmood Alassouli
No ratings yet

TPhadoop

Uploaded by

TPhadoop

Uploaded by

Big Data Analytics

Login: cloudera Password: cloudera

Make sure that your BIOS allows virtualization

• VM freezes when starting:

• Type in the following command

• It should list available commands

• The Project Gutenberg EBook of The

• Type in or paste the URL

• This is because the file is not yet in HDFS!

• List the files with ls or ls –al

• You should see your downloaded files

Command: Command Option:

• Check whether the file is copied correctly

• Now, let’s try to copy big.txt to HDFS again

• Copy files within HDFS

• Copy files back to local file system

• Remove files in HDFS

• Show all command options

• You can list the contents inside the directory with:

• Then copy the result file back with

• Now see the contents of the result:

You might also like