0% found this document useful (0 votes)

53 views17 pages

Hands On-Exercies

This document provides steps to install Hadoop in pseudo-distributed mode on a virtual machine and run a sample job. It describes downloading required files, installing Hadoop packages, configuring HDFS directories and permissions, starting HDFS and MapReduce services, and running an example MapReduce job to grep for text in XML files stored on HDFS.

Uploaded by

api-281821827

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

53 views17 pages

Hands On-Exercies

Uploaded by

api-281821827

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

Hadoop

Training
Hands On Exercise

1. Getting started:
Step 1: Download and Install the Vmware player
- Download the VMware-player-5.0.1-894247.zip and unzip it on your
windows machine
- Click the exe and install Vmware player

Step 2: Download and install the VMWare image
- Download the Hadoop Training - Distribution.zip and unzip it on your
windows machine
- Click on centos-6.3-x86_64-server.vmx to start the Virtual Machine

Step 3: Login and a quick check
- Once the VM starts, use the following credentials:
Username: training
Password: training
- Quickly check if eclipse and mysql workbench are installed

2. Installing Hadoop in a pseudo distributed

mode:
Step 1: Run the following command to install hadoop from yum
repository in a pseudo distributed mode (Already done for you,
please dont run this command)
sudo yum install hadoop-0.20-conf-pseudo

Step 2: Verify if the packages are installed properly

rpm -ql hadoop-0.20-conf-pseudo

Step 3: Format the namenode

sudo -u hdfs hdfs namenode -format

Step 4: Stop existing services (As Hadoop was already installed for
you, there might be some services running)
$ for service in /etc/init.d/hadoop*
> do
> sudo $service stop
> done

Step 5: Start HDFS
$ for service in /etc/init.d/hadoop-hdfs-*
> do
> sudo $service start
> done

Step 6: Verify if HDFS has started properly (In the browser)
http://localhost:50070

Step 7: Create the /tmp directory

$ sudo -u hdfs hadoop fs -mkdir /tmp
$ sudo -u hdfs hadoop fs -chmod -R 1777 /tmp

Step 8: Create mapreduce specific directories

sudo -u hdfs hadoop fs -mkdir /var
sudo -u hdfs hadoop fs -mkdir /var/lib
sudo -u hdfs hadoop fs -mkdir /var/lib/hadoop-hdfs
sudo -u hdfs hadoop fs -mkdir /var/lib/hadoop-hdfs/cache
sudo -u hdfs hadoop fs -mkdir /var/lib/hadoop-hdfs/cache/mapred
sudo -u hdfs hadoop fs -mkdir /var/lib/hadoop-
hdfs/cache/mapred/mapred
sudo -u hdfs hadoop fs -mkdir /var/lib/hadoop-
hdfs/cache/mapred/mapred/staging
sudo -u hdfs hadoop fs -chmod 1777 /var/lib/hadoop-
hdfs/cache/mapred/mapred/staging
sudo -u hdfs hadoop fs -chown -R mapred /var/lib/hadoop-
hdfs/cache/mapred

Step 9: Verify the directory structure

$ sudo -u hdfs hadoop fs -ls -R /

Output should be

drwxrwxrwt

- hdfs

supergroup

0 2012-04-19 15:14

drwxr-xr-x

- hdfs

supergroup

0 2012-04-19 15:16

drwxr-xr-x

- hdfs

supergroup

0 2012-04-19 15:16

drwxr-xr-x

- hdfs

supergroup

0 2012-04-19 15:16

drwxr-xr-x

- hdfs

supergroup

0 2012-04-19 15:16

/tmp
/var
/var/lib
/var/lib/hadoop-hdfs
/var/lib/hadoop-

supergroup

0 2012-04-19 15:19

/var/lib/hadoop-

0 2012-04-19 15:29

/var/lib/hadoop-

0 2012-04-19 15:33

/var/lib/hadoop-

hdfs/cache
drwxr-xr-x

- mapred

hdfs/cache/mapred
drwxr-xr-x

- mapred

supergroup

hdfs/cache/mapred/mapred
drwxrwxrwt

- mapred

supergroup

hdfs/cache/mapred/mapred/staging

Step 10: Start MapReduce
$ for service in /etc/init.d/hadoop-0.20-
mapreduce-*
> do
> sudo $service start
> done

Step 11: Verify if MapReduce has started properly (In Browser)
http://localhost:50030

Step 12: Verify if the installation went on well by running a program

Step 12.1: Create a home directory on HDFS for the user

sudo -u hdfs hadoop fs -mkdir /user/training
sudo -u hdfs hadoop fs -chown training /user/training

Step 12.2: Make a directory in HDFS called input and copy some XML files
into it by running the following commands

$ hadoop fs -mkdir input

$ hadoop fs -put /etc/hadoop/conf/*.xml input
$ hadoop fs -ls input
Found 3 items:
-rw-r--r-- 1 joe supergroup 1348 2012-02-13 12:21 input/core-
site.xml
-rw-r--r-- 1 joe supergroup 1913 2012-02-13 12:21 input/hdfs-
site.xml
-rw-r--r-- 1 joe supergroup 1001 2012-02-13 12:21 input/mapred-
site.xml

Step 12.3: Run an example Hadoop job to grep with a regular expression in
your input data.

$ /usr/bin/hadoop jar /usr/lib/hadoop-0.20-
mapreduce/hadoop-examples.jar grep input output 'dfs[a-
z.]+'

Step 12.4: After the job completes, you can find the output in the HDFS
directory named output because you specified that output directory to
Hadoop.

$ hadoop fs -ls
Found 2 items
drwxr-xr-x
- joe supergroup 0 2009-08-18 18:36
/user/joe/input
drwxr-xr-x
- joe supergroup 0 2009-08-18 18:38
/user/joe/output

Step 12.5: List the output files

$ hadoop fs -ls output

Found 2 items

drwxr-xr-x - joe supergroup
0 2009-02-25

10:33
/user/joe/output/_logs

-rw-r--r-- 1 joe supergroup 1068 2009-02-25

10:33
/user/joe/output/part-00000

-rw-r--r1 joe supergroup
0 2009-02-25

10:33
/user/joe/output/_SUCCESS

Step 12.6: Read the output

$ hadoop fs -cat output/part-00000 | head

1
dfs.datanode.data.dir

1
dfs.namenode.checkpoint.dir

1
dfs.namenode.name.dir

1
dfs.replication

1
dfs.safemode.extension

1
dfs.safemode.min.datanodes

3. Accessing HDFS from command line:

This exercise is just to you familiar with HDFS. Run the following commands:

Command 1: List the files in the user/training directory
$> hadoop fs -ls

Command 2: List the files in the root directory

$> hadoop fs ls /

Command 3: Push a file to HDFS

$> hadoop fs put test.txt /user/training/test.txt

Command 4: View the contents of the file
$> hadoop fs cat /user/training/test.txt

Command 5: Delete a file

$> hadoop fs rmr /user/training/test.txt

4. Running the Wordcount Mapreduce job

Step 1: Put the data in the HDFS
hadoop fs -mkdir /user/training/wordcountinput
hadoop fs put wordcount.txt /user/training/wordcountinput

2: Create a new project in eclipse called wordcount
Step

1. cp r /home/training/exercises/wordcount
/home/training/workspace/wordcount
2. Open EclipseNew Project->wordcount->location
/home/training/workspace
3. Right Click on the wordcount project->properties->java
build path->Libraries->Add External JarsSelect all jars
from /usr/lib/hadoop and /usr/lib/hadoop-0.20-
mapreduceOk
4. Make sure that there are no more compilation errors

Step
3: Create a jar file

1. Right click the project-ExportJavaJarSelect the location as

/home/trainingMake sure workdcount is checkedFinish

Step 4 Run the jar file
hadoop jar wordcount.jar WordCount wordcountinput
wordcountoutput

5. Mini Project: Importing MySQL Data

Using Sqoop and Querying it using Hive
5.1 Setting up Sqoop
Step 1: Install Sqoop (Already done for you, please dont run
this command)

$> sudo yum install sqoop

Step 2: View list of databases

$> sqoop list-databases \
--connect jdbc:mysql://localhost/training_db \
--username root --password root

Step 3: View list of tables

$> sqoop list-tables \

--connect jdbc:mysql://localhost/training_db \

--username root --password root

Step 4: Import data to HDFS

$> sqoop import \

--connect jdbc:mysql://localhost/training_db \
--table user_log --fields-terminated-by '\t' \
-m 1 --username root --password root

5.2 Setting up Hive

Step 1: Install Hive

$> sudo yum install hive (Already done for you, dont
run this command)
$> sudo u hdfs hadoop fs mkdir /user/hive/warehouse
$> hadoop fs chmod g+w /tmp
$> sudo u hdfs hadoop fs chmod g+w
/user/hive/warehouse
$> sudo u hdfs hadoop fs chown R training
/user/hive/warehouse
$>sudo chmod 777 /var/lib/hive/metastore
$> hive
Hive>show tables;

Step 2: Create table

hive> create table user_log (country
STRING,ip_address STRING) ROW FORMAT DELIMITED FIELDS
TERMINATED BY '\t' STORED AS TEXTFILE;

Step 3: Load Data

hive> LOAD DATA INPATH "/user/training/user_log/part-

m-00000" INTO TABLE user_log;

Step 4: Run the query

$> select country,count(1) from user_log group by
country;

6. Setting up Flume
Step 1: Install Flume
$> sudo yum install flume-ng (Already done for you, please
dont run this command)
$> sudo u hdfs hadoop fs chmod 1777 /user/training

Step 2: Copy the configuration file

$> sudo cp /home/training/exercises/flume-
config/flume.conf /usr/lib/flume-ng/conf

Step 3: Start the flume agent
$> flume-ng agent --conf-file /usr/lib/flume-
ng/conf/flume.conf --name agent -
Dflume.root.logger=INFO,console

Step 4: Push the file in a different terminal
$> sudo cp /home/training/exercises/log.txt
/home/training

Step 5: View the output
$> hadoop fs ls logs

7. Setting up a multi node cluster
Step 1: For converting the pseudo distributed mode to distributed
mode, the first step is to stop the existing services (To be done on all
nodes)
$> for service in /etc/init.d/hadoop*
> do
> sudo $service stop
> done

Step 2: Create a new set of blank configuration files. The conf.empty

directory contains blank files, so we will copy those to a new
directory (To be done on all nodes)

$> sudo cp r /etc/hadoop/conf.empty \

> /etc/hadoop/conf.class

Step 3: Point Hadoop configuration to the new configuration (To be

done on all nodes)

$> sudo /usr/sbin/alternatives -install \

> /etc/hadoop/conf hadoop-conf \
> /etc/hadoop/conf.class 99

Step 4: Verify Alternatives (To be done on all nodes)

$> /usr/sbin/update-alternatives \
> --display hadoop-conf

Step 5: Setting up the hosts (To be done on all nodes)

Step 5.1: Find the IP address of your machine

$> /sbin/ifconfig

Step 5.2: List down all the IP Addresses in your cluster setup i.e.
the ones that will belong to your cluster. And decide a name for
each one. In our example, lets say we are trying to setup a 3 node
cluster so we fetch IP address of each node and name it as
namenode and datanode<n>.
Update /etc/hosts file with IP addresses as shown. So /etc/hosts
file on each node should look something like this

192.168.1.12 namenode
192.168.1.21 datanode1
192.168.1.21 datanode2

Step 5.3: Update /etc/sysconfig/network file with Hostname

Open the /etc/sysconfig/network on your local box and make

sure that your hostname is namenode or datanode<n>.
Assuming you have decided to become a datanode1 i.e.
192.168.1.21. So your hostname should be
HOSTNAME=datanode1
HOSTNAME=Your node i.e. namenode or datanode1

5.4: Restart your machine and try pining other machines
Step

Ping namenode

Step
6: Changing configuration files (To be done on all nodes)
The format to add the configuration parameter is
<property>
<name>property_name</name>
<value>property_value</value>
</property>

Add the following configurations in the following files

Name
Value
Filename: /etc/hadoop/conf.class/core-site.xml
fs.default.name
hdfs://<namenode>:8020

Filename: /etc/hadoop/conf.class/hdfs-site.xml
dfs.name.dir
/home/disk1/dfs/nn,/home/disk2/dfs/nn
dfs.data.dir
/home/disk1/dfs/dn,/home/disk2/dfs/dn
dfs.http.address
namenode:50070

Filename: /etc/hadoop/conf.class/mapred-site.xml
mapred.local.dir
/home/disk1/mapred/local,/home/disk2/mapre
d/local
mapred.job.tracker
namenode:8021
mapred.jobtracker.staging.ro /user
ot.dir

Step 7: Create necessary directories (To be done on all nodes)
$> sudo mkdir p /home/disk1/dfs/nn
$> sudo mkdir p /home/disk2/dfs/nn
$> sudo mkdir p /home/disk1/dfs/dn
$> sudo mkdir p /home/disk2/dfs/dn
$> sudo mkdir p /home/disk1/mapred/local
$> sudo mkdir p /home/disk2/mapred/local

Step
8: Manage Permissions (To be done on all nodes)

$> sudo chown R hdfs:hadoop /home/disk1/dfs/nn

$> sudo chown R hdfs:hadoop /home/disk2/dfs/nn
$> sudo chown R hdfs:hadoop /home/disk1/dfs/dn
$> sudo chown R hdfs:hadoop /home/disk2/dfs/dn
$> sudo chown R mapred:hadoop /home/disk1/mapred/local
$> sudo chown R mapred:hadoop /home/disk2/mapred/local

Step 9: Reduce Hadoop Heapsize (To be done on all nodes)

$> export HADOOP_HEAPSIZE=200

Step 10: Format the namenode (Only on Namenode)
$> sudo u hdfs hadoop namenode -format

On Namenode
$> sudo /etc/init.d/hadoop-hdfs-namenode start
$> sudo /etc/init.d/hadoop-hdfs-secondarynamenode start

On Datanode
$> sudo /etc/init.d/hadoop-hdfs-datanode start

Step 11: Start HDFS processes

Step 12: Create directories in HDFS (Only one member should do this)
$> sudo u hdfs hadoop fs mkdir /user/training
$> sudo u hdfs hadoop fs chown training /user/training

$> sudo u hdfs hadoop fs mkdir /mapred/system
$> sudo u hdfs hadoop fs chown mapred:hadoop \
>/mapred/system

Step 13: Create directories for mapreduce (Only one member should do this)

Step 14: Start the Mapreduce process

On Namenode
$> sudo /etc/init.d/hadoop-0.20-jobtracker start

On Slave node
$> sudo /etc/init.d/hadoop-0.20-tasktracker start

Step 15: Verify the cluster
Visit http://namenode:50070 and look at number of nodes

Osdp - V2 1 - 5 - 2014
No ratings yet
Osdp - V2 1 - 5 - 2014
55 pages
DA Lab
No ratings yet
DA Lab
89 pages
Dell Emc Poweredge MX Modular Platform Installation, Configuration, and Management
No ratings yet
Dell Emc Poweredge MX Modular Platform Installation, Configuration, and Management
3 pages
Final Bda Lab Manual
No ratings yet
Final Bda Lab Manual
56 pages
Experiment No - 1
No ratings yet
Experiment No - 1
13 pages
CCS334 Bda Lab Manual
No ratings yet
CCS334 Bda Lab Manual
48 pages
HP Workstation xw4600 Specs
No ratings yet
HP Workstation xw4600 Specs
2 pages
Bda Lab Manual
No ratings yet
Bda Lab Manual
45 pages
Ccs334 Bda Lab Manual PRINT
No ratings yet
Ccs334 Bda Lab Manual PRINT
53 pages
213nt1306 - Big Data Analytics Lab Manual
No ratings yet
213nt1306 - Big Data Analytics Lab Manual
80 pages
Ccs 334 Bigdata Manual
No ratings yet
Ccs 334 Bigdata Manual
45 pages
Procedure: 1
No ratings yet
Procedure: 1
29 pages
Bigdatamanualfinal 231019063224 d211cb48
No ratings yet
Bigdatamanualfinal 231019063224 d211cb48
45 pages
Assignment Tanupriya BDDV
No ratings yet
Assignment Tanupriya BDDV
8 pages
TP 1 - HDFS
No ratings yet
TP 1 - HDFS
40 pages
BDH Record - Merged
No ratings yet
BDH Record - Merged
47 pages
Folder Requirement
No ratings yet
Folder Requirement
105 pages
Big Data
No ratings yet
Big Data
23 pages
How To Make Partitions in Windows 10 - Windows 10 PDF
No ratings yet
How To Make Partitions in Windows 10 - Windows 10 PDF
6 pages
Bda Lab Manual
No ratings yet
Bda Lab Manual
42 pages
Cloud Computing - KCS713 - Assignment
No ratings yet
Cloud Computing - KCS713 - Assignment
1 page
BigData Lab Manual
No ratings yet
BigData Lab Manual
44 pages
Big Data Lab Manual
No ratings yet
Big Data Lab Manual
32 pages
Big Data Analytics lab-JD
No ratings yet
Big Data Analytics lab-JD
49 pages
BDA Lab Manual UPDATED
No ratings yet
BDA Lab Manual UPDATED
45 pages
Group A 1st
No ratings yet
Group A 1st
4 pages
Grid Vgpu User Guide
No ratings yet
Grid Vgpu User Guide
289 pages
Data Storage Data Processing: Hadoop Distributed File System (HDFS) Mapreduce
No ratings yet
Data Storage Data Processing: Hadoop Distributed File System (HDFS) Mapreduce
35 pages
Bi Lab File
No ratings yet
Bi Lab File
19 pages
BDA LabManual
No ratings yet
BDA LabManual
20 pages
Computer Science & Engineering: Department of
No ratings yet
Computer Science & Engineering: Department of
6 pages
BDA-Lab Record
No ratings yet
BDA-Lab Record
43 pages
Hadoop 1
No ratings yet
Hadoop 1
15 pages
Hadoop Configuration
No ratings yet
Hadoop Configuration
12 pages
BDA Lab Manual
No ratings yet
BDA Lab Manual
49 pages
Cloud Computing PPT 2
No ratings yet
Cloud Computing PPT 2
22 pages
BDA Record
No ratings yet
BDA Record
34 pages
Library For SNTP Server Functionality in Simatic S7 Cpus (LSNTP)
No ratings yet
Library For SNTP Server Functionality in Simatic S7 Cpus (LSNTP)
26 pages
Ccs334 Bda Lab Ex
No ratings yet
Ccs334 Bda Lab Ex
45 pages
Hadoop Single Node Cluster Setup Steps
No ratings yet
Hadoop Single Node Cluster Setup Steps
7 pages
Exp 1-2
No ratings yet
Exp 1-2
9 pages
Iaas Using Openstack: Cloud Computing: Lab 7
No ratings yet
Iaas Using Openstack: Cloud Computing: Lab 7
20 pages
@vtucode - In-2022-Scheme-Module-4-3rd semester-CSE
No ratings yet
@vtucode - In-2022-Scheme-Module-4-3rd semester-CSE
35 pages
Chapter 1 Introduction To Mobile App Development
No ratings yet
Chapter 1 Introduction To Mobile App Development
30 pages
LC3 Appendix C
No ratings yet
LC3 Appendix C
20 pages
Lab Manual
No ratings yet
Lab Manual
34 pages
Setup Hadoop Gettingstart
No ratings yet
Setup Hadoop Gettingstart
4 pages
Hadoop Installation
No ratings yet
Hadoop Installation
4 pages
130511-130701-Digital Logic Design PDF
No ratings yet
130511-130701-Digital Logic Design PDF
2 pages
Delta Ia-Plc As PM en 20190621 PDF
No ratings yet
Delta Ia-Plc As PM en 20190621 PDF
1,224 pages
Big Data File
No ratings yet
Big Data File
16 pages
Install Hadoop
No ratings yet
Install Hadoop
8 pages
BIG Data File
No ratings yet
BIG Data File
28 pages
Edid Over
No ratings yet
Edid Over
8 pages
Getting Started W/ Arduino On Windows
No ratings yet
Getting Started W/ Arduino On Windows
5 pages
BDAO
No ratings yet
BDAO
23 pages
CCS334-BDA LAB MANUAL Final
No ratings yet
CCS334-BDA LAB MANUAL Final
46 pages
Zookeeper: Coordinating Your Cluster
No ratings yet
Zookeeper: Coordinating Your Cluster
13 pages
Big Data Analytics Lab Experiments
No ratings yet
Big Data Analytics Lab Experiments
16 pages
Big Data Manual Ai
No ratings yet
Big Data Manual Ai
33 pages
Huawei CloudEngine S5735-L Series Switches Brochure
No ratings yet
Huawei CloudEngine S5735-L Series Switches Brochure
24 pages
Week 1 in Terminal
No ratings yet
Week 1 in Terminal
10 pages
Big Data Record 2024-25
No ratings yet
Big Data Record 2024-25
46 pages
Amrita CC 3.1
No ratings yet
Amrita CC 3.1
7 pages
CCS334 Bda
No ratings yet
CCS334 Bda
23 pages
NA 324 Practical Notes
No ratings yet
NA 324 Practical Notes
32 pages
Bda Manual
No ratings yet
Bda Manual
33 pages
Bigdatamanual
No ratings yet
Bigdatamanual
45 pages
03 01-Hardware Cpu
No ratings yet
03 01-Hardware Cpu
16 pages
Bda Lab
No ratings yet
Bda Lab
37 pages
Big Data
No ratings yet
Big Data
28 pages
Hadoop Installation Manual 2.odt
No ratings yet
Hadoop Installation Manual 2.odt
20 pages
BDA Lab Manual-1
No ratings yet
BDA Lab Manual-1
60 pages
Extreme Computing Lab Exercises Session One: 1 Getting Started
No ratings yet
Extreme Computing Lab Exercises Session One: 1 Getting Started
6 pages
Mobile Softphone Rev1.0 16oct2017 1508218733.9602
No ratings yet
Mobile Softphone Rev1.0 16oct2017 1508218733.9602
99 pages
Big Data & Analytics Lab Manual
No ratings yet
Big Data & Analytics Lab Manual
51 pages
M-Series Sonars: Quickstart Guide
No ratings yet
M-Series Sonars: Quickstart Guide
4 pages
@bigdatalabfile 09
No ratings yet
@bigdatalabfile 09
35 pages
Docker Resp
No ratings yet
Docker Resp
2 pages
Az 101
No ratings yet
Az 101
140 pages
Subnetting - Lecture Slides..
No ratings yet
Subnetting - Lecture Slides..
68 pages
Big Data Analytics - Lab-Manual
No ratings yet
Big Data Analytics - Lab-Manual
19 pages
Hadoop Administrator Training - Lab Hand Book
No ratings yet
Hadoop Administrator Training - Lab Hand Book
12 pages
Azure AD Device Identity Documentation
No ratings yet
Azure AD Device Identity Documentation
211 pages
Lab 1 - Hadoop HDFS and MapReduce
No ratings yet
Lab 1 - Hadoop HDFS and MapReduce
4 pages
PLNY12 Galera Cluster Best Practices
No ratings yet
PLNY12 Galera Cluster Best Practices
76 pages
Deepshikha Agrawal Pushp B.Sc. (IT), MBA (IT) Certification-Hadoop, Spark, Scala, Python, Tableau, ML (Assistant Professor JLBS)
No ratings yet
Deepshikha Agrawal Pushp B.Sc. (IT), MBA (IT) Certification-Hadoop, Spark, Scala, Python, Tableau, ML (Assistant Professor JLBS)
74 pages
Ax 25proctocol
No ratings yet
Ax 25proctocol
13 pages
Imsva 9.1 BPG 20160531
No ratings yet
Imsva 9.1 BPG 20160531
61 pages
Quick Configuration of Openldap and Kerberos In Linux and Authenicating Linux to Active Directory
From Everand
Quick Configuration of Openldap and Kerberos In Linux and Authenicating Linux to Active Directory
Dr. Hidaia Mahmood Alassouli
No ratings yet

Hands On-Exercies

Uploaded by

Hands On-Exercies

Uploaded by

Hadoop

2. Installing Hadoop in a pseudo distributed

Step 2: Verify if the packages are installed properly

sudo -u hdfs hdfs namenode -format

Step 7: Create the /tmp directory

Step 9: Verify the directory structure

Step 12.1: Create a home directory on HDFS for the user

$ hadoop fs -mkdir input

Step 12.5: List the output files

3. Accessing HDFS from command line:

Command 2: List the files in the root directory

Command 3: Push a file to HDFS

$> hadoop fs put test.txt /user/training/test.txt

Command 5: Delete a file

4. Running the Wordcount Mapreduce job

1. Right click the project-ExportJavaJarSelect the location as

5. Mini Project: Importing MySQL Data

$> sudo yum install sqoop

Step 3: View list of tables

5.2 Setting up Hive

Step 3: Load Data

Step 2: Copy the configuration file

Step 2: Create a new set of blank configuration files. The conf.empty

$> sudo cp r /etc/hadoop/conf.empty \

Step 3: Point Hadoop configuration to the new configuration (To be

$> sudo /usr/sbin/alternatives -install \

Step 5: Setting up the hosts (To be done on all nodes)

Step 5.1: Find the IP address of your machine

Step 5.3: Update /etc/sysconfig/network file with Hostname

Open the /etc/sysconfig/network on your local box and make

Add the following configurations in the following files

$> sudo chown R hdfs:hadoop /home/disk1/dfs/nn

Step 9: Reduce Hadoop Heapsize (To be done on all nodes)

Step 14: Start the Mapreduce process

You might also like