0% found this document useful (0 votes)

81 views37 pages

Unix Commands Part 2

This document provides instructions for installing Hadoop on a single node and configuring it for a single node cluster. It describes prerequisites like installing Java and adding a dedicated Hadoop user. It also covers generating SSH keys to enable passwordless login for the Hadoop user for managing nodes. The main steps are to download and extract Hadoop, set environment variables, and then format and start the HDFS and MapReduce daemons to have a functional single node Hadoop cluster.

Uploaded by

Jai Sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

81 views37 pages

Unix Commands Part 2

Uploaded by

Jai Sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 37

Installation and Configuration

Documentation
Release 1.0.1

Oshin Prem

Sep 27, 2017

Contents

1 HADOOP INSTALLATION 3
1.1 SINGLE-NODE INSTALLATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 MULTI-NODE INSTALLATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2 HIVE INSTALLATION 25
2.1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.2 Hive Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3 SQOOP INSTALLATION 29
3.1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.2 Stable release and Download . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.3 Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.4 Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

4 IMPORTING DATA FROM HADOOP TO MYSQL 31

4.1 Steps to install mysql . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.2 Using sqoop to perform import to hadoop from sql . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.3 Error points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

5 Indices and tables 33

i
ii
Installation and Configuration Documentation, Release 1.0.1

Contents:

Contents 1
Installation and Configuration Documentation, Release 1.0.1

2 Contents
CHAPTER 1

HADOOP INSTALLATION

This section refers to the installation settings of Hadoop on a standalone system as well as on a system existing as a
node in a cluster.

SINGLE-NODE INSTALLATION

Running Hadoop on Ubuntu (Single node cluster setup)

The report here will describe the required steps for setting up a single-node Hadoop cluster backed by the Hadoop
Distributed File System, running on Ubuntu Linux. Hadoop is a framework written in Java for running applications on
large clusters of commodity hardware and incorporates features similar to those of the Google File System (GFS) and
of the MapReduce computing paradigm. Hadoop’s HDFS is a highly fault-tolerant distributed file system and, like
Hadoop in general, designed to be deployed on low-cost hardware. It provides high throughput access to application
data and is suitable for applications that have large data sets.
Before we start, we will understand the meaning of the following:

DataNode:

A DataNode stores data in the Hadoop File System. A functional file system has more than one DataNode, with the
data replicated across them.

NameNode:

The NameNode is the centrepiece of an HDFS file system. It keeps the directory of all files in the file system, and
tracks where across the cluster the file data is kept. It does not store the data of these file itself.

3
Installation and Configuration Documentation, Release 1.0.1

Jobtracker:

The Jobtracker is the service within hadoop that farms out MapReduce to specific nodes in the cluster, ideally the
nodes that have the data, or atleast are in the same rack.

TaskTracker:

A TaskTracker is a node in the cluster that accepts tasks- Map, Reduce and Shuffle operatons – from a Job Tracker.

Secondary Namenode:

Secondary Namenode whole purpose is to have a checkpoint in HDFS. It is just a helper node for namenode.

Prerequisites

Java 6 JDK

Hadoop requires a working Java 1.5+ (aka Java 5) installation.

Update the source list

user@ubuntu:~$ sudo apt-get update

or
Install Sun Java 6 JDK

4 Chapter 1. HADOOP INSTALLATION

Installation and Configuration Documentation, Release 1.0.1

Note:

If you already have Java JDK installed on your system, then you need not run the above command.
To install it

user@ubuntu:~$ sudo apt-get install sun-java6-jdk

The full JDK which will be placed in /usr/lib/jvm/java-6-openjdk-amd64 After installation, check whether java JDK
is correctly installed or not, with the following command

user@ubuntu:~$ java -version

Adding a dedicated Hadoop system user

We will use a dedicated Hadoop user account for running Hadoop.

user@ubuntu:~$ sudo addgroup hadoop_group

user@ubuntu:~$ sudo adduser --ingroup hadoop_group hduser1

This will add the user hduser1 and the group hadoop_group to the local machine. Add hduser1 to the sudo group

user@ubuntu:~$ sudo adduser hduser1 sudo

Configuring SSH

The hadoop control scripts rely on SSH to peform cluster-wide operations. For example, there is a script for stopping
and starting all the daemons in the clusters. To work seamlessly, SSH needs to be setup to allow password-less login

1.1. SINGLE-NODE INSTALLATION 5

Installation and Configuration Documentation, Release 1.0.1

6 Chapter 1. HADOOP INSTALLATION

Installation and Configuration Documentation, Release 1.0.1

for the hadoop user from machines in the cluster. The simplest way to achive this is to generate a public/private key
pair, and it will be shared across the cluster.
Hadoop requires SSH access to manage its nodes, i.e. remote machines plus your local machine. For our single-node
setup of Hadoop, we therefore need to configure SSH access to localhost for the hduser user we created in the earlier.
We have to generate an SSH key for the hduser user.

user@ubuntu:~$ su - hduser1
hduser1@ubuntu:~$ ssh-keygen -t rsa -P ""

The second line will create an RSA key pair with an empty password.

Note:

P “”, here indicates an empty password

You have to enable SSH access to your local machine with this newly created key which is done by the following
command.

hduser1@ubuntu:~$ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

The final step is to test the SSH setup by connecting to the local machine with the hduser1 user. The step is also needed
to save your local machine’s host key fingerprint to the hduser user’s known hosts file.

hduser@ubuntu:~$ ssh localhost

If the SSH connection fails, we can try the following (optional):

• Enable debugging with ssh -vvv localhost and investigate the error in detail.
• Check the SSH server configuration in /etc/ssh/sshd_config. If you made any changes to the SSH server config-
uration file, you can force a configuration reload with sudo /etc/init.d/ssh reload.

1.1. SINGLE-NODE INSTALLATION 7

Installation and Configuration Documentation, Release 1.0.1

INSTALLATION

Main Installation

• Now, I will start by switching to hduser

hduser@ubuntu:~$ su - hduser1

• Now, download and extract Hadoop 1.2.0

• Setup Environment Variables for Hadoop
Add the following entries to .bashrc file
# Set Hadoop-related environment variables
export HADOOP_HOME=/usr/local/hadoop
# Add Hadoop bin/ directory to PATH
export PATH= $PATH:$HADOOP_HOME/bin

Configuration

hadoop-env.sh

Change the file: conf/hadoop-env.sh

#export JAVA_HOME=/usr/lib/j2sdk1.5-sun

to in the same file

# export JAVA_HOME=/usr/lib/jvm/java-6-openjdk-amd64 (for 64 bit)
# export JAVA_HOME=/usr/lib/jvm/java-6-openjdk-amd64 (for 32 bit)

8 Chapter 1. HADOOP INSTALLATION

Installation and Configuration Documentation, Release 1.0.1

conf/*-site.xml

Now we create the directory and set the required ownerships and permissions

hduser@ubuntu:~$ sudo mkdir -p /app/hadoop/tmp

hduser@ubuntu:~$ sudo chown hduser:hadoop /app/hadoop/tmp
hduser@ubuntu:~$ sudo chmod 750 /app/hadoop/tmp

The last line gives reading and writing permissions to the /app/hadoop/tmp directory
• Error: If you forget to set the required ownerships and permissions, you will see a java.io.IO Exception when
you try to format the name node.
Paste the following between <configuration>
• In file conf/core-site.xml

<property>
<name>hadoop.tmp.dir</name>
<value>/app/hadoop/tmp</value>
<description>A base for other temporary directories.</description>
</property>

<property>
<name>fs.default.name</name>
<value>hdfs://localhost:54310</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>

• In file conf/mapred-site.xml

<property>
<name>mapred.job.tracker</name>
<value>localhost:54311</value>
<description>The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
</description>
</property>

• In file conf/hdfs-site.xml

<property>
<name>dfs.replication</name>
<value>1</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>

1.1. SINGLE-NODE INSTALLATION 9

Installation and Configuration Documentation, Release 1.0.1

Formatting the HDFS filesystem via the NameNode

To format the filesystem (which simply initializes the directory specified by the dfs.name.dir variable). Run the
command

hduser@ubuntu:~$ /usr/local/hadoop/bin/hadoop namenode -format

Starting your single-node cluster

Before starting the cluster, we need to give the required permissions to the directory with the following command

hduser@ubuntu:~$ sudo chmod -R 777 /usr/local/hadoop

Run the command

hduser@ubuntu:~$ /usr/local/hadoop/bin/start-all.sh

This will startup a Namenode, Datanode, Jobtracker and a Tasktracker on the machine.

hduser@ubuntu:/usr/local/hadoop$ jps

Errors:

1. If by chance your datanode is not starting, then you have to erase the contents of the folder /app/hadoop/tmp
The command that can be used

hduser@ubuntu:~:$ sudo rm -Rf /app/hadoop/tmp/*

10 Chapter 1. HADOOP INSTALLATION

Installation and Configuration Documentation, Release 1.0.1

2. You can also check with netstat if Hadoop is listening on the configured ports. The command that can be
used

hduser@ubuntu:~$ sudo netstat -plten | grep java

3. Errors if any, examine the log files in the /logs/ directory.

Stopping your single-node cluster

Run the command to stop all the daemons running on your machine.

hduser@ubuntu:~$ /usr/local/hadoop/bin/stop-all.sh

ERROR POINTS:

If datanode is not starting, then clear the tmp folder before formatting the namenode using the following command

hduser@ubuntu:~$ rm -Rf /app/hadoop/tmp/*

Note:

• The masters and slaves file should contain localhost.

• In /etc/hosts, the ip of the system should be given with the alias as localhost.
• Set the java home path in hadoop-env.sh as well bashrc.

1.1. SINGLE-NODE INSTALLATION 11

Installation and Configuration Documentation, Release 1.0.1

MULTI-NODE INSTALLATION

Running Hadoop on Ubuntu Linux (Multi-Node Cluster)

From single-node clusters to a multi-node cluster

We will build a multi-node cluster merge two or more single-node clusters into one multi-node cluster in which one
Ubuntu box will become the designated master but also act as a slave , and the other box will become only a slave.

Prerequisites

Configuring single-node clusters first,here we have used two single node clusters. Shutdown each single-node cluster
with the following command

user@ubuntu:~$ bin/stop-all.sh

Networking

• The easiest is to put both machines in the same network with regard to hardware and software configuration.
• Update /etc/hosts on both machines .Put the alias to the ip addresses of all the machines. Here we are creating a
cluster of 2 machines , one is master and other is slave 1

hduser@master:$ cd /etc/hosts

• Add the following lines for two node cluster

12 Chapter 1. HADOOP INSTALLATION

Installation and Configuration Documentation, Release 1.0.1

10.105.15.78 master (IP address of the master node)

10.105.15.43 slave1 (IP address of the slave node)

SSH access

The hduser user on the master (aka hduser@master) must be able to connect:
1. to its own user account on the master - i.e. ssh master in this context.
2. to the hduser user account on the slave (i.e. hduser@slave1) via a password-less SSH login.
• Add the hduser@master public SSH key using the following command

hduser@master:~$ ssh-copy-id -i $HOME/.ssh/id_rsa.pub hduser@slave1

• Connect with user hduser from the master to the user account hduser on the slave.
1. From master to master

hduser@master:~$ ssh master

2. From master to slave

hduser@master:~$ ssh slave1

Hadoop

Cluster Overview

This will describe how to configure one Ubuntu box as a master node and the other Ubuntu box as a slave node.

1.2. MULTI-NODE INSTALLATION 13

Installation and Configuration Documentation, Release 1.0.1

14 Chapter 1. HADOOP INSTALLATION

Installation and Configuration Documentation, Release 1.0.1

Configuration

conf/masters

The machine on which bin/start-dfs.sh is running will become the primary NameNode. This file should be updated on
all the nodes. Open the masters file in the conf directory

hduser@master/slave :~$ /usr/local/hadoop/conf

hduser@master/slave :~$ sudo gedit masters

Add the following line

Master

conf/slaves

This file should be updated on all the nodes as master is also a slave. Open the slaves file in the conf directory

hduser@master/slave:~/usr/local/hadoop/conf$ sudo gedit slaves

Add the following lines

Master
Slave1

conf/*-site.xml (all machines)

Open this file in the conf directory

1.2. MULTI-NODE INSTALLATION 15

Installation and Configuration Documentation, Release 1.0.1

16 Chapter 1. HADOOP INSTALLATION

Installation and Configuration Documentation, Release 1.0.1

hduser@master:~/usr/local/hadoop/conf$ sudo gedit core-site.xml

Change the fs.default.name parameter (in conf/core-site.xml), which specifies the NameNode (the HDFS master) host
and port.
conf/core-site.xml (ALL machines .ie. Master as well as slave)

<property>
<name>fs.default.name</name>
<value>hdfs://master:54310</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>

conf/mapred-site.xml

Open this file in the conf directory

hduser@master:~$ /usr/local/hadoop/conf
hduser@master:~$ sudo gedit mapred-site.xml

Change the mapred.job.tracker parameter (in conf/mapred-site.xml), which specifies the JobTracker (MapReduce mas-
ter) host and port.
conf/mapred-site.xml (ALL machines)

<property>
<name>mapred.job.tracker</name>

1.2. MULTI-NODE INSTALLATION 17

Installation and Configuration Documentation, Release 1.0.1

<value>master:54311</value>
<description>The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
</description>
</property>

conf/hdfs-site.xml

Open this file in the conf directory

hduser@master:~$ /usr/local/hadoop/conf
hduser@master:~$ sudo gedit hdfs-site.xml

Change the dfs.replication parameter (in conf/hdfs-site.xml) which specifies the default block replication. We have
two nodes available, so we set dfs.replication to 2.

conf/hdfs-site.xml (ALL machines)

Changes to be made

<property>
<name>dfs.replication</name>
<value>2</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>

18 Chapter 1. HADOOP INSTALLATION

Installation and Configuration Documentation, Release 1.0.1

Formatting the HDFS filesystem via the NameNode

Format the cluster’s HDFS file system

hduser@master:~/usr/local/hadoop$ bin/hadoop namenode -format

Starting the multi-node cluster

Starting the cluster is performed in two steps.

1. We begin with starting the HDFS daemons: the NameNode daemon is started on master, and DataNode daemons
are started on all slaves (here: master and slave).
2. Then we start the MapReduce daemons: the JobTracker is started on master, and TaskTracker daemons are
started on all slaves (here: master and slave).
Cluster is started by running the commnd on master

hduser@master:~$ /usr/local/hadoop
hduser@master:~$ bin/start-all.sh

By this command:
• The NameNode daemon is started on master, and DataNode daemons are started on all slaves (here: master and
slave).
• The JobTracker is started on master, and TaskTracker daemons are started on all slaves (here: master and slave)
To check the daemons running , run the following commands

hduser@master:~$ jps

On slave, datanode and jobtracker should run.

1.2. MULTI-NODE INSTALLATION 19

Installation and Configuration Documentation, Release 1.0.1

20 Chapter 1. HADOOP INSTALLATION

Installation and Configuration Documentation, Release 1.0.1

hduser@slave:~/usr/local/hadoop$ jps

Stopping the multi-node cluster

To stop the multinode cluster , run the following command on master pc

hduser@master:~$ cd /usr/local/hadoop
hduser@master:~/usr/local/hadoop$ bin/stop-all.sh

ERROR POINTS:

1. Number of slaves = Number of replications in hdfs-site.xml also number of slaves = all slaves + master(if
master is also considered to be a slave)
2. When you start the cluster, clear the tmp directory on all the nodes (master+slaves) using the following command

hduser@master:~$ rm -Rf /app/hadoop/tmp/*

3. Configuration of /etc/hosts , masters and slaves files on both the masters and the slaves nodes should be the
same.
4. If namenode is not getting started run the following commands:
• To give all permissions of hadoop folder to hduser

hduser@master:~$ sudo chmod -R 777 /app/hadoop

• This command deletes the junk files which gets stored in tmp folder of hadoop

1.2. MULTI-NODE INSTALLATION 21

Installation and Configuration Documentation, Release 1.0.1

22 Chapter 1. HADOOP INSTALLATION

Installation and Configuration Documentation, Release 1.0.1

hduser@master:~$ sudo rm -Rf /app/hadoop/tmp/*

1.2. MULTI-NODE INSTALLATION 23

Installation and Configuration Documentation, Release 1.0.1

24 Chapter 1. HADOOP INSTALLATION

CHAPTER 2

HIVE INSTALLATION

This section refers to the installation settings of Hive on a standalone system as well as on a system existing as a node
in a cluster.

INTRODUCTION

Apache Hive is a data warehouse infrastructure built on top of Hadoop for providing data summarization,
query, and analysis. Apache Hive supports analysis of large datasets stored in Hadoop’s HDFS and com-
patible file systems such as Amazon S3 filesystem. It provides an SQL-like language called HiveQL(Hive
Query Language) while maintaining full support for map/reduce.

Hive Installation

Installing HIVE:

• Browse to the link: http://apache.claz.org/hive/stable/

• Click the apache-hive-0.13.0-bin.tar.gz
• Save and Extract it
Commands

user@ubuntu:~$ cd /usr/lib/
user@ubuntu:~$ sudo mkdir hive
user@ubuntu:~$ cd Downloads
user@ubuntu:~$ sudo mv apache-hive-0.13.0-bin /usr/lib/hive

25
Installation and Configuration Documentation, Release 1.0.1

Setting Hive environment variable:

Commands

user@ubuntu:~$ cd
user@ubuntu:~$ sudo gedit ~/.bashrc

Copy and paste the following lines at end of the file

# Set HIVE_HOME
export HIVE_HOME="/usr/lib/hive/apache-hive-0.13.0-bin"
PATH=$PATH:$HIVE_HOME/bin
export PATH

Setting HADOOP_PATH in HIVE config.sh

Commands

user@ubuntu:~$ cd /usr/lib/hive/apache-hive-0.13.0-bin/bin
user@ubuntu:~$ sudo gedit hive-config.sh

Go to the line where the following statements are written

# Allow alternate conf dir location.

HIVE_CONF_DIR="${HIVE_CONF_DIR:-$HIVE_HOME/conf"
export HIVE_CONF_DIR=$HIVE_CONF_DIR
export HIVE_AUX_JARS_PATH=$HIVE_AUX_JARS_PATH

Below this write the following

export HADOOP_HOME=/usr/local/hadoop (write the path where hadoop file is there)

Create Hive directories within HDFS

Command

user@ubuntu:~$ hadoop fs -mkdir /usr/hive/warehouse

Setting READ/WRITE permission for table

Command

user@ubuntu:~$ hadoop fs -chmod g+w /usr/hive/warehouse

HIVE launch

Command

user@ubuntu:~$ hive

Hive shell will prompt:

26 Chapter 2. HIVE INSTALLATION

Installation and Configuration Documentation, Release 1.0.1

OUTPUT

Shell will look like

Logging initialized using configuration in jar:file:/usr/lib/hive/apache-hive-0.13.0-

˓→bin/lib/hive- common-0.13.0.jar!/hive-log4j.properties

hive>

Creating a database

Command

hive> create database mydb;

OUTPUT

OK
Time taken: 0.369 seconds
hive>

Configuring hive-site.xml:

Open with text-editor and change the following property

<property>
<name>hive.metastore.local</name>
<value>TRUE</value>
<description>controls whether to connect to remove metastore server or open a new
˓→metastore server in Hive Client JVM</description>

</property>

<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://usr/lib/hive/apache-hive-0.13.0-bin/metastore_db?
˓→createDatabaseIfNotExist=true</value>

<description>JDBC connect string for a JDBC metastore</description>

</property>

<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>

<property>
<name>hive.metastore.warehouse.dir</name>
<value>/usr/hive/warehouse</value>
<description>location of default database for the warehouse</description>
</property>

Writing a Script

Open a new terminal (CTRL+ALT+T)

2.2. Hive Installation 27

Installation and Configuration Documentation, Release 1.0.1

user@ubuntu:~$ sudo gedit sample.sql

create database sample;

use sample;
create table product(product int, productname string, price float)[row format
˓→delimited fields terminated by ',';]

describe product;

load data local inpath ‘/home/hduser/input_to_product.txt’ into table product

select * from product;

SAVE and CLOSE

user@ubuntu:~$ sudo gedit input_to_product.txt

user@ubuntu:~$ cd /usr/lib/hive/apache-hive-0.13.0-bin/ $ bin/hive -f /home/hduser/
˓→sample.sql

28 Chapter 2. HIVE INSTALLATION

CHAPTER 3

SQOOP INSTALLATION

This section refers to the installation settings of Sqoop.

INTRODUCTION

• Sqoop is a tool designed to transfer data between Hadoop and relational databases.
• You can use Sqoop to import data from a relational database management system (RDBMS) such as MySQL
or Oracle into the Hadoop Distributed File System (HDFS), transform the data in Hadoop MapReduce, and
then export the data back into an RDBMS. Sqoop automates most of this process, relying on the database to
describe the schema for the data to be imported. Sqoop uses MapReduce to import and export the data, which
provides parallel operation as well as fault tolerance. This document describes how to get started using Sqoop
to move data between databases and Hadoop and provides reference information for the operation of the Sqoop
command-line tool suite.

Stable release and Download

Sqoop is an open source software product of the Apache Software Foundation. Sqoop source code is held in the
Apache Git repository.

Prerequisites

Before we can use Sqoop, a release of Hadoop must be installed and con?gured. Sqoop is currently supporting 4 major
Hadoop releases - 0.20, 0.23, 1.0 and 2.0. We have installed Hadoop 2.2.0 and it is compatible with sqoop 1.4.4.We
are using a Linux environment Ubuntu 12.04 to install and run sqoop. The basic familiarity with the purpose and
operation of Hadoop is required to use this product.

29
Installation and Configuration Documentation, Release 1.0.1

Installation

To install the sqoop 1.4.4 we followed the given sequence of steps :

1. Download the sqoop-1.4.4.bin_hadoop-1.0.0.tar.gz file from www.apache.org/dyn/closer.cgl/sqoop/1.4.4
2. Unzip the tar file: sudo tar -zxvf sqoop-1.4.4.bin hadoop1.0.0.tar.gz
3. Move sqoop-1.4.4.bin hadoop1.0.0 to sqoop using command
user@ubuntu:~$ sudo mv sqoop 1.4.4.bin hadoop1.0.0 /usr/lib/sqoop
4. Create a directory sqoop in usr/lib using command
user@ubuntu:~$ sudo mkdir /usr/lib/sqoop
5. Go to the zipped folder sqoop-1.4.4.bin_hadoop-1.0.0 and run the command
user@ubuntu:~sudo mv ./* /usr/lib/sqoop
6. Go to root directory using cd command
user@ubuntu:~$ cd
7. Open .bashrc file using
user@ubuntu:~$ sudo gedit ~/.bashrc
8. Add the following lines

export SQOOP_HOME=¡usr/lib/sqoop
export PATH=$PATH:$SQOOP_HOME/bin

9. To check if the sqoop has been installed successfully type the command

sqoop version

30 Chapter 3. SQOOP INSTALLATION

CHAPTER 4

IMPORTING DATA FROM HADOOP TO MYSQL

Steps to install mysql

• Run the command :sudo apt-get install mysql-server and give appropriate username and password.

Using sqoop to perform import to hadoop from sql

1. Download mysql-connector-java-5.1.28-bin.jar and move to /usr/lib/sqoop/lib using command

user@ubuntu:~$ sudo cp mysql-connnectpr-java-5.1.28-bin.jar /usr/lib/sqoop/lib/

2. Login to mysql using command

user@ubuntu:~$ mysql -u root -p

3. Login to secure shell using command

user@ubuntu:~$ ssh localhost

4. Start hadoop using the command

user@ubuntu:~$ bin/hadoop start-all.sh

5. Run the command

user@ubuntu:~$ sqoop import -connect jdbc:mysql://localhost:3306/sqoop -username

˓→root -pasword abc -table employees -m

This command imports the employees table from the sqoop directory of mysql to hdfs.

31
Installation and Configuration Documentation, Release 1.0.1

Error points

1. Do check if the hadoop is in safe mode using command

user@ubuntu:~$hadoop dfsadmin -safemode get

If you are getting safemode is on, run the command

user@ubuntu:~$hadoop dfsadmin -safemode leave

and again run the command

user@ubuntu:~$hadoop dfsadmin -safemode get

and confirm that you are getting safemode is off.

2. Do make sure that hadoop is running before performing the import action.

32 Chapter 4. IMPORTING DATA FROM HADOOP TO MYSQL

CHAPTER 5

Indices and tables

• genindex
• modindex
• search

Foundations of Software Testing
100% (1)
Foundations of Software Testing
722 pages
List of AWCs in Tamil Nadu PDF
0% (1)
List of AWCs in Tamil Nadu PDF
1,833 pages
CP7101 Design and Management of Computer Networks Important and Model Question Paper
No ratings yet
CP7101 Design and Management of Computer Networks Important and Model Question Paper
1 page
Hadoop Installatio1
No ratings yet
Hadoop Installatio1
22 pages
Bda Lab
No ratings yet
Bda Lab
37 pages
TP2 _3IM - En
No ratings yet
TP2 _3IM - En
7 pages
Installation of Hadoop
No ratings yet
Installation of Hadoop
8 pages
Hadoop 2.6 Installing On Ubuntu 14.04 (Single-Node Cluster)
No ratings yet
Hadoop 2.6 Installing On Ubuntu 14.04 (Single-Node Cluster)
27 pages
Installation of Hadoop in Ubuntu
No ratings yet
Installation of Hadoop in Ubuntu
15 pages
Install Hadoop
No ratings yet
Install Hadoop
8 pages
Hadoop Install
No ratings yet
Hadoop Install
19 pages
BDA LAB Programs
No ratings yet
BDA LAB Programs
56 pages
Hadoop Installation Steps
100% (1)
Hadoop Installation Steps
6 pages
Install Sqoop
No ratings yet
Install Sqoop
7 pages
BDAO
No ratings yet
BDAO
23 pages
How To Install Hadoop On Ubuntu 18.04 or 20.04
No ratings yet
How To Install Hadoop On Ubuntu 18.04 or 20.04
15 pages
Hadoop Configuration
No ratings yet
Hadoop Configuration
12 pages
Hadoop Installation Step by Step
No ratings yet
Hadoop Installation Step by Step
8 pages
$ Sudo Apt-Get Install Oracle-Java8-Installer
No ratings yet
$ Sudo Apt-Get Install Oracle-Java8-Installer
4 pages
Hadoop 2.7.3 Setup On Ubuntu 15.10
No ratings yet
Hadoop 2.7.3 Setup On Ubuntu 15.10
7 pages
Sqoop Tutorial: Sqoop: "SQL To Hadoop and Hadoop To SQL"
No ratings yet
Sqoop Tutorial: Sqoop: "SQL To Hadoop and Hadoop To SQL"
11 pages
Hadoop Installation Guide
No ratings yet
Hadoop Installation Guide
18 pages
A Report On Distributed Computing
No ratings yet
A Report On Distributed Computing
25 pages
Hadoop
No ratings yet
Hadoop
27 pages
hbase_installationn
No ratings yet
hbase_installationn
12 pages
Experiment 1
No ratings yet
Experiment 1
17 pages
BDA Lab manual
No ratings yet
BDA Lab manual
49 pages
Hadoop 2.6.5 Installing On Ubuntu 16.04 and 18.04 (Single-Node Cluster)
No ratings yet
Hadoop 2.6.5 Installing On Ubuntu 16.04 and 18.04 (Single-Node Cluster)
7 pages
Lab Manual
No ratings yet
Lab Manual
27 pages
Hive INstallation
No ratings yet
Hive INstallation
13 pages
Hadoop Installation Manual 2.odt
No ratings yet
Hadoop Installation Manual 2.odt
20 pages
2023MCS320004 HEMANTH TARRA - Hadoop Installation - Assignment
No ratings yet
2023MCS320004 HEMANTH TARRA - Hadoop Installation - Assignment
9 pages
Big Data Analytics Lab Experiments
No ratings yet
Big Data Analytics Lab Experiments
16 pages
Hadoop/Hbase Installation: Install Java
No ratings yet
Hadoop/Hbase Installation: Install Java
11 pages
Experiment No - 1
No ratings yet
Experiment No - 1
13 pages
EX. NO Date Program NO Sign
No ratings yet
EX. NO Date Program NO Sign
80 pages
bigdatamanual(2)
No ratings yet
bigdatamanual(2)
45 pages
BDA lab manual UPDATED
No ratings yet
BDA lab manual UPDATED
45 pages
213nt1306- Big Data Analytics Lab Manual
No ratings yet
213nt1306- Big Data Analytics Lab Manual
80 pages
Single Node Hadoop Cluster
No ratings yet
Single Node Hadoop Cluster
9 pages
Bda Practical
No ratings yet
Bda Practical
62 pages
Experiment 1 Hadoop Installation
No ratings yet
Experiment 1 Hadoop Installation
6 pages
Step 1 - Install Oracle Java 8 On Ubuntu
No ratings yet
Step 1 - Install Oracle Java 8 On Ubuntu
7 pages
Bda Record
No ratings yet
Bda Record
27 pages
Online:: Setting Up The Environment
No ratings yet
Online:: Setting Up The Environment
9 pages
DataVisuaization Lab
No ratings yet
DataVisuaization Lab
5 pages
BigData_Lab_Manual
No ratings yet
BigData_Lab_Manual
44 pages
CP5261Data Analytics Laboratory
No ratings yet
CP5261Data Analytics Laboratory
57 pages
Configure HBase Hadoop and Hbase Client
No ratings yet
Configure HBase Hadoop and Hbase Client
16 pages
Big Data Analytics - Lab-Manual
No ratings yet
Big Data Analytics - Lab-Manual
19 pages
Hadoop Installation
No ratings yet
Hadoop Installation
6 pages
BDA Practical
No ratings yet
BDA Practical
38 pages
Hadoop Installation
No ratings yet
Hadoop Installation
4 pages
Hadoop Installation Guide
No ratings yet
Hadoop Installation Guide
18 pages
Exp-1-1
No ratings yet
Exp-1-1
24 pages
Hadoop 2 - Pseudo Node Installation
No ratings yet
Hadoop 2 - Pseudo Node Installation
9 pages
Hadoop Cluster Creation
No ratings yet
Hadoop Cluster Creation
8 pages
Install and Run Hadoop On Windows
No ratings yet
Install and Run Hadoop On Windows
29 pages
Anurag 1-6 Merged
No ratings yet
Anurag 1-6 Merged
60 pages
Running Ha Do Op Michel Noll
No ratings yet
Running Ha Do Op Michel Noll
23 pages
BDA Lab Manual-1
No ratings yet
BDA Lab Manual-1
60 pages
Software Patterns Made Easy
From Everand
Software Patterns Made Easy
Justice Nanhou
No ratings yet
Advanced Multiplayer Game Development with Ureal Engine 5: A Comprehensive Guide to C++ Scripting
From Everand
Advanced Multiplayer Game Development with Ureal Engine 5: A Comprehensive Guide to C++ Scripting
Vladimir Kiselev
No ratings yet
Amenu Rheman A10031291S219
No ratings yet
Amenu Rheman A10031291S219
4 pages
Projection of Constraint Based On Clustering: Jai Sharma. K, Chandra Mouliswara Reddy. P
No ratings yet
Projection of Constraint Based On Clustering: Jai Sharma. K, Chandra Mouliswara Reddy. P
4 pages
Sampling Which Provides For A Known Non Zero Chance of Selection Is
No ratings yet
Sampling Which Provides For A Known Non Zero Chance of Selection Is
7 pages
Cse41-Internet of Things - Syllabus
No ratings yet
Cse41-Internet of Things - Syllabus
2 pages
SMART Product Drivers 12 and Smart Ink 5: Basic Installation and Administration Guide
No ratings yet
SMART Product Drivers 12 and Smart Ink 5: Basic Installation and Administration Guide
32 pages
Saveetha School of Engineering Saveetha Institute of Medical and Technical Sciences M.E Computer Science and Engineering Programme Regulation 2016
No ratings yet
Saveetha School of Engineering Saveetha Institute of Medical and Technical Sciences M.E Computer Science and Engineering Programme Regulation 2016
1 page
TN Apartment Ownership Rules 1997
100% (1)
TN Apartment Ownership Rules 1997
3 pages
Privacy Characterization and Quantification in Data Publishing
No ratings yet
Privacy Characterization and Quantification in Data Publishing
1 page
B.E CSE Full Time Curriculam-2010 PDF
No ratings yet
B.E CSE Full Time Curriculam-2010 PDF
135 pages
XML Unit 2 Notes
No ratings yet
XML Unit 2 Notes
24 pages
TECHSOSYS
No ratings yet
TECHSOSYS
16 pages
Unit - I Web Fundamentals 1.1 History of Web
No ratings yet
Unit - I Web Fundamentals 1.1 History of Web
50 pages
Shell Scripting Notes
No ratings yet
Shell Scripting Notes
17 pages
CS006 C Programming 3 0 2 4
No ratings yet
CS006 C Programming 3 0 2 4
2 pages
STPI
No ratings yet
STPI
2 pages
SHAR Permission Letter
100% (2)
SHAR Permission Letter
2 pages
cs2029 Adt1 QB
No ratings yet
cs2029 Adt1 QB
2 pages
Meghdoot Manual
No ratings yet
Meghdoot Manual
5 pages
CS2029
No ratings yet
CS2029
6 pages
GKM College of Engineering and Technology Chennai - 63 Department of Information Technology
No ratings yet
GKM College of Engineering and Technology Chennai - 63 Department of Information Technology
2 pages
TrueNAS For Veeaam Backup Replication Whitepaper 2020 WEB
No ratings yet
TrueNAS For Veeaam Backup Replication Whitepaper 2020 WEB
19 pages
MinHook - The Minimalistic x86 - x64 API Hooking Library - CodeProject
100% (1)
MinHook - The Minimalistic x86 - x64 API Hooking Library - CodeProject
11 pages
MCQ - 8085 Microprocessor - November 2020 With Answers
No ratings yet
MCQ - 8085 Microprocessor - November 2020 With Answers
48 pages
User Manual: QX Connect Series
No ratings yet
User Manual: QX Connect Series
16 pages
Private&HybridCloud-Sales Training Deck
No ratings yet
Private&HybridCloud-Sales Training Deck
26 pages
Xitron Plugin For Trendsetter
No ratings yet
Xitron Plugin For Trendsetter
16 pages
CEG 2136 - Fall 2008 - Midterm
No ratings yet
CEG 2136 - Fall 2008 - Midterm
20 pages
A GPU Polyhedral For DEM - Master - Thesis - Adam - Bilock
No ratings yet
A GPU Polyhedral For DEM - Master - Thesis - Adam - Bilock
75 pages
Parallel and Distributed Computing Quiz3
No ratings yet
Parallel and Distributed Computing Quiz3
3 pages
NAIT Linux Exam Sample Questions Help For ICT480
100% (5)
NAIT Linux Exam Sample Questions Help For ICT480
7 pages
LoRaWAN EN v1.0
No ratings yet
LoRaWAN EN v1.0
86 pages
CCNA Exploration Network Fundamentals: OSI Transport Layer
No ratings yet
CCNA Exploration Network Fundamentals: OSI Transport Layer
41 pages
Lazarus - Chapter 11
No ratings yet
Lazarus - Chapter 11
2 pages
1783-Um007 - En-P - Statrix 5700
No ratings yet
1783-Um007 - En-P - Statrix 5700
528 pages
Micro
No ratings yet
Micro
265 pages
2.11.HC120115009 Eth-Trunk Principles and Configurations
No ratings yet
2.11.HC120115009 Eth-Trunk Principles and Configurations
23 pages
5 Pen PC Technology
100% (1)
5 Pen PC Technology
15 pages
DrumCore 4 Install and Migration Guide
No ratings yet
DrumCore 4 Install and Migration Guide
24 pages
Original Thomson Twg870 Manual
0% (1)
Original Thomson Twg870 Manual
90 pages
TOPOLOGIA NEREIDAS - Assessment Final@CEVSEDE
No ratings yet
TOPOLOGIA NEREIDAS - Assessment Final@CEVSEDE
42 pages
Clase 12. MPLS - Basic TE
No ratings yet
Clase 12. MPLS - Basic TE
87 pages
2020 IDOC Does Not Reach PO SM58 Log Solved Changing The PROGRAM ID Name - SAP ONE Support Launchpad
No ratings yet
2020 IDOC Does Not Reach PO SM58 Log Solved Changing The PROGRAM ID Name - SAP ONE Support Launchpad
1 page
Components of Embedded Systems
No ratings yet
Components of Embedded Systems
10 pages
4.1 Architecture of 8051 Microcontroller Evolution: Rohini College of Engineering & Technology
No ratings yet
4.1 Architecture of 8051 Microcontroller Evolution: Rohini College of Engineering & Technology
9 pages
Scan To Folder Setup Tool For SMB
No ratings yet
Scan To Folder Setup Tool For SMB
11 pages
Rouing Concepts
No ratings yet
Rouing Concepts
211 pages
WhatsUp Gold Setup and Configuration Checklist
No ratings yet
WhatsUp Gold Setup and Configuration Checklist
14 pages
Pico Alu
No ratings yet
Pico Alu
2 pages
HUAWEI ALE-L21 Android6.0 Rollback To Android5.0 Operation Instruction v1.2
No ratings yet
HUAWEI ALE-L21 Android6.0 Rollback To Android5.0 Operation Instruction v1.2
5 pages
Lenovo V15 G4 IAN Spec
No ratings yet
Lenovo V15 G4 IAN Spec
7 pages