0% found this document useful (0 votes)
20 views24 pages

32 BDA Exp2

Uploaded by

jessjohn2209
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views24 pages

32 BDA Exp2

Uploaded by

jessjohn2209
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Faculty: Sana Shaikh Lab Manual - BDA 2024-2025

Experiment No: 2
Name : Jess John Roll No : 32

Batch : B

Topic: Use of Sqoop tool to transfer data between Hadoop and relational
databaseservers.
a. Sqoop and MySQL - Installation.
To execute basic commands of Hadoop eco system component Sqoop.
Prerequi o Familiarity with command-line interfaces such as bash
site: o Basic knowledge of Relational database management systems.
MySQL Basic familiarity with the purpose and operation of Hadoop
o
Mapping CSL704.3
With
COs:
Objectiv Ingest data using Sqoop.
e:
Outcome Students will be able to use the Sqoop tool - for transferring data between Hadoop
: &relational databases
Instructi This experiment is a compulsory experiment. All the students are required to
ons: perform this experiment individually.
Faculty: Sana Shaikh Lab Manual - BDA 2024-2025
Delivera SQOOP INSTALLATION
bles: Sqoop is a tool designed to transfer data between Hadoop and relational
database servers. It is used to import data from relational databases such
as MySQL, Oracle to Hadoop HDFS, and export from Hadoop file
system to relational databases. The traditional application management
system, that is, the interaction of applications with relational database
using RDBMS, is one of the sources that generate Big Data. Such Big
Data, generated by RDBMS, isstored in Relational Database Servers in
the relational database structure.

When Big Data storages and analyzers such as MapReduce, Hive, HBase,
Cassandra, Pig, etc. of the Hadoop ecosystem came into picture, they required a tool
tointeract with the relational database servers for importing and
exporting the Big Data residing in them. Here, Sqoop occupies a place
in the Hadoop ecosystem to provide feasible interaction between
relational databaseserver and Hadoop’s HDFS.

Sqoop: “SQL to Hadoop and Hadoop to SQL”


Sqoop is a tool designed to transfer data between Hadoop and
relational database servers. It is used to import data from relational
databases such asMySQL, Oracle to Hadoop HDFS, and export
from Hadoop file system to relational databases. It is provided by
the Apache Software Foundation.
The following image describes the workflow of Sqoop.
Faculty: Sana Shaikh Lab Manual - BDA 2024-2025

Sqoop Import
The import tool imports individual tables from RDBMS to HDFS. Each
row ina table is treated as a record in HDFS. All records are stored as
text data in text files or as binary data in Avro and Sequence files.

Sqoop Export
The export tool exports a set of files from HDFS back to an RDBMS.
The filesgiven as input to Sqoop contain records, which are called as
rows in table.
Those are read and parsed into a set of records and delimited with
userspecified delimiter.

STEPS to install Sqoop : 1.


1. Extract the Sqoop Package from the tar file pasted on the Desktop. The
extractedpackage can be seen, listed in the list of files and folders of the
Desktop using the ls command.
ls Desktop

2. Move this extracted folder (sqoop1.4.6.bin hadoop2.0.4alpha)


fromDesktop to the directory /usr/lib/sqoop using the sudo mv
command.

3. Sqoop environment can be set up only by appending the


following linesby executing nano ~/.bashrc command.
Faculty: Sana Shaikh Lab Manual - BDA 2024-2025

Append the following lines in this file.


export SQOOP_HOME=/usr/lib/sqoop
export
PATH=$PATH:$SQOOP_HOME/bin
Ctrl+X ....Y... Enter

4. Now save this bashrc file permanently by the command source ~/.bashrc

5. To configure Sqoop with Hadoop we need to edit a file sqoopenv.sh


whichis present in the directory path $SQOOP_HOME/conf.

Now move the contents of the template file sqoopenv-


template.sh to sqoopenv.sh using the mv command. mv sqoopenvtemplate.sh
sqoop-env.sh

To add contents in the sqoopenv.sh file use the command: nano sqoopenv.sh
Faculty: Sana Shaikh Lab Manual - BDA 2024-2025

Crtl+X...Y ...Enter

6. Now copy, add or download the mysqlconnectorjava5.1.36.tar.gz fileonto the


Desktop. Extract this file in the same file location.

7. Move this extracted file to the location /usr/lib/sqoop/lib using the mv


command.
cd Desktop
ls
cd mysql•connector•java•5.1.36
ls
mv mysql•connector•java•5.1.36•bin.jar /usr/lib/sqoop/lib
ls /usr/lib/sqoop/lib

8. To check if Sqoop has been installed correctly we move to the directory


$SQOOP_HOME/bin and use the command sqoop version to check for sqoop
installation success.
cd /usr/lib/sqoop/bin
cd $SQOOP_HOME/bin
sqoop version
Faculty: Sana Shaikh Lab Manual - BDA 2024-2025

STEPS for MYSQL INSTALLATION


1. After Sqoop installation MySQL has to be installed as well. Firstly,
installall the required libraries using the command sudo apt•get install
mysql•server.

3. To login to the MySQL user, use the following command: mysql u root p
Faculty: Sana Shaikh Lab Manual - BDA 2024-2025
It will be asked to enter the password for the corresponding user. Enter the
password. Now the MySQL script will run and the user will be logged in. This
verifies the successful completion of the MySQL installation onto the system.

IMPORT/EXPORT
1. We check if all the services are running using the jps command.

2. Then we start the mysql shell as:

3. We see the list of databases.

4. We create a new database or use an existing database according to the need.

5. Now we create a table in mysql which we will import into HDFS. create table
Faculty (id int primary key, name varchar(10), city varchar(10), salary bigint);
Faculty: Sana Shaikh Lab Manual - BDA 2024-2025

mysql> Insert into Faculty values(1, 'Sana', 'Mumbai', 95000);


mysql> Insert into Faculty values(2, 'Riya', 'Pune', 85000);
mysql> Insert into Faculty values(3, 'Karan', 'Jaipur', 55000);
mysql> Insert into Faculty values(4, 'Rahul', 'Delhi', 78000);
mysql> Insert into Faculty values(5, 'Bush', 'Mumbai', 75000);
mysql> Insert into Faculty values(6, 'Ram', 'Delhi', 66000);
mysql> Insert into Faculty values(7, 'Slade', 'Pune', 71000);

6. We can output the entries of the table as follows:

8. Now, we grant privileges to the user so that we can perform import function.
grantall privileges on *.* to ‘root’@’localhost’;

9. After that we quit the mysql shell.


Faculty: Sana Shaikh Lab Manual - BDA 2024-2025

TRANSFERRING AN ENTIRE TABLE INTO HADOOP:


10. The command is as follows: sqoop import --connect
jdbc:mysql://127.0.0.1:3306/emp --username root --password mySQL12345 --table
Faculty -m 1

hadoop fs -ls Faculty

We can check the output using the cat command.

http://localhost:50070/explorer.html#/user/slade/Faculty
Faculty: Sana Shaikh Lab Manual - BDA 2024-2025

TRANSFERRING AN ENTIRE TABLE INTO HADOOP:


. The command is as follows:
sqoop import –-connect jdbc:mysql://127.0.0.1:3306/Emp –-username root –-
password root –-table Faculty -m 1
If –m 1 is not used then the output is not saved in a single partition. It makes more
than 1 partitions in the hdfs.
Faculty: Sana Shaikh Lab Manual - BDA 2024-2025
Faculty: Sana Shaikh Lab Manual - BDA 2024-2025

Exercises for Students:


SPECIFYING A TARGET DIRECTORY:
1. Specify the target directory in HDFS into which we want the
output tobe saved.
Faculty: Sana Shaikh Lab Manual - BDA 2024-2025

IMPORTING ONLY A SUBSET OF DATA:


2. We will now import only a part of the table Faculty:
Faculty: Sana Shaikh Lab Manual - BDA 2024-2025
Faculty: Sana Shaikh Lab Manual - BDA 2024-2025

PROTECTIN G YOUR PASSWORD:


4.There are two ways of specifying the password. First is to write –P in
thecommand and then specify the password later on.

Implement the Second way of specifying the password is writing thepassword in a


file and then specifying the file in the command.
Faculty: Sana Shaikh Lab Manual - BDA 2024-2025
Faculty: Sana Shaikh Lab Manual - BDA 2024-2025

COMPRESSING IMPORTED DATA:


4. Use command to compress the imported data.
Faculty: Sana Shaikh Lab Manual - BDA 2024-2025

mmm

Exporting DATA FROM HDFS to RELATIONAL DATABASE:


5. Export the data
Create database engineer
Faculty: Sana Shaikh Lab Manual - BDA 2024-2025
Faculty: Sana Shaikh Lab Manual - BDA 2024-2025

Check table contents

Create new table for export


Faculty: Sana Shaikh Lab Manual - BDA 2024-2025
Faculty: Sana Shaikh Lab Manual - BDA 2024-2025
Faculty: Sana Shaikh Lab Manual - BDA 2024-2025
Faculty: Sana Shaikh Lab Manual - BDA 2024-2025

Conclusi Students will be able to use Sqoop tool for transferring data between
on: Hadoop &relational databases
Referenc http://moodle.dbit.in/
es: https://www.edureka.co/blog/apache-sqoop-tutorial/
https://dwgeek.com/sqoop-command-with-secure-password.html/

You might also like