0% found this document useful (0 votes)
80 views18 pages

Sqoop v1.1

Sqoop is a tool used for transferring bulk data between HDFS and relational databases. It can import data from a database into HDFS or export data from HDFS to a database. Sqoop uses MapReduce jobs and JDBC to connect to databases. It supports importing/exporting to file formats like delimited text and SequenceFiles. Common databases supported include MySQL, SQL Server, PostgreSQL and Oracle. Sqoop also integrates with Hive for importing into tables and HBase for importing into column families.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
80 views18 pages

Sqoop v1.1

Sqoop is a tool used for transferring bulk data between HDFS and relational databases. It can import data from a database into HDFS or export data from HDFS to a database. Sqoop uses MapReduce jobs and JDBC to connect to databases. It supports importing/exporting to file formats like delimited text and SequenceFiles. Common databases supported include MySQL, SQL Server, PostgreSQL and Oracle. Sqoop also integrates with Hive for importing into tables and HBase for importing into column families.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

SQOOP

Karthikeyan
Sanjai
1

[email protected]

INTRODUCTION

Bulk data transfer tool

To import and export data from a relational database into Hadoop for
processing

Map only job.

command-line tool

Integrates with Hive & Hbase

Support plugins via connector based architecture

[email protected]

SQOOP ARCHITECTURE

[email protected]

FILE FORMATS

Two file formats:


Delimited text

SequenceFiles

[email protected]

Delimited Text File

Default import format, explicitly as --as-textfile

Default delimiters are comma(,) for fields , a newline (\n) for records.

Sequence File --as-sequencefile

SUPPORTED DATABASES

Uses JDBC compatibility layer to talk with the databases

Sample List of DBs


MySQL

MS SQL

PostgreSQL

Oracle

Need to add vendor specific drivers in $SQOOP_HOME/lib

[email protected]
m

INSTALLATION
Download Sqoop-*.tar.gz

tar txvf sqoop-*.*.tar.gz

export HADOOP_HOME=/some/path/hadoop-dir

Please add the vendor Specific JDBC jar to $SQOOP_HOME/lib

Change to Sqoop Bin folder .

[email protected]

/sqoop help

SQOOP COMMANDS

sqoop help
Or We can use: sqoop COMMAND [ARGS]

Available commands:

codegen

create-hive-table

Import tables from a database to HDFS

list-databases

Import a table from a database to HDFS

import-all-tables

Export an HDFS directory to a database table help List available commands

import

Evaluate a SQL statement and display the results

export

Import a table definition into Hive

eval

Generate code to interact with database records

[email protected]

List available databases on a server

list-tables

List available tables in a database version Display version information

MYSQL CONNECTIVITY

mysql u root p

Enter password:root

show databases;

use test;

mysql>CREATE TABLE patient( pid INT(10),name VARCHAR(20),durg


VARCHAR(20),tot_amt INT(10));

mysql>insert into patient values(1,'saravanan','avil',100);

mysql>insert into patient values(2,'senthil','metacin',200);

mysql>insert into patient values(3,'Gowtham','paracetamol',300);

mysql>select * from patient;

[email protected]

SQOOP EVALUATE
Evaluate a SQL statement

bin/sqoop eval --connect jdbc:mysql://localhost/test -username root -

[email protected]

password root query "SELECT * FROM patient"

bin/sqoop eval --connect jdbc:mysql://localhost/test -username root password root --query "SELECT * FROM patient LIMIT 2"

bin/sqoop eval --connect jdbc:mysql://localhost/test -username root password root --query "INSERT INTO patient VALUES(4,
amudhan,'avil',400)"

SQOOP LIST
Sqoop-list-databases

bin/sqoop list-databases --connect jdbc:mysql://localhost/ information_schema

Sqoop-list-tables
bin/sqoop list-tables --connect jdbc:mysql://localhost/test username root password root

[email protected]

-username root -password root

10

SQOOP IMPORT - MYSQL TO HDFS

bin/sqoop import --connect jdbc:mysql://localhost/test -username root -

password root --table patient -m 1


Imports patient table into HDFS directory

Sqoop generates java file(patient.java) for our use

Data imported as text or SequenceFiles

Instead we can use codegen

bin/hadoop dfs cat /user/username/patient/part-00000

All values re displayed

These files can be used as input to MR jobs.

[email protected]

11

SQOOP IMPORT(CONTD)

Increasing parallelism (number of mappers)

bin/sqoop import --connect jdbc:mysql://localhost/test -username root -

target-directory

bin/sqoop import -connect jdbc:mysql://localhost/test -username root password root --table patient --target-dir /user/output -m 1

mysql to hdfs import-all-tables

bin/sqoop import-all-tables --connect jdbc:mysql://localhost/test -

[email protected]

password root --table patient --split-by column name(pid) -m 2

username root -password root -m 1

12

HIVE INTEGRATION

bin/sqoop-import --connect jdbc:mysql://localhost/test -username root password root --table patient --hive-table patientthive --create-hive-table [email protected]

-hive-import -m 1

13

SQOOP - HIVE IMPORT

14

[email protected]

HBASE

bin/sqoop import --connect jdbc:mysql://localhost/test --username root --password root

table patient --hbase-table patienthbase2 --column-family datasqoop [email protected]

key pid --hbase-createtable -m 1

15

SQOOP HBASE IMPORT

16

[email protected]

SQOOP EXPORT

Exports a set of files from HDFS back to an RDBMS

The target table must already exist in the database

The input files are read and parsed into a set of records according to the

Does not export from Hbase

hdfs to mysql

bin/sqoop export --connect jdbc:mysql://localhost/test - username root password root --table patient --export-dir /user/amudhan/patient

hive to mysql:

[email protected]

user-specified delimiters.

bin/sqoop export --connect jdbc:mysql://localhost/test --table patient -export-dir /user/hive/warehouse/patient --username root -- password
root -m 1
17

THANK YOU!

18

[email protected]

You might also like