0% found this document useful (0 votes)

88 views7 pages

How Sqoop Works?: Relationaldatabase Servers in The Relational Database Structure

This document provides an overview of Sqoop, including: - Sqoop allows transferring data between relational databases and Hadoop. It imports data from databases like MySQL to HDFS and exports from HDFS to databases. - The document describes how Sqoop works, its import and export functionality, syntax for commands, and examples of using Sqoop to import a database table to HDFS. - It also covers incremental imports, creating and managing Sqoop jobs, and using jobs to automate recurring import tasks.

Uploaded by

uday kiran

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

88 views7 pages

How Sqoop Works?: Relationaldatabase Servers in The Relational Database Structure

Uploaded by

uday kiran

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 7

INTRODUCTION1.

The traditional application management system, that is, the interaction of

applications with relational database using RDBMS, is one of the sources that
generate Big Data. Such Big Data, generated by RDBMS, is stored in
RelationalDatabase Servers in the relational database structure.

When Big Data storages and analyzers such as MapReduce, Hive, HBase, Cassandra,
Pig, etc. of the Hadoop ecosystem came into picture, they required a tool to interact
with the relational database servers for importing and exporting the Big Data residing in
them. Here, Sqoop occupies a place in the Hadoop ecosystem to provide feasible
interaction between relational database server and Hadoop’s HDFS.

Sqoop: “SQL to Hadoop and Hadoop to SQL”

Sqoop is a tool designed to transfer data between Hadoop and relational database
servers. It is used to import data from relational databases such as MySQL, Oracle
to Hadoop HDFS, and export from Hadoop file system to relational databases. It is
provided by the Apache Software Foundation.

How Sqoop Works?

The following image describes the workflow of Sqoop.
Sqoop Import
The import tool imports individual tables from RDBMS to HDFS. Each row in a table
is treated as a record in HDFS. All records are stored as text data in text files or as
binary data in Avro and Sequence files.

Sqoop Export
The export tool exports a set of files from HDFS back to an RDBMS. The files given
as input to Sqoop contain records, which are called as rows in table. Those are read
and parsed into a set of records and delimited with user-specified delimiter.

Syntax
The following syntax is used to import data into HDFS
$ sqoop import (generic-args) (import-args)

Example in Mysql

Add a database called books, enter:

mysql> CREATE DATABASE books;
Now, database is created. Use a database with use command, type:
mysql> USE books;
Next, create a table called authors with name, email and id as fields:
mysql> CREATE TABLE authors (id INT, name VARCHAR(20), email VARCHAR(20));
To display your tables in books database, enter:
mysql> SHOW TABLES;
Finally, add a data i.e. row to table books using INSERT statement, run:
mysql> INSERT INTO authors (id,name,email) VALUES(1,"Vivek","[email protected]");Try to
add few more rows to your table:
mysql> INSERT INTO authors (id,name,email) VALUES(2,"Priya","[email protected]");
mysql> INSERT INTO authors (id,name,email) VALUES(3,"Tom","[email protected]");

sqoop list commands

Run the commands on the Unix prompt, on the node where you have sqoop installed.

List-databases
Lists databases in your mysql database.
$ sqoop list-databases --connect jdbc:mysql://192.168.80.134:3306/employees
--username root

13/05/31 16:45:58 INFO manager.MySQLManager: Preparing to use a MySQL

streaming resultset.

information_schema

employees

test

list-tables
Lists tables in your mysql database.

$ sqoop list-tables --connect jdbc:mysql://192.168.80.134:3306/employees --

username root

13/05/31 16:45:58 INFO manager.MySQLManager: Preparing to use a MySQL

streaming resultset.

departments

dept_emp

dept_manager

employees

employees_exp_stg

employees_export

salaries

titles

Argument Description
--append Append data to an existing dataset in HDFS
--as-avrodatafile Imports data to Avro Data Files
--as-textfile Imports data as plain text (default)
--boundary-query
<statement> Boundary query to use for creating splits
--columns <col,col,col…>Columns to import from table
--direct Use direct import fast path
--direct-split-size <n>
Split the input stream every n bytes when importing
in direct mode
-m,--num-mappers <n> Use n map tasks to import in parallel
Argument Description
-e,--query <statement> Import the results of statement.
--split-by <column-name> Column of the table used to split work units
--table <table-name> Table to read
--target-dir <dir> HDFS destination dir
--where <where clause> WHERE clause to use during import
-z,--compress Enable compression
--compression-codec <c> Use Hadoop codec (default gzip)

Importing a Table
Sqoop tool ‘import’ is used to import table data from the table to the Hadoop file
system as a text file or a binary file.

Importing into Target Directory

We can specify the target directory while importing table data into HDFS using the
Sqoop import tool.
--target-dir <new directory in HDFS>

The following command is used to import the authors table from MySQL database
server to HDFS.
sqoop import --connect jdbc:mysql://192.168.80.132:3306/books --username root
--table authors --target-dir /sqoop/mysqlstage -m 1

sqoop import --connect jdbc:mysql://192.168.80.134:3306/books1 --username

root --password hadoop --table authors --target-dir /mysqoop/mysqlstaging -m
1

sqoop import --connect jdbc:mysql://192.168.80.134:3306/books1 --username

root --password hadoop --table authors --split-by id -target-dir
/mysqoop/mysqlstaging1 -m 2

Incremental Import
Incremental import is a technique that imports only the newly added rows in a
table. It is required to add ‘incremental’, ‘check-column’, and ‘last-value’ options to
perform the incremental import.

Argument Description
--check-column Specifies the column to be examined when determining which rows to
(col) import.
--incremental Specifies how Sqoop determines which rows are new. Legal values
(mode) for mode includeappend and lastmodified.
Argument Description
--last-value
(value) Specifies the maximum value of the check column from the previous import.

Sqoop supports two types of incremental

imports: append and lastmodified. You can use the --
incrementalargument to specify the type of incremental import to
perform.

You should specify append mode when importing a table where new
rows are continually being added with increasing row id values. You
specify the column containing the row's id with --check-column. Sqoop
imports rows where the check column has a value greater than the
one specified with --last-value.

An alternate table update strategy supported by Sqoop is

called lastmodified mode. You should use this when rows of the
source table may be updated, and each such update will set the
value of a last-modified column to the current timestamp. Rows
where the check column holds a timestamp more recent than the
timestamp specified with --last-value are imported.

Add one more record into authors table

INSERT INTO authors (id,name,email) VALUES(4,"tonia","[email protected]")

--incremental <mode> --
check-column <column name>
--last-value <last check column value>

Run cat command before import:

hadoop fs -cat /sqoop/myswlstage/*

result is :
1,Vivek,[email protected]
2,Priya,[email protected]
3,Tom,[email protected]

sqoop import --connect jdbc:mysql://192.168.80.132:3306/books --username root --table authors

--target-dir /sqoop/myswlstage -m 1 --incremental append --check-column id --last-value 3
Run cat command after import:
Result is:
1,Vivek,[email protected]
2,Priya,[email protected]
3,Tom,[email protected]
4,tonia,[email protected]

The above code will insert all the new rows based on the last value.

**In hdfs in the same old target directory ,a new file will be createdwith all new records.

Now we can think of second case where there are updates in rows

+------+------------+----------+------+------------+
| sid | city | state | rank | rDate |
+------+------------+----------+------+------------+
| 101 | Chicago | Illinois | 1 | 2015-01-01 |
| 101 | Schaumburg | Illinois | 3 | 2014-01-25 |
| 101 | Columbus | Ohio | 7 | 2014-01-25 |
| 103 | Charlotte | NC | 9 | 2013-04-22 |
| 103 | Greenville | SC | 9 | 2013-05-12 |
| 103 | Atlanta | GA | 11 | 2013-08-21 |
| 104 | Dallas | Texas | 4 | 2015-02-02 |
| 105 | Phoenix | Arzona | 17 | 2015-02-24 |
+------+------------+----------+------+------------+
Here we use incremental lastmodified where we will fetch all the updated rows based on
date.

sqoop import --connect jdbc:mysql://localhost:3306/ydb --table yloc --username root -

P --check-column rDate --incremental lastmodified --last-value 2014-01-25 --target-
dir yloc/loc

** we have to specify a new target directory

Sqoop Job
Sqoop job creates and saves the import and export commands. It specifies
parameters to identify and recall the saved job. This re-calling or re-executing is
used in the incremental import, which can import the updated rows from RDBMS
table to HDFS
Syntax
The following is the syntax for creating a Sqoop job
$ sqoop job (generic-args) (job-args)
[-- [subtool-name] (subtool-args)]

Create Job (--create)

Here we are creating a job with the name myjob, which can import the table data
from RDBMS table to HDFS. The following command is used to create a job that is
importing data from the authors table in the books database to the HDFS file.
sqoop job --create myjob -- import --connect jdbc:mysql://192.168.80.132:3306/books --username root
--table authors --target-dir /sqoop/mysqlstage3 -m 1

Note: "--" and a space before the import.

Verify Job (--list)

‘--list’ argument is used to verify the saved jobs. The following command is used to
verify the list of saved Sqoop jobs.

$ sqoop job --list

Available jobs:
Myjob

Inspect Job (--show)

‘--show’ argument is used to inspect or verify particular jobs and their details. The
following command and sample output is used to verify a job called myjob.
sqoop job --show myjob
It shows the tools and their options, which are used in myjob.
Execute Job (--exec)
‘--exec’ option is used to execute a saved job. The following command is used to
execute a saved job called myjob.

$ sqoop job --exec myjob

SIC Big Data Chapter 3 Workbook
No ratings yet
SIC Big Data Chapter 3 Workbook
86 pages
Sample - Superstore Sales (Excel)
No ratings yet
Sample - Superstore Sales (Excel)
1,189 pages
Sqoop
No ratings yet
Sqoop
15 pages
Bda U3
No ratings yet
Bda U3
59 pages
Chapter 17 - Physical Database Design
No ratings yet
Chapter 17 - Physical Database Design
44 pages
Mysql 010-002 010-002 Certified Mysql Associate Practice Test
No ratings yet
Mysql 010-002 010-002 Certified Mysql Associate Practice Test
30 pages
U Iv Sqoop 1
No ratings yet
U Iv Sqoop 1
20 pages
Sqoop User Guide
No ratings yet
Sqoop User Guide
90 pages
04 Sqoop
No ratings yet
04 Sqoop
30 pages
Name: Vinayak Nagar Reg - No.: 21MCA0015 Subject: Database Technology Topic: Assignment-1
No ratings yet
Name: Vinayak Nagar Reg - No.: 21MCA0015 Subject: Database Technology Topic: Assignment-1
77 pages
Big Data: Sqoop
No ratings yet
Big Data: Sqoop
43 pages
Module 5 - Sqoop
No ratings yet
Module 5 - Sqoop
25 pages
Sqoop
No ratings yet
Sqoop
28 pages
DSCI 5350 - Lecture 3 PDF
No ratings yet
DSCI 5350 - Lecture 3 PDF
39 pages
Unit 4 3 Lumify, Data Rapper and Sqooop
No ratings yet
Unit 4 3 Lumify, Data Rapper and Sqooop
27 pages
Fundamentals of Apache Sqoop Notes
No ratings yet
Fundamentals of Apache Sqoop Notes
66 pages
Unit 3 Topic 8 Flume and Scoop
No ratings yet
Unit 3 Topic 8 Flume and Scoop
35 pages
Bio-Data-Annexure-I Advt No 2021-2
No ratings yet
Bio-Data-Annexure-I Advt No 2021-2
7 pages
Linear Regression (Check List)
100% (1)
Linear Regression (Check List)
2 pages
DBMS Unit-3
No ratings yet
DBMS Unit-3
28 pages
SqoopTutorial Ver 2.0
No ratings yet
SqoopTutorial Ver 2.0
51 pages
Mod 2
No ratings yet
Mod 2
70 pages
Scoop Intro
No ratings yet
Scoop Intro
9 pages
MySQL - MODULE 1
No ratings yet
MySQL - MODULE 1
51 pages
Unit 3 Apache Sqoop and Drill
No ratings yet
Unit 3 Apache Sqoop and Drill
10 pages
Zep Sqoop Big Data Interview Questions
No ratings yet
Zep Sqoop Big Data Interview Questions
25 pages
Sqoop Incremental Import PP 200913 222451 Unlocked
No ratings yet
Sqoop Incremental Import PP 200913 222451 Unlocked
27 pages
Sqoop LAB
No ratings yet
Sqoop LAB
12 pages
Lesson 3 - Data - Ingestion - Into - Big - Data - Systems - and - ETL
No ratings yet
Lesson 3 - Data - Ingestion - Into - Big - Data - Systems - and - ETL
104 pages
B22 BDA Experiment 03
No ratings yet
B22 BDA Experiment 03
11 pages
Sqoop 2
No ratings yet
Sqoop 2
10 pages
Bda 11
No ratings yet
Bda 11
10 pages
SQOOP
No ratings yet
SQOOP
8 pages
Sqoop v1.1
No ratings yet
Sqoop v1.1
18 pages
BD Sqltohadoop3 PDF
No ratings yet
BD Sqltohadoop3 PDF
13 pages
Sqooprequestfiles
No ratings yet
Sqooprequestfiles
7 pages
Lab Experiments 1,2&4
No ratings yet
Lab Experiments 1,2&4
8 pages
Code Academy SQ L
No ratings yet
Code Academy SQ L
6 pages
BDA Lab2
No ratings yet
BDA Lab2
8 pages
Sqoop
No ratings yet
Sqoop
9 pages
6.moving Data Into Hadoop
No ratings yet
6.moving Data Into Hadoop
18 pages
Sqoop - A Haddop Technology: Srikalahasti
No ratings yet
Sqoop - A Haddop Technology: Srikalahasti
13 pages
Sqoop Additional Reading Pp-200913-222451-Unlocked
No ratings yet
Sqoop Additional Reading Pp-200913-222451-Unlocked
18 pages
Cloudera Academic Partnership 8 PDF
No ratings yet
Cloudera Academic Partnership 8 PDF
69 pages
Sqoop Students Datadotz
No ratings yet
Sqoop Students Datadotz
19 pages
SQL Statements
No ratings yet
SQL Statements
113 pages
Apache Sqoop Data Transfer Between Hadoop and RDBMS
No ratings yet
Apache Sqoop Data Transfer Between Hadoop and RDBMS
9 pages
Chapter - 2 Query Processing
No ratings yet
Chapter - 2 Query Processing
61 pages
Chapter n3 Sqoop
No ratings yet
Chapter n3 Sqoop
24 pages
SQL Commands: A Data Type Defines What Kind of Value A Column Can Contain
No ratings yet
SQL Commands: A Data Type Defines What Kind of Value A Column Can Contain
12 pages
This Documents Are About Apache Sqoop
No ratings yet
This Documents Are About Apache Sqoop
23 pages
Apache Sqoop: Hanoi - Autumn 2019
No ratings yet
Apache Sqoop: Hanoi - Autumn 2019
18 pages
158.337 Database Development - Massey - Exam - S1 2014
No ratings yet
158.337 Database Development - Massey - Exam - S1 2014
18 pages
How Sqoop Works?: Sqoop "SQL To Hadoop and Hadoop To SQL"
No ratings yet
How Sqoop Works?: Sqoop "SQL To Hadoop and Hadoop To SQL"
27 pages
M - M - Num-Mappers
No ratings yet
M - M - Num-Mappers
4 pages
Intro
No ratings yet
Intro
2 pages
Sqoop
No ratings yet
Sqoop
4 pages
Sqoopintro
No ratings yet
Sqoopintro
2 pages
5 - Big - Data Vivek
No ratings yet
5 - Big - Data Vivek
4 pages
BigData Module 2
No ratings yet
BigData Module 2
18 pages
Sqoop Practice
No ratings yet
Sqoop Practice
7 pages
Knowledge About Apache Sqoop and Its All Basic Commands To Import and Export The Data
No ratings yet
Knowledge About Apache Sqoop and Its All Basic Commands To Import and Export The Data
7 pages
Sqoop Commands
No ratings yet
Sqoop Commands
9 pages
Sqoop Implementation Revised
No ratings yet
Sqoop Implementation Revised
7 pages
Creating A Table in RDBMS 3 2. Importing RDBMS Data Into H DFS 3 Exporting HDFS Data To RDBMS .. 6
No ratings yet
Creating A Table in RDBMS 3 2. Importing RDBMS Data Into H DFS 3 Exporting HDFS Data To RDBMS .. 6
5 pages
Using Sqooptool To Transfer Data Between Hadoop and Mysql: Implementation
No ratings yet
Using Sqooptool To Transfer Data Between Hadoop and Mysql: Implementation
4 pages
Sqoop Interview Questions
No ratings yet
Sqoop Interview Questions
6 pages
Sqoop Cammand
No ratings yet
Sqoop Cammand
8 pages
Apache Sqoop
No ratings yet
Apache Sqoop
21 pages
Sqoop Commands
No ratings yet
Sqoop Commands
4 pages
Employee (Name, Salary, Deptno) and Department (Deptno, Deptname, Address)
No ratings yet
Employee (Name, Salary, Deptno) and Department (Deptno, Deptname, Address)
3 pages
Distributed Databases: Rohini College of Engineering & Technology
No ratings yet
Distributed Databases: Rohini College of Engineering & Technology
5 pages
Sqoop Export and Import Commands
No ratings yet
Sqoop Export and Import Commands
5 pages
Date and Time Conversions Using SQL Server
No ratings yet
Date and Time Conversions Using SQL Server
4 pages
Prin - and App - of Database Homework 02
No ratings yet
Prin - and App - of Database Homework 02
3 pages
Web Project - Part I
No ratings yet
Web Project - Part I
12 pages
Chap4 Introduction To SQL and MySQL
No ratings yet
Chap4 Introduction To SQL and MySQL
67 pages
Revision Ch.9 AIS
No ratings yet
Revision Ch.9 AIS
6 pages
Hdfs Commands
No ratings yet
Hdfs Commands
4 pages
Karthik B Resume-1
No ratings yet
Karthik B Resume-1
3 pages
WebUtil Deployment
No ratings yet
WebUtil Deployment
2 pages
Resume Bipin
No ratings yet
Resume Bipin
3 pages
Unit III Unit III - Chapter 5
No ratings yet
Unit III Unit III - Chapter 5
22 pages
JUSPAY Analytics Data Set
No ratings yet
JUSPAY Analytics Data Set
3 pages
Integrity Log Etib
No ratings yet
Integrity Log Etib
29 pages
Net - How To Connect Access Database in C# - Stack Overflow
No ratings yet
Net - How To Connect Access Database in C# - Stack Overflow
4 pages
ACID (Atomicity, Consistency, Isolation, Durability)
No ratings yet
ACID (Atomicity, Consistency, Isolation, Durability)
15 pages
JRE Exam Python July SET-B
No ratings yet
JRE Exam Python July SET-B
11 pages
Simple Android Application That Makes Use of Database
No ratings yet
Simple Android Application That Makes Use of Database
10 pages
Azure Book 106
No ratings yet
Azure Book 106
1 page
DBMS Q PAPER 1 - Btech5thsem
No ratings yet
DBMS Q PAPER 1 - Btech5thsem
1 page
Basic Rdbms Oracle
No ratings yet
Basic Rdbms Oracle
4 pages
DBMS Lab Manual
From Everand
DBMS Lab Manual
Jitendra Patel
1.5/5 (3)
Quick Configuration of Openldap and Kerberos in Linux and Authenicating Linux to Active Directory
From Everand
Quick Configuration of Openldap and Kerberos in Linux and Authenicating Linux to Active Directory
Dr. Hidaia Mahmood Alassouli
No ratings yet

How Sqoop Works?: Relationaldatabase Servers in The Relational Database Structure

Uploaded by

How Sqoop Works?: Relationaldatabase Servers in The Relational Database Structure

Uploaded by

INTRODUCTION1.

The traditional application management system, that is, the interaction of

Sqoop: “SQL to Hadoop and Hadoop to SQL”

How Sqoop Works?

Add a database called books, enter:

sqoop list commands

13/05/31 16:45:58 INFO manager.MySQLManager: Preparing to use a MySQL

$ sqoop list-tables --connect jdbc:mysql://192.168.80.134:3306/employees --

13/05/31 16:45:58 INFO manager.MySQLManager: Preparing to use a MySQL

Importing into Target Directory

sqoop import --connect jdbc:mysql://192.168.80.134:3306/books1 --username

sqoop import --connect jdbc:mysql://192.168.80.134:3306/books1 --username

Sqoop supports two types of incremental

An alternate table update strategy supported by Sqoop is

Add one more record into authors table

INSERT INTO authors (id,name,email) VALUES(4,"tonia","[email protected]")

Run cat command before import:

sqoop import --connect jdbc:mysql://192.168.80.132:3306/books --username root --table authors

sqoop import --connect jdbc:mysql://localhost:3306/ydb --table yloc --username root -

** we have to specify a new target directory

Create Job (--create)

Note: "--" and a space before the import.

Verify Job (--list)

$ sqoop job --list

Inspect Job (--show)

$ sqoop job --exec myjob

You might also like