0% found this document useful (0 votes)
34 views10 pages

Apache Sqoop: Vasanth B 2019202060

Apache Sqoop is a tool that transfers data between Hadoop and relational databases. It efficiently moves large amounts of data between HDFS and external data stores like databases. Sqoop uses SQL to import data from databases like MySQL and Oracle into HDFS and to export data from HDFS to databases. It fills the need for transferring data between relational databases and Hadoop systems.

Uploaded by

Vasanth b
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views10 pages

Apache Sqoop: Vasanth B 2019202060

Apache Sqoop is a tool that transfers data between Hadoop and relational databases. It efficiently moves large amounts of data between HDFS and external data stores like databases. Sqoop uses SQL to import data from databases like MySQL and Oracle into HDFS and to export data from HDFS to databases. It fills the need for transferring data between relational databases and Hadoop systems.

Uploaded by

Vasanth b
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

APACHE SQOOP

VASANTH B
2019202060
SQOOP

Apache Sqoop is a tool in Hadoop ecosystem which
is designed to transfer data between hdfs and
rdbms vice-versa.

It efficiently transfers bulk data between hadoop and
external data stores such as eterprise data
warehouses, relational databases, etc.

SQOOP – SQL to Hadoop & Hadoop to SQL.

Sqoop transfer data between hadoop and
relational DB servers.

Sqoop is used to import data from relational DB
such as MySql,Oracle.

Sqoop is used to export data from HDFS to
relational DB.
Sqoop Uses:

Before Sqoop came, developers used to write to import and
export data between Hadoop and RDBMS and tool was
needed to the same.

Again Sqoop uses the MapReduce mechanism for its
operations like import and export work and work on a
parallel mechanism as well as fault tolerance.

Sqoop came and filled the gap between the transfer
between relational databases and Hadoop system.
Sqoop Architecture

sqoop
Import
HDFS RDBMS

Export
Limitations of Sqoop

Sqoop cannot be pasused and resumed. If it is failed we need
to clear things up and start again

Sqoop Export performance also depends upon the hardware
configuration of RDBMS server.

Sqoop is slow because it still uses Map Reduce in backend
processing.

Failures need special handling in case of partial import and
export.
Over to hands on..

You might also like