Apache Sqoop: Vasanth B 2019202060
Apache Sqoop: Vasanth B 2019202060
VASANTH B
2019202060
SQOOP
●
Apache Sqoop is a tool in Hadoop ecosystem which
is designed to transfer data between hdfs and
rdbms vice-versa.
●
It efficiently transfers bulk data between hadoop and
external data stores such as eterprise data
warehouses, relational databases, etc.
●
SQOOP – SQL to Hadoop & Hadoop to SQL.
●
Sqoop transfer data between hadoop and
relational DB servers.
●
Sqoop is used to import data from relational DB
such as MySql,Oracle.
●
Sqoop is used to export data from HDFS to
relational DB.
Sqoop Uses:
●
Before Sqoop came, developers used to write to import and
export data between Hadoop and RDBMS and tool was
needed to the same.
●
Again Sqoop uses the MapReduce mechanism for its
operations like import and export work and work on a
parallel mechanism as well as fault tolerance.
●
Sqoop came and filled the gap between the transfer
between relational databases and Hadoop system.
Sqoop Architecture
sqoop
Import
HDFS RDBMS
Export
Limitations of Sqoop
●
Sqoop cannot be pasused and resumed. If it is failed we need
to clear things up and start again
●
Sqoop Export performance also depends upon the hardware
configuration of RDBMS server.
●
Sqoop is slow because it still uses Map Reduce in backend
processing.
●
Failures need special handling in case of partial import and
export.
Over to hands on..