0% found this document useful (0 votes)
13 views

Practice Assignment

Sqoop is a tool for transferring bulk data between Hadoop and structured data stores like relational databases. It allows importing data from databases into HDFS and exporting data from HDFS to databases in either full loads of entire tables or incremental loads of updated data. Sqoop's key features include robustness, support for full and incremental loads, parallel import/export using YARN, importing SQL query results, compression, connectors for major databases, Kerberos security, and loading data directly into Hive or HBase.

Uploaded by

hitaarnav
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Practice Assignment

Sqoop is a tool for transferring bulk data between Hadoop and structured data stores like relational databases. It allows importing data from databases into HDFS and exporting data from HDFS to databases in either full loads of entire tables or incremental loads of updated data. Sqoop's key features include robustness, support for full and incremental loads, parallel import/export using YARN, importing SQL query results, compression, connectors for major databases, Kerberos security, and loading data directly into Hive or HBase.

Uploaded by

hitaarnav
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

SQOOP AND ITS

CHARACTERISTICS

Submitted By:
Aman Bhatia
Sap ID: 500075254
Roll no: R172219010
CSE Big Data (BATCH1)
 Write a description about Sqoop and its characteristics.

Apache Sqoop is a tool designed for efficiently transferring bulk data between Apache
Hadoop and structured data stores such as relational databases.
The traditional application management system, that is, the interaction of applications
with relational database using RDBMS, is one of the sources that generate Big Data.
Such Big Data, generated by RDBMS, is stored in Relational Database Servers in the
relational database structure.

When Big Data storages and analysers such as MapReduce, Hive, HBase, Cassandra,
Pig, etc. of the Hadoop ecosystem came into picture, they required a tool to interact
with the relational database servers for importing and exporting the Big Data residing
in them. Here, Sqoop occupies a place in the Hadoop ecosystem to provide feasible
interaction between relational database server and Hadoop’s HDFS.

Sqoop − “SQL to Hadoop and Hadoop to SQL”

Sqoop is a tool designed to transfer data between Hadoop and relational database
servers. It is used to import data from relational databases such as MySQL, Oracle to
Hadoop HDFS, and export from Hadoop file system to relational databases. It is
provided by the Apache Software Foundation.
Characteristics of Apache Sqoop
The various key features of Apache Sqoop are:

1. Robust: Apache Sqoop is highly robust in nature. It has community support and
contribution and is easily usable.

2. Full Load: Using Sqoop, we can load a whole table just by a single Sqoop command.
Sqoop also allows us to load all the tables of the database by using a single Sqoop
command.

3. Incremental Load: Sqoop supports incremental load functionality. Using Sqoop, we


can load parts of the table whenever it is updated.

4. Parallel import/export: Apache Sqoop uses the YARN framework for importing and
exporting the data. This provides fault tolerance on the top of parallelism.

5. Import results of SQL query: Sqoop also allows us to import the result returned from
the SQL query into Hadoop Distributed File System.

6. Compression: We can compress our data either by using the deflate(gzip) algorithm
with the –compress argument or by specifying the –compression-codec argument. We
can load a compressed table in Apache Hive.

7. Connectors for all the major RDBMS Databases: Sqoop provides connectors for
various RDBMS databases, covering almost all of the entire circumference.

8. Kerberos Security Integration: Basically, Kerberos is the computer network


authentication protocol which works on the basis of the ‘tickets’ for allowing nodes
that are communicating over the non-secure network to prove their identity to each
other. Apache Sqoop provides support for Kerberos authentication.

9. Load data directly into HIVE/HBase: Using Sqoop, we can load the data directly into
the Hive for data analysis. We can also dump our data in the HBase, that is, the NoSQL
database.

10. Support for Accumulo: We can instruct Apache Sqoop to import a table in
Accumulo instead of importing them in a directory in HDFS.

You might also like