Practice Assignment
Practice Assignment
CHARACTERISTICS
Submitted By:
Aman Bhatia
Sap ID: 500075254
Roll no: R172219010
CSE Big Data (BATCH1)
Write a description about Sqoop and its characteristics.
Apache Sqoop is a tool designed for efficiently transferring bulk data between Apache
Hadoop and structured data stores such as relational databases.
The traditional application management system, that is, the interaction of applications
with relational database using RDBMS, is one of the sources that generate Big Data.
Such Big Data, generated by RDBMS, is stored in Relational Database Servers in the
relational database structure.
When Big Data storages and analysers such as MapReduce, Hive, HBase, Cassandra,
Pig, etc. of the Hadoop ecosystem came into picture, they required a tool to interact
with the relational database servers for importing and exporting the Big Data residing
in them. Here, Sqoop occupies a place in the Hadoop ecosystem to provide feasible
interaction between relational database server and Hadoop’s HDFS.
Sqoop is a tool designed to transfer data between Hadoop and relational database
servers. It is used to import data from relational databases such as MySQL, Oracle to
Hadoop HDFS, and export from Hadoop file system to relational databases. It is
provided by the Apache Software Foundation.
Characteristics of Apache Sqoop
The various key features of Apache Sqoop are:
1. Robust: Apache Sqoop is highly robust in nature. It has community support and
contribution and is easily usable.
2. Full Load: Using Sqoop, we can load a whole table just by a single Sqoop command.
Sqoop also allows us to load all the tables of the database by using a single Sqoop
command.
4. Parallel import/export: Apache Sqoop uses the YARN framework for importing and
exporting the data. This provides fault tolerance on the top of parallelism.
5. Import results of SQL query: Sqoop also allows us to import the result returned from
the SQL query into Hadoop Distributed File System.
6. Compression: We can compress our data either by using the deflate(gzip) algorithm
with the –compress argument or by specifying the –compression-codec argument. We
can load a compressed table in Apache Hive.
7. Connectors for all the major RDBMS Databases: Sqoop provides connectors for
various RDBMS databases, covering almost all of the entire circumference.
9. Load data directly into HIVE/HBase: Using Sqoop, we can load the data directly into
the Hive for data analysis. We can also dump our data in the HBase, that is, the NoSQL
database.
10. Support for Accumulo: We can instruct Apache Sqoop to import a table in
Accumulo instead of importing them in a directory in HDFS.