Module 5 - Sqoop

Uploaded by

sonia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views25 pages

Module 5 - Sqoop

Uploaded by

sonia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 25

APACHE SQOOP

Introduction to Apache Sqoop

• Generally, applications interact with the relational database using
RDBMS, and thus this makes relational databases one of the most
important sources that generate Big Data. Such data is stored in RDB
Servers in the relational structure. Here, Apache Sqoop plays an
important role in Hadoop ecosystem, providing feasible interaction
between the relational database server and HDFS.
Advantages of Apache Sqoop
• So, Apache Sqoop is a tool in Hadoop ecosystem which is designed to
transfer data between HDFS (Hadoop storage) and relational database
servers like MySQL, Oracle RDB, SQLite, Teradata, Netezza, Postgres etc.
Apache Sqoop imports data from relational databases to HDFS, and
exports data from HDFS to relational databases. It efficiently transfers
bulk data between Hadoop and external data stores such as enterprise
data warehouses, relational databases, etc.
• This is how Sqoop got its name – “SQL to Hadoop & Hadoop to SQL”.
• Additionally, Sqoop is used to import data from external datastores into
Hadoop ecosystem’s tools like Hive & HBase.
Why Sqoop?
• For Hadoop developer, the actual game starts after the data is being
loaded in HDFS. They play around this data in order to gain various
insights hidden in the data stored in HDFS.

• So, for this analysis, the data residing in the relational database
management systems need to be transferred to HDFS. The task of
writing MapReduce code for importing and exporting data from the
relational database to HDFS is uninteresting & tedious. This is where
Apache Sqoop comes to rescue and removes their pain. It automates
the process of importing & exporting the data.
Why Sqoop?
• Sqoop makes the life of developers easy by providing CLI for importing
and exporting data. They just have to provide basic information like
database authentication, source, destination, operations etc. It takes
care of the remaining part.

• Sqoop internally converts the command into MapReduce tasks, which

are then executed over HDFS. It uses YARN framework to import and
export the data, which provides fault tolerance on top of parallelism.
Key Features of Sqoop
Sqoop provides many salient features like:
1.Full Load: Apache Sqoop can load the whole table by a single command.
You can also load all the tables from a database using a single command.
2.Incremental Load: Apache Sqoop also provides the facility of incremental
load where you can load parts of table whenever it is updated.
3.Parallel import/export: Sqoop uses YARN framework to import and
export the data, which provides fault tolerance on top of parallelism.
4.Import results of SQL query: You can also import the result returned
from an SQL query in HDFS.
5.Compression: You can compress your data by using deflate(gzip)
algorithm with –compress argument, or by specifying –compression-codec
argument. You can also load compressed table in Apache Hive.
Key Features of Sqoop
• Connectors for all major RDBMS Databases: Apache Sqoop provides
connectors for multiple RDBMS databases, covering almost the entire
circumference.
• Kerberos Security Integration: Kerberos is a computer network
authentication protocol which works on the basis of ‘tickets’ to allow
nodes communicating over a non-secure network to prove their
identity to one another in a secure manner. Sqoop supports Kerberos
authentication.
• Load data directly into HIVE/HBase: You can load data directly into
Apache Hive for analysis and also dump your data in HBase, which is a
NoSQL database.
• Support for Accumulo: You can also instruct Sqoop to import the table
in Accumulo rather than a directory in HDFS.
Sqoop Architecture & Working
Sqoop Architecture
• The import tool imports individual tables from RDBMS to HDFS. Each row in a table
is treated as a record in HDFS.
• When we submit Sqoop command, our main task gets divided into subtasks which is
handled by individual Map Task internally. Map Task is the subtask, which imports
part of data to the Hadoop Ecosystem. Collectively, all Map tasks import the whole
data.
How Sqoop Works?
Sqoop Import
• The import tool imports individual tables from RDBMS to HDFS. Each row in a table
is treated as a record in HDFS. All records are stored as text data in text files or as
binary data in Avro and Sequence files.
Sqoop Export
• The export tool exports a set of files from HDFS back to an RDBMS. The files given as
input to Sqoop contain records, which are called as rows in table. Those are read and
parsed into a set of records and delimited with user-specified delimiter.
Apache Sqoop - Working
• Export also works in a similar manner.
• The export tool exports a set of files from HDFS back to an RDBMS. The files given as
input to Sqoop contain records, which are called as rows in the table.
• When we submit our Job, it is mapped into Map Tasks which brings the chunk of
data from HDFS. These chunks are exported to a structured data
destination. Combining all these exported chunks of data, we receive the whole data
at the destination, which in most of the cases is an RDBMS (MYSQL/Oracle/SQL
Server).
Apache Sqoop - Working
• Reduce phase is required in case of aggregations. But, Apache Sqoop just imports
and exports the data; it does not perform any aggregations. Map job launch multiple
mappers depending on the number defined by the user.
• For Sqoop import, each mapper task will be assigned with a part of data to be
imported. Sqoop distributes the input data among the mappers equally to get high
performance.
• Then each mapper creates a connection with the database using JDBC and fetches
the part of data assigned by Sqoop and writes it into HDFS or Hive or HBase based
on the arguments provided in the CLI.
Flume vs Sqoop
The major difference between Flume and Sqoop is that:
•Flume only ingests unstructured data or semi-structured data into HDFS.
•While Sqoop can import as well as export structured data from RDBMS or Enterprise
data warehouses to HDFS or vice versa.
THANK YOU

Shop Manual PC27MRX1 PC30MRX1 PC35MRX1 PC40MRX1 PC45MRX1
No ratings yet
Shop Manual PC27MRX1 PC30MRX1 PC35MRX1 PC40MRX1 PC45MRX1
946 pages
Manual Midas m32
100% (1)
Manual Midas m32
61 pages
Abstract Algebra Rings, Modules, Polynomials, Ring Extensions, Categorical and Commutative Algebra
No ratings yet
Abstract Algebra Rings, Modules, Polynomials, Ring Extensions, Categorical and Commutative Algebra
488 pages
Natural Sciences, 2020-201
No ratings yet
Natural Sciences, 2020-201
777 pages
SOPFalloutDataWorkaroundv1 2
No ratings yet
SOPFalloutDataWorkaroundv1 2
457 pages
Microwave Remote Sensing
No ratings yet
Microwave Remote Sensing
66 pages
Quantitative Research Designs
100% (15)
Quantitative Research Designs
16 pages
C MCQ's
No ratings yet
C MCQ's
6 pages
Apache - SQOOP and Flume
No ratings yet
Apache - SQOOP and Flume
16 pages
Questions With Solutions Mid-Sem Final
No ratings yet
Questions With Solutions Mid-Sem Final
7 pages
Tissin Positioner TS900-manual E
No ratings yet
Tissin Positioner TS900-manual E
52 pages
Caie Igcse Mathematics Theory Znotes
No ratings yet
Caie Igcse Mathematics Theory Znotes
21 pages
Casting Technology 04
No ratings yet
Casting Technology 04
11 pages
Bda U3
No ratings yet
Bda U3
59 pages
Unit 4
No ratings yet
Unit 4
119 pages
BDA Module 2 PDF
No ratings yet
BDA Module 2 PDF
123 pages
PDMS Procedure: 2D DRAFT Intermediate - Structural Discipline
No ratings yet
PDMS Procedure: 2D DRAFT Intermediate - Structural Discipline
14 pages
Unit 3 Topic 8 Flume and Scoop
No ratings yet
Unit 3 Topic 8 Flume and Scoop
35 pages
U Iv Sqoop 1
No ratings yet
U Iv Sqoop 1
20 pages
MA3151 Matrix and Calculus Unit Wise
No ratings yet
MA3151 Matrix and Calculus Unit Wise
5 pages
National Masonry VIC Brochure MDG Book 1 Structural Fire Acoustic v24
No ratings yet
National Masonry VIC Brochure MDG Book 1 Structural Fire Acoustic v24
38 pages
Sqoop User Guide
No ratings yet
Sqoop User Guide
90 pages
Prac 4 Report
100% (1)
Prac 4 Report
15 pages
Unit 4 3 Lumify, Data Rapper and Sqooop
No ratings yet
Unit 4 3 Lumify, Data Rapper and Sqooop
27 pages
Lesson 3 - Data - Ingestion - Into - Big - Data - Systems - and - ETL
No ratings yet
Lesson 3 - Data - Ingestion - Into - Big - Data - Systems - and - ETL
104 pages
Module 4 - Yarn
No ratings yet
Module 4 - Yarn
34 pages
Multi V-S
No ratings yet
Multi V-S
11 pages
04 Sqoop
No ratings yet
04 Sqoop
30 pages
LIFT DATA SHEET (Single Mobile Crane Lift)
No ratings yet
LIFT DATA SHEET (Single Mobile Crane Lift)
1 page
BigData - Sem 4 - Elective 1 - Module 2 - PPT
No ratings yet
BigData - Sem 4 - Elective 1 - Module 2 - PPT
29 pages
Unit 6
No ratings yet
Unit 6
26 pages
Big Data Lecture
No ratings yet
Big Data Lecture
49 pages
Big Data Analytics
From Everand
Big Data Analytics
Nitin Kumar Yadav
No ratings yet
Az 3
No ratings yet
Az 3
19 pages
Big Data: Sqoop
No ratings yet
Big Data: Sqoop
43 pages
Machine Assignment
No ratings yet
Machine Assignment
2 pages
Unit 3 Apache Sqoop and Drill
No ratings yet
Unit 3 Apache Sqoop and Drill
10 pages
Fundamentals of Apache Sqoop Notes
No ratings yet
Fundamentals of Apache Sqoop Notes
66 pages
DSCI 5350 - Lecture 3 PDF
No ratings yet
DSCI 5350 - Lecture 3 PDF
39 pages
Sqoop
No ratings yet
Sqoop
28 pages
International GCSE Biology (4BI1) - Grade Characteristics: Holistic Approach To Grades
No ratings yet
International GCSE Biology (4BI1) - Grade Characteristics: Holistic Approach To Grades
7 pages
Module 5 - Mahout
No ratings yet
Module 5 - Mahout
20 pages
Module 4 - Yarn Schedulers
No ratings yet
Module 4 - Yarn Schedulers
21 pages
Cloudera Academic Partnership 8 PDF
No ratings yet
Cloudera Academic Partnership 8 PDF
69 pages
Module 5 - Flume
No ratings yet
Module 5 - Flume
23 pages
Scoop Intro
No ratings yet
Scoop Intro
9 pages
Chapter n3 Sqoop
No ratings yet
Chapter n3 Sqoop
24 pages
Zep Sqoop Big Data Interview Questions
No ratings yet
Zep Sqoop Big Data Interview Questions
25 pages
47 Exp2 Dav
No ratings yet
47 Exp2 Dav
15 pages
WAGO 750-461en
No ratings yet
WAGO 750-461en
6 pages
160 P16cse5a-P16ite3a 2020052411232116
No ratings yet
160 P16cse5a-P16ite3a 2020052411232116
13 pages
B22 BDA Experiment 03
No ratings yet
B22 BDA Experiment 03
11 pages
SqoopTutorial Ver 2.0
No ratings yet
SqoopTutorial Ver 2.0
51 pages
Sqoop VSFlume
No ratings yet
Sqoop VSFlume
18 pages
High Electron Mobility Transistor-Foti
No ratings yet
High Electron Mobility Transistor-Foti
17 pages
SQOOP
No ratings yet
SQOOP
8 pages
Unit 3 Part 2 Scoopflume
No ratings yet
Unit 3 Part 2 Scoopflume
10 pages
Gold Video Task Complted
No ratings yet
Gold Video Task Complted
31 pages
6.moving Data Into Hadoop
No ratings yet
6.moving Data Into Hadoop
18 pages
Bda 11
No ratings yet
Bda 11
10 pages
Module 2
No ratings yet
Module 2
27 pages
Chapter 6 HW Packet
No ratings yet
Chapter 6 HW Packet
19 pages
SQOOP
No ratings yet
SQOOP
6 pages
Cse 17CS82 M2 S2 PPT
No ratings yet
Cse 17CS82 M2 S2 PPT
20 pages
DMBD MBAA21041 Sqoop
No ratings yet
DMBD MBAA21041 Sqoop
11 pages
Practical No 18
No ratings yet
Practical No 18
8 pages
Apache Sqoop Data Transfer Between Hadoop and RDBMS
No ratings yet
Apache Sqoop Data Transfer Between Hadoop and RDBMS
9 pages
Module IV
No ratings yet
Module IV
5 pages
How Sqoop Works?: Sqoop "SQL To Hadoop and Hadoop To SQL"
No ratings yet
How Sqoop Works?: Sqoop "SQL To Hadoop and Hadoop To SQL"
27 pages
Apache Sqoop: Vasanth B 2019202060
No ratings yet
Apache Sqoop: Vasanth B 2019202060
10 pages
Bda Exp8 Chinmay
No ratings yet
Bda Exp8 Chinmay
6 pages
Sqoop - A Haddop Technology: Srikalahasti
No ratings yet
Sqoop - A Haddop Technology: Srikalahasti
13 pages
BigData Module 2
No ratings yet
BigData Module 2
18 pages
BDA Lab2
No ratings yet
BDA Lab2
8 pages
Sqoop Students Datadotz
No ratings yet
Sqoop Students Datadotz
19 pages
Notes Bug Data and of Apache
No ratings yet
Notes Bug Data and of Apache
6 pages
Experiment-5 (Case Study On Sqoop)
No ratings yet
Experiment-5 (Case Study On Sqoop)
5 pages
Pipe Glossary
No ratings yet
Pipe Glossary
3 pages
Apache Sqoop: Hanoi - Autumn 2019
No ratings yet
Apache Sqoop: Hanoi - Autumn 2019
18 pages
15CS82 Module 2
No ratings yet
15CS82 Module 2
12 pages
Practice Assignment
No ratings yet
Practice Assignment
4 pages
Module 5
No ratings yet
Module 5
4 pages
Practice Assignment
No ratings yet
Practice Assignment
3 pages
Scoop PPT
No ratings yet
Scoop PPT
3 pages
Sqoopintro
No ratings yet
Sqoopintro
2 pages
Are We Compatible or Terrible
No ratings yet
Are We Compatible or Terrible
6 pages
Essential Hadoop Tools: Module - 2 Session - 2
No ratings yet
Essential Hadoop Tools: Module - 2 Session - 2
6 pages
Intro
No ratings yet
Intro
2 pages
BD Sqltohadoop3 PDF
No ratings yet
BD Sqltohadoop3 PDF
13 pages
Apache Sqoop
No ratings yet
Apache Sqoop
21 pages
Sqoop
No ratings yet
Sqoop
4 pages
How Sqoop Works?: Relationaldatabase Servers in The Relational Database Structure
No ratings yet
How Sqoop Works?: Relationaldatabase Servers in The Relational Database Structure
7 pages
CLP 02.2 Course Title: Microprocessors & Microcontrollers Lab
No ratings yet
CLP 02.2 Course Title: Microprocessors & Microcontrollers Lab
6 pages
TD1360c Shell and Tube Datasheet
No ratings yet
TD1360c Shell and Tube Datasheet
2 pages
Electrical and Optical Properties of Germanium-Doped Zinc Oxide Thin Films
No ratings yet
Electrical and Optical Properties of Germanium-Doped Zinc Oxide Thin Films
4 pages
Learn Hive in 24 Hours
From Everand
Learn Hive in 24 Hours
Alex Nordeen
No ratings yet

Module 5 - Sqoop

Uploaded by

Module 5 - Sqoop

Uploaded by

APACHE SQOOP

Introduction to Apache Sqoop

• Sqoop internally converts the command into MapReduce tasks, which

You might also like