0% found this document useful (0 votes)

12 views10 pages

Sqoop 2

Uploaded by

sarkhelatanu74

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views10 pages

Sqoop 2

Uploaded by

sarkhelatanu74

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

www.smartdatacamp.

com

Apache Sqoop
Apache Sqoop is a command-line interface application for transferring data between
relational databases and Hadoop.

1)What is the default file format to import data using Apache Sqoop?

Answer)Sqoop allows data to be imported using two file formats

i) Delimited Text File Format
This is the default file format to import data using Sqoop. This file format can be explicitly
specified using the –as-textfile argument to the import command in Sqoop. Passing this as an
argument to the command will produce the string based representation of all the records to the
output files with the delimited characters between rows and columns.
ii) Sequence File Format
It is a binary file format where records are stored in custom record-specific data types which are
shown as Java classes. Sqoop automatically creates these data types and manifests them as java
classes.

2)How do I resolve a Communications Link Failure when connecting to MySQL?

Answer)Verify that you can connect to the database from the node where you are running Sqoop:
$ mysql --host=IP Address --database=test --user=username --password=password
Add the network port for the server to your my.cnf file
Set up a user account to connect via Sqoop. Grant permissions to the user to access the database
over the network:
Log into MySQL as root mysql -u root -p ThisIsMyPassword
Issue the following command: mysql> grant all privileges on test.* to 'testuser'@'%' identified by
'testpassword'

3)How do I resolve an IllegalArgumentException when connecting to Oracle?

Answer)This could be caused a non-owner trying to connect to the table so prefix the table name
with the schema, for example SchemaName.OracleTableName.
m

4)What's causing this Exception in thread main java.lang.IncompatibleClassChangeError

when running non-CDH Hadoop with Sqoop?

Answer)Try building Sqoop 1.4.1-incubating with the command line property

-Dhadoopversion=20.
m
ca

112 Learn Big Data, Hadoop, Apache Spark and Machine Learning @
www.smartdatacamp.com
www.smartdatacamp.com

5)How do I resolve an ORA-00933 error SQL command not properly ended when connecting
to Oracle?

Answer)Omit the option --driver oracle.jdbc.driver.OracleDriver and then re-run the Sqoop
command.

6)I have around 300 tables in a database. I want to import all the tables from the database
except the tables named Table298, Table 123, and Table299. How can I do this without

m
having to import the tables one by one?

co
Answer)This can be accomplished using the import-all-tables import command in Sqoop and by
specifying the exclude-tables option with it as follows-

p.
sqoop import-all-tables
--connect –username –password --exclude-tables Table298, Table 123, Table 299

am
7)Does Apache Sqoop have a default database?
ac
Answer)Yes, MySQL is the default database.
at
td

8)How can I import large objects (BLOB and CLOB objects) in Apache Sqoop?
ar

Answer)Apache Sqoop import command does not support direct import of BLOB and CLOB large
objects. To import large objects, I Sqoop, JDBC based imports have to be used without the direct
.sm

argument to the import utility.

9)How can you execute a free form SQL query in Sqoop to import the rows in a sequential
w

manner?
w

Answer)This can be accomplished using the –m 1 option in the Sqoop import command. It will
w

create only one MapReduce task which will then import rows serially.

10)How will you list all the columns of a table using Apache Sqoop?

Answer)Unlike sqoop-list-tables and sqoop-list-databases, there is no direct command like

sqoop-list-columns to list all the columns. The indirect way of achieving this is to retrieve the
columns of the desired tables and redirect them to a file which can be viewed manually
containing the column names of a particular table.

113
www.smartdatacamp.com
sqoop import --m 1 --connect jdbc: sqlserver: nameofmyserver; database=nameofmydatabase;
username=DeZyre; password=mypassword --query SELECT column_name, DATA_TYPE FROM
INFORMATION_SCHEMA.Columns WHERE table_name=mytableofinterest AND $CONDITIONS
--target-dir mytableofinterest_column_name

11)What is the difference between Sqoop and DistCP command in Hadoop?

Answer)Both distCP (Distributed Copy in Hadoop) and Sqoop transfer data in parallel but the only
difference is that distCP command can transfer any kind of data from one Hadoop cluster to
another whereas Sqoop transfers data between RDBMS and other components in the Hadoop
ecosystem like HBase, Hive, HDFS, etc.

12)What is Sqoop metastore?

Answer)Sqoop metastore is a shared metadata repository for remote users to define and execute
saved jobs created using sqoop job defined in the metastore. The sqoop –site.xml should be
configured to connect to the metastore.

13)What is the significance of using –split-by clause for running parallel import tasks in
Apache Sqoop?

Answer)--Split-by clause is used to specify the columns of the table that are used to generate
splits for data imports. This clause specifies the columns that will be used for splitting when
importing the data into the Hadoop cluster. —split-by clause helps achieve improved
performance through greater parallelism. Apache Sqoop will create splits based on the values
present in the columns specified in the –split-by clause of the import command. If the –split-by
clause is not specified, then the primary key of the table is used to create the splits while data
import. At times the primary key of the table might not have evenly distributed values between
the minimum and maximum range. Under such circumstances –split-by clause can be used to
specify some other column that has even distribution of data to create splits so that data import
is efficient.

14)You use –split-by clause but it still does not give optimal performance then how will you
improve the performance further.
m
co

Answer)Using the –boundary-query clause. Generally, sqoop uses the SQL query select min (),
max () from to find out the boundary values for creating splits. However, if this query is not
optimal then using the –boundary-query argument any random query can be written to generate
p.

two numeric columns.

m
ca

114 Learn Big Data, Hadoop, Apache Spark and Machine Learning @
www.smartdatacamp.com
www.smartdatacamp.com
15)During sqoop import, you use the clause –m or –numb-mappers to specify the number
of mappers as 8 so that it can run eight parallel MapReduce tasks, however, sqoop runs
only four parallel MapReduce tasks. Why?

Answer)Hadoop MapReduce cluster is configured to run a maximum of 4 parallel MapReduce

tasks and the sqoop import can be configured with number of parallel tasks less than or equal to
4 but not more than 4.

16)You successfully imported a table using Apache Sqoop to HBase but when you query the

m
table it is found that the number of rows is less than expected. What could be the likely
reason?

co
Answer)If the imported records have rows that contain null values for all the columns, then

p.
probably those records might have been dropped off during import because HBase does not
allow null values in all the columns of a record.

am
17)The incoming value from HDFS for a particular column is NULL. How will you load that
row into RDBMS in which the columns are defined as NOT NULL?
ac
at
Answer)Using the –input-null-string parameter, a default value can be specified so that the row
gets inserted with the default value for the column that it has a NULL value in HDFS.
td

18)If the source data gets updated every now and then, how will you synchronise the data
ar

in HDFS that is imported by Sqoop?

.sm

Answer)Data can be synchronised using incremental parameter with data import –

--Incremental parameter can be used with one of the two options-
i) append-If the table is getting updated continuously with new rows and increasing row id values
w

then incremental import with append option should be used where values of some of the
columns are checked (columns to be checked are specified using –check-column) and if it
w

discovers any modified value for those columns then only a new row will be inserted.
ii) lastmodified – In this kind of incremental import, the source has a date column which is
w

checked for. Any records that have been updated after the last import based on the lastmodifed
column in the source, the values would be updated.

19)Below command is used to specify the connect string that contains hostname to
connect MySQL with local host and database name as test_db
––connect jdbc: mysql: //localhost/test_db
Is the above command the best way to specify the connect string in case I want to use
Apache Sqoop with a distributed hadoop cluster?

115
www.smartdatacamp.com
Answer)When using Sqoop with a distributed Hadoop cluster the URL should not be specified
with localhost in the connect string because the connect string will be applied on all the
DataNodes with the Hadoop cluster. So, if the literal name localhost is mentioned instead of the
IP address or the complete hostname then each node will connect to a different database on
their localhosts. It is always suggested to specify the hostname that can be seen by all remote
nodes.

20)What are the relational databases supported in Sqoop?

Answer)Below are the list of RDBMSs that are supported by Sqoop Currently.
MySQL
PostGreSQL
Oracle
Microsoft SQL
IBM’s Netezza
Teradata

21)What are the destination types allowed in Sqoop Import command?

Answer)Currently Sqoop Supports data imported into below services.

HDFS
Hive
HBase
HCatalog
Accumulo

22)Is Sqoop similar to distcp in hadoop?

Answer)Partially yes, hadoop’s distcp command is similar to Sqoop Import command. Both
submits parallel map-only jobs but distcp is used to copy any type of files from Local FS/HDFS to
HDFS and Sqoop is for transferring the data records only between RDMBS and Hadoop eco
system services, HDFS, Hive and HBase.

23)What are the majorly used commands in Sqoop?

Answer)In Sqoop Majorly Import and export commands are used. But below commands are also
useful some times.
co

codegen
eval
p.

import-all-tables
job
m

list-databases
ca

116 Learn Big Data, Hadoop, Apache Spark and Machine Learning @
www.smartdatacamp.com
www.smartdatacamp.com
list-tables
merge
metastore

24)While loading tables from MySQL into HDFS, if we need to copy tables with maximum
possible speed, what can you do ?

Answer)We need to use –direct argument in import command to use direct import fast path and
this –direct can be used only with MySQL and PostGreSQL as of now.

m
25)While connecting to MySQL through Sqoop, I am getting Connection Failure exception

co
what might be the root cause and fix for this error scenario?

p.
Answer)This might be due to insufficient permissions to access your MySQL database over the
network. To confirm this we can try the below command to connect to MySQL database from

am
Sqoop’s client machine.
$ mysql --host=MySql node > --database=test --user= --password=
If this is the case then we need grant permissions user @ sqoop client machine as per the answer
ac
to Question 6 in this post.
at
26)What is the importance of eval tool?
td

Answer)It allow users to run sample SQL queries against Database and preview the result on the
ar

console.
.sm

27)What is the process to perform an incremental data load in Sqoop?

Answer)The process to perform incremental data load in Sqoop is to synchronize the modified or
w

updated data (often referred as delta data) from RDBMS to Hadoop. The delta data can be
facilitated through the incremental load command in Sqoop.
w

Incremental load can be performed by using Sqoop import command or by loading the data into
hive without overwriting it. The different attributes that need to be specified during incremental
w

load in Sqoop are-

1)Mode (incremental) –The mode defines how Sqoop will determine what the new rows are. The
mode can have value as Append or Last Modified.
2)Col (Check-column) –This attribute specifies the column that should be examined to find out the
rows to be imported.
3)Value (last-value) –This denotes the maximum value of the check column from the previous
import operation.

28)What is the significance of using –compress-codec parameter?

117
www.smartdatacamp.com

Answer)To get the out file of a sqoop import in formats other than .gz like .bz2 we use the
–compress -code parameter.

29)Can free form SQL queries be used with Sqoop import command? If yes, then how can
they be used?

Answer)Sqoop allows us to use free form SQL queries with the import command. The import
command should be used with the –e and – query options to execute free form SQL queries.
When using the –e and –query options with the import command the –target dir value must be
specified.

30)What is the purpose of sqoop-merge?

Answer)The merge tool combines two datasets where entries in one dataset should overwrite
entries of an older dataset preserving only the newest version of the records between both the
data sets.

31)How do you clear the data in a staging table before loading it by Sqoop?

Answer)By specifying the –clear-staging-table option we can clear the staging table before it is
loaded. This can be done again and again till we get proper data in staging.

32)How will you update the rows that are already exported?

Answer)The parameter –update-key can be used to update existing rows. In it a

comma-separated list of columns is used which uniquely identifies a row. All of these columns is
used in the WHERE clause of the generated UPDATE query. All other table columns will be used in
the SET part of the query.

33)What is the role of JDBC driver in a Sqoop set up?

Answer)To connect to different relational databases sqoop needs a connector. Almost every DB
co

vendor makes this connecter available as a JDBC driver which is specific to that DB. So Sqoop
needs the JDBC driver of each of the database it needs to interact with.
p.

34)When to use --target-dir and when to use --warehouse-dir while importing data?
m
ca

118 Learn Big Data, Hadoop, Apache Spark and Machine Learning @
www.smartdatacamp.com
www.smartdatacamp.com

Answer)To specify a particular directory in HDFS use --target-dir but to specify the parent
directory of all the sqoop jobs use --warehouse-dir. In this case under the parent directory sqoop
will cerate a directory with the same name as th e table.

35)When the source data keeps getting updated frequently, what is the approach to keep
it in sync with the data in HDFS imported by sqoop?

Answer)sqoop can have 2 approaches.

m
a − To use the --incremental parameter with append option where value of some columns are
checked and only in case of modified values the row is imported as a new row.

co
b − To use the --incremental parameter with lastmodified option where a date column in the
source is checked for records which have been updated after the last import.

p.
36)Is it possible to add a parameter while running a saved job?

am
Answer)Yes, we can add an argument to a saved job at runtime by using the --exec option
sqoop job --exec jobname -- -- newparameter
ac
at
37)Before starting the data transfer using mapreduce job, sqoop takes a long time to
retrieve the minimum and maximum values of columns mentioned in –split-by parameter.
How can we make it efficient?
td
ar

Answer)We can use the --boundary –query parameter in which we specify the min and max value
for the column based on which the split can happen into multiple mapreduce tasks. This makes it
.sm

faster as the query inside the –boundary-query parameter is executed first and the job is ready
with the information on how many mapreduce tasks to create before executing the main query.
w

38)How will you implement all-or-nothing load using sqoop?

Answer)Using the staging-table option we first load the data into a staging table and then load it
w

to the final target table only if the staging load is successful.

39)How will you update the rows that are already exported?

Answer)The parameter --update-key can be used to update existing rows. In it a

119
www.smartdatacamp.com

40)How can you sync a exported table with HDFS data in which some rows are deleted?

Answer)Truncate the target table and load it again.

41)How can we load to a column in a relational table which is not null but the incoming
value from HDFS has a null value?

Answer)By using the –input-null-string parameter we can specify a default value and that will
allow the row to be inserted into the target table.

42)How can you schedule a sqoop job using Oozie?

Answer)Oozie has in-built sqoop actions inside which we can mention the sqoop commands to be
executed.

43)Sqoop imported a table successfully to HBase but it is found that the number of rows is
fewer than expected. What can be the cause?

Answer)Some of the imported records might have null values in all the columns. As Hbase does
not allow all null values in a row, those rows get dropped.

44)How can you force sqoop to execute a free form Sql query only once and import the
rows serially.

Answer)By using the –m 1 clause in the import command, sqoop cerates only one mapreduce
task which will import the rows sequentially.

45)In a sqoop import command you have mentioned to run 8 parallel Mapreduce task but
sqoop runs only 4. What can be the reason?
m

Answer)The Mapreduce cluster is configured to run 4 parallel tasks. So the sqoop command must
co

have number of parallel tasks less or equal to that of the MapReduce cluster.
p.

46)What happens when a table is imported into a HDFS directory which already exists
using the –apend parameter?
m
ca

120 Learn Big Data, Hadoop, Apache Spark and Machine Learning @
www.smartdatacamp.com
www.smartdatacamp.com

Answer)Using the --append argument, Sqoop will import data to a temporary directory and then
rename the files into the normal target directory in a manner that does not conflict with existing
filenames in that directory.

47)How to import only the updated rows form a table into HDFS using sqoop assuming the
source has last update timestamp details for each row?

Answer)By using the lastmodified mode. Rows where the check column holds a timestamp more

m
recent than the timestamp specified with --last-value are imported.

co
48)What does the following query do?
$ sqoop import --connect jdbc:mysql://host/dbname --table EMPLOYEES \

p.
--where start_date > 2012-11-09

am
Answer)It imports the employees who have joined after 9-Nov-2012.
ac
49)Give a Sqoop command to import all the records from employee table divided into
groups of records by the values in the column department_id.
at
td

Answer) $ sqoop import --connect jdbc:mysql://db.foo.com/corp --table EMPLOYEES \

--split-by dept_id
ar

50)What does the following query do? $ sqoop import --connect

.sm

jdbc:mysql://db.foo.com/somedb --table sometable \

--where "id > 1000" --target-dir /incremental_dataset --append
w

Answer)It performs an incremental import of new data, after having already imported the first
1000 rows of a table
w
w

121

Campus Placement Drive - Recruit CRM
No ratings yet
Campus Placement Drive - Recruit CRM
2 pages
SIC Big Data Chapter 3 Workbook
No ratings yet
SIC Big Data Chapter 3 Workbook
86 pages
Unit 3 Topic 8 Flume and Scoop
No ratings yet
Unit 3 Topic 8 Flume and Scoop
35 pages
Big Data Hadoop MCQ Question
No ratings yet
Big Data Hadoop MCQ Question
109 pages
U Iv Sqoop 1
No ratings yet
U Iv Sqoop 1
20 pages
Chapter n3 Sqoop
No ratings yet
Chapter n3 Sqoop
24 pages
Amazon EC2 Auto Scaling User Guide
No ratings yet
Amazon EC2 Auto Scaling User Guide
367 pages
Adnan's Computer For Pharmacist
No ratings yet
Adnan's Computer For Pharmacist
136 pages
Bda U3
No ratings yet
Bda U3
59 pages
Sqoop
No ratings yet
Sqoop
15 pages
BioBlocksLab - A Portable DIY Bio Lab Using BioBlocks Language - ScienceDirect
No ratings yet
BioBlocksLab - A Portable DIY Bio Lab Using BioBlocks Language - ScienceDirect
14 pages
04 Sqoop
No ratings yet
04 Sqoop
30 pages
Unit 3 Apache Sqoop and Drill
No ratings yet
Unit 3 Apache Sqoop and Drill
10 pages
Unit-5 SM
No ratings yet
Unit-5 SM
32 pages
Networking Manual by Bassterlord (Fisheye)
No ratings yet
Networking Manual by Bassterlord (Fisheye)
63 pages
Darkweb Monitoring Report
No ratings yet
Darkweb Monitoring Report
6 pages
Inputs Outputs: Navigation Menu
No ratings yet
Inputs Outputs: Navigation Menu
10 pages
Unit 4 3 Lumify, Data Rapper and Sqooop
No ratings yet
Unit 4 3 Lumify, Data Rapper and Sqooop
27 pages
Module 5 - Sqoop
No ratings yet
Module 5 - Sqoop
25 pages
Avr EM'CY DIESEL GENERATOR-3
No ratings yet
Avr EM'CY DIESEL GENERATOR-3
7 pages
Mod 2
No ratings yet
Mod 2
70 pages
CSC 102 Lecture 1
No ratings yet
CSC 102 Lecture 1
29 pages
Sqooprequestfiles
No ratings yet
Sqooprequestfiles
7 pages
Bank Loan Prediction Xgboost Smote - Ipynb
No ratings yet
Bank Loan Prediction Xgboost Smote - Ipynb
375 pages
160 P16cse5a-P16ite3a 2020052411232116
No ratings yet
160 P16cse5a-P16ite3a 2020052411232116
13 pages
SQOOP
No ratings yet
SQOOP
8 pages
B22 BDA Experiment 03
No ratings yet
B22 BDA Experiment 03
11 pages
SEMIKRON Product-Catalogue EN PDF
100% (1)
SEMIKRON Product-Catalogue EN PDF
126 pages
Bda 11
No ratings yet
Bda 11
10 pages
Sqoop
No ratings yet
Sqoop
28 pages
INTERN
No ratings yet
INTERN
40 pages
Sqoop LAB
No ratings yet
Sqoop LAB
12 pages
E-Cobus Anleitung English BMS Und SAE CAN Auslesen 2
No ratings yet
E-Cobus Anleitung English BMS Und SAE CAN Auslesen 2
35 pages
Picozero Readthedocs Io en Latest
No ratings yet
Picozero Readthedocs Io en Latest
69 pages
417 Competency Based Question Artificial Intelligence Chap-1 (2024-25)
No ratings yet
417 Competency Based Question Artificial Intelligence Chap-1 (2024-25)
4 pages
HTML Iframes
No ratings yet
HTML Iframes
10 pages
Zep Sqoop Big Data Interview Questions
No ratings yet
Zep Sqoop Big Data Interview Questions
25 pages
NRDC FORMAT FOR PROVIDING INFORMATION ON INVENTIONS FOR PATENTING - Shardha and SMT1
No ratings yet
NRDC FORMAT FOR PROVIDING INFORMATION ON INVENTIONS FOR PATENTING - Shardha and SMT1
9 pages
Fundamentals of Apache Sqoop Notes
No ratings yet
Fundamentals of Apache Sqoop Notes
66 pages
BDA Lab2
No ratings yet
BDA Lab2
8 pages
Big Data: Sqoop
No ratings yet
Big Data: Sqoop
43 pages
Sqoop User Guide
No ratings yet
Sqoop User Guide
90 pages
Sqoop
No ratings yet
Sqoop
3 pages
Which of The Following Is The Foundation of Mapreduce Operations?
No ratings yet
Which of The Following Is The Foundation of Mapreduce Operations?
12 pages
Boujou 5.0 GettingStarted
No ratings yet
Boujou 5.0 GettingStarted
50 pages
DSCI 5350 - Lecture 3 PDF
No ratings yet
DSCI 5350 - Lecture 3 PDF
39 pages
How Sqoop Works?: Relationaldatabase Servers in The Relational Database Structure
No ratings yet
How Sqoop Works?: Relationaldatabase Servers in The Relational Database Structure
7 pages
Lesson 1: Functions: Solution
0% (1)
Lesson 1: Functions: Solution
4 pages
Wms Requirements Template
100% (1)
Wms Requirements Template
13 pages
Apache Sqoop Data Transfer Between Hadoop and RDBMS
No ratings yet
Apache Sqoop Data Transfer Between Hadoop and RDBMS
9 pages
Sqoop
No ratings yet
Sqoop
9 pages
6.moving Data Into Hadoop
No ratings yet
6.moving Data Into Hadoop
18 pages
Sqoop Additional Reading Pp-200913-222451-Unlocked
No ratings yet
Sqoop Additional Reading Pp-200913-222451-Unlocked
18 pages
M - M - Num-Mappers
No ratings yet
M - M - Num-Mappers
4 pages
Lesson 3 - Data - Ingestion - Into - Big - Data - Systems - and - ETL
No ratings yet
Lesson 3 - Data - Ingestion - Into - Big - Data - Systems - and - ETL
104 pages
PicoChess 0.9k Reference Card
No ratings yet
PicoChess 0.9k Reference Card
3 pages
Hadoop
No ratings yet
Hadoop
13 pages
Chapter 8 Revision
No ratings yet
Chapter 8 Revision
15 pages
Gold Video Task Complted
No ratings yet
Gold Video Task Complted
31 pages
5 - Big - Data Vivek
No ratings yet
5 - Big - Data Vivek
4 pages
Lab 3 (Saha)
No ratings yet
Lab 3 (Saha)
7 pages
Production Issues: in Beginning Almost Every Time!
No ratings yet
Production Issues: in Beginning Almost Every Time!
8 pages
First Quarter Exam Grade 8 Math
No ratings yet
First Quarter Exam Grade 8 Math
3 pages
What Are The Components of Web Service?: Java Questions
No ratings yet
What Are The Components of Web Service?: Java Questions
9 pages
Sqoop
No ratings yet
Sqoop
4 pages
SqoopTutorial Ver 2.0
No ratings yet
SqoopTutorial Ver 2.0
51 pages
Sqoop - A Haddop Technology: Srikalahasti
No ratings yet
Sqoop - A Haddop Technology: Srikalahasti
13 pages
Sqoop Students Datadotz
No ratings yet
Sqoop Students Datadotz
19 pages
Project On Microsoft
33% (3)
Project On Microsoft
7 pages
How Sqoop Works?: Sqoop "SQL To Hadoop and Hadoop To SQL"
No ratings yet
How Sqoop Works?: Sqoop "SQL To Hadoop and Hadoop To SQL"
27 pages
BD Sqltohadoop3 PDF
No ratings yet
BD Sqltohadoop3 PDF
13 pages
Apache Sqoop: Hanoi - Autumn 2019
No ratings yet
Apache Sqoop: Hanoi - Autumn 2019
18 pages
Interview
No ratings yet
Interview
86 pages
MA-K27468-KW Oil Analysis Solutions Iss9 Small
No ratings yet
MA-K27468-KW Oil Analysis Solutions Iss9 Small
8 pages
Knowledge About Apache Sqoop and Its All Basic Commands To Import and Export The Data
No ratings yet
Knowledge About Apache Sqoop and Its All Basic Commands To Import and Export The Data
7 pages
Sqoop Practice
No ratings yet
Sqoop Practice
7 pages
5 SQL Hadoop Analyzing Big Data Hive m5 Storage Eco System Slides
No ratings yet
5 SQL Hadoop Analyzing Big Data Hive m5 Storage Eco System Slides
15 pages
Sqoop Commands
No ratings yet
Sqoop Commands
9 pages
Department of Education: Republic of The Philippines
No ratings yet
Department of Education: Republic of The Philippines
24 pages
Image Compression
No ratings yet
Image Compression
15 pages
IBM Big Data Engineer: IBM C2090-101 Version Demo
No ratings yet
IBM Big Data Engineer: IBM C2090-101 Version Demo
6 pages
Sqoop Interview Questions
No ratings yet
Sqoop Interview Questions
6 pages
Apache Sqoop
No ratings yet
Apache Sqoop
21 pages
Sqoop Export and Import Commands
No ratings yet
Sqoop Export and Import Commands
5 pages
Sqoop Cheatsheet
No ratings yet
Sqoop Cheatsheet
3 pages
Sqoop Commands
No ratings yet
Sqoop Commands
4 pages
Budget of Work - Ict Grade 10
No ratings yet
Budget of Work - Ict Grade 10
9 pages
Sqoop Cammand
No ratings yet
Sqoop Cammand
8 pages
50 Recipes for Programming Node.js
From Everand
50 Recipes for Programming Node.js
Jamie Munro
3/5 (4)
UNIX Shell Programming Interview Questions You'll Most Likely Be Asked
From Everand
UNIX Shell Programming Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet

Sqoop 2

Uploaded by

Sqoop 2

Uploaded by

www.smartdatacamp.

Answer)Sqoop allows data to be imported using two file formats

2)How do I resolve a Communications Link Failure when connecting to MySQL?

3)How do I resolve an IllegalArgumentException when connecting to Oracle?

4)What's causing this Exception in thread main java.lang.IncompatibleClassChangeError

when running non-CDH Hadoop with Sqoop?

Answer)Try building Sqoop 1.4.1-incubating with the command line property

argument to the import utility.

Answer)Unlike sqoop-list-tables and sqoop-list-databases, there is no direct command like

11)What is the difference between Sqoop and DistCP command in Hadoop?

12)What is Sqoop metastore?

two numeric columns.

Answer)Hadoop MapReduce cluster is configured to run a maximum of 4 parallel MapReduce

in HDFS that is imported by Sqoop?

Answer)Data can be synchronised using incremental parameter with data import –

20)What are the relational databases supported in Sqoop?

21)What are the destination types allowed in Sqoop Import command?

Answer)Currently Sqoop Supports data imported into below services.

22)Is Sqoop similar to distcp in hadoop?

23)What are the majorly used commands in Sqoop?

27)What is the process to perform an incremental data load in Sqoop?

load in Sqoop are-

28)What is the significance of using –compress-codec parameter?

30)What is the purpose of sqoop-merge?

Answer)The parameter –update-key can be used to update existing rows. In it a

33)What is the role of JDBC driver in a Sqoop set up?

Answer)sqoop can have 2 approaches.

38)How will you implement all-or-nothing load using sqoop?

to the final target table only if the staging load is successful.

Answer)The parameter --update-key can be used to update existing rows. In it a

Answer)Truncate the target table and load it again.

42)How can you schedule a sqoop job using Oozie?

Answer) $ sqoop import --connect jdbc:mysql://db.foo.com/corp --table EMPLOYEES \

50)What does the following query do? $ sqoop import --connect

jdbc:mysql://db.foo.com/somedb --table sometable \

You might also like