Sqoop
Sqoop
a. ValidationThreshold
• whether the error margin between the source and target are acceptable:
Absolute, Percentage Tolerant and many more. However, the default
implementation is AbsoluteValidationThreshold.
• Basically, that ensures that the row counts from source as well as targets
are the same.
b. ValidationFailureHandler
• Also, it has once interface with ValidationFailureHandler, that is
responsible for handling failures here. Such as log an error/warning, abort
and many more. Although default implementation is LogOnFailureHandler.
Here that logs a warning message to the configured logger.
c) Validator:Validator drives the validation logic. Also delegates failure
handling to ValidationFailureHandler. Moreover, the default implementation
is RowCountValidator here.
COMMANDS
import data into HDFS(Syntax of
Sqoop Validation).
• $ sqoop import (generic-args) (import-args)
• $ sqoop-import (generic-args) (import-args)
Incremental Import
• Incremental import is a technique that imports only
the newly added rows in a table. It is required to add
‘incremental’, ‘check-column’, and ‘last-value’ options
to perform the incremental import.
• The following syntax is used for the incremental
option in Sqoop import command.
--incremental <mode>
--check-column <column name>
--last value <last check column value>
• Let us assume the newly added data into emp table
is as follows −
1206, satish p, grp des, 20000, GR
commands
• The following command is used to import
the emp table from MySQL database server to
HDFS.
• $ sqoop import \ --connect
jdbc:mysql://localhost/userdb \ --username
root \ --table emp --m 1
• To verify the imported data in HDFS, use the
following command.
• $ $HADOOP_HOME/bin/hadoop fs -cat
/emp/part-m-*
Import tables
• from the RDBMS database server to the HDFS.
Each table data is stored in a separate
directory and the directory name is same as
the table name.