Sqoop
Sqoop
==============Hadoop Commands==============================
hadoop namenode -format
hadoop fs -mkdir <path>
hadoop fs -copyFromLocal <local file path> <hadoop destination path>
hadoop fs -copyToLocal <hadoop destination path> <local file path>
input
split
map
shuffel /sort
reduce
output
hadoop will not overwrite an existing file. this utility is disabled by developers
as the file system is shared and it it may overrite the file of some other user
we need to modify the hadoop-site.xml file to change the file saving area of
hadoop. once changed then no need to format namenode every time we start hadoop
==========mysql COMMANDS======================================
show databases;
use database <database name>;
show tables;
show coulmns from <table name>;
===============SQOOP IMPORT==================================
sqoop import
--connect jdbc:mysql://localhost:3306/vaibhav
--table employees
--username root
--password hr
--target-dir /data/sqoop/example_1
sqoop import
--connect jdbc:mysql://localhost:3306/vaibhav
--table employees
--username root
--password hr
--target-dir /data/sqoop/example_2
-m 1
(here number of mapers cannot be more than 1, we are setting the mapper to 1
because their is no primary key in the data so hadoop cannot divide the data among
4 mapers)
sqoop import
--connect jdbc:mysql://localhost:3306/vaibhav
--table employees
--columns 'FIRST_NAME,LAST_NAME,EMPLOYEE_ID,SALARY'
--username root
--password hr
--target-dir /data/sqoop/example_3
sqoop import
--connect jdbc:mysql://localhost:3306/vaibhav
--table employees
--columns 'FIRST_NAME,LAST_NAME,EMPLOYEE_ID,SALARY'
--where 'SALARY>5000'
--username root
--password hr
--target-dir /data/sqoop/example_4
sqoop import
--connect jdbc:mysql://localhost:3306/vaibhav
--table employees
--columns 'FIRST_NAME,LAST_NAME,EMPLOYEE_ID,SALARY'
--where 'FIRST_NAME LIKE "A%"'
--username root
--password hr
--target-dir /data/sqoop/example_5
=======================================================================
SQOOP IMPORT (Free-Form query import) (split by clause and $CONDITIONS)
=======================================================================
sqoop import
--connect jdbc:mysql://localhost:3306/vaibhav
--query 'select * from employees join DEPT on (employees.DEPARTMENT_ID =DEPT.DID)
WHERE $CONDITIONS'
--username root
--password hr
--target-dir /data/sqoop/example_6
--split-by 'EMPLOYEE_ID'
To trouble shoot the problem we always refer to logs. see the message with the
error log to determine the issue
=============HOMEWORK=========================
password file : instead of specifying password in comand place it in file and call
file in comand
incremental import : import only new rows in the hdfs and not all
==================SQOOP EXPORT==================
while export the data in hdfs data should be in the same format that of employee
table i.e all the constraints should matched
sqoop export
--connect jdbc:mysql://localhost:3306/vaibhav
--table employees
--username root
--password hr
--export-dir /data/sqoop/example_1
==============================================================