0% found this document useful (0 votes)

32 views11 pages

Week 3

The document discusses using Sqoop to import and export data between databases and HDFS. Some key points covered include: - Sqoop Import is used to transfer data from a database to HDFS, while Sqoop Export moves data from HDFS to a database. - Imports use MapReduce jobs with mappers that divide the work based on the table's primary key by default. - Imports and exports can be customized through options like compression, column selection, and partitioning. - Staging tables are used during exports to avoid partial data transfers if a job fails. - Incremental imports allow importing only new or updated records over time rather than reprocessing the full table each time

Uploaded by

pali.rajtrader

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views11 pages

Week 3

Uploaded by

pali.rajtrader

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

You are on page 1/ 11

# SQOOP IMPORT EXECISE

=======================

SESSION - 1
============

Sqoop Import - Databases to HDFS (frequently)

Sqoop Export - HDFS to Databases

Sqoop Eval - to run queries on the database

sqoop-list-databases \
--connect "jdbc:mysql://quickstart.cloudera:3306" \
--username retail_dba \
--password cloudera

sqoop-list-tables \
--connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" \
--username retail_dba \
--password cloudera

sqoop-eval \
--connect "jdbc:mysql://quickstart.cloudera:3306" \
--username retail_dba \
--password cloudera \
--query "select * from retail_db.customers limit 10"

SESSION - 2
============

INSERT INTO people values (101,'Raj','Pali','Itwara chowk','Yavatmal)

Sqoop import
=============

(transfer data from your relation db to HDFS)

Mapreduce job

only mappers work and no reducer.

by default there are 4 mappers which do the work.

yes we can change the number of mappers.

these mappers divide the work based on primary key.

if there is no primary key then what will happen?

1. you change the number of mappers to 1.

2. split by column

sqoop-eval \
--connect "jdbc:mysql://10.0.2.15:3306" \
--username retail_dba \
--password cloudera \
--query "describe retail_db.orders"

sqoop import \
--connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" \
--username root \sqoop
--password cloudera \
--table orders \
--target-dir /queryresult

sqoop import \
--connect "jdbc:mysql://quickstart.cloudera:3306/trendytech" \
--username root \
--password cloudera \
--table people \
--target-dir peopleresult

sqoop import \
--connect "jdbc:mysql://quickstart.cloudera:3306/trendytech" \
--username root \
--password cloudera \
--table people \ {people table don't content the P.K therefore
setting the mapper 1}
-m 1 \ {if you dont set mapper 1 then it will give an
error}
--target-dir peopleresult

sqoop import \
--connect "jdbc:mysql://quickstart.cloudera:3306/trendytech" \
--username root \
--password cloudera \
--table people \
-m 1 \
--warehouse-dir peopleresult1

Now my path will this = peopleresult1/people

Target dir vs. Warehouse dir

=============================
employee table that you are importing from mysql

In case of target directory the directory path mentioned is

the final path where data is copied.
/data

In case of warehouse directory, the system will create a

subdirectory with the table name.
/data/employee

sqoop-import-all-tables \
--connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" \
--username retail_dba \
--password cloudera \
--as-sequencefile \
-m 4 \
--warehouse-dir /user/cloudera/sqoopdir

SESSION - 3
============
sqoop-list-databases \
--connect "jdbc:mysql://quickstart.cloudera:3306" \
--username retail_dba \
--password cloudera

sqoop-list-databases \
--connect "jdbc:mysql://quickstart.cloudera:3306" \
--username retail_dba \
-P {console me aapka Password show nahi hoga!}

How to Redirect the logs for later use ?

----------------------------------------

sqoop import \
--connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" \
--username root \
--password cloudera \
--table orders \
--warehouse-dir /queryresult4 1>query.out 2>query.err

Mostly it will contain output content : 1>query.out (in case of eval command)
And all other log , errors will be here : 2>query.err (you can set any name for
file & this file will be stored in cwd from where command is run)

Boundary query
===============

sqoop import the work is divided among the mappers based on the
primary key.

Employee table
===============
empId, empname, age, salary (empId is the primary key)
0
1
2
3
4
5
6
.
.
100000

the mappers by default will be 4.

find -- how the mapper will distribute the work on the basis of P.K.?

the max of primary key

min of primary key

split size = (max_of_pk - min_of_pk)/Num_Mappers

(100000 - 0)/4
100000/4 = 25000

split size = 25000

mapper1 0 - 25000
mapper2 25001 - 50000
mapper3 50001 - 75000
mapper4 75001 - 100000

SESSION - 4
============

sqoop-import \
--connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" \
--username retail_dba \
--password cloudera \
--table orders \
--compress \
--warehouse-dir /user/cloudera/compressresult

sqoop-import \
--connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" \
--username retail_dba \
--password cloudera \
--table orders \
--compression-codec BZip2Codec \
--warehouse-dir /user/cloudera/bzipcompresult

sqoop-import \
--connect jdbc:mysql://quickstart.cloudera:3306/retail_db \
--username retail_dba \
--password cloudera \
--table orders \
--cloumns order_id,order_customer_id,order_status \
--where "order_status in ('complete','closed')" \ {Where clause converted as
BoundaryValsQuery}
--warehouse-dir /user/cloudera/customimportresult

sqoop-import \
--connect jdbc:mysql://quickstart.cloudera:3306/retail_db \
--username retail_dba \
--password cloudera \
--table orders \
--boundary-query "SELECT 1, 68883" {Here we are hardcoding the min & max for
BVQ due to outlier}
--warehouse-dir /user/cloudera/ordersboundval

SESSION - 5
============

sqoop-import
--connect "jdbc:mysql://quickstart.cloudera:3306/retail_db \
--username retail_dba \
--password cloudera \
--table orders \
--columns order_id,order_customer_id,order_status \
--where "order_status in ('processing')" \ {Where clause internally add
to boundary query, no matter what}
--warehouse-dir /user/cloudera/whereclauseresult

sqoop-import \
--connect jdbc:mysql://quickstart.cloudera:3306/retail_db \
--username retail_dba \
--password cloudera \
--table order_no_pk \ {It will fail because in this table no PK and therefore
mapper doesn't know how to divide the work among themselves}
--warehouse-dir /ordersnopk

sqoop-import \
--connect jdbc:mysql://quickstart.cloudera:3306/retail_db \
--username retail_dba \
--password cloudera \
--table order_no_pk \
--split-by order_id \
--target-dir /ordersnopk

sqoop import-all-tables \
--connect jdbc:mysql://quickstart.cloudera:3306/retail_db \
--username retail_dba \
--password cloudera \
--warehouse-dir /user/cloudera/autoreset1mresult \
--autoreset-to-one-mapper \ {uses one mappper if a table with no P.K. is
encountered}
--num-mappers 2

{Agar apne pass 100 tables hai or usme se 98 tables me P.K. hai and remaining 2
tables me P.K. nahi hai toh
jab table me P.K. hai toh 2 mapper work karege, and jissme P.K nahi hai ussme by
default mapper 1 ho jayega!}

SESSION - 6
============

sqoop create-hive-table \ {Creating the empty table in hive based on metadata in

mysql}
--connect jdbc:mysql://quickstart.cloudera:3306/retail_db \
--username retail_dba \
--password cloudera \
--table orders \ {By default hive me table ka name same hota hai from
source table name but we can change it}
--hive-table emps \ {The name of the table should be emps in hive which
content the metadata of order table present in mysql}
--fields-terminated-by ','

# SQOOP EXPORT EXECISE

=======================

SESSION - 1
============

SQOOP EXPORT

IS USED TO TRANSFER DATA FROM HDFS TO RDBMS.

CREATE TABLE card_transactions (

transaction_id INT,
card_id BIGINT,
member_id BIGINT,
amount INT,
postcode INT,
pos_id BIGINT,
transaction_dt varchar(255),
status varchar(255),
PRIMARY KEY(transaction_id)
);

WE HAVE CARD_TRANS.CSV ON THE DESKTOP LOCALLY IN CLOUDERA.

WE SHOULD BE MOVING THIS FILE FROM LOCAL TO HDFS

hadoop fs -mkdir /data

hadoop fs -put Desktop/card_trans.csv /data

sqoop export \

--connect jdbc:mysql://quickstart.cloudera:3306/banking \
--username root \
--password cloudera \
--table card_transactions \
--export-dir /data/card_trans.csv \
--fields-terminated-by ","

2 IMPORTANT THINGS:

1. why the job failed ? {check your Job tracking url}

2. if a job fails how to make sure that target table is not

impacted.{thats means nothing should be transfered if job fail
i.e. it should not be a partial}

Caused by:
com.mysql.jdbc.exceptions.jdbc4.MySQLIntergrityConstraintViolationException:
Duplicate entry '345925144288000-10-10-2017 18:02:40' for key 'PRIMARY'

>>Concept : Staging table comes into play for avoid partial transfer of data :

>>1st creating the same schema table with stage name attach in mysql database,

CREATE TABLE card_transactions_stage (

card_id BIGINT,
member_id BIGINT,
amount INT(10),
postcode INT(10),
pos_id BIGINT,
transaction_dt varchar(255),
status varchar(255),
PRIMARY KEY (card_id, transaction_dt)
);

>>Now, Running the export command with --staging-table <table name>

sqoop export \
--connect jdbc:mysql://quickstart.cloudera:3306/banking \
--username root \
--password cloudera \
--table card_transactions \
--staging-table card_transactions_stage \
--export-dir /data/card_transactions.csv \
--field-terminated-by ','

>>If partial record transfered then the partial record will kept in stage table
will not
transfer to the main table;
>>If data is successfully transfered to staging table then stage table in MySql
will Migrate the data
to the main table and stage table will become empty. Because data has been
migrated.

SESSION - 8
============

sqoop export \
--connect jdbc:mysql://quickstart.cloudera:3306/banking \
--username root \
--password cloudera \
--table card_transactions \
--staging-table card_transactions_stage \
--export-dir /user/cloudera/data/card_transactions_new.csv \
--fields-terminated-by ','

SESSION - 9
============

Incremental Import

orders table in mysql

50000 records are there.

order_id is the primary key.

100 new orders are coming tomorrow in orders table.

again, sqoop import.

you already have done the import of 50000 records using

sqoop import.

in such a case you should go with incremental import

2 choices
==========

1.append mode - append mode is used when there are no updates

in data, and there are just new inserts.

2.lastmodified mode - when we need to capture the updates also.

so in this case we will be using a date on basis of which we will
try to fetch the data.

sqoop-import \
--connect jdbc:mysql://quickstart.cloudera:3306/retail_db \
--username root \
--password cloudera \
--table orders \
--warehouse-dir /data \
--incremental append \
--check-column order_id \
--last-value 0 {saying that if order_id is >0 then please import the records}

insert into orders values(68884,'2014-07-23 00:00:00',5522,'COMPLETE');

insert into orders values(68885,'2014-07-23 00:00:00',5522,'COMPLETE');
insert into orders values(68886,'2014-07-23 00:00:00',5522,'COMPLETE');
insert into orders values(68887,'2014-07-23 00:00:00',5522,'COMPLETE');
insert into orders values(68888,'2014-07-23 00:00:00',5522,'COMPLETE');
insert into orders values(68889,'2014-07-23 00:00:00',5522,'COMPLETE');

>>commit

SESSION - 10
=============

incremental import using append mode - only inserts, no updates.

incremental import using lastmodified mode - when there are updates

as well.

sqoop import
--connect jdbc:mysql://quickstart.cloudera:3306/retail_db
--username root
--password cloudera
--table orders
--warehouse-dir /user/cloudera/data
--incremental lastmodified
--check-column order_date {Issme ham TimeStamp (Date) wala column specify karte
hai}
--last-value 0 {Basically I should give here date but in the first load I want to
consider everything}
--append

>> '2023-02-07 22:35:59' { Now next time I have run this then I have to replace 0
with this number thatswhy I am taking this }

insert into orders values(68890,current_timestamp,5523,'COMPLETE');

insert into orders values(68891,current_timestamp,5523,'COMPLETE');
insert into orders values(68892,current_timestamp,5523,'COMPLETE');
insert into orders values(68893,current_timestamp,5523,'COMPLETE');
insert into orders values(68894,current_timestamp,5523,'COMPLETE');

update orders set order_status='COMPLETE',order_date = current_timestamp WHERE

ORDER_ID = 68862;
commit;

sqoop-import \
--connect jdbc:mysql://quickstart.cloudera:3306/retail_db \
--username root \
--password cloudera \
--table orders \
--warehouse-dir /user/cloudera/data \
--incremental lastmodified \
--check-column order_date \
--last-value '2023-02-07 22:35:59' \ {Bass aye date ko save karna padta hai for
next import}
--append {hamne jab ek baar import kiya and again hum incremental import karte hai
over the same output dir then
we want to choose either append or merge-key on the base of the
requirement}

if a record is updated in your table and then we use incremental

import with last modified. then we will get the updated record
also

5000 oldtimestamp in hdfs

5000 newtimestamp in hdfs {It means in your hdfs you will have 2 records with
oldtimestamp & newtimestamp
because we are using --append parameter}

you want that hdfs file should be always in sync with the table.
{e.g. If you have 1000 records in your table of MySql DB then there should be 1000
records in your hdfs}
{i.e. hame bass new updated records chahiye old wale records nahi chahiye}

{ thats means 5000 is Primary key and having 2 records then it should
consider the record with the latest timeStamp in hdfs thats means
there will not be any duplicate entry in HDFS ,So for that we use --merge-key }

sqoop-import \
--connect jdbc:mysql://quickstart.cloudera:3306/retail_db \
--username root \
--password cloudera \
--table orders \
--warehouse-dir /user/cloudera/data \
--incremental lastmodified \
--check-column order_date \
--last-value '2023-02-07 22:35:59' \
--merge-key order_id {if i am using merge-key as against append then it will make
sure that against each P.K.
or each key order_id will have only one record and with
the latest TimeStamp will be considered
in hdfs}

{After using the above import command it will bring the new records which are added
& old records which are updated in the table and
after receiving that records then it will start process of merging the duplicate
records on the basis of --merging-key parameter,
and After it get merged then in will produce ony 1 file in the output dir with
Part-r file because merging is reducing activity }

2 modes
========
1. append - we talk only about new inserts

--incremental append
--check-column order_id
--last-value 0 {Any order_id greater than 0 should be import}

2. lastmodified - when we have updates as well

--incremental lastmodified
--check-column order_date {It should be some date column}
--last-value previousdate {So, this is a date after which all records entered
should be imported}

>>Aaapko append or merge dono me se ek parameter dena hi padega after 1st

incremental import otherwise will show err that output dir exist :

--append (will create the duplicacy if old records and old updated records in hdfs}

--merge-key order_id (will merge the duplicacy with the help of reducing activity
on the basis of P.K)
{And we usually replace the new records over the old records
on the basis of TimeStamp}

SESSION - 11
=============

incremental import

In this session we will talk about

1. sqoop job

2. password management.

sqoop job \
--create job_orders \ {job name should be unique}
-- import \ {yaha 2 times hypen ke baadme ek space honi chahiye}
--connect jdbc:mysql://quickstart.cloudera:3306\retail_db \
--username root \
--password cloudera \
--table orders \
--warehouse-dir /user/cloudera/data \
--incremental append \
--check-column order_id \
--last-value 0

sqoop job --list : This command will show us all the created sqoop jobs.

sqoop job --exec job_orders

sqoop job --show job_orders : To see all the parameter saved or stored.

sqoop job --delete job_orders : Deleting a sqoop job

echo -n "cloudera" >> .password.file , it's is created in local cloudera

sqoop job \
--create job_orders \
-- import \
--connect jdbc:mysql://quickstart.cloudera:3306/retail_db \
--username root \
--password-file file:///home/cloudera/.password.file \
--table orders \
--warehouse-dir /user/cloudera/data \
--incremental append \
--check-column order_id \
--last-value 0

We are expecting the above command will be fully automatic.

We have successfully created job.

sqoop job --exec job_orders

Hadoop Exam
No ratings yet
Hadoop Exam
67 pages
OracleStudy Material
100% (1)
OracleStudy Material
376 pages
Big Data Practice
No ratings yet
Big Data Practice
93 pages
Bda U3
No ratings yet
Bda U3
59 pages
04 Sqoop
No ratings yet
04 Sqoop
30 pages
Mod 2
No ratings yet
Mod 2
70 pages
Sqoop
No ratings yet
Sqoop
15 pages
Sqoop Cheat Sheet
No ratings yet
Sqoop Cheat Sheet
2 pages
Sqoop User Guide
No ratings yet
Sqoop User Guide
90 pages
Oracle Report Builder
No ratings yet
Oracle Report Builder
49 pages
Unit 4 3 Lumify, Data Rapper and Sqooop
No ratings yet
Unit 4 3 Lumify, Data Rapper and Sqooop
27 pages
Big Data: Sqoop
No ratings yet
Big Data: Sqoop
43 pages
DSCI 5350 - Lecture 3 PDF
No ratings yet
DSCI 5350 - Lecture 3 PDF
39 pages
Sqoop LAB
No ratings yet
Sqoop LAB
12 pages
Sqoop 1
No ratings yet
Sqoop 1
29 pages
Sqoop Export PP 200913 222451 Unlocked
No ratings yet
Sqoop Export PP 200913 222451 Unlocked
12 pages
B22 BDA Experiment 03
No ratings yet
B22 BDA Experiment 03
11 pages
BDA 02 - Sqoop Installation
No ratings yet
BDA 02 - Sqoop Installation
13 pages
BC Ca1,2
No ratings yet
BC Ca1,2
31 pages
Sqoop Incremental Import PP 200913 222451 Unlocked
No ratings yet
Sqoop Incremental Import PP 200913 222451 Unlocked
27 pages
Lab Experiments 1,2&4
No ratings yet
Lab Experiments 1,2&4
8 pages
Sqoop Additional Reading Pp-200913-222451-Unlocked
No ratings yet
Sqoop Additional Reading Pp-200913-222451-Unlocked
18 pages
6.moving Data Into Hadoop
No ratings yet
6.moving Data Into Hadoop
18 pages
Sqoop
No ratings yet
Sqoop
5 pages
Sqoop
No ratings yet
Sqoop
9 pages
Scoop Intro
No ratings yet
Scoop Intro
9 pages
Apache Sqoop Data Transfer Between Hadoop and RDBMS
No ratings yet
Apache Sqoop Data Transfer Between Hadoop and RDBMS
9 pages
L4B-Sqoop Import - Mysql To Hive: Scenario 1 - The Setting
No ratings yet
L4B-Sqoop Import - Mysql To Hive: Scenario 1 - The Setting
14 pages
Loadeer Lab
No ratings yet
Loadeer Lab
3 pages
Class 4
No ratings yet
Class 4
3 pages
Tasks Mar 22 2020
No ratings yet
Tasks Mar 22 2020
2 pages
HOL Hive
No ratings yet
HOL Hive
85 pages
Sqoop - A Haddop Technology: Srikalahasti
No ratings yet
Sqoop - A Haddop Technology: Srikalahasti
13 pages
Sqoop
No ratings yet
Sqoop
3 pages
Sqoop Practice
No ratings yet
Sqoop Practice
5 pages
Session9 DataIngestion SQOOP
No ratings yet
Session9 DataIngestion SQOOP
4 pages
BD Sqltohadoop3 PDF
No ratings yet
BD Sqltohadoop3 PDF
13 pages
Sqoop v1.1
No ratings yet
Sqoop v1.1
18 pages
Module IV
No ratings yet
Module IV
5 pages
This Documents Are About Apache Sqoop
No ratings yet
This Documents Are About Apache Sqoop
23 pages
M - M - Num-Mappers
No ratings yet
M - M - Num-Mappers
4 pages
5 - Big - Data Vivek
No ratings yet
5 - Big - Data Vivek
4 pages
Production Issues: in Beginning Almost Every Time!
No ratings yet
Production Issues: in Beginning Almost Every Time!
8 pages
Sqoop
No ratings yet
Sqoop
4 pages
Apache Sqoop: Hanoi - Autumn 2019
No ratings yet
Apache Sqoop: Hanoi - Autumn 2019
18 pages
Knowledge About Apache Sqoop and Its All Basic Commands To Import and Export The Data
No ratings yet
Knowledge About Apache Sqoop and Its All Basic Commands To Import and Export The Data
7 pages
Sqoop Practice
No ratings yet
Sqoop Practice
7 pages
Power BI For Data Modelling
100% (2)
Power BI For Data Modelling
25 pages
How Sqoop Works?: Sqoop "SQL To Hadoop and Hadoop To SQL"
No ratings yet
How Sqoop Works?: Sqoop "SQL To Hadoop and Hadoop To SQL"
27 pages
How Sqoop Works?: Relationaldatabase Servers in The Relational Database Structure
No ratings yet
How Sqoop Works?: Relationaldatabase Servers in The Relational Database Structure
7 pages
Creating A Table in RDBMS 3 2. Importing RDBMS Data Into H DFS 3 Exporting HDFS Data To RDBMS .. 6
No ratings yet
Creating A Table in RDBMS 3 2. Importing RDBMS Data Into H DFS 3 Exporting HDFS Data To RDBMS .. 6
5 pages
Cloudera: CCA175 Exam
No ratings yet
Cloudera: CCA175 Exam
11 pages
Data Ingest
No ratings yet
Data Ingest
5 pages
Sqoop Commands
No ratings yet
Sqoop Commands
9 pages
Sqoop Practice
No ratings yet
Sqoop Practice
2 pages
Hive Commands Simplin
No ratings yet
Hive Commands Simplin
5 pages
Sqoop Implementation Revised
No ratings yet
Sqoop Implementation Revised
7 pages
DBMS Notes: Database: Database Is A Collection of Inter-Related Data Which Helps in Efficient
No ratings yet
DBMS Notes: Database: Database Is A Collection of Inter-Related Data Which Helps in Efficient
22 pages
Sqoop Demo
No ratings yet
Sqoop Demo
7 pages
Apache Sqoop
No ratings yet
Apache Sqoop
21 pages
What Is Data Redundancy and Inconsistency
100% (2)
What Is Data Redundancy and Inconsistency
3 pages
Sqoop Cammand
No ratings yet
Sqoop Cammand
8 pages
Sqoop Commands
No ratings yet
Sqoop Commands
4 pages
Code MSG
No ratings yet
Code MSG
266 pages
IP Project Prem THIS IS THE IP PROJECT OF LCASS 12TH CBSE
No ratings yet
IP Project Prem THIS IS THE IP PROJECT OF LCASS 12TH CBSE
30 pages
Zero Data Loss Recovery Appliance
No ratings yet
Zero Data Loss Recovery Appliance
45 pages
Levels of Abstraction in Database Management System
No ratings yet
Levels of Abstraction in Database Management System
22 pages
Data Analytics
No ratings yet
Data Analytics
14 pages
fr2110 Operators Manual PDF
No ratings yet
fr2110 Operators Manual PDF
84 pages
BSC
No ratings yet
BSC
20 pages
Constraints
No ratings yet
Constraints
22 pages
SQIT3043 Chapter 8 - SQL (Latest)
100% (1)
SQIT3043 Chapter 8 - SQL (Latest)
36 pages
Semester 1 Mid Term Exam Answers Sections 1-10 of Database Design
No ratings yet
Semester 1 Mid Term Exam Answers Sections 1-10 of Database Design
128 pages
DBMS Viva Questions
No ratings yet
DBMS Viva Questions
31 pages
Atharva Joshi - 7066467817
No ratings yet
Atharva Joshi - 7066467817
1 page
Unit 2 Tree
No ratings yet
Unit 2 Tree
55 pages
Introduction To Database Systems
No ratings yet
Introduction To Database Systems
34 pages
Kota Krishna Chaitanya - Resume
No ratings yet
Kota Krishna Chaitanya - Resume
4 pages
Onrfile Sample
No ratings yet
Onrfile Sample
97 pages
Lecture 1.1 and 1.2 (Database Concepts and Database System Architecture)
No ratings yet
Lecture 1.1 and 1.2 (Database Concepts and Database System Architecture)
23 pages
RAC Duplication
No ratings yet
RAC Duplication
11 pages
Fedora 4.7 Triplestore Integration Notes
No ratings yet
Fedora 4.7 Triplestore Integration Notes
30 pages
Interview Que Only For Cognos
No ratings yet
Interview Que Only For Cognos
6 pages
Jurnal 3 PDF
No ratings yet
Jurnal 3 PDF
20 pages
Document 2484229.1 Recover Oracle Error
No ratings yet
Document 2484229.1 Recover Oracle Error
3 pages
Splunk 8.0.1 Indexer Howindexingworks
No ratings yet
Splunk 8.0.1 Indexer Howindexingworks
5 pages
CA ERwin 10 Things To Avoid
No ratings yet
CA ERwin 10 Things To Avoid
8 pages
Attach and Detach
No ratings yet
Attach and Detach
2 pages
Basic DBA Query v.1: Oracle Database
From Everand
Basic DBA Query v.1: Oracle Database
Oraclesql-plsql
5/5 (1)
Quick Configuration of Openldap and Kerberos in Linux and Authenicating Linux to Active Directory
From Everand
Quick Configuration of Openldap and Kerberos in Linux and Authenicating Linux to Active Directory
Dr. Hidaia Mahmood Alassouli
No ratings yet

Week 3

Uploaded by

Week 3

Uploaded by

# SQOOP IMPORT EXECISE

Sqoop Import - Databases to HDFS (frequently)

Sqoop Export - HDFS to Databases

Sqoop Eval - to run queries on the database

INSERT INTO people values (101,'Raj','Pali','Itwara chowk','Yavatmal)

(transfer data from your relation db to HDFS)

only mappers work and no reducer.

by default there are 4 mappers which do the work.

yes we can change the number of mappers.

these mappers divide the work based on primary key.

if there is no primary key then what will happen?

1. you change the number of mappers to 1.

Now my path will this = peopleresult1/people

Target dir vs. Warehouse dir

In case of target directory the directory path mentioned is

In case of warehouse directory, the system will create a

How to Redirect the logs for later use ?

the mappers by default will be 4.

the max of primary key

split size = (max_of_pk - min_of_pk)/Num_Mappers

split size = 25000

sqoop create-hive-table \ {Creating the empty table in hive based on metadata in

# SQOOP EXPORT EXECISE

IS USED TO TRANSFER DATA FROM HDFS TO RDBMS.

CREATE TABLE card_transactions (

WE HAVE CARD_TRANS.CSV ON THE DESKTOP LOCALLY IN CLOUDERA.

WE SHOULD BE MOVING THIS FILE FROM LOCAL TO HDFS

hadoop fs -mkdir /data

hadoop fs -put Desktop/card_trans.csv /data

1. why the job failed ? {check your Job tracking url}

2. if a job fails how to make sure that target table is not

CREATE TABLE card_transactions_stage (

>>Now, Running the export command with --staging-table <table name>

orders table in mysql

50000 records are there.

order_id is the primary key.

100 new orders are coming tomorrow in orders table.

again, sqoop import.

you already have done the import of 50000 records using

in such a case you should go with incremental import

1.append mode - append mode is used when there are no updates

2.lastmodified mode - when we need to capture the updates also.

insert into orders values(68884,'2014-07-23 00:00:00',5522,'COMPLETE');

incremental import using append mode - only inserts, no updates.

incremental import using lastmodified mode - when there are updates

insert into orders values(68890,current_timestamp,5523,'COMPLETE');

update orders set order_status='COMPLETE',order_date = current_timestamp WHERE

if a record is updated in your table and then we use incremental

5000 oldtimestamp in hdfs

2. lastmodified - when we have updates as well

>>Aaapko append or merge dono me se ek parameter dena hi padega after 1st

In this session we will talk about

sqoop job --exec job_orders

sqoop job --delete job_orders : Deleting a sqoop job

echo -n "cloudera" >> .password.file , it's is created in local cloudera

We are expecting the above command will be fully automatic.

We have successfully created job.

sqoop job --exec job_orders

You might also like