0% found this document useful (0 votes)

9 views4 pages

Session9 DataIngestion SQOOP

SQOOP DATA Ingestion

Uploaded by

parthu12347299

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views4 pages

Session9 DataIngestion SQOOP

SQOOP DATA Ingestion

Uploaded by

parthu12347299

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

You are on page 1/ 4

PHASE-1

DATALAKE --> ALL THE KINDS OF DATA --> SD , SSD , USD

HDFS (ONPREM), S3 , ADLS

HADOOP CLUSTER --> HADOOP ADMINS

DEVELOPERS --> EDGE NODE --> LINUX terminal

LINUX --> 20
HDFS --> 15 COMMANDS

===========================================
ETL or ELT ( BD)

E -> EXTRACT --> VERY SIMPLE --> SQL LOGICS

L -> LOAD --> VERY SIMPLE --> LOAD
T --> TRANSFORM --> STRUGGLE --> PYSPARK , HIVE

ETL or ELT

============================================
EXTRACT TERMINOLOGY -->

BANKING --> SOURCES

RDBMS --> MYSQL , ORACLE , POSTGRE

SOCIAL MEDIA --> LINKEDLN , FB , WATSAPP
MAINFRAMES
SENSOR
REST API

INJECTION PHASE --> TAKING THE DATA FROM SOURCE SYSTEM and keeping in DATA LAKE

COMMON TOOLS AND FRAMEWORKS --> INJECTION PHASE

TALEND
INFORMATICA
SQOOP --> THIS ONE --> SIMPLE --> BD projects --> DATA INJECTION TOOL ( CLOUDERA ,
CLOUDXLAB)
SPARK SQL PULL INJECTION
KAFKA (BD and RT)
SSIS
AWS GLUE
AZURE ADF
AZURE SYANPSE ANALYTICS
APACHE FLINK

============================

SQOOP --> SQL + HADOOP --> TOOL which takes the dta from RDBMS to HDFS

1) WRITE A QUERY ON A SQL RDBMS

2) TAKE THE DATA FROM EACH AND EVERY SYSTEM , DUMP it in a different , then do the
anal;sys

SQOOP RDBMS and keep it in HDFS

a) EDGE NODE
b) HDFS
c) BOTH
d) NONE

SQOOP to RDBMS TO S3 ? YES ...

MYSQL Server --> Retail_db --> customer table --> HDFS

1) HOST ID
2) username
3) password
4) Databasenamne
5) table name
6) Target name --> target hdfs dir name

MYSQL --. S3 -- SQOOP

1) HOST ID
2) USERNAME
3) PASSWORD
4) DATABASE
5) TABLE NAME
6) ACCESS KEY
7) S3 bucket name

CUSTOMER TABLE --> 12435 --> HDFS ..

1) HOST ID
2) username
3) password
4) retail_db
5) customers
6) HDFS directory

SQOOP -->
1) IMPORT --> RDBMS to HDFS
2) EXPORT --> HDFS TO RDBMS

CLOUDERA -->

sqoop import --connect jdbc:mysql://localhost/retail_db --username root --

password cloudera --table customers -m 1 --target-dir
/user/cloudera/Sqoop_B17_SAMPLE

================= CLOUDERA ===============================

1) Open your cloudera in Putty

2) Check whether mysql is present in your cloudera
mysql -u root -pcloudera
3) show databases ; ( retail_db)
4) use retail_db;
5) check customers
6) select count(*) from customers;
============================================
open a new terminal in putty

HIT THE BELOW COMMAND :

sqoop import --connect jdbc:mysql://localhost/retail_db --username root

--password cloudera --table customers -m 1 --target-dir
/user/cloudera/Sqoop_B17_SAMPLE

hdfs dfs -ls /user/cloudera/Sqoop_B17_SAMPLE

You need to see the part file ...

============================================

CLOUDXLAB -->
1) open mysql
mysql command -->
mysql -h cxln2.c.thelab-240901.internal -u sqoopuser -pNHkkP876rp
2) Go inside retail_db;
use retail_db;

3) select count(*) from customers;

128

4)
sqoop import --connect jdbc:mysql://cxln2.c.thelab-240901.internal/retail_db -
username sqoopuser --password NHkkP876rp -table customers -m 1 --target-dir
/user/gadirajumidhun2082/Sqoop_B17_MIDHUN

==============================================

NUMBER OF MAPPERS --> 4

split-by

customers ( PK ) --> HDFS --> no mappers --> 4

customer_Midhun (batch12) --> HDFS --> No mappers --> error

==============================================

1) SQOOP COMMAND
2) SQOOP IMPORT ARCGH

=============================================

1) SQOOP --> RDBMS to HDFS customer

RDBMS --> SQL
HDFS --> JAVA

PROGRAM --> RECORD CONTAINER CLASS --> DATA TYPE MATCHING

2) BOUNDARY QUERY -->

100000 --> customer --> select min(custid),max(custid) from tbl/4
m 4

m1 --> 1 to 25000 --> partm-00000

m2 --> 25001 to 50000 --> partm-00001
m3 --. 50001 to 75000 --> partm-00002
m4 -- 75001 to 10000 --> partm-00003

3) DATA IMPORT -->

m1
m2
m3
m4 to HDFS

============================================

Hadoop Exam
No ratings yet
Hadoop Exam
67 pages
Unit 3 Topic 8 Flume and Scoop
No ratings yet
Unit 3 Topic 8 Flume and Scoop
35 pages
Bda U3
No ratings yet
Bda U3
59 pages
Medical Stores Management System in C#
86% (43)
Medical Stores Management System in C#
123 pages
04 Sqoop
No ratings yet
04 Sqoop
30 pages
Cloudera Msazure Hadoop Deployment Guide
No ratings yet
Cloudera Msazure Hadoop Deployment Guide
39 pages
PGNP Manual
No ratings yet
PGNP Manual
92 pages
Mod 2
No ratings yet
Mod 2
70 pages
Adv - Java GTU Study Material Presentations Unit-2 JDBC Programming
No ratings yet
Adv - Java GTU Study Material Presentations Unit-2 JDBC Programming
107 pages
Sqoop User Guide
No ratings yet
Sqoop User Guide
90 pages
Getting Started QMF For Win
No ratings yet
Getting Started QMF For Win
330 pages
Scoop Intro
No ratings yet
Scoop Intro
9 pages
Unit 4 3 Lumify, Data Rapper and Sqooop
No ratings yet
Unit 4 3 Lumify, Data Rapper and Sqooop
27 pages
Unit 3 Apache Sqoop and Drill
No ratings yet
Unit 3 Apache Sqoop and Drill
10 pages
Big Data: Sqoop
No ratings yet
Big Data: Sqoop
43 pages
DSCI 5350 - Lecture 3 PDF
No ratings yet
DSCI 5350 - Lecture 3 PDF
39 pages
Chapter n3 Sqoop
No ratings yet
Chapter n3 Sqoop
24 pages
5 - Hive
No ratings yet
5 - Hive
51 pages
Bda 11
No ratings yet
Bda 11
10 pages
Slide 4 Data Loading Tool
No ratings yet
Slide 4 Data Loading Tool
77 pages
Lab Experiments 1,2&4
No ratings yet
Lab Experiments 1,2&4
8 pages
BDA 02 - Sqoop Installation
No ratings yet
BDA 02 - Sqoop Installation
13 pages
Sqoop 1
No ratings yet
Sqoop 1
29 pages
BDA Lab2
No ratings yet
BDA Lab2
8 pages
Oracle User Management R12 Developers Guide
No ratings yet
Oracle User Management R12 Developers Guide
34 pages
Cloudera Academic Partnership 8 PDF
No ratings yet
Cloudera Academic Partnership 8 PDF
69 pages
IBM SPSS Statistics 27 (Desktop MacOS Requisites)
No ratings yet
IBM SPSS Statistics 27 (Desktop MacOS Requisites)
110 pages
Module IV
No ratings yet
Module IV
5 pages
DSCI 5350 - Lecture 4 PDF
No ratings yet
DSCI 5350 - Lecture 4 PDF
33 pages
6.moving Data Into Hadoop
No ratings yet
6.moving Data Into Hadoop
18 pages
17-Lesson Sqoop Practice - PuTTY
No ratings yet
17-Lesson Sqoop Practice - PuTTY
8 pages
Sqoop
No ratings yet
Sqoop
9 pages
Sqoop Additional Reading Pp-200913-222451-Unlocked
No ratings yet
Sqoop Additional Reading Pp-200913-222451-Unlocked
18 pages
Answer: B: Explanation
100% (3)
Answer: B: Explanation
76 pages
Apache Sqoop Data Transfer Between Hadoop and RDBMS
No ratings yet
Apache Sqoop Data Transfer Between Hadoop and RDBMS
9 pages
Module-1 DBMS Student Vesion
No ratings yet
Module-1 DBMS Student Vesion
95 pages
Practical 3.6 Hive
No ratings yet
Practical 3.6 Hive
8 pages
Sqoop Practice
No ratings yet
Sqoop Practice
5 pages
MDB
No ratings yet
MDB
77 pages
Practical 1-4
No ratings yet
Practical 1-4
14 pages
Current Log
No ratings yet
Current Log
62 pages
L4B-Sqoop Import - Mysql To Hive: Scenario 1 - The Setting
No ratings yet
L4B-Sqoop Import - Mysql To Hive: Scenario 1 - The Setting
14 pages
Class 4
No ratings yet
Class 4
3 pages
This Documents Are About Apache Sqoop
No ratings yet
This Documents Are About Apache Sqoop
23 pages
HIVE Installation
No ratings yet
HIVE Installation
3 pages
Data Lake 1
No ratings yet
Data Lake 1
48 pages
Loadeer Lab
No ratings yet
Loadeer Lab
3 pages
Week 3
No ratings yet
Week 3
11 pages
M - M - Num-Mappers
No ratings yet
M - M - Num-Mappers
4 pages
Data Ingest
No ratings yet
Data Ingest
15 pages
How To Activate PPO and Resolve ASSERTION - FAILED Dump During Synchronization
100% (1)
How To Activate PPO and Resolve ASSERTION - FAILED Dump During Synchronization
4 pages
Sqoop v1.1
No ratings yet
Sqoop v1.1
18 pages
Sqoop Students Datadotz
No ratings yet
Sqoop Students Datadotz
19 pages
How Sqoop Works?: Sqoop "SQL To Hadoop and Hadoop To SQL"
No ratings yet
How Sqoop Works?: Sqoop "SQL To Hadoop and Hadoop To SQL"
27 pages
Sqoop Commands
No ratings yet
Sqoop Commands
9 pages
Sqoop
No ratings yet
Sqoop
4 pages
Apache Sqoop: Hanoi - Autumn 2019
No ratings yet
Apache Sqoop: Hanoi - Autumn 2019
18 pages
Knowledge About Apache Sqoop and Its All Basic Commands To Import and Export The Data
No ratings yet
Knowledge About Apache Sqoop and Its All Basic Commands To Import and Export The Data
7 pages
Cloudera Training
No ratings yet
Cloudera Training
10 pages
M.tech Thesis and Review Report Guidelines
No ratings yet
M.tech Thesis and Review Report Guidelines
10 pages
Cloudera: CCA175 Exam
No ratings yet
Cloudera: CCA175 Exam
11 pages
Creating A Table in RDBMS 3 2. Importing RDBMS Data Into H DFS 3 Exporting HDFS Data To RDBMS .. 6
No ratings yet
Creating A Table in RDBMS 3 2. Importing RDBMS Data Into H DFS 3 Exporting HDFS Data To RDBMS .. 6
5 pages
Apache Sqoop
No ratings yet
Apache Sqoop
21 pages
Sqoop Demo
No ratings yet
Sqoop Demo
7 pages
Sqoop Practice
No ratings yet
Sqoop Practice
7 pages
Sqoop Cammand
No ratings yet
Sqoop Cammand
8 pages
How Sqoop Works?: Relationaldatabase Servers in The Relational Database Structure
No ratings yet
How Sqoop Works?: Relationaldatabase Servers in The Relational Database Structure
7 pages
Buffer Cache Advisory
No ratings yet
Buffer Cache Advisory
8 pages
Sample Data For Pivot Table
No ratings yet
Sample Data For Pivot Table
25 pages
IV - Common Errors in Datastage
No ratings yet
IV - Common Errors in Datastage
3 pages
Sqoop Commands
No ratings yet
Sqoop Commands
4 pages
Data Ingest
No ratings yet
Data Ingest
5 pages
ATS Friendly Data Analyst Resume Template
No ratings yet
ATS Friendly Data Analyst Resume Template
4 pages
Introducing Dataops Into Your Data and Analytics Discipline: Joao Oliveira Tapan Patel
No ratings yet
Introducing Dataops Into Your Data and Analytics Discipline: Joao Oliveira Tapan Patel
13 pages
Question Text
No ratings yet
Question Text
18 pages
Midterm Solution
No ratings yet
Midterm Solution
22 pages
Statistica
No ratings yet
Statistica
4 pages
Sqoop Cheatsheet
No ratings yet
Sqoop Cheatsheet
3 pages
2nd Review Final
No ratings yet
2nd Review Final
4 pages
Nan Mudhhalvan Uipath Project
No ratings yet
Nan Mudhhalvan Uipath Project
12 pages
How Yaffs Works
No ratings yet
How Yaffs Works
25 pages
Introduction To Linked List
No ratings yet
Introduction To Linked List
13 pages
Data Integrity
No ratings yet
Data Integrity
2 pages
Python and Mysql Connectivity Class 12 SN
No ratings yet
Python and Mysql Connectivity Class 12 SN
3 pages
FDB - Syllabus 2024-2025
No ratings yet
FDB - Syllabus 2024-2025
3 pages
Q5
No ratings yet
Q5
1 page
1 Vinayaka Power Bi Demo Over View Basic Theory
No ratings yet
1 Vinayaka Power Bi Demo Over View Basic Theory
54 pages
Status Code - 213 (No Storage Units Available For Use)
No ratings yet
Status Code - 213 (No Storage Units Available For Use)
7 pages
Session8 HadoopCommands
No ratings yet
Session8 HadoopCommands
3 pages
Bug 12851443 - OrA-600 (Mal0-Size-Too-large) With PIPELINED Table Function Doc ID
No ratings yet
Bug 12851443 - OrA-600 (Mal0-Size-Too-large) With PIPELINED Table Function Doc ID
2 pages
Perfil Joaocarvalhofacex@
No ratings yet
Perfil Joaocarvalhofacex@
3 pages
Quick Configuration of Openldap and Kerberos in Linux and Authenicating Linux to Active Directory
From Everand
Quick Configuration of Openldap and Kerberos in Linux and Authenicating Linux to Active Directory
Dr. Hidaia Mahmood Alassouli
No ratings yet
Guidelines For Preparing M.Tech. Project Review Report / Thesis-In PDF Format 2. Power Point Guidelines-In Excel Format
No ratings yet
Guidelines For Preparing M.Tech. Project Review Report / Thesis-In PDF Format 2. Power Point Guidelines-In Excel Format
1 page
Some Tutorials in Computer Networking Hacking
From Everand
Some Tutorials in Computer Networking Hacking
Dr. Hidaia Mahmood Alassouli
No ratings yet

Session9 DataIngestion SQOOP

Uploaded by

Session9 DataIngestion SQOOP

Uploaded by

PHASE-1

DATALAKE --> ALL THE KINDS OF DATA --> SD , SSD , USD

HADOOP CLUSTER --> HADOOP ADMINS

E -> EXTRACT --> VERY SIMPLE --> SQL LOGICS

BANKING --> SOURCES

RDBMS --> MYSQL , ORACLE , POSTGRE

COMMON TOOLS AND FRAMEWORKS --> INJECTION PHASE

1) WRITE A QUERY ON A SQL RDBMS

SQOOP RDBMS and keep it in HDFS

SQOOP to RDBMS TO S3 ? YES ...

MYSQL Server --> Retail_db --> customer table --> HDFS

MYSQL --. S3 -- SQOOP

CUSTOMER TABLE --> 12435 --> HDFS ..

sqoop import --connect jdbc:mysql://localhost/retail_db --username root --

================= CLOUDERA ===============================

1) Open your cloudera in Putty

HIT THE BELOW COMMAND :

sqoop import --connect jdbc:mysql://localhost/retail_db --username root

hdfs dfs -ls /user/cloudera/Sqoop_B17_SAMPLE

3) select count(*) from customers;

NUMBER OF MAPPERS --> 4

customers ( PK ) --> HDFS --> no mappers --> 4

1) SQOOP --> RDBMS to HDFS customer

PROGRAM --> RECORD CONTAINER CLASS --> DATA TYPE MATCHING

2) BOUNDARY QUERY -->

m1 --> 1 to 25000 --> partm-00000

3) DATA IMPORT -->

You might also like