0% found this document useful (0 votes)

1 views

EDA_SQL_Document

EDA

Uploaded by

Vitor Hugo Ferreira

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

1 views

EDA_SQL_Document

EDA

Uploaded by

Vitor Hugo Ferreira

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Exploratory Data Analysis (EDA) Using SQL

1. Understanding the Dataset

- Data Overview:

Use SQL queries like SELECT TOP (5) * FROM table_name; or SELECT COLUMN_NAME,

DATA_TYPE FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = 'table_name';

to quickly understand the structure of your dataset. Identify whether the data types are appropriate

(categorical, numerical, dates).

2. Data Cleaning

- Check for Missing Values:

Identify missing values with a query like:

SELECT COUNT(*) AS missing_count FROM table_name WHERE column_name IS NULL;

- Check for Duplicates:

Find duplicate records using:

SELECT column_name, COUNT(*) FROM table_name GROUP BY column_name HAVING

COUNT(*) > 1;

- Outlier Detection:

For numerical data, outliers can be detected using standard deviation:

SELECT AVG(column_name), STDEV(column_name) FROM table_name;

Filter out rows where values fall beyond a certain threshold:

SELECT * FROM table_name WHERE column_name > (AVG(column_name) + 3 *

STDEV(column_name));
- Data Type Corrections:

Use ALTER TABLE statements to ensure columns have the correct data type, e.g.:

ALTER TABLE table_name ALTER COLUMN column_name INT;

3. Descriptive Statistics

- Summary Statistics:

Get summary statistics (mean, min, max, etc.) with:

SELECT MIN(column_name), MAX(column_name), AVG(column_name), COUNT(*) FROM

table_name;

- For categorical data, get frequency distribution:

SELECT column_name, COUNT(*) FROM table_name GROUP BY column_name;

4. Data Relationships

- Correlation Analysis:

Use aggregate SQL functions to mimic correlation.

- Scatter Plots:

SQL cannot directly create plots, but you can retrieve the necessary data for visualization.

5. Data Visualization:

SQL doesn't produce charts directly. Export results for visualization using external tools.

6. Handling Categorical Variables

- Encoding:
Use CASE statements to manually assign numerical values to categories.

- Frequency Analysis:

Analyze category frequency distribution with GROUP BY.

7. Feature Engineering:

Use SQL to create new features, bins, or calculated columns.

8. Outlier Treatment:

Identify and manage outliers using threshold-based queries.

9. Dimensionality Reduction:

SQL supports column filtering via SELECT.

10. Summarizing Findings:

Use GROUP BY and aggregations to reveal trends.

SQL Notes
50% (4)
SQL Notes
16 pages
Oracle Database SQL 1Z0-071 Review Materilas
No ratings yet
Oracle Database SQL 1Z0-071 Review Materilas
2 pages
SQL for Data Science
No ratings yet
SQL for Data Science
8 pages
SQL-Data Analytcs
No ratings yet
SQL-Data Analytcs
13 pages
SQL_2024
No ratings yet
SQL_2024
3 pages
8 SQL Techniques Data Analysis Analytics Data Science
No ratings yet
8 SQL Techniques Data Analysis Analytics Data Science
13 pages
LearningTask4Document1 (2)
No ratings yet
LearningTask4Document1 (2)
20 pages
SQL - Eda Process
No ratings yet
SQL - Eda Process
7 pages
SQL Theory With Query
No ratings yet
SQL Theory With Query
11 pages
SQL For Everyone (Definitive Guide)
No ratings yet
SQL For Everyone (Definitive Guide)
10 pages
SQLNotes
No ratings yet
SQLNotes
223 pages
S07 Slides
No ratings yet
S07 Slides
17 pages
SQL Keywords and Functions
No ratings yet
SQL Keywords and Functions
9 pages
Introduction To Structured Query Language
No ratings yet
Introduction To Structured Query Language
23 pages
Mehak Dbms
No ratings yet
Mehak Dbms
21 pages
Master SQL SQL
No ratings yet
Master SQL SQL
17 pages
Advanced Structured Query Language
No ratings yet
Advanced Structured Query Language
76 pages
SQL Essentials PDF
No ratings yet
SQL Essentials PDF
36 pages
Benja's Notes
No ratings yet
Benja's Notes
40 pages
Basic SQL Commands
No ratings yet
Basic SQL Commands
10 pages
Topicos 1Z0-071 Oracle
No ratings yet
Topicos 1Z0-071 Oracle
3 pages
Sql notes
No ratings yet
Sql notes
5 pages
SQL Structured Query Language
No ratings yet
SQL Structured Query Language
3 pages
SQL Made Easy A Beginners Guide To Easily Learn SQL b096w2gtdf
No ratings yet
SQL Made Easy A Beginners Guide To Easily Learn SQL b096w2gtdf
214 pages
Simple SQL Queries
No ratings yet
Simple SQL Queries
4 pages
Lec07 - SQL (Cont.)
No ratings yet
Lec07 - SQL (Cont.)
79 pages
FULL DB AND SQL
No ratings yet
FULL DB AND SQL
27 pages
sql
No ratings yet
sql
2 pages
RDBMS Lab Record-IV Sem
No ratings yet
RDBMS Lab Record-IV Sem
39 pages
SQL Is A Standard Language For Accessing and Manipulating Databases. What Is SQL?
No ratings yet
SQL Is A Standard Language For Accessing and Manipulating Databases. What Is SQL?
25 pages
Fundamentals of Data Analysis (Access)
No ratings yet
Fundamentals of Data Analysis (Access)
24 pages
SQL Notes
No ratings yet
SQL Notes
8 pages
Cheat Sheet for SQL From Beginner to Expert
No ratings yet
Cheat Sheet for SQL From Beginner to Expert
2 pages
cheat sheet for sql from beginner to expert
No ratings yet
cheat sheet for sql from beginner to expert
2 pages
Step-by-Step Guide To Learn SQL
No ratings yet
Step-by-Step Guide To Learn SQL
11 pages
Data Analyst Syllabus(for Aundh)
No ratings yet
Data Analyst Syllabus(for Aundh)
8 pages
SQL Cheat Codes
No ratings yet
SQL Cheat Codes
8 pages
Advanced SQL Concepts
No ratings yet
Advanced SQL Concepts
38 pages
MY SQL Cheat Sheet
No ratings yet
MY SQL Cheat Sheet
6 pages
SQL Summary Version 5
No ratings yet
SQL Summary Version 5
7 pages
SQL Notes
No ratings yet
SQL Notes
81 pages
3 Ass
No ratings yet
3 Ass
21 pages
SQL
No ratings yet
SQL
10 pages
SQL Notes-2
No ratings yet
SQL Notes-2
7 pages
Lecture 07 - DMS
No ratings yet
Lecture 07 - DMS
15 pages
Oracle Certifications:: (Any One Exam Out of 3) - Click The Link To Get More Info. or
No ratings yet
Oracle Certifications:: (Any One Exam Out of 3) - Click The Link To Get More Info. or
6 pages
SQL For Everyone
No ratings yet
SQL For Everyone
11 pages
Database syntax (by chatGPT)
No ratings yet
Database syntax (by chatGPT)
4 pages
SQL
No ratings yet
SQL
4 pages
4. SQL - 1688813695672
No ratings yet
4. SQL - 1688813695672
33 pages
SQL All Operations
No ratings yet
SQL All Operations
10 pages
SQL Performance Improvement
No ratings yet
SQL Performance Improvement
94 pages
SQL__1721960421
No ratings yet
SQL__1721960421
131 pages
1738563659003
No ratings yet
1738563659003
11 pages
SQL Notes
No ratings yet
SQL Notes
69 pages
Open SQL Statements
No ratings yet
Open SQL Statements
38 pages
SQL For Everyone
No ratings yet
SQL For Everyone
11 pages
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
DBMS Lab Manual
From Everand
DBMS Lab Manual
Jitendra Patel
1.5/5 (3)
SQL Interview Success From Beginner To Pro
From Everand
SQL Interview Success From Beginner To Pro
Shana
No ratings yet
Enhanced_Visual_LEAD_DATEDIFF_SQL_Guide
No ratings yet
Enhanced_Visual_LEAD_DATEDIFF_SQL_Guide
3 pages
Episode 2 - Transcription
No ratings yet
Episode 2 - Transcription
10 pages
Episode 4 - Transcript
No ratings yet
Episode 4 - Transcript
10 pages
Official Microsoft Assessment for PL300 - 02
No ratings yet
Official Microsoft Assessment for PL300 - 02
29 pages
Official Microsoft Assessment for PL300 - 03
No ratings yet
Official Microsoft Assessment for PL300 - 03
27 pages

EDA_SQL_Document

Uploaded by

EDA_SQL_Document

Uploaded by

Exploratory Data Analysis (EDA) Using SQL

1. Understanding the Dataset

DATA_TYPE FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = 'table_name';

(categorical, numerical, dates).

- Check for Missing Values:

Identify missing values with a query like:

SELECT COUNT(*) AS missing_count FROM table_name WHERE column_name IS NULL;

- Check for Duplicates:

Find duplicate records using:

SELECT column_name, COUNT(*) FROM table_name GROUP BY column_name HAVING

For numerical data, outliers can be detected using standard deviation:

SELECT AVG(column_name), STDEV(column_name) FROM table_name;

Filter out rows where values fall beyond a certain threshold:

SELECT * FROM table_name WHERE column_name > (AVG(column_name) + 3 *

ALTER TABLE table_name ALTER COLUMN column_name INT;

Get summary statistics (mean, min, max, etc.) with:

SELECT MIN(column_name), MAX(column_name), AVG(column_name), COUNT(*) FROM

- For categorical data, get frequency distribution:

SELECT column_name, COUNT(*) FROM table_name GROUP BY column_name;

Use aggregate SQL functions to mimic correlation.

6. Handling Categorical Variables

Analyze category frequency distribution with GROUP BY.

Use SQL to create new features, bins, or calculated columns.

Identify and manage outliers using threshold-based queries.

SQL supports column filtering via SELECT.

10. Summarizing Findings:

Use GROUP BY and aggregations to reveal trends.

You might also like