Data Science Data Prep With SQL Quick Reference 1636560429

Data science

Uploaded by

newengineer20231110

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views1 page

Data Science Data Prep With SQL Quick Reference 1636560429

Data science

Uploaded by

newengineer20231110

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 1

Data Science - Data Prep with SQL - Quick Reference

DATASET PROFILING CLEAN ATTRIBUTES

Volume SELECT COUNT(*) FROM t; Outliers SELECT CASE WHEN attr1 < 0 THEN 0 WHEN
(Quantitative) attr1 > 1000 THEN 1000 ELSE attr1 END as
Velocity SELECT t.date1, COUNT(*) attr1 FROM t;
FROM t GROUP by t.date1
ORDER BY t.date1 desc; Missing Values SELECT COALESCE(attr1,AVG(attr1) OVER ()),
(At Random) COALESCE (attr1,’Unknown’) FROM t;
Attribute SELECT attr1, attr2, attr3, attr4
FROM t; Missing Values SELECT COALESCE(attr1,0)
Selection FROM t;
(Not at Random)

Incomplete SELECT * FROM t

Incorrect Values SELECT REPLACE(attr1,’bad’,’good’)
Records WHERE t.attr1 IS NULL
FROM t;
AND t.attr2 IS NULL;

VALIDATE ATTRIBUTES DERIVE ATTRIBUTES

SELECT DISTINCT(attr1) FROM t; Buckets\Binning SELECT attr1, CASE WHEN attr1 <= 50
Domain THEN ‘bin1’ WHEN attr1 > 50 THEN ‘bin2’
ELSE ‘bin3’ END as attr1_bin FROM t;
Missing SELECT * FROM t
Values WHERE t.attr1 IS NULL; SELECT DAYOFMONTH(date1),
Date Parts
MONTHOFYEAER(date1) FROM t;
Range SELECT MIN(attr1), MAX(attr1),
AVG(attr1) FROM t; Date Difference SELECT DATEDIFF(date1,date2) FROM t;

Data Type SELECT * FROM Last Period SELECT DATEADD(year,-1,date1) FROM t;

information_schema.columns
WHERE table_name = ‘t’; Dummy Encoding SELECT attr1, CASE WHEN attr1 = ‘Male’
(One Hot) THEN 1 ELSE 0 as male_gender FROM t;
Outliers WITH dev_cte AS (
(95% confidence) SELECT STDDEV(attr1) sdev FROM t)
SELECT attr1, attr2 FROM t COMBINE DATASETS
CROSS JOIN dev_cte c
WHERE t.attr1 > c.sdev * 2; Join Horizontally SELECT t1.attr1, t2.attr2 FROM t1
(Full Match) INNER JOIN t2 ON t1.ID = t2.ID;
Distribution SELECT attr1,
WIDTH_BUCKET(attr1,100,500,5) Join Horizontally SELECT t1.attr1, t2.attr2 FROM t1
FROM t; (Optional Match) LEFT JOIN t2 ON t1.ID = t2.ID;

Union Vertically SELECT attr1, attr2 FROM t1

STANDARDIZE ATTRIBUTES (Deduplicate) UNION SELECT attr1, attr2 FROM t2

Data Types SELECT CAST(attr1 AS DATE), Union Vertically SELECT attr1, attr2 FROM t1
CAST(attr2 AS INT) FROM t; (No Deduplicate) UNION ALL SELECT attr1, attr2 FROM t2

Patterns SELECT CASE WHEN attr1 = …,

REPLACE(attr2,’Street’,’St’) FROM t; SPLIT DATASETS
Formatting SELECT UPPER(attr1), REPLACE(attr2,’- Simple Filter SELECT attr1, attr2 FROM t
’,’’) FROM t; WHERE attr1 IS NOT NULL;

Scaling SELECT attr1, attr2/(MAX(attr2) OVER Filter Based on SELECT attr1, SUM(attr2)
(PARTITION BY attr1)) FROM t; Aggregation FROM t GROUP BY attr1
HAVING SUM(attr2) > 10;

CREATE INTERFACE Sampling SELECT attr1, ROW_NUMBER() OVER

(Random) (ORDER BY RANDOM()) as random FROM t;
Create view CREATE VIEW AS SELECT…
Sampling SELECT attr1, NTILE(4) OVER (ORDER BY
(Non-Random) date()) as quartile FROM t;
Pugsley 2021

Using SQL and Python For Data Analytics
No ratings yet
Using SQL and Python For Data Analytics
113 pages
SQL__1721960421
No ratings yet
SQL__1721960421
131 pages
Crack Your Data Engineering SQL Round
No ratings yet
Crack Your Data Engineering SQL Round
112 pages
Data Science: Part 2 - SQL
100% (1)
Data Science: Part 2 - SQL
13 pages
Advanced SQL Concepts
No ratings yet
Advanced SQL Concepts
38 pages
DWDM
No ratings yet
DWDM
81 pages
ADBMS Journal
No ratings yet
ADBMS Journal
100 pages
3-Preparing The Data-10-01-2024
No ratings yet
3-Preparing The Data-10-01-2024
127 pages
SQL PPT DDL DML Agg Operator Clauses
No ratings yet
SQL PPT DDL DML Agg Operator Clauses
76 pages
Data preprocessing (1)
No ratings yet
Data preprocessing (1)
77 pages
Practical No 1: Then We Check To See If The Database We Created Is Attached by Typing The Command
No ratings yet
Practical No 1: Then We Check To See If The Database We Created Is Attached by Typing The Command
42 pages
CH13
No ratings yet
CH13
52 pages
Day_20__1734369705
No ratings yet
Day_20__1734369705
47 pages
SQL Cheat Sheet
No ratings yet
SQL Cheat Sheet
62 pages
Happay
No ratings yet
Happay
21 pages
CS-30013 (DMDW) - CS Mid Sept 2024
No ratings yet
CS-30013 (DMDW) - CS Mid Sept 2024
12 pages
Business Statistics Course Outline
100% (2)
Business Statistics Course Outline
5 pages
Data Engineer (3-5 Years of Experience.).PDF
No ratings yet
Data Engineer (3-5 Years of Experience.).PDF
7 pages
3 4 Worksheet For Loacation and Dispersion - PDF
No ratings yet
3 4 Worksheet For Loacation and Dispersion - PDF
17 pages
Hortizontal Aggregation in SQL For Data Mining Analysis To Prepare Data Sets
No ratings yet
Hortizontal Aggregation in SQL For Data Mining Analysis To Prepare Data Sets
11 pages
1692_lab 2
No ratings yet
1692_lab 2
7 pages
Kroenke Dbp16e Chapter 4
No ratings yet
Kroenke Dbp16e Chapter 4
31 pages
Lab01 OALP CE171360 SE1705
No ratings yet
Lab01 OALP CE171360 SE1705
12 pages
SQL Server Queries
No ratings yet
SQL Server Queries
12 pages
SQL Scenario Based Questions-1
No ratings yet
SQL Scenario Based Questions-1
25 pages
Data Analysis With SQL: Mysql Cheat Sheet
No ratings yet
Data Analysis With SQL: Mysql Cheat Sheet
4 pages
DDBS LAB FILE
No ratings yet
DDBS LAB FILE
33 pages
Database Code
No ratings yet
Database Code
11 pages
110 SQL Query Interview Questions and Practice Exercises for Experienced and Fre
No ratings yet
110 SQL Query Interview Questions and Practice Exercises for Experienced and Fre
40 pages
L17-18 PPT IVSem
No ratings yet
L17-18 PPT IVSem
38 pages
15 Solving Problems Involving Test of Hypothesis on Population Proportion Sptc 1703 q4 Fpf
No ratings yet
15 Solving Problems Involving Test of Hypothesis on Population Proportion Sptc 1703 q4 Fpf
37 pages
Improving Analysis of Data Mining by Creating Dataset Using SQL Aggregations
No ratings yet
Improving Analysis of Data Mining by Creating Dataset Using SQL Aggregations
6 pages
Document From Sat
No ratings yet
Document From Sat
14 pages
1) Union and Union All
No ratings yet
1) Union and Union All
11 pages
Lab-4 & 5
No ratings yet
Lab-4 & 5
7 pages
Advanced SQL Techniques
No ratings yet
Advanced SQL Techniques
19 pages
ade_1737191501
No ratings yet
ade_1737191501
29 pages
SQL
No ratings yet
SQL
3 pages
BDST 122 RDBMS
No ratings yet
BDST 122 RDBMS
12 pages
Question
No ratings yet
Question
24 pages
DBMS LAB PROGRAMS
No ratings yet
DBMS LAB PROGRAMS
6 pages
Chapter 2 Data Issues
No ratings yet
Chapter 2 Data Issues
21 pages
r23 Dbms Record
No ratings yet
r23 Dbms Record
8 pages
Database Nest Quiz
No ratings yet
Database Nest Quiz
22 pages
Warehouse and SQL QUESTIONS
No ratings yet
Warehouse and SQL QUESTIONS
14 pages
DBMS 3b(employee department location )
No ratings yet
DBMS 3b(employee department location )
9 pages
Sql_Interview_Questions_Top_100
No ratings yet
Sql_Interview_Questions_Top_100
18 pages
SQL Cheat Sheet GDS
No ratings yet
SQL Cheat Sheet GDS
16 pages
Basic Business Statistics 13th Edition Berenson Solutions Manual pdf download
100% (4)
Basic Business Statistics 13th Edition Berenson Solutions Manual pdf download
44 pages
[UPDATED] SQL Level 3
No ratings yet
[UPDATED] SQL Level 3
9 pages
ASSESSEMENTS SQL - Batch 10
No ratings yet
ASSESSEMENTS SQL - Batch 10
14 pages
Practice Exercises For SELECT Statement
No ratings yet
Practice Exercises For SELECT Statement
8 pages
Tech Mahindra SQL Interview Questions for Data Engineer
No ratings yet
Tech Mahindra SQL Interview Questions for Data Engineer
6 pages
SQL Experiment Ans
No ratings yet
SQL Experiment Ans
16 pages
L2 数量课件
No ratings yet
L2 数量课件
196 pages
Quality Reliability Eng - 2016 - Ali - An Overview of Control Charts For High Quality Processes
No ratings yet
Quality Reliability Eng - 2016 - Ali - An Overview of Control Charts For High Quality Processes
19 pages
Lecture 9 PDF
100% (1)
Lecture 9 PDF
28 pages
Confidence Interval
No ratings yet
Confidence Interval
14 pages
18bit0166 Ayush Kanaujia
No ratings yet
18bit0166 Ayush Kanaujia
12 pages
Data Analysis With SQL: Postgresql Cheat Sheet
No ratings yet
Data Analysis With SQL: Postgresql Cheat Sheet
4 pages
Statistics - Statistical Inference
No ratings yet
Statistics - Statistical Inference
3 pages
Create Table: Create Table As Select
No ratings yet
Create Table: Create Table As Select
4 pages
Lab Mid Spring 2024 Solution
No ratings yet
Lab Mid Spring 2024 Solution
8 pages
Microsoft Word - Math Ia - Final Final
100% (2)
Microsoft Word - Math Ia - Final Final
28 pages
Statistical Inference Practise Question
No ratings yet
Statistical Inference Practise Question
3 pages
Z Score Compiled
No ratings yet
Z Score Compiled
36 pages
DAY3 ASSgn
No ratings yet
DAY3 ASSgn
3 pages
RGIS603 - Assignment 5 Sara
No ratings yet
RGIS603 - Assignment 5 Sara
3 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
35 pages
Sampling and Its Types
No ratings yet
Sampling and Its Types
7 pages
Quiz - Estimators Attempt Review
No ratings yet
Quiz - Estimators Attempt Review
5 pages
Cluster-Weighted Modeling
No ratings yet
Cluster-Weighted Modeling
3 pages
4 Statistics and Probability G11 Quarter 4 Module 4 Identifying The Appropriate Test Statistics Involving Population Mean
78% (18)
4 Statistics and Probability G11 Quarter 4 Module 4 Identifying The Appropriate Test Statistics Involving Population Mean
27 pages
Table of Contents
No ratings yet
Table of Contents
4 pages
One Way Anova Vs Two Way Anova: Turkey HSD
No ratings yet
One Way Anova Vs Two Way Anova: Turkey HSD
4 pages
S1 January 2003 Mark Scheme
No ratings yet
S1 January 2003 Mark Scheme
3 pages
Lect 4
No ratings yet
Lect 4
9 pages
GEMECHISPUBLISHEDPAPER
No ratings yet
GEMECHISPUBLISHEDPAPER
8 pages
Ch5 MMW BSN
No ratings yet
Ch5 MMW BSN
18 pages
Quasi Experimental Design
100% (1)
Quasi Experimental Design
4 pages
Chapter 8 Specialized Audit
No ratings yet
Chapter 8 Specialized Audit
9 pages
Answer:: Activity in Statistics
No ratings yet
Answer:: Activity in Statistics
3 pages
Experimental Research
No ratings yet
Experimental Research
4 pages
Assignment#2 RT WQ2021
No ratings yet
Assignment#2 RT WQ2021
2 pages
Worksheet 12: Averages and Measures of Spread: Answers To Extended Revision Exercises: Data Handling
No ratings yet
Worksheet 12: Averages and Measures of Spread: Answers To Extended Revision Exercises: Data Handling
4 pages