100% found this document useful (1 vote)

132 views13 pages

SQL for Aspiring Data Scientists

SQL is an important skill for data scientists to extract and prepare data from multiple sources for machine learning models. Some key reasons include: (1) In industry, datasets often need to be prepared from multiple tables using SQL queries involving joins, aggregations, etc. (2) As a machine learning engineer experiments with different features, SQL is useful to try new feature extractions. (3) SQL is needed for general analytics on big data beyond the limitations of tools like Excel. The document then provides examples of SQL case studies and questions often asked in interviews.

Uploaded by

Himanshu Patidar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

132 views13 pages

SQL for Aspiring Data Scientists

Uploaded by

Himanshu Patidar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Data Science

Part 2 - SQL

CANTILEVER LABS IS OFFICIAL TRAINING PARTNER OF

IIT BOMBAY | IIT MADRAS | IIT KHARGPUR | IIT HYDERABAD

BITS PILANI | BITS HYDERABAD | NIT ROURKELA | SYMBIOSIS

JNTU HYDERABAD | SREENIDHI | MAHINDRA UNIVERSITY | GITAM srm

chennai | gnits | cmr-cet | GEETHANJALI | CHITKARA

& many more

Index
Part 2/4 - SQL
Part 1 - Python Part 3 - P&S Part 4 - Deep into Data Science

01
Lets understand first, As a Data scientist why do we need SQL?

02
Case study 1: (PAYPAL interview)

03
Case study 3 : Joins

04
Case study 4 : Analyzing telecom data

05
GENERAL SQL QUESTIONS
Lets understand first, As a Data scientist why do we need SQL?

When we learn machine learning academically, we use datasets from Kaggle or other
such websites. Many of those datasets are readymade (we directly get a csv file with x
rows and y columns). In industrial setting, we have to prepare datasets from multiple
data sources(tables), build hypothesis and test them. There can be multiple SQL queries
running at backend to prepare just one column(feature) in your dataset, which may
involve aggregation, ordering, windowing, joins and many such SQL operations.

As machine learning models performance majorly depends on quality features its been
trained on, So while project is in development phase, we have to do lot of
experimentation with hyperparameters and quality features, for which we have to try
new set of features for improvements, that’s where at least basic understanding of SQL
comes handy. Later when pilots are successful, these data extraction pipelines will be
automated by data engineers where you need advanced knowledge to optimize
workflows.

Also for general purpose analytics to gain more insights from data we need SQL as
Excel has its limitations when it comes to big data.

Some examples of features that you will be using as input dataset of your machine
learning model in various industries are,

Telecom : How many times customer did recharge after expiry of his prepaid plan, Avg
of last 3 recharges MRP

Finance : Sum/avg of top 3 high value transactions of customer, days passed since
recent transaction, creditworthiness

Manufacturing : Number of times maintenance activity performed, days between each

maintenance or breakdowns of machine

Ecommerce : Tag customers who did > 50$ purchase in their 2nd transaction (high
value repeating customers)

In following sections you will find 2 types of question sets, one will be case study
based, which are mostly asked in interviews, second will be fundamental questions. At
the end we also have interview checklist for SQL and some useful links to learn and
practice sql.

01 Data Science | Part 2 - SQL

Case study 1: (PAYPAL interview)

Table 1 (daily transaction data) columns : Pymt_ID, Pymt_Date, Sndr_ID, Rcvr_ID, Amt

Table 2 columns : Rcvr_ID, Rcvr_name, Rcvr_Industry

Table 3 columns : Sndr_id, Sndr_name, Sndr_age

Q. Which industry has 3rd highest total receiving amount.

Case study 2 : Window functions

Table Name: Employee_MST (keeps record of active employee salary and dept)

Table Name: Employee_DTL (keeps record of all employees associated with company)

Q . Refer Above tables and Write a Query which gives below output,

02 Data Science | Part 2 - SQL

A. Output table has only employee names which have joined recently. Here concept
used is first get department wise ranking using window function with descending order
of dates and use that table with alias and then get the data which has recent date rank
(row_number) to get only recently joined employee

Case study 3 : Joins

Employee_name

Employee_dtl

Q. Write a Query which gives below Output.

03 Data Science | Part 2 - SQL

Case study 4 : Analyzing telecom data

Table : 1 year data of recharges done by subscribers

Q1. HOW MANY TOTAL RECHARGES EACH SUBSCRIBER HAS DONE IN JUNE MONTH

Q2. WHICH RECHARGE PLAN MRP IS SUBSCRIBED MOST

Q3. EXTRACT CUSTOMERS WHO HAVE DONE MORE THAN 15 RECHARGES

04 Data Science | Part 2 - SQL

Q4. GET TOTAL, AVG AND MAXIMUM OF RECENT 3 RECHARGE AMOUNT OF
SUBSCRIBER

Q5. THERE ARE HOW MANY SUCH CUSTOMERS IN SYSTEM, WHO HAVE NOT DONE
ANY RECHARGE FOR LAST 35 DAYS

Q6. GET RECENT RECHARGE OF SUBSCRIBERS

GENERAL SQL QUESTIONS

Q1) What is the difference between ISNULL and COALESCE?

ISNULL is used when we want null values as imputed by our specified value in final table.

COALESCE returns first non null entry

Q2. What are different SQL commands : ( As a data scientist we majorly deal with
DDL, DML,DQL )

05 Data Science | Part 2 - SQL

Q3 . Types of joins in SQL :

Q4 . Data types in SQL :

Q5. What is the difference between Delete, Truncate and Drop ?

Delete : We can delete all rows or targeted rows based on condition

Truncate : We can delete all rows from table at once

Drop : We can delete entire table from database

06 Data Science | Part 2 - SQL

Q6. How is “PARTITION BY” different from “GROUP BY”?
PARTITION BY gives aggregated columns with each record in the specified table. If we have 15 records in
the table, the query output SQL PARTITION BY also gets 15 rows. On the other hand, GROUP BY gives
one row per group in result set.

E.g. : Suppose we have below table of student heights in class A and B,

We want to know avg. height of students from class A and B,

Group by clause will give below output

But, Now if I want to see each students height compared to their class avg. height, we will use partition
by clause as below.

Output :

Now its more informative for me to see each student height as well as class avg.

Q7. What is order of each SQL clause

07 Data Science | Part 2 - SQL

Q8. What is the difference between RANK() ,ROW_NUMBER() and DENSE_RANK() ?
Rank() : it is used in window function, it ranks the data as per order given in window. It skips the ranking
if it finds similar record for that window

Dense_rank() : it works in similar way as of rank(), but it does not skip ranking if it finds duplicate in
window

Row_num() : it returns simply row number of record in window function.

Q9.
Grouping Data and Using Aggregate Functions

Ordering Data Results

Selecting Data from Multiple Tables ( Joins )

Q10. Different types of aggregate functions

COUNT()
SUM()
MIN()
MAX()
AVG()
STDEV()
VAR()

08 Data Science | Part 2 - SQL

Q11. What are Constraints in SQL?
NOT NULL - Restricts NULL value from being inserted into a column.
CHECK - Verifies that all values in a field satisfy a condition.
DEFAULT - Automatically assigns a default value if no value has been specified for the field.
UNIQUE - Ensures unique values to be inserted into the field.
INDEX - Indexes a field providing faster retrieval of records.
PRIMARY KEY - Uniquely identifies each record in a table.
FOREIGN KEY - Ensures referential integrity for a record in another table.

Q12 . What are ACID properties?

Atomicity: This property ensures that the transaction is completed in all-or-nothing way.
Consistency: This ensures that updates made to the database is valid and follows rules and
restrictions.
Isolation: This property ensures integrity of transaction that are visible to all other transactions.
Durability: This property ensures that the committed transactions are stored permanently in the
database

Q13 . How to find the 5th highest salary in SQL?

Q14. What is cte in SQL

CTEs are Common Table Expressions that are used to create temporary result tables from which data
can be retrieved/ used.

Interview checklist for SQL :

Before interview, you should have at least solved problems that contain following SQL
clauses.

Group by, Order by, having, window functions, is null, rank, dense_rank, row_number,
min, max, avg, stdev, count, all types of joins, like, wildcards,.

09 Data Science | Part 2 - SQL

Useful links :

[Link]

[Link]
intro\
[Link]

09 Data Science | Part 2 - SQL

Part 1/4 - Python

Part 2 - SQL

Next Part 3 - P&S

Part 4 - Deep into Data Science

@cantilever_labs

@cantilever labs

[Link]

Data Science | Part 1 - Python

SQL PDF
No ratings yet
SQL PDF
28 pages
Advanced SQL Techniques for Data Science
No ratings yet
Advanced SQL Techniques for Data Science
38 pages
SQL For Data Analysis PDF
100% (1)
SQL For Data Analysis PDF
10 pages
SQL Fundamentals
No ratings yet
SQL Fundamentals
27 pages
SQL Tutorial
No ratings yet
SQL Tutorial
787 pages
SQL Mastery for Job Seekers
No ratings yet
SQL Mastery for Job Seekers
28 pages
Real Data Analyst Interview Questions Answers
No ratings yet
Real Data Analyst Interview Questions Answers
15 pages
Database Testing
No ratings yet
Database Testing
52 pages
SQL Combined
No ratings yet
SQL Combined
24 pages
SQL Doc
No ratings yet
SQL Doc
39 pages
Crack Your Data Engineering SQL Round
100% (1)
Crack Your Data Engineering SQL Round
112 pages
SQL Interview Questions Guide
No ratings yet
SQL Interview Questions Guide
11 pages
SQL Basics 1752319177
No ratings yet
SQL Basics 1752319177
37 pages
Module-Ii 2
No ratings yet
Module-Ii 2
99 pages
SQL and Data Analysis Interview Questions
No ratings yet
SQL and Data Analysis Interview Questions
9 pages
A Complete Data Science Interview With 100 Questions
100% (1)
A Complete Data Science Interview With 100 Questions
57 pages
1 Complete SQL For Data Science Cheatsheet
No ratings yet
1 Complete SQL For Data Science Cheatsheet
3 pages
Ultimate SQL Interview Question Bank
No ratings yet
Ultimate SQL Interview Question Bank
4 pages
SQL For Everyone
No ratings yet
SQL For Everyone
11 pages
SQL For Data Analysis Cheat Sheet-By Srija Biswas
No ratings yet
SQL For Data Analysis Cheat Sheet-By Srija Biswas
22 pages
SQL Interview Questions
No ratings yet
SQL Interview Questions
4 pages
SQL 1721960421
No ratings yet
SQL 1721960421
131 pages
SQL Workshop
No ratings yet
SQL Workshop
22 pages
SQL Master
No ratings yet
SQL Master
10 pages
SQL Interview
No ratings yet
SQL Interview
6 pages
3 Notes of 3 Unit
No ratings yet
3 Notes of 3 Unit
36 pages
SQL for R Users: A Beginner's Guide
No ratings yet
SQL for R Users: A Beginner's Guide
9 pages
SQL Theory With Query
No ratings yet
SQL Theory With Query
11 pages
SQL Topics - Aasif Codes
No ratings yet
SQL Topics - Aasif Codes
3 pages
SQL For Everyone (Definitive Guide)
No ratings yet
SQL For Everyone (Definitive Guide)
10 pages
SQL
No ratings yet
SQL
12 pages
A Data Pipeline Should Address These Issues:: Topics To Study
No ratings yet
A Data Pipeline Should Address These Issues:: Topics To Study
10 pages
Top SQL Interview Questions 2024
No ratings yet
Top SQL Interview Questions 2024
20 pages
SQL Interview Q&A Guide
No ratings yet
SQL Interview Q&A Guide
6 pages
KPMG Data Analyst Interview Questions
No ratings yet
KPMG Data Analyst Interview Questions
30 pages
The Most Commonly Used SQL Queries
No ratings yet
The Most Commonly Used SQL Queries
29 pages
SQL Scenario-Based Interview Questions & Answers: Nitya Cloudtech PVT LTD
No ratings yet
SQL Scenario-Based Interview Questions & Answers: Nitya Cloudtech PVT LTD
14 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
21 pages
SQL
No ratings yet
SQL
9 pages
Advanced SQL Concepts Explained
No ratings yet
Advanced SQL Concepts Explained
5 pages
Complete SQL Overview
No ratings yet
Complete SQL Overview
4 pages
10 SQL Interview Questions To Prepare As A Data Analyst
No ratings yet
10 SQL Interview Questions To Prepare As A Data Analyst
13 pages
SQL Interview Questions and Answers
No ratings yet
SQL Interview Questions and Answers
34 pages
SQL Questions
No ratings yet
SQL Questions
14 pages
Aaaaaa
No ratings yet
Aaaaaa
15 pages
SQL Tutorial For Beginners
No ratings yet
SQL Tutorial For Beginners
10 pages
SQL Tutorial On Data Analysis in R
100% (1)
SQL Tutorial On Data Analysis in R
5 pages
Real Data Analyst Interview Questions Detailed
No ratings yet
Real Data Analyst Interview Questions Detailed
14 pages
SQL Problems
No ratings yet
SQL Problems
18 pages
Learn
No ratings yet
Learn
31 pages
70+ SQL Interview Questions
No ratings yet
70+ SQL Interview Questions
19 pages
Real DSA and SQL Interview Questions Solutions
No ratings yet
Real DSA and SQL Interview Questions Solutions
15 pages
Data Analytics - Advanced
No ratings yet
Data Analytics - Advanced
62 pages
SQL Interview Guide For Three Previous Posts
No ratings yet
SQL Interview Guide For Three Previous Posts
15 pages
Top Advanced SQL Interview Questions & Answers
No ratings yet
Top Advanced SQL Interview Questions & Answers
6 pages
SQL Guide for Data Analysts
No ratings yet
SQL Guide for Data Analysts
11 pages
DBMS Lab 2-2
No ratings yet
DBMS Lab 2-2
42 pages
100 SQL Questions With Real Examples-2
No ratings yet
100 SQL Questions With Real Examples-2
16 pages
Martin Paper Software Principles
No ratings yet
Martin Paper Software Principles
56 pages
Jetpack Compose Animation Guide
No ratings yet
Jetpack Compose Animation Guide
1 page
Efficient Array Operations with BIT
No ratings yet
Efficient Array Operations with BIT
3 pages
20 Coding Patterns To Master MAANG Interviews
No ratings yet
20 Coding Patterns To Master MAANG Interviews
22 pages
Steam Reviews Sentiment Analysis
No ratings yet
Steam Reviews Sentiment Analysis
7 pages
Trusted Tester Guide for Web Compliance
No ratings yet
Trusted Tester Guide for Web Compliance
93 pages
009-3296-055 (Utility Solutions Application Guide) RevisionA
No ratings yet
009-3296-055 (Utility Solutions Application Guide) RevisionA
64 pages
UML Diagrams for ATM & Student DB
No ratings yet
UML Diagrams for ATM & Student DB
7 pages
Computer Graphics Practical Manual
No ratings yet
Computer Graphics Practical Manual
27 pages
Lec 5
No ratings yet
Lec 5
35 pages
DoSelect Modules
No ratings yet
DoSelect Modules
24 pages
Upgrade Postgresql Streaming Replication Setup
No ratings yet
Upgrade Postgresql Streaming Replication Setup
6 pages
Pig Latin Language and Data Types Guide
No ratings yet
Pig Latin Language and Data Types Guide
10 pages
Important User Information: 2.2. Physical Installation
No ratings yet
Important User Information: 2.2. Physical Installation
12 pages
Software Requirements Guide
No ratings yet
Software Requirements Guide
12 pages
OPERA OXI Interface Setup Guide
No ratings yet
OPERA OXI Interface Setup Guide
9 pages
Lpic101 500
No ratings yet
Lpic101 500
65 pages
bộ đề 1 word
No ratings yet
bộ đề 1 word
10 pages
Citra Emulator Log Analysis on Android
No ratings yet
Citra Emulator Log Analysis on Android
3 pages
Group 4 Review 1
No ratings yet
Group 4 Review 1
13 pages
Real-Time Temperature Data Logger
No ratings yet
Real-Time Temperature Data Logger
29 pages
Abhay Garg: Software Engineer at Infomo
No ratings yet
Abhay Garg: Software Engineer at Infomo
1 page
Working With Advanced Excel 2013 - Activity Book
100% (2)
Working With Advanced Excel 2013 - Activity Book
139 pages
Collision Resistance in Blockchain
No ratings yet
Collision Resistance in Blockchain
9 pages
Wireshark Lab Guide for Students
No ratings yet
Wireshark Lab Guide for Students
8 pages
MS Office Shortcuts Cheat Sheet
No ratings yet
MS Office Shortcuts Cheat Sheet
3 pages
User Report - Computer Fundamentals (GST103 - 251) - NOUN
No ratings yet
User Report - Computer Fundamentals (GST103 - 251) - NOUN
2 pages
C Programming Exercises and Solutions
No ratings yet
C Programming Exercises and Solutions
5 pages
Information Systems Basics Guide
No ratings yet
Information Systems Basics Guide
26 pages
C++ Graphics with graphics.h Guide
No ratings yet
C++ Graphics with graphics.h Guide
17 pages
04-eSight&NCE-Campus POL Introduce - 20240423
No ratings yet
04-eSight&NCE-Campus POL Introduce - 20240423
23 pages
Swann Security QSG IOS
No ratings yet
Swann Security QSG IOS
1 page
SQL Table Structure Modifications
No ratings yet
SQL Table Structure Modifications
6 pages
Python in Action
No ratings yet
Python in Action
259 pages

SQL for Aspiring Data Scientists

Uploaded by

SQL for Aspiring Data Scientists

Uploaded by

Data Science

CANTILEVER LABS IS OFFICIAL TRAINING PARTNER OF

IIT BOMBAY | IIT MADRAS | IIT KHARGPUR | IIT HYDERABAD

BITS PILANI | BITS HYDERABAD | NIT ROURKELA | SYMBIOSIS

JNTU HYDERABAD | SREENIDHI | MAHINDRA UNIVERSITY | GITAM srm

& many more

Manufacturing : Number of times maintenance activity performed, days between each

01 Data Science | Part 2 - SQL

Table 2 columns : Rcvr_ID, Rcvr_name, Rcvr_Industry

Table 3 columns : Sndr_id, Sndr_name, Sndr_age

Q. Which industry has 3rd highest total receiving amount.

Case study 2 : Window functions

02 Data Science | Part 2 - SQL

Case study 3 : Joins

Q. Write a Query which gives below Output.

03 Data Science | Part 2 - SQL

Case study 4 : Analyzing telecom data

Table : 1 year data of recharges done by subscribers

Q2. WHICH RECHARGE PLAN MRP IS SUBSCRIBED MOST

Q3. EXTRACT CUSTOMERS WHO HAVE DONE MORE THAN 15 RECHARGES

04 Data Science | Part 2 - SQL

Q6. GET RECENT RECHARGE OF SUBSCRIBERS

GENERAL SQL QUESTIONS

COALESCE returns first non null entry

05 Data Science | Part 2 - SQL

Q4 . Data types in SQL :

Q5. What is the difference between Delete, Truncate and Drop ?

Truncate : We can delete all rows from table at once

Drop : We can delete entire table from database

06 Data Science | Part 2 - SQL

E.g. : Suppose we have below table of student heights in class A and B,

We want to know avg. height of students from class A and B,

Group by clause will give below output

Q7. What is order of each SQL clause

07 Data Science | Part 2 - SQL

Row_num() : it returns simply row number of record in window function.

Ordering Data Results

Selecting Data from Multiple Tables ( Joins )

Q10. Different types of aggregate functions

08 Data Science | Part 2 - SQL

Q12 . What are ACID properties?

Q13 . How to find the 5th highest salary in SQL?

Q14. What is cte in SQL

Interview checklist for SQL :

09 Data Science | Part 2 - SQL

09 Data Science | Part 2 - SQL

Next Part 3 - P&S

Data Science | Part 1 - Python

You might also like