0% found this document useful (0 votes)
2 views3 pages

SQL For Data Scientist

A database schema outlines the structure of a database, detailing tables, columns, data types, relationships, primary and foreign keys, indexes, and constraints. It also covers SQL queries for data manipulation, including selection, filtering, sorting, and joining tables, as well as using aggregation functions and subqueries. The document provides syntax examples for various SQL operations essential for data science applications.

Uploaded by

sci.mointariq
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views3 pages

SQL For Data Scientist

A database schema outlines the structure of a database, detailing tables, columns, data types, relationships, primary and foreign keys, indexes, and constraints. It also covers SQL queries for data manipulation, including selection, filtering, sorting, and joining tables, as well as using aggregation functions and subqueries. The document provides syntax examples for various SQL operations essential for data science applications.

Uploaded by

sci.mointariq
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

A database schema is the blueprint or structure of a database, defining how data is organized and

related. It includes:

1. Tables (or relations): Represented as rectangles, tables store data in rows and columns.
2. Columns (or attributes): Represented as vertical lines, columns define the data type and
constraints for each table.
3. Data Types: Define the type of data stored in each column (e.g., integer, string, date).
4. Relationships: Represented as lines connecting tables, relationships define how tables are linked
(e.g., one-to-one, one-to-many, many-to-many).
5. Primary Keys (PK): Unique identifiers for each table, ensuring data consistency and integrity.
6. Foreign Keys (FK): Columns that reference primary keys in other tables, establishing
relationships.
7. Indexes: Data structures that improve query performance by providing quick access to specific
data.
8. Constraints: Rules that enforce data consistency and integrity (e.g., NOT NULL, UNIQUE, CHECK).

 NUMERIC is most common Float data type in sql which can store upto 38 digits float
 VARCHAR is most common string data type in sql
 < > means not equal to i.e. !=
 SELECT (x/y) gives integer nearest integer answer if x and y are integers and float if x&y are float
i.e 2/10=0 and 2.0/10 =0.200000…

SQL Queries for Data Science:

1. Select All Columns from a Table: SELECT * FROM TableName;


2. Select Specific Columns from a Table: SELECT column1, column2 FROM TableName;
3. Select with Filter (WHERE clause): SELECT * FROM TableName WHERE condition;
4. Select with Multiple Conditions: SELECT * FROM TableName WHERE condition1 AND/OR
condition2; Note: We can also use BETWEEN i.e x BETWEEN y AND z (y&z are inclusive) ; Note:
We can also use IN i.e. x in (y,z,r) etc.
5. Select with Sorting (ORDER BY clause): SELECT * FROM TableName ORDER BY column ASC/DESC;
6. Select with Limiting Rows (LIMIT clause): SELECT * FROM TableName LIMIT n;
7. Select with Aliases (AS keyword for renaming columns or tables): SELECT column1 AS alias1,
column2 AS alias2 FROM TableName;
8. Select Distinct Values (DISTINCT keyword): SELECT DISTINCT column FROM TableName;
9. Select with Pattern Matching (LIKE operator for text matching): SELECT * FROM TableName
WHERE column LIKE 'pattern';
Patterns: A% matches all names with A at start, %A matches all names with A at end, _A%
matches all names with A as 2nd element, %A__ matches all elements with A as 3rd last element.
10. Saving query as a View (Virtual Table): CREATE VIew view_name AS SELECT (Complete Query)
Note: We can also query this new virtual table
11. Select with Aggregation Functions (COUNT, SUM, AVG, MAX, MIN):
i. SELECT COUNT(column) FROM TableName; #Counts no. of non-null records in column
ii. SELECT COUNT(*) FROM TableName #Counts no. of non-null records in TableName
iii. SELECT COUNT(*) FROM TableName WHERE * IS NULL # counts null values in table
iv. SELECT SUM(column) FROM TableName WHERE condition;
v. SELECT AVG(column) FROM TableName GROUP BY column;
vi. SELECT MAX(column) FROM TableName;
vii. SELECT MIN(column) FROM TableName;
12. SELECT ROUND(column, n) FROM TableName # rounds numerical values upto n decimals Note:
#All above are applied similar to COUNT
13. Select with Grouping (GROUP BY clause): SELECT column1, SUM(column2) FROM TableName
GROUP BY column1; Note: If we want to filter after group by we can’t use WHERE because it is
executed earlier hence we use HAVING condition (after group by)
14. con.close() to close connection with DB
15. ORDER OF SYNTAX:
SELECT x, Agg_func(y) AS alias_y
FROM table
WHERE conditions
GROUP BY x
HAVING conditions
ORDER BY x ASC/DESC
LIMIT n:
16. INNER JOIN (Picks common values i.e. Intersection)
SELECT alias1.col1, alias1.col2, …,alias2.col1,… ,right-col,left-col #We use .col with cols that are
commonly named in both tables to avoid mixup, for separate names no need to use. We used
aliases instead of table names bcz FROM gets executed before SELECT‘.’.
FROM left-table AS alias1
INNER JOIN right-table As alias2
ON alias1.key1=alias2.key2 #the keys are different named for tables, for same named simply use
USING(key)
>>Multiple Join: Same above +
INNER JOIN right-table2 as alias3
ON alias2.keyx=alias3.key3
>>On Multiple Keys: Same above +
ON aliasx.keyx=aliasx.keyx
AND aliasx.keyy=aliasx.keyy
17. SELF JOIN: INNER JOIN with both left and right table as same one.
18. LEFT/RIGHT/FULL JOIN (takes all values from left/right/Both table and only matching keys from
right/left/Both table)
Syntax same as INNER JOIN just replace inner with LEFT/RIGHT/Both
19. CROSS JOIN: A cross join, also known as a Cartesian product, is a type of join in SQL that returns
the Cartesian product of two tables. It combines each row of one table with each row of another
table, resulting in a large result set. The syntax for a cross join is:
SELECT *
FROM table1
CROSS JOIN table2;
20. SET Operations: Union/Union all (merge all fields of both tables without duplication for UNION
and with duplicates for UNION ALL, the no. of cols. Must be same and also respective cols. Must
have same dtype). Syntax:
SELECT col1, col2
FROM Table1
UNION (or UNION ALL)
SELECT col1, col2
FROM Table2

#Same applies for INTERSECT which retains common values only and EXCEPT which ignores
common values and retains other

21. Join with Subqueries (Nested Queries): SELECT * FROM TableName WHERE column IN (SELECT
column FROM AnotherTable WHERE condition);
22. Subquery in SELECT: SELECT col1, (SELECT col FROM Table2) AS col2 FROM Table1
23. Subquery in FROM:
We can add two tables in FROM clause where 2nd table is temporary view
SELECT col1, col2
FROM table1, table2
WHERE col1.table1=col1.table2 (a random condition)
#Inplace of table2 we can use (subquery) as in SELECT

You might also like