0% found this document useful (0 votes)
7 views

SQL for Data Science

The document provides an overview of databases, including types such as relational and non-relational databases, and key concepts like DBMS, RDBMS, SQL commands, and normalization. It explains the importance of data integrity, constraints, and various SQL components such as Data Definition Language, Data Manipulation Language, and transaction control. Additionally, it covers the significance of keys in database design and the process of creating efficient database structures to minimize redundancy and ensure data accuracy.

Uploaded by

mukesh kumar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

SQL for Data Science

The document provides an overview of databases, including types such as relational and non-relational databases, and key concepts like DBMS, RDBMS, SQL commands, and normalization. It explains the importance of data integrity, constraints, and various SQL components such as Data Definition Language, Data Manipulation Language, and transaction control. Additionally, it covers the significance of keys in database design and the process of creating efficient database structures to minimize redundancy and ensure data accuracy.

Uploaded by

mukesh kumar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 107

Data Blogs Follow Tajamul Khan

Database - Types
DBMS vs RDBMS
ER Diagram
Relational Database Schema
SQL - ACID Properties, Commands
Data Types
Constraints
Errors in SQL
Keys
Normalisation - 1NF, 2NF, 3NF, BCNF
Operators
Clauses
Alias
Case Statement
Data Blogs Follow Tajamul Khan

Data refers to raw, unprocessed facts

processed and organized data that is meaningful and useful.

1
Data Blogs Follow Tajamul Khan

Same piece of data exists in multiple places in Database

Data meets the applicable standards Invalid / Wrong data Type


Data Not Found
No Rows Selected

Data is valid and consistent. It is used to to restrict invalid data Emp ID Name
from entering the table. It can be achieved by:
Data Types 101 Zac

Constraints
Tom 23

2
Data Blogs Follow Tajamul Khan

Data Container

It is a storage or a container in which we store and organise data


Databases help us with efficiently storing, accessing, and
manipulating data

3
Data Blogs Follow Tajamul Khan

4
Data Blogs Follow Tajamul Khan

Non
Relational

5
Data Blogs Follow Tajamul Khan

1960

The data & information is stored in the form of numerous physical


files with no relation among each other.
Data Redundancy
Time Consuming to search for data or information
It is recommended only if data to be maintained is minimum

The data is stored in the form of tree like structure organised in


parent-child relationship, parent record is allowed multiple child
records.
Due to relationship data redundancy was reduced
Searching was faster as compare to flat files

6
Data Blogs Follow Tajamul Khan

Just like hierarchical database, the data is stored in tree like


structure except child can connect to multiple parent records
Due to relationship, redundancy was reduced more and
search was fast as compared to flat and hierarchical.
Complex design, if one node fails entire model shuts down

Big Data due to dynamic Schema

A non-relational database, often referred to as NoSQL, is a type


of database that doesn't store data in tabular forms. Instead, it
stores data as key-value pairs, documents (json), graphs, wide-
columns, time series, search engine
These databases are designed to handle large volumes of
data efficiently and are highly scalable.

7
Data Blogs Follow Tajamul Khan

E.F Codd in 1970

The Data is stored in the form of Tables. These tables are connected to each other with the help of
primary and foreign key due to which the data duplicity was completely reduced.
This model is effective and efficient than all other Databases

8
Data Blogs Follow Tajamul Khan

Data Base Management System

Software used to manage data base.


It is used to create, manage, and organizig data into a Database,
Data is stored in the form of files, no relations & Normalisation

9
Data Blogs Follow Tajamul Khan

Relational Data Base Management System

Advanced version of DBMS allows to access the data more efficiently.


Data is stored in the form of Tables unlike DBMS.
Enhanced Security features, Good performance and also it can store huge
volume of data into the Database

10
Data Blogs Follow Tajamul Khan

11
Data Blogs Follow Tajamul Khan

12
Data Blogs Follow Tajamul Khan

13
Data Blogs Follow Tajamul Khan

First step for designing Relational Database

A diagram that represents relationships among entities in a database.


An Entity Relationship Diagram (ER Diagram) pictorially explains the relationship between entities to be
stored in a database

14
Data Blogs Follow Tajamul Khan

15
Data Blogs Follow Tajamul Khan

An entity may be any object, class, person or place.


It is represented as rectangles.

independent

dependent

16
Data Blogs Follow Tajamul Khan

The attribute is used to describe the property of an


entity.
Eclipse is used to represent an attribute.

17
Data Blogs Follow Tajamul Khan

A relationship is used to describe the relation between


entities.
Diamond is used to represent the relationship.

18
Data Blogs Follow Tajamul Khan

19
Data Blogs Follow Tajamul Khan

20
Data Blogs Follow Tajamul Khan

Case Insensitive Sequel RDBMS

Structured Query Language is a programming language used to interact with


database. SQL enables a user to create, read, update and delete relational databases,
tables or rows

21
Data Blogs Follow Tajamul Khan

ensure reliable and secure processing of


database transactions.

22
Data Blogs Follow Tajamul Khan

23
Data Blogs Follow Tajamul Khan

Data Definition Language


Used to define structure of databases and their objects
(CREATE, DROP, ALTER, RENAME, TRUNCATE)

24
Data Blogs Follow Tajamul Khan

All table commands

We Can’t Add Null Constraint in A Filled Column but we can add Null Constraint in empty Column

25
Data Blogs Follow Tajamul Khan

Data Manipulation Language


Used to manipulate data within database.
(INSERT, UPDATE, DELETE)

26
Data Blogs Follow Tajamul Khan

Data Control Language


DCL is an important in ensuring database security by controlling access and permission.
(GRANT, REVOKE)

27
Data Blogs Follow Tajamul Khan

Transaction Control Language


A transaction in SQL is a sequence of one or more SQL statements that are executed as a single
unit of work. TCL commands are used to initiate, execute, and terminate transactions.
(COMMIT, ROLLBACK, SAVEPOINT)

28
Data Blogs Follow Tajamul Khan

Data Query Language

used to retrieve data from databases


(SELECT)

Select is used with where


which uses Operators

used to filter records


SELECT ID, NAME, SALARY
FROM CUSTOMERS
WHERE SALARY > 2000;

29
Data Blogs Follow Tajamul Khan

What type of data, a column can hold.


Character
Numeric
Data and Time
Boolean

VARCHAR is memory efficient

30
Data Blogs Follow Tajamul Khan

1 = True
0 = False

31
Data Blogs Follow Tajamul Khan

Precision is the number of digits in a number. Scale is the number of digits to the right of the
decimal point in a number. For example, the number 123.45 has a precision of 6 and a scale of 2.

it gives the approximate value of the it has the fixed number of


stored number. Rounds up to digits after the decimal point.

32
Data Blogs Follow Tajamul Khan

Series in Post gre

Identity column of a table is a column whose value increases automatically. Identity column can be
used to uniquely identify the rows in the table.

The ‘ID’ column of the table starts from 1 as the seed value provided is 1
and is incremented by 1 at each row.

33
Data Blogs Follow Tajamul Khan

Constraints are rules used to limit the type of data entering the columns. It ensures
accuracy and reliability of the data

Ensures that a column cannot have NULL value

Create Table Customers(


Provides a default value for a column when none is specified. ID INT NOT NULL,
Can have NULL Value Salary DECIMAL(5,2) DEFAULT(5000),
Ensures that all rows in a column are different. FingerprintID INT UNIQUE,
Can’t have NULL Value PRIMARY KEY (ID),
Uniquely identifies each row/record in a database table FOREIGN KEY SID INT references ZIP(ID),
Age INT CHECK(AGE > 18)

Ensures column can be referenced by Primary key

Ensures that all the values in a column satisfies certain conditions.

34
Data Blogs Follow Tajamul Khan

When SQL statements do not follow the correct syntax and structure of the language

Provides a default value for a column when none is specified.

When SQL violates one or more constrainsts on the database

When trying to insert non matching data type

Problem that occurs during the execution of a transaction. A transaction in SQL is a


sequence of one or more SQL statements that are treated as a single unit of work

35
Data Blogs Follow Tajamul Khan

"Key" refers to a column or set of columns in a table that uniquely identify each
row within that table
To uniquely identify a row
To enforce Data integrity & Constraints
To establish relationship between multiple tables in the database

Primary Key
Foreign Key
Unique Key
Composite Key
Alternate Key
Candidate
Super Key

36
Data Blogs Follow Tajamul Khan

A primary key uniquely identifies each record in a table.


It must contain unique values and cannot contain NULL values.
Only one primary key is allowed per table.
CREATE TABLE employees (employee_id INT PRIMARY KEY);

A foreign key establishes a link between two tables, by referencing the primary key in another.
It ensures referential integrity by enforcing a relationship between the tables.
CREATE TABLE orders ( order_id INT PRIMARY KEY, customer_id INT,
FOREIGN KEY (customer_id) REFERENCES customers(customer_id) );

A unique key ensures that all values in a column or a group of columns are unique.
Unlike primary keys, unique keys can contain NULL values (2 NULL are not same)
CREATE TABLE students ( student_id INT UNIQUE, ... );

37
Data Blogs Follow Tajamul Khan

A composite key consists of multiple columns that together uniquely identify a record in a
table.
It's useful when a single column cannot uniquely identify records, but a combination of columns
can.
CREATE TABLE orders ( order_id INT, product_id INT,
PRIMARY KEY (order_id, product_id) );

An alternate key is a candidate key that is not selected as the primary key.
It could serve as a unique identifier if the primary key didn't exist.

CREATE TABLE students ( student_id INT PRIMARY KEY,


email VARCHAR(50) UNIQUE, ... );

38
Data Blogs Follow Tajamul Khan

minimal

Minimal set of columns that uniquely identifies each row in a table


The combination of EmployeeID and Email uniquely identifies each employee

not minimal

Set of columns that uniquely identifies each row in a table It may contain more columns than
necessary for uniqueness.
{EmployeeID, Email, SSN} also uniquely identifies each employee. This set is a super key because it
goes beyond the minimal requirement for uniqueness.

In summary, while both candidate keys and super keys ensure uniqueness in a table, candidate keys
are minimal sets of columns fulfilling this requirement, while super keys can include more columns
than necessary.

39
Data Blogs Follow Tajamul Khan

Deficiency in database design which lead to data redundancy, and integrity.

40
Data Blogs Follow Tajamul Khan

Normalisation helps improve database design by following NFs which ensures


reduce redundancy,
ensure data integrity.
prevent anomalies,

1NF
2NF
3NF
BCNF No Functional Dependency

4NF
DKNF

41
Data Blogs Follow Tajamul Khan

Atomic Value

In 1NF, all the rows in a column must have atomic values with consistent data types

No Partial Dependency

In 2NF, the table must be in 1NF first.


It means if a table has composite primary keys, each non-prime attribute should be dependent
on the entire composite key, not just on part of it.

PRIME ATTRIBUTE
A B C D AB Composite Key Here D is dependent on canditate key ✅
But C is dependent on only B which is
NON PRIME subset of canditate key and that is known
CD ATTRIBUTE as partial dependency

42
Data Blogs Follow Tajamul Khan

No Transitive Dependency

In 3NF, the table must be in 2NF first.


Every non-prime attribute should be non-transitively dependent on the primary key.
This means that no column should depend on another non-key column.

PRIME ATTRIBUTE
A B C D AB Composite Key

NON PRIME
CD ATTRIBUTE


Here c is dependent on canditate key
But D is dependent on C which is also non-
prime attribute and that is known as
transitive dependency

43
Data Blogs Follow Tajamul Khan

Every Non Prime is fully dependent on Candidate keys No Functional Dependency

Boyce Codd Normal form (BCNF) AKA 3.5NF


BCNF is the advance version of 3NF. It is stricter than 3NF.
A table is in BCNF if X depends on Y, Y is the super key of the table.
In simple words, BCNF means every piece of information in a table should only depend on the
primary key. If it depends on anything else, it needs its own table. This keeps the database
organized and prevents data duplication

PRIME NON PRIME


ATTRIBUTE AB C ATTRIBUTE

A B C
NON PRIME
PRIME
ATTRIBUTE
C B ATTRIBUTE

alpha Beta

44
Data Blogs Follow Tajamul Khan

efficient database design


reduce redundancy,
ensure data integrity.
prevent anomalies,

The performance degrades when normalizing the relations to higher normal


forms, i.e., 4NF, 5NF, DKNF.
Time-consuming and difficult to normalize relations of a higher degree.
Careless decomposition may lead to a bad database design, leading to serious
problems.

45
Data Blogs Follow Tajamul Khan

46
Data Blogs Follow Tajamul Khan

The SQL reserved words and characters used with a WHERE clause in a SQL query

47
Data Blogs Follow Tajamul Khan

48
Data Blogs Follow Tajamul Khan

49
Data Blogs Follow Tajamul Khan

50
Data Blogs Follow Tajamul Khan

51
Data Blogs Follow Tajamul Khan

The SQL clauses are foundational elements of a SQL Query

52
Data Blogs Follow Tajamul Khan

Aliases in SQL are used to provide temporary names

53
Data Blogs Follow Tajamul Khan

A case statement is like an if-elif-else statement in programming, allowing different actions to be taken
based on different conditions.
CASE
WHEN condition1 THEN result1
WHEN condition2 THEN result2
...
WHEN conditionN THEN resultN

END
ELSE default_result
Efficient than If-Else ✅
Causes problems when dealing with NULL

SELECT customer_id,
CASE amount
WHEN 500 THEN 'Prime Customer'
WHEN 100 THEN 'Plus Customer'
ELSE 'Regular Customer'
END AS CustomerStatus
FROM payment

54
Data Blogs Follow Tajamul Khan

Joins
Set Operations
Group By and Having clause
Order of Execution
Functions - Aggregate, Datetime, String
Windows Function
Sub-Query
CTE table
In-built functions
Views
Indexes
Stored Procedures
Triggers
Temporary Tables
Data Blogs Follow Tajamul Khan

a join is an operation that joins rows from two or more tables based on a related column between them.

55
Data Blogs Follow Tajamul Khan

Returns only matching rows between both tables.


Note: NULL values are not considered equal, the rows with NULL values from both tables will not
be matched with each other. Therefore, they will not be included in the result set of the INNER JOIN.

SELECT *
FROM customer AS c
INNER JOIN payment AS p
ON c.customer_id = p.customer_id

Returns all rows from the left table and the matched rows from the right table. If there is no match,
NULL values are returned for the missing values.
SELECT *
FROM customer AS c
LEFT JOIN payment AS p
ON c.customer_id = p.customer_id

We will not see null in Inner Join as Null can’t be same


56
Data Blogs Follow Tajamul Khan

Returns all rows from the Right table and the matched rows from the Left table. If there is no match,
NULL values are returned for the missing values.
SELECT *
FROM customer AS c
RIGHT JOIN payment AS p
ON c.customer_id = p.customer_id

Returns all records when there is a match in either left or right table. If there is no match, NULL
values are returned for the missing values in the corresponding columns.

SELECT *
FROM customer AS c
FULL OUTER JOIN payment AS p
ON c.customer_id = p.customer_id

57
Data Blogs Follow Tajamul Khan

It returns the Cartesian product of the two tables involved, meaning it combines every row of the
first table with every row of the second table. In other words, it produces a result set where each row
from the first table is paired with every row from the second table.

Use Case: Get all possible combinations

SELECT *
FROM customers
CROSS JOIN orders;

58
Data Blogs Follow Tajamul Khan

A join in which a table is joined to itself


Self Joins are powerful for comparing rows within the same table based on specified conditions.

When evaluating a hierarchy, the self join is utilized generally.

SELECT
member. Id,
member.FullName,
member.teamleadId,
teamlead.FullName as
teamleadName
FROM members member
JOIN members teamlead
ON member.teamleadId =
teamlead.Id

59
Data Blogs Follow Tajamul Khan

60
Data Blogs Follow Tajamul Khan

are used to combine or manipulate the result sets of multiple SELECT queries.

Set operations compare entire result sets, while joins compare specific columns based on
relationships between tables.

61
Data Blogs Follow Tajamul Khan

combines all the rows including duplicates from result sets of two or more SELECT queries.

SELECT CustomerName FROM


Customers UNION ALL
SELECT SupplierName FROM
Suppliers;

combines all the rows except duplicates from result sets of two or more SELECT queries.

SELECT CustomerName FROM


Customers UNION
SELECT SupplierName FROM
Suppliers;

62
Data Blogs Follow Tajamul Khan

It returns the common rows that exist in the result sets of two or more SELECT queries.
It only returns distinct that appear in all result sets.
SELECT CustomerName FROM
Customers INTERSECT
SELECT SupplierName FROM
Suppliers;

It returns the distinct rows that are present in the result set of the first SELECT query but not
in the result set of the second SELECT query.
SELECT CustomerName FROM
Customers EXCEPT
SELECT SupplierName FROM
Suppliers;

63
Data Blogs Follow Tajamul Khan

It is used to group rows from one or more columns of a table


Mostly used with aggregate functions
SELECT customer_id, SUM(quantity) AS total_quantity
FROM orders
GROUP BY customer_id;

The HAVING clause is used to apply a filter on the results of GROUP BY


Where Clause specifically for group by
SELECT customer_id, SUM(quantity)
FROM orders
GROUP BY customer_id
HAVING SUM(quantity) › 10;

64
Data Blogs Follow Tajamul Khan

(JOIN)

65
Data Blogs Follow Tajamul Khan

performs a calculation on multiple values and returns a single value.


These are often used with GROUP BY & SELECT statement

66
Data Blogs Follow Tajamul Khan

67
Data Blogs Follow Tajamul Khan

The EXTRACT() function extracts a part from a given date value

68
Data Blogs Follow Tajamul Khan

69
Data Blogs Follow Tajamul Khan

Function applied over window frame

applies aggregation, ranking, analytic, distribution functions over a window frame


Window Frame = Set of Rows or Subset of Partition
Over = defines a window frame by partition, order and framing of rows

50 50 Σ

50 50 Σ

Σ
50 50 Σ

Give output one row per aggregation rows maintain their separate identities

Note: In window function over clause is mandatory to use whereas partition by & order by is optional

70
Data Blogs Follow Tajamul Khan

Aggregate functions Partition by


Ranking functions Order by
Value functions Rows or Range
Statistic functions

71
Data Blogs Follow Tajamul Khan

Partitioning: The first step in using a window function is to partition the data into groups based on
certain criteria. This partitioning is done using the PARTITION BY (optional) clause. Rows within
each partition will be treated as a separate group for the window function.
Ordering: Once the data is partitioned, it's often helpful to order the rows within each partition to
define the window frame. Ordering is done using the ORDER BY (mandatory) clause. This
determines the order in which the window function will process the rows within each partition.
Window Frame: The window frame defines the subset of rows within each partition that the window
function will operate on. It's defined by the combination of the PARTITION BY and ORDER BY
clauses. You can specify whether the window frame includes all rows in the partition, a fixed
number of rows preceding or following the current row, or rows between a specified range.
Applying the Function: Once the window frame is defined, the window function is applied to the
rows within the frame. The function calculates a result for each row based on the values within its
window frame.
Result: Finally, the result of the window function is returned for each row in the query result set.
The result is typically displayed as an additional column alongside the original data.

72
Data Blogs Follow Tajamul Khan

Min, Max, Sum, Avg, Count

73
Data Blogs Follow Tajamul Khan

Skip

74
Data Blogs Follow Tajamul Khan

This refers to all rows from the beginning of the partition up to and including the current row.

This refers to all rows from the current row up to the end of the partition.
Used with Range or Rows
all rows from the beginning of the partition up
SELECT
employee_id, to and including the current row
salary,
SUM(salary) OVER (ORDER BY employee_id ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW) AS running_total
FROM
employees;

75
Data Blogs Follow Tajamul Khan

76
Data Blogs Follow Tajamul Khan

SELECT new_id, new_cat,


SUM(new_id) OVER( PARTITION BY new_cat ORDER BY new_id ) AS "Total",
AVG(new_id) OVER( PARTITION BY new_cat ORDER BY new_id ) AS "Average",
COUNT(new_id) OVER( PARTITION BY new_cat ORDER BY new_id ) AS "Count",
MIN(new_id) OVER( PARTITION BY new_cat ORDER BY new_id ) AS "Min",
MAX(new_id) OVER( PARTITION BY new_cat ORDER BY new_id ) AS "Max"
FROM test_data

77
Data Blogs Follow Tajamul Khan

SELECT new_id,
ROW_NUMBER() OVER(ORDER BY new_id) AS "ROW_NUMBER",
RANK() OVER(ORDER BY new_id) AS "RANK",
DENSE_RANK() OVER(ORDER BY new_id) AS "DENSE_RANK",
PERCENT_RANK() OVER(ORDER BY new_id) AS "PERCENT_RANK"
FROM test_data

78
Data Blogs Follow Tajamul Khan

SELECT new_id,
FIRST_VALUE(new_id) OVER( ORDER BY new_id) AS "FIRST_VALUE",
LAST_VALUE(new_id) OVER( ORDER BY new_id) AS "LAST_VALUE",
LEAD(new_id) OVER( ORDER BY new_id) AS "LEAD",
LAG(new_id) OVER( ORDER BY new_id) AS "LAG"
FROM test_data

79
Data Blogs Follow Tajamul Khan

Format CAST and CONVERT are both the


functions that can be used to change
the data type of a value in SQL Server.
CAST is usually preferable since it is
more concise. However, there are some
situations where CONVERT may be
necessary, such as when you need to
use the style argument.

Convert Formats

80
Data Blogs Follow Tajamul Khan

Variables in SQL are defined by using DECLARE statement

81
Data Blogs Follow Tajamul Khan

Nested

also known as a nested query, is a query nested within another query. It allows us to retrieve
data based on the results of another query.
Sub query syntax involves two SELECT statements
It can be added after keywords like WHERE or ON, with comparison operators (>, <, =).

SELECT *
FROM table_name
WHERE <=
(SELECT column_name FROM table_name WHERE ...);

82
Data Blogs Follow Tajamul Khan

WITH

also known as Common Table Expressions, is a named temporary table or named result set
that can be used multiple times within a single query.
It used tempdb but is efficient than temporary table
It allows us to break down complex queries into smaller, more manageable parts.

WITH Sales_CTE AS (
SELECT
customer_id,
SUM(amount) AS total_sales
FROM
sales
GROUP BY
customer_id
)
SELECT customer_id, total_sales FROM Sales_CTE WHERE total_sales > 1000;

83
Data Blogs Follow Tajamul Khan

84
Data Blogs Follow Tajamul Khan

View is like a virtual table that is based on the result set of a SELECT query.
View does not store any data, difference between table & view is that table can store
data but view can never stores data
Changes will be done in view table not in main table

CREATE VIEW my_view AS


SELECT column1, column2
FROM table
WHERE condition;

Select * FROM my_view

85
Data Blogs Follow Tajamul Khan

Cannot change column name.


Cannot change column data type
Data Integrity and Security
Cannot change order of columns To simplify SQL complex queries
But we can add new column at the end
If main table is changed we need to
refresh view to see changes

Changes are done in View only

86
Data Blogs Follow Tajamul Khan

Once you executed if you try to re-execute it will show error

Once you executed if you try to re-execute multiple times it will run without any error

87
Data Blogs Follow Tajamul Khan

Lookup

Indexing in SQL organizes data using a B-tree data structure to swiftly locate information in
tables, enhancing performance for read-intensive operations.

88
Data Blogs Follow Tajamul Khan

A precompiled collection of SQL statements stored in the database and executed as a single
unit.
Improved Performance: Precompiled and stored on the server, they reduce parsing time and enhance
execution speed.
Enhanced Security: Users execute procedures without direct table access, reducing SQL injection risks.
Code Reusability: Encapsulate business logic for reuse across applications.
Reduced Network Traffic: Execute with minimal data transfer, sending only procedure names and
parameters.

CREATE PROCEDURE sp_GetEmployeeByID @EmployeeID INT


AS
BEGIN
-- SQL statements inside the stored procedure
SELECT * FROM Employees
WHERE EmployeeID = @EmployeeID;
-- Execute the stored procedure
EXEC sp_GetEmployeeByID @EmployeeID = 1;

89
Data Blogs Follow Tajamul Khan

A database object that automatically executes in response to specified events (INSERT,


UPDATE, DELETE) on a table.
Purpose: automate actions based on data changes, ensuring consistency, integrity, and tracking
of operations within the database.

90
Data Blogs Follow Tajamul Khan

also known as temp tables are on the fly tables used to do complex calculations without
storing data in database. They are stored in tempdb

Select * into #localtb


Delete from #localtb
WHERE ID = 1
Drop table #localtb

Create Table ##Globaltb


(
Roll No INT,
City Varchar(20)
)

91
Data Blogs Follow Tajamul Khan
Data Blogs Follow Tajamul Khan

CREATE TABLE: Creates a new table. SELECT: Retrieves specific columns from a table.
CREATE TABLE table_name (id INT PRIMARY KEY, SELECT column1, column2 FROM table_name;
name VARCHAR(50));
DISTINCT: Removes duplicate rows from the result.
ALTER TABLE: Modifies an existing table. SELECT DISTINCT column1 FROM table_name;
ALTER TABLE table_name ADD column2 INT;
WHERE: Filters rows based on a condition.
DROP TABLE: Deletes a table. SELECT * FROM table_name WHERE column1 = 'v1';
DROP TABLE table_name;
ORDER BY: Sorts result set by one or more columns.
CREATE INDEX: Creates an index on a table. SELECT * FROM table_nm ORDER BY column1 ASC;
CREATE INDEX idx_name ON table_name (column1);
LIMIT / FETCH: Limits the number of rows returned.
DROP INDEX: Removes an index. SELECT * FROM table_name LIMIT 10;
DROP INDEX idx_name ON table_name;
LIKE: Searches for patterns in text columns.
CREATE VIEW: Creates virtual table based on query.
CREATE VIEW view_name AS SELECT column1, SELECT * FROM table_name WHERE col1 LIKE 'A%';
column2 FROM table_name;
IN: Filters rows with specific values.
DROP VIEW: Deletes a view. SELECT * FROM table_nm WHERE col1 IN ('v1', 'v2');
DROP VIEW view_name;
BETWEEN: Filters rows within a range of values.
RENAME TABLE: Renames an existing table. SELECT * FROM table WHERE c1 BETWEEN 1 AND 20;
RENAME TABLE old_table_nm TO new_table_name;

@Tajamulkhann
92
Data Blogs Follow Tajamul Khan

COUNT(): Returns the number of rows. INSERT INTO: Adds new rows to a table.
SELECT COUNT(*) FROM table_name; INSERT INTO table_name (column1, column2)
VALUES ('value1', 'value2');
SUM(): Calculates the sum of a numeric column.
SELECT SUM(column1) FROM table_name; UPDATE: Updates existing rows in a table.
UPDATE table_name SET col1 = 'value' WHERE id = 1;
AVG(): Calculates the average of a numeric column.
SELECT AVG(column1) FROM table_name; DELETE: Removes rows from a table.
DELETE FROM table_name WHERE column1 = 'value';
MIN(): Returns the smallest value in a column.
SELECT MIN(column1) FROM table_name; MERGE: Combines INSERT, UPDATE, and DELETE
based on a condition.
MAX(): Returns the largest value in a column. MERGE INTO table_name USING source_table ON
SELECT MAX(column1) FROM table_name; condition WHEN MATCHED THEN UPDATE SET
column1 = value WHEN NOT MATCHED THEN INSERT
GROUP BY: Groups rows for aggregation. (columns) VALUES (values);
SELECT col1, COUNT(*) FROM t1 GROUP BY col1;
TRUNCATE: Removes all rows from a table without
HAVING: Filters grouped rows based on a condition. logging.
SELECT column1, COUNT(*) FROM t1 GROUP BY TRUNCATE TABLE table_name;
column1 HAVING COUNT(*) > 5;
REPLACE: Deletes existing rows and inserts new rows
DISTINCT COUNT(): Counts unique values in column. (MySQL-specific).
SELECT COUNT(DISTINCT col1) FROM table_name; REPLACE INTO table_name VALUES (value1, value2);

@Tajamulkhann
93
Data Blogs Follow Tajamul Khan

Commit Transaction: Finalizes changes when all UNION: Combines results from two queries,
operations succeed. removing duplicates.
START TRANSACTION; SELECT column1 FROM table1 UNION SELECT
UPDATE accounts SET balance = 1000 WHERE id = 1; column1 FROM table2;
WHERE id = 2; COMMIT;
UNION ALL: Combines results from two queries,
Execute a Stored Procedure: Undoes changes if an including duplicates.
error occurs or the transaction is not committed. SELECT column1 FROM table1 UNION ALLSELECT
START TRANSACTION; column1 FROM table2;
UPDATE accounts SET balance = 1000 WHERE id = 1;
ROLLBACK; INTERSECT: Returns common rows from both
queries.
Using Savepoints: Set a rollback point within a SELECT column1 FROM table1 INTERSECT SELECT
transaction, allowing partial rollback without column1 FROM table2;
affecting the whole transaction.
START TRANSACTION; EXCEPT (or MINUS): Returns rows from the first
UPDATE accounts SET balance = 1000 WHERE id = 1; query that are not in the second query.
SAVEPOINT sp1; SELECT column1 FROM table1 EXCEPTSELECT
UPDATE accounts SET balance = 2000 WHERE id = 3; column1 FROM table2;
-- Simulate failure
ROLLBACK TO SAVEPOINT sp1;
UPDATE accounts SET balance = 1000 WHERE id = 2;
COMMIT;

@Tajamulkhann @Tajamulkhann

94
Data Blogs Follow Tajamul Khan

INNER JOIN: matching values in both tables. CONCAT(): Concatenates strings.


SELECT * FROM table1 INNER JOIN table2 ON SELECT CONCAT(first_name, ' ', last_name) FROM
table1.id = table2.id; table_name;

LEFT JOIN: Returns all rows from the left table and SUBSTRING(): Extracts a substring from a string.
matching rows from the right table. SELECT SUBSTRING(column1, 1, 5) FROM table_nm;
SELECT * FROM table1 LEFT JOIN table2 ON table1.id
= table2.id; LENGTH(): Returns the length of a string.
SELECT LENGTH(column1) FROM table_name;
RIGHT JOIN: Returns all rows from the right table and
matching rows from the left table. ROUND(): Rounds a number to a specified number of
SELECT * FROM table1 RIGHT JOIN table2 ON decimal places.
table1.id = table2.id; SELECT ROUND(column1, 2) FROM table_name;

FULL OUTER JOIN: Returns rows when there is a NOW(): Returns the current timestamp.
match in either table. SELECT NOW();
SELECT * FROM table1 FULL OUTER JOIN table2 ON
table1.id = table2.id; DATE_ADD(): Adds a time interval to a date.
SELECT DATE_ADD(NOW(), INTERVAL 7 DAY);
CROSS JOIN: Cartesian product of both tables.
SELECT * FROM table1 CROSS JOIN table2; COALESCE(): Returns the first non-null value.
SELECT COALESCE(column1, column2) FROM
SELF JOIN: Joins a table with itself. table_name;
SELECT a.column1, b.column1 FROM table_name a,
table_name b WHERE a.id = b.parent_id; IFNULL(): Replaces NULL values with desired value.
SELECT IFNULL(col1, 'default') FROM table_name;

@Tajamulkhann
95
Data Blogs Follow Tajamul Khan

ROW_NUMBER: Assigns a unique number to each row Create a Stored Procedure:


in a result set. CREATE PROCEDURE sp_GetEmployeeByID
SELECT ROW_NUMBER() OVER (PARTITION BY @EmployeeID INT
department ORDER BY salary DESC) AS row_num AS
FROM employees; BEGIN
-- SQL statements inside the stored procedure
RANK: Assigns a rank to each row, with gaps for ties. SELECT * FROM Employees
SELECT RANK() OVER (PARTITION BY department WHERE EmployeeID = @EmployeeID;
ORDER BY salary DESC) AS rank FROM employees;
Execute a Stored Procedure:
DENSE_RANK: Assigns a rank to each row without EXEC sp_GetEmployeeByID @EmployeeID = 1;
gaps for ties.
SELECT DENSE_RANK() OVER (PARTITION BY Stored Procedure with OUT Parameter:
department ORDER BY salary DESC) AS dense_rank CREATE PROCEDURE GetEmployeeCount (OUT
FROM employees; emp_count INT) BEGINSELECT COUNT(*) INTO
emp_count FROM employees; END;
NTILE: Divides rows into equal parts.
SELECT NTILE(4) OVER (ORDER BY salary) AS Drop a Stored Procedure:
quartile FROM employees; DROP PROCEDURE GetEmployeeDetails;

LEAD(): Accesses subsequent rows’ data.


SELECT name, salary, LEAD(salary) OVER (ORDER BY
salary) AS next_salary FROM employees;

LAG(): Accesses subsequent rows’ data.


SELECT name, salary, LAG(salary) OVER (ORDER BY
salary) AS previous_salary FROM employees;

@Tajamulkhann
96
Data Blogs Follow Tajamul Khan

Create a Trigger (Before Insert): Scalar Subquery: Returns a single value.


CREATE TRIGGER set_created_at SELECT name, salary
BEFORE INSERT ON employees FROM employees
FOR EACH ROW WHERE salary > (SELECT AVG(salary) FROM
SET NEW.created_at = NOW(); employees);

After Update Trigger: Correlated Subquery:


CREATE TRIGGER log_updates SELECT e1.name, e1.salary
AFTER UPDATE ON employees FROM employees e1
FOR EACH ROW WHERE e1.salary > (SELECT AVG(e2.salary) FROM
INSERT INTO audit_log(emp_id, old_salary, employees e2 WHERE e1.department =
new_salary, updated_at) e2.department);
VALUES (OLD.id, OLD.salary, NEW.salary, NOW());

After Delete Trigger:


CREATE TRIGGER log_deletes
AFTER DELETE ON employees
FOR EACH ROW
INSERT INTO audit_log(emp_id, old_salary,
new_salary, deleted_at)
VALUES (OLD.id, OLD.salary, NULL, NOW());

@Tajamulkhann
97
Data Blogs Follow Tajamul Khan

With a Single CTE: Create an Index:


WITH DepartmentSalary AS ( CREATE INDEX idx_department ON
SELECT department, AVG(salary) AS avg_salary employees(department);
FROM employees
GROUP BY department Unique Index: CREATE UNIQUE INDEX
) idx_unique_email ON employees(email);
SELECT *
FROM DepartmentSalary Drop an Index:
WHERE avg_salary > 50000; DROP INDEX idx_department;

Recursive CTE: Clustered Index (SQL Server):


WITH RECURSIVE Numbers AS ( CREATE CLUSTERED INDEX idx_salary ON
SELECT 1 AS num employees(salary);
UNION ALL
SELECT num + 1 Using EXPLAIN to Optimize:
FROM Numbers EXPLAIN SELECT * FROM employees WHERE salary >
WHERE num < 10 50000;
)
SELECT * FROM Numbers;

@Tajamulkhann
98
Data Blogs Follow Tajamul Khan
Data Blogs Follow Tajamul Khan

FREE FREE FREE


MACHINE EDA STATISTICS
LEARNING PROJECTS PROJECTS
PROJECTS

Download projects for your portfolio!


99
Data Blogs Follow Tajamul Khan

Power BI EDA Statistics Excel

Download your copy now!


1 00
Data Blogs Follow Tajamul Khan

Notes & Tips Free Blogs Free Projects

Follow to stay updated!


1 01
Drop your Review!

Tajamul Khan

You might also like