SQL Notes
SQL Notes
These SQL commands are mainly categorized into four categories as:
1) DDL – Data Definition Language
DDL or Data Definition Language actually consists of the SQL commands that can be
used to define the database schema. It simply deals with descriptions of the database
schema and is used to create and modify the structure of database objects in the database.
DDL is a set of SQL commands used to create, modify, and delete database structures but
not data. These commands are normally not used by a general user, who should be
accessing the database via an application. List of DDL commands:
• CREATE: This command is used to create the database or its objects (like table,
index, function, views, store procedure, and triggers).
• DROP: This command is used to delete objects from the database.
• ALTER: This is used to alter the structure of the database.
• TRUNCATE: This is used to remove all records from a table, including all spaces
allocated for the records are removed.
• COMMENT: This is used to add comments to the data dictionary.
• RENAME: This is used to rename an object existing in the database.
2. Order of Execution
• From
• Where
• Group by
• Having
• Select
• Order by
• limit
4. Types of Constraints
i. NOT NULL Constraint:
The NOT NULL constraint ensures that a column cannot contain NULL values. For
example, if we have a table "employees" with a column "name" that cannot be NULL
ii. UNIQUE Constraint: The UNIQUE constraint ensures that each value in a column is
unique. For example, if we have a table "students" with a column "email" that must
be unique
iii. PRIMARY KEY Constraint:
The PRIMARY KEY constraint ensures that each row in a table is uniquely identified
by a specific column or set of columns. For example, if we have a table "books" with
a column "id" that uniquely identifies each row
iv. FOREIGN KEY Constraint: The FOREIGN KEY constraint is used to link two tables
together by creating a relationship between a column in one table and the primary key
column in another table. For example, if we have a table "orders" with a column
"customer_id" that references the "id" column in the "customers" table
v. CHECK Constraint: The CHECK constraint is used to ensure that values in a column
meet a specific condition. For example, if we have a table "employees" with a column
"age" that must be greater than or equal to 18
5. Types of Keys
1) Primary Key
It is the first key used to identify one and only one instance of an entity uniquely. An
entity can contain multiple keys, as we saw in the PERSON table. The key which is most
suitable from those lists becomes a primary key.
2) Foreign key
• Foreign keys are the column of the table used to point to the primary key of
another table.
• Every employee works in a specific department in a company, and employee and
department are two different entities. So, we can't store the department's
information in the employee table. That's why we link these two tables through
the primary key of one table.
3) Candidate key
• A candidate key is an attribute or set of attributes that can uniquely identify a
tuple.
• Except for the primary key, the remaining attributes are considered a candidate
key. The candidate keys are as strong as the primary key.
• In the EMPLOYEE table, id is best suited for the primary key. The rest of the
attributes, like SSN, Passport_Number, License_Number, etc., are considered a
candidate key.
4) Super Key
• Super key is an attribute set that can uniquely identify a tuple. A super key is a
superset of a candidate key.
• In the above EMPLOYEE table, for (EMPLOEE_ID, EMPLOYEE_NAME), the
name of two employees can be the same, but their EMPLYEE_ID can't be the
same. Hence, this combination can also be a key.
6. Aggregate Functions
An aggregate function in SQL returns one value after calculating multiple values of a column.
We often use aggregate functions with the GROUP BY and HAVING clauses of the SELECT
statement. Various types of SQL aggregate functions are:
• Count()
• Sum()
• Avg()
• Min()
• Max()
7. What is the difference between WHERE and having clauses in SQL?
• A HAVING clause is like a WHERE clause, but applies only to groups as a whole
(that is, to the rows in the result set representing groups), whereas the WHERE clause
applies to individual rows. A query can contain both a WHERE clause and a
HAVING clause.
8. Window functions
Window functions applies aggregate and ranking functions over a particular window (set of
rows). OVER clause is used with window functions to define that window. OVER clause does
two things:
• Partitions rows to form set of rows. (PARTITION BY clause is used)
• Orders rows within those partitions into a particular order. (ORDER BY clause is
used)
• If partitions aren’t done, then ORDER BY orders all rows of table.
As we can see in above example, the average salary within each department is calculated and
displayed in column Avg_Salary. Also, employees within particular column are ordered by
their age.
1) RANK()
As the name suggests, the rank function assigns rank to all the rows within every
partition. Rank is assigned such that rank 1 given to the first row and rows having same
value are assigned same rank. For the next rank after two same rank values, one rank
value will be skipped.
2) DENSE_RANK()
It assigns rank to each row within partition. Just like rank function first row is assigned
rank 1 and rows having same value have same rank. The difference between RANK() and
DENSE_RANK() is that in DENSE_RANK(), for the next rank after two same rank,
consecutive integer is used, no rank is skipped.
3) ROW_NUMBER()
It assigns consecutive integers to all the rows within partition. Within a partition, no two
rows can have same row number.
• Updating a table with indexes takes more time than updating a table without (because the
indexes also need an update). So, only create indexes on columns that will be frequently
searched against.
• Syntax:
• The following guidelines indicate when the use of an index should be reconsidered.
• Indexes should not be used on small tables.
• Tables that have frequent, large batch updates or insert operations.
• Indexes should not be used on columns that contain a high number of NULL
values.
• Columns that are frequently manipulated should not be indexed.
11. Joins
A JOIN clause is used to combine rows from two or more tables, based on a related column
between them.
1) Inner Join
• The INNER JOIN keyword selects records that have matching values in both
tables.
• The INNER JOIN keyword selects all rows from both tables as long as there is a
match between the columns. If there are records in the "Orders" table that do not
have matches in "Customers", these orders will not be shown!
3) Right Join
• The RIGHT JOIN keyword returns all records from the right table (table2), and
the matching records from the left table (table1). The result is 0 records from the
left side, if there is no match.
• The RIGHT JOIN keyword returns all records from the right table (Employees),
even if there are no matches in the left table (Orders).
• Syntax
SELECT Orders.OrderID, Employees.LastName, Employees.FirstName
FROM Orders
RIGHT JOIN Employees ON Orders.EmployeeID = Employees.EmployeeID
ORDER BY Orders.OrderID;
4) Self-Join
• A self-join is a regular join, but the table is joined with itself.
• Self-joins can be useful in many scenarios, including hierarchical structures, such
as organizational charts, where employees report to other employees, or in
product categories, where a product may belong to multiple categories. They can
also be used to identify patterns or relationships within a single table.
• Example
SELECT A.CustomerName AS CustomerName1, B.CustomerName AS
CustomerName2, A.City
FROM Customers A, Customers B
WHERE A.CustomerID <> B.CustomerID
AND A.City = B.City
ORDER BY A.City;
Return all customers from a city that starts with 'L' followed by one
wildcard character, then 'nd' and then two wildcard characters:
SELECT * FROM Customers
WHERE city LIKE 'L_nd__';
Return all customers from a city that contains the letter 'L':
SELECT * FROM Customers
WHERE city LIKE '%L%';
Return all customers that starts with 'a' or starts with 'b':
SELECT * FROM Customers
WHERE CustomerName LIKE 'a%' OR CustomerName LIKE 'b%';
Return all customers that starts with "b" and ends with "s":
SELECT * FROM Customers
WHERE CustomerName LIKE 'b%s';
Return all customers that starts with "a" and are at least 3 characters
in length:
SELECT * FROM Customers
WHERE CustomerName LIKE 'a__%';
1. Extraction:
The first step of the ETL process is extraction. In this step, data from
various source systems is extracted which can be in various formats like
relational databases, No SQL, XML, and flat files into the staging area. It
is important to extract the data from various source systems and store it
into the staging area first and not directly into the data warehouse
because the extracted data is in various formats and can be corrupted
also. Hence loading it directly into the data warehouse may damage it and
rollback will be much more difficult. Therefore, this is one of the most
important steps of ETL process.
2. Transformation:
The second step of the ETL process is transformation. In this step, a set
of rules or functions are applied on the extracted data to convert it into a
single standard format. It may involve following processes/tasks:
• Filtering – loading only certain attributes into the data warehouse.
• Cleaning – filling up the NULL values with some default values,
mapping U.S.A, United States, and America into USA, etc.
• Joining – joining multiple attributes into one.
• Splitting – splitting a single attribute into multiple attributes.
• Sorting – sorting tuples on the basis of some attribute (generally key-
attribute).
3. Loading:
The third and final step of the ETL process is loading. In this step, the
transformed data is finally loaded into the data warehouse. Sometimes
the data is updated by loading into the data warehouse very frequently
and sometimes it is done after longer but regular intervals. The rate and
period of loading solely depends on the requirements and varies from
system to system.