0% found this document useful (0 votes)
4 views17 pages

SQL Notes

The document provides an overview of SQL commands categorized into DDL, DML, DCL, and TCL, detailing their functions and examples. It also discusses the order of execution, differences between various commands, types of constraints and keys, aggregate functions, window functions, joins, and other SQL concepts. Additionally, it touches on data warehousing and the ETL process for data extraction, transformation, and loading.

Uploaded by

sunnu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views17 pages

SQL Notes

The document provides an overview of SQL commands categorized into DDL, DML, DCL, and TCL, detailing their functions and examples. It also discusses the order of execution, differences between various commands, types of constraints and keys, aggregate functions, window functions, joins, and other SQL concepts. Additionally, it touches on data warehousing and the ETL process for data extraction, transformation, and loading.

Uploaded by

sunnu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

1. SQL uses certain commands like Create, Drop, Insert, etc. to carry out the required tasks.

These SQL commands are mainly categorized into four categories as:
1) DDL – Data Definition Language
DDL or Data Definition Language actually consists of the SQL commands that can be
used to define the database schema. It simply deals with descriptions of the database
schema and is used to create and modify the structure of database objects in the database.
DDL is a set of SQL commands used to create, modify, and delete database structures but
not data. These commands are normally not used by a general user, who should be
accessing the database via an application. List of DDL commands:
• CREATE: This command is used to create the database or its objects (like table,
index, function, views, store procedure, and triggers).
• DROP: This command is used to delete objects from the database.
• ALTER: This is used to alter the structure of the database.
• TRUNCATE: This is used to remove all records from a table, including all spaces
allocated for the records are removed.
• COMMENT: This is used to add comments to the data dictionary.
• RENAME: This is used to rename an object existing in the database.

2) DML – Data Manipulation Language


The SQL commands that deals with the manipulation of data present in the database
belong to DML and this includes most of the SQL statements. It is the component of the
SQL statement that controls access to data and to the database. List of DML commands:
• SELECT: It is used to retrieve data from the database.
• INSERT: It is used to insert data into a table.
• UPDATE: It is used to update existing data within a table.
• DELETE: It is used to delete records from a database table.
• LOCK: Table control concurrency.
• CALL: Call a PL/SQL or JAVA subprogram.
• EXPLAIN PLAN: It describes the access path to data.

3) DCL – Data Control Language


DCL includes commands such as GRANT and REVOKE which mainly deal with the
rights, permissions, and other controls of the database system. List of DCL commands:
• GRANT: This command gives users access privileges to the database.
• REVOKE: This command withdraws the user’s access privileges given by using
the GRANT command.
4) TCL – Transaction Control Language
Transactions group a set of tasks into a single execution unit. Each transaction begins
with a specific task and ends when all the tasks in the group successfully complete. If any
of the tasks fail, the transaction fails. Therefore, a transaction has only two results:
success or failure.
• COMMIT: Commits a Transaction.
• ROLLBACK: Rollbacks a transaction in case of any error occurs.
• SAVEPOINT: Sets a save point within a transaction.
• SET TRANSACTION: Specify characteristics for the transaction.

2. Order of Execution
• From
• Where
• Group by
• Having
• Select
• Order by
• limit

3. What is the difference between DELETE and DROP and truncate?


• The DELETE command deletes one or more existing records from the table in the
database.
• The DROP Command drops the complete table from the database.
• The TRUNCATE Command deletes all the rows from the existing table, leaving the
row with the column names.

4. Types of Constraints
i. NOT NULL Constraint:
The NOT NULL constraint ensures that a column cannot contain NULL values. For
example, if we have a table "employees" with a column "name" that cannot be NULL
ii. UNIQUE Constraint: The UNIQUE constraint ensures that each value in a column is
unique. For example, if we have a table "students" with a column "email" that must
be unique
iii. PRIMARY KEY Constraint:
The PRIMARY KEY constraint ensures that each row in a table is uniquely identified
by a specific column or set of columns. For example, if we have a table "books" with
a column "id" that uniquely identifies each row
iv. FOREIGN KEY Constraint: The FOREIGN KEY constraint is used to link two tables
together by creating a relationship between a column in one table and the primary key
column in another table. For example, if we have a table "orders" with a column
"customer_id" that references the "id" column in the "customers" table
v. CHECK Constraint: The CHECK constraint is used to ensure that values in a column
meet a specific condition. For example, if we have a table "employees" with a column
"age" that must be greater than or equal to 18

5. Types of Keys
1) Primary Key
It is the first key used to identify one and only one instance of an entity uniquely. An
entity can contain multiple keys, as we saw in the PERSON table. The key which is most
suitable from those lists becomes a primary key.

2) Foreign key
• Foreign keys are the column of the table used to point to the primary key of
another table.
• Every employee works in a specific department in a company, and employee and
department are two different entities. So, we can't store the department's
information in the employee table. That's why we link these two tables through
the primary key of one table.

3) Candidate key
• A candidate key is an attribute or set of attributes that can uniquely identify a
tuple.
• Except for the primary key, the remaining attributes are considered a candidate
key. The candidate keys are as strong as the primary key.
• In the EMPLOYEE table, id is best suited for the primary key. The rest of the
attributes, like SSN, Passport_Number, License_Number, etc., are considered a
candidate key.

4) Super Key
• Super key is an attribute set that can uniquely identify a tuple. A super key is a
superset of a candidate key.
• In the above EMPLOYEE table, for (EMPLOEE_ID, EMPLOYEE_NAME), the
name of two employees can be the same, but their EMPLYEE_ID can't be the
same. Hence, this combination can also be a key.

6. Aggregate Functions
An aggregate function in SQL returns one value after calculating multiple values of a column.
We often use aggregate functions with the GROUP BY and HAVING clauses of the SELECT
statement. Various types of SQL aggregate functions are:
• Count()
• Sum()
• Avg()
• Min()
• Max()
7. What is the difference between WHERE and having clauses in SQL?
• A HAVING clause is like a WHERE clause, but applies only to groups as a whole
(that is, to the rows in the result set representing groups), whereas the WHERE clause
applies to individual rows. A query can contain both a WHERE clause and a
HAVING clause.

8. Window functions
Window functions applies aggregate and ranking functions over a particular window (set of
rows). OVER clause is used with window functions to define that window. OVER clause does
two things:
• Partitions rows to form set of rows. (PARTITION BY clause is used)
• Orders rows within those partitions into a particular order. (ORDER BY clause is
used)
• If partitions aren’t done, then ORDER BY orders all rows of table.
As we can see in above example, the average salary within each department is calculated and
displayed in column Avg_Salary. Also, employees within particular column are ordered by
their age.

1) RANK()
As the name suggests, the rank function assigns rank to all the rows within every
partition. Rank is assigned such that rank 1 given to the first row and rows having same
value are assigned same rank. For the next rank after two same rank values, one rank
value will be skipped.

2) DENSE_RANK()
It assigns rank to each row within partition. Just like rank function first row is assigned
rank 1 and rows having same value have same rank. The difference between RANK() and
DENSE_RANK() is that in DENSE_RANK(), for the next rank after two same rank,
consecutive integer is used, no rank is skipped.

3) ROW_NUMBER()
It assigns consecutive integers to all the rows within partition. Within a partition, no two
rows can have same row number.

4) Lead & Lag


• LAG() and LEAD() are positional functions. These are window functions and are
very useful in creating reports, because they can refer to data from rows above or
below the current row.
• The LAG() function allows access to a value stored in a different row above the
current row. The row above may be adjacent or some number of rows above, as
sorted by a specified column or set of columns.
• Syntax: LAG(expression [,offset[,default_value]]) OVER(ORDER BY columns)
• This simplest use of LAG() displays the value from the adjacent row above. For
example, the second record displays Alice’s sale amount ($12,000) with Stef’s
($7,000) from the row above, in columns sale_value and previous_sale_value,
respectively. Notice that the first row does not have an adjacent row above, and
consequently the previous_sale_value field is empty (NULL), since the row from
which the value of sale_value should be obtained does not exist. If you specify
only the required argument (the name of the column or other expression) as we
have in this example, the offset argument defaults to 1 and the third argument
defaults to NULL. In our example, the first row in the result set has NULL in
previous_sale_value and in the other rows are the values from the respective rows
immediately above, because the offset is 1.

• Syntax: LEAD(expression [,offset[,default_value]]) OVER(ORDER BY


columns)
9. Indexes
• Indexes are used to retrieve data from the database more quickly than otherwise. The
users cannot see the indexes, they are just used to speed up searches/queries.

• Updating a table with indexes takes more time than updating a table without (because the
indexes also need an update). So, only create indexes on columns that will be frequently
searched against.

• Syntax:

CREATE INDEX index_name


ON table_name (column1, column2, ...);

• The following guidelines indicate when the use of an index should be reconsidered.
• Indexes should not be used on small tables.
• Tables that have frequent, large batch updates or insert operations.
• Indexes should not be used on columns that contain a high number of NULL
values.
• Columns that are frequently manipulated should not be indexed.

10. Difference between group by and window functions


Both GROUP BY and window functions are used to perform aggregate operations on data in
SQL, but they differ in their capabilities and how they operate on the data.
• GROUP BY groups rows into separate summary rows, while window functions calculate
new column values for each individual row.
• GROUP BY returns a summary table, while window functions return a new column with
calculated values for each row.
• GROUP BY can be used with aggregate functions, while window functions can be used
with both aggregate and non-aggregate functions.
• GROUP BY cannot reference columns outside of the GROUP BY clause or aggregate
functions, while window functions can reference any column in the SELECT statement.

11. Joins
A JOIN clause is used to combine rows from two or more tables, based on a related column
between them.

1) Inner Join
• The INNER JOIN keyword selects records that have matching values in both
tables.
• The INNER JOIN keyword selects all rows from both tables as long as there is a
match between the columns. If there are records in the "Orders" table that do not
have matches in "Customers", these orders will not be shown!

SELECT Orders.OrderID, Customers.CustomerName, Orders.OrderDate


FROM Orders
INNER JOIN Customers ON Orders.CustomerID=Customers.CustomerID;
2) Left Join
• The LEFT JOIN keyword returns all records from the left table (table1), and the
matching records from the right table (table2). The result is 0 records from the
right side, if there is no match.
• The LEFT JOIN keyword returns all records from the left table (Customers), even
if there are no matches in the right table (Orders).
• Syntax
SELECT Customers.CustomerName, Orders.OrderID
FROM Customers
LEFT JOIN Orders ON Customers.CustomerID = Orders.CustomerID
ORDER BY Customers.CustomerName;

3) Right Join
• The RIGHT JOIN keyword returns all records from the right table (table2), and
the matching records from the left table (table1). The result is 0 records from the
left side, if there is no match.
• The RIGHT JOIN keyword returns all records from the right table (Employees),
even if there are no matches in the left table (Orders).
• Syntax
SELECT Orders.OrderID, Employees.LastName, Employees.FirstName
FROM Orders
RIGHT JOIN Employees ON Orders.EmployeeID = Employees.EmployeeID
ORDER BY Orders.OrderID;

4) Self-Join
• A self-join is a regular join, but the table is joined with itself.
• Self-joins can be useful in many scenarios, including hierarchical structures, such
as organizational charts, where employees report to other employees, or in
product categories, where a product may belong to multiple categories. They can
also be used to identify patterns or relationships within a single table.
• Example
SELECT A.CustomerName AS CustomerName1, B.CustomerName AS
CustomerName2, A.City
FROM Customers A, Customers B
WHERE A.CustomerID <> B.CustomerID
AND A.City = B.City
ORDER BY A.City;

12. Union Operation


• Every SELECT statement within UNION must have the same number of columns
• The columns must also have similar data types
• The columns in every SELECT statement must also be in the same order
• The UNION operator selects only distinct values by default. To allow duplicate
values, use UNION ALL

13. Wildcards Characters


Select all customers that starts with the letter "a":
SELECT * FROM Customers
WHERE CustomerName LIKE 'a%';

Return all customers from a city that starts with 'L' followed by one
wildcard character, then 'nd' and then two wildcard characters:
SELECT * FROM Customers
WHERE city LIKE 'L_nd__';

Return all customers from a city that contains the letter 'L':
SELECT * FROM Customers
WHERE city LIKE '%L%';

Return all customers that starts with 'a' or starts with 'b':
SELECT * FROM Customers
WHERE CustomerName LIKE 'a%' OR CustomerName LIKE 'b%';

Return all customers that ends with 'a':


SELECT * FROM Customers
WHERE CustomerName LIKE '%a';

Return all customers that starts with "b" and ends with "s":
SELECT * FROM Customers
WHERE CustomerName LIKE 'b%s';

Return all customers that starts with "a" and are at least 3 characters
in length:
SELECT * FROM Customers
WHERE CustomerName LIKE 'a__%';

Return all customers from Spain: (without wildcard)


SELECT * FROM Customers
WHERE Country LIKE 'Spain';
14. SQL NULL Functions
• MySQL: The MySQL IFNULL() function lets you return an alternative value if an
expression is NULL, or we can use the COALESCE() function

• SQL Server: Same as MySQL


• MS Access: he MS Access IsNull() function returns TRUE (-1) if the expression is a
null value, otherwise FALSE (0)

• Oracle: The Oracle NVL() function achieves the same result

15. Stored Procedure


• A stored procedure is a prepared SQL code that you can save, so the code can be
reused over and over again.
• So, if you have an SQL query that you write over and over again, save it as a stored
procedure, and then just call it to execute it.
• You can also pass parameters to a stored procedure, so that the stored procedure can
act based on the parameter value(s) that is passed.
Data Warehousing

ETL is a process in Data Warehousing and it stands


for Extract, Transform and Load. It is a process in which an ETL tool
extracts the data from various data source systems, transforms it in the
staging area, and then finally, loads it into the Data Warehouse system.

1. Extraction:
The first step of the ETL process is extraction. In this step, data from
various source systems is extracted which can be in various formats like
relational databases, No SQL, XML, and flat files into the staging area. It
is important to extract the data from various source systems and store it
into the staging area first and not directly into the data warehouse
because the extracted data is in various formats and can be corrupted
also. Hence loading it directly into the data warehouse may damage it and
rollback will be much more difficult. Therefore, this is one of the most
important steps of ETL process.
2. Transformation:
The second step of the ETL process is transformation. In this step, a set
of rules or functions are applied on the extracted data to convert it into a
single standard format. It may involve following processes/tasks:
• Filtering – loading only certain attributes into the data warehouse.
• Cleaning – filling up the NULL values with some default values,
mapping U.S.A, United States, and America into USA, etc.
• Joining – joining multiple attributes into one.
• Splitting – splitting a single attribute into multiple attributes.
• Sorting – sorting tuples on the basis of some attribute (generally key-
attribute).
3. Loading:
The third and final step of the ETL process is loading. In this step, the
transformed data is finally loaded into the data warehouse. Sometimes
the data is updated by loading into the data warehouse very frequently
and sometimes it is done after longer but regular intervals. The rate and
period of loading solely depends on the requirements and varies from
system to system.

You might also like