SQL Fundamentals
SQL Fundamentals
SQL Fundamentals:
UPDATE table_name SET age = 18, name = ‘john’ WHERE name =”khan”
1. INT
2. BIGINT
3. SMALLINT
4. TINYINT
5. DECIMAL or NUMERIC
6. FLOAT
7. DOUBLE
8. REAL
10. VARCHAR it can vary the length accordingly, generally used when you know length can
vary.
11. TEXT
12. DATE
13. TIME
30. SET
o Operators (arithmetic, comparison, logical)
Arithmetic operators are + - % * comparison operators are = != < > <= >= <> and logical
operators are AND, OR, UNION, UNION ALL, NOT
NULL is special in SQL, it doesn’t mean 0 or an empty string instead it indicates the abscess
of the value altogether.
We can’t compare NULL with normal logical operator, instead we use “IS” to compare the
NULL values, IS NULL to check the value is equal to NULL and we use IS NOT NULL to check
the value is not equal to NULL.
When it’s come about arithmetic operators, whenever any arithmetic operator is used with
NULL the result will be NULL example:- 5+NULL = NULL
When it comes to aggregate functions, all the functions ignore the NULL values, except
COUNT(), count counts the NULL as well
When it comes about logical operators, NULL is neither TRUE or FALSE, it’s NULL so the
result is NULL, example NULL AND TRUE == NULL
NULLIF():- this function returns NULL if col-1 and col-2 both are equal else return col-1 if it’s
not equal, generally they are used to handle the division by 0 error.
Example:- NULLIF(col1,0) if col1 is equal to 0 then the result will be NULL else the result will
be the value in col1.
2. Database Objects:
Tables are the fundamental of any database, it is structured form of data which are stored in
tables and this tables are connected with each other with some relation. Tables are consist
of rows and columns, rows are the horizontal and also sometimes call as records or tuples.
Columns are verticals and also known as attributes or fields.
Primary Key is a column or set of columns which are used to uniquely identify the each
record of table. The values of primary key is unique and cannot be NULL.
Foreign Key is a column or a set of column which is used as primary key for another table
and is used to refer the other table
Unique Key is similar to primary key but it can contain one NULL value.
Composite Key, it is a combination of two or more column which can be used as primary key,
foreign key or unique key.
Alternate Key, it is also a unique value column but it is not used as primary key but it has
potential to be primary key.
Surrogate Key, it is auto incremental, artificial generated key, which helps us in uniquely
determine the record. It is a user created primary key, it is not present in data naturally.
Candidate Key, is a minimal super key which can uniquely identify the record.
Super Key, is super set of candidate key, it the combination of columns which can extract the
unique record.
There are many types of indexes are present in SQL, like clustered, non-clustered, bitmap
index, full text index, hash index, unique index, etc.
Clustered:- it is sorted and stored in sorted physical form, it is generally created on primary
key, the leaf nodes contains the data, it is one per table, it is faster than the non-clustered
indexes. Generally don’t take any new memory for the storage.
Non-clustered:- it is created on a column, it has the pointers to data, it generally uses the B+
tree for storage, it can be applied on multiple column, a new memory is allocated to non-
clustered indexes.
Views are the concept in SQL, views are created on the top of tables for limited access and
to display limited data, to make sure data security and privacy.
Materialized Views:- It is unlike normal views, the result is physically stored on disk, it helps
in retrieving frequently accessed data.
Example:-
START WITH 1
INCREMENT BY 1;
Identity columns:-
Example:-
Schema is the blueprint of table, it shows what attributes table will contain what will be the
primary key, what will be foreign key, and other aspects.
Updateting:-
o SELECT statement and its clauses (WHERE, GROUP BY, HAVING, ORDER BY)
SELECT statement is used to retrieve the data, we use different clauses to retrieved the
desired data,
WHERE:- This clause is used to check the condition, it can not be used with aggregate
functions. It used to filter out the data.
GROUP BY:-
Group by is used to make group of records on the basis of a column or a set of columns. It is
generally used with the aggregate functions.
HAVING:-
Having clause is used to compare the aggregation result, it is generally used with GROUP BY.
ORDER BY:-
Order by helps us in arranging the records on the basis or a column or a group of columns.
FROM:-
Helps in deciding from which table we have take out the data.
LIMIT/OFFSET:-
Limit helps in limiting the extracted data, and display limited data.
o Joining tables (INNER JOIN, LEFT JOIN, RIGHT JOIN, FULL OUTER JOIN, SELF JOIN)
INNER JOIN:- Inner join helps in joining two tables with common key, it only retrieve the
records which are in both of the tables.
LEFT JOIN:- Left joins takes are the data from the left table and matching data from the right
table, all the places where it is unable to find data in right table it will fill NULL in it.
RIGHT JOIN:- Right join is anti of left join, it takes all the matching data from the left table
and fill null if it unable to find the data in left table.
FULL OUTER JOIN:- Full outer join takes the all the data from both the data it is union of
Right and Left join.
SELF JOIN:- joining table with itself it called self join, generally used when we want to check
the redords of same table and they have any relation.
Subquery is the concept in SQL, subquery take out the data and then the main query is run
over the data that is retrieved by the subquery.
Non-Correlated Subquery:- This query is independent and doesn’t depend on outer query
for it’s execution.
DENSE_RANK() helps us in ranking the column in better way it doesn’t skip any number so if
something is same it will be like 1,1,2,2,3,4,5
RANK() helps in ranking the column in skipping the ranks for repetitive data, like:-
1,1,3,3,5
LEAD() functions give the next column value ahead, like I have salary day by day, by lead I
can see tomorrows salary today.
FIRST_VALUE(), it gives the first value of every partition that we have done.
LAST_VALUE(), it gives the last value of every partition that we have done.
CTEs are temporary tables which are generally used to retrieve the temporary data and do
operation on top of it. They use WITH key word
Example:-
o Recursive queries
Recursive queries uses the data from the query itself, it is query calling query
Creating:-
Col_3 CHAR,
Altering:-
Dropping:-
Check:- check is used to check the condition before doing any operation, we can give the
condition before inserting or updating the data.
DEFAULT keyword helps in setting up the default value, we can use the DEFAULT word while
inserting the data, we can set the default value if we don’t find any related value for that
column.
Temporary tables are created to store the data temporary, within the sessions we use “#” for
local access of table and “##” for global access.
employee_id INT,
name VARCHAR(100),
department VARCHAR(50)
);
Table variables:-
It is kind of similar to temporary table but they have differences where they are declared,
generally declared in stored procedure and user functions.
product_id INT,
product_name VARCHAR(100),
price DECIMAL(10, 2)
);
The GRANT statement is used to give specific privileges to users or roles on database
objects.
The REVOKE statement is used to remove previously granted privileges from users or roles.
-- GRANT
-- REVOKE
USER is the account which used to interact with the database, and can do various actions.
USER can be created with CREATE USER, can alter user with ALTER USER and delete the USER
with DROP USER.
ROLE is the collection of privileges that can be granted to users or other rules
It is created with the help of CREATE ROLE, altered with ALTER ROLE and deleted with DROP
ROLE
o Privileges
Privileges are the actions that a user or role can perform on the database
ACID stands for Atomicity, Consistency, Isolation, Durability it is very important for any
transactional database, let’s understand each one by one
1. Atomicity:- Atomicity means the unit of work should be take place as a whole, either
the operation will be done in one shot or it will get fail, but all the operation will
happen together.
2. Consistency:- Consistency means the database should remain consistent after the
transaction, it ensure data remain consistent after the transaction
3. Isolation:- Isolation means every transaction should perform in isolation, the one
transaction should not affect the other transaction.
4. Durability:- Once the transaction is done, then the changes are permanent.
o Transaction control statements (BEGIN, COMMIT, ROLLBACK, SAVEPOINT)
Choosing:
There are locking system in a transactional databases, this make a system that how will
transactions will interact with database for the same data simultaneously.
Different locks have different responsibilities, shared lock allows read only, exclusive allow
read and write both, update allow to update the data and intent used as hierarchy lock
system.
Deadlock is when both the transactions are trying to use same data at the same time, there
are different types of mechanisms are there to detect the deadlock,
Like:- Wait for Graph and Timeout
Indexes helps us to retrieve the data from the database fast, we create indexes on top of the
database, There are mainly 2 types of indexes are there.
Clustered:- It is the physical index stored on the disk, there is only 1 clustered index, it is
stored in physical sequence, it has directly access to data.
Non Clustered:- They are build on top of a column ot multiple column it uses the B+ tree
data structure to store the indexes, it is bit slower then the clustered index as it it contains
the pointer to the data.
o Covering indexes
Covering indexes creates extra column inside the index it helps query to retrieve the data
which is in index and in main table.
1. Indexing on database
3. Statical Analysis
Indexing, creating indexes over the database helps query to retrieve the data
fast, although indexing takes extra space but when it comes to retrieval it is really fast.
We have to sometime rewrite the queries, optimizing the joins and restricting the data
movement as much as possible. Using of efficient joins are better, using inner, instead of left
join if applicable, also using of joins instead of subqueries helps in performing queries better,
Also don’t just go blindly with * try to select limited useful columns.
Statical Analysis means DBMS collects the statistics about the data distributed in database
on basis of this statistics the query takes the best decision on execution, maintaining
updated statistics and giving hints can improve performance.
Table Designing at the starting state is important, maintaining the correct datatypes are
important, normalization helps in reducing the redundancy.
Execution plan and profiling, With “Explain” we can check the execution plan of whole query
and can optimize the bottle necks.
Stored Procedures are precompiled SQL statements that are stored and executed in
database, it helps in fast retrieval of data. Helps in reducing the network traffic.
Temporary table is created on disk, this helps in fast retrieval of frequently used data.
With Explain with can see whole execution plan that DBMS has prepared, the joins the cost
everything, with the help of explain we optimize the query.
Cardinality estimation is a process of predicting the the number of rows will be returned in a
particular step in query plan, query optimizer take this plan and try to predict the most
efficient way of query execution
Partitioning is dividing the data on basis of some key, it creates the chunks of data which
helps in parallel processing.
Sharding is horizontal scaling, distributing data to every database server which is called
shard.
12. Stored Procedures and Functions:
AS
sql_statement
GO;
AS
GO;
The parameters that are given to stored procedure is Input parameter and the output given
by stored procedures is output parameters
User defined functions or UDFs are customized functions created by user, we can take the
advantage of already present functions but with the help of UDFs we can actually extend the
SQL power,
There are two types of UDFs:-
Scalar, this function return only single value of any data type.
SQL
RETURNS INT
AS
BEGIN
END;
Table-valued functions return a table as their output, they are similar to views but can take
the parameters.
RETURNS TABLE
AS
RETURN
);
Error handling is important in SQL, we can handle the SQL errors in TRY CATCH block,
BEGIN TRY
END TRY
BEGIN CATCH
SELECT
ERROR_NUMBER() AS ErrorNumber,
ERROR_SEVERITY() AS ErrorSeverity,
ERROR_STATE() AS ErrorState,
ERROR_PROCEDURE() AS ErrorProcedure,
ERROR_LINE() AS ErrorLine,
ERROR_MESSAGE() AS ErrorMessage;
END CATCH;
Triggers
What they are: Special stored procedures that automatically execute in response to
specific events on a table or view (e.g., INSERT, UPDATE, DELETE).
Types of Triggers
DML Triggers:
o FOR (or AFTER): Executes after the triggering event (INSERT, UPDATE,
DELETE). You can access the affected rows through special tables like
INSERTED (new values) and DELETED (old values).
o INSTEAD OF: Executes in place of the triggering event. Useful for customizing
the behavior of views or tables with constraints.
DDL Triggers: Triggered by data definition language events (e.g., CREATE, ALTER,
DROP). Often used for auditing database schema changes.
SQL
o AFTER UPDATE
o AS
o BEGIN
o SELECT d.CustomerID, d.Name + ' ' + d.Email, i.Name + ' ' + i.Email, SYSTEM_USER,
GETDATE()
o FROM DELETED d
o END;
content_copy
SQL Server Agent: A component of SQL Server that lets you schedule and automate
administrative tasks.
Jobs: A series of steps that can include SQL scripts, operating system commands, or
integration services packages.
Schedules: Define when jobs should run (e.g., daily, weekly, on a specific date/time).
Benefits:
Important Notes:
Recursion: Be careful to avoid infinite loops if triggers can modify data that then
triggers other triggers.
Error Handling: Implement robust error handling within triggers to prevent issues.
Let me know if you'd like to explore any of these concepts in more detail or see examples in
a specific SQL dialect!
tuneshare
more_vert
You've opened a treasure trove of data warehousing and business intelligence (BI) topics!
Let's break them down:
Star Schema: The foundational design for data warehouses. It consists of:
o Fact Table: Stores the core measurements or metrics of your business (e.g.,
sales amount, quantity sold).
Snowflake Schema: An extension of the star schema where dimension tables are
normalized (broken down into smaller tables) to reduce redundancy. This can
improve data quality and save storage space, but it can also make queries more
complex.
o Numerical Facts: The measurable values you want to analyze (e.g., sales,
costs, inventory).
o Foreign Keys: References to dimension tables that provide context for the
facts.
o Grain: The level of detail captured in the fact table (e.g., daily sales, individual
transactions).
The Problem: How to track changes in dimension attributes over time (e.g., a
customer's address or a product's price).
SCD Types:
o Type 1: Overwrite the old value with the new one (no history).
o Type 2: Create a new row for the changed dimension member with a new
start date and a version number.
o Type 3: Keep the current value and a previous value in separate columns.
ETL: Extract, Transform, Load. The process of getting data from source systems,
cleaning and transforming it, and loading it into the data warehouse.
ETL Tools: Help automate and streamline the ETL process (e.g., SQL Server
Integration Services (SSIS), Informatica PowerCenter, Talend Open Studio).
OLAP and Data Cube Operations
OLAP Operations:
o Slice: Select a specific subset of data based on dimension values (e.g., sales
for a particular product in a specific region).
o Roll-up: Aggregate data along one or more dimensions (e.g., total sales across
all regions).
o Pivot: Rotate the data cube to view data from different perspectives.
Let me know if you'd like a deeper dive into any of these areas, specific examples, or
guidance on choosing the right tools and techniques!
You've hit upon some of the most powerful and versatile features in modern SQL! Let's
explore these advanced techniques:
What they are: Functions that operate on a set of rows (a "window") relative to the
current row within a query result. They allow you to perform calculations across rows
without grouping them.
Benefits:
SQL
SELECT
EmployeeID,
Salary,
FROM Employees;
What they are: Queries that deal with data organized in a hierarchical or tree-like
structure (e.g., employee hierarchies, product categories).
Recursive CTEs: A type of Common Table Expression (CTE) that references itself to
iterate through the hierarchical data.
Benefits:
SQL
WITH EmployeeHierarchy AS
(
SELECT EmployeeID, ManagerID, Name
FROM Employees
UNION ALL
FROM Employees e
Pivoting: Rotating rows into columns, turning unique values in a column into new
column headers.
Benefits:
SQL
SELECT *
FROM Sales
PIVOT
SUM(Amount)
SQL Injection: A security vulnerability where malicious code is inserted into SQL
statements.
Prevention:
What it is: Specialized indexing and search engine capabilities within a database.
Benefits:
Available in: SQL Server (Full-Text Search), PostgreSQL (full-text search), MySQL
(FULLTEXT indexes).
Let me know if you'd like to dive deeper into any of these advanced techniques! We can
explore specific examples, implementation details, or best practices based on your needs.
o Methods:
o Types:
Data Masking: Obfuscating sensitive data (e.g., credit card numbers, social security
numbers) by replacing it with realistic but fictitious values.
SQL Injection: A malicious attack where attackers inject harmful SQL code into input
fields, potentially exposing or modifying sensitive data.
Prevention:
o Input Validation: Validate and sanitize all user input to ensure it conforms to
expected patterns.
o Least Privilege: Grant users and applications only the minimum permissions
they need.
Auditing: The process of tracking and recording database activity for security,
compliance, or troubleshooting purposes.
o SQL Server Audit: Built-in feature in SQL Server to track events like logins,
schema changes, and data access.
o Third-Party Tools: Use specialized tools for more comprehensive auditing and
reporting.
Regularly Update and Patch: Keep your database software and operating system up
to date with the latest security patches.
Strong Passwords: Enforce strong password policies for all database users.
Limit Network Access: Restrict access to the database server from external
networks.
Firewalls: Use firewalls to control incoming and outgoing traffic to the database
server.
Regular Backups: Maintain regular backups of your data to protect against data loss
due to attacks or system failures.
Security Training: Educate users and developers about security risks and best
practices.
Let me know if you have any more specific questions or want to explore a particular security
topic in more depth!
FROM addresses a1