SQL Deep Guide
SQL Deep Guide
● About
● Chapter 1: Getting started with SQL
○ Section 1.1: Overview
●
● Chapter 2: Identifier
○ Section 2.1: Unquoted identifiers
●
● Chapter 3: Data Types
○ Section 3.1: DECIMAL and NUMERIC
○ Section 3.2: FLOAT and REAL
○ Section 3.3: Integers
○ Section 3.4: MONEY and SMALLMONEY
○ Section 3.5: BINARY and VARBINARY
○ Section 3.6: CHAR and VARCHAR
○ Section 3.7: NCHAR and NVARCHAR
○ Section 3.8: UNIQUEIDENTIFIER
●
● Chapter 4: NULL
○ Section 4.1: Filtering for NULL in queries
○ Section 4.2: Nullable columns in tables
○ Section 4.3: Updating fields to NULL
○ Section 4.4: Inserting rows with NULL fields
●
● Chapter 5: Example Databases and Tables
○ Section 5.1: Auto Shop Database
○ Section 5.2: Library Database
○ Section 5.3: Countries Table
●
● Chapter 6: SELECT
○ Section 6.1: Using the wildcard character to select all columns in a query
○ Section 6.2: SELECT Using Column Aliases
○ Section 6.3: Select Individual Columns
○ Section 6.4: Selecting specified number of records
○ Section 6.5: Selecting with Condition
○ Section 6.6: Selecting with CASE
○ Section 6.7: Select columns which are named after reserved keywords
○ Section 6.8: Selecting with table alias
○ Section 6.9: Selecting with more than 1 condition
○ Section 6.10: Selecting without Locking the table
○ Section 6.11: Selecting with Aggregate functions
○ Section 6.12: Select with condition of multiple values from column
○ Section 6.13: Get aggregated result for row groups
○ Section 6.14: Selection with sorted Results
○ Section 6.15: Selecting with null
○ Section 6.16: Select distinct (unique values only)
○ Section 6.17: Select rows from multiple tables
●
● Chapter 7: GROUP BY
○ Section 7.1: Basic GROUP BY example
○ Section 7.2: Filter GROUP BY results using a HAVING clause
○ Section 7.3: USE GROUP BY to COUNT the number of rows for each unique entry in
a given column
○ Section 7.4: ROLAP aggregation (Data Mining)
●
● Chapter 8: ORDER BY
○ Section 8.1: Sorting by column number (instead of name)
○ Section 8.2: Use ORDER BY with TOP to return the top x rows based on a column's
value
○ Section 8.3: Customizeed sorting order
○ Section 8.4: Order by Alias
○ Section 8.5: Sorting by multiple columns
●
● Chapter 9: AND & OR Operators
○ Section 9.1: AND OR Example
●
● Chapter 10: CASE
○ Section 10.1: Use CASE to COUNT the number of rows in a column match a
condition
○ Section 10.2: Searched CASE in SELECT (Matches a boolean expression)
○ Section 10.3: CASE in a clause ORDER BY
○ Section 10.4: Shorthand CASE in SELECT
○ Section 10.5: Using CASE in UPDATE
○ Section 10.6: CASE use for NULL values ordered last
○ Section 10.7: CASE in ORDER BY clause to sort records by lowest value of 2
columns
●
● Chapter 11: LIKE operator
○ Section 11.1: Match open-ended pattern
○ Section 11.2: Single character match
○ Section 11.3: ESCAPE statement in the LIKE-query
○ Section 11.4: Search for a range of characters
○ Section 11.5: Match by range or set
○ Section 11.6: Wildcard characters
●
● Chapter 12: IN clause
○ Section 12.1: Simple IN clause
○ Section 12.2: Using IN clause with a subquery
●
● Chapter 13: Filter results using WHERE and HAVING
○ Section 13.1: Use BETWEEN to Filter Results
○ Section 13.2: Use HAVING with Aggregate Functions
○ Section 13.3: WHERE clause with NULL/NOT NULL values
○ Section 13.4: Equality
○ Section 13.5: The WHERE clause only returns rows that match its criteria
○ Section 13.6: AND and OR
○ Section 13.7: Use IN to return rows with a value contained in a list
○ Section 13.8: Use LIKE to find matching strings and substrings
○ Section 13.9: Where EXISTS
○ Section 13.10: Use HAVING to check for multiple conditions in a group
●
● Chapter 14: SKIP TAKE (Pagination)
○ Section 14.1: Limiting amount of results
○ Section 14.2: Skipping then taking some results (Pagination)
○ Section 14.3: Skipping some rows from result
●
● Chapter 15: EXCEPT
○ Section 15.1: Select dataset except where values are in this other dataset
●
● Chapter 16: EXPLAIN and DESCRIBE
○ Section 16.1: EXPLAIN Select query
○ Section 16.2: DESCRIBE tablename;
●
● Chapter 17: EXISTS CLAUSE
○ Section 17.1: EXISTS CLAUSE
●
● Chapter 18: JOIN
○ Section 18.1: Self Join
○ Section 18.2: Dierences between inner/outer joins
○ Section 18.3: JOIN Terminology: Inner, Outer, Semi, Anti..
○ Section 18.4: Left Outer Join
○ Section 18.5: Implicit Join
○ Section 18.6: CROSS JOIN
○ Section 18.7: CROSS APPLY & LATERAL JOIN
○ Section 18.8: FULL JOIN
○ Section 18.9: Recursive JOINs
○ Section 18.10: Basic explicit inner join
○ Section 18.11: Joining on a Subquery
●
● Chapter 19: UPDATE
○ Section 19.1: UPDATE with data from another table
○ Section 19.2: Modifying existing values
○ Section 19.3: Updating Specified Rows
○ Section 19.4: Updating All Rows
○ Section 19.5: Capturing Updated records
●
● Chapter 20: CREATE Database
○ Section 20.1: CREATE Database
●
● Chapter 21: CREATE TABLE
○ Section 21.1: Create Table From Select
○ Section 21.2: Create a New Table
○ Section 21.3: CREATE TABLE With FOREIGN KEY
○ Section 21.4: Duplicate a table
○ Section 21.5: Create a Temporary or In-Memory Table
●
● Chapter 22: CREATE FUNCTION
○ Section 22.1: Create a new Function
●
● Chapter 23: TRY/CATCH
○ Section 23.1: Transaction In a TRY/CATCH
●
● Chapter 24: UNION / UNION ALL
○ Section 24.1: Basic UNION ALL query
○ Section 24.2: Simple explanation and Example
●
● Chapter 25: ALTER TABLE
○ Section 25.1: Add Column(s)
○ Section 25.2: Drop Column
○ Section 25.3: Add Primary Key
○ Section 25.4: Alter Column
○ Section 25.5: Drop Constraint
●
● Chapter 26: INSERT
○ Section 26.1: INSERT data from another table using SELECT
○ Section 26.2: Insert New Row
○ Section 26.3: Insert Only Specified Columns
○ Section 26.4: Insert multiple rows at once
●
● Chapter 27: MERGE
○ Section 27.1: MERGE to make Target match Source
○ Section 27.2: MySQL: counting users by name
○ Section 27.3: PostgreSQL: counting users by name
●
● Chapter 28: cross apply, outer apply
○ Section 28.1: CROSS APPLY and OUTER APPLY basics
●
● Chapter 29: DELETE
○ Section 29.1: DELETE all rows
○ Section 29.2: DELETE certain rows with WHERE
○ Section 29.3: TRUNCATE clause
○ Section 29.4: DELETE certain rows based upon comparisons with other tables
●
● Chapter 30: TRUNCATE
○ Section 30.1: Removing all rows from the Employee table
●
● Chapter 31: DROP Table
○ Section 31.1: Check for existence before dropping
○ Section 31.2: Simple drop
●
● Chapter 32: DROP or DELETE Database
○ Section 32.1: DROP Database
●
● Chapter 33: Cascading Delete
○ Section 33.1: ON DELETE CASCADE
●
● Chapter 34: GRANT and REVOKE
○ Section 34.1: Grant/revoke privileges
●
● Chapter 35: XML
○ Section 35.1: Query from XML Data Type
●
● Chapter 36: Primary Keys
○ Section 36.1: Creating a Primary Key
○ Section 36.2: Using Auto Increment
●
● Chapter 37: Indexes
○ Section 37.1: Sorted Index
○ Section 37.2: Partial or Filtered Index
○ Section 37.3: Creating an Index
○ Section 37.4: Dropping an Index, or Disabling and Rebuilding it
○ Section 37.5: Clustered, Unique, and Sorted Indexes
○ Section 37.6: Rebuild index
○ Section 37.7: Inserting with a Unique Index
●
● Chapter 38: Row number
○ Section 38.1: Delete All But Last Record (1 to Many Table)
○ Section 38.2: Row numbers without partitions
○ Section 38.3: Row numbers with partitions
●
● Chapter 39: SQL Group By vs Distinct
○ Section 39.1: Dierence between GROUP BY and DISTINCT
●
● Chapter 40: Finding Duplicates on a Column Subset with Detail
○ Section 40.1: Students with same name and date of birth
●
● Chapter 41: String Functions
○ Section 41.1: Concatenate
○ Section 41.2: Length
○ Section 41.3: Trim empty spaces
○ Section 41.4: Upper & lower case
○ Section 41.5: Split
○ Section 41.6: Replace
○ Section 41.7: REGEXP
○ Section 41.8: Substring
○ Section 41.9: Stu
○ Section 41.10: LEFT - RIGHT
○ Section 41.11: REVERSE
○ Section 41.12: REPLICATE
○ Section 41.13: Replace function in sql Select and Update query
○ Section 41.14: INSTR
○ Section 41.15: PARSENAME
●
● Chapter 42: Functions (Aggregate)
○ Section 42.1: Conditional aggregation
○ Section 42.2: List Concatenation
○ Section 42.3: SUM
○ Section 42.4: AVG()
○ Section 42.5: Count
○ Section 42.6: Min
○ Section 42.7: Max
●
● Chapter 43: Functions (Scalar/Single Row)
○ Section 43.1: Date And Time
○ Section 43.2: Character modifications
○ Section 43.3: Configuration and Conversion Function
○ Section 43.4: Logical and Mathmetical Function
●
● Chapter 44: Functions (Analytic)
○ Section 44.1: LAG and LEAD
○ Section 44.2: PERCENTILE_DISC and PERCENTILE_CONT
○ Section 44.3: FIRST_VALUE
○ Section 44.4: LAST_VALUE
○ Section 44.5: PERCENT_RANK and CUME_DIST
●
● Chapter 45: Window Functions
○ Section 45.1: Setting up a flag if other rows have a common property
○ Section 45.2: Finding "out-of-sequence" records using the LAG() function
○ Section 45.3: Getting a running total
○ Section 45.4: Adding the total rows selected to every row
○ Section 45.5: Getting the N most recent rows over multiple grouping
●
● Chapter 46: Common Table Expressions
○ Section 46.1: generating values
○ Section 46.2: recursively enumerating a subtree
○ Section 46.3: Temporary query
○ Section 46.4: recursively going up in a tree
○ Section 46.5: Recursively generate dates, extended to include team rostering as
example
○ Section 46.6: Oracle CONNECT BY functionality with recursive CTEs
●
● Chapter 47: Views
○ Section 47.1: Simple views
○ Section 47.2: Complex views
●
● Chapter 48: Materialized Views
○ Section 48.1: PostgreSQL example
●
● Chapter 49: Comments
○ Section 49.1: Single-line comments
○ Section 49.2: Multi-line comments
●
● Chapter 50: Foreign Keys
○ Section 50.1: Foreign Keys explained
○ Section 50.2: Creating a table with a foreign key
●
● Chapter 51: Sequence
○ Section 51.1: Create Sequence
○ Section 51.2: Using Sequences
●
● Chapter 52: Subqueries
○ Section 52.1: Subquery in FROM clause
○ Section 52.2: Subquery in SELECT clause
○ Section 52.3: Subquery in WHERE clause
○ Section 52.4: Correlated Subqueries
○ Section 52.5: Filter query results using query on dierent table
○ Section 52.6: Subqueries in FROM clause
○ Section 52.7: Subqueries in WHERE clause
●
● Chapter 53: Execution blocks
○ Section 53.1: Using BEGIN ... END
●
● Chapter 54: Stored Procedures
○ Section 54.1: Create and call a stored procedure
●
● Chapter 55: Triggers
○ Section 55.1: CREATE TRIGGER
○ Section 55.2: Use Trigger to manage a "Recycle Bin" for deleted items
●
● Chapter 56: Transactions
○ Section 56.1: Simple Transaction
○ Section 56.2: Rollback Transaction
●
● Chapter 57: Table Design
○ Section 57.1: Properties of a well designed table
●
● Chapter 58: Synonyms
○ Section 58.1: Create Synonym
●
● Chapter 59: Information Schema
○ Section 59.1: Basic Information Schema Search
●
● Chapter 60: Order of Execution
○ Section 60.1: Logical Order of Query Processing in SQL
●
● Chapter 61: Clean Code in SQL
○ Section 61.1: Formatting and Spelling of Keywords and Names
○ Section 61.2: Indenting
○ Section 61.3: SELECT *
○ Section 61.4: Joins
●
● Chapter 62: SQL Injection
○ Section 62.1: SQL injection sample
○ Section 62.2: simple injection sample
1. Data Definition Language (DDL): to create and modify the structure of the database;
2. Data Manipulation Language (DML): to perform Read, Insert, Update and Delete operations
on the data of the database;
3. Data Control Language (DCL): to control the access of the data stored in the database.
The core DML operations are Create, Read, Update and Delete (CRUD for short) which are
performed by the statements INSERT, SELECT, UPDATE and DELETE.
There is also a (recently added) MERGE statement which can perform all 3 write operations
(INSERT, UPDATE, DELETE).
Chapter 2: Identifier
This topic is about identifiers, i.e. syntax rules for names of tables, columns, and other database
objects.
Where appropriate, the examples should cover variations used by different SQL implementations, or
identify the SQL implementation of the example.
Depending on SQL implementation, and/or database settings, other characters may be allowed,
some even as the first character, e.g.
Unquoted identifiers are case-insensitive. How this is handled depends greatly on SQL
implementation:
Syntax:
Examples:
Syntax:
BINARY [ ( n_bytes ) ]
VARBINARY [ ( n_bytes | max ) ]
n_bytes can be any number from 1 to 8000 bytes. max indicates that the maximum storage space is
2^31-1.
Syntax:
CHAR [ ( n_chars ) ]
VARCHAR [ ( n_chars ) ]
Examples:
SELECT CAST('ABC' AS CHAR(10)) -- 'ABC ' (padded with spaces on the right)
SELECT CAST('ABC' AS VARCHAR(10)) -- 'ABC' (no padding due to variable character)
SELECT CAST('ABCDEFGHIJKLMNOPQRSTUVWXYZ' AS CHAR(10)) -- 'ABCDEFGHIJ' (truncated to
10 characters)
Use MAX for very long strings that may exceed 8000 characters.
Chapter 4: NULL
NULL in SQL, as well as programming in general, means literally "nothing". In SQL, it is easier to
understand as "the absence of any value".
It is important to distinguish it from seemingly empty values, such as the empty string '' or the
number 0, neither of which are actually NULL.
It is also important to be careful not to enclose NULL in quotes, like 'NULL', which is allowed in
columns that accept text, but is not NULL and can cause errors and incorrect data sets.
Note that because NULL is not equal to anything, not even to itself, using equality operators = NULL
or <> NULL (or != NULL) will always yield the truth value of UNKNOWN which will be rejected by
WHERE.
WHERE filters all rows that the condition is FALSE or UKNOWN and keeps only rows that the condition is
TRUE.
By default every column (except those in primary key constraint) is nullable unless we explicitly set
NOT NULL constraint.
Attempting to assign NULL to a non-nullable column will result in an error.
INSERT INTO MyTable (MyCol1, MyCol2) VALUES (1, NULL) ; -- works fine
UPDATE Employees
SET ManagerId = NULL
WHERE Id = 4
Departments
Id Name
1 HR
2 Sales
3 Tech
Employees
Customers
I FNam LNam Email PhoneNumbe PreferredContac
d e e r t
Cars
Id Customer EmployeeId Model Status Total Cost
Id
Authors and Books are known as base tables, since they contain column definition and data for the
actual entities in the relational model. BooksAuthors is known as the relationship table, since this
table defines the relationship between the Books and Authors table.
Authors
(view table)
I Name Country
d
2 F. Scott. USA
Fitzgerald
3 Jane Austen UK
Books
(view table)
I Title
d
2 Nine Stories
BooksAuthors
(view table)
BookI AuthorI
d d
1 1
2 1
3 1
4 2
5 2
6 3
7 4
7 5
7 6
7 7
7 8
SQL to create the table:
Examples
SELECT
ba.AuthorId,
a.Name AuthorName,
ba.BookId,
b.Title BookTitle
FROM BooksAuthors ba
INNER JOIN Authors a ON a.id = ba.authorid
INNER JOIN Books b ON b.id = ba.bookid
;
Some Market data software applications like Bloomberg and Reuters require you to give their API
either a 2 or 3 character country code along with the currency code. Hence this example table has
both the 2-character ISO code column and the 3 character ISO3 code columns.
Countries
(view table)
Chapter 6: SELECT
The SELECT statement is at the heart of most SQL queries. It defines what result set should be
returned by the query, and is almost always used in conjunction with the FROM clause, which defines
what part(s) of the database should be queried.
Section 6.1: Using the wildcard character to select all columns
in a query
Consider a database with the following two tables.
Employees table:
1 James Smith 3
2 John Johnson 4
Departments table:
I Name
d
1 Sales
2 Marketin
g
3 Finance
4 IT
When used as a substitute for explicit column names, it returns all columns in all tables that a query
is selecting FROM. This effect applies to all tables the query accesses through its JOIN clauses.
1 James Smith 3
2 John Johnson 4
Dot notation
To select all values from a specific table, the wildcard character can be applied to the table with dot
notation.
SELECT
Employees.*,
Departments.Name
FROM
Employees
JOIN
Departments
ON Departments.Id = Employees.DeptId;
This will return a data set with all fields on the Employee table, followed by just the Name field in the
Departments table:
2 John Johnson 4 IT
It is generally advised that using * is avoided in production code where possible, as it can cause a
number of potential problems including:
1. Excess IO, network load, memory use, and so on, due to the database engine reading data
that is not needed and transmitting it to the front-end code. This is particularly a concern
where there might be large fields such as those used to store long notes or attached files.
2. Further excess IO load if the database needs to spool internal results to disk as part of the
processing for a query more complex than SELECT <columns> FROM <table>.
3. Extra processing (and/or even more IO) if some of the unneeded columns are:
○ computed columns in databases that support them
○ in the case of selecting from a view, columns from a table/view that the query
optimiser could otherwise optimise out
4.
5. The potential for unexpected errors if columns are added to tables and views later that
results ambiguous column names. For example SELECT * FROM orders JOIN people
ON people.id = orders.personid ORDER BY displayname - if a column column
called displayname is added to the orders table to allow users to give their orders
meaningful names for future reference then the column name will appear twice in the output
so the ORDER BY clause will be ambiguous which may cause errors ("ambiguous column
name" in recent MS SQL Server versions), and if not in this example your application code
might start displaying the order name where the person name is intended because the new
column is the first of that name returned, and so on.
While best avoided in production code, using * is fine as a shorthand when performing manual
queries against the database for investigation or prototype work.
Sometimes design decisions in your application make it unavoidable (in such circumstances, prefer
tablealias.* over just * where possible).
When using EXISTS, such as SELECT A.col1, A.Col2 FROM A WHERE EXISTS (SELECT *
FROM B where A.ID = B.A_ID), we are not returning any data from B. Thus a join is
unnecessary, and the engine knows no values from B are to be returned, thus no performance hit for
using *. Similarly COUNT(*) is fine as it also doesn't actually return any of the columns, so only
needs to read and process those that are used for filtering purposes.
Section 6.2: SELECT Using Column Aliases
Column aliases are used mainly to shorten code and make column names more readable.
Code becomes shorter as long table names and unnecessary identification of columns (e.g., there
may be 2 IDs in the table, but only one is used in the statement) can be avoided. Along with table
aliases this allows you to use longer descriptive names in your database structure while keeping
queries upon that structure concise.
Furthermore they are sometimes required, for instance in views, in order to name computed outputs.
All versions of SQL
Aliases can be created in all versions of SQL using double quotes (").
SELECT
FName AS "First Name",
MName AS "Middle Name",
LName AS "Last Name"
FROM Employees;
You can use single quotes ('), double quotes (") and square brackets ([]) to create an alias in
Microsoft SQL Server.
SELECT
FName AS "First Name",
MName AS 'Middle Name',
LName AS [Last Name]
FROM Employees;
SELECT
FName "First Name",
MName "Middle Name",
LName "Last Name"
FROM Employees;
However, the explicit version (i.e., using the AS operator) is more readable.
If the alias has a single word that is not a reserved word, we can write it without single quotes,
double quotes or brackets:
SELECT
FName AS FirstName,
LName AS LastName
FROM Employees;
FirstName LastName
James Smith
John Johnson
Michael Williams
Some find using = instead of As easier to read, though many recommend against this format, mainly
because it is not standard so not widely supported by all databases. It may cause confusion with
other uses of the = character.
Also, if you need to use reserved words, you can use brackets or quotes to escape:
SELECT
FName as "SELECT",
MName as "FROM",
LName as "WHERE"
FROM Employees;
Different Versions of SQL
Likewise, you can escape keywords in MSSQL with all different approaches:
SELECT
FName AS "SELECT",
MName AS 'FROM',
LName AS [WHERE]
FROM Employees;
Also, a column alias may be used any of the final clauses of the same query, such as an ORDER BY:
SELECT
FName AS FirstName,
LName AS LastName
FROM
Employees
ORDER BY
LastName DESC;
SELECT
FName AS SELECT,
LName AS FROM
FROM
Employees
ORDER BY
LastName DESC;
SELECT
PhoneNumber,
Email,
PreferredContact
FROM Customers;
This statement will return the columns PhoneNumber, Email, and PreferredContact from all rows
of the Customers table. Also the columns will be returned in the sequence in which they appear in
the SELECT clause.
The result will be:
If multiple tables are joined together, you can select columns from specific tables by specifying the
table name before the column name: [table_name].[column_name]
SELECT
Customers.PhoneNumber,
Customers.Email,
Customers.PreferredContact,
Orders.Id AS OrderId
FROM
Customers
LEFT JOIN
Orders ON Orders.CustomerId = Customers.Id;
*AS OrderId means that the Id field of Orders table will be returned as a column named OrderId.
See selecting with column alias for further information.
To avoid using long table names, you can use table aliases. This mitigates the pain of writing long
table names for each field that you select in the joins. If you are performing a self join (a join
between two instances of the same table), then you must use table aliases to distinguish your tables.
We can write a table alias like Customers c or Customers AS c. Here c works as an alias for
Customers and we can select let's say Email like this: c.Email.
SELECT
c.PhoneNumber,
c.Email,
c.PreferredContact,
o.Id AS OrderId
FROM
Customers c
LEFT JOIN
Orders o ON o.CustomerId = c.Id
This standard is only supported in recent versions of some RDMSs. Vendor-specific non-standard
syntax is provided in other systems. Progress OpenEdge 11.x also supports the FETCH FIRST <n>
ROWS ONLY syntax.
Additionally, OFFSET <m> ROWS before FETCH FIRST <n> ROWS ONLY allows skipping rows before
fetching rows.
Results: 10 records.
Vendor Nuances:
It is important to note that the TOP in Microsoft SQL operates after the WHERE clause and will return
the specified number of results if they exist anywhere in the table, while ROWNUM works as part of the
WHERE clause so if other conditions do not exist in the specified number of rows at the beginning of
the table, you will get zero results when there could be others to be found.
The [condition] can be any SQL expression, specified using comparison or logical operators like
>, <, =, <>, >=, <=, LIKE, NOT, IN, BETWEEN etc.
The following statement returns all columns from the table 'Cars' where the status column is
'READY':
SELECT CASE WHEN Col1 < 50 THEN 'under' ELSE 'over' END threshold
FROM TableName;
SELECT
CASE WHEN Col1 < 50 THEN 'under'
WHEN Col1 > 50 AND Col1 <100 THEN 'between'
ELSE 'over'
END threshold
FROM TableName;
SELECT
CASE WHEN Col1 < 50 THEN 'under'
ELSE
CASE WHEN Col1 > 50 AND Col1 <100 THEN Col1
ELSE 'over' END
END threshold
FROM TableName;
SELECT
"ORDER",
ID
FROM ORDERS;
Note that it makes the column name case-sensitive.
Some DBMSes have proprietary ways of quoting names. For example, SQL Server uses square
brackets for this purpose:
SELECT
[Order],
ID
FROM ORDERS;
SELECT
`Order`,
id
FROM orders;
The Employees table is given the alias 'e' directly after the table name. This helps remove ambiguity
in scenarios where multiple tables have the same field name and you need to be specific as to which
table you want to return data from.
Note that once you define an alias, you can't use the canonical table name anymore. i.e.,
It is worth noting table aliases -- more formally 'range variables' -- were introduced into the SQL
language to solve the problem of duplicate columns caused by INNER JOIN. The 1992 SQL
standard corrected this earlier design flaw by introducing NATURAL JOIN (implemented in mySQL,
PostgreSQL and Oracle but not yet in SQL Server), the result of which never has duplicate column
names. The above example is interesting in that the tables are joined on columns with different
names (Id and ManagerId) but are not supposed to be joined on the columns with the same name
(LName, FName), requiring the renaming of the columns to be performed before the join:
Note that although an alias/range variable must be declared for the dervied table (otherwise SQL will
throw an error), it never makes sense to actually use it in the query.
Sam 18 M
John 21 M
Bob 22 M
Mary 23 F
SELECT name FROM persons WHERE gender = 'M' AND age > 20;
Name
John
Bob
using OR keyword
SELECT name FROM persons WHERE gender = 'M' OR age < 20;
name
Sam
John
Bob
These keywords can be combined to allow for more complex criteria combinations:
SELECT name
FROM persons
WHERE (gender = 'M' AND age < 20)
OR (gender = 'F' AND age > 20);
name
Sam
Mary
SQL Server
MySQL
Oracle
DB2
If used on table that has record modifications going on might have unpredictable results.
If employee is categorized with multiple department and we want to find avg salary for every
department then we can use following query.
Minimum
The MIN() aggregate function will return the minimum of values selected.
Maximum
The MAX() aggregate function will return the maximum of values selected.
Count
The COUNT() aggregate function will return the count of values selected.
Specific columns can also be specified to get the number of values in the column. Note that NULL
values are not counted.
Count can also be combined with the distinct keyword for a distinct count.
Sum
The SUM() aggregate function returns the sum of the values selected for all rows.
The important thing is to select only columns specified in the GROUP BY clause or used with
aggregate functions.
There WHERE clause can also be used with GROUP BY, but WHERE filters out records before any
grouping is done:
If you need to filter the results after the grouping has been done, e.g, to see only departments whose
average income is larger than 1000, you need to use the HAVING clause:
This statement will return all the columns from the table Employees.
Or
This example will sort the results first by LName and then, for records that have the same LName,
sort by FName. This will give you a result similar to what you would find in a telephone book.
In order to save retyping the column name in the ORDER BY clause, it is possible to use instead the
column's number. Note that column numbers start from 1.
This will sort your results to have all records with the LName of "Jones" at the top.
Selection with nulls take a different syntax. Don't use =, use IS NULL or IS NOT NULL instead.
This query will return all DISTINCT (unique, different) values from ContinentCode column from
Countries table
ContinentCode
OC
EU
AS
NA
AF
Section 6.17: Select rows from multiple tables
SELECT *
FROM
table1,
table2;
SELECT
table1.column1,
table1.column2,
table2.column1
FROM
table1,
table2;
These statements return the selected columns from multiple tables in one query.
There is no specific relationship between the columns returned from each table.
This covers the complete transcription up to the end of Chapter 6. I will stop here to avoid an overly
long response, but I am ready to continue with Chapter 7 and all subsequent chapters in the same
complete and unabridged manner. Please let me know how you'd like to proceed.
Chapter 7: GROUP BY
Results of a SELECT query can be grouped by one or more columns using the GROUP BY statement:
all results with the same value in the grouped columns are aggregated together. This generates a
table of partial results, instead of one result. GROUP BY can be used in conjunction with aggregation
functions using the HAVING statement to define how non-grouped columns are aggregated.
is saying:
+-----+-------------+
|EmpID|MonthlySalary|
+-----+-------------+
|1 |200 |
+-----+-------------+
|2 |300 |
+-----+-------------+
Result:
+-+---+
|1|200|
+-+---+
|2|300|
+-+---+
Sum wouldn't appear to do anything because the sum of one number is that number. On the other
hand if it looked like this:
+-----+-------------+
|EmpID|MonthlySalary|
+-----+-------------+
|1 |200 |
+-----+-------------+
|1 |300 |
+-----+-------------+
|2 |300 |
+-----+-------------+
Result:
+-+---+
|1|500|
+-+---+
|2|300|
+-+---+
Then it would because there are two EmpID 1's to sum together.
Examples:
Return all authors that wrote more than one book (live example).
SELECT
a.Id,
a.Name,
COUNT(*) BooksWritten
FROM BooksAuthors ba
INNER JOIN Authors a ON a.id = ba.authorid
GROUP BY
a.Id,
a.Name
HAVING COUNT(*) > 1; -- equals to HAVING BooksWritten > 1
;
Return all books that have more than three authors (live example).
SELECT
b.Id,
b.Title,
COUNT(*) NumberOfAuthors
FROM BooksAuthors ba
INNER JOIN Books b ON b.id = ba.bookid
GROUP BY
b.Id,
b.Title
HAVING COUNT(*) > 3 ;-- equals to HAVING NumberOfAuthors > 3
Name GreatHouseAllegienc
e
Arya Stark
Cercei Lannister
Myrcell Lannister
a
Yara Greyjoy
Catelyn Stark
Sansa Stark
Without GROUP BY, COUNT will simply return a total number of rows:
Returns…
Number_of_Westerosians
But by adding GROUP BY, we can COUNT the users for each value in a given column, to return the
number of people in a given Great House, say:
returns...
House Number_of_Westerosian
s
Stark 3
Greyjoy 1
Lannister 2
It's common to combine GROUP BY with ORDER BY to sort results by largest or smallest category:
returns...
House Number_of_Westerosian
s
Stark 3
Lannister 2
Greyjoy 1
The SQL standard provides two additional aggregate operators. These use the polymorphic value
"ALL" to denote the set of all values that an attribute can take. The two operators are:
● with data cube that it provides all possible combinations than the argument attributes of
the clause.
● with roll up that it provides the aggregates obtained by considering the attributes in
order from left to right compared how they are listed in the argument of the clause.
Examples
With cube
select Food,Brand,Total_amount
from Table
group by Food,Brand,Total_amount with cube;
With roll up
select Food,Brand,Total_amount
from Table
group by Food,Brand,Total_amount with roll up;
Pro: If you think it's likely you might change column names later, doing so won't break this code.
Con: This will generally reduce readability of the query (It's instantly clear what ' ORDER BY
Reputation' means, while 'ORDER BY 14' requires some counting, probably with a finger on the
screen.)
This query sorts result by the info in relative column position 3 from select statement instead of
column name Reputation.
SELECT DisplayName, JoinDate, Reputation FROM Users ORDER BY 3;
Community 2008-09-15 1
Section 8.2: Use ORDER BY with TOP to return the top x rows
based on a column's value
In this example, we can use GROUP BY not only determined the sort of the rows returned, but also
what rows are returned, since we're using TOP to limit the result set.
Let's say we want to return the top 5 highest reputation users from an unnamed popular Q&A site.
Without ORDER BY
This query returns the Top 5 rows ordered by the default, which in this case is "Id", the first column
in the table (even though it's not a column shown in the results).
returns...
DisplayName Reputation
Community 1
With ORDER BY
Returns…
DisplayName Reputation
JonSkeet 865023
BalusC 650237
Remarks
Some versions of SQL (such as MySQL) use a LIMIT clause at the end of a SELECT, instead of TOP
at the beginning, for example:
SELECT DisplayName, Reputation
FROM Users
ORDER BY Reputation DESC
LIMIT 5;
Name Department
Hasa IT
n
Yusuf HR
Hillary HR
Joe IT
Merry HR
Ken Accountant
SELECT *
FROM Employee
ORDER BY CASE Department
WHEN 'HR' THEN 1
WHEN 'Accountant' THEN 2
ELSE 3
END;
Name Department
Yusuf HR
Hillary HR
Merry HR
Ken Accountant
Hasa IT
n
Joe IT
And can use relative order of the columns in the select statement. Consider the same example as
above and instead of using alias use the relative order like for display name it is 1 , for Jd it is 2 and
so on
Community 2008-09-15 1
Bob 10 Paris
Mat 20 Berlin
Mary 24 Prague
Name
Mary
Gives
Name
Bob
Mary
CASE can be used in conjunction with SUM to return a count of only those items matching a pre-
defined condition. (This is similar to COUNTIF in Excel.)
The trick is to return binary results indicating matches, so the "1"s returned for matching entries can
be summed for a count of the total number of matches.
Given this table ItemSales, let's say you want to learn the total number of items that have been
categorized as "Expensive":
5 145 10 AFFORDABL
E
Query: SELECT
COUNT(Id) AS ItemsCount,
SUM ( CASE
WHEN PriceRating = 'Expensive' THEN 1
ELSE 0
END
) AS ExpensiveItemsCount FROM ItemSales;
Results:
ItemsCoun ExpensiveItemsCount
t
5 3
Alternative:
SELECT
COUNT(Id) as ItemsCount,
SUM (
CASE PriceRating
WHEN 'Expensive' THEN 1
ELSE 0
END
) AS ExpensiveItemsCount
FROM ItemSales;
(This differs from the simple case, which can only check for equivalency with an input.)
5 145 10 AFFORDABLE
A word of caution. It's important to realize that when using the short variant the entire statement is
evaluated at each WHEN. Therefore the following statement:
SELECT
CASE ABS(CHECKSUM(NEWID())) % 4
WHEN 0 THEN 'Dr'
WHEN 1 THEN 'Master'
WHEN 2 THEN 'Mr'
WHEN 3 THEN 'Mrs'
END;
may produce a NULL result. That is because at each WHEN NEWID() is being called again with a new
result. Equivalent to:
SELECT
CASE
WHEN ABS(CHECKSUM(NEWID())) % 4 = 0 THEN 'Dr'
WHEN ABS(CHECKSUM(NEWID())) % 4 = 1 THEN 'Master'
WHEN ABS(CHECKSUM(NEWID())) % 4 = 2 THEN 'Mr'
WHEN ABS(CHECKSUM(NEWID())) % 4 = 3 THEN 'Mrs'
END;
Therefore it can miss all the WHEN cases and result as NULL.
SELECT ID
,REGION
,CITY
,DEPARTMENT
,EMPLOYEES_NUMBER
FROM DEPT
ORDER BY
CASE WHEN REGION IS NULL THEN 1
ELSE 0
END,
REGION;
The CASE expression in the query below looks at the Date1 and Date2 columns, checks which
column has the lower value, and sorts the records depending on this value.
Sample data
Id Date1 Date2
1 2017-01-01 2017-01-31
2 2017-01-31 2017-01-03
3 2017-01-31 2017-01-02
4 2017-01-06 2017-01-31
5 2017-01-31 2017-01-05
6 2017-01-04 2017-01-31
Query
SELECT Id, Date1, Date2
FROM YourTable
ORDER BY CASE
WHEN COALESCE(Date1, '1753-01-01') < COALESCE(Date2, '1753-01-01') THEN Date1
ELSE Date2
END;
Results
Id Date1 Date2
1 2017-01-01 2017-01-31
3 2017-01-31 2017-01-02
2 2017-01-31 2017-01-03
6 2017-01-04 2017-01-31
5 2017-01-31 2017-01-05
4 2017-01-06 2017-01-31
Explanation
As you see row with Id = 1 is first, that because Date1 have lowest record from entire table 2017-
01-01, row where Id = 3 is second that because Date2 equals to 2017-01-02 that is second
lowest value from table and so on.
So we have sorted records from 2017-01-01 to 2017-01-06 ascending and no care on which one
column Date1 or Date2 are those values.
Following statement matches for all records having FName containing string 'on' from Employees
Table.
Following statement matches all records having PhoneNumber starting with string '246' from
Employees.
SELECT * FROM Employees WHERE PhoneNumber LIKE '246%';
Following statement matches all records having PhoneNumber ending with string '11' from
Employees.
The _ (underscore) character can be used as a wildcard for any single character in a pattern match.
Find all employees whose Fname start with 'j' and end with 'n' and has exactly 3 characters in
Fname.
_ (underscore) character can also be used more than once as a wild card to match patterns.
For example, this pattern would match "jon", "jan", "jen", etc.
These names will not be shown "jn","john","jordan", "justin", "jason", "julian", "jillian", "joann" because
in our query one underscore is used and it can skip exactly one character, so result must be of 3
character Fname.
For example, this pattern would match "LaSt", "LoSt", "HaLt", etc.
SELECT *
FROM T_Whatever
WHERE SomeField LIKE CONCAT('%', @in_SearchText, '%');
However, (apart from the fact that you shouldn't necessarely use LIKE when you can use fulltext-
search) this creates a problem when somebody inputs text like "50%" or "a_b".
So (instead of switching to fulltext-search), you can solve that problem using the LIKE-escape
statement:
SELECT *
FROM T_Whatever
WHERE SomeField LIKE CONCAT('%', @in_SearchText, '%') ESCAPE '\';
That means \ will now be treated as ESCAPE character. This means, you can now just prepend \ to
every character in the string you search, and the results will start to be correct, even when the user
enters a special character like % or _.
e.g.
sqlCmd.Parameters.Add("@in_SearchText", newString);
// instead of sqlCmd.Parameters.Add("@in_SearchText", stringToSearch);
Note: The above algorithm is for demonstration purposes only. It will not work in cases where 1
grapheme consists out of several characters (utf-8). e.g. string stringToSearch = "Les Mise\
u0301rables"; You'll need to do this for each grapheme, not for each character. You should not
use the above algorithm if you're dealing with Asian/East-Asian/South-Asian languages. Or rather, if
you want correct code to begin with, you should just do that for each graphemeCluster.
The range or set can also be negated by appending the ^ caret before the range or set:
This range pattern would not match "gary" but will match "mary":
This set pattern would not match "mary" but will match "gary":
Eg: //selects all customers with a City starting with any character, followed by "erlin"
SELECT * FROM Customers
WHERE City LIKE '_erlin';
Eg: //selects all customers with a City starting with "a", "d", or "l"
SELECT * FROM Customers
WHERE City LIKE '[adl]%';
//selects all customers with a City starting with "a", "b", or "c"
SELECT * FROM Customers
WHERE City LIKE '[a-c]%';
[^charlist]] - Matches only a character NOT specified within the brackets
Eg: //selects all customers with a City starting with a character that is not "a", "p", or "l"
SELECT * FROM Customers
WHERE City LIKE '[^apl]%';
or
select *
from products
where id in (1,8,3)
SELECT *
FROM customers
WHERE id IN (
SELECT DISTINCT customer_id
FROM orders
);
The above will give you all the customers that have orders in the system.
Using the
SELECT * From ItemSales
WHERE Quantity BETWEEN 10 AND 17
This query will return all ItemSales records that have a quantity that is greater or equal to 10 and
less than or equal to 17. The results will look like:
Using the
This query will return all ItemSales records with a SaleDate that is greater than or equal to July
11, 2013 and less than or equal to July 24, 2013.
When comparing datetime values instead of dates, you may need to convert the datetime
values into a date values, or add or subtract 24 hours to get the correct results.
Using the
This query will return all customers whose name alphabetically falls between the letters 'D' and 'L'. In
this case, Customer #1 and #3 will be returned. Customer #2, whose name begins with a 'M' will not
be included.
I FName LName
d
1 William Jones
3 Richar Davis
d
An aggregate function is a function where the values of multiple rows are grouped
together as input on certain criteria to form a single value of more significant meaning or
measurement (Wikipedia).
This query will return the CustomerId and Number of Cars count of any customer who has more
than one car. In this case, the only customer who has more than one car is Customer #1.
CustomerId Number of
Cars
1 2
SELECT *
FROM Employees
WHERE ManagerId IS NULL
This statement will return all Employee records where the value of the ManagerId column is NULL.
SELECT *
FROM Employees
WHERE ManagerId IS NOT NULL
This statement will return all Employee records where the value of the ManagerId is not NULL.
Note: The same query will not return results if you change the WHERE clause to WHERE
ManagerId = NULL or WHERE ManagerId <> NULL.
Using a WHERE at the end of your SELECT statement allows you to limit the returned rows to a
condition. In this case, where there is an exact match using the = sign:
Section 13.5: The WHERE clause only returns rows that match
its criteria
Steam has a games under $10 section of their store page. Somewhere deep in the heart of their
systems, there's probably a query that looks something like:
SELECT *
FROM Items
WHERE Price < 10;
AND
Will return:
OR
Will return:
SELECT *
FROM Cars
WHERE TotalCost IN (100, 200, 300);
This query will return Car #2 which costs 200 and Car #3 which costs 100. Note that this is
equivalent to using multiple clauses with OR, e.g.:
SELECT *
FROM Cars
WHERE TotalCost = 100 OR TotalCost = 200 OR TotalCost = 300;
This example uses the Employees Table from the Example Databases.
SELECT *
FROM Employees
WHERE FName LIKE 'John'
This query will only return Employee #1 whose first name matches 'John' exactly.
SELECT *
FROM Employees
WHERE FName like 'John%';
● John% - will return any Employee whose name begins with 'John', followed by any amount of
characters
● %John - will return any Employee whose name ends with 'John', proceeded by any amount
of characters
● %John% - will return any Employee whose name contains 'John' anywhere within the value
In this case, the query will return Employee #2 whose name is 'John' as well as Employee #4 whose
name is 'Johnathon'.
1 2 5 100
1 3 2 200
1 4 1 500
2 1 4 50
3 5 6 700
To check for customers who have ordered both - ProductID 2 and 3, HAVING can be used
select customerId
from orders
where productID in (2,3)
group by customerId
having count(distinct productID) = 2;
Return value:
customerI
d
The query selects only records with the productIDs in questions and with the HAVING clause checks
for groups having 2 productIds and not just one.
select customerId
from orders
group by customerId
having sum(case when productID = 2 then 1 else 0 end) > 0
and sum(case when productID = 3 then 1 else 0 end) > 0;
This query selects only groups having at least one record with productID 2 and at least one with
productID 3.
Chapter 14: SKIP TAKE (Pagination)
Section 14.1: Limiting amount of results
ISO/ANSI SQL:
Oracle:
SELECT Id,
Col1
FROM (SELECT Id,
Col1,
row_number() over (order by Id) RowNumber
FROM TableName)
WHERE RowNumber <= 20
SQL Server:
SELECT TOP 20 *
FROM dbo.[Sale]
Section 14.2: Skipping then taking some results (Pagination)
ISO/ANSI SQL:
MySQL:
SELECT Id,
Col1
FROM (SELECT Id,
Col1,
row_number() over (order by Id) RowNumber
FROM TableName)
WHERE RowNumber BETWEEN 21 AND 40
PostgreSQL; SQLite:
MySQL:
SELECT Id,
Col1
FROM (SELECT Id,
Col1,
row_number() over (order by Id) RowNumber
FROM TableName)
WHERE RowNumber > 20
PostgreSQL:
SQLite:
Example query:
Example result:
i select_ty tabl typ possible_ke key key_l ref row Extr
d pe e e ys en s a
on type you see if an index was used. In the column possible_keys you see if the execution plan
can choose from different indexes of if none exists. key tells you the acutal used index. key_len
shows you the size in bytes for one index item. The lower this value is the more index items fit into
the same memory size an they can be faster processed. rows shows you the expected number of
rows the query needs to scan, the lower the better.
DESCRIBE tablename;
Example Result:
I FirstNam LastName
d e
1 Ozgur Ozturk
2 Youssef Medi
3 Henry Tai
Order Table
I CustomerId Amount
d
1 2 123.50
2 3 14.80
Result
I FirstNam LastName
d e
2 Youssef Medi
3 Henry Tai
Result
I FirstNam LastName
d e
1 Ozgur Ozturk
Purpose
EXISTS, IN and JOIN could sometime be used for the same result, however, they are not equals :
A table may be joined to itself or to any other table. If information from more than two tables needs to
be accessed, multiple joins can be specified in a FROM clause.
In the below example, for each Employee in the example database Employees table, a record is
returned containing the employee's first name together with the corresponding first name of the
employee's manager. Since managers are also employees, the table is joined with itself:
SELECT
e.FName AS "Employee",
m.FName AS "Manager"
FROM
Employees e
JOIN
Employees m
ON e.ManagerId = m.Id;
Employe Manager
e
John James
Michael James
Johnathon John
The first action is to create a Cartesian product of all records in the tables used in the FROM clause.
In this case it's the Employees table twice, so the intermediate table will look like this (I've removed
any fields not used in this example):
2 John 1 2 John 1
2 John 1 3 Michael 1
2 John 1 4 Johnatho 2
n
3 Michael 1 2 John 1
3 Michael 1 3 Michael 1
3 Michael 1 4 Johnatho 2
n
4 Johnatho 2 1 James NULL
n
4 Johnatho 2 2 John 1
n
4 Johnatho 2 3 Michael 1
n
4 Johnatho 2 4 Johnatho 2
n n
The next action is to only keep the records that meet the JOIN criteria, so any records where the
aliased e table ManagerId equals the aliased m table Id:
4 Johnatho 2 2 John 1
n
Then, each expression used within the SELECT clause is evaluated to return this table:
e.FName m.FName
John James
Michael James
Johnatho John
n
Finally, column names e.FName and m.FName are replaced by their alias column names, assigned
with the AS operator:
Employe Manager
e
John James
Michael James
Johnathon John
In CROSS JOIN, each row from 1st table joins with all the rows of another table.
If 1st table contain x rows and y rows in 2nd one the result set will be x * y rows.
Table A
1
2
3
4
Table B
3
4
5
6
Note that (1,2) are unique to A, (3,4) are common, and (5,6) are unique to B.
Inner Join
An inner join using either of the equivalent queries gives the intersection of the two tables, i.e. the
two rows they have in common:
a|b
--+--
3|3
4|4
A left outer join will give all rows in A, plus any common rows in B:
a|b
--+-----
1 | null
2 | null
3|3
4|4
Similarly, a right outer join will give all rows in B, plus any common rows in A:
a |b
-----+----
3 |3
4 |4
null | 5
null | 6
A full outer join will give you the union of A and B, i.e., all the rows in A and all the rows in B. If
something in A doesn't have a corresponding datum in B, then the B portion is null, and vice versa.
a |b
-----+-----
1 | null
2 | null
3 |3
4 |4
null | 6
null | 5
CREATE TABLE A (
X varchar(255) PRIMARY KEY
);
CREATE TABLE B (
Y varchar(255) PRIMARY KEY
);
Inner Join
Combines left and right rows that match.
X Y
Lisa Lisa
Marco Marco
Phil Phil
Left Outer Join
Sometimes abbreviated to "left join". Combines left and right rows that match, and includes non-
matching left rows.
X Y
Amy NULL
John NULL
Lisa Lisa
Marco Marco
Phil Phil
X Y
Lisa Lisa
Marco Marco
Phil Phil
NULL Tim
NULL Vincen
t
X Y
Amy NULL
John NULL
Lisa Lisa
Marco Marco
Phil Phil
NULL Tim
NULL Vincen
t
Lisa
Marco
Phil
Lisa
Marco
Phil
As you can see, there is no dedicated IN syntax for left vs. right semi join - we achieve the effect
simply by switching the table positions within SQL text.
Amy
Joh
n
WARNING: Be careful if you happen to be using NOT IN on a NULL-able column! More details here.
Tim
Vincen
t
As you can see, there is no dedicated NOT IN syntax for left vs. right anti semi join - we achieve the
effect simply by switching the table positions within SQL text.
Cross Join
A Cartesian product of all left with all right rows.
X Y
Amy Lisa
John Lisa
Lisa Lisa
Marco Lisa
Phil Lisa
Amy Marco
John Marco
Lisa Marco
Marco Marco
Phil Marco
Amy Phil
John Phil
Lisa Phil
Marco Phil
Phil Phil
Amy Tim
John Tim
Lisa Tim
Marco Tim
Phil Tim
Amy Vincent
John Vincent
Lisa Vincent
Marco Vincent
Phil Vincent
Cross join is equivalent to an inner join with join condition which always matches, so the following
query would have returned the same result:
X X
Amy John
Amy Lisa
Amy Marco
Joh Marco
n
Lisa Marco
Phil Marco
Amy Phil
The following example will select all departments and the first name of employees that work in that
department. Departments with no employees are still returned in the results, but will have NULL for
the employee name:
Departments.Name Employees.FName
HR James
HR John
HR Johnathon
Sales Michael
Tech NULL
and
I Nam
d e
1 HR
2 Sales
3 Tech
First a Cartesian product is created from the two tables giving an intermediate table.
The records that meet the join criteria (Departments.Id = Employees.DepartmentId) are
highlighted in bold; these are passed to the next stage of the query.
As this is a LEFT OUTER JOIN all records are returned from the LEFT side of the join (Departments),
while any records on the RIGHT side are given a NULL marker if they do not match the join criteria. In
the table below this will return Tech with NULL.
Finally each expression used within the SELECT clause is evaluated to return our final table:
Departments.Name Employees.FName
HR James
HR John
Sales Richard
Tech NULL
All RDBMSs support it, but the syntax is usually advised against. The reasons why it is a bad idea to
use this syntax are:
● It is possible to get accidental cross joins which then return incorrect results, especially if you
have a lot of joins in the query.
● If you intended a cross join, then it is not clear from the syntax (write out CROSS JOIN
instead), and someone is likely to change it during maintenance.
The following example will select employee's first names and the name of the departments they work
for:
e.FNam d.Name
e
James HR
John HR
Richard Sales
d.Name e.FName
HR James
HR John
HR Michael
HR Johnatho
n
Sales James
Sales John
Sales Michael
Sales Johnatho
n
Tech James
Tech John
Tech Michael
Tech Johnatho
n
It is recommended to write an explicit CROSS JOIN if you want to do a cartesian join, to highlight that
this is what you want.
The basic idea is that a table-valued function (or inline subquery) gets applied for every row you join.
This makes it possible to, for example, only join the first matching entry in another table.
The difference between a normal and a lateral join lies in the fact that you can use a column that you
previously joined in the subquery that you "CROSS APPLY".
Syntax:
PostgreSQL 9.3+
SQL-Server:
/*
AND
(
(__in_DateFrom <= T_MAP_Contacts_Ref_OrganisationalUnit.MAP_KTKOE_DateTo)
AND
(__in_DateTo >= T_MAP_Contacts_Ref_OrganisationalUnit.MAP_KTKOE_DateFrom)
)
*/
ORDER BY MAP_CTCOU_DateFrom
LIMIT 1
) AS FirstOE
A FULL OUTER JOIN returns all rows from the left table, and all rows from the right table.
If there are rows in the left table that do not have matches in the right table, or if there are rows in
right table that do not have matches in the left table, then those rows will be listed, too.
Example 1 :
SELECT
COALESCE(T_Budget.Year, tYear.Year) AS RPT_BudgetInYear
,COALESCE(T_Budget.Value, 0.0) AS RPT_Value
FROM T_Budget
Note that if you're using soft-deletes, you'll have to check the soft-delete status again in the WHERE-
clause (because FULL JOIN behaves kind-of like a UNION);
It's easy to overlook this little fact, since you put AP_SoftDeleteStatus = 1 in the join clause.
Also, if you are doing a FULL JOIN, you'll usually have to allow NULL in the WHERE-clause; forgetting
to allow NULL on a value will have the same effects as an INNER join, which is something you don't
want if you're doing a FULL JOIN.
Example:
SELECT
T_AccountPlan.AP_UID
,T_AccountPlan.AP_Code
,T_AccountPlan.AP_Lang_EN
,T_BudgetPositions.BUP_Budget
,T_BudgetPositions.BUP_UID
,T_BudgetPositions.BUP_Jahr
FROM T_BudgetPositions
WHERE (1=1)
AND (T_BudgetPositions.BUP_SoftDeleteStatus = 1 OR T_BudgetPositions.BUP_SoftDeleteStatus IS
NULL)
AND (T_AccountPlan.AP_SoftDeleteStatus = 1 OR T_AccountPlan.AP_SoftDeleteStatus IS NULL)
UNION ALL
SELECT People.Name
FROM People
JOIN MyDescendants ON People.Name = MyDescendants.Parent
)
SELECT * FROM MyDescendants;
Section 18.10: Basic explicit inner join
A basic join (also called "inner join") queries data from two tables, with their relationship defined in a
join clause.
The following example will select employees' first names (FName) from the Employees table and the
name of the department they work for (Name) from the Departments table:
Employees.FName Departments.Name
James HR
John HR
Richard Sales
(These examples use the Employees and Customers tables from the Example Databases.)
Standard SQL
UPDATE
Employees
SET PhoneNumber =
(SELECT
c.PhoneNumber
FROM
Customers c
WHERE
c.FName = Employees.FName
AND c.LName = Employees.LName)
WHERE Employees.PhoneNumber IS NULL;
SQL:2003
MERGE INTO
Employees e
USING
Customers c
ON
e.FName = c.Fname
AND e.LName = c.LName
AND e.PhoneNumber IS NULL
WHEN MATCHED THEN
UPDATE
SET PhoneNumber = c.PhoneNumber;
SQL Server
UPDATE
Employees
SET
PhoneNumber = c.PhoneNumber
FROM
Employees e
INNER JOIN Customers c
ON e.FName = c.FName
AND e.LName = c.LName
WHERE
PhoneNumber IS NULL;
UPDATE Cars
SET TotalCost = TotalCost + 100
WHERE Id = 3 or Id = 4;
Update operations can include current values in the updated row. In this simple example the
TotalCost is incremented by 100 for two rows:
A column's new value may be derived from its previous value or from any other column's value in the
same table or a joined table.
UPDATE
Cars
SET
Status = 'READY'
WHERE
Id = 4;
This statement will set the status of the row of 'Cars' with id 4 to "READY".
WHERE clause contains a logical expression which is evaluated for each row. If a row fulfills the
criteria, its value is updated. Otherwise, a row remains unchanged.
UPDATE Cars
SET Status = 'READY';
This statement will set the 'status' column of all rows of the 'Cars' table to "READY" because it does
not have a WHERE clause to filter the set of rows.
This would create an empty database named myDatabase where you can create tables.
The CREATE TABLE statement is used create a new table in the database. A table definition consists
of a list of columns, their types, and any integrity constraints.
You can use any of the other features of a SELECT statement to modify the data before passing it to
the new table. The columns of the new table are automatically created according to the selected
rows.
This is then followed by the list of column names and their properties, such as the ID
Value Meaning
identity(1,1 states that column will have auto generated values starting at 1 and
) incrementing by 1 for each new row.
primary key states that all values in this column will have unique values
not null states that this column cannot have null values
Value Meaning
Important: You couldn't make a reference to a table that not exists in the database. Be source to
make first the table Cities and second the table Employees. If you do it vise versa, it will throw an
error.
SQL Server
To create a temporary table local to the session:
BEGIN TRANSACTION
BEGIN TRY
INSERT INTO dbo.Sale(Price, SaleDate, Quantity)
VALUES (5.2, GETDATE(), 1)
INSERT INTO dbo.Sale(Price, SaleDate, Quantity)
VALUES (5.2, 'not a date', 1)
COMMIT TRANSACTION
END TRY
BEGIN CATCH
THROW
ROLLBACK TRANSACTION
END CATCH;
BEGIN TRANSACTION
BEGIN TRY
INSERT INTO dbo.Sale(Price, SaleDate, Quantity)
VALUES (5.2, GETDATE(), 1)
INSERT INTO dbo.Sale(Price, SaleDate, Quantity)
VALUES (5.2, GETDATE(), 1)
COMMIT TRANSACTION
END TRY
BEGIN CATCH
THROW
ROLLBACK TRANSACTION
END CATCH;
Let's say we want to extract the names of all the managers from our departments.Using a UNION we
can get all the employees from both HR and Finance departments, which hold the position of a
manager
SELECT
FirstName, LastName
FROM
HR_EMPLOYEES
WHERE
Position = 'manager'
UNION ALL
SELECT
FirstName, LastName
FROM
FINANCE_EMPLOYEES
WHERE
Position = 'manager';
The UNION statement removes duplicate rows from the query results. Since it is possible to have
people having the same Name and position in both departments we are using UNION ALL, in order
not to remove duplicates.
If you want to use an alias for each output column, you can just put them in the first select statement,
as follows:
SELECT
FirstName as 'First Name', LastName as 'Last Name'
FROM
HR_EMPLOYEES
WHERE
Position = 'manager'
UNION ALL
SELECT
FirstName, LastName
FROM
FINANCE_EMPLOYEES
WHERE
Position = 'manager';
● UNION joins 2 result sets while removing duplicates from the result set
● UNION ALL joins 2 result sets without attempting to remove duplicates
One mistake many people make is to use a UNION when they do not need to have the
duplicates removed. The additional performance cost against large results sets can be
very significant.
When you might need
Suppose you need to filter a table against 2 different attributes, and you have created separate non-
clustered indexes for each column. A UNION enables you to leverage both indexes while still
preventing duplicates.
This simplifies your performance tuning since only simple indexes are needed to perform these
queries optimally. You may even be able to get by with quite a bit fewer non-clustered indexes
improving overall write performance against the source table as well.
Suppose you still need to filter a table against 2 attributes, but you do not need to filter duplicate
records (either because it doesn't matter or your data wouldn't produce any duplicates during the
union due to your data model design).
This is especially useful when creating Views that join data that is designed to be physically
partitioned across multiple tables (maybe for performance reasons, but still wants to roll-up records).
Since the data is already split, having the database engine remove duplicates adds no value and just
adds additional processing time to the queries.
Chapter 25: ALTER TABLE
ALTER command in SQL is used to modify column/constraint in a table
The above statement would add columns named StartingDate which cannot be NULL with default
value as current date and DateOfBirth which can be NULL in Employees table.
This will not only delete information from that column, but will drop the column salary from table
employees(the column will no more exist).
This query will alter the column datatype of StartingDate and change it from simple date to
datetime and set default to current date.
This Drops a constraint called DefaultSalary from the employees table definition.
Note: Ensure that constraints of the column are dropped before dropping a column.
This example will insert all Employees into the Customers table. Since the two tables have different
fields and you don't want to move all the fields over, you need to set which fields to insert into and
which fields to select. The correlating field names don't need to be called the same thing, but then
need to be the same data type. This example is assuming that the Id field has an Identity
Specification set and will auto increment.
If you have two tables that have exactly the same field names and just want to move all the records
over you can use:
This statement will insert a new row into the Customers table. Note that a value was not specified
for the Id column, as it will be added automatically. However, all other column values must be
specified.
This statement will insert a new row into the Customers table. Data will only be inserted into the
columns specified - note that no value was provided for the PhoneNumber column. Note, however,
that all columns marked as not null must be included.
Note: The AND NOT EXISTS portion prevents updating records that haven't changed. Using the
INTERSECT construct allows nullable columns to be compared without special handling.
Section 27.2: MySQL: counting users by name
Suppose we want to know how many users have the same name. Let us create table users as
follows:
Now, we just discovered a new user named Joe and would like to take him into account. To achieve
that, we need to determine whether there is an existing row with his name, and if so, update it to
increment count; on the other hand, if there is no existing row, we should create it.
MySQL uses the following syntax : insert ... on duplicate key update .... In this case:
Now, we just discovered a new user named Joe and would like to take him into account. To achieve
that, we need to determine whether there is an existing row with his name, and if so, update it to
increment count; on the other hand, if there is no existing row, we should create it.
PostgreSQL uses the following syntax : insert ... on conflict ... do update .... In this
case:
create a Department table to hold information about departments. Then create an Employee table
which hold information about the employees. Please note, each employee belongs to a department,
hence the Employee table has referential integrity with the Department table.
First query selects data from Department table and uses CROSS APPLY to evaluate the Employee
table for each record of the Department table. Second query simply joins the Department table
with the Employee table and all the matching records are produced.
SELECT *
FROM Department D
CROSS APPLY (
SELECT *
FROM Employee E
WHERE E.DepartmentID = D.DepartmentID
)A
GO
SELECT *
FROM Department D
INNER JOIN Employee E
ON D.DepartmentID = E.DepartmentID;
If you look at the results they produced, it is the exact same result-set; How does it differ from a
JOIN and how does it help in writing more efficient queries.
The first query in Script #2 selects data from Department table and uses OUTER APPLY to evaluate
the Employee table for each record of the Department table. For those rows for which there is not a
match in Employee table, those rows contains NULL values as you can see in case of row 5 and 6.
The second query simply uses a LEFT OUTER JOIN between the Department table and the
Employee table. As expected the query returns all rows from Department table; even for those
rows for which there is no match in the Employee table.
SELECT *
FROM Department D
OUTER APPLY (
SELECT *
FROM Employee E
WHERE E.DepartmentID = D.DepartmentID
)A
GO
SELECT *
FROM Department D
LEFT OUTER JOIN Employee E
ON D.DepartmentID = E.DepartmentID
GO;
Even though the above two queries return the same information, the execution plan will be bit
different. But cost wise there will be not much difference.
Now comes the time to see where the APPLY operator is really required. In Script #3, I am creating a
table-valued function which accepts DepartmentID as its parameter and returns all the employees
who belong to this department. The next query selects data from Department table and uses CROSS
APPLY to join with the function we created. It passes the DepartmentID for each row from the outer
table expression (in our case Department table) and evaluates the function for each row similar to a
correlated subquery. The next query uses the OUTER APPLY in place of CROSS APPLY and hence
unlike CROSS APPLY which returned only correlated data, the OUTER APPLY returns non-correlated
data as well, placing NULLs into the missing columns.
CREATE FUNCTION dbo.fn_GetAllEmployeeOfADepartment (@DeptID AS int)
RETURNS TABLE
AS
RETURN
(
SELECT
*
FROM Employee E
WHERE E.DepartmentID = @DeptID
)
GO
SELECT
*
FROM Department D
CROSS APPLY dbo.fn_GetAllEmployeeOfADepartment(D.DepartmentID)
GO
SELECT
*
FROM Department D
OUTER APPLY dbo.fn_GetAllEmployeeOfADepartment(D.DepartmentID)
GO;
So now if you are wondering, can we use a simple join in place of the above queries? Then the
answer is NO, if you replace CROSS/OUTER APPLY in the above queries with INNER JOIN/LEFT
OUTER JOIN, specify ON clause (something as 1=1) and run the query, you will get "The multi-part
identifier "D.DepartmentID" could not be bound." error. This is because with JOINs the execution
context of outer query is different from the execution context of the function (or a derived table), and
you can not bind a value/variable from the outer query to the function as a parameter. Hence the
APPLY operator is required for such queries.
Chapter 29: DELETE
The DELETE statement is used to delete records from a table.
See TRUNCATE documentation for details on how TRUNCATE performance can be better because it
ignores triggers and indexes and logs to just delete the data.
Let's assume we want to DELETEdata from Source once its loaded into Target.
Most common RDBMS implementations (e.g. MySQL, Oracle, PostgresSQL, Teradata) allow tables
to be joined during DELETE allowing more complex comparison in a compact syntax.
Adding complexity to original scenario, let's assume Aggregate is built from Target once a day and
does not contain the same ID but contains the same date. Let us also assume that we want to delete
data from Source only after the aggregate is populated for the day.
This essentially results in INNER JOINs between Source, Target and Aggregate. The deletion is
performed on Source when the same IDs exist in Target AND date present in Target for those IDs
also exists in Aggregate.
Same query may also be written (on MySQL, Oracle, Teradata) as:
DELETE Source
FROM Source, TargetSchema.Target, AggregateSchema.Aggregate
WHERE Source.ID = TargetSchema.Target.ID
AND TargetSchema.Target.DataDate = AggregateSchema.Aggregate.AggDate;
Explicit joins may be mentioned in Delete statements on some RDBMS implementations (e.g.
Oracle, MySQL) but not supported on all platforms (e.g. Teradata does not support them)
Comparisons can be designed to check mismatch scenarios instead of matching ones with all syntax
styles (observe NOT EXISTS below)
Using truncate table is often better then using DELETE TABLE as it ignores all the indexes and
triggers and just removes everything.
Delete table is a row based operation this means that each row is deleted. Truncate table is a data
page operation the entire data page is reallocated. If you have a table with a million rows it will be
much faster to truncate the table than it would be to use a delete table statement.
Though we can delete specific Rows with DELETE, we cannot TRUNCATE specific rows, we can only
TRUNCATE all the records at once. Deleting All rows and then inserting a new record will continue to
add the Auto incremented Primary key value from the previously inserted value, where as in
Truncate, the Auto Incremental primary key value will also get reset and starts from 1.
Note that when truncating table, no foreign keys must be present, otherwise you will get an error.
Chapter 31: DROP Table
Section 31.1: Check for existence before dropping
MySQL Version ≥ 3.19
This should mean that you have a foreign key on your room table, referencing the client table.
Assuming a client moves on to some other software, you'll have to delete his data in your software.
But if you do
Then you'll get a foreign key violation, because you can't delete the client when he still has rooms.
Now you'd have write code in your application that deletes the client's rooms before it deletes the
client. Assume further that in the future, many more foreign key dependencies will be added in your
database, because your application's functionality expands. Horrible. For every modification in your
database, you'll have to adapt your application's code in N places. Possibly you'll have to adapt code
in other applications as well (e.g. interfaces to other systems).
ALTER TABLE dbo.T_Room -- WITH CHECK -- SQL-Server can specify WITH CHECK/WITH NOCHECK
ADD CONSTRAINT FK_T_Room_T_Client FOREIGN KEY(RM_CLI_ID)
REFERENCES dbo.T_Client (CLI_ID)
ON DELETE CASCADE;
and the rooms are automagically deleted when the client is deleted.
Problem solved - with no application code changes.
One word of caution: In Microsoft SQL-Server, this won't work if you have a table that references
itselfs. So if you try to define a delete cascade on a recursive tree structure, like this:
A word of caution:
This means you can't simply delete and re-insert the client table anymore, because if you do this, it
will delete all entries in "T_Room"... (no non-delta updates anymore)
Chapter 34: GRANT and REVOKE
Section 34.1: Grant/revoke privileges
Grant User1 and User2 permission to perform SELECT and UPDATE operations on table Employees.
Revoke from User1 and User2 the permission to perform SELECT and UPDATE operations on table
Employees.
Results
First a 1
First b 2
First c 3
Second a 3
Second b 4
Second c 5
Third a 10
Third b 20
Third c 30
Chapter 36: Primary Keys
Section 36.1: Creating a Primary Key
This will create the Employees table with 'Id' as its primary key. The primary key can be used to
uniquely identify the rows of a table. Only one primary key is allowed per table.
A key can also be composed by one or more fields, so called composite key, with the following
syntax:
MySQL
SQL Server
SQLite
Several types of indexes exist, and can be created on a table. When an index exists on the columns
used in a query's WHERE clause, JOIN clause, or ORDER BY clause, it can substantially improve
query performance.
The database system would not do additional sorting, since it can do an index-lookup in that order.
Consider a constant growing amount of orders with order_state_id equal to finished (2), and a
stable amount of orders with order_state_id equal to started (1).
If your business make use of queries like this:
Partial indexing allows you to limit the index, including only the unfinished orders:
This index will be smaller than an unfiltered index, which saves space and reduces the cost of
updating the index.
This will create an index for the column EmployeeId in the table Cars. This index will improve the
speed of queries asking the server to sort or select by values in EmployeeId, such as the following:
In this case, the index would be useful for queries asking to sort or select by all included columns, if
the set of conditions is ordered in the same way. That means that when retrieving the data, it can
find the rows to retrieve using the index, instead of looking through the full table.
For example, the following case would utilize the second index;
SELECT * FROM Cars WHERE EmployeeId = 1 Order by CarId DESC;
If the order differs, however, the index does not have the same advantages, as in the following;
The index is not as helpful because the database must retrieve the entire index, across all values of
EmployeeId and CarID, in order to find which items have OwnerId = 17.
(The index may still be used; it may be the case that the query optimizer finds that retrieving the
index and filtering on the OwnerId, then retrieving only the needed rows is faster than retrieving the
full table, especially if the table is large.)
We can use command DROP to delete our index. In this example we will DROP the index called
ix_cars_employee_id on the table Cars.
This deletes the index entirely, and if the index is clustered, will remove any clustering. It cannot be
rebuilt without recreating the index, which can be slow and computationally expensive. As an
alternative, the index can be disabled:
This allows the table to retain the structure, along with the metadata about the index.
Critically, this retains the index statistics, so that it is possible to easily evaluate the change. If
warranted, the index can then later be rebuilt, instead of being recreated completely;
The above SQL statement creates a new clustered index on Employees. Clustered indexes are
indexes that dictate the actual structure of the table; the table itself is sorted to match the structure of
the index. That means there can be at most one clustered index on a table. If a clustered index
already exists on the table, the above statement will fail. (Tables with no clustered indexes are also
called heaps.)
This will create an unique index for the column Email in the table Customers. This index, along with
speeding up queries like a normal index, will also force every email address in that column to be
unique. If a row is inserted or updated with a non-unique Email value, the insertion or update will, by
default, fail.
This creates an index on Customers which also creates a table constraint that the EmployeeID
must be unique. (This will fail if the column is not currently unique - in this case, if there are
employees who share an ID.)
This creates an index that is sorted in descending order. By default, indexes (in MSSQL server, at
least) are ascending, but that can be changed.
By default rebuilding index is offline operation which locks the table and prevents DML against it ,
but many RDBMS allow online rebuilding. Also, some DB vendors offer alternatives to index
rebuilding such as REORGANIZE (SQLServer) or COALESCE/SHRINK SPACE(Oracle).
This will fail if an unique index is set on the Email column of Customers. However, alternate
behavior can be defined for this case:
WITH cte AS (
SELECT ProjectID,
ROW_NUMBER() OVER (PARTITION BY ProjectID ORDER BY InsertDate DESC) AS rn
FROM ProjectNotes
)
DELETE FROM cte WHERE rn > 1;
SELECT
ROW_NUMBER() OVER(ORDER BY Fname ASC) AS RowNumber,
Fname,
LName
FROM Employees;
SELECT
ROW_NUMBER() OVER(PARTITION BY DepartmentId ORDER BY DepartmentId ASC) AS
RowNumber,
DepartmentId, Fname, LName
FROM Employees;
Chapter 39: SQL Group By vs Distinct
Section 39.1: Dierence between GROUP BY and DISTINCT
GROUP BY is used in combination with aggregation functions. Consider the following table:
1 43 Store A 25 20-03-2016
2 57 Store B 50 22-03-2016
3 43 Store A 30 25-03-2016
4 82 Store C 10 26-03-2016
5 21 Store A 45 29-03-2016
SELECT
storeName,
COUNT(*) AS total_nr_orders,
COUNT(DISTINCT userId) AS nr_unique_customers,
AVG(orderValue) AS average_order_value,
MIN(orderDate) AS first_order,
MAX(orderDate) AS lastOrder
FROM
orders
GROUP BY
storeName;
While DISTINCT is used to list a unique combination of distinct values for the specified columns.
SELECT DISTINCT
storeName,
userId
FROM
orders;
storeNam userI
e d
Store A 43
Store B 57
Store C 82
Store A 21
Chapter 40: Finding Duplicates on a Column Subset with
Detail
Section 40.1: Students with same name and date of birth
This example uses a Common Table Expression and a Window Function to show all duplicate rows
(on a subset of columns) side by side.
Chapter 41: String Functions
String functions perform operations on string values and return either numeric or string values.
Using string functions, you can, for example, combine data, extract a substring, compare strings, or
convert a string to all uppercase or lowercase characters.
Some databases support using CONCAT to join more than two strings (Oracle does not):
Some databases (e.g., Oracle) perform implicit lossless conversions. For example, a CONCAT on a
CLOB and NCLOB yields a NCLOB. A CONCAT on a number and a varchar2 results in a varchar2,
etc.:
SELECT CONCAT(CONCAT('Foo', 42), 'Bar') FROM dual; --returns Foo42Bar
Some databases can use the non-standard + operator (but in most, + works only for numbers):
On SQL Server < 2012, where CONCAT is not supported, + is the only way to join strings.
It should be noted though, that DATALENGTH returns the length of the underlying byte representation
of the string, which depends, i.a., on the charset used to store the string.
DECLARE @str varchar(100) = 'Hello ' --varchar is usually an ASCII string, occupying 1 byte per char
SELECT DATALENGTH(@str) -- returns 6
DECLARE @nstr nvarchar(100) = 'Hello ' --nvarchar is a unicode string, occupying 2 bytes per char
SELECT DATALENGTH(@nstr) -- returns 12
Oracle
Examples:
SELECT value FROM STRING_SPLIT('Lorem ipsum dolor sit amet.', ' ');
Result:
value
Lorem
ipsum
dolor
sit
amet.
REPLACE( String to search , String to search for and replace , String to place
into the original string )
Example:
SELECT REPLACE( 'Peter Steve Tom', 'Steve', 'Billy' ) --Return Values: Peter Billy Tom
This is often used in conjunction with the LEN() function to get the last n characters of a string of
unknown length.
Syntax:
Example:
Oracle SQL doesn't have LEFT and RIGHT functions. They can be emulated with SUBSTR and
LENGTH.
SUBSTR ( string-expression, 1, integer )
SUBSTR ( string-expression, length(string-expression)-integer+1, integer)
The following example replaces occurrences of South with Southern in Employees table:
FirstNam Address
e
Select Statement :
SELECT
FirstName,
REPLACE (Address, 'South', 'Southern') Address
FROM Employees
ORDER BY FirstName;
Result:
FirstNam Address
e
Update Statement :
We can use a replace function to make permanent changes in our table through following approach.
Update Employees
Set city = REPLACE (Address, 'South', 'Southern');
A more common approach is to use this in conjunction with a WHERE clause like this:
Update Employees
Set Address = REPLACE (Address, 'South', 'Southern')
Where Address LIKE 'South%';
PARSENAME function returns the specific part of given string(object name). object name may contains
string like object name,owner name, database name and server name.
Syntax
PARSENAME('NameOfStringToParse',PartIndex)
Example
PARSENAME will returns null is specified part is not present in given object name string
select customer,
sum(case when payment_type = 'credit' then amount else 0 end) as credit,
sum(case when payment_type = 'debit' then amount else 0 end) as debit
from payments
group by customer;
Result:
Peter 400 0
select customer,
sum(case when payment_type = 'credit' then 1 else 0 end) as credit_transaction_count,
sum(case when payment_type = 'debit' then 1 else 0 end) as debit_transaction_count
from payments
group by customer;
Result:
Custome credit_transaction_coun debit_transaction_count
r t
Peter 2 0
John 1 1
List Concatenation aggregates a column or expression by combining the values into a single string
for each group. A string to delimit each value (either blank or a comma when omitted) and the order
of the values in the result can be specified. While it is not part of the SQL standard, every major
relational database vendor supports it in their own way.
MySQL
SELECT ColumnA
, GROUP_CONCAT(ColumnB ORDER BY ColumnB SEPARATOR ',') AS ColumnBs
FROM TableName
GROUP BY ColumnA
ORDER BY ColumnA;
SELECT ColumnA
, LISTAGG(ColumnB, ',') WITHIN GROUP (ORDER BY ColumnB) AS ColumnBs
FROM TableName
GROUP BY ColumnA
ORDER BY ColumnA;
PostgreSQL
SELECT ColumnA
, STRING_AGG(ColumnB, ',' ORDER BY ColumnB) AS ColumnBs
FROM TableName
GROUP BY ColumnA
ORDER BY ColumnA;
SQL Server
SQL Server 2016 and earlier
WITH CTE_TableName AS (
SELECT ColumnA, ColumnB
FROM TableName)
SELECT t0.ColumnA
, STUFF((
SELECT ',' + t1.ColumnB
FROM CTE_TableName t1
WHERE t1.ColumnA = t0.ColumnA
ORDER BY t1.ColumnB
FOR XML PATH('')), 1, 1, '') AS ColumnBs
FROM CTE_TableName t0
GROUP BY t0.ColumnA
ORDER BY ColumnA;
SELECT ColumnA
, STRING_AGG(ColumnB, ',') WITHIN GROUP (ORDER BY ColumnB) AS ColumnBs
FROM TableName
GROUP BY ColumnA
ORDER BY ColumnA;
SQLite
without ordering:
SELECT ColumnA
, GROUP_CONCAT(ColumnB, ',') AS ColumnBs
FROM TableName
GROUP BY ColumnA
ORDER BY ColumnA;
TotalSalar
y
2500
DepartmentI TotalSalar
d y
1 2000
2 500
EXAMPLE TABLE
To select the average population of the New York City, USA from a table containing city names,
population measurements, and measurement years for last ten years:
QUERY
Notice how measurement year is absent from the query since population is being averaged over
time.
RESULTS
city_name avg_populatio
n
Note: The AVG() function will convert values to numeric types. This is especially important
to keep in mind when working with dates.
Section 42.5: Count
You can count the number of rows:
TotalRow
s
DepartmentI NumEmployees
d
1 3
2 1
You can count over a column/expression with the effect that will not count the NULL values:
mgr
For example:
Will return different values. The SingleCount will only Count individual Continents once, while the
AllCount will include duplicates.
ContinentCode
OC
EU
AS
NA
NA
AF
AF
AllCount: 7 SingleCount: 5
Above example will return smallest value for column age of employee table.
Syntax:
SELECT MIN(column_name) FROM table_name;
Above example will return largest value for column age of employee table.
Syntax:
You use scalar functions wherever an expression is allowed within a T-SQL statement.
time hh:mm:ss[.nnnnnnn]
date YYYY-MM-DD
The DATENAME function returns the name or value of a specific part of the date.
Datename
Saturday
You use the GETDATE function to determine the current date and time of the computer running the
current SQL instance. This function doesn't include the time zone difference.
SELECT GETDATE() as Systemdate;
Systemdate
2017-01-14
11:11:47.7230728
In the syntax, datepart is the parameter that specifies which part of the date you want to use to
calculate difference. The datepart can be year, month, week, day, hour, minute, second, or
millisecond. You then specify the start date in the startdate parameter and the end date in the
enddate parameter for which you want to find the difference.
SalesOrderID Processing
time
43659 7
43660 7
43661 7
43662 7
The DATEADD function enables you to add an interval to part of a specific date.
Added20MoreDays
2017-02-03
00:00:00.000
The lower(char) function converts the given character parameter to be lower-cased characters.
would return the customer's last name changed from "SMITH" to "smith".
Server
SQL06
4
In SQL, most data conversions occur implicitly, without any user intervention.
To perform any conversions that can't be completed implicitly, you can use the CAST or CONVERT
functions.
The CAST function syntax is simpler than the CONVERT function syntax, but is limited in what it can
do.
In here, we use both the CAST and CONVERT functions to convert the datetime data type to the
varchar data type.
The CAST function always uses the default style setting. For example, it will represent dates and
times using the format YYYY-MM-DD.
The CONVERT function uses the date and time style you specify. In this case, 3 specifies the date
format dd/mm/yy.
USE AdventureWorks2012
GO
SELECT FirstName + ' ' + LastName + ' was hired on ' +
CAST(HireDate AS varchar(20)) AS 'Cast',
FirstName + ' ' + LastName + ' was hired on ' +
CONVERT(varchar, HireDate, 3) AS 'Convert'
FROM Person.Person AS p
JOIN HumanResources.Employee AS e
ON p.BusinessEntityID = e.BusinessEntityID
GO;
Cast Convert
David Hamiltion was hired on 2003-02-04 David Hamiltion was hired on 04/02/03
Another example of a conversion function is the PARSE function. This function converts a string to a
specified data type.
In the syntax for the function, you specify the string that must be converted, the AS keyword, and
then the required data type. Optionally, you can also specify the culture in which the string value
should be formatted. If you don't specify this, the language for the session is used.
If the string value can't be converted to a numeric, date, or time format, it will result in an error. You'll
then need to use CAST or CONVERT for the conversion.
Date in English
2012-08-13
00:00:00.0000000
The CHOOSE function returns an item from a list of values, based on its position in the list. This
position is specified by the index.
In the syntax, the index parameter specifies the item and is a whole number, or integer. The val_1
… val_n parameter identifies the list of values.
Result
Sales
In this example, you use the CHOOSE function to return the second entry in a list of departments.
The IIF function returns one of two values, based on a particular condition. If the condition is true, it
will return true value. Otherwise it will return a false value.
In the syntax, the boolean_expression parameter specifies the Boolean expression. The
true_value parameter specifies the value that should be returned if the boolean_expression
evaluates to true and the false_value parameter specifies the value that should be returned if the
boolean_expression evaluates to false.
In this example, you use the IIF function to return one of two values. If a sales person's year-to-date
sales are above 200,000, this person will be eligible for a bonus. Values below 200,000 mean that
employees don't qualify for bonuses.
SQL includes several mathematical functions that you can use to perform calculations on input
values and return numeric results.One example is the SIGN function, which returns a value indicating
the sign of an expression. The value of -1 indicates a negative expression, the value of +1 indicates
a positive expression, and 0 indicates zero.
Sign
-1
In the example, the input is a negative number, so the Results pane lists the result -1.
Another mathematical function is the POWER function. This function provides the value of an
expression raised to a specified power.
In the syntax, the float_expression parameter specifies the expression, and the y parameter
specifies the power to which you want to raise the expression.
Result
125000
You use a scalar expression to specify the values that should be compared. The offset parameter is
the number of rows before the current row that will be used in the comparison. If you don't specify
the number of rows, the default value of one row is used.
The default parameter specifies the value that should be returned when the expression at offset has
a NULL value. If you don't specify a value, a value of NULL is returned.
The LEAD function provides data on rows after the current row in the row set. For example, in a
SELECT statement, you can compare values in the current row with values in the following row.
You specify the values that should be compared using a scalar expression. The offset parameter is
the number of rows after the current row that will be used in the comparison.
You specify the value that should be returned when the expression at offset has a NULL value using
the default parameter. If you don't specify these parameters, the default of one row is used and a
value of NULL is returned.
This example uses the LEAD and LAG functions to compare the sales values for each employee to
date with those of the employees listed above and below, with records ordered based on the
BusinessEntityID column.
The values are grouped by rowset or partition, as specified by the WITHIN GROUP clause.
To find the exact value from the row that matches or exceeds the 0.5 percentile, you pass the
percentile as the numeric literal in the PERCENTILE_DISC function. The Percentile Discreet column
in a result set lists the value of the row at which the cumulative distribution is higher than the
specified percentile.
267 Application 57 1 56
Specialist
The PERCENTILE_CONT function is similar to the PERCENTILE_DISC function, but returns the
average of the sum of the first matching entry and the next entry.
To base the calculation on a set of values, you use the PERCENTILE_CONT function. The "Percentile
Continuous" column in the results lists the average value of the sum of the result value and the next
highest matching value.
267 Applicatio 57 1 56 56
n
Specialist
In this example, the FIRST_VALUE function is used to return the ID of the state or province with the
lowest tax rate. The OVER clause is used to order the tax rates to obtain the lowest rate.
This example uses the LAST_VALUE function to return the last value for each rowset in the ordered
values.
The first value in the result set always has a percent rank of zero. The value for the highest-ranked –
or last – value in the set is always one.
The CUME_DIST function calculates the relative position of a specified value in a group of values, by
determining the percentage of values less than or equal to that value. This is called the cumulative
distribution.
In this example, you use an ORDER clause to partition – or group – the rows retrieved by the SELECT
statement based on employees' job titles, with the results in each group sorted based on the
numbers of sick leave hours that employees have used.
272 Applicatio 55 1 1
n
Specialist
262 Assitant 48 0 1
to the
Cheif
Financial
Officer
239 Benefits 45 0 1
Specialist
The PERCENT_RANK function ranks the entries within each group. For each entry, it returns the
percentage of entries in the same group that have lower values.
The CUME_DIST function is similar, except that it returns the percentage of values less than or equal
to the current value.
Chapter 45: Window Functions
Section 45.1: Setting up a flag if other rows have a common
property
Let's say I have this data:
Table items
id name tag
1 example unique_ta
g
2 foo simple
42 bar simple
3 baz hello
51 quux world
I'd like to get all those lines and know if a tag is used by other lines
SELECT id, name, tag, COUNT(*) OVER (PARTITION BY tag) > 1 AS flag FROM items
In case your database doesn't have OVER and PARTITION you can use this to produce the same
result:
SELECT id, name, tag, (SELECT COUNT(tag) FROM items B WHERE tag = A.tag) > 1 AS flag FROM
items A;
Items identified by ID values must move from STATUS 'ONE' to 'TWO' to 'THREE' in sequence,
without skipping statuses. The problem is to find users (STATUS_BY) values who violate the rule and
move from 'ONE' immediately to 'THREE'.
The LAG() analytical function helps to solve the problem by returning for each row the value in the
preceding row:
SELECT * FROM (
SELECT
t.*,
LAG(status) OVER (PARTITION BY id ORDER BY status_time) AS prev_status
FROM test t
) t1 WHERE status = 'THREE' AND prev_status != 'TWO';
In case your database doesn't have LAG() you can use this to produce the same result:
date amount
2016-03-12 200
2016-03-11 -50
2016-03-14 100
2016-03-15 100
2016-03-10 -250
2016-03-14 100 0
i name Ttl_Row
d s
1 example 5
2 foo 5
3 bar 5
4 baz 5
5 quux 5
Instead of using two queries to get a count then the line, you can use an aggregate as a window
function and use the full result set as the window. This can be used as a base for further calculation
without the complexity of extra self joins.
1 2016-07-20
1 2016-07-21
2 2016-07-20
2 2016-07-21
2 2016-07-22
with CTE as
(SELECT *,
ROW_NUMBER() OVER (PARTITION BY User_ID
ORDER BY Completion_Date DESC) Row_Num
FROM Data)
SELECT * FROM CTE WHERE Row_Num <= n;
Using n=1, you'll get the one most recent row per user_id:
1 2016-07-21 1
2 2016-07-22 1
The following example generates a common table expression called Numbers with a column i which
has a row for numbers 1-5:
--Give a table name `Numbers` and a column `i` to hold the numbers
WITH Numbers(i) AS (
--Starting number/index
SELECT 1
--Top-level UNION ALL operator required for recursion
UNION ALL
--Iteration expression:
SELECT i + 1
--Table expression we first declared used as source for recursion
FROM Numbers
--Clause to define the end of the recursion
WHERE i < 5
)
--Use the generated table expression like a regular table
SELECT i FROM Numbers;
This method can be used with any number interval, as well as other types of data.
UNION ALL
-- get employees that have any of the previously selected rows as manager
SELECT ManagedByJames.Level + 1,
Employees.ID,
Employees.FName,
Employees.LName
FROM Employees
JOIN ManagedByJames
ON Employees.ManagerID = ManagedByJames.ID
ORDER BY 1 DESC -- depth-first search
)
SELECT * FROM ManagedByJames;
1 1 James Smith
2 2 John Johnson
3 4 Johnatho Smith
n
2 3 Michael Williams
WITH ReadyCars AS (
SELECT *
FROM Cars
WHERE Status = 'READY'
)
SELECT ID, Model, TotalCost
FROM ReadyCars
ORDER BY TotalCost;
I Model TotalCos
D t
1 Ford F- 200
150
2 Ford F- 230
150
UNION ALL
-- Transition Sequence = Rest & Relax into Day Shift into Night Shift
-- RR (Rest & Relax) = 1
-- DS (Day Shift) = 2
-- NS (Night Shift) = 3
;WITH roster AS
(
SELECT @DateFrom AS RosterStart, 1 AS TeamA, 2 AS TeamB, 3 AS TeamC
UNION ALL
SELECT DATEADD(d, @IntervalDays, RosterStart),
CASE TeamA WHEN 1 THEN 2 WHEN 2 THEN 3 WHEN 3 THEN 1 END AS TeamA,
CASE TeamB WHEN 1 THEN 2 WHEN 2 THEN 3 WHEN 3 THEN 1 END AS TeamB,
CASE TeamC WHEN 1 THEN 2 WHEN 2 THEN 3 WHEN 3 THEN 1 END AS TeamC
FROM roster WHERE RosterStart < DATEADD(d, -@IntervalDays, @DateTo)
)
SELECT RosterStart,
ISNULL(LEAD(RosterStart) OVER (ORDER BY RosterStart), RosterStart + @IntervalDays) AS
RosterEnd,
CASE TeamA WHEN 1 THEN 'RR' WHEN 2 THEN 'DS' WHEN 3 THEN 'NS' END AS TeamA,
CASE TeamB WHEN 1 THEN 'RR' WHEN 2 THEN 'DS' WHEN 3 THEN 'NS' END AS TeamB,
CASE TeamC WHEN 1 THEN 'RR' WHEN 2 THEN 'DS' WHEN 3 THEN 'NS' END AS TeamC
FROM roster;
Result
I.e. For Week 1 TeamA is on R&R, TeamB is on Day Shift and TeamC is on Night Shift.
Results Messages
RosterStart RosterEnd
2016-06-01 2016-06-08
06:00:00.000 06:00:00.000
2016-06-08 2016-06-15
06:00:00.000 06:00:00.000
2016-06-15 2016-06-22
06:00:00.000 06:00:00.000
2016-06-22 2016-06-29
06:00:00.000 06:00:00.000
2016-06-29 2016-07-06
06:00:00.000 06:00:00.000
WITH tbl AS (
SELECT id, name, parent_id
FROM mytable)
, tbl_hierarchy AS (
/* Anchor */
SELECT 1 AS "LEVEL"
--, 1 AS CONNECT_BY_ISROOT
--, 0 AS CONNECT_BY_ISBRANCH
, CASE WHEN t.id IN (SELECT parent_id FROM tbl) THEN 0 ELSE 1 END AS
CONNECT_BY_ISLEAF
, 0 AS CONNECT_BY_ISCYCLE
, '/' + CAST(t.id AS VARCHAR(MAX)) + '/' AS SYS_CONNECT_BY_PATH_id
, '/' + CAST(t.name AS VARCHAR(MAX)) + '/' AS SYS_CONNECT_BY_PATH_name
, t.id AS root_id
, t.*
FROM tbl t
WHERE t.parent_id IS NULL -- START WITH parent_id IS NULL
UNION ALL
/* Recursive */
SELECT th."LEVEL" + 1 AS "LEVEL"
--, 0 AS CONNECT_BY_ISROOT
--, CASE WHEN t.id IN (SELECT parent_id FROM tbl) THEN 1 ELSE 0 END AS
CONNECT_BY_ISBRANCH
, CASE WHEN t.id IN (SELECT parent_id FROM tbl) THEN 0 ELSE 1 END AS
CONNECT_BY_ISLEAF
, CASE WHEN th.SYS_CONNECT_BY_PATH_id LIKE '%/' + CAST(t.id AS VARCHAR(MAX)) + '/%'
THEN 1 ELSE 0 END AS CONNECT_BY_ISCYCLE
, th.SYS_CONNECT_BY_PATH_id + CAST(t.id AS VARCHAR(MAX)) + '/' AS
SYS_CONNECT_BY_PATH_id
, th.SYS_CONNECT_BY_PATH_name + CAST(t.name AS VARCHAR(MAX)) + '/' AS
SYS_CONNECT_BY_PATH_name
, th.root_id
, t.*
FROM tbl t
JOIN tbl_hierarchy th ON (th.id = t.parent_id) -- CONNECT BY PRIOR id = parent_id
WHERE th.CONNECT_BY_ISCYCLE = 0) -- NOCYCLE
SELECT th.*
--, REPLICATE(' ', (th."LEVEL" - 1) * 3) + th.name AS tbl_hierarchy
FROM tbl_hierarchy th
JOIN tbl CONNECT_BY_ROOT ON (CONNECT_BY_ROOT.id = th.root_id)
ORDER BY th.SYS_CONNECT_BY_PATH_name; -- ORDER SIBLINGS BY name;
● Clauses
○ CONNECT BY: Specifies the relationship that defines the hierarchy.
○ START WITH: Specifies the root nodes.
○ ORDER SIBLINGS BY: Orders results properly.
●
● Parameters
○ NOCYCLE: Stops processing a branch when a loop is detected. Valid hierarchies are
Directed Acyclic Graphs, and circular references violate this construct.
●
● Operators
○ PRIOR: Obtains data from the node's parent.
○ CONNECT_BY_ROOT: Obtains data from the node's root.
● Pseudocolumns
○ LEVEL: Indicates the node's distance from its root.
○ CONNECT_BY_ISLEAF: Indicates a node without children.
○ CONNECT_BY_ISCYCLE: Indicates a node with a circular reference.
● Functions
○ SYS_CONNECT_BY_PATH: Returns a flattened/concatenated representation of the
path to the node from its root.
Chapter 47: Views
Section 47.1: Simple views
A view can filter some rows from the base table or project only some columns from it:
SELECT *
FROM dept_income;
DepartmentName TotalSalar
y
HR 1900
Sales 600
Chapter 48: Materialized Views
A materialized view is a view whose results are physically stored and must be periodically refreshed
in order to remain current. They are therefore useful for storing the results of complex, long-running
queries when realtime results are not required. Materialized views can be created in Oracle and
PostgreSQL. Other database systems offer similar functionality, such as SQL Server's indexed
views or DB2's materialized query tables.
number
--------
1
(1 row)
number
--------
1
(1 row)
number
--------
1
2
(2 rows)
SELECT *
FROM Employees -- this is a comment
WHERE FName = 'John';
An example of where a foreign key is required is: In a university, a course must belong to a
department. Code for the this scenario is:
The following table will contain the information of the subjects offered by the Computer science
branch:
(The data type of the Foreign Key must match the datatype of the referenced key.)
The Foreign Key constraint on the column Dept_Code allows values only if they already exist in the
referenced table, Department. This means that if you try to insert the following values:
the database will raise a Foreign Key violation error, because CS300 does not exist in the
Department table. But when you try a key value that exists:
● A Foreign Key must reference a UNIQUE (or PRIMARY) key in the parent table.
● Entering a NULL value in a Foreign Key column does not raise an error.
● Foreign Key constraints can reference tables within the same database.
● Foreign Key constraints can refer to another column in the same table (self-reference).
We will add a new table in order to store the powers of each super hero:
UPDATE Orders
SET Order_UID = orders_seq.NEXTVAL
WHERE Customer = 581;
SELECT *
FROM Employees
WHERE Salary = (SELECT MAX(Salary) FROM Employees);
SELECT EmployeeId
FROM Employee AS eOuter
WHERE Salary > (
SELECT AVG(Salary)
FROM Employee eInner
WHERE eInner.DepartmentId = eOuter.DepartmentId
);
Subquery SELECT AVG(Salary) ... is correlated because it refers to Employee row eOuter from
its outer query.
SELECT *
FROM Employees
WHERE EmployeeID not in (SELECT EmployeeID
FROM Supervisors);
SELECT *
FROM Employees AS e
LEFT JOIN Supervisors AS s ON s.EmployeeID=e.EmployeeID
WHERE s.EmployeeID is NULL;
The above finds cities from the weather table whose daily temperature variation is greater than 20.
The result is:
city temp_va
r
ST LOUIS 21
LOS 31
ANGELES
LOS 23
ANGELES
LOS 31
ANGELES
LOS 27
ANGELES
LOS 28
ANGELES
LOS 28
ANGELES
LOS 32
ANGELES
Here: the subquery (SELECT avg(pop2000) FROM cities) is used to specify conditions in the
WHERE clause. The result is:
name pop2000
ST LOUIS 348189
BEGIN
UPDATE Employees SET PhoneNumber = '5551234567' WHERE Id = 1;
UPDATE Employees SET Salary = 650 WHERE Id = 3;
END;
-- Or
EXEC Northwind.getEmployee @LastName = N'Ackerman', @FirstName = N'Pilar';
GO
-- Or
EXECUTE Northwind.getEmployee @FirstName = N'Pilar', @LastName = N'Ackerman';
GO
BEGIN TRANSACTION
INSERT INTO DeletedEmployees(EmployeeID, DateDeleted, User)
(SELECT 123, GetDate(), CURRENT_USER);
DELETE FROM Employees WHERE EmployeeID = 123;
COMMIT TRANSACTION;
BEGIN TRY
BEGIN TRANSACTION
INSERT INTO Users(ID, Name, Age)
VALUES(1, 'Bob', 24)
DELETE FROM Users WHERE Name = 'Todd'
COMMIT TRANSACTION
END TRY
BEGIN CATCH
ROLLBACK TRANSACTION
END CATCH;
At best a badly designed table structure will slow the execution of queries and could make it
impossible for the database to function as intended.
A database table should not be considered as just another table; it has to follow a set of rules to be
considered truly relational. Academically it is referred to as a 'relation' to make the distinction.
1. Each value is atomic; the value in each field in each row must be a single value.
2. Each field contains values that are of the same data type.
3. Each field heading has a unique name.
4. Each row in the table must have at least one value that makes it unique amongst the other
records in the table.
5. The order of the rows and columns has no significance.
1 Fred 11/02/197 3
1
2 Fred 11/02/197 3
1
3 Sue 08/07/197 2
5
●
Rule 1: Each value is atomic. Id, Name, DOB and Manager only contain a single value.
● Rule 2: Id contains only integers, Name contains text (we could add that it's text of four
characters or less), DOB contains dates of a valid type and Manager contains integers (we
could add that corresponds to a Primary Key field in a managers table).
● Rule 3: Id, Name, DOB and Manager are unique heading names within the table.
● Rule 4: The inclusion of the Id field ensures that each record is distinct from any other
record within the table.
1 Fred 11/02/1971 3
1 Fred 11/02/1971 3
●
Rule 1: The second name field contains two values - 2 and 1.
● Rule 2: The DOB field contains dates and text.
● Rule 3: There's two fields called 'name'.
● Rule 4: The first and second record are exactly the same.
● Rule 5: This rule isn't broken.
Such a query allows users to rapidly find database tables containing columns of interest, such as
when attempting to relate data from 2 tables indirectly through a third table, without existing
knowledge of which tables may contain keys or other useful columns in common with the target
tables.
Using T-SQL for this example, a database's information schema may be searched as follows:
SELECT *
FROM INFORMATION_SCHEMA.COLUMNS
WHERE COLUMN_NAME LIKE '%Institution%’;
The result contains a list of matching columns, their tables' names, and other useful information.
VT stands for 'Virtual Table' and shows how various data is produced as the query is processed
1. FROM: A Cartesian product (cross join) is performed between the first two tables in the FROM
clause, and as a result, virtual table VT1 is generated.
2. ON: The ON filter is applied to VT1. Only rows for which the is TRUE are inserted to VT2.
3. OUTER (join): If an OUTER JOIN is specified (as opposed to a CROSS JOIN or an INNER
JOIN), rows from the preserved table or tables for which a match was not found are added to
the rows from VT2 as outer rows, generating VT3. If more than two tables appear in the
FROM clause, steps 1 through 3 are applied repeatedly between the result of the last join and
the next table in the FROM clause until all tables are processed.
4. WHERE: The WHERE filter is applied to VT3. Only rows for which the is TRUE are inserted to
VT4.
5. GROUP BY: The rows from VT4 are arranged in groups based on the column list specified
in the GROUP BY clause. VT5 is generated.
6. CUBE | ROLLUP: Supergroups (groups of groups) are added to the rows from VT5,
generating VT6.
7. HAVING: The HAVING filter is applied to VT6. Only groups for which the is TRUE are inserted
to VT7.
8. SELECT: The SELECT list is processed, generating VT8.
9. DISTINCT: Duplicate rows are removed from VT8. VT9 is generated.
10. ORDER BY: The rows from VT9 are sorted according to the column list specified in the
ORDER BY clause. A cursor is generated (VC10).
11. TOP: The specified number or percentage of rows is selected from the beginning of VC10.
Table VT11 is generated and returned to the caller. LIMIT has the same functionality as TOP
in some SQL dialects such as Postgres and Netezza.
Chapter 61: Clean Code in SQL
How to write good, readable SQL queries, and example of good practices.
Two common ways of formatting table/column names are CamelCase and snake_case:
Adding prefixes or suffixes like tbl or col reduces readability, so avoid them. However, they are
sometimes used to avoid conflicts with SQL keywords, and often used with triggers and indexes
(whose names are usually not mentioned in queries).
Keywords
SQL keywords are not case sensitive. However, it is common practice to write them in upper case.
At the minimum, put every clause into a new line, and split lines if they would become too long
otherwise:
SELECT d.Name,
COUNT(*) AS Employees
FROM Departments AS d
JOIN Employees AS e ON d.ID = e.DepartmentID
WHERE d.Name != 'HR'
HAVING COUNT(*) > 10
ORDER BY COUNT(*) DESC;
Sometimes, everything after the SQL keyword introducing a clause is indented to the same column:
SELECT d.Name,
COUNT(*) AS Employees
FROM Departments AS d
JOIN Employees AS e ON d.ID = e.DepartmentID
WHERE d.Name != 'HR'
HAVING COUNT(*) > 10;
(This can also be done while aligning the SQL keywords right.)
SELECT
d.Name,
COUNT(*) AS Employees
FROM
Departments AS d
JOIN
Employees AS e
ON d.ID = e.DepartmentID
WHERE
d.Name != 'HR'
HAVING
COUNT(*) > 10
ORDER BY
COUNT(*) DESC;
SELECT Model,
EmployeeID
FROM Cars
WHERE CustomerID = 42
AND Status = 'READY';
Using multiple lines makes it harder to embed SQL commands into other programming languages.
However, many languages have a mechanism for multi-line strings, e.g., @""..."" in C#,
"""...""" in Python, or R"(...)" in C++.
When using SELECT *, the data returned by a query can change whenever the table definition
changes. This increases the risk that different versions of your application or your database are
incompatible with each other.
Furthermore, reading more columns than necessary can increase the amount of disk and network
I/O.
So you should always explicitly specify the column(s) you actually want to retrieve:
--SELECT * don't
SELECT ID, FName, LName, PhoneNumber -- do
FROM Emplopees;
However, SELECT * does not hurt in the subquery of an EXISTS operator, because EXISTS ignores
the actual data anyway (it checks only if at least one row has been found). For the same reason, it is
not meaningful to list any specific column(s) for EXISTS, so SELECT * actually makes more sense:
● The join condition is somewhere in the WHERE clause, mixed up with any other filter
conditions. This makes it harder to see which tables are joined, and how.
● Due to the above, there is a higher risk of mistakes, and it is more likely that they are found
later.
● In standard SQL, explicit joins are the only way to use outer joins:
SELECT d.Name,
e.Fname || e.LName AS EmpName
FROM Departments AS d
LEFT JOIN Employees AS e ON d.ID = e.DepartmentID;
SELECT RecipeID,
Recipes.Name,
COUNT(*) AS NumberOfIngredients
FROM Recipes
LEFT JOIN Ingredients USING (RecipeID);
(This requires that both tables use the same column name.
USING automatically removes the duplicate column from the result, e.g., the join in this query returns
a single RecipeID column.)
https://somepage.com/ajax/login.ashx?username=admin&password=123
strUserName = getHttpsRequestParameterString("username");
strPassword = getHttpsRequestParameterString("password");
and query your database to determine whether a user with that password exists.
txtSQL = "SELECT * FROM Users WHERE username = '" + strUserName + "' AND password = '"+
strPassword +"'";
This will work if the username and password do not contain a quote.
However, if one of the parameters does contain a quote, the SQL that gets sent to the database will
look like this:
-- strUserName = "d'Alambert";
txtSQL = "SELECT * FROM Users WHERE username = 'd'Alambert' AND password = '123'";
This will result in a syntax error, because the quote after the d in d'Alambert ends the SQL string.
You could correct this by escaping quotes in username and password, e.g.:
If you do not use parameters, and forget to replace quote in even one of the values, then a malicious
user (aka hacker) can use this to execute SQL commands on your database.
"SELECT * FROM Users WHERE username = 'somebody' AND password = 'lol'; DROP DATABASE
master; --'";
Unfortunately for you, this is valid SQL, and the DB will execute this!
There are many other things a malicious user could do, such as stealing every user's email address,
steal everyone's password, steal credit card numbers, steal any amount of data in your database,
etc.
SQL = "SELECT * FROM Users WHERE username = '" + user + "' AND password ='" + pw + "'";
db.execute(SQL);
Then a hacker could retrieve your data by giving a password like pw' or '1'='1; the resulting SQL
statement will be:
SELECT * FROM Users WHERE username = 'somebody' AND password ='pw' or '1'='1';
This one will pass the password check for all rows in the Users table because '1'='1' is always
true.