SQL Cheatshet
SQL Cheatshet
SQL is a standard language that was designed to query and manage data
in relational database management systems (RDBMSs). An RDBMS is a
database management system based on the relational model (a semantic
model for representing data).
In any SQL dialect, the SQL statements are grouped together into several different types
of statements. These different types are:
• Data Manipulation Language (DML) is the set of SQL statements that focuses on
querying and modifying data. DML statements include SELECT, the primary focus
of this training, and modification statements such as INSERT, UPDATE, and DELETE.
• Data Definition Language (DDL) is the set of SQL statements that handles the
definition and life cycle of database objects, such as tables, views, and procedures.
DDL includes statements such as CREATE, ALTER, and DROP.
• Data Control Language (DCL) is the set of SQL statements used to manage
security permissions for users and objects. DCL includes statements such as
GRANT, REVOKE, and DENY.
Clauses
The Select clause selects columns
The FROM clause identifies which table is the source of the rows for the
query
The WHERE clause filters rows out of the results, keeping only those rows
that satisfy the specified condition
The GROUP BY clause takes the rows that met the filter condition and
groups them
The HAVING clause filters the groups based on its own predicate
1. The FROM clause is evaluated first, to provide the source rows for the rest of the
statement. A virtual table is created and passed to the next step.
2. The WHERE clause is next to be evaluated, filtering those rows from the source
table that match a predicate. The filtered virtual table is passed to the next step.
3. GROUP BY is next, organizing the rows in the virtual table according to unique
values found in the GROUP BY list. A new virtual table is created, containing the
list of groups, and is passed to the next step. From this point in the flow of
operations, only columns in the GROUP BY list or aggregate functions may be
referenced by other elements.
4. The HAVING clause is evaluated next, filtering out entire groups based on its
predicate. The virtual table created in step 3 is filtered and passed to the next step.
5. The SELECT clause finally executes, determining which columns will appear in the
query results. Because the SELECT clause is evaluated after the other steps, any
column aliases (in our example, Orders) created there cannot be used in the
GROUP BY or HAVING clause.
6. The ORDER BY clause is the last to execute, sorting the rows as determined by its
column list.
When multiple operators appear in the same expression, SQL Server
evaluates them based on operator precedence rules. The following list
describes the precedence among operators, from highest to lowest:
1. ( ) (Parentheses)
2. * (Multiplication), / (Division), % (Modulo)
3. + (Positive), – (Negative), + (Addition), + (Concatenation), –
(Subtraction)
4. =, >, =, <=, <>, !=, !>, !< (Comparison operators)
5. NOT
6. AND
7. BETWEEN, IN, LIKE, OR
8. = (Assignment)
Tips
• Capitalize T-SQL keywords, like SELECT, FROM, AS, and so on.
Capitalizing keywords is a commonly used convention that makes it
easier to find each clause of a complex statement.
• Start a new line for each major clause of a statement.
• If the SELECT list contains more than a few columns, expressions, or
aliases, consider listing each column on its own line.
• Indent lines containing subclauses or columns to make it clear which
code belongs to each major clause.
Constraints
One of the greatest benefits of the relational model is the ability to define
data integrity as part of the model. Data integrity is achieved through rules
called constraints that are defined in the data model and enforced by the
RDBMS.
Primary Key Constraints
A primary key constraint enforces uniqueness of rows and also disallows
NULL marks in the constraint attributes. Each unique set of values in the
constraint attributes can appear only once in the table—in other words,
only in one row. An attempt to define a primary key constraint on a column
that allows NULL marks will be rejected by the RDBMS. Each table can have
only one primary key.
Unique Constraints
Check Constraints
A check constraint allows you to define a predicate that a row must meet
to be entered into the table or to be modified. For example, the following
check constraint ensures that the salary column in the Employees table will
support only positive values.
Default Constraints
A default constraint is associated with a particular attribute. It is an
expression that is used as the default value when an explicit value is not
specified for the attribute when you insert a row.
INDEX Constraint
The INDEX constraints are created to speed up the data retrieval from the
database. An Index can be created by using a single or group of columns in
a table. A table can have a single PRIMARY Key but can have multiple
INDEXES. An Index can be Unique or Non Unique based on requirements.
The SQL Indexes
SQL Indexes are special lookup tables that are used to speed up the
process of data retrieval. They hold pointers that refer to the data stored in
a database, which makes it easier to locate the required data records in a
database table.
Types of Indexes:
• Unique Index
• Single-Column Index
• Composite Index
• Implicit Index
Unique indexes are used not only for performance, but also for data
integrity. A unique index does not allow any duplicate values to be inserted
into the table. It is automatically created by PRIMARY and UNIQUE constraints
when they are applied on a database table, in order to prevent the user from
inserting duplicate values into the indexed table column(s).
Filters
The TOP option is a proprietary T-SQL feature that allows you to limit the number or
percentage of rows that your query returns.
Predicates
Normalization
Normalization is a formal mathematical process to guarantee that each
entity will be represented by a single relation.
1NF The first normal form says that the tuples (rows) in the relation
(table) must be unique, and attributes should be atomic
2NF The second normal form involves two rules. One rule is that the
data must meet the first normal form. The other rule addresses the
relationship between non-key and candidate key attributes. For every
candidate key, every non-key attribute has to be fully functionally
dependent on the entire candidate key
3NF The third normal form also has two rules. The data must meet
the second normal form. Also, all non-key attributes must be
dependent on candidate keys non-transitively. Informally this rule
means www.it-ebooks.info Chapter 1 Background to T-SQL Querying
and Programming 9 that all non-key attributes must be mutually
independent.
Data Warehouse
A data warehouse (DW) is an environment designed for data retrieval and
reporting purposes. When it serves an entire organization, such an
environment is called a data warehouse; when it serves only part of the
organization (such as a specific department) or a subject matter area in
the organization, it is called a data mart.
The simplest data warehouse design is called a star schema. The star
schema includes several dimension tables and a fact table. Each
dimension table represents a subject by which you want to analyze the
data.
Databases
Databases You can think of a database as a container of objects such as
tables, views, stored procedures, and other objects.
CASE Expressions
A CASE expression is a scalar expression that returns a value based on
conditional logic.
NULL Marks
A NULL value means no value or unknown. It does not mean zero or blank,
or even an empty string. Those values are not unknown.
Always keep in mind that T-SQL uses three-valued predicate logic, where
logical expressions can evaluate to TRUE, FALSE, or UNKNOWN
Note that all aggregate functions ignore NULL marks with one
exception—COUNT(*).
The correct definition of the treatment SQL has for query filters is “accept
TRUE,” meaning that both FALSE and UNKNOWN are filtered out.
Conversely, the definition of the treatment SQL has for CHECK constraints
is “reject FALSE,” meaning that both TRUE and UNKNOWN are accepted
Working with Character Data
Any data type without the VAR element (CHAR, NCHAR) in its name has a
fixed length, which means that SQL Server preserves space in the row
based on the column’s defined size and not on the actual number of
characters in the character string.
A data type with the VAR element (VARCHAR, NVARCHAR) in its name has
a variable length, which means that SQL Server uses as much storage space
in the row as required to store the characters that appear in the character
string, plus two extra bytes for offset data.
JOINS
A JOIN table operator operates on two input tables. The three fundamental
types of joins are cross joins, inner joins, and outer joins. These three
types of joins differ in how they apply their logical query processing
phases; each type applies a different set of phases. A cross join applies only
one phase—Cartesian Product. An inner join applies two phases—Cartesian
Product and Filter. An outer join applies three phases—Cartesian Product,
Filter, and Add Outer Rows.
a cross join is the simplest type of join. A cross join implements only one
logical query processing phase—a Cartesian Product. This phase operates
on the two tables provided as inputs to the join and produces a Cartesian
product of the two.
When a join condition involves only an equality operator, the join is said to
be an equi join. When a join condition involves any operator besides
equality, the join is said to be a non-equi join.
Subqueries
SQL supports writing queries within queries, or nesting queries. The
outermost query is a query whose result set is returned to the caller and is
known as the outer query. The inner query is a query whose result is used
by the outer query and is known as a subquery.
Table Expressions
A table expression is a named query expression that represents a valid
relational table.
Derived tables (also known as table subqueries) are defined in the FROM
clause of an outer query. Their scope of existence is the outer query. As
soon as the outer query is finished, the derived table is gone.
Common table expressions (CTEs) are another standard form of table
expression very similar to derived tables, yet with a couple of important
advantages. CTEs are defined by using a WITH statement and have the
following general form.
Inline TVFs are reusable table expressions that support input parameters.
In all respects except for the support for input parameters, inline TVFs are
similar to views.