0% found this document useful (0 votes)
14 views60 pages

CH 04

Uploaded by

vuthnarak2005
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views60 pages

CH 04

Uploaded by

vuthnarak2005
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 60

Data Selection Queries

Data Selection
Queries
Objectives
• Understand the origins of the SQL language.
• Learn how to formulate SELECT statements.
• Use WHERE clauses to filter result sets.
• Use ORDER BY clauses to sort result sets.
• Use GROUP BY clauses to aggregate result sets.
• Understand the principles of joining.
• Learn how to join tables using inner and outer joins.
• Learn how a self-join joins a table to itself.

The files associated with this chapter are located in the following folders:
{Install Folder}\Selecting
{Install Folder}\SelectingDataLab

Microsoft SQL Server 2005 4-1


Copyright © by Application Developers Training Company
All rights reserved. Reproduction is strictly prohibited.
Data Selection Queries

Understanding Transact-SQL
Transact-SQL is based on the Structured Query Language (SQL) core, which
is used to select and manipulate data as well as define data structures and
implement security.

In the early days of relational databases, IBM developed Structured Query


Language (SQL) as a language to retrieve and modify data for the first
generation of IBM relational databases. At the time, it was not the only
querying language in existence; each of the database vendors had its own
language. In the 1980s, the American National Standards Institute (ANSI)
created a committee to develop an industry standard querying language and the
committee ultimately selected IBM’s SQL.

The original standard, SQL86, was simply IBM’s implementation of the query
language. In 1989, the SQL89 standard was the first enhancement of SQL that
reflected the influence of a variety of vendors. The SQL language was further
expanded in 1992 to the SQL92 standard and then to SQL99. Transact-SQL is
the SQL Server-specific implementation. SQL Server 2005 conforms to
SQL92 and implements several SQL99 enhancements, including common
table expressions and ranking functions.

NOTE SQL92 included three levels of conformance: Entry, Intermediate,


and Full. Almost all current SQL dialects conform to the Entry
level; most conform to Intermediate; and not one comes very close
to Full conformance. No vendor has fully implemented every part
of the ANSI/ISO standard, and each vendor has added its own
proprietary extensions to the language.

By definition, a querying language is able to retrieve data from the database


(after all, that’s what “query” means). However, in addition to querying, it also
needs to have:

• The ability to add, delete, and modify data in the database.


• The ability to create database elements, including databases, tables,
indexes, and other elements.
• The ability to control access to database elements.
• The ability to protect and enforce rules about the data.
We can condense these capabilities as two main categories:
Data Definition Language (DDL): Enables you to create and modify
objects in your database; the main commands are CREATE, ALTER, and
DROP. The SQL Server Enterprise Manager uses DDL behind the scenes
to create and modify objects. Additional commands support granting,
revoking, or denying permissions on objects.

4-2 Microsoft SQL Server 2005


Copyright © by Application Developers Training Company
All rights reserved. Reproduction is strictly prohibited.
Understanding Transact-SQL

Data Manipulation Language (DML): Enables you to work with your


data. SELECT, INSERT, UPDATE, DELETE, and TRUNCATE are all
part of DML.

In addition, Transact-SQL lets you work with variables, temporary objects, and
cursors. It also contains its own built-in system functions, which you can use
to aggregate data, to work with numbers, strings, and dates, and to retrieve
information from the system. Transact-SQL contains many features common
to all programming languages, such as branching, looping, and error handling.

Schemas and Naming in SQL Server


2005
In SQL Server 2005, database object names use a convention that can contain
four parts, any of which can be blank, except the object name itself:

[ server_name. [database_name]. [schema_name]. |


database_name.[schema_name]. | schema_name. ] object_name

• The server_name specifies linked server name or remote server name.


A blank implies the current server.
• The database_name specifies the database name. A blank implies the
current database context.
• The schema_name specifies the name of the schema that contains the
object if. A blank implies the default schema for the current user or the
dbo schema if no other default schema is assigned to the current user.
• The object_name specifies the name of the object.
In most situations, it is not necessary to use all four parts. However, the
recommendation is to use the schema name with the object name, as shown in
the following two examples. The first example using Northwind will work
with or without dbo because the server uses dbo when no schema is explicitly
defined and no default schema is explicitly assigned to the current user. The
second query using AdventureWorks will fail if the schema name Sales is
omitted. In this case, the Store table has been created in the Sales schema. This
query will work only for a user who has been assigned Sales as the default
schema.

Microsoft SQL Server 2005 4-3


Copyright © by Application Developers Training Company
All rights reserved. Reproduction is strictly prohibited.
Data Selection Queries

-- Use Northwind;
SELECT CompanyName FROM dbo.Customers;
-- "SELECT CompanyName FROM Customers;" will also work.

-- USE AdventureWorks;
SELECT Name FROM Sales.Store;
-- "SELECT Name FROM Store;" will fail.

The easiest way to deal with schemas is to keep all database objects assigned
to dbo and to avoid creating or assigning any other schemas. However,
schemas can be a useful way of creating multiple namespaces in a database,
just as namespaces make it easier for .NET programmers to keep track of
classes. The AdventureWorks database provides a good example of using
schemas as namespaces.

NOTE In previous versions of SQL Server, a schema was created


automatically for each database user. When a user created an
object, the object was automatically created in that user’s
schema—unless the user was a database owner, in which case the
object was created in the dbo schema. In SQL Server 2005, each
user does not automatically have a schema. Schemas are created
independently of users; users must explicitly be assigned rights to
a schema and can be assigned a default schema. Many users can
have rights to use any schema and many users can have the same
schema as their default.

4-4 Microsoft SQL Server 2005


Copyright © by Application Developers Training Company
All rights reserved. Reproduction is strictly prohibited.
The SELECT Statement

The SELECT Statement


See SelectingData.sql There are many elements of the SELECT statement, but the most basic
SELECT syntax looks like this:

SELECT <col1, col2, ...> FROM <table>

Using this basic syntax, a SQL statement that retrieves the first name and last
name from the Customer table looks like this:

SELECT FirstName, LastName FROM dbo.Employees;

The first few rows of the output are shown in Figure 1.

Figure 1. The result set from a simple SELECT statement.

TIP: The query processor ignores tabs, carriage returns, and extra spaces. In
addition, Transact-SQL statements do not have to be typed on a single line.
The statement:

SELECT
LastName
FROM
dbo.Employees;

is equivalent to:

SELECT LastName FROM dbo.Employees ;

In a single SQL statement, the semicolon at the end of the statement is


optional.

Microsoft SQL Server 2005 4-5


Copyright © by Application Developers Training Company
All rights reserved. Reproduction is strictly prohibited.
Data Selection Queries

Semicolon vs. GO

GO is not a Transact-SQL statement. It signals the end of a batch so that all of the
preceding statements execute together. A batch is not the same as a script, which can
contain multiple batches. GO is supported by Microsoft SQL Server Management
Studio (SSMS), the sqlcmd utility, and the osql utility. In SSMS, you can even set an
option to define your own batch terminator, rather than using GO.

The semicolon character is a statement terminator and is a part of the ANSI SQL92
standard, but was never widely used within Transact-SQL. Although it is not required,
it is considered good programming practice to use it at the end of each statement.

The main advantage of using a semicolon instead of GO is that a semicolon does not
reset variables. When a GO statement terminates a batch, all variables are destroyed.

However, some situations require a GO statement, such as when you use DDL to
create objects in which the first statement in the batch must be the CREATE statement.
Any statements that attempt to work with the new object will fail unless you use GO
after the CREATE statement.

Selecting All Columns


You can use the asterisk symbol in place of a field name to select all columns
in a table:

SELECT * FROM dbo.Employees;

This SQL statement retrieves all fields for the table, as shown in Figure 2 (note
that not all the columns are visible in this figure).

Figure 2. Returning all columns in a SELECT query.

4-6 Microsoft SQL Server 2005


Copyright © by Application Developers Training Company
All rights reserved. Reproduction is strictly prohibited.
The SELECT Statement

TIP: It’s best to avoid using the asterisk in your SQL statements. You’ll almost
never need every column in a table, and every column you retrieve adds time
to your query, adds overhead to the server, and eats up more bandwidth on
the network. It’s simply good SQL writing practice to be explicit about which
columns you need.

Sometimes, however, you actually don’t know which columns may be added
later and you want to write a query that is sure to retrieve all the columns. In
such cases, the asterisk is appropriate.

Concatenating Columns
Not every field specified in a SELECT statement has to be a column in a
database. You can create your own columns using expressions. For example,
you can concatenate values from multiple columns to create a new column.

In the Customer table, the first name and last name of each customer are in two
separate fields. You can combine these fields using the addition (+) operator
and adding a comma and a space in the middle:

SELECT LastName + ', ' + FirstName FROM dbo.Employees;

When SQL Server receives this query, it executes the expression, and the
result set contains a single constructed column that contains both the last and
first names of the customer, separated by a comma and space, as shown in
Figure 3.

Figure 3. Concatenating two columns to create a new column.

Microsoft SQL Server 2005 4-7


Copyright © by Application Developers Training Company
All rights reserved. Reproduction is strictly prohibited.
Data Selection Queries

Naming Columns
Notice in Figure 3 that the column has no name. Since the column is an
expression, rather than a column in the database, you must explicitly name the
column. Here is how you can do that by using the AS keyword:

SELECT LastName + ', ' + FirstName AS FullName


FROM dbo.Employees;

The query itself is essentially the same as the previous example; all that’s
changed is that the column now has a name, FullName, as shown in Figure 4.
This name is also called an alias. Aliases are used routinely in complex SQL
and are necessary for certain types of joins, like self-joins. To include a space
within an alias, surround the alias with square brackets:

SELECT LastName + ', ' + FirstName AS [Full Name]


FROM dbo.Employees;

Figure 4. Giving a name to the computed column.

However, the AS clause is optional—you can name a column just by using the
name separated from the column list by a space:

SELECT LastName + ', ' + FirstName FullName


FROM dbo.Employees

This query functions in the same way as the query with the AS clause, but
makes your SQL statement harder to read. Is FullName a column in the
database, or an alias? It is better to be explicit with the AS clause. The AS
clause is the most explicit way to alias a column and is supported by the ANSI
standard.

4-8 Microsoft SQL Server 2005


Copyright © by Application Developers Training Company
All rights reserved. Reproduction is strictly prohibited.
The SELECT Statement

Another supported option is to use an equal sign for defining a column alias:

SELECT FullName = LastName + ', ' + FirstName


FROM dbo.Employees

Deprecated Syntax

One method of defining column aliases using the equal sign (=) has been deprecated,
which means it will not be supported in future versions of SQL Server. This method
uses a string expression to define the alias:
-- Deprecated Syntax
SELECT 'FullName' = LastName + ', ' + FirstName
FROM dbo.Employees

You can, however, still use a string expression to define the column alias if you use the
AS syntax (even if you leave out the word AS).

Using DISTINCT to Limit Values


There are certain times when you want to retrieve only unique instances of a
column. For example, suppose you want a list of every job title. You might
write the SQL statement like this:

SELECT Title FROM dbo.Employees;

However, this would return one row for every employee. You only want a list
of each distinct title; that is, you want each title listed once in the result set.
The DISTINCT keyword limits a query to unique rows only:

SELECT DISTINCT Title FROM dbo.Employees;

Now instead of having a row for every employee, you have one row for each
unique title, as shown in Figure 5.

Microsoft SQL Server 2005 4-9


Copyright © by Application Developers Training Company
All rights reserved. Reproduction is strictly prohibited.
Data Selection Queries

Figure 5. Using the DISTINCT keyword to limit a record set.

4-10 Microsoft SQL Server 2005


Copyright © by Application Developers Training Company
All rights reserved. Reproduction is strictly prohibited.
The WHERE Clause

The WHERE Clause


You will rarely want SQL Server to return every row in a table. The most
efficient queries retrieve only the data you will actually use—no more, no less.
To limit the rows returned, you have to specify the subset of records that you
want. The WHERE clause is the primary row filter of a SELECT statement.

You use the WHERE clause to specify the search conditions that SQL Server
should use to identify rows that should or shouldn’t be included in the result
set. The simplest WHERE clauses check for equality. For example, if you want
a list of all customers in the city of Paris write the following query:

SELECT CompanyName, City


FROM dbo.Customers
WHERE City = 'Paris';

Figure 6 shows the result set for this query.

Figure 6. Using the WHERE clause to retrieve only customers from Paris

Microsoft SQL Server 2005 4-11


Copyright © by Application Developers Training Company
All rights reserved. Reproduction is strictly prohibited.
Data Selection Queries

Transact-SQL Comparison Operators


Transact-SQL comparison operators allow you to build queries that search for
records other than those that are exactly equal. Table 1 lists the Transact-SQL
Comparison Operators. The symbol-based operators are self-explanatory;
however, some of the text-based operators, such as the LIKE operator, are
easier to understand through example.

Operator Description

= Equal to
<> Not equal to
> Greater than
< Less than
>= Greater than or equal to
<= Less than or equal to
!= Not equal to
!< Not less than
!> Not greater than
[NOT] LIKE Pattern matching a string with wildcards
BETWEEN expr1 AND An inclusive range of values
expr2
IS [NOT] NULL Check for null value
[NOT] IN (val1, List matching
val2…)
-or-
[NOT] IN (subquery)
ANY (SOME) Tests whether one or more rows in the result set of
a subquery meet the specified condition
ALL Tests whether all rows in the result set of a
subquery meet the specified condition
[NOT] EXISTS Tests whether a subquery returns any results
Table 1. Transact-SQL comparison operators.

4-12 Microsoft SQL Server 2005


Copyright © by Application Developers Training Company
All rights reserved. Reproduction is strictly prohibited.
The WHERE Clause

The LIKE Operator


The LIKE operator uses wildcards to match string values. Table 2 shows a list
of wildcard characters and their meanings.

Wildcard Meaning

% Any string of zero or more characters.


_ Any single character.
[] Any single character within the specified range or set (LIKE [a-
f] or LIKE [abcdef]).
[^] Any single character not within the specified range or set (LIKE
[^a - f]) or LIKE [^abcdef]).
Table 2. Wildcard characters used with the LIKE operator.

Write the following query to show a list of customers whose names start with
“S.”

SELECT CompanyName
FROM dbo.Customers
WHERE CompanyName LIKE 'S%';

The result set is shown in Figure 7.

Figure 7. Selecting companies whose names start with “S.”

This query will display companies whose names end in “S”:

SELECT CompanyName
FROM dbo.Customers
WHERE CompanyName LIKE '%S';

Microsoft SQL Server 2005 4-13


Copyright © by Application Developers Training Company
All rights reserved. Reproduction is strictly prohibited.
Data Selection Queries

And this one will display customers who have an “S” anywhere in their names:

SELECT CompanyName
FROM dbo.Customers
WHERE CompanyName LIKE '%S%'

The wildcard characters for the LIKE operator are the percent symbol (%) and
the underscore (_). Their behavior corresponds to the asterisk (*) and question
mark (?) in DOS search strings. So the expression ‘S%’ means an ‘S’ character
followed by any number of other characters, and the expression ‘S_’ means an
‘S’ character followed by one other character.

Use the underscore to write a query that shows all of the customer first names
that start with “B”, end with “P”, and have any combination of characters in
the middle three slots (indicated by three underscore characters):

SELECT CustomerID
FROM dbo.Customers
WHERE CustomerID LIKE 'B___P';

The result set is shown in Figure 8.

Figure 8. Using single-character pattern matching.

Use square brackets to delimit a list of values to match. The following query
will return any CustomerID values that begin with “FRAN” and end with
either “R” or “K”:

SELECT CustomerID
FROM dbo.Customers
WHERE CustomerID LIKE 'FRAN[RK]';

You can also use square brackets to specify a consecutive range of characters
to match:

4-14 Microsoft SQL Server 2005


Copyright © by Application Developers Training Company
All rights reserved. Reproduction is strictly prohibited.
The WHERE Clause

SELECT CustomerID
FROM dbo.Customers
WHERE CustomerID LIKE 'FRAN[A-S]';

Use the caret character inside square brackets to indicate negation. For
example, the following query will return CustomerID values that begin with
“FRAN” and end with any character except “R”:

SELECT CustomerID
FROM dbo.Customers
WHERE CustomerID LIKE 'FRAN[^R]';

The BETWEEN Operator


Use the BETWEEN operator when you want to specify an inclusive range of
values as a search condition. The two values are separated by the AND
keyword:

SELECT LastName, FirstName, PostalCode


FROM dbo.Employees
WHERE PostalCode BETWEEN '98103' AND '98999'

The result set shown in Figure 9 includes all customers who have a postal code
between 98103 and 98999, inclusively.

Figure 9. Using the BETWEEN operator to filter ZIP codes.

TIP: The BETWEEN operator can be used to express date ranges as well as string
or numeric ranges.

Microsoft SQL Server 2005 4-15


Copyright © by Application Developers Training Company
All rights reserved. Reproduction is strictly prohibited.
Data Selection Queries

Using IS NULL to Test for Nulls


Use IS NULL to test for null values in expressions. Any expression that
contains a null value always resolves as null, because as long as there is an
unknown value, how could the expression resolve to a known value?

As a consequence of the behavior of nulls, a special operator exists to test for


the existence of a null:

SELECT LastName, FirstName, Region


FROM dbo.Employees
WHERE Region IS NULL

The result set is shown in Figure 10.

Figure 10. Using IS NULL to return employees with a null region.

4-16 Microsoft SQL Server 2005


Copyright © by Application Developers Training Company
All rights reserved. Reproduction is strictly prohibited.
The WHERE Clause

Beware! Three-Valued Logic Lies Ahead

A database null value is not the same as a null in a programming language. Null in
SQL Server means an unknown value, and is not equivalent to zero. The possible
values of a SQL expression can be TRUE, FALSE, and NULL (UNKNOWN),
whereas in a programming language they are simply true or false. In SQL, any
comparison or operation involving a null value results in null. For example, x + null =
null, or to rephrase it, x + unknown = unknown. The reason is that an unknown value
cannot be compared logically against any other value. This occurs if an expression is
compared to the literal NULL, or if two expressions are compared and one of them
evaluates to NULL.

When testing for nulls, always use IS NULL or IS NOT NULL. An attempt to test for
nulls using the equality operator (=) may or may not work, depending on the database
settings for ANSI NULLS. When SET ANSI_NULLS is ON, a comparison in which
one or more of the expressions is NULL does not yield either TRUE or FALSE; it
yields NULL. Note that the session level setting overrides the default database setting,
and the default for ODBC and OLE DB connections is SET ANSI NULLS True.

Multiple Conditions with AND, OR, and


NOT
Use the Boolean AND, OR, and NOT operators to combine multiple
conditions, creating more complex search conditions.

The AND operator requires both expressions to be true for the condition to be
true:

SELECT LastName, City, PostalCode


FROM dbo.Employees
WHERE City = 'Seattle' AND PostalCode LIKE '9%'

The results display all employees who live in Seattle and whose ZIP code
starts with “9”, as shown in Figure 11.

Figure 11. Using the AND operator to filter result sets.

Microsoft SQL Server 2005 4-17


Copyright © by Application Developers Training Company
All rights reserved. Reproduction is strictly prohibited.
Data Selection Queries

The OR operator requires only one expression to be true, although both can be
true. The preceding query can be modified to use the OR operator:

SELECT LastName, City, PostalCode


FROM dbo.Employees
WHERE City = 'Seattle' OR PostalCode LIKE '9%'

This query shows all of the employees in Seattle in addition to all employees
who have a ZIP code starting with 9. Figure 12 shows the result set.

Figure 12. Using the OR operator to filter result sets.

The NOT operator is used for negation against a single expression, which
makes it more difficult to understand. In essence, it reverses the behavior of
the expression that follows it. The following query returns all cities and postal
codes except Seattle:

SELECT LastName, City, PostalCode


FROM dbo.Employees
WHERE City NOT LIKE 'Seattle'

Figure 13 shows the result set of all employees who don’t live in Seattle.

Figure 13. Using the NOT operator.

4-18 Microsoft SQL Server 2005


Copyright © by Application Developers Training Company
All rights reserved. Reproduction is strictly prohibited.
The WHERE Clause

Operator Precedence
When you combine operators in expressions, the precedence of the operator
determines how the conditions are evaluated. The order of precedence is:

1. NOT

2. AND

3. OR

An operator that’s higher in this list is evaluated before an operator that’s


lower in the list.

For example, the following query will eliminate any employees from Seattle,
regardless of their names, and then filter the LastName on the remaining rows:

SELECT LastName, FirstName, City


FROM dbo.Employees
WHERE LastName LIKE '%S%'
AND City NOT LIKE 'Seattle'

Figure 14 shows the result set.

Figure 14. Filtering on city, then name.

Using the IN Operator


The OR operator can be used to filter on a list of values, as shown in the
following query, which returns customers in France or Spain:

SELECT CustomerID, Country


FROM dbo.Customers
WHERE Country = 'France'
OR Country = 'Spain'

Microsoft SQL Server 2005 4-19


Copyright © by Application Developers Training Company
All rights reserved. Reproduction is strictly prohibited.
Data Selection Queries

However, a much more efficient (and faster) way is to use the IN operator. The
IN operator compares a field against an array of correct values. The following
query returns the same list of all customers in France and Spain:

SELECT CustomerID, Country


FROM dbo.Customers
WHERE Country IN ('France', 'Spain')

Figure 15 shows the result set:

Figure 15. Using the IN operator to include customers from multiple countries.

TIP: You can also use the IN operator with subqueries. For example, to find all
customers who have not placed orders, you could use this:

SELECT CustomerID
FROM dbo.Customers
WHERE CustomerID NOT IN(SELECT CustomerID
FROM dbo.Orders);

Later in this chapter you’ll see how to create this query by using an outer join.

4-20 Microsoft SQL Server 2005


Copyright © by Application Developers Training Company
All rights reserved. Reproduction is strictly prohibited.
Using ORDER BY to Sort Data

Using ORDER BY to Sort Data


Use the ORDER BY clause to specify the order in which rows are returned in
the result set. Because SQL Server uses several different algorithms for
fetching data depending on the type of query and the system statistics, you
cannot guarantee any order for the return of rows without an ORDER BY
clause.

Sorting on a Single Column


The ORDER BY clause appears at the end of a basic SELECT statement, after
the WHERE clause (if any). You can order by any column in the tables that
you use in the query. The column that you use in the ORDER BY clause does
not have to appear in the SELECT list. The most basic ORDER BY is a single
column:

SELECT LastName, City


FROM dbo.Employees
ORDER BY City

The default sort order for an ORDER BY clause is in ascending order: 0-9,
A-Z. The results of sorting on the City name are shown in Figure 16.

Figure 16. A basic ORDER BY, sorting by city in ascending order.

To sort in descending order, you add the DESC keyword after the column:

Microsoft SQL Server 2005 4-21


Copyright © by Application Developers Training Company
All rights reserved. Reproduction is strictly prohibited.
Data Selection Queries

SELECT LastName, City


FROM dbo.Employees
ORDER BY City DESC;

The results display the cities sorted in descending alphabetical order.

Sorting by Multiple Columns


You can sort on multiple columns, but each column must be separated by a
comma. You can specify the sort order for each column separately. You can
specify ascending order by using the optional ASC keyword, which makes the
following query easier to read:

SELECT LastName, City


FROM dbo.Employees
ORDER BY City DESC, LastName ASC

The result set in Figure 17 displays the cities sorted in descending order first,
and then by last name in ascending order where the cities are identical.

Figure 17. Sorting by city and last name.

4-22 Microsoft SQL Server 2005


Copyright © by Application Developers Training Company
All rights reserved. Reproduction is strictly prohibited.
Using ORDER BY to Sort Data

Sorting with Expressions


Column names are not the only elements that you can use to sort a result set.
You can create an expression in the ORDER BY clause for sorting:

SELECT LastName
FROM dbo.Employees
ORDER BY LEN(LastName);

Figure 18 shows that by using the LEN function in an expression, you can sort
employees by the length of their last names.

Figure 18. Using an expression to sort by the length of the last name.

Microsoft SQL Server 2005 4-23


Copyright © by Application Developers Training Company
All rights reserved. Reproduction is strictly prohibited.
Data Selection Queries

The GROUP BY Clause


One of the primary purposes of a relational database is to summarize
information. Other than for data entry, you will rarely look at individual
orders. Instead, using the database, you will summarize all the orders by week,
month, or year. SQL Server performs these sorts of summaries using
aggregate functions. These aggregate functions, combined with the GROUP
BY clause, can combine a tremendous amount of data into a clear and concise
summary.

Aggregate Functions
Table 3 lists some of the most commonly used aggregate functions. Of the five
functions, COUNT is used most often, providing the aggregate count of the
number of rows in the query. The other aggregate functions are intended for
use in summarizing numeric values.

Function Description

COUNT Counts the number of rows in the aggregate query.


COUNT_BIG Same as COUNT, but returns a bigint data type.
SUM Adds up (sums) column values. Only works with numeric columns.
AVG Averages column values. Only works with numeric columns.
MIN Returns the lowest column value in an aggregate query.
MAX Returns the largest column value in an aggregate query.
Table 3. The most commonly used aggregate functions in SQL Server.

TIP: You can view a complete list of aggregate functions by searching in SQL
Server Books Online for Aggregate Functions.

Counting Rows
Use the COUNT function to return the number of rows in a table, as shown in
the following query, which uses the asterisk (*) to count all of the rows:

4-24 Microsoft SQL Server 2005


Copyright © by Application Developers Training Company
All rights reserved. Reproduction is strictly prohibited.
The GROUP BY Clause

SELECT COUNT(*)
FROM dbo.Employees

TIP: Your aggregate queries will execute faster if you use the asterisk in the
COUNT function instead of a column name. The asterisk instructs SQL
Server to count only the number of rows. Using a column name will force
SQL Server to retrieve every value in the column and check for nulls, which
aren’t included in the count. If all you’re doing is counting the rows, the
asterisk is more efficient.

Counting Columns
When you specify a column name in the COUNT function, any null values in
the column are excluded from the count. The following query counts the total
number of rows in the Employees table, and then counts the non-null values in
the Region column:

SELECT COUNT(*) AS NumEmployees,


COUNT(Region) AS NumRegion
FROM dbo.Employees

Because there are null Region values, the result set in Figure 19 shows a
discrepancy between the number of employees and the number of regions.

Figure 19. Null values are not included in COUNT aggregate functions.

Counting with a WHERE Clause


When you use a WHERE clause in an aggregate function, only the rows that
pass the search conditions of the WHERE clause are included in the aggregate
query:

Microsoft SQL Server 2005 4-25


Copyright © by Application Developers Training Company
All rights reserved. Reproduction is strictly prohibited.
Data Selection Queries

SELECT COUNT(*) AS NumEmployeeSeattle


FROM dbo.Employees
WHERE City = 'Seattle'

Figure 20 shows the results of the restricted result set.

Figure 20. The count of the number of employees in Seattle.

While this query returned the number of employees in Seattle, by using the
GROUP BY clause you can return a list of every city and the number of
customers in each city.

Using GROUP BY
Use a GROUP BY clause to return a list of every city where employees live
and the number of employees in each city. The GROUP BY clause combines
aggregate columns and regular columns in the same query. The queries in the
previous examples included only aggregate columns, so only aggregates were
returned. You might think that simply listing the City and the COUNT of rows
might work, as in the following query, but you would be wrong:

SELECT City, COUNT(*) AS NumEmployees


FROM dbo.Employees;

Unfortunately, this query generates the following error, which tells you exactly
what’s going on:

Msg 8120, Level 16, State 1, Line 1


Column 'dbo.Employees.City' is invalid in the select list
because it is not contained in either an aggregate
function or the GROUP BY clause.

4-26 Microsoft SQL Server 2005


Copyright © by Application Developers Training Company
All rights reserved. Reproduction is strictly prohibited.
The GROUP BY Clause

Add a GROUP BY clause to fix the query. The GROUP BY clause must
include all of the non-aggregate columns in the select list, in this case the City:

SELECT City, COUNT(*) AS NumEmployees


FROM dbo.Employees
GROUP BY City

The result set in Figure 21 lists all the cities in the Employees table, plus the
count of the employees from those cities. Note that the rows are ordered
alphabetically by City, which is the column used for the grouping.

Figure 21. A basic GROUP BY showing cities and the number of employees in
each city.

Using ORDER BY with GROUP BY


Use the ORDER BY clause with the GROUP BY clause to order the data so
that the cities with the most employees come first:

SELECT City, COUNT(*) AS NumEmployees


FROM dbo.Employees
GROUP BY City
ORDER BY COUNT(*) DESC, City

Notice that aggregate functions can be used in the ORDER BY clause, and
using the DESC keyword shows the cities with the most employees first (see
Figure 22). Adding City to the ORDER BY clause puts all cities with the same
number of employees in alphabetical order.

Microsoft SQL Server 2005 4-27


Copyright © by Application Developers Training Company
All rights reserved. Reproduction is strictly prohibited.
Data Selection Queries

Figure 22. Ordering the employee by city counts.

Using HAVING with GROUP BY


When you use a WHERE clause in an aggregate query, the WHERE clause
excludes rows before the grouping takes place. Use the HAVING clause to
limit the result set after the aggregate value is calculated (see Figure 23). For
example, the preceding query calculates the number of employees and displays
them in descending order. The following query does the same, but eliminates
cities that have only one employee by applying the HAVING criteria after the
aggregate is calculated:

SELECT City, COUNT(*) AS NumEmployees


FROM dbo.Employees
GROUP BY City
HAVING COUNT(*) > 1
ORDER BY NumEmployees DESC, City;

Figure 23. Filtering the result set using the HAVING clause.

WARNING! Note that the last query used the alias NumEmployees in the
ORDER BY clause, but not in the HAVING clause. Even when you
define an alias for an aggregate expression, you must use the full
aggregate expression in the HAVING clause. Using the alias there
will cause an error.

4-28 Microsoft SQL Server 2005


Copyright © by Application Developers Training Company
All rights reserved. Reproduction is strictly prohibited.
The GROUP BY Clause

TOP Values Queries


Use the TOP clause to limit the number of rows in a result set. For example,
the following query selects the top three cities with the highest number of
customers:

SELECT TOP 3
City, COUNT(*) AS NumEmployees
FROM dbo.Employees
GROUP BY City
ORDER BY COUNT(*) DESC;

Figure 24 shows the result set from the query. A key element is the ORDER
BY clause, which sorts the result set in descending order before the TOP
clause is applied. A sort in ascending order would return the top three cities
with the fewest number of employees.

Figure 24. Using TOP to show the top three cities with the most employees.

However, one problem with a TOP query is the matter of ties. Aren’t there
many cities with only one employee? If you want to see the ties for last place,
you need to use the WITH TIES clause:

SELECT TOP 3 WITH TIES


City, COUNT(*) AS NumEmployees
FROM dbo.Employees
GROUP BY City
ORDER BY COUNT(*) DESC;

Figure 25 shows the result set. All five cities that tied for last place are
returned. For the data in this table, only a TOP 2 query will return unique
values.

Microsoft SQL Server 2005 4-29


Copyright © by Application Developers Training Company
All rights reserved. Reproduction is strictly prohibited.
Data Selection Queries

Figure 25. The top three cities WITH TIES shows all of the ties for last place.

TOP also enables you to specify a percent value rather than an absolute
number. Here’s an example of using TOP with PERCENT:

SELECT TOP 25 PERCENT WITH TIES


City, COUNT(*) AS NumEmployees
FROM dbo.Employees
GROUP BY City
ORDER BY COUNT(*) DESC;

TIP: SQL Server 2005 added several enhancements to TOP. You can now use any
numeric expression, even a variable, to specify the number; you can also now
use TOP in INSERT, UPDATE, and DELETE statements.

4-30 Microsoft SQL Server 2005


Copyright © by Application Developers Training Company
All rights reserved. Reproduction is strictly prohibited.
Joining Tables

Joining Tables
One of the fundamental concepts of relational databases is tables, or sets, of
data. Different data elements are grouped into separate tables. Data about
employees is in the dbo.Employees table, data about orders is in the
dbo.Orders table, and so forth. The process of organizing the various elements
of data into tables is called normalization. The key measure of successful
normalization is the ability to join tables effectively.

The advantage of normalization is that each discrete bit of information—a


customer address, for instance—is stored in only one place. The disadvantage
of normalization is the need for joining. To assemble the information you need
in a query, you often must join two or more tables.

Cross Joins (Cartesian Products)


The following query performs a cross join between the Products and
Categories tables. Because it has no restrictions, the result contains every
possible combination of a row from the Products table and a row from the
Categories table.

SELECT ProductName, CategoryName


FROM dbo.Products, dbo.Categories

The result of this query is a Cartesian product (named after the French
mathematician Rene Descartes who developed the concept). All possible
combinations of ProductName and CategoryName are returned. Consider what
would happen if each table had thousands of rows. A multimillion-row result
set is not something you want your SQL Server to process, much less to send
down your network connection and into your workstation.

Not every Cartesian product is bad. It’s only when they are created
unintentionally that Cartesian products cause problems with a database.
Cartesian products are often used to generate large amounts of data rapidly,
which is useful for generating sample data, for instance. Certain scientific and
mathematical tasks require the creation of sets that combine every element of
one set with another set, which is exactly what a Cartesian product is.

The Use of Keys in Joining


The connection between the Products table and Categories table is the
CategoryID field. In the Categories table, CategoryID is the primary key. It is
an identity column, and whenever a new category is added, SQL Server

Microsoft SQL Server 2005 4-31


Copyright © by Application Developers Training Company
All rights reserved. Reproduction is strictly prohibited.
Data Selection Queries

automatically generates the next ID number. Each number is unique and


cannot be null; these are standard requirements for a primary key.

The Products table also has a primary key, the ProductID. But the connection
to the Categories table is the CategoryID, which is a foreign key in the Product
table.

By adding a WHERE clause, you make an intelligent join between the two
tables, as shown here where the CategoryID in the Products table is joined to
the CategoryID in the Categories table:

SELECT ProductName, CategoryName


FROM dbo.Products, dbo.Categories
WHERE Products.CategoryID = Categories.CategoryID;

Figure 26 shows the results of this query. Each product is listed once, with its
appropriate category.

Figure 26. The result set from a join based on the CategoryID of both tables.

TIP: SQL Server performs best when indexed columns are used for joins. You can
write join queries using any column in a table that matches another column in
another table. You can even build matching expressions for joins. But the
more complex the joining criteria, the slower the join will execute. The long
integers used in identity field primary keys are the perfect join column.

Join Notation
As the SQL standard has evolved, so has the join notation. The earliest join
notation uses the WHERE clause to enforce the joining criteria, as shown in
the previous example.

4-32 Microsoft SQL Server 2005


Copyright © by Application Developers Training Company
All rights reserved. Reproduction is strictly prohibited.
Joining Tables

The problem with this notation is that it overloads the WHERE clause; in
essence, the WHERE clause is pulling double duty by restricting the rows of
the result set based on search criteria and enforcing joins between different
tables. This causes a variety of problems:

• No enforcement of joining criteria requirements will likely result in


Cartesian products when queries are complex.
• Search criteria and join criteria expressions can cause SQL Server to
perform the query inefficiently.
• As queries get complex, the WHERE clause becomes extremely
difficult to read.
The SQL92 standard introduced a new notation for joining, using the JOIN
condition:

SELECT dbo.Products.ProductName,
dbo.Categories.CategoryName
FROM dbo.Products JOIN dbo.Categories
ON dbo.Products.CategoryID = dbo.Categories.CategoryID;

The results of this query are identical to the previous query; only the notation
changed. The join notation puts joining criteria into the FROM clause,
independent of the WHERE clause. This makes the query easier to read, and
SQL Server can decipher and execute it more efficiently (SQL Server
translates WHERE-clause joins into FROM-clause joins).

Whenever possible, use the JOIN condition notation instead of the WHERE
notation, but at the same time, be aware that the older style is still used.

TIP: Another notation convention for joining is the use of table names with their
field names, separated by a dot. Table names are required only when a field is
referenced that exists in both tables, such as the CategoryID field. Using table
names for all field names not only looks better, but it makes the query self-
documenting and easier to read. Anyone who reads the query (including you a
day later) can quickly determine which table each field comes from without
having to look it up elsewhere. To reduce the amount of typing, short aliases
are often used for the table names:

SELECT P.ProductName, C.CategoryName


FROM dbo.Products AS P JOIN dbo.Categories AS C
ON P.CategoryID = C.CategoryID;

Microsoft SQL Server 2005 4-33


Copyright © by Application Developers Training Company
All rights reserved. Reproduction is strictly prohibited.
Data Selection Queries

Inner Joins
Inner joins are the original joins developed in the SQL language, and are
certainly the most common. The joins you’ve already seen in this chapter are
all inner joins. The principle behind the inner join is that the result set includes
only rows that have matching joining criteria.

The word INNER is always optional, because the inner join is the default. Here
is an example of adding the optional INNER keyword:

SELECT Products.ProductName, Products.CategoryName


FROM dbo.Products INNER JOIN dbo.Categories
ON Products.CategoryID = Categories.CategoryID

In this query all the data in both tables is included in the query, because all
products have categories assigned to them. However, it is important to
remember that data can be excluded from an inner join. CategoryID is allowed
to be NULL in the Northwind database, and the inner join wouldn’t return any
products with a NULL CategoryID.

Simple Inner Join


Once joins are introduced into the SELECT query, the query becomes longer
and more complex. A simple inner join has a single join, but when combined
with other clauses like WHERE and ORDER BY, the query grows to
substantial size:

SELECT dbo.Products.ProductName,
dbo.Categories.CategoryName,
dbo.Products.UnitPrice
FROM dbo.Products INNER JOIN dbo.Categories
ON dbo.Products.CategoryID = dbo.Categories.CategoryID
WHERE UnitPrice > 50
ORDER BY ProductName;

The results of this query, shown in Figure 27, show a list of all products, sorted
alphabetically, including the category and price, where the price is greater than
fifty dollars. The only role of the join in this query is to bring the category
name into the result set. The WHERE clause limited the products to those
above fifty dollars, and the ORDER BY clause sorted the products
alphabetically.

4-34 Microsoft SQL Server 2005


Copyright © by Application Developers Training Company
All rights reserved. Reproduction is strictly prohibited.
Joining Tables

Figure 27. A simple join brings the product data together with the category name.

Multiple Inner Joins


More than two tables can be joined in a given query. When connecting
multiple tables in a join, each join expression treats the previous expression as
if it were a single table. The Orders table first joins to the Customers table
using the CustomerID column in both tables. Then the Employees table is
joined to the results of that join, using the EmployeeID from the Orders table
and the Employees table.

SELECT OrderID, convert(varchar(10), OrderDate,101)


AS Date, CompanyName, LastName
FROM dbo.Orders
INNER JOIN dbo.Customers
ON Orders.CustomerID = Customers.CustomerID
INNER JOIN Employees
ON Orders.EmployeeID = Employees.EmployeeID
WHERE OrderDate BETWEEN '9/1/1996' AND '9/10/1996'
ORDER BY OrderDate;

The first few rows of the result set are shown in Figure 28, displaying the order
number and date from the Orders table, as well as the company name from the
Customers table and employee name from the Employees table.

Figure 28. Joining multiple tables yields a list of orders that includes customer and
employee names.

Microsoft SQL Server 2005 4-35


Copyright © by Application Developers Training Company
All rights reserved. Reproduction is strictly prohibited.
Data Selection Queries

Aggregates and Grouping with Joins


For grouping, the data in the join is treated as if it all came from the same
table. Non-aggregate columns have to be included in the GROUP BY and the
aggregate columns do not.

A typical group join, which also demonstrates a sequential join, determines the
total sales per customer. Three tables are involved:

• The Customers table provides the CustomerID and CompanyName.


• The Order Details table provides the UnitPrice and Quantity.
• The Orders table connects the tables by OrderID and CustomerID.
There is no direct relationship between Customers and Order Details⎯Orders
sits between Customers and Order Details, so the sequential join is Customers
to Orders to Order Details, even if no columns from Orders are displayed in
the result set.

This query selects the CompanyName, joins the Customers table to Orders,
and then to Order Details in order to derive a total for each customer:

SELECT CompanyName,
SUM([Order Details].UnitPrice * [Order
Details].Quantity)
AS TotalSold
FROM dbo.Customers INNER JOIN dbo.Orders
ON Customers.CustomerID = Orders.CustomerID
INNER JOIN [Order Details]
ON Orders.OrderID = [Order Details].OrderID
WHERE Orders.OrderDate BETWEEN '9/1/1996' AND '9/10/1996'
GROUP BY CompanyName
ORDER BY TotalSold DESC;

The GROUP BY clause includes the CompanyName in the SELECT list,


which is not part of the data being summed. The aggregate in this query is a
sum of the sale price multiplied by the quantity, and then the results are sorted
in descending order by the total price. The first few rows of the result set are
shown in Figure 29.

4-36 Microsoft SQL Server 2005


Copyright © by Application Developers Training Company
All rights reserved. Reproduction is strictly prohibited.
Joining Tables

Figure 29. A grouped join showing the total sales for each customer.

TIP: You may have noticed the square brackets around the name of the Order
Details table. Those brackets are required because the table name includes a
space character. Brackets are required if an object name includes a character
that isn’t otherwise allowed. Another example would be names that include
hyphens. In general, it is best to avoid using names that require square
brackets.

Outer Joins
There are three types of outer joins: left, right, and full. A left join includes all
rows of the first table, a right join includes all rows of the second table, and a
full join includes both tables. The challenge of using outer joins is knowing
when they are appropriate.

Write the following query using an inner join to generate a list of all customers
and the dates of their first order:

SELECT CompanyName, MIN(Orders.OrderDate) AS FirstOrder


FROM dbo.Customers INNER JOIN dbo.Orders
ON Customers.CustomerID = Orders.CustomerID
GROUP BY CompanyName
ORDER BY CompanyName

Microsoft SQL Server 2005 4-37


Copyright © by Application Developers Training Company
All rights reserved. Reproduction is strictly prohibited.
Data Selection Queries

The Customers table is joined to the Orders table and grouped by company
name. The MIN aggregate function returns the lowest order date for each
customer. The first few rows of the result set shown in Figure 30 display each
customer and the date of their first order.

Figure 30. Using an inner join to return the first order of each customer.

But check the record counts—the list may not include every customer. The
behavior of the inner join will exclude any customers who have not placed any
orders. If you need to see a list of all customers regardless of whether they’ve
placed an order, as well as the date of the first order for customers who have,
you need to use an outer join.

Left Outer Joins


A left outer join includes all rows in the first table specified in the join
expression. The result set generated by a left join contains all the rows that the
inner join includes, plus any other rows in the first table of the join that would
ordinarily be excluded by an inner join. Those rows have NULL values for any
columns from the joined table.

To change the previous inner join query, replace the inner join with a left join:

SELECT CompanyName, MIN(Orders.OrderDate) AS FirstOrder


FROM dbo.Customers LEFT JOIN dbo.Orders
ON Customers.CustomerID = Orders.CustomerID
GROUP BY CompanyName
ORDER BY CompanyName

Figure 31 shows one of the two Northwind customers that were excluded
before but show up in these query results. This customer and Paris spécialités
have no orders in the Northwind database.

4-38 Microsoft SQL Server 2005


Copyright © by Application Developers Training Company
All rights reserved. Reproduction is strictly prohibited.
Joining Tables

Figure 31. The result set of the left join, showing null values.

This is a key purpose of outer joins: Making certain that a given result set
includes all rows.

This example also points out how you can use outer joins to find unmatched
values. To find all customers who don’t have any orders, you can use the
following outer join, which relies on the fact that the OrderID column will
contain a NULL value for those customers. Figure 32 shows the result set.

SELECT CompanyName as [No Orders]


FROM dbo.Customers LEFT JOIN dbo.Orders
ON Customers.CustomerID = Orders.CustomerID
WHERE Orders.OrderID IS NULL
ORDER BY CompanyName;

Figure 32. Customers with no orders.

Right Outer Joins


Right outer joins function identically to left joins, except that the second table
of the join has all rows included, instead of the first table.

It is important to note that the only difference between a right outer join and a
left outer join is the order in which the tables are listed in the FROM clause.
The following query uses a right outer join to return customers without orders:

Microsoft SQL Server 2005 4-39


Copyright © by Application Developers Training Company
All rights reserved. Reproduction is strictly prohibited.
Data Selection Queries

SELECT CompanyName as [No Orders]


FROM dbo.Orders RIGHT JOIN dbo.Customers
ON Customers.CustomerID = Orders.CustomerID
WHERE Orders.OrderID IS NULL
ORDER BY CompanyName;

This query returns the same results shown in Figure 32.

Full Outer Join


A full outer join combines the effects of the left and right joins. It is not a
Cartesian Product, however, because it has a joining condition.

A Cartesian Product returns all the rows of the first table for each of the rows
of the second table. A full outer join returns all rows that meet the join criteria
(the inner join), as well as rows in the first table that did not meet the criteria
(the left join) and rows in the second table that did not meet the criteria (the
right join). The actual number of rows returned will be no more than the sum
of the number of rows in each table, not the product of those numbers. If all
the rows have matches in the two tables, the number of rows will be equal to
the number in the largest of the tables.

The following query will show you all products, including any without
categories, and all categories, including any without products:

SELECT ProductName, CategoryName


FROM dbo.Products FULL JOIN dbo.Categories
ON dbo.Products.CategoryID = dbo.Categories.CategoryID
ORDER BY ProductName;

Northwind contains no unmatched products and categories. If you enter a new


product without a category and a category without a product, then you’ll see
that those new entries will still be returned by the query, with NULL values in
the columns that have missing data.

Combining Inner and Outer Joins


When you combine inner and outer joins in one query, the joins are performed
in the order that you specify, and you need to understand the consequences of
that order. Normally, all inner joins should come first in the expression.
Disregarding this rule will lead to queries that seem to behave correctly, but
actually don’t. An example demonstrates the difficulties of using inner and
outer joins together.

4-40 Microsoft SQL Server 2005


Copyright © by Application Developers Training Company
All rights reserved. Reproduction is strictly prohibited.
Joining Tables

If you want to see all of the customers, whether they have orders or not, and
the total sales for customers who do have orders, you could try using a left join
on the Customers-Orders join, and an inner join on the Orders-OrderDetails
join. On the surface, this seems to make sense because you want all of the
customers regardless of whether they have orders, and you know that every
Order Detail will have an OrderID that matches one in the Orders table:

SELECT CompanyName,
SUM([Order Details].UnitPrice * [Order
Details].Quantity)
AS TotalSold
FROM dbo.Customers LEFT JOIN dbo.Orders
ON Customers.CustomerID = Orders.CustomerID
INNER JOIN dbo.[Order Details]
ON Orders.OrderID = [Order Details].OrderID
GROUP BY CompanyName
ORDER BY CompanyName;

However, the results of the query don’t show the customers that you know
haven’t placed an order yet.

The problem is that the left join precedes the inner join in the query. The inner
join, which is executed second, excludes all the null rows generated by the left
join⎯you might as well have used two inner joins. You need to revise the
query so that the inner join comes first. Join the Orders to Order Details first
and then use a right join on Customers for the correct result:

SELECT CompanyName,
SUM([Order Details].UnitPrice *
[Order Details].Quantity) AS TotalSold
FROM dbo.Orders INNER JOIN dbo.[Order Details]
ON Orders.OrderID = [Order Details].OrderID
RIGHT JOIN dbo.Customers
ON Orders.CustomerID = Customers.CustomerID
GROUP BY CompanyName
ORDER BY CompanyName;

Scrolling through the results, you can see that the customers without orders are
now showing up as expected.

Another way of solving this problem is to use only left joins:

Microsoft SQL Server 2005 4-41


Copyright © by Application Developers Training Company
All rights reserved. Reproduction is strictly prohibited.
Data Selection Queries

SELECT CompanyName, SUM([Order Details].UnitPrice *


[Order Details].Quantity) AS TotalSold
FROM dbo.Customers LEFT JOIN dbo.Orders
ON Orders.CustomerID = Customers.CustomerID
LEFT JOIN dbo.[Order Details]
ON Orders.OrderID = dbo.[Order Details].OrderID
GROUP BY CompanyName
ORDER BY CompanyName;

Even though you know that every Order has at least one matching Order
Detail, and that you shouldn’t ordinarily need an outer join between Orders
and Order Details, using the outer join here ensures that results from the first
join that have a NULL OrderID won’t be excluded.

TIP: When it comes to performance, inner joins win hands down. Use outer joins
sparingly. An inner join can exploit indexes to improve performance; an outer
join almost always requires a table scan, where SQL Server reads every row
in the table. However, if you use a WHERE clause that restricts the rows
returned in the outer join, SQL Server may elect to use an index on the outer
join table, speeding up data retrieval.

Self Joins
A self join is counter-intuitive: why join a table to itself? But self joins are the
best solution to certain kinds of problems.

An example of an appropriate use for a self join is to track the manager each
employee reports to. In the Employees table in the Northwind database, the
ReportsTo column contains the EmployeeID of the person to whom each
employee reports. Employees who don't report to anyone (presumably the top
people) have a null value in that field.

In a self join, both sides of the join are the same table, but each side is
referenced separately, using an alias to distinguish them. An alias allows you
to assign a custom name to one or both of the instances of the table, which
allows the query processor to treat them as separate tables that happen to
contain the same data. In this query, the first reference to the Employees table
uses the default table name. The second reference to the Employees table uses
the alias of Managers. The Employees table is joined to the virtual Managers
table on ReportsTo in Employees and EmployeeID in Managers:

4-42 Microsoft SQL Server 2005


Copyright © by Application Developers Training Company
All rights reserved. Reproduction is strictly prohibited.
Joining Tables

SELECT Employees.FirstName + ' ' + Employees.LastName


AS EmployeeName, Managers.FirstName + ' ' +
Managers.LastName AS ManagerName
FROM dbo.Employees INNER JOIN dbo.Employees AS Managers
ON Employees.ReportsTo = Managers.EmployeeID

The result set shown in Figure 33 displays only employees with managers.

Figure 33. A self join query using an inner join returns only rows with data.

To return all employees, use a left join between Employees and the alias table
Reports instead of an inner join:

SELECT Employees.FirstName + ' ' + Employees.LastName


AS EmployeeName, Managers.FirstName + ' ' +
Managers.LastName AS ManagerName
FROM dbo.Employees LEFT JOIN dbo.Employees AS Managers
ON Employees.ReportsTo = Managers.EmployeeID

The result, shown in Figure 34, lists all employees, including the big boss.

Microsoft SQL Server 2005 4-43


Copyright © by Application Developers Training Company
All rights reserved. Reproduction is strictly prohibited.
Data Selection Queries

Figure 34. A self join query using a left join returns all rows.

TIP: Self joins are useful when the data you want to join exists in two different
columns in a single table. You can also use self joins to join each row in a
table to the row that comes before or after it in a particular order. For
example, if your rows all have consecutive dates, you could join the date
column to an expression that adds one day to that date. This would allow you
to use expressions that compare or perform calculations on the values in
adjoining rows.

4-44 Microsoft SQL Server 2005


Copyright © by Application Developers Training Company
All rights reserved. Reproduction is strictly prohibited.
Joining Tables

Summary
• SQL is a comprehensive querying language for creating database
objects and for retrieving and modifying data.
• The basic SELECT statement includes a list of columns and a table to
get the columns from.
• You can restrict the rows that the query returns by using the WHERE
clause.
• You can order the rows that the query returns using the ORDER BY
clause.
• You can summarize data in tables by using aggregate functions.
• The GROUP BY clause creates sophisticated aggregate queries.
• Joining is a necessity with relational databases, because related data is
separated into tables (called normalization).
• A join with no joining criteria results in a Cartesian product, in which
the rows of the first table are combined with all the rows of the second
table.
• Primary key/foreign key joins are the most efficient way to join tables.
• There are several joining notations. The SQL92 standard, which uses
the FROM clause, is the preferred joining notation.
• Inner joins combine rows based on the joining criteria to the exclusion
of all other rows.
• Left outer joins combine rows based on the joining criteria and also
include all rows of the first table in the join.
• Right outer joins combine rows based on the joining criteria and also
include all rows of the second table in the join.
• Full outer joins combine rows based on the joining criteria and also
include all rows from both tables in the join.
• A self join is a special form of inner or outer join that involves only
one table, joined to another aliased instance of itself.

Microsoft SQL Server 2005 4-45


Copyright © by Application Developers Training Company
All rights reserved. Reproduction is strictly prohibited.
Data Selection Queries

(Review questions and answers on the following pages.)

4-46 Microsoft SQL Server 2005


Copyright © by Application Developers Training Company
All rights reserved. Reproduction is strictly prohibited.
Joining Tables

Questions
1. How can you retrieve the last name and first name of all employees in the
Employees table?

2. Which wildcard character do you use with the LIKE keyword to indicate
“all characters”?

3. How can you count the number of employees from each region in the
Employees table?

4. Why are FROM clause joins preferable to WHERE clause joins?

5. What sort of join would you use to return the rows from the Order Details
table for a specific order?

6. What kind of join would you use to return the names of managers where
the Manager ID is stored in the Employees table for each employee?

Microsoft SQL Server 2005 4-47


Copyright © by Application Developers Training Company
All rights reserved. Reproduction is strictly prohibited.
Data Selection Queries

Answers
1. How can you retrieve the last name and first name of all customers in the
Employees table?
SELECT LastName, FirstName FROM dbo.Employees

2. Which wildcard character do you use with the LIKE keyword to indicate
“all characters”?
The percent sign, or '%'

3. How can you count the number of customers from each region in the
Employees table?
SELECT Region, COUNT(*)
FROM dbo.Employees
GROUP BY Region

4. Why are FROM clause joins preferable to WHERE clause joins?


FROM clause joins use the explicit JOIN keyword and require a
join criterion, making Cartesian Products less likely.

5. What sort of join would you use to return the rows from the Order Details
table for a specific order?
An inner join between the Order table and Order Details table

6. What kind of join would you use to return the names of managers where
the manager ID is stored in the Employees table for each employee?
A self-join on the Employees table between the Manager ID and
the employee ID

4-48 Microsoft SQL Server 2005


Copyright © by Application Developers Training Company
All rights reserved. Reproduction is strictly prohibited.
Lab 4:
Data Selection Queries

Lab 4:
Data Selection
Queries

Microsoft SQL Server 2005 4-49


Copyright © by Application Developers Training Company
All rights reserved. Reproduction is strictly prohibited.
Lab 4:
Data Selection Queries

Lab 4 Overview
In this lab you’ll learn the fundamentals of building SELECT queries.

To complete this lab, you’ll need to work through five exercises:

• Simple Select Query


• Aggregate Query
• Joining Tables with an Inner Join
• Aggregate Query with Multiple Inner Joins
• Aggregate Query with Inner and Outer Joins
Each exercise includes an “Objective” section that describes the purpose of the
exercise. You are encouraged to try to complete the exercise from the
information given in the Objective section. If you require more information to
complete the exercise, the Objective section is followed by detailed step-by-
step instructions.

4-50 Microsoft SQL Server 2005


Copyright © by Application Developers Training Company
All rights reserved. Reproduction is strictly prohibited.
Simple Select Query

Simple Select Query

Objective
In this exercise, you’ll build a simple select query to retrieve a list of products,
the quantity per unit for each product, and the unit price.

Things to Consider
• Which table in the database contains the data you want to retrieve?
• Which fields are you going to retrieve in the query?
• Are there any expressions that you could use to improve the query?
• Which order should the query appear in?

Step-by-Step Instructions
1. Open a query window and type the following statements to use the
Northwind database and to see the column names in the Products table.
Note the names of the columns.

USE Northwind;
SELECT * FROM dbo.Products WHERE 1=0;

2. From the column name information gathered from the SELECT query,
you’re ready to create the query. Concatenate the product name and
quantity per unit, and select the unit price. Sort by the product name. The
following query will produce the results shown in Figure 35, which
displays the first few rows of the result set.

SELECT ProductName + ' sold in ' +


QuantityPerUnit AS Packaging, UnitPrice
FROM dbo.Products
ORDER BY ProductName

Microsoft SQL Server 2005 4-51


Copyright © by Application Developers Training Company
All rights reserved. Reproduction is strictly prohibited.
Lab 4:
Data Selection Queries

Figure 35. Products sorted by name.

3. Execute the query by pressing the F5 key, clicking the Execute button, or
choosing Query|Execute from the menu.

4. Change the ORDER BY clause to sort by UnitPrice in descending order.


The following query will produce the results shown in Figure 36, which
displays the first few rows of the result set.

SELECT ProductName + ' sold in ' +


QuantityPerUnit AS Packaging, UnitPrice
FROM dbo.Products
ORDER BY UnitPrice DESC

Figure 36. Products sorted by price in descending order.

5. Execute the query.

4-52 Microsoft SQL Server 2005


Copyright © by Application Developers Training Company
All rights reserved. Reproduction is strictly prohibited.
Aggregate Query

Aggregate Query

Objective
In this exercise, you’ll build an aggregate query to find out how many units are
in stock, how many on order for products where the UnitPrice is greater than
30 and there is at least 1 unit on order. The query will then calculate and
display the total units by adding the units in stock to the units on order. You'll
then revise the query so that you only display rows where there are more than
40 total units.

Things to Consider
• Which fields should you use to create the query?
• What kind of aggregate function are you going to need?
• Which fields should you include in the GROUP BY clause?
• When do you use a HAVING clause?

Step-by-Step Instructions
1. To determine the total units, you’ll need to sum the units in stock with the
units on order. Since City is not part of the aggregate function, it has to be
included in the GROUP BY. For clarity, order the query by the COUNT
function in descending order.

SELECT ProductName, UnitsInStock, UnitsOnOrder,


SUM(UnitsInStock + UnitsOnOrder) AS TotalUnits
FROM dbo.Products
WHERE UnitPrice > 30 AND UnitsOnOrder > 0
GROUP BY ProductName, UnitsInStock, UnitsOnOrder
ORDER BY TotalUnits DESC

Microsoft SQL Server 2005 4-53


Copyright © by Application Developers Training Company
All rights reserved. Reproduction is strictly prohibited.
Lab 4:
Data Selection Queries
2. Execute the query and view the results as shown in Figure 37.

Figure 37. Total units for products with a unit price greater than 30 and at least 1
order.

3. To only display rows where the TotalUnits are greater than 40, add the
bolded HAVING clause to the query. Execute the query. The results
should look like those shown in Figure 38.

SELECT ProductName, UnitsInStock, UnitsOnOrder,


SUM(UnitsInStock + UnitsOnOrder) AS TotalUnits
FROM dbo.Products
WHERE UnitPrice > 30 AND UnitsOnOrder > 0
GROUP BY ProductName, UnitsInStock, UnitsOnOrder
HAVING SUM(UnitsInStock + UnitsOnOrder) > 40
ORDER BY TotalUnits DESC

Figure 38. Displaying only rows where the total units are more than 40.

4-54 Microsoft SQL Server 2005


Copyright © by Application Developers Training Company
All rights reserved. Reproduction is strictly prohibited.
Joining Tables with an Inner Join

Joining Tables with an Inner Join

Objective
In this exercise, you’ll create a query that lists all the products and the category
that each product is in. The data for this query is stored in two tables: Products
and Categories. You’ll insert a new category named All Diet into the
Categories table. You’ll first create a query using an inner join to display all of
the products and categories where each category has at least one product.
You’ll then create a query that displays all categories and products.

Things to Consider
• What are the names of the tables and fields for the query?
• Which field in each table will you use for the join?
• What type of join should you use for the query?

Step-by-Step Instructions
1. Use a SELECT statement to determine the column names you need to use
to insert a row in the Categories table.

SELECT * FROM dbo.Categories WHERE 1=0;

2. Execute these statements to insert a new row for the All Diet category into
the Categories table.

INSERT INTO dbo.Categories(CategoryName, Description)


VALUES ('All Diet', 'Low carb,low protein, all types of
dieting items')

3. The CategoryID field is common between both the Product and Category
tables, and it is a primary key/foreign key, making it the ideal joining field.

4. Make sure you specify both the table names and field names every time a
field is referenced in the query:

Microsoft SQL Server 2005 4-55


Copyright © by Application Developers Training Company
All rights reserved. Reproduction is strictly prohibited.
Lab 4:
Data Selection Queries

SELECT Categories.CategoryName, Products.ProductName


FROM dbo.Categories INNER JOIN dbo.Products
ON Categories.CategoryID = Products.CategoryID
ORDER BY Categories.CategoryName

5. Execute the query and view the results as shown in Figure 39, which
displays the first few rows of the result set.

Figure 39. Joining Categories and Products.

6. Next, display all of the categories regardless of whether they have any
products associated with them. Use the LEFT JOIN syntax shown here.

SELECT Categories.CategoryName, Products.ProductName


FROM dbo.Categories LEFT JOIN dbo.Products
ON Categories.CategoryID = Products.CategoryID
ORDER BY Categories.CategoryName

7. Execute the query. The results are displayed in Figure 40.

Figure 40. All categories and products.

4-56 Microsoft SQL Server 2005


Copyright © by Application Developers Training Company
All rights reserved. Reproduction is strictly prohibited.
Aggregate Query with Multiple Inner Joins

Aggregate Query with Multiple Inner


Joins

Objective
In this exercise, you’ll modify the query from the previous exercise to turn it
into an aggregate query with multiple inner joins. Instead of listing only the
product name and category, you’ll also include the total sales for that product.

Things to Consider
• Which tables are needed to retrieve the data for the query? Which
fields?
• Which fields should you use to join the fields together?
• What type of join should you use?
• Which fields should be aggregate? What aggregate function should
you use?
• Which fields should you include in the GROUP BY clause?

Step-by-Step Instructions
1. The Order Details table is required in order to aggregate the data.

2. The ProductID field is the common field between the Products table and
the Order Details table. CategoryID remains the common field between
the Products table and the Categories table.

3. To get total sales, the Quantity must be multiplied with the UnitPrice in the
Order Details table. To aggregate the results together, you must use the
SUM aggregate function.

4. Since the ProductName and CategoryName are going to be displayed and


are not included in the aggregate function, they must appear in the
GROUP BY clause. Sort by CategoryName and ProductName.

Microsoft SQL Server 2005 4-57


Copyright © by Application Developers Training Company
All rights reserved. Reproduction is strictly prohibited.
Lab 4:
Data Selection Queries

SELECT Categories.CategoryName, Products.ProductName,


SUM([Order Details].UnitPrice*[Order Details].Quantity)
AS Total
FROM dbo.Categories INNER JOIN dbo.Products
ON Categories.CategoryID = Products.CategoryID
INNER JOIN dbo.[Order Details]
ON Products.ProductID = [Order Details].ProductID
GROUP BY Categories.CategoryName, Products.ProductName
ORDER BY Categories.CategoryName, Products.ProductName

5. Execute the query and view the results as shown in Figure 41.

Figure 41. An inner join showing categories, products, and totals.

4-58 Microsoft SQL Server 2005


Copyright © by Application Developers Training Company
All rights reserved. Reproduction is strictly prohibited.
Aggregate Query with Inner and Outer Joins

Aggregate Query with Inner and Outer


Joins

Objective
In this exercise, you’ll modify the query from the previous exercise to use an
outer join. The goal is to make certain that all products are included in the
query.

Things to Consider
• Which join should you change to an outer join?
• What kind of outer join should you use?
• In what order should the joins occur?

Step-by-Step Instructions
1. There are only two joins in the query, the inner join between Categories
and Products, and the inner join between Products and Order Details.

2. Because the first exercise included all rows, it is possible that the inclusion
of the Products table excluded some Categories that did not have
matching products, and that joining the Order Detail table as an inner join
excluded some products.

3. The simplest solution is to change all of the inner joins to left joins. The
modified query is as follows:

Microsoft SQL Server 2005 4-59


Copyright © by Application Developers Training Company
All rights reserved. Reproduction is strictly prohibited.
Lab 4:
Data Selection Queries

SELECT Categories.CategoryName, Products.ProductName,


SUM([Order Details].UnitPrice*[Order Details].Quantity)
AS Total
FROM dbo.Categories LEFT JOIN dbo.Products
ON Categories.CategoryID = Products.CategoryID
LEFT JOIN dbo.[Order Details]
ON Products.ProductID = [Order Details].ProductID
GROUP BY Categories.CategoryName, Products.ProductName
ORDER BY Categories.CategoryName, Products.ProductName

4. Execute the query and view the results, shown in Figure 42.

Figure 42. Using left joins to display all rows.

4-60 Microsoft SQL Server 2005


Copyright © by Application Developers Training Company
All rights reserved. Reproduction is strictly prohibited.

You might also like