0% found this document useful (0 votes)
3 views302 pages

SQL Server 2005

Chapter 1 provides an overview of SQL Server 2005, emphasizing its role as a secure and efficient database engine for managing business applications and supporting Business Intelligence (BI) solutions. It discusses the architecture of database servers, including single-tier, two-tier, three-tier, and n-tier architectures, and outlines the core components of SQL Server 2005 such as the Database Engine, Integration Services, Analysis Services, and Reporting Services. The chapter highlights the importance of SQL and the tools available in SQL Server 2005 for database developers to enhance productivity.

Uploaded by

crysis3.a2
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views302 pages

SQL Server 2005

Chapter 1 provides an overview of SQL Server 2005, emphasizing its role as a secure and efficient database engine for managing business applications and supporting Business Intelligence (BI) solutions. It discusses the architecture of database servers, including single-tier, two-tier, three-tier, and n-tier architectures, and outlines the core components of SQL Server 2005 such as the Database Engine, Integration Services, Analysis Services, and Reporting Services. The chapter highlights the importance of SQL and the tools available in SQL Server 2005 for database developers to enhance productivity.

Uploaded by

crysis3.a2
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 302

Chapter 1

Overview of SQL Server 2005


In today’s competitive environment, an organization needs a secure, reliable, and productive data
platform for its business applications. The SQL Server 2005 database engine provides a platform
to build and manage data applications. In addition, it combines data analysis, reporting,
integration, and notification services that enable organizations to build and deploy efficient
Business Intelligence (BI) solutions.

This chapter discusses the importance of a database server. In addition, it provides an overview
of SQL Server 2005, its components, and features. In addition, this chapter introduces the
Structured Query Language (SQL) that is used to manipulate data in a database server. Lastly, it
discusses the tools provided by SQL Server 2005 to improve the productivity of the database
developer and manage the server.

Objectives
In this chapter, you will learn to:
Appreciate SQL Server 2005 as a database server
Identify the SQL Server 2005 tools

Introduction to SQL Server 2005


Every organization needs to maintain information related to employees, customers, business
partners, or business transactions. Organizations build business applications with a user-friendly
interface to store and manipulate this information and to generate reports. In addition, they need
a platform to store and maintain this information in an efficient way. Various database
management systems (DBMS) and relational database management systems (RDBMS), such as
SQL Server 2005, Oracle, and Sybase, can be used to maintain this information.

SQL Server 2005 is a data engine introduced by Microsoft. It provides an environment used to
create databases. It allows secure and efficient storage and management of data. In addition, it
provides other components and services that support the business intelligence platform to generate
reports and help in data analysis.

As a database developer, it is important for you to identify the role of a database server in an
organization. You can design a database effectively if you know all the components and services
of SQL SERVER 2005. In addition, you need to understand the basics of SQL, a language that is
used to query and manage data.</

Role of a Database Server


Organizations have always been storing and managing business data. Earlier, organizations used
to store data on paper. With an increase in the usage of computers, organizations started
maintaining the same information in computers. Data was stored in an organized way, and it was
also easy to retrieve data faster than before. As data retrieval became easy and fast, organizations
started using business applications to support the business operations.
Business applications accept data as input, process the data based on business requirements, and
provide data or information as output. For example, an application maintains the details of the
number of cars sold for a particular brand, such as Ferrari. Each car has a unique identification
number that is already stored in an application. Whenever a sale happens, the application checks
whether the unique identification number provided for the car is correct or not. If the unique
identification number is correct then the sale details for the same is updated in the application.
The data is saved and an output message confirming that the data has been saved is displayed to
the user. This process of checking whether the unique identification number exists in the system
or not is called a business rule.

Consider another example. The Human Resource department of an organization uses an


application to manage the employee data. The users need to add the details of new employees. For
this, the application provides an interface to enter the employee details. These details are validated
for accuracy based on business rules. For example, a business rule is defined to check that the date
of joining of the new employee is less than or equal to the current date. If the data meets the
requirements, it is saved in the data store.

Based on the preceding scenario, a business application can have the following elements:

The user interface (UI) or the presentation element, through which data is entered.
The application logic or the business rule element, which helps in validating the entered
data.
The data storage or the data management element, which manages the storage and retrieval
of data.

These elements form the base of the models or architectures used in application development. All
these elements can exist on the same computer as a single process or on different computers as
different processes. Depending on the placement of these elements, the application architecture
can be categorized as:

Single-tier architecture
Two-tier architecture
Three-tier architecture
N-tier architecture

Single-Tier Architecture

In a single-tier architecture, all elements of a business application are combined as a single


process. If multiple users need to work on this application then it needs to be installed on the
computer of every user. This type of architecture has one disadvantage. In case some errors are
identified in the application then after rectifying the same, the application has to be installed again
on the system of every user. This is a time consuming process.

Two-Tier Architecture
To solve the problems of single-tier application, two-tier architecture was introduced. In two-tier
architecture, the application is divided into two parts. One part handles the data, while the other
provides the user interface. Therefore, this architecture is called two-tier architecture. These two
parts can be located on a single computer or on separate computers over a network.
The part that handles the UI is called the client tier. The part that implements the application logic
and validates the input data based on the business rules is called the server tier, as shown in the
following figure.

Two-Tier Architecture

In this architecture, the maintenance, upgrade, and general administration of data is easier, as it
exists only on the server and not on all the clients.

Two-tier architecture is also called the client-server architecture. A client sends the request for a
service and a server provides that service. Most RDBMSs, such as Microsoft Access, SQL Server,
and Oracle, support the client-server architecture. RDBMS provides centralized functionality
while supporting many users.

Three-Tier Architecture

When implementing complex business solutions in a two-tier architecture, the tier on which the
business logic is implemented becomes over loaded. As a result, it takes more time to execute.
Therefore, to provide further flexibility, the two-tier architecture can be split into three tiers. In
three-tier architecture, the first tier is the client tier. The second or middle tier is called the business
tier. The third tier is called the server tier. The server tier contains a database server that manages
the data.

In this architecture, an additional tier called a business tier has been added between the client tier
and the server tier, as shown in the following figure.

Three-Tier Client/Server Architecture


The business tier consists of all the business rules. It consists of the application logic that
implements business rules and validates the data. The advantage of a three-tier application is that
it allows you to change the business rules without affecting the other two tiers.

For example, in a banking application for loans, the user tier is the front-end used by the customer
to specify the loan details. The server tier can consist of an RDBMS in which the data is stored.
The business tier lies between the other two tiers and consists of business rules, such as the loan
limit and the interest rate charged to a customer. If there is a change in the rate of interest, only
the middle tier component needs to be modified.

N-Tier Architecture

As the business complexities increased, the business tier became larger and unmanageable. This
led to the evolution of n-tier architecture, where the business services model was divided into
smaller manageable units. N-tier architecture is also called as multi-tier architecture.

In this architecture, one component near the client tier is responsible to do the client side validation
and send the data to the presentation tier. Therefore, it is possible to keep the UI-centric processing
component on a computer near the client. The UI-centric processing component is responsible for
processing the data retrieved from and sent to the presentation tier. In addition, you may have
another component near the database server to do the data manipulation and validation. You can
keep the data-centric processing components on another computer near the database server, so
that you gain significant performance benefits. Data-centric processing components are
responsible for accessing the data tier to retrieve, modify, and delete data to and from the database
server.

The n-tier architecture consists of the following layers:

Client tier
UI-centric processing components
Data-centric processing objects
Database server

The banking application, when further expanded, can represent an example of n-tier architecture.
The client tier would consist of user interface, which would include the user interface controls,
such as forms, menus, and toolbars. The server tier would consist of data-handling including
saving data to the database server.

The business logic would include the rules and guidelines for different types of accounts, interest
rates, fixed deposits, ATMs, and loan rules. All these would form the middle tier. However, there
would be some rules that need to be implemented on the user interface and on the database. You
can place these rules either on the UI-centric processing components or data-centric processing
components, based on the functionality.

Applications that follow multi-tier architecture can be used across various locations. For example,
in Web applications, the application is stored on the Web server. The clients access the application
from any location through a browser. The clients make requests to and receive responses from the
Web server.
The Web server transfers the request for data to a database server, as shown in the following
figure.

Architecture of Web Applications

Depending on the type of the business rules, they can be implemented on any of the tiers, such as
Web clients, Web server, or the database server.

To provide support to applications where users can send requests simultaneously, the database
server needs to be a fast, reliable, and secure. SQL Server 2005 is one such complete database
platform that provides a fast, reliable, and secure RDBMS. It also helps in data analysis with
integrated BI tools.

A BI application is an application that is used by the top management of an organization for data
analysis to make future decisions. BI tools help in creating reports that enable data analysis.</

SQL Server 2005 Components


SQL Server 2005 contains a number of components. Each component provides specific services
and support to the clients connected to the server.

The following figure displays the components of SQL Server 2005.


Components of SQL Server 2005

As shown in the preceding figure, SQL Server 2005 consists of the following core components:

Database Engine
Integration Services
Analysis Services
Reporting Services

Database Engine

The database engine provides support to store, query, process, and secure data on the database
server. The database engine allows you to create and manage database objects, such as tables.
Apart from providing support for data management, the database engine also provides the
following background services:

Service Broker: Provides support for asynchronous communication between clients and the
database server, enabling reliable query processing. The client application sends a request
to the database server and continues to work. The requests sent by the client are queued up
at the server, in case the server is not available to process the request immediately. Server
Broker ensures that the request is processed whenever the server is available.

The following figure shows an example of order processing system.

Service Broker
The preceding figure describes the example of the order processing system. The client applications
are sending orders to the database server to enter the order details. All these orders are placed in
a queue, which is managed by Service Broker.

Replication: Allows you to copy and distribute data and database objects from one database
server to another. These servers can be located at remote locations to provide fast access to
users at widely distributed locations. After replicating data, SQL Server allows you to
synchronize different databases to maintain data consistency. For example, the database
servers for your organization might be located at different locations around the world. All
the servers store common data. To ensure that the data in all the servers is synchronized, you
can implement data replication. Replication follows the publisher/subscriber model. In this
model, the changes are sent out by one database server (“publisher”) and are received by
others (“subscribers”).

The following figure depicts the process of replication.

Replication Process

In the preceding figure, articles are the database objects to be replicated. These articles are
contained in a publication. Publication is the database data that is replicated. Distributor takes the
publications from the publisher and distributes them to the subscribers.

Full-text search: Allows you to implement fast and intelligent search in large databases. It
allows you to search records containing certain words and phrases. You can search for
different forms of a specific word, such as ‘produce’, ‘produces’, or ‘production’. In
addition, you can search for synonyms of a given word, such as ‘garment’, ‘cloth’, or
‘fabric’.
Notification services: Allows you to generate and send notification messages to the users
or administrators about any event. For example, the database administrator should be
notified when a table is created or deleted. The notification messages can be sent to a variety
of devices, such as computers or mobile devices. Notification services are a platform for
developing and deploying highly scalable notification applications. It allows developers to
build notification applications that send timely, personalized information updates, helping
to enhance customer relationships. For example, a brokerage firm sends stock and fund
prices based on the customer’s preferences.

Integration Services

Data in different sources might be stored in different formats and structures. Integration services
allow you to gather and integrate this varied data in a consistent format in a common database
called the data warehouse. A data warehouse consists of integrated databases, which can be a
DBMS, text files, or flat files. A data warehouse is similar to a physical warehouse that stores raw
material or products for further distribution. The organization does not store useless materials or
products in its warehouse because it costs money and affects the ability to get products in and out
of the warehouse. Similarly, a data warehouse should not contain useless data. The data should be
meaningful and should be processed quickly. Therefore, a data warehouse is a large central
repository of data that helps in decision-making.

Consider a telecommunications company where the CEO notices that in the past one year, the
frequency of cancellation of services by its customers has increased considerably. The company
is unable to analyze the service preferences of the customers because data is scattered across
disparate data sources. These data sources contain data, spanning two decades. In such a case, a
data warehouse can be implemented to integrate two decades of historical data from disparate data
sources. The integrated data will provide a holistic view of the customers to the CEO.

SQL Server 2005 Integration Services (SSIS) Import and Export Wizard provides a series of
dialogs that helps you to complete the process of selecting the data source, the destination, and
the objects that will be transferred to create a data warehouse.

Analysis Services

Data warehouses are designed to facilitate reporting and analysis. Enterprises are increasingly
using data stored in data warehouses for analytical purposes to assist them in making quick
business decisions. The applications used for such analysis are termed as business intelligence
(BI) applications. Data analysis assists in determining past trends and formulating future business
decisions. This type of analysis requires a large volume of data to reach a consistent level of
sampling.

In the telecommunications company scenario, with the help of the analysis tools querying on the
data warehouse, the CEO can identify the customers who are canceling their services. The
company can then use this information to provide attractive offers to the identified customers and
build loyalty. This kind of information analysis proves to be beneficial to the enterprise in the long
run. The enterprise can retain its customers by offering loyalty programs and schemes on the basis
of analysis on the historical data.

Consider another example of a soft drink manufacturer that uses data of the past few years to
forecast the quantity of bottles to be manufactured in the current month. These forecasts are based
on various parameters, such as the average temperature during the last few years, purchasing
capacity of the customers, age group of customers, and past trends of consumption. The
requirements for such an analysis include:
A large volume of data
Historical data, that is, data stored over a period of time

Therefore, analysis services help in data analysis in a BI application. Microsoft SQL Server 2005
Analysis Services (SSAS) provide Online Analytical Processing (OLAP) for BI applications.
OLAP arranges the data in the data warehouse in an easily accessible format. This technology
enables data warehouse to do online analysis of the data.

Reporting Services

Reporting services provide support to generate complete reports on data in the database engine or
in the data warehouse. These services provide a set of tools that help in creating and managing
different types of reports in different formats. Using these services, you can create centralized
reports that can be saved to a common server. Reporting services provide secure and restricted
access to these reports.

Microsoft SQL Server 2005 Reporting Services (SSRS) help in creating Web-based reports that
is based on the content stored in a variety of data sources. You can also publish these reports in
different formats.

The following figure shows the usage of the various SQL Server core components in a BI
application:

Core Components of SQL Server 2005</

SQL Server Integration with the .NET Framework


As against the earlier versions of SQL Server, Microsoft SQL Server 2005 is integrated with the
.NET Framework, as shown in the following figure.
Integration of SQL Server 2005 with the .NET Framework

The .NET Framework is an environment used to build, deploy, and run business applications.
These applications can be built in various programming languages supported by the .NET
Framework. It has its own collection of services and classes. It exists as a layer between the .NET
applications and the underlying operating system.

SQL Server 2005 uses various services provided by the .NET Framework. For example, the
notification services component is based on the .NET Framework. This component uses the .NET
Framework services to generate and send notification messages.

The .NET Framework is also designed to make improvements in code reuse, code specialization,
resource management, multilanguage development, security, deployment, and administration.
Therefore, it helps bridge the gap of interoperability between different applications.

The .NET Framework consists of the following components:

Development tools and languages


Base Class Library
Common Language Runtime (CLR)

Development Tools and Languages

The development tools and languages are used to create the interface for the Windows forms, Web
forms, and console applications. The development tools include Visual Studio 2005 and Visual
C# Developer. The languages that can be used are Visual Basic.NET, C#, or J#. These components
are based on the .NET Framework base classes.

Base Class Library

The .NET Framework consists of a class library that acts as a base for any .NET language, such
as Visual Basic, .NET, and C#. This class library is built on the object-oriented nature of the
runtime. It provides classes that can be used in the code to accomplish a range of common
programming tasks, such as string management, data collection, database connectivity, and file
access.

Common Language Runtime (CLR)


CLR is one of the most essential components of the .NET Framework. It provides an environment
for the application to run. CLR or the runtime provides functionalities, such as exception handling,
security, debugging, and versioning support to the applications.

Some of the features provided by CLR are:

Automatic memory management: Allocates and deallocates memory to the application as


and when required.
Standard type system: Provides a set of common data types in the form of Common Type
System (CTS). This means that the size of integer and long variables is the same across all
programming languages.
Language interoperability: Provides the ability of an application to interact with another
application written in a different programming language. This also helps maximize code
reuse.
Platform independence: Allows execution of a code from any platform that supports the
.NET CLR.
Security management: Applies restrictions on the code to access the resources of a
computer.

CLR can host a variety of languages. It offers a common set of tools across these languages,
ensuring interoperability between the codes. The code developed with a language compiler that
targets CLR is called a managed code.

Alternatively, the code that is developed without considering the rules and requirements of the
common language runtime is called unmanaged code. Unmanaged code executes in the common
language runtime environment with minimal services. For example, unmanaged code may run
with limited debugging and without the garbage collection process.

The components of SQL Server 2005 and the .NET Framework provide various features to the
database server. These features help the developers manage data in an efficient way.</

Features of SQL Server 2005


The components of SQL Server 2005 help improve the database management and developer
productivity. These benefits are provided by the following features:

Built-in support for Extensible Markup Language (XML) data: Allows you to store and
manage XML data in variables or columns of the XML data type. The XML data is the data
stored in a structured format. This format can be used across different platforms and
applications built by using different languages.
CLR integration: Allows you to implement programming logic in any language supported
by the .NET Framework.
Scalability: Allows partitioning of database tables to help in parallel processing of queries.
This makes the database scalable and improves the performance of queries.
Service-oriented architecture: Provides a distributed, asynchronous application
framework for large-scale applications. This allows the database clients to send requests to
the database server even if the server is not available to process the request immediately.
Support for Web services: Allows you to provide direct access to the data from the Web
services by implementing the HTTP endpoints.
High level of security: Implements high security by enforcing policies for log on passwords.
Administrators can also manage permissions on database objects granted to different users.
High availability: Ensures that the database server is available to all users at all times. This
reduces the downtime of the server. In SQL Server 2005, high availability is implemented
with the help of database mirroring, failover clustering, and database snapshots.
Support for data migration and analysis: Provides tools to migrate data from different
data sources to a common database. In addition, it allows building the data warehouse on
this data that can support BI applications for data analysis and decision-making.

SQL
As a database developer, you need to manage the database to store, access, and modify data. SQL
is the core language used to perform these operations on the data. SQL, pronounced as “sequel",
is a language that is used to manage data in an RDBMS. This language was developed by IBM in
the 1970s. It follows the International Organization for Standardization (ISO) and American
National Standards Institute (ANSI) standards.

Most database systems have created customized versions of the SQL language. For example,
Transact-SQL (T-SQL) is a scripting language used on SQL Server for programming.
Alternatively, PL-SQL is used for programming in Oracle. T-SQL conforms to the ANSI SQL-
92 standard published by ANSI and ISO in the year 1992.

The SQL statements can be categorized as:


Data Definition Language (DDL): Is used by to define the database, data types, structures,
and constraints on the data. Some of the DDL statements are:
 CREATE: Used to create a new database object, such as a table.
 ALTER: Used to modify the database objects.
 DROP: Used to delete the objects.

Data Manipulation Language (DML): Is used to manipulate the data in the database
objects. Some of the DML statements are:
 INSERT: Used to insert a new data record in a table.
 UPDATE: Used to modify an existing record in a table.
 DELETE: Used to delete a record from a table.

Data Control Language (DCL): Is used to control the data access in the database. Some of
the DCL statements are:
 GRANT: Used to assign permissions to users to access a database object.
 REVOKE: Used to deny permissions to users to access a database object.

Data Query Language (DQL): Is used to query data from the database objects. SELECT
is the DQL statement that is used to select data from the database in different ways and
formats.

SQL is not a case-sensitive language. Therefore, you can write the statements in any case,
lowercase or uppercase. For example, you can use the SELECT statement in lowercase as ‘select’
or in title case as ‘Select’.
Just a minute:
Which of the following features of SQL Server 2005 allows the developers to implement
their programming logic in any language supported by the .NET Framework?

1. Support for data migration


2. High availability
3. CLR integration
4. Scalability

Answer:
3. CLR integration

Identifying SQL Server 2005 Tools


SQL Server 2005 provides various tools that help improve the efficiency of database developers.
SQL Server Management Studio is one such tool that helps in creating and maintaining database
objects. SQL Server Business Intelligence Development Studio is another tool that helps in
creating and implementing BI solutions. The server also provides tools, such as Database Engine
Tuning Advisor and SQL Server Configuration Manager that help the database administrator in
configuring the server and optimizing the performance.

Before you start working on SQL Server 2005, it is important to identify various tools provided
by the server and their features.</

SQL Server Management Studio


SQL Server Management Studio is a powerful tool associated with SQL Server 2005. It provides
a simple and integrated environment for developing and managing the SQL Server database
objects. The following figure shows the SQL Server Management Studio interface.
SQL Server Management Studio Interface

The following table lists the main components of the SQL Server Management Studio interface.
Component Description
Object Explorer An Object Explorer window provides the ability to register,
browse, and manage servers. Using Object Explorer, you can
also create, browse, and manage server components. The
Explorer allows you to configure the following components:

Security: Used to create log on Ids and users and to assign


permissions.
Notification Services: Used to generate and send notifications
to users.
Replication: Used to create and manage publishers and
subscribers.
SQL Server Agent: Used to automate administrative tasks by
creating and managing jobs, alerts, and operators.
Management Services: Used to configure Distributed
Transaction Coordinator, Full-Text search, Database Mail
service, or SQL Server logs.
Server Objects: Used to create and manage backups,
endpoints, and triggers.

Registered Servers The Registered Servers window displays all the servers
registered with the management studio. It also helps record
connection information for each registered server including the
authentication type, default database, network protocol
characteristics, encryption, and time-out parameters.
Solution Explorer The Solution Explorer window provides an organized view of
your projects and files. In this explorer, you can right-click on a
project or file to manage or set their properties.
Query Editor The Query Editor window provides the ability to execute queries
written in T-SQL. It can be invoked by selecting the New Query
option from the File menu or the New Query button from the
Standard toolbar.
Template Explorer The Template Explorer window provides a set of templates of
SQL queries to perform standard database operations. You can
use these queries to reduce the time spent in creating queries.
Dynamic Help The Dynamic Help window is available from the Help menu of
SQL Server Management Studio. This tool automatically displays
links to relevant information while users work in the
Management Studio environment.

Components of the SQL Server Management Studio Interface</

SQL Server Business Intelligence Development Studio


SQL Server Business Intelligence Development Studio is a tool that provides an environment to
develop business intelligence solutions. These solutions are based on the data that was generated
in the organization and helps in business forecasting and making strategic decisions and future
plans.

Business Intelligence Development Studio helps build the following types of solutions:

Data integration: The integration services allow you to build solutions that integrate data
from various data sources and store them in a common data warehouse.
Data analysis: The analysis services help to analyze the data stored in the data warehouse.
Data reporting: The reporting services allow you to build reports in different formats that
are based on the data warehouse.

Business Intelligence Development Studio contains templates, tools, and wizards to work with
objects that you can use to create business intelligence solutions.

The following figure shows the SQL Server Business Intelligence Development Studio interface.
SQL Server Business Intelligence Development Studio Interface

In the preceding figure, the different project templates that you can use in the SQL Server Business
Intelligence Development Studio interface are displayed.</

Database Engine Tuning Advisor


Database Engine Tuning Advisor helps database administrators to analyze and tune the
performance of the server. To analyze the performance of the server, the administrator can execute
a set of T-SQL statements against a database. After analyzing the performance of these statements,
the tool provides recommendations to add, remove, or modify database objects, such as indexes
or indexed views to improve performance. These recommendations help in executing the given
T-SQL statements in the minimum possible time.</

SQL Server Configuration Manager


SQL Server Configuration Manager helps the database administrators to manage the services
associated with SQL Server. By default, in SQL Server 2005, some services, such as SQL Server
Agent and integration services are not enabled. Administrators can start, pause, resume, or stop
these services by using this tool.

In addition, the tool allows you to manage the network connectivity configuration from the SQL
Server client computers. It allows you to specify the protocols through which the client computers
can connect to the server.

The following figure shows the SQL Server Configuration Manager interface.
SQL Server Configuration Manager Interface

In the preceding figure, it displays the various configuration options through which you can
configure SQL Server.

Just a minute:
Which of the following tools of SQL Server 2005 allows starting and stopping the full-
text search?

1. SQL Server Management Studio


2. Business Intelligence Development Studio
3. Database Engine Tuning Advisor
4. SQL Server Configuration Manager

Answer:
4. SQL Server Configuration Manager

Summary
In this chapter, you learned that:

A business application can have three elements: user interface, business logic, and data
storage.
A database server is used to store and manage the database in a business application.
SQL Server 2005 consists of the four core components: database engine, integration
services, analysis services, and reporting services.
The database engine provides support to store, query, process, and secure data on the
database server.
Integration services allow you to gather and integrate this varied data in a consistent format
in a common database called the data warehouse.
Analysis services assist in determining past trends and formulating future business decisions.
Reporting services provide support to generate comprehensive reports on the data stored in
the database engine or the data warehouse.
Microsoft SQL Server 2005 is integrated with the .NET Framework.
The .NET Framework is an environment used to build, deploy, and run business applications
through various programming languages.
The .NET Framework consists of three components: development tools and languages, base
class library, and CLR.
SQL Server 2005 provides the following benefits:
 Built-in support for XML data
 CLR integration
 Scalability
 Service-oriented architecture
 Support for Web services
 High level of security
 High availability
 Support for data migration and analysis

SQL includes:
 DDL: To create and manage database objects
 DML: To store and manage data in database objects
 DCL: To allow or deny access to database objects
 DQL: To query data from the database objects

SQL Server 2005 provides the following tools to improve the efficiency of the database
developers and manage the server:
 SQL Server Management Studio
 SQL Server Business Intelligence Management Studio
 Database Engine Tuning Advisor
 SQL Server Configuration Manager

Chapter 2
Querying Data
As a database developer, you need to regularly retrieve data for various purposes, such as creating
reports and manipulating data. You can retrieve data from the database server by using SQL
queries.

This chapter explains how to retrieve selected data from database tables by executing the SQL
queries. Further, it discusses how to incorporate functions to customize the data values returned
by the queries. In addition, the chapter explains how to retrieve summarized and grouped data
from the database tables.
Objectives
In this chapter, you will learn to:
Retrieve data
Use functions to customize the result set
Summarize and group data

Retrieving Data
At times, the database developers might need to retrieve complete or selected data from a table.
Depending on the requirements, you might need to extract only selected columns or rows from a
table. For example, an organization stores the employee data in the SQL Server database. At times,
the users might need to extract only selected information such as name, date of birth, and address
details of all the employees. At other times, the users might need to retrieve all the details of the
employees in the Sales and Marketing department.

Depending on these requirements, you will need to run different SQL queries. These queries
specify the criteria for selection of data from the tables. Therefore, it is important for you to learn
how to query databases to retrieve the required information. Databases can contain different types
of data. Therefore, before querying the data, it is important to identify the various types of data.</

Identifying Data Types


Data type represents the type of data that an object can contain, such as character data or integer
data. SQL Server 2005 supports various data types. Data types can be associated with each
column, local variable, or an expression defined in the database.

You need to specify the data type of a column according to the data to be stored. For example,
you can specify character as the data type to store the employee name, datetime as the data type
to store the hire date of employees. Similarly, you can specify money as a data type to store the
salary of the employees. SQL Server 2005 supports the following data types.
Data type Range Used to store
int –2^31 (–2,147,483,648) to 2^31– Integer data (whole numbers)
1 (2,147,483,647)
smallint –2^15 (–32,768) to 2^15–1 Integer data
(32,767)
tinyint 0 to 255 Integer data
bigint –2^63 (– Integer data
9,223,372,036,854,775,808) to
2^63–1
(9,223,372,036,854,775,807)
decimal –10^38 +1 through 10^38–1 Numeric data types with a fixed
precision and scale
numeric –10^38 +1 through 10^38–1 Numeric data types with a fixed
precision and scale
float –1.79E+308 to –2.23E–308, 0 and Floating precision data
2.23E–308 to 1.79E+308
money –922,337,203,685,477.5808 to Monetary data
922,337,203,685,477.5807
smallmoney –214,748.3648 to 214,748.3647 Monetary data
datetime January 1, 1753, through Date and time data
December 31, 9999
smalldatetime January 1, 1900, through June 6, Date and time data
2079
char(n) n characters, where n can be 1 to Fixed length character data
8000
varchar(n) n characters, where n can be 1 to Variable length character data
8000
text Maximum length of 2^31–1 Character string
(2,147,483,647) characters
ntext Maximum length of 2^30–1 Variable length Unicode data
(1,073,741,823) characters
bit 0 or 1 Integer data with 0 or 1
image Maximum length of 2^31–1 Variable length binary data to store
(2,147,483,647) bytes images
real –3.40E + 38 to –1.18E–38, 0 and Floating precision number
1.18E–38 to 3.40E + 38
binary Maximum length of 8000 bytes Fixed length binary data
varbinary Maximum length of 8000 bytes Variable length binary data
cursor Stores variables or stored Cursor reference
procedure OUTPUT parameters
nchar Maximum length of 4000 Fixed length Unicode data
characters
nvarchar Maximum length of 4000 Variable length Unicode data
characters
sql_variant Maximum length of 8016 bytes Different data types except text, ntext,
image, timestamp, and sql_variant
timestamp Maximum storage size of 8 bytes Unique number in a database that is
updated every time a row that contains
timestamp is inserted or updated
uniqueidentifier Is a 16–byte GUID A column or local variable of the
uniqueidentifier data type can be
initialized by:

Using the NEWID function.


Converting from a string constant in
the form xxxxxxxx-xxxx-xxxx-xxxx-
xxxxxxxxxxxx, where each x is a
hexadecimal digit in the range 0-9 or
a-f. For example, an unique identifier
value is 6F9619FF-8B86-D011-B42D-
00C04FC964FF

table Result set to be processed later Temporary set of rows returned as a


result set of a table-valued function
xml xml instances and xml type Store and return xml values
variables

Data Types Supported by SQL Server 2005</

Retrieving Specific Attributes


While retrieving data from tables, you can display one or more columns. For example, the
AdventureWorks database stores the employee details, such as EmployeeId, ManagerID, Title,
HireDate, and BirthDate in the Employee table. Users might want to view all the details of the
Employee table or might want to view few columns such as EmployeeID and ManagerID. You
can retrieve the required data from the database tables by using the SELECT statement.

The SELECT statement is used for accessing and retrieving data from a database. The syntax of
the SELECT statement is:
SELECT [ALL | DISTINCT] select_column_list
[INTO [new_table_name]]
[FROM {table_name | view_name}
[WHERE search_condition]

where,
ALL is represented with an (*) asterisk symbol and displays all the columns of the table.
DISTINCT specifies that only the unique rows can appear in the result set.
select_column_list is the list of columns or aggregate columns for which data is to be listed.
INTOcreates a new table and inserts the resulting rows from the query into it.
new table_name is the name of a new table to be created.
FROM table_name is the name of the table from which data is to be retrieved.
WHERE specifies the search condition for the rows returned by the query.
Search_condition specifies the condition to be satisfied to return the selected rows.

The SELECT statement contains some more arguments such as WHERE, GROUP BY, COMPUTE,
and ORDER BY that will be explained later in this chapter.

All the examples in this book are based on the AdventureWorks, Inc. case study given in the
Appendix.

For example, the Employee table is stored in the HumanResources schema of the
AdventureWorks database. To display all the details of employees, you can use the following
query:
SELECT * FROM HumanResources.Employee

SQL Server will display the output of the query, as shown in the following figure.

Retrieving All Columns

The result set displays the records in the order in which they are stored in the table.

The number of rows in the output window may vary depending on the modifications done on the
database.
The machine name, SQLServer01, displayed at the bottom of the output window may vary
depending upon your machine name. In addition, the user name, Robert, may vary depending upon
the user name you choose to log on to SQL Server.

In the previous versions of SQL Server, database users and schemas were conceptually the
same object. The behavior of schemas changed in SQL Server 2005. Schemas are no longer
equivalent to database users. Schema is a namespace that acts as a container of objects. A
schema can be owned by any user, and its ownership is transferable.
A single schema can contain objects owned by multiple database users.

If you need to retrieve specific columns, you can specify the column names in the SELECT
statement. For example, to view specific details, such as EmployeeID, ContactID, LoginID, and
Title, of the employees of AdventureWorks, you can specify the column names in the SELECT
statement, as shown in the following query:
SELECT EmployeeID, ContactID, LoginID, Title FROM
HumanResources.Employee

SQL Server will display the output of the preceding query, as shown in the following figure.

Specific Columns Retrieved from the Employee table

In the output, the result set shows the column names the way they are present in the table
definition. You can customize these column names, if required.

Customizing the Display

Sometimes, you may want to change the way the data is displayed. For example, if the names of
columns are not descriptive, you might need to change the default column headings by creating
user-defined headings.

Consider the following example that displays the Department ID and Department Names from the
Department table of the AdventureWorks database. The report should contain column headings
different from those given in the table, as shown in the following figure.
Department Number Department Name

You can write the query in the following ways:

1. SELECT ‘Department Number'= DepartmentID, ‘ Department Name'= Name FROM


HumanResources.Department
2. SELECT DepartmentID ‘Department Number', Name ‘ Department Name’ FROM
HumanResources.Department
3. SELECT DepartmentID AS ‘Department Number', Name AS ‘ Department Name’ FROM
HumanResources.Department

SQL Server will display the same output for all the preceding queries, as shown in the following
figure.

Retrieving Specific Columns with User-Defined Headings

In the preceding figure, the columns are displayed with different names of heading, but the original
column names in the database table remain unchanged.

Similarly, you might be required to make results more explanatory. In such a case, you can add
more text to the values displayed by the columns by using literals. Literals are string values that
are enclosed in single quotes and added to the SELECT statement. The literal value is printed in
a separate column as they are written in the SELECT list. Therefore, literals are used for display
purpose.

The following SQL query retrieves the employee ID and their titles from the Employee table along
with a literal ‘Designation’:
SELECT EmployeeID, ‘Designation: ‘, Title
FROM HumanResources.Employee

SQL Server will display the output of the preceding query, as shown in the following figure.
Retrieving Specific Columns by Using Literals

In the preceding figure, the resultset displays a virtual column with Designation as a value in each
row. This column does not physically exist in the database table.

Concatenating the Text Values in the Output

Concatenation is the operation where two strings are joined to make one string. For example, the
strings, snow and ball can be concatenated to display the output, snowball.

As a database developer, you have to manage the requirements from various users, who might
want to view results in different ways. You may require to display the values of multiple columns
in a single column and also add a description with the column value. In this case, you can use the
concatenation operator. The concatenation operator is used to concatenate string expressions. It
is represented by the + sign. To concatenate two strings, you can use the following query:
SELECT ‘snow’ + ‘ball’

The preceding query will display snowball as the output.

The following SQL query concatenates the data of the Name and GroupName columns of the
Department table into a single column:
SELECT Name + ‘ department comes under ‘ + GroupName + ‘ group’ AS Department FROM
HumanResources.Department

In the preceding query, literals, such as “department comes under” and “group”, are concatenated
to increase the readability of the output. SQL Server will display the output of the preceding query,
as shown in the following figure.
Retrieving Specific Columns by Using the Concatenation Operator

Calculating Column Values

Sometimes, you might also need to show calculated values for the columns. For example, the
Orders table stores the order details such as OrderID, ProductID, OrderDate, UnitPrice, and Units.
To find the total amount of an order, you need to multiply the UnitPrice of the product with the
Units. In such cases, you can apply arithmetic operators. Arithmetic operators are used to perform
mathematical operations, such as addition, subtraction, division, and multiplication, on numeric
columns or on numeric constants.

SQL Server supports the following arithmetic operations:

+ (for addition)
- (for subtraction)
/ (for division)
* (for multiplication)
% (for modulo - the modulo arithmetic operator is used to obtain the remainder of two
divisible numeric integer values)

All arithmetic operators can be used in the SELECT statement with column names and numeric
constants in any combination.

When multiple arithmetic operators are used in a single query, the processing of the operation
takes place according to the precedence of the arithmetic operators. The precedence level of
arithmetic operators in an expression is multiplication (*), division (/), modulo (%), subtraction (-
), and addition (+). You can change the precedence of the operators by using parentheses (()).
When an arithmetic expression uses the same level of precedence, the order of execution is from
left to right.

The EmployeePayHistory table in the HumanResources schema contains the hourly rate of the
employees. The following SQL query retrieves the per day rate of the employees from the
EmployeePayHistory table:
SELECT EmployeeID, Rate, Per_Day_Rate = 8 * Rate FROM
HumanResources.EmployeePayHistory

In the preceding query, Rate is multiplied by 8, assuming that an employee works for eight hours
in a day.</
Retrieving Selected Rows
In a given table, a column can contain different values in different records. At times, you might
need to view only those records that match a condition. For example, in a manufacturing
organization, an employee wants to view a list of products from the Products table that are priced
between $100 and $200. Consider another example where a teacher wants to see the names and
the scores of the students who scored greater than 80%. So the Query must select the names and
the scores from the table and with a condition added to the score column.

To retrieve selected rows based on a specific condition, you need to use the WHERE clause in the
SELECT statement. Using the WHERE clause selects the rows that satisfy the condition.

The following SQL query retrieves the department details from the Department table, where the
group name is Research and Development:
SELECT * FROM HumanResources.Department WHERE GroupName = ‘Research and Development’

SQL Server will display the output of the preceding query, as shown in the following figure.

Retrieving Selected Rows

In the preceding query, rows containing the Research and Development group name are retrieved.

Using Comparison Operators to Specify Conditions


You can specify conditions in the SELECT statement to retrieve selected rows by using various
comparison operators. Comparison operators test for similarity between two expressions.
Comparison operators allow row retrieval from a table based on the condition specified in the
WHERE clause. Comparison operators cannot be used on text, ntext, or image data type
expressions. The syntax for using the comparison operator in the SELECT statement is:
SELECT column_list
FROM table_name
WHERE expression1 comparison_operator expression2

where,
expression1 and expression2 are any valid combination of a constant, a variable, a function,
or a column-based expression.

In the WHERE clause, you can use a comparison operator to specify a condition. The following
SQL query retrieves records from the Employee table where the vacation hour is less than 5:
SELECT EmployeeID, NationalIDNumber, Title, VacationHours
FROM HumanResources.Employee WHERE VacationHours < 5

The preceding query retrieves all the rows that satisfy the specified condition by using the
comparison operator, as shown in the following figure.

Retrieving Selected Rows

The following table lists the comparison operators provided by SQL Server.
Operator Description
= Equal to
> Greater than
< Less than
>= Greater than or equal to
<= Less than or equal to
<> Not equal to
!= Not equal to
!< Not less than
!> Not greater than

Comparison Operators

Sometimes, you might need to view records for which one or more conditions hold true.
Depending on the requirements, you can retrieve records based on the following conditions:
Records that match one or more conditions
Records that contain values in a given range
Records that contain any value from a given set of values
Records that match a pattern
Records that contain NULL values
Records to be displayed in a sequence
Records from the top of a table
Records without duplication of values

Retrieving Records That Match One or More Conditions

Logical operators are used in the SELECT statement to retrieve records based on one or more
conditions. While querying data, you can combine more than one logical operator to apply
multiple search conditions. In a SELECT statement, the conditions specified by the logical
operators are connected with the WHERE clause. The syntax for using the logical operators in the
SELECT statement is:
SELECT column_list
FROM table_name
WHERE conditional_expression1 {AND/OR} [NOT]
conditional_expression2

where,
conditional_expression1 and conditional_expression2 are any conditional expressions.

There are three types of logical operators. They are:

OR: Returns a true value when at least one condition is satisfied. For example, the following
SQL query retrieves records from the Department table when the GroupName is either
Manufacturing or Quality Assurance:

SELECT * FROM HumanResources.Department WHERE GroupName = ‘Manufacturing’ OR


GroupName = ‘Quality Assurance’

AND: Is used to join two conditions and returns a true value when both the conditions are
satisfied. To view the details of all the employees of AdventureWorks who are married and
working as a Production Technician – WC60, you can use the AND logical operator, as shown
in the following query:

SELECT * FROM HumanResources.Employee WHERE Title = ‘Production Technician - WC60’


AND MaritalStatus = ‘M’

NOT: Reverses the result of the search condition. The following SQL query retrieves records
from the Department table when the GroupName is not Quality Assurance:

SELECT * FROM HumanResources.Department WHERE NOT GroupName = ‘Quality Assurance’

The preceding query retrieves all the rows, except the rows that match the condition specified
after the NOT conditional expression.
Retrieving Records That Contain Values in a Given Range
The range operator retrieves data based on a range. The syntax for using the range operator in the
SELECT statement is:
SELECT column_list
FROM table_name
WHERE expression1 range_operator expression2 AND expression3

where,
expression1, expression2, and expression3 are any valid combination of constants, variables,
functions, or column-based expressions.
range_operator is any valid range operator.

Range operators are of the following types:

BETWEEN: Specifies an inclusive range to search.

The following SQL query retrieves records from the Employee table when the vacation hour is
between 20 and 50:
SELECT EmployeeID, VacationHours FROM HumanResources.Employee WHERE VacationHours
BETWEEN 20 AND 50

NOT BETWEEN: Excludes the specified range from the result set.

The following SQL query retrieves records from the Employee table when the vacation hour is
not between 40 and 50:
SELECT EmployeeID,VacationHours FROM HumanResources.Employee WHERE VacationHours NOT
BETWEEN 40 AND 50

Just a minute:
Which of the following operators are logical operators?
1. BETWEEN and NOT BETWEEN
2. AND, OR, and NOT
3. + and %
4. > and <

Answer:
2. AND, OR, and NOT

Retrieving Records That Contain Any Value from a Given Set of Values

Sometimes, you might want to retrieve data after specifying a set of values to check whether the
specified value matches any data of the table. This type of operation is performed by using the IN
and NOT IN keywords. The syntax of using the IN and NOT IN keywords in the SELECT
statement is:
SELECT column_list
FROM table_name
WHERE expression list_operator (‘value_list’)

where,
expression is any valid combination of constants, variables, functions, or column-based
expressions.
list_operator is any valid list operator, IN or NOT IN.
value_list is the list of values to be included or excluded in the condition.

The IN keyword selects values that match any one of the values given in a list. The following
SQL query retrieves records of employees who are Recruiter or Stocker from the Employee table:
SELECT EmployeeID, Title, LoginID FROM HumanResources.Employee WHERE Title IN
('Recruiter', ‘Stocker')

Alternatively, the NOT IN keyword restricts the selection of values that match any one of the
values in a list. The following SQL query retrieves records of employees whose designation is not
Recruiter or Stocker:
SELECT EmployeeID, Title, LoginID FROM HumanResources.Employee WHERE Title NOT IN
('Recruiter', ‘Stocker')

Retrieving Records That Match a Pattern

When retrieving data, you can view selected rows that match a specific pattern. For example, you
are asked to create a report that displays all the product names of AdventureWorks beginning with
the letter P. You can do this by using the LIKE keyword. The LIKE keyword is used to search a
string by using wildcards. Wildcards are special characters, such as * and %. These characters are
used to match patterns.

The LIKE keyword matches the given character string with the specified pattern. The pattern can
include combination of wildcard characters and regular characters. While performing a pattern
match, regular characters must match the characters specified in the character string. However,
wildcard characters are matched with fragments of the character string.

For example, you want to retrieves records from the Department table where the values of Name
column begin with ‘Pro’. You need to use the ‘%’ wildcard character, as shown in the following
query:
SELECT * FROM HumanResources.Department WHERE Name LIKE ‘Pro%’

Consider another example, where you want to retrieve the rows from the Department table in
which the department name is five characters long and begins with ‘Sale’, whereas the fifth
character can be anything. For this, you need to use the ‘_’ wildcard character, as shown in the
following query:
SELECT * FROM HumanResources.Department WHERE Name LIKE ‘Sale_’

The following table describes the wildcard characters that are used with the LIKE keyword in
SQL server.
Wildcard Description
% Represents any string of zero or more character(s)
_ Represents any single character
[] Represents any single character within the specified range
[^] Represents any single character not within the specified range

Wildcard Characters Supported by SQL Server

Wildcard characters can be combined into a single expression with the LIKE keyword. Wildcard
characters themselves can be searched using the LIKE keyword by putting them into square
brackets ([]).

The following table describes the use of wildcard characters with the LIKE keyword.
Expression Returns
LIKE ‘LO%’ All names that begin with “LO”
LIKE ‘%ion’ All names that end with “ion”
LIKE ‘%rt%’ All names that have the letters “rt” in them
LIKE ‘_rt’ All three letter names ending with “rt”
LIKE ‘[DK]%’ All names that begin with “D” or “K”
LIKE ‘[A- All four letter names that end with “ear” and begin with any letter from “A”
D]ear’ through “D”
LIKE ‘D[^c]%’ All names beginning with “D” and not having “c” as the second letter

Use of Wildcard Characters with the LIKE Keyword

The Like operator is not case-sensitive. For example, Like ‘LO%’ and Like ‘lo%’ will return the
same result.

Retrieving Records That Contain NULL Values

A NULL value in a column implies that the data value for the column is not available. You might
be required to find records that contain null values or records that do not contain NULL values in
a particular column. In such a case, you can use the unknown_value_operator in your queries.

The syntax of using the unknown_value_operator in the SELECT statement is:


SELECT column_list
FROM table_name
WHERE column_name unknown_value_operator
where,
unknown_value_operator is either the keyword IS NULL or IS NOT NULL.

The following SQL query retrieves only those rows from the EmployeeDepartmentHistory table
for which value in the EndDate column is NULL:
SELECT EmployeeID, EndDate FROM HumanResources.EmployeeDepartmentHistory WHERE
EndDate IS NULL

No two NULL values are equal. You cannot compare one NULL value with another.
NULL values are always the first item to be displayed in the output that is sorted in an ascending
order.

Retrieving Records to be Displayed in a Sequence

You can use the ORDER BY clause of the SELECT statement to display the data in a specific
order. Data can be displayed in the ascending and descending order of values in a given column.

The syntax of the ORDER BY clause in the SELECT statement is:


SELECT select_list
FROM table_name
[ORDER BY order_by_expression [ASC|DESC]
[, order_by_expression [ASC|DESC]...]

where,
order_by_expression is the column name on which the sort is to be performed.
ASC specifies that the values need to be sorted in ascending order.
DESC specifies that the values need to be sorted in descending order.

The following SQL query retrieves the record from the Department table by setting ascending
order on the Name column:
SELECT DepartmentID, Name FROM HumanResources.Department ORDER BY Name ASC

Optionally, you can sort the result set based on more than one column. For this, you need to
specify the sequence of the sort columns in the ORDER BY clause, as shown in the following
query:
SELECT GroupName, DepartmentID, Name FROM HumanResources.Department ORDER BY
GroupName, DepartmentID

The preceding query sorts the Department table in ascending order of GroupName, and then
ascending order of DepartmentID, as shown in the following figure.
Sorting On Two Columns

If you do not specify the ASC or DESC keywords with the column name in the ORDER BY
clause, the records are sorted in the ascending order.
The ORDER By clause does not sort the table physically.

Retrieving Records from the Top of a Table

You can use the TOP keyword to retrieve only the first set of rows from the top of a table. This
set of records can be either a number of records or a percent of rows that will be returned from a
query result.

For example, you want to view the product details from the product table, where the product price
is more that $50. There might be various records in the table, but you want to see only the top 10
records that satisfy the condition. In such a case, you can use the TOP keyword.

The syntax of using the TOP keyword in the SELECT statement is:
SELECT [TOP n [PERCENT]] column_name [,column_name...]
FROM table_name
WHERE search_conditions
[ORDER BY column_name[,column_name...]]

where,
n is the number of rows that you want to retrieve.

If the PERCENT keyword is used, then ‘n’ percent of the rows are returned.

The following query retrieves the top 10 rows of the Employee table:
SELECT TOP 10 * FROM HumanResources.Employee

The following query retrieves the top 10% rows of the Employee table:
SELECT TOP 10 PERCENT * FROM HumanResources.Employee

In the output of the preceding query, 29 rows will be returned where the total number of rows in
the Employee table is 290.
If the SELECT statement including TOP has an ORDER BY clause, then the rows to be returned
are selected after the ORDER BY clause has been applied.

For example, you want to retrieve the top three records from the Employee table where the
HireDate is greater than or equal to 1/1/98 and less than or equal to 12/31/98. Further, the record
should be displayed in the ascending order based on the SickLeaveHours column. To accomplish
this task, you can use the following query:
SELECT TOP 3 * FROM HumanResources.Employee WHERE HireDate >= ‘1/1/98’ AND HireDate
<= ‘12/31/98’ ORDER BY SickLeaveHours ASC

Retrieving Records Without Duplication of Values

When there is a requirement to eliminate rows with duplicate values in a column, the DISTINCT
keyword is used. The DISTINCT keyword eliminates the duplicate rows from the result set. The
syntax of the DISTINCT keyword is:
SELECT [ALL|DISTINCT] column_names
FROM table_name
WHERE search_condition

where,
DISTINCT keyword specifies that only the records containing non-duplicated values in the
specified column are displayed.

The following SQL query retrieves all the Titles beginning with PR from the Employee table:
SELECT DISTINCT Title FROM HumanResources.Employee WHERE Title LIKE ‘PR%’

The execution of the preceding query displays a title only once.

If the DISTINCT keyword is followed by more than one column name then it is applied to all the
columns. You can specify DISTINCT keyword only before the select list.

Just a minute:
Write a query to display all the records of the ProductModel table where the product name
begins with HL.

Answer:
SELECT * FROM Production.ProductModel WHERE Name LIKE ‘HL%’

Activity: Retrieving Data

Problem Statement

You are a database developer of AdventureWorks, Inc. The AdventureWorks database is stored
in the SQLSERVER01 database server. The details of the sales persons are stored in the
SalesPerson table. The management wants to view the details of the top three sales persons who
have earned bonus money between $4,000 and $6,000.
How will you generate this report?

Solution

To solve the preceding problem, you need to perform the following tasks:

1. Create a query.
2. Execute the query to generate the report.

Task 1: Creating a Query

To display the top three sales person records from the SalesPerson table, you need to use the TOP
keyword. In addition, to specify the range of bonus earned, you need to use the BETWEEN range
operator.

To create the query by using SQL Server Management Studio, you need to perform the following
steps:

1. Select Start(Programs(Microsoft SQL Server 2005(SQL Server Management Studio to


display the Microsoft SQL Server Management Studio window. In the Microsoft SQL
Server Management Studio window, the Connect to Server dialog box is displayed.
2. Select the server name from the Server name drop-down list.

Name of the server is computer specific. In this example, the name of the server is
SQLSERVER01. You can use the DEFAULT server as the database server.
3. Select authentication as SQL Server Authentication from the Authentication drop-down
list in the Connect to Server dialog box.
4. Enter the login name and password in the Login and Password text boxes, as shown in the
following figure.

Connect to Server Dialog Box

5. The Microsoft SQL Server Management Studio window is displayed, as shown in the
following figure.
Microsoft SQL Server Management Studio Window

6. Click the New Query button on the Standard Toolbar.


The SQLSERVER01.Master – SQLQuery1.sql window is displayed.

7. Select AdventureWorks to set the database to AdventureWorks from the Available


Databases drop-down list of the SQL Editor toolbar. The Query Editor window changes to
SQLSERVER01.AdventureWorks – SQLQuery1.sql.

Name of the Query Editor window is specific to machine.


8. Type the following query in the Query Editor window:
SELECT TOP 3 * FROM Sales.SalesPerson WHERE Bonus BETWEEN 4000 AND 6000

Task 2: Executing the Query to Generate the Report

Select the query and press the F5 key or click the Execute button to execute the query to generate
the report. The following figure displays the output of the preceding query.
Generating the Report

Using Functions to Customize the Result Set


While querying data from SQL Server 2005, you can use various in-built functions to customize
the result set. These in-built functions are provided by SQL Server 2005. Customization includes
changing the format of the string or date values or performing calculations on the numeric values
in the result set. For example, if you need to display all the text values in uppercase, you can use
the upper() string function. Similarly, if you need to calculate the square of the integer values,
you can use the power()mathematical function.

Depending on the utility, the in-built functions provided by SQL Server 2005 are categorized as
string functions, date functions, mathematical functions, ranking functions, and system
functions.</

Using String Functions


You can use the string functions to manipulate the string values in the result set. For example, to
display only the first eight characters of the values in a column, you can use the left() string
function.

String functions are used with the char and varchar data types. SQL Server provides string
functions that can be used as a part of any character expression. These functions are used for
various operations on strings.

The syntax of using a function in the SELECT statement is:


SELECT function_name (parameters)

where,
function_name is the name of the function.
parameters are the required parameters for the string function.

For example, you want to retrieve the Name, DepartmentID, and GroupName columns from the
Department table and the data of the Name column should be displayed in uppercase with a user-
defined heading, Department Name. For this, you can use the upper() string function, as shown in
the following query:
SELECT ‘Department Name'= upper(Name), DepartmentID, GroupName FROM
HumanResources.Department
The following SQL query uses the left() string function to extract the specified characters from
the left side of a string:
SELECT Name = Title + ‘ ‘ + left(FirstName,1) + ‘. ‘ + LastName, EmailAddress FROM
Person.Contact

The execution of the preceding query will display the following output.

Left String Function

The following table lists the string functions provided by SQL Server 2005.
String Functions</

Using Date Functions


You can use the date functions of SQL Server to manipulate datetime values. You can either
perform arithmetic operations on date values or parse the date values. Date parsing includes
extracting components, such as the day, the month, and the year from a date value.

You can also retrieve the system date and use the value in the date manipulation operations. To
retrieve the current system date, you can use the getdate() function. The following query displays
the current date:
SELECT getdate()

The datediff() function is used to calculate the difference between two dates. For example, the
following SQL query uses the datediff() function to calculate the age of the employees:
SELECT datediff (yy, BirthDate, getdate()) AS ‘Age’
FROM HumanResources.Employee

The preceding query calculates the difference between the current date and the date of birth of
employees, whereas, the date of birth of employees is stored in the BirthDate column of the
Employee table in the AdventureWorks database.

The following table lists the date functions provided by SQL Server 2005.

Date Functions
To parse the date values, you can use the datepart()function in conjunction with the date functions.
For example, the datepart()function retrieves the year when an employee was hired, along with
the employee title, from the Employee table, as shown in the following query:
SELECT Title, datepart (yy, HireDate) AS ‘Year of Joining’
FROM HumanResources.Employee

SQL Server 2005 provides the abbreviations and values of the datepart function, as shown in the
following table.
Date part Abbreviation Values
year yy, yyyy 1753-9999
quarter qq, q 1-4
month mm, m 1-12
day of year dy, y 1-366
day dd, d 1-31
week wk, ww 0-51
weekday dw 1-7(1 is Sunday)
hour hh (0-23)
minute mi, n (0-59)
second ss, s 0-59
millisecond ms 0-999

Abbreviations Used to Extract Different Parts of a Date

The following SQL query uses datename() and datepart() functions to retrieve the month name
and year from a given date:
SELECT EmployeeID,datename(mm, hiredate)+ ‘, ‘ + convert(varchar,datepart(yyyy,
hiredate)) as ‘Joining’
FROM HumanResources.Employee

The execution of the preceding query will display the output, as shown in the following figure.
Date Functions

SQL Server also provides the convert function to change the data type of an expression into
another data type. The syntax of the CONVERT function is:
convert (datatype [(length)], expression [, style])
where,
datatype is the system-defined data type. (User-defined data types cannot be used.)
length is the optional parameter of char, varchar, or binary data types.
expression is any valid expression to be converted from one data type to another.
style is the method of representing the date when converting the date data type into the
character data type.
For example, you need to display the title and hire date from the Employee table. For this, you
need to convert the hire date from the date data type to the character data type and then
display it in the yy.mm.dd format. To perform this task, you can write the following query:
SELECT Title, convert(char(10),HireDate,2) AS ‘Hire Date’ FROM
HumanResources.Employee</

Using Mathematical Functions


You can use mathematical functions to manipulate the numeric values in a result set. You can
perform various numeric and arithmetic operations on the numeric values. For example, you can
calculate the absolute value of a number or you can calculate the square or square root of a value.

The following table lists the mathematical functions provided by SQL Server 2005.
Mathematical Functions

For example, to calculate the round off value of any number, you can use the round() mathematical
function. The round() mathematical function calculates and returns the numeric value based on
the input values provided as an argument.

The syntax of the round function is:


round(numeric_expression, length)

where,
numeric_expression is the numeric expression to be rounded off.
length is the precision to which the expression is to be rounded off.

The following SQL query retrieves the EmployeeID and Rate for a specified employee id from
the EmployeePayHistory table:
SELECT EmployeeID, ‘Hourly Pay Rate’ = round(Rate,2)
FROM HumanResources.EmployeePayHistory

The preceding query displays the output, as shown in the following figure.

round() Function

In the preceding figure, the value of the Hourly Pay Rate column is rounded off to two decimal
places.

While using the round() function, if the length is positive, then the expression is rounded to the
right of the decimal point. If the length is negative, then the expression is rounded to the left of
the decimal point.

The following table lists the usage of the round() function provided by SQL Server 2005.
Function Output
round (1234.567,2) 1234.570
round (1234.567,1) 1234.600
round (1234.567,0) 1235.000
round (1234.567,-1) 1230.000
round (1234.567,-2) 1200.000
round (1234.567,-3) 1000.000

Usage of the round() Function

Just a minute:
Identify the utility of the datepart function.

Answer:
The datepart function is used to extract different parts of a date value.
Just a minute:
The management of AdventureWorks wants to increase the shift time from 8 hours to 10 hours.
Calculate the end time of the shifts based on their start time.

Answer:
SELECT ShiftID, StartTime, ‘EndTime’ = dateadd (hh, 10, StartTime) FROM
HumanResources.Shift</

Using Ranking Functions


You can use ranking functions to generate sequential numbers for each row or to give a rank based
on specific criteria. For example, in a manufacturing organization, the management wants to rank
the employees based on their salary. To rank the employees, you can use the rank() function.

Consider another example, where a teacher wants to see the names and scores of all students
according to their ranks. Ranking functions return a ranking value for each row. However, based
on the criteria, more than one row can get the same rank. You can use the following functions to
rank the records:

row_number()
rank()
dense_ rank()

All these functions make use of the OVER clause. This clause determines the ascending or
descending sequence in which rows are assigned a rank.

row_number() Function

The row_number() function returns the sequential numbers, starting at 1, for the rows in a result
set based on a column.

For example, the following SQL query displays the sequential number on a column by using the
row_number function:
SELECT EmployeeID, Rate, row_number() OVER(ORDER BY Rate desc)AS RANK FROM
HumanResources.EmployeePayHistory

The following figure displays the output of the preceding query.


Output of the row_number Function

The EmployeeID and Rate column are retrieved from the EmployeePayHistory table, where the
Rate column is ranked by using the row_number() function. The ORDER BY keyword in the
OVER clause specifies that the result set will appear in the descending order of the Rate column.

rank() Function

The rank() function returns the rank of each row in a result set based on specified criteria. For
example, you want to rank the products based on the sales made during a year. For this, you can
use the rank() function. This function will consider the ORDER BY clause and the record with
maximum value will get the highest rank if the ORDER BY clause is asc.

For example, you want to create the report of all the employees with their salary rates. The salary
rates should also contain the rank of the salary. The highest ranked employee should be given the
rank as 1. In addition, if two employees have the same salary rate, they should be given the same
rank.

To perform this task, you need to write the following query:


SELECT EmployeeID, Rate, rank() OVER(ORDER BY Rate desc)AS rank FROM
HumanResources.EmployeePayHistory

The following figure displays the output of the preceding query.

Output of the rank Function

In the preceding output, the salary rates for the employee ID 158 and 42 are same.

Therefore, both of them have been ranked 6 but the rank for the employee ID 140 is 8, not 7. In
case, you want that the consecutive ranks are provided, you need to use the dense_rank() function.

dense_rank() Function

The dense_rank() function is used where consecutive ranking values need to be given based on a
specified criteria. It performs the same ranking task as the rank() function, but provides
consecutive ranking values to an output. For example, you want to rank the products based on the
sales done for that product during a year. If two products A and B have same sale values, both
will be assigned a common rank. The next product in the order of sales values would be assigned
the next rank value.
If in the preceding example of the rank() function, you need to give the same rank to the employees
with the same salary rate and the consecutive rank to the next one. You need to write the following
query:
SELECT EmployeeID, Rate, dense_rank() OVER(ORDER BY Rate desc)AS rank FROM
HumanResources.EmployeePayHistory

The following figure displays the output of the preceding query.

Output of the dense_rank Function</

Using System Functions


The system functions are used to query the system tables. System tables are a set of tables that are
used by SQL Server to store information about users, databases, tables, and security. The system
functions are used to access the SQL Server databases or user-related information. For example,
to view the host ID of the terminal on which you are logged onto, you can use the following query:
SELECT host_id() as ‘HostID’

The preceding query displays the output, as shown in the following figure.

Host_id() Function

The system functions may display different output in different machine.

The following table lists the system functions provided by SQL Server 2005.
Function Definition
host_id() Returns the current host process ID number of a client process
host_name () Returns the current host computer name of a client process
suser_sid ([‘login_name’]) Returns the security identification (SID) number corresponding to
the log on name of the user
suser_id ([‘login_name’]) Returns the log on identification (ID) number corresponding to
the log on name of the user
suser_sname Returns the log on name of the user corresponding to the
([server_user_id]) security identification number
user_id ([‘name_in_db’]) Returns the database identification number corresponding to the
user name
user_name ([user_id]) Returns the user name corresponding to the database
identification number
db_id (['db_name']) Returns the database identification number of the database
db_name ([db_id]) Returns the database name
object_id (‘objname’) Returns the database object ID number
object_name (‘obj_id’) Returns the database object name

System Functions

Activity: Customizing the Result Set

Problem Statement

The management at AdventureWorks, Inc. wants to view a report that displays the employee ID,
designation, and age of the employees who are working as a marketing manager or a marketing
specialist. The data should be displayed in uppercase.

The employee details are stored in the Employee table in the AdventureWorks database. How will
you display the required data?

Solution

To solve the preceding problem, you need to perform the following tasks:

1. Create a query.
2. Execute the query to display the data.

Task 1: Creating a Query

To display the age of the employees, you need to use the datediff() function to calculate the
difference between their birth date and the current date. In addition, to retrieve the data for a
marketing manager or a marketing specialist, you need to use the OR logical operator.

Type the following query in the Query Editor window of the Microsoft SQL Server
Management Studio window:
SELECT EmployeeID, upper(Title) AS Designation,datediff(yy,Birthdate,getdate()) AS
Age
FROM HumanResources.Employee WHERE Title='Marketing Manager’ OR Title='Marketing
Specialist’

Task 2: Executing the Query to Display the Data

Press the F5 key or click the Execute button to execute the query and view the result set. The
following figure displays the output.

Result Set Displaying the Employee Details

Summarizing and Grouping Data


At times, the users need to view a summary of the data. Summary of the data contains aggregated
values that help in data analysis at a broader level. For example, to analyze the sales, the users
might want to view the average sales or total sales for a specified time period. SQL Server provides
aggregate functions to generate summarized data.

The users might also want to view the summarized data in different groups based on specific
criteria. For example, the users want to view the average sales data region-wise or product-wise.
In such a case, the sales data of each region will be displayed together. You can group the data by
using the GROUP BY clause of the SELECT statement. You can also use aggregate functions to
summarize data when grouping it.</

Summarizing Data by Using Aggregate Functions


At times, you need to calculate the summarized values of a column based on a set of rows. For
example, the salary of employees is stored in the Rate column of the EmployeePayHistory table
and you need to calculate the average salary earned by the employees.

The aggregate functions, on execution, summarize the values for a column or a group of columns,
and produce a single value. The syntax of an aggregated function is:
SELECT aggregate_function([ALL|DISTINCT] expression)
FROM table_name

where,
ALL specifies that the aggregate function is applied to all the values in the specified column.
DISTINCT specifies that the aggregate function is applied to only unique values in the specified
column.
expression specifies a column or an expression with operators.
You can calculate summary values by using the following aggregate functions:

Avg(): Returns the average of values in a numeric expression, either all or distinct.

The following SQL query retrieves the average value from the Rate column of the
EmployeePayHistory table with a user-defined heading:
SELECT ‘Average Rate’ = avg (Rate) FROM HumanResources.EmployeePayHistory

Count(): Returns the number of values in an expression, either all or distinct. The following
SQL query retrieves the unique rate values from the EmployeePayHistory table with a user-
defined heading:

SELECT ‘Unique Rate’ = count (DISTINCT Rate)


FROM HumanResources.EmployeePayHistory

The count() function also accepts (*) as its parameter, but it counts the number of rows returned
by the query.

Min(): Returns the lowest value in the expression. The following SQL query retrieves the
minimum value from the Rate column of the EmployeePayHistory table with a user-defined
heading:

SELECT ‘Minimum Rate’ = min (Rate)


FROM HumanResources.EmployeePayHistory

Max(): Returns the highest value in the expression. The following SQL query retrieves the
maximum value from the Rate column of the EmployeePayHistory table with a user-defined
heading:

SELECT ‘Maximum Rate’ = max (Rate)


FROM HumanResources.EmployeePayHistory

Sum(): Returns the sum total of values in a numeric expression, either all or distinct. The
following SQL query retrieves the sum value of all the unique Rate values from the
EmployeePayHistory table with a user-defined heading:

SELECT ‘Sum’ = sum (DISTINCT Rate) FROM HumanResources.EmployeePayHistory

Just a minute:
What would be the output of the following query?
SELECT ‘Maximum Rate’ = max (UnitPrice)
FROM Sales.SalesOrderDetail

Answer:
The query displays the maximum unit price from the SalesOrderDetail table.</

Grouping Data
At times, you need to view data matching specific criteria to be displayed together in the result
set. For example, you want to view a list of all the employees with details of employees of each
department displayed together.

You can group the data by using the GROUP BY, COMPUTE, COMPUTE BY, and PIVOT
clauses of the SELECT statement.

GROUP BY

The GROUP BY clause summarizes the result set into groups as defined in the SELECT statement
by using aggregate functions. The HAVING clause further restricts the result set to produce the
data based on a condition. The syntax of the GROUP BY clause is:
SELECT column_list
FROM table_name
WHERE condition
[GROUP BY [ALL] expression [, expression]
[HAVING search_condition]

where,
ALL is a keyword used to include those groups that do not meet the search condition.
expression specifies the column name(s) or expression(s) on which the result set of the
SELECT statement is to be grouped.
search_condition is the conditional expression on which the result is to be produced.

The following SQL query returns the minimum and maximum values of vacation hours for the
different types of titles when the vacation hours are greater than 80:
SELECT Title, Minimum = min(VacationHours), Maximum = max(VacationHours) FROM
HumanResources.Employee
WHERE VacationHours > 80 GROUP BY Title

The preceding query will display the following output.

Output of the GROUP BY Clause

The GROUP BY&HAVING clause is same as the SELECT&WHERE clause. The result set
produced with the GROUP BY clause eliminates all the records and values that do not meet the
condition specified in the HAVING clause. The GROUP BY clause collects data that matches the
condition, and summarizes it into an expression to produce a single value for each group. The
HAVING clause eliminates all those groups that do not match the condition.
The following query retrieves all the titles along with their average vacation hours when the
vacation hours are more than 30 and the group average value is greater than 55:
SELECT Title, ‘Average Vacation Hours’ = avg(VacationHours) FROM
HumanResources.Employee WHERE VacationHours > 30 GROUP BY Title HAVING
avg(VacationHours) >55

The GROUP BY clause can be applied on multiple fields. You can use the following query to
retrieve the average value of the vacation hours that is grouped by Title and ManagerID in the
Employee table:
SELECT Title, ‘Manager ID’ = ManagerID, Average = avg(VacationHours) FROM
HumanResources.Employee
GROUP BY Title, ManagerID

The preceding query will display the output, as shown in the following figure.

Output of GROUP BY Clause Applied On Multiple Fields

The ALL keyword of the GROUP BY clause is used to display all groups, including those which
are excluded by the WHERE clause. The ALL keyword is meaningful to those queries that contain
the WHERE clause. If ALL is not used, the GROUP BY clause does not show the groups for
which there are no matching rows. However, the GROUP BY ALL shows all rows, even if the
groups have no rows meeting the search conditions.

The following query retrieves the records for the employee titles that are eliminated in the
WHERE condition:
SELECT Title, VacationHours = sum (VacationHours) FROM HumanResources.Employee WHERE
Title IN ('Recruiter', ‘Stocker', ‘Design Engineer') GROUP BY ALL Title
ORDER BY sum (VacationHours)DESC

The following figure displays the output of the preceding query.

Output of the GROUP BY ALL Clause


If you write the preceding query using GROUP BY clause, it will display 3 records

COMPUTE and COMPUTE BY

The COMPUTE clause, with the SELECT statement, is used to generate summary rows by using
aggregate functions in the query results. The COMPUTE BY clause can be used to calculate
summary values of the result set on a group of data. The column on which the data is to be grouped
is mentioned after the BY keyword.

The GROUP BY clause is used to generate a group summary report and does not produce
individual table rows in the result set, whereas the COMPUTE and COMPUTE BY clauses
generate the summary report with individual data rows from the table. In other words, the
COMPUTE clause is used for control-break summary reporting applications that generate detailed
information in the result set.

The syntax of the COMPUTE and COMPUTE BY clause is:


SELECT column_list
FROM table_name
ORDER BY column_name
COMPUTE aggregate_function (column_name) [, aggregate_function (column_name)…]
[BY column_name [, column_name]…]

where,
ORDER BY column_name specifies the name of the column(s) by which data in the result is to be
sorted.
COMPUTE aggregate_function specifies any row aggregate function from the aggregate
function list.
column_name specifies the name of the column(s) for which the summary report is to be
displayed.
BY column_namespecifies the name of the column(s) by which data is to be grouped.

The following SQL query uses the COMPUTE clause to calculate the total SickLeaveHours and
VacationHours from the Employee table and display them based on Title, VacationHours, and
SickLeaveHours:
SELECT Title, ‘Total VacationHours’ = VacationHours, ‘Total SickLeaveHours’ =
SickLeaveHours
FROM HumanResources.Employee
WHERE Title IN ('Recruiter', ‘Stocker')
ORDER BY Title, VacationHours, SickLeaveHours
COMPUTE SUM(VacationHours), SUM(SickLeaveHours)

In the preceding query, the data of the VacationHours and SickLeaveHours column is grouped for
the recruiter and the stocker, and the summation of the vacation hours and sick leave hours of both
is retrieved, as shown in the following figure.
Output of the COMPUTE Clause

Consider another example, where you need to use the COMPUTE BY clause to calculate the
subtotals of VacationHours and SickLeaveHours for each value in the Designation column. The
COMPUTE clause calculates the grand total of VacationHours and SickLeaveHours, as shown in
the following query:
SELECT Title, ‘Total VacationHours’ = VacationHours, ‘Total SickLeaveHours’ =
SickLeaveHours FROM HumanResources.Employee WHERE Title IN ('Recruiter',
‘Stocker')ORDER BY Title, VacationHours, SickLeaveHours
COMPUTE sum (VacationHours), sum(SickLeaveHours) BY Title COMPUTE sum
(VacationHours), sum (SickLeaveHours)

In the preceding query, initially the data of the VacationHours and SickLeaveHours column is
grouped for the recruiter and stocker and the summation of the vacation hours and sick leave hours
for both are retrieved. The BY clause computes the total vacation hours and sick leave hours for
the recruiter and stocker, as shown in the following figure.

Output of the COMPUTE BY Clause

PIVOT
The database users might need to view data in a user-defined format. These reports might involve
summarizing data on the basis of various criteria. SQL Server 2005 allows you to generate
summarized data reports using the PIVOT clause of SELECT statement.

The PIVOT clause is used to transform a set of columns into values. PIVOT rotates a table-valued
expression by turning the unique values from one column in the expression into multiple columns
in the output. In addition, it also performs aggregations on the remaining column values if required
in the output. The syntax of the PIVOT operator is:
SELECT * from table_name
PIVOT (aggregation_function (value_column)
FOR pivot_column
IN (column_list)
) table_alias

Consider an example, you want to display the number of purchase orders placed by certain
employees, laid down with the vendors. The following query provides this report:
SELECT VendorID, [164] AS Emp1, [198] AS Emp2, [223] AS Emp3, [231] AS Emp4, [233]
AS Emp5
FROM
(SELECT PurchaseOrderID, EmployeeID, VendorID
FROM Purchasing.PurchaseOrderHeader) p
PIVOT
(
COUNT (PurchaseOrderID)
FOR EmployeeID IN
( [164], [198], [223], [231], [233] )
) AS pvt
ORDER BY VendorID

The preceding query displays the output, as shown in the following figure.

Output of the PIVOT Clause

Just a minute:
When grouping data, which of the following clauses helps eliminate the groups that do not
match the condition specified?

1. NOT IN
2. HAVING
3. WHERE
4. COMPUTE

Answer:
2. HAVING

Just a minute:
Match the column A with column B.
Column A Column B
ALL Used by aggregate functions
PIVOT Returns zero or more values
IN Used for modifying comparison operator
> Relational operator
Answer:
Column A Column B
ALL Used for modifying comparison operator
PIVOT Relational operator
IN Returns zero or more values
> Used by aggregate functions

Activity: Summarizing and Grouping Data

Problem Statement

You are a database developer of AdventureWorks, Inc. The management wants to view the
average quantity ordered for each product group. The data should be displayed in the descending
order of ProductID.

The sales details are stored in the SalesOrderHeader and SalesOrderDetails tables in the
AdventureWorks database. How will you generate this report?
Solution
To solve the preceding problem, you need to perform the following tasks:

1. Create a query.
2. Execute the query to verify the result.

Task 1: Creating a Query

To display the average order quantity you need to use the avg() aggregate function. In addition,
to group the products you need to use the GROUP BY clause.

Type the following query in the Query Editor window of Microsoft SQL Server Management
Studio window to display the result:
SELECT SalesOrderID, ProductID, ‘Average Order Quantity’ = avg(OrderQty) FROM
Sales.SalesOrderDetail
GROUP BY ProductID, SalesOrderID

Task 2: Executing the Query to Verify the Result

Select the query and press the F5 key or click the Execute button to execute the query and view
the result set. The following figure displays the output.

Generating the Report

Summary
In this chapter, you learned that:

Data can be retrieved from a database by using the SELECT statement.


Data of all the columns of a table can be retrieved by specifying * in the SELECT statement.
Data that has to be retrieved based on a condition is specified by adding the WHERE clause.
Literals and user-defined headings are added to change the display.
The concatenation operator is used to concatenate a string expression.
Arithmetic operators are used to perform mathematical operations.
Comparison operators test the similarity between two expressions.
Logical operators are used in the SELECT statement to retrieve records based on one or
matching conditions. The logical operators are AND, OR, and NOT.
The range operator retrieves data based on the range. There are of two types of range
operators, BETWEEN and NOT BETWEEN.
The IN keyword allows the selection of values that match any one of the values in a list.
The NOT IN keyword restricts the selection of values that match any one of the values in a
list.
The LIKE keyword is used to specify the pattern search.
The IS NULL keyword is used to retrieve missing values.
The ORDER BY clause is used to retrieve data in a specific order.
The TOP keyword retrieves only the first set of rows, which can either be a number or a
percent of rows that will be returned from a query result.
The DISTINCT keyword eliminates duplicate rows.
The string functions are used to format data in the result set.
The date functions are used to manipulate date values.
The mathematical functions are used to perform numerical operations.
The ranking functions are used to generate sequential numbers for each row or to give a rank
based on specific criteria.
The system functions are used to query system tables.
The aggregate functions, such as avg, count, min, max, and sum, are used to retrieve
summarized data.
The GROUP BY, GROUP BY ALL, and PIVOT clauses are used to group the result set.
The COMPUTE and COMPUTE BY clauses are used to calculate summarized values in a
grouped result set.

Exercises
Connect to SQL Server and use the AdventureWorks database.

Exercise 1

Display the details of all the customers.

Exercise 2

Display the ID, type, number, and expiry year of all the credit cards in the following format.

Exercise 3

Display the customer ID and the account number of all the customers who live in the Territory ID
4.

Exercise 4
Display all the details of the sales orders that have a cost exceeding $2,000.

Exercise 5

Display the sales order details of the product named ‘Cable Lock'.

Hint: The Product ID for Cable Lock is 843.

Exercise 6

Display the list of all the orders placed on June 06, 2004.

Exercise 7

Display a report of all the orders in the following format.

Hint: Total Cost = Order Quantity * Unit Price

Exercise 8

Display a list of all the sales orders in the price range of $2,000 to $2,100.

LineTotal computes the subtotal of each product. Computed as UnitPrice * (1 – UnitPriceDiscount)


* OrderQty.

Exercise 9
Display the name, country region code, and sales year to date for the territory with Territory ID
as 1.

Exercise 10

Display the details of the orders that have a tax amount of more than $10,000.

Exercise 11

Display the sales territory details of Canada, France, and Germany.

Exercise 12

Generate a report that contains the IDs of sales persons living in the territory with TerritoryID as
2 or 4. The report is required in the following format.
Sales Person ID Territory ID

Exercise 13

Display the details of the Vista credit cards that are expiring in the year 2006.

Exercise 14

Display the details of all the orders that were shipped after July 12, 2004.

Exercise 15

Display the orders placed on July 01, 2001 that have a total cost of more than $10,000 in the
following format.

Exercise 16

Display the details of the orders that have been placed by customers online.

Exercise 17

Display the order ID and the total amount due of all the sales orders in the following format.
Ensure that the order with the highest price is at the top of the list.

Exercise 18

Display the order ID and the tax amount for the sales orders that are less than $2,000. The data
should be displayed in the ascending order.

Exercise 19
Display the order number and the total value of the order in ascending order of the total value.

Exercise 20
Display the maximum, minimum, and the average rate of sales orders.

Exercise 21
Display the total value of all the orders put together.

Exercise 22

Display the Order ID of the top five orders based on the total amount due in the year 2001.

Hint: You can extract the year part form a date using the Datepart function.

Exercise 23

Display the details of all the currencies that have the word ‘Dollar’ in their name.

Exercise 24

Display all territories whose names begin with ‘N'.

Exercise 25

Display the SalesPersonID, the TerritoryID, and the sales quota for those sales persons who have
been assigned a sales quota. The data should be displayed in the following format.

Exercise 26

What will be the output of the following code written to display the total order value for each
order?
SELECT SalesOrderID,ProductID,sum(LineTotal) FROM Sales.SalesOrderDetail GROUP BY
SalesOrderID

Exercise 27
You can place an order for more than one product. Display a report containing the product ID and
the total cost of products for the product ID whose total cost is more than $10000.

Exercise 28

The following SQL query containing the COMPUTE BY clause generates errors. What are
possible causes of such errors?
SELECT ProductID, LineTotal AS ‘Total’ FROM Sales.SalesOrderDetail
COMPUTE sum(LineTotal) BY ProductID

Exercise 29

Display the top three sales persons based on the bonus.


Exercise 30
Display the details of those stores, which have Bike in their name.

Exercise 31

Display the total amount collected from the orders for each order date.

Exercise 32

Display the total unit price and the total amount collected after selling the products, 774 and 777.
In addition, calculate the total amount collected from these two products.

Exercise 33

Display the sales order ID and the maximum and minimum values of the order based on the sales
order ID. In addition, ensure that the order amount is greater than $5,000.

Exercise 34
A report containing the sales order ID and the average value of the total amount, which is greater
than $5,000 is required in the following format.

Exercise 35

Display the different types of credit cards used for purchasing products.

Exercise 36

Display the customer ID, name, and sales person ID for all the stores. According to the
requirement, only first 15 letters of the name should be displayed.

Exercise 37

Display all orders in the following format.

Exercise 38

Display SalesOrderID, OrderQty, and UnitPrice from the SalesOrderDetail table where a similar
unit price needs to be marked with an identical value.
Exercise 39
Display the EmployeeID and the HireDate of the employees from the Employee table. The month
and the year need to be displayed.

Chapter 3
Querying Data by Using Joins and Subqueries
In a normalized database, the data can be stored in multiple tables. When you need to view data
from related tables together, you can join the tables with the help of common attributes. You can
also use subqueries where the result of a query is used as an input for the condition of another
query.

This chapter discusses how to query data from multiple tables by applying various types of joins,
such as an inner join, outer join, cross join, equi join, or self join. Further, it explains how to use
subqueries.

Objectives
In this chapter, you will learn to:
Query data by using joins
Query data by using subqueries

Querying Data by Using Joins


When a SQL query executes, it returns a result set. A result set is nothing but a set of rows retrieved
from a table. As a database developer, you may need to retrieve data from more than one table.
The retrieved data needs to be displayed in a single result set. In such a case, different columns in
the result set can obtain data from different tables. To retrieve data from multiple tables, SQL
Server allows you to apply joins. Joins allow you to view data from related tables in a single result
set. You can join more than one table based on a common attribute.

Depending on the requirements to view data from multiple tables, you can apply different types
of joins, such as inner join, outer join, cross join, equi join, or self join.</

Using an Inner Join


An inner join retrieves records from multiple tables after comparing values present in a common
column. When an inner join is applied, only rows with values satisfying the join condition in the
common column are displayed. Rows in both the tables that do not satisfy the join condition are
not displayed.

For example, the details of the films are stored in the following Films table.
FilmID FilmName YearMade
1 My Fair Lady 1964
2 Unforgiven 1992
Films Table

The details of the actors are stored in the following Actors table.
FilmID FirstName LastName
1 Rex Harrison
1 Audrey Hepburn
2 Clint Eastwood
5 Humphrey Bogart

Actors Table

If you join the Films and Actors tables using the inner join, the result set will be retrieved, as
shown in the following table.

Result Set of an Inner Join

Notice that the row with the last name ‘Bogart’ is not present in the result set as the corresponding
matching record is not found in the Films table.

Consider another example, where you have created two tables, Customer and Orders. The
Customer table is used to store the details of the customers. The Orders table stores the order
details. The CustomerID is the common column between these two tables. The following figure
depicts how inner join produces the set of records that match both the tables, Customer and Orders.

Inner Join

In the preceding figure, all the rows in the Customer and Orders tables that are not matching the
join condition are excluded from the result set.

A join is implemented by using the SELECT statement, where the SELECT list contains the name
of the columns to be retrieved from the tables. The FROM clause contains the names of the tables
from which combined data is to be retrieved. The WHERE clause specifies the condition, with a
comparison operator, based on which the tables will be joined.

The syntax of applying an inner join in the SELECT statement is:


SELECT column_name, column_name [,column_name]
FROM table1_name JOIN table2_name
ON table1_name.ref_column_name join_operator table2_name.ref_column_name

where,
table1_name and table2_name are the names of the tables that are joined.
join_operator is the comparison operator based on which the join is applied.
table1_name.ref_column_name and table2_name.ref_column_name are the names of the
columns on which the join is applied.

An inner join is the default join. Therefore, you can also apply an inner join by using the JOIN
keyword. In addition, you can also use the INNER JOIN keyword.

Whenever a column is mentioned in a join condition, the column should be referred by prefixing
it with the table name to which it belongs or with a table alias. A table alias is used to refer to the
table with another name or to uniquely identify the table. It is defined in the FROM clause of the
SELECT statement.

When listing the column names in the SELECT statement, it is mandatory to use a table name or
a table alias if an ambiguity arises due to duplicate column names in multiple tables.

The following query displays the Employee ID and Title for each employee from the Employee
table, and the Rate and PayFrequency columns from the EmployeePayHistory table:
SELECT e.EmployeeID,e.Title, eph.Rate,eph.PayFrequency
FROM HumanResources.Employee e JOIN HumanResources.EmployeePayHistory eph ON
e.EmployeeID = eph.EmployeeID

In the preceding query, the Employee and EmployeePayHistory tables are joined on the common
column, EmployeeID. The query also assigns e as the alias of the Employee table and eph as the
alias of the EmployeePayHistory table. The column names are also listed with the table alias
names.

The following figure displays the data of the Employee table.


Employee Table

The following figure displays the data of the EmployeePayHistory table.

EmployeePayHistory Table

The following figure displays the output of the preceding inner join query.

Result Set of an Inner Join


Based on the relationship between tables, you need to select the common column to set the join
condition. In the preceding inner join query, the common column is EmployeeID, which is the
primary key of the Employee table and the foreign key of the EmployeePayHistory table.

Observe the result set after the join. The record of the employee with EmployeeID as 1 from the
Employee table is joined with the record of the employee with EmployeeID as 1 from the
EmployeePayHistory table.

While applying joins, you can also check for other conditions. The following query retrieves the
employee ID and the designation from the Employee table for all the employees whose pay rate
is greater than 40:
SELECT ‘Employee ID’ = e.EmployeeID, ‘Designation’ = e.Title
FROM HumanResources.Employee e INNER JOIN HumanResources.EmployeePayHistory eph ON
e.EmployeeID = eph.EmployeeID where eph.Rate>40

In the preceding query, the tables are joined based on the EmployeeID column, which is common
in both the tables. In addition, only those records are selected where the value in the Rate column
of the EmployeePayHistory table is greater than 40.

Just a minute:
Why do you need a table alias with the column name?

Answer:
A table alias is required to uniquely identify columns in the SELECT query to avoid ambiguity that
arises due to same column names in multiple tables.</

Using an Outer Join


In comparison to an inner join, an outer join displays the result set containing all the rows from
one table and the matching rows from another table. For example, if you create an outer join on
table A and table B, it will show you all the records of table A and only those records from table
B for which the condition on the common column holds true.

An outer join displays NULL for the columns of the related table where it does not find matching
records. The syntax of applying an outer join is:
SELECT column_name, column_name [,column_name]
FROM table1_name [LEFT | RIGHT| FULL] OUTER JOIN table2_name
ON table1_name.ref_column_name join_operator table2_name.ref_column_name

An outer join is of the following types:

Left outer join


Right outer join
Full outer join

Using a Left Outer Join


A left outer join returns all rows from the table specified on the left side of the LEFT OUTER
JOIN keyword and the matching rows from the table specified on the right side. It displays NULL
for the columns of the table specified on the right side where it does not find matching records.

The following figure depicts how the left outer join produces a complete set of records from the
Customer table, with matching records (where available) in the Orders table.

Left Outer Join

In the preceding figure, all the non-matching rows of the Orders table are excluded from the result
set.

Consider an example. The SpecialOfferProduct table contains a list of products that are on special
offer. The SalesOrderDetail table stores the details of all the sales transactions. The users at
AdventureWorks, Inc. need to view the transaction details of these products. In addition, they
want to view the ProductID of the special offer products for which no transaction has been done.

To perform this task, you can use the LEFT OUTER JOIN keyword, as shown in the following
query:
SELECT p.ProductID, p1.SalesOrderID, p1.UnitPrice FROM Sales.SpecialOfferProduct p
LEFT OUTER JOIN
[Sales].[SalesOrderDetail] p1 ON p. ProductID = p1.ProductID WHERE SalesOrderID IS
NULL

The following figure displays the output of the preceding query.

Result Set of a Left Outer Join

The preceding figure displays NULL for the ProductIDs for which no transaction was performed.

Using a Right Outer Join


A right outer join returns all the rows from the table specified on the right side of the RIGHT
OUTER JOIN keyword and the matching rows from the table specified on the left side.

The following figure depicts how the right outer join produces a complete set of records from the
Orders table, with matching records (where available) in the Customer table.

Right Outer Join

In the preceding figure, all the non-matching rows of the Customer table are excluded from the
result set.

Consider the example of AdventureWorks, Inc. The JobCandidate table stores the details of all
the job candidates. You need to retrieve a list of all the job candidates. In addition, you need to
find which candidate has been employed in AdventureWorks, Inc. To perform this task, you can
apply a right outer join between the Employee and JobCandidate tables, as shown in the following
query:
SELECT e.Title, d.JobCandidateID FROM HumanResources.Employee e
RIGHT OUTER JOIN HumanResources.JobCandidate d ON e.EmployeeID=d.EmployeeID

The following figure displays the output of the preceding query.

Result Set of a Right Outer Join

In the preceding figure, the result set displays the JobCandidateID column from the JobCandidate
table and the Title column from the matching rows of the Employee table.

Using a Full Outer Join

A full outer join is a combination of left outer join and right outer join. This join returns all the
matching and non-matching rows from both the tables. However, the matching records are
displayed only once. In case of non-matching rows, a NULL value is displayed for the columns
for which data is not available.

The following figure depicts how the result set is produced using the full outer join.

Full Outer Join

In the preceding figure, all the rows from both the tables are included in the result set, regardless
of whether there are matching rows in the tables.

Consider an example where the HR department of an organization stores the details of an


employee in the Employee table. The Employee table contains records, as shown in the following
figure.

Records of the Employee Table

In addition, the educational details are stored in a master table named Education. The Education
table contains records, as shown in the following figure.
Records of the Education Table

You need to generate a report displaying the list of all the employees with their highest educational
qualification details. To perform this task, you can use the FULL OUTER JOIN keyword, as
shown in the following query:
SELECT e.EmployeeID, e.EmployeeName, ed.EmployeeEducationCode,
ed.Education
FROM Employee e FULL OUTER JOIN Education ed
ON e.EmployeeEducationCode = ed.EmployeeEducationCode

The following figure displays the output of the preceding query.

Output of Full Outer Join

In the preceding figure, the employee details and their highest educational qualification is
displayed. For non-matching values, NULL is displayed.

Just a minute:
When do you use the right outer join?

Answer:
You can use the right outer join when you need all the records from the table at the right side of the
outer join and only matching records from the table at the left side of the outer join.</
Using a Cross Join
A cross join is also known as the Cartesian Product. It joins each row from one table with each
row of the other table. The number of rows in the result set is the number of rows in the first table
multiplied by the number of rows in the second table. This implies that if table A has 10 rows and
table B has 5 rows, then all the 10 rows of table A are joined with all the 5 rows of table B.
Therefore, the result set will contain 50 rows.

For example, table A contains three shapes, Circle, Rectangle, and Line. Table B contains three
colors, Red, Blue and Green. If you join these two tables using the cross join, the result set will
contain nine rows, as shown in the following figure.

Using the Cross Join

Consider another example of a storehouse that sells computers. As a database developer, you have
saved the configuration and price details of computers in the ComputerDetails table, as shown in
the following figure.

ComputerDetails Table

The store also sells peripheral devices. You have saved the details of these devices in the
AddOnDetails table, as shown in the following figure.
AddOnDetails Table

To identify the total price of a computer with all the combinations of add-on devices, you can use
the CROSS JOIN keyword, as shown in the following query:
SELECT A.CompDescription, B.AddOnDescription, A.Price + B.Price AS ‘Total Cost’ FROM
ComputerDetails A CROSS JOIN AddOnDetails B

The preceding query combines the records of both the tables to display the total price of a
computer with all the possible combinations, as shown in the following figure.

Cross Join Between the ComputerDetails and the AddOnDetails Tables</

Using an Equi Join


An equi join is the same as an inner join and joins tables with the help of a foreign key. However,
an equi join is used to display all the columns from both the tables. The common column from all
the joining tables is displayed.

Consider an example where you apply an equi join between the EmployeeDepartmentHistory,
Employee, and Department tables by using a common column, EmployeeID. To perform this task,
you can use the following query:
SELECT * FROM HumanResources.EmployeeDepartmentHistory d JOIN
HumanResources.Employee e ON d.EmployeeID = e.EmployeeID JOIN
HumanResources.Department p ON p.DepartmentID = d.DepartmentID

The output of the preceding query displays the EmployeeID column from all the tables, as shown
in the following figure.
Output of an Equi Join

Just a minute:
What is the difference between an equi join and an inner join?

Answer:
An equi join is used to retrieve all the columns from both the tables. An inner join is used to retrieve
selected columns from tables.</

Using a Self Join


In a self join, a table is joined with itself. As a result, one row in a table correlates with other rows
in the same table. In a self join, a table name is used twice in the query. Therefore, to differentiate
the two instances of a single table, the table is given two alias names.

Consider the example of an Employee table. The Employee table contains records, as shown in
the following figure.

Employee Table

You want to display the employee details along with their manager details, such as manager id
and manager designation. However, in the preceding table, designation of the manager is not
displayed. Here, the manager is also an employee. Therefore, the designation of the manager can
be retrieved from the title column, which stores the designation of an employee. Therefore, you
need to join the table with itself to obtain the required result.
To perform self-join, you need to divide the physical tables into two logical tables, as shown in
the following figure.

Two Logical Tables

In the preceding figure, the Employee (emp) table represents the employeeid, title and managerid
of the employee, whereas the Employee (mgr) table represents the employeeid and title of the
managers. Using the preceding figure, you can easily identify the designations of the managers.

The following query joins the Employee table with itself to display the EmployeeID attributes and
the designations of all the employees with the designations of their managers:
SELECT emp.EmployeeID, emp.Title AS Employee_Designation, emp.ManagerID, mgr.Title
AS Manager_Designation FROM HumanResources.Employee emp, HumanResources.Employee mgr
WHERE emp.ManagerID = mgr.EmployeeID

The output of the preceding query is shown in the following figure.

Result Set of a Self Join

Activity: Using Joins

Problem Statement

The HR manager of AdventureWorks, Inc. requires a report containing the following details:
Employee ID
Employee Name
Department Name
Date of Joining
EmployeeAddress

How will you generate this report?

Solution
To solve the preceding problem, you need to perform the following tasks:

1. Identify the join.


2. Create a query based on join.
3. Execute the query to verify the result.

Task 1: Identifying the Join

The required details are available in different tables of the AdventureWorks database. Therefore,
you need to join the Employee, Department, EmployeeDepartmentHistory, Contact, and Address
tables.

You need to display the details of all the employees. Therefore, you need to retrieve the
EmployeeID column of all the employees from the Employee table and obtain the employee name,
department name, hire date, and address for each employee from the other tables. To perform this
task, you need to use an inner join.

Task 2: Creating a Query Based on Join

In the Microsoft SQL Server Management Studio window, type the following query in the
Query Editor window:
SELECT e.EmployeeID AS ‘Employee ID',
h.FirstName AS ‘Employee Name', g.Name AS ‘Department Name',
e.HireDate AS ‘Date of Joining', j.AddressLine1 AS ‘Employee Address’ FROM
HumanResources.Employee AS e
JOIN HumanResources.EmployeeDepartmentHistory AS f ON
e.EmployeeID = f.EmployeeID JOIN HumanResources.Department AS g
ON f.DepartmentID = g.DepartmentID
JOIN Person.Contact AS h ON e.ContactID = h.ContactID
JOIN HumanResources.EmployeeAddress AS i ON
e.EmployeeID = i.EmployeeID JOIN Person.Address AS j
ON i.AddressID = j.AddressID
Task 3: Executing the Query to Verify the Result

Press the F5 key to execute the query and view the result set. The following figure displays the
output.

Result Set of the Join

The query retrieves different columns from five different tables based on the common column.

Querying Data by Using Subqueries


While querying data from multiple tables, you might need to use the result of one query as an
input for the condition of another query. For example, in the AdventureWorks database, you need
to view the designation of all the employees who earn more than the average salary. Here, you
have to perform two steps to find the employees who earn more than the average salary. First, you
have to find the average salary. Second, you have to find the employee who earns more than the
average salary. In such cases, you can combine more than one statement in a subquery.

A subquery is an SQL statement that is used within another SQL statement. Subqueries are nested
inside the WHERE or HAVING clause of the SELECT, INSERT, UPDATE, and DELETE
statements. The query that represents the parent query is called an outer query, and the query that
represents the subquery is called an inner query. The database engine executes the inner query
first and returns the result to the outer query to calculate the result set.

For example, the EmployeeDetails table contains five columns: EmployeeID, EmpName,
Designation, Salary, and DeptNo. The following figure represents the EmployeeDetails table.

EmployeeDetails Table

You want to find all employees who have the same designation as John. Here, the designation of
John is not known in advance. Therefore, you have to execute the following query to find the
designation of John:
SELECT Designation FROM EmployeeDetails
WHERE EmpName = ‘John’

Consider that the output of the preceding query is ‘Executive’. Next, you have to execute the
following query to get the desired output:
SELECT * from EmployeeDetails
WHERE Designation = ‘Executive’

Instead of executing two queries, you can combine these two queries to get the desired output, as
shown in the following query:
SELECT * FROM EmployeeDetails
WHERE Designation = (SELECT Designation FROM EmployeeDetails
WHERE EmpName = ‘John')

In the preceding query, the inner query is executed first and returns the designation of John to the
outer query. Then, the outer query is executed using the return value and displays the desired
output, as shown in the following figure.

Output of the SubQuery

Depending on the output generated by the subquery and the purpose for which it is to be used in
the outer query, you can use different keywords, operators, and functions in the subqueries.</

Using the IN and EXISTS Keywords


A subquery returns values that are used by the outer query. A subquery can return one or more
values. Depending on the requirement, you can use these values in different ways in the outer
query.

For example, in the AdventureWorks database, you need to display the department name for an
employee whose EmployeeID is 46. To perform this task, you can use the following query:
SELECT Name FROM HumanResources.Department
WHERE DepartmentID =
(SELECT DepartmentID FROM HumanResources.EmployeeDepartmentHistory
WHERE EmployeeID = 46 AND EndDate IS NULL)

In the preceding query, the inner subquery returns the DepartmentID column of the employee with
EmployeeID as 46. Using this DepartmentID, the outer query returns the name of the department
from the Department table.
In the preceding query, EndDate is NULL signifying that you need to extract the ID of the
department where the employee is currently working.

In the preceding query, the subquery returns a single value. However, at times, you need to return
more than one value from the subquery. In addition, you might need to use a subquery only to
check the existence of some records and based on that you need to execute the outer query.

You can specify different kinds of conditions on subqueries by using the following keywords:

IN
EXISTS

Using IN Keyword
If a subquery returns more than one value, you might need to match a column value with any of
the values in the list returned by the inner query. To perform this task, you need to use the IN
keyword.

The syntax of using the IN keyword is:


SELECT column, column [,column]
FROM table_name
WHERE column [ NOT ] IN
( SELECT column FROM table_name [WHERE conditional_expression] )

For example, you need to retrieve the EmployeeID attribute of all the employees, who live in
Bothell, from the EmployeeAddress table. To perform this task, you need to use a query to obtain
the AddressID of all the addresses that contain the word Bothell. You can then obtain the
EmployeeID from the Employee table where the AddressID matches any of the AddressIDs
returned by the previous query.

To perform this task, you can use the following query:


SELECT EmployeeID FROM HumanResources.EmployeeAddress WHERE AddressID IN (SELECT
AddressID FROM Person.Address WHERE City = ‘Bothell')

The output of the preceding query is shown in the following figure.

Result Set of a Subquery Using the IN Keyword

Using EXISTS Keyword


You can also use a subquery to check if a set of records exist. For this, you need to use the EXISTS
clause with a subquery. The EXISTS keyword always returns a TRUE or FALSE value.

The EXISTS clause checks for the existence of rows according to the condition specified in the
inner query and passes the existence status to the outer query. The EXISTS keyword returns a
TRUE value if the result of the inner query contains any row.

The query introduced with the EXISTS keyword differs from other queries. The EXISTS keyword
is not preceded by any column name, constant, or other expression, and it contains an asterisk (*)
in the SELECT list of the inner query.

The syntax of the EXISTS keyword in the SELECT statement is:


SELECT column, column [,column]
FROM table_name
WHERE EXISTS ( SELECT column FROM table_name [WHERE conditional_expression] )

For example, the users of AdventureWorks, Inc. need a list containing the employee Id and title
of all the employees who have worked in the marketing department at any point of time. The
department Id of the marketing department is 4.

To generate the required list, you can write the following query by using the EXISTS keyword:
SELECT EmployeeID, Title FROM HumanResources.Employee
WHERE EXISTS
(SELECT * FROM HumanResources.EmployeeDepartmentHistory WHERE EmployeeID =
HumanResources.Employee.EmployeeID AND DepartmentID = 4)

The following figure displays the output generated by the preceding query.

Result Set of a Subquery Using the EXISTS Keyword

Consider another example, where the DeptDetails table stores the details of the departments, as
shown in the following figure.
DeptDetails Table

The EmployeeDetails table stores the details of the employees, as shown in the following figure.

EmployeeDetails Table

You want to display the department details where no employee exists. To perform this task, you
have to execute the following query:
SELECT * FROM DeptDetails d
WHERE NOT EXISTS
(SELECT * FROM EmployeeDetails e
WHERE e.DeptNo = d.DeptNo)

The preceding query will display the output, as shown in the following figure.

Result Set of a Subquery Using the NOT EXISTS Keyword

A subquery must be enclosed within parentheses and cannot use the ORDER BY or the COMPUTE
BY clause.</

Using Modified Comparison Operators


While using subqueries, you can use the =, >, and < comparison operators to create a condition
that checks the value returned by the subquery. When a subquery returns more than one value,
you might need to apply the operators to all the values returned by the subquery. To perform this
task, you can modify the comparison operators in the subquery. SQL Server provides the ALL
and ANY keywords that can be used to modify the existing comparison operators.

The ALL keyword returns a TRUE value if all the values that are retrieved by the subquery satisfy
the comparison operator. It returns a FALSE value if only some values satisfy the comparison
operator, or if the subquery does not return any rows to the outer statement.
The ANY keyword returns a TRUE value, if any value that is retrieved by the subquery satisfies
the comparison operator. It returns a FALSE value if no values in the subquery satisfy the
comparison operator, or if the subquery does not return any rows to the outer statement.

The following table shows the operators that can be used with the ALL and ANY keywords:
Operator Description
>ALL
Means greater than the maximum value in the list.

The expression | column_name >ALL (10, 20, 30) means ‘greater than 30’

>ANY
Means greater than the minimum value in the list.

The expression | column_name >ANY (10, 20, 30) means ‘greater than 10’

=ANY
Means any of the values in the list. It acts in the same way as the IN clause.

The expression | column_name =ANY (10, 20, 30) means ‘equal to either 10 or 20
or 30’

<>ANY
Means not equal to any value in the list.

The expression | column_name <>ANY (10, 20, 30) means ‘not equal to 10 or 20 or
30’

<>ALL
Means not equal to all the values in the list. It acts in the same way as the NOT IN
clause.

The expression | column_name <>ALL (10, 20, 30) means ‘not equal to 10 and 20
and 30’

The ALL and ANY Keywords

The following query displays the employee Id and the title of all the employees whose vacation
hours are more than the vacation hours of employees designated as Recruiter:
SELECT EmployeeID, Title
FROM HumanResources.Employee
WHERE VacationHours >ALL (SELECT VacationHours
FROM HumanResources.Employee WHERE Title ='Recruiter')

In the preceding query, the inner query returns the vacation hours of all the employees who are
titled as Recruiter. The outer query uses the ‘>ALL’ comparison operator. It retrieves the details
of those employees who have vacation hours greater than all the employees titled as Recruiter.

The output of the preceding query is displayed in the following figure.


Result Set of the Subquery Using the Modified Comparison Operator

Consider another example of the EmployeeDetails table. You want to display the details of
employees whose salary is more than the lowest salary of department 10. To perform this task,
you have to use the following query:
SELECT * FROM EmployeeDetails
WHERE Salary > ANY ( SELECT DISTINCT Salary FROM EmployeeDetails
WHERE DeptNo = 10)

In the preceding query, the ANY operator compares the salary of each employee to each value
returned by the subquery. The following figure displays the output of the preceding query.

Output of Using ANY Operator

Just a minute:
What is the use of the EXISTS keyword in a subquery?

Answer:
The EXISTS keyword is used to check the existence of rows in the result set of an inner query
according to the condition specified in the inner query.</

Using Aggregate Functions


While using subqueries, you can also use aggregate functions in the subqueries to generate
aggregated values from the inner query. For example, in a manufacturing organization, the
management wants to view the sales records of those items whose sales records are higher than
the average sales record of a particular product. Therefore, the user first needs to obtain the
average sales record of a particular product, and then find all the records whose sales records
exceed the average value. For this, you can use aggregate functions inside the subquery.

The following query displays the employee Id of those employees whose vacation hours are
greater than the average vacation hours of employees with title as Marketing Assistant:
SELECT EmployeeId FROM HumanResources.Employee
WHERE VacationHours >(SELECT AVG(VacationHours) FROM HumanResources.Employee
WHERE Title = ‘Marketing Assistant')

In the preceding query, the inner query returns the average vacation hours of all the employees
who are titled as Marketing Assistant. The outer query uses the comparison operator ‘>’ to retrieve
the employee ID of all those employees who have vacation hours more than the average vacation
hours assigned for a Marketing Assistant.

The output of the preceding query is shown in the following figure.

Result Set of the Subquery Using the Aggregate Function</

Using Nested Subqueries


A subquery can contain one or more subqueries. Subqueries are used when the condition of a
query is dependent on the result of another query, which in turn is dependent on the result of
another subquery.

For example, you need to view the department Id of an employee whose e-mail address is
[email protected]. To perform this task, you can use the following query:
SELECT DepartmentID FROM HumanResources.EmployeeDepartmentHistory
WHERE EmployeeID = /* Level 1 inner query */
(SELECT EmployeeID FROM HumanResources.Employee
WHERE ContactID = /* Level 2 inner query */
(SELECT ContactID FROM Person.Contact WHERE EmailAddress = ‘taylor0@adventure-
works.com')
)

In the preceding query, two queries are nested within another query. The level 2 inner query
returns the contact Id of an employee based on the e-mail address of the employee from the Person
table.
The level 1 inner query uses this contact Id to search for the employee Id of the employee with
the given e-mail address. The main query uses the employee Id returned by level 1 inner query to
search for the department Id from the EmployeeDepartmentHistory table.

The output of the preceding query is shown in the following figure.

Result Set of a Nested Subquery

You can implement subqueries upto 32 levels. However, the number of levels that can be used
depends on the memory available on the database server.

Consider another example where you want to display the employee details whose salary is greater
than the highest salary in the Admin department. To perform this task, you need to execute the
following query:
SELECT * FROM EmployeeDetails
WHERE Salary > ( SELECT max(Salary) FROM EmployeeDetails
WHERE DeptNo = ( SELECT DeptNo from DeptDetails
WHERE DeptName = ‘Admin'))

In the preceding query, the level 2 inner query returns the department number of the admin
department. The level 1 inner query uses this department number to retrieve the highest salary in
this department. The main query uses the highest salary returned by level 1 inner query to display
the employee details. The preceding query displays the output, as shown in the following figure.

Result Set Displaying Highest Salary in the Admin Department</

Using Correlated Subqueries


A correlated subquery can be defined as a query that depends on the outer query for its evaluation.
In a normal nested subquery, the inner query executes only once. The main query is executed
using the value returned by the inner query. On the other hand, in a correlated subquery, the inner
query is driven by the outer query. Here, the inner query executes once for each row returned by
the outer query. For example, if the outer query returns 10 rows, the inner query will be executed
10 times.

In a correlated subquery, the WHERE clause references a table in the FROM clause. This means
that the inner query is evaluated for each row of the table specified in the outer query.
For example, you can use the following query to find the employees who earn the highest salary
in their department:
SELECT * FROM EmployeeDetails e
WHERE Salary = (SELECT max(Salary) FROM EmployeeDetails
WHERE DeptNo = e.DeptNo)

The preceding query can be identified as a correlated subquery since the e.DeptNo column in the
WHERE clause of the inner query is referencing the table specified in the FROM clause of the
outer SELECT statement. The preceding query will display the output, as shown in the following
figure.

Result Set of Correlated Subqueries

Consider another example of AdventureWorks database. The following query displays the
employee Id, designation, and number of hours spent on vacation for all the employees whose
vacation hours are greater than the average vacation hours identified for their title:
SELECT EmployeeID, Title, VacationHours
FROM HumanResources.Employee e1 WHERE e1.VacationHours >
(SELECT AVG(e2.VacationHours)
FROM HumanResources.Employee e2 WHERE e1.Title = e2.Title)

In the preceding query, the inner query returns the title wise average vacation hours from the
Employee table. The outer query retrieves the employee Id, title, and vacation hours of all the
employees whose vacation hours are greater than the average vacation hours retrieved by the inner
query. The output of the preceding query is shown in the following figure.

Result Set of Correlated Subqueries

Just a minute:
Write a query to determine the employee Id and the department Id of all the employees whose
manager Id is 12 from the AdventureWorks database.

Answer:

SELECT EmployeeID, DepartmentID FROM HumanResources.EmployeeDepartmentHistory WHERE


EmployeeID=(SELECT EmployeeID FROM HumanResources.Employee WHERE ManagerID=12)

Activity: Using Subqueries

Problem Statement

The management of AdventureWorks, Inc. is planning to revise the pay rate of the employees.
For this, they want a report containing the EmployeeID and designation of those employees whose
present pay rate is more than $40.

How will you generate this report?

Solution

To solve the preceding problem, you need to perform the following tasks:

1. Create a query.
2. Execute the query to verify the result.

Task 1: Creating a Query

Type the following query in the Query Editor window of the Microsoft SQL Server
Management Studio window:
SELECT EmployeeID, Title FROM HumanResources.Employee WHERE
EmployeeID IN (SELECT EmployeeID FROM HumanResources.EmployeePayHistory WHERE
Rate>40)

In the preceding query, the subquery returns the employee Ids of all the employees whose pay rate
is greater than 40.

Task 2: Executing the Query to Verify the Result

Press the F5 key to execute the query and view the result set. The following figure displays the
output.
Output of the Preceding Query

Summary
In this chapter, you learned that:

Joins and subqueries are used to retrieve data from multiple tables.
An inner join combines records from multiple tables by using a comparison operator on a
common column.
A left outer join returns all the rows from the left table and the matching rows from the right
table.
A right outer join returns all the rows from the right table and the matching rows from the
left table.
A full outer join returns all the matching and non-matching rows from both the tables on
which join is applied.
A cross join returns each row from the first table joined with each row from the second table.
An equi join is used to list all the columns from the joining tables.
A self join correlates one row in a table with other rows in the same table.
The IN clause in a subquery returns zero or more values.
The EXISTS clause in a subquery returns data in terms of a TRUE or FALSE value.
The ALL and ANY keyword is used in a subquery to modify the existing comparison operator.
Aggregate functions can also be used in subqueries to generate aggregated values from the
inner query.
Subqueries that contain one or more queries are specified as nested subqueries.
A correlated subquery can be defined as a query that depends on the outer query for its
evaluation.

Exercises

Exercise 1
Write a query to display the sales person ID of all the sales persons and name of the territories to
which they belong.
Exercise 2
Write a query to display the sales person ID, territory ID, and territory name of all sales persons
in the following format.

Exercise 3

Write a query to display the sales order ID, the product ID, and order date for all products in the
following format.

Exercise 4

Write a query to display the sales person ID and territory names for all sales persons. If a sales
person does not belong to any territory, NULL should be displayed.

Exercise 5

Write a query to display the sales order ID, territory ID, month, and year of order in the following
format.

Exercise 6

Write a query to display the order number, territory name, order date, and the quarter in which
each order was placed, in the following format. (Hint: Use the QQ datepart to calculate the
quarter.)

Exercise 7
Write a query to display the total amount due of all the sales orders rounded off to a whole number.
In addition, display the sales order ID and the type of credit card through which the payment was
made.
Exercise 8
Write a query to display all the country region codes along with the corresponding territory IDs.

Exercise 9

Write a query to display the total amount due of all the orders in the following format.

Exercise 10

Write a query to display the order date along with the sales order ID and territory name. The order
date should be displayed in the dd/mm/yyyy format.

Exercise 11

Write a query to display the order ID and the territory name of the orders where the month of
order is May and year is 2004.

Exercise 12
Write a query to display the contact ID of the customers that have the ‘Vista’ credit card.

Exercise 13

Write a query to display the sales order IDs of the orders received from the Northeast territory.

Exercise 14
A report containing the sales order ID of those orders where the total value is greater than the
average of the total value of all the orders is required.

Exercise 15

Write a query to display the order ID, the order detail ID, and the total value of those orders where
the total value is greater than the maximum of the total value of order ID 43662.

Exercise 16

Write a query to display the order IDs and the credit card IDs of those cards which are expiring
in the year 2007.

Exercise 17
Write a query to display the credit card number of Catherine Abel.

Exercise 18

Write a query to display the details of those orders for which no discount was offered.

Exercise 19

Write a query to display the order IDs and the order detail IDs along with the total value of those
orders that have a total value greater than the average of the total value for the order ID.

Exercise 20

Write a query to display the sales order IDs of the orders that have been paid through Superior
Card.

Exercise 21

Write a query to display the average rate of the Australian Dollar, where the currency rate date is
st
1 July, 2004.

You need to use the AdventureWorks database to solve these exercises. You can view the details of
tables in the AdventureWorks database in the Appendix.

Chapter 4
Managing Databases and Tables
As a database developer, you are responsible for creating and managing databases and tables.
While creating tables, it is important for you to maintain data integrity. This implies that the data
in the tables is accurate, consistent, and reliable. SQL Server provides various checks that you can
apply on tables to enforce data integrity.

This chapter introduces different types of system databases. It also explains how to create and
drop user-defined databases. Further, it explains how to create and manage user-defined tables by
using DDL statements. In addition, the chapter focuses on various checks and rules that you can
apply to tables to ensure data integrity.

Objectives
In this chapter, you will learn to:
Manage databases
Manage tables
Managing Databases
A database is a collection of tables and objects such as views, indexes, stored procedures, and
triggers. The data stored in a database may be related to a process, such as an inventory or a
payroll. SQL Server can support many databases.

As a database developer, you might need to create databases to store information. At times, you
might also need to delete a database, if it is not required. Therefore, it is essential to know how to
create and delete a database.

SQL Server contains some standard system databases. Before creating a database, it is important
to identify the system databases supported by SQL Server 2005 and their importance.

A view is a virtual table, which gives access to a subset of columns from one or more tables.
An index is an internal table structure that SQL Server 2000 uses to provide quick access to
the rows of a table based on the values of one or more columns.
A stored procedure is a collection or batch of T-SQL statements and control-of-flow language
that is stored under one name, and executed as a single unit.
A trigger is a block of code that constitutes a set of T-SQL statements. These statements are
activated in response to certain actions.</

Identifying System Databases in SQL Server 2005


System databases are the standard databases that exist in every instance of SQL Server 2005.
These databases contain a specific set of tables that are used to store server-specific
configurations, and templates for other databases. In addition, these databases contain a temporary
storage area required to query the database.

SQL Server 2005 contains the following system databases:

master
tempdb
model
msdb
Resource

You can view the system databases in the Object Explorer window of SQL Server Management
Studio, as shown in the following figure.
System Databases

The Object Explorer window does not display the Resource database.

The Master Database

The master database consists of system tables that keep track of the server installation as a whole
and all the other databases. It records all the server-specific configuration information, including
authorized users, databases, system configuration settings, and remote servers. In addition, it
records the instance-wide metadata, such as logon accounts, endpoints, and system configuration
settings.

The master database contains critical data that controls the SQL Server operations. It is advisable
not to give any permission to users on the master database. It is also important to update the
backups of the master database to reflect the changes that take place in the database as the master
database records the existence of all other databases and the location of those database files.

The master database also stores the initialization information of SQL Server. Therefore, if the
master database is unavailable, the SQL Server database engine will not be started.

The SQL Server Management Studio query window defaults to the master database context. Any
queries executed from the query window will execute in the master database unless you change the
context.

The tempdb Database

The tempdb database is a temporary database that holds all temporary tables and stored
procedures. It is automatically used by the server to resolve large or nested queries or to sort data
before displaying results to the user.

All the temporary tables and results generated by the GROUP BY, ORDER BY, and DISTINCT
clauses are stored in the tempdb database. The tempdb database is re-created every time SQL
Server is started so that the system always starts with a clean copy of the database. Temporary
tables and stored procedures are dropped automatically on disconnect or the system is shut down.
Backup and restore operations are not allowed on tempdb database. Therefore, you should not
save any database object in the tempdb database because this database is recreated every time
SQL Server starts. This results in losing the data you saved earlier.

Stored procedures will be discussed later in Chapter 7. Endpoints will be discussed later in Chapter
10.

The model Database

The model database acts as a template or a prototype for the new databases. Whenever a database
is created, the contents of the model database are copied to the new database.

If you modify the model database, all databases created afterward will inherit those changes. The
changes include setting permissions or database options, or adding objects such as tables,
functions, or stored procedures. For example, if you want every new database to contain a
particular database object, you can add the object to the model database. After this, whenever you
create a new database, the object will also be added to the database.

The msdb Database

The msdb database supports the SQL Server Agent. SQL Server Agent is a tool that schedules
periodic activities of SQL Server, such as backup and database mailing. It can run a job on a
schedule in response to a specific event, or on demand. For example, if you want to back up all
the company servers every weekday, you can automate this task. Schedule the backup to run after
12.00 P.M., Monday through Friday. If the backup encounters a problem, the SQL Server Agent
can record the event and notify you. The msdb database contains task scheduling, exception
handling, alert management, and system operator information needed for the SQL Executive
Service. The msdb database contains a few system-defined tables that are specific to the database.

As a database developer, you can query this database to retrieve information on alerts, exceptions,
and schedules. For example, you can query this database to know the schedule for the next backup
and to know the history of previously scheduled backups. You can also query this database to
know how many database e-mail messages have been sent to the administrator.

The Resource Database

The Resource database is a read-only database that contains all the system objects, such as system-
defined procedures and views, included with SQL Server 2005. The Resource database does not
contain user data or user metadata. The Resource database makes upgrading to a new version of
SQL Server easier and faster.

Just a minute:
What is the utility of the model database?

Answer:
The model database acts as a template or a prototype for the new databases.</

Identifying the Database Files


SQL Server maps a database over a set of operating-system files. Each database is stored as a set
of files on the hard disk of the computer. These files include:

Primary data file: The primary data file contains database objects. It can be used for the
system tables and objects. It is the starting point of the database and points to other files in
the database. Every database has one primary data file. It has a .mdf extension.
Secondary data file: The secondary data file is used to store user-defined database objects.
Very large databases may need multiple secondary data files spread across multiple disks.
Databases need not have secondary data files, if the primary data file is large enough to hold
all the data in the database. The secondary data file has a .ndf extension.
Transaction log file: The transaction log file records all modifications that have occurred in
the database and the transactions that caused those modifications. The transaction log files
hold all the transaction information and can be used to recover a database. At least, one
transaction log file must exist for a database. There can be more than one transaction log file.
The minimum size of a transaction log file is 512K. The size of the transaction log file should
be 25 - 40 percent of the size of the database. The log files have an .ldf extension.

A database must consist of a primary data file and one transaction log file. In SQL Server, the
locations of all the files in a database are recorded in the primary data file of the database and in
the master database.

What are Filegroups?

The database files are grouped together in filegroups for allocation and administration purposes.
A filegroup is a collection of files. A database comprises a primary filegroup and any user-defined
filegroup. A primary filegroup contains the primary data file and any other files that are not put
into any other filegroup. It also contains the system tables. When objects are created in the
database without specifying the filegroup, they are assigned to the default filegroup. Only one
filegroup in a database can be the default filegroup.

A user-defined filegroup is a filegroup that is created by users. You can create filegroups to
distribute the data amongst more than one filegroups to improve the performance of the
database.</

Creating a User-Defined Database


In addition to system databases, SQL Server also contains user-defined databases where the users
store and manage their information. When the users create a database, it is stored as a set of files
on the hard disk of the computer.
To create a user-defined database, you can use the CREATE DATABASE statement. The syntax
of the CREATE DATABASE statement is:
CREATE DATABASE database_name
[ ON [ PRIMARY ] [ < filespec >]]
[ LOG ON [ < filespec > ]]

< filespec > ::=


( [ NAME = logical_file_name , ]
FILENAME = 'os_file_name'
[ , SIZE = size ]
[ , MAXSIZE = { max_size | UNLIMITED } ]
[ , FILEGROWTH = growth_increment ] ) [ ,...n ]

where,

database_name is the name of the new database.

ON specifies the disk files used to store the data portion of the database (data files).

PRIMARY specifies the associated <filespec> list that defines files in the primary filegroup.

LOG ON specifies the disk files used to store the log files.

NAME=logical_file_name specifies the logical name for the file.

FILENAME=os_file_name specifies the operating-system file name for the file.

SIZE=size specifies the initial size of the file defined in the <filespec>list.

MAXSIZE=max_size specifies the maximum size to which the file defined in the <filespec> list can
grow.

FILEGROWTH=growth_increment specifies the growth increment of the file defined in the <filespec>
list. The FILEGROWTH setting for a file cannot exceed the MAXSIZE setting.

To create a database, you must be a member of the dbcreator server role. In addition, you must
have the CREATE DATABASE, CREATE ANY DATABASE, or ALTER ANY DATABASE
permissions.

The following SQL statement creates a database named Personnel to store the data related to all
the employees:
CREATE DATABASE Personnel

The preceding statement creates a database named Personnel in the C:\Program Files\Microsoft
SQL Server\MSSQL.1\MSSQL\Data folder. The data file name of the database is Personnel.mdf
and the log file name is Personnel_Log.ldf.

You can also use the following statements to create a database:


USE master
GO
CREATE DATABASE MyDB
ON PRIMARY
( NAME='MyDB_Primary',
FILENAME=
'c:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\data\MyDB_Prm.mdf',
SIZE=4MB,
MAXSIZE=10MB,
FILEGROWTH=1MB),
FILEGROUP MyDB_FG1
( NAME = 'MyDB_FG1_Dat1',
FILENAME =
'c:\Program Files\Microsoft SQL Server\MSSQL.1 \MSSQL\data\MyDB_FG1_1.ndf',
SIZE = 1MB,
MAXSIZE=10MB,
FILEGROWTH=1MB),
( NAME = 'MyDB_FG1_Dat2',
FILENAME =
'c:\Program Files\Microsoft SQL Server\MSSQL.1 \MSSQL\data\MyDB_FG1_2.ndf',
SIZE = 1MB,
MAXSIZE=10MB,
FILEGROWTH=1MB)
LOG ON
( NAME='MyDB_log',
FILENAME =
'c:\Program Files\Microsoft SQL Server\MSSQL.1 \MSSQL\data\MyDB.ldf',
SIZE=1MB,
MAXSIZE=10MB,
FILEGROWTH=1MB)
GO

The preceding statements create a database MyDB where the primary data file is stored in the
primary filegroup, which is the default file group. The two secondary data files are stored in the
user-defined filegroup named MyDB_FG1. It also creates a log file by the name MyDB.ldf.

The following figure represents the creation of primary, log and secondary files of the MyDB
database.

MyDB Database
You can change the default filegroup by using the following statement:
ALTER DATABASE MyDB
MODIFY FILEGROUP MyDB_FG1 DEFAULT
GO

The preceding statement makes MyDB_FG1 as the default filegroup for the MyDB database.

You can also create a database by using the Object Explorer window. To create a database using
Object Explorer window, you need to perform the following steps:
1. Right-click the Databases folder in the Object Explorer window.
2. Select the New Database option from the pop-up menu. The New Database window is
displayed, as shown in the following figure.

New Database Window

3. Type the database name in the Database name text box, and click the OK button. The
database is created, and the user, who creates the database, automatically becomes the owner
of the database. The owner of the database is called dbo.

After a database is created, you may also need to see the details of the database. For this
purpose, you can use the sp_helpdb statement. The syntax of the sp_helpdb statement is:
sp_helpdb [database name]
Just a minute:
Which statement is used to create a database?

Answer:
The CREATE DATABASE statement</

Renaming a User-Defined Database


You can rename a database whenever required. Only a system administrator or the database owner
can rename a database. The sp_renamedb stored procedure is used to rename a database. The
syntax of the sp_renamedb statement is:
sp_renamedb old_database_name, new_database_name

where,

old_database_name is the current name of the database.

new_database_name is the new name of the database.

For example, the following statement renames the Personnel database to the Employee database:
sp_renamedb Personnel, Employee</

Dropping a User-Defined Database


You can delete a database when it is no longer required. This causes all the database files and data
to be deleted. Only the users with sysadmin role and the database owner have the permissions to
delete a database. The DROP DATABASE statement is used to delete a database. The syntax of
the DROP DATABASE statement is:
DROP DATABASE database_name

where,

database_name is the name of the database.

The following statement deletes the Employee database:


DROP DATABASE Employee

You cannot delete a system-defined database.

You can rename or delete a database by right-clicking the Databases folder in the Object Explorer
window, and then selecting the Rename or Delete option from the pop-up menu.

Managing Tables
A table is a database object used to store data. Data in a table is organized in rows and columns.
Each row in a table represents a unique record and each column represents an attribute of the
record. The column names within a table must be unique, but the same column name can be used
in different tables within a database.

As a database developer, you need to create tables to store data. While creating tables in a
relational database, you must ensure that no one should enter invalid data in it. Therefore, you
need to apply certain rules and constraints for columns that specify the kind of data to be stored.
In addition, you need to specify the relationships between various tables.

If you want to store a large volume of data in a table, you can create a partitioned table. This helps
in improving the performance of the queries.

In addition to creating tables, you are responsible for managing tables. The management of tables
involves modifying tables to add columns or to change the rules imposed on the table. It also
involves deleting tables, when they are not required.</

Creating a Table
In SQL Server 2005, you can create a table by using the CREATE TABLE statement. The syntax
of the CREATE TABLE statement is:
CREATE TABLE
[ database_name . [ schema_name ] .] table_name
( { <column_definition> | <computed_column_definition> }
[IDENTITY (SEED, INCREMENT)]
[ <table_constraint> ] [ ,...n ] )
[ ON { partition_scheme_name ( partition_column_name ) | filegroup
| "default" } ]
[ { TEXTIMAGE_ON { filegroup | "default" } ]
[ ; ]

where,

database_name specifies the name of the database where the table is created. If you do not specify
a database name, the table is created in the current database.

schema_name specifies the schema name where the new table belongs. Schema is a logical group
of database objects in a database. Schemas help in improving manageability of objects in a
database.

table_name specifies the new table name. The table name can be a maximum of 128 characters.

column_namespecifies the name of the column and must be unique in the table. It can be a
maximum of 128 characters.

specifies the expression, which produces the value of the computed


computed_column_definition
column. A computed column does not exist physically in the memory but it is used to generate a
computed value.

IDENTITYis used for those columns that need automatically generated unique system values. This
property can be used to generate sequential numbers.
SEEDis the starting or the initial value for the IDENTITY column.

INCREMENTis the step value used to generate the next value for the column. This value can also be
negative.

is an optional keyword that specifies the PRIMARY KEY, NOT NULL,


table_constraint
UNIQUE, FOREIGN KEY, or CHECK constraint.

partition_scheme_name specifies the partition scheme name that defines the filegroups on which
the partition of a table is mapped. Ensure that the partition scheme exist within the database.

partition_column_name specifies the column name on which a partitioned table will be partitioned.

TEXTIMAGE_ON { filegroup | “default” } are keywords that specify that the text, ntext, image,
xml, varchar(max), nvarchar(max), varbinary(max), and CLR user-defined type columns are
stored on the specified filegroup. If there are no large value columns in the table, TEXTIMAGE_ON is
not allowed.

You need to have the CREATE TABLE permissions to create a table.

Consider the following example. The management of AdventureWorks, Inc. needs to maintain
the leave details of the employees. For this, you need to create a table named EmployeeLeave in
the HumanResources schema.

The structure of the EmployeeLeave table is given in the following table.


Columns Data Type Checks
EmployeeID int NOT NULL
LeaveStartDate date NOT NULL
LeaveEndDate date NOT NULL
LeaveReason varchar(100) NOT NULL
LeaveType char(2) NOT NULL

EmployeeLeave Table Details

You can use the following statement to create the table:


CREATE TABLE HumanResources.EmployeeLeave
(
EmployeeID int NOT NULL,
LeaveStartDate datetime NOT NULL,
LeaveEndDate datetime NOT NULL,
LeaveReason varchar(100),
LeaveType char(2)NOT NULL
)

You can use the system stored procedure named sp_help to view a table structure.To view the
stucture of HumanResources.EmployeeLeave table, you can use the following statement:
sp_help ‘HumanResources.EmployeeLeave’

Consider another example, where the management of AdventureWorks, Inc. needs to maintain
the project details assigned to their employees. For this, you need to create a Project table in the
HumanResources schema. The structure of the Project table is given in the following table.
Columns Data Type
ProjectCode Int
ProjectManagerID int
Description Varchar(50)
StartDate date
EndDate date

Project Table Details

You can use the following statement to create the table:


CREATE TABLE HumanResources.Project
(
ProjectCode int,
ProjectManagerID int,
Description varchar(50),
StartDate datetime,
EndDate datetime
)

The preceding statement creates a table named Project in the HumanResources schema where all
the columns can store null values.

If you assign an IDENTITY property to a column, SQL Server automatically generates sequential
numbers for new rows inserted in the table containing the IDENTITY column.You can use the
following statement to create the IDENTITY column in a table:
CREATE TABLE Emp
(EmpCode int IDENTITY(100,1),
EmpName char(25) NOT NULL,
DeptName char(25) NOT NULL)

The preceding statement creates a table Emp with an IDENTITY column. The EmpCode column
of the Emp table is the IDENTITY column with the starting value (SEED) as 100 and the step
value (INCREMENT) as 1. Therefore, the EmpCode Column will store values as 100, 101, 102,
and so on.
You can use the following statement to view the stucture of the HumanResources.Project table:
sp_help ‘HumanResources.Project’

Guidelines to Create Tables

While creating tables, you need to consider the following guidelines:

The column names within a table must be unique, but the same column name can be used in
different tables within a database.
The table name can be of maximum 128 characters.

You can also create a table by right-clicking the Tables folder under the Database folder in the
Object Explorer window, and then selecting the New Table option from the pop-up menu.</

Implementing Data Integrity


If the checks are not applied while defining and creating tables, the data stored in the tables can
become redundant. For example, if you do not store the data about all the employees with
complete address details, then the data would not be useful.

Similarly, if a database used by the Human Resource department stores employee contact details
in two separate tables, the details of the employees might not match. This would result in
inconsistency and confusion.

Therefore, it is important to ensure that the data stored in tables is complete and consistent. The
concept of maintaining consistency and completeness of data is called data integrity. Data
integrity is enforced to ensure that the data in a database is accurate, consistent, and reliable. It is
broadly classified into the following categories:

Entity integrity: Ensures that each row can be uniquely identified by an attribute called the
primary key. The primary key column contains unique value in all the rows. In addition, this
column cannot be NULL. Consider a situation where there might be two candidates for an
interview with the same name ‘Jack’. By enforcing entity integrity, the two candidates can be
identified by using the unique code assigned to them. For example, one candidate can have
the code 001 and the other candidate can be 002.
Domain integrity: Ensures that only a valid range of values is stored in a column. It can be
enforced by restricting the type of data, the range of values, and the format of the data. For
example, you have a table called BranchOffice with a column called City that stores the name
of the cities, where the branch offices are located. The offices are located in ‘Beijing’,
‘Nanjing’, ‘Hangzhou’, ‘Dalian’, ‘Suzhou’, ‘Chengdu’ and ‘Guangzhou’. By enforcing domain
integrity, you can ensure that only valid values (as per the list specified) are entered in the
City column of the BranchOffice table. Therefore, the user will not be allowed to store any
other city names like ‘New York’ or ‘London’ in the City column of the BranchOffice table.
Referential integrity: Ensures that the values of the foreign key match the value of the
corresponding primary key. For example, if a bicycle has been ordered and an entry is to be
made in the OrderDetail table, then that bicycle code should exist in the Product table. This
ensures that an order is placed only for the bicycle that is available.
User-defined integrity: Refers to a set of rules specified by a user, which do not belong to the
entity, domain, and referential integrity categories.

When creating tables, SQL Server allows you to maintain integrity by:

Applying constraints
Applying rules
Using user-defined types

Applying Constraints

Consider an example where a user entered a duplicate value in the EmployeeID column of the
Employee table. This would mean that the two employees have same employee ID. This would
further result in erroneous results when anybody queries the table. As a database developer, you
can prevent this by enforcing data integrity on the table by using constraints.

Constraints define rules that must be followed to maintain consistency and correctness of the data.
A constraint can be either created while creating a table or added later. When a constraint is added
after the table is created, it checks the existing data. If there is any violation, then the constraint is
rejected.

A constraint can be created by using either of the following statements:

CREATE TABLE statement


ALTER TABLE statement

A constraint can be defined on a column while creating a table. It can be created with the CREATE
TABLE statement. The syntax of adding a constraint at the time of table creation is:
CREATE TABLE table_name
(
column_name CONSTRAINT constraint_name constraint_type [,CONSTRAINT constraint_name
constraint_type]
)

where,

column_name is the name of the column on which the constraint is to be defined.

constraint_name is the name of the constraint to be created and must follow the rules for the
identifier.

constraint_type is the type of the constraint to be added.

Constraints can be divided into the following types:

Primary key constraint


Unique constraint
Foreign key constraint
Check constraint
Default constraint

Primary Key Constraint

A primary key constraint is defined on a column or a set of columns whose values uniquely
identify all the rows in a table. These columns are referred to as the primary key columns. A
primary key column cannot contain NULL values since it is used to uniquely identify rows in a
table. The primary key constraint ensures entity integrity.

You can define a primary key constraint while creating the table, or you can add it later by altering
the table. However, if you define the primary key constraint after inserting rows, SQL Server will
give an error if the rows contain duplicate values in the column. While defining a primary key
constraint, you need to specify a name for the constraint. If a name is not specified, SQL Server
automatically assigns a name to the constraint.

If a primary key constraint is defined on a column that already contains data, then the existing
data in the column is screened. If any duplicate values are found, then the primary key constraint
is rejected. The syntax of applying the primary key constraint while creating a table is:
CREATE TABLE table_name
(
col_name [CONSTRAINT constraint_name PRIMARY KEY [CLUSTERED|NONCLUSTERED]
col_name [, col_name [, col_name [, ...]]]
)

where,

constraint_name specifies the name of the constraint to be created.

CLUSTERED| NONCLUSTERED are keywords that specify if a clustered or a nonclustered index is to be


created for the primary key constraint.

col_name specifies the name of the column(s) on which the primary key constraint is to be defined.

You will learn more about indexes in Chapter 6.

In the preceding example of the Project table, you can add a primary key constraint while creating
the table. You can use the following statement to apply the primary key constraint:
CREATE TABLE HumanResources.Project
(
ProjectCode int CONSTRAINT pkProjectCode PRIMARY KEY,
...
...
)

The preceding statement will create the Project table with a primary key column, ProjectCode.
You can create primary key using more than one column. For example, you can set the
EmployeeID and the LeaveStartDate columns of the EmployeeLeave table as a composite primary
key. You can use the following statement to apply the composite primary key constraint:
CREATE TABLE HumanResources.EmployeeLeave
(
EmployeeID int,
LeaveStartDate datetime
CONSTRAINT cpkLeaveStartDate PRIMARY KEY(EmployeeID, LeaveStartDate),
...
...
...
)

The preceding statement creates the EmployeeLeave table with a composite primary key
constraint on EmployeeID and LeaveStartDate. The name of the constraint is cpkLeaveStartDate.

Unique Constraint

The unique constraint is used to enforce uniqueness on non-primary key columns. A primary key
constraint column automatically includes a restriction for uniqueness. The unique constraint is
similar to the primary key constraint except that it allows one NULL row. Multiple unique
constraints can be created on a table. The syntax of applying the unique constraint when creating
table is:
CREATE TABLE table_name
(
col_name [CONSTRAINT constraint_name UNIQUE [CLUSTERED | NONCLUSTERED]
(col_name [, col_name [, col_name [, ...]]])
col_name [, col_name [, col_name [, ...]]]
)

where,

constraint_name specifies the name of the constraint to be created.

CLUSTERED | NONCLUSTERED are keywords that specify if a clustered or a nonclustered index is to


be created for the unique constraint.

col_name specifies the name of the column(s) on which the unique constraint is to be defined.

You can use the following statement to enforce unique constraint on the Description column of
the Project table:
CREATE TABLE HumanResources.Project
(
ProjectCode int CONSTRAINT pkProjectCode PRIMARY KEY,
Description varchar(50) CONSTRAINT unDesc UNIQUE,
...
...
)

Foreign Key Constraint

You can use the foreign key constraint to remove the inconsistency in two tables when the data in
one table depends on the data in another table. A foreign key always refers the primary key column
of another table, as shown in the following figure.

Primary Key and Foreign Key Relationship

The preceding figure contains two tables, Dept and Emp. The DeptNo column is the primary key
in the Dept table and foreign key in the Emp table.

Consider the example of Customers and Orders table. The following figure represents the primary
key and foreign key relationship between the Customers and the Orders table.

Primary Key and Foreign Key Relationship

In the preceding figure, ID is the primary key column of the Customers table and Cust_ID is the
foreign key column in the Orders table.

A foreign key constraint associates one or more columns (the foreign key) of a table with an
identical set of columns (a primary key column) in another table on which a primary key constraint
has been defined. The syntax of applying the foreign key constraint when creating a table is:
CREATE TABLE table_name
(
col_name [CONSTRAINT constraint_name FOREIGN KEY (col_name [, col_name [, ...]])
REFERENCES table_name (column_name [, column_name [, ...]])]
(col_name [, col_name [, col_name [, ...]]])
col_name [, col_name [, col_name [, ...]]]
)

where,

constraint_name is the name of the constraint on which the foreign key constraint is to be defined.

col_name is the name of the column on which the foreign key constraint is to be enforced.

table_name is the name of the related table in which the primary key constraint has been specified.

column_name is the name of the primary key column of the related table on which the primary key
constraint has been defined.

For example, in the EmployeeLeave table of the HumanResources schema, you need to add the
foreign key constraint to enforce referential integrity. The EmployeeID column is set as a primary
key in the Employee table of the HumanResources schema. Therefore, you need to set
EmployeeID in the EmployeeLeave table as a foreign key. You can use the following statement
to apply the foreign key constraint in the EmployeeLeave table:
CREATE TABLE HumanResources.EmployeeLeave
(
EmployeeID int CONSTRAINT fkEmployeeID REFERENCES
HumanResources.Employee(EmployeeID),
LeaveStartDate datetime CONSTRAINT cpkLeaveStartDate PRIMARY KEY(EmployeeID,
LeaveStartDate),
...
...
...
)

You can also apply the foreign key constraint in the EmployeeLeave table by using the following
statement:
CREATE TABLE HumanResources.EmployeeLeave
(
EmployeeID int,
LeaveStartDate datetime CONSTRAINT cpkLeaveStartDate PRIMARY KEY(EmployeeID,
LeaveStartDate),
CONSTRAINT fkEmployeeID FOREIGN KEY(EmployeeID) REFERENCES
HumanResources.Employee(EmployeeID)
...
...
)

The preceding statement creates the EmployeeLeave table with a foreign key constraint on the
EmployeeID column. The name of the constraint is fkEmployeeID.
Check Constraint

A check constraint enforces domain integrity by restricting the values to be inserted in a column.
It is possible to define multiple check constraints on a single column. These are evaluated in the
order in which they are defined. The syntax of applying the check constraint is:
CREATE TABLE table_name
(
col_name [CONSTRAINT constraint_name] CHECK (expression)
(col_name [, col_name [, ...]])
...
)

where,

constraint_name specifies the name of the constraint to be created.

expression specifies the conditions that define the check to be made on the column. It can include
elements, such as arithmetic operators, relational operators, or keywords, such as IN, LIKE, and
BETWEEN.

A single check constraint can be applied to multiple columns when it is defined at the table level.
For example, while entering project details, you want to ensure that the start date of the project
must be less than or equal to the end date.

You can use the following statement to apply the check constraint on the Project table:
CREATE TABLE HumanResources.Project
(
ProjectCode int,
EmployeeID int,
Description varchar(50),
StartDate datetime,
EndDate datetime,
Constraint chkDate CHECK (StartDate <= EndDate)
)

A check constraint can be specified by using the following keywords:

IN: To ensure that the values entered are from a list of constant expressions. The following
statement creates a check constraint, chkLeave on the LeaveType column of the
HumanResources.EmployeeLeave table, thereby restricting the entries to valid leave types:

CREATE TABLE HumanResources.EmployeeLeave


(
EmployeeID int CONSTRAINT fkEmployeeID FOREIGN KEY REFERENCES
HumanResources.Employee(EmployeeID),
LeaveStartDate datetime CONSTRAINT cpkLeaveStartDate PRIMARY KEY(EmployeeID,
LeaveStartDate),
LeaveEndDate datetime NOT NULL,
LeaveReason varchar(100),
LeaveType char(2) CONSTRAINT chkLeave CHECK(LeaveType IN('CL','SL','PL'))
)

The preceding statement ensures that the leave type can be any one of the three values: CL, PL,
or SL. Here, CL stands for Casual Leave, SL stands for Sick leave, and PL stands for Privileged
Leave.

LIKE: To ensure that the values entered in specific columns are of a certain pattern. This can
be achieved by using wildcards. For example, the following statement creates a check
constraint on DeptCode column of the Emp table:

CREATE TABLE Emp


(


DeptCode char(4) CHECK (DeptCode LIKE ‘[0-9][0-9][0-9][0-9]')
)

In the preceding statement, the check constraint specifies that the DeptCode column can contain
only four-digit numbers.

BETWEEN: To specify a range of constant expressions by using the BETWEEN keyword. The
upper and lower boundary values are included in the range. For example, the following
statement creates a check constraint on Salary column of the Emp table:

CREATE TABLE Emp


(


Salary int CHECK (Salary BETWEEN 20000 AND 80000)
)

In the preceding statement, the check constraint specifies that the Salary column can have a value
only between 20000 and 80000.

The rules regarding the creation of the check constraint are:

It can be created at the column level.


It can contain user-specified search conditions.
It cannot contain subqueries.
It does not check the existing data in the table if created with the WITH NOCHECK option.
It can reference other columns of the same table.

Default Constraint

A default constraint can be used to assign a constant value to a column, and the user need not
insert values for such a column. Only one default constraint can be created for a column but the
column cannot be an IDENTITY column. The system-supplied values, such as USER,
CURRENT_USER, and user-defined values can be assigned as defaults.

The syntax of applying the default constraint while creating a table is:
CREATE TABLE table_name
(
col_name [CONSTRAINT constraint_name] DEFAULT (constant_expression | NULL)
(col_name [, col_name [, …]])
...
)

where,

constraint_name specifies the name of the constraint to be created.

constant_expression specifies an expression that contains only constant values. This can contain
a NULL value.

For example, while creating the EmployeeLeave table, you can insert a default constraint to add
a default value for the LeaveType column. You can set the default leave type as PL.

You can use the following statement to create the default constraint:
CREATE TABLE HumanResources.EmployeeLeave
(
EmployeeID int CONSTRAINT fkEmployeeID FOREIGN KEY REFERENCES
HumanResources.Employee(EmployeeID),
LeaveStartDate datetime CONSTRAINT cpkLeaveStartDate PRIMARY KEY(EmployeeID,
LeaveStartDate),
LeaveEndDate datetime NOT NULL,
LeaveReason varchar(100),
LeaveType char(2) CONSTRAINT chkLeave CHECK(LeaveType IN('CL','SL','PL')) CONSTRAINT
chkDefLeave DEFAULT ‘PL’
)

The preceding statement creates the EmployeeLeave table with a default constraint on the
LeaveType column, where the default value is specified as PL. The name of the constraint is
chkDefLeave.

You can also use the DEFAULT database objects to create a default constraint that can be applied
to columns across tables within the same database. For this, you can create database objects by
using the CREATE DEFAULT statement.

Just a minute:
Which keyword is used to specify a check constraint?
Answer:
A check constraint can be specified by using the LIKE, IN, and BETWEEN keywords.

Applying Rules
A rule enforces domain integrity for columns or user-defined data types. The rule is applied to the
column or the user-defined data type before an INSERT or UPDATE statement is issued. In other
words, a rule specifies a restriction on the values of a column or a user-defined data type. Rules
are used to implement business-related restrictions or limitations. A rule can be created by using
the CREATE RULE statement. The syntax of the CREATE RULE statement is:
CREATE RULE rule_name AS conditional_expression

where,

rule_name specifies the name of the new rule that must conform to rules for identifiers.

conditional_expression specifies the condition(s) that defines the rule. It can be any expression
that is valid in a WHERE clause and can include elements, such as arithmetic operators, relational
operators, IN, LIKE, and BETWEEN.

The variable specified in the conditional expression must be prefixed with the @ symbol. The
expression refers to the value that is being specified with the INSERT or UPDATE statement.

In the preceding example of the EmployeeLeave table, you applied the check constraint on the
LeaveType column to accept only three values: CL, SL, and PL. You can perform the same task
by creating a rule, as shown in the following statement:
CREATE RULE rulType
AS @LeaveType IN ('CL', ‘SL', ‘PL')

The following table lists the statement to create rule.


Example Description
Specifies that if the value of deptname is accounts or stores,
CREATE RULE dept_name_rule
then the value is to be rejected from being inserted into the
AS @deptname NOT IN column to which the rule is bound.
(‘accounts’,’stores’)

Allows a value of $5000 or more to be inserted in the column,


CREATE RULE min_price_rule
to which the rule is bound.
AS @minprice >= $5000

Allows character strings that follow the pattern specified in the


CREATE RULE emp_code_rule
LIKE clause to be inserted in the column, to which the rule will
AS @empcode LIKE ‘[F-M][A- be bound that is the first character of the string can be any
Z][0-9] [0-9][0-9]’ value between F to M, the second character will be any value
ranging from A to Z and from the third character onwards the
acceptable range is a numeric value from 0 to 9.
Example of Creating Rules

After you create the rule, you need to activate the rule by using a stored procedure, sp_bindrule.

The syntax of sp_bindrule is:


sp_bindrule <'rule'>, <'object_name'>, [<'futureonly_flag'>]

where,

rule specifies the name of the rule that you want to bind.

object_name specifies the object on which you want to bind the rule.

futureonly_flag applies only when you want to bind the rule to a user-defined data type.

Consider the following example where the rulType rule created for the LeaveType column of the
EmployeeLeave table is bound by using the sp_bindrule stored procedure. You can use the
following statement to bind the rule:
sp_bindrule ‘rulType','HumanResources.EmployeeLeave.LeaveType’

Similarly, when you want to remove the rule, the sp_unbindrule stored procedure is used. For
example, to remove the rule from the EmployeeLeave table, you can use the following statement
to unbind the rule:
sp_unbindrule ‘HumanResources.EmployeeLeave.LeaveType’

The rule can be deleted by using the DROP RULE statement. The syntax for the DROP RULE
statement is:
DROP RULE rule_name

where,

rule_name is the name of the rule to be dropped.

For example, you can use the following statement to delete the rule, ruleType:
DROP RULE ruleType

Using a User-Defined Data Type

User-defined data types are custom data types defined by the users with a custom name. The user-
defined data types are based on the system data types. A user-defined data type is basically a
named object with the following additional features:

Defined data type and length


Defined nullability
Predefined rules that may be bound to the user-defined data types
Predefined default value that may be bound to the user-defined data types

You can create user-defined data types by using the CREATE TYPE statement. The syntax of the
CREATE TYPE statement is:
CREATE TYPE [ schema_name. ] type_name { FROM base_type [ ( precision [, scale ] ) ]
[ NULL | NOT NULL ] } [ ; ]
where,

schema_name specifies the name of the schema to which the alias data type or the user-defined data
type belongs.

type_name specifies the name of the alias data type or the user-defined data type.

base_type specifies SQL Server supplied data type on which the alias data type is based.

precision specifies the decimal or numeric point. Decimal and numeric are non-negative integers
that indicate the maximum number of decimal digits that can be stored, both to the left and to the
right of the decimal point.

scalespecifies the decimal or numeric scale.

NULL | NOT NULL specifies whether the data type can hold a null value. If not specified, NULL is
the default value.

The following statement creates a user-defined data type for descriptive columns:
CREATE TYPE DSCRP
FROM varchar(100) NOT NULL ;

In the preceding statement, a user-defined data type, DSCRP is created to store the varchar data
type and the size limit is specified as 100. Further, it also specifies NOT NULL. Therefore, you
can use this data for the columns that store description, address, and reason.

For example, you can use the DSCRP data type to store the data of the LeaveReason column of
the EmployeeLeave table, as shown in the following statement:
CREATE TABLE HumanResources.EmployeeLeave
(
EmployeeID int CONSTRAINT fkEmployeeID FOREIGN KEY REFERENCES
HumanResources.Employee(EmployeeID),
LeaveStartDate datetime CONSTRAINT cpkLeaveStartDate PRIMARY KEY(EmployeeID,
LeaveStartDate),
LeaveEndDate datetime NOT NULL,
LeaveReason DSCRP,
LeaveType char(2) CONSTRAINT chkLeave CHECK(LeaveType IN('CL','SL','PL')) CONSTRAINT
chkDefLeave DEFAULT ‘PL’
)

Just a minute:
You want to create a rule, rule1, which allows the user to enter any of the four values: Tea,
Coffee, Soup, or Miranda in a column. Which statement should you execute?

Answer:
CREATE RULE rule1
AS @TypeRule IN ('Tea', ‘Coffee', ‘Soup', ‘Miranda')</
Creating a Partitioned Table
When the volume of data in a table increases, it takes time to query the data. You can partition
such tables and store different parts of the tables in multiple physical locations based on a range
of values for a specific column. This helps in managing the data and improving the query
performance.

Consider the example of a manufacturing organization. The details of inventory movements are
stored in the InventoryIssue table. The table contains a large volume of data. Therefore, the queries
take a lot of time to execute, thereby slowing the report generation process.

To improve the query performance, you can partition the table to divide the data based on a
condition and store different parts of the data in different locations. The condition can be based
on the date of transaction and you can save the data pertaining to five years at a location. After
partitioning the table, data can be retrieved directly from a particular partition by mentioning the
partition number in the query.

In the preceding example, the partitioned tables were created after the database had been designed.
You can also create partitioned tables while designing the database and creating tables. You can
plan to create a partitioned table when you know that the data to be stored in the table will be
large. For example, if you are creating a database for a banking application and you know that the
transaction details will be huge, you can create the transaction tables with partitions.

Partitioning a table is only allowed in Enterprise Edition of SQL Server 2005.

To create a partitioned table, you need to perform the following tasks:

1. Create a partition function.


2. Create a partition scheme.
3. Create a table by using the partition scheme.

For example, an organization stores the sales data of all the products for the last 11 years.
However, this results in generation of a large volume of data, which adversely affects the query
performance.

To improve the query performance, the database developer needs to partition the table based on a
condition, as shown in the following figure.
Partitioning the Table

In the preceding figure, the sales data has been physically stored in different filegroups, on an
yearly basis.

Now, consider a scenario of AdventureWorks database, which stores the data of all the employees
for the last 11 years. This data includes the personal details of the employees and their payment
rates. Whenever there is a change in the payment rate of an employee, it is recorded in a separate
record. However, this results in generation of a large volume of data and adversely affects the
query performance. To improve the query performance, the database developer needs to partition
the table storing the changes in payment rate.

Creating a Partition Function

A partition function specifies how a table should be partitioned. It specifies the range of values on
a particular column, based on which the table is partitioned. The syntax of the CREATE
PARTITION FUNCTION statement is:
CREATE PARTITION FUNCTION partition_function_name ( input_parameter_type )
AS RANGE [ LEFT | RIGHT ]
FOR VALUES ( [ boundary_value [ ,…n ] ] )

where,

partition_function_namespecifies the name of the partition function.

input_parameter_typespecifies the data type of the column used for partitioning.

boundary_valuespecifies the boundary values on which a table is to be partitioned.

…n specifies the number of boundary_value, which should not exceed 999. The number of
partitions created is equal to n + 1.
LEFT | RIGHT specifies to which side of each boundary value interval, the boundary_value [ ,…n
] belongs. Left is the default value.

For example, the following statement creates a partition function to partition a table or index into
four partitions:
CREATE PARTITION FUNCTION pfrange(int)
AS RANGE LEFT FOR VALUES (1, 100, 1000);

In the preceding statement, partition 1 will store the column value less than equal to 1. The column
value between 2 to 100 will be stored in partition 2. The column value between 101 to 1000 will
be stored in partition 3. Partition 4 will store the column value which is greater than 1000. The
following table lists the value in each partition.

Range LEFT

In the preceding statement, if you specify RIGHT instead of LEFT, partition 1 will store the
column value which is less than 1. The column value between 1 to 99 will be stored in partition
2. The column value between 100 to 999 will be stored in partition 3. Partition 4 will store the
column value that is greater than or equal to1000. The following table lists the value in each
partition.

Range RIGHT

Consider the scenario of Adventure Works, where you can partition the data based on years. The
following statement creates a partition function for the same:
CREATE PARTITION FUNCTION RateChngDate ( datetime )
AS RANGE RIGHT FOR VALUES ('1996-01-01', ‘2000-01-01', ‘2004-01-01', ‘2008-01-01')

The preceding statement creates a partition function named RateChngDate. It specifies that the
data pertaining to the changes in the payment rate will be partitioned based on the years.

Creating a Partition Scheme


After setting the partition function, you need to create the partition scheme. A partition scheme
associates a partition function with various filegroups resulting in the physical layout of the data.
The syntax of the CREATE PARTITION SCHEME statement is:
CREATE PARTITION SCHEME partition_scheme_name
AS PARTITION partition_function_name
[ ALL ] TO ( { file_group_name | [ PRIMARY ] } [ ,…n ] )

where,

partition_scheme_name specifies the name of the partition scheme.


partition_function_name specifies the name of the partition function. Partition functions are
linked to the filegroups specified in the partition scheme.

ALL specifies that all partitions map to a single filegroup provided in the file_group_name.

file_group_name | [ PRIMARY ] [ ,…n] specifies the list of the filegroup names. If [PRIMARY]
is specified, the partition is stored on the primary filegroup. Partitions are assigned to filegroups
starting with partition 1, in the order in which the filegroups are listed in [,…n]. n is the number
of partitions.

Therefore, before creating a partition scheme, you need to create filegroups.

To create partition filegroups, you need to perform the following steps:

1. Expand the Databases folder in the Object Explorer window and right-click the
AdventureWorks database.
2. Select the Properties option from the pop-up menu to display the Database Properties –
AdventureWorks window, as shown in the following figure.

Database Properties – AdventureWorks Window

3. Select the Filegroups folder from Select a page pane to display the list of all the filegroups
in AdventureWorks.
4. Click the Add button to add a filegroup. Specify the name of the filegroup in the Name text
box as Old, as shown in the following figure.
Adding a Filegroup in the Database

5. Repeat step 4 to add four more filegroups named First, Second, Third, and Fourth, as
shown in the following figure.
Adding All Filegroups in the Database

6. Select the Files folder from Select a page pane to display the list of all the files.
7. Click the Add button and type the name of the file as OldFile in the Logical Name text box,
and select Old from the Filegroup drop-down list, as shown in the following figure.
Creating a File Related to a Filegroup

8. Repeat step 7 to create four files named File1, File2, File3, and File4, and then select
filegroup as First, Second, Third, and Fourth for the files, respectively, as shown in the
following figure.
Files Added to the Filegroups

9. Click the OK button to close the Database Properties – AdventureWorks window.


10. Execute the following statement in the Microsoft SQL Server Management Studio window
to create the partition scheme:

CREATE PARTITION SCHEME RateChngDate


AS PARTITION RateChngDate
TO (Old, First, Second, Third, Fourth)

Creating a Table by Using the Partition Scheme

After you create a partition function and a partition scheme, you need to create a table that will
store the partition records. You can use the following statement to create the partitioned table:
Create Table EmpPayHistPart
(
EmployeeID int,
RateChangeDate datetime,
Rate money,
PayFrequency tinyint,
ModifiedDate datetime
)ON RateChngDate(RateChangeDate)
In the preceding statement, the RateChngDate refers to the partition scheme that is applied to the
RateChangeDate column. The records entered in the EmpPayHistPart table will be stored based
on the condition specified in the partition function.

If you want to display records from a partition instead of displaying the entire table, you can use
the following query:
SELECT * FROM EmpPayHistPart WHERE $PARTITION.RateChngDate(RateChangeDate) = 2

In the preceding query, $PARTITION.RateChngDate(RateChangeDate) function returns the


partition number. Therefore, the preceding statement will retrieve records from partition 2.</

Modifying a Table
You need to modify tables when there is a requirement to add a new column, alter the data type
of a column, or add or remove constraints on the existing columns. For example, AdventureWorks
stores the leave details of all the employees in the EmployeeLeave table. According to the
requirements, you need to add another column named ApprovedBy in the table to store the name
of the supervisor who approved the leave of the employee. To implement this change, you can
use the ALTER TABLE statement.

The syntax of the ALTER TABLE statement is:


ALTER TABLE [ database_name . [ schema_name ] .] table_name
{
ALTER COLUMN column_name
{
[ NULL | NOT NULL ]
}
| [ WITH { CHECK | NOCHECK } ] ADD COLUMN <column_definition>
{
ADD CONSTRAINT constraint_name constraint_type

where,

database_name specifies the name of the database in which the table is created.

schema_name specifies the name of the schema to which the table belongs.

table_name is the name of the table that is to be altered. If the table is not in the current database,
then the user needs to specify the database name and the schema name explicitly.

ALTER COLUMN specifies the name of the altered column.

ADD COLUMN specifies the name of the column to be added.

column_definition specifies the new column definition.

WITH CHECK | WITH NOCHECK specifies whether the existing data is to be checked for a newly added
constraint or a re-enabled constraint.
constraint_name specifies the name of the constraint to be created and must follow the rules for
the identifier.

constraint_type specifies the type of the constraint.

The following statement adds a column named ApprovedBy to the EmployeeLeave table:
ALTER TABLE HumanResources.EmployeeLeave
ADD ApprovedBy VARCHAR(30) NOT NULL

In the preceding statement, the ApprovedBy column is added that can store string values.

The following statement modifies the Description column of the HumanResources.Project table:
ALTER TABLE HumanResources.Project
ALTER COLUMN Description varchar(100)

In the preceding statement, the size of the description column is increased to varchar(100).

The following statement adds a constraint called chkDeptName to the Emp table:
ALTER TABLE Emp
ADD CONSTRAINT chkDeptName CHECK(DeptName IN ('Admin', ‘System', ‘HR', ‘Sales'))

In the preceding statement, a CHECK constraint is added on the DeptName column.

While modifying a table, you can drop a constraint when it is not required. You can perform this
task by altering the table by using the ALTER TABLE statement. The syntax to drop a constraint
is:
ALTER TABLE [ database_name . [ schema_name ] . | schema_name . ] table_name DROP
CONSTRAINT constraint_name

where,

database_name specifies the name of the database in which the table is created.

schema_name specifies the name of the schema to which the table belongs.

table_name specifies the name of the table that contains the constraint to be dropped.

constraint_name specifies the name of the constraint to be dropped.

The following statement drops the default constraint, chkDefLeave of the EmployeeLeave table:
ALTER TABLE HumanResources.EmployeeLeave DROP CONSTRAINT chkDefLeave

In the preceding statement, the chkDefLeave constraint is dropped from the EmployeeLeave table.

All constraints defined on a table are dropped automatically when the table is dropped.
If the column, you are trying to add a constraint to, has some data in it, you can use WITH CHECK
| WITH NOCHECK to specify whether the existing data will be checked with the added
constraint.</

Renaming a Table
You can rename a table whenever required. The sp_rename stored procedure is used to rename
the table. You can use sp_rename to rename any database object, such as table, view, stored
procedure, or function. The syntax of the sp_rename stored procedure is:
sp_rename old_name, new_name

where,

oldname is the current name of the object.

newname is the new name of the object.

For example, the following statement renames the EmployeeLeave table:


sp_rename [HumanResources.EmployeeLeave], [HumanResources.EmployeeVacation]

You can also rename a table by right-clicking the Tables folder under the Databases folder in the
Object Explorer window, and then selecting the Rename option from the pop-up menu.

After a table is created, you may need to see the details of the table. Details of the table
include the column names and the constraints. For this purpose, you can use the sp_help
statement. The syntax of the sp_help statement is:
sp_help [table name]

Just a minute:
You are managing a large table. You want to improve the performance of the table and want
the table to be more manageable. Which strategy can you use?

Answer:
Partition the table.</

Dropping a Table
At times, when a table is not required, you need to delete a table. A table can be deleted along
with all the associated database objects, such as its index, triggers, constraints, and permissions.
You can delete a table by using the DROP TABLE statement. The syntax of the DROP TABLE
statement is:
DROP TABLE [ database_name . [ schema_name ] .] table_name
where,

database_name specifies the name of the database where the table is created.

schema_name specifies the name of the schema to which the table belongs.

table_name specifies the name of the table that needs to be dropped.

When a table is deleted, any other database object referenced by the table needs to be deleted
explicitly. This should be done before deleting the table. This is because while deleting a table, if
violations occur in the rule of referential integrity then an error occurs that restricts you from
deleting the table. Therefore, if your table is referenced then you must delete the referenced table
or the referenced constraint, and then delete the table.

For example, the Employee table in the HumanResource schema contains EmployeeID as its
primary key. The EmployeeLeave table under the same schema contains EmployeeID as its
foreign key and is referenced with the EmployeeID column of the Employee table. Therefore,
when you want to delete the Employee table, you first need to delete the EmployeeLeave table.

The following statement deletes the EmployeeVacation table:


DROP TABLE HumanResources.EmployeeVacation

You can also delete a table by right-clicking the Tables folder under the Databases folder in the
Object Explorer window, and then selecting the Delete option from the pop-up menu.

Activity: Managing Tables

Problem Statement

The management of AdventureWorks, Inc. has decided to provide travel and medical
reimbursements to the employees. They want to store the details of these reimbursements in the
database. For this, you need to create a database table, EmployeeReimbursements. The details of
the EmployeeReimbursements table are shown in the following table.
Columns Data Type and Constraints
Size
RimID int Primary key
EmployeeID int Foreign Key references the EmployeeID of the Employee
Table, NOT NULL
Amount money Amount>0
RimType varchar(20) RimType should be Medical, Cash, or Local
Pending_with varchar(30) NOT NULL

Details of the EmployeeReimbursements Table


How will you create the table?

Solution

To solve the preceding problem, you need to perform the following tasks:

1. Write the query to create a table.


2. Execute the statement to verify the result.

Task 1: Writing the Query to Create a Table

To store the travel and medical reimbursement details of the employees of AdventureWorks, you
need to create the EmployeeReimbursements table under the HumanResource schema. The table
needs to include the necessary constraints to maintain data accuracy and consistency. Type the
following statement in the Query Editor window:
CREATE TABLE HumanResources.EmployeeReimbursements
(
RimID int CONSTRAINT pkRim PRIMARY KEY,
Employee_ID int CONSTRAINT fkEid FOREIGN KEY REFERENCES
HumanResources.Employee(EmployeeID) NOT NULL,
Amount money CONSTRAINT ckMon CHECK (Amount>0),
RimType varchar(20) CONSTRAINT [ckType] CHECK (RimType IN
('Medical','Cash','Local')),
Pending_with varchar(30) NOT NULL
)

Task 2: Executing the Statement to Verify the Result

Press the F5 key to execute the query and view the result set. The following message is displayed
as the output:
Command(s) completed successfully.

Summary
In this chapter, you learned that:

A database is a repository of information that contains data in an organized way.


The master database records all the server-specific configuration information, including
authorized users, databases, system configuration settings, and remote servers.
The tempdb database is a temporary database that holds all the temporary tables and stored
procedures.
The model database acts as a template or a prototype for the new databases.
The msdb database supports the SQL Server Agent. SQL Server Agent includes features that
schedule periodic activities of SQL Server.
The Resource database is a read-only database that contains all the system objects that are
included with SQL Server 2005.
The user-defined databases are created by the users to store data for client-server
applications.
A database consists of the following types of files:
 Primary data file
 Secondary data file
 Transaction log file
A database must consist of a primary data file and one transaction log file.
The CREATE DATABASE statement is used to create a database, which also includes
determining the name of the database, the size of the database, and the files used to store
data in the database.
The DROP DATABASE statement is used to delete a database.
Tables are used to store data.
The CREATE TABLE statement is used to create a table.
Data integrity is enforced to keep the data in a database accurate, consistent, and reliable. It is
broadly classified into the following categories:
 Entity integrity: Ensures that each row can be uniquely identified by an attribute called
the primary key.
 Domain integrity: Ensures that only a valid range of values is allowed to be stored in a
column.
 Referential integrity: Ensures that the values of the foreign key match the value of the
corresponding primary key.
 User-defined integrity: Refers to a set of rules specified by a user, which do not belong to
the entity, domain, and referential integrity categories.
Constraints define rules that must be followed to maintain consistency and correctness of
data.
A primary key constraint is defined on a column or a set of columns whose values uniquely
identify rows in a table.
The unique constraint is used to enforce uniqueness on non-primary key columns.
A foreign key constraint associates one or more columns of a table (the foreign key) with an
identical set of columns on which a primary key constraint has been defined (a primary key
column in another table).
A check constraint enforces domain integrity by restricting the values to be inserted in a
column. The IN, LIKE, and BETWEEN keywords are used to define the check constraint.
A default constraint can be used to assign a constant value to a column, and the user need
not insert values for such a column.
A rule provides a mechanism for enforcing domain integrity for columns or user-defined data
types.
User-defined data types are custom data types defined by the users with a custom name.
A partitioned table is created to manage the data and improve the query performance.
The ALTER TABLE statement is used to modify a table.
The DROP TABLE statement is used to delete a table.

Exercises

Exercise 1
Create a table named Recipient to store the details of recipients. The following table provides the
attribute name and the data type.
Attribute Name Data type
OrderNumber char(6)
FirstName varchar(20)
LastName varchar(20)
Address varchar(50)
City char(15)
State char(15)
CountryCode char(3)
ZipCode char(10)
Phone char(15)

Exercise 2
Create a table named Country to store the country details. The following table provides the
attribute name and the data type.
Attribute Name Data type
CountryID varchar(2)
Country char(25)

Exercise 3

The Recipient table and the Country table do not have the same data type for the CountryId
attribute. Following is the sample structure of the two tables.
Alter the Recipient or the Country table so that they have the same data type for the CountryId
attribute.

Exercise 4

Delete the Recipient table.

Exercise 5

Consider the following table structures.

Refer to these table structures for the following problems:


1. Create the Category table. Enforce the following data integrity rules while creating the table:
 The category id should be the primary key.
 The Category attribute should be unique but not the primary key.
 The description of the categories attribute can allow storage of NULL values.
2. Create the ProductBrand table. Enforce the following data integrity rules while creating the
table:
 The brand id should be the primary key.
 The brand name should be unique but not the primary key.
3. Create the NewProduct table with the following data integrity rules:
 The product id should be the primary key.
 The quantity on hand (QoH) of the product should be between 0 and 200.
 The Photo and ProductImgPath attributes can allow storage of NULL values.
 The ProductName and ProductDescription attributes should not allow NULL values.
 The values of the CategoryId attribute should be present in the Category table.
4. Modify the NewProduct table to enforce the following data integrity rule:
 The values entered in the BrandId attribute should be present in the ProductBrand
table.

Exercise 6
The following statement was used to remove the table called Category:
DELETE TABLE Category

The preceding statement displays an error and aborts. Identify the error and rectify it.

Chapter 5
Manipulating Data in Tables
After creating a database and the tables, the next step is to store data in the database. As a database
developer, you will be required to modify or delete data. You can perform these data
manipulations by using the Data Manipulation Language (DML) statements of Transact-SQL.

The data stored in the database can be used by different types of client applications, such as mobile
devices or Web applications. Therefore, data should be stored in a format that can be interpreted
by any application. For this, SQL Server allows you to store data in the Extensible Markup
Language (XML) format that can be read by any application.

This chapter discusses how to use the DML statements to manipulate data in the tables. In
addition, it explains how to manipulate the XML data in the database tables.

Objectives
In this chapter, you will learn to:
Manipulate data by using Data Manipulation Language statements
Manipulate the Extensible Markup Language data

Manipulating Data by Using DML Statements


As a database developer, you need to regularly insert, modify, or delete data. These operations
ensure that the data is up-to-date. For example, you need to insert new records in the Employee
table, whenever a new employee joins the organization.

Similarly, if the details of an employee change, you need to update the existing records. For
example, if the salary of any employee increases, you need to update the existing records to reflect
this change.

You can use the DML SQL statements to manipulate the data in any table.</

Storing Data in a Table


The smallest unit of data that you can add in a table is a row. You can add a row by using the
INSERT statement. The syntax of the INSERT statement is:
INSERT [INTO]{table_name} [(column_list)]
VALUES {DEFAULT | values_list | select_statement}

where,

table_name specifies the name of the table into which the data is to be inserted. The INTO keyword
is optional.
column_list specifies an optional parameter. You can use it when partial data is to be inserted in
a table or when the columns to be inserted are defined in a different order.

DEFAULT specifies the clause that you can use to insert the default value specified for the column.
If a default value is not specified for a column and the column property is specified as NULL,
NULL is inserted in the column. If the column does not have any default constraint attached to it
and does not allow NULL as the column value, SQL Server returns an error message and the insert
operation is rejected.

value_list specifies the list of values for the table columns that have to be inserted as a row in
the table. If a column has to contain a default value, you can use the DEFAULT keyword instead
of a column value. The column value can also be an expression.

select_statement specifies a nested SELECT statement that you can use to insert rows into the
table.

Guidelines for Inserting Rows

While inserting rows into a table, you need to consider the following guidelines:

The number of data values must be the same as the number of attributes in the table or
column list.
The order of inserting the information must be the same as the order in which attributes are
listed for insertion.
The values clause need not contain the column with the IDENTITY property.
The data types of the information must match the data types of the columns of the table.

Consider an example of the EmpData table that is used to store the details of the employees.

The following table describes the structure of the EmpData table.


Column Name Data Type Constraint
EmpName varchar(20) NULL
EmpNo int NOT NULL
EmpAddress varchar(60) NULL
Salary int NULL

Structure of the EmpData Table

To insert a row into the EmpData table with all the column values, you can use any one of the
following statements:
INSERT EmpData
VALUES ('Yang Kan', 101, ‘123 Nanjing Lu', 2500)
Or
INSERT EmpData (EmpName, EmpAddress, EmpNo, Salary)
VALUES ('Yang Kan', ‘123 Nanjing Lu', 101, 2500)
Or
INSERT Employee (EmpName, EmpAddress, Salary, EmpNo)
VALUES ('Yang Kan', ‘123 Nanjing Lu', 2500, 101)

Inserting Partial Data


Depending on the constraints applied to the columns of the tables, you can insert partial data into
the database tables. This means that while performing an insert operation, you can insert data for
selective columns in a table. It is not necessary that you have to insert values for all the columns
in the table. SQL Server allows you to insert partial data for a column that allows NULL or has a
default constraint assigned to it. The INSERT clause lists the columns for which data is to be
inserted, except those columns that allow NULL or have a default constraint. The VALUES clause
provides values for the specified columns.

In the previous example of EmpData table, the EmpAddress column allows you to enter a NULL
value in a row. Therefore, you can use the following statements to insert partial data into the table:
INSERT EmpData
VALUES ('Yang Kan', 101, NULL, 2500)
Or
INSERT EmpData (EmpName, EmpNo, Salary)
VALUES ('Yang Kan', 101, 2500)

Inserting Data in Related Tables

Data related to an entity can be stored in more than one table. Therefore, while adding information
for a new entity, you need to insert new rows in all the related tables. In such a case, you need to
first insert a row in the table that contains the primary key. Next, you can insert a row in the related
table containing the foreign key.

For example, in the AdventureWorks database, the employee details are stored in Person.Contact,
HumanResources.Employee, HumanResources.EmployeeDepartmentHistory, and
HumanResources.EmployeePayHistory tables. To save the details for a new employee, you need
to insert data in all these tables.

The following statements insert data of a new employee into the database:
–– inserting records in the Person.Contact table.
INSERT INTO Person.Contact VALUES (0, ‘Mr.', ‘Steven', NULL, ‘Fleming', NULL,
[email protected]', 1, ‘951-667-2401',
‘B4802B37F8F077A6C1F2C3F50F6CD6C5379E9C79', ‘3sa+edf=', NULL, DEFAULT, DEFAULT)
–– inserting records in the HumanResources.Employee table.
INSERT INTO HumanResources.Employee VALUES ('45879632', 19978,
‘adventure-works\steven', 185, ‘Tool Designer', ‘1967-06-03 00:00:00.000','M', ‘M',
‘2006-08-01 00:00:00.000', 1, 0, 0, 1, DEFAULT, DEFAULT)
–– inserting records in the HumanResources.EmployeeDepartmentHistory table.
INSERT INTO HumanResources.EmployeeDepartmentHistory VALUES (291, 2, 1, ‘2006-08-01
00:00:00.000', NULL, DEFAULT)
–– inserting records in the HumanResources.EmployeePayHistory table.
INSERT INTO HumanResources.EmployeePayHistory VALUES (291,
‘2006-08-01 00:00:00.000', 23.0769, 2, DEFAULT)

Copying Data from an Existing Table into a New Table


While inserting data in table, you might need to copy rows from an existing table to another table.
You can do this by using the SELECT statement.

For example, in the AdventureWorks database, data for the employees with a payment rate of 35
or above is to be copied into a new table called PreferredEmployee from the EmployeePayHistory
table.

The following statement copies the values from the EmployeePayHistory table into the
PreferredEmployee table:
SELECT * INTO PreferredEmployee
FROM HumanResources.EmployeePayHistory
WHERE Rate >= 35

The preceding statement will create a new table named PreferredEmployee. The table will have
the same structure as HumanResources.EmployeePayHistory.

You can also copy the values from an existing table to another existing table. For example, you
want to copy data for the employees with a payment rate of 10 or less from the
EmployeePayHistory table into the PreferredEmployee table. The PreferredEmployee table exists
in the database. You can perform this task by using the following statement:
INSERT INTO PreferredEmployee
SELECT * FROM HumanResources.EmployeePayHistory
WHERE Rate <=10

Whenever you copy data to another existing table, the source and target table structure must be
same.

Just a minute:
Which statement allows you to insert data in a table?

Answer:
INSERT INTO

Just a minute:
Which statement allows you to copy contents of one table into another table?
Answer:
SELECT INTO</

Updating Data in a Table


You need to modify the data in the database when the specifications of a customer, a client, a
transaction, or any other data maintained by the organization undergo a change.

For example, if a client changes his address or if the quantity of a product ordered is changed, the
required changes need to be made to the respective rows in the tables. You can use the UPDATE
statement to make the changes. Updating ensures that the latest and correct information is
available at any point of time. You can update one or more than one column of a row.

You can update data in a table by using the UPDATE DML statement. The syntax of the UPDATE
statement is:
UPDATE table_name
SET column_name = value [, column_name = value]
[FROM table_name]
[WHERE condition]

where,

table_name specifies the name of the table you have to modify.

column_name specifies the columns you have to modify in the specified table.

value specifies the value(s) with which you have to update the column(s) of the table. Some valid
values include an expression, a column name, and a variable name. The DEFAULT and NULL
keywords can also be supplied.

FROM table_name specifies the table(s) that is used in the UPDATE statement.

condition specifies the rows that you have to update.

Guidelines for Updating Data

You need to consider the following guidelines while updating data:

An update can be done on only one table at a time.


If an update violates integrity constraints, then the entire update is rolled back.

For example, the following table displays sample data from the EmpData table.

Values of the EmpData Table


The following statement updates the address of the employee having EmpNo 201:
UPDATE EmpData
SET EmpAddress = ‘Shi Lu’
WHERE EmpNo = 201

Consider another example, where you need to update the title of an employee named Lynn
Tsoflias to Sales Executive in the Employee table. To perform this task, you need to refer to the
Contact table to obtain the Contact ID. You can update the employee details by using the following
statement:
UPDATE HumanResources.Employee SET Title = ‘Sales Executive’
FROM HumanResources.Employee e, Person.Contact c
WHERE e.contactID = c.ContactID
AND c.FirstName = ‘Lynn’ and c.LastName = ‘Tsoflias’

Just a minute:
Which statement allows you to modify data in a database?

Answer:
UPDATE</

Deleting Data from a Table


You need to delete data from the database when it is no longer required. The smallest unit that
can be deleted from a database is a row.

You can delete a row from a table by using the DELETE DML statement. The syntax of the
DELETE statement is:
DELETE [FROM] table_name
[FROM table(s)]
[WHERE condition]

where,

table_name specifies the name of the table from which you have to delete rows.

table(s) specifies the name of the table(s) required to set the condition for deletion.

condition specifies the condition that identifies the row(s) to be deleted.

For example, the following statement deletes the address details of AddressID 104 from the
Address table:
DELETE Address
WHERE AddressID = ‘104’

Deleting Data from Related Tables


While deleting records from the related tables, you first need to delete the records from the table
that contains the foreign key. After that, you can delete the record from the table that contains the
primary key.

Consider the example of the AdventureWorks database. The Employee table contains data of
those employees who have retired from the company. This data is not required anymore. This
increases the size of the database. You are required to ensure that this old data is removed from
the Employee table.

You can delete this data by using the following statement:


DELETE FROM HumanResources.Employee
WHERE BirthDate < dateadd(yy, -60, getdate())

The database contains tables related to the Employee table. The related tables are
HumanResources.EmployeeAddress, HumanResources.EmployeeDepartmentHistory,
HumanResources.EmployeePayHistory, and HumanResources.JobCandidate. The EmployeeID
attribute in these tables is a foreign key to the EmployeeID attribute of the Employee table.
Therefore, the query results in an error. You need to delete data from the related tables before
executing the preceding DELETE statement.

Deleting All the Records from a Table

As a database developer, you might need to delete all the records from a table. You can do this by
using the following DELETE statement:
DELETE table_name

You can also use the TRUNCATE DML statement. The syntax of the TRUNCATE statement is:
TRUNCATE TABLE table_name

where,

table_name specifies the name of the table from which you have to delete rows.

However, the TRUNCATE DML statement is executed faster.

The TRUNCATE DML statement does not support the WHERE clause. In addition, the TRUNCATE
TABLE statement does not fire a trigger. When truncate is used, the deleted rows are not entered in
the transaction log.

For example, the following statement deletes all the records from the Address table:
TRUNCATE TABLE Address

Just a minute:
Which statement allows you to delete a single row from a table?
Answer:
The DELETE statement

Activity: Manipulating Data in Tables

Problem Statement

You are the database developer at AdventureWorks, Inc. As a part of the regular database
operations, you need to implement the following changes in the AdventureWorks database:

1. The management has decided to create a new department named Inventory Control under
the Inventory Management group. You need to add the details of this new department into
the Department table. The details should match the structure of the Department table.
2. Change the department names for the following employees to the Inventory Control
department:
a. Vamsi N. Kuppa
b. Susan W. Eaton

The Department ID of the Inventory Control department is 17.

Solution

To solve the preceding problem, you need to perform the following tasks:

1. Insert a new record in the Department table.


2. Update the employee details to change the department.
3. Verify that data is inserted and modified.

Task 1: Inserting a New Record in the Department Table

To add the details of the Inventory Control department into the Department table, you need to
perform the following steps:
1. Type the following statement in the Query Editor window of the Microsoft SQL Server
Management Studio window:
INSERT INTO HumanResources.Department VALUES ('Inventory Control', ‘Inventory
Management', DEFAULT)

2. Press the F5 key to execute the statement.

Task 2: Updating the Employee Details to Change the Department

The tables required to update the department ID of the two employees are
HumanResources.EmployeeDepartmentHistory, Person.Contact, and
HumanResources.Employee. Therefore, to update the details of the employees, you need to
perform the following steps:
1. Type the following statements in the Query Editor window of the Microsoft SQL Server
Management Studio window:
UPDATE HumanResources.EmployeeDepartmentHistory
SET DepartmentID = 17
FROM HumanResources.EmployeeDepartmentHistory d, HumanResources.Employee e,
Person.Contact c
WHERE d.EmployeeID = e.EmployeeID AND e.ContactID = c.ContactID
AND c.FirstName = ‘Vamsi’ AND c.MiddleName = ‘N’ AND c.LastName = ‘Kuppa’
UPDATE HumanResources.EmployeeDepartmentHistory
SET DepartmentID = 17
FROM HumanResources.EmployeeDepartmentHistory d, HumanResources.Employee e,
Person.Contact c
WHERE d.EmployeeID = e.EmployeeID AND e.ContactID = c.ContactID
AND c.FirstName = ‘Susan’ AND c.MiddleName = ‘W’ AND c.LastName = ‘Eaton’

2. Press the F5 key to execute the statements.

Task 3: Verifying that Data is Inserted and Modified

To verify that the data is inserted and modified, you need to perform the following steps:
1. Type the following statement in the Query Editor window of the Microsoft SQL Server
Management Studio window to verify that the details of new department are added:
SELECT * FROM HumanResources.Department
2. Press the F5 key to execute the statement.

The result set will display a list of all the departments. In this list, the details of the new
department will also appear.
3. Type the following queries in the Query Editor window of the Microsoft SQL Server
Management Studio window to verify that the employee details have been updated:
SELECT DepartmentID FROM HumanResources.EmployeeDepartmentHistory d
JOIN HumanResources.Employee e
ON d.EmployeeID = e.EmployeeID
JOIN Person.Contact c
ON e.ContactID = c.ContactID
WHERE FirstName = ‘Vamsi’ AND MiddleName = ‘N’ AND LastName = ‘Kuppa’
SELECT DepartmentID FROM HumanResources.EmployeeDepartmentHistory d
JOIN HumanResources.Employee e
ON d.EmployeeID = e.EmployeeID
JOIN Person.Contact c
ON e.Contactid = c.Contactid
WHERE FirstName = ‘Susan’ AND MiddleName = ‘W’ AND LastName = ‘Eaton’

The preceding queries display the department ID of the employees whose names are
specified in the parameters.
4. Press the F5 key to execute the preceding queries.

In the result set that is displayed, the department ID of the two employees will be 17.

Manipulating the XML Data


With a growth in clients accessing data through heterogeneous hardware and software platforms,
we need a language that could be interpreted by any environment. This resulted in the evolution
of a language called XML. XML is a markup language that is used to describe the structure of
data in a standard hierarchical manner. The structure of the documents containing the data is
described with the help of tags contained in the document. Therefore, various business
applications store their data in XML documents.

SQL Server 2005 allows you to save or retrieve data in the XML format. This enables SQL Server
to provide database support to various kinds of applications. As a database developer, it is
important for you to learn to manipulate the XML data by using SQL Server 2005.

The structure of the XML data can be defined in the form of a supporting Document Type Definition
(DTD) or schema. You can read more about XML in the Appendix.</

Storing XML Data in a Table


The XML data is available in the form of XML fragments or complete XML documents. An XML
fragment contains XML data without a top-level element.

The following example displays a complete XML document:


<customerdata>
<customer ID=”C001”>
<custname>John</custname>
</customer>
<customer ID=”C002”>
<custname>Peter</custname>
</customer>

The following example displays XML data in the form of XML fragment:
<customer ID=”C001”><custname>John</custname></customer>

SQL Server 2005 uses XML as a data type to save the XML data in its original state. You can
create tables or variables by using this data type to store the XML data. However, you can also
shred the XML data and store the values in different columns in a rowset. This process of
transforming XML data into a rowset is called as shredding.

In SQL Server 2005, you can store the XML data in the following ways:

A rowset
An XML column
Storing the XML Data in a Rowset
Consider an example. You have received order details from a vendor. The order details are
generated by the application used by the vendor in an XML document. You need to store this data
in a database table. For this, you need to shred the XML data. SQL Server allows you to shred the
XML data by using the OPENXML function and its related stored procedures.

Shredding an XML document involves the following tasks:

1. Parse the XML document


2. Retrieve a rowset from the tree
3. Store the data from the rowset
4. Clear the memory

The following figure represents the shredding of an XML document.

Shredding an XML document

In the preceding figure, you must first call the sp_xml_preparedocument procedure to parse the
XML document. This procedure returns a handle to the parsed document that is ready for
consumption. The parsed document is a document object model (DOM) tree representation of
various nodes in the XML document. The document handle is passed to OPENXML. OPENXML
then provides a rowset view of the document based on the parameters passed to it. The internal
representation of an XML document must be removed from memory by calling the
sp_xml_removedocument system stored procedure.

Parsing the XML Document

SQL Server 2005 provides the sp_xml_preparedocument stored procedure to parse the XML
document. This stored procedure reads the XML document and parses it with the MSXML parser.
Parsing an XML document involves validating the XML data with the structure defined in the
DTD or schema. The parsed document is an internal tree representation of various nodes in the
XML document, such as elements, attributes, text, and comments.
sp_xml_preparedocument returns a handle or pointer that can be used to access the newly created
internal representation of the XML document. This handle is valid for the duration of the session
or until the handle is destroyed by executing the sp_xml_removedocument stored procedure.

Retrieving a Rowset from the Tree

After verifying the accuracy of the structure and completeness of data, you need to extract the
data from the available XML data. For this, you can use the OPENXML function to generate an
in-memory rowset from the parsed data.

The syntax of the OPENXML function is:


openxml( idoc int [ in], rowpattern nvarchar [ in ], [ flags byte [ in ] ] )
[ WITH ( SchemaDeclaration | TableName ) ]

where,

idoc specifies the document handle of the internal tree representation of an XML document.

rowpattern specifies the XPath pattern used to identify the nodes (in the XML document whose
handle is passed in the idoc parameter) to be processed as rows. The row pattern parameter of
OPENXML allows you to traverse the XML hierarchy.

flags indicate the mapping that should be used between the XML data and the relational rowset,
and how the spill-over column should be filled. Flags is an optional parameter and can have the
following values:

0 - to use the default mapping (attributes)

1 - to retrieve attribute values

2 - to retrieve element values

3 - to retrieve both attribute and element values

SchemaDeclaration specifies the rowset schema declaration for the columns to be returned by
using a combination of column names, data types, and patterns.

TableName specifies the table name that can be given instead of SchemaDeclaration, if a table with
the desired schema already exists and no column patterns are required.

Storing the Data from the Rowset

You can use the rowset created by openxml to store the data, in the same way that you would use
any other rowset. You can insert the rowset data into permanent tables in a database. For example,
you can insert the data received by a supplier in the XML format into the SalesOrderHeader and
SalesOrderDetail tables.
Clearing the Memory

After saving the data permanently in the database, you need to release the memory where you
stored the rowset. For this, you can use the sp_xml_removedocument stored procedure.

Consider an example, where customers shop online and the orders given by the customers are
transferred to the supplier in the form of an XML document. The following data is available in
the XML document:
DECLARE @Doc int
DECLARE @XMLDoc nvarchar(1000)
SET @XMLDoc = N'<ROOT>
< Customer CustomerID="JH01" ContactName="John Henriot">
<Order OrderID="1001" CustomerID="JH01"
OrderDate="2006-07-04T00:00:00">
<OrderDetail ProductID="11" Quantity="12"/>
<OrderDetail ProductID="22" Quantity="10"/>
</Order>
< /Customer>
< Customer CustomerID="SG01" ContactName="Steve Gonzlez">
<Order OrderID="1002" CustomerID="SG01"
OrderDate="2006-08-16T00:00:00">
<OrderDetail ProductID="32" Quantity="3"/>
</Order>
< /Customer>
< /ROOT>'

To view the preceding XML data in a rowset, you need to perform the following steps:
1. Create an internal representation of the XML document by executing the following
statement:
EXEC sp_xml_preparedocument @Doc OUTPUT, @XMLDoc

2. Execute the following statement to store the data in an existing table, CustomerDetails by
using the OPENXML function:
INSERT INTO CustomerDetails
SELECT *
FROM openxml (@Doc, ‘/ROOT/Customer', 1)
WITH (CustomerID varchar(10),
ContactName varchar(20))

In the preceding statement, the OPENXML function takes three parameters. The first parameter,
@Doc is the document handle, which stores the internal representation of the XML document.
The second parameter specifies that the element, Customer is to be processed. The third parameter,
1 allows to retrieve the attribute values of the Customer element. The WITH clause is specifying
the column names along with the data types where attribute values will be stored. When the
preceding query is executed, the data will be displayed, as shown in the following table.
CustomerID ContactName
JH01 John Henriot
SG01 Steve Gonzlez
Representation of the XML Data in a Tabular Format

3. Remove the internal tree from the memory by executing the following statement:
EXEC sp_xml_removedocument @Doc

You can also specify the column pattern to map the rowset columns and the XML attributes and
elements. For example, consider the following statements:
DECLARE @Doc int
DECLARE @XMLDoc nvarchar(1000)
SET @XMLDoc = N'<ROOT>
< Customer CustomerID="JH01" ContactName="John Henriot">
<Order OrderID="1001" CustomerID="JH01"
OrderDate="2006-07-04T00:00:00">
<OrderDetail ProductID="11" Quantity="12"/>
<OrderDetail ProductID="22" Quantity="10"/>
</Order>
< /Customer>
< Customer CustomerID="SG01" ContactName="Steve Gonzlez">
<Order OrderID="1002" CustomerID="SG01"
OrderDate="2006-08-16T00:00:00">
<OrderDetail ProductID="32" Quantity="3"/>
</Order>
< /Customer>
< /ROOT>'

EXEC sp_xml_preparedocument @Doc OUTPUT, @XMLDoc

SELECT *
FROM openxml (@Doc, '/ROOT/Customer/Order/OrderDetail',1)
WITH (CustomerID varchar(10) '../../@CustomerID',
ContactName varchar(20) '../../@ContactName',
OrderID int '../@OrderID',
OrderDate datetime '../@OrderDate',
ProdID int '@ProductID',
Quantity int)

EXEC sp_xml_removedocument @Doc

In the preceding statements, the second parameter (/ROOT/Customer/Order/OrderDetail) of the


openxml function specifies that the current node is OrderDetail, which needs to be processed. The
WITH clause specifies the column names, data types and patterns. Here, the ‘@’ symbol specifies
the attribute, and the double dot (..) symbol represents the parent node of the current node.
Therefore, the expression, ../../@CustomerID, in the WITH clause represents the CustomerID
attribute of the Customer element.

The output of the preceding statements is displayed in the following table.


Representation of the XML Data in a Tabular Format

Storing XML Data in XML Columns

At times, you need to store the XML data in its original state in a column of a database table. For
example, you need to save the details of customers in the database. The details of individual
customers are maintained by a website. The website saves the details of each customer in an XML
file. As a database developer, you need to save this data in SQL Server. For this, you can create
the CustDetails table to store the customer details, as shown in the following statement:
CREATE TABLE CustDetails
(
CUST_ID int,
CUST_DETAILS XML
)

You can save the following types of data in the columns with the XML data types:

Untyped XML data: Is also a well-formed data, but is not associated with a schema. SQL
Server does not validate this data, but ensures that the data being saved with the XML data
type is well-formed.
Typed XML data: Is a well-formed data that is associated with a schema defining the elements
and their attributes. It also specifies a namespace for the data. When you save the typed XML
data in a table, SQL Server validates the data against the schema and assigns the appropriate
data type to the data based on the data types defined in the schema. This helps in saving the
storage space.

As a database developer, you should know how to store both types of data on SQL Server.

Storing Untyped XML Data

To store the untyped XML data, you can use columns or variables with the XML data type. For
example, to store customer data in the CustDetails table, you can use the following INSERT INTO
statement:
INSERT INTO CustDetails VALUES (2, ‘<Customer Name="Abrahim Jones” City="Selina”
/>‘)

In the preceding statement, the string value that contains an XML fragment is implicitly converted
to XML. However, you can also convert a string value to XML by using the CONVERT or CAST
functions. In this example, you can use the following statement to convert the data type of the
string value to XML before inserting the record into the table:
INSERT INTO CustDetails VALUES(2, convert(XML,’<Customer Name="Abrahim Jones”
City="Selina” />‘))

Similarly, you can also use the CAST function, as shown in the following statement:
INSERT INTO CustDetails VALUES(4, cast(’<Customer Name="Abrahim Jones” City="Selina”
/>‘ as XML))

Storing Typed XML Data

To store the typed XML data, you need to first register an XML schema in the XML schema
collection objects in the database. The XML schema collection is an object on SQL Server that is
used to save one or more XML schemas. You can create an XML schema collection object by
using the following statement:
CREATE XML SCHEMA COLLECTION <Namey> as Expression

where,

Name specifies an identifier name with which SQL Server will identify the schema collection.

Expression specifies an XML value that contains one or more XML schema documents.

For example, the customer details are associated with the following schema:
<schema xmlns="http://www.w3.org/2001/XMLSchema">
<element name="CustomerName" type="string"/>
<element name="City" type="string"/>
< /schema>'

You can use the following statement to register the preceding schema with the database:
CREATE XML SCHEMA COLLECTION CustomerSchemaCollection AS
'<schema xmlns="http://www.w3.org/2001/XMLSchema">
<element name="CustomerName" type="string"/>
<element name="City" type="string"/>
< /schema>'

The preceding statement creates a XML schema collection object named


CustomerSchemaCollection.

You can view information about the registered schemas in a database by querying the
sys.XML_schema_collections catalog view, as shown in the following query:
SELECT * FROM sys.XML_schema_collections

To drop an XML Schema Collection from the database, you can use the following statement:
DROP XML SCHEMA COLLECTION CustomerSchemaCollection
After registering the XML schema, you can use the schemas to validate typed XML values while
inserting records into the tables. You need to specify this while creating a table that will store the
XML data. In the preceding example, if you need to validate the customer details with the
CustomerSchemaCollection schema, you need to create the CustDetails table by using the
following statement:
CREATE TABLE CustDetails
(
CustID int,
CustDetail XML (CustomerSchemaCollection)
)

You can insert the data in the CustDetails table by using the following statement:
INSERT INTO CustDetails
VALUES(2,’<CustomerName>Abrahim Jones</CustomerName><City>Selina</City>‘)

While executing the preceding statement, SQL Server will validate the values for the CustDetails
column against the CustomerSchemaCollection schema.

Insert another record using the following statement:


INSERT INTO CustDetails
VALUES(2,’<Name>John</Name><City>New York</City>‘)

In the preceding statement, the value of CustDetail column is not following the schema definition.
The execution of the preceding statement will produce the following error:
Msg 6913, Level 16, State 1, Line 1
XML Validation: Declaration not found for element ‘Name'. Location: /*:Name[1]</

Retrieving Table Data into XML Format


At times, you need to retrieve the relational data from a table into the XML format for reporting
purposes or to share the data across different applications. This involves extracting data from a
table in the form of well-formed XML fragments. You can retrieve the data in XML format by
using:

The FOR XML clause in the SELECT statement


XQuery

Using the FOR XML Clause in the SELECT Statement

SQL Server 2005 allows you to extract data from relational tables into an XML format by using
the SELECT statement with the FOR XML clause. You can use the FOR XML clause to retrieve
the XML data by using the following modes:

RAW
AUTO
PATH
EXPLICIT
Using the RAW Mode

The RAW mode is used to return an XML file with each row representing an XML element. The
RAW mode transforms each row in the query result set into an XML element with the element
name, row. Each column value that is not NULL is mapped to an attribute with the same name as
the column name.

The following query displays the details of employees with employee ID as 1 or 2:


SELECT EmployeeID, ContactID, LoginID, Title
FROM HumanResources.Employee
WHERE EmployeeID=1 OR EmployeeID=2
FOR XML RAW

The preceding query displays the employee details in the following format:
<row EmployeeID="1" ContactID="1209" LoginID="adventure-works\guy1"
Title="Production Technician - WC60" />
< row EmployeeID="2" ContactID="1030" LoginID="adventure-works\kevin0"
Title="Marketing Assistant" />

If the ELEMENTS directive is specified with the FOR XML clause, each column value is mapped
to a subelement of the <row> element, as shown in the following query:
SELECT EmployeeID, ContactID, LoginID, Title
FROM HumanResources.Employee
WHERE EmployeeID=1 OR EmployeeID=2
FOR XML RAW, ELEMENTS

The preceding query displays the employee details in the following format:
<row>
<EmployeeID>1</EmployeeID>
<ContactID>1209</ContactID>
<LoginID>adventure-works\guy1</LoginID>
<Title>Production Technician - WC60</Title>
< /row>
< row>
<EmployeeID>2</EmployeeID>
<ContactID>1030</ContactID>
<LoginID>adventure-works\kevin0</LoginID>
<Title>Marketing Assistant</Title>
< /row>

When element-centric XML is returned, null columns are omitted in the results. However, you
can specify that null columns should yield empty elements with the xsi:nil attribute instead of
being omitted. For this, you can use the XSINIL option with ELEMENTS directive in AUTO,
RAW and PATH mode queries. For example, the following query displays the product details:
SELECT ProductID, Name, Color
FROM Production.Product Product
WHERE ProductID = 1 OR ProductID = 317
FOR XML RAW, ELEMENTS XSINIL
In the preceding query, the value of Color column for ProductID 1 is NULL. When you execute
the preceding query, the output will be displayed in the following format:
<row xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<ProductID>1</ProductID>
<Name>Adjustable Race</Name>
<Color xsi:nil="true" />
< /row>
< row xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<ProductID>317</ProductID>
<Name>LL Crankarm</Name>
<Color>Black</Color>
< /row>

In the preceding format, for ProductID 1, the xsi:nil attribute is added for the NULL value in the
Color column.

In EXPLICIT mode queries, you can use the elementxsinil column mode to yield empty elements
for the columns having NULL.

Using the AUTO Mode

The AUTO mode is used to return query results as nested XML elements. Similar to the RAW
mode, each column value that is not NULL is mapped to an attribute that is named after either the
column name or the column alias. The element that these attributes belong to is named to the table
that they belong to or the table alias that is used in the SELECT statement, as shown in the
following query:
SELECT EmployeeID, ContactID, LoginID, Title
FROM HumanResources.Employee Employee
WHERE EmployeeID=1 OR EmployeeID=2
FOR XML AUTO

The preceding query displays employee details in the following format:


<Employee EmployeeID="1” ContactID="1209” LoginID="adventure-works\guy1”
Title="Production Technician - WC60” />
<Employee EmployeeID="2” ContactID="1030” LoginID="adventure-works\kevin0”
Title="Marketing Assistant” />

If the optional ELEMENTS directive is specified in the FOR XML clause, the columns listed in
the SELECT clause are mapped to subelements, as shown in the following query:
SELECT EmployeeID, ContactID, LoginID, Title
FROM HumanResources.Employee Employee
WHERE EmployeeID=1 OR EmployeeID=2
FOR XML AUTO, ELEMENTS

The output of the preceding query is displayed in the following format:


<Employee>
<EmployeeID>1</EmployeeID>
<ContactID>1209</ContactID>
<LoginID>adventure-works\guy1</LoginID>
<Title>Production Technician - WC60</Title>
< /Employee>
< Employee>
<EmployeeID>2</EmployeeID>
<ContactID>1030</ContactID>
<LoginID>adventure-works\kevin0</LoginID>
<Title>Marketing Assistant</Title>
< /Employee>

Using the PATH Mode

The PATH mode is used to return specific values by indicating the column names for which you
need to retrieve the data, as shown in the following query:
SELECT EmployeeID “@EmpID",
FirstName “EmpName/First",
MiddleName “EmpName/Middle",
LastName “EmpName/Last”
FROM HumanResources.Employee e JOIN Person.Contact c
ON e.ContactID = c.ContactID
AND e.EmployeeID=1
FOR XML PATH

The preceding query displays the output in the following format:


<row EmpID="1">
<EmpName>
<First>Guy</First>
<Middle>R</Middle>
<Last>Gilbert</Last>
</EmpName>
< /row>

In the preceding format, the EmployeeID column is mapped to the EmpID attribute, and
FirstName, MiddleName, and LastName columns are mapped as subelements of the EmpName
element. A node preceded by @ symbol represents an attribute. Subelements are preceded by the
parent element name followed by the /.

You can also use the optional ElementName argument with the PATH mode query to modify the
name of the default row element, as shown in the following query:
SELECT EmployeeID “@EmpID",
FirstName “EmpName/First",
MiddleName “EmpName/Middle",
LastName “EmpName/Last”
FROM HumanResources.Employee e JOIN Person.Contact c
ON e.ContactID = c.ContactID
AND e.EmployeeID=1
FOR XML PATH('Employee')

The preceding query displays the output in the following format:


<Employee EmpID="1">
<EmpName>
<First>Guy</First>
<Middle>R</Middle>
<Last>Gilbert</Last>
</EmpName>
< /Employee>

In the preceding format, the Employee element becomes the root element.

Using the EXPLICIT Mode

The EXPLICIT mode is used to return an XML file that obtains the format as specified in the
SELECT statement. Separate SELECT statements can be combined with the UNION ALL
statement to generate each level/element in the resulting XML output. Each of these SELECT
statements requires the first two tags to be called Tag and Parent. The Parent element is used to
control the nesting of elements. It contains the tag number of the parent element of the current
element. The top-level element in the document should have the Parent value set to 0 or NULL.

While writing EXPLICIT mode queries, column names in the resulting rowset must be specified
in the following format:
ElementName!TagNumber!AttributeName!Directive

where,

ElementName specifies the name of the element.

TagNumber specifies the unique tag value assigned to an element. It, along with the value in the
Parent tag, determines the nesting of the elements in the resulting XML.

AttributeName specifies the name of the attribute. This attribute will be created in the element
specified by the ElementName, if the directive is not specified.

Directive specifies the type of AttributeName. It is used to provide additional information for
construction of the XML. It is optional and can have values, such as xml, cdata, or element. If you
specify the element, it will generate a subelement instead of an attribute.

For example, the managers of AdventureWorks want to access the information regarding products
through their mobile devices. These devices cannot directly connect to SQL Server, but can read
the data provided in the XML format. Therefore, you need to convert the details of the products
from the Product table into the XML document. To perform this task, you need to create an XML
document with <Product> as the parent tag. The <Product> tag will contain ProductID as an
attribute, and >ProductName> and <Color> as child elements.

To perform this task, you can create the following query:


SELECT 1 AS Tag,
NULL AS Parent,
ProductID AS [Product!1!ProductID],
Name AS [Product!1!ProductName!element],
Color AS [Product!1!Color!elementxsinil]
FROM Production.Product
Where ProductID = 1 OR ProductID = 317
FOR XML EXPLICIT

The preceding query assigns 1 as Tag value fort he <Product> element and NULL as Parent,
because <Product> is the top-level element.

The Product!1!ProductID column specifies that the ProductId will be the attribute of the Product
element. If the type of the node is not specified, it will be an attribute, by default.

The Product!1!ProductName!element column specifies that the ProductName will be the child
element of Product, as the type is specified as element.

The Product!1!Color!elementxsinil column specifies that the Color element will be the child
element of Product, and it also generates the element for null values as the type is specified as
elementxsinil.

The execution of the preceding query generates the output in the following format:
<Product xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" ProductID="1">
<ProductName>Adjustable Race</ProductName>
<Color xsi:nil="true" />
< /Product>
< Product xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" ProductID="317">
<ProductName>LL Crankarm</ProductName>
<Color>Black</Color>
< /Product>

Using XQuery

In addition to FOR XML, SQL Server 2005 allows you to extract data stored in variables or
columns with the XML data type by using XQuery. XQuery is a language that uses a set of
statements and functions provided by the XML data type to extract data. As compared to the FOR
XML clause of the SELECT statement, the XQuery statements allow you to extract specific parts
of the XML data.

Each XQuery statement consists of two parts, prolog and body. In the prolog section, you declare
the namespaces. In addition, schemas can be imported in the prolog. The body part specifies the
XML nodes to be retrieved. It contains query expressions that define the result of the query. The
XQuery language supports the following expressions:

for: Used to iterate through a set of nodes at the same level as in an XML document.
let: Used to declare variables and assign values.
order by: Used to specify a sequence.
where: Used to specify criteria for the data to be extracted.
return: Used to specify the XML returned from a statement.

The preceding expressions are defined by the acronym FLOWR (pronounced “flower”). The
XQuery statements also use the following functions provided by the XML data type:
Query: Used to extract XML from an XML data type. The XML to be retrieved is specified by
the XQuery expression passed as a parameter. For example, the following DeliverySchedule
table in the Sales schema stores the delivery details of the sales orders.

Column Name Data Type


ScheduleID int IDENTITY PRIMARY KEY
ScheduleDate DateTime
DeliveryRoute int
DeliveryDriver nvarchar(20)
DeliveryList Xml

DeliverySchedule Table

A record is inserted in the DeliverySchedule table using the following statement:


INSERT INTO Sales.DeliverySchedule
VALUES
(GetDate(), 3, 'Bill',
'<?xml version="1.0" ?>
< DeliveryList xmlns="http://schemas.adventure?works.com/DeliverySchedule">
<Delivery SalesOrderID="43659">
<CustomerName>Steve Schmidt</CustomerName>
<Address>6126 North Sixth Street, Rockhampton</Address>
</Delivery>
<Delivery SalesOrderID="43660">
<CustomerName>Tony Lopez</CustomerName>
<Address>6445 Cashew Street, Rockhampton</Address>
</Delivery>
< /DeliveryList>')

The following query retrieves the delivery driver and the name of customers from the
DeliverySchedule table:
SELECT DeliveryDriver, DeliveryList.query
('declare namespace ns="http://schemas.adventure-works.com/DeliverySchedule";
ns:DeliveryList/ns:Delivery/ns:CustomerName') as ‘Customer Names’
FROM Sales.DeliverySchedule

In the preceding query, the query() function is used to retrieve the customer names. This function
is taking the path of the CustomerName element as parameter. Here, ns is the prefix of the
namespace used in the XML document.

Value: Used to return a single value from an XML document. To extract a single value, you
need to specify an XQuery expression that identifies a single node and a data type of the
value to be retrieved.

For example, the following query retrieves the address of the first delivery:
SELECT DeliveryList.value
('declare namespace ns="http://schemas.adventure-works.com/DeliverySchedule";
(ns:DeliveryList/ns:Delivery/ns:Address)[1]', ‘nvarchar(100)') DeliveryAddress
FROM Sales.DeliverySchedule

In the preceding query, the value() function takes two parameters, the path of Address element
and its data type. Here, ‘(ns:DeliveryList/ns:Delivery/ns:Address)[1]’ represents the index of the
first address.

Exist: Used to check the existence of a node in an XML data. The function returns 1 if the
specified node exists else it returns 0. The following query finds the driver for a specific order:

SELECT DeliveryDriver
FROM Sales.DeliverySchedule
WHERE DeliveryList.exist
('declare namespace ns="http://schemas.adventure-works.com/DeliverySchedule";
/ns:DeliveryList/ns:Delivery[@SalesOrderID=43659]') = 1

In the preceding query, it returns the driver for the order, whose sales order id is 1.

Just a minute:
Which system stored procedure can be used to view the information about the registered
schemas in a database?

Answer:
sys.XML_schema_collections

Just a minute:
Which clause is used to extract data from a table in the XML format?

Answer:
FOR XML clause</

Modifying XML Data


Similar to any other type of data, you might also need to modify the XML data. To modify data,
you can use the modify function provided by the XML data type of SQL Server. The modify
function specifies an XQuery expression and a statement that specifies the kind of modification
that needs to be done. This function allows you to perform the following modifications:

Insert: Used to add nodes to XML in an XML column or variable. For example, in the
Adventureworks database, the customer details are stored in the following CustomDetails
table.
CustomDetails Table

In the preceding table, the customer details are stored in an XML format in the Cust_Details
column. The management of AdventureWorks wants to store the type of the customer in the
CustomDetails table. This can be done by adding an attribute, Type in the XML data of the
Cust_Details column. The default value of the Type attribute should be ‘Credit’. To perform the
required task, you can create the following statement:
UPDATE CustomDetails SET Cust_Details.modify(’
insert attribute Type{"Credit"} as first
into (/Customer)[1]')

The execution of the preceding statement adds the Type attribute with a default value, Credit. The
‘as first’ clause specifies that the attribute will be inserted in the beginning of all attribute. After
the addition of the attribute, the CustomDetails table will contain the data, as shown in the
following figure.

CustomDetails Table After Adding a New Attribute

Replace: Used to update the XML data. For example, James Stephen, one of the customers
of AdventureWorks, has decided to change his customer type from Credit to Cash. As a
database developer, you can create the following statement to reflect this change:

UPDATE CustomDetails SET Cust_Details.modify (’


replace value of(Customer/@Type)[1] with “Cash"') WHERE Cust_ID = 3
In the preceding statement, the value of the Type attribute will be replaced as cash for the
customer whose customer id is 3, as shown in the following figure.
CustomDetail Table After Changing the Attribute Value

Delete: Used to remove a node from the XML data. For example, the management of
AdventureWorks has decided to remove the details of the city from the customer details. You
can write the following statement to remove the City attribute:

UPDATE CustomDetails SET Cust_Details.modify (


‘delete(/Customer/@City)[1]')
The preceding statement, deletes the City attribute from the Cust_Details column in the
CustomDetails table, as shown in the following figure.

CustomDetails Table After Deleting the Attribute

Activity: Working with XML Data

Problem Statement

The Human Resource department of AdventureWorks, Inc. has decided to maintain the previous
employment details of all the new employees. For this, you need to create the following table in
the HumanResources schema of the database.
Column Name Data Type
EmployeeID int
EmploymentHistory XML

EmployeeHistoryDetails

The EmployeeHistoryDetails table will store the following set of details data in the XML format:

PreviousEmploymentOrg
PreviousEmploymentAddress
PreviousEmploymentDesig
PreviousEmploymentDuration

While storing the data, you need to ensure that the data is as per the required structure.

In addition, you need to insert the following details in the EmployeeHistoryDetails table.
EmployeeHistoryDetails Table

Solution

To solve the preceding problem, you need to perform the following tasks:

1. Create a schema.
2. Create the EmployeeHistoryDetails table by using the schema.
3. Insert records in the EmployeeHistoryDetails table.
4. Verify the insertion of records.

Task 1: Creating a Schema

To store the details of an employee in the XML format, you need to create an XML schema
collection. The schema collection will be based on the following conditions:

EmployeeDetails should be the parent tag.


PreviousEmploymentOrg should be the child tag. This tag should occur at least once and a
maximum of five times.
PreviousEmploymentAddress, PreviousEmploymentDesig, and
PreviousEmploymentDuration should be the attributes of the PreviousEmploymentOrg tag.

To create the schema collection, you need to perform the following steps:
1. Type the following statement in the Query Editor window of the Microsoft SQL Server
Management Studio window to create an XML Schema Collection:
CREATE XML SCHEMA COLLECTION EmployeeSchemaCollection as
'<?xml version="1.0"?>
< xsd:schema
targetNamespace= "http://schemas.adventure?works.com/Employees"
xmlns="http://schemas.adventure-works.com/Employees"
elementFormDefault="qualified"
attributeFormDefault="unqualified"
xmlns:xsd="http://www.w3.org/2001/XMLSchema" >
< xsd:element name="EmployeeDetails">
< xsd:complexType>
< xsd:sequence>
<xsd:element name="PreviousEmploymentOrg" minOccurs="1" maxOccurs="5">
<xsd:complexType>
<xsd:simpleContent>
<xsd:extension base="xsd:string">
<xsd:attribute name="PreviousEmploymentAddress"
type="xsd:string" />
<xsd:attribute name="PreviousEmploymentDesig" type="xsd:string" />
<xsd:attribute name="PreviousEmploymentDuration" type="xsd:string" />
</xsd:extension>
</xsd:simpleContent>
</xsd:complexType>
</xsd:element>
< /xsd:sequence>
< /xsd:complexType>
< /xsd:element>
< /xsd:schema>'

2. Press the F5 key to execute the statement.

Task 2: Creating the EmployeeHistoryDetails Table by Using the Schema

The EmployeeHistoryDetails table contains the EmployeeID and EmployeeHistory columns. The
EmployeeID attribute is of the int data type and EmploymentHistory should contain XML data.
The EmploymentHistory attribute should be associated with the EmployeeSchemaCollection
XML Schema.
1. Type the following statement in the Query Editor window of the Microsoft SQL Server
Management Studio window, to create the EmployeeHistoryDetails table:
CREATE TABLE EmployeeHistoryDetails
(
EmployeeID int,
EmploymentHistory XML(EmployeeSchemaCollection)
)

2. Press the F5 key to execute the statement.

Task 3: Inserting Records in the EmployeeHistoryDetails Table

After the EmployeeHistoryDetails table has been created, you need to insert the data in the table.
For this, you need to perform the following steps:
1. Type the following statement in the Query Editor window of the SQL Server Management
Studio window, to insert the records in the table by using the following statements:
INSERT INTO EmployeeHistoryDetails VALUES (1001,
'<?xml version="1.0"?>
< EmployeeDetails xmlns="http://schemas.adventure?works.com/Employees">
<PreviousEmploymentOrg
PreviousEmploymentAddress = "New Jersey"
PreviousEmploymentDesig = "Software Developer"
PreviousEmploymentDuration = "3 Years"> HP </PreviousEmploymentOrg>
</EmployeeDetails>')

INSERT INTO EmployeeHistoryDetails VALUES (1002,


'<?xml version="1.0"?>
< EmployeeDetails xmlns="http://schemas.adventure?works.com/Employees">
<PreviousEmploymentOrg
PreviousEmploymentAddress = "New York"
PreviousEmploymentDesig = "Project Manager"
PreviousEmploymentDuration = "2 Years"> IBM </PreviousEmploymentOrg>
< /EmployeeDetails>')
2. Press the F5 key to execute the statements.

Task 4: Verifying the Insertion of Records

To verify the insertion of data, retrieve data from the EmployeeHistoryDetails table. For this, you
need to perform the following steps:
1. Type the following statement in the Query Editor window of the Microsoft SQL Server
Management Studio window:
SELECT * FROM EmployeeHistoryDetails

2. Press the F5 key to execute the statement.

The result set will display the two records that were added in the previous task.

Summary
In this chapter, you learned that:

The INSERT statement is used to insert data into the table.


While inserting data into a table, the data type of the information must match the data types
of the columns of the table.
It is not essential to insert data into a column that allows NULL or has a default constraint
assigned to it.
You can copy contents from one table into another table by using the SELECT INTO statement.
SQL Server provides a row update statement called UPDATE to modify values within tables.
You can delete a row from a table by using the DELETE statement.
You use the TRUNCATE TABLE statement to remove all the rows from a table.
SQL Server 2005 uses XML as a data type to save the XML data in its original state.
SQL Server allows you to shred the XML data by using the openxml statement.
Untyped XML data is a well-formed data, but is not associated with a schema.
Typed XML data is a well-formed data that is associated with a schema.
You can create an XML schema collection object by using the CREATE XML SCHEMA
COLLECTION statement.
You can use the FOR XML clause of the SELECT statement to retrieve the XML data in different
ways by using the RAW, AUTO, PATH, EXPLICIT modes.
You can use the XQuery functions to extract the XML data stored in a column with the XML
data type.
The XQuery statement uses the Query, Value, and Exist functions to query the XML data in
the table.
You can modify the XML data by using the modify function provided by the XML data type.
Using the modify function, you can insert, update, or remove a node from the XML data.

Exercises

Exercise 1

Insert the following data in the ProductBrand table of the AdventureWorks database.
BrandID BrandName
B01 Lee
B02 Nike
B03 Reebok

Exercise 2
AdventureWorks, Inc. has set up a new store. Insert the following data into the database:

Store Name – Car Store

Sales Person ID – 283


Demographics - <StoreSurvey
XMLns="http://schemas.microsoft.com/sqlserver/2004/07/adventure?works/StoreSurvey">
<AnnualSales>350000</AnnualSales>
<AnnualRevenue>35000</AnnualRevenue>
<BankName>International Bank</BankName>
<BusinessType>BM</BusinessType>
<YearOpened>1980</YearOpened>
<Specialty>Road</Specialty>
<SquareFeet>7500</SquareFeet>
<Brands>AW</Brands>
<Internet>T1</Internet>
<NumberEmployees>7</NumberEmployees>
< /StoreSurvey>

Tip
A store in AdventureWorks is treated like a customer. Therefore, you need to first create a record in
the customer table by storing the territory id and specifying the customer type as ‘S'. Then you need
to add the store details in the Store table.

Exercise 3

The address of a vendor, Comfort Road Bicycles, has changed. You need to update the following
data in the AdventureWorks database.
Address 4151 Olivera
City Atlanta
StateProvinceID 17
PostalCode 30308
Exercise 4
Delete all the records from the ProductBrand table. Ensure that you do not delete the table.

Exercise 5

The users of AdventureWorks need to publish the details of all the customers and their address on
the organizations website. To perform this task, you need to retrieve the data in the XML format.

Exercise 6

The management of AdventureWorks requires a list containing the skills of all the candidates who
have applied for a vacancy. The details of all the candidates are stored in the XML format in the
HumanResources.JobCandidate table.

Exercise 7

The production of a bicycle at AdventureWorks involves a number of phases. In each phase, the
bicycle is moved to a different work center. The details of all the work centers are stored in the
Production.ProductModel table. Bicycles of different types pass through different work centres,
depending on the components that need to be fitted. The management wants a list of all the types
of bicycles that go through work center 10. How will you generate this list?

Exercise 8

There is a change in the production process of the bicycle with the product model id 7. Due to this
change, the bicycle will not be going to work centre 10. You need to update this change in the
database. How will you perform this task?

Chapter 6
Implementing Indexes, Views, and Full-Text Search
A database developer is often required to improve the performance of queries. SQL Server 2005
allows you to reduce the execution time of queries by implementing indexes. In addition, you can
restrict the view of data to different users by implementing views.

SQL Server also provides an in-built full-text search capability to allow fast searching of data.

This chapter discusses how to create and manage indexes and views. In addition, it discusses about
implementing full-text search capability.

Objectives
In this chapter, you will learn to:
Create and manage indexes
Create and manage views
Implement a full-text search

Creating and Managing Indexes


When a user queries data from a table based on conditions, the server scans all the data stored in
the database table. With an increasing volume of data, the execution time for queries also
increases. As a database developer, you need to ensure that the users are able to access data in the
least possible time.

SQL Server 2005 allows you to create several indexes on a table. In SQL Server 2005, it is the
job of the query optimizer to determine which indexes will be the most useful in processing a
specific query. In addition, SQL Server 2005 allows you to create XML indexes for columns that
store XML data. Although indexes may speed up queries in large tables, indexes will slow update
operations (insert, delete, update). This is because every update causes a rebuild of the index.

At times, the table that you need to search contains huge amount of data. In such cases, it is
advisable to create partitioned indexes. A partitioned index makes the index more manageable
and scaleable as they store data of a particular partition only.

As a database developer, you need to create and manage indexes. Before creating an index, it is
important to identify different types of indexes.</

Identifying the Types of Indexes


Before identifying the types of indexes, it is important to understand the need to implement an
index.

The data in the database tables is stored in the form of data pages. Each data page is 8 KB in size.
Therefore, entire data of the table is stored in multiple data pages. When a user queries a data
value from the table, the query processor searches for the data value in all the data pages. When
it finds the value, it returns the result set. As the data in the table increases, this process of querying
data takes time.

To reduce the data query time, SQL Server allows you to implement indexes on tables. An index
is a data structure associated with a table that enables fast searching of data. Indexes in SQL Server
are like the indexes at the back of a book that you can use to locate text in the book.

Indexes provide the following benefits:

Accelerate queries that join tables, and perform sorting and grouping
Enforce uniqueness of rows, (if configured for that)

An index contains a collection of keys and pointers. Keys are the values built from one or more
columns in the table. The column on which the key is built is the one on which the data is
frequently searched. Pointers store the address of the location where a data page is stored in the
memory, as depicted in the following figure.
Structure of an Index

When the users query data with conditions based on the key columns, the query processor scans
the indexes, retrieves the address of the data page where the required data is stored, and accesses
the information. The query processor does not need to search for data in all the data pages.
Therefore, the query execution time is reduced. When you modify the data of an indexed column,
the associated indexes are updated automatically.

SQL Server allows you to create the following types of indexes:

Clustered index
Nonclustered index

Clustered Index

A clustered index is an index that sorts and stores the data rows in the table based on their key
values. Therefore, the data is physically sorted in the table when a clustered index is defined on
it. Only one clustered index can be created per table. Therefore, you should build the clustered
index on attributes that have a high percentage of unique values and are not modified often.

Consider the following figure where the rows of the Employee table are sorted according to the
Eid attribute.
Working of a Clustered Index

The preceding figure displays a clustered index on the Employee table. To search for any record,
SQL Server would start at the root page. For example, if the row containing Eid E006 was to be
searched by using a clustered index (refer to the preceding figure), SQL Server performs the
following steps:

1. SQL Server starts from page 603, the root page.


2. SQL Server searches for the highest key value on the page, which is less than or equal to
the search value. The result of this search is the page containing the pointer to Eid, E005.
3. The search continues from page 602. There, Eid E005 is found and the search continues to
page 203.
4. Page 203 is searched to find the required row.

A clustered index determines the order in which the rows are actually stored. Therefore, you can
define only one clustered index on a table.

Nonclustered Index

Similar to the clustered index, a nonclustered index also contains the index key values and the
row locators that point to the storage location of the data in a table. However, in a nonclustered
index, the physical order of the rows is not the same as the index order.

Nonclustered indexes are typically created on columns used in joins and the WHERE clause.
These indexes can also be created on columns where the values are modified frequently. SQL
Server creates nonclustered indexes by default when the CREATE INDEX statement is given.
There can be as many as 249 nonclustered indexes per table.

The data in a nonclustered index is present in a random order, but the logical ordering is specified
by the index. The data rows may be randomly spread throughout the table.
The following figure represents the working of a nonclustered index.

Working of a NonClustered Index

The preceding figure displays a nonclustered index present on the Eid attribute of the Employee
table. To search for any value, SQL Server would start from the root page and move down until
it reaches a leaf page that contains a pointer to the required record. It would then use this pointer
to access the record in the table. For example, to search for the record containing Eid, E006 by
using the nonclustered index, SQL Server performs the following steps:

1. SQL Server starts from page 603, which is the root page.
2. It searches for the highest key value less than or equal to the search value, that is, to the
page containing the pointer to Eid, E005.
3. The search continues from page 602.
4. Eid, E005 is found and the search continues to page 203.
5. Page 203 is searched to find a pointer to the actual row. Page 203 is the last page, or the
leaf page of the index.
6. The search then moves to page 302 of the table to find the actual row.

In an index, more than one row can contain duplicate values. However, if you configure an index to
contain unique values, the index will also contain unique values. Such an index is called a unique
index. You can create a unique index on columns that contain unique values, such as the primary
key columns. A unique index can be clustered or nonclustered depending on the nature of the
column on which it is built.</

Creating Indexes
You should create indexes on the most frequently queried columns in a table. However, at times,
you might need to create an index based on the combination of one or more columns. An index
based on one or more columns is called a composite index. A composite index can be based on a
maximum of 16 columns. However, indexes with less number of columns use less disk space and
involve fewer resources when compared to indexes based on more columns.

To create an index, you can use the CREATE INDEX statement. The syntax of the CREATE
INDEX statement is:
CREATE [UNIQUE][CLUSTERED | NONCLUSTERED] INDEX index_name
ON [{database_name.[schema_name]. | schema_name.}]
{table_or_view_name}(column [ASC | DESC][,…n])
[WITH(<relational_index_option>[,…n])]
[ON {partition_scheme_name(column_name[,…n])
| filegroup_name | DEFAULT}]
<relation_index_option>::=
{PAD_INDEX = {ON | OFF}
| FILLFACTOR = fillfactor
| ONLINE = {ON | OFF}

where,
UNIQUE creates an index where each row should contain a different index value.
CLUSTERED specifies a clustered index where data is sorted on the index attribute.
NONCLUSTERED specifies a nonclustered index that organizes data logically. The data is not
sorted physically.
index_name specifies the name of the index.
table_name specifies the name of the table that contains the attributes on which the index is to
be created.
column specifies the column or columns on which the index will be created.
ON partition_scheme_name ( column_name ) specifies the partition scheme that defines the
filegroups onto which the partitions of the partitioned index will be mapped.
ON filegroup_name specifies the filegroup on which index is created.
ON DEFAULT specifies that the specified index will be created on the default filegroup.
PAD_INDEX = { ON | OFF } specifies the index padding, which is OFF, by default.
FILLFACTOR = 1 to 100 specifies a percentage that indicates how full the leaf level of each
index page should become during index creation or rebuild. The default value is 0.
ONLINE = { ON | OFF } checks whether the underlying tables and associated indexes are
available to query and modify the data during the index operation.

You can create online indexes only in SQL Server 2005 Enterprise Edition.

Consider an example of an organization that maintains employee details in the Employee table.
You can create a clustered index on the EmployeeID attribute of the Employee table by using the
following statement:
CREATE CLUSTERED INDEX IX_EmployeeID
ON Employee (EmployeeID)
WITH FILLFACTOR = 10

In the preceding statement, the FILLFACTOR value of 10 has been specified to reserve a
percentage of free space on each data page of the index to accommodate future expansion.

The following statement creates a nonclustered index on the ManagerID attribute of the Employee
table:
CREATE NONCLUSTERED INDEX IDX_Employee_ManagerID
ON Employee (ManagerID)

When a PRIMARY KEY or UNIQUE constraint is created on a table, an index is created


automatically with the same name as the constraint.

Guidelines for Creating Indexes

You need to consider the following guidelines while creating indexes on a table:

Create clustered indexes on columns that have unique or not null values.
Do not create an index that is not used frequently. You require time and resources to
maintain indexes.
Create a clustered index before creating a nonclustered index. A clustered index changes the
order of rows. A nonclustered index would need to be rebuilt if it is built before a clustered
index.
Create nonclustered indexes on all columns that are frequently used in predicates and join
conditions in queries.

Creating XML Indexes


When a query is based on an XML column, the query processor needs to parse the XML data each
time the query is executed. In SQL Server 2005, an XML data value can be of a maximum of two
gigabytes (GB). Therefore, the XML values can be very large and the server might take time to
generate the result set. To speed up the execution of the query based on the XML data type, SQL
Server 2005 allows you to create an index that is based on columns storing XML data values.
Such indexes are called XML indexes.

You need to consider the following guidelines while creating an XML index:

XML indexes can be created only on XML columns.


XML indexes only support indexing a single XML column.
XML indexes cannot be created on an XML column in a view, on a table-valued variable with
XML columns or XML type variables.

XML indexes created on a table do not allow you to modify the primary key. To do so, you first
need to drop all the XML indexes on the table.

XML indexes are of the following types:


Primary XML index
Secondary XML index

Primary XML Index

This is a clustered B-Tree representation of the nodes in the XML data. When an index is created
on a column with the XML data type, an entry will be created for all the nodes in the XML data.
Therefore, the index creates several rows of data for each XML value in the column. Each row
stores the following information:

The name of each tag, such as an element or attribute name, in the XML
The value of each node
The type of the node, such as an element node, attribute node, or text node
Path from each node to the root of the XML tree. This column is searched for path expressions
in the query
The primary key for that table

The preceding information is used to evaluate and construct XML results for a specified query.

There must be a clustered primary key on the table on which you want to create the index on XML
data-type column. You can create XML indexes on XML columns by using the CREATE
PRIMARY XML INDEX and CREATE XML INDEX T-SQL statements. For example, the
ProductModel table contains the CatalogDescription column that stores XML values. You can
create a primary XML index on this column by using the following statement:
CREATE PRIMARY XML INDEX PXML_ProductModel_CatalogDescription ON
Production.ProductModel (CatalogDescription)

The preceding statement will create an index for all the nodes in the XML data stored in the
CatalogDescription column. The name of the primary XML index created on the ProductModel
table is PXML_ProductModel_CatalogDescription.

The first index on an XML type column must be the primary XML index.

Secondary XML Index

This is a nonclustered index of the primary XML index. A primary XML index must exist before
any secondary index can be created on a column with the XML data type. After you have created
the primary XML index, an additional three kinds of secondary XML indexes can be defined on
each XML column. The secondary XML indexes can only be created only on columns that already
have a primary XML index. These indexes assist in the processing of XQuery expressions.

The secondary XML indexes are of the following types:

Path indexes
Value indexes
Property indexes
Path Indexes

The path secondary XML index is used to speed up the execution of queries that use XML path
expressions. It is built on the path id and value columns of the primary XML indexes. This index
improves the performance of queries that use paths and values to select data.

For example, if you execute a query that checks for the existence of a product model Id using an
XQuery expression as /PD:ProductDescription/@ProductModelID[.="19"], you can create a path
secondary index on the CatalogDescription column of the ProductModel table. In this path index,
you can use the primary index created previously.

The following statement creates a path index on the CatalogDescription column:


CREATE XML INDEX PIdx_ProductModel_CatalogDescription_PATH ON
Production.ProductModel (CatalogDescription)
USING XML INDEX PXML_ProductModel_CatalogDescription
FOR PATH

The preceding statement creates a path index, PIdx_ProductModel_CatalogDescription_PATH


on the CatalogDescription column of the table.

Value Indexes

The value indexes contain the same items as path indexes but in the reverse order. It contains the
value of the column first and then the path id. This index improves the performance of queries
that search for the values anywhere in the XML document.

For example, if you execute a query that checks the existence of a node in an XQuery expression
such as //Item[@ProductID=”1”], you can create a value secondary index by using the primary
index created previously.

The following statement creates a value index on the CatalogDescription column:


CREATE XML INDEX PIdx_ProductModel_CatalogDescription_VALUE ON
Production.ProductModel (CatalogDescription)
USING XML INDEX PXML_ProductModel_CatalogDescription
FOR VALUE

The preceding statement creates a value index,


PIdx_ProductModel_CatalogDescription_VALUE on the CatalogDescription column of the
table.

Property Indexes

The property index contains the primary key of the base table, path id, and value columns of the
primary XML index. This index improves the performance of queries that retrieve particular
object properties from within an XML document by using the value() method.

For example, if you execute a query that returns a value of the node in an XQuery expression,
such as /ItemList/Item/@ProductID)[1], you can create a property secondary index on the
CatalogDescription column of the ProductModel table by using the following statement:
CREATE XML INDEX PIdx_ProductModel_CatalogDescription_PROPERTY ON
Production.ProductModel (CatalogDescription)
USING XML INDEX PXML_ProductModel_CatalogDescription
FOR PROPERTY

The preceding statement creates a property index,


PIdx_ProductModel_CatalogDescription_PROPERTY, on the CatalogDescription column of the
table.

Just a minute:
Which type of index implements physical sorting of data?

Answer:
Clustered Index

Just a minute:
Which type of an XML index is created first on the table?

Answer:
Primary XML index</

Creating Partitioned Indexes


In SQL Server 2005, indexes can also be partitioned based on the value ranges. Similar to the
partitioned tables, the partitioned indexes also improve query performance. Partitioning enables
you to manage and access subsets of data quickly and efficiently. When indexes become very
large, you can partition the data into smaller, more manageable sections.

Partitioning an index will distribute the table data into multiple filegroups, thereby partitioning
the table. This will enable the database engine to read or write data quickly. This also helps in
maintaining the data efficiently.

For example, the SalesOrderHeader table contains the details about the orders received by
AdventureWorks, Inc. As the data in this table is large, the query takes a long time to execute. To
solve this problem, you can create a partitioned index on the table.

Partitioning is allowed only in the Enterprise Edition of SQL Server 2005.

To create a partitioned index, you need to perform the following tasks:

1. Create a partition function.


2. Create a partition scheme.
3. Create a clustered index.

Creating a Partition Function

Similar to creating a partitioned table, you need to create a partition function to create a partitioned
index. The partition function will determine the boundary values for creating partitions.

For example, the queries on the SalesOrderHeader table are mostly based on the OrderDate
column. The sales manager of AdventureWorks requires the details of the orders received, on a
yearly basis. The table contains the details of orders for the last five years beginning from 2001.
Based on this information, you can create a partition function, as shown in the following
statement:
CREATE PARTITION FUNCTION PFOrderDate (datetime)
AS RANGE RIGHT FOR VALUES ('2002-01-01', ‘2003-01-01', ‘2004-01-01', ‘2005-01-01')

The preceding statement creates a partition function, PFOrderDate, by using the datetime data
type. It specifies four boundary values. Therefore, there will be five partitions. As range right is
specified for partitioning, the first partition will contain data less than the first boundary value,
2002-01-01. The second partition will contain data greater than or equal to 2002-01-01 but less
than 2003-01-01. Other partitions will store data similarly.

Creating a Partition Scheme


After creating the partition function, you need to create a partition scheme to associate it with the
partition function. Based on the boundary values defined in the partition function, there will be
five partitions. The data of each partition is stored in a filegroup. You should have the same
number of filegroups as partitions. If there are five partitions, you need to create five filegroups:
fg1, fg2, fg3, fg4, and fg5.

The following statement creates the PSOrderDate partition scheme, associating it with the
PFOrderDate partition function:
CREATE PARTITION SCHEME PSOrderDate
AS PARTITION PFOrderDate
TO (fg1, fg2, fg3, fg4, fg5)

The partition scheme, PSOrderDate, created in the preceding statement directs each partition to a
separate filegroup.

Creating a Clustered Index

After creating the partition scheme, you need to associate it with a clustered index. The clustered
index is created on the attribute having unique and non-null values. Therefore, you can create the
index on the SalesOrderID column of the SalesOrderHeader table. To create the partitioned index,
you need to associate the clustered index with the partition scheme, as shown in the following
statement:
CREATE CLUSTERED INDEX ix_SalesOrderID
ON Sales.MySalesOrderHeader (SalesOrderID)
ON PSOrderDate (OrderDate)
The preceding statement will distribute the table data into five filegroups based on the yearly data
of orders stored in the OrderDate column.

Just a minute:
Which of the following options is used to specify the percentage of space to be used for each
index page?

1. Fill Factor
2. Pad Index
3. Path Index
4. Value Index

Answer:
1. Fill Factor</

Managing Indexes
In addition to creating indexes, you also need to maintain them to ensure their continued optimal
performance. The common index maintenance tasks include disabling, enabling, renaming, and
dropping an index. As a database developer, you need to regularly monitor the performance of the
index and optimize it.

Disabling Indexes

When an index is disabled, the user is not able to access the index. If a clustered index is disabled,
then the table data is not accessible to the user. However, the data still remains in the table, but is
unavailable for Data Modification Language (DML) operations until the index is dropped or
rebuilt.

To rebuild and enable a disabled index, use the ALTER INDEX REBUILD statement or the
CREATE INDEX WITH DROP_EXISTING statement.

The following statement disables a clustered index, IX_EmployeeID, on the Employee table.
ALTER INDEX IX_EmployeeID
ON Employee DISABLE

Enabling Indexes

After an index is disabled, it remains in the disabled state until it is rebuilt or dropped. You can
enable a disabled index by rebuilding it through one of the following methods:

Using the ALTER INDEX statement with the REBUILD clause


Using the CREATE INDEX statement with the DROP_EXISTING clause
Using DBCC DBREINDEX
By using one of the preceding statements, the index is rebuilt and the index status is set to enable.
You can rebuild a disabled clustered index, when the ONLINE option is set to ON.

For example, you can enable the clustered index of the Employee table by using the following
statement:
ALTER INDEX IX_EmployeeID
ON Employee REBUILD

The preceding statement will rebuild the clustered index on the Employee table. This allows you
to view the data in the Employee table.

Renaming Indexes

You can rename the current index with the help of the sp_rename system stored procedure.

The following statement renames the IX_JobCandidate_EmployeeID index on the JobCandidate


table to IX_EmployeeID.
EXEC sp_rename ‘HumanResources.JobCandidate.IX_JobCandidate_EmployeeID',
‘IX_EmployeeID','index’

Dropping Indexes
When you no longer need an index, you can remove it from a database. You cannot drop an index
used by either a PRIMARY KEY or UNIQUE constraint, except by dropping the constraint.

The following statement drops the IX_EmployeeID index on the Employee table:
DROP INDEX IX_EmployeeID
ON Employee

Activity: Creating Indexes

Problem Statement
The production manager of the AdventureWorks, Inc. needs to frequently view data from the
Product table in the Production schema. He needs to frequently search for data based on the
product number.

The Product table contains a large volume of data. Therefore, the query takes time to execute. To
reduce the time taken in the execution of the query, you need to suggest a solution to improve
performance. For this, you need to check the performance of the query before and after applying
the suggested solution.

Tip
You can check the performance of a query by using the execution plan of that query.
Before performing this activity, it is essential to ensure that the Production.Product table does not
contain any index on the column based on which the queries are executed frequently.

Solution

To solve the preceding problem, you can apply an index on the column on which the data is
frequently searched. To apply an index, you need to perform the following tasks:

1. Identify the column to be indexed.


2. Enable the display of the query execution plan.
3. Check the I/O cost of the query.
4. Create an index to improve performance.
5. Verify the improvement in query execution.

Task 1: Identifying the Column to be Indexed

The queries are based on the ProductNumber column of the Product table. Therefore, the index
must be created on the ProductNumber column.

Task 2: Enabling the Display of the Query Execution Plan

To verify the difference in the query performance, you need to view the execution plan before and
after creating the index. To enable SQL Server to display the execution plan, select Query →
Include Actual Execution Plan.

To view the execution plan for a query, you need to have the SHOWPLAN permission on the
database.

Task 3: Checking the I/O Cost of the Query

Before suggesting a solution to improve performance, you need to know the current I/O cost. To
know the current I/O cost, perform the following steps:
1. Type the following query in the Query Editor window of the Microsoft SQL Server
Management Studio window:
SELECT * FROM Production.Product
WHERE ProductNumber = ‘RA-2345’
2. Press the F5 key to execute the preceding query.

The output of the preceding query appears, as shown in the following figure.
Output of the Query

Note that the Output pane contains the Execution plan tab that includes the execution plan
in the result set.
3. Click the Execution plan tab in the Output pane to view the execution plan, as shown in
the following figure.

Execution Plan of the Query

4. Move the mouse pointer over the Clustered Index Scan object.

The Clustered Index Scan page appears. Note the Estimated I/O Cost, as shown in the
following figure.

I/O Cost of the Query


Task 4: Creating an Index to Improve Performance

The queries are frequently based on the ProductNumber column of the Product table in the
Production schema. All the values in the ProductNumber column are unique and do not contain
Null values. In addition, the table already contains a clustered index on the ProductID column.
Therefore, you can create a unique nonclustered index on this column to improve performance.

To create the index, you need to perform the following steps:


1. Type the following statement to create an index on the ProductNumber column of the
Production.Product table:
CREATE UNIQUE NONCLUSTERED INDEX Product_ProductNumber
ON Production.Product (ProductNumber)

2. Press the F5 key to execute the statement.

Task 5: Verifying the Improvement in Query Execution

To verify the improvement in the query execution, you need to perform the following steps:
1. Execute the following query to check the Estimated I/O Cost:
SELECT * FROM Production.Product
WHERE ProductNumber = ‘RA-2345’

2. Compare the Estimated I/O Cost with the previous value. Note that the Estimated I/O Cost
decreases after creating the index. This proves that the speed of the query execution has
improved after creating the index.

Creating and Managing Views


At times, the database administrator might want to restrict access of data to different users. They
might want some users to be able to access all the columns of a table whereas other users to be
able to access only selected columns. SQL Server allows you to create views to restrict user access
to the data. Views also help in simplifying query execution when the query involves retrieving
data from multiple tables by applying joins.

A view is a virtual table, which provides access to a subset of columns from one or more tables.
It is a query stored as an object in the database, which does not have its own data. A view can
derive its data from one or more tables called the base tables or underlying tables.

Views provide the following advantages:

Providing relevant data for users: A view is generally used to focus, simplify, and customize
each user’s perception of the data. It can be used as a security mechanism by allowing users
to access and manipulate data through the view. Data that is unnecessary, confidential, or
inappropriate can be excluded from a view definition.
Hiding data complexity: Views hide the complexity of the database design from the user. This
enables developers to change the database design without affecting the user interaction with
the database. In addition, users can view data by using names that are easier to understand
than the cryptic names often used in databases.
Retrieving specific rows and columns of a table: Views can be based on a complex query that
joins two or more tables. This appears to the user as a single table called a partitioned view.
For example, if one table contains the salary details for the employees in the USA, and
another table contains salary details for the employees in the UK, a view could be created
from the UNION of these tables. The view would represent the salary details of employees in
both the countries.
Reducing the object size: Views do not contain data. SQL Server stores only the definition of
the view in the database.

Depending on the volume of the data, you can create a view with or without an index. As a
database developer, it is important for you to learn how to create and manage views.</

Creating Views
In SQL, a view is a way to get a restricted subset of data. It is a database object that is used to
view data from the tables in the database. A view has a structure similar to a table. It does not
contain any data, but derives its data from the underlying tables.

You can use views to encapsulate complex queries. After a view on a set of data has been created,
you can treat that view as another table. When the data in a table changes, view also displays the
updated data. Views do not take up physical space in the database as tables do.

Views ensure security of data by restricting access to:

Specific rows of a table.


Specific columns of a table.
Specific rows and columns of a table.
Rows fetched by using joins.
Statistical summary of data in a given table.
Subsets of another view or a subset of views and tables.

Apart from restricting access, views can also be used to create and save queries based on multiple
tables. To view data from multiple tables, you can create a query that includes various joins. If
you need to frequently execute this query, you can create a view that executes this query. You can
access data from this view every time you need to execute the query.

You can create a view by using the CREATE VIEW statement. The syntax of the CREATE VIEW
statement is:
CREATE VIEW view_name
[(column_name [, column_name]…)]
[WITH ENCRYPTION [, SCHEMABINDING]]
AS select_statement [WITH CHECK OPTION]

where,
view_name specifies the name of the view.
column_name specifies the name of the column(s) to be used in a view.
WITH ENCRYPTION specifies that the text of the view will be encrypted in the syscomments
view.
SCHEMABINDING binds the view to the schema of the underlying table or tables.
AS specifies the actions to be performed by the view.
select_statement specifies the SELECT statement that defines a view. The view may use the
data contained in other views and tables.
WITH CHECK OPTION forces the data modification statements to meet the criteria given in the
SELECT statement defining the view. The data is visible through the view after the
modifications have been made permanent.

Syscomments view is a system-defined view stored in the database. It contains entries for each view,
rule, default, CHECK constraint, DEFAULT constraint, and stored procedure within the database.
This view contains a text column that stores the original SQL definition statements.

Guidelines for Creating Views

While creating views, you should consider the following guidelines:

The name of a view must follow the rules for identifiers and must not be the same as that of
the table on which it is based.
A view can be created only if there is a SELECT permission on its base table.
A view cannot derive its data from temporary tables.
In a view, ORDER BY cannot be used in the SELECT statement.

For example, to provide access only to the employee ID, marital status, and department ID for all
the employees, you can use the following statement to create the view:
CREATE VIEW HumanResources.vwEmployeeDepData
AS
SELECT e.EmployeeID, MaritalStatus, DepartmentID
FROM HumanResources.Employee e JOIN HumanResources.EmployeeDepartmentHistory d
ON e.EmployeeID = d.EmployeeID

The preceding statement creates the vwEmployeeDepData view. This view contains the selected
columns from the Employee and EmployeeDepartmentHistory tables, as shown in the following
figure.
Selected Data Displayed by View

Restrictions at the Time of Modifying Data Through Views

Views do not maintain a separate copy of the data, but only display the data present in the base
tables. Therefore, you can modify the base tables by modifying the data in the view. However, the
following restrictions exist while inserting, updating, or deleting data through views:

You cannot modify data in a view if the modification affects more than one underlying table.
However, you can modify data in a view if the modification affects only one table at a time.
You cannot change a column that is the result of a calculation, such as a computed column
or an aggregate function.

For example, a view displaying employee id, manager id, and rate of the employees has been
defined using the following statement:
CREATE VIEW vwSal AS
SELECT i.EmployeeID, i.ManagerID, j.Rate FROM HumanResources.Employee AS i
JOIN HumanResources.EmployeePayHistory AS j ON
i.EmployeeID = j.EmployeeID

After creating the view, if you try to execute the following UPDATE statement, it generates an
error:
UPDATE vwSal
SET ManagerID = 2, Rate = 12.45
WHERE EmployeeID = 1

The preceding statement generates the following error.


Msg 4405, Level 16, State 1, Line 1
View or function ‘vwSal’ is not updatable because the modification affects multiple
base tables.

The preceding error is generated because the data is being modified in two tables through a single
UPDATE statement. Therefore, instead of a single UPDATE statement, you need to execute two
UPDATE statements for each table.

The following statement would update the EmployeeID attribute in the Employee table:
UPDATE vwSal
SET ManagerID = 2
WHERE EmployeeID = 1

You can update the Rate attribute in the EmployeePayHistory table using the following statement:
UPDATE vwSal
SET Rate = 12.45
WHERE EmployeeID = 1

Therefore, to modify the data in two or more underlying tables through a view, you need to execute
separate UPDATE statements for each table.
You can create an INSTEAD OF trigger on the view to modify data in a view if the modification
affects more than one underlying table. You will learn about triggers in Chapter 8.</

Indexing Views
Similar to tables, you can create indexes on views. By default, views created on a table are
nonindexed. However, you can index the views when the volume of data in the underlying tables
is large and not frequently updated. Indexing a view helps in improving the query performance.

Another benefit of creating an indexed view is that the optimizer starts using the view index in
queries that do not directly name the view in the FROM clause. If the query contains references
to columns that are also present in the indexed view, and the query optimizer estimates that using
the indexed view offers the lowest cost access mechanism, the query optimizer selects the indexed
view.

When indexing a view, you need to first create a unique clustered index on a view. After you have
defined a unique clustered index on a view, you can create additional nonclustered indexes. When
a view is indexed, the rows of the view are stored in the database in the same format as a table.

Guidelines for Creating an Indexed View

You should consider the following guidelines while creating an indexed view:

A unique clustered index must be the first index to be created on a view.


The view must not reference any other views. It can reference only base tables.
All base tables referenced by the view must be in the same database and have the same
owner as the view.
The view must be created with the SCHEMABINDING option. Schema binding binds the view
to the schema of the underlying base tables.

Creating an Indexed View by Using the CREATE INDEX Statement

You can create indexes on views by using the CREATE INDEX statement. For example, you can
use the following statement for creating a unique clustered index on the vwEmployeeDepData
view:
CREATE UNIQUE CLUSTERED INDEX idx_vwEmployeeDepData
ON HumanResources.vwEmployeeDepData (EmployeeID, DepartmentID)

When the preceding statement is executed, it generates an error, “Cannot create index on view
‘vwEmployeeDepData’ because the view is not schema bound”. This error is generated because
the vwEmployeeDepData view was not bound to the schema at the time of its creation. Therefore,
before executing the preceding statement, you need to bind the vwEmployeeDepData view to the
schema using the following statement:
ALTER VIEW HumanResources.vwEmployeeDepData WITH SCHEMABINDING
AS
SELECT e.EmployeeID, MaritalStatus, DepartmentID
FROM HumanResources.Employee e JOIN HumanResources.EmployeeDepartmentHistory d
ON e.EmployeeID = d.EmployeeID

The preceding statement alters the existing view, vwEmployeeDepData, and binds it with the
schema of the underlying tables. You can then create a unique clustered index on the view.

Just a minute:
In which of the following conditions will you NOT create an indexed view?

1. When the data is large


2. When the data is regularly updated
3. When you need to improve the performance of the view

Answer:
2. When the data is regularly updated</

Managing Views
In addition to creating views, you also need to manage them. Management of a view includes
altering, dropping, or renaming views.

Altering Views

If you define a view with a SELECT * statement, and then alter the structure of the underlying
tables by adding columns, the new columns do not appear in the view. Similarly, when you select
all the columns in a CREATE VIEW statement, the columns list is interpreted only when you first
create the view. To add new columns in the view, you must alter the view.

You can modify a view without dropping it. This ensures that permissions on the view are not
lost. You can modify a view without affecting its dependent objects.

To modify a view, you need to use the ALTER VIEW statement. The syntax of the ALTER VIEW
statement is:
ALTER VIEW view_name [(column_name)]
[WITH ENCRYPTION]
AS select_statement
[WITH CHECK OPTION]

where,
view_name specifies the view to be altered.
column_name specifies the name of the column(s) to be used in a view.
WITH ENCRYPTION option encrypts the text of the view in the syscomments view.
AS specifies the actions to be performed by the view.
select_statement specifies the SELECT statement that defines a view.
forces the data modification statements to follow the criteria given in the
WITH CHECK OPTION
SELECT statement.

For example, you have created a view to retrieve selected data from the Employee and
EmployeeDepartmentHistory tables.You need to alter the view definition by including the
LoginID attribute from the Employee table.

To modify the definition, you can write the following statement:


ALTER VIEW HumanResources.vwEmployeeDepData
AS
SELECT e.EmployeeID, LoginID, MaritalStatus, DepartmentID
FROM HumanResources.Employee e JOIN HumanResources.EmployeeDepartmentHistory d
ON e.EmployeeID = d.EmployeeID

The preceding statement alters the view definition by including the LoginID attribute from the
Employee table.

Dropping Views
You need to drop a view when it is no longer required. You can drop a view from a database by
using the DROP VIEW statement. When a view is dropped, it has no effect on the underlying
table(s). Dropping a view removes its definition and all the permissions assigned to it.

Further, if you query any view that references a dropped table, you receive an error message.
Dropping a table that references a view does not drop the view automatically. You have to use the
DROP VIEW statement explicitly.

The syntax of the DROP VIEW statement is:


DROP VIEW view_name

where,
view_name is the name of the view to be dropped.

For example, you can use the following statement to remove the vwEmployeeDepData view:
DROP VIEW HumanResources.vwEmployeeDepData

The preceding statement will drop the vwEmployeeDepData view from the database.

You can drop multiple views with a single DROP VIEW statement. The names of the views that
need to be dropped are separated by commas in the DROP VIEW statement.

Renaming Views

At times, you might need to change the name of a view. You can rename a view without dropping
it. This ensures that permissions on the view are not lost. A view can be renamed by using the
sp_rename system stored procedure.

The syntax of the sp_rename procedure is:


sp_rename old_viewname, new_viewname
where,
old_viewname is the view that needs to be renamed.
new_viewname is the new name of the view.

For example, you can use the following statement to rename the vwSal view:
sp_rename vwSal, vwSalary

The preceding statement renames the vwSal view as vwSalary.

While renaming views, you must conform to the following guidelines:

The view must be in the current database.


The new name for the view must follow the rules for identifiers.
The view can only be renamed by its owner.
The owner of the database can also rename the view.

Activity: Creating Views

Problem Statement

You are a database developer at AdvenureWorks, Inc. You need to frequently generate a report
containing the following details of the employees:

Employee ID
Employee First Name
Employee Last Name
Title
Manager First Name
Manager Last Name

To retrieve this data, you need to execute the following query on the database:
SELECT e1.EmployeeID, c1.FirstName, c1.LastName, e1.Title,
c2.FirstName AS [Manager First Name], c2.LastName AS [Manager Last Name]
FROM HumanResources.Employee e1 INNER JOIN Person.Contact c1
ON e1.ContactID = c1.ContactID INNER JOIN HumanResources.Employee AS e2
ON e1.ManagerID = e2.EmployeeID INNER JOIN Person.Contact AS c2
ON e2.ContactID = c2.ContactID

As a database developer, you need to simplify the execution of the preceding query so that you
do not need to send such a large query to the database engine every time the report is required.

Solution

To solve the preceding problem, you need to perform the following tasks:

1. Create a view.
2. Verify the simplification of the query execution.
Task 1: Creating a View

To retrieve the employee details, you need to query two tables, HumanResources.Employee and
Person.Contact. As you need to access multiple columns from these tables, you can simplify the
query by creating a view.

To create a view, you need to perform the following steps:

1. Identify the tables from where you need to retrieve the data as
HumanResources.Employee and Person.Contact.
2. Write the following statement in the Query Editor window of the Microsoft SQL Server
Management Studio window:

CREATE VIEW vw_Emp_Details


AS
SELECT e1.EmployeeID, c1.FirstName,
c1.LastName, e1.Title, c2.FirstName AS [Manager First Name],
c2.LastName AS [Manager Last Name]
FROM HumanResources.Employee e1 INNER JOIN Person.Contact c1
ON e1.ContactID = c1.ContactID INNER JOIN
HumanResources.Employee AS e2
ON e1.ManagerID = e2.EmployeeID
INNER JOIN Person.Contact AS c2
ON e2.ContactID = c2.ContactID

Task 2: Verifying the Simplification of the Query Execution

To verify the simplification of the query execution, you need to perform the following steps:

1. Type the following query in the Query Editor window:

SELECT EmployeeID, FirstName, LastName, Title, [Manager First Name], [Manager


Last Name] FROM vw_Emp_Details

2. Press the F5 key to execute the query.

The preceding query displays the output, as shown in the following figure.
Output of the View

Implementing a Full-Text Search


While querying data, you can use the LIKE operator to search for a text value in a column.
However, at times, you might need to perform complex search on the data. For example, you need
to search for synonyms or antonyms of a particular word. SQL Server allows you to improve data
search by configuring the full-text search feature. The full-text search feature helps you to search
for complex strings in the database.

In SQL Server 2005, the full-text search is disabled by default. As a database developer, you
should know how to configure and search data by using full-text search.</

Configuring Full-Text Search


The full-text query feature in SQL Server enables users to search for a wide range of text in SQL
tables. For example, the sales management team of AdventureWorks, Inc. makes frequent
searches on the ProductDescription table to develop marketing strategies. The search is based on
the data stored in the Description column of the table.

A bike racing competition is scheduled to begin in Beijing. The sales manager of AdventureWorks
wants to see the details of all the bikes that are related to racing, so that a marketing strategy can
be designed to increase the sale of these bikes. Specifically, he wants a list of all the bikes that
have the keyword ‘race winner’ in the description.

As the data is large, the search query takes a long time to retrieve data from the table. In this
scenario, you can apply a full-text index on the Description column of the ProductDescription
table to improve the speed of searching.

To retrieve the required details by using full-text search, you need to configure full-text search on
the database. For this, you need to perform the following tasks:

1. Enable full-text search in the database.


2. Create a full-text catalog.
3. Create a unique index.
4. Create a full-text index.
5. Populate the full-text index.
You need to be a member of the sysadmin role to enable full-text search, create the full-text catalog,
and create a full-text index.

Enabling Full-Text Search in the Database

Before using the full text search feature of SQL Server, you need to enable the database using the
following statements:
USE AdventureWorks
GO
sp_fulltext_database enable
GO

Creating a Full-Text Catalog

A full-text catalog serves as a container to store full-text indexes. After enabling the full text
search, you need to create a full-text catalog. A full-text catalog is a container that contains full-
text indexes. A full-text catalog may have multiple full-text indexes. The syntax of the CREATE
FULLTEXT CATALOG statement is:
CREATE FULLTEXT CATALOG catalog_name
[ON FILEGROUP filegroup ]
[IN PATH ‘rootpath']
[WITH ACCENT_SENSITIVITY = {ON|OFF}]
[AS DEFAULT]
[AUTHORIZATION owner_name ]

where,
catalog_name specifies the name of the new catalog.
filegroup specifies the name of the filegroup where the new catalog will be added. If the
filegroup is not specified, the new catalog will be added in the default filegroup. The default
full-text filegroup is the primary filegroup.
rootpath specifies the root directory for the catalog. If the rootpath is not specified, the new
catalog will be located in the default directory.
AS DEFAULT specifies that the catalog is the default catalog. When full-text indexes are created
without specifying a catalog, the default catalog is used.
ACCENT_SENSITIVITY = {ON|OFF} specifies whether the catalog is accent sensitive or not for
full-text indexing.
owner_name specifies the owner of the full-text catalog. It can be the name of a database user
or role.

For example, you can create a full-text catalog by using the following statement:
CREATE FULLTEXT CATALOG Cat1 AS DEFAULT
The preceding statement creates a full-text catalog by the name Cat1 and makes it the default
catalog.

Creating a Unique Index

After creating the full-text catalog, you need to identify a unique index on the table. The full-text
engine requires this unique index to map each row in the table to a unique key. You can use an
existing unique index defined on the table, or create a new one. For example, you can create a
unique index on the Production.ProductDescription table, as shown in the following statement:
CREATE UNIQUE INDEX Ix_Desc ON Production.ProductDescription (ProductDescriptionID)

Creating a Full-Text Index

After you have created the full-text catalog and a unique index, you can create a full-text index
on the table. A full-text index stores information about significant words and their location within
a given column. You can use this information to compute full-text queries that search for rows
with particular words or combinations of words. Full-text indexes can be created on the base tables
but not on the views or the system tables.

There are certain words that are used often and may hinder a query. These words are called noise
words and are excluded from the search string. For example, if your search string is “Who is the
governor of California”, a full-text search will not look for words, such as ‘is’ and ‘the’. Some
noise words are a, an, the, and are.

The syntax of the CREATE FULLTEXT INDEX statement is:


CREATE FULLTEXT INDEX ON table_name
[(column_name [TYPE COLUMN type_column_name]
[LANGUAGE language_term] [,…n])]
KEY INDEX index_name
[ON fulltext_catalog_name]
[WITH
{CHANGE_TRACKING {MANUAL | AUTO | OFF [, NO POPULATION]}}
]

where,
table_name specifies the name of the table or indexed view, which contains the column or
columns included in the full-text index.
column_name specifies the name of the column or columns included in the full-text index. Only
columns of type char, varchar, nchar, nvarchar, text, ntext, image, xml, and varbinary can be
used for full-text indexing.
TYPE COLUMN type_column_name specifies the name of the column that holds the document type
of the column. It must be specified only if the columns are of type varbinary(max) or image.
LANGUAGE language_term specifies the language of the data stored in the column or columns
included in the full-text index.
,… n specifies that multiple comma separated columns can be used in the full-text index.
KEY INDEX index_name specifies the name of the unique key index in the table.
ON fulltext_catalog_name specifies the full-text catalog used for the full-text index. If the
catalog_name is not specified, the default catalog is used.
WITH CHANGE_TRACKING {MANUAL | AUTO | OFF [ , NO POPULATION]} specifies whether SQL
Server maintains a list of all the changes to the indexed data.
MANUAL specifies that the change-tracking log will be propagated either on a schedule using
SQL Server Agent, or manually by the user.
AUTO specifies that SQL Server automatically updates the full-text index whenever the table is
modified. This is the default option.
OFF [ , NO POPULATION] specifies that SQL Server does not keep a list of changes to the
indexed data.

The NO POPULATION option can be used only when CHANGE_TRACKING is OFF. When
NO POPULATION is specified, SQL Server does not populate an index after it is created. The
index is only populated after the user executes the ALTER FULLTEXT INDEX statement with
the START FULL, or INCREMENTAL POPULATION clause. When NO POPULATION is not
specified, SQL Server populates the index fully after it is created.

Based on the preceding scenario, you can create a full-text index on the Description column, as
shown in the following statement:
CREATE FULLTEXT INDEX ON Production.ProductDescription (Description) KEY INDEX
Ix_Desc

The preceding statement will create a full-text index on the Description column of the
ProductionDescription table. This index is based on the Ix_Desc unique index created earlier on
the Description column of the table.

You can also create the full-text index in the Object Explorer window by right-clicking the table, on
which you need to create the full-text index, and selecting Full-Text index → Define Full-Text
Index.

Populating the Full-Text Index

After creating the full-text index, you need to populate it with the data in the columns enabled for
full-text support. SQL Server full-text search engine populates the full-text index through a
process called population. Population involves filling the index with words and their location in
the data page. When a full-text index is created, it is populated by default. In addition, SQL Server
automatically updates the full-text index as the data is modified in the associated tables.

However, SQL Server does not keep a list of changes made to the indexed data when the
CHANGE_TRACKING option is off. This option is specified while creating the full-text index
by using the CREATE FULLTEXT INDEX statement.

If you do not want the full-text index to be populated when it is created using the CREATE
FULLTEXT INDEX statement, then you must specify NO POPULATION along with the
CHANGE TRACKING OFF option. To populate the index, you need to execute the ALTER
FULLTEXT INDEX statement along with the START FULL, INCREMENTAL, or UPDATE
POPULATION clause.

For example, to create an empty full-text index on the ProductDescription table, you can execute
the following statement:
CREATE FULLTEXT INDEX ON Production.ProductDescription (Description)
KEY INDEX PK_ProductDescription_ProductDescriptionID
WITH CHANGE_TRACKING OFF, NO POPULATION

To populate the index, you need to execute the following statement:


ALTER FULLTEXT INDEX ON Production.ProductDescription START FULL POPULATION

The preceding statement will populate the full-text index created on the ProductDescription table.

Similar to regular SQL indexes, full-text indexes can also be updated automatically as the data is
modified in the associated tables. This repopulation can be time-consuming and adversely affect
the usage of resources of the database server during periods of high database activity. Therefore,
it is better to schedule repopulation of full-text indexes during periods of low database activity.
You can specify the following types of full-text index population methods to repopulate the index:

Full population
Change tracking-based population
Incremental timestamp-based population

Full Population

You can use this method when you need to populate the full-text catalog or the full-text index for
the first time. After that, you can maintain the indexes by using change tracking or incremental
populations.

During a full population of a full-text catalog, index entries are built for all the rows in all the
tables covered by the catalog. If a full population is requested for a table, index entries are built
for all the rows in that table.

Change Tracking-Based Population

SQL Server maintains a record of the rows that have been modified in a table set up for full-text
indexing. These changes are propagated to the full-text index.

Incremental Timestamp-Based Population

The incremental population method updates the full-text index with the data that has been changed
since the last time the index was refreshed. For an incremental population refresh to work, the
indexed table must have a column of the timestamp data type. If a table does not have a column
of the timestamp data type, then only a full population refresh can be done.</

Searching Data by Using a Full-Text Search


After the full-text index has been created on a table, you can query the table by using the full-text
predicates. The full-text predicates are used to specify how the search string should be searched
in the table. The following predicates can be used while performing the full-text search.

FREETEXT: In SQL Server, you can use the LIKE operator for searching a pattern. But, if you
are not looking for an exact match, if you need to go beyond the standard SQL predicates and
use SQL Server’s full-text search capabilities. FREETEXT searches for any variation of a word
or a group of words given in the search column. For example, considering the previous
scenario of the bike racing competition, you can use the FREETEXT predicate to obtain the
desired output, as shown in the following statement:

SELECT Description FROM Production.ProductDescription WHERE FREETEXT (Description,


‘race winners')

The preceding statement on execution will display the rows that contain words related to race and
winners.

CONTAINS: Full-text queries using CONTAINS are more precise than the full-text queries using
FREETEXT. This predicate is used in queries when you want to search for a specific phrase or
for the exact match. It also searches for the proximity of words within a text. For example,
you can use the following statement to search for the word ‘Ride’ near the word ‘Bike’ in the
ProductDescription table:

SELECT Description FROM Production.ProductDescription


WHERE CONTAINS (Description, ‘ride NEAR bike')

You can also use the CONTAINS predicate to search for the inflectional form of a specific word.
For example, you can search for the inflectional form of the word ‘ride'. If various rows in the
table include the words ‘ride', ‘rides', ‘rode', ‘riding', and ‘ridden', all would be in the result set
because each of these can be inflectionally generated from the word ‘ride'. You can use the
following statement to search for the inflectional form of the word ‘ride':
SELECT Description
FROM Production.ProductDescription
WHERE CONTAINS(Description, ‘ FORMSOF (INFLECTIONAL, ride) ‘);

When you use CONTAINS predicate, SQL Server discards noise words from the search criteria.
Noise words are words, such as “a,", “is,” or “the", which occur frequently but do not help when
searching for specific text.

Just a minute:
List the types of full-text index population methods.

Answer:
The three types of full-text index population methods are:

1. Full population
2. Change tracking based population
3. Incremental timestamp based population

Just a minute:
Which predicate is used to search for a specific phrase or for an exact match?

Answer:
CONTAINS

Activity: Implementing a Full-Text Search

Problem Statement

The users at AdventureWorks, Inc. need to frequently search for employees, customers, or vendors
based on the location. The location details of all these entities are stored in the Address table in
the Person schema. The users want to search for addresses with different combinations of the
words specified in the search criteria. For example, they need to search for locations that contain
the words ‘Santa’ and ‘drive’ in the AddressLine1 column. Similarly, they might need to search
for the locations that contain the words ‘Santa’ and ‘Street’ in the AddressLine1 column.

How will you enable the users to perform such a data search?

The Address table in the Person schema contains a unique index AK_Address_rowguid on the
rowguid column.

Solution

To solve the preceding problem, you need to perform the following tasks:

1. Enable the full-text search on the AdventureWorks database.


2. Create a default full-text catalog.
3. Create a full-text index.
4. Search the data by using the CONTAINS predicate

Task 1: Enabling the Full-Text Search on the AdventureWorks Database

The full-text search is disabled by default. Execute the following statement to enable the full-text
feature on the AdventureWorks database:
sp_fulltext_database enable
Task 2: Creating a Default Full-Text Catalog

You can create the full-text catalog in the AdventureWorks database by using the following
statement:
CREATE FULLTEXT CATALOG Cat2 AS DEFAULT

Task 3: Creating a Full-Text Index

You can create a full-text index on the Address table as shown in the following statement:
CREATE FULLTEXT INDEX ON Person.Address (AddressLine1) KEY INDEX AK_Address_rowguid

Task 4: Searching the Data by Using the CONTAINS Predicate

Execute the following statement to retrieve the locations that contain the words ‘Santa’ and
‘drive’:
SELECT * FROM Person.Address WHERE CONTAINS (AddressLine1, ‘Santa NEAR drive')

Summary
In this chapter, you learned that:

Indexes are created to enhance the performance of queries.


There are two types of indexes, clustered and nonclustered.
Indexes are created by using the CREATE INDEX statement.
Clustered indexes should be built on an attribute whose values are unique and do not change
often. Data is physically sorted in a clustered index.
In a nonclustered index, the physical order of rows is not the same as that of the index order.
A nonclustered index is the default index that is created with the CREATE INDEX statement.
An XML index is built on columns with the XML data type.
Indexes can also be partitioned based on the value ranges.
The common index maintenance tasks include disabling, enabling, renaming, and dropping
an index.
A view is a virtual table, which derives its data from one or more tables known as the base or
underlying tables.
Views serve as security mechanisms, thereby protecting data in the base tables.
SQL Server allows data to be modified only in one of the underlying tables when using views,
even if the view is derived from multiple underlying tables.
SQL Server enables users to search for a wide range of text in the SQL tables through a full-
text query feature.
You can enable the full-text search by using the statement, sp_fulltext_database enable.
A full-text catalog can be created by using the CREATE FULLTEXT CATALOG statement.
A full-text index can be created by using the CREATE FULLTEXT INDEX statement.
The types of full-text population methods are full population, change tracking-based
population, and incremental timestamp-based population.
Full-text predicates that can be used to perform full-text search are CONTAINS and FREETEXT.
A FREETEXT predicate searches for the word or the words given in the search column.
The CONTAINS predicate searches for a specific phrase or for the exact match.
Exercises
The following exercises are based on the AdventureWorks database.

Exercise 1

The SalesOrderDetail and SalesOrderHeader tables store the details of the sales orders. To
generate a report displaying the sales order id and the total amount of all the products purchased
against an order, you are using the following query:
SELECT sd.SalesOrderID, sum(LineTotal) AS [Total Amount]
FROM Sales.SalesOrderDetail sd JOIN Sales.SalesOrderHeader sh
ON sd.SalesOrderID = sh.SalesOrderID
GROUP BY sd.SalesOrderID

The table contains a large amount of data. Create an appropriate index to optimize the execution
of this query.

Exercise 2

The Store table is often queried. The queries are based on the CustomerID attribute and take long
time to execute. Optimize the execution of the queries. In addition, ensure that the CustomerID
attribute does not contain duplicate values.

Exercise 3

The SalesOrderDetail table is often queried. The queries are based on the SalesOrderDetailID and
SalesOrderID attributes. The execution of the queries takes a long time. Optimize the execution
of the queries.

Exercise 4
A view has been defined as shown in the following statement:
CREATE VIEW vwSalesOrderDetail
AS
SELECT oh.SalesOrderID, TerritoryID, TotalDue, OrderQty, ProductID
FROM Sales.SalesOrderHeader oh JOIN Sales.SalesOrderDetail od
ON oh.SalesOrderID = od.SalesOrderID

The following UPDATE statement gives an error when you update the OrderQty and TerritoryID
attributes:
UPDATE vwSalesOrderDetail
SET OrderQty = 2, TerritoryID = 4
FROM vwSalesOrderDetail
WHERE SalesOrderID = 43659

Identify the problem and provide the solution.


Exercise 5
The Store table contains the details of all the stores. The HR Manager of AdventureWorks, Inc.
frequently queries the Store table based on the names of the stores. He wants to create the
following reports:

A report containing the details of all the stores that contain the word ‘bike’ in their names.
A report displaying the names of all the stores containing the phrase ‘Bike Store'.

Write the query so that the result set is retrieved very promptly.

Exercise 6

Display the details of all the credit cards that are of type ‘SuperiorCard'. The CreditCard table
contains a large amount of data. Therefore, the query takes a long time to retrieve the details of
the credit card. You need to optimize the execution of the query so that the result set does not take
time to be retrieved.

Exercise 7

Display the details of all the currencies that contain the words ‘New’ and ‘Dollar’ in their names.
These words can be included in any order. In addition, you need to make sure that the query does
not take time to execute.

Exercise 8
The manager of the production department wants to analyze the products, which contain the exact
word ‘road’ in their description. Write the query so that the result set does not take a long time to
execute.

Exercise 9

You need to create a report displaying the details of all the products, which contain the word ‘top’
near the word ‘line’ in their description. Write the query to retrieve the desired output. Write the
query such that it does not take a long time to execute.

Exercise 10

Display the details of all the stores having the word ‘bike’ in their name. In addition, the report
should contain the details of those stores that have the sales person ID as 277. You need to write
the query so that the result set does not take a long time to be retrieved.

Chapter 7
Implementing Stored Procedures and Functions
As a database developer, you might need to execute a set of SQL statements together. SQL Server
allows you to create batches with multiple statements that can be executed together. These batches
can also contain programming constructs that include conditional logic to examine conditions
before executing the statements.

At times, it might be required to execute a batch repeatedly. In such a case, a batch can be saved
as database objects called stored procedures and functions. These database objects contain a
precompiled batch that can be executed many times without recompilation.

This chapter explains how to create batches to execute multiple SQL statements. Further, it
explains how to implement stored procedures and functions in SQL Server 2005.

Objectives
In this chapter, you will learn to:
Implement batches
Implement stored procedures
Implement functions

Implementing Batches
As a database developer, you might need to execute more than one SQL statement to perform a
task. For example, when a new employee joins AdventureWorks, Inc., you need to insert the
employee details in the database. The details of the employees are stored in more than one table.
Therefore, you need to execute multiple insert statements to store the details in each table. In such
a case, you can send all the SQL statements together to SQL Server to be executed as a unit. This
helps in reducing the network traffic.

At times, you might also need to check conditions before executing the SQL statements. For
example, in a manufacturing unit, the InventoryIssue table stores the details of an item issued for
the manufacturing process. When you insert a record in this table, you need to check that the
quantity on hand is more than or equal to the quantity issued. In such a case, you can create
conditional constructs that check for a condition before executing a statement.</

Creating Batches
A batch is a group of SQL statements submitted together to SQL Server for execution. While
executing batches, SQL Server compiles the statements of a batch into a single executable unit
called an execution plan. This helps in saving execution time.

For example, you have to execute 10 statements and you are executing them one by one by sending
10 requests. This process takes time if your queries are in a queue. All the statements might not
get executed together. Instead, if you execute all the 10 statements together in a batch, then the
execution process becomes faster as all the statements are sent to the server together.

To create a batch, you can write multiple SQL statements followed by the keyword GO at the end.
The syntax of creating a batch is:
<T-SQL Statement1>
< T-SQL Statement2>
< T-SQL Statement3>

GO

GO is a command that specifies the end of the batch and sends the SQL statements for execution.

For example, if you want to store the details of new employees in the AdventureWorks database,
you can write multiple INSERT statements in a batch, as shown in the following statements:
INSERT INTO [AdventureWorks].[Person].[Contact]
VALUES (0, null, ‘Robert', ‘J', ‘Langdon', NULL,'[email protected]', 0, ‘1
(11) 500 555-0172’,'9E685955-ACD0-4218-AD7F-60DDF224C452', ‘2a31OEw=', NULL,
newid(), getdate())
INSERT INTO [AdventureWorks].[HumanResources].[Employee]
VALUES ('AS01AS25R2E365W', 19978, ‘robertl', 16, ‘Tool Designer', ‘1972-05-15', ‘S',
‘M', ‘1996-07-31', 0, 16, 20, 1, newid(), getdate())
GO

When a batch is submitted to SQL Server, it is compiled to create an execution plan. If any
compilation error occurs, such as a syntax error, the execution plan is not created. Therefore, none
of the statements in the batch is executed. However, if a run-time error occurs after the execution
plan is created, the execution of the batch stops. In such a case, the statements executed before the
statement that encountered the run-time error are not affected.

Using Variables

While creating batches, you might need to store some values temporarily during the execution
time. For example, you might need to store some intermediate values while performing
calculations. To store the intermediate values, you can declare variables and assign values to them.
You can declare a variable by using the DECLARE statement. A variable name is always preceded
by the ‘@’ symbol. The syntax of the DECLARE statement is:
DECLARE @variable_name data_type

Variables that are declared in a batch and can be used in any statement inside the batch are called
local variables.

The following statements declares a variable, @Rate, and assigns the maximum value of the Rate
column from the EmployeePayHistory table to the variable:
DECLARE @Rate int
SELECT @Rate = max(Rate)
FROM HumanResources.EmployeePayHistory
GO

In the preceding statements, the max aggregate function is used to retrieve the maximum pay rate
from the EmployeePayHistory table.

Displaying User-Defined Messages

At times, you need to display values of variables or user-defined messages when the batch is
executed. For this, you can use the PRINT statement.
The following statements display the value of the @rate variable by using the PRINT statement:
DECLARE @Rate int
SELECT @Rate = max(Rate)
FROM HumanResources.EmployeePayHistory
PRINT @Rate
GO

You can also use comment entries in batches to write a description of the code. This will help
understand the purpose of the code. A comment entry can be written in the following ways:

Multiple line comment entries enclosed within /* and */


Single line comment entry starting with – (double hyphens)

Guidelines to Create Batches

While creating batches, you need to consider the following guidelines:

You cannot combine statements, such as CREATE DEFAULT, CREATE FUNCTION, CREATE
PROCEDURE, CREATE RULE, CREATE TRIGGER, and CREATE VIEW with other statements while
creating a batch. Any statement that follows the create statement is interpreted as part of
the definition.
You can use the EXECUTE statement in a batch when it is not the first statement of the batch,
otherwise the EXECUTE statement works implicitly.

In addition, you need to consider the following restrictions:

You cannot bind rules and defaults to columns and use them in the same batch.
You cannot define and use the CHECK constraint in the same batch.
You cannot drop objects and recreate them in the same batch.
You cannot alter a table by adding a column and then refer to the new columns in the batch
created earlier.

Just a minute:
Which of the following statements cannot be combined with other statements within a batch?

1. CREATE FUNCTION
2. CREATE RULE
3. DECLARE

Answer:
3. DECLARE</

Using Constructs
SQL Server allows you to use programming constructs in batches for conditional execution of
statements. For example, you need to retrieve data based on a condition. If the condition is not
satisfied, a message should be displayed.

SQL Server allows you to use the following constructs to control the flow of statements:

IF…ELSE statement
CASE statement
WHILE statement

Using the IF…ELSE Statement

You can use the IF…ELSE statement for conditional execution of SQL statements. A particular
action is performed when the given condition evaluates to TRUE, and another action is performed
when the given condition evaluates to FALSE.

The syntax of the IF…ELSE statement is:


IF boolean_expression
{sql_statement | statement_block}
ELSE
{sql_statement | statement_block}]

where,
boolean_expression specifies the condition that evaluates to either TRUE or FALSE.
sql_statement specifies a T-SQL statement.
statement_block is a collection of T-SQL statements.

For example, you can retrieve the pay rate of an employee from the EmployeePayHistory table to
a variable, @Rate. The value of the @Rate variable is compared with the value 15 by using the <
(less than) comparison operator. Based on the condition, different messages are displayed, as
shown in the following statements:
DECLARE @Rate money
SELECT @Rate = Rate FROM HumanResources.EmployeePayHistory
WHERE EmployeeID = 23
IF @Rate < 15
PRINT ‘Review of the rate is required’
ELSE
BEGIN
PRINT ‘Review of the rate is not required’
PRINT ‘Rate =’
PRINT @Rate
END
GO

In the preceding statements, the IF statement checks whether the @Rate variable is storing a value
less than 15. If the result is true, the PRINT statement displays “Review of the rate is required”
else it displays “Review of the rate is not required”. Further, the next PRINT statement displays
the value of the rate.

Consider another example, where a check is performed to see the existence of the Sales
department, as shown in the following statement:
IF EXISTS (SELECT * FROM HumanResources.Department WHERE Name = ‘Sales')
BEGIN
SELECT * FROM HumanResources.Department WHERE Name = ‘Sales’
END
ELSE
PRINT ‘Department details not available’
GO

In the preceding statement, if the Sales department exists, all the details are displayed; otherwise,
a user-defined message is displayed.

Using the CASE Statement

You can use the CASE statement in situations where several conditions need to be evaluated. The
CASE statement evaluates a list of conditions and returns one of the possible results. You can use
the IF statement to do the same task. However, you can use a CASE statement when there are
more than two conditions that check a common variable for different values. The syntax of the
CASE statement is:
CASE input_expression
WHEN when_expression THEN result_expression
[[WHEN when_expression THEN result_expression] […]]
[ELSE else_result_expression]
END

where,
input_expressionspecifies the input expression that is evaluated. The input_expression is any
valid expression.
when_expression is the expression that is compared with the input_expression.
result_expression is the expression returned when the comparison of input_expression with
when_expression evaluates to TRUE. This can be a constant, a column name, a function, a
query, or any combination of arithmetic, bit-wise, and string operators.
else_result_expression is the expression returned if no comparison operation evaluates to
TRUE. If this argument is omitted and no comparison operation evaluates to TRUE, the result
will be NULL.

In a simple CASE construct, a variable or an expression is compared with the expression in each
WHEN clause. If any of these expressions evaluate to TRUE, then the expression specified with
the THEN clause is executed. If the expression does not evaluate to TRUE, the expression with
the ELSE statement is executed.

Consider the following statements, where a CASE construct is included in the SELECT statement
to display the marital status as ‘Married’ or ‘Single’:
SELECT EmployeeID, ‘Marital Status’ =
CASE MaritalStatus
WHEN ‘M’ THEN ‘Married’
WHEN ‘S’ THEN ‘Single’
ELSE ‘Not specified’
END
FROM HumanResources.Employee
GO

Using the WHILE Statement


You can use the WHILE statement in a batch to allow a set of T-SQL statements to execute
repeatedly as long as the given condition holds true. The syntax of the WHILE statement is:
WHILE boolean_expression
{sql_statement | statement_block}
[BREAK]
{sql_statement | statement_block}
[CONTINUE]

where,
boolean_expression is an expression that evaluates to TRUE or FALSE.
sql_statement is any SQL statement.
statement_block is a group of SQL statements.
BREAK causes the control to exit from the WHILE loop.
CONTINUE causes the WHILE loop to restart, skipping all the statements after the CONTINUE
keyword.

SQL Server provides the BREAK and CONTINUE statements to control the statements within
the WHILE loop. The BREAK statement causes an exit from the WHILE loop. Any statements
that appear after the END keyword, which marks the end of the loop, are executed after the
BREAK statement is executed. The CONTINUE statement causes the WHILE loop to restart,
skipping any statements after this statement inside the loop.

For example, the HR department of AdventureWorks, Inc. has decided to review the salary of all
the employees. As per the current HR policy, the average hourly salary rate of all the employees
should be approximately $20. You need to increase the hourly salary of all the employees until
the average hourly salary reaches near $20. In addition, you need to ensure that the maximum
hourly salary should not exceed $127. To accomplish this task, you can use the following
statement:
WHILE (SELECT AVG(Rate)+1 FROM HumanResources.EmployeePayHistory) < 20
BEGIN
UPDATE HumanResources.EmployeePayHistory
SET Rate = Rate + 1
FROM HumanResources.EmployeePayHistory
IF (SELECT max(Rate)+1 FROM
HumanResources.EmployeePayHistory)>127
BREAK
ELSE
CONTINUE
END</

Handling Errors and Exceptions


When you execute a query, it is parsed for syntactical errors before execution. If the syntax is
correct, it is compiled and executed. Sometimes, due to factors, such as incorrect data, an error
can occur during execution even if the query is syntactically correct. The errors that occur at run
time are known as exceptions.

For example, there is a primary key constraint applied on the EmployeeID attribute of the
Employee table. When you try to insert an employee ID that already exists in the table, an error
occurs while executing the INSERT statement.

Exceptions can be handled in the following ways:

By using the TRY-CATCH construct


By using the RAISERROR statement and handling the error in the application

Using TRY-CATCH

A TRY-CATCH construct includes a TRY block followed by a CATCH block. A TRY block is
a group of SQL statements enclosed in a batch, stored procedure, trigger, or function. If an error
occurs in any statement of the TRY block, the control is passed to another group of statements
that are enclosed in a CATCH block.

A CATCH block contains SQL statements that perform some operations when an error occurs.
Therefore, an associated CATCH block must immediately follow a TRY block, as shown in the
following syntax:
TRY
<SQL statements>

CATCH
<SQL statements>

END CATCH

If there are no errors in the code that is enclosed in a TRY block, the control is passed to the
statement immediately after the associated END CATCH statement. In this case, statements
enclosed in the CATCH block are not executed.

The TRY…CATCH constructs can be nested. Either a TRY block or a CATCH block can contain
nested TRY…CATCH constructs. A CATCH block can contain an embedded TRY…CATCH
construct to handle errors encountered by the CATCH block.
In the CATCH block, you can use the following system functions to determine the information
about errors:

ERROR_LINE(): Returns the line number at which the error occurred.


ERROR_MESSAGE(): Specifies the text of the message that would be returned to the
application.
ERROR_NUMBER(): Returns the error number.
ERROR_PROCEDURE(): Returns the name of the stored procedure or trigger in which the
error occurred. This function returns NULL if the error did not occur within a stored procedure
or trigger.
ERROR_SEVERITY(): Returns the severity.
ERROR_STATE(): Returns the state of the error.

For example, the EmployeeID attribute of the Employee table in the AdventureWorks database is
an IDENTITY column and its value cannot be specified while inserting a new record. However,
if you specify the value for the EmployeeID in the INSERT statement, an error will occur.

To handle such run-time errors, you can include the insert statement in a TRY block and send the
control to the CATCH block where the error information is displayed, as shown in the following
statements:
BEGIN TRY
INSERT INTO [AdventureWorks].[Person].[Contact]
VALUES (0, null, 'Robert', 'J', 'Langdon', NULL
,'[email protected]', 0, '1 (11) 500 555-0172'
,'9E685955-ACD0-4218-AD7F-60DDF224C452', '2a31OEw=', NULL, newid(), getdate())

INSERT INTO [AdventureWorks].[HumanResources].[Employee]


VALUES ('AS01AS25R2E365W', 19979, 'robertl', 16, 'Tool Designer', '1972-05-15', 'S',
'M', '1996-07-31', 0, 16, 20, 1, newid(), getdate())

END TRY
BEGIN CATCH
SELECT 'There was an error! ' + ERROR_MESSAGE() AS ErrorMessage,
ERROR_LINE() AS ErrorLine,
ERROR_NUMBER() AS ErrorNumber,
ERROR_PROCEDURE() AS ErrorProcedure,
ERROR_SEVERITY() AS ErrorSeverity,
ERROR_STATE() AS ErrorState
END CATCH
GO

Using RAISERROR

A RAISERROR statement is used to return messages to the business applications that are
executing the SQL statements. This statement uses the same format as a system error or warning
message generated by the database engine. For example, consider an application that is executing
a batch. If an error occurs while executing this batch, an error message will be raised and sent to
the application. The application, in turn, will include the code to handle the error.
You can also return user-defined error messages by using the RAISERROR statement. The syntax
of the RAISEERROR statement is:
RAISEERROR(‘Message’, Severity, State)

where,
Message is the text that you want to display.
Severity is the user-defined severity level associated with the message. It represents how
serious the error is. Severity level can range from 0 to 25. The levels from 0 through 18 can
be specified by any user. The preferable value used by users is 10, which is for displaying
informational error messages. Severity levels from 20 through 25 are considered fatal.
State is an integer value from 0 through 255. If the same user-defined error is raised at
multiple locations, using a unique state number for each location can help find which section
of the code is raising the errors.

A RAISERROR severity of 11 to 19 executed in the TRY block causes the control to be


transferred to the associated CATCH block. For example, you have to update the Shift table to
store the details of the shift in which the employees work. While updating the shift details, you
need to ensure that the difference between the start time and the end time is eight hours, as shown
in the following statements:
BEGIN TRY
DECLARE @Start datetime
DECLARE @End datetime
DECLARE @Date_diff int

SELECT @Start = '1900-01-01 23:00:00.000', @End = '1900-01-02 06:00:00.000'

SELECT @Date_diff = datediff(hh, @Start, @End)


IF (@Date_diff != 8)
RAISERROR('Error Raised', 16, 1)
ELSE
BEGIN
UPDATE HumanResources.Shift
SET StartTime = @Start, EndTime = @End
WHERE ShiftID = 3
END
END TRY
BEGIN CATCH
PRINT 'The difference between the Start and End time should be 8
hours'
END CATCH
GO

In the preceding statements, if the difference between the start time and the end time is less than
eight hours, an error is raised, and the update process is stopped.

Just a minute:
Which system function returns the text of the error message when used in the CATCH block?
Answer:
ERROR_MESSAGE()

Just a minute:
How can you return a user-defined error message in a batch?

Answer:
Using the RAISERROR statement

Implementing Stored Procedures


Batches are temporary in nature. To execute a batch more than once, you need to recreate SQL
statements and submit them to the server. This leads to an increase in the overhead, as the server
needs to compile and create the execution plan for these statements again. Therefore, if you need
to execute a batch multiple times, you can save it within a stored procedure. A stored procedure
is a precompiled object stored in the database.

Stored procedures can invoke the Data Definition Language (DDL) and Data Manipulation
Language (DML) statements and can return values. If you need to assign values to the variables
declared in the procedures at run time, you can pass parameters while executing them. You can
also execute a procedure from another procedure. This helps in using the functionality of the called
procedure within the calling procedure.

As a database developer, it is important for you to learn how to implement procedures.</

Creating Stored Procedures


You can create a stored procedure by using the CREATE PROCEDURE statement.

The syntax of the CREATE PROCEDURE statement is:


CREATE PROCEDURE proc_name
AS
BEGIN
sql_statement1
sql_statement2
END

where,
proc_name specifies the name of the stored procedure.

The following statement creates a stored procedure to view the department names from the
Department table:
CREATE PROCEDURE prcDept
AS
BEGIN
SELECT Name FROM HumanResources.Department
END

When the CREATE PROCEDURE statement is executed, the server compiles the procedure and
saves it as a database object. The procedure is then available for various applications to execute.

The process of compiling a stored procedure involves the following steps:

1. The procedure is compiled and its components are broken into various pieces. This process
is known as parsing.
2. The existence of the referred objects, such as tables and views, are checked. This process
is known as resolving.
3. The name of the procedure is stored in the sysobjects table and the code that creates the
stored procedure is stored in the syscomments table.
4. The procedure is compiled and a blueprint for how the query will run is created. This
blueprint is specified as an execution plan. The execution plan is saved in the procedure
cache.
5. When the procedure is executed for the first time, the execution plan will be read, fully
optimized, and then run. When the procedure is executed again in the same session, it will
be read directly from the cache. This increases performance, as there is no repeated
compilation.

After creating the stored procedure, you can view the code of the procedure by using the sp_helptext
statement.

Guidelines to Create a Stored Procedure

The following points need to be considered before creating a stored procedure:

You cannot combine the CREATE PROCEDURE statement with other SQL statements in a
single batch.
You must have the CREATE PROCEDURE permission to create a procedure in the database
and the ALTER permission on the schema, where the procedure is being created.
You can create a stored procedure only in the current database.

After creating a stored procedure, you can execute the procedure. You can also alter the procedure
definition or drop it, if the existing procedure is not required.

Executing a Stored Procedure

A procedure can be executed by using the EXECUTE or EXEC statement. The syntax of the
EXECUTE statement is:
EXEC | EXECUTE proc_name

where,
proc_name, is the name of the procedure that you need to execute.
You can execute the stored procedure, prcDept, as shown in the following statement:
EXEC prcDept

Altering a Stored Procedure

A stored procedure can be modified by using the ALTER PROCEDURE statement. The syntax
of the ALTER PROCEDURE statement is:
ALTER PROCEDURE proc_name

You can alter the stored procedure by using the following statement:
ALTER PROCEDURE prcDept
AS
BEGIN
SELECT DepartmentID, Name FROM HumanResources.Department
END

In the preceding statement, the prcDept stored procedure will be modified to display department
Ids along with the department name.

Dropping a Stored Procedure

You can drop a stored procedure from the database by using the DROP PROCEDURE statement.
The syntax of the DROP PROCEDURE statement is:
DROP PROCEDURE proc_name

You cannot retrieve a procedure once it is dropped.

You can drop the prcDept stored procedure by using the following statement:
DROP PROCEDURE prcDept

Just a minute:
Which statement will you use to modify the stored procedure?

Answer:
ALTER PROCEDURE

Just a minute:
Which system-defined table stores the names of all the stored procedures?

Answer:
Sysobjects</

Creating Parameterized Stored Procedures


At times, you need to execute a procedure for different values of a variable that are provided at
run time. For this, you can create a parameterized stored procedure. Parameters are used to pass
values to the stored procedure during run time. These values can be passed by using standard
variables. The parameter that passes the value to the stored procedure is defined as an input
parameter. A stored procedure has the capability of using a maximum of 2100 parameters. Each
parameter has a name, data type, direction, and a default value.

Direction represents whether a parameter is an input parameter or an output parameter. By default,


the direction of a parameter is input.

The following statement creates a stored procedure displaying the employee ID, the login ID, and
the title of employees that have the same title provided as an input during execution:
CREATE PROC prcListEmployee @title char(50)
AS
BEGIN
PRINT ‘List of Employees’
SELECT EmployeeID, LoginID, Title
FROM HumanResources.Employee
WHERE Title = @title
END

You can execute the stored procedure, prcListEmployee, by using the following statement:
EXECUTE prcListEmployee ‘Tool Designer’

While executing stored procedures, you can also provide the values for the parameters by
explicitly specifying the name and value of the parameter. In the previous example, you can also
pass the parameter value by using the name of variable, as shown in the following statement:
EXECUTE prcListEmployee @title = ‘Tool Designer’</

Returning Values from Stored Procedures


Similar to providing input values to the procedures at run time, you can also return values as an
output from the procedures. The values can be returned to the calling application through output
parameters. To specify a parameter as the output parameter, you can use the OUTPUT keyword.

The OUTPUT keyword has to be specified in both the CREATE PROCEDURE and the
EXECUTE statement. If the OUTPUT keyword is omitted, the procedure will be executed but
will not return any value.

The syntax of declaring an output parameter using the OUTPUT keyword is:
CREATE PROCEDURE procedure_name
[
{@parameter data_type} [OUTPUT]
]
AS
sql_statement […n]
@parameter data_type [OUTPUT]allows the stored procedure to pass a data value to the calling
procedure. If the OUTPUT keyword is not used, then the parameter is treated as an input
parameter.

You can also return values from the stored procedure by using the RETURN statement. The
RETURN statement allows the stored procedure to return only an integer value to the calling
application. The syntax of the RETURN statement is:
RETURN value

where,
value is any integer.

If a value is not specified, then the stored procedure returns a default value of 0 to specify success
and a nonzero value to specify failure.

For example, you need to display the details of an employee whose employee ID has been
provided as an input. For this, you need to create a procedure prcGetEmployeeDetail that will
accept employee ID as an input and return the department name and ID of the shift in which the
employee works. You can create the procedure, as shown in the following statement:
CREATE PROCEDURE prcGetEmployeeDetail @EmpId int, @DepName char(50) OUTPUT, @ShiftId
int OUTPUT
AS
BEGIN
IF EXISTS(SELECT * FROM HumanResources.Employee WHERE EmployeeID = @EmpId)
BEGIN
SELECT @DepName = d.Name, @ShiftId = h.ShiftID
FROM HumanResources.Department d JOIN
HumanResources.EmployeeDepartmentHistory h
ON d.DepartmentID = h.DepartmentID
WHERE EmployeeID = @EmpId AND h.Enddate IS NULL
RETURN 0
END
ELSE
RETURN 1
END

In the preceding statement, the prcGetEmployeeDetail procedure accepts the employee ID as an


input parameter and returns the department name and shift ID as the output parameters. The
procedure first checks the existence of the given employee ID. If it exists, the procedure returns
an integer value 0 along with the required details.</

Calling a Procedure from Another Procedure


At times, you might need to use the values returned by a procedure in another procedure. For this,
you can execute or call one procedure from another procedure. A procedure that calls or executes
another procedure is known as the calling procedure, and the procedure that is called or executed
by the calling procedure is termed as the called procedure. You can also execute a procedure from
another procedure if you need to use the functionality provided by one into another.

Consider the previous example where the prcGetEmployeeDetail procedure returns the employee
details for a given employee ID. You can create the prcDisplayEmployeeStatus procedure, which
accepts the employee ID of an employee as an input and displays the department name and shift
ID where the employee is working along with the manager ID and the title of the employee. To
perform this task, you need to call the prcGetEmployeeDetail procedure from the
prcDisplayEmployeeStatus procedure, as shown in the following statement:
CREATE PROCEDURE prcDisplayEmployeeStatus @EmpId int
AS
BEGIN
DECLARE @DepName char(50)
DECLARE @ShiftId int
DECLARE @ReturnValue int
EXEC @ReturnValue = prcGetEmployeeDetail @EmpId,
@DepName OUTPUT,
@ShiftId OUTPUT
IF (@ReturnValue = 0)
BEGIN
PRINT ‘The details of an employee with ID: ‘ + convert(char(10),
@EmpId)
PRINT ‘Department Name: ‘ + @DepName
PRINT ‘Shift ID: ‘ + convert( char(1), @ShiftId)
SELECT ManagerID, Title FROM HumanResources.Employee
WHERE EmployeeID = @EmpID
END
ELSE
PRINT ‘No records found for the given employee’
END

To execute the prcDisplayEmployeeStatus procedure, you need to execute the following


statement:
EXEC prcDisplayEmployeeStatus 2

SQL Server provides a function, @@ROWCOUNT, which returns the number of rows affected by
the last statement. You can use this statement in the IF construct to check the result of the last
executed statement.

Activity: Creating Stored Procedures

Problem Statement
You are a database developer of AdventureWorks, Inc. The Human Resource department needs
to revise the payment details of the employees. You need to create a procedure that obtains the
percentage value by which you need to increase the pay rate. In addition, you need to ensure that
the pay is revised for only those employees whose pay rate was not revised in the last six months.

Solution

To solve the preceding problem, you need to perform the following tasks:

1. Create a stored procedure.


2. Execute the stored procedure.
3. Verify the result.

Task 1: Creating a Stored Procedure

To create a stored procedure, you need to perform the following steps:


1. Type the following statement in the Query Editor window of the Microsoft SQL Server
Management Studio window:
CREATE PROC PayRateIncrease @EmpID int, @percent float
AS
BEGIN
DECLARE @maxRate float
DECLARE @RevisedRate float
DECLARE @PayFre int
IF EXISTS (SELECT *
FROM HumanResources.EmployeePayHistory
WHERE Datediff(mm, RateChangeDate, getdate()) > 6 AND EmployeeID = @EmpID)
BEGIN
SELECT @maxRate = Rate
FROM HumanResources.EmployeePayHistory
WHERE EmployeeID = @EmpID
IF (@maxRate * @percent > 200.00)
BEGIN
PRINT ‘Rate of an employee cannot be greater than 200.00’
END
ELSE
BEGIN
SELECT @RevisedRate = Rate, @PayFre = PayFrequency
FROM HumanResources.EmployeePayHistory
WHERE EmployeeID = @EmpID
SET @RevisedRate = @RevisedRate * @percent
INSERT INTO HumanResources.EmployeePayHistory
VALUES (@EmpID, getdate(), @RevisedRate, @PayFre,
getdate())
END
END
END

2. Press the F5 key to compile the stored procedure.

Task 2: Executing the Stored Procedure

Execute the following statement to run the stored procedure:


EXEC PayRateIncrease 6, 2

Task 3: Verifying the Result

Execute the following query to verify the result of the stored procedure:
SELECT * FROM HumanResources.EmployeePayHistory WHERE EmployeeID = 6 ORDER BY
ModifiedDate desc

The preceding query displays the output, as shown in the following figure.

Output of the Query

In the preceding figure, you can see that the rate of the employee has been updated by two percent
on the current date.

Implementing Functions
Similar to stored procedures, you can also create functions to store a set of T-SQL statements
permanently. These functions are also referred to as user-defined functions (UDFs). A UDF is a
database object that contains a set of T-SQL statements, accepts parameters, performs an action,
and returns the result of that action as a value. The return value can be either a single scalar value
or a result set.

UDFs have a limited scope as compared to stored procedures. You can create functions in
situations when you need to implement a programming logic that does not involve any permanent
changes to the database objects outside the function. For example, you cannot modify a database
table from a function.

UDFs are of different types, scalar functions and table-valued functions. As a database developer,
you must learn to create and manage different types of UDFs.</

Creating UDFs
A UDF contains the following components:

Function name with optional schema/owner name


Input parameter name and data type
Options applicable to the input parameter
Return parameter data type and optional name
Options applicable to the return parameter
One or more T-SQL statements

To create a function, you can use the CREATE FUNCTION statement. The syntax of the
CREATE FUNCTION statement is:
CREATE FUNCTION [ schema_name. ] function_name
( [ { @parameter_name [ AS ][ type_schema_name. ] parameter_data_type
[ = default ] }
[ ,...n ]
]
)
RETURNS return_data_type
[ WITH <function_option> [ ,...n ] ]
[ AS ]
BEGIN
function_body
RETURN expression
END
[ ; ]

where,
schema_name is the name of the schema to which the UDF belongs.
function_name is the name of the UDF. Function names must comply with the rules for
identifiers and must be unique within the database and to its schema.
@parameter_name is a parameter in the UDF. One or more parameters can be declared.
[ type_schema_name. ] parameter_data_type is the data type of the parameter, and optionally
the schema to which it belongs.
[ = default ] is a default value for the parameter.
return_data_type is the return value of a scalar user-defined function.
function_body specifies a series of T-SQL statements.

UDFs can be of two types, scalar and table-valued functions. The definition of each function is
different. Therefore, it is important to learn how to create each type of function.

Creating Scalar Functions

Scalar functions accept a single parameter and return a single data value of the type specified in
the RETURNS clause. A scalar function can return any data type except text, ntext, image, cursor,
and timestamp. Some scalar functions, such as current_timestamp, do not require any arguments.

A function contains a series of T-SQL statements defined in a BEGIN…END block of the function
body that returns a single value.
For example, consider a scalar function that calculates the monthly salary of employees. This
function accepts the pay rate and returns a single value after multiplying the rate with the number
of hours and number of days. You can create this function by using the following statement:
CREATE FUNCTION HumanResources.MonthlySal (@PayRate float)
RETURNS float
AS
BEGIN
RETURN (@PayRate * 8 * 30)
END

You can execute the preceding function by using the following statements:
DECLARE @PayRate float
SET @PayRate = HumanResources.MonthlySal(12.25)
PRINT @PayRate

In the preceding statements, @PayRate is a variable that will store a value returned by the
MonthlySal function.

Creating Table-Valued Functions

A table-valued function returns a table as an output, which can be derived as part of a SELECT
statement. Table-valued functions return the output as a table data type. The table data type is a
special data type used to store a set of rows. Table-valued functions are of the following types:

Inline table-valued function


Multistatement table-valued function

Inline Table-Valued Function

An inline table-valued function returns a variable of a table data type from the result set of a single
SELECT statement. An inline function does not contain a function body within the BEGIN and
END statements.

For example, the inline table-valued function, fx_Department_GName, accepts a group name as
a parameter and returns the details of the departments that belong to the group from the
Department table. You can create the function by using the following statement:
CREATE FUNCTION fx_Department_GName( @GrName nvarchar(20) )
RETURNS table
AS
RETURN (
SELECT *
FROM HumanResources.Department
WHERE GroupName=@GrName
)
GO
You can use the following query to execute the fx_Department_GName function with a specified
argument:
SELECT * FROM fx_Department_GName('Manufacturing')

The preceding query will return a result set, as shown in the following figure.

Output of the Function

Consider another example of an inline function that accepts rate as a parameter and returns all the
records that have a rate greater than the parameter value. You can create this function, as shown
in the following statement:
CREATE FUNCTION HumanResources.Emp_Pay(@Rate int)
RETURNS table
AS
RETURN (
SELECT e.EmployeeID, e.Title, er.Rate
FROM HumanResources.Employee AS e
JOIN HumanResources.EmployeePayHistory AS er
ON e.EmployeeID=er.EmployeeID WHERE er.Rate>@Rate
)
GO

In the preceding statement, the Emp_Pay function will return a result set that displays all the
records of the employees who have the pay rate greater that the parameter. You can execute the
preceding function by using the following query:
SELECT * FROM HumanResources.Emp_Pay(50)

Multistatement Table-Valued Function

A multistatement table-valued function uses multiple statements to build the table that is returned
to the calling statement. The function body contains a BEGIN…END block, which holds a series
of T-SQL statements to build and insert rows into a temporary table. The temporary table is
returned in the result set and is created based on the specification mentioned in the function.

For example, the multistatement table-valued function, PayRate, is created to return a set of
records from the EmployeePayHistory table. You can create this function by using the following
statement:
CREATE FUNCTION PayRate (@rate money)
RETURNS @table TABLE
(EmployeeID int NOT NULL,
RateChangeDate datetime NOT NULL,
Rate money NOT NULL,
PayFrequency tinyint NOT NULL,
ModifiedDate datetime NOT NULL)
AS
BEGIN
INSERT @table
SELECT *
FROM HumanResources.EmployeePayHistory
WHERE Rate > @rate
RETURN
END

In the preceding statement, the function returns a result set in form of a temporary table, @table,
created within the function. You can execute the function by using the following query:
SELECT * FROM PayRate(45)

Depending on the result set returned by a function, a function can be categorized as


deterministic or nondeterministic. Deterministic functions always return the same result
whenever they are called with a specific set of input values. However, nondeterministic
functions may return different results each time they are called with a specific set of input
values.
An example of a deterministic function is dateadd(), which returns the same result for any given set
of argument values for its three parameters. getdate() is a nondeterministic function because it is
always invoked without any argument, but the return value changes on every execution.

Just a minute:
Which type of functions returns a single value?

Answer:
Scalar functions

Activity: Creating Functions

Problem Statement

As a database developer at AdventureWorks, Inc., you need to create a function that accepts the
employee ID of an employee and returns the following details:

Employee ID
Name of the employee
Title of the employee
Number of other employees working under the employee

How will you create the function?

Solution

To solve the preceding problem, you need to perform the following tasks:

1. Create a function.
2. Execute the function to verify the result.

Task 1: Creating a Function

To create a function, you need to perform the following steps:


1. Type the following statement in the Query Editor window of the Microsoft SQL Server
Management Studio window:
CREATE FUNCTION EmployeeDetails (@Eid AS int)
RETURNS TABLE
AS
RETURN
(
SELECT t1.EmployeeID, t2.FirstName AS ‘Name',
t1.Title, (SELECT count(Employeeid) FROM HumanResources.Employee
WHERE Managerid = @Eid) AS ‘Number of Employees’ FROM
HumanResources.Employee t1
JOIN Person.Contact t2 ON t1.ContactID = t2.ContactID WHERE
t1.EmployeeID = @Eid
GROUP BY t2.FirstName, t1.EmployeeID, t1.title
)

2. Press the F5 key to create the function.

Task 2: Executing the Function to Verify the Result

To verify the result of the function, execute the following query:


SELECT * FROM EmployeeDetails(16)

The preceding query displays the output, as shown in the following figure.

Output of the Function


The preceding figure displays the employee ID, name, title, and number of employees working
under the employee with employee ID as 16.

Summary
In this chapter, you learned that:

A batch is a set of SQL statements submitted together to the server for execution.
You can use a variable to store a temporary value.
You can use the PRINT statement to display a user-defined message or the content of a
variable on the screen.
You can use the comment entries in batches to write a description of the code.
You can use the IF…ELSE statement for conditional execution of the SQL statements.
The CASE statement evaluates a list of conditions and returns one of the various possible
results.
You can use the WHILE statement in a batch to allow a set of T-SQL statements to execute
repeatedly as long as the given condition holds true.
The BREAK statement causes an exit from the WHILE loop.
The CONTINUE statement causes the WHILE loop to restart, skipping any statements after
the CONTINUE statement within the loop.
Two ways of handling errors in a batch are:
 TRY…CATCH
 RAISERROR
A stored procedure is a collection of various T-SQL statements that are stored under one
name and are executed as a single unit.
A stored procedure can be created by using the CREATE PROCEDURE statement.
A stored procedure allows you to declare parameters, variables, and use T-SQL statements
and programming logic.
A stored procedure provides better performance, security, and accuracy, and reduces the
network congestion.
A stored procedure accepts data through input parameters.
A stored procedure returns data through the output parameters or return statements.
A stored procedure is executed by using the EXECUTE statement.
A stored procedure can be altered by using the ALTER PROCEDURE statement.
A user-defined function is a database object that contains a set of T-SQL statements.
The user-defined functions can return either a single scalar value or a result set.
UDFs are of two types: scalar functions and table-valued functions.
A scalar function accepts a single value and returns a single value.
A table-valued function returns a table as an output, which can be derived as part of a SELECT
statement.

Exercises

Exercise 1
Create a batch that finds the average pay rate of the employees and then lists the details of
employees who have a pay rate less than the average pay rate.

Exercise 2

Create a function that returns the shipment date of a particular order.

Exercise 3

Create a function that returns the credit card number for a particular order.

Exercise 4

Create a function that returns a table containing the ID and the name of the customers who are
categorized as individual customers (CustomerType = ‘I'). The function will take one parameter.
The parameter value can be either Shortname or Longname. If the parameter value is Shortname,
only the last name of the customer will be retrieved. If the value is Longname, then the full name
will be retrieved.

Chapter 8
Working with Triggers and Transactions
In a relational database, data in a table is related to other tables. Therefore, while manipulating
data in one table, you need to verify and validate its effect on data in the related tables. In addition,
after inserting or updating data in the table, you might need to manipulate data in another table.
You also need to ensure that if an error occurs while updating the data in a table, the changes are
reverted. This helps in maintaining data integrity. SQL Server allows you to implement triggers
and transactions to maintain data integrity.

This chapter explains different types of triggers that can be created in SQL Server 2005. Next, it
discusses how to implement triggers to enforce data integrity. Further, it explains how to
implement transactions.

Objectives
In this chapter, you will learn to:
Implement triggers
Implement transactions

Implementing Triggers
At times, while performing data manipulation on a database object, you might also need to
perform manipulation on another object. For example, in an organization, the employees use the
Online Leave Approval system to apply for leaves. When an employee applies for a leave, the
leave details are stored in the LeaveDetails table. In addition, a new record is added to the
LeavesForApproval table. When the supervisors log on to the system, all the leaves pending for
their approval are retrieved from the LeavesForApproval table and displayed to them.
To perform such operations, SQL Server allows you to implement triggers. A trigger consists of
a set of T-SQL statements activated in response to certain actions, such as insert or delete. Triggers
are used to ensure data integrity before or after performing data manipulations. Therefore, a trigger
is a special kind of stored procedure that executes in response to specific events.

Before you implement a trigger, it is important to know different types of triggers that can be
created by using SQL Server 2005.</

Identifying Types of Triggers


In SQL Server, various kinds of triggers can be used for different types of data manipulation
operations. Triggers can be nested and fired recursively. SQL Server supports the following types
of triggers:

Data Modification Language (DML) triggers


Data Definition Language (DDL) triggers

DML Triggers

A DML trigger is fired when data in the underlying table is affected by DML statements, such as
INSERT, UPDATE, or DELETE. These triggers help in maintaining consistent, reliable, and
correct data in tables. They enable the process of reflecting the changes made in a table to other
related tables.

The DML triggers have the following characteristics:

Fired automatically by SQL Server whenever any data modification statement is issued.
Cannot be explicitly invoked or executed, as in the case of the stored procedures.
Prevents incorrect, unauthorized, and inconsistent changes in data.
Cannot return data to the user.
Can be nested up to 32 levels. The nesting of triggers occurs when a trigger performs an
action that initiates another trigger.

Whenever a trigger is fired in response to the INSERT, DELETE, or UPDATE statement, SQL
Server creates two temporary tables, called magic tables. The magic tables are called Inserted and
Deleted. These are logical tables and are similar in structure to the table on which the trigger is
defined.

The Inserted table contains a copy of all the records that are inserted in the trigger table. A trigger
table refers to the table on which the trigger is defined. The Deleted table contains all records that
have been deleted from the trigger table. Whenever you update data in a table, the trigger uses
both the inserted and the deleted tables.

Depending on the operation that is performed, DML triggers can be further categorized as:

Insert trigger: Is fired whenever an attempt is made to insert a row in the trigger table. When
an INSERT statement is executed, the newly inserted rows are added to the Inserted table.
Delete trigger: Is fired whenever an attempt is made to delete a row from the trigger table.
When a DELETE statement is executed, deleted rows are added to the Deleted table. The
Deleted and trigger tables do not have any rows in common, as in the case of the Inserted and
trigger tables.
There are three ways of implementing referential integrity by using a delete trigger. These are:
 The cascade method: Deletes records from the dependent tables whenever a record is
deleted from the master table.
 The restrict method: Restricts the deletion of records from the master table if the related
records are present in the dependent tables.
 The nullify method: Nullifies the values in the specified columns of the dependent tables
whenever a record is deleted from the master table.
Update trigger: Is fired when an UPDATE statement is executed in the trigger table. It uses
the two logical tables for its operations. The deleted table contains the original rows (the
rows with the values before updating) and the inserted table stores the new rows (the
modified rows). After all the rows are updated, the Deleted and Inserted tables are populated
and the trigger is fired.

For example, you have a table with three columns. The table stores the details of hardware devices.
You updated a value in Column2 from ‘Printer’ to ‘Lex New Printer’. During the update process,
the Deleted table holds the original row (the row with the values before updating) with the value
‘Printer’ in Column2. The Inserted table stores the new row (the modified row) with the value
‘Lex New Printer’ in Column2.

The following figure illustrates the functioning of the update trigger.

Functioning of the Update Trigger

Depending on the way the triggers are fired, the triggers can be further categorized as:

After Triggers
Instead of Triggers
After Triggers

The after trigger can be created on any table for the insert, update or delete operation just like
other triggers. The main difference in the functionality of an after trigger is that it is fired after the
execution of the DML operation for which it has been defined. The after trigger is executed when
all the constraints and triggers defined on the table are successfully executed.

If more than one after trigger is created on a table, then the sequence of execution is the order in
which they were created.

For example, the EmpSalary table stores the salary and tax details for all the employees in an
organization. You need to ensure that after the salary details of an employee are updated in the
EmpSalary table, the tax details are also recalculated and updated. In such a scenario, you can
implement an after trigger to update the tax details when the salary details are updated.

You can have multiple after triggers for any single DML operation.

Instead of Triggers

The instead of triggers can be primarily used to perform an action, such as a DML operation on
another table or view. This type of trigger can be created on both, a table as well as a view. For
example, if a view is created with multiple columns from two or more tables, then an insert
operation on the view is only possible if the primary key fields from all the base tables are used
in the query. Alternatively, if you use an instead of trigger, you can insert data in the base tables
individually. This makes the view logically updateable.

An instead of trigger can be used for the following actions:

Ignoring parts of a batch.


Not processing a part of a batch and logging the problem rows.
Taking an alternative action when an error condition is encountered.

You can even create an instead of trigger to restrict deletion in a master table. For example, you
can display a message “Master records cannot be deleted” if a delete statement is executed on the
Employee table of the AdventureWorks database.

Unlike after triggers, you cannot create more than one instead of trigger for a DML operation on
the same table or view.

DDL Triggers

A DDL trigger is fired in response to DDL statements, such as CREATE TABLE or ALTER
TABLE. DDL triggers can be used to perform administrative tasks, such as database auditing.
Database auditing helps in monitoring DDL operations on a database. DDL operation can include
operations, such as creation of a table or view, or modification of a table or procedure. Consider
an example, where you want the database administrator to be notified whenever a table is created
in the Master database. For this purpose, you can create a DDL trigger.

Nested Triggers
Nested triggers are fired due to actions of other triggers. Both DML and DDL triggers can be
nested when a trigger performs an action that initiates another trigger. For example, a delete trigger
on the Department table deletes the corresponding employee records in the Employee table. A
delete trigger on the Employee table inserts the deleted employee records in the EmployeeHistory
table. Therefore, the delete trigger on the Department table initiates the delete trigger on the
Employee table, as shown in the following figure.

Nested Trigger

Recursive Triggers

Recursive triggers are a special case of nested triggers. Unlike nested triggers, support for
recursive triggers is at the database level. As the name implies, a recursive trigger eventually calls
itself. There are the following types of recursive triggers:

Direct
Indirect

Direct Recursive Trigger

When a trigger fires and performs an action that causes the same trigger to fire again, the trigger
is called recursive trigger. For example, an application updates the table A, which causes trigger
T1 to fire. T1 updates the table A again, which causes the trigger T1 to fire again.

The following figure depicts the execution of the direct recursive trigger.

Direct Recursive Trigger


Indirect Recursive Trigger

An indirect recursive trigger fires a trigger on another table and eventually the nested trigger ends
up firing the first trigger again. For instance, an UPDATE statement on table A fires a trigger that
in turn fires an update on table B. The update on table B fires another trigger that performs an
update on table C. Table C has a trigger that causes an update on table A again. The update trigger
of table A is fired again. The following figure depicts the execution of the indirect recursive
trigger.

Indirect Recursive Trigger

Just a minute:
You want to make changes in another database object whenever any new database object is
created. Which of the following triggers will you use?

1. DML Trigger
2. Instead of Trigger
3. DDL Trigger
4. Nested Trigger

Answer:
3. DDL Trigger</

Creating Triggers
You can use the CREATE TRIGGER statement to create triggers. The syntax of the CREATE
TRIGGER statement is:
CREATE TRIGGER trigger_name
ON { OBJECT NAME }
{ FOR | AFTER | INSTEAD OF } { event_type [ ,...n ] | DDL_DATABASE_LEVEL_EVENTS }
{ AS
{ sql_statement [ ...n ] }
}

where,
trigger_name specifies the name of the trigger to be created.
table_name specifies the name of the table on which the trigger is to be created.
FOR | AFTER | INSTEAD OF specifies the precedence and execution context of a trigger.
AS sql_statements specifies the trigger conditions and actions. A trigger can contain any
number of T-SQL statements, provided these are enclosed within the BEGIN and END
keywords.

For example, the following statement creates a trigger on the EmployeeDepartmentHistory table
of the AdventureWorks database:
CREATE TRIGGER [HumanResources].[trgDepartment] ON [HumanResources].[Department]
AFTER UPDATE AS
BEGIN
UPDATE [HumanResources].[Department]
SET [HumanResources].[Department].[ModifiedDate] = GETDATE()
FROM Inserted
WHERE Inserted.[DepartmentID] = [HumanResources].[Department].[DepartmentID];
END;

The preceding statement creates a trigger named trgDepartment. This trigger is fired on every
successfully executed UPDATE statement on the HumanResources.Department table. The trigger
updates the ModifiedDate column of every updated value with the current date.

The following statement creates a trigger to display the data that is inserted in the magic tables:
CREATE TRIGGER trgMagic ON EmpDeptHistory
AFTER UPDATE AS
BEGIN
SELECT * FROM Deleted
SELECT * FROM Inserted
END;

The preceding statement creates an after update trigger on the EmpDeptHistory table. Whenever
any UPDATE statement is fired on the EmpDeptHistory table, the trgMagic trigger is executed
and shows you the previous value in the table as well as the updated value. For example, you
execute the following UPDATE statement on the EmpDeptHistory table:
UPDATE EmpDeptHistory SET DepartmentID = 16
WHERE EmployeeID = 4

When the preceding statement is executed, the trgMagic trigger is fired displaying the output, as
shown in the following figure.
Output of the trgMagic Trigger

In the preceeding figure, the result set on the top shows the values before the execution of the
UPDATE statement. The result set at the bottom shows the updated values.

Use the following statement to create the EmpDeptHistory table:


SELECT * INTO EmpDeptHistory
FROM HumanResources.EmployeeDepartmentHistory

Creating an Insert Trigger

An insert trigger gets fired at the time of adding new rows in the trigger table. For example, users
at AdventureWorks, Inc. want the modified date to be set to the current date whenever a new
record is entered in the Shift table. To perform this task, you can use the following statement:
CREATE TRIGGER HumanResources.trgInsertShift
ON HumanResources.Shift
FOR INSERT
AS
DECLARE @ModifiedDate datetime
SELECT @ModifiedDate = ModifiedDate FROM Inserted
IF (@ModifiedDate != getdate())
BEGIN
PRINT 'The modified date should be the current date.
Hence, cannot insert.'
ROLLBACK TRANSACTION
END
RETURN

The ROLLBACK TRANSACTION statement is used to roll back transactions. The ROLLBACK
TRANSACTION statement in the trgInsertShift trigger is used to undo the INSERT operation.

Creating a Delete Trigger

A delete trigger gets fired at the time of deleting rows from the trigger table. For example, the
following statement creates a trigger to disable the deletion of rows from the Department table:
CREATE TRIGGER trgDeleteDepartment
ON HumanResources.Department
FOR DELETE
AS
PRINT 'Deletion of Department is not allowed'
ROLLBACK TRANSACTION
RETURN

Creating an Update Trigger


An update trigger gets fired at the time of updating records in the trigger table. For example, you
need to create a trigger to ensure that the average of the values in the Rate column of the
EmployeePayHistory table should not be more than 20 when the value of Rate is increased. To
perform this task, you can use the following statement:
CREATE TRIGGER trgUpdateEmployeePayHistory
ON HumanResources.EmployeePayHistory
FOR UPDATE
AS
IF UPDATE (Rate)
BEGIN

DECLARE @AvgRate float


SELECT @AvgRate = AVG(Rate)
FROM HumanResources.EmployeePayHistory
IF(@AvgRate > 20)
BEGIN
PRINT 'The average value of rate cannot be more than 20'
ROLLBACK TRANSACTION
END
END

Creating an After Trigger

An after trigger gets fired after the execution of a DML statement. For example, you need to
display a message after a record is deleted from the Employee table. To perform this task, you can
write the following statement:
CREATE TRIGGER trgDeleteShift ON HumanResources.Shift
AFTER
DELETE
AS
PRINT 'Deletion successful'

If there are multiple after triggers for a single DML operation, you can change the sequence of
execution of these triggers by using the sp_settriggerorder system stored procedure. The syntax
of the sp_settriggerorder stored procedure is:
sp_settriggerorder <triggername>, <order-value>, <DML-operation>

where,
triggername specifies the name of the trigger whose order of execution needs to be changed.
order-value specifies the order in which the trigger needs to be executed. The values that can
be entered are FIRST, LAST, and NONE. If FIRST is mentioned, then the trigger is the first
trigger to be executed, if LAST is mentioned, then the trigger will be the last trigger to be
executed. If NONE is specified, then the trigger is executed on a random basis.
DML-operation specifies the DML operation for which the trigger was created. This should
match the DML operation associated with the trigger. For example, if UPDATE is specified
for a trigger that is created for the INSERT operation, the sp_settriggerorder stored procedure
will generate an error.
For example, you have created another after trigger, trgDeleteShift1 on the Shift table. By default,
triggers are executed in the sequence of their creation. However, if you need to execute the trigger
named trgDeleteShift1 before the trgDeleteShift trigger, you can execute the following statement:
sp_settriggerorder ‘HumanResources.trgDeleteShift1', ‘FIRST', ‘DELETE’
RETURN

Creating an Instead of Trigger

Instead of triggers are executed in place of the events that cause the trigger to fire. For example,
if you create an INSTEAD OF UPDATE trigger on a table, the statements specified in the trigger
will be executed instead of the UPDATE statement that caused the trigger to fire.

These triggers are executed after the inserted and deleted tables reflecting the changes to the base
table are created, but before any other action is taken. They are executed before any constraint,
and therefore, supplement the action performed by a constraint. You can create the following
instead of trigger to restrict the deletion of records in the Project table:
CREATE TRIGGER trgDelete ON HumanResources.Project
INSTEAD OF DELETE
AS
PRINT 'Project records cannot be deleted'

Creating a DDL Trigger

DDL triggers are a special kind of triggers that fire in response to the DDL statements. They can
be used to perform administrative tasks in the database such as auditing and regulating database
operations. The following statement creates a DDL trigger:
CREATE TRIGGER safety
ON DATABASE
FOR DROP_TABLE, ALTER_TABLE
AS
PRINT 'You must disable Trigger "safety" to drop or alter tables!'
ROLLBACK

In the preceding statement, the safety trigger will fire whenever a DROP_TABLE or
ALTER_TABLE event occurs in the database.

To check the existence of a trigger, use the following syntax: sp_help <trigger_name></

Managing Triggers
While managing triggers, you can perform the following operations on a trigger:

Alter a trigger
Delete a trigger

Altering a Trigger
As a database developer, you might need to modify the logic or code behind a trigger. For
example, a trigger is used to calculate a 10 percent discount on every item sold. With the new
management policy, the discount rate has been increased to 15 percent. To reflect this change in
the trigger, you need to change the code in the trigger. You can use the ALTER TRIGGER
statement to modify the trigger. The SYNTAX of the ALTER TRIGGER statement is:
ALTER TRIGGER trigger_name
{ FOR | AFTER } { event_type [ ,...n ] | DDL_DATABASE_LEVEL_EVENTS }
{ AS
{ sql_statement [ ...n ] }
}

For example, when an employee resigns or is transferred from one department to another, the end
date is updated in the EmployeeDepartmentHistory table. After the end date is updated, the
ModifiedDate attribute of the EmployeeDepartmentHistory table should be updated to the current
date.

You can modify the trgInsertShift trigger that was created earlier to check whether the
ModifiedDate attribute is the current date or not. If the ModifiedDate attribute is not the current
date, the trigger should display a message, “The modified date is not the current date. The
transaction cannot be processed.” To modify the trgInsertShift trigger, you need to execute the
following statement:
ALTER TRIGGER HumanResources.trgInsertShift
ON HumanResources.Shift
FOR INSERT
AS
DECLARE @ModifiedDate datetime
SELECT @ModifiedDate = ModifiedDate FROM Inserted
IF (@ModifiedDate != getdate())
BEGIN
RAISERROR ('The modified date is not the current date. The transaction cannot be
processed.',10, 1)
ROLLBACK TRANSACTION
END
RETURN

Deleting a Trigger

As the requirements change, you may need to delete some triggers. For example, you have a
trigger that updates the salaries of the employees in their pay slips. Now, this function is performed
by a front-end application in your organization. Therefore, you need to remove the trigger.

To delete a trigger, you can use the DROP TRIGGER statement. The syntax of the DROP
TRIGGER statement is:
DROP TRIGGER { trigger }

where,
trigger is the name of the trigger you want to drop.

The following statement drops the HumanResources.trgMagic trigger:


DROP TRIGGER HumanResources.trgMagic
Just a minute:
Name the tables that are created when a trigger is fired in response to the INSERT, DELETE
or UPDATE statements.

Answer:
Magic tables, Inserted and Deleted

Activity: Implementing Triggers

Problem Statement

In AdventureWorks, Inc., you have created the following view, vwEmployee to view the
employee details:
CREATE VIEW vwEmployee AS
SELECT e.EmployeeID AS 'Employee ID',
h.FirstName as 'Employee Name', g.Name AS 'Department Name',
e.HireDate AS 'Date of Joining', j.AddressLine1 AS 'Employee Address'
FROM HumanResources.Employee AS e
JOIN HumanResources.EmployeeDepartmentHistory AS f ON
e.EmployeeID = f.EmployeeID JOIN HumanResources.Department AS g
ON f.DepartmentID = g.DepartmentID
JOIN Person.Contact AS h ON e.ContactID = h.ContactID
JOIN HumanResources.EmployeeAddress AS i ON
e.EmployeeID = i.EmployeeID JOIN Person.Address AS j
ON i.AddressID = j.AddressID

You have identified that you are not able to modify data by using this view because it is based on
multiple tables. How can you make the view updateable?

Solution

To solve the preceding problem, you need to perform the following tasks:

1. Create an instead of trigger on the view.


2. Verify the functionality.

Task 1: Creating an Instead Of Trigger on the View

If a view is based on multiple tables, you cannot modify data in all the base tables by using the
view. To do this, you need to use an instead of trigger. To create this trigger, you need to perform
the following steps:
1. Type the following statement in the Query Editor window of the Microsoft SQL Server
Management Studio window:
CREATE TRIGGER trgEmployee ON vwEmployee
INSTEAD OF
UPDATE AS
BEGIN
UPDATE Person.Contact SET FirstName = (SELECT [Employee Name] FROM Inserted)
WHERE ContactID = (SELECT ContactID FROM HumanResources.Employee
WHERE EmployeeID = (SELECT [Employee ID] FROM Inserted))
UPDATE HumanResources.EmployeeDepartmentHistory SET DepartmentID = (
SELECT DepartmentID FROM HumanResources.Department WHERE Name = (SELECT
[Department Name] FROM Inserted)) WHERE
EmployeeID = (SELECT [Employee ID] FROM Inserted)
END

2. Press the F5 key to execute the statement.

Task 2: Verifying the Functionality

To verify the functionality of the instead of trigger, perform the following steps:

1. Execute the following statement to update the Name and Department Name of an
employee:

UPDATE vwEmployee
SET [Employee Name] = ‘Ron', [Department Name] = ‘Sales’
WHERE [Employee ID] = 51

2. Execute the following query to view the result:

SELECT * FROM vwEmployee WHERE [Employee ID] = 51


Both the values are updated in the result set, as shown in the following figure.

Output of the View with Updated Values

Implementing Transactions
At times, you are required to execute a sequence of statements as a single logical unit of work.
For example, whenever you transfer money from one bank account to another account, the amount
is debited from one account and credited to another account. In such a situation, you need either
all the statements to be executed successfully or none of them to be executed. This helps in
ensuring data integrity.

In SQL Server 2005, you can implement transactions to ensure data integrity. In a multi-user
environment, there can be multiple transactions accessing the same resource at the same time. To
prevent errors that could occur due to transactions accessing the same resource, you can use locks.
Locks provide a mechanism to secure a resource until one transaction is executed so that only one
transaction can work on a database resource at a time.</

Creating Transactions
A transaction can be defined as a sequence of operations performed together as a single logical
unit of work. A single unit of work must possess the following properties called ACID (Atomicity,
Consistency, Isolation, and Durability).

Atomicity: This states that either all the data modifications are performed or none of them
are performed.
Consistency: This states that all data is in a consistent state after a transaction is completed
successfully. All rules in a relational database must be applied to the modifications in a
transaction to maintain complete data integrity.
Isolation: This states that any data modification made by concurrent transactions must be
isolated from the modifications made by other concurrent transactions. In simpler words, a
transaction either accesses data in the state in which it was before a concurrent transaction
modified it, or accesses the data after the second transaction has been completed. There is
no scope for the transaction to see an intermediate state.
Durability: This states that any change in data by a completed transaction remains
permanently in effect in the system. Therefore, any change in data due to a completed
transaction persists even in the event of a system failure. This is ensured by the concept of
backing up and restoring transaction logs.

It is important that a database system provides mechanisms to ensure the physical integrity of each
transaction. To fulfill the requirements of the ACID properties, SQL Server provides the following
features:

Transaction management: Ensures the atomicity and consistency of all transactions. A


transaction must be successfully completed after it has started, or SQL Server undoes all the
data modifications made since the start of the transaction.
Locking: Preserves transaction durability and isolation.

SQL Server allows implementing transactions in the following ways:

Autocommit transaction
Implicit transaction
Explicit transaction

Autocommit Transaction

The autocommit transaction is the default transaction management mode of SQL Server. Based
on the completeness of every T-SQL statement, transactions are automatically committed or rolled
back. A statement is committed if it is completed successfully, and it is rolled back if it encounters
an error.

Implicit Transaction
An implicit transaction is one where you do not need to define the start of the transaction. You
are only required to commit or roll back the transaction. You also need to turn on the implicit
transaction mode to specify the implicit transaction. After you have turned on the implicit
transaction mode, SQL Server starts the transaction when it executes any of the statements listed
in the following table.
ALTER TABLE INSERT CREATE
OPEN DELETE DROP
REVOKE SELECT FETCH
TRUNCATE TABLE GRANT UPDATE

Implicit Transaction Statements

The transaction remains in effect until you issue a COMMIT or ROLLBACK statement. After the
first transaction is committed or rolled back, SQL Server starts a new transaction the next time
any of the preceding statements is executed. SQL Server keeps generating a chain of implicit
transactions until you turn off the implicit transaction mode.

You can turn on the implicit transaction mode by using the following statement:
SET IMPLICIT_TRANSACTIONS ON;

For example, consider the following statements:


SET IMPLICIT_TRANSACTIONS ON;

INSERT INTO Emp VALUES ('Jack', 'Marketing');


INSERT INTO Emp VALUES ('Robert', 'Finance');

-- Commit first transaction.


COMMIT TRANSACTION;

-- Second implicit transaction started by a SELECT statement.


SELECT COUNT(*) FROM Emp;
INSERT INTO Emp VALUES ('Peter', 'Sales');
SELECT * FROM ImplicitTran;

-- Commit second transaction.


COMMIT TRANSACTION;

In the preceding statements, the implicit transaction mode is turned on. After the first transaction
is committed, the second implicit transaction is started as soon as the SELECT statement is
executed.

You can turn off the implicit transaction mode by using the following statement:
SET IMPLICIT_TRANSACTIONS OFF;

Explicit Transaction
An explicit transaction is one where both the start and the end of the transaction are defined
explicitly. Explicit transactions were called user-defined or user-specified transactions in earlier
versions of SQL Server. Explicit transactions are specified by using the following statements:

BEGIN TRANSACTION: Is used to set the starting point of a transaction.


COMMIT TRANSACTION: Is used to save the changes permanently in the database.
ROLLBACK TRANSACTION: Is used to undo the changes.
SAVE TRANSACTION: Is used to establish save points that allow partial rollback of a
transaction.

You can use the BEGIN TRANSACTION statement to override the default autocommit mode. SQL
Server returns to the autocommit mode when the explicit transaction is committed or rolled back.

Beginning a Transaction

The BEGIN TRANSACTION statement marks the start of a transaction. The syntax of the BEGIN
TRANSACTION statement is:
BEGIN TRAN[SACTION] [transaction_name | @tran_name_variable]

where,
transaction_name is the name assigned to the transaction. This parameter must conform to the
rules for identifiers and should not be more than 32 characters.
@tran_name_variable is the name of a user-defined variable that contains a valid transaction
name. This variable must be declared with a char, varchar, nchar, or nvarchar data type.

Commiting a Transaction

The COMMIT TRANSACTION or COMMIT WORK statement marks the end of an explicit
transaction. This statement is used to end a transaction for which no errors were encountered
during the transaction. The syntax of the COMMIT TRANSACTION statement is:
COMMIT [ TRAN[SACTION] [transaction_name | @tran_name_variable] ]

where,
transaction_name is ignored by SQL Server. This parameter specifies a transaction name
assigned by a previous BEGIN TRANSACTION statement. The transaction_name parameter
can be used to indicate the nested BEGIN TRANSACTION statement that is associated with
the COMMIT TRANSACTION statement.
@tran_name_variable is the name of a user-defined variable that contains a valid transaction
name. This variable must be declared with a char, varchar, nchar, or nvarchar data type.

Consider a scenario where a banker named Sally wants to transfer $25,000 from a savings account
to a fixed deposit account. To perform this transaction, the following statements need to be
executed:
UPDATE FixedDepositAccount
SET Balance = Balance - 25000
WHERE AccountName = ‘Sally’
UPDATE SavingsAccount
SET Balance = Balance + 25000
WHERE AccountName = ‘Sally’

For the preceding statements, either both should be executed successfully or none of them should
be executed. If any of the statements fails to execute, the entire transaction should be rolled back.
Therefore, you need to define the beginning and end of a transaction, as shown in the following
statements:
BEGIN TRAN myTran
UPDATE FixedDepositAccount
SET Balance = Balance - 25000
WHERE AccountName = ‘Sally’

UPDATE SavingsAccount
SET Balance = Balance + 25000
WHERE AccountName = ‘Sally’
COMMIT TRAN myTran

The preceding statements create a transaction named myTran, which updates the
FixedDepositAccount and the SavingsAccount tables.

Just a minute:
Which of the following properties does a transaction NOT possess?

1. Atomicity
2. Consistency
3. Isolation
4. Separation

Answer:
4. Separation</

Reverting Transactions
Sometimes, all the statements of a transaction do not execute successfully due to some problem.
For example, in case of a power failure between two statements, one statement will be executed
and the other will not. This leaves the transaction in an invalid state. In such a case, you need to
revert to the statement that has been successfully executed to maintain consistency.

The ROLLBACK TRANSACTION statement rolls back an explicit or implicit transaction to the
beginning of the transaction, or to a save-point within a transaction. A save-point is used to divide
a transaction into smaller parts. Save-points allow you to discard parts of a long transaction instead
of rolling back the entire transaction.

The syntax of the ROLLBACK TRANSACTION statement is:


ROLLBACK [TRAN[SACTION] [transaction_name |@tran_name_variable |savepoint_name |
@savepoint_variable] ]

where,
transaction_name is the name assigned to the transaction. This parameter must conform to the
rules for identifiers and should not be more than 32 characters.
@tran_name_variable is the name of a user-defined variable that contains a valid transaction
name. This variable must be declared with a char, varchar, nchar, or nvarchar datatype.
savepoint_name is the name assigned to the save-point.
@savepoint_variable is the name of a user-defined variable containing a valid save-point
name. This variable must be declared with a char, varchar, nchar, or nvarchar datatype.

For example, while updating the personal details of an employee, you want all the statements to
be rolled back if any query fails. To ensure this, you can write the following statements:
BEGIN TRANSACTION TR1
BEGIN TRY
UPDATE Person.Contact
SET EmailAddress='[email protected]'
WHERE ContactID = 1070
--Statement 1
UPDATE HumanResources.EmployeeAddress SET AddressID = 32533
WHERE EmployeeID = 1
COMMIT TRANSACTION TR1
--Statement 2
SELECT 'Transaction Executed'
END TRY
BEGIN CATCH
ROLLBACK TRANSACTION TR1
SELECT 'Transaction Rollbacked'
END CATCH

In the preceding statements, the TR1 transaction updates the e-mail address of an employee. In
addition, this transaction updates the address of the employee. In the query, the first statement is
executed, but the second statement gives an error due to which the whole transaction is rolled
back.</

Implementing Transactional Integrity


In a multi-user environment, multiple users can access the database server at a single point of
time. Multiple users might also execute the UPDATE or SELECT statements on the same data.
This may lead to data redundancy or incorrectness in the database.

Consider the example of AdventureWorks, Inc. A user sends a request to update all the records in
a table. At the same time, another user might send a query to update data in only selected records
in the same table. In such a case, there are chances of losing information that results inconsistency
in the database. Such problems can be resolved by using locks.

SQL Server uses the concept of locking to ensure transactional integrity. In a multi-user
environment, locking prevents users from changing the same data at the same time. In SQL Server,
locking is implemented automatically. However, you can also explicitly use locks. Locks are used
to guarantee that the current user of a resource has a consistent view of that resource, from the
beginning to the end of an operation. This ensures completeness of data modifications.

For a transaction-processing database, a DBMS resolves potential conflicts between two different
processes that are attempting to change the same information at the same time.

Transactional concurrency is the ability of multiple transactions to access or change the shared
data at the same time. Transactional concurrency is impacted when a transaction trying to modify
the data prevents other transactions from reading the data.

Need for Locking

In the absence of locking, problems may occur if more than one transaction uses the same data
from a database at the same time. These problems include:

Lost updates
Uncommitted dependency (Dirty Read)
Inconsistent analysis
Phantom reads

Lost Updates

A lost update occurs when two or more transactions try to modify the same row. In this case, each
transaction is unaware of the other transaction. The last update in the transaction queue overwrites
the updates made by the previous transactions. This leads to loss of data manipulation performed
by the previous transactions.

For example, in a banking application, two transactions are simultaneously trying to update the
balance details for a particular account. Both the transactions select data from the table
simultaneously and get the same value for the current balance. When one transaction is committed,
the balance is updated, but the second transaction does not get the updated value. Therefore, when
the second transaction is committed, the changes done by the previous transaction will be lost.
This results in the loss of an update.

Uncommitted Dependency (Dirty Read)

An uncommitted dependency is also known as a dirty read. This problem occurs when a
transaction queries data from a table when the other transaction is in the process of modifying
data.

For example, the details of all the products are stored in the Products table in a database. A user
is executing a query to update the price for all the products. While the changes are being made,
another user generates a report from the same table and distributes it to the intended audience.
The update query finally commits and the table is updated now. In this case, the distributed report
contains data that no longer exists and should be treated as redundant.
This problem is knows as the problem of dirty read. To avoid such a problem, you should not
allow any user to read the table until the database developer finalizes the changes.

Inconsistent Analysis

An inconsistent analysis problem is also known as a non-repeatable problem. This problem arises
when the data is changed between simultaneous read by one user.

For example, in a banking application, a user generates a report to display the balance of all the
accounts. The user uses the result set to update data. Next, the user again retrieves the same result
set to reflect the changes. Between the executions of two queries, another user updates the original
table. When the employee queries the table for the second time, the data is changed. This leads to
confusion.

Phantom Reads

A phantom read is also known as the phantom problem. This problem occurs when new records
inserted by a user are identified by transactions that started prior to the INSERT statement. For
example, in a ticket reservation application, a user, User 1, starts a transaction and queries the
available seats. The query returns a value X. Next, the transaction tries to reserve the X seats.
Simultaneously, another user, User 2, reserves the X-2 seats. When the transaction of User 1 tries
to reserve X seats, the required number of seats is unavailable.

Locking in SQL Server

SQL Server implements multi-granular locking, which allows transactions to lock different types
of resources at different levels. To minimize the effort on locking, SQL Server automatically locks
resources at a level appropriate to the transaction, for example, row level or data page level. For
transactions to access resources, SQL Server resolves a conflict between the concurrent
transactions by using lock modes. SQL Server uses the following lock modes:

Shared locks
Exclusive locks
Update locks
Intent locks
Schema locks
Bulk update locks

Shared Locks

Shared (S) locks, by their functionality, allow concurrent transactions to read a resource. If there
are any shared locks on a resource, no other transaction can modify the data on that resource. A
shared lock releases the resource after the data has been read by the transaction unless the
transaction isolation is set to repeatable read or higher.
Exclusive Locks

Exclusive (X) locks, by their functionality, exclusively restrict concurrent transactions from
accessing a resource. No other transaction can read or modify the data locked with an exclusive
lock.

Update Locks

An update (U) lock falls in between a shared and an exclusive lock. For example, to update all the
products with a price more that $10, you can run an UPDATE statement on the table. To determine
the records that need to be updated, the query will acquire a shared lock on the table.

When physical updates occur, your query acquires an exclusive lock. In the time gap between the
shared and the exclusive lock, any other transaction might change the data that you are going to
update. Therefore, an update lock can be acquired. An update lock is applied to the table along
with a shared lock, which prevents other transactions from updating the table until the update is
complete.

Intent Locks

An intent (I) lock, by its functionality, indicates that SQL Server wants to acquire a shared or
exclusive lock on some of the resources lower in the hierarchy. For example, when a shared intent
lock is implemented at the table level, a transaction will place shared locks on pages or rows
within that table.

Implementing an intent lock at the table level ensures that no other transaction can subsequently
acquire an exclusive lock on the table containing that page. Intent locks improve the performance
of SQL Server because SQL Server examines the intent locks only at the table level to determine
if a transaction can safely acquire a lock on that table. Therefore, you must examine every row or
page lock on the table to determine whether a transaction can lock the entire table.

Intent locks with their diversified features include intent shared (IS), intent exclusive (IX) and
shared with intent exclusive (SIX) locks.

Schema Locks

SQL Server considers schema modification (Sch-M) locks when any DDL operation is performed
on a table. SQL Server considers schema stability (Sch-S) locks while compiling queries. An Sch-
S lock does not block other locks including the exclusive (X) locks. Therefore, other transactions
including those with exclusive (X) locks on a table can run while a query is being compiled.

Bulk Update Locks

A bulk update lock (BU) secures your table from any other normal T-SQL statement, but multiple
BULK INSERT statements or a bulk copy program can be performed at the same time.

Controlling Locks
Locks are implemented automatically in SQL Server. By default, SQL Server locks every row
that you query. Sometimes, when you query a large record set, locks can grow from rows to data
pages and further to table levels. If the query you are executing takes time to execute, it will
prevent other users from accessing the database objects. This results in the lack of concurrency in
the database. In addition, you might need to change the lock mode from a normal shared lock to
an exclusive lock. To resolve such problems, you need to use isolation levels.

You can use isolation levels to specify the rights other transactions will have on the data being
modified by a transaction. SQL Server 2005 supports the following types of isolation levels:

READ UNCOMMITTED
READ COMMITED
REPEATABLE READ
SNAPSHOT
SERIALIZABLE

READ UNCOMMITTED

The READ UNCOMMITED isolation level specifies that a transaction can read the data modified
by the current transaction but the modifications have not been committed yet. Transactions
running with this isolation level do not perform a shared lock on the database object, enabling
other transactions to modify the data being read by the current transaction. The database objects
are also not blocked by the exclusive locks enabling other transaction to read the data being
modified but not committed by the current transaction.

When this level is set, the transaction can read the uncommitted data, resulting in the dirty-read
problem. It is the least safe isolation level.

READ COMMITED

The READ COMMITED isolation level specifies that a transaction cannot read the data that is
being modified by the current transaction. This prevents the problem of dirty-read.

This isolation level places an exclusive lock on each UPDATE statement in the current
transaction. When this isolation level is set, other transactions can update the data that has been
read by the current transaction. This results in a problem of phantom read.

It is the default isolation level in SQL Server.

REPEATABLE READ

The REPEATABLE READ isolation level specifies that a transaction cannot read the data that is
being modified by the current transaction. In addition, no other transaction can update the data
that has been read by the current transaction until the current transaction completes.

This isolation level places an exclusive lock on each UPDATE statement within the current
statement. In addition, it places a shared lock on each SELECT statement.
When this isolation level is set, other transactions can insert new rows, which result in a phantom
read.

SNAPSHOT

The SNAPSHOT isolation provides every transaction with a snapshot of the current data. Every
transaction works and makes changes on its own copy of the data. When a transaction is ready to
update the changes, it checks whether the data has been modified since the time it starts working
on the data and decides whether to update the data.

The ALLOW_SNAPSHOT_ISOLATION database option must be set to ON before you can start
a transaction that uses the SNAPSHOT isolation level. If a transaction using the SNAPSHOT
isolation level accesses data in multiple databases, the ALLOW_SNAPSHOT_ISOLATION must
be set to ON in each database.

SERIALIZABLE

The SERIALIZABLE isolation level specifies that no transaction can read, modify, or insert new
data while the data is being read or updated by the current transaction.

This is the most safe isolation level provided by SQL Server as it places a lock on each statement
of the transaction. In this isolation level, the concurrency is at the lowest.

Implementing Isolation Levels

You can implement isolation levels in your transactions by using the SET TRANSACTION
ISOLATION LEVEL statement before beginning a transaction.

The syntax of the SET TRANASCTION ISOLATION LEVEL statement is:


SET TRANSACTION ISOLATION LEVEL { READ UNCOMMITTED | READ COMMITTED |
REPEATABLE READ | SNAPSHOT | SERIALIZABLE } [ ; ]
BEGIN TRANSACTION
………
………
COMMIT TRANSACTION

For example, while updating the records of an employee, you do not want any other transaction
to read the uncommitted records. For this, you can create the following statements:
SET TRANSACTION ISOLATION LEVEL
READ COMMITTED
BEGIN TRANSACTION TR
BEGIN TRY
UPDATE Person.Contact
SET EmailAddress='[email protected]'
WHERE ContactID = 1070

UPDATE HumanResources.EmployeeAddress SET AddressID = 32533


WHERE EmployeeID = 1
COMMIT TRANSACTION TR
PRINT 'Transaction Executed'
END TRY
BEGIN CATCH
ROLLBACK TRANSACTION TR
PRINT 'Transaction Rollbacked'
END CATCH

The preceding statements sets the isolation level of transaction “TR” as READ COMMITTED.
This prevents other transaction from reading the uncommitted updates in the tables.

Consider another example. While reading records from the EmployeePayHistory table, you do
not want any other transaction to update the records until the current transaction completes its
execution. For this, you can write the following statements:
USE AdventureWorks
GO
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
GO
BEGIN TRANSACTION
GO
SELECT * FROM HumanResources.EmployeePayHistory
GO
SELECT * FROM HumanResources.Department;
GO
COMMIT TRANSACTION
GO

The preceding statements set the isolation level of transaction as REPEATABLE READ. This
prevents other transactions from updating the records untill the current ransaction completes. But,
new records can be inserted.</

Resolving Deadlocks
A deadlock is a situation where two or more transactions have locks on separate objects, and each
transaction waits for a lock on the other object to be released. Deadlocks are dangerous because
they decrease the concurrency and availability of the database and the database objects.

The following figure displays the deadlock between two objects.

Deadlock Between Two Objects

In the preceding figure, transaction A has locked the DISTRIBUTOR table and wants to lock the
PRODUCTS table. Transaction B has locked the PRODUCTS table and wants to lock the
DISTRIBUTOR table. This results in a deadlock, as both the transactions wait for the other
transaction to release the table. Since no lock will be released, this results in a deadlock.
Setting Deadlock Priority
SQL Server provides the SET DEADLOCK_PRIORITY statement to customize deadlocking.
Setting DEADLOCK_PRIORITY as LOW for a session causes a particular session to be chosen
as the deadlock victim. The DEADLOCK_PRIORITY option controls how the particular session
reacts in a deadlock.

The syntax of the DEADLOCK_PRIORITY statement is:


SET DEADLOCK_PRIORITY {LOW | NORMAL | @deadlock_var}

where,
LOW is used to specify that the current session is the preferred deadlock victim.
NORMAL is used to specify that the session returns to the default deadlock-handling method.
@deadlock_var is a character variable with a length of three characters for string of low
priority (LOW) and six characters for string of normal priority (NORMAL). It also specifies
the deadlock-handling method.

The periodic detection mechanism of SQL Server reduces the overhead of deadlock detection
because deadlocks affect only a small number of transactions.

Customizing Lock Timeout

When a transaction attempts to lock a resource, which is already held by another transaction, SQL
Server informs the first transaction of the current availability status of the resource. If the resource
is locked, the first transaction is blocked, waiting for that resource. If there is a deadlock, SQL
Server terminates one of the participating transactions. In case a deadlock does not occur, the
requesting transaction is blocked until the other transaction releases the lock. By default, SQL
Server does not enforce the timeout period.

The SET LOCK_TIMEOUT statement can be used to set the maximum time that a statement
waits on a blocked resource. After LOCK_TIMEOUT is set, when a statement has waited longer
than the LOCK_TIMEOUT setting, SQL Server automatically cancels the waiting transaction.
The syntax of LOCK_TIMEOUT is:
SET LOCK_TIMEOUT [timeout_period]

where,
timeout_period is represented in milliseconds and is the time that will pass before SQL
Server returns a locking error for a blocked transaction. You can specify a value of 1 to
implement default timeout settings.

Detecting Deadlocks

In order to detect deadlock situations, SQL Server scans for sessions that are waiting for a lock
request. SQL Server, during the initial scan, marks a flag for all the waiting sessions. When SQL
Server scans the sessions for the second time, a recursive deadlock search begins. If any circular
chain of lock requests is found, SQL Server cancels the least expensive transaction and marks that
transaction as the deadlock victim.

With the intelligent use of a deadlock scan for sessions, SQL Server ends a deadlock by
automatically selecting the user who can break the deadlock as the deadlock victim. In addition,
after rolling back the deadlock victim’s transaction, SQL Server notifies the user’s application
through error message number 1205.

Using sys.dm_exec_requests

The sys.dm_exec_requests system view returns information about each transaction that is
executing within SQL Server. This view can be used to detect the transaction that is causing the
deadlock. You can query this view by using the following query:
SELECT * FROM sys.dm_exec_requests

The output of the preceding query is shown in the following figure.

Output of the sys.dm_exec_requests View

The output shows the list of all the transactions currently running on the server. In addition, it
shows the status of all the transactions, which includes the session_id, status, and the owner.

Avoiding Deadlocks by Using Update Locks


When two concurrent transactions acquire shared mode locks on a resource, and then attempt to
update the data concurrently, one transaction attempts conversion of the lock to an exclusive lock.
In this scenario, the conversion from a shared mode to an exclusive lock must wait. This is because
the exclusive lock for one transaction is not compatible with the shared mode lock of the other
transaction. Therefore, a lock wait occurs.

The second transaction modifies the data and attempts to acquire an exclusive lock. Under these
circumstances, when both the transactions are converting from shared to exclusive locks, a
deadlock occurs, because each transaction is waiting for the other transaction to release its shared
mode lock. Therefore, update locks are used to avoid this potential deadlock problem. SQL Server
allows only one transaction to obtain an update lock on a resource at a time. The update lock is
converted to an exclusive lock if a transaction modifies a resource. Otherwise, the lock is
converted to a shared mode lock.
Just a minute:
Which of the following concurrency problems is also known as DIRTY READ?

1. Uncommitted dependency
2. Phantom problem
3. Inconsistence analysis

Answer:
1. Uncommitted dependency

Just a minute:
Which of the following locks enables others to view the data being modified by the current
transaction?

1. Shared lock
2. Exclusive lock
3. Update lock
4. Intent lock
5. Bulk Update lock

Answer:
1. Shared lock

Just a minute:
Which of the following locks prevents your database from deadlocks?

1. Intent lock
2. Update lock
3. Shared lock

Answer:
2. Update lock

Activity: Implementing Transactions

Problem Statement
At AdventureWorks, Inc., an employee named Sidney Higa, who is currently working as
Production Technician – WC10 has been promoted as Marketing Manager. The employee ID of
Sidney is 13. As a database developer, you need to update his records. This involves updating the
title in the Employee table and updating the department history details.

You need to ensure that all the changes take effect. In addition, you need to ensure that no other
transaction should be able to view the data being modified by the current transaction.

Solution

To solve the preceding problem, you need to perform the following tasks:

1. Create a transaction.
2. Verify the output.

Task 1: Creating a Transaction

As changes need to be made to two tables, you can create a transaction to ensure that all the
changes take effect. In addition, you need to set the isolation level for the transaction to ensure
that no other transaction can read the data being modified. For this, you need to perform the
following steps:
1. Type the following statements in the Query Editor window of the Microsoft SQL Server
Management Studio window:
SET TRANSACTION ISOLATION LEVEL READ COMMITTED

BEGIN TRANSACTION

BEGIN TRY

UPDATE HumanResources.Employee
SET Title = 'Marketing Manager'
WHERE EmployeeID = 13

UPDATE HumanResources.EmployeeDepartmentHistory
SET EndDate = getDate()
WHERE EmployeeID = 13 AND EndDate = NULL

INSERT INTO HumanResources.EmployeeDepartmentHistory


VALUES(13, 4, 1, getDate(), NULL, getDate())

COMMIT TRANSACTION
PRINT 'TRANSACTION COMMITTED'
END TRY

BEGIN CATCH
ROLLBACK TRANSACTION

END CATCH

2. Press the F5 key to execute the statements.


Task 2: Verifying the Output

To verify the output, you can check the record of Sidney Higa in the EmployeeDepartmentHistory
and Employee tables. For this, you need to perform the following steps:
1. Type the following queries in the Query Editor window of the Microsoft SQL Server
Management Studio window:
SELECT * FROM HumanResources.EmployeeDepartmentHistory WHERE EmployeeID = 13
SELECT * FROM HumanResources.Employee WHERE EmployeeID = 13
2. Press the F5 key to execute the queries. The result set should show the updated data for
Sidney Higa.

Summary
In this chapter, you learned that:

A trigger is a block of code that constitutes a set of T-SQL statements that are activated in
response to certain actions.
SQL Server supports the following triggers:
 DML triggers
 DDL triggers
You can alter and delete a trigger.
Transactions are used to execute a sequence of statements together as a single logical unit
of work.
Every transaction possesses the ACID property.
SQL Server supports the following transactions:
 Autocommit transaction
 Implicit transaction
 Explicit transaction
Locks are used to maintain transactional integrity.
In the absence of locking, the following problems may occur if transactions use the same data
from a database at the same time:
 Lost updates
 Uncommitted dependency
 Inconsistent analysis
 Phantom reads
SQL Server supports the following lock modes:
 Shared locks
 Exclusive locks
 Update locks
 Intent locks
 Schema locks
 Bulk update locks
A deadlock is a situation where two users (or transactions) have locks on separate objects,
and each user wants to acquire a lock on the other user’s object.
Exercises

Exercise 1

The management of AdventureWorks, Inc. has decided that no user should be able to change the
prices of the products. In addition, management wants that all the attempts to change the price
should be saved in a temporary table, Temp. John, the database developer has been asked to make
the significant changes in the database to implement this policy. What can John do to achieve the
same?

Exercise 2

The management of AdventureWorks, Inc. wants that whenever the pay rate of an employee is
modified, its effect on the monthly salary of the employee should be displayed. John, a database
developer at AdventureWorks, has been asked to resolve this problem. Help John to find out
appropriate solution.

Tip
Monthly Salary = Rate * PayFrequency * 30

Chapter 9
Implementing Managed Code
Managed code runs in.NET CLR. SQL Server 2005 integrates Common Language Runtime
(CLR) to allow execution of managed code within the SQL Server environment. This provides
flexibility in writing the database code in multiple languages supported by .NET. Managed code
also takes advantage of programming languages to implement complex programming logic in
database objects, such as stored procedures and triggers.

This chapter introduces SQLCLR, which is the CLR hosted on SQL Server. Further, it explains
how to implement stored procedures, user-defined functions, triggers, and user-defined types by
using managed code.

Objectives
In this chapter, you will learn to:
Understand managed code
Implement managed database objects

Understanding Managed Code


As a database developer, you create database objects, such as procedures, functions, and triggers,
to implement programming logic by using T-SQL. However, in some situations it is not possible
to implement the required functionality by using the T-SQL code. For example, you need to store
the credit card number in a table in an encrypted format so that it cannot be tampered. For this,
you need to apply various string manipulations and mathematical calculations, which involve
usage of arrays and constructs. It is a very complex process to do this by using the T-SQL code.

With CLR integration in SQL Server 2005, you can create programs in any of the .NET-supported
languages to implement enhanced programming logic that cannot be implemented by using T-
SQL. Code that is written in any of the .NET supported languages and runs in the CLR is called
managed code. You can embed these programs in your database so that they can run in the same
environment in which the database exists.

CLR is the heart of the Microsoft .NET Framework and provides the execution environment for all
the .NET programming languages.

To implement managed code on the database server, it is important for you to know the details of
CLR integration with SQL Server. It is also important to identify situations when it is better to
implement the managed code than the T-SQL code.</

Introduction to SQL Server CLR Integration


Database developers can use T-SQL to write codes in the form of procedures and functions, but
T-SQL is not a complete and comprehensive programming language. Unlike other programming
languages, T-SQL does not support object-orientation, arrays, collections, for-each loops, or usage
of external resources and classes.

To implement programming logic that involves complex operations in the database, SQL Server
2005 is integrated with the .NET Framework. CLR is an environment that executes codes written
in any .NET programming language. CLR integration allows database developers to create objects
in any of the .NET-supported languages and embeds the objects in the database. Such a database
object is called a managed database object.

CLR integration provides the following benefits:

Better programming model: The .NET programming languages provide a set of programming
constructs that are not available in T-SQL. In addition, .NET provides a set of classes that can
be used to implement a predefined functionality. For example, you need to save the data in
a file in a compressed format. For this, you can use BinaryStreamReader and
BinaryStreamWriter classes provided by the .NET Framework base class library.
Common development environment: Application developers can use the same tool, Visual
Studio 2005, to create database objects and scripts as they use to create a client-tier.
Ability to define data types: Database developers can create user-defined data types to
expand the storage capabilities of SQL Server. For example, you need to store a set of values
in a single column or variable in a pre-defined format. For this, you need to use an array. In
such a case, you can create a new data type using a .NET programming language.

Identifying the Need for Managed Code


You can use either managed code or T-SQL to implement the database programming logic.
However, it is not always required to create managed code to implement a programming logic. It
is essential to identify situations where creating a managed code is a better option.

You should use T-SQL statements when you need to:

Perform data access and manipulation operations that can be done using T-SQL statements.
Create database objects, such as procedures, functions, or triggers, requiring basic
programming logic that can be implemented by using the programming constructs provided
by T-SQL.

For example, T-SQL provides efficiency and ease to perform pure database oriented operations,
such as joining two result sets. In comparison, creating a CLR object to apply a join involves
writing code to join the data rows in one result set with the rows in another result set. In such a
situation, using T-SQL is a better decision.

You should use managed database objects when you need to:

Implement complicated programming logic for which you can reuse the functionality
provided by the .NET base class libraries.
Access external resources, such as calling a Web service or accessing the file system.
Implement a CPU-intensive functionality that can run more efficiently as compared to the
managed code.

For example, you need to extract data from a database table and store the result set in a file stored
in the local file system in the XML format. You will not be able to implement this functionality
by using T-SQL. Instead, you can create a managed database object by using a .NET language to
store the data in an XML file.

In addition to all the preceding points, T-SQL is interpreted at the time of execution, and is,
therefore, slow as compared to the managed code. However, the managed code is pre-compiled
and faster to execute within the CLR.

Just a minute:
Which of the following options is supported by .NET and not by T-SQL?

1. Writing Queries
2. Creating Procedures
3. Object-Orientation
4. Writing Triggers

Answer:
3. Object-Orientation

Implementing Managed Database Objects


To create managed database objects in SQL Server, you need to first create a managed code in
any of the .NET programming languages. A managed code contains classes and methods that
provide a desired functionality. Next, you need to compile the code to create an assembly. An
assembly can be a .dll or .exe file that contains compiled managed code.

SQL Server cannot directly execute the assemblies. Therefore, before using the assemblies, you
need to import and configure the assemblies in the database engine. Further, you need to create a
database object based on the imported assembly.</

Importing and Configuring Assemblies


You can import a .NET assembly, which is a .dll or an .exe, in SQL Server database engine using
the CREATE ASSEMBLY statement. The syntax of the CREATE ASSEMBLY statement is:
CREATE ASSEMBLY assembly_name
[ AUTHORIZATION owner_name ]
FROM { <client_assembly_specifier> | <assembly_bits>
[ ,...n ] }
[ WITH PERMISSION_SET =
{ SAFE | EXTERNAL_ACCESS | UNSAFE } ]

where,
assembly_nameis the name of the assembly that you need to create in SQL Server. This name
will be further used by the database objects to refer to the assembly.
AUTHORIZATION owner_namespecifies the name of a user or role as the owner of the assembly.
client_assembly_specifier specifies the local or network path of the .NET assembly that is
being imported.
PERMISSION_SETspecifies the permissions that are granted to the assembly when it is accessed
by SQL Server. This parameter can accept any of the following values:

SAFE: Is the most secure permission as the code will not be able to access any external
resource, such as files, networks, environment variables, or registry. If no value is specified,
the default value is SAFE.
EXTERNAL_ACCESS: Enables the .NET code to access some external resources, such as files,
networks, environmental variables, and registry.
UNSAFE: Enables the .NET code to access any resource within or outside SQL Server. For
example, it allows a .NET code to call unmanaged code.

Consider an example. You have created a managed code and created the .NET assembly as
CLRIntegration.dll. To import this assembly in SQL Server, you can write the following
statement:
CREATE ASSEMBLY CLRIntegration FROM
‘C:\CLRIntegration.dll’ WITH PERMISSION_SET = SAFE

In the preceding statement, the database engine imports the CLRIntegration.dll assembly file from
the C: drive of the local computer and creates an assembly object named CLRIntgration.
Whenever you import an assembly, its details are added to the sys.assemblies system table.

You can update an assembly to refer to the recent version of the assembly file or to update the
permissions. For this, you can use the ALTER ASSEMBLY statement. Altering an assembly
object does not affect the processes that are currently using it. The following statement changes
the permission set of the CLRIntegration assembly from SAFE to EXTERNAL ACCESS:
ALTER ASSEMBLY CLRIntegration WITH PERMISSION_SET = EXTERNAL_ACCESS

You can use the DROP ASSEMBLY statement to delete an assembly. To drop the CLRIntegration
assembly, you can use the following statement:
DROP ASSEMBLY CLRIntegration

Just a minute:
Which of the following permissions will you use to call unmanaged code?

1. SAFE
2. EXTERNAL_ACCESS
3. UNSAFE

Answer:
3. UNSAFE</

Creating Managed Database Objects


After importing assemblies in SQL Server, you can create managed database objects that use the
managed code provided in the assembly. By default, SQL Server does not allow the running of
the managed code on the server. Therefore, before creating a managed database object in your
database, you need to enable the CLR integration feature in your database. To enable CLR, you
need to use the following statements:
sp_configure CLR_ENABLED, 1;
GO
RECONFIGURE;
GO

You need to be a member of sysadmin server role to enable CLR.

While developing managed database objects, you will have to use the System.Data.SqlClient,
System.Data.SqlTypes, and Microsoft.SqlServer.Server namespaces found in the .NET base class
libraries. These namespaces contains several key classes that you will use to create the managed
database object. The main purpose of these classes is to give you faster access to the database.
The following classes found in the System.Data.SqlClient namespace are used to access a database
from a managed code:
SqlContext: Represents the context under which the assembly is running. It provides several
properties, such as Pipe, TriggerContext, and WindowsIdentity, which you can use to access
the SqlPipe, SqlTriggerContext, and WindowsIdentity objects, respectively.
SqlPipe: Allows you to send results or messages directly to the client application. For
example, if you are using SQL Server Management Studio to display a message in the
Messages tab, you can use the following code:

SqlPipe pipe = SqlContext.Pipe;


pipe.Send(“This is a test message”);

In the preceding code, the Send() method is used to display the message.

SqlTriggerContext: Allows you to access information about the event that fired the trigger
during a managed trigger operation. Consider the following code:

SqlTriggerContext tr = SqlContext.GetTriggerContext();
SqlPipe pipe = SqlContext.Pipe;
if ( tr.TriggerAction == TriggerAction.Insert )
pipe.Send(“ A record inserted”);

In the preceding code, you will be able to find the type of the trigger that has been fired. If the
type of the trigger is insert, a message will be displayed.

SqlConnection: Allows you to connect a database before querying any data. You can use the
following code to connect to the database:

SqlConnection con = new SqlConnection(“context connection=true”);

SqlCommand: Allows you to send T-SQL commands to the database server. The following
code displays how to delete a row from a table in the database:

SqlCommand cmd = new SqlCommand();


cmd.CommandText = “DELETE FROM HumanResources.Product";
cmd.Connection = con;
int rows = cmd.ExecuteNonQuery();

In the preceding code, the ExecuteNonQuery() method sends a T-SQL statement to the database
server and returns the number of records deleted by the statement.
SqlDataReader: Allows you to work with the results of a query. You can use this class to
retrieve records, as shown in the following code:
SqlCommand cmd = new SqlCommand("SELECT LastName FROM Person.Contact", con);
SqlPipe pipe = SqlContext.Pipe;
SqlDataReader dr = cmd.ExecuteReader();
while ( dr.Read ())
{
pipe.Send ( dr["LastName"]);
}
dr.Close();
In the preceding code, the ExecuteReader() method executes the SELECT statement and
returns the SqlDataReader object. The SqlDataReader object, dr, points to the beginning of the
data in the result set. Therefore, the Read() method is called to move the pointer to the first
row. To iterate through all the records, Read() method is called until it returns false. The Read()
method returns a false value if no more records are found.

The System.Transaction namespace contains the Transaction class that allows you to manipulate
the data in a database within a transaction. For example, you can use the following code to rollback
the current transaction:
Transaction.Current.Rollback();

In the preceding code, the Rollback() method is used to rollback the current transaction. To use
the System.Transactions namespace, you must add the reference of the System.Transactions.dll
file.

To add the reference of the System.Transactions.dll, you have to perform the following steps in
Microsoft Visual Studio:

1. Select Project→Add Reference. The Add Reference dialog box will be displayed.
2. Select System.Transactions from the Component Name list displayed in the Add Reference
dialog box.
3. Click the OK button.

The System.Data.SqlTypes namespace represents classes that are equivalent to the SQL native
data types. It allows you to avoid the overhead of implicit type conversions in SQL Server
assemblies. The following table lists the classes found in the System.Data.SqlTypes namespace
along with their corresponding CLR SQL Server Integration data types.
SQL Server Native Data Type CLR SQL Server Integration Data Type
Bigint SqlInt64
Binary SqlBytes
VarBinary SqlBinary
SmallInt SqlInt16
Int SqlInt32
Money, SmallMoney SqlMoney
Float SqlDouble
Decimal SqlDecimal
Bit SqlBoolean
NVarchar, Varchar SqlString
Xml SqlXml
DateTime SqlDateTime

CLR SQL Server Integration Data Types

Depending on the requirements, the database developer can create the following types of database
objects:

Stored procedures
Functions
Triggers
User-Defined Types

To create different types of managed objects, you need to use the various attributes provided by
the Microsoft.SqlServer.Server namespace. The following table lists these attributes.
Attribute Allows to create
SqlProcedure Managed stored procedure
SqlFunction Managed user defined function
SqlTrigger Managed trigger
SqlUserDefinedType Managed user defined data type

Attributes in the Microsoft.SqlServer.Server Namespace

Creating Managed Stored Procedures

Stored procedures are the most common and useful database code blocks. With CLR integration,
you can use managed code to be executed as a stored procedure. For this, you need to create a
procedure that refers to an imported assembly.

To create a stored procedure using the managed code, you have to perform the following steps:

1. Create a .NET class that implements the functionality of the stored procedure. Then,
compile that class to produce a .NET assembly.
2. Register that assembly in SQL Server using the CREATE ASSEMBLY statement.
3. Create a stored procedure and associate the stored procedure with the actual methods of
the assembly.

After completing the preceding steps, the stored procedure is configured as managed stored
procedure, and you can execute it just like any other stored procedure.

To create a managed stored procedure by using an imported assembly, you can use the CREATE
PROCEDURE statement. The syntax of the CREATE PROCEDURE statement is:
CREATE PROCEDURE <Procedure Name>
AS EXTERNAL NAME <Assembly Identifier>.<Type Name>.<Method Name>,

where,
Procedure Name is the name of the procedure you want to create.
Assembly Identifier is the name of the imported assembly.
Type Name is the name of the class that contains the method that will be executed through the
procedure.
Method Name is the name of the method that will be executed through the procedure.

For example, you need to create a procedure that will read the data from a table and write it into
an XML file. The procedure will take two parameters, the SQL query and the output filename.
For this, you need to create a managed procedure to access an external file.

To create a managed stored procedure, first you have to create an assembly using the .NET
framework. For this, you have to perform the following steps:

1. Select Start→All Programs→Microsoft Visual Studio 2005→Microsoft Visual Studio 2005.


The Start Page - Microsoft Visual Studio window is displayed.
2. Select File→New→Project. The New Project dialog box is displayed.
3. Select the project type as Visual C# from the Project types pane.
4. Select Class Library from the Templates pane.
5. Type ConvertXML in the Name text box.
6. Type C:\ in the Location combo box.
7. Ensure that the Create directory for solution option is selected.
8. Click the OK button. The Class1.cs file is opened.
9. Replace the existing code in the Class1.cs file with the following code:
using System;
using System.Xml;
using Microsoft.SqlServer.Server;
using System.Data.SqlTypes;

//Declare a namespace in which the class will be stored


namespace CLRStoredProcedure
{
//Declare a class name XMLProc to define the functionality
public class XMLProc
{
// Marks that the code will be used as a stored procedure
[Microsoft.SqlServer.Server.SqlProcedure]

//Define the function that provides functionality


public static void SaveXML(SqlXml XmlData, SqlString FileName)
{
SqlPipe pipe = SqlContext.Pipe;
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.LoadXml(XmlData.Value);
xmlDoc.Save(FileName.Value);
pipe.Send("Data saved in an XML file");
}
}
}

10. Press the F7 key to build the application. When this application is built, the output file will
be a DLL file named ConvertXML.dll. This DLL file is created in the
C:\ConvertXML\ConvertXML\bin\Debug folder.

Using the ConvertXML.dll assembly, you need to create a managed procedure. For this, you need
to perform the following steps:

1. Execute the following statement in the Query Editor window of the Microsoft SQL Server
Management Studio window to create an assembly named ConvertXMLAssembly that will
refer to the ConvertXML.dll file:

CREATE ASSEMBLY ConvertXMLAssembly FROM


‘C:\ConvertXML\ConvertXML\bin\Debug\ConvertXML.dll’ WITH PERMISSION_SET =
EXTERNAL_ACCESS

While creating assembly using EXTERNAL_ACCESS or UNSAFE permission set, you may get
the following error message:
“CREATE ASSEMBLY for assembly ‘CLRStoredProcedure’ failed because assembly
‘CLRStoredProcedure’ is not authorized for PERMISSION_SET = UNSAFE. The assembly is
authorized when either of the following is true: the database owner (DBO) has UNSAFE
ASSEMBLY permission and the database has the TRUSTWORTHY database property on; or
the assembly is signed with a certificate or an asymmetric key that has a corresponding login
with UNSAFE ASSEMBLY permission.” To solve this problem, you need to execute the
following statement:
ALTER DATABASE AdventureWorks SET TRUSTWORTHY ON

While creating assembly using EXTERNAL_ACCESS or UNSAFE permission set, you may get
the following error message:
“The database owner SID recorded in the master database differs from the database owner
SID recorded in database ‘AdventureWorks'. You should correct this situation by resetting the
owner of database ‘AdventureWorks’ using the ALTER AUTHORIZATION statement. “
To solve this problem, you need to execute the following statement:
EXEC dbo.sp_changedbowner @loginame = N’sa', @map = false

2. Create a managed stored procedure by executing the following statement:


CREATE PROCEDURE clrproc(@XmlData as XML, @filename as nvarchar(30))
AS EXTERNAL NAME
ConvertXMLAssembly.[CLRStoredProcedure.XMLProc].SaveXML

3. Execute the following statements to run the managed procedure:


DECLARE @p xml
SET @p = (SELECT ProductID, Name, ListPrice FROM Production.Product FOR XML AUTO,
ELEMENTS, ROOT('Catalog'))
EXEC clrproc @p, ‘D:\Catalog.Xml’

When the procedure clrproc will be executed, it will create a file named Catalog.xml. This
procedure reads the data from the Product table and writes all the rows in the XML format in the
Catalog.xml file.

Creating Managed Functions

To create a managed function, you need to perform the following steps:

Create a .NET class that implements the functionality of the user-defined function. Then,
compile that class to produce a .NET assembly.
Register that assembly in SQL Server using the CREATE ASSEMBLY statement.
Create a user-defined function and associate it with the actual methods of the assembly.

After completing the preceding steps, the function is configured as a managed function, and you
can call it just like any other user-defined function.

To create a managed function using an imported assembly, you can use the CREATE FUNCTION
statement. The syntax of the CREATE FUNCTION statement is:
CREATE FUNCTION <Function Name>
(
< Parameter List>
)
RETURNS <Return Type>
AS EXTERNAL NAME <Assembly Identifier>.<Type Name>.<Method Name>

where,
<function name> is the name of the function.
<Parameter List> is the list of parameters accepted by the funtion.
<Return Type> is the type of the value that is to be returned by the function.
<Assembly Identifier> is the name of the imported assembly.
<Type Name> is the name of the class that contains the method that will be executed through
the procedure.
<Method Name> is the name of the method that will be executed through the procedure.

For example, you need a function that accepts a datetime value as parameter and converts it into
a LongDateString format. For this, you want to create an assembly named
GetLongDateAssembly.dll by using the following code:
using System;
using System.Data.Sql;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
namespace DateUtilities
{
public class UserDefinedFunctions
{
[SqlFunction(Name = "GetLongDate",
DataAccess = DataAccessKind.None)]
public static SqlString GetLongDate(SqlDateTime DateVal)
{
return DateVal.Value.ToLongDateString();
}
};
}

In the preceding code, the GetLongDate() function converts a SqlDateTime value into the
LongDateString format. To create the managed function using the GetLongDateAssembly.dll file,
you need to perform the following steps:
1. Execute the following statement in the Query Editor window of the Microsoft SQL Server
Management Studio interface to create an assembly named GetLongDateAssembly that
will refer to the GetLongDateAssembly.dll file:
CREATE ASSEMBLY GetLongDateAssembly FROM
‘C:\GetLongDateAssembly\GetLongDateAssembly\bin\Debug\GetLongDateAssembly.dll’
WITH PERMISSION_SET = UNSAFE
2. Execute the following statement to create a managed function named GetLongDate() from
the GetLongDateAssembly assembly:
CREATE FUNCTION GetLongDate(@d as DateTime)
RETURNS nVarchar(50)
AS EXTERNAL NAME GetLongDateAssembly.
[DateUtilities.UserDefinedFunctions].GetLongDate
3. You can execute the following query to execute the managed function:
SELECT EmployeeID, dbo.GetLongDate(birthdate) as ‘BirthDate’
FROM HumanResources.Employee
4. The preceding query displays the output, as shown in the following figure.

Output of the Manage Function

Creating Managed Triggers

Managed triggers help in implementing advanced trigger logic that cannot be done using T-SQL.
To create a managed trigger using a managed code, you have to perform the following steps:
1. Create a .NET class that implements the functionality of the trigger. Then, compile that
class to produce a .NET assembly.
2. Register that assembly in SQL Server using the CREATE ASSEMBLY statement.
3. Create a trigger and associate it with the actual methods of the assembly.

After completing the preceding steps, the trigger is configured as a managed trigger, and it can be
fired just like any other trigger in response to an event.

You can create a managed trigger by using the CREATE TRIGGER statement. The syntax of the
CREATE TRIGGER statement is:
CREATE TRIGGER <TriggerName>
ON <Table or View> <FOR|INSTEAD OF|AFTER>
<INSERT|UPDATE|DELETE>
AS EXTERNAL NAME <Assembly Identifier>.<Type Name>.<Method Name>

where,
<TriggerName> is the name of the trigger.
<Table or View> is the name of the table or view on which you want to create the trigger.
<FOR|INSTEAD OF|AFTER> <INSERT|UPDATE|DELETE> specifies the type of trigger you want to
create.
<Assembly Identifier> is the name of the imported assembly.
<Type Name> is the name of the class that contains the method that will be executed through
the procedure.
<Method Name> is the name of the method that will be executed through the procedure.

Consider an example. The database in an organization maintains the user details in the
UserNameAudit table. The table stores various details of the users including the e-mail address.
When inserting the details of users, you need to validate the e-mail address to ensure that it is a
valid e-mail address. For this, you have decided to create a trigger that gets executed every time
a new row is inserted.

To implement the logic of validating the e-mail address, you need to create a managed trigger.
You have created the ValidateMail.dll assembly using the following code:
using System;
using System.Data;
using System.Data.Sql;
using Microsoft.SqlServer.Server;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using System.Xml;
using System.Text.RegularExpressions;
using System.Transactions;

public class trgMail


{
[SqlTrigger(Name = @"EmailAudit", Target = "[dbo].[Users]", Event
= "FOR INSERT")]
// Specifies that the code will be used as a SQL Trigger
public static void MailValid()
{
string userName;
string realName;
SqlCommand command;
SqlTriggerContext triggContext = SqlContext.TriggerContext;
SqlPipe pipe = SqlContext.Pipe;
SqlDataReader reader;
switch (triggContext.TriggerAction)
{
case TriggerAction.Insert:
using (SqlConnection connection
= new SqlConnection(@"context connection=true"))
{
connection.Open();// Opens the SQL Connection
command = new SqlCommand(@"SELECT * FROM INSERTED;", connection);
reader = command.ExecuteReader();
reader.Read();
userName = (string)reader[0];
realName = (string)reader[1];
reader.Close();
if (IsValidEMailAddress(userName))
{
command = new SqlCommand(
@"INSERT [dbo].[UserNameAudit] VALUES ('"
+ userName + @"', '" + realName + @"');",
connection);
pipe.Send(command.CommandText);
command.ExecuteNonQuery();
pipe.Send("You inserted: " + userName);
}
else
{
try
{
pipe.Send("Invalid EmailID");
Transaction trans = Transaction.Current;
trans.Rollback();

}
catch (SqlException ex)
{

}
}
}
break;
}
}
public static bool IsValidEMailAddress(string email)
{
return Regex.IsMatch(email, @"^([\w-]+\.)*?[\w-]+
@[\w-]+\.([\w-]+\.)*?[\w]+$");
}
}

In the preceding code, MailValid() function defines the logic of validating the e-mail address and
inserting a row in the table.

To create a managed trigger using the ValidateMail.dll assembly, you need to perform the
following steps:
1. Execute the following statement to create the ValidateEmailAssembly assembly from the
ValidateMail.dll file:
CREATE ASSEMBLY ValidateEmailAssembly
FROM ‘C:\ValidateMail\ValidateMail\bin\Debug\ValidateMail.dll’
WITH PERMISSION_SET = UNSAFE
2. Create two tables named Users and UserNameAudit using the following statements:
CREATE TABLE Users
(
UserName nvarchar(200) NOT NULL,
RealName nvarchar(200) NOT NULL
)
CREATE TABLE UserNameAudit
(
UserName nvarchar(200) NOT NULL,
RealName nvarchar(200) NOT NULL
)
3. Execute the following statement to create a managed trigger:
CREATE TRIGGER EmailAudit
ON Users FOR INSERT
AS EXTERNAL NAME ValidateEmailAssembly.trgMail.MailValid
4. Execute the following statements to insert records in the Users table:
INSERT INTO Users VALUES('[email protected]', ‘Peter')
INSERT INTO Users VALUES('johnyahoo.com', ‘John')

When you insert a record in the Users table, the EmailValidation trigger will be fired to validate
the e-mail address. The first INSERT statement contains a valid e-mail address. Therefore, the
first record will be stored in the UserNameAudit table. However, the second INSERT statement
contains an invalid email address. As a result, the second INSERT statement will be rolled back.

Creating Managed User-Defined Types


You can create a data type definition in any of the .NET-supported languages and use it as a data
type within SQL Server. This data type can be a combination of any other existing data type with
more modifications applied on it. To create a user-defined data type using a managed code such
as C# .NET, you have to perform the following steps:

1. Create a .NET class that implements the functionality of the user-defined data type. Then,
compile that class to produce a .NET assembly.
2. Register that assembly in SQL Server using the CREATE ASSEMBLY statement.
3. Create a user-defined data type that refers to the registered assembly.

After completing the preceding steps, the data type is configured as a managed user-defined data
type, and you can use it just like any other data type in SQL Server 2005.

You can create a managed user-defined type by using the CREATE TYPE statement. The syntax
of the CREATE TYPE statement is:
CREATE TYPE [ schema_name. ] type_name
{
EXTERNAL NAME assembly_name [ .class_name ]
}

where,
schema_name is the name of the schema to which the alias data type or user-defined type
belongs.
type_name is the name of the alias data type or the user-defined type.
assembly_name specifies the SQL Server assembly that references the implementation of the
user-defined type in CLR. It should match an existing assembly in SQL Server within the
current database.
[ .class_name ] specifies the class within the assembly that implements the user-defined
type. The class name must be a valid identifier and must exist as a class in the assembly with
assembly visibility. It is case-sensitive, regardless of the database collation, and must match
the class name in the corresponding assembly.

For example, an enterprise has employees in two states of US, California and New York. As a
database developer, you want that whenever the details of a new employee are inserted, the zip
code of the state should be converted to the state name. For example, if the zip code is “CA” then
it should be converted into “State: California”. For this, you need to create a managed user-defined
type. To create the user-defined type, you have the ZipCode.dll assembly file created by the
development team using the following code:
using System;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
using System.IO;

[Serializable]
[Microsoft.SqlServer.Server.SqlUserDefinedType(Format.UserDefined, MaxByteSize =
8000)] // Specifying that the code will be used as User-Defined data type.
public struct ZipCode : IBinarySerialize, INullable // Implementing the required
interface
{
private string zip_code; //Declaring the private members
private bool isNull; //Declaring the private members

public bool IsNull //The IsNull property of the INullable interface


{
get
{
return this.isNull; // Return that the data is NULL
}
}
public static ZipCode Null // The ZipCode property
{
get
{
ZipCode sd = new ZipCode();
sd.isNull = true;
return sd;
}
}
public ZipCode(string s) // The constructor of the structure
{
isNull = false;
zip_code = s;
}

public static ZipCode Parse(SqlString s) // The parser


{
string value = s.Value;
if (s.IsNull || value.Trim() == "") return Null;
string zip = value;
return new ZipCode(zip);
}
public override String ToString() // Overriding the default ToString function
{
return zip_code;
}
public void Read(BinaryReader r) // Displaying the data to the user
{
zip_code = r.ReadString();
}
public void Write(BinaryWriter w) // Inserting the data in the table
{
if (zip_code == "CA")
{
w.Write("State : California");
}
else if (zip_code == "NY")
{
w.Write("State : New York");
}
else
{
w.Write("State : Invailid, " + zip_code);
}
}
}

To create the managed user-defined data type using the ZipCode.dll assembly file, you can
perform the following steps:
1. Execute the following statement to create an assembly named ZipAssembly from the
ZipCode.dll file:
CREATE ASSEMBLY ZipAssembly
FROM ‘C:\ZipCode\ZipCode\bin\Debug\ZipCode.dll’
WITH PERMISSION_SET = UNSAFE
2. Execute the following statement to create a managed user-defined type named ZipCode:
CREATE TYPE ZipCode
EXTERNAL NAME ZipAssembly.ZipCode

The data type will be associated with the default dbo.

You can execute the following statement to create a table using the ZipCode user-defined type:
CREATE TABLE ManagedEmployee
(
Name nvarchar(20),
Zip ZipCode
)

You can insert a new record in the ManagedEmployee table using the following statement:
INSERT INTO ManagedEmployee
VALUES ( ‘Peter', ‘NY')
INSERT INTO ManagedEmployee
VALUES ( ‘John', ‘CA')

You can display the records from the ManagedEmployee table using the following query:
SELECT Name, convert(nvarchar(20), Zip)as ‘Zip Code’
From ManagedEmployee

The preceding query will display the output, as shown in the following figure.

Output of the Managed User-Defined Type

You can also use Zipcode as a data type to declare a variable. The following statements can be
used to create a batch that uses a variable named @zip of user-defined data type, ZipCode:
DECLARE @zip ZipCode
SET @zip = ‘CA’
PRINT convert(NVARCHAR(20), @zip)
GO

The execution of the preceding statements will display the following output:
State : California
Just a minute:
When will you use managed code instead of T-SQL?

1. When you need to write queries.


2. When you need to access external resources.
3. When you need to perform an administrative task on the database.

Answer:
2. When you need to access external resources.

Activity: Implementing Managed User-Defined Types

Problem Statement

The management of AdventureWorks, Inc. has decided to include the details of the spouse of
employees in the database. The application used to enter the employee detail will accept the name
and date of birth of the spouse of an employee. In addition, it will concatenate the two values
separated by a ‘;’. As a database developer, you need to store the spouse details in the following
format:

Spouse Name: <name of the spouse> ; Spouse Date of Birth : <date of birth>

To implement this, you have decided to create a managed user-defined data type. How will you
create this data type?

For this activity, the SpouseDetails.dll assembly file will be provided. This file is created using the
code given in task 2 of the solution.

Solution

To solve the preceding problem, you need to perform the following tasks:

1. Enable CLR in the database.


2. Create an assembly.
3. Create a managed user-defined data type.
4. Create a table that implements the user-defined data type.
5. Verify the output.

Task 1: Enabling CLR in the Database

To enable CLR in the AdventureWorks database, you need to perform the following steps:
1. Select Start→Programs→SQL Server 2005→SQL Server 2005 Management Studio to open
the Microsoft SQL Server 2005 Management Studio window.
2. Change the database context to AdventureWorks, by typing the following statement:
USE AdventureWorks
GO

3. Press the F5 key to execute the statement.


4. Enable CLR by executing the following statements:
sp_configure CLR_ENABLED, 1
GO
RECONFIGURE
GO

5. Press the F5 key to execute the statements.

Task 2: Creating an Assembly

Create the SpouseDetails.dll assembly file by using the following code:


using System;
using System.Data;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
using System.Text;
using System.IO;
using System.Runtime.InteropServices;

[Serializable]
[Microsoft.SqlServer.Server.SqlUserDefinedType(Format.UserDefined,
MaxByteSize = 8000)] //* Specifying that the code will be used as User-Defined data
type.*/

public struct SpouseDetails : IBinarySerialize, INullable


// Implementing the required interface
{
private string sp_name; //Declaring the private members
private string sp_age; //Declaring the private members
private bool isNull; //Declaring the private members

public bool IsNull //The IsNull property of the INullable


{
get
{
return this.isNull;
}
}
public static SpouseDetails Null //The static property
{
get
{
SpouseDetails sd = new SpouseDetails();
sd.isNull = true;
return sd;
}
}
public SpouseDetails(string s, string s1) // The constructor
{
isNull = false;
sp_name = s;
sp_age = s1;
}

public static SpouseDetails Parse(SqlString s) // The parser


{
string value = s.Value;
if (s.IsNull || value.Trim() == "") return Null;
string name = value.Substring(0, value.IndexOf(';'));
string dob = value.Substring(value.IndexOf(';') + 1,
value.Length - name.Length - 1);
return new SpouseDetails(name, dob);
}

public override String ToString() // The overridden ToString method


{
return sp_name;
}

public void Read(BinaryReader r) // Displaying the data to the user on the


SELECT query
{
sp_name = r.ReadString();
}
public void Write(BinaryWriter w)// *Inserting the data in the table*/
{
w.Write("Spouse Name : " + sp_name + "; Spouse Date of Birth:"
+ sp_age + ".");
}
}

To create an assembly using the SpouseDetails.dll file, you need to perform the following steps
in the Microsoft SQL Server Management Studio window:
1. Type the following statement to create an assembly named SpouseDet from the
SpouseDetails.dll file:
CREATE ASSEMBLY SpouseDet
FROM ‘C:\SpouseDetails\SpouseDetails\bin\Debug\SpouseDetails.dll’

2. Press the F5 key to execute the statement.


Task 3: Creating a Managed User-Defined Data Type

To create a managed user-defined data type, you need to perform the following steps:
1. Type the following statement to create the managed user-defined type named
Spouse_Details:
CREATE TYPE Spouse_Details EXTERNAL NAME
SpouseDet.[SpouseDetails]

2. Press the F5 key to execute the statement.

Task 4: Creating a Table that Implements the User-Defined Data Type

To create a table that implements the user-defined data type, you need to perform the following
steps:
1. Type the following statement to create a table named Spouse_Det in the AdventureWorks
database, which will implement the managed user-defined data type:
CREATE TABLE Spouse_Det
(
Employee_id int,
SpouseDetails Spouse_Details
)

2. Press the F5 key to execute the statement.

Task 5: Verifying the Output

To verify the output, you need to perform the following steps:


1. Type the following statement to insert a record in the Spouse_Det table:
INSERT INTO Spouse_Det VALUES (1,'Samantha;12/12/1980')

2. Press the F5 key to execute the statement. One row will be inserted in the Spouse_Det
table.
3. Type the following statement to display records from the Spouse_Det table:
SELECT Employee_id, convert(nvarchar(60), SpouseDetails)
AS ‘Spouse Details’ FROM Spouse_Det
4. Press the F5 key to execute the statement. It will display the output, as shown in the
following figure.

Output of Managed User-Defined Type

Summary
In this chapter, you learned that:
The database objects created in any of the .NET supported languages are called managed
database objects.
CLR integration provides the following benefits:
 Better programming model
 Common development environment
 Ability to define data types
T-SQL can be used to perform data access and manipulation operations that can be
implemented using the programming constructs provided by T-SQL.
Managed database objects can be used in the following situations:
 To implement complicated programming logic for which you can reuse the functionality
provided by the .NET base class libraries.
 To access external resources, such as calling a Web service or accessing the file system.
 To implement a CPU-intensive functionality that can run more efficiently as compared to
the managed code.
The .NET code that is used to create the managed database objects are compiled in .NET
assemblies, .dll or .exe files.
To create a managed database object, first the .NET assemblies are imported in the database
engine.
The assemblies in the database engine can be given any of the following permissions:
 SAFE
 EXTERNAL_ACCESS
 UNSAFE
By default, SQL Server does not allow running managed code on the server.
Before creating a managed database object in your database, the CLR integration feature
should be enabled in the database using the sp_configure stored procedure.
While developing managed database objects, you will have to use the System.Data.SqlClient,
System.Data.SqlTypes, and Microsoft.SqlServer.Server namespaces found in the .NET base
class libraries.
The following classes found in the System.Data.SqlClient namespace are used to access a
database from a managed code:
 SqlContext
 SqlPipe
 SqlTriggerContext
 SqlConnection
 SqlCommand
 SqlDataReader
Managed stored procedure can be created using the CREATE PROCEDURE statement.
Managed function can be created using the CREATE FUNCTION statement.
Managed trigger can be created using the CREATE TRIGGER statement.
Managed data type can be created using the CREATE TYPE statement.

Exercises

Exercise 1
The AdventureWorks database maintains the sales details in the SalesOrderHeader and
SalesOrderDetials tables. As a database developer of AdventureWorks, Inc., you need to retrieve
all the order numbers for the sales accounts in the following format.
AccountNumber OrderNumbers
10-4020-000676 SO43659 SO44305 SO45061 SO45779
10-4020-000117 SO43660 SO47660 SO49857 SO51086

How will you retrieve the data in the given format?

To complete this exercise, a DLL will be provided to you that will have the code for the required
functionality.

Chapter 10
Implementing HTTP Endpoints
When organizations need to make the data available to users spread across various locations, they
implement Web applications. These Web applications make use of certain services, known as
Web services that provide programming support and help the applications to interact with the
database server. To allow communication between the Web services and the database server, you
need to provide access through additional ports on the Internet firewall. This increases the security
threat to the organization.

SQL Server 2005 provides native Hypertext Transfer Protocol (HTTP) support that allows you to
create Web services on the database server. These Web services expose Web methods that the
Web applications can access by using the HTTP endpoints.

This chapter introduces you to Web services. Next, it discusses the role of the HTTP endpoints in
the Web service architecture. Further, it explains how to implement HTTP endpoints.

Objectives
In this chapter, you will learn to:
Define the HTTP endpoints
Implement the HTTP endpoints for Web services

Introduction to HTTP Endpoints


Consider a scenario of a large-scale enterprise that has a number of sales officers. The
management has decided to provide Personal Desktop Assistant (PDA) devices to the sales
officers so that they can log the sales details online. In addition, the executives need to connect
the PDA device to the database server. However, the management does not want to allow direct
access to the secured database. Further, the architecture of PDA and the database server are
entirely different. Therefore, providing direct access from a PDA device to the database server
involves a high cost. As a result, there is an urgent need of a technology through which
heterogeneous applications and devices can easily communicate with the database. Here, the
management can implement Web services through which each sales executive can log the sales
details online from anywhere using any device.

A Web service is a collection of Web methods or functions, accessible over the Internet or an
intranet. Each of the Web methods provides specific business functionality and serves as a
building block of creating distributed application. Web services encapsulate the implementation
of business logic and provide an interface for accessing the data stored in the database.

SQL Server 2005 provides native HTTP support for Web services. HTTP is the protocol used for
communication over the Internet. Therefore, you can interact with the database over the internet
by using HTTP. With the help of native HTTP support, you can create Web services in SQL
Server. Further, you can create HTTP endpoints to expose these services to the clients, who can
directly access these services over the Internet.

SQL Server 2005 supports the concept of an endpoint. An endpoint is a service that listens for
requests natively within the server. Each endpoint supports a protocol, which can be TCP or
HTTP, and a payload type, which can include support for database mirroring, service broker, T-
SQL, or SOAP.

The payload references the type of traffic that the endpoint permits.

The Web services are based on the Service-Oriented Architecture (SOA). Therefore, before you
understand how the Web services are implemented in SQL Server, it is essential to know about
SOA.</

Introduction to SOA
SOA is an extension of distributed computing based on the request/response design pattern. In the
distributed application, the client interacts with the middle-tier by using HTTP, and the middle-
tier interacts with the database. It follows the n-tier architecture that is essentially a collection of
services communicating with each other.

SOA involves two types of entities, service provider and service consumer. A service provider
implements a set of services in the form of functions. In addition, the service provider exposes its
functionality through an interface. A service consumer uses the interface to make requests to the
service provider to use the services. The service provider processes the request and sends a reply
to the consumer, as shown in the following figure.

Service-Oriented Architecture
Consider an example of a large business organization that has a number of clients using diverse
hardware and software platform, such as a PDA, laptop, Linux, Tablet PC, or a mobile phone. In
such a scenario, an architecture that enables communication between different types of clients and
servers is required. SOA allows creating interoperable services that can be accessed from
heterogeneous systems. Interoperability enables a service provider to host a service on any
hardware or software platform that can be different from the platform on the consumer end.

In addition, SOA provides the following benefits:

1. Enables you to create applications that reuse the existing business logic. For example, a
service provides logic to validate the identity of a customer. If you need to build two
different business applications that need to validate the identity of a customer, you can
use the same service for that purpose.
2. Enables changes to applications while keeping clients or service consumers isolated from
the changes. Consider the preceding example, where you have created a service to validate
the identity of a customer. If you need to modify the logic of this service, you are only
required to update the service. You do not need to make any changes in the business
applications that use the service.

Introduction to Web Services


A Web service is a set of functions that provide programmable logic used by the client applications
over the Internet. These services are based on SOA and communicate through Simple Object
Access Protocol (SOAP). SOAP is a protocol used for the exchange of information in a distributed
environment. It is an XML-based message envelop based on HTTP. The clients send and receive
SOAP messages in the XML format and communicate with the service by using HTTP.

For example, a Web service provides the latest information about stocks. Various websites that
need to display the stock details can use this service to get the latest updates. This service will use
SOAP to communicate with the clients.

A Web service is implemented in the distributed environment, as shown in the following figure.
Implementation of a Web Service

A Web service encapsulates the implementation of the functions and provides an interface through
which the clients can call these functions. This interface is provided in the XML format in an
industry standard called Web Services Definition Language (WSDL). WSDL is a document that
is used by Web clients to retrieve information about the names of the methods and their
parameters.

Each Web service needs to be published by using a protocol called Universal Description,
Discovery and Integration (UDDI). UDDI helps the client applications locate the published Web
services.

Web services provide the following benefits:

Interoperability: A Web service communicates by using the HTTP protocol, which is the
standard protocol followed by the industry. Therefore, a Web service can be used by any
client application that can communicate through HTTP.
Multilanguage support: A Web service can be created by using any programming language.
Reusability: A Web service created for a particular application can be reused in other
applications as it follows the industry standards for implementation and communication.
Use of industry-supported standards: All of the major vendors support Web service-related
technologies, such as HTTP, XML, and SOAP. This enables heterogeneous applications access
Web service very easily.

Web Services in SQL Server

As a database developer, you have created a stored procedure that retrieves region-wise sales data
of the organization. You want to ensure that heterogeneous applications can use this data online.
For this, you can expose this stored procedure as a Web service in SQL Server. A Web service
provides a programming logic to implement a business rule. It can also manipulate data in a
database server. SQL Server 2005 provides the native HTTP support within the database engine
that allows database developers to create Web services. This allows the database users to interact
with the database over the Internet by using a Web service.

When you use the native Web services of SQL Server, you can send SOAP messaging requests
to an instance of SQL Server over HTTP to run the following:

Transact-SQL batch statements, with or without parameters.


Stored procedures and scalar-valued user-defined functions.

Just a minute:
Which of the following options describes the Web services?

1. WSDL
2. SOAP
3. UDDI
Answer:
1. WSDL

Just a minute:
Which of the following options helps in finding a Web service?

1. WSDL
2. SOAP
3. UDDI

Answer:
3. UDDI</

Identifying the Role of HTTP Endpoints in a Native Web Service


Architecture
To use the native XML Web services of SQL Server, you need to establish an HTTP endpoint at
the server. This endpoint is the gateway through which HTTP-based clients can send queries to
the server. An HTTP endpoint listens and receives client requests on port 80. These requests are
listened and routed to the endpoint by the Http.sys listener process.

The following figure shows how the HTTP endpoint allows users to communicate with the Web
services implemented on SQL Server.

HTTP Endpoint Architecture

After establishing an HTTP endpoint, you can create stored procedures or user-defined functions
that can be made available to endpoint users. These procedures and functions are also called as
Web methods. The Web methods together can be called a Web service. SQL Server instance
provides a WSDL generator that helps generate the description of a Web service in the WSDL
format, which is used by the clients to send requests.

Just a minute:
On which of the following ports does SQL Server listen for HTTP requests?
1. 80
2. 90
3. 70

Answer:
1. 80

Implementing HTTP Endpoints for Web Services


As a database developer, you can configure an instance of SQL Server 2005 as a Web service that
can listen natively for HTTP SOAP requests. To perform this task, you need to create HTTP
endpoints and define the properties and methods that an endpoint exposes.

An HTTP endpoint opens the database to all the trusted users of your data. At times, you might
need to restrict access to the data to selected users only. Therefore, it is important to secure the
HTTP endpoints by granting permissions to only selected users to access an HTTP endpoint.</

Creating HTTP Endpoints


As a database developer, you need to create HTTP endpoints to allow users to access a Web
service implemented in the database engine. For example, for an organization, you need to allow
the sales executives to add or update order details in a database. For this, you can create a Web
service and an HTTP endpoint through which the users will execute the service.

Before implementing an HTTP endpoint, you need to first create the database code that provides
the functionality to the client applications.

To create the database code, you need to determine what object you want to use from SQL Server.
You can use stored procedures or user-defined functions as the endpoints but cannot expose tables
or views directly.

After defining the object you want to expose, you need to run the T-SQL statements that expose
and launch the endpoints.

Creating the Required Database Code

To provide access of data to various users on the Web, you need to create stored procedures or
functions, which will be further converted into Web methods. The code allows users to perform
data manipulations or generate reports that are further accessed from the Web.

Consider an example, where the database users need to frequently generate reports. These reports
display the aggregated sales data or the region-wise customer details of an organization. You can
create stored procedures to generate the desired results.

Creating an HTTP Endpoint Object


After creating the database objects, you need to create an HTTP endpoint. This object provides
the users with a connecting point through which they can access the implemented functions.
When you create an HTTP endpoint, SQL Server converts the database objects into Web methods.
These Web methods return data that is compatible with the Web standards. These Web methods
can be called from any client applications, regardless of the platform of the client applications, to
access data.

You can use the CREATE ENDPOINT statement to create an HTTP endpoint. The syntax of the
CREATE ENDPOINT statement is:
CREATE ENDPOINT endpoint_name
STATE = { STARTED | STOPPED | DISABLED }
AS HTTP ( AUTHENTICATION =( { BASIC | DIGEST |
INTEGRATED | NTLM | KERBEROS },
PATH = 'url',
PORTS = (CLEAR | SSL} [ ,... n ] )
[ SITE = {'*' | '+' | 'webSite' },]
)
FOR SOAP(
[ { WEBMETHOD [ 'namespace' .] 'method_alias'
( NAME = 'database.schema.name'
[ , SCHEMA = { NONE | STANDARD | DEFAULT } ]
[ , FORMAT = { ALL_RESULTS | ROWSETS_ONLY } ])

} [ ,...n ] ]
[ BATCHES = { ENABLED | DISABLED } ]
[ , DATABASE = { 'database_name' | DEFAULT } ]
[ , WSDL = { NONE | DEFAULT | 'sp_name' } ]

where,
Endpoint_name is the name of the endpoint that you want to create.
STATE = { STARTED | STOPPED | DISABLED } specifies the state of the endpoint when it is
created. It can have the following values:

STARTED: Specifies that the endpoint is started and the server is actively listening to client
requests.
DISABLED: Specifies that the endpoint is disabled. In this state, the server does not listen and
response to client requests.
STOPPED: Specifies that the endpoint is stopped. In this state, the server listens to client
requests but returns errors to the clients.

AUTHENTICATION = ( { BASIC | DIGEST | NTLM | KERBEROS | INTEGRATED }) specifies the


authentication method that will be used to verify the clients accessing the endpoint. This
parameter accepts any of the following values:

BASIC: Basic authentication contains the user name and password in an encoded format
separated by a colon. It can be useful when the endpoint needs to be accessed locally or on
the server itself.
DIGEST: In digest authentication, the user name and password is hashed by using MD5, which
is a one-way hashing algorithm. The username and password sent by using digest
authentication must be mapped to a windows account. Digest authentication can be used for
crucial data that needs to be accessed when required.
NTLM: This protocol uses encryption for secure transmission of passwords. It provides more
security than Basic or Digest authentication.
KERBEROS: Kerberos authentication is an Internet standard authentication. While using this
mechanism, SQL server must associate a Service Principal Name with the account accessing
the endpoint.
INTEGRATED: Integrated authentication can use the NTLM or Kerberos authentication to
authenticate the client. Integrated authentication can authenticate the client with the type
the client requests.

PATH = ‘url’ specifies the Uniform Resource Locator (URL) path which is the location
where the endpoint will be stored on the host computer. The name of the host computer is
specified by the SITE parameter.
PORTS= ( { CLEAR | SSL} [ ,… n ] ) specifies one or more listening port types that are
associated with the endpoint. This parameter can accept CLEAR or SSL. Clear specifies that
the incoming requests should be coming from HTTP where as SSL specifies that the requests
should come for secure http (HTTPS).
[ SITE = {'*’ | ‘+’ | ‘webSite’ },]specifies the name of the host computer. It can have
the following values:

*(asterisk): Is used to listen to all the possible host names for the computer that are not
currently reserved. It is the default option.
+ (plus sign): Is used tolisten to all the possible host names for the computer.
website: Is used to specify the particular host name for the computer.

[ WEBMETHOD [ ‘namespace’ .] ‘method_alias’ ] is the method that will be used to accept


the requests from the http client. Method_alias specifies an alternate name for a Web method
that a client can use to access the method.
NAME = ‘database.schema.name’ specifies the name of the stored procedure or user-defined
function that corresponds to the SOAP method specified in WEBMETHOD.
[ SCHEMA = { NONE | STANDARD | DEFAULT } ] determines whether an inline XSD (XML
Schema Definition) schema will be returned for the current Web method in SOAP responses.
You can use any of the following types with the SCHEMA parameter:

NONE:The XSD schema is not returned for the SELECT statement results sent through SOAP.
STANDARD: The XSD schema is returned for the SELECT statement results sent through SOAP.
DEFAULT: Defaults to the endpoint SCHEMA option setting.

specifies what will be returned by the Web


[ FORMAT = { ALL_RESULTS | ROWSETS_ONLY } ]
method. The default is ALL_RESULTS. You can use any of the following with the
FORMAT parameter:

ALL_RESULTS: Returns a result set, a row count, error messages, and warnings in the SOAP
response.
ROWSETS_ONLY: Returns only the result sets.

BATCHES = { ENABLED | DISABLED } specifies whether the endpoint will process the ad hoc
requests or not.
DATABASE = { ‘database_name’ | DEFAULT } ]specifies the database on which the requested
operation is executed. If database_name is not specified or DEFAULT is specified, the
default database for the login is used.
WSDL = { NONE | DEFAULT | ‘sp_name’ } indicates whether the WSDL document generation
is supported for this endpoint. If NONE, no WSDL response is generated or returned for the
WSDL queries submitted to the endpoint. If DEFAULT, a default WSDL response is
generated and returned for the WSDL queries submitted to the endpoint. In exceptional cases,
where you are implementing custom WSDL support for the endpoint, you can also specify a
stored procedure by the name that will return a modified WSDL document.

You need to be a member of the sysadmin role to create an endpoint.

Consider an example, where you want to create an endpoint called sqlEndpoint with two Web
methods, GetSqlInfo and getSalesDetails. These are methods for which a client can send SOAP
requests to the endpoint. For this, you need to use the following statement in SQL Server:
CREATE ENDPOINT sqlEndpoint
STATE = STARTED AS HTTP(
PATH = '/sql',
AUTHENTICATION = (INTEGRATED ),
PORTS = ( CLEAR ), SITE = 'localhost' )
FOR SOAP (
WEBMETHOD 'getSqlInfo' (name='master.dbo.xp_msver', SCHEMA=STANDARD ),
WEBMETHOD 'getSalesDetails' (name='master.sys.fn_MSdayasnumber'),
WSDL = DEFAULT, SCHEMA = STANDARD, DATABASE = 'master', NAMESPACE =
'http://tempUri.org/' );

In the preceding statement, sqlEndpoint is the name of the endpoint. This endpoint will listen to
the HTTP request. The SITE specifies the database server name. The database server is using
integrated authentication. Here, the database object, xp_msver, in the master database is exposed
as a Web method called getSqlInfo(), and another database object, fn_MSdayasnumber, of the
master database is exposed as a Web method named getSalesDetails() in the Web service. The
server listens to the client request in a clear port. The default port number is 80. The path of the
service is /sql. Therefore, clients can access the Web service by using the URL,
“http://localhost:80/sql?wsdl”. The WSDL option is set to DEFAULT. Therefore, if a client
requests the Web service by using this URL, the server generates and returns a WSDL response
to the client. The SCHEMA option is set to STANDARD for the endpoint. Therefore, by default,
inline schemas are returned in SOAP responses.

You can use the following statement to drop an endpoint:


DROP ENDPOINT sqlEndpoint

While creating the endpoint, you may get the following error
Msg 7890, Level 16, State 1, Line 1
An error occurred while attempting to register the endpoint, ‘sqlEndpoint‘. One or
more of the ports specified in the CREATE ENDPOINT statement may be bound to another
process. Attempt the statement again with a different port or use netstat to find
the application currently using the port and resolve the conflict.
The preceding error occurs when the port 8080 is being used by some other service and is not
free to host any other service. To resolve the error, you need to use the following statement:
PORTS = ( CLEAR ), Clear_Port = 8080, SITE = ‘localhost’ )
The preceding statement clears the port 8080 and allows you to host the current endpoint at
this port.

Just a minute:
While creating an HTTP endpoint, which of the following format will you use to return only
the result set to the user?

1. ROWSET_ONLY
2. ALL_RESULT
3. NONE

Answer:
1. ROWSET_ONLY</

Securing HTTP Endpoints


A Web service displays data on the Internet to trusted suppliers, customers, or business partners.
However, you need to ensure that only appropriate people gain access to a critical data. Therefore,
it is important to secure the HTTP endpoint.

To secure endpoints, you need to perform the following steps:

1. Create appropriate user accounts within the database: You need to create appropriate
user accounts for the users who need to access the database.
2. Grant permissions for any stored procedure or user-defined functions that the user or
roles need to access by using the Web service: After allowing the users to access the
database, you need to provide the execute permissions on the database objects that will
be accessed by the endpoint.
3. Grant permissions to allow users or roles to connect to the HTTP endpoints: To allow the
users to execute the database objects over the Internet, you need to allow the users to
connect to the endpoint through which the users will connect to the Web service.

Activity: Implementing HTTP Endpoints

Problem Statement
The database server of AdventureWorks, Inc. is located at Beijing. The organization has various
offices located at various locations spread across the globe.

According to the requirements, the users need to access the data of all the employees at various
locations. Users might also need to use PDAs or mobile phones to access these details. As a
database developer, you have decided to implement a Web service that allows the users to access
the data using the Internet.

How will you implement this service inside the AdventureWorks database?

Solution

To solve the preceding problem, you need to perform the following tasks:

1. Create a procedure.
2. Create an HTTP endpoint for SOAP.
3. Verify the creation of the HTTP endpoint.
4. Create a client application to access the endpoint.

Task 1: Creating a Procedure

To create a procedure that will display the details of all the employees, you need to perform the
following steps:
1. Write the following statement in the Query Editor window of the Microsoft SQL Server
Management Studio window:
CREATE PROCEDURE hrDetails AS
SELECT ContactID, Title + ‘ ‘ + FirstName + ‘ ‘ + LastName as Name, EmailAddress
FROM Person.Contact

2. Press the F5 key to execute the statement.

Task 2: Creating an HTTP Endpoint for SOAP

To create an HTTP endpoint that will access the hrDetails, you need to perform the following
steps:
1. Type the following statement to create an HTTP endpoint in the Query Editor window:
CREATE ENDPOINT hrDetails
STATE = STARTED AS HTTP(
PATH = '/AdventureWorks',
AUTHENTICATION = (INTEGRATED ),
PORTS = ( CLEAR ), CLEAR_PORT = 8088, SITE = 'localhost' )
FOR SOAP (
WEBMETHOD 'hrDetails' (name='AdventureWorks.dbo.hrDetails', FORMAT = ROWSETS_ONLY),
WSDL = DEFAULT, SCHEMA = STANDARD,
DATABASE = 'AdventureWorks', NAMESPACE = 'http://tempUri.org/' );

2. Press the F5 key to execute the statement.


Task 3: Verifying the Creation of the HTTP Endpoint

To verify the creation of an HTTP endpoint, you can view the details of the Web method in the
Microsoft SQL Sever Management Studio window. For this, you need to expand the Server
Objects→Endpoints→SOAP nodes in the Object Explorer window.

You can view the hrDetails Web method listed under the SOAP node, as shown in the following
figure.

hrDetails Endpoint Listed in the Object Explorer Window

If you are unable to view the Web service in the Object Explorer window, you need to refresh the
Object Explorer window several times.

Task 4: Creating a Client Application to Access the Endpoint

To verify the data being accessed by the endpoint, you need to create a client application. You
can write the client application in any of the .NET supported languages.

For example, if you create the client application using C#, you need to perform the following
steps:

1. Create a windows application named WebServiceClient. Add a datagridview named


dataGridView1 and a button named button1 in the windows form, as shown in the
following figure.
Windows Form

2. Select Project→Add Web Reference to display the Add Web Reference dialog box.
3. Type http://localhost:8088/AdventureWorks?wsdl in the URL combo box.
4. Click the GO button. The hrDetails service will be displayed in the left pane of the Add
Web Reference dialog box, as shown in the following figure.

Add Web Reference Dialog Box

5. Type AdventureWorks in the Web reference name text box.


6. Click the Add Reference button. The Web service reference will be added in the client
application.
7. Write the following code in the click event of button1:
WebServiceClient.AdventureWorks.hrDetails
hr = new WebServiceClient.AdventureWorks.hrDetails();
// makes the object of the class
hr.Credentials = System.Net.CredentialCache.DefaultCredentials;
// uses the default username and password to execute the code
DataSet ds = hr.CallhrDetails();
//executes the Web method and stores the result in ds dataset
dataGridView1.DataSource = ds.Tables[0].DefaultView;
//displays the dataset to the user

8. Press the F5 key to execute the client application. The Form1 window will be displayed.
9. Press the Call Web Service button. The data will be displayed in the datagrid, as shown in
the following figure.

Result Displayed in the Datagrid

Summary
In this chapter, you learned that:

A Web service is the piece of code that is exposed over the Internet.
Web services have following advantages:
 Interoperability
 Multilanguage support
 Reusability
 Use of industry-supported standards
SOAP is a standard communication protocol to interchange information in a structured
format in a distributed environment.
WSDL is a markup language that describes a Web service.
UDDI provides a standard mechanism to register and discover a Web service.
HTTP endpoints allow you to create and use Web services within SQL Server.
Before creating an HTTP endpoint, you need to first create stored procedures or functions
that form a Web service.
HTTP endpoints provide the users with a connecting point through which they can access the
implemented functions.
You can create HTTP endpoints by using the CREATE ENDPOINT statement.
To secure an HTTP endpoint, you can create users and assign permissions to the users to
execute a Web method or to connect to the endpoint.

Exercises

Exercise 1

Multiple business partners of AdventureWorks need to use the details of all the sales stores stored
in the AdventureWorks database server. The details include the name of the store, name of the
store owner, and e-mail address of the owner. As per the company’s security policy, access to
company’s databases cannot be provided to any external entity.

As a database developer, how will you make the list available to other organizations without any
additional overhead?

Chapter 11
Implementing Services for Message-Based Communication
In n-tier architecture, the clients and servers can be placed at distant locations. The server that
implements business logic can provide services to various clients. In such a scenario, it is essential
to ensure asynchronous communication. This allows the client applications to send requests even
if the server is not available. SQL Server 2005 allows you to implement this by using Service
Broker.

Service Broker provides a platform on which the services interact by sending and receiving
messages. Services are self-contained components that provide the required functionality.

This chapter provides an overview of the message-based communication. Further, it explains how
to implement services for a message-based communication.

Objectives
In this chapter, you will learn to:
Appreciate message-based communication
Implement Service Broker

Message-Based Communication
Consider a scenario of credit card services. When a person makes a purchase through a credit
card, the credit card details are validated and the purchase transaction is completed. In addition,
an entry is made in the database of the credit card issuing bank for the amount that the person has
to pay for the transaction. At the end of a period, a consolidated bill is sent to the person to receive
the credited amount.

In this scenario, there is also a possibility of the client application not being able to connect to the
database. Therefore, it is required to build a reliable system to ensure that all the requests sent to
the server will be processed. This can be done with the help of the Service Broker feature provided
by SQL Server 2005.
Service Broker is a message-based communication platform that helps in maintaining reliable
query processing and asynchronous communication.</

Introduction to Service Broker


Service Broker allows the database developers to create services that converse with each other by
sending and receiving messages. A service is a database object that provides an endpoint for a
conversation. It sends a request message to another service to utilize the provided functionality.

In the previous example of the credit card system, the client application sends a message to a
service on the database server to enter the transaction details. This service places the message in
a queue and commits the transaction of the client application. When multiple users send requests,
all the messages are placed in the queue. The server processes these messages in the order in
which they are received, as shown in the following figure.

Asynchronous Processing of Messages

If the message could not be processed in the database server, the message is returned to its original
position in the queue. Whenever the server is available, the message is again sent for processing.
After successful processing, a reply is sent to the client. This helps in performing asynchronous
transactions and reliable query processing.

Service Broker is also useful for large-scale distributed applications that need to process data in
multiple database servers located at different locations. For example, a Web application accepts
data from a user through a Web form. The data needs to be saved on two different database servers.
In addition, the application needs to update the data in another remote database server.

Such an application can implement Service Broker because it provides message queuing and
reliable message delivery. The application can continue to accept information from the clients
even if one of the servers is not available.

The following list represents the usage of Service Broker in business processing:

Asynchronous triggers: Triggers always execute synchronously in the context of a


transaction. However, you can create a trigger that invokes an asynchronous process by using
Service Broker. The asynchronous trigger queues a message to perform another job through
Service Broker. This job is performed in a separate transaction, thereby, allowing the original
transaction to commit immediately. Therefore, the applications implementing asynchronous
triggers can avoid system slowdowns that result from keeping the original transaction open
while performing the job.
Reliable query processing: Some applications must reliably process queries, without
interruptions from system failures, power outages, or similar problems. Such application can
submit queries by sending messages to a Service Broker service. The application that
implements the service reads the message, runs the query, and returns the results. All three
of these operations take place in the same transaction. If a failure occurs, the entire
transaction rolls back and the message returns to the queue. When the system recovers from
the failure, the application restarts and processes the message again.
Reliable data collection: Some applications collect data from different servers. Such
applications can take advantage of a Service Broker to collect data reliably. For example,
there is a retail application hosted on multiple websites. This application can use Service
Broker to send transaction information to a central data server. Due to reliable, asynchronous
message delivery provided by Service Broker, each website can continue to process
transactions even if the website loses connection to the central data server. This helps
protects the data and ensures that messages are not misdirected.
Distributed server-side processing for client applications: Service Broker can help
applications that access multiple databases for information. For example, an order processing
application can use Service Broker to exchange the customer, credit, and inventory
information between different databases. Service Broker can provide message queuing,
therefore, the application can accept orders even when one of the databases is unavailable.
When the server is available again, the data is retrieved from the queue for further
processing.
Data consolidation for client applications: Any client application that is using and displaying
data simultaneously from multiple servers can use Service Broker. For example, an
application is consolidating data from different databases located at multiple servers and
displaying them onto one screen. Here, the application can send multiple requests to
different services parallely by using Service Broker. When the services respond to the
requests, the application can collect and display the results. Therefore, the parallel
processing offered by Service Broker helps in reducing the response time significantly.
Large-scale batch processing: Application that handles large volume of data and performs
large scale batch processing can also use parallel processing and queuing offered by Service
Broker. Here, the data to be processed can be stored in a queue. A program can periodically
read and process the data from the queue.

Service Broker Architecture

Service Broker is based on the Service Broker architecture. This architecture consists of the
following database objects:

Message: Is the data exchanged between services.


Service: Is an addressable endpoint for the conversations. The Service Broker messages are
sent from one service to another. The two types of services that take part in a conversation
are initiator and processing services. The initiator services initiate the conversation and send
a message to the processing service.
Message type: Defines the content of the messages exchanged between the participants in
a conversation. A message type object defines the name of the message type and the type of
content the message can contain.
Contract: Is an agreement between the two services about the messages exchanged between
them. The same contract must be created on each participant database that is involved in
the conversation.
Queue: Is a container that stores messages. Each service is associated with a queue. When a
message is sent for a service, Service Broker places the message in the queue. A queue is
represented in the form of a table where each message is placed in a row. Each row contains
the message and its information, such as the message type, the initiator, and the target
service.
Service program: Is a program that provides logic to the service. When a message is received
for a service, Service Broker automatically initiates the service program and forwards the
message to the program.

Introduction to Service Broker Conversation Process


In the Service Broker architecture, various services converse with each other by sending and
receiving messages. The following figure explains the conversation process between two services.

Service Broker Conversation Process

The Service Broker applications send and receive messages across services. A message is sent
from one service to another for processing. When a service receives a message, it verifies that the
message is of the same type as specified in the contract. After verification, the message is added
to the queue. Every service has a service program attached to it. The service program receives the
top-most message in the queue and processes it.

After the processing is complete, a response or acknowledgement can also be sent back to the
initiator service.

The conversation process between two services in Service Broker includes the following objects:

Dialog: A dialog is a two way conversation between two Service Broker services. The sending
service is referred to as the initiator service, and the receiving service is referred to as the
target service. It defines the contract that will be used for the conversation. It also defines
the encryption options, and the lifetime for a single conversation.

Applications exchange messages as part of the dialog. When SQL Server receives a message,
it places the message in the queue for the dialog. The application receives the message from
the queue and processes the message.

Conversation group: The conversation group is a collection of related dialog conversations.


You can consider the example of an Airlines Reservation System. A conversation group is like
a family that needs to check in together for a flight. The same concept also applies to
messages related to each other. These messages must be received and processed in order.
An application can use conversation groups to identify messages related to the same business
task, and process those messages together.
End points: SQL Server 2005 uses end points to communicate with Service Brokers on
different SQL Server instances. An end point allows a Service Broker to communicate over
the network using transport protocols, such as HTTP, TCP, and SOAP. An end point for each
transportation protocol has to be defined using the CREATE ENDPOINT statement.

By default, SQL Server does not contain any end points. Therefore, end points must be created first
to enable communication. The security between SQL Server instances must be enabled for such
communication to be allowed.

Routes: A Service Broker uses routes to determine the destination of a message. While
creating routes, you specify the service that the route points to. You also specify the network
address, and the protocol for the route. By default, each database has a AutoCreatedLocal
route that is used to define the local instance of the database.

Just a minute:
Which of the following objects processes a message from a queue?

1. Service
2. Service program
3. Contract

Answer:
2. Service program

Implementing Service Broker


While implementing Service Broker, you need to first create the Service Broker objects, such as
messages, queues, contracts, and services. Next, you can begin a conversation. After the
conversation has started, the objects can communicate with each other by sending and receiving
messages.
To create any of the Service Broker objects, you need to be a member of the ddladmin role on
the specific database. In addition, you need to enable Service Broker in the database by using
the following statement:
ALTER DATABASE <database name> ENABLE_BROKER
where,
<database name> is the name of the database for which the Service Broker needs to be
enabled.
Only users with the sysadmin privileges can enable the Service Broker.
</

Creating Messages
A message is an entity that is exchanged between the Service Broker services. A message requires
a name to participate in a conversation. A message can contain a validation over the datatype that
a message possesses. As a part of the conversation, a message has a unique identifier as well as a
unique sequence number to enforce message queuing.

You can use the CREATE MESSAGE TYPE statement to create a new message. The syntax of
the CREATE MESSAGE TYPE statement is:
CREATE MESSAGE TYPE message_type_name
[ VALIDATION = { NONE | EMPTY | WELL_FORMED_XML | VALID_XML WITH SCHEMA COLLECTION
schema_collection_name } ] [ ; ]

where,
message_type_name is the name of the message type that you want to create.
VALIDATION specifies how the message should be validated before sending. The default value
is NONE. The validation clause can take any of the following values:

NONE:Specifies that no validation is performed. The message body may contain any data, or
may be NULL.
EMPTY: Specifies that the message body must be NULL.
WELL_FORMED_XML: Specifies that the message must be a well-formed XML snippet.
VALID_XML WITH SCHEMA COLLECTION: Specifies that the message present in the XML snippet
must comply with a schema in the specified schema collection.

schema_collection_name specifies the name of an existing XML schema collection.

For example, the following statement creates a message type named sendMessage:
CREATE MESSAGE TYPE
sendMessage
VALIDATION = WELL_FORMED_XML

The preceding statement creates a message type named sendMessage, which ensures that the
message is a well-formed XML snippet.
Consider another example, where a message present in the XML snippet must be validated with
an existing schema. You can register a schema in the schema collection by using the following
statement:
CREATE XML SCHEMA COLLECTION ExpenseSchema AS
N'<?xml version="1.0" encoding="UTF-16" ?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
targetNamespace= "http://Adventure?Works.com/schemas/expenseReport"
xmlns:expense="http://Adventure-Works.com/schemas/expenseReport"
elementFormDefault="qualified"
>
<xsd:complexType name="expenseReportType">
<xsd:sequence>
<xsd:element name="EmployeeName" type="xsd:string"/>
<xsd:element name="EmployeeID" type="xsd:string"/>
<xsd:element name="ItemDetail"
type="expense:ItemDetailType" maxOccurs="unbounded"/>
</xsd:sequence>
</xsd:complexType>

<xsd:complexType name="ItemDetailType">
<xsd:sequence>
<xsd:element name="Date" type="xsd:date"/>
<xsd:element name="CostCenter" type="xsd:string"/>
<xsd:element name="Total" type="xsd:decimal"/>
<xsd:element name="Currency" type="xsd:string"/>
<xsd:element name="Description" type="xsd:string"/>
</xsd:sequence>
</xsd:complexType>
<xsd:element name="ExpenseReport" type="expense:expenseReportType"/>

</xsd:schema>'

The preceding statement creates an XML schema collection that holds the schema for a simple
expense report.

You can use the following statement to create a message type using the preceding schema:
CREATE MESSAGE TYPE
sendRespone
VALIDATION = VALID_XML WITH SCHEMA COLLECTION ExpenseSchema

The preceding statement creates a new message type by the name sendResponse that validates the
messages against the schema.</

Creating Queues
A queue is an object that stores the messages. In other words, a queue is the primary storage for
messages transferred between two services. A queue can be viewed as a pipeline for messages.
When the queue receives a message, it calls an associated stored procedure to process the message.
This stored procedure is the service program that provides the required functionality.
You can create a queue by using the CREATE QUEUE statement. The syntax of the CREATE
QUEUE statement is:
CREATE QUEUE [ database_name. [schema_name. ] ] queue_name
[ WITH [ STATUS = { ON | OFF } [ , ] ] [ RETENTION = { ON | OFF } [ , ] ] [
ACTIVATION ( [ STATUS = { ON | OFF } , ] PROCEDURE_NAME = <procedure> ,
MAX_QUEUE_READERS = max_readers , EXECUTE AS { SELF | 'user_name' | OWNER } ) ] ]
[ ON { filegroup | [ DEFAULT ] } ]

where,
queue_name is the name of the queue.
STATUS (Queue) specifies the state of the queue. The state of the queue can be ON or OFF. ON
signifies that the queue is available to receive messages. Alternatively, OFF specifies that the
queue is not available and no message can be added or removed from the queue.
RETENTION specifies that the messages sent or received within a particular conversation will be
retained in the queue or not. The default value is OFF.
ACTIVATION specifies the information about the stored procedure that has to process messages
in the queue.
STATUS (Activation) specifies that when the queue receives a message, it should execute a
stored procedure or not. The default value is ON.
PROCEDURE_NAME = <procedure> specifies the name of the stored procedure to start to process
messages in the queue.
MAX_QUEUE_READERS = max_readers specifies the maximum number of instances of the stored
procedure that the queue starts at the same time. It must be a value between 0 and 32767.
EXECUTE AS specifies the SQL Server database user account under which the activation stored
procedure runs. It can have the following values:

SELF specifies that the stored procedure executes as the current user.
‘user_name’ is the name of the user with whose credentials the stored procedure will be
executed.
OWNER specifies that the stored procedure executes as the owner of the queue.

ON filegroup | [ DEFAULT ] specifies the name of the filegroup where the queue will be
created. Here, DEFAULT is not a keyword. When no filegroup is specified, the queue uses
the default filegroup of the database.

For example, the following statement creates a queue named sendQueue:


CREATE QUEUE sendQueue WITH STATUS = ON,
ACTIVATION (
PROCEDURE_NAME = sendProc,
MAX_QUEUE_READERS = 5,
EXECUTE AS SELF

The preceding statement creates a queue named sendQueue that is available to receive messages.
The queue starts the stored procedure named sendProc when a message enters the queue. The
stored procedure executes as the current user. The queue starts a maximum of five instances of
the stored procedure.
The sendProc stored procedure must exist in the database for the preceding statement to execute.

The following statement creates a queue that is unavailable to receive messages:


CREATE QUEUE ExpenseQueue WITH STATUS=OFF

You can use the ALTER QUEUE statement to make the queue available to receive messages, as
shown in the following statement:
ALTER QUEUE ExpenseQueue WITH STATUS=ON</

Creating Contracts
A contract is an agreement between two services that need to communicate with each other. It
also specifies the type of message that will be used in a conversation between the two services.
You can create a message by using the CREATE CONTRACT statement.

The syntax of the CREATE CONTRACT statement is:


CREATE CONTRACT contract_name
[ AUTHORIZATION owner_name ]
( { { message_type_name | [ DEFAULT ] }
SENT BY { INITIATOR | TARGET | ANY } } [ ,…n] ) [ ; ]

where,
contract_name is the name of the contract to be created.
AUTHORIZATION owner_name sets the owner of the contract to the specified database user or role.
message_type_name is the name of the message type to be included in the contract.
SENT BY specifies that what type of message will be sent by which endpoint. It may take the
following values:

INITIATOR: Indicates that only the endpoint that started the conversation will be able to send
this type of message.
TARGET: Indicates that only the endpoint that is the target of the conversation will be able to
send this type of message.
ANY: Indicates that messages of this type can be sent by both the initiator and the target.

For example, the following statement creates a contract named sendContract:


CREATE CONTRACT sendContract
(
sendMessage SENT BY INITIATOR
);

In the preceding statement, a new contract with the name sendContract is created. The contract
uses the sendMessage message type for communication.</

Creating Services
A service is used by Service Broker to deliver messages to the correct queue within a database or
to route messages to another database. In addition, it is used to enforce the contract for a
conversation.

You can create a service by using the CREATE SERVICE statement. The syntax of the CREATE
SERVICE statement is:
CREATE CONTRACT contract_name
[ AUTHORIZATION owner_name ]
( { { message_type_name | [ DEFAULT ] }
SENT BY { INITIATOR | TARGET | ANY } } [ ,...n] ) [ ; ]

where,
service_name is the name of the service to be created. You can not use the name of a Server,
Database, or a Schema as the name of a service.
AUTHORIZATION owner_name sets the owner of the service to the specified database user or role.
ON QUEUE [ schema_name . ] queue_name specifies the queue in which all the messages sent to
this service will be stored.
contract_name specifies a contract for which this service may be a target. If no contract is
specified, the service may only initiate conversations.

For example, the following statement creates a service named sendService on the queue,
sendQueue:
CREATE SERVICE sendService
ON QUEUE
[dbo].[sendQueue] (sendContract)

The preceding statement specifies that the service must follow the contract named sendContract.
You can also create a service that can only initiate a conversation by using the following
statement:
CREATE SERVICE initiaterService
ON QUEUE [dbo].[sendQueue]

The preceding statement creates a service that has no contract information. Therefore, the service
can only be the initiator of the conversation.</

Beginning a Conversation
In Service Broker, two services communicate with each other through a dialog. A dialog allows
bi-directional communication between two services. The dialog ensures that messages are
received in the same order in which they were sent. Before sending any message, it is important
to begin a dialog conversation between two services.

You can begin a conversation by using the BEGIN DIALOG statement. The syntax of the BEGIN
DIALOG statement is:
BEGIN DIALOG [ CONVERSATION ]
@dialog_handle FROM SERVICE
initiator_service_name
TO SERVICE
'target_service_name'] [ ON CONTRACT contract_name ]
where,
@dialog_handle is a variable name that stores the system created handle of the particular
conversation. This variable must be of uniqueidentifier data type.
FROM SERVICE initiator_service_name specifies the service that initiates the dialog.
TO SERVICE ‘target_service_name’ specifies the target service with which to initiate the
dialog.
ON CONTRACT contract_name is the name of the contract that will be followed in this
conversation between services.

For example, the following statements begin a conversation between the sendService and
receiveService services by using the sendContract contract:
DECLARE @dialog_handle UNIQUEIDENTIFIER;
BEGIN DIALOG CONVERSATION @dialog_handle FROM
SERVICE [sendService] TO SERVICE ‘recieveService’ ON CONTRACT [sendContract];

The preceding statements begin a dialog conversation and store an identifier for the dialog in a
variable named @dialog_handle. The sendService is the initiator for the dialog, and the
receiveService is the target of the dialog. The dialog follows the contract named sendContract.</

Sending and Receiving Messages


After you have created all the required broker objects and begun the conversation, you are ready
to send and receive messages.

To send a message, you need to use the SEND statement. The syntax of the SEND statement is:
SEND ON CONVERSATION conversation_handle [ MESSAGE TYPE message_type_name ] [ (
message_body_expression ) ] [ ; ]

where,
ON CONVERSATION conversation_handle specifies the conversation that the message belongs to.
The conversation_handle contains the valid conversation identifier.
MESSAGE TYPE message_type_name specifies the message type of the sent message.
message_body_expression is the message that is required to be sent.

For example, the following statement sends a message named sendMessage:


SEND ON CONVERSATION
@dialog_handle MESSAGE TYPE [sendMessage]
(’<name>John</name>‘)

To receive a message, you need to use the RECEIVE statement, and perform the following steps:

1. Declare variables for storing the message details.


2. Call the RECEIVE statement.
3. Process the message.
4. If the conversation ends, call the END CONVERSATION statement.

The syntax of the RECEIVE statement is:


[ WAITFOR ( ]
RECEIVE [ TOP (n) ]
< column_specifier> [ …n ]
FROM queue_name
[ ) ] [ , TIMEOUT timeout ]

where,
WAITFOR specifies that the RECEIVE statement waits for a message to arrive on the queue if
no messages are currently present. If you specify the TIMEOUT value, the RECEIVE
statement will wait for the given duration.

TOP (n) specifies how many messages to receive from the queue.

Column_specifier [ &n] specifies the list of columns that you want to retrieve from the queue.

queue_name specifies the name of the queue.

TIMEOUT timeout specifies the amount of time, in milliseconds, for the statement to wait for a
message.

For example, you can use the following statement to receive message from a queue:
WAITFOR (
RECEIVE TOP(1)
@message_body=message_body,
FROM ExpenseQueue
), TIMEOUT 3000

The preceding statement is receiving one message from a queue named ExpenseQueue, and
storing the message in a variable named @message_body. If the message is yet to arrive in the
queue, the processing will wait for three seconds.

If the conversation is over, you can call the END CONVERSATION statement to end the
conversation. The syntax of the END CONVERSATION statement is:
END CONVERSATION conversation_handle

where,

conversation_handle specifies the conversation handle for a conversation to end.

For example, you can end a conversation using the following statement:
END CONVERSATION @dialog_handle

The execution of the preceding statement ends a dialog conversation specified by


@dialog_handle.

Activity: Implementing Service Broker

Problem Statement
The management of AdventureWorks, Inc. wants to know the exact yearly sales at any point of
the year to help them plan future strategies. The aggregated yearly sales data is maintained in the
SalesDetails table of the SalesDB database.

The sales transaction details are stored in the SalesOrderHeader and SalesOrderDetails tables in
the AdventureWorks database. To keep the yearly sales data updated, you need to ensure that
whenever any order is processed and its shipping date is updated in the AdventureWorks database,
the total monetary value of that order, stored in the SubTotal column of the table, should be added
to the total yearly sales in the SalesDB database.

For this activity, the SalesDB database should be present on the server. This database will be used
for generating reports.

Solution

To solve the preceding problem, you need to implement Service Broker in the database. To
implement Service Broker, you need to perform the following tasks:

1. Create a service program.


2. Create message type, contract, queue, and service objects.
3. Create a trigger on the SalesOrderHeader table.
4. Verify the functionality.

Task 1: Creating a Service Program

Before you create a Service Broker solution, you need to create a service program. To create a
service program in the SalesDB database, you need to perform the following steps:

1. Open the Microsoft SQL Server Management Studio window.


2. Type the following statements in the Query Editor window of the Microsoft SQL Server
Management Studio window:

USE SalesDB
GO
CREATE PROCEDURE [dbo].[OnReceiveMessage]
AS
DECLARE @message_type int
DECLARE @dialog uniqueidentifier,
@ErrorSave int,
@ErrorDesc nvarchar(100),
@message_body int;
WHILE (1 = 1)
BEGIN
BEGIN TRANSACTION
WAITFOR (
RECEIVE TOP(1)
@message_type=message_type_id,
@message_body=message_body,
@dialog = conversation_handle
FROM SalesQueue
), TIMEOUT 3000

IF (@@ROWCOUNT = 0)
BEGIN
ROLLBACK TRANSACTION
BREAK
END

SET @ErrorSave = @@ERROR


;
IF (@ErrorSave <>
0)
BEGIN
ROLLBACK TRANSACTION ;
SET @ErrorDesc = N'An error has occurred.' ;
END CONVERSATION @dialog
WITH ERROR = @ErrorSave DESCRIPTION = @ErrorDesc ;
END
ELSE
IF (@message_type <> 2)
BEGIN
UPDATE SalesDetails SET TotalSales = TotalSales + @message_body;
END
ELSE
BEGIN
END CONVERSATION @dialog
END
COMMIT TRANSACTION
END

3. Press the F5 key to execute the statements.

Task 2: Creating Message Type, Contract, Queue, and Service Objects

After the service program has been created, you need to create message type, contract, queue, and
service objects in the AdventureWorks and SalesDB databases.

To create these objects, you need to perform the following steps:

1. Type the following statements in the Query Editor window of the Microsoft SQL Server
Management Studio window:

USE AdventureWorks
GO
CREATE MESSAGE TYPE SendMessage
VALIDATION = NONE

CREATE MESSAGE TYPE AcknowledgeMessage


VALIDATION = NONE

CREATE CONTRACT MyContract


(SendMessage SENT BY INITIATOR,
AcknowledgeMessage SENT BY TARGET)

CREATE QUEUE AdvQueue;

CREATE SERVICE SalesService


ON QUEUE AdvQueue (MyContract)

2. Press the F5 key to execute the statements.


3. Type the following statements in the Query Editor window of the Microsoft SQL Server
Management Studio window:

USE SalesDB
GO

CREATE MESSAGE TYPE SendMessage


VALIDATION = NONE

CREATE MESSAGE TYPE AcknowledgeMessage


VALIDATION = NONE

CREATE CONTRACT MyContract


(SendMessage SENT BY INITIATOR,
AcknowledgeMessage SENT BY INITIATOR)

CREATE QUEUE SalesQueue


WITH STATUS=ON,
ACTIVATION (
PROCEDURE_NAME = OnReceiveMessage,
MAX_QUEUE_READERS = 5,
Execute AS SELF) ;

CREATE SERVICE RecieveService


ON QUEUE SalesQueue (MyContract)

4. Press the F5 key to execute the statements.

Task 3: Creating a Trigger on the SalesOrderHeader Table

To update the SalesDB database when any sale is completed, you need to create an insert trigger
on the Sales.SalesOrderHeader table. To accomplish this task, you need to perform the following
steps:

1. Type the following statements in the Query Editor window of the Microsoft SQL Server
Management Studio window:

USE AdventureWorks
GO
CREATE TRIGGER SendTrigger ON Sales.SalesOrderHeader FOR UPDATE AS
DECLARE @amt AS int
SELECT @amt = SubTotal FROM Inserted
DECLARE @dialog_handle UNIQUEIDENTIFIER ;
BEGIN DIALOG CONVERSATION @dialog_handle FROM
SERVICE [SalesService] TO SERVICE 'RecieveService' ON CONTRACT [MyContract] ;
SEND ON CONVERSATION
@dialog_handle MESSAGE TYPE[SendMessage]
(@amt)

2. Press the F5 key to execute the statements.

Task 4: Verifying the Functionality

1. To verify the functionality, update a row in the Sales.SalesOrderHeader table of the


AdventureWorks database by executing the following statement:

UPDATE Sales.SalesOrderHeader
SET ShipDate = getdate()
WHERE SalesOrderID = 43692

Verify that the respective amount is added to the SalesDetails table of the SalesDB database.

To verify the reliable query processing feature of Service Broker, detach the SalesDB database
from SQL Server by executing the following statement:
EXEC sp_detach_db ‘SalesDB’,’true’

Then, execute the following UPDATE statement to update the Sales.SalesOrderHeader table of
the AdventureWorks database:
UPDATE Sales.SalesOrderHeader
SET ShipDate = getdate()
WHERE SalesOrderID = 43692

The SendTrigger update trigger of the Sales.SalesOrderHeader table sends messages to the
ReceiveService service in the SalesDB database. However, both the service and database are not
available. As a result, the message will remain in the transmission queue until the ReceiveService
service of the SalesDB database is made available.

You can execute the following query to view the message present in the transmission queue:
SELECT * FROM sys.transmission_queue

The following figure displays the output of the preceding query.

Message Present in the Transmission Queue

Next, you can attach the SalesDB database by executing the following statement:
EXEC sp_attach_db @dbname = N'SalesDB',
@filename1 = N'C:\Program Files\Microsoft SQL
Server\MSSQL.1\MSSQL\Data\SalesDB.mdf',
@filename2 = N'C:\Program Files\Microsoft SQL
Server\MSSQL.1\MSSQL\Data\SalesDB_Log.ldf'

After you have attached the SalesDB database, you need to enable Service Broker in the SalesDB
database by using the following statement:
ALTER DATABASE SalesDB SET ENABLE_BROKER

Next, you need to verify that the SalesDetails table of the SalesDB database has been updated
with the amount sent in the message. The table is now updated.

If you query the transmission queue now, you will find that the message has been removed from
the queue, as shown in the following figure.

Transmission Queue After all the Messages are Processed

Summary
In this chapter, you learned that:

Service Broker provides a platform that allows the developers to create asynchronous and
reliable query processing.
In Service Broker, developers create services that converse with each other by sending and
receiving messages.
The following list represents the usage of Service Broker in business processing:
 Asynchronous triggers
 Reliable query processing
 Reliable data collection
 Distributed server-side processing for client applications
 Data consolidation for client applications
 Large-scale batch processing
A message is the data exchanged between services.
Each message is of a specific message type.
A service is a database object that provides an endpoint for a conversation.
A contract is an agreement between the services that participate in a conversation.
Each service is associated with a queue that acts as a container that stores messages.
A service program provides the required service to which a message is forwarded by the
queue for processing.
The conversation process between two services in Service Broker includes the following
objects:
 Dialog
 Conversation Groups
 End Point
 Routes
When implementing Service Broker, you need to create message, queue, contract, service,
and conversation database objects.
A message can be created by using the CREATE MESSAGE TYPE statement.
A queue can be created by using the CREATE QUEUE statement.
A contract can be created by using the CREATE CONTRACT statement.
A service can be created by using the CREATE SERVICE statement.
A conversation can be started by using the BEGIN DIALOG statement.
Messages can be sent by using the SEND ON CONVERSATION statement.
If the conversation is over, you can call the END CONVERSATION statement to end the
conversation.

Exercises

Exercise 1

In the AdventureWorks database, the details of the vendors are stored in the Vendors table. In
addition, the names of the vendors are saved in the VList table of the Vendor database used by
another application.

According to the requirements, whenever details of new vendors are added to the Vendors table
of the AdventureWorks database, the name of new vendors should also be added to the VList table
of the Vendor database.

How will you solve this problem?

For this exercise, the Vendor database will be provided to you. The vendor list is present in the
VList table of the Vendor database.

You might also like