Implementing Indexes, Views
Implementing Indexes, Views
At times, the table that you need to search contains huge amount of data. In such cases, it
is advisable to create partitioned indexes. A partitioned index makes the index more
manageable and saleable as they store data of a particular partition only.
As a database developer, you need to create and manage indexes. Before creating an
index, it is important to identify different types of indexes.
The data in the database tables is stored in the form of data pages. Each data page is 8
KB in size. Therefore, the entire data of the table is stored in multiple data pages. When a
user queries a data value from the table, the query processor searches for the data value
in all the data pages. When it finds the value, it returns the result set. As the data in the
table increases, this process of querying data takes time.
To reduce the data query time, SQL Server allows you to implement indexes on tables. An
index is a data structure associated with a table that enables fast searching of data.
Indexes in SQL Server are like the indexes at the back of a book that you can use to
locate text in the book. Indexes provide the following benefits:
Accelerate queries that join tables, and perform sorting and grouping
Enforce uniqueness of rows, (if configured for that)
An index contains a collection of keys and pointers. Keys are the values built from one or
more columns in the table. The column on which the key is built is the one on which the
data is frequently searched. Pointers store the address of the location where a data page
is stored in the memory. The following figure depicts the structure of an index.
When the users query data with conditions based on the key columns, the query processor
scans the indexes, retrieves the address of the data page where the required data is
stored, and accesses the information. The query processor does not need to search for
data in all the data pages. Therefore, the query execution time is reduced. The keys in the
indexes are stored in a B-Tree structure in the memory. A B-Tree is a data-indexing
method that organizes the index into a multilevel set of nodes. Each page in an index B-
Tree is called an index node. Each index contains a single root page at the top of the tree.
This root page, or root node, branches out into n number of pages at each intermediate
level until it reaches the bottom, or leaf level of the index. The index tree is traversed by
following pointers from the upper-level pages down through the lower-level pages.
The key values in the root page and the intermediate pages are sorted in the ascending
order. Therefore, in the B-Tree structure, the set of nodes on which the server will
search for data values is reduced. This enables SQL Server to find the records associated
with the key values quickly and efficiently. When you modify the data of an indexed
column, the associated indexes are updated automatically. SQL Server allows you to create
the following types of indexes:
Clustered index
Nonclustered index
Clustered Index
A clustered index is an index that sorts and stores the data rows in the table based on
their key values. Therefore, the data is physically sorted in the table when a clustered
index is defined on it. Only one clustered index can be created per table. Therefore, you
should build the clustered index on attributes that have a high percentage of unique values
and are not modified often. In a clustered index, data is stored at the leaf level of the B-
Tree. SQL Server performs the following steps when it uses a clustered index to search
for a value:
1. SQL Server obtains the address of the root page from the sysindexes table, which is a
system table containing the details of all the indexes in the database.
2. The search value is compared with the key values on the root page.
3. The page with the highest key value less than or equal to the search value is found.
4. The page pointer is followed to the next lower level in the index.
6. The rows of data are searched on the data page until the search value is found. If the
search value is not found on the data page, no rows are returned by the query.
For example, the rows of the Employee table are sorted according to the Eid attribute.
The following figure shows the working of a clustered index for the Employee table.
The preceding figure displays a clustered index on the Employee table. To search for any
record, SQL Server would start at the root page. It would then move down the B-Tree and
the data values would be found on the leaf pages of the B-Tree. For example, if the row
containing Eid E006 was to be searched by using a clustered index (refer to the preceding
figure), SQL Server performs the following steps:
2. SQL Server searches for the highest key value on the page, which is less than or equal
to the search value. The result of this search is the page containing the pointer to Eid,
E005.
3. search continues from page 602. There, Eid E005 is found and the search continues to
page 203.
Nonclustered indexes are typically created on columns used in joins and the WHERE
clause. These indexes can also be created on columns where the values are modified
frequently. SQL Server creates nonclustered indexes by default when the CREATE
INDEX command is given. There can be as many as 999 nonclustered indexes per table.
The data in a nonclustered index is present in a random order, but the logical ordering is
specified by the index. The data rows may be randomly spread throughout the table. The
nonclustered index tree contains the index keys in a sorted order, with the leaf level of
the index containing a pointer to the data page.
SQL Server performs the following steps when it uses a nonclustered index to search for
a value:
1. SQL Server obtains the address of the root page from the sysindexes table.
2. The search value is compared with the key values on the root page.
3. The page with the highest key value less than or equal to the search value is
found.
4. The page pointer is followed to the next lower level in the index.
6. The rows are searched on the leaf page for the specified value. If a match is not
found, the table contains no matching rows. If a match is found, the pointer is
1. SQL Server starts from page 603, which is the root page.
2. It searches for the highest key value less than or equal to the search value, that is, the
5. Page 203 is searched to find a pointer to the actual row. Page 203 is the last page, or
6. The search then moves to page 302 of the table to find the actual row.
Creating Indexes
You should create indexes on the most frequently queried columns in a table.
However, at times, you might need to create an index based on the combination of
one or more columns. An index based on one or more columns is called a composite
index. A composite index can be based on a maximum of 16 columns. The maximum
allowable size of the combined index values is 900 bytes. Columns that are of the
Large Object (LOB) data types, such as ntext, text, varchar(max), nvarchar(max),
varbinary(max), xml, or image cannot be specified as key columns for an index.
However, indexes with less number of columns use less disk space and involve fewer
resources when compared to indexes based on more columns.
In the preceding code, the FILLFACTOR value of 10 has been specified to reserve
a percentage of free space on each data page of the index to accommodate future
expansion. The following example creates a nonclustered index on the ManagerID
attribute of the Employee table:
CREATE NONCLUSTERED INDEX IDX_Employee_ManagerID ON Employee
(ManagerID)
Consider another example, where users frequently query the Document table that
contains the following three columns:
Title nvarchar(50)
FileName nvarchar(400)
FileExtension nvarchar(8)
The preceding statement creates a nonclustered index on the Document table. This
index contains two key columns, Title and FileExtension, and one non-key column,
FileName. Therefore, this index covers all the required columns with index key size
still under the limit. This way, you can create nonclustered indexes that cover all
the columns in the query. This enhances the performance of the query as all the
columns included in the query are covered within the index either as key or non-key
columns.
Create clustered indexes on columns that have unique or not null values.
Do not create an index that is not used frequently. You require time and resources
to Maintain indexes.
Create a clustered index before creating a nonclustered index. A clustered index
changes the order of rows. A nonclustered index would need to be rebuilt if it is
built before a clustered index.
Create nonclustered indexes on all columns that are frequently used in predicates
and join conditions in queries.
You can create a filtered index based on a condition specified by the WHERE clause.
Filtered indexes do not allow the IGNORE_DUP_KEY option. Only nonclustered filtered
indexes can be created on a table. For example, you need to create a filtered index for
those records in the Employee table, where the value in the Title column is Tool Manager.
Execute the following statement to create the specified index:
Managing Indexes
In addition to creating indexes, you also need to maintain them to ensure their continued
optimal performance. The common index maintenance tasks include disabling, enabling,
renaming, and dropping an index. As a database developer, you need to regularly monitor
the performance of the index and optimize it.
Disabling Indexes
When an index is disabled, the user is not able to access the index. If a clustered index is
disabled then the table data is not accessible to the user. However, the data still remains
in the table, but is unavailable for Data Modification Language (DML) operations until the
index is dropped or rebuilt.
To rebuild and enable a disabled index, use the ALTER INDEX REBUILD statement or the
CREATE INDEX WITH DROP_EXISTING statement.
The following query disables a nonclustered index, IX_EmployeeID, on the Employee table:
Enabling Indexes
After an index is disabled, it remains in the disabled state until it is rebuilt or dropped.
You can enable a disabled index by rebuilding it through one of the following methods:
By using one of the preceding statements, the index is rebuilt and the index status is set
to enable. You can rebuild a disabled clustered index, when the ONLINE option is set to
ON. The DROP_EXISTING clause is used to rebuild the index after dropping the existing
clustered and nonclustered indexes. The new index must have the same name as that of an
existing index. However, you can modify the index definition by specifying different
columns, sorting order, or partitioning scheme while creating a new index. The default
value of the DROP_EXISTING index is set to OFF.
Renaming Indexes
You can rename the current index with the help of the sp_rename system stored
procedure. The following statement renames the IX_JobCandidate_EmployeeID index on
the JobCandidate table to IX_EmployeeID:
EXEC sp_rename
'HumanResources.JobCandidate.IX_JobCandidate_EmployeeID',
'IX_EmployeeID','index'
Dropping Indexes
When you no longer need an index, you can remove it from a database. You cannot drop an
index used by either a PRIMARY KEY or UNIQUE constraint, except by dropping the
constraint. The following statement drops the IDX_Employee_ManagerID index on the
Employee table:
A SELECT statement is constructed with one or more clauses, such as FROM and WHERE.
These clauses are executed in different phases. Each phase executes a clause and
generates a virtual table that is feed as an input to the next phase. The phases and their
orders are:
1. FROM: If more than one table is listed in the FROM clause, a cartesian product
of the first two tables listed in the FROM clause is generated. This result is
stored in a virtual table, for example VT1. If there is only one table listed in the
FROM clause, all the records are extracted from the table and the result is stored
in a virtual table.
2. ON: The ON condition is applied on the virtual table, VT1. This generates a new
virtual table, VT2, containing the rows which satisfies the join condition.
3. JOIN: The rows are retrieved from the virtual table, VT2, according to the join
type and stored in a new virtual table, VT3.
4. If more than two tables are specified in the FROM clause, the step 1 to 3 is
repeated between the VT3 and the next table specified in the FROM clause.
5. WHERE: This condition is applied on VT3. The rows that satisfy the WHERE
condition is stored in a new virtual table, VT4.
6. GROUP BY: The rows of the virtual table, VT4, are arranged according to the
column listed in the GROUP BY clause and stored in a new virtual table, VT5.
7. WITH CUBE or WITH ROLLUP: The ROLLUP and CUBE operators are used to
apply multiple levels of aggregation on the result retrieved in the virtual table,
VT5, using the GROUP BY clause. The result is stored in a new virtual table, VT6.
8. HAVING: This clause retrieves the groups that match the condition. The result
is stored in a new virtual table, VT7.
9. SELECT: The columns specified in the SELECT list in the query is retrieved
from the virtual table, VT7, and stored in a new virtual table, VT8.
10. DISTINCT: The duplicate rows from the virtual table, VT8 are removed and a
new virtual table, VT9 is generated.
11. ORDER BY: The rows of the virtual table, VT9, is arranged in the order
specified in the ORDER BY clause and a new virtual table, VT10, is generated.
12. TOP: The first set of rows from the top of the virtual table, VT10, is retrieved
and stored in a new virtual table, VT11. The rows of this table are returned as a
result to the caller. A caller may be a client application or the outer query.
The logical order of operation in SELECT statement can be specified using the syntax of
SELECT statement as:
(8)SELECT
(9)DISTINCT
ON C.CustomerID = O.CustomerID
GROUP BY C.CustomerID
ORDER BY Numorders;
In the preceding query, the cartesian product of the Customer and the
SalesOrderHeader tables is generated and stored in a virtual table,
VT1. Further, the rows whose C.CustomerID is equal to the
O.CustomerID is returned and stored in a new virtual table, VT2.
Further, the rows from VT2 and rows from the Customer table for
which no match was found in previous step is returned and stored in a
new virtual table, VT3. The rows from VT3 whose CustomerType is S is
returned and stored in a new virtual table, VT4. The rows of VT4 are
arranged in group based on the C.CustomerID column and stored in a
new virtual table, VT5. Further, those groups whose number of rows of
O.SalesOrderID column is less than three is returned and stored in a
new virtual table, VT6. Further, the CustomerID column and the
SalesOrderID column as Numorders is retrieved and stored in a new
virtual table, VT7. Further, the rows from VT7 are sorted in ascending
order based on the Numorders column and stored in a new virtual table,
VT8. The virtual table, VT8, is returned as a result to the user.
Creating and Managing Views
At times, the database administrator might want to restrict access of
data to different users. They might want some users to be able to
access all the columns of a table whereas other users to be able to
access only selected columns. SQL Server allows you to create views to
restrict user access to the data. Views also help in simplifying query
execution when the query involves retrieving data from multiple tables
by applying joins. A view is a virtual table, which provides access to a
subset of columns from one or more tables. It is a query stored as an
object in the database, which does not have its own data.
A view can derive its data from one or more tables, called the base
tables or underlying tables. Depending on the volume of data, you can
create a view with or without an index. As a database developer, it is
important for you to learn to create and manage views.
Creating Views
A view is a database object that is used to view data from the tables in
the database. A view has a structure similar to a table. It does not
contain any data, but derives its data from the underlying tables. Views
ensure security of data by restricting access to:
Specific rows of a table.
Specific columns of a table.
Specific rows and columns of a table.
Rows fetched by using joins.
Statistical summary of data in a given table.
Subsets of another view or a subset of views and tables.
Apart from restricting access, views can also be used to create and
save queries based on multiple tables. To view data from multiple
tables, you can create a query that includes various joins. If you need
to frequently execute this query, you can create a view that executes
this query. You can access data from this view every time you need to
execute the query. You can create a view by using the CREATE VIEW
statement. The syntax of the CREATE VIEW statement is:
Guidelines for Creating Views While creating views, you should consider
the following guidelines:
The name of a view must follow the rules for identifiers and must
not be the same as that of the table on which it is based.
A view can be created only if there is a SELECT permission on its
base table.
A view cannot derive its data from temporary tables.
For example, to provide access only to the employee ID, marital status,
and department ID for all the employees you can create the following
code:
Views do not maintain a separate copy of the data, but only display the
data present in the base tables. Therefore, you can modify the base
tables by modifying the data in the view. However, the following
restrictions exist while inserting, updating, or deleting data through
views:
For example, a view displaying the employee id, manager id, and rate of
the employees has been defined by using the following statement:
UPDATE vwSal
WHERE EmployeeID = 1
UPDATE vwSal
UPDATE vwSal
Managing Views
Altering Views
If you define a view with a SELECT * statement and then alter the
structure of the underlying tables by adding columns, the new columns
do not appear in the view. Similarly, when you select all the columns in a
CREATE VIEW statement, the columns list is interpreted only when
you first create the view. To add new columns in the view, you must
alter the view. You can modify a view without dropping it. This ensures
that permissions on the view are not lost. You can modify a view
without affecting its dependent objects. To modify a view, you need to
use the ALTER VIEW statement.
For example, you created a view to retrieve selected data from the
Employee and EmployeeDepartmentHistory tables.
You need to alter the view definition by including the LoginID attribute
from the Employee table. To modify the definition, you can write the
following statement:
AS
Renaming Views
At times, you might need to change the name of a view. You can rename
a view without dropping it. This ensures that permissions on the view
are not lost. A view can be renamed by using the sp_rename system
stored procedure.
For example, you can use the following statement to rename the vwSal
view:
The new name for the view must follow the rules for identifiers.
Dropping Views
You need to drop a view when it is no longer required. You can drop a
view from a database by using the DROP VIEW statement. When a
view is dropped, it has no effect on the underlying table(s). Dropping a
view removes its definition and all the permissions assigned to it.
Further, if you query any view that references a dropped table, you
receive an error message. Dropping a table that references a view does
not drop the view automatically.
You have to use the DROP VIEW statement explicitly. The syntax of
the DROP VIEW statement is:
For example, you can use the following statement to remove the
vwSalary view:
The preceding statement will drop the vwSalary view from the
database. You can drop multiple views with a single DROP VIEW
statement. The names of the views that need to be dropped are
separated by commas in the DROP VIEW statement.