Week 11 Reading
Week 11 Reading
correlated subqueries)
o Subqueries: A query that is embedded (or nested) inside another query. Also known as a
nested query or an inner query.
The use of joins in a relational database allows you to get information from two or more tables.
For example, the following query allows you to get customer data with its respective invoices by
joining the CUSTOMER and INVOICE tables:
In the previous query, the data from both tables (CUSTOMER and INVOICE) is
processed at once, matching rows with shared CUS_CODE values.
However, it is often necessary to process data based on other processed data. For
example, suppose that you want to generate a list of vendors who do not provide products.
(Recall that not all vendors in the VENDOR table have provided products—some
are only potential vendors.) Previously, you learned that you could generate such a list by
writing the following query:
However, this result can also be found by using a subquery, such as:
SELECT V_CODE, V_NAME
FROM VENDOR
WHERE V_CODE NOT IN (SELECT V_CODE
FROM PRODUCT
WHERE V_CODE IS NOT NULL);
Similarly, to generate a list of all products with a price greater than or equal to the average
product price, you can write the following query:
In both queries, you needed to get information that was not previously known:
o What vendors provide products?
o What is the average price of all products?
In both cases, you used a subquery to generate the required information, which
could then be used as input for the originating query. There are key characteristics that
you should remember for subqueries:
o A subquery is a query (SELECT statement) inside another query.
o A subquery is normally expressed inside parentheses.
o The first query in the SQL statement is known as the outer query.
o The query inside the SQL statement is known as the inner query.
o The inner query is executed first.
o The output of an inner query is used as the input for the outer query.
o The entire SQL statement is sometimes referred to as a nested query.
Subqueries have a wide range of uses. For example, you can use a subquery within a SQL data
manipulation language (DML) statement such as INSERT, UPDATE, or DELETE, in which a
value or list of values (such as multiple vendor codes or a table) is expected.
1. WHERE Subqueries:
The most common type of subquery uses an inner SELECT subquery on the right side of a
WHERE comparison expression. For example, to find all products with a price greater than or
equal to the average product price, you write the following query:
Note that this type of query, when used in a >, <, =, >=, or <= conditional expression, requires a
subquery that returns only one value (one column, one row). The value generated by the
subquery must be of a comparable data type; if the attribute to the left of the comparison symbol
is a character type, the subquery must return a character string. Also, if the query returns more
than a single value, the DBMS will generate an error.
Subqueries can also be used in combination with joins. If the original query encounters the “Claw
hammer” string in more than one product description, you get an error message. To compare one
value to a list of values, you must use an IN operand.
2. IN Subqueries:
When you want to compare a single attribute to a list of values, you use the IN operator. When
the P_CODE values are not known beforehand, but they can be derived using a query, you must
use an IN subquery. The following example lists all customers who have purchased hammers,
saws, or saw blades.
3. HAVING Subqueries:
Just as you can use subqueries with the WHERE clause, you can use a subquery with a HAVING
clause. The HAVING clause is used to restrict the output of a GROUP BY query by applying
conditional criteria to the group rows. For example, to list all products with a total quantity sold
greater than the average quantity sold, you would write the following query:
4. FROM Subqueries:
The FROM clause specifies the table(s) from which the data will be drawn. Because the output of
a SELECT statement is another table (or more precisely, a “virtual” table), you could use a
SELECT subquery in the FROM clause. For example, assume that you want to know all
customers who have purchased products 13-Q2/P2 and 23109-HB. All product purchases are
stored in the LINE table, so you can easily find out who purchased any given product by
searching the P_CODE attribute in the LINE table. In this case, however, you want to know all
customers who purchased both products, not just one. You could write the following query:
Instead of typing the same query at the end of each day, wouldn’t it be better to permanently save
that query in the database? That is the function of a relational view.
A view is a virtual table based on a SELECT query. The query can contain columns, computed
columns, aliases, and aggregate functions from one or more tables. The tables on which the view
is based are called base tables.
The CREATE VIEW statement is a data definition command that stores the subquery
specification—the SELECT statement used to generate the virtual table—in the data
dictionary.
o You can use the name of a view anywhere a table name is expected in a SQL statement.
o Views are dynamically updated. That is, the view is re-created on demand each time
it is invoked. Therefore, if new products are added or deleted to meet the criterion
P_PRICE > 50.00, those new products will automatically appear or disappear in the
PRICEGT50 view the next time the view is invoked.
o Views provide a level of security in the database because they can restrict users to seeing
only specified columns and rows in a table. For example, if you have a company with
hundreds of employees in several departments, you could give each department
administrative assistant a view of certain attributes only for the employees who belong to
that assistant’s department.
o Views may also be used as the basis for reports. For example, if you need a report that
shows a summary of total product cost and quantity-on-hand statistics grouped by vendor,
you could create a PROD_STATS view as:
Using the CREAT INDEX command, SQL indexes can be created on the
basis of any selected attribute. The syntax is:
A common practice is to create an index on any field that is used as a search key, in
comparison operations in a conditional expression, or when you want to list rows in a
specific order. For example, if you want to create a report of all products by vendor, it
would be useful to create an index on the V_CODE attribute in the PRODUCT table.
Unique composite indexes are often used to prevent data duplication. For example,
consider the case illustrated in Table 8.3, in which required employee test scores are
stored. (An employee can take a test only once on a given date.) Given the structure of
Table 8.3, the PK is EMP_NUM + TEST_NUM. The third test entry for employee 111
meets entity integrity requirements—the combination 111,3 is unique—yet the WEA
test entry is clearly duplicated.
Such duplication could have been avoided through the use of a unique composite
index, using the attributes EMP_NUM, TEST_CODE, and TEST_DATE:
CREATE UNIQUE INDEX EMP_TESTDEX ON TEST(EMP_NUM, TEST_CODE,
TEST_DATE);
The improvement in data access speed occurs because an index is an ordered set of values that
contains the index key and pointers.
The pointers are the row IDs for the actual table rows.
One measure that determines the need for an index is the data sparsity of the column you want to
index. Data sparsity refers to the number of different values a
column could have
Most DBMSs implement indexes using one of the following data structures:
Hash index. A hash index is based on an ordered list of hash values. A hash algorithm
is used to create a hash value from a key column. This value points to an entry in a hash table,
which in turn points to the actual location of the data row. This type of index is good for simple
and fast lookup operations based on equality conditions—for example, LNAME="Scott" and
FNAME="Shannon".
B-tree index. The B-tree index is an ordered data structure organized as an upside-down tree. (See
Figure 11.4.) The index tree is stored separately from the data. The lower-level leaves of the B-
tree index contain the pointers to the actual data rows. B-tree indexes are “self-balanced,” which
means that it takes approximately the same amount of time to access any given row in the index.
This is the default and most common type of index used in databases. The B-tree index is
used mainly in tables in which column values repeat a relatively small number of times.
Bitmap index. A bitmap index uses a bit array (0s and 1s) to represent the existence of a value or
condition. These indexes are used mostly in data warehouse applications in tables with a
large number of rows in which a small number of column values repeat many times. (See
Figure 11.4.) Bitmap indexes tend to use less space than B-tree indexes because they use bits
instead of bytes to store their data.