0% found this document useful (0 votes)
4 views

Delphi Magazine - Data Processing With Subqueries

Uploaded by

Bruno Voltz
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Delphi Magazine - Data Processing With Subqueries

Uploaded by

Bruno Voltz
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Surviving Client/Server:

Data Processing With Subqueries


by Steve Troxell

D atabase programmers just


coming into the world of SQL
and client/server may overlook the
SalesByStore contains:
Store_ID
––––––––
TotalSales
––––––––––
subtle power of the SQL language. 6380 132.80
7066 1,821.25
You may be tempted to look on the 7067 1,486.30
SELECT statement as a simple 7131 1,400.15
7896 604.40
“record read” I/O function which 8042 1,232.00
will return a specified set of SELECT Average = AVG(TotalSales) FROM SalesByStore
columns and rows to be further Average
manipulated by the Delphi client ––––––––––
1,112.82
program. In reality, SELECT packs a
pretty powerful data processing
punch of its own. We skimmed the
➤ Figure 1: Obtain the average sales figure
surface of what SELECT can do back
in Issue 4 (November 1995). This
month we’re going to expand our SELECT Store_ID, TotalSales FROM SalesByStore
knowledge further and take a look WHERE TotalSales > 1112.82
at just how we can leverage the ORDER BY TotalSales DESC

power of SELECT by making use of Store_ID TotalSales


–––––––– ––––––––––
subqueries. 7066 1,821.25
Subqueries, sometimes referred 7067 1,486.30
7131 1,400.15
to as nested queries, can be 8042 1,232.00
thought of as chaining two or more
queries together with the results of
➤ Figure 2: Stores with above average sales
one query feeding into or partici-
pating in the processing of the
other query. I should say at the
SELECT * FROM SalesByStore
outset that the 16-bit BDE with WHERE TotalSales > (SELECT AVG(TotalSales)
Delphi 1.0 does not support FROM SalesByStore)
ORDER BY TotalSales DESC
subqueries in local SQL (ie, with
Store_ID TotalSales
Paradox and dBase tables etc), –––––––– ––––––––––
only with database servers. 7066 1,821.25
7067 1,486.30
The usefulness of subqueries 7131 1,400.15
might best be explained with an 8042 1,232.00
example. Suppose you want a list of
stores having above average sales.
➤ Figure 3: Above average sales with a subquery
Let’s say we have a table called
SalesByStore containing one row
per store with the total sales for this case, the inner query (SELECT you wanted a list of stores with any
that store. First, we must obtain the AVG) is run first to obtain the employee making under $5.00 an
average store sales as shown in average value which is then substi- hour? Figure 4 shows how to do
Figure 1. Then, we would use this tuted into the outer query. The this. The inner query finds all
value to create the list of above outer query then runs with a employees making less than $5.00
average stores in Figure 2. We constant value in place of the inner an hour and returns a list of
would be tempted to have a Delphi query. All of this happens within Store_IDs where those employees
client program run the first query, the context of a single SQL query work. Notice that we use the
then substitute the average value sent to the server and a single DISTINCT keyword to get a non-
obtained into the second query. result set passed back to the client, duplicating list of store numbers.
However, with a little ingenuity, instead of the two-stage process Otherwise, if a store had more than
we can accomplish the same thing we had before. one employee matching the crite-
by changing Figure 2 to use a Subqueries don’t have to return ria we would get duplicate
subquery as shown in Figure 3. In just a single value either. Suppose Store_IDs in our return set.

10 The Delphi Magazine Issue 14


Correlated Subqueries SELECT StoreName FROM Stores
What if you wanted to find all the WHERE Store_ID IN (SELECT DISTINCT Store_ID FROM Employees
stores whose sales are above the WHERE PayRate < 5.00)

average sales for the same city? StoreName


––––––––––––––––––––––––
This is a little different because it’s Videos R Us
not a simple two-stage process Big Bag O’ Videos
Videos To Go
where we can obtain the result of Superstar Video Rental
one query and plug that directly
into a second query. In this case,
➤ Figure 4
the query that determines the
average for the city changes as we
examine stores in different cities. SELECT Store_ID, TotalSales FROM SalesByStore S1
Figure 5 shows how to make a WHERE TotalSales > (SELECT AVG(TotalSales) FROM SalesByStore
WHERE City = S1.City)
correlated subquery where the ORDER BY Store_ID
values from the outer query affect Store_ID TotalSales
the execution of the inner query. –––––––– ––––––––––
7066 1,821.25
In this query we cannot evaluate 7131 1,400.15
the inner query first and then use
its results to evaluate the outer
➤ Figure 5
query. Here the outer query exe-
cutes and, as it examines each row
in SalesByStore, the inner query CREATE TABLE WebMonthly(
executes once per row using the Page varchar(50),
City value for the current store. YearMonth char(6), /* YYYYMM */
TotalHits int,
Notice that we must use a table PRIMARY KEY (Page, YearMonth))
alias on the outer query in order to
reference it from within the inner
➤ Figure 6
query. In its entirety, this query
logically states, “For each row in
SalesByStore (S1), calculate the
SELECT TotalHits, Page FROM WebMonthly
average for all stores in the same WHERE YearMonth = ’199607’ /* July 1996 */
city as the current store (from S1), ORDER BY TotalHits DESC

and return only those stores whose TotalHits Page


––––––––––– ––––––––––
TotalSales are greater than the 2143 default.htm
average.” 773 download.htm
436 products.htm
368 pressrel.htm
Ranking Data 296 apd.htm
233 sleuth.htm
Let’s use a real world example to 199 ordering.htm
see how much more we can do with 185 orpheus.htm
subqueries. Every time someone 140 systinfo.htm
134 orp21.htm
accesses a page on TurboPower’s 112 newsletr.htm
web site, a “hit” record is written to 85 about.htm
85 orderup.htm
a transaction table that posts, 81 order.htm
amongst other things, the date and
time of the hit and the page that
➤ Figure 7
was hit. In addition a summary file
of total hits by page (see Figure 6)
is updated for each month via an statistics. Retaining historical produce this list. We simply run the
insert trigger on the transaction information like this may not be as query shown in Figure 7 and print
table. important for web page hits, but for off the output for the Marketing
Since analysis by month is one of product sales, customer order Guy.
our key purposes in collecting this activity, and other similar events, Unfortunately, Marketing Guy
data, and since the size of the table historical summary data may be returns and asks “How simple is it
is relatively small (50 or so web very important. to put a ranking on these from 1 to
pages by 12 months for a year’s Obviously, our marketing folks whatever so I can easily pull off the
worth of data), it is worthwhile to would very much like to evaluate top 20?” We can actually get the
make a separate table for these the effectiveness of the web site by ranking within the result set itself
summary statistics. This also gives seeing an ordered list of “Top Page by using a subquery as shown in
us the flexibility to purge the trans- Hits” for any given month. With Figure 8.
action table periodically while still web page hits summarized by This is a bit different from the
retaining the distilled historical month, it is very straightforward to subqueries we’ve looked at so far.

12 The Delphi Magazine Issue 14


Here we are not using a subquery SELECT Rank = (SELECT COUNT(TotalHits) FROM WebMonthly
to restrict the rows returned by the WHERE TotalHits >= W.TotalHits AND
outer query, but instead we are YearMonth = W.YearMonth),
TotalHits, Page
using the subquery to compute one FROM WebMonthly W
of the columns in the result set. WHERE YearMonth = ’199607’ /* July 1996 */
ORDER BY 1
This is still a correlated subquery,
Rank TotalHits Page
so the subquery will run for each ––––––––––– ––––––––––– ––––––––––––
row processed by the outer query. 1 2143 default.htm
2 773 download.htm
It is critical that a subquery used in 3 436 products.htm
this sense return exactly one row 4 368 pressrel.htm
and exactly one column or it 5 296 apd.htm
6 233 sleuth.htm
wouldn’t make sense in the main 7 199 ordering.htm
result set. 8 185 orpheus.htm
9 140 systinfo.htm
The outer query retrieves all 10 134 orp21.htm
rows from the WebMonthly table for 11 112 newsletr.htm
13 85 about.htm
July 1996, which equates to all the 13 85 orderup.htm
web pages hit in that month. For 14 81 order.htm
each row retrieved, the inner query
(SELECT COUNT(TotalHits)) counts
➤ Figure 8
the number of rows in the same
month (YearMonth = W.YearMonth)
having TotalHits the same as or SELECT Rank = (SELECT COUNT(DISTINCT TotalHits) FROM WebMonthly
higher than the TotalHits of the WHERE TotalHits >= W.TotalHits AND
current row of the outer query YearMonth = W.YearMonth),
TotalHits, Page
(TotalHits >= W.TotalHits). There- FROM WebMonthly W
fore, the row with the highest WHERE YearMonth = ’199607’ /* July 1996 */
ORDER BY 1
TotalHits gets ranked as number 1
Rank TotalHits Page
because there is only one row with ––––––––––– ––––––––––– ––––––––––––
the same or greater TotalHits (the 1 2143 default.htm
2 773 download.htm
row itself). The row with the sec- 3 436 products.htm
ond highest TotalHits gets ranked 4 368 pressrel.htm
as number 2 because there are 5 296 apd.htm
6 233 sleuth.htm
exactly two rows with the same or 7 199 ordering.htm
greater TotalHits (the row itself is 8 185 orpheus.htm
9 140 systinfo.htm
equal, and the row ranked number 10 134 orp21.htm
1 is higher). 11 112 newsletr.htm
12 85 about.htm
Finally, we order the list by 12 85 orderup.htm
specifying the column number to 13 81 order.htm
order by rather than the column
name. Some database server ven-
➤ Figure 9
dors do not allow ordering by a
computed column name, but for
some reason, it’s still legal to order when ORDERUP.HTM is processed. Ranking Multiple Sequences
by that same column using the Again, the rank is 13. So you hand off your new statistical
column number. Go figure... How do we correct this and report and lean back in satisfaction
achieve a ranking list with no gaps? thinking “what else could our
Duplicate Values In Rankings All we need to do is add the Marketing Guy possibly want?”
In Figure 8, note that ABOUT.HTM and DISTINCT keyword within the inner “Great!” he says. “Can you stick
ORDERUP.HTM have the same hit query as shown in Figure 9. Why the ranking for last month in there
count, and therefore have the same does DISTINCT correct this? Now, too? And show the number of hits
ranking (there is a tie for this posi- all rows having the same value for from last month as well, so I can
tion). However, the fact that we TotalHits are counted only once as chart the up or down movement.”
skip from rank 11 to rank 13 is very a whole. Now when processing Figure 10 shows just how to do
disheartening. Why this happens is ABOUT.HTM, there are 11 rows with a this. Since this is a fairly involved
straightforward. When the query higher TotalHits and one distinct query, I’ve used variables to hold
processes the row for ABOUT.HTM, match with the same value. There- the current and last month values
there are 11 rows with a higher fore, the ranking is 12. Since rows to make it easier to follow. This
TotalHits and 2 rows with the with the same value are counted script is written for Microsoft SQL
same TotalHits (ABOUT.HTM and only once, you are guaranteed Server, there are subtle differences
ORDERUP.HTM). Therefore, the rank is to have an unbroken chain of in how other servers handle
13. The exact same logic applies consecutive rankings. variables.

14 The Delphi Magazine Issue 14


At first this query looks very DECLARE @vThisMonth char(6)
intimidating, but fear not! It’s actu- DECLARE @vLastMonth char(6)
SELECT @vThisMonth = ’199607’ /* July 1996 */
ally more of the same thing we’ve SELECT @vLastMonth = ’199606’ /* June 1996 */
been doing, just a whole lot of it in SELECT
the same query. RankThisMonth = (SELECT COUNT(DISTINCT TotalHits)
Our main outer query is the same FROM WebMonthly
WHERE YearMonth = M1.YearMonth AND
as before, selecting all the web TotalHits >= M1.TotalHits),
pages for the given month. Only RankLastMonth = (SELECT (SELECT COUNT(DISTINCT TotalHits)
FROM WebMonthly
now we’re assigning a table alias of WHERE YearMonth = M2.YearMonth AND
M1 to this set of rows. TotalHits >= M2.TotalHits)
FROM WebMonthly M2
The RankThisMonth column is the WHERE YearMonth = @vLastMonth AND
same subquery as in Figure 9. It is Page = M1.Page),
a correlated subquery tied to the HitsThisMonth = M1.TotalHits,
HitsLastMonth = (SELECT TotalHits FROM WebMonthly
current row in M1 which computes WHERE YearMonth = @vLastMonth AND
the ranking for the current month. Page = M1.Page),
M1.Page
The logic is therefore “for every FROM WebMonthly M1
web page in the current month WHERE YearMonth = @vThisMonth
ORDER BY 1, 2
(M1), count the number of other
web pages in that month (WHERE RankThisMonth RankLastMonth HitsThisMonth HitsLastMonth Page
YearMonth = M1.YearMonth) that ––––––––––––– ––––––––––––– ––––––––––––– ––––––––––––– ––––––––––––
1 1 2143 2066 default.htm
have distinct TotalHits greater 2 2 773 874 download.htm
than or equal to this page (Total- 3 3 436 526 products.htm
4 4 368 509 pressrel.htm
Hits >= M1.TotalHits).” 5 5 296 341 apd.htm
The RankLastMonth column is a 6 (null) 233 (null) sleuth.htm
7 7 199 200 ordering.htm
subquery that itself contains a 8 6 185 207 orpheus.htm
subquery. The outer query re- 9 8 140 185 systinfo.htm
trieves last month’s WebMonthly 10 9 134 183 orp21.htm
11 10 112 125 newsletr.htm
record for the current web page. 12 11 85 111 orderup.htm
The inner query then uses that to 12 12 85 92 about.htm
13 13 81 77 order.htm
compute last month’s ranking. This 14 (null) 71 (null) sl-scrn1.htm
is essentially the same as the Rank- 15 15 53 54 delphi32.htm
15 15 53 54 gi.htm
ThisMonth subquery, except it is
16 14 50 61 btree.htm
correlated to last month’s web
page row (M2) instead of this
➤ Figure 10
month’s (M1). The net result of
these two nested queries is a single
value representing the ranking of TotalHits for last month performance should not be much
that web page for last month. Note (RankLastMonth); of a problem. Also, bear in mind
that when a particular page was ➣ Finally, a fourth query is run to that these types of query are not
not present last month, we conven- retrieve the matching web page run very frequently.
iently get a null value for its row for the previous month
ranking and hit count. (again) in order to get last Conclusion
The HitsThisMonth column is sim- month’s total hits. We’ve seen how to use subqueries
ply copied from the TotalHits from Obviously, a query such as this is a to add another dimension of proc-
the current month’s web page row good candidate for a stored proce- essing power to our SQL queries.
(M1). The HitsLastMonth column dure, with parameters for the two Subqueries can generally be used
requires a subquery to retrieve last months to compare. Note this anywhere an expression compat-
month’s row for the current web algorithm is not limited to compar- ible with the result of the subquery
page. ing consecutive months: any two can be used. Next month, we’ll con-
To recap, for every row in M1: points in time can be compared tinue the theme by seeing how to
➣ A query is launched to count the (for example, current month and work with case functions, running
number of other rows having same month last year). totals and cross tabulations.
the same or greater TotalHits Complicated queries such as
for the same month (RankThis- this may perform slowly for large
Month); sets of data depending on how Steve Troxell is a software engi-
➣ A query is launched to retrieve widely the values are distributed neer with TurboPower Software
the matching web page row for and how well you indexed the where he is developing Delphi
the previous month and then tables. But for the type of data client/server applications for the
another query is launched to we’ve examined here and similar casino industry. Steve can be
count the number of other rows data summaries (product sales by contacted at [email protected]
having the same or greater product and month, for example), or on CompuServe at 74071,2207

October 1996 The Delphi Magazine 15

You might also like