coming into the world of SQL and client/server may overlook the SalesByStore contains: Store_ID –––––––– TotalSales –––––––––– subtle power of the SQL language. 6380 132.80 7066 1,821.25 You may be tempted to look on the 7067 1,486.30 SELECT statement as a simple 7131 1,400.15 7896 604.40 “record read” I/O function which 8042 1,232.00 will return a specified set of SELECT Average = AVG(TotalSales) FROM SalesByStore columns and rows to be further Average manipulated by the Delphi client –––––––––– 1,112.82 program. In reality, SELECT packs a pretty powerful data processing punch of its own. We skimmed the ➤ Figure 1: Obtain the average sales figure surface of what SELECT can do back in Issue 4 (November 1995). This month we’re going to expand our SELECT Store_ID, TotalSales FROM SalesByStore knowledge further and take a look WHERE TotalSales > 1112.82 at just how we can leverage the ORDER BY TotalSales DESC
power of SELECT by making use of Store_ID TotalSales
–––––––– –––––––––– subqueries. 7066 1,821.25 Subqueries, sometimes referred 7067 1,486.30 7131 1,400.15 to as nested queries, can be 8042 1,232.00 thought of as chaining two or more queries together with the results of ➤ Figure 2: Stores with above average sales one query feeding into or partici- pating in the processing of the other query. I should say at the SELECT * FROM SalesByStore outset that the 16-bit BDE with WHERE TotalSales > (SELECT AVG(TotalSales) Delphi 1.0 does not support FROM SalesByStore) ORDER BY TotalSales DESC subqueries in local SQL (ie, with Store_ID TotalSales Paradox and dBase tables etc), –––––––– –––––––––– only with database servers. 7066 1,821.25 7067 1,486.30 The usefulness of subqueries 7131 1,400.15 might best be explained with an 8042 1,232.00 example. Suppose you want a list of stores having above average sales. ➤ Figure 3: Above average sales with a subquery Let’s say we have a table called SalesByStore containing one row per store with the total sales for this case, the inner query (SELECT you wanted a list of stores with any that store. First, we must obtain the AVG) is run first to obtain the employee making under $5.00 an average store sales as shown in average value which is then substi- hour? Figure 4 shows how to do Figure 1. Then, we would use this tuted into the outer query. The this. The inner query finds all value to create the list of above outer query then runs with a employees making less than $5.00 average stores in Figure 2. We constant value in place of the inner an hour and returns a list of would be tempted to have a Delphi query. All of this happens within Store_IDs where those employees client program run the first query, the context of a single SQL query work. Notice that we use the then substitute the average value sent to the server and a single DISTINCT keyword to get a non- obtained into the second query. result set passed back to the client, duplicating list of store numbers. However, with a little ingenuity, instead of the two-stage process Otherwise, if a store had more than we can accomplish the same thing we had before. one employee matching the crite- by changing Figure 2 to use a Subqueries don’t have to return ria we would get duplicate subquery as shown in Figure 3. In just a single value either. Suppose Store_IDs in our return set.
10 The Delphi Magazine Issue 14
Correlated Subqueries SELECT StoreName FROM Stores What if you wanted to find all the WHERE Store_ID IN (SELECT DISTINCT Store_ID FROM Employees stores whose sales are above the WHERE PayRate < 5.00)
average sales for the same city? StoreName
–––––––––––––––––––––––– This is a little different because it’s Videos R Us not a simple two-stage process Big Bag O’ Videos Videos To Go where we can obtain the result of Superstar Video Rental one query and plug that directly into a second query. In this case, ➤ Figure 4 the query that determines the average for the city changes as we examine stores in different cities. SELECT Store_ID, TotalSales FROM SalesByStore S1 Figure 5 shows how to make a WHERE TotalSales > (SELECT AVG(TotalSales) FROM SalesByStore WHERE City = S1.City) correlated subquery where the ORDER BY Store_ID values from the outer query affect Store_ID TotalSales the execution of the inner query. –––––––– –––––––––– 7066 1,821.25 In this query we cannot evaluate 7131 1,400.15 the inner query first and then use its results to evaluate the outer ➤ Figure 5 query. Here the outer query exe- cutes and, as it examines each row in SalesByStore, the inner query CREATE TABLE WebMonthly( executes once per row using the Page varchar(50), City value for the current store. YearMonth char(6), /* YYYYMM */ TotalHits int, Notice that we must use a table PRIMARY KEY (Page, YearMonth)) alias on the outer query in order to reference it from within the inner ➤ Figure 6 query. In its entirety, this query logically states, “For each row in SalesByStore (S1), calculate the SELECT TotalHits, Page FROM WebMonthly average for all stores in the same WHERE YearMonth = ’199607’ /* July 1996 */ city as the current store (from S1), ORDER BY TotalHits DESC
and return only those stores whose TotalHits Page
––––––––––– –––––––––– TotalSales are greater than the 2143 default.htm average.” 773 download.htm 436 products.htm 368 pressrel.htm Ranking Data 296 apd.htm 233 sleuth.htm Let’s use a real world example to 199 ordering.htm see how much more we can do with 185 orpheus.htm subqueries. Every time someone 140 systinfo.htm 134 orp21.htm accesses a page on TurboPower’s 112 newsletr.htm web site, a “hit” record is written to 85 about.htm 85 orderup.htm a transaction table that posts, 81 order.htm amongst other things, the date and time of the hit and the page that ➤ Figure 7 was hit. In addition a summary file of total hits by page (see Figure 6) is updated for each month via an statistics. Retaining historical produce this list. We simply run the insert trigger on the transaction information like this may not be as query shown in Figure 7 and print table. important for web page hits, but for off the output for the Marketing Since analysis by month is one of product sales, customer order Guy. our key purposes in collecting this activity, and other similar events, Unfortunately, Marketing Guy data, and since the size of the table historical summary data may be returns and asks “How simple is it is relatively small (50 or so web very important. to put a ranking on these from 1 to pages by 12 months for a year’s Obviously, our marketing folks whatever so I can easily pull off the worth of data), it is worthwhile to would very much like to evaluate top 20?” We can actually get the make a separate table for these the effectiveness of the web site by ranking within the result set itself summary statistics. This also gives seeing an ordered list of “Top Page by using a subquery as shown in us the flexibility to purge the trans- Hits” for any given month. With Figure 8. action table periodically while still web page hits summarized by This is a bit different from the retaining the distilled historical month, it is very straightforward to subqueries we’ve looked at so far.
12 The Delphi Magazine Issue 14
Here we are not using a subquery SELECT Rank = (SELECT COUNT(TotalHits) FROM WebMonthly to restrict the rows returned by the WHERE TotalHits >= W.TotalHits AND outer query, but instead we are YearMonth = W.YearMonth), TotalHits, Page using the subquery to compute one FROM WebMonthly W of the columns in the result set. WHERE YearMonth = ’199607’ /* July 1996 */ ORDER BY 1 This is still a correlated subquery, Rank TotalHits Page so the subquery will run for each ––––––––––– ––––––––––– –––––––––––– row processed by the outer query. 1 2143 default.htm 2 773 download.htm It is critical that a subquery used in 3 436 products.htm this sense return exactly one row 4 368 pressrel.htm and exactly one column or it 5 296 apd.htm 6 233 sleuth.htm wouldn’t make sense in the main 7 199 ordering.htm result set. 8 185 orpheus.htm 9 140 systinfo.htm The outer query retrieves all 10 134 orp21.htm rows from the WebMonthly table for 11 112 newsletr.htm 13 85 about.htm July 1996, which equates to all the 13 85 orderup.htm web pages hit in that month. For 14 81 order.htm each row retrieved, the inner query (SELECT COUNT(TotalHits)) counts ➤ Figure 8 the number of rows in the same month (YearMonth = W.YearMonth) having TotalHits the same as or SELECT Rank = (SELECT COUNT(DISTINCT TotalHits) FROM WebMonthly higher than the TotalHits of the WHERE TotalHits >= W.TotalHits AND current row of the outer query YearMonth = W.YearMonth), TotalHits, Page (TotalHits >= W.TotalHits). There- FROM WebMonthly W fore, the row with the highest WHERE YearMonth = ’199607’ /* July 1996 */ ORDER BY 1 TotalHits gets ranked as number 1 Rank TotalHits Page because there is only one row with ––––––––––– ––––––––––– –––––––––––– the same or greater TotalHits (the 1 2143 default.htm 2 773 download.htm row itself). The row with the sec- 3 436 products.htm ond highest TotalHits gets ranked 4 368 pressrel.htm as number 2 because there are 5 296 apd.htm 6 233 sleuth.htm exactly two rows with the same or 7 199 ordering.htm greater TotalHits (the row itself is 8 185 orpheus.htm 9 140 systinfo.htm equal, and the row ranked number 10 134 orp21.htm 1 is higher). 11 112 newsletr.htm 12 85 about.htm Finally, we order the list by 12 85 orderup.htm specifying the column number to 13 81 order.htm order by rather than the column name. Some database server ven- ➤ Figure 9 dors do not allow ordering by a computed column name, but for some reason, it’s still legal to order when ORDERUP.HTM is processed. Ranking Multiple Sequences by that same column using the Again, the rank is 13. So you hand off your new statistical column number. Go figure... How do we correct this and report and lean back in satisfaction achieve a ranking list with no gaps? thinking “what else could our Duplicate Values In Rankings All we need to do is add the Marketing Guy possibly want?” In Figure 8, note that ABOUT.HTM and DISTINCT keyword within the inner “Great!” he says. “Can you stick ORDERUP.HTM have the same hit query as shown in Figure 9. Why the ranking for last month in there count, and therefore have the same does DISTINCT correct this? Now, too? And show the number of hits ranking (there is a tie for this posi- all rows having the same value for from last month as well, so I can tion). However, the fact that we TotalHits are counted only once as chart the up or down movement.” skip from rank 11 to rank 13 is very a whole. Now when processing Figure 10 shows just how to do disheartening. Why this happens is ABOUT.HTM, there are 11 rows with a this. Since this is a fairly involved straightforward. When the query higher TotalHits and one distinct query, I’ve used variables to hold processes the row for ABOUT.HTM, match with the same value. There- the current and last month values there are 11 rows with a higher fore, the ranking is 12. Since rows to make it easier to follow. This TotalHits and 2 rows with the with the same value are counted script is written for Microsoft SQL same TotalHits (ABOUT.HTM and only once, you are guaranteed Server, there are subtle differences ORDERUP.HTM). Therefore, the rank is to have an unbroken chain of in how other servers handle 13. The exact same logic applies consecutive rankings. variables.
14 The Delphi Magazine Issue 14
At first this query looks very DECLARE @vThisMonth char(6) intimidating, but fear not! It’s actu- DECLARE @vLastMonth char(6) SELECT @vThisMonth = ’199607’ /* July 1996 */ ally more of the same thing we’ve SELECT @vLastMonth = ’199606’ /* June 1996 */ been doing, just a whole lot of it in SELECT the same query. RankThisMonth = (SELECT COUNT(DISTINCT TotalHits) Our main outer query is the same FROM WebMonthly WHERE YearMonth = M1.YearMonth AND as before, selecting all the web TotalHits >= M1.TotalHits), pages for the given month. Only RankLastMonth = (SELECT (SELECT COUNT(DISTINCT TotalHits) FROM WebMonthly now we’re assigning a table alias of WHERE YearMonth = M2.YearMonth AND M1 to this set of rows. TotalHits >= M2.TotalHits) FROM WebMonthly M2 The RankThisMonth column is the WHERE YearMonth = @vLastMonth AND same subquery as in Figure 9. It is Page = M1.Page), a correlated subquery tied to the HitsThisMonth = M1.TotalHits, HitsLastMonth = (SELECT TotalHits FROM WebMonthly current row in M1 which computes WHERE YearMonth = @vLastMonth AND the ranking for the current month. Page = M1.Page), M1.Page The logic is therefore “for every FROM WebMonthly M1 web page in the current month WHERE YearMonth = @vThisMonth ORDER BY 1, 2 (M1), count the number of other web pages in that month (WHERE RankThisMonth RankLastMonth HitsThisMonth HitsLastMonth Page YearMonth = M1.YearMonth) that ––––––––––––– ––––––––––––– ––––––––––––– ––––––––––––– –––––––––––– 1 1 2143 2066 default.htm have distinct TotalHits greater 2 2 773 874 download.htm than or equal to this page (Total- 3 3 436 526 products.htm 4 4 368 509 pressrel.htm Hits >= M1.TotalHits).” 5 5 296 341 apd.htm The RankLastMonth column is a 6 (null) 233 (null) sleuth.htm 7 7 199 200 ordering.htm subquery that itself contains a 8 6 185 207 orpheus.htm subquery. The outer query re- 9 8 140 185 systinfo.htm trieves last month’s WebMonthly 10 9 134 183 orp21.htm 11 10 112 125 newsletr.htm record for the current web page. 12 11 85 111 orderup.htm The inner query then uses that to 12 12 85 92 about.htm 13 13 81 77 order.htm compute last month’s ranking. This 14 (null) 71 (null) sl-scrn1.htm is essentially the same as the Rank- 15 15 53 54 delphi32.htm 15 15 53 54 gi.htm ThisMonth subquery, except it is 16 14 50 61 btree.htm correlated to last month’s web page row (M2) instead of this ➤ Figure 10 month’s (M1). The net result of these two nested queries is a single value representing the ranking of TotalHits for last month performance should not be much that web page for last month. Note (RankLastMonth); of a problem. Also, bear in mind that when a particular page was ➣ Finally, a fourth query is run to that these types of query are not not present last month, we conven- retrieve the matching web page run very frequently. iently get a null value for its row for the previous month ranking and hit count. (again) in order to get last Conclusion The HitsThisMonth column is sim- month’s total hits. We’ve seen how to use subqueries ply copied from the TotalHits from Obviously, a query such as this is a to add another dimension of proc- the current month’s web page row good candidate for a stored proce- essing power to our SQL queries. (M1). The HitsLastMonth column dure, with parameters for the two Subqueries can generally be used requires a subquery to retrieve last months to compare. Note this anywhere an expression compat- month’s row for the current web algorithm is not limited to compar- ible with the result of the subquery page. ing consecutive months: any two can be used. Next month, we’ll con- To recap, for every row in M1: points in time can be compared tinue the theme by seeing how to ➣ A query is launched to count the (for example, current month and work with case functions, running number of other rows having same month last year). totals and cross tabulations. the same or greater TotalHits Complicated queries such as for the same month (RankThis- this may perform slowly for large Month); sets of data depending on how Steve Troxell is a software engi- ➣ A query is launched to retrieve widely the values are distributed neer with TurboPower Software the matching web page row for and how well you indexed the where he is developing Delphi the previous month and then tables. But for the type of data client/server applications for the another query is launched to we’ve examined here and similar casino industry. Steve can be count the number of other rows data summaries (product sales by contacted at [email protected] having the same or greater product and month, for example), or on CompuServe at 74071,2207