Stix
Stix
Problem
Write and test functions to compute the following statistics for a nonempty list of
numeric values:
• The mean, or average value, is computed by dividing the sum of the values by
the number of values in the list. In mathematical notation, this is represented as
x1 + x2 + … + xn
----------------------------------------
n
where n is the number of values in the list—n is at least 1—and x1, x2, …, xn are
the values in the list.
• The median is the middle value in the list when the list’s values are arranged in
numerical order. If the list has an even number of values, either of the two mid-
dle values may be chosen as the median. (This may differ from conventions you
have used in other contexts.)
• The range of values in the list is a pair of values from the list, namely the mini-
mum and maximum values.
• The standard deviation, a measure of how the values are spread out, is computed
using one of the two formulas described below.
Let the variables m and sqm represent the following quantities:
m = the mean value in the list;
sqm = the mean of the squares of values in the list, that is,
2 2 2
x1 + x2 + … + xn
------------------------------------------------
n
in mathematical notation.
One formula for the standard deviation is represented by the mathematical
expression
2
sqm – m
The other formula is
2 2 2
( x1 – m ) + ( x2 – m ) + … + ( xn – m )
---------------------------------------------------------------------------------------------
n
Intuitively, this is measuring how far, on the average, the values are from the
mean.
Preparation
The reader should have experience with defining Scheme functions and constants
and with using the following Scheme constructs: conditional expressions, lists, the
built-in functions first, second, member, and assoc, the lambda special form, and
the functionals map, apply, accumulate, find-if, remove-if, and keep-if.
Exercises
Application 1. Compute the mean, median, and standard deviation for
the following set of values:
5 1 4 2 3
Use both formulas to compute the standard deviation.
Analysis 3. Devise two sets of five values with the same mean, one hav-
ing a very small standard deviation, the other having a
very large standard deviation.
Analysis 5. Convince someone else that the two formulas for the stan-
dard deviation compute the same result. Why is your argu-
ment convincing?
Analysis 7. Give a set of values for which the average of the squares of
the values is different from the squared average of the val-
ues. Give a set of values for which the two quantities are
equal.
# xk
1"k"n
93
Analysis 9. Discuss, for each statistic, the accuracy of the information
it conveys about the list of values, and describe lists of val-
ues for which the statistic may be misleading.
Analysis 10. Give a reason for squaring the distance each value is from
the mean in the second formula for standard deviation.
Hint: Compute
( x1 – m ) + ( x2 – m ) + ( x3 – m )
---------------------------------------------------------------------------
3
for the list 1, 2, 3.
94
(max 3 -1 51 48 2)
Stop and predict $ Give an expression that computes the sum of the values in a list L of
numbers.
95
How is the mean computed? We move to the mean. As described in the problem
statement, the mean is the sum of the values divided
by the number of values. The sum of the values can
be computed with apply in the same way as the maxi-
mum was:
(apply + '(3 -1 51 48 2))
How should these functions be All these functions are short. Even so, we may have
tested? made typing errors, so we stop to test them here.
We test mean on a “typical” list for which the answer
is easy to compute by hand, and test list-max and list-
min in the same way. We also check what happens for
all the functions in the “extreme” situation where
the list of values contains only one element. (The
problem statement guarantees that the list of values
contains at least one element.)
Stop and consider $ For which list of ten values is the mean most easily computed by hand?
Stop and consider $ What does mean return, if anything, when given an empty list as
argument?
What is the next step? Appendix A contains the code that computes the
mean and the range. Looking down the list of
remaining functions, we note that the standard
deviation merely involves arithmetic on means. We
already have a function to compute the mean, so we
work on standard deviation next.
How should the standard The problem statement gives two formulas for the
deviation be computed? standard deviation. The second formula appears to
be long and complicated. The first formula, how-
ever, is the square root of the difference of two
quantities:
96
the mean of the squared values, and
the square of the mean of the values.
In Scheme, we have the following:
(define (std-dev values)
(sqrt
(- (mean-of-squared values)
(square (mean values)) ) ) )
values ? average of
squared values
values ?
mean average of
squared values
97
(define (squared values)
(map square values) )
How should std-dev be tested? There are two keys to testing a complicated func-
tion like std-dev. The first is to test the parts individu-
ally before testing them in combination. Just as a
contractor or engineer requires reliable parts to
build a house or a bridge, so should a programmer
develop reliable components of a program before
assembling them. The second key is to select test
values with easily checked answers.
To test the parts individually, we test the square func-
tion and the squared function by themselves. (We’ve
already tested mean.)
For std-dev, an easy-to-compute case is an extreme
case. The problem statement notes that the stan-
dard deviation is a measure of how much the values
are spread out, so it should be 0 when all the values
are the same. We should check two or three such
cases, including a case with a value list of just one
value.
Next we consider value lists whose standard devia-
tion should be 1. Such lists are more easily found by
using the second formula from the problem state-
ment. After a bit of algebraic manipulation, we note
that when each value differs from the average by 1
or –1, the standard deviation should be 1. The val-
ues 1, 1, 3, 3 make up one such set.
Finally, we consider lists of values whose standard
deviation is not an integer. We use the first formula
to compute the answers by hand, since it provides
an independent way to check the answer (the pro-
gram implements the second formula, so redoing
98
its computation by hand would provide somewhat
less convincing evidence for correctness). We use
small lists of values—three or four values each—to
minimize the pain of the hand computation.
Stop and help $ Compute the standard deviation for the values 1, 4, 1, then use these
values as test data for the std-dev function.
Note that correct results on the test data do not prove
that the function is indeed correct. Chosen system-
atically, however, with both extreme cases and “typi-
cal” cases, they can provide substantial evidence.
Exercises
Analysis 11. Suppose that three values each differs from its mean by
the same amount. Show that they must all be identical.
Application 13. Write a set of functions that computes the standard devia-
tion using the second formula given in the problem state-
ment:
2 2 2
( x1 – m ) + ( x2 – m ) + … + ( xn – m )
-----------------------------------------------------------------------------------------------
n
Analysis 14. How can the functions from the previous exercise help in
testing the code designed so far in the case study?
99
One way to compute the median
How should the median be The median was defined as the middle value when
computed? the values are arranged in order. In a list without
duplicate values, it’s the value that’s greater than
half the other values in the list. Each definition sug-
gests an approach to computing the median.
Stop and predict $ Which of the two definitions of the median is likely to be easier to imple-
ment in Scheme?
Stop and predict $ Write a function called half-length to find the number of values that
will be less than the median.
The first definition requires putting the values in
order, while the second definition does not. That
suggests that a solution based on the second defini-
tion is likely to be somewhat easier. Thus, even
though the solution will only work with a list with no
duplicate values, we will still start with that
approach. We are confident that the solution will be
easy to modify to work with lists with duplicates as
well.
How can the value that’s larger “Finding the value that …” suggests using the find-if
than exactly half the other values functional:
in the list be found?
(define (median values)
(find-if
_________________
values))
100
(define (median values)
(find-if
(lambda (x)
(possible-median? x values) )
values) )
How is the number of list To find the number of list elements less than a
elements less than a given value given value, we first define a function smaller-vals
found?
that selects those elements from the list. It uses keep-
if as follows:
(define (smaller-vals x values)
(keep-if
(lambda (y) (< y x))
values) )
How should all these functions Summarizing, we have the following. Finding the
work together? median consists of using find-if to find a possible
median. The function possible-median? checks for a
possible median, which is an element for which the
number of smaller elements in the list is equal to
half the length of the list. The function how-many-<
finds the number of elements in the list smaller
than a given value, and the function half-length
returns half the length of a list. Appendix C con-
tains the code.
We check this by hand on an example list, big
enough to involve a significant amount of computa-
tion, but not so big that we lose track of what’s
going on. The list (5 1 4 3 2) seems large enough;
since its median is the fourth element, that should
provide enough evidence that our approach at least
is on the right track. Here are the steps of the “desk
check”. Indentation shows calls of subsidiary func-
tions.
101
Evaluate (median '(5 1 4 3 2)):
Find a possible median by checking first 5, then 1,
and so on until one is found.
1. Evaluate (possible-median? 5 '(5 1 4 3 2)):
Check how many elements in the list are less than 5, and
see if that’s half the number of elements.
• Evaluate (how-many-< 5 '(5 1 4 3 2)):
Using smaller-vals, construct the list (1 4 3 2), and
return its length, 4.
• Evaluate (half-length '(5 1 4 3 2)):
Return 2.
• 4 % 2, so 5 isn’t a possible median.
2. Evaluate (possible-median? 1 '(5 1 4 3 2)):
Check how many elements in the list are less than 1, using
how-many-<. How-many-< returns 0; half-length returns
the same thing it did before, which was 2. (Details are
omitted.) 0 % 2, so 1 isn’t a possible median.
3. Evaluate (possible-median? 4 '(5 1 4 3 2)):
There are 3 elements in (5 1 4 3 2) that are less than 4. 3
% 2, so 1 isn’t a possible median.
4. Evaluate (possible-median? 3 '(5 1 4 3 2)):
There are 2 elements in (5 1 4 3 2) that are less than 4, so
possible-median? returns true, and median correspond-
ingly returns 3.
Stop and help $ Trace through the application of median to the list (4 1 3 2).
How should median be tested? So far, so good. We must now thoroughly test median
online. Several aspects of the code suggest possibili-
ties for error:
• The integer division in half-length provides the possibility
of an “off-by-one” error. Testing this involves using value
lists both of odd and of even length.
• Another source of off-by-one errors is the comparison in
smaller-vals. Sometimes programmers say “<” when they
mean “<=” or vice-versa. Guarding against this error
requires devising boundary test values.
• Finally, programmers (even experts!) sometimes get con-
fused, and accidentally reverse the sense of a comparison,
using “<” when they mean “>” or vice-versa. Test data that
displays this error will be easy to devise, but we must be
sure to test the functions individually to ensure that we
find the error quickly.
102
several ways in which a value can be “extreme” in a
list: it can be at the beginning or end of the list, or it
can be the largest or smallest value of the list. We
test median with value lists containing the median at
various positions.
Stop and predict $ Is it true that the median can’t also be the largest value? Explain.
From the principles above, we generate the follow-
ing test arguments for the various functions:
How can map be used to test A good way to test possible-median? is to wrap a map
median more easily? around it. The resulting function will look just like
median, except for having map in place of find-if:
(define (test values)
(map
(lambda (k)
(possible-median? k values) )
values) )
Thus
(test '(5 1 4 3 2))
should return
(#f #f #f #t #f)
Stop and help $ Design test data for the various functions used to compute the median,
and test the functions.
Now we try the function on a list that contains dupli-
cate values. The extreme case for this situation is
103
where the values are all the same, and for this the
median fails miserably. The fix is not too difficult,
however; it is left as a study question.
Exercises
Analysis 16. Why does median not work on some lists that contain
duplicate values?
Testing, analysis 17. Provide an argument to median that does contain dupli-
cate values but for which the correct median value is
returned.
Debugging 19. Fix median. Hint: Think about using both a “greater than
or equal to” and a “less than or equal to” test in the argu-
ment to find-if.
Analysis 20. If there are an even number of values in the list passed to
median, which of the two “middle” elements gets returned
as the median?
Application 22. Use map to test the mean function on all of the lists
(1 2 3), (4 5), and (2) at once.
Why even worry about finding One might ask what good another method of com-
another way to compute the puting the median would do. There are several rea-
median?
sons to look for an alternative implementation:
a. One reason is merely to acquire more experience with
Scheme, and with functionals in particular. The function-
104
als are so powerful and flexible that even experts find new
uses for them after playing around for awhile.
b. We saw in computing the standard deviation that two
implementations could be used to check each other’s
results.
c. Finally, we should always aim for code that’s easier to test
and understand. The median-by-sorting function looks
promising in this respect, as long as the sort function isn’t
too complicated.
How should the list be sorted? Sorting can be done recursively. To gain more prac-
tice with functionals, however, we choose to explore
ways to use them to sort, and survey the functionals
to see which ones might be helpful. Find-if returns a
single element of a list. It could be given a list of all
orderings of values and return one in which all the
values are in order, but generating the list of all
orderings seems like too much trouble. Remove-if
and keep-if return only parts of a list, and thus seem
inappropriate. Map returns a list of the correct
length; however, it involves the application of a one-
argument function to each element, and the sorting
process will involve comparison of pairs of elements.
How can accumulate be used to There are many ways to sort a list (entire books have
help sort the list? been written on the subject!). One way is as an accu-
mulation.
How can the computation of the An accumulation in real life is a collection of things,
mean, maximum, and minimum amassed one by one. Raindrops accumulate into a
be viewed as accumulations?
puddle. Blocks accumulate into a stack, as shown
below.
105
ments one by one. The accumulate functional per-
forms that one-by-one accumulation. (We noted
before that apply essentially adds the numbers all at
once.) For example, (accumulate + '(1 5 4 3 2)) per-
forms the computation ((((1+5)+4)+3)+2), which,
expressed in words, is
Add 1 and 5.
Add 4 to the result (the accumulated sum so far).
Add 3 to that result.
Add 2 to that result.
The maximum (or minimum) of a list of values can
also be an accumulation, built by successively com-
paring each value to the maximum (or minimum)
found so far. To compute the maximum value in the
list, we might do the following:
Compare 1 and 5.
Compare 4 with 5 (the largest value so far).
Compare 3 with 5 (the result of the second comparison).
Compare 2 with 5 (the result of the third comparison).
That’s just what (accumulate max '(1 5 4 3 2)) does.
How can sorting be viewed as an To see how this approach applies to sorting, we con-
accumulation? sider how a person might arrange a hand of cards.
He or she would pick them up, one by one, and
insert each on into the hand. The cards are thus
being accumulated into the hand in order. When all
cards are picked up, the entire hand is sorted.
Applying this technique to a list of numbers, say 3,
1, 9, 5, and 4, yields the following steps:
Start out with 3.
Insert 1 into the list (3), giving (1 3).
Insert 9 into (1 3), giving (1 3 9).
Insert 5 into (1 3 9), giving (1 3 5 9).
Insert 4 into (1 3 5 9), giving (1 3 4 5 9).
(We arbitrarily assume that the list will be sorted in
increasing order.)
We just saw accumulate used to find the sum of values
and the maximum value in a list. Accumulate here is
used somewhat differently. The accumulation isn’t a
number, it’s a list. Thus the order of arguments to
the accumulating function is important (the value
accumulated so far comes first). Also, the first ele-
106
ment of the list argument must itself be an accumu-
lated value, in this case a list. Here’s the code:
(define (insert L k)
________________ )
(define (sort L)
(accumulate
insert
(cons
(list (first L))
(rest L) ) ) )
(3) 1 9 5 4
(1 3)
(1 3 9)
(1 3 5 9)
(1 3 4 5 9)
How is insert coded? Insert, given a list and a number, should return the
result of inserting the number into the list in the
correct position. Again, we could code it recursively
but choose for this problem to find a way to use
functionals. The insertion can be done by splitting
the list into two pieces, the small elements and the
large elements, and appending them around the
number to insert, as in the diagram below.
sorted list of numbers
new
number
Stop and predict $ Have we forgotten anything? Hint: think back to the bug in the first
version of median.
A problem, the same one we encountered in the
first version of median, is that there may be duplicate
107
values in the list. Using “"” for “less than” or “&” for
“greater than” (but not both!) solves the problem.
To break apart the list as just described, we use the
function smaller-vals designed for the first version of
median—slightly modified so that the order of argu-
ments is consistent with that of insert—and a similar
function derived from it:
(define (smaller+equal-vals L x)
(keep-if
(lambda (y) (<= y x))
L))
(define (larger-vals L x)
(keep-if
(lambda (y) (> y x))
L))
How should sort and its Appendix D contains the code for the rewritten
components be tested? median computation. We test in pieces, following
good programming practice. By now, we’ve col-
lected quite a number of “extreme test data” catego-
ries:
relevant element at the beginning;
relevant element at the end;
all elements identical;
element is the largest in the list;
element is the smallest in the list.
These should suggest test data for median-by-sorting,
sort, insert, smaller+equal-vals, and larger-vals.
Stop and help $ Why is it necessary to specify (list x) rather than x as an argument to
append in insert?
Stop and help $ Design test data for all the functions just designed.
108
Exercises
Analysis 23. What would be the undesired result of using “"” for “less
than” and “&” for “greater than” in the insertion?
Analysis 24. Another way to solve the problem is to break the list into
three parts: the values less than k, those greater than k,
and those equal to k, These three lists can then be
appended to form the proper result. Compare this
approach to the one we took.
Modification 26. Rewrite the insert function to work with the following ver-
sion of sort:
(define (sort L)
(accumulate insert L) )
109
An alternate way to compute the median
How can the other approach to finding the median be imple-
mented in Scheme?
Why even worry about finding another way to compute
the median?
How should the list be sorted?
How can accumulate be used to help sort the list?
How can the computation of the mean, maximum,
and minimum be viewed as accumulations?
How can sorting be viewed as an accumulation?
How is insert coded?
How should sort and its components be tested?
Exercises
Modification 27. Suppose the values list provided to the various statistics
functions is a list of pairs, each containing a name and a
value. Here’s an example:
((clancy 100)
(linn 100)
(wirth 65))
Modify the program(s) to return the desired statistics for
the new argument format.
Modification 29. Modify the mean function to return the average of all but
the largest and smallest scores in its argument list. (This is
done in some athletic competitions.) Thus the revised
function will return 2 for the list (8 1 2 3 –16) as well as
for the list (3 2 1 3 –7).
Testing, reflection 32. What “rules of thumb” help programmers design test
data?
110
Application 33. The mode of a list of values is the value that occurs most
often in the list. How might techniques described in this
case study help in designing a function to compute the
mode?
Analysis 34. It has been said that “Liars often figure and figures often
lie.” How might a liar choose among the mean, the mode,
and the median to communicate false information.
Modification 35. Explain how the program might be modified to alert users
to potentially misleading information in the statistics that
are computed.
111
Appendix A—functions to compute the mean and the range
; Return the sum of values in the list of numbers L.
(define (list-sum L)
(apply + L) )
Appendix D
Functions to compute the median, using a sorting function
; Return the list of numbers in values that are less than
; or equal to x. x is a number, values is a list of numbers.
(define (smaller+equal-vals L x)
(keep-if (lambda (y) (<= y x)) L) )
; L is a sorted list.
; Return the result of inserting x into L in proper order.
(define (insert L x)
(append
(smaller+equal-vals L x)
(list x)
(larger-vals L x) ) )