Statistical Literacy and Mathematical Thinking
Statistical Literacy and Mathematical Thinking
Statistical Literacy and Mathematical Thinking
ICME-9
2000ICME9d.doc
Page 1
Milo Schield
07/21/00 1:53 PM
ICME-9
In analyzing data, many students fail to distinguish "the chance of this distribution of data in a sample given that the population is normal," P(this distribution | population is normal), from "the chance the population distribution is normal given this distribution of sample data," P(population is normal | this sample)". All of these conditional probabilities involve conditional reasoning. Consider some other claims n ivolving conditional reasoning: "rejecting the null hypothesis when the null hypothesis is true" from "finding the null hypothesis is true when the null has been rejected." This confusion on conditionality is a big problem. This confusion was the basis for David Moore's argument that we should not teach Bayesian statistics. This confusion was the basis for the MSMESB1 recommendation for a "de-emphasis of mathematical formalism (probability, hypothesis testing)" (See Love and Hildebrand Slide 10, 1999.) As a result, one statistics text eliminated hypothesis testing. 4. CHANCE, BIAS AND CONFOUNDING In traditional statistics, students have problems with statistical explanations of variation: chance, bias and confounding .2 Students are taught that the sources of variation can be systematic or non-systematic (chance). In traditional statistics, the primary focus is statistical inference: deductive inferences about variability resulting from random chance. Differences between population parameters are discussed in the context of the t-test, regression and ANOVA. Regression models are introduced to determine the variability due entirely to chance in the slope and intercept. In each case, the focus is on the sampling variability expected if due entirely to random chance. This over-emphasis on chance makes it appear as if chance is the dominant "cause" of variability, as if chance has a much wider scope than do systematic causes. Systematic causes are marginalized by saying, "There is no test for bias."
Tests
None None None
Although it is true that "there is no test for bias", this statement is misleading. First, it implies that bias consists solely of errors such as measurement bias. It blurs any distinction between bias and confounding. Second, it implies that since there is no mathematical test, there is no mathematical tool that can be used. It ignores the Cornfield conditions for confounding. The Cornfield conditions are what give Statistical Literacy mathematical credentials. (Schield, 1999) De-emphasizing confounding has grave consequences: 1. Observational studies are de-emphasized. In traditional statistics, studies are often treated as experiments (where confounding is all but irrelevant) rather than observational studies (where confounding is most problematic). There is little mention of the relation between confounding and statistical significance. There is no mention that the t-value of an association between two variables could go from being significant to being insignificant (or vice versa) after taking into account a confounding factor in an observational study. 2. Observational studies as dismissed as given any evidence for causation. Teachers of traditional statistics say, "Association is not causation" without realizing how ambiguous this claim is. Of course it is true, that in observational studies, association does not prove causation. But in a great number of cases, association is an effect of direct causation. But students may conclude that "not" means "almost never", if not "never." 3. When observational data is statistically analyzed by the social sciences, it receives an uncritical credibility. When the humanities try to deal with the human condition, with the causes of human action, their thinking is seen as being subjective and unscientific. But when economists predict the future or when experimental psychologists explain human behavior using statistical inference, their claims are seen as being objective and scientific. 5. UNDERLYING THEMES The cumulative effect of these two misunderstandings is to leave many students in a state of confusion. They may know how to manipulate the formulas, but they don't really understand what statistics is saying. They don't really understand conditional probabilities
1 2
Making Statistics More Effective in Schools of Business. Students often fail to understand that the Binomial theorem is a mathematical model of pure chance. Students may view chance as being indeterminate and as lacking a nature. So the idea of determining chance, of giving chance a definite nature, may seem totally inappropriate. This reflects their difficulty in seeing a "hidden order" at the macro level to that which is "indeterminate" at the micro level.
2000ICME9d.doc
Page 2
Milo Schield
07/21/00 1:53 PM
ICME-9
or the relation between chance and confounding. They presume that if an association is statistically significant, then that means the association must be both real and important. These two problem areas have some underlying similarities involving mathematical thinking: (A) ordered relationships and (B) contextual relationships. A. Problems with ordered relationships. These problems begin in elementary school with subtraction and division and are repeated with ratios, fractions and percentages. These problems are encountered in logic with conditional logic: "If A then B." Given the truth or existence of B, students presume this proves the truth or existence of A. These problems are encountered in critical thinking where students confuse to and from, premise and conclusion, if and then , and cause with effect. These problems are encountered in algebra or probability when dealing with inequalities. For example, if N > n and D > d, then N+D > n+d. and N*D > n*d. But it does not follow that N-D > n-d or that N/D > n/d. Similarly, suppose R1 = N1/D1, R2 = N2/D2, r1 = n1/d1, and r2 = n2/d2 where R1> r1 and R2>r2. Now suppose that R = (N1+N2)/(D1+D2) and r = (n1+n2)/(d1+d2). Does it follow that R > r? No it does not. It could be that R < r. Such a reversal in an association is known as Simpson's Paradox. (See Schield, 1999) The concept of order is so basic to mathematical thinking that it must be given a very high priority in our teaching.
dents when applied to arithmetic measures of relationships. Students never think of arithmetic diffe rences or ratios as being contextual. The idea of a relationship is one of the most fundamental concepts in all of human thought. This is what ties statistical literacy to critical thinking. To the extent statistical education has not emphasized these two underlying mathematical themes, we may not be teaching students what they really need to learn. We may have confused level of mathematics (algebra versus calculus) with level of mathematical thinking (associations as ordered, and associations as contextual). This paper argues that we need to i ncrease the level of mathematical thinking considerably even though we may not use a very high level of mathematics in doing this. We need more mathematical thinking even though we may need less advanced mathematics. To do this, a new course is proposed. 6. STATISTICAL LITERACY: A PROPOSAL This paper proposes the development of a new course: Statistical Literacy the study of statistics as evidence in arguments. Although it can function as a stand-alone course, Statistical Literacy is viewed here as a support course for traditional statistics.
Statistical Inference (Chance) Central Limit Theorem, Implications of chance Conditional probability in explaining variability
Ordered Relations
Arithmetic
subtraction: A -B B - A Division: A / B B/A
Probability/Percentage
Conditional Probability P(A | B) P(B | A) Percentage of A who are B Percentage of A among B
Logic
If P then Q. So, if P, then Q If P then Q. So, if Q, then P
B. Problems with contextual relationships. Most students never think of an association as being contextual. Perhaps their experience with arithmetic taught them to regard all numeric measures as absolute and unchanging in the absence of error. They can't imagine that an arithmetic sum, difference, product or ratio might vary depending on what one takes into account. Instead students act as though binary variables were independent and continuous variables were orthogonal. This kind of error in statistics is similar to treating total derivatives as always being the same as partial derivatives. This is a very difficult idea for stu-
As a support course (a pre-stats bridging course), Statistical Literacy has a strong focus on ordered relationships to bring students up to speed to handle the conditional probabilities involved in confidence i ntervals and hypothesis tests. It has a strong focus on contextual relationships to help offset the idea that chance is the primary course of variability.
TRADITIONAL STATISTICS
Primary Focus Inference Variability Chance Mathematics Central Limit Theorem Kind of Study Experiment Kind of Control Control Of / Physical Size Data Set Small
STATISTICAL LITERACY
Modeling Confounding Cornfield Conditions* Observational Study Control For / Mental Large/Population
Although the course content involves descriptive statistics and modeling, this course is not a "baby stats" course. In its own unique way, it is as de-
2000ICME9d.doc
Page 3
Milo Schield
07/21/00 1:53 PM
ICME-9
manding as is a course in statistical inference. Rather than demoting conditional probability (per the MSMESB recommendation), this course makes conditional thinking and conditional probability a central feature. But it does de-emphasize "mathematical formalisms" (following the MSMESB recommendation) by using ordinary language rather than algebra. Rather than use computers to analyze micro-data for statistical significance, the course focuses on each student's ability to analyze, evaluate and express arguments involving statistics and to do so using a grammar that is, in certain respects, as precise and demanding as is algebra. Mathematically, this course has two themes: ordered relationships and associations as contextual. The critical thinking aspects of Statistical Literacy are discussed in Schield, 1999. Some elements of each theme are shown as follows:
MATHEMATICAL THINKING
Ordered Relationships
Describing and comparing rates, percentages & probabilities Reading tables and graphs Selecting part and wholes.
In describing these named ratios, the ideas of part and whole, together with the devices used by the English language to indicate part and whole, force students to become aware of order in these ratios. Consider the relation between part and whole in the following sets of phrases. 1. "X% of {whole} who are {Whole}" versus "The percentage of {whole} who are {part}." 2. "percentage of {whole} who are {part}" versus "percentage of {part} among {whole}." 3. "men's rate of death" versus "death rate of men" In 1, the phrase, "who are" introduces a whole in the first phrase, but a part in the second phrase. In 2, the phrase, "percentage of" introduces a whole in the first phrase, but a part in the second phrase. In 3, men is whole and death is part in both phrases despite their reversal of positions. The attention to order is even more important in comparing named ratios. The concepts of test and base in a comparison mixed together with the concepts of part and whole in rate or percentages force students to think carefully about ordered relations. Consider, 1. "Men are more likely to drink than women" 2. "Men are more likely to drink than smoke." 3. 4. Smokers are more likely among men than women Smokers are more likely among men than drinkers.
Association as Contextual
Multivariate models Simpson's Paradox; Medical tests Association as spurious Slope, r, and R 2 as spurious
CRITICAL THINKING
Concepts and Constructs
Experiment vs. observational study Control of vs. Control for Reliability,validi ty; correlation Constructs: social, biological, etc.
The following details the two themes involving mathematical thinking. 7. TEACHING ORDERED RELATIONS Ordered relations and conditional thinking can be taught by teaching the reading and comparisons of rates and percentages in tables and graphs. Rates and percentages are the best known examples of named ratios: ratios having proper names. The relation of these named ratios to the different kinds of arithmetic comparisons as shown as follows:
. Arithmetic Comparisons Of Numbers
Simple Difference
[Test - Base]
In 1, men and women are wholes, drink is the part. In 2, drink and smoke are parts, while men is whole. The ambiguity occurs because keywords were omitted ("than are women" versus "than to smoke"). In 3, smokers is part while men and women are wholes; in 4, smokers and drinkers are part while men is whole. Again, these forms are ambiguous because keywords were omitted ("than among women" versus "than are drinkers"). Describing and comparing named ratios has a very practical byproduct: students learn to express complex relationships clearly and concisely. This r equires considerable attention to grammar and is definitely language dependent. (For more details, see Schield 2000) 8. TEACHING CONTEXTUAL ASSOCIATIONS Once students realize that some relationships are contextual and can change depending on what else one takes into account (controls for), they conclude that any relationship could be nullified or reversed so one has no reason to base any argument on the strength of any association. This transition from na-
Simple Ratio
[Test / Base]
Relative Difference
[(Test- Base) / Base]
Ratio Family: Mixture Rate Family: Prevalence, Incidence Named Ratios Percentage Family: Share, Fraction, Proportion
2000ICME9d.doc
Page 4
Milo Schield
07/21/00 1:53 PM
ICME-9
ivete to skepticism overlooks the possibility that associations are contextual in specific ways. Cornfield, et al., identified mathematical requirements for an association to reverse in sign. (See Schield, 1998). As an example, consider the death rates at two hospitals: City and Rural. Patients are 50% more likely to die at City than at Rural. Is this justification to close up City, sack the staff and start over? Not necessarily if the association is spurious.
Death Rates
City
But in statistics, students expect both the sign and the magnitude of the association to remain invariant regardless of what other factors are taken into account. They fail to see associations as being contextual. Mathematically, the key to this course is the necessary condition for Simpson's Paradox. This littleknown condition (the Cornfield condition) is as fundamental to statistical literacy and observational statistics as is the Central Limit Theorem to statistical inference. [See Schield, 1998] 9. BENEFITS There are some indirect benefits in teaching a course in statistical literacy. In order to teach students how to describe and compare ratios and percentages, they must deal with summary data such as that provided in the US Statistical Abstract. This kind of data underlies many of the issue involving social statistics. Mathematics is often described as a liberal art. But not all aspects of mathematics are equally related to the liberal arts. Statistics, by its focus on the properties and behaviors of real entities brings mathematics into direct contact with the world of substantial matter. By focusing on how people think about quantitative measures and relationships, statistics is perhaps the portion of mathematics that is most deserving of supporting the mathematical claim to being a liberal art. By supporting a pre-stats statistical literacy course, the mathematical community could become leaders in building a bridge to the liberal arts. If traditional statistics is the bridge to the experimental social sciences (such as psychology and health), then statistical literacy could be the bridge to the observational social sciences (such as sociology and political science) and to the humanities (such as history, literature, and philosophy). Statistical Literacy could have much broader social implications as indicated in the figure below.
MATHEMATICAL THINKING
Remedial
Arithmetic, Algebra Geometry, Triginom. Quantitative Literacy Simple Probability
3.8% RR 3.2
Good
2.6 Pct. Pts
3.0%
1 Pct.Pt.
Overall
Poor
RR 1.5 2.0%
Rural
2.7%
By Hospital
In this case, the association between location and death rate (higher in city hospital than in rural hospital) may be spurious after taking into account the condition of the patient. Per Cornfield's condition, a stronger association is a necessary condition to make an unconditioned association spurious. The fact that there is a stronger association between patient condition (poor versus good) and death rates than there is between the hospital and death rates weakens any argument about causes based on the hospital statistic. Students are surprised to find there are many relations between two variables depending on what else we take into account. Consider a group of houses. Per additional bedroom, the price of a house increases as shown below: Price increase After p-value (t-value) per extra bed Controlling For $39,000 .000 (6.3) $16,000 land .001 (3.3) $ 9,000 land and house size .01 (2.4) $ 5,000 land, house size & baths .10 (1.3) So how much is the increase in price for an additional bedroom? Is bedroom statistically significant in determining price? Once again, students are confronted with the fact that associations are contextual. Mathematically, a spurious association is just an example of the difference between a total derivative and a partial derivative. Mathematically, this is not a very important difference. There is no reason to expect that the total and partial derivatives will have the same sign; there is no special concern when they have different signs.
Vocational
Counts, measures Rates, percents Tables, graphs, Comparisons
Professional
Modeling: Linear & Logistic Classifying, Discriminating
CRITICAL THINKING
Critical Thinking
claims, arguments premise, conclusion deductive, inductive Valid, strong
Decision Making
Cost of ignorance Value of knowing Maximizing value, Minimizing costs.
Communications
Describe ratios & associations, Check Premises Check Context
2000ICME9d.doc
Page 5
Milo Schield
07/21/00 1:53 PM
ICME-9
10. WHY MATHEMATICIANS? Whether to introduce Statistical Literacy as a new course might seem to be a rather local issue one that statistical educators can evaluate. So, why should mathematics educators take any special role in this matter? Mathematicians are needed for several reasons. (1) More students study statistics within departments of mathematics than in any other single department. Systematic change will not occur without leadership from mathematics departments. (2) Statisticians may need an OK from mathematicians before agreeing to introduce a statistics course that doesn't involve chance. Statisticians are very conscious of their applied status within mathematics a discipline that often assigns priorities or status based on theoretical abstractness. As such, statisticians may not want to propose anything that might appear to lessen their allegiance to higher mathematics. (3) Statistics teachers at two-year colleges are located primarily in mathematics departments and would look to their colleagues in four-year colleges for direction. Those mathematicians teaching at two-year colleges are conscious of their relative status within the mathematics community. They may see great benefits in teaching this material to their students, but they would want and need the sanction of the mathematics community before they would initiate the process. (4) Faculty in the humanities, who need this most, are least able to identify exactly what they want. Mathematicians must be willing to speak for them. The social sciences rely on statistical inference; the humanities need comparable tools. (5) Mathematicians can give even-handed support to confounding (bias), inductive inference, observational studies and philosophical inquiries. Statisticians attach a high importance to chance, statistical inference, controlled experiments and the Scientific Method. Mathematicians have no such allegiances. (6) Mathematicians are needed to bring about needed changes in the current guidelines for teaching mathematics and statistics. Those interested in teaching these materials might hesitate until it is added to the curriculum guidelines. Mathematicians are in excellent position to request such changes. 11. SUMMARY If students are to appreciate the role of statistics in their lives, they must be able to assess the strength of a statistic in supporting the truth of a claim. To do this, they must understand how an association can be true but irrelevant or spurious. They must understand how the assumptions they make may determine the conclusions they reach.
Mathematics is in a position to make a major contribution to students' ability to think critically about non-mathematical topics. Mathematics educators need to investigate new ways of helping students understand fundamental mathematical ideas (e.g., ordered relationships and contextual relationships) that are intimately related to the critical thinking in every-day life. 12. NEXT STEPS If this new course, Statistical Literacy, is to become an accepted course that is sanctioned by Departments of Mathematics and Statistics, it will require a great deal of work and support. The first step is dialog: talking about these ideas and giving feedback. Then, a group of educators, data producers and data consumers is needed to review the goals, assess the materials and evaluate the approaches used to achieve these goals. Educators in this group should include mathematicians and statisticians, teachers of journalism and communications, as well as teachers in the humanities and the professions who are interested in this critical thinking approach to statistical education. Data producers in this group should include those from government statistics organizations, from polling organizations and from the centers that compile and disseminate large data sets. Data consumers in this group should include those who expect claims to have clarity, arguments to have credibility and conclusions to be given no more support than the evidence warrants. Materials are needed to help teachers teach these concepts. Conferences are needed to train educators on how to teach statistical literacy. Educators are needed to test the effectiveness of this program in their classrooms. Evaluators are needed to design assessment tools and to monitor outcomes. Exa mples, problems, sample quizzes and tests are also needed for teachers to use in teaching this new subject. A support system is necessary to help new teachers master these new materials, to develop better quizzes, test and projects, to obtain better examples, and to provide a forum to exchange ideas. This is an extremely ambitious effort but the goal of making a substantial improvement in the understanding of, and appreciation for, statistics is certainly worth the effort required.
2000ICME9d.doc
Page 6
Milo Schield
07/21/00 1:53 PM
ICME-9
REFERENCES Loftsgaarden, Don and Ann Watkins, Statistics Teaching in Colleges and Universities: Courses, Instructors, and Degrees in Fall 1995. The American Statistician. November, 1998 p. 308. Love, Thomas E. and David K. Hildebrand, 1999. Recommendations from the Making Statistics More Effective in Schools of Business (MSMESB) Conferences. http://weatherhead.cwru.edu/msmesb. Schield, Milo (1998). Statistical Literacy and Evidential Statistics. ASA 1998 Proceedings of the Section on Statistical Education, p. 187-192. Schield, Milo (1999a). Simpsons Paradox and Cornfield's Conditions. ASA 1999 Proceedings of the Section on Statistical Education. P. 106-111. Schield, Milo (1999b). Statistical Literacy: Thinking Critically about Statistics. Of Significance published by APDU: The Association of Public Data Users, Vo lume 1, Number 1. Schield, Milo (1999c). Common Errors in Forming Arithmetic Comparisons. Of Significance published by APDU: The Association of Public Data Users, Volume 1, Number 1. Schield, Milo (2000). Statistical Literacy: Describing and Comparing Percentages, Rates and Probabilities. ASA 2000 Proceedings of the Section on Statistical Education. Steen, Lynn Arthur (6/20/99). Quantitative Literacy Guide. www.stolaf.edu/other/ql/ql.html Steen, Lynn Arthur (1997) editor. Why Numbers Count: Quantitative Literacy for Tomorrow's America. The College Board Acknowledgments: To Working Group 5 on "Mathematics Education in Universities" at the ICME-9: International Congress on Mathematics Education conference for the opportunity to present this paper. To Linda Schield and Thomas V.V. Burnham for their criticisms, comments and support. Dr. Schield can be reached at schield@augsburg.edu. For this paper and related materials, see Dr. Schield's homepage at www.augsburg.edu/ppages/schield.
This data must be estimated since the American Statistical Association only tabulates those statistics courses taught in Department of Mathematics and Statistics. The percentages were estimated by treating entire majors as requiring a specific math course. Engineering, Biological sciences, Physical Sciences and Mathematics were treated as requiring calculus. Business, Social Sciences, Education, Health Sciences and Psychology were treated as requiring statistics. Data on US four-year college graduates by major was obtained for 1995 from the 1996 US Statistical Abstract, Table 325. On this basis, about 167,000 graduates (14%) had studied calculus while about 620,000 (54%) had studied statistics. In 1995, graduating seniors from US four-year colleges were 3.7 times as likely to have taken statistics as to have taken calculus.
2
In the US in fall 1995, elementary statistics was taught by Departments of Mathematics or Statistics to 164,000 students at four-year colleges and to 72,000 students at two year colleges. (Loftsgaarden and Watkins, 1998) Assuming that fall enrollments are half of the yearly totals, mathematics and statistics departments taught about 328,000 students a year at four-year colleges. At two-year colleges, department chairs estimated that about 9,000 students were taught statistics outside Mathematics Departments. There was no estimate in this article of the number taught outside Departments of Mathematics and Statistics at four-year colleges. Based on the estimate in the previous footnote, about 50% (328,000/640,000) of the students taking statistics at four-year colleges took their statistics in a course offered by a Depart ment of Mathematics or Statistics.
2000ICME9d.doc
Page 7
Milo Schield