0% found this document useful (0 votes)

41 views7 pages

To Pool or Not To Pool: That Is The Confusion

Uploaded by

Mayssa Bougherra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

41 views7 pages

To Pool or Not To Pool: That Is The Confusion

Uploaded by

Mayssa Bougherra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

To pool or not to pool: That is the confusion

Thomas R. Knapp
©
2013

Prologue

Isn't the English language strange? Consider the word "pool". I go swimming in
a pool. I shoot pool at the local billiards parlor. I obtain the services of someone
in the secretarial pool to type a manuscript for me. I participate in a pool to try to
predict the winners of football games. I join a car pool to save on gasoline. You
and I pool our resources.

And now here I am talking about whether or not to pool data?! With 26 letters in
our alphabet I wouldn't think we'd need to use the word "pool" in so many
different ways. (The Hawaiian alphabet has only 12 letters...the five vowels and
seven consonants H,K,L,M,N,P,and W; they just string lots of the same letters
together to make new words.)

What is the meaning of the term "pooling data"?

There are several situations in which the term "pooling data" arises. Here are
most of them:

1. Pooling variances

Let's start with the most familiar context for pooling data (at least to students in
introductory courses in statistics), viz., the pooling of sample variances in a t test
of the significance of the difference between two independent sample means.
The null hypothesis to be tested is that the means of two populations are equal
(the populations from which the respective samples have been randomly
sampled). We almost never know what the population variances are (if we did
we'd undoubtedly also know what the populations means are, and there would be
no need to test the hypothesis), but we often assume that they are equal, so we
need to have some way of estimating from the sample data the variance that the
two populations have in common. I won't bore you with the formula (you can look
it up in almost any statistics texxtbook), but it involves, not surprisingly, the two
sample variances and the two sample sizes. You should also test the
"poolability" of the sample variances before doing the pooling, by using Bartlett's
test or Levene's test, but almost nobody does; neither test has much power.

[Note: There is another t test for which you don't assume the population
variances to be equal, and there's no pooling. It's variously called the Welch-

2013-Knapp-To-pool-or-not-to-pool.doc Page 1
Satterthwaite test or the Behrens-Fisher test. It is the default t test in Minitab. If
you want the pooled test you have to explicitly request it.]

2. Pooling within-group regression slopes

One of the assumptions for the appropriate use of the analysis of covariance
(ANCOVA) for two independent samples is that the regression of Y (the
dependent variable) on X (the covariate) is the same in the two populations that
have been sampled. If a test of the significance of the difference between the
two within-group slopes is "passed" (the null hypothesis of equality of slopes is
not rejected), those sample slopes can be pooled together for the adjustment of
the means on the dependent variable. If that test is "failed" (the null hypothesis
of equality of slopes is rejected) the traditional ANCOVA is not appropriate and
the Johnson-Neyman technique (Johnson & Neyman, 1936) must be used in its
place.

3. Pooling raw data across two (or more) subgroups

This is the kind of pooling people often do without thinking through the
ramifications. For example, suppose you were interested in the relationship
between height and weight for adults, and you had a random sample of 50 males
and a random sample of 50 females. Should you pool the data for the two sexes
and calculate one correlation coefficient, or should you get two correlation
coefficients (one for the males and one for the females)? Does it matter?

The answer to the first question is a resounding "no" to the pooling. The answer
to the second question is a resounding "yes". Here's why. In almost every
population of adults the males are both taller and heavier than the females, on
the average. If you pool the data and create a scatter plot, it will be longer and
skinnier than the scatterplots for the two sexes treated separately, thereby
producing a spuriously high correlation between height and weight. Try it. You'll
see what I mean. And read the section in David Howell's (2007) statistics
textbook (page 265) regarding this problem. He provides an example of real
data for a sample of 92 college students (57 males, 35 females) in which the
correlation between height and weight is .60 for the males, .49 for the females,
and .78 for the two sexes pooled together.

4. Pooling raw data across research sites

This is the kind of pooling that goes on all the time (often unnoticed) in
randomized clinical trials. The typical researcher often runs into practical
difficulties in obtaining a sufficient number of subjects at a single site and "pads"
the sample size by gathering data from two or more sites. In the analysis
he(she) almost never tests the treatment-by-site interaction, which might "be
there" and would constrain the generalizability of the findings.

2013-Knapp-To-pool-or-not-to-pool.doc Page 2
5. Pooling data across time

There is a subtle version of this kind of pooling and a not-so-subtle version.

Researchers often want to combine data for various years or minutes or
whatever, for each unit of analysis (a person, a school, a hospital, etc.), usually
by averaging, in order to get a better indicator of a "typical" measurement. They
(the researchers) usually explain why and how they do that, so that's the not-so-
subtle version. The subtle version is less common but more dangerous. Here the
mistake is occasionally made of treating the Time 2 data for the same people as
though they were different people from the Time 1 people. The sample size
accordingly looks to be larger than it is, and the "correlatedness" of the data at
the two points in time is ignored, often to the detriment of a less sensitive
analysis. (Compare, for example, data that should be treated using McNemar's
test for correlated samples with data that are appropriately handled by the
traditional chi-square test of the independence of two categorical variables.)

6. Pooling data across scale categories

This is commonly known as "collapsing" and is frequently done with Likert-type

scales. Instead of distinguishing between those who say "strongly agree" from
those who say "agree'", the data for those two scale points are combined into
one over-all "agree" designation. Likewise for "strongly disagree" and "disagree".
This can result in a loss of information, so it should be used as a last resort.

7. Pooling "scores" on different variables

There are two different ways that data can be pooled across variables. The first
way is straightforward and easy. Suppose you were interested in the trend of
average (mean) monthly temperatures for a particular year in a particular city.
For some months you have temperatures in degrees Fahrenheit and for other
months you have temperatures in degrees Celsius. (Why that might have
happened is not relevant here.) No problem. You can convert the Celsius
temperatures to Fahrenheit by the formula F = (9/5)C + 32; or you can convert
the Fahrenheit temperatures to Celsius by using the formula C = (5/9) (F - 32).

The second way is complicated and not easy. Suppose you were interested in
determining the relationship between mathematical aptitude and mathematical
achievement for the students in your particular secondary school, but some of
the students had taken the Smith Aptitude Test and other students had taken the
Jones Aptitude Test. The problem is to estimate what score on the Smith test is
equivalent to what score on the Jones test. This problem can be at least
approximately solved if there is a normative group of students who have taken
both the Smith test and the Jones test, you have access to such data, and you
have for each test the percentile equivalent to each raw score on each test. For
each student in your school who took Smith you use this "equipercentile method"
to estimate what he(she) "might have gotten" on Jones. Assign to him(her) the

2013-Knapp-To-pool-or-not-to-pool.doc Page 3
Jones raw score equivalent to the percentile rank that such persons obtained on
Smith. Got it? Whew!

8. Pooling data from the individual level to the group level

This is usually referred to as "data aggregation". Suppose you were interested in

the relationship between secondary school teachers' numbers of years of
experience and the mathematical achievement of their students. You can't use
the individual student as the unit of analysis, because each student doesn't have
a different teacher (except in certain tutoring or home-school situations). But you
can, and should, pool the mathematical achievement scores across students in
their respective classrooms in order to get the correlation between teacher years
of experience and student mathematical achievement.

9. Pooling cross-sectional data to approximate panel data

Cross-sectional data are relatively easy to obtain. Panel (longitudinal) data are
not. Why? The principal reason is that the latter requires that the same people
are measured on each of the occasions of interest, and life is such that people
often refuse to participate on every occasion or they are unable to participate on
every occasion (some even die). And you might not even want to measure the
same people time after time, because they might get bored with the task and just
"parrot back" their responses, thereby artificially inflating the correlations
between time points.

What has been suggested is to take a random sample of the population at Time
1, a different random sample at Time 2,...etc. and compare the findings across
time. You lose the usual sensitivity provided by having repeated measurements
on the same people, but you gain some practical advantages.

There is a more complicated approach called a cross-sectional-sequential

design, whereby random samples are taken from two or more cohorts at various
time points. Here is an example (see Table 1, below) taken from an article that
Chris Kovach and I wrote several years ago (Kovach & Knapp, 1989, p. 26). You
get data for five different ages (60, 62, 64, 66, and 68) for a three-year study
(1988, 1990, 1992). Nice, huh?

2013-Knapp-To-pool-or-not-to-pool.doc Page 4
10. Pooling findings across similar studies

This very popular approach is technically called "meta-analysis" (the term is due
to Glass, 1976), but it should be called "meta-synthesis" (some people do use
that term), because it involves the combining of results, not the breaking-down of
results. I facetiously refer to it occasionally as "a statistical review of related
literature", because it has come to replace almost all narrative reviews in certain
disciplines. I avoid it like the plague; it's much too hard to cope with the problems
involved. For example, what studies (published only? published and
unpublished?) do you include? How do you determine their "poolability"? What
statistical analysis(es) do you employ in combining the results?

Summary

So, should you pool or not? Or, putting it somewhat differently, when should you
pool and when should you not? The answer depends upon the following
considerations, in approximately decreasing order of importance:

1. The research question(s). Some things are obvious. For example, if you are
concerned with the question "What is the relationship between height and weight
for adult females?" you wouldn't want to toss in any height&weight data for adult
males. But you might want to pool the data for Black adult females with the data
for White adult females, or the data for older adult females with the data for
younger adult females. It would be best to test the poolability before you do so,
but if your sample is a simple random sample drawn from a well-defined
population of adult females you might not know or care who's Black and who's
White. On the other hand, you might have to pool if you don't have an adequate
number of both Blacks and Whites to warrant a separate analysis for each.

2. Sample size. Reference was made in the previous paragraph to the situation
where there is an inadequate number of observations in each of two (or more)
subgroups, which would usually necessitate pooling (hopefully poolable entities).

3. Convenience, common sense, necessity

In order to carry out an independent sample t test when you assume equal
population variances, you must pool. If you want to pool across subgroups, be
careful; you probably don't want to do so, as the height and weight example (see
above) illustrates. When collapsing Likert-type scale categories you might not
have enough raw frequencies (like none?) for each scale point, which would
prompt you to want to pool. For data aggregation you pool data at a lower level
to produce data at a higher level. And for meta-analysis you must pool; that's
what meta-analysis is all about.

2013-Knapp-To-pool-or-not-to-pool.doc Page 5
A final caution

Just as "acceptance" of a null hypothesis does not mean it is necessarily true,

"acceptance" in a poolability test does not mean that poolability is necessarily
justified.

2013-Knapp-To-pool-or-not-to-pool.doc Page 6
References

Glass, G. V (1976). Primary, secondary, and meta-analysis of research.

Educational Researcher, 5, 3-8.

Howell, D.C. (2007). Statistical methods for psychology (6th ed.). Belmont, CA:
Thomson.

Johnson, P. O., & Neyman, J. (1936). Tests of certain linear hypotheses and
their applications to some educational problems. Statistical Research Memoirs,
1, 57-93.

Kovach, C.R., & Knapp, T.R. (1989). Age, cohort, and time-period confounds in
research on aging. Journal of Gerontological Nursing, 15 (3), 11-15.

2013-Knapp-To-pool-or-not-to-pool.doc Page 7

Grade 9 Sepedi HL Test
33% (3)
Grade 9 Sepedi HL Test
28 pages
Statistical Analysis of Data With Report Writing
100% (2)
Statistical Analysis of Data With Report Writing
16 pages
Eco2011 Notes
No ratings yet
Eco2011 Notes
96 pages
Research Methods Chapter 5
No ratings yet
Research Methods Chapter 5
59 pages
Paper2scheam 2
No ratings yet
Paper2scheam 2
28 pages
Statistical Analysis Notes
No ratings yet
Statistical Analysis Notes
49 pages
Statistics Notes BS
No ratings yet
Statistics Notes BS
11 pages
Statistical Tests
No ratings yet
Statistical Tests
11 pages
Statistics Explained An Introductory Guide For Life Scientists 2nd Edition Entire PDF Ebook
No ratings yet
Statistics Explained An Introductory Guide For Life Scientists 2nd Edition Entire PDF Ebook
15 pages
Lecture 11
No ratings yet
Lecture 11
53 pages
Data Analysis and Report Writing BRM
No ratings yet
Data Analysis and Report Writing BRM
49 pages
Statistics Long Essay
No ratings yet
Statistics Long Essay
22 pages
Choosing The Right Statistic
No ratings yet
Choosing The Right Statistic
20 pages
Unit IV - Analytics Tasks (Students)
No ratings yet
Unit IV - Analytics Tasks (Students)
127 pages
Lecture 7
No ratings yet
Lecture 7
28 pages
STATISTICS
No ratings yet
STATISTICS
10 pages
Oultine 4
No ratings yet
Oultine 4
1 page
How Do I Test The Normality of A Variable's Distribution?
No ratings yet
How Do I Test The Normality of A Variable's Distribution?
6 pages
Lesson 11 Statistical Techniques Toanalyze Data
No ratings yet
Lesson 11 Statistical Techniques Toanalyze Data
34 pages
CH11 pptx-1
No ratings yet
CH11 pptx-1
35 pages
Data Analysis, Interpretation and Presentation
No ratings yet
Data Analysis, Interpretation and Presentation
21 pages
Z Scores
No ratings yet
Z Scores
19 pages
Name: Deepak Kumar Singh Student Reg. No. 1708004923
No ratings yet
Name: Deepak Kumar Singh Student Reg. No. 1708004923
6 pages
Piano Start
100% (3)
Piano Start
53 pages
Geog 3mb3 Section 4
No ratings yet
Geog 3mb3 Section 4
30 pages
Gcse Statistics Revision Notes
No ratings yet
Gcse Statistics Revision Notes
10 pages
Inferential Statistics
No ratings yet
Inferential Statistics
42 pages
ML Unit 3
No ratings yet
ML Unit 3
46 pages
BRM CH - 07
No ratings yet
BRM CH - 07
7 pages
Unit One Graphing and Descriptive Statis-1
No ratings yet
Unit One Graphing and Descriptive Statis-1
12 pages
Anova Ancova Presentation To Research Sig University of Phoenix March 2021
No ratings yet
Anova Ancova Presentation To Research Sig University of Phoenix March 2021
66 pages
Link It 4 SB
No ratings yet
Link It 4 SB
126 pages
Module 3 - Lesson 3.2 Quantitative Data Analysis
No ratings yet
Module 3 - Lesson 3.2 Quantitative Data Analysis
41 pages
Chapter Six Data Processing, Analysis and Interpretation
No ratings yet
Chapter Six Data Processing, Analysis and Interpretation
8 pages
Course Code: 8614 Course Name: Educational Statistics Assignment: 2 Semester: Spring 2022 Program: B.Ed
No ratings yet
Course Code: 8614 Course Name: Educational Statistics Assignment: 2 Semester: Spring 2022 Program: B.Ed
19 pages
BSCS OBE Syllabus For Computer Programming 1
No ratings yet
BSCS OBE Syllabus For Computer Programming 1
7 pages
Mine Detection Dogs
100% (1)
Mine Detection Dogs
314 pages
Summer. in The Fall, (Cultured Left Unattended For A Long Period of Time) He Injected Laboratory
No ratings yet
Summer. in The Fall, (Cultured Left Unattended For A Long Period of Time) He Injected Laboratory
7 pages
Statistics
No ratings yet
Statistics
61 pages
Get The Guy, Keep The Guy-StaceyMurphy
88% (8)
Get The Guy, Keep The Guy-StaceyMurphy
204 pages
1) One-Sample T-Test
No ratings yet
1) One-Sample T-Test
5 pages
Educ 301 Angel Mae A. Llobrera
No ratings yet
Educ 301 Angel Mae A. Llobrera
14 pages
Carte Teste Biling Intensiv
100% (1)
Carte Teste Biling Intensiv
112 pages
Chapter 6
No ratings yet
Chapter 6
16 pages
Sta 103 L1 Upda2
No ratings yet
Sta 103 L1 Upda2
104 pages
Analisis Data Inferensi
No ratings yet
Analisis Data Inferensi
17 pages
Correlation of Statistics
No ratings yet
Correlation of Statistics
6 pages
Statistics For A2 Biology
100% (1)
Statistics For A2 Biology
9 pages
Quantitative Data Analysis
No ratings yet
Quantitative Data Analysis
22 pages
Ordinal Data
No ratings yet
Ordinal Data
6 pages
Introduction Into Statistics: Vladimir Kozlov
No ratings yet
Introduction Into Statistics: Vladimir Kozlov
20 pages
Annova
No ratings yet
Annova
4 pages
Statistical Instruments and References Writing in Research
No ratings yet
Statistical Instruments and References Writing in Research
36 pages
Data Analysis Lecture
No ratings yet
Data Analysis Lecture
17 pages
Quantitative Data Analysis: Harshad Bajpai
No ratings yet
Quantitative Data Analysis: Harshad Bajpai
26 pages
PSPP Users' Guide: GNU PSPP Statistical Analysis Software Release 1.4.1-G79ad47
100% (1)
PSPP Users' Guide: GNU PSPP Statistical Analysis Software Release 1.4.1-G79ad47
213 pages
Mufti-E-Aazam Hayat Wo Khidmat Awr Naatiya Shaeri PH.D Theses
100% (2)
Mufti-E-Aazam Hayat Wo Khidmat Awr Naatiya Shaeri PH.D Theses
303 pages
Statistical Techniques - Bda
No ratings yet
Statistical Techniques - Bda
33 pages
Mathematical Statistics Statistic
No ratings yet
Mathematical Statistics Statistic
3 pages
AP2080 Main Module
No ratings yet
AP2080 Main Module
11 pages
Statistics
No ratings yet
Statistics
52 pages
Javascript Exercises
100% (1)
Javascript Exercises
11 pages
Statistics - The Big Picture
No ratings yet
Statistics - The Big Picture
4 pages
A Simple Glossary of Statistics
No ratings yet
A Simple Glossary of Statistics
19 pages
Interactive English 101 Prelim Test
No ratings yet
Interactive English 101 Prelim Test
5 pages
Statistics Supplement McEvoy
No ratings yet
Statistics Supplement McEvoy
10 pages
Statistics For Research: Data and Variables
No ratings yet
Statistics For Research: Data and Variables
7 pages
Parametric Vs Non Paramteric Tests
No ratings yet
Parametric Vs Non Paramteric Tests
5 pages
2019 Book E-Democracy
No ratings yet
2019 Book E-Democracy
252 pages
A Brief Overview of Alexander Wendts Constructivism
No ratings yet
A Brief Overview of Alexander Wendts Constructivism
4 pages
Format of Thesis Writing
No ratings yet
Format of Thesis Writing
20 pages
Supervision Guide: (Postgraduate Students)
No ratings yet
Supervision Guide: (Postgraduate Students)
9 pages
Classifying Political Regimes 1800-2016: A Typology and A New Dataset
No ratings yet
Classifying Political Regimes 1800-2016: A Typology and A New Dataset
14 pages
GuidelinesThesis General
No ratings yet
GuidelinesThesis General
5 pages
Cultural Security: The Evolving Role of Art in International Security
No ratings yet
Cultural Security: The Evolving Role of Art in International Security
28 pages
Lave and March Review PDF
No ratings yet
Lave and March Review PDF
3 pages
All You Need To Know: A Master Thesis Guide: Created For Students of MEAS Program and Master Students of FB 02
No ratings yet
All You Need To Know: A Master Thesis Guide: Created For Students of MEAS Program and Master Students of FB 02
13 pages
Human Rights Defenders: Toolkit
No ratings yet
Human Rights Defenders: Toolkit
6 pages
2013conceptualization of ElectronicGovernment Adoption
No ratings yet
2013conceptualization of ElectronicGovernment Adoption
11 pages
Inthe: 'Í, Ntroduction Models
No ratings yet
Inthe: 'Í, Ntroduction Models
16 pages
All You Need To Know: A Thesis and Seminar Paper Guide
No ratings yet
All You Need To Know: A Thesis and Seminar Paper Guide
14 pages
Global Democracy & COVID-19:: Upgrading International Support
No ratings yet
Global Democracy & COVID-19:: Upgrading International Support
42 pages
Anglia Ruskin University Vice-Chancellor'S PHD Studentship Terms and Conditions 2020-1 (Version 1)
No ratings yet
Anglia Ruskin University Vice-Chancellor'S PHD Studentship Terms and Conditions 2020-1 (Version 1)
6 pages
Nottingham Trent University Postgraduate Masters Scholarships For UK Students Application Form
No ratings yet
Nottingham Trent University Postgraduate Masters Scholarships For UK Students Application Form
6 pages
Checklist: Master's Students
No ratings yet
Checklist: Master's Students
1 page
Maharashtra Public Service Commission Maharashtra Group 'C' Services Main Examination - 2022
No ratings yet
Maharashtra Public Service Commission Maharashtra Group 'C' Services Main Examination - 2022
23 pages
PathwaysLS3e L2 Assessment Unit02
No ratings yet
PathwaysLS3e L2 Assessment Unit02
7 pages
Manual Seguidores Ades 2008
No ratings yet
Manual Seguidores Ades 2008
91 pages
English Module
No ratings yet
English Module
9 pages
Phases of Compiler PDF
No ratings yet
Phases of Compiler PDF
63 pages
ASCII Chart Decimal Octal Hex Character Description: S. Balaraman
100% (1)
ASCII Chart Decimal Octal Hex Character Description: S. Balaraman
3 pages
Special Issue: New Directions in Mixed Methods Research
No ratings yet
Special Issue: New Directions in Mixed Methods Research
106 pages
Torture
No ratings yet
Torture
123 pages
Boarding Pass 3 MCC Grammar Resources
No ratings yet
Boarding Pass 3 MCC Grammar Resources
32 pages
Keynotes About Phonetics Lessons
No ratings yet
Keynotes About Phonetics Lessons
12 pages
Jayamala - Wikipedia
No ratings yet
Jayamala - Wikipedia
7 pages
Grade 6 DLL ENGLISH 6 Q1 Week 1
No ratings yet
Grade 6 DLL ENGLISH 6 Q1 Week 1
7 pages
7th Grade Post-Test
No ratings yet
7th Grade Post-Test
11 pages
Chapter#3 Effective Communication in Business
No ratings yet
Chapter#3 Effective Communication in Business
33 pages
Class 4 English
No ratings yet
Class 4 English
3 pages
EDA - 90 - Somos Guardianes Del Planeta
No ratings yet
EDA - 90 - Somos Guardianes Del Planeta
9 pages
3 Truths - 2 Lies
No ratings yet
3 Truths - 2 Lies
1 page
Maranao Culture and Its Relationship To The Standard English Phonetic and Intonation Skills
No ratings yet
Maranao Culture and Its Relationship To The Standard English Phonetic and Intonation Skills
12 pages
Penerapan Barcode Pada Perancangan Sistem Informasi Penjualan (Studi Kasus Pada Po Sarana)
No ratings yet
Penerapan Barcode Pada Perancangan Sistem Informasi Penjualan (Studi Kasus Pada Po Sarana)
11 pages
Chapter 1 Psycholinguistic
No ratings yet
Chapter 1 Psycholinguistic
2 pages
Summative Test
No ratings yet
Summative Test
2 pages
Exercise 4
No ratings yet
Exercise 4
3 pages
A Concept of Limits
From Everand
A Concept of Limits
Donald W. Hight
4/5 (4)
SPSS for you
From Everand
SPSS for you
A Rajathi
4.5/5 (4)
Sampling in Statistics
From Everand
Sampling in Statistics
Stephanie Glen
No ratings yet
Chi Squared for Beginners
From Everand
Chi Squared for Beginners
Stephanie Glen
No ratings yet

To Pool or Not To Pool: That Is The Confusion

Uploaded by

To Pool or Not To Pool: That Is The Confusion

Uploaded by

To pool or not to pool: That is the confusion

What is the meaning of the term "pooling data"?

2. Pooling within-group regression slopes

3. Pooling raw data across two (or more) subgroups

4. Pooling raw data across research sites

There is a subtle version of this kind of pooling and a not-so-subtle version.

6. Pooling data across scale categories

This is commonly known as "collapsing" and is frequently done with Likert-type

7. Pooling "scores" on different variables

8. Pooling data from the individual level to the group level

This is usually referred to as "data aggregation". Suppose you were interested in

9. Pooling cross-sectional data to approximate panel data

There is a more complicated approach called a cross-sectional-sequential

3. Convenience, common sense, necessity

Just as "acceptance" of a null hypothesis does not mean it is necessarily true,

Glass, G. V (1976). Primary, secondary, and meta-analysis of research.

You might also like