100% found this document useful (14 votes)
291 views14 pages

Statistical Research Methods A Guide For Non Statisticians Accessible DOCX Download

The document is a comprehensive guide on statistical research methods tailored for non-statisticians, covering essential topics such as hypothesis generation, statistical assumptions, and data analysis using R software. It includes detailed sections on one-sample and two-sample proportions, multi-category data, and continuous data summarization, along with practical exercises and examples. The guide emphasizes the importance of statistical methods in the research process and provides a structured approach to writing research findings in the IMRaD format.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (14 votes)
291 views14 pages

Statistical Research Methods A Guide For Non Statisticians Accessible DOCX Download

The document is a comprehensive guide on statistical research methods tailored for non-statisticians, covering essential topics such as hypothesis generation, statistical assumptions, and data analysis using R software. It includes detailed sections on one-sample and two-sample proportions, multi-category data, and continuous data summarization, along with practical exercises and examples. The guide emphasizes the importance of statistical methods in the research process and provides a structured approach to writing research findings in the IMRaD format.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Statistical Research Methods A Guide for Non Statisticians

Visit the link below to download the full version of this book:

https://medipdf.com/product/statistical-research-methods-a-guide-for-non-statist
icians/

Click Download Now


Contents

1 Introduction 1
1.1 Statistical Methods as a Part of the Research Process . . . . 1
1.1.1 Populations and Samples . . . . . . . . . . . . . . . . 1
1.1.2 Parameters and Statistics . . . . . . . . . . . . . . . . 2
1.2 The Statistical Method . . . . . . . . . . . . . . . . . . . . . . 3
1.2.1 Research Question and Hypothesis Generation . . . . 3
1.2.2 Statistical Assumptions . . . . . . . . . . . . . . . . . 4
1.2.3 Statistical Method . . . . . . . . . . . . . . . . . . . . 5
1.3 Writing in the IMRaD Format . . . . . . . . . . . . . . . . . 6
1.4 The R Statistical Software Package . . . . . . . . . . . . . . . 7
1.4.1 Getting Started . . . . . . . . . . . . . . . . . . . . . . 10
1.4.2 Loading Data . . . . . . . . . . . . . . . . . . . . . . . 10
1.4.3 Working with Data . . . . . . . . . . . . . . . . . . . . 11

2 One-Sample Proportions 13
2.1 Introduction: Qualitative Data . . . . . . . . . . . . . . . . . 13
2.2 Establishing Hypotheses . . . . . . . . . . . . . . . . . . . . . 14
2.3 Summarizing Categorical Data (with R Code) . . . . . . . . . 15
2.4 Assessing Assumptions . . . . . . . . . . . . . . . . . . . . . . 16
2.5 Hypothesis Test for Comparing a Population Proportion
to a Hypothesized Value . . . . . . . . . . . . . . . . . . . . . 17
2.5.1 Behavior of the Sample Proportion . . . . . . . . . . . 17
2.5.2 Decision Making . . . . . . . . . . . . . . . . . . . . . 18
2.5.3 Standard Normal Distribution . . . . . . . . . . . . . 20
2.6 Performing the Test and Decision Making (with R Code) . . 21
2.6.1 Test Statistic . . . . . . . . . . . . . . . . . . . . . . . 21
2.7 Formal Decision Making . . . . . . . . . . . . . . . . . . . . . 24
2.7.1 Critical Value Method . . . . . . . . . . . . . . . . . . 24
2.7.2 p-value Method . . . . . . . . . . . . . . . . . . . . . . 26
2.7.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . 27
2.7.4 Confidence Intervals . . . . . . . . . . . . . . . . . . . 28
2.8 Contingency Methods (with R Code) . . . . . . . . . . . . . . 29
2.9 Communicating the Results (IMRaD Write-Up) . . . . . . . . 32

v
vi CONTENTS

2.10 Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.11 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3 Two-Sample Proportions 37
3.1 Summarizing Categorical Data with Contingency Tables
(with R Code) . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.2 Hypothesis Test for Comparing Two Population
Proportions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.2.1 Generating Hypotheses About Two Proportions . . . 39
3.2.2 Statistical Assumptions . . . . . . . . . . . . . . . . . 40
3.3 Performing the Test and Decision Making (with R Code) . . 42
3.3.1 Critical Value Method . . . . . . . . . . . . . . . . . . 43
3.3.2 p-Value Method . . . . . . . . . . . . . . . . . . . . . 44
3.3.3 Confidence Intervals . . . . . . . . . . . . . . . . . . . 45
3.3.4 Chi-Square Test . . . . . . . . . . . . . . . . . . . . . 47
3.4 Contingency Methods (with R Code) . . . . . . . . . . . . . . 51
3.5 Odds Ratio (with R Code) . . . . . . . . . . . . . . . . . . . . 51
3.6 Communicating the Results (IMRaD Write-Up) . . . . . . . . 53
3.7 Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

4 Multi-category Data 59
4.1 Introduction: Types of Multi-categorical Data . . . . . . . . . 59
4.2 Summarizing Categorical Data (with R Code) . . . . . . . . . 60
4.3 Establishing Hypotheses: Difference
Between Comparisons and Association . . . . . . . . . . . . . 63
4.4 Assessing Assumptions (with R Code) . . . . . . . . . . . . . 65
4.5 Performing the Test and Decision Making (with R Code) . . 67
4.5.1 Critical Value Method . . . . . . . . . . . . . . . . . . 68
4.5.2 p-Value Method . . . . . . . . . . . . . . . . . . . . . 70
4.5.3 Interpretation of Results . . . . . . . . . . . . . . . . . 70
4.6 Contingency Methods (with R Code) . . . . . . . . . . . . . . 72
4.7 Communicating the Results (IMRaD Write-Up) . . . . . . . . 73
4.8 Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

5 Summarizing Continuous Data 79


5.1 Representative Values (with R Code) . . . . . . . . . . . . . . 80
5.1.1 Mean . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
5.1.2 Median . . . . . . . . . . . . . . . . . . . . . . . . . . 80
5.1.3 Other Measures . . . . . . . . . . . . . . . . . . . . . . 81
5.2 Measures of Variability (with R Code) . . . . . . . . . . . . . 82
5.2.1 Standard Deviation . . . . . . . . . . . . . . . . . . . 83
CONTENTS vii

5.2.2 Range Measures . . . . . . . . . . . . . . . . . . . . . 84


5.2.3 Empirical Rule . . . . . . . . . . . . . . . . . . . . . . 84
5.3 Assessing Normality (with R Code) . . . . . . . . . . . . . . . 85
5.3.1 Histogram . . . . . . . . . . . . . . . . . . . . . . . . . 86
5.3.2 Box Plot . . . . . . . . . . . . . . . . . . . . . . . . . 88
5.3.3 QQ Plot . . . . . . . . . . . . . . . . . . . . . . . . . . 90
5.3.4 Outliers . . . . . . . . . . . . . . . . . . . . . . . . . . 94
5.4 Rounding and Reporting Conventions . . . . . . . . . . . . . 95
5.4.1 Rounding . . . . . . . . . . . . . . . . . . . . . . . . . 95
5.4.2 Reporting Based on Distribution . . . . . . . . . . . . 96
5.4.3 Standard Error . . . . . . . . . . . . . . . . . . . . . . 96
5.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

6 One-Sample Means 101


6.1 Behavior of the Sample Mean . . . . . . . . . . . . . . . . . . 101
6.2 Establishing Hypotheses . . . . . . . . . . . . . . . . . . . . . 106
6.3 Assessing Assumptions (with R Code) . . . . . . . . . . . . . 107
6.4 Summarizing Data (with R Code) . . . . . . . . . . . . . . . 108
6.5 Performing the Test and Decision Making (with R Code) . . 109
6.5.1 Critical Value Method . . . . . . . . . . . . . . . . . . 112
6.5.2 p-Value Method . . . . . . . . . . . . . . . . . . . . . 113
6.5.3 Confidence Intervals . . . . . . . . . . . . . . . . . . . 114
6.6 Contingency Methods (with R Code) . . . . . . . . . . . . . . 115
6.7 Communicating the Results . . . . . . . . . . . . . . . . . . . 116
6.8 Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
6.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

7 Two-Sample Means 121


7.1 Introduction: Independent Groups
or Paired Measurements . . . . . . . . . . . . . . . . . . . . . 121
7.2 Independent Groups . . . . . . . . . . . . . . . . . . . . . . . 122
7.2.1 Establishing Hypotheses: Independent Groups . . . . 122
7.2.2 Assessing Assumptions (with R Code) . . . . . . . . . 123
7.2.3 Summarizing Data (with R Code) . . . . . . . . . . . 124
7.2.4 Performing the Test and Decision Making
(with R Code) . . . . . . . . . . . . . . . . . . . . . . 128
7.2.5 Contingency Methods (with R Code) . . . . . . . . . . 132
7.2.6 Communicating the Results . . . . . . . . . . . . . . . 133
7.2.7 Process for Two-Sample t-Test . . . . . . . . . . . . . 135
7.3 Paired Measurements . . . . . . . . . . . . . . . . . . . . . . . 136
7.3.1 Establishing Hypotheses: Independent Groups . . . . 136
7.3.2 Assessing Assumptions (with R Code) . . . . . . . . . 137
7.3.3 Summarizing Data (with R Code) . . . . . . . . . . . 139
7.3.4 Performing the Test and Decision Making
(with R Code) . . . . . . . . . . . . . . . . . . . . . . 140
viii CONTENTS

7.3.5 Communicating the Results . . . . . . . . . . . . . . . 141


7.3.6 Process for Paired t-Test . . . . . . . . . . . . . . . . . 143
7.3.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . 143

8 Analysis of Variance 147


8.1 Establishing Hypotheses . . . . . . . . . . . . . . . . . . . . . 147
8.2 Assessing Assumptions (with R Code) . . . . . . . . . . . . . 148
8.3 Summarizing Data (with R Code) . . . . . . . . . . . . . . . 149
8.4 Performing the Test and Decision Making (with R Code) . . 152
8.4.1 Post-hoc Multiple Comparisons (with R Code) . . . . 154
8.5 Contingency Methods (with R Code) . . . . . . . . . . . . . . 160
8.6 Communicating the Results . . . . . . . . . . . . . . . . . . . 161
8.7 Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
8.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163

9 Power 167
9.1 Making Mistakes with Statistical Tests . . . . . . . . . . . . . 167
9.2 Determinants of Sample Size . . . . . . . . . . . . . . . . . . 169
9.3 Categorical Outcomes . . . . . . . . . . . . . . . . . . . . . . 170
9.3.1 One-Sample Case . . . . . . . . . . . . . . . . . . . . . 170
9.3.2 Two-Sample Case (with R Code) . . . . . . . . . . . . 171
9.4 Continuous Outcomes . . . . . . . . . . . . . . . . . . . . . . 173
9.4.1 One-Sample Case (with R Code) . . . . . . . . . . . . 173
9.4.2 Two-Sample Case (with R Code) . . . . . . . . . . . . 173
9.4.3 Multi-sample Case (with R Code) . . . . . . . . . . . 174
9.5 Post-hoc Power Analysis . . . . . . . . . . . . . . . . . . . . . 175
9.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179

10 Association and Regression 181


10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
10.1.1 Association Between Measurements . . . . . . . . . . 181
10.1.2 Scatter Plots (with R Code) . . . . . . . . . . . . . . . 182
10.2 Correlation Coefficients . . . . . . . . . . . . . . . . . . . . . 186
10.2.1 Establishing Hypotheses . . . . . . . . . . . . . . . . . 186
10.2.2 Assessing Assumptions (with R Code) . . . . . . . . . 187
10.2.3 Summarizing Data . . . . . . . . . . . . . . . . . . . . 188
10.2.4 Estimating Correlation, Performing the Test,
and Decision Making (with R Code) . . . . . . . . . . 188
10.2.5 Contingency Methods (with R Code) . . . . . . . . . . 190
10.2.6 Communicating the Results (with IMRaD
Write-Up) . . . . . . . . . . . . . . . . . . . . . . . . . 190
10.2.7 Process for Estimating Correlation . . . . . . . . . . . 191
10.3 Simple Linear Regression . . . . . . . . . . . . . . . . . . . . 193
10.3.1 Establishing Hypotheses . . . . . . . . . . . . . . . . . 195
10.3.2 Assessing Assumptions (with R Code) . . . . . . . . . 195
CONTENTS ix

10.3.3 Summarizing Data . . . . . . . . . . . . . . . . . . . . 195


10.3.4 Estimating the Regression, Performing the Test,
and Decision Making (with R Code) . . . . . . . . . . 195
10.3.5 Establishing the Worth of the Regression . . . . . . . 197
10.3.6 Communicating the Results (with IMRaD
Write-Up) . . . . . . . . . . . . . . . . . . . . . . . . . 205
10.3.7 Process for Simple Linear Regression . . . . . . . . . . 206
10.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207

Bibliography 211
Chapter 1

Introduction

1.1 Statistical Methods as a Part of the


Research Process
1.1.1 Populations and Samples
The impetus for conducting research which utilizes statistical analyses is
the desire to better understand some population of interest. A population is
defined as the totality of any group of subjects sharing some characteristic(s).
The characteristics that such groups of subjects share can be generally defined
(e.g. nationality, ethnicity, gender) or specifically defined (e.g. low-income
patients under 40 years of age with type II diabetes). Researchers typically
study such populations because some feature of that group is unknown or
under question (e.g. what is the success rate of patients surviving a particular
treatment for a particular disease).
Though the research focus is on the population level, the use of an entire
population is impractical for many reasons, each of which can be summa-
rized in one word: resources. The resources needed to measure or examine
the members of a population include money for research materials (drugs,
laboratory space, recruitment, etc.) and the time needed to conduct the
study. If a population is too large, then a great deal of money is needed
to examine every subject within that population. Likewise, if members of
a population are spread over a large area (e.g. the contiguous U.S.), the
money and time required to reach them all will again be great. Importantly,
the resources available to conduct research are usually constrained by factors
external to the research. For instance, federal or industrial agencies spon-
soring such research only have so much funding to offer, so population-level
studies are usually out of the question. In other cases, such as in drug devel-
opment, it would be unwise to test new treatments on large populations of
subjects, especially when the risks of such treatments are severe or unknown.

R. Sabo and E. Boone, Statistical Research Methods: A Guide for 1


Non-Statisticians, DOI 10.1007/978-1-4614-8708-1 1,
© Springer Science+Business Media New York 2013
2 CHAPTER 1. INTRODUCTION

Due to constrained resources, studies focus on subgroups of populations,


which we refer to as samples. Samples are – by definition – smaller than the
populations from which they are drawn and are thus more manageable, both
from a resource-expenditure point of view as well as a conduct-of-research
point of view. In sampling from a population we hope to capture the char-
acteristics of the entire population in the smaller sample. For instance, if
57% of all undergraduate college students throughout the U.S. are female
(and 43% male), then we would hope that a smaller sample drawn from this
population would maintain a similar gender breakdown.
But therein lies one of the underlying facets of statistical theory and its
applications: how do we know that a sample resembles the population from
which it is drawn? The short answer is that we usually never know how closely
a sample represents its parent population, or – in other words – how “good”
the sample is. But we can take measures to help ensure that our samples
are of the highest possible quality, none more important than our sampling
technique. In order to obtain a sample, it must be taken from the population,
meaning that certain subjects – but not all – from the population must be
identified as also belonging to the sample. How these subjects are identified
is essential to sound statistical practice. If we “draw” certain subjects from
a population as opposed to others simply because it is easy for us to do so
(e.g. we take those closest to us; we take those willing to participate without
compensation; etc.), then we will have drawn what’s called a biased sample
(these particular examples is also called a convenience sample), meaning that
the reason we selected certain subjects has caused our sample to somehow
not reflect the parent population. Except in certain cases (clinical trials for
example), we try to avoid these convenience samples.
Collecting a simple random sample is the surest way we have of capturing
the important characteristics from a population. A simple random sample
is a process or quality more so than a noun, and it means that the process
used to identify subjects ensured that every subject (or most subjects) in
a population had an equal chance of being selected into the sample. The
phrase “equal chance” implies that we are probabilistically selecting patients
into the sample, and there are many ways of doing this (e.g. flipping a coin,
picking numbers randomly from a phone book) that we won’t get into. If we
know or if we can reasonably assume that this type of process was followed,
and provided the sample is itself not too small, then we can typically expect
our sample to be a microcosm of the population. If that is the case, then an
analysis of our sample should mimic an analysis of our population, and the
results we would get from both cases should be similar.

1.1.2 Parameters and Statistics


The populations in which we are interested often consist of many subjects
(consider the number of citizens in the United States, or the number of dia-
betes sufferers worldwide), each consisting of many individual characteristics.
1.2. THE STATISTICAL METHOD 3

Naturally, a comprehensive understanding of all facets of the entire


population is typically unattainable – aside from the fact that the population
itself is usually unattainable. Thus, we focus on parameters that adequately
summarize certain mathematical characteristics of the population. These
parameters often take the form of proportions or means, and in most cases
reflect the characteristic we would expect to observe in a typical subject from
that population.
However, since we cannot collect populations, we must focus on the prop-
erties of the samples to which we have access. Any property of a sample that
we measure (such as proportion or mean) is called a statistic, and is thus
distinguished from its population counterpart, the parameter. As we will see
in subsequent chapters, we can use a few statistics to summarize our entire
sample (this is especially helpful if samples are large), and we can also use
them to test hypotheses about the population in question. For instance, if
we want to know something about a population parameter (say the success
rate for a certain type of experimental cancer treatment), then we can use
the sample statistic (say the success rate for 20 patients who underwent that
treatment) to provide information on that population parameter. The most
popular statistical method that turns a sample statistic into inference on
a population parameter is called hypothesis testing, which will be the main
focus of our foray into biostatistical methodology.

1.2 The Statistical Method


1.2.1 Research Question and Hypothesis Generation
A hypothesis test (note it’s “a hypothesis”, not “an hypothesis”; you’ve been
warned) is the process of using sample data to provide evidence toward some
statement about a population parameter. Such a statement originally occurs
in the form of a research question, where we boldly and unequivocally state
what we feel or think about some parameter. As statisticians and biostatis-
ticians (or the hopeful users of statistics), our first job is to translate this
research question into a parametric or symbolic form that lends itself to
being measured. For example, stating that you want to make cancer-victims
better doesn’t make good science, but saying you want to increase the median
survival time of cancer-victims by 10 months through a particular treatment
works well.
Once we have determined what population parameter we are interested in,
we need to then turn the research question into a set of testable hypotheses.
These competing hypotheses must be such that only one can be true at a
time. Given that we have defined such hypotheses, we can then use our
sample data to provide evidence for or against those hypotheses; naturally,
the hypothesis that the evidence more closely supports becomes the “winner”.
It is this process of using sample data to support a set of hypotheses about
a population parameter that we are referring to in hypothesis testing.
4 CHAPTER 1. INTRODUCTION

1.2.2 Statistical Assumptions


In order to conduct a hypothesis test, several characteristics of our sample
must be in order for us to place any stock in the worth of such a test.
The characteristics we require of a sample are: that it is representative, that
the subjects within that sample are independently measured, and that our
sample is large enough for the planned statistical method to work correctly.
Representative Samples: A sample is representative of the population from
which it is drawn if the sample is somehow a microcosm of that population,
in that it maintains the important characteristics (e.g. gender or race propor-
tions, disease susceptibility) of the population even though it only contains
a fraction – often a small fraction – of its members. This is an important
characteristic, the utility of which is easily observed through the unfortunate
instance of an unrepresentative sample – if a sample is not representative of
the population from which it was drawn, then what good is it? The idea
is that if a sample is representative of a population, the numeric or mathe-
matic characteristics of that population will be present in the sample. This
attribute will ensure that statistical analysis of the sample would yield similar
results to a (hypothetical) statistical analysis of the population.
Independent Measurements: The concepts of dependence and indepen-
dence are somewhat difficult to explain without some basic foundation in
statistical language, so we will save some of this discussion for later. How-
ever, it should suffice to say that we would not want a sample where the
measurements or values we observe for some subjects are influenced by – or
depend upon – the measurements or values for other subjects. This may at
first seem like a weird phenomenon – in simple random samples this rarely
happens – but examples are easy to imagine. For instance, if we are con-
ducting a study where we are measuring the presence or absence of a certain
gene, and we unknowingly sampled measurements from members of the same
family, then the outcomes for those subjects within the same family will be
related due to genetic inheritability. This is bad because measurements that
are related – or dependent– make the sample measurements seem closer to-
gether than they actually may be in the grand population (this is called
variability and will be discussed later). Regardless, we would like our sample
measurements or values to be independent of one another, and if we responsi-
bly sample from the parent population (i.e. create a simple random sample),
then we can usually assume that this is the case.
Adequate Sample Size: Ask any statistician their biggest pet peeve, and
one of the most popular responses will be analyzing samples that are too
small. This happens for many reasons, such as small or esoteric populations,
limited resources, etc., but it happens most often due to poor planning. The
reason why this is a problem is that small samples can in no way represent the
population from which they were drawn, and thus any statistical methodology
dependent upon the sample’s representativeness will break down (i.e. not
work). Thus, we need our samples large enough to adequately reflect the
1.2. THE STATISTICAL METHOD 5

populations from which they are drawn (if you ask a statistician, no sample
is large enough), yet manageable enough to be cost effective. For many of
the procedures we will discuss throughout this text we will have rules for
determining how large a sample we need. We will also focus – in Chapter 9 –
on performing a sample size or power analysis, which helps us determine the
sample size we need to collect before we conduct the study.

1.2.3 Statistical Method


We will follow a formal method for conducting statistical analyses that con-
sists of several parts: statement of the research question, determining what
method to use, assessing our statistical assumptions, summarizing the data,
performing the test, and interpreting the results. These parts are designed
for several reasons: so that we can be sure we are taking the correct steps
for the analysis, so that we can easily communicate our methods and results,
and so that our methods can be easily reproduced.
Statement of Research Question: Before we know what statistical pro-
cedure we’re going to use (the statistical method provides the answer we’re
looking for), we have to know what question we are asking. We do this by
taking our research question – which must be explicitly stated – and turning
it into a set of testable hypotheses. We will spend a lot of time doing this
throughout the text. At the end of the day, you cannot provide an answer if
you don’t know what the question is.
Determination of Statistical Method: Once we know our question, we can
figure out how best to answer it. The remaining chapters in this text are
arranged to provide different types of methods we can use to answer different
types of questions we could potentially face. We determine what statistical
method to use by looking at how measurements were observed, and the types
of measurements we can come across vary considerably.
Assess Statistical Assumptions: Once we’ve identified the type of mea-
surement we have, and what kind of statistical method we would like to use
to analyze those measurements, we need to determine whether or not it is
appropriate to use that method. In general, this is done by assessing whether
our sample is representative, whether our measurements are independent,
and whether we have a large enough sample, though on occasion there will
be other considerations.
Summarize the Data: Provided our assumptions are met, we can then
summarize the sample data with statistics that represent all of the important
details of that sample. We will focus a great deal on how to appropriately
summarize a sample, given the type of measurements we have and what
assumptions are met.
Perform the Test: Once our data are appropriately summarized, we can
then perform the statistical hypothesis test or use the desired statistical tech-
nique. Again, we will learn various methods throughout the semester, with
each chapter presenting a new class of methods.
6 CHAPTER 1. INTRODUCTION

State the Result: Once we have conducted the statistical analysis, we will
need to make sense of our results. As mentioned earlier, we do this by stating
which hypothesis the evidence supports. Recall that though we are trying to
learn some characteristic about some population, that characteristic exists
or is true; we simply don’t know what it is. So when we state our result, we
can make one of two decisions: the evidence supports the first hypothesis, or
the evidence supports the second hypothesis. Since the conditions stated in
one of the two hypotheses we’ve created must be true, we can make two types
of mistakes called Type I and Type II errors. We will focus on errors of the
first type throughout this text, and we will cover errors of the second type in
Chapter 9. In practice, if we have set the table correctly by following sound
scientific methods in our data collection and sampling methodology, these
types of errors are of little concern and we can put a great deal of faith in
our statistical conclusions. Of course, the key aspect of any statistical result
lies in translating it into a meaningful statement that can be understood by
curious and critical readers.

1.3 Writing in the IMRaD Format


While we will spend a great deal of time performing statistical analyses, we
must also learn how to communicate these results to the scientific community.
We will spend a lot of time focusing on the “write-up” of our methods and
results. This is not because we don’t like you, or because we take peculiar
pleasure in torturing students, but rather because the results from statistical
analysis – regardless of how fancy or sophisticated – are useless unless they
can be understood by those not involved in the study. This is not only true of
statistical methods and results, but of science in general. If a scientific or sta-
tistical method is unclear, then readers of your research will not understand
what you have done and will ultimately reject your work via the following,
well-established assumption: “if I can’t understand what the author is saying,
then it must not be any good, for I am smarter than the author”.
A standard write-up that describes the key points of research that any
sophisticated reader would need to know to make an informed decision –
are the results from this research reliable enough for me to use or believe? –
would then be a necessity for translating and communicating our results. The
IMRaD style goes a long way toward providing such a standard format, and
has been accepted by virtually all credible scientific research journals (in a
strange irony, methodological research in statistics generally does not adhere
to IMRaD, though all of the pieces are still there).
The IMRaD format consists of four main parts: the Introduction, the
Methods section, reporting of Results, and a Discussion. Each part serves
its own purpose – briefly described below – and contains specific information
that matches up perfectly with the process we will follow for conducting
statistical analyses. Sophisticated readers become accustomed to this format
1.4. THE R STATISTICAL SOFTWARE PACKAGE 7

and its placement of material, so much that they often skip to the parts they
are interested in to glean information quickly. We will spend a great deal
of effort understanding this method and its pieces, as well as practicing how
they apply to specific statistical methods.
Introduction: Here we provide details on the scientific problem in which we
are interested, and then describe the populations of interest. The treatment
or intervention specific to the current study is introduced, and the scientific
research question is un-categorically introduced (i.e. in the form from which
you will create your testable hypotheses).
Methods: In an actual publication, this is the section where you would
describe the setting of your sample, including such details as where and when
subjects were observed. A thorough description of what was measured and
the process under which those measurements were taken would then be pro-
vided. Any technological processes specific to the particular science and mea-
surements in question would be described here. Generally, a description of
the statistical methods used to analyze the sample measurements in light of
the research questions and hypotheses would be placed in the last sub-section
of the Methods Section (and often in small font to indicate its importance).
Here you will state how you summarized your data, how you analyzed the
data, and how you will make your decisions based on that analysis. You
must also specify any details that aid in that process, such as the statistical
software used for analysis.
Results: The details of the statistical analyses are presented here, start-
ing with a summary of the sample data (including any tabular or graphical
representations), continuing with the results from the analysis of the primary
research question, and ending with any secondary or sub-analyses not speci-
fied in the primary research question. An unequivocal answer to the primary
research question must be provided in this section.
Discussion: A brief summary of the results is provided at the beginning of
this section, where here the results are described in words (no statistics). The
scientific or clinical implications of these results are then expounded upon,
specifically with regards to how these results compare to those from previous
research studies. Any study limitations – there will always be something –
must be identified and described in this section, as should a justification as
to how they do or do not affect your results. This section often ends with a
prognostication of what these results mean for future research or what steps
need to be taken to continue this research.

1.4 The R Statistical Software Package


While some common statistical procedures are simple enough to compute by
hand, many are computationally intensive enough that we would not wish
to do so. Further, modern data sets can be so large that calculation by
hand is typically prohibited. Fortunately, there are many statistical software
8 CHAPTER 1. INTRODUCTION

Figure 1.1: The initial screen in R showing the R Console.

packages available to perform these computations; some popular packages


are SPSS, SAS, Minitab, Stata, JMP, etc. In this text we will focus on
using the statistical package R, which is an open-source (read: free) software
program that is continually updated with new and improved packages by
its users. R can be downloaded at no cost from: cran.r-project.org. Once
on that page simply select your operating system (Windows, MacOS X or
Linux) and download the base package.
Once R is installed, you will have an extremely powerful piece of statistical
software at your fingertips. The main drawback to using R is that it is
a language, which means you will need to program the analyses yourself

You might also like