22 Cilinical Research 5 Quantitative Data Collection and Analysis
22 Cilinical Research 5 Quantitative Data Collection and Analysis
a Epworth Hospital/Deakin University Clinical Nursing Research Centre, The Epworth Hospital,
Melbourne, Australia
b LaTrobe University/The Alfred Hospital Clinical School of Nursing, The Alfred Hospital, Melbourne,
Australia
KEYWORDS Summary This six-part research series is aimed at clinicians who wish to develop
Descriptive statistics; research skills, or who have a particular clinical problem that they think could be
Inferential statistics; addressed through research. The series aims to provide insight into the decisions that
Data analysis; researchers make in the course of their work, and to also provide a foundation for
Data management;
decisions that nurses may make in applying the findings of a study to practice in their
own Unit or Department. The series emphasises the practical issues encountered
Parametric statistics;
when undertaking research in critical care settings; readers are encouraged to source
Non-parametric
research methodology textbooks for more detailed guidance on specific aspects of
statistics; the research process.
Measurement A couple of points:
1. It is artificial to describe research as qualitative or
quantitative. Studies often include both dimensions.
However, for the purposes of this paper/series, this
distinction is drawn for clarity of writing.
2. It is common practice for quantitative studies to refer
to study ‘subjects’ and qualitative studies to refer to
study ‘participants’. For ease of reading, the latter
term will be used throughout this series.
© 2005 Elsevier Ltd. All rights reserved.
Introduction
0964-3397/$ — see front matter © 2005 Elsevier Ltd. All rights reserved.
doi:10.1016/j.iccn.2005.02.005
188 M. Botti, R. Endacott
outcome is a function of the variable itself (true dif- of a scale but the ratios of the values are also
ferences) and not a function of differences in the meaningful. This means that on a ratio scale, a
way the variable is measured. score of 10 for example, represents a value that
Selection or development of a measurement in- is twice the value of 5.
strument is guided by two principles: ensuring the
instrument measures what it purports to measure In general, data collected at the nominal and ordi-
(validity), and the instrument does not allow sys- nal level are considered discrete variables (Polgar
tematic or unsystematic bias in the data collected and Thomas, 1995) because measurement classifies
(reliability). data into discrete (not overlapping) categories
An important outcome of the operationalisation (male or female, high, moderate or low sedation).
of variables and the choice of measurement instru- Data measured at interval and ratio level are
ment is the type of numerical data that is collected. considered continuous variables because the data
The type of data derived can be classified into lev- represent an underlying continuum where there
els of quality that have implications for the type of are potentially an infinite number of values. Some-
analyses that can be applied. times we can choose the level at which data are
collected. Participants’ age, for example, can be
collected as a continuous variable by simply asking
Levels of data for participants’ age, or as a discrete variable by
asking participants to nominate which age category
There are four levels at which variables can be mea- they belong to (65—75 years).
sured:
1. Nominal (or categorical): At this level, data
collected represents categories of a particular Quantitative analysis
variable. For example, sex of participants is a
nominal variable because the data collected, Once data are collected, these ‘‘raw data’’ need
whether participants are male or female, rep- to be organised and interpreted. We can see
resents a category of the variable of interest. from the discussion above that in quantitative
These categories are assigned a number for the research, all data are given a numerical value,
purpose of analysis, for example males may be even nominal data where the number assigned has
categorised as ‘‘1’’ and females ‘‘2’’. These no numerical meaning. The interpretation of data
numbers have no numerical significance. in quantitative research requires the use of statis-
2. Ordinal: At this level, data are ranked or ordered tics. Statistics are a way of organising and making
from lowest to highest score. Ranked data pro- sense of data obtained by measurement (Polgar
vides more information than nominal level data and Thomas, 1995). Descriptive statistics include
but, with ordinal level data it is not possible to the principles for organising and summarising raw
know what the differences are between the first data. Inferential statistics include the principles
and second scores and so on. We only know that for deciding whether the data collected shows
a higher ranking means that there is more of a the expected differences and patterns. The cen-
particular characteristic than a lower ranking. tral consideration when selecting appropriate
For example, patients’ sedation scores can be descriptive and inferential statistics is whether
considered ordinal level data. Level of sedation variables are discrete or continuous. The level of
is ranked from fully alert to heavily sedated, but data available for analysis determines whether the
a score of two does not necessarily mean that a statistical tests to be applied are selected from
patient has twice the level of sedation than a the parametric or non-parametric group of tests;
patient with a score of one. a fundamental distinction between statistical
3. Interval: At this level, data occur on a scale tests.
where the differences between the intervals of
that scale are assumed to be equal. For exam-
ple, when pain intensity is measured on a nu- Parametric statistics
merical scale of 0 to 10, the difference between
a pain rating of 4 and 5 is assumed to be the Parametric statistics use the arithmetic mean (av-
same as the difference between a rating of 9 erage) so therefore data must be measured at
and 10. the interval or ratio level. In addition, the use of
4. Ratio: This level of data measurement is an ex- these tests is based on a key assumption about the
tension of interval level data where not only are way the data are distributed. This assumption is
there equal intervals between different points that the data have come from a population that
190 M. Botti, R. Endacott
has a normal distribution (Story, 2004). Many vari- ◦ 99.7% of values are within three S.D. of the
ables measured in biological and behavioural sci- mean (Sim and Wright, 2000).
ences have an approximately normal distribution
(Polgar and Thomas, 1995). Fig. 1 shows an exam- Using the example in Fig. 1, it is possible to sum-
ple of normally distributed data (baseline systolic marise the distribution of scores in our sample using
BP readings from 101 patients). The mean (aver- the standard deviation and our knowledge of the
age) BP is 136.4 mmHg and the standard deviation characteristics of the normal curve. For the sam-
(the average deviation of scores from the mean) is ple in Fig. 1, approximately 69 patients had a BP
20.8 mmHg. A normal distribution has the following between 116 and 157 mmHg, 97 patients had a BP
properties: between 95 and 176 mmHg, and all patients had a
BP between 76 and 198 mmHg.
Parametric statistical tests are preferred for two
• It represents the theoretical distribution of pop- reasons:
ulation scores.
1. There is a greater variety of parametric tests
• It is a bell-shaped curve.
available.
• It has well-defined properties that are based on
2. Parametric tests generally are more powerful,
the mean and variation from the mean (standard
i.e. they are more likely to detect differences
deviation).
between group means if differences exist.
• It is symmetrical around the mean, so that equal
numbers of cases fall on either side of the mean.
Most of the cases fall close to the mean; very few Non-parametric statistics
cases fall at the extreme arms of the curve.
• The area under the distribution curve represents Non-parametric statistics do not rely on assump-
a probability about the distribution of scores: tions about the distribution of data discussed above
◦ 68.3% of values are within one standard devia- so are often referred to as assumption-free tests.
tion (S.D.) of the mean; These tests do not use the raw data, instead, the
◦ 95.4% of values are within two S.D. of the data are ranked from the lowest to the highest score
mean; and and the analysis is carried out on the ranked data.
Clinical research 5: Quantitative data collection and analysis 191
Table 1 Descriptive statistics: summarising and presenting nominal and ordinal data.
Nominal and ordinal data
Summary statistics Description
Counts and percentages Data are summarised by counting the number of participants within each category
and the corresponding percentage of the total number of participants. For example,
if there were 30 participants in a study and 12 were male and 18 were female then
both the number and percentage are reported [Males = 12(40%)]
Data presentation When the number of categories is greater than two, a visual display of the data in
charts can help the understanding and interpretation of data
Pie charts Circles divided into segments. The total area of the circle represents cases, and the
size of the segments are proportional to the frequency of cases in each category
Bar charts Comprise a series of bars of equal width, separated by gaps. Each category of variable
is represented by one bar. The height of the bar is proportional to the frequency of
cases in each category
Tables Summary tables are useful ways of displaying frequency of responses to individual
variables and their categories. The columns in the tables show the frequency of
responses and the rows represent the individual variables
By ranking data, important information about the termine if the distribution is normal. This is done
magnitude of the data is lost. That is why non- by graphing the data onto a frequency histogram
parametric analyses are considered less powerful (Fig. 1) to look for the characteristics of a nor-
(less likely to find a significant effect if one exists). mal curve. If the distribution is normal, then the
The reduced power increases the likelihood of Type best summary statistics are the mean and standard
2 error. Refer to the third article in this series for a deviation. If the distribution is not normal then
definition of Type 1 and Type 2 error (Endacott and non-parametric statistics are used. The measures
Botti, 2005). of central tendency (middle scores) in this situa-
tion are the median and the mode. Refer to Table 2
Descriptive statistics for a definition of these terms. When the distribu-
tion is normal, the mean, the median and the mode
Descriptive statistics have two main purposes (Sim should be approximately the same. When the dis-
and Wright, 2000). The first is to organise, sum- tribution is not normal, these statistics are quite
marise and present numerical data. In some de- different.
scriptive studies this may be the full extent of the
analysis of the data. The second purpose of descrip- Inferential statistics
tive statistics is to describe the structure of the
data collected (the distribution) in order to inform When we collect data from a sample in order to
more complex inferential analyses of data, in par- understand a particular phenomenon, we are seek-
ticular, whether or not to use parametric statistics. ing to draw inferences about a populations’ char-
Table 1 presents ways of summarising and pre- acteristics from a sample that is representative of
senting data measured at nominal or ordinal lev- that population (refer to the third article in this
els. An important consideration when summaris- series for a discussion of representative sampling,
ing and displaying data is to ensure the data are Endacott and Botti, 2005). The mean derived from
not distorted and that it is represented in a useful the sample may or may not reflect the ‘‘true’’
way. This is especially important in small samples mean, i.e. the population mean. We use inferen-
where, for example, reporting percentages without tial statistics to determine the probability that the
the corresponding number can misrepresent the im- obtained results were due to chance or they are sig-
portance of a finding (25% can refer to 1 out of 4 nificant at a given level of probability (Polgar and
participants and 30 out of 120 participants). Thomas, 1995). Inferential statistics can be used
The distribution of variables that are measured to test the probability that a sample mean is rep-
at an interval or ratio level can take many forms resentative of the target population (discussed in
depending on the characteristics of the distribu- detail in Endacott and Botti, 2005) or to test ex-
tion. The first analysis of this level of data is to de- perimental hypotheses. In the latter, experimental
192 M. Botti, R. Endacott
designs may be used to test hypotheses about pre- There are many statistical tests available. Se-
dicted outcomes. Two or more groups of partici- lection of the appropriate statistical test to deter-
pants are exposed to different conditions and data mine the p-value is based on the following consid-
are collected and analysed. If there are differences erations:
between the mean scores of the groups, there are
1. The level of data.
two possible explanations:
2. The number of groups in the investigation (one
1. the experimental manipulation caused a change or more).
in the phenomenon of interest and the samples 3. Whether the data were collected from indepen-
come from different populations, or dent groups (two or more separate groups of par-
2. the samples come from the same population ticipants) or from related groups (for example,
and the differences in the means occurred by repeated measures in a pre and post test de-
chance. sign).
4. The characteristics of the data, i.e. the distri-
By applying inferential statistical analyses, it is bution of the data.
possible to calculate the probability that the pop-
ulations are different and that the experimental Essentially, once the criteria for decision making
manipulation did have an effect. The convention is known, selection of statistical tests is relatively
in clinical research is to accept a probability of straightforward.
.05 or .01 that the effect observed occurred by
chance. In other words, a 5% or 1% probability that
the observed differences occurred by chance alone Conclusions
and not as a result of the experimental condition.
The statistic that provides this information is the p- This paper addressed the key principles related to
value. If the p-value is less than .05 or .01 then the the collection and analysis of data in quantitative
observed difference between groups is considered research. The levels at which variables are mea-
to be statistically significant and the experimen- sured determines the methods of analysis. This con-
tal hypothesis is accepted. If the p-value is greater cept is central to almost all decisions regarding
than .05 or .01, then the experimental hypothesis statistical analysis. Consultation with a statistician
is rejected. The p-value of significance (.05 or .01) during the design phase of research will help en-
is chosen in the design stage. sure that both the design of the study and the way
Clinical research 5: Quantitative data collection and analysis 193
data are collected enables the use of more power- Polgar S, Thomas SA. Introduction to research in the health sci-
ful statistical analyses. Careful and explicit design ences. 3rd ed. Melbourne: Churchill Livingstone; 1995.
Sim J, Wright C. Research in health care. Concepts, designs and
of research investigations is required to meet the
methods. London: Stanley Thornes (Publishers) Ltd; 2000.
objectives of replicability, objectivity and measure- Story I. A brief survey of univariate group comparison proce-
ment underpinning quantitative research. dures. In: Minichiello V, Sullivan G, Greenwood K, Axford R,
editors. Handbook research methods for nursing and health
science. 2nd ed. Frenchs Forest, NSW: Pearson Education
Australia; 2004.
References Wynne R, Botti M, Steadman H, Holdsworth L. The effect of three
wound dressings on infection, healing, comfort and cost in
Endacott R, Botti M. Clinical research 3: sample selection. In- patients with sternotomy wounds: a randomized trial. Chest
tensive Crit Care Nurs 2005;21:51—5. 2004;125:43—9.