100% found this document useful (1 vote)
185 views12 pages

Quantitative Research Data Analysis Lecturers Notes

This document provides an overview of quantitative research data analysis methods. It discusses [1] statistical terms like confidence intervals and p-values, [2] basic statistical tests like t-tests and ANOVA, [3] sampling methods like probability and non-probability sampling, and [4] concepts like data distribution and randomization. Examples of common statistical tests are also outlined, including correlation, Mann-Whitney U, Kruskal-Wallis, and chi-square tests. The document aims to help readers further understand quantitative data analysis techniques.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
185 views12 pages

Quantitative Research Data Analysis Lecturers Notes

This document provides an overview of quantitative research data analysis methods. It discusses [1] statistical terms like confidence intervals and p-values, [2] basic statistical tests like t-tests and ANOVA, [3] sampling methods like probability and non-probability sampling, and [4] concepts like data distribution and randomization. Examples of common statistical tests are also outlined, including correlation, Mann-Whitney U, Kruskal-Wallis, and chi-square tests. The document aims to help readers further understand quantitative data analysis techniques.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

UU-PhD-801

Quantitative and Qualitative Research Methods

Lecturer’s Notes: Quantitative Research Data Analysis

Further Understanding will be gained on the following:

 Understand statistical terms related to quantitative research (confidence intervals and p-values)
 Learn the use of basic statistical tests
 Define sampling, data distribution and randomization
 Explain probability and non-probability sampling and describe the different types of each

Quantitative Data Analysis:

Statistics and Descriptive statistical analysis:


Statistics is concerned with the systematic collection of numerical data and its interpretation.
Burns & Grove (2005:752) refer to a statistic as simply 'a numerical value obtained from a sample
that is used to estimate the parameters of a population' (e.g. the number of people living in a
particular town).
Descriptive statistics are used to describe the basic features of the data that have been collected in a
study. They provide simple summaries about the sample and the measures (e.g. mean, median,
standard deviation etc). Together with simple graphics analysis, they form the basis of virtually
every quantitative analysis of data. It should be noted that with descriptive statistics no conclusions
can be extended beyond the immediate group from which the data was gathered.
Data presentation:
A set of data on its own is very hard to interpret. There is a lot of information contained in the data,
but it is hard to see. Eye-balling your data using graphs and exploratory data analysis is necessary
for understanding important features of the data, detecting outliers, and data which has been
recorded incorrectly. Outliers are extreme observations which are inconsistent with the rest of the
data. The presence of outliers can significantly distort some of the more formal statistical
techniques, and hence there is a high need for preliminary detection and correction or
accommodation of such observations, before further analysis takes place. Usually, a straight line fits
the data well. However, the outlier “pulls” the line in the direction of the outlier, as demonstrated in
the lower graph in Figure 2. When the line is dragged towards the outlier, the rest of the points then
fall farther from the line that they would otherwise fall on or close to. In this case the “fit” is
reduced; thus, the correlation is weaker. Outliers typically occur from an error including a
mismarked answer paper, a mistake in entering a score in a database, a subject who misunderstood
the directions etc. The researcher should always seek to understand the cause of an outlying score.

1
UU-PhD-801 Quantitative and Qualitative Research Methods
UU-PhD-801

Quantitative and Qualitative Research Methods

If the cause is not legitimate, the researcher should eliminate the outlying score from the analysis to
avoid distorts in the analysis.

Statistical Analysis (Burns & Grove, 2005):

One-tailed test: A test of a statistical hypothesis, where the region of rejection is on only one side of the sampling
distribution, is called a one-tailed test. For example, suppose the null hypothesis states that the mean is less than or
equal to 10. The alternative hypothesis would be that the mean is greater than 10.
Two-tailed test: When using a two-tailed test, regardless of the direction of the relationship you
hypothesize, you are testing for the possibility of the relationship in both directions. For example,
we may wish to compare the mean of a sample to a given value x using a t-test. Our null hypothesis
is that the mean is equal to x.

2
UU-PhD-801 Quantitative and Qualitative Research Methods
UU-PhD-801

Quantitative and Qualitative Research Methods

Alpha level (p value): In statistical analysis the researcher examines whether there is any significance in the
results.

The acceptance or rejection of a hypothesis is based upon a level of significance – the alpha (a) level

This is typically set at the 5% (0.05) a level, followed in popularity by the 1% (0.01) a level

These are usually designated as p, i.e. p =0.05 or p = 0.01

So, what do we mean by levels of significance that the 'p' value can give us?

The p value is concerned with confidence levels. This states the threshold at which you are prepared to accept the
possibility of a Type I Error – otherwise known as a false positive – rejecting a null hypothesis that is actually true.

The question that significance levels answer is 'How confident can the researcher be that the results have not arisen
by chance?'

Note: The confidence levels are expressed as a percentage.

So if we had a result of:


p =1.00, then there would be a 100% possibility that the results occurred by chance.
p = 0.50, then there would be a 50% possibility that the results occurred by chance.
p = 0.05, then we are 95% certain that the results did not arise by chance
p = 0.01, then we are 99% certain that the results did not arise by chance.
Clearly, we want our results to be as accurate as possible, so we set our significance levels as low as
possible - usually at 5% (p = 0.05), or better still, at 1% (p = 0.01)
Anything above these figures, are considered as not accurate enough. In other words, the results are not significant.

Now, you may be thinking that if an effect could not have arisen by chance 90 times out of 100 (p = 0.1), then that
is pretty significant.

However, what we are determining with our levels of significance, is 'statistical significance', hence we are much
more strict with that, so we would usually not accept values greater than p = 0.05.
So when looking at the statistics in a research paper, it is important to check the 'p' values to find
out whether the results are statistically significant or not.

3
UU-PhD-801 Quantitative and Qualitative Research Methods
UU-PhD-801

Quantitative and Qualitative Research Methods

Table 1.
Statistical Symbols

Accessed: http://www.statisticshowto.com/statistics-symbols/

4
UU-PhD-801 Quantitative and Qualitative Research Methods
UU-PhD-801

Quantitative and Qualitative Research Methods

Statistical tests (Field, 2013):


There are a number of tests that can be used to analyse quantitative data, depending on what the
researcher is looking for, what data were collected and how the data were collected.

Below are a few of the more common tests used to analyse quantitative data:

T-test
The t-test is used to assess whether the means of two groups differ statistically from each other.

Pearson Correlation
Pearson's correlation is used to test the correlation between at least two continuous variables. The value for
Pearson's correlation lies between 0.00 (no correlation) and 1.00 (perfect correlation).

ANOVA (Analysis of Variance)


ANOVA is one of a number of tests (ANCOVA - analysis of covariance - and MANOVA - multivariate analysis of
variance) that are used to describe/compare the association between a number of groups.

Mann-Whitney U-test
The Mann-Whitney U-test test is used to test for differences between two independent groups on a continuous
measure, e.g. do males and females differ in terms of their levels of anxiety.
This test requires two variables (e.g. male/female gender) and one continuous variable (e.g. anxiety level).
Basically, the Mann-Whitney U-test converts the scores on the continuous variable to ranks, across the two groups
and calculates and compares the medians of the two groups. It then evaluates whether the medians for the two
groups differ significantly.

Spearman rank correlation test


The Spearman rank correlation test is used to demonstrate the association between two ranked variables (X and
Y), which are not normally distributed. It is frequently used to compare the scores of a group of subjects on two
measures (i.e. a coefficient correlation based on ranks).

Wilcoxon signed-rank test


The Wilcoxon signed-rank test (also known as Wilcoxon matched-pairs test) is the most common
nonparametric test for the two-sampled repeated measures design of research study.
Kruskal-Wallis test
The Kruskal-Wallis test is used to compare the means among more than two samples, when either the data are
ordinal or the distribution is not normal. When there are only two groups, then it is the equivalent of the Mann-
Whitney U-test.
This test is typically used to determine the significance of difference among three or more groups.

5
UU-PhD-801 Quantitative and Qualitative Research Methods
UU-PhD-801

Quantitative and Qualitative Research Methods

Chi-square test
There are two different types of chi-square tests - but both involve categorical data.
One type of chi-square test compares the frequency count of what is expected in theory against what is actually
observed.
The second type of chi-square test is known as a chi-square test with two variables or the chi-square
test for independence.

Parametric and Nonparametric Tests (Frost, 2015):


A parametric statistical test makes assumptions about the parameters (defining properties) of the
population distribution(s) from which one's data are drawn, whereas a non-parametric test makes no
such assumptions. Nonparametric tests are also called distribution-free tests because they do not
assume that your data follow a specific distribution.

It is argued that nonparametric tests should be used when the data do not meet the assumptions of the parametric
test, particularly the assumption about normally distributed data. However, there are additional considerations
when deciding whether a parametric or nonparametric test should be used.

Reasons to Use Parametric Tests

Reason 1: Parametric tests can perform well with skewed and nonnormal distributions
Parametric tests can perform well with continuous data that are not normally distributed if the
sample size guidelines demonstrated in the table below are satisfied.

6
UU-PhD-801 Quantitative and Qualitative Research Methods
UU-PhD-801

Quantitative and Qualitative Research Methods

*Note: These guidelines are based on simulation studies conducted by statisticians at Minitab.

Reason 2: Parametric tests can perform well when the spread of each group is different
While nonparametric tests don not assume that your data are normally distributed, they do have other assumptions
that can be hard to satisfy. For example, when using nonparametric tests that compare groups, a common
assumption is that the data for all groups have the same spread (dispersion). If the groups have a different spread,
then the results from nonparametric tests might be invalid.

Reason 3: Statistical power


Parametric tests usually have more statistical power compared to nonparametric tests. Hence, they
are more likely to detect a significant effect when one truly exists.

Reasons to Use Nonparametric Tests


Reason 1: Your area of study is better represented by the median

The fact that a parametric test can be performed with nonnormal data does not imply that the mean is the best
measure of the central tendency for your data.
For example, the center of a skewed distribution (e.g. income), can be better measured by the median where 50%
are above the median and 50% are below. However, if you add a few billionaires to a sample, the mathematical
mean increases greatly, although the income for the typical person does not change.

7
UU-PhD-801 Quantitative and Qualitative Research Methods
UU-PhD-801

Quantitative and Qualitative Research Methods

When the distribution is skewed enough, the mean is strongly influenced by changes far out in the distribution’s
tail, whereas the median continues to more closely represent the center of the distribution.

Reason 2: You have a very small sample size


If the data are not normally distributes and do not meet the sample size guidelines for the parametric tests, then a
nonparametric test should be used. In addition, when you have a very small sample, it might be difficult to
ascertain the distribution of your data as the distribution tests will lack sufficient power to provide meaningful
results.

Reason 3: You have ordinal data, ranked data, or outliers that you cannot remove
Typical parametric tests can only assess continuous data and the results can be seriously affected by
outliers. Conversely, some nonparametric tests can handle ordinal data, ranked data, without being
significantly affected by outliers.

Power of the study:


There is increasing criticism about the lack of statistical power of published research in sports and exercise science
and psychology. Statistical power is defined as the probability of rejecting the null hypothesis; that is, the
probability that the study will lead to significant results. If the null hypothesis is false but not rejected, a type 2
error is incurred. Cohen suggested that a power of 0.80 is satisfactory when an alpha is set at 0.05—that is, the risk
of type 1 error (i.e. rejection of the null hypothesis when it is true) is 0.05. This means that the risk of a type 2 error
is 0.20.
The magnitude of the relation or treatment effect (known as the effect size) is a factor that must receive a lot of
attention when considering the statistical power of a study. When calculated in advance, this can be used as an
indicator of the degree to which the researcher believes the null hypothesis to be false. Each statistical test has an
effect size index that ranges from zero upwards and is scale free. For instance, the effect size index for a correlation
test is r; where no conversion is required. For assessing the difference between two sample means, Cohen's d ,
Hedges g, or Glass's Δ can be used. These divide the difference between two means by a standard deviation.
Formulae are available for converting other statistical test results (e.g. t test, one way analysis of variance, and χ2
results—into effect size indexes (see Rosenthal, 1991).
Effect sizes are typically described as small, medium, and large. Effect sizes of correlations that
equal to 0.1, 0.3, and 0.5 and effect sizes of Cohen's that equal 0.2, 0.5, and 0.8 equate to small,
medium, and large effect sizes respectively. It is important to note that the power of a study is
linked to the sample size i.e. the smaller the expected effect size, the larger the sample size required
to have sufficient power to detect that effect size.

For example, a study that assesses the effects of habitual physical activity on body fat in children might have a
medium effect size (e.g. see Rowlands et al., 1999). In this study, there was a moderate correlation between
habitual physical activity and body fat, with a medium effect size. A large effect size may be anticipated in a study
that assesses the effects of a very low energy diet on body fat in overweight women (e.g. see Eston et al, 1995). In
Eston et al’s study, a significant reduction in total body intake resulted in a substantial decrease in total body mass
and the percentage of body fat.
The effect size should be estimated during the design stage of a study, as this will allow the researcher to determine
the size required to give adequate power for a given alpha (i.e. p value). Therefore, the study can be designed to

8
UU-PhD-801 Quantitative and Qualitative Research Methods
UU-PhD-801

Quantitative and Qualitative Research Methods

ensure that there is sufficient power to detect the effect of interest, that is minimising the possibility of a type 2
error.

Table 3.
Small, medium and large effect sizes as defined by Cohen

When empirical data are available, they can be used to assess the effect size for a study. However,
for some research questions it is difficult to find enough information (e.g. there is limited empirical
information on the topic or insufficient detail provided in the results of the relevant studies) to
estimate the expected effect size. In order to compare effect sizes of studies that differ in sample
size, it is recommended that, in addition to reporting the test statistic and p value, the appropriate
effect size index is also reported.

Quantitative Software for Data Analysis (Blaikie, 2003):


Quantitative studies often result in large numerical data sets that would be difficult to analyse without the help of
computer software packages. Programs such as EXCEL are available to most researchers and are relatively
straight-forward. These programs can be very useful for descriptive statistics and less complicated analyses.
However, sometimes the data require a more sophisticated software. There are a number of excellent statistical
software packages including:

SPSS – The Statistical Package for Social Science (SPSS) is one of the most popular software in social science
research. SPSS is comprehensive and compatible with almost any type of data and can be used to run both
descriptive statistics and other more complicated analyses, as well as to generate reports, graphs, plots and trend
lines based on data analyses.

STATA – This is an interactive program that can be used for both simple and complex analyses. It can also
generate charts, graphs and plots of data and results. This program seems a bit more complicated than other
programs as it uses four different windows including the command window, the review window, the result window
and the variable window.

SAS – The Statistical Analysis System (SAS) is another very good statistical software package that can be useful
with very large data sets. It has additional capabilities that make it very popular in the business world because it
can address issues such as business forecasting, quality improvement, planning, and so forth. However, some

9
UU-PhD-801 Quantitative and Qualitative Research Methods
UU-PhD-801

Quantitative and Qualitative Research Methods

knowledge of programming language is necessary to use the software, making it a less appealing option for some
researchers.

R programming – R is an open source programming language and software environment for statistical computing
and graphics that is supported by the R Foundation for Statistical Computing. The R language is commonly used
among statisticians and data miners for developing statistical software and data analysis.

Data distribution (Langley & Perrie, 2014):


Data can be "distributed" (spread out) in different ways:

The Normal Curve:


The graph of the normal distribution depends on two factors i.e. the mean (M) and the standard
deviation (SD). The location of the center of the graph is determined by the mean of the

10
UU-PhD-801 Quantitative and Qualitative Research Methods
UU-PhD-801

Quantitative and Qualitative Research Methods

distribution, and the height and width of the graph is determined by the standard deviation. When
the standard deviation is large, the curve is short and wide; when the standard deviation is small, the
curve is tall and narrow. Normal distribution graphs look like a symmetric, bell-shaped curve, as
shown above. When measuring things like people's height, weight, salary, opinions or votes, the
graph of the results is very often a normal curve.

11
UU-PhD-801 Quantitative and Qualitative Research Methods
UU-PhD-801

Quantitative and Qualitative Research Methods

References

Abramson, J. H., Abramson, Z. H. (2008). Scales of Measurement. Research Methods in Community Medicine:
Surveys, Epidemiological Research, Programme Evaluation, Clinical Trials, Sixth Edition, 125-132.
Blaikie, N. (2003). Analyzing quantitative data: From description to explanation. Sage
Publications.

Burns N., Grove S.K. (2005). The Practice of Nursing Research: Conduct, Critique, and Utilization (5th Ed.). St.
Louis, Elsevier Saunders.

Creswell, J. W. (2013). Research design: Qualitative, quantitative, and mixed methods approaches. Sage
Publications, Incorporated.
Eston, RG, Fu F. Fung L (1995). Validity of conventional anthropometric techniques for estimating
body composition in Chinese adults. Br J Sports Med, 29, 52–6.

Field, A. (2013).Discovering Statistics Using IBM SPSS Statistics. (4th Ed).


Publications Ltd.

Frost J. (2015). Choosing Between a Nonparametric Test and a Parametric Test. Retrieved from
http://blog.minitab.com/blog/adventures-in-statistics-2/choosing-between-a-nonparametric-test-and-a-parametric-
test

Langley C, Perrie Y (2014). Maths Skills for Pharmacy: Unlocking Pharmaceutical Calculations. Oxford
University Press.

Lyons, R. (2010). Best Practices in Graphical Data Presentation, Ohio, USA.

Rosenthal R. (1991.). Meta-analytic procedures for social research (revised edition). Newbury Park, CA: Sage

Rowlands A.V, Eston R.G, Ingledew D.K. (1999). The relationship between activity levels, body fat and aerobic
fitness in 8–10 year old children. J Appl Physiol, 86, 1428–35.

12
UU-PhD-801 Quantitative and Qualitative Research Methods

You might also like