Research Methods Lesson Notes
Research Methods Lesson Notes
DESCRIPTIVE STATISTICS
Introduction
Research and innovations are essential in any walk of life or field of knowledge for its
enrichment, progress and development. A lot of research work is carried out and
statistical methods help researchers in carrying out these researches successfully.
Statistics, branch of mathematics that deals with the collection, organization, and
analysis of numerical data and with such problems as experiment design and decision
making.
Statistics is divided into two major divisions: Descriptive and Inferential statistics. In this
chapter we will concern ourselves to descriptive statistics.
Descriptive statistics is the use of statistics to describe, summarize, and explain or make
sense of a given set of data.
The word data is plural of datum. It means the evidence or facts for describing a group or
situation. Examples of data are height of students, score of intelligence and weight of
learners.
For understanding the meaning and deriving useful conclusion the data have to be
organised or arranged in some systematic way. Organisation of data is the arrangement
of original data in a proper way for deriving useful interpretation (Mangal, 1990 and
Kasomo, 2007). It involves the following ways:
Statistical Tables
Here, you tabulate or arrange the data in some properly selected classes and the
arrangement is described by titles and sub-titles. Such tables can list original raw data as
well as percentages, means, standard deviations (Mangal, 1990). The following are
general rules for constructing tables:
➢ Title should be simple, concise and unambiguous. The title usually is top of the
table.
➢ The table should be suitably divided into columns and rows according to the nature
of data and purpose. These rows and columns should be arranged in a logical
order to facilitate comparisons.
➢ The heading of each column or row should be as brief as possible. Two or more
columns or rows with similar headings may be grouped under a common heading
to avoid repetition.
➢ Sub-totals for each separate classification and general total for all combined
classes are to be given. These totals should be given at the bottom or right of
concerned items.
➢ The units in which the data are given must be mentioned, preferably, in the
headings of columns or rows.
➢ Necessary footnotes for providing essential explanation of he points to ambiguous
interpretation of the tabulated data must be given at the bottom of the table.
➢ Provide the source of data at the end of the table.
➢ The table should be simple to allow easy interpretation.
Rank Order
Example: Fifty students obtained the following scores on their psychology test,
tabulate these data in the form of rank order:
63, 22, 27, 33, 57, 37, 38, 40, 54, 41, 55, 43, 45, 62, 69, 29, 34, 57, 58, 38, 53, 40, 46,
48, 49, 59, 46, 47, 48, 41, 54, 43, 44, 64, 31, 35, 59, 36, 39, 53, 42, 52, 45, 42, 43, 44,
46, 47, 51, 39
Solution: The rank order tabulation of this data will be as given below:
11 54 31 43
12 53 32 42
13 53 33 42
14 52 34 41
15 51 35 41
16 50 36 40
17 49 37 40
18 49 38 39
19 48 39 39
20 47 40 38
Frequency Distributions
One useful way to view the data of a variable is to construct a frequency distribution
(i.e., an arrangement in which the frequencies, and sometimes percentages, of the
occurrence of each unique data value are shown). By frequency of a datum, we mean the
number of times the datum is repeated in a given series.
In this form of organisation and arrangement of data, you group the numerical data into
some arbitrarily chosen classes or groups. For this purpose, usually, the data are
distributed into groups of data (classes) and each datum is allotted a place in the
respective group or class. It is also seen how many times a particular datum or group of
data occur in the given data. In this way, frequency distribution may be considered a
method of presenting a collection of groups of data in each group of data or class.
While grouping the raw scores into a frequency distribution, we assume that the mid-point
of the class interval is the score obtained by each of the individuals represented by that
interval. In the above table, notice, for example, the class interval 35-39, had original
values of 39, 39, 38, 38, 37, 36 and 35 respectively. In grouping we assume that these
measures have the value of the midpoint of their respective class or group, i.e. 37. They
are represented by a single measure 37. This assumption leads to grouping error.
A Frequency Distribution Table tells you the way in which frequencies are distributed over
the various class intervals but it does not tell you the total number of cases or the
percentage of cases lying below or above a class interval. You can perform this task with
the help of cumulative frequency and cumulative percentage frequency distribution.
The following table on the above data will illustrate how this is done:
Cumulative frequencies are, thus, obtained by adding successively, starting from the
bottom, the individual frequencies. In the above table when we start from the bottom, the
first cumulative frequency is written as 1 against the lowest class interval, i.e. 20-24, the
next cumulative frequency is 3 because 1 is added to the frequency of the second class
interval which is 2, then 6 by adding 3 to the frequency of the third class interval which is
3 and so on.
To convert the cumulative frequencies into cumulative percentage frequencies, you just
need the particular cumulative frequency and then multiply it by 100/N, where N isthe total
number of frequencies
Some common graphical representations are bar graphs, histograms, line graphs, and
scatter plots.
Bar Graphs
A bar graph uses vertical bars to represent the data.
• The height of the bars usually represent the frequencies for the categories that sit on
the X axis.
• Note that, by tradition, the X axis is the horizontal axis and the Y axis is the vertical
axis.
Histograms
A histogram is a graphic that shows the frequencies and shape that characterize a
quantitative variable.
• Line Graphs
A line graph uses one or more lines to depict information about one or more variables.
• A simple line graph might be used to show a trend over time (e.g., with the years on
the X axis and the population sizes on the Y axis).
• Line graphs are used for many different purposes in research. Line graphs are used
in factorial experimental designs to depict the relationship between two categorical
independent variables and the dependent variable.
• This line graph shows that the "sampling distribution of the mean" is normally
distributed.
• Line graphs have in common their use of one or more lines within the graph (to depict
the levels or characteristics of a variable or to depict the relationships among
variables).
Scatterplots
A scatterplot is used to depict the relationship between two quantitative variables.
• Three common measures of central tendency are the mode, the median, and the
mean.
• Then you check to see which of the following two rules applies:
• Rule One. If you have an odd number of numbers, the median is the centre
number (e.g., three is the median for the numbers 1, 1, 3, 4, 9).
• Rule Two. If you have an even number of numbers, the median is the average of
the two innermost numbers (e.g., 2.5 is the median for the numbers 1, 2, 3, 7).
The mean is the arithmetic average (e.g., the average of the numbers 2, 3, 3, and 4, is
equal to 3).
Activity 16
1. Which measure of central tendency would be most suitable for each of the
following sets of data?
i. 1, 23,25, 26, 27, 23, 29, 30
ii. 1, 1, 1, 1, 1, 1, 1,1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 4, 50
iii. 1, 1, 2, 3, 4, 1, 2, 6, 6, 8, 3, 4, 5, 6, 7
iv. 1, 101, 104, 106, 111, 108, 109, 200
Measures of Variability
Measures of variability tell you how "spread out" or how much variability is present in a
set of numbers. They tell you how different your numbers tend to be. Note that measures
of variability should be reported along with measures of central tendency because they
provide very different but complementary and important information. To fully interpret one
(e.g., a mean), it is helpful to know about the other (e.g., a standard deviation).
An easy way to get the idea of variability is to look at two sets of data, one that is highly
variable and one that is not very variable. For example, which of these two sets of
numbers appears to be the most spread out, Set A or Set B?
Range
A relatively crude indicator of variability is the range (i.e., which is the difference between
the highest and lowest numbers).
• For example the range in Set A shown above is 7, and the range in Set B shown
above is 90.
• Higher values for both of these indicators indicate a larger amount of variability than
do lower numbers.
• Zero stands for no variability at all (e.g., for the data 3, 3, 3, 3, 3, 3, the variance and
standard deviation will equal zero).
• When you have no variability, the numbers are a constant (i.e., the same number).
• (Basically, you set up the three columns shown, get the sum of the third column, and
then plug the relevant numbers into the variance formula.)
• The variance tells you (exactly) the average deviation from the mean, in "squared
units."
• The standard deviation is just the square root of the variance (i.e., it brings the
"squared units" back to regular units).
• The standard deviation tells you (approximately) how far the numbers tend to vary
from the mean. (If the standard deviation is 7, then the numbers tend to be about
7 units from the mean. If the standard deviation is 1500, then the numbers tend to
be about 1500 units from the mean.)
If data are normally distributed, then an easy rule to apply to the data is what we call “the
68, 95, 99.7 percent rule." That is:
• Approximately 68% of the cases will fall within one standard deviation of the mean.
• Approximately 95% of the cases will fall within two standard deviations of the mean.
• Approximately 99.7% of the cases will fall within three standard deviations of the
mean.
You can determine the mean of the type of standard scores below by simply looking under
Mean. You can determine the standard deviation by looking at how much the scores
increase as you move from the mean to 1 SD.
• Note: percentile ranks are a different type of scores; because they only have ordinal
measurement properties, the concept of standard deviation is not relevant.
Activity 18
If you have a negative z-score, does it fall above or below the mean?
With a negative z-score do the majority of the population score higher or
lower than you?
Percentile Ranks
A percentile rank tells you the percentage of scores in a reference group (i.e., in the
norming group) that fall below a particular raw score.
• For example, if your percentile rank is 93 then you know that 93 percent of the scores
in the reference group fall below your score.
Z-Scores
A z-score tells you how many standard deviations (SD) a raw score falls from the mean.
• A SD of -3.5 says the score falls three and a half standard deviations below the
mean.
To transform a raw score into z-score units, just use the following formula:
Raw score - Mean Z-score = ------------------------ Standard Deviation
For example, you know that the mean for IQ scores is 100 and the standard deviation for
IQ scores is 15
• New score=3(15) + 100 (remember, the mean of IQ scores is 100 and the standard
deviation of IQ scores is 15). Therefore, the new score (i.e., the IQ score converted
from the z-score of 3 using the formula just provided) is equal to 145 (3 times 15
is 45, and when 100 is added you get 145).
Activity 19
Suppose that your marks in Mathematics and English are 65% and 71%
respectively. Which is your better subject in comparison with the others
if the group means and SD are 60 and 5 (Mathematics) and 65 and 7 (
English)
We have been talking about relationships among variables throughout your textbook. For
example, we have already talked about correlation, partial correlation, analysis of
variance which is used for factorial designs and analysis of covariance.
At this point in this chapter on descriptive statistics, we introduce two additional
techniques that you also can use for examining relationships among variables:
contingency tables and regression analysis.
Contingency Tables
When all of your variables are categorical, you can use contingency tables to see if your
variables are related.
When interpreting a contingency table, remember to use the following two rules:
• Rule One. If the percentages are calculated down the columns, compare across the
rows.
• Rule Two. If the percentages are calculated across the rows, compare down the
columns.
• When you follow these rule you will be comparing the appropriate rates (a rate is the
percentage of people in a group who have a specific characteristic).
• When you listen to the local and national news, you will often hear the announcers
compare rates.
• The failure of some researchers to follow the two rules just provided has resulted in
misleading statements about how categorical variables are related; so be careful.
Regression Analysis
Regression analysis is a set of statistical procedures used to explain or predict the values
of a quantitative dependent variable based on the values of one or more independent
variables.
• In multiple regression, there is one quantitative dependent variable and two or more
independent variables.
The components of the regression equations (e.g., the Y-intercept and the regression
coefficients). Here are the important definitions:
• The 9,234.56 is the Y intercept (look at the above regression line; it crosses the Y
axis a little below $10,000; specifically, it crosses the Y axis at $9,234.56).
• The 7,638.85 is the simple regression coefficient, which tells you the average amount
of increase in starting salary that occurs when GPA increases by one unit (it is also
the slope or the rise over the run).
• Now, you can plug in a value for X (i.e., starting salary) and easily get the predicted
starting salary.
• If you put in a 3.00 for GPA in the above equation and solve it, you will see that the
predicted starting salary is $32,151.11
• Now plug in another number within the range of the data (how about a 3.5) and see
what the predicted starting salary (check on your work: it is $35,970.54)
• The main difference is that in multiple regression, the regression coefficient is now
called a partial regression coefficient, and this coefficient provides the predicted
change in the dependent variable given a one unit change in the independent
variable, controlling for the other independent variables in the equation. In other
words, you can use multiple regression to control for other variables (i.e., for what
we called in earlier chapters statistical control).
CHAPTER NINE
INFERENTIAL STATISTICS
Introduction
• As you can see, inferential statistics is divided into estimation and hypothesis testing,
and estimation is further divided into point and interval estimation.
Inferential statistics is defined as the branch of statistics that is used to make inferences
about the characteristics of a populations based on sample data.
• The goal is to go beyond the data at hand and make inferences about population
parameters.
Sampling Distributions
One of the most important concepts in inferential statistics is that of the sampling
distribution. That's because the use of sampling distributions is what allows us to make
"probability" statements in inferential statistics.
• A one specific type of sampling distribution is called the sampling distribution of the
mean. If you wanted to generate this distribution through the laborious process of
doing it by hand (which you would NOT need to do in practice), you would randomly
select a sample, calculate the mean, randomly select another sample, calculate
the mean, and continue this process until you have calculated the means for all
possible samples. This process will give you a lot of means, and you can construct
a line graph to depict your sampling distribution of the mean
• The sampling distribution of the mean is normally distributed (as long as your sample
size is about 30 or more for your sampling).
• Also, note that the mean of the sampling distribution of the mean is equal to the
population mean! That tells you that repeated sampling will, over the long run,
produce the correct mean. The spread or variance shows you that sample means
will tend to be somewhat different from the true population mean in most particular
samples.
The standard deviation of a sampling distribution is called the standard error. In other
words, the standard error is just a special kind of standard deviation and you learned what
a standard deviation was in the last chapter.
• The smaller the standard error, the less the amount of variability present in a
sampling distribution.
• The computer programme that a researcher uses (e.g., SPSS and SAS) uses the
appropriate sampling distribution for you.
• The computer programme will look at the type of statistical analysis you select (and
also consider certain additional information that you have provided, such as the
sample size in your study), and then the statistical programme selects the
appropriate sampling distribution.
• (It's kind of like the Greyhound Bus analogy: Leave the driving to us...SPSS will take
care of generating the appropriate sampling distribution for you if you give it the
information it needs.)
So please remember that the idea of sampling distributions (i.e., the idea of probability
distributions obtained from repeated sampling) underlies our ability to make probability
statements in inferential statistics.
Estimation
The key estimation question is "Based on our random sample, what is our estimate of the
population parameter?"
• The basic idea is that you are going to use your sample data to provide information
about the population.
• They can be first understood through the following analogy: Let's say that you take
your car to your local car dealer's service department and you ask the service
manager how much it will cost to repair your car. If the manager says it will cost
you $500 then she is providing a point estimate. If the manager says it will cost
somewhere between $400 and $600 then she is providing an interval estimate.
In other words, a point estimate is a single number, and an interval estimate is a range of
numbers.
• A point estimate is the value of your sample statistic (e.g., your sample mean or
sample correlation), and it is used to estimate the population parameter (e.g., the
population mean or the population correlation).
• For example, if you take a random sample from adults living an the United States
and you find that the average income for the people in your sample is $45,000,
then your best guess or your point estimate for the population of adults in the U.S.
will be $45,000.
In the above example, you used the value of the sample mean as the estimate of the
population mean.
• Again, whenever you engage in point estimation, all you need to do is to use the
value of your sample statistic as your "best guess" (i.e., as your estimate) of the
(unknown) population parameter. Often times, we like to put an interval around our
point estimates so that we realize that the actual population value is somewhat
different from our point estimate because sampling error is always present in
sampling.
• An interval estimate (also called a confidence interval) is a range of numbers inferred
from the sample that has a known probability of capturing the population parameter
over the long run (i.e., over repeated sampling).
• The "beauty" of confidence intervals is that we know their probability (over the long
run) of including the true population parameter (you can't do this with a point
estimate).
• Specifically, if you have the computer provide you with a 95 percent confidence
interval (based on your data), then you will be able to be "95% confident" that it will
include the population parameter. That is, your “level of confidence” is 95%.
• For example, you might take the point estimate of annual income of U.S. adults of
$45,000 (used earlier as a point estimate) and surround it by a 95% confidence
interval. You might find that the confidence interval is $43,000 to $47,000. In this
case, you can be "95% confident" that the average income is somewhere between
$43,000 and $47,000.
• If you have the computer programme give you a 99% confidence interval, then you
can be "99% confident" that the confidence interval provided will include the
population parameter (i.e., it will capture the true parameter 99% of the time in the
long run).
You might ask: So why don’t we just use 99% confidence intervals rather than 95%
intervals, since you will make fewer mistakes?
• The answer is that for a given sample size, the 99% confidence interval will be wider
(i.e., less precise) than a 95% confidence interval. For example, the interval
$40,000 to 50,000 is wider than the interval $43,000 to $47,000.
• 95% confidence intervals are popular with many researchers. However, you may, at
times, want to use other confidence intervals (e.g., 90% confidence intervals or
99% confidence intervals).
Hypothesis Testing
Hypothesis testing is the branch of inferential statistics that is concerned with how well
the sample data support a null hypothesis and when the null hypothesis can be rejected
in favour of the alternative hypothesis.
• First note that the null hypothesis is usually the prediction that there is no relationship
in the population.
• The alternative hypothesis is the logical opposite of the null hypothesis and says
there is a relationship in the population.
• Note that it is the null hypothesis that is directly tested in hypothesis testing (not the
alternative hypothesis).
The Kenyan criminal justice system operates on the assumption that the defendant is
innocent until proven guilty beyond a reasonable doubt. In hypothesis testing, this
assumption is called the null hypothesis. That is, researchers assume that the null
hypothesis is true until the evidence suggests that it is not likely to be true. The
researcher's null hypothesis might be that a technique of counseling does not work any
better than no counseling. The researcher is kind of like a prosecuting attorney. The
prosecuting attorney brings someone to trial when he or she believes there is some
evidence against the accused, and the researcher brings a null hypothesis to "trial" when
he or she believes there is some evidence against the null hypothesis (i.e., the researcher
actually believes that the counseling technique does work better than no counseling). In
the courtroom, the jury decides what constitutes reasonable doubt, and they make a
decision about guilt or innocence. The researcher uses inferential statistics to determine
the probability of the evidence under the assumption that the null hypothesis is true. If this
probability is low, the researcher is able to reject the null hypothesis and accept the
alternative hypothesis. If this probability is not low, the researcher is not able to reject the
null hypothesis. No matter what decision is made, things are still not completely settled
because a mistake could have been made. In the courtroom, decisions of guilt or
innocence are sometimes overturned or found to be incorrect. Similarly, in research, the
decision to reject or not reject the null hypothesis is based on probability, so researchers
sometimes make a mistake. However, inferential statistics gives researchers the
probability of their making a mistake.
• In jurisprudence, the jury rejects the claim of innocence (rejects the null) in the face
of strong evidence to the contrary and makes the opposite conclusion that the
defendant is guilty. Likewise, in hypothesis testing, the researcher rejects the null
hypothesis in the face of strong evidence to the contrary.
• In short, in the procedure called hypothesis testing, the researcher states the null
and alternative hypotheses. Then if the probability value is small, the researcher
rejects the null hypothesis and goes with the alternative hypothesis and makes the
claim that statistical significance has been found.
• When you look at the table, be sure to notice that the null hypothesis has the equality
sign in it and the alternative hypothesis has the "not equals" sign in it.
You may be wondering, when do you actually reject the null hypothesis and make the
decision to tentatively accept the alternative hypothesis?
• Earlier we mentioned that you reject the null hypothesis when the probability of your
result assuming a true null is very small. That is, you reject the null when the
evidence would be unlikely under the assumption of the null.
• In particular, you set a significance level (also called the alpha level) to use in your
research study, which is the point at which you would consider a result to be very
unlikely. Then, if your probability value is less than or equal to your significance
level, you reject the null hypothesis.
• It is essential that you understand the difference between the probability value (also
called the p-value) and the significance level (also called the alpha level).
• The probability value is a number that is obtained from the SPSS computer printout.
It is based on your empirical data, and it tells you the probability of your result or a
more extreme result when it is assumed that there is no relationship in the
population (i.e., when you are assuming that the null hypothesis is true which is
what we do in hypothesis testing and in jurisprudence).
• The significance level is just that point at which you would consider a result to be
"rare." You are the one who decides on the significance level to use in your
research study. A significance level is not an empirical result; it is the level that you
set so that you will know what probability value will be small enough for you to
reject the null hypothesis.
• You must memorize the definitions of probability value and significance level right
away because they are at the heart of hypothesis testing. At the most simple level,
the process just boils down to seeing whether your probability value is less than
(or equal to) your significance level. If it is, you are happy because you can reject
the null hypothesis and make the claim of statistical significance (still don’t forget
the last step of determining practical significance).
• The final step after conducting a hypothesis test, you must interpret your results,
make a substantive, real-world decision, and determine the practical significance
of your result.
• Statistical significance does not tell you whether you have practical significance.
• If a finding is statistically significant then you can claim that the evidence suggests
that the observed result (e.g., your observed correlation or your observed
difference between two means) was probably not just due to chance. That is, there
probably is some non-zero relation present in the population.
• An effect size indicator can aid in your determination of practical significance and
should always be examined to help interpret the strength of a statistically
significant relationship. An effect size indicator is defined as a measure of the
strength of a relationship.
• A finding is practically significant when the difference between the means or the size
of the correlation is big enough, in your opinion, to be of practical use. For example,
a correlation of .15 would probably not be practically significant, even if it was
statistically significant. On the other hand, a correlation of .85 would probably be
practically significant.
The next idea is for you to realize that you will either make a correct decision about
statistical significance or you will make an error whenever you conduct a hypothesis test.
• When the null hypothesis is true you can make the correct decision (i.e., fail to reject
the null) or you can make the incorrect decision (rejecting the true null). The
incorrect decision is called a Type I error or a "false positive" because you have
erroneously concluded that there is an effect or relationship in the population.
• When the null hypothesis is false you can also make the correct decision (i.e.,
rejecting the false null) or you can make the incorrect decision (failure to reject the
false null). The incorrect decision is called a Type II error or a "false negative"
because you have erroneously concluded that there is no effect or relationship in
the population.
• You need to memorize the definitions of Type I and Type II errors, and after working
with many examples of hypothesis testing they will become easier to ponder.
In this last section of the chapter, I apply the process of hypothesis testing (which is also
called "significance testing") to the data set.
• Since we are now using this data set for inferential statistics, we will assume that the
25 people were randomly selected.
• Note that there are three quantitative variables and two categorical variables (can
you list them?).
• Also note that I will use the significance level of .05 for all of my statistical tests below.
(The answers to the earlier questions about the two types of errors are in the first case a
Type I error was made and in the second case a Type II error was made.)
• Before we test some hypotheses, we want to point out the reason WHY we use
hypothesis or significance testing: We do it because researchers do not want to
interpret findings that are not statistically significant because these findings are
probably nothing but a reflection of chance fluctuations.
Note that in all of the following examples we will be doing the same thing. we will get
the p-value and compare it to my preset significance level of .05 to see if the
relationship is statistically significant. And then I will also interpret the results by
looking at the data, looking at an effect size indicator, and by thinking about the
practical importance of the result.
• Again, after practice, significance becomes very easy because you do the same
procedure every single time. Determining the practical significance is probably the
hardest part.
t-Test for Independent Samples One frequently used statistical test is called the t-test
for independent samples. We do this when we want to determine if the difference between
two groups is statistically significant.
Here is an example of the t-test for independent samples using our recent college
graduate data set:
• Research Question: Is the difference between average starting salary for males and
the average starting salary for females significantly different?
• Here the hypotheses (note that they are stated in terms of population parameters):
• Null Hypothesis Ho: μM = μF (i.e., the population mean for males equals the population
mean for females)
• Alternative Hypothesis H1: μM ≠ μF (i.e., the population mean for males does not equal
the population mean for females)
The probability value was .048 (we got this off of my SPSS printout).
• Since my probability value of .049 is less than my significance level of .05, I reject
the null hypothesis and accept the alternative.
• I conclude that the difference between the two means is statistically significant.
• Now I would need to look at the actual means and interpret them for substantive and
practical significance.
• I can simply look at these means and see how different they are.
• To help in judging how different the means are, I also calculated an effect size
indicator called eta-squared which was equal to .16. This tells me that gender
explains 16% of the variance in starting salary in my data set.
• I conclude that males earn more than females, and because this is an important
issue in society, I also conclude that this difference is practically significant.
• Here the hypotheses (note that they are stated in terms of population parameters):
• Null Hypothesis. Ho: μE = μA&S = μB (i.e., the population means for education students,
arts and sciences students, and business students are all the same)
• Alternative Hypothesis. H1: Not all equal (i.e., the population means are not all the
same)
The probability value was .001 (I got this off of my SPSS printout).
• Since .001 is less than .05, I reject the null hypothesis and accept the alternative. I
conclude that at least two of the means are significantly different.
• The effect size indicator, eta-squared, was equal to .467 which say that almost 47
percent in the variance of starting salary was explained or accounted for by
differences in college major.
• Now we need to find out which of the three means are different.
• In order to decide which of these three means are significantly different, I must follow
the “post hoc testing” procedure explained in the next. Notice that is I had done an
ANOVA with an independent variable that was composed of only two groups, I
would not need follow-up tests (which are only needed when there are three or
more groups).
• Education: $29,500
• Business: $36,714.29
The question in post hoc testing is "Which pairs of means are significantly different?"
In this case that results in three post hoc tests that need to be conducted:
1. First, is the difference between education and arts and sciences significantly
different"
• Here are the null and alternative hypotheses for this first post hoc test:
• Null Hypothesis Ho: μE = μA&S (i.e., the population mean for education majors
equals the population mean for arts and sciences majors)
• Alternative Hypothesis H1: μE ≠ μA&S (i.e., the population mean for education
majors does not equal the population mean for arts and sciences majors)
• The Bonferroni "adjusted" p-value, which I got off the SPSS printout, was .233.
• Since .233 is > .05, I fail to reject the null that the population means for
education and arts and sciences are equal.
• Here are the null and alternative hypotheses for this first post hoc test:
• Null Hypothesis Ho: μE = μB (i.e., the population mean for education majors
equals the population mean for business majors)
• Since .001 is < .05, we reject the null hypothesis that the two population
means are equal.
• We make the claim that the difference between the means is statistically
significant.
• We also claim that the salaries are higher for business than for education
students in the populations from which they were randomly selected.
• Because this finding could affect many students’ choices about majors and
because it may also reflect the nature of salary setting by the private versus
public sectors, We also conclude that this difference is practically
significant.
3. Third, is the difference between arts and sciences and business significantly
different?
• Here are the null and alternative hypotheses for this first post hoc test:
• Null Hypothesis Ho: μB = μA&S (i.e., the population mean for business majors
equals the population mean for arts and sciences majors)
• Alternative Hypothesis H1: μB ≠ μA&S (i.e., the population mean for business
majors does not equal the population mean for arts and sciences majors)
• Since .031 is < .05, we reject the null hypothesis that the two population
means are significantly different.
• We make the claim that this difference between the means is statistically
significant.
• We also claim that the salaries are higher form arts and sciences than for
education students in the populations from which they were randomly
selected.
• Because this finding could affect students’ choices about majoring in business
versus arts and sciences, I believe that this finding is practically significant.
In short, based on my post hoc tests, I have found that two of the differences in starting
salary were statistically significant, and, in my view, these differences were also
practically significant.
• We conclude that GPA and starting salary are correlated in the population.
• If you square the correlation coefficient you obtain a “variance accounted for” effect
size indicator: .63 squared is .397 which means that almost 40 percent of the
variance in starting salary is explained or accounted for by GPA
• Because the effect size is large and because GPA is something that students can
control through studying, we conclude that this statistically significant correlation is
also practically significant.
• Null Hypothesis. H0: βYX1.X2 = 0 (i.e., the population regression coefficient expressing
the relationship between starting salary and GPA, controlling for GRE Verbal is
equal to zero; that is, there is no relationship)
• Since .035 is < .05, I conclude that the relationship expressed by this regression
coefficient is statistically significant.
• A good measure of effect size for regression coefficients is the semi-partial
2
correlation squared (sr ) . In this case it is equal to .10, which means that 10% of
the variance in starting salary is uniquely explained by GPA
• Because GPA is something we can control and because the effect is explains a good
amount of variance in starting salary, we conclude that the relationship expressed
by this regression coefficients is practically significant.
• Null Hypothesis. H0: βYX2.X1 = 0 (i.e., the population regression coefficient expressing
the relationship between starting salary and GRE Verbal, controlling for GPA is
equal to zero; that is, there is no relationship)
• Alternative Hypothesis. H1 : βYX2.X1 ≠ 0 (i.e., the population regression coefficient
expressing the relationship between starting salary and GRE Verbal, controlling
for GPA is NOT equal to zero; that is, there IS a relationship)
• Since .014 is < .05, we conclude that the relationship expressed by this regression
coefficient is statistically significant.
• Because GRE Verbal is also something we can work at (as well as take preparation
programmes for) and because the effect is explains15% of the variance in starting
salary, we conclude that the relationship expressed by this regression coefficient
is practically significant.
The Chi-Square Test for Contingency Tables This test is used to determine whether a
relationship observed in a contingency table is statistically significant.
• Research Question: Is the observed relationship between college major and gender
statistically significant?
• Because the effect size indicator suggested a moderately large relationship and
because of the importance of these variables in real world politics, we would also
conclude that this relationship is practically significant.
CHAPTER TEN
WRITING THE RESEARCH REPORT
Introduction
The purpose of this final chapter is to provide useful advice on how to organize and write
a research paper that has the potential for publication.
There are four main sections in this chapter:
1. General Principles Related to Writing the Research Report.
2. Writing Quantitative Research Reports Using the APA Style.
3. Writing Qualitative Research Reports.
4. Writing Mixed Research Reports.
• Simple, clear, and direct communication should be your most important goal when
you write a research report.
Language The following three guidelines will help you select appropriate language in
your report:
1. Choose accurate and clear words that are free from bias. One way to do this is to be
very specific rather than less specific.
2. Avoid labelling people whenever possible.
3. Write about your research participants in a way that acknowledges their participation.
• For example, avoid the impersonal term "subject" or subjects; words such as
“research participants” or children or adults are preferable.
Keeping in mind the above guidelines, you should give special attention to the following
issues which are explained more fully in our chapter and, especially, in the APA
Publication Manual:
• Sexual Orientation. Terms such as homosexual should be replaced with terms such
as lesbians, gay men, and bisexual women or men. Specific instances of sexual
behavior should be referred to with terms such as same gender, male-male,
female-female, and male-female.
• Racial and Ethnic Identity. Ask participants about their preferred designations and
use them. When writing this term, capitalize it (e.g., African American).
• Disabilities. Do not equate people with their disability. For example, refer to a
participant as a person who has cancer rather than as a cancer victim.
• Age. Acceptable terms are boy and girl, young man and young woman, male
adolescent and female adolescent. Older person is preferred to elderly. Call people
eighteen and older men and women.
Editorial Style
Italics.
• As a general rule, use italics infrequently. If you are submitting a paper for
publication, you can now use italics directly rather than using underlines to signal
what is to be italicized.
Abbreviations
• Use abbreviations sparingly, and try to use conventional abbreviations (such as IQ,
e.g., c.f., i.e., etc.).
Headings
• The APA Manual and our chapter specify five different levels of headings and the
combinations in which they are to be used in your report.
• If you are using two levels of headings, centre the first level and use upper and lower
case letters (i.e., do not use all caps), and place the second heading on the left
side in upper and lower case letters and in italics. Here is an example:
Method
Procedure
• If you are using three levels of headings, do the first two levels as just shown for two
levels. The third level should be in upper and lowercase letters, italicized, indented,
and ending with a period.
Method
Procedure
Instruments. (Start the text on this same line)
Quotations
• Quotations of fewer than 40 words should be inserted into the text and enclosed in
double quotation marks. Quotations of 40 or more words should be displayed in a
free standing block of lines without quotation marks. The author, year, and specific
page from which the quote is taken should always be included.
Numbers
• Use words for numbers that begin a sentence and for numbers that are below ten.
• See the APA Publication Manual for exceptions to this rule. Physical Measurements
• APA recommends using metric units for all physical measurements. You can also
use other units, as long as you include the metric equivalent in parentheses.
Activity 29
What are some of the commonly used metric units?
• Provide enough information to allow the reader to corroborate the results. See your
book and the APA manual for specifics (e.g., an analysis of variance significance
test of four group means would be presented like this: F(3, 32) = 8.79, p ═ .03).
• Note that the use of an equal sign is preferred when reporting probability values.
• If a probability value is less than .001, then use p < .001 rather than p = .000
• APA format is an author-date citation method. The text shows the specifics.
• Frequently you will put references at the end of sentences. Here is an example,
“Mastery motivation has been found to affect achievement with very young children
(Turner & Johnson, 2003).”
Reference List
• Use only one space between the end of a sentence and the beginning of the next
sentence.
• Discussion of author notes, footnotes, tables, figure captions, and figures is only in
the textbook (and, of course, in the APA Publication Manual).
1. Title Page
• Your paper title should summarize the main topic of the paper in about 10 to 12
words.
2. Abstract
3. Introduction
• This section is not labelled. It should present the research problem and place it in
the context of other research literature in the area.
4. Method
• This section does not start on a separate page in a manuscript being submitted for
review.
• The most common subsections are Participants (e.g., list the number of participants,
their characteristics, and how they were selected), Apparatus or Materials or
Instruments (e.g., list materials used and how they can be obtained), and
Procedure (e.g., provide a step-by-step account of what the researcher and
participants did during the study so that someone could replicate it).
5. Results
• It is where you report on the results of your data analysis and statistical significance
testing.
• Be sure to report the significance level that you are using (e.g., "An alpha level of .05
was used in this study") and report your observed effect sizes along with the tests
of statistical significance.
• Tables and figures are expensive but can be used when they effectively illustrate
your ideas.
6. Discussion
• This is where you interpret and evaluate your results presented in the previous
section.
7. References
• Centre the word References at the top of the page and double-space all entries.
• Title Page and Abstract. The goals are exactly the same as before. You should
provide a clear and descriptive title. The abstract should describe the key focus of
the study, its key methodological features, and the most important findings.
• Introduction. Clearly explain the purpose of your study and situate it in any research
literature that is relevant to your study. In qualitative research, research questions
will typically be stated in more open-ended and general forms such as the
researcher hopes to "discover," "explore a process," "explain or understand," or
"describe the experiences."
• Method. It is important that qualitative researchers always include this section in their
reports. This section includes information telling how the study was done, where it
was done, with whom it was done, why the study was designed as it was, how the
data were collected and analyzed, and what procedures were carried out to ensure
the validity of the arguments and conclusions made in the report.
• Results. The overriding concern when writing the results section is to provide
sufficient and convincing evidence. Remember that assertions must be backed up
with empirical data. The bottom line is this: It's about evidence.
i. You will need to find an appropriate balance between description and
interpretation in order to write a useful and convincing results section.
ii. Several specific strategies are discussed in the chapter (e.g., providing quotes,
following interpretative statements with examples, etc.).
iii. We state that regardless of the specific format of your results section, you must
always provide data (i.e., descriptions, quotes, data from multiple sources, and
so forth) that back up your assertions.
iv. Effective ways to organize the results section are organizing the content around
the research questions, a typology created in the study, the key themes, or
around a conceptual scheme used in the study.
v. It can also be very helpful to use diagrams, matrices, tables, figures, etc. to
help communicate your ideas in a qualitative research report.
• Discussion. You should state your overall conclusions and offer additional
interpretations in this section of the report. Even if your research is exploratory, it
is important to fit your findings back into the relevant research literature. You may
also make suggestions for future research here.
Writing Mixed Research Reports
• First, know your audience and write in a manner that clearly communicates.
• The suggestions already discussed in this chapter for quantitative and qualitative
also apply for mixed research.
3. Write essentially two separate subreports (one for the qualitative part and
one for the quantitative part).
4. NOTE: in all cases, if you are writing a mixed research report, mixing must
take place somewhere (e.g., at a minimum the findings must be related and
“mixed” in the discussion section).