0% found this document useful (0 votes)
18 views15 pages

Research Methodology and Statistical Reasoning

The document provides introductory notes on research methodology and statistical reasoning, covering essential topics such as the importance of statistics, research design, and the role of variables. It outlines different research methods, including experimental and non-experimental designs, and discusses key concepts like population, sample, hypothesis, and data analysis. Additionally, it addresses common errors, biases, and software tools used in statistical analysis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views15 pages

Research Methodology and Statistical Reasoning

The document provides introductory notes on research methodology and statistical reasoning, covering essential topics such as the importance of statistics, research design, and the role of variables. It outlines different research methods, including experimental and non-experimental designs, and discusses key concepts like population, sample, hypothesis, and data analysis. Additionally, it addresses common errors, biases, and software tools used in statistical analysis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/336374790

Research Methodology and Statistical Reasoning Introductory Notes

Article · October 2019

CITATIONS READS
0 5,693

1 author:

Johar M. Ashfaque

199 PUBLICATIONS 253 CITATIONS

SEE PROFILE

All content following this page was uploaded by Johar M. Ashfaque on 10 October 2019.

The user has requested enhancement of the downloaded file.


Research Methodology and Statistical Reasoning
Introductory Notes

Johar M. Ashfaque

Contents
1 Importance of Statistics 2
1.1 Introduction to Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Population and Sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Research Problem and Hypothesis . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Research Methods and Design 4


2.1 The Experimental Method . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Between Subjects or Independent Groups Design . . . . . . . . . . . . . . 4
2.3 Repeated Measures or Within Subjects Design . . . . . . . . . . . . . . . . 5
2.4 Complex/Factorial Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.5 Non-Experimental Design . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.6 Quasi-Experimental or Natural Groups Design . . . . . . . . . . . . . . . . 6
2.7 Data Analyses of Observational and Descriptive Data . . . . . . . . . . . . 6
2.8 Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.9 Survey Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.10 Choosing the Right Research Method . . . . . . . . . . . . . . . . . . . . . 7

3 The Normal Curve and its Importance in Choosing a Statistic 7


3.1 The Normal Curve and its Properties . . . . . . . . . . . . . . . . . . . . . 7
3.2 Skewness, Kurtosis and Tests of Normality . . . . . . . . . . . . . . . . . . 8

4 NOIR 8
4.1 Nominal Scale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4.2 Ordinal Scale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4.3 Interval Scale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4.4 Ratio Scale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4.5 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

5 Descriptive Statistics 10
5.1 Measures of Central Tendency: Mean . . . . . . . . . . . . . . . . . . . . . 10
5.2 Measures of Central Tendency: Median, Mode . . . . . . . . . . . . . . . . 11
5.3 Measures of Variability: Range . . . . . . . . . . . . . . . . . . . . . . . . 11
5.4 Measures of Variability: Quartile Deviation . . . . . . . . . . . . . . . . . . 11
5.5 Measures of Variability: Variance . . . . . . . . . . . . . . . . . . . . . . . 12
5.6 Measures of Variability: Standard Deviation . . . . . . . . . . . . . . . . . 12

1
6 Inferential Statistics 12

7 Interpreting a Statistic 12
7.1 Factors related to Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . 13
7.2 Effect Size and Practical Significance . . . . . . . . . . . . . . . . . . . . . 13

8 Common Errors and Biases 13


8.1 Sources of Bias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
8.2 Errors in Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

9 Software 14
9.1 SPSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
9.2 R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
9.3 PSPP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

1 Importance of Statistics

1.1 Introduction to Research

This course focuses on the basic components of research and aims at improving one’s
knowledge about how to carry out a research.

1.2 Population and Sample

Consider a study where participants have to learn two lists of words: words denoting
emotion and words having a non-emotional or neutral meaning. You want to study if
participants remember one category of words better than the other.
• Population: This is all the data you are interested in collecting; a population can
be large or small, as long it covers all of your data. In the above example, your
population can be defined as any person between ages 1640.
• Sample: This is a subset of the population you want to study; a sample must be
adequate and representative of the population it is drawn from. Using the example
above, your sample can be 300 individuals who have volunteered to be a part of
your study.
• The characteristics of a population are called parameters. E.g., standard deviation
of the population.
• Characteristics of a sample are called statistics. For instance, inferential statistics
are calculated for the sample in order to estimate population parameters.
• Sampling can be done in two ways: Probability Sampling and Non-probability sam-
pling.
• In probability sampling, every unit of the population has equal probability (or
chance) of being selected in the sample. For example, random sampling, strati-
fied sampling, and cluster sampling.

2
• In non-probability sampling, all units of population does not have an equal proba-
bility (or chance) of getting selected in the sample. For example, quota sampling,
purposive sampling, and convenience sampling.

1.3 Research Problem and Hypothesis

• One of the first steps in research is identifying a research problem and a hypothesis.
• Hypothesis: This is a tentative explanation of a problem, which is expressed in the
form of a prediction of some outcome. For instance, it is hypothesized that words
with an emotional connotation will be better recalled than neutral words.
• Research Problem: This is framed like a question or statement that speculates
the relationship between two variables. Unlike a hypothesis a research problem is
exploratory in nature and does not give a prediction for the variables.

1.4 Variables

• Variable: In research, a variable is a psychological construct that takes on different


values.
• Independent Variable (IV): The variable that the researcher manipulates or selects
to determine its effect on the Dependent Variable (DV). In the above example, the
IV is the category of the words: emotional or neutral.
• In its simplest form, the IV has at least two levels. One level, where there is some
form of treatment, and another where the same treatment is absent. The levels of
the IV in our example are 2: Emotional and Neutral.
• Dependent Variable (DV): This is the measure of behaviour that is used to observe
the effects of the IV. Its outcome depends on the independent variable. Research
aims to determine whether the levels of the IV cause any difference in the DV. The
DV in our example is participants performance on a recall test.
• Sometimes, the DV can be explained by some variable other than the IV. This
other variable is called a confounding variable. A possible confound in our example
is if any participant has already undergone a similar experiment in the past or is a
psychology student and hence be acquainted with the experiment hypothesis.
• When such a variable is present, it is difficult to tell whether the change in the DV
is brought about the IV or by the confounding variable.
• The effects of these confounding variables can be managed by introducing Controls
in the experiment.
• One such control is the technique of counter-balancing that helps correct for order
effects. Thus, counter-balancing involves presenting the IVs in all possible orders
in order to neutralize order effects. In our example, we would present half the
participants with the emotional list first followed by the neutral list, whereas the
other half would get the neutral list followed by the emotional list.

3
2 Research Methods and Design

2.1 The Experimental Method

• An experiment involves the manipulation of one or more IVs order to observe their
effects on one or more DVs. An experiment allows us to:
– Test a hypothesis: By allowing us to exercise controls, experiments can attempt
to eliminate extraneous factors and test a hypothesized relationship between
the IV and the DV.
– Make causal inferences between the IV and DV: Experiments also allow us to
state with a high degree of confidence that changes in the IV cause changes in
the DV.
• Three conditions are necessary in order to establish a causal link between the IV
and the DV:
– Covariation: There is an observed relationship between the IV and the DV.
– Time-order relationship: The change in the DV is observed after the IV is ma-
nipulated implying that that the changes are contingent on the manipulations
of the IV.
– Elimination of plausible alternative causes: Accomplished by the use of controls
and counter-balancing.

2.2 Between Subjects or Independent Groups Design

• Main characteristics: Each participant undergoes a different level of the IV.


• Limitation: Individual differences among participants in the different groups might
confound the results.
• For this reason the different groups are matched, i.e., the researcher ensures that the
groups are similar to each other in important characteristics that might confound
the results.
• For instance, if socio-economic status is a possible confounding factor between dif-
ferent groups, the researcher can divide the participants into low, middle, and high
income groups, on the basis of reported characteristics. Each socio-economic group
will consist of respondents from each of these levels, and will undergo different con-
ditions of the IV.
• These matched groups are then randomly assigned to one of the conditions or levels
of the IV. The rationale is that random assignment will neutralize or balance out
the individual differences between participants so that they are similar in all other
characteristics except for the condition or level of the IV they undergo.

4
2.3 Repeated Measures or Within Subjects Design

• Main characteristics: Each participant undergoes all levels of the IV.


• When is it used: This design is suitable when the sample is small or specialized or
to study changes in behaviour over time (e.g., learning research). Compared to the
Between Subjects Design, individual differences do not have a confounding effect on
the DV in this design.
• Limitations: Due to repeated testing in this design, participants might show practice
effects. Their performance can either get better from one condition to the next, or
worse across conditions due to fatigue effects.
• Practice effects can be controlled by counter-balancing the order of presentation of
the IV across various conditions.

2.4 Complex/Factorial Designs

• Often, research involves more than one IV in which case a complex or factorial
design would be used.
• In this, each level of IV1 is paired with each level of IV2, so that we can observe each
IV independently (main effect) as well as in interaction with other IVs (interaction
effect).
• For example: We hypothesize that the audio-visual method of teaching leads to
better performance on a class test than the black-board method, in low ability
students. Here, we have 2 IVs: Method of teaching (manipulated IV1): Audio-
visual and black board; and Ability of the student (selected IV2): Low, Middle or
High.
• An example of a main effect would be higher than baseline test scores by audio-visual
method of teaching for all three groups of students. An interaction effect would be
a specific improvement in performance when the low ability group is taught using
the audiovisual method.
• The number of conditions in a complex design is obtained by multiplying the number
of levels of all IVs involved. Taking the above example, since IV1 has two levels and
IV2 has three levels, this design is represented as a 2 × 3 factorial design.
• Factorial designs can be of three types: Completely between subjects factorial de-
signs, completely within subjects factorial designs and mixed factorial designs
• In a completely between subjects factorial design, different sub-groups are randomly
assigned to undergo different levels of independent variable. For example, for a 2 × 2
factorial design, there will 4 subgroups undergoing each level of the IV. Thus, both
IVs will be between groups IVs. For instance, IV1 is gender (male, female), and
IV2 is location (urban, rural).
• In a completely within subjects factorial design, the same group of participants
will go through all the levels of the IVs. For example, in a 2 × 2 factorial design
where IV1 is stimulus (word, image), and IV2 is valence (positive, negative), all

5
participants will be presented with positive words, positive images, negative words,
and negative images.
• In a mixed factorial design, one IV is a between subjects variable, and the other
IV is a within subjects variable. For example, in a 2 × 2 factorial design where
IV1 is gender (male, female) and IV2 is stimulus (word, image), both male and
female participants will be presented with images and words. Thus, participants
are belong to different levels of one independent variable and participate in all the
levels of other variable. It is a combination of completely between subjects design
and completely within subjects design.

2.5 Non-Experimental Design

It is often not practical to observe behaviour in the strictly controlled conditions that
experimental research demands. Quasi experimental design is one that applies an experi-
mental interpretation to results that do not meet all the requirements of a true experiment.
It lacks the degree of control found in true experiments. It controls some but not all the
confounding variables, which gives threat to its internal validity.

2.6 Quasi-Experimental or Natural Groups Design

• Main characteristic: This uses existing individual differences among participants.


Contrary to a Between-Subjects Design where the aim is to neutralize these dif-
ferences across different conditions, Natural Groups Design maximizes one or few
differences among its participants.
• For example, research on brain lesions would require a select sample of participants
who already have brain lesions. Similarly, to study the effects of divorce, one would
require a group of divorced persons.

2.7 Data Analyses of Observational and Descriptive Data

• Data analyses of observational and descriptive data can involve the following:
– Data Reduction Procedures: These procedures help abstract and summarize
the data. An example of such a procedure is coding, in which the data are
broken down into smaller units and classified as per pre-decided categories.
– Content Analysis: Such an analysis can be done with archival records, written
communications, film and other media, online material like tweets or blog en-
tries. Once the source of the content analysis is decided, the next step involves
sample selection from the source (for instance, 10 random blog entries made
by a celebrity for a month, up to 6 months). Next, the content will be coded
in order to further interpret emerging patterns.
– Descriptive Statistics can also be computed for such data on the basis of fre-
quency counts, timing, and ratings.

6
– When data is independently observed, rated or analyzed by two or more
judges/raters, the inter-rater agreement can be calculated on the basis on the
percentage of agreement between both raters.

2.8 Case Study

• Case studies are commonly used in fields such as clinical psychology, neurology,
anthropology, and criminal psychology. They describe and analyze one individual.
Case studies lack the level of control used in experimental research.
• Scales may be used in such research to measure participants’ judgements or their
relative standing on a personality trait.

2.9 Survey Research

• Survey data can be obtained through personal interviews, telephone interviews or


internet surveys.
• Most surveys use questionnaires as means to measure the variables of study. Ques-
tionnaires mainly provide a measure of the demographic details of the participants
along with their attitudes, preferences.
• The accuracy of a questionnaire depends on the care taken in framing its contents
and on the specificity of its language, which helps illicit unambiguous responses from
participants.

2.10 Choosing the Right Research Method

• Existing literature to see the methodology used in similar studies in the past.
• Goal of the study. If a study is observational, it would require a qualitative method-
ology than if it were purely experimental.
• Nature of data collected. The level of measurement of data will determine the
analyses to be performed.

3 The Normal Curve and its Importance in Choosing


a Statistic

3.1 The Normal Curve and its Properties

• An important aspect of the description of a variable is the shape of its frequency


distribution, which describes the frequency of values from different ranges of the
variable. Data is normally distributed when the distribution has an equal mean,
median, and mode.

7
• A normally distributed curve is also called the bell curve because of its shape, where
50% of the values are below the mean and 50% of the values are above the mean.
• The exact shape of the normal distribution is defined by a function, which has two
parameters: mean and standard deviation. The standard deviation is a measure of
dispersion that demonstrates how spread out the data is.
• A characteristic property of the normal distribution is that 68% of observations fall
within a range of ±1 standard deviation from the mean; a range of ±2 standard
deviations includes 95% of the scores; and 99.7% of the observations fall within ±3
standard deviations of the mean. A standard normal distribution has mean = 0 and
standard deviation = 1.

3.2 Skewness, Kurtosis and Tests of Normality

• Significant skewness and kurtosis clearly indicate that data are not normal. Skew-
ness is a measure of the lack of symmetry. Kurtosis is a measure of whether the
data are peaked or flat relative to a normal distribution.
• The skewness and kurtosis of the normal distribution are 0 and 3 respectively. If
these are markedly different from zero, data is asymmetrically distributed.
• The histogram is an effective graphical technique for showing both the skewness and
kurtosis of data set. In statistics, normality tests are used to determine if a data set
has a normal distribution and in descriptive statistics, a goodness of fit test is used.
• All parametric tests follow the assumption of normal distribution while non-parametric
tests do not require data to be normally distributed. Normal distribution is also a
prerequisite for calculating confidence intervals.
• For inferential statistics, tests of normality that are commonly used include Kolmogorov-
Smirnov test (N > 50), or the Shapiro-Wilk’s W test (N < 50).

4 NOIR
A factor that determines the amount of information that can be provided by a variable is
its scale of measurement. There are four basic levels of measurement, namely: nominal,
ordinal, interval, and ratio.

4.1 Nominal Scale

• The scale involves categorizing an event into one of a number of discrete categories.
This scale of measurement is used when behaviours and events are classified into
mutually exclusive categories. The variable may also be referred to as a categorical
variable.

8
• One observation cannot fall under more than one category. A very common way of
summarizing nominal data is by reporting the frequency in the form of proportion
or percentage in each of the categories.
• Examples of variables that would typically fall under this category are gender and
marital status. The attributes of a variable are only named, and the measure of
central tendency that can be used is the mode.

4.2 Ordinal Scale

• The second level of measurement is called the ordinal scale. This involves ranking
the events that need to be measured.
• When using an ordinal scale, the central tendency of a group of items can be de-
scribed by using the group’s mode or median.
• The interval between values is not to be interpreted in an ordinal measure.
• Other statistical ways of analyzing an ordinal scale are percentiles and rank order
correlations, but not the mean. Here the attributes of the variable can only be
ordered.
• Variables that are ordinal would be ranking of scores, brands, or products. These
rankings only give the direction of difference between variables, not the degree of
difference.

4.3 Interval Scale

• The third level of measurement is an interval scale that involves specifying how far
apart two events are on a given dimension. Interval scales specify relative size or
degree of difference between the items measured.
• In an interval measurement, the distances between attributes have meaning. An
interval scale does not have an absolute meaningful zero point.
• Intervals between categories are usually equal. Product moment correlations can be
used to measure such scales.

4.4 Ratio Scale

• The fourth level of measurement is called a ratio scale. A ratio scale has all the
properties of an interval scale, but a ratio scale also has an absolute zero point.
• In terms of arithmetic operations, a zero point makes the ratio of scale values mean-
ingful.
• Examples of variables that are commonly considered to be ratio are temperature,
weight, and length where, zero is not absolute but has meaning and value.

9
• The scale that contains the richest information about an object is ratio scaling. The
ratio scale also contains all of the information of the previous three levels.

4.5 Concluding Remarks

The four levels of measurement are very important for analyzing the results of a research
study. At the lower levels of measurement, assumptions tend to be less restrictive and
data analyses tend to be less sensitive. At each level up the hierarchy, the current level
includes all of the qualities of the one below it and adds something new. In general, it
is desirable to have a higher level of measurement (e.g., interval or ratio) rather than a
lower one (nominal or ordinal).

5 Descriptive Statistics
Descriptive statistics allow us to i) describe and ii) summarize raw data. They help make
sense of vast quantities of raw data by determining emerging patterns and trends. These
statistics do not allow us to i) make conclusions beyond the available data about the
population; ii) test hypotheses. Descriptive statistics provide us with statistics on the
basis of inferential procedures can be conducted.

5.1 Measures of Central Tendency: Mean

These measures represent a single value that describes the data by identifying the central
position within the data.
Arithmetic Mean
• The most frequently used measure of central tendency, the mean, is the sum of all
measurements divided by the number of observations in the data set.
• The mean includes every value in the data set for its calculation. This makes the
mean a more robust measure than median or mode as even small changes in the
data set can be reflected.
• However, the mean is susceptible to the presence of outliers in the data set. The
presence of outliers can artificially increase or decrease the value of the mean. In
such cases, the mean can be used after using an appropriate method to treat outliers,
else the median would be more representative.
• Another situation where the mean may not be the best measure to use is when the
data distribution is significantly skewed. In a normal distribution the mean, median,
and mode is the same. However, skewness drags the mean away from its central
location making it unrepresentative of the data. In such situations, the median is a
better measure as it is less affected by skewness.

10
5.2 Measures of Central Tendency: Median, Mode

Median
• The median is the middle value or score in an ordered data set.
• Unlike the mean, it is less affected by outliers or skewness. As a rule of thumb, when
the data set is skewed, the median is more representative of the central tendency
than mean.
Mode
• Mode is the most frequent value or score in a data set. It represents the most
popular alternative chosen by the sample, and is often used with categorical data.
• However, the mode is not unique to one category or option. When two or more
values or options share the same frequency the data set will have multiple modes.
In such a situation it is difficult to decide which mode is the most representative of
the data.
• The mode is also not the most representative measure when the most common score
is far away from the rest of the data.

5.3 Measures of Variability: Range

These measures describe the spread of the data set. For instance, if the mean score of
a data set of 150 students is 60 out of 100, not all students will have scored around
this number. A measure of variability will tell us how the scores are spread across the
150 students. Measures of central tendency are usually reported along with measures of
variability to given a holistic picture of the data set. Thus, if the spread of the data
is large, it implies that that the mean does not adequately represent the data. A large
spread means that there are larger differences between individual scores. Moreover, from
a research perspective a small spread is more desirable as it implies smaller variability in
our sample.
Range
• Difference between the highest and lowest score.

5.4 Measures of Variability: Quartile Deviation

• Quartiles describes the spread of the data by dividing the data set down to quarters.
• Quartiles are less affected by skewness and outliers and often reported with the
median is such cases.
• The second quartile is equivalent to the median.
• Quartiles are often reported as the interquartile range, which is the difference be-
tween the first and the third quartile

11
• However, the quartiles do not take into account every score in the data set. Thus,
measures like the variance are more representative of the spread.

5.5 Measures of Variability: Variance

• Variance is the difference or the deviation of the scores from the mean. These
deviations are squared to remove any negative values.
• If the data is mostly centred around the mean, the variance will be small. Large
variance implies more spread in the data.
• Disadvantages: Because variance squares the deviations from the mean, it can give
undue weight to outliers. Variance is expressed in square units and not the direct
unit of measurement of the data set. This makes it difficult to relate the variance
to our data set.

5.6 Measures of Variability: Standard Deviation

• The SD describes how much the data set deviates from or is different from the
mean. It is calculated as the square root of the variance and symbolized by σ for
the population SD, and s for sample SD.
Like the mean, the SD is used with continuous data and appropriate only when the
distribution is not skewed or has outliers.

6 Inferential Statistics
• Inferential statistics are used to draw conclusions and test hypotheses. When using
inferential statistics, we analyze the sample to draw relationships, causal links, or
predictions about the population we are interested in.
• Thus, an important precondition when using inferential statistics is having a repre-
sentative sample. This is achieved by using the right sampling technique. However,
sampling naturally involves sampling errors and cannot be expected to accurately
represent the population.

7 Interpreting a Statistic
After choosing and applying an appropriate statistical procedure, the next step is under-
standing what the statistic is telling us about our data. Correct interpretation of statistics
takes them from being just numbers and gives them meaning and relevance in research.

12
7.1 Factors related to Statistics

• Magnitude or Size of the relationship: For instance, if we find that audio visual
methods of teaching are more effective in low performing students as compared
to high performing ones, the value of this relationship (as given by the chosen
statistic) is its magnitude or size. When this is large, it means that we can predict
our dependent variable based on the independent variable (at least among members
of our sample).
• Reliability or significance of the relationship: This tells us how representative our
results are for the entire population. Statistics are based on the rationale of gener-
alizing to the population, based on a representative sample. The p-value provides
this significance.

7.2 Effect Size and Practical Significance

• The effect size describes how much effect on the DV was as a result of the IV.
• An important feature of the effect size is that it is not affected by the sample size.
Thus, on finding a statistically significant relationship, checking for the effect size
confirms if the results are an artifact of a large sample size or due to a genuine
relationship.
• The effect size is expressed in terms of Cohens d. In research literature, effect sizes
of .20, .50 and .80 represent small, medium and large effects of the IV on the DV.
• Effect size is also expressed in terms of ‘eta-squared’. It tells us how much variation
in the DV can be attributed to the IV.
• The effect size is a measure of whether the results have practical significance beyond
being statistically significant.

8 Common Errors and Biases

8.1 Sources of Bias

These refer to conditions that limit the external validity of our findings based on statistics.
• Lack of Representative Sampling: Ideally, in a representative sample, each individual
in the population has an equal chance of being selected. However, this may not
always be the case in practice.
• Assumptions of Normality: Valid interpretations can be made as long as they your
data fulfil the basic assumptions of a particular statistic. One common assumption
is that of normality which is often disregarded. Even if the statistic is a robust one,
it is necessary to ensure that the data are normally distributed.
• Assumption of Independence of Observations: Most statistical tests assume that
each participant has individually undergone the IV, without suggestion or influence

13
of other participants or the researcher. However, when the sample is a concentrated
one, say like a class of second year psychology students, it is difficult to maintain
independence of observations.

8.2 Errors in Methodology

• Statistical Power: The power of a statistical test depends on four factors: sample
size, effect size, type I error rate (alpha value), are variability in the sample. These
factors can help calculate the power of the statistic being used. A low power level
will enable you to reconsider the methodology and analyses to be used.
• Multiple Comparisons: Errors can occur when one has several variables to compare
and can be intensified if these comparisons are done in a haphazard manner. Thus, if
we calculate correlations between each combination of our several variables spurious
correlations there is a probability that some of them will be spurious just because
of the number of correlations.
• Measurement Error: This refers to the measurement tools employed in research
such as rating scales, questionnaires, and psychometric tests. To reduce error due
to faulty construction of these tools, checking their reliability and validity becomes
essential.

9 Software

9.1 SPSS

SPSS (Statistical Package for the Social Sciences) is a software package used for statistical
analysis. The software is also used in health sciences and marketing. The statistics
included in this software are descriptive statistics (cross- tabulation, frequencies), bivariate
statistics (means, t-test, ANOVA, correlation and non-parametric tests), linear regression,
factor analysis, and cluster analysis. The package also provides graphical representation
of data.

9.2 R

Unlike SPSS, R Studio is an open source software and performs similar statistics as SPSS.
However unlike SPSS, R operates using C++ programming.

9.3 PSPP

Another software available is PSPP. This software too provides similar statistical mea-
surements as the previous two. It was originally introduced with the aim of being a free
open replacement of SPSS.

14
View publication stats

You might also like