MPC 006
MPC 006
SECTION A
Answer the following question in about 1000 words (wherever applicable) each
15x3=45 Marks
1. Compare between parametric and nonparametric statistics. Discuss in detail any two
nonparametric techniques.
2. The scores obtained by three groups of students on Self Concept Scale are given below.
Compute ANOVA for the same.
Group A 34 43 22 66 44 34 44 77 77 33
Group B 34 22 34 44 65 67 43 35 57 87
Group C 34 33 22 58 56 54 56 66 77 78
SECTION B
Answer the following questions in about 400 words (wherever applicable) each
5x5=25 Marks
1. Describe the concept and importance of normal probability curve.
2. Using Pearson’s product moment correlation for the following data:
Data 1 10 7 5 6 3 6 8 2 9 10
m
Data 2 2 1 9 4 9 5 4 9 5 4
.co
1
3. With the help of Mann Whitney U test find if significant difference exists between the
scores obtained on Organisational Commitment Scale obtained by public and private bank
employees.
Public Bank 10, 12, 21, 23, 34, 45, 32, 23, 34, 23
Employees
Private Bank 34, 54, 56, 43, 32, 23, 34, 32, 33, 44, 32, 34, 32
Employees
4. Explain Two-way Analysis of Variance with a focus on its merits and demerits.
5. Compute Chi-square for the following data:
High Low
Early adolescents 34 43
Late adolescents 45 44
SECTION C
1. Tabulation
2. Interval estimation
3. Level of significance
4. Direction of correlation
5. Partial correlation
6. Multiple regression
m
2
ASSIGNMENT SOLUTIONS GUIDE (2020-21)
MPC-006
STATISTICS IN PSYCHOLOGY
CODE: MPC-006 /TMA/2020-21
Disclaimer/Special Note: These are just the sample of the Answers/Solutions to some of the Questions
given in the Assignments. These Sample Answers/Solutions are prepared by Private
Teacher/Tutors/Authors for the help and guidance of the student to get an idea of how he/she can
answer the Questions given the Assignments. We do not claim 100% accuracy of these sample
answers as these are based on the knowledge and capability of Private Teacher/Tutor. Sample
answers may be seen as the Guide/Help for the reference to prepare the answers of the Questions
given in the assignment. As these solutions and answers are prepared by the private teacher/tutor so
the chances of error or mistake cannot be denied. Any Omission or Error is highly regretted though
every care has been taken while preparing these Sample Answers/Solutions. Please consult your own
Teacher/Tutor before you prepare a Particular Answer and for up-to-date and exact information, data
and solution. Student should must read and refer the official study material provided by the
university.
SECTION-A
Answer the following question in about 1000 words (wherever applicable) each.
Q1. Compare between parametric and nonparametric statistics. Discuss in detail any two
Nonparametric techniques.
Ans.
m
.co
3
Q2. The scores obtained by three groups of students on Self-Concept Scale are given below.
Compute ANOVA for the same:
Ans.
m
.co
4
Solve this try yourself:
5
The sample data must provide sufficient evidence to reject the null hypothesis and conclude that the
effect exists in the population. Ideally, a hypothesis test fails to reject the null hypothesis when the
effect is not present in the population, and it rejects the null hypothesis when the effect exists.
Statisticians define two types of errors in hypothesis testing. Creatively, they call these errors Type I
and Type II errors. Both types of error relate to incorrect conclusions about the null hypothesis.
The table summarizes the four possible outcomes for a hypothesis test.
Test Rejects Null Test Fails to Reject Null
Null is True Type I Error
False Positive
Correct decision
No effect
Null is False Correct decision
Effect exists
Type II error
False negative
Fire alarm analogy for the types of errors: Sign that says fire alarm. A fire alarm provides a good
analogy for the types of hypothesis testing errors. Preferably, the alarm rings when there is a fire and
does not ring in the absence of a fire. However, if the alarm rings when there is no fire, it is a false
positive, or a Type I error in statistical terms. Conversely, if the fire alarm fails to ring when there is a
fire, it is a false negative, or a Type II error.
Using hypothesis tests correctly improves your chances of drawing trustworthy conclusions.
However, errors are bound to occur.
Unlike the fire alarm analogy, there is no sure way to determine whether an error occurred after you
perform a hypothesis test. Typically, a clearer picture develops over time as other researchers conduct
similar studies and an overall pattern of results appears. Seeing how your results fit in with similar
studies is a crucial step in assessing your study’s findings.
Now, let’s take a look at each type of error in more depth.
Type I Errors: False Positives: When you see a p-value that is less than your significance level, you
get excited because your results are statistically significant. However, it could be a type I error. The
supposed effect might not exist in the population. Again, there is usually no warning when this
occurs.
It comes down to sample error. Your random sample has overestimated the effect by chance. It was
the luck of the draw. This type of error doesn’t indicate that the researchers did anything wrong. The
experimental design, data collection, data validity, and statistical analysis can all be correct, and yet
this type of error still occurs.
Even though we don’t know for sure which studies have false positive results, we do know their rate
of occurrence. The rate of occurrence for Type I errors equals the significance level of the hypothesis
test, which is also known as alpha (α).
The significance level is an evidentiary standard that you set to determine whether your sample data
are strong enough to reject the null hypothesis. Hypothesis tests define that standard using the
probability of rejecting a null hypothesis that is actually true. You set this value based on your
willingness to risk a false positive.
Using the significance level to set the Type I error rate: When the significance level is 0.05 and the
m
null hypothesis is true, there is a 5% chance that the test will reject the null hypothesis incorrectly. If
you set alpha to 0.01, there is a 1% of a false positive. If 5% is good, then 1% seems even better, right?
As you’ll see, there is a tradeoff between Type I and Type II errors. If you hold everything else
constant, as you reduce the chance for a false positive, you increase the opportunity for a false
negative.
.co
Type I errors are relatively straightforward. The math is beyond the scope of this article,
but statisticians designed hypothesis tests to incorporate everything that affects this error rate so that
you can specify it for your studies. As long as your experimental design is sound, you collect valid
6
data, and the data satisfy the assumptions of the hypothesis test, the Type I error rate equals the
significance level that you specify. However, if there is a problem in one of those areas, it can affect
the false positive rate.
Warning about a potential misinterpretation of Type I errors and the Significance Level: When the
null hypothesis is correct for the population, the probability that a test produces a false positive
equals the significance level. However, when you look at a statistically significant test result, you
cannot state that there is a 5% chance that it represents a false positive.
Why is that the case? Imagine that we perform 100 studies on a population where the null hypothesis
is true. If we use a significance level of 0.05, we’d expect that five of the studies will produce
statistically significant results—false positives. Afterward, when we go to look at those significant
studies, what is the probability that each one is a false positive? Not 5 percent but 100%!
That scenario also illustrates a point that I made earlier. The true picture becomes more evident after
repeated experimentation. Given the pattern of results that are predominantly not significant, it is
unlikely that an effect exists in the population.
Type II Errors: False Negatives: When you perform a hypothesis test and your p-value is greater than
your significance level, your results are not statistically significant. That’s disappointing because your
sample provides insufficient evidence for concluding that the effect you’re studying exists in the
population. However, there is a chance that the effect is present in the population even though the
test results don’t support it. If that’s the case, you’ve just experienced a Type II error. The probability
of making a Type II error is known as beta (β).
What causes Type II errors? Whereas Type I errors are caused by one thing, sample error, there are a
host of possible reasons for Type II errors—small effect sizes, small sample sizes, and high data
variability. Furthermore, unlike Type I errors, you can’t set the Type II error rate for your analysis.
Instead, the best that you can do is estimate it before you begin your study by approximating
properties of the alternative hypothesis that you’re studying. When you do this type of estimation, it’s
called power analysis.
To estimate the Type II error rate, you create a hypothetical probability distribution that represents
the properties of a true alternative hypothesis. However, when you’re performing a hypothesis test,
you typically don’t know which hypothesis is true, much less the specific properties of the
distribution for the alternative hypothesis. Consequently, the true Type II error rate is usually
unknown!
Type II errors and the power of the analysis: The Type II error rate (beta) is the probability of a false
negative. Therefore, the inverse of Type II errors is the probability of correctly detecting an effect.
Statisticians refer to this concept as the power of a hypothesis test. Consequently, 1 – β = the statistical
power. Analysts typically estimate power rather than beta directly.
If you read my post about power and sample size analysis, you know that the three factors that affect
power are sample size, variability in the population, and the effect size. As you design your
experiment, you can enter estimates of these three factors into statistical software and it calculates the
estimated power for your test.
Suppose you perform a power analysis for an upcoming study and calculate an estimated power of
90%. For this study, the estimated Type II error rate is 10% (1 – 0.9). Keep in mind that variability and
effect size are based on estimates and guesses. Consequently, power and the Type II error rate are just
estimates rather than something you set directly. These estimates are only as good as the inputs into
m
7
the null hypothesis. Of course, when you perform the hypothesis test, you don’t know which
hypothesis is correct. And, the properties of the distribution for the alternative hypothesis are usually
unknown. However, use this graph to understand the general nature of these errors and how they are
related.
As you’ve seen, the nature of the two types of error, their causes, and the certainty of their rates of
occurrence are all very different.
A common question is whether one type of error is worse than the other? Statisticians designed
hypothesis tests to control Type I errors while Type II errors are much less defined. Consequently,
many statisticians state that it is better to fail to detect an effect when it exists than it is to conclude an
effect exists when it doesn’t. That is to say, there is a tendency to assume that Type I errors are worse.
However, reality is more complex than that. You should carefully consider the consequences of each
type of error for your specific test.
Suppose you are assessing the strength of a new jet engine part that is under consideration. Peoples
lives are riding on the part’s strength. A false negative in this scenario merely means that the part is
strong enough but the test fails to detect it. This situation does not put anyone’s life at risk. On the
other hand, Type I errors are worse in this situation because they indicate the part is strong enough
when it is not.
Now suppose that the jet engine part is already in use but there are concerns about it failing. In this
case, you want the test to be more sensitive to detecting problems even at the risk of false positives.
Type II errors are worse in this scenario because the test fails to recognize the problem and leaves
these problematic parts in use for longer.
SECTION-B
Answer the following questions in about 400 words (wherever applicable) each
8
2. The normal curve is unimodal: Since there is only one point in the curve which has
maximum frequency, the normal probability curve is unimodal, i.e. it has only one mode.
3. Mean, median and mode coincide: The mean, median and mode of the normal distribution
are the same and they lie at the centre. They are represented by 0 (zero) along the base line.
[Mean = Median = Mode]
4. The maximum ordinate occurs at the centre: The maximum height of the ordinate always
occurs at the central point of the curve that is, at the mid-point. The ordinate at the mean is
the highest ordinate and it is denoted by Y0. (Y0 is the height of the curve at the mean or mid-
point of the base line).
5. The normal curve is asymptotic to the X-axis: The Normal Probability Curve approaches the
horizontal axis asymptotically i.e., the curve continues to decrease in height on both ends
away from the middle point (the maximum ordinate point); but it never touches the
horizontal axis.
It extends infinitely in both directions i.e. from minus infinity (-∞) to plus infinity (+∞) as
shown in Figure below. As the distance from the mean increases the curve approaches to the
base line more and more closely.
6. The height of the curve declines symmetrically: In the normal probability curve the height
declines symmetrically in either direction from the maximum point. Hence the ordinates for
values of X = µ ± K, where K is a real number, are equal.
For example: The heights of the curve or the ordinate at X = µ + σ and X = µ – σ are exactly
the same as shown in the following Figure:
m
.co
7. The points of Influx occur at point ± 1 Standard Deviation (± 1 a): The normal curve changes
its direction from convex to concave at a point recognized as point of influx. If we draw the
9
perpendiculars from these two points of influx of the curve on horizontal axis, these two will
touch the axis at a distance one Standard Deviation unit above and below the mean (± 1 σ).
8. The total percentage of area of the normal curve within two points of influxation is
fixed:Approximately 68.26% area of the curve falls within the limits of ±1 standard deviation
9. Normal curve is a smooth curve: The normal curve is a smooth curve, not a histogram. It is
10. The normal curve is bilateral: The 50% area of the curve lies to the left side of the maximum
central ordinate and 50% lies to the right side. Hence the curve is bilateral.
11. The normal curve is a mathematical model in behavioural sciences: The curve is used as a
measurement scale. The measurement unit of this scale is ± σ (the unit standard deviation).
12. Greater percentage of cases at the middle of the distribution: There is a greater percentage
of cases at the middle of the distribution. In between -1σ and + 1σ, 68.26% (34.13 + 34.13),
nearly 2/3 of eases lie. To the right side of +1σ, 15.87% (13.59 + 2.14 + .14), and to the left of-1σ,
15.87% (13.59 + 2.14 + .14) of cases lie. Beyond +2σ. 2.28% of eases lie and beyond -2σ also
Thus, majority of eases lie at the middle of the distribution and gradually number of cases on
Percentage of cases between Mean and different a distances can be read from the figure
below:
m
10
(Equation of the normal probability curve) in which
x = scores (expressed as deviations from the mean) laid off along the base line or X-axis.
y = the height of the curve above the X axis, i.e., the frequency of a given x-value.
The other terms in the equation are constants:
N = number of eases
a = standard deviation of the distribution
π = 3.1416 (the ratio of the circumference of a circle to its diameter)
e = 2.7183 (base of the Napierian system of logarithms).
15. The normal curve is based on elementary principles of probability and the other name of the
normal curve is the ‘normal probability curve’.
Q2. Using Pearson’s product moment correlation for the following data:
Ans.
Q3. With the help of Mann Whitney U test find if significant difference exists between the scores
obtained on Organisational Commitment Scale obtained by public and private bank employees.
Ans. When you choose to analyse your data using a Mann-Whitney U test, part of the process
involves checking to make sure that the data you want to analyse can actually be analysed using a
Mann-Whitney U test. You need to do this because it is only appropriate to use a Mann-Whitney U
test if your data "passes" four assumptions that are required for a Mann-Whitney U test to give you a
valid result. In practice, checking for these four assumptions just adds a little bit more time to your
m
analysis, requiring you to click a few more buttons in SPSS Statistics when performing your analysis,
as well as think a little bit more about your data, but it is not a difficult task.
Before we introduce you to these four assumptions, do not be surprised if, when analysing your own
data using SPSS Statistics, one or more of these assumptions is violated (i.e., is not met). This is not
uncommon when working with real-world data rather than textbook examples, which often only
.co
show you how to carry out a Mann-Whitney U test when everything goes well! However, don’t
worry. Even when your data fails certain assumptions, there is often a solution to overcome this. First,
let’s take a look at these four assumptions:
11
• Assumption #1: Your dependent variable should be measured at the ordinal or continuous
level. Examples of ordinal variables include Likert items (e.g., a 7-point scale from "strongly
agree" through to "strongly disagree"), amongst other ways of ranking categories (e.g., a 5-
point scale explaining how much a customer liked a product, ranging from "Not very much"
to "Yes, a lot"). Examples of continuous variables include revision time (measured in hours),
intelligence (measured using IQ score), exam performance (measured from 0 to 100), weight
(measured in kg), and so forth. You can learn more about ordinal and continuous variables in
our article: Types of Variable.
• Assumption #2: Your independent variable should consist of two categorical, independent
groups. Example independent variables that meet this criterion include gender (2 groups:
male or female), employment status (2 groups: employed or unemployed), smoker (2 groups:
yes or no), and so forth.
• Assumption #3: You should have independence of observations, which means that there is no
relationship between the observations in each group or between the groups themselves. For
example, there must be different participants in each group with no participant being in more
than one group. This is more of a study design issue than something you can test for, but it is
an important assumption of the Mann-Whitney U test. If your study fails this assumption,
you will need to use another statistical test instead of the Mann-Whitney U test (e.g., a
Wilcoxon signed-rank test). If you are unsure whether your study meets this assumption, you
can use our Statistical Test Selector, which is part of our enhanced content.
• Assumption #4: A Mann-Whitney U test can be used when your two variables are not
normally distributed. However, in order to know how to interpret the results from a Mann-
Whitney U test, you have to determine whether your two distributions (i.e., the distribution
of scores for both groups of the independent variable; for example, 'males' and 'females' for
the independent variable, 'gender') have the same shape. To understand what this means,
take a look at the diagram below:
In the two diagrams above, the distribution of scores for 'males' and 'females' have the same shape. In
the diagram on the left, you cannot see the distribution of scores for 'males' (illustrated in blue on the
diagram on the right) because the two distributions are identical (i.e., both distributions are identical,
so they are 'on top of each other' in the diagram, with the blue-coloured male distribution underneath
the red-coloured female distribution). However, in the diagram on the right, even though both
distributions have the same shape, they have a different location (i.e., the distribution of one of the
groups of the independent variable has higher or lower values compared to the second distribution –
in our example, females have 'higher' values than males, overall).
m
When you analyse your own data, it is extremely unlikely that your two distributions will be
identical, but they may have the same (or a 'similar') shape. If they do have the same shape, you can
use SPSS Statistics to carry out a Mann-Whitney U test to compare the medians of your dependent
variable (e.g., engagement score) for the two groups (e.g., males and females) of the independent
.co
variable (e.g., gender) you are interested in. However, if your two distribution have a different shape,
you can only use the Mann-Whitney U test to compare mean ranks.
12
Q4. Explain Two-way Analysis of Variance with a focus on its merits and demerits.
Ans. ANOVA stands for analysis of variance and tests for differences in the effects of independent
variables on a dependent variable. A two-way ANOVA test is a statistical test used to determine the
effect of two nominal predictor variables on a continuous outcome variable.
A two-way ANOVA tests the effect of two independent variables on a dependent variable. A two-
way ANOVA test analyzes the effect of the independent variables on the expected outcome along
with their relationship to the outcome itself. Random factors would be considered to have no
statistical influence on a data set, while systematic factors would be considered to have statistical
significance.
By using ANOVA, a researcher is able to determine whether the variability of the outcomes is due to
chance or to the factors in the analysis. ANOVA has many applications in finance, economics, science,
medicine, and social science.
Understanding 2-Way ANOVA: An ANOVA test is the first step in identifying factors that influence
a given outcome. Once an ANOVA test is performed, a tester may be able to perform further analysis
on the systematic factors that are statistically contributing to the data set's variability.
A two-way ANOVA test reveals the results of two independent variables on a dependent variable.
ANOVA test results can then be used in an F-test, a statistical test used to determine whether two
populations with normal distributions share variances or a standard deviation, on the significance of
the regression formula overall.
Analysis of variances is helpful for testing the effects of variables on one another. It is similar to
multiple two-sample t-tests. However, it results in fewer type 1 errors and is appropriate for a range
of issues. An ANOVA test groups differences by comparing the means of each group and includes
spreading out the variance across diverse sources. It is employed with subjects, test groups, between
groups and within groups.
ANOVA vs. 2-Way ANOVA: There are two main types of analysis of variance: one-way (or
unidirectional) and two-way (bidirectional). One-way or two-way refers to the number of
independent variables in your analysis of variance test. A one-way ANOVA evaluates the impact of a
sole factor on a sole response variable. It determines whether the observed differences between the
means of independent (unrelated) groups are explainable by chance alone, or whether there are any
statistically significant differences between groups.
A two-way ANOVA is an extension of the one-way ANOVA. With a one-way, you have one
independent variable affecting a dependent variable. With a two-way ANOVA, there are two
independents. For example, a two-way ANOVA allows a company to compare worker productivity
based on two independent variables, such as department and gender. It is utilized to observe the
interaction between the two factors. It tests the effect of two factors at the same time.
A three-way ANOVA, also known as three-factor ANOVA, is a statistical means of determining the
effect of three factors on an outcome.
13
Ans. With their help of this. Find out your answer.
SECTION-C
Answer the following in about 50 words each
Q1. Tabulation
Ans. Tabulation is a systematic & logical presentation of numeric data in rows and columns to
facilitate comparison and statistical analysis. It facilitates comparison by bringing related information
close to each other and helps in further statistical analysis and interpretation.
In other words, the method of placing organised data into a tabular form is called as tabulation. It
may be complex, double or simple depending upon the nature of categorisation.
Also Check: Tabular Presentation of Data .5 Major Objectives Of Tabulation:
(1) To Simplify the Complex Data:
• It reduces the bulk of information, i.e. raw data in a simplified and meaningful form
m
14
(4) To Facilitate Statistical Analysis:
• Tables serve as the best source of organised data for further statistical analysis.
• The task of computing average, dispersion, correlation, etc. becomes easier if data is
presented in the form of a table.
(5) Saving of Space:
• A table presents facts in a better way than the textual form.
• It saves space without sacrificing the quality and quantity of data.
The correlation requires two scores from the same individuals. These scores are normally identified as
X and Y. The pairs of scores can be listed in a table or presented in a scatterplot.
Example: We might be interested in the correlation between your SAT-M scores and your GPA at
UNC.
.co
Here are the Math SAT scores and the GPA scores of 13 of the students in this class, and the
scatterplot for all 41 students:
15
The scatterplot has the X values (GPA) on the horizontal (X) axis, and the Y values (MathSAT) on the
vertical (Y) axis. Each individual is identified by a single point (dot) on the graph which is located so
that the coordinates of the point (the X and Y values) match the individual's X (GPA) and Y
(MathSAT) scores.
For example, the student named "Obs5" (in the sixth row of the datasheet) has GPA=2.30 and
MathSAT=710. This student is represented in the scatterplot by high-lighted and labled ("5") dot in the
upper-left part of the scatterplot. Note that is to the right of MathSAT of 710 and above GPA of 2.30.
This image shows the relationship between diet and weight loss. The shaded region would be the
correlation between these two variables.
This image shows the relationship between exercise and weight loss. The orange region represents the
correlation between the two variables.
16
This is the image that represents what we are after. Notice that the grey region represents the overall
correlation of the three variables. The orange and green shaded regions are still there. These indicate
the partial correlations that still explain some of the weight loss; however, the grey region is how
they both explain what is going on with weight loss.
Kurtosis is a measure of the combined sizes of the two tails. It measures the amount of probability in
the tails. The value is often compared to the kurtosis of the normal distribution, which is equal to
3. If the kurtosis is greater than 3, then the dataset has heavier tails than a normal distribution (more
in the tails). If the kurtosis is less than 3, then the dataset has lighter tails than a normal distribution
(less in the tails). Careful here. Kurtosis is sometimes reported as “excess kurtosis.” Excess kurtosis
is determined by subtracting 3 form the kurtosis. This makes the normal distribution kurtosis equal
0. Kurtosis originally was thought to measure the peakedness of a distribution. Though you will still
see this as part of the definition in many places, this is a misconception.
17
Q8. Levels of measurement
Ans. The level of measurement refers to the relationship among the values that are assigned to the
attributes for a variable. What does that mean? Begin with the idea of the variable, in this example
“party affiliation.”
That variable has a number of attributes. Let’s assume that in this particular election context the only
relevant attributes are “republican”, “democrat”, and “independent”. For purposes of analyzing the
results of this variable, we arbitrarily assign the values 1, 2 and 3 to the three attributes. The level of
measurement describes the relationship among these three values. In this case, we simply are using the
numbers as shorter placeholders for the lengthier text terms. We don’t assume that higher values
mean “more” of something and lower numbers signify “less”. We don’t assume the the value
of 2 means that democrats are twice something that republicans are. We don’t assume that
republicans are in first place or have the highest priority just because they have the value of 1. In this
case, we only use the values as a shorter name for the attribute. Here, we would describe the level of
measurement as “nominal”.
18
ordinal scale. It is considered a nonparametric alternative to the Pearson’s product-moment
correlation when your data has failed one or more of the assumptions of this test. It is also considered
an alternative to the nonparametric Spearman rank-order correlation coefficient (especially when you
have a small sample size with many tied ranks). If you consider one of your variables as an
independent variable and the other as a dependent variable, you might consider running a Somers'
d test instead.
For example, you could use Kendall's tau-b to understand whether there is an association between
exam grade and time spent revising (i.e., where there were six possible exam grades – A, B, C, D, E
and F – and revision time was split into five categories: less than 5 hours, 5-9 hours, 10-14 hours, 15-19
hours, and 20 hours or more). Alternately, you could use Kendall's tau-b to understand whether there
is an association between customer satisfaction and delivery time (i.e., where delivery time had four
categories – next day, 2 working days, 3-5 working days, and more than 5 working days – and
customer satisfaction was measured in terms of the level of agreement customers had with the
following statement, "I am satisfied with the time it took for my parcel to be delivered", where the
level of agreement had five categories: strongly agree, agree, neither agree nor disagree, disagree and
strongly disagree).
This "quick start" guide shows you how to carry out Kendall's tau-b using SPSS Statistics. We show
you the main procedure to carry out Kendall's tau-b in the Procedure section. However, first we
introduce you to the assumptions that you must consider when carrying out Kendall's tau-b.
19