0% found this document useful (0 votes)
123 views19 pages

MPC 006

- Hypothesis testing involves potential errors in rejecting or failing to reject the null hypothesis. - Type I errors occur when the null hypothesis is rejected when it is actually true, resulting in a false positive. The rate of Type I errors is equal to the significance level. - Type II errors occur when the null hypothesis is not rejected when it is actually false, resulting in a false negative. Stricter significance levels decrease the risk of Type I errors but increase the risk of Type II errors.

Uploaded by

Swarnali Mitra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
123 views19 pages

MPC 006

- Hypothesis testing involves potential errors in rejecting or failing to reject the null hypothesis. - Type I errors occur when the null hypothesis is rejected when it is actually true, resulting in a false positive. The rate of Type I errors is equal to the significance level. - Type II errors occur when the null hypothesis is not rejected when it is actually false, resulting in a false negative. Stricter significance levels decrease the risk of Type I errors but increase the risk of Type II errors.

Uploaded by

Swarnali Mitra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

STATISTICS IN PSYCHOLOGY (MPC-006)

TUTOR MARKED ASSIGNMENT (TMA)

Course Code: MPC-006


Assignment Code: MPC-006/AST/TMA/2020-2021
Marks: 100
NOTE: All Questions Are Compulsory.
The answers are to be written in own words. Do not copy from the course material
or any other source.

SECTION A

Answer the following question in about 1000 words (wherever applicable) each
15x3=45 Marks

1. Compare between parametric and nonparametric statistics. Discuss in detail any two
nonparametric techniques.
2. The scores obtained by three groups of students on Self Concept Scale are given below.
Compute ANOVA for the same.
Group A 34 43 22 66 44 34 44 77 77 33

Group B 34 22 34 44 65 67 43 35 57 87

Group C 34 33 22 58 56 54 56 66 77 78

3. Describe hypothesis testing with a focus on errors in hypothesis testing.

SECTION B

Answer the following questions in about 400 words (wherever applicable) each
5x5=25 Marks
1. Describe the concept and importance of normal probability curve.
2. Using Pearson’s product moment correlation for the following data:

Data 1 10 7 5 6 3 6 8 2 9 10
m

Data 2 2 1 9 4 9 5 4 9 5 4
.co

1
3. With the help of Mann Whitney U test find if significant difference exists between the
scores obtained on Organisational Commitment Scale obtained by public and private bank
employees.

Scores on Organisational Commitment Scale

Public Bank 10, 12, 21, 23, 34, 45, 32, 23, 34, 23
Employees

Private Bank 34, 54, 56, 43, 32, 23, 34, 32, 33, 44, 32, 34, 32
Employees

4. Explain Two-way Analysis of Variance with a focus on its merits and demerits.
5. Compute Chi-square for the following data:

Phases of Adolescence Achievement Motivation Scores

High Low

Early adolescents 34 43

Late adolescents 45 44

SECTION C

Answer the following in about 50 words each 10x3=30 Marks

1. Tabulation
2. Interval estimation
3. Level of significance
4. Direction of correlation
5. Partial correlation
6. Multiple regression
m

7. Skewness and Kurtosis


8. Levels of measurement
9. Measures of dispersion
10. Kendall’s tau
.co

2
ASSIGNMENT SOLUTIONS GUIDE (2020-21)

MPC-006
STATISTICS IN PSYCHOLOGY
CODE: MPC-006 /TMA/2020-21
Disclaimer/Special Note: These are just the sample of the Answers/Solutions to some of the Questions
given in the Assignments. These Sample Answers/Solutions are prepared by Private
Teacher/Tutors/Authors for the help and guidance of the student to get an idea of how he/she can
answer the Questions given the Assignments. We do not claim 100% accuracy of these sample
answers as these are based on the knowledge and capability of Private Teacher/Tutor. Sample
answers may be seen as the Guide/Help for the reference to prepare the answers of the Questions
given in the assignment. As these solutions and answers are prepared by the private teacher/tutor so
the chances of error or mistake cannot be denied. Any Omission or Error is highly regretted though
every care has been taken while preparing these Sample Answers/Solutions. Please consult your own
Teacher/Tutor before you prepare a Particular Answer and for up-to-date and exact information, data
and solution. Student should must read and refer the official study material provided by the
university.

SECTION-A
Answer the following question in about 1000 words (wherever applicable) each.

Q1. Compare between parametric and nonparametric statistics. Discuss in detail any two
Nonparametric techniques.
Ans.
m
.co

3
Q2. The scores obtained by three groups of students on Self-Concept Scale are given below.
Compute ANOVA for the same:

Ans.
m
.co

4
Solve this try yourself:

Q3. Describe hypothesis testing with a focus on errors in hypothesis testing.


Ans. Hypothesis tests use sample data to make inferences about the properties of a population. You
gain tremendous benefits by working with random samples because it is usually impossible to
measure the entire population.
However, there are tradeoffs when you use samples. The samples we use are typically a minuscule
percentage of the entire population. Consequently, they occasionally misrepresent the population
severely enough to cause hypothesis tests to make errors.
In this blog post, you will learn about the two types of errors in hypothesis testing, their causes, and
m

how to manage them.


Potential Outcomes in Hypothesis Testing
Hypothesis testing is a procedure in inferential statistics that assesses two mutually exclusive theories
about the properties of a population. For a generic hypothesis test, the two hypotheses are as follows:
Null hypothesis: There is no effect
.co

Alternative hypothesis: There is an effect.

5
The sample data must provide sufficient evidence to reject the null hypothesis and conclude that the
effect exists in the population. Ideally, a hypothesis test fails to reject the null hypothesis when the
effect is not present in the population, and it rejects the null hypothesis when the effect exists.
Statisticians define two types of errors in hypothesis testing. Creatively, they call these errors Type I
and Type II errors. Both types of error relate to incorrect conclusions about the null hypothesis.
The table summarizes the four possible outcomes for a hypothesis test.
Test Rejects Null Test Fails to Reject Null
Null is True Type I Error
False Positive
Correct decision
No effect
Null is False Correct decision
Effect exists
Type II error
False negative
Fire alarm analogy for the types of errors: Sign that says fire alarm. A fire alarm provides a good
analogy for the types of hypothesis testing errors. Preferably, the alarm rings when there is a fire and
does not ring in the absence of a fire. However, if the alarm rings when there is no fire, it is a false
positive, or a Type I error in statistical terms. Conversely, if the fire alarm fails to ring when there is a
fire, it is a false negative, or a Type II error.
Using hypothesis tests correctly improves your chances of drawing trustworthy conclusions.
However, errors are bound to occur.
Unlike the fire alarm analogy, there is no sure way to determine whether an error occurred after you
perform a hypothesis test. Typically, a clearer picture develops over time as other researchers conduct
similar studies and an overall pattern of results appears. Seeing how your results fit in with similar
studies is a crucial step in assessing your study’s findings.
Now, let’s take a look at each type of error in more depth.
Type I Errors: False Positives: When you see a p-value that is less than your significance level, you
get excited because your results are statistically significant. However, it could be a type I error. The
supposed effect might not exist in the population. Again, there is usually no warning when this
occurs.
It comes down to sample error. Your random sample has overestimated the effect by chance. It was
the luck of the draw. This type of error doesn’t indicate that the researchers did anything wrong. The
experimental design, data collection, data validity, and statistical analysis can all be correct, and yet
this type of error still occurs.
Even though we don’t know for sure which studies have false positive results, we do know their rate
of occurrence. The rate of occurrence for Type I errors equals the significance level of the hypothesis
test, which is also known as alpha (α).
The significance level is an evidentiary standard that you set to determine whether your sample data
are strong enough to reject the null hypothesis. Hypothesis tests define that standard using the
probability of rejecting a null hypothesis that is actually true. You set this value based on your
willingness to risk a false positive.
Using the significance level to set the Type I error rate: When the significance level is 0.05 and the
m

null hypothesis is true, there is a 5% chance that the test will reject the null hypothesis incorrectly. If
you set alpha to 0.01, there is a 1% of a false positive. If 5% is good, then 1% seems even better, right?
As you’ll see, there is a tradeoff between Type I and Type II errors. If you hold everything else
constant, as you reduce the chance for a false positive, you increase the opportunity for a false
negative.
.co

Type I errors are relatively straightforward. The math is beyond the scope of this article,
but statisticians designed hypothesis tests to incorporate everything that affects this error rate so that
you can specify it for your studies. As long as your experimental design is sound, you collect valid

6
data, and the data satisfy the assumptions of the hypothesis test, the Type I error rate equals the
significance level that you specify. However, if there is a problem in one of those areas, it can affect
the false positive rate.
Warning about a potential misinterpretation of Type I errors and the Significance Level: When the
null hypothesis is correct for the population, the probability that a test produces a false positive
equals the significance level. However, when you look at a statistically significant test result, you
cannot state that there is a 5% chance that it represents a false positive.
Why is that the case? Imagine that we perform 100 studies on a population where the null hypothesis
is true. If we use a significance level of 0.05, we’d expect that five of the studies will produce
statistically significant results—false positives. Afterward, when we go to look at those significant
studies, what is the probability that each one is a false positive? Not 5 percent but 100%!
That scenario also illustrates a point that I made earlier. The true picture becomes more evident after
repeated experimentation. Given the pattern of results that are predominantly not significant, it is
unlikely that an effect exists in the population.
Type II Errors: False Negatives: When you perform a hypothesis test and your p-value is greater than
your significance level, your results are not statistically significant. That’s disappointing because your
sample provides insufficient evidence for concluding that the effect you’re studying exists in the
population. However, there is a chance that the effect is present in the population even though the
test results don’t support it. If that’s the case, you’ve just experienced a Type II error. The probability
of making a Type II error is known as beta (β).
What causes Type II errors? Whereas Type I errors are caused by one thing, sample error, there are a
host of possible reasons for Type II errors—small effect sizes, small sample sizes, and high data
variability. Furthermore, unlike Type I errors, you can’t set the Type II error rate for your analysis.
Instead, the best that you can do is estimate it before you begin your study by approximating
properties of the alternative hypothesis that you’re studying. When you do this type of estimation, it’s
called power analysis.
To estimate the Type II error rate, you create a hypothetical probability distribution that represents
the properties of a true alternative hypothesis. However, when you’re performing a hypothesis test,
you typically don’t know which hypothesis is true, much less the specific properties of the
distribution for the alternative hypothesis. Consequently, the true Type II error rate is usually
unknown!
Type II errors and the power of the analysis: The Type II error rate (beta) is the probability of a false
negative. Therefore, the inverse of Type II errors is the probability of correctly detecting an effect.
Statisticians refer to this concept as the power of a hypothesis test. Consequently, 1 – β = the statistical
power. Analysts typically estimate power rather than beta directly.
If you read my post about power and sample size analysis, you know that the three factors that affect
power are sample size, variability in the population, and the effect size. As you design your
experiment, you can enter estimates of these three factors into statistical software and it calculates the
estimated power for your test.
Suppose you perform a power analysis for an upcoming study and calculate an estimated power of
90%. For this study, the estimated Type II error rate is 10% (1 – 0.9). Keep in mind that variability and
effect size are based on estimates and guesses. Consequently, power and the Type II error rate are just
estimates rather than something you set directly. These estimates are only as good as the inputs into
m

your power analysis.


Low variability and larger effect sizes decrease the Type II error rate, which increases the statistical
power. However, researchers usually have less control over those aspects of a hypothesis test.
Typically, researchers have the most control over sample size, making it the critical way to manage
your Type II error rate. Holding everything else constant, increasing the sample size reduces the Type
.co

II error rate and increases power.


Graphing Type I and Type II Errors: The graph below illustrates the two types of errors using two
sampling distributions. The critical region line represents the point at which you reject or fail to reject

7
the null hypothesis. Of course, when you perform the hypothesis test, you don’t know which
hypothesis is correct. And, the properties of the distribution for the alternative hypothesis are usually
unknown. However, use this graph to understand the general nature of these errors and how they are
related.
As you’ve seen, the nature of the two types of error, their causes, and the certainty of their rates of
occurrence are all very different.
A common question is whether one type of error is worse than the other? Statisticians designed
hypothesis tests to control Type I errors while Type II errors are much less defined. Consequently,
many statisticians state that it is better to fail to detect an effect when it exists than it is to conclude an
effect exists when it doesn’t. That is to say, there is a tendency to assume that Type I errors are worse.
However, reality is more complex than that. You should carefully consider the consequences of each
type of error for your specific test.
Suppose you are assessing the strength of a new jet engine part that is under consideration. Peoples
lives are riding on the part’s strength. A false negative in this scenario merely means that the part is
strong enough but the test fails to detect it. This situation does not put anyone’s life at risk. On the
other hand, Type I errors are worse in this situation because they indicate the part is strong enough
when it is not.
Now suppose that the jet engine part is already in use but there are concerns about it failing. In this
case, you want the test to be more sensitive to detecting problems even at the risk of false positives.
Type II errors are worse in this scenario because the test fails to recognize the problem and leaves
these problematic parts in use for longer.

SECTION-B
Answer the following questions in about 400 words (wherever applicable) each

Q1. Describe the concept and importance of normal probability curve.


Ans. This article throws light upon the fifteen main principles of normal probability curve. Some of
the properties are: 1. The normal curve is symmetrical 2. The normal curve is unimodal 3. Mean,
median and mode coincide 4. The maximum ordinate occurs at the centre 5. The normal curve is
asymptotic to the X-axis 6. The height of the curve declines symmetrically and Others.
1. The normal curve is symmetrical: The Normal Probability Curve (N.P.C.) is symmetrical
about the ordinate of the central point of the curve. It implies that the size, shape and slope of
the curve on one side of the curve is identical to that of the other.
That is, the normal curve has a bilateral symmetry. If the figure is to be folded along its
vertical axis, the two halves would coincide. In other words the left and right values to the
middle central point are mirror images.
m
.co

8
2. The normal curve is unimodal: Since there is only one point in the curve which has
maximum frequency, the normal probability curve is unimodal, i.e. it has only one mode.
3. Mean, median and mode coincide: The mean, median and mode of the normal distribution
are the same and they lie at the centre. They are represented by 0 (zero) along the base line.
[Mean = Median = Mode]
4. The maximum ordinate occurs at the centre: The maximum height of the ordinate always
occurs at the central point of the curve that is, at the mid-point. The ordinate at the mean is
the highest ordinate and it is denoted by Y0. (Y0 is the height of the curve at the mean or mid-
point of the base line).

5. The normal curve is asymptotic to the X-axis: The Normal Probability Curve approaches the
horizontal axis asymptotically i.e., the curve continues to decrease in height on both ends
away from the middle point (the maximum ordinate point); but it never touches the
horizontal axis.
It extends infinitely in both directions i.e. from minus infinity (-∞) to plus infinity (+∞) as
shown in Figure below. As the distance from the mean increases the curve approaches to the
base line more and more closely.

6. The height of the curve declines symmetrically: In the normal probability curve the height
declines symmetrically in either direction from the maximum point. Hence the ordinates for
values of X = µ ± K, where K is a real number, are equal.
For example: The heights of the curve or the ordinate at X = µ + σ and X = µ – σ are exactly
the same as shown in the following Figure:
m
.co

7. The points of Influx occur at point ± 1 Standard Deviation (± 1 a): The normal curve changes

its direction from convex to concave at a point recognized as point of influx. If we draw the

9
perpendiculars from these two points of influx of the curve on horizontal axis, these two will

touch the axis at a distance one Standard Deviation unit above and below the mean (± 1 σ).

8. The total percentage of area of the normal curve within two points of influxation is

fixed:Approximately 68.26% area of the curve falls within the limits of ±1 standard deviation

unit from the mean as shown in figure below.

9. Normal curve is a smooth curve: The normal curve is a smooth curve, not a histogram. It is

moderately peaked. The kurtosis of the normal curve is 263.

10. The normal curve is bilateral: The 50% area of the curve lies to the left side of the maximum

central ordinate and 50% lies to the right side. Hence the curve is bilateral.

11. The normal curve is a mathematical model in behavioural sciences: The curve is used as a

measurement scale. The measurement unit of this scale is ± σ (the unit standard deviation).

12. Greater percentage of cases at the middle of the distribution: There is a greater percentage

of cases at the middle of the distribution. In between -1σ and + 1σ, 68.26% (34.13 + 34.13),

nearly 2/3 of eases lie. To the right side of +1σ, 15.87% (13.59 + 2.14 + .14), and to the left of-1σ,

15.87% (13.59 + 2.14 + .14) of cases lie. Beyond +2σ. 2.28% of eases lie and beyond -2σ also

2.28% of cases lie.

Thus, majority of eases lie at the middle of the distribution and gradually number of cases on

either side decreases with certain proportions.

Percentage of cases between Mean and different a distances can be read from the figure

below:
m

13. The scale of X-axis in normal curve is generalised by Z deviates


.co

14. The equation of the normal probability curve reads:

10
(Equation of the normal probability curve) in which
x = scores (expressed as deviations from the mean) laid off along the base line or X-axis.
y = the height of the curve above the X axis, i.e., the frequency of a given x-value.
The other terms in the equation are constants:
N = number of eases
a = standard deviation of the distribution
π = 3.1416 (the ratio of the circumference of a circle to its diameter)
e = 2.7183 (base of the Napierian system of logarithms).
15. The normal curve is based on elementary principles of probability and the other name of the
normal curve is the ‘normal probability curve’.

Q2. Using Pearson’s product moment correlation for the following data:

Ans.

Q3. With the help of Mann Whitney U test find if significant difference exists between the scores
obtained on Organisational Commitment Scale obtained by public and private bank employees.

Ans. When you choose to analyse your data using a Mann-Whitney U test, part of the process
involves checking to make sure that the data you want to analyse can actually be analysed using a
Mann-Whitney U test. You need to do this because it is only appropriate to use a Mann-Whitney U
test if your data "passes" four assumptions that are required for a Mann-Whitney U test to give you a
valid result. In practice, checking for these four assumptions just adds a little bit more time to your
m

analysis, requiring you to click a few more buttons in SPSS Statistics when performing your analysis,
as well as think a little bit more about your data, but it is not a difficult task.
Before we introduce you to these four assumptions, do not be surprised if, when analysing your own
data using SPSS Statistics, one or more of these assumptions is violated (i.e., is not met). This is not
uncommon when working with real-world data rather than textbook examples, which often only
.co

show you how to carry out a Mann-Whitney U test when everything goes well! However, don’t
worry. Even when your data fails certain assumptions, there is often a solution to overcome this. First,
let’s take a look at these four assumptions:

11
• Assumption #1: Your dependent variable should be measured at the ordinal or continuous
level. Examples of ordinal variables include Likert items (e.g., a 7-point scale from "strongly
agree" through to "strongly disagree"), amongst other ways of ranking categories (e.g., a 5-
point scale explaining how much a customer liked a product, ranging from "Not very much"
to "Yes, a lot"). Examples of continuous variables include revision time (measured in hours),
intelligence (measured using IQ score), exam performance (measured from 0 to 100), weight
(measured in kg), and so forth. You can learn more about ordinal and continuous variables in
our article: Types of Variable.
• Assumption #2: Your independent variable should consist of two categorical, independent
groups. Example independent variables that meet this criterion include gender (2 groups:
male or female), employment status (2 groups: employed or unemployed), smoker (2 groups:
yes or no), and so forth.
• Assumption #3: You should have independence of observations, which means that there is no
relationship between the observations in each group or between the groups themselves. For
example, there must be different participants in each group with no participant being in more
than one group. This is more of a study design issue than something you can test for, but it is
an important assumption of the Mann-Whitney U test. If your study fails this assumption,
you will need to use another statistical test instead of the Mann-Whitney U test (e.g., a
Wilcoxon signed-rank test). If you are unsure whether your study meets this assumption, you
can use our Statistical Test Selector, which is part of our enhanced content.
• Assumption #4: A Mann-Whitney U test can be used when your two variables are not
normally distributed. However, in order to know how to interpret the results from a Mann-
Whitney U test, you have to determine whether your two distributions (i.e., the distribution
of scores for both groups of the independent variable; for example, 'males' and 'females' for
the independent variable, 'gender') have the same shape. To understand what this means,
take a look at the diagram below:

In the two diagrams above, the distribution of scores for 'males' and 'females' have the same shape. In
the diagram on the left, you cannot see the distribution of scores for 'males' (illustrated in blue on the
diagram on the right) because the two distributions are identical (i.e., both distributions are identical,
so they are 'on top of each other' in the diagram, with the blue-coloured male distribution underneath
the red-coloured female distribution). However, in the diagram on the right, even though both
distributions have the same shape, they have a different location (i.e., the distribution of one of the
groups of the independent variable has higher or lower values compared to the second distribution –
in our example, females have 'higher' values than males, overall).
m

When you analyse your own data, it is extremely unlikely that your two distributions will be
identical, but they may have the same (or a 'similar') shape. If they do have the same shape, you can
use SPSS Statistics to carry out a Mann-Whitney U test to compare the medians of your dependent
variable (e.g., engagement score) for the two groups (e.g., males and females) of the independent
.co

variable (e.g., gender) you are interested in. However, if your two distribution have a different shape,
you can only use the Mann-Whitney U test to compare mean ranks.

12
Q4. Explain Two-way Analysis of Variance with a focus on its merits and demerits.
Ans. ANOVA stands for analysis of variance and tests for differences in the effects of independent
variables on a dependent variable. A two-way ANOVA test is a statistical test used to determine the
effect of two nominal predictor variables on a continuous outcome variable.
A two-way ANOVA tests the effect of two independent variables on a dependent variable. A two-
way ANOVA test analyzes the effect of the independent variables on the expected outcome along
with their relationship to the outcome itself. Random factors would be considered to have no
statistical influence on a data set, while systematic factors would be considered to have statistical
significance.
By using ANOVA, a researcher is able to determine whether the variability of the outcomes is due to
chance or to the factors in the analysis. ANOVA has many applications in finance, economics, science,
medicine, and social science.
Understanding 2-Way ANOVA: An ANOVA test is the first step in identifying factors that influence
a given outcome. Once an ANOVA test is performed, a tester may be able to perform further analysis
on the systematic factors that are statistically contributing to the data set's variability.
A two-way ANOVA test reveals the results of two independent variables on a dependent variable.
ANOVA test results can then be used in an F-test, a statistical test used to determine whether two
populations with normal distributions share variances or a standard deviation, on the significance of
the regression formula overall.
Analysis of variances is helpful for testing the effects of variables on one another. It is similar to
multiple two-sample t-tests. However, it results in fewer type 1 errors and is appropriate for a range
of issues. An ANOVA test groups differences by comparing the means of each group and includes
spreading out the variance across diverse sources. It is employed with subjects, test groups, between
groups and within groups.
ANOVA vs. 2-Way ANOVA: There are two main types of analysis of variance: one-way (or
unidirectional) and two-way (bidirectional). One-way or two-way refers to the number of
independent variables in your analysis of variance test. A one-way ANOVA evaluates the impact of a
sole factor on a sole response variable. It determines whether the observed differences between the
means of independent (unrelated) groups are explainable by chance alone, or whether there are any
statistically significant differences between groups.
A two-way ANOVA is an extension of the one-way ANOVA. With a one-way, you have one
independent variable affecting a dependent variable. With a two-way ANOVA, there are two
independents. For example, a two-way ANOVA allows a company to compare worker productivity
based on two independent variables, such as department and gender. It is utilized to observe the
interaction between the two factors. It tests the effect of two factors at the same time.
A three-way ANOVA, also known as three-factor ANOVA, is a statistical means of determining the
effect of three factors on an outcome.

Q5. Compute Chi-square for the following data:


m
.co

13
Ans. With their help of this. Find out your answer.

SECTION-C
Answer the following in about 50 words each

Q1. Tabulation
Ans. Tabulation is a systematic & logical presentation of numeric data in rows and columns to
facilitate comparison and statistical analysis. It facilitates comparison by bringing related information
close to each other and helps in further statistical analysis and interpretation.
In other words, the method of placing organised data into a tabular form is called as tabulation. It
may be complex, double or simple depending upon the nature of categorisation.
Also Check: Tabular Presentation of Data .5 Major Objectives Of Tabulation:
(1) To Simplify the Complex Data:
• It reduces the bulk of information, i.e. raw data in a simplified and meaningful form
m

so that it could be easily by a common man in less time.


(2) To Bring Out Essential Features of the Data:
• It brings out the chief/main characteristics of data.
• It presents facts clearly and precisely without textual explanation.
(3) To Facilitate Comparison:
.co

• Presentation of data in row & column is helpful in simultaneous detailed comparison


on the basis of several parameters.

14
(4) To Facilitate Statistical Analysis:
• Tables serve as the best source of organised data for further statistical analysis.
• The task of computing average, dispersion, correlation, etc. becomes easier if data is
presented in the form of a table.
(5) Saving of Space:
• A table presents facts in a better way than the textual form.
• It saves space without sacrificing the quality and quantity of data.

Q2. Interval estimation


Ans. In statistics, the evaluation of a parameter—for example, the mean (average)—of a population
by computing an interval, or range of values, within which the parameter is most likely to be located.
Intervals are commonly chosen such that the parameter falls within with a 95 or 99 percent
probability, called the confidence coefficient. Hence, the intervals are called confidence intervals; the
end points of such an interval are called upper and lower confidence limits.
The interval containing a population parameter is established by calculating that statistic from values
measured on a random sample taken from the population and by applying the knowledge (derived
from probability theory) of the fidelity with which the properties of a sample represent those of the
entire population.
The probability tells what percentage of the time the assignment of the interval will be correct but not
what the chances are that it is true for any given sample. Of the intervals computed from many
samples, a certain percentage will contain the true value of the parameter being sought.

Q3. Level of significance


Ans. The significance level, also denoted as alpha or α, is a measure of the strength of the evidence
that must be present in your sample before you will reject the null hypothesis and conclude that
the effect is statistically significant. The researcher determines the significance level before conducting
the experiment.
The significance level is the probability of rejecting the null hypothesis when it is true. For example, a
significance level of 0.05 indicates a 5% risk of concluding that a difference exists when there is no
actual difference. Lower significance levels indicate that you require stronger evidence before you
will reject the null hypothesis.
Use significance levels during hypothesis testing to help you determine which hypothesis the data
support. Compare your p-value to your significance level. If the p-value is less than your significance
level, you can reject the null hypothesis and conclude that the effect is statistically significant. In other
words, the evidence in your sample is strong enough to be able to reject the null hypothesis at
the population level.

Q4. Direction of correlation


Ans. Correlation is a statistical technique that is used to measure and describe a relationship between
two variables. Usually the two variables are simply observed, not manipulated.
m

The correlation requires two scores from the same individuals. These scores are normally identified as
X and Y. The pairs of scores can be listed in a table or presented in a scatterplot.
Example: We might be interested in the correlation between your SAT-M scores and your GPA at
UNC.
.co

Here are the Math SAT scores and the GPA scores of 13 of the students in this class, and the
scatterplot for all 41 students:

15
The scatterplot has the X values (GPA) on the horizontal (X) axis, and the Y values (MathSAT) on the
vertical (Y) axis. Each individual is identified by a single point (dot) on the graph which is located so
that the coordinates of the point (the X and Y values) match the individual's X (GPA) and Y
(MathSAT) scores.
For example, the student named "Obs5" (in the sixth row of the datasheet) has GPA=2.30 and
MathSAT=710. This student is represented in the scatterplot by high-lighted and labled ("5") dot in the
upper-left part of the scatterplot. Note that is to the right of MathSAT of 710 and above GPA of 2.30.

Q5. Partial correlation


Ans. The Partial Correlation: A partial correlation is basically the correlation between two variables
when a third variable is held constant. Now, that may be a little confusing, but we will delve into it a
little deeper with my diet-exercise routine.
If we look at the relationship between diet and exercise, we see that there is a positive correlation.
What it means, in a practical sense, is that the better I diet, the more weight I will lose. Whew, it’s nice
to know that I got something good for giving up ice cream. If we look at the relationship between
exercise and weight loss, we see a negative correlation, which sounds bad but isn’t. It means that the
more I exercise, the more weight I lose.
But, if I want a complete picture of how both diet and exercise correlate to weight loss, I need to
consider the effect exercise has on dieting and weight loss. This is where the partial correlation is
useful.
Let’s look at a picture of what I mean

This image shows the relationship between diet and weight loss. The shaded region would be the
correlation between these two variables.

This image shows the relationship between exercise and weight loss. The orange region represents the
correlation between the two variables.

16
This is the image that represents what we are after. Notice that the grey region represents the overall
correlation of the three variables. The orange and green shaded regions are still there. These indicate
the partial correlations that still explain some of the weight loss; however, the grey region is how
they both explain what is going on with weight loss.

Q6. Multiple regressions


Ans. Multiple regression generally explains the relationship between multiple independent or
predictor variables and one dependent or criterion variable. A dependent variable is modeled as a
function of several independent variables with corresponding coefficients, along with the constant
term. Multiple regression requires two or more predictor variables, and this is why it is called
multiple regression.
The multiple regression equation explained above takes the following form:
y = b1x1 + b2x2 + … + bnxn + c.
Here, bi’s (i=1,2…n) are the regression coefficients, which represent the value at which the criterion
variable changes when the predictor variable changes.
As an example, let’s say that the test score of a student in an exam will be dependent on various
factors like his focus while attending the class, his intake of food before the exam and the amount of
sleep he gets before the exam. Using this test one can estimate the appropriate relationship among
these factors.
Multiple regression in SPSS is done by selecting “analyze” from the menu. Then, from analyze, select
“regression,” and from regression select “linear.”

Q7. Skewness and Kurtosis


Ans. Skewness and kurtosis are two commonly listed values when you run a software’s descriptive
statistics function. Many books say that these two statistics give you insights into the shape of the
distribution.
Skewness is a measure of the symmetry in a distribution. A symmetrical dataset will have a skewness
equal to 0. So, a normal distribution will have a skewness of 0. Skewness essentially measures the
relative size of the two tails.

Kurtosis is a measure of the combined sizes of the two tails. It measures the amount of probability in
the tails. The value is often compared to the kurtosis of the normal distribution, which is equal to
3. If the kurtosis is greater than 3, then the dataset has heavier tails than a normal distribution (more
in the tails). If the kurtosis is less than 3, then the dataset has lighter tails than a normal distribution
(less in the tails). Careful here. Kurtosis is sometimes reported as “excess kurtosis.” Excess kurtosis
is determined by subtracting 3 form the kurtosis. This makes the normal distribution kurtosis equal
0. Kurtosis originally was thought to measure the peakedness of a distribution. Though you will still
see this as part of the definition in many places, this is a misconception.

17
Q8. Levels of measurement
Ans. The level of measurement refers to the relationship among the values that are assigned to the
attributes for a variable. What does that mean? Begin with the idea of the variable, in this example
“party affiliation.”

That variable has a number of attributes. Let’s assume that in this particular election context the only
relevant attributes are “republican”, “democrat”, and “independent”. For purposes of analyzing the
results of this variable, we arbitrarily assign the values 1, 2 and 3 to the three attributes. The level of
measurement describes the relationship among these three values. In this case, we simply are using the
numbers as shorter placeholders for the lengthier text terms. We don’t assume that higher values
mean “more” of something and lower numbers signify “less”. We don’t assume the the value
of 2 means that democrats are twice something that republicans are. We don’t assume that
republicans are in first place or have the highest priority just because they have the value of 1. In this
case, we only use the values as a shorter name for the attribute. Here, we would describe the level of
measurement as “nominal”.

Q9. Measures of dispersion


Ans. The measure of dispersion indicates the scattering of data. It explains the disparity of data from
one another, delivering a precise view of the distribution of data. The measure of dispersion displays
and gives us an idea about the variation and central value of an individual item.
In other words, Dispersion is the extent to which values in a distribution differ from the average of
the distribution. It gives us an idea about the extent to which individual items vary from one another
and from the central value.
The variation can be measured in different numerical measures, namely:
(i) Range – It is the simplest method of measurement of dispersion and defines the difference
between the largest and the smallest item in a given distribution. Suppose, If Y max and Y
min are the two ultimate items then
Range = Y max – Y min
(ii) Quartile Deviation – It is known as Semi-Inter-Quartile Range, i.e. half of the difference
between the upper quartile and lower quartile. The first quartile is derived as (Q), the middle
digit (Q1) connects the least number with the median of the data. The median of a data set is
the (Q2) second quartile. Lastly, the number connecting the largest number and the median is
the third quartile (Q3). Quartile deviation can be calculated by
Q = ½ × (Q3 – Q1)
(iii) Mean Deviation-Mean deviation is the arithmetic mean (average) of deviations ⎜D⎜of
observations from a central value {Mean or Median}.
Mean deviation can be evaluated by using the formula: A = 1⁄n [∑i|xi – A|]
(iv) Standard Deviation- Standard deviation is the Square Root of the Arithmetic Average of the
squared of the deviations measured from the mean. The standard deviation is given as
σ = [(Σi (yi – ȳ) ⁄ n] ½ = [(Σ i yi 2 ⁄ n) – ȳ 2] ½
Apart from a numerical value, graphics method are also applied for estimating dispersion.

Q10. Kendall’s tau


Ans. Kendall's tau-b (τb) correlation coefficient (Kendall's tau-b, for short) is a nonparametric measure
of the strength and direction of association that exists between two variables measured on at least an

18
ordinal scale. It is considered a nonparametric alternative to the Pearson’s product-moment
correlation when your data has failed one or more of the assumptions of this test. It is also considered
an alternative to the nonparametric Spearman rank-order correlation coefficient (especially when you
have a small sample size with many tied ranks). If you consider one of your variables as an
independent variable and the other as a dependent variable, you might consider running a Somers'
d test instead.
For example, you could use Kendall's tau-b to understand whether there is an association between
exam grade and time spent revising (i.e., where there were six possible exam grades – A, B, C, D, E
and F – and revision time was split into five categories: less than 5 hours, 5-9 hours, 10-14 hours, 15-19
hours, and 20 hours or more). Alternately, you could use Kendall's tau-b to understand whether there
is an association between customer satisfaction and delivery time (i.e., where delivery time had four
categories – next day, 2 working days, 3-5 working days, and more than 5 working days – and
customer satisfaction was measured in terms of the level of agreement customers had with the
following statement, "I am satisfied with the time it took for my parcel to be delivered", where the
level of agreement had five categories: strongly agree, agree, neither agree nor disagree, disagree and
strongly disagree).
This "quick start" guide shows you how to carry out Kendall's tau-b using SPSS Statistics. We show
you the main procedure to carry out Kendall's tau-b in the Procedure section. However, first we
introduce you to the assumptions that you must consider when carrying out Kendall's tau-b.

19

You might also like