0% found this document useful (0 votes)
43 views31 pages

1-PDP On Decoding Statistics For Data Analysis - Day 1 - Test of Normality

- The document discusses choosing appropriate statistical methods based on characteristics of the data such as type of variables, sample size, hypothesis being tested, and dependence/independence of variables. - It provides examples of common statistical tests for different data types and hypotheses, such as parametric vs non-parametric tests, univariate vs bivariate vs multivariate analyses, and tests for differences vs tests for associations. - Checking for normality of data distributions is discussed, including graphical and statistical tests of normality and how to interpret their results. Skewness and kurtosis are introduced as measures of normality.

Uploaded by

Dr.Shaifali Garg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views31 pages

1-PDP On Decoding Statistics For Data Analysis - Day 1 - Test of Normality

- The document discusses choosing appropriate statistical methods based on characteristics of the data such as type of variables, sample size, hypothesis being tested, and dependence/independence of variables. - It provides examples of common statistical tests for different data types and hypotheses, such as parametric vs non-parametric tests, univariate vs bivariate vs multivariate analyses, and tests for differences vs tests for associations. - Checking for normality of data distributions is discussed, including graphical and statistical tests of normality and how to interpret their results. Skewness and kurtosis are introduced as measures of normality.

Uploaded by

Dr.Shaifali Garg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

Day 1

• Choosing a statistical method for analysis based on type of data


• Rationale for checking normality
• Conversion to Z Scores-Logic, method and usefulness
• Tests/Methods of checking normality
• Practical problems
• Interpretation of the results
Choosing a statistical method

Choosing an appropriate statistical


method

Depends Upon

Number, Distribution Dependence and


and level of Nature of
independence Sample size
measurement of hypothesis
structure
variables
Choosing a statistical method-Variables

Distribution of underlying variables

Numerical variables with data No Distribution


distributions

Parametric Tests Non-Parametric tests


Choosing a statistical method-Variables

Number of variables in the Level of measurement of variables


hypothesis

Categorical Numerical
One Two More than two

Nominal Ordinal Interval Ratio


Univariate Bivariate Multivariate
Choosing a statistical method-Dependence

Dependence/Independence structure

Dependent variables Independent variables


Choosing a statistical method-Hypothesis

Nature of Hypothesis

Hypothesis of
Hypothesis of differences
Association/Causation

Examining the relationship Examining the differences in


between variables populations
Univariate
data Description/Inference

Descriptive Statistics Inference

Hypothesis of difference

One sample t-test


Bi-variate data
Nature of Hypothesis

Hypothesis of Hypothesis of
difference Association

Data Data

Parametric Non Parametric Non-parametric


Parametric

Samples
Mann
Whitney Association Causatio Nominal
test n
Independe
Paired Ordinal
nt

Chi Square
Independe Paired
nt Sample sample
t test t test Spearman’s
Correlation Correlation
Regression
Multi-variate data
Nature of Hypothesis

Hypothesis of association
Hypothesis of differences

Type of DV
Data

Numerical Categorical

Parametric Non Parametric


Type of IVs Type of IVs

Numerical Numerical and


Numerical
Categorical
ANOVA Kruskal Wallis

Multiple Regression Discriminant


Logistic regression
analysis
Choosing a statistical method Inter-relationships

Multiple dependency
Grouping Variable/Dimension and multiple
Cases/Respondents reduction relationships

Cluster analysis Factor Analysis Structural Equation


Modelling
Real life examples of
Normality
Relevance of
Central limit
theorem

• Normality is the basis for


many statistical tests
• The sample should be
representative of the
population to ensure that
we can generalise the
findings from the
research sample to the
population as a whole.
Concept of Normal Distribution
Conversion to Z Scores
Is my data normal? Graphical and statistical data exploration

• Normality must • Look at the data • Histograms


be checked for graphically first • Boxplots
each variable • Look for patterns,
separately • PP Plots and Q Q
relationships and/ Plots
• Data need not be potential problems
perfectly normal, • Skewness
• Then use tests to
It can be confirm/reject your • Kurtosis
approximately initial observation
normal • Shapiro-
• Garbage in garbage out Kolmogorov test
Box plot

• It is a simple way of representing


statistical data on a plot in which a
rectangle is drawn to represent the
first and third quartiles, usually
with a vertical line inside to indicate
the median value. The lower and
upper values are shown as horizontal
lines on either side of the rectangle.
One simple problem
Understanding boxplots
• The reaction times (in milliseconds) of a group of 20-
year-olds and a group of 30-year-olds were tested.
The reaction times for the 20-year-olds has been
plotted below:
• The reaction times for the 30-year-olds are as
follows:
• 220, 252, 256, 312, 332, 332,400
• Construct a box plot for this set of the data and note
two differences between the two groups.
Skewness

• Skewness is a measure of the asymmetry of a


distribution. This value can be positive or
negative.
• Negative skew indicates that the tail is on
the left side of the distribution, which
extends towards more negative values.
• Positive skew indicates that the tail is on
the right side of the distribution, which
extends towards more positive values.
• A value of zero indicates that there is no
skewness in the distribution at all, meaning
the distribution is perfectly symmetrical.
• Different formulae: By Pearson, Bowley,
Kelly, Moments
Interpretation

• If the skewness comes to less than -1 or greater than +1, the data
distribution is highly skewed
• If the skewness comes to between -1 and -0.5, or between 0.5 and
+1, the data distribution is moderately skewed.
• If the skewness is between -0.5 and between 0.5,the distribution is
approximately symmetric
Quick check

• In the graph indicating a


negative distribution and
skewed bell curve, what
do A, B and C represent,
respectively?
• Mode, Median, Mean
• Mean, Median, Mode
• Median, Mode, Mean
• Mean, Mode, Median
One illustration
Solution:

Class interval Frequency Midpoint (X) XF X-142.5 (X-142.5)^2 F*(X-142.5)^2 (X-142.5)^3 F*(X-142.5)^3
0-50 2 25 50 -117.5 13806.25 27612.5 -1622234.375 -3244468.75
50-100 3 75 225 -67.5 4556.25 13668.75 -307546.875 -922640.625
100-150 5 125 625 -17.5 306.25 1531.25 -5359.375 -26796.875
150-200 6 175 1050 32.5 1056.25 6337.5 34328.125 205968.75
200-250 4 225 900 82.5 6806.25 27225 561515.625 2246062.5
20Sum 2850 -87.5 26531.25 76375 -1339296.875 -1741875
Mean= 142.5
Standard
deviation 61.80
Skewness= -0.39

The skewness is between -0.5 and between 0.5,the distribution is approximately symmetric
• Kurtosis is a measure of whether or not a
distribution is heavy-tailed or light-tailed
Kurtosis relative to a normal distribution.
• The kurtosis of a normal distribution is 3.
• If a given distribution has a kurtosis less
than 3, it is said to be platykurtic, which
means it tends to produce fewer and less
extreme outliers than the normal
distribution.
• If a given distribution has a kurtosis
greater than 3, it is said to be leptokurtic,
which means it tends to produce more
outliers than the normal distribution.
Kurtosis-Formula,
interpretation If Positive,
and illustration Then It Is a Leptokurtic Distribution

If Zero,
Then It Is a Mesokurtic Distribution

If Negative,
Then It Is a Platykurtic Distribution

Calculate Kurtosis for the given data:


26 12 16 56 112 24
Calculate Kurtosis for the given data:

Y Y-41 (Y-41)^2 (Y-41)^4


26 -15 225 50625 Mean 41
12 -29 841 707281
16 -25 625 390625 Second moment or
variance 1207.667
56 15 225 50625
112 71 5041 25411681 Fourth moment 4449060
24 -17 289 83521 Kurtosis 3.050521
0 7246 26694358

Since the kurtosis of the distribution is more than 3, it


means it is a leptokurtic distribution
Broad guideline for reporting of normality

• A visual inspection of the histogram/box plot/Q Q plot show that


the data (e.g. exams scores) were approximately normally
distributed for both Group A (males) with the skewness of
____________and Kurtosis of __________, and Group B
(females) with the skewness of ________ and Kurtosis of
________

You might also like