0% found this document useful (0 votes)

46 views6 pages

Normal Probability Plot: Shibdas@isical - Ac.in

This document discusses normal probability plots and their use in checking the normality assumption for regression models. It notes that normal probability plots of raw data or regression residuals are inappropriate when observations have unequal expectations or are dependent. The key points are: 1) It extracts independent and identically distributed components from the regression residuals using principal component analysis to overcome issues of unequal expectations and dependence. 2) These independent components extracted from the residuals are then used in a normal probability plot, providing a more appropriate check of the normality assumption for the regression model. 3) The technique is illustrated using a real data example from a match factory study.

Uploaded by

Ji Sian Lee

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

46 views6 pages

Normal Probability Plot: Shibdas@isical - Ac.in

Uploaded by

Ji Sian Lee

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

You are on page 1/ 6

Normal Probability plot

Shibdas Bandyopadhyay
[email protected]
Indian Statistical Institute

Abstract

Normal probability plots are made to graphically verify normality assumption for
data from a univariate population that are mutually independent and
identically distributed. Normal probability plot is very common option in
most statistical packages. In the context of design of experiments or
regression, though the observations are assumed to be mutually independent
and homoscedastic, they have different unknown expectations. So the raw
data are inappropriate for normality check. To overcome the problem of
unequal expectations, it is common to use residuals of a fitted regression
model. The residuals have zero expectation, but these are heteroscedastic,
and also mutually dependent. It is thus inappropriate to use the residuals for
normality check. In this study, mutually independent homoscedastic
components with zero mean are extracted from residuals through principle
component analysis; these are then used for normal probability plot. The
technique is illustrated with data.

Key words and phrases: Normal probability plot, principal component analysis.
AMS (1991) subject classification: 62P.
Normal Probability plot
Shibdas Bandyopadhyay
[email protected]
Indian Statistical Institute

1. Introduction
Let Y1 , Y2 , ... .., Yn be mutually independent with common mean  and standard
deviation . To check graphically if the data are from a common normal
distribution, one plots Y(i ) , the ith ordered statistic of Y1 , Y2 , ... .., Yn , against
 1 (ci ) , i = 1,2, …., n; if the line plot is nearly linear, one is satisfied with the
normality assumption. In the plot,  happens to be the slope of the straight line of
Y(i ) on  1 (ci ) ; c i ' s are chosen to estimate  ‘efficiently’ (David and Nagaraja,
2003). Currently used c i ' s (Blom, 1958) in statistical packages like in Minitab are:

ci = (i- 3
8 )/( n  14 ), i=1,2,…, n.
(1.1)

Line plot of Y(i ) on  1 (ci ) is called Normal Probability Plot.

While testing for , it is natural to check normality assumption using normal

probability plot. Use of normal probability plot to check normality assumption has
been common in other situations also. In this study, we shall consider the use of
normal probability plot to check normality assumption for response in the context
of regression and design of experiments.

Consider the standard linear regression model:

Y = X +  (1.2)

where Y is n1 response vector, X is np design matrix of rank r  p,  is p1 vector
of unknown parameters, and  is n1 unobservable vector of error components;
error components are assumed to be mutually independent and identically
distributed with zero mean and standard deviation .
Though the n components of Y are independently distributed with common
standard deviation , components of Y do not have a common mean . The ith
component Y i of Y has the mean  i = X i' , where X i' is the ith row of X, i=1,2,…, n.
1
So, a line plot of Y(i ) on  (ci ) is not meaningful to check the normality of Y i ’s.
It has become a standard practice, as in Minitab, to work with ˆ , the n1 residuals:

ˆ = Y – X ̂ (1.3)
and make a line plot of ˆi , the i component of ˆ , on  (ci ) , ci ’s given by (1.1).
th 1

We use a match factory data (Roy et al, 1959) for illustration. Data are scores of n =
25 workers on three psychological tests U 1 , U 2 , U 3 and also their efficiency index
Y.
Components of ˆ ' after fitting the regression
Y =  1 +  2 U 1 + 3 U 2 + 4 U 3
(1.4)
(with X 1 1, X 2 =U 1 , X 3 =U 2 and X 4 =U 3 ) is 1  25:

( 3.33 –0.18 –0.88 –3.62 –5.16 –2.24 0.92 3.42 –0.22 –0.52
–1.61 –1.37 –1.27 1.31 0.12 1.16 2.17 0.66 0.88 –3.07
0.055 –2.28 0.69 3.87 3.84).

Fig.1 is a line plot of ˆi on  (ci ) , with ci = (i-

1 3
8 )/(25.25), i=1,2,…, 25.
Regression residuals

6.00
4.00
2.00
0.00
-3 -2 -1 -2.00 0 1 2 3
-4.00
-6.00
Phi-inverse(Ci)

Fig.1 : Normal Probability Plot with regression residuals

But this line plot of  1 (ci ) on ˆi with ci = (i- 83 )/( n  14 ), i=1,2,…, n is not
appropriate to check normality of Y i ’s. It true that, when the mutually
independent Y i ’s are normally distributed with mean  i = X i  and
'

common standard deviation , ˆi ’s are distributes as normal with mean

zero but standard deviations are different multiples (depending on X) of
. Also ˆi ’s are not mutually independent. So, one needs modification
(Hocking, 2003).
This study suggests a natural modification by extracting independent and
identically distributed normal components from ˆ = Y – X ˆ using
principal component analysis. It will not be possible to carry out the
suggested modification by using statistical tables and calculators or on
PC; it computer intensive. One would need principal component analysis
module, which is common in most statistical packages such as Eigen
Analysis in Minitab.

2. Extraction of independent and identically distributed components using

principal component analysis

Consider the regression model Y = X +  along with what follows (1.1). One may
write ˆ as, for ̂ = (X'X)  X'Y,
ˆ = Y – X ̂ = (I n -X(X'X)  X') Y  HY (2.1)
where (X'X) is a g-inverse of X'X and H = I n -X(X'X) X'. It follows that ˆ has
 

a singular normal distribution, mass of the joint density of n components of ˆ lies

in
(n-r) dimension, with zero mean and covariance matrix  2 H with rank (H)= (n-r).
Since H is symmetric and idempotent of rank (n-r), characteristic roots of H are 1 of
multiplicity (n-r) and 0 of multiplicity r. Using spectral decomposition of H we may
write

 I nr 0 
H=P   P', PP' = P'P = I n .

0 0
P is non-stochastic orthogonal matrix and depends only on the design matrix X. P' ˆ
has a singular normal distribution, mass of the joint density of n components of P' ˆ

 I n r 0 
lies in (n-r) dimension, with zero mean and covariance matrix  2   . Thus, if

 0 0
we write P = ( P (1) P ( 2 ) ), where P (1) consists of the first (n-r) columns of P (the
characteristic vectors corresponding to the (n-r) non-zero characteristic roots of H )
and P ( 2 ) consists of the remaining r columns of P, (n-r) components of P' (1) ˆ are
independent and identically distributed normal with zero mean and standard
deviation  while the remaining r components of P' ( 2 ) ˆ are identically zero ( zero
mean and zero variance).

For the match factory data, (P' ˆ )' is 1  25,

(P' ˆ )' = ( 1.84 –0.36 1.30 1.40 0.59 –4.58
–2.55 –1.93 0.26 –1.56 0.82 2.83
0.68 2.77 3.97 –2.59 –3.63 2.11
5.54 –0.86 –0.061 0 0 0 0). (2.2)
Notice that each of the last four components of P' ˆ , P' ( 2 ) ˆ , is 0, as these should
be.

Fig.2 is a line plot of ith ordered statistic of the 21 components of P’ (1) ˆ with
1
ci = (i- 3
8 )/(21.25), (since r=p=4, n-r= 21) on  (ci ) , i=1,2,…, 21.

6.00
PC of residuals

4.00
2.00
0.00
-3 -2 -1 -2.00 0 1 2 3
-4.00
-6.00
Phi-inverse(Ci)

Fig.2 : Normal Probability Plot with PC of regression residuals

We do not wish to compare the two figures. We only want to point out that the
analysis suggested with principal components is an appropriate method and is not
difficult to implement in packages that have eigen analysis module.

References

Blom, G. (1958). Statistical Estimates and Transformed Beta-Variables. Wiley, New

York.
David, H.A. and Nagaraja, H. N. (2003). Order Statistics. Wiley – Interscience.
Hocking R. R.(2003). Methods and Applications of Linear Models. Wiley –
Interscience.
Roy, J., Chakravarty, I.M. and Laha, R.G.(1959). Handbook of Methods of
Applied Statistics, Vol. 1. John Wiley & Sons, Inc.

Introductory Statistics: Student Solutions Manual To Accompany
100% (1)
Introductory Statistics: Student Solutions Manual To Accompany
192 pages
Introduction To Statistical Methods: BITS Pilani
No ratings yet
Introduction To Statistical Methods: BITS Pilani
40 pages
Module 4 - STAT 311 - 022001
No ratings yet
Module 4 - STAT 311 - 022001
29 pages
00000chen - Linear Regression Analysis3
No ratings yet
00000chen - Linear Regression Analysis3
252 pages
Normal Distribution and Regression Notes
No ratings yet
Normal Distribution and Regression Notes
71 pages
Gec004 - Module 4 - Normal Distribution and Regression
No ratings yet
Gec004 - Module 4 - Normal Distribution and Regression
84 pages
Inse 6220
No ratings yet
Inse 6220
51 pages
Business Forecasting 9th Edition Hanke Solution Manual
71% (7)
Business Forecasting 9th Edition Hanke Solution Manual
9 pages
STA2005S Regression
No ratings yet
STA2005S Regression
92 pages
Rakhlin Mathstat sp22
No ratings yet
Rakhlin Mathstat sp22
108 pages
Statistics Normality
No ratings yet
Statistics Normality
42 pages
FCDS - RA ch3 Sp21
No ratings yet
FCDS - RA ch3 Sp21
20 pages
C195X PDF AppA
No ratings yet
C195X PDF AppA
23 pages
1965 - Shapiro - Analysis Variance Normality
No ratings yet
1965 - Shapiro - Analysis Variance Normality
21 pages
Stats 2 Notes
No ratings yet
Stats 2 Notes
17 pages
Sst414-Lesson 4
No ratings yet
Sst414-Lesson 4
12 pages
Bus 173 - 1
No ratings yet
Bus 173 - 1
28 pages
Edda Course Notes
No ratings yet
Edda Course Notes
310 pages
Intro To Essential Stats With Python
No ratings yet
Intro To Essential Stats With Python
51 pages
Prob 3160 CH 8
No ratings yet
Prob 3160 CH 8
10 pages
Hypothesis Testing in The Classical Regression Model
No ratings yet
Hypothesis Testing in The Classical Regression Model
13 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
23 pages
Normaldistribution
No ratings yet
Normaldistribution
10 pages
Normal Statistics Estimation
No ratings yet
Normal Statistics Estimation
8 pages
Module01 ProbabilityAndHypothesisTesting
No ratings yet
Module01 ProbabilityAndHypothesisTesting
62 pages
EJ1165803
No ratings yet
EJ1165803
15 pages
Analyt Tech Newpr
No ratings yet
Analyt Tech Newpr
14 pages
1962-Some Problems Connected With Rayleigh Distributions
No ratings yet
1962-Some Problems Connected With Rayleigh Distributions
8 pages
CS215 Autumn 2024-1
No ratings yet
CS215 Autumn 2024-1
6 pages
MMW 101 - Lesson 10 - Z-Scores and Normal Curve
No ratings yet
MMW 101 - Lesson 10 - Z-Scores and Normal Curve
45 pages
Measures of Central Tendency and Dispersion: Mean or Average
No ratings yet
Measures of Central Tendency and Dispersion: Mean or Average
7 pages
CourseWork1 20122013 Solutions
No ratings yet
CourseWork1 20122013 Solutions
10 pages
Lecture 6: Classical Normal Linear Regression Model Some Basic Ideas
No ratings yet
Lecture 6: Classical Normal Linear Regression Model Some Basic Ideas
9 pages
Homework Topic 1&2.: Plus 20
No ratings yet
Homework Topic 1&2.: Plus 20
11 pages
Norm Alp Rob Plot
No ratings yet
Norm Alp Rob Plot
3 pages
Lecture 5
No ratings yet
Lecture 5
19 pages
Stat Modelling Notes
No ratings yet
Stat Modelling Notes
49 pages
FALK 2010 Comparison of Common Tests For Normality
No ratings yet
FALK 2010 Comparison of Common Tests For Normality
103 pages
Statistical+Inference+1 Shaw2007
No ratings yet
Statistical+Inference+1 Shaw2007
66 pages
18.443 MIT Stats Course
No ratings yet
18.443 MIT Stats Course
139 pages
Jomapa Shs Worktext in Stat. Prob. Lesson 2
No ratings yet
Jomapa Shs Worktext in Stat. Prob. Lesson 2
8 pages
Lecture 1
No ratings yet
Lecture 1
8 pages
(A) Modeling: 2.3 Models For Binary Responses
No ratings yet
(A) Modeling: 2.3 Models For Binary Responses
6 pages
Checking Model Assumptions
No ratings yet
Checking Model Assumptions
4 pages
Hinkley 1975
No ratings yet
Hinkley 1975
12 pages
Community Project: Checking Normality For Parametric Tests in R
No ratings yet
Community Project: Checking Normality For Parametric Tests in R
4 pages
Department of Mathematics Indian Institute of Technology, Kharagpur Module No. #01 Lecture No. #39 Testing of Hypothesis-VII
No ratings yet
Department of Mathematics Indian Institute of Technology, Kharagpur Module No. #01 Lecture No. #39 Testing of Hypothesis-VII
17 pages
Dagostino
No ratings yet
Dagostino
7 pages
Lecture4 Mech SU
No ratings yet
Lecture4 Mech SU
17 pages
Compilation of Lessons and Activities
No ratings yet
Compilation of Lessons and Activities
37 pages
Statistics and Probability: Normal Distribution
No ratings yet
Statistics and Probability: Normal Distribution
40 pages
How To Improve Your English
100% (1)
How To Improve Your English
4 pages
Approximating The Shapiro-Wilk W-Test For Non-Normality
No ratings yet
Approximating The Shapiro-Wilk W-Test For Non-Normality
3 pages
Czekanowski Index-Based Similarity As Alternative Correlation Measure in N-Asset Portfolio Analysis
No ratings yet
Czekanowski Index-Based Similarity As Alternative Correlation Measure in N-Asset Portfolio Analysis
1 page
Ass 3 Skeleton - 1
No ratings yet
Ass 3 Skeleton - 1
4 pages
Residual Analysis
No ratings yet
Residual Analysis
6 pages
1) Common Univariate Summaries: I) I) Iii) I) Ii)
No ratings yet
1) Common Univariate Summaries: I) I) Iii) I) Ii)
5 pages
Jarque - Bera Test
No ratings yet
Jarque - Bera Test
3 pages
Draw Space
No ratings yet
Draw Space
8 pages
BI SR Nilam Daerah 2018
100% (1)
BI SR Nilam Daerah 2018
3 pages
Mental Fitness PDF
100% (1)
Mental Fitness PDF
9 pages
MODULE 5 - SUMMATIVE Test
No ratings yet
MODULE 5 - SUMMATIVE Test
2 pages
Geometrical Optics ASSIGN - Student
No ratings yet
Geometrical Optics ASSIGN - Student
29 pages
What Is Administration of Justice? Explain Its Kinds. Distinguish Between Civil and Criminal Justice
No ratings yet
What Is Administration of Justice? Explain Its Kinds. Distinguish Between Civil and Criminal Justice
32 pages
FSA ELA Reading Practice Test Questions: Grade 10
No ratings yet
FSA ELA Reading Practice Test Questions: Grade 10
26 pages
Employee Motivation Project Report
No ratings yet
Employee Motivation Project Report
62 pages
SubAtomic Particles
No ratings yet
SubAtomic Particles
28 pages
MAt Design Complete
No ratings yet
MAt Design Complete
13 pages
Strategic Mgt. Process
No ratings yet
Strategic Mgt. Process
13 pages
Grade 7 - Half Yearly Review Timetable & Portion 2024-25
No ratings yet
Grade 7 - Half Yearly Review Timetable & Portion 2024-25
2 pages
Amplitude Modulation On MATLAB Simulink
No ratings yet
Amplitude Modulation On MATLAB Simulink
4 pages
Free Talking (Senior) 2 20240830114548
No ratings yet
Free Talking (Senior) 2 20240830114548
81 pages
Chemistry For Human Welfare Promises and Concerns
100% (1)
Chemistry For Human Welfare Promises and Concerns
2 pages
Progress in Energy and Combustion Science: Steffen Heidenreich, Pier Ugo Foscolo
No ratings yet
Progress in Energy and Combustion Science: Steffen Heidenreich, Pier Ugo Foscolo
24 pages
Volunteering at Pehchaan The Street School
No ratings yet
Volunteering at Pehchaan The Street School
8 pages
Table 1. Standard Rating Conditions: Gpm/ton 105.00
No ratings yet
Table 1. Standard Rating Conditions: Gpm/ton 105.00
28 pages
CXTFit ISMAR PDF
100% (1)
CXTFit ISMAR PDF
4 pages
RM1038-e - Pasio 50
No ratings yet
RM1038-e - Pasio 50
6 pages
1st Sem Result of BA (Economics) Exam Held in Feb 2023
No ratings yet
1st Sem Result of BA (Economics) Exam Held in Feb 2023
44 pages
10-Year Project TOKIO
No ratings yet
10-Year Project TOKIO
16 pages
Study On Optimization and Pigment Analysis of Beetroot (Beta-Vulgaris) and Their Application As Natural Dyes
No ratings yet
Study On Optimization and Pigment Analysis of Beetroot (Beta-Vulgaris) and Their Application As Natural Dyes
5 pages
FCU Dual Systems Final Report
No ratings yet
FCU Dual Systems Final Report
133 pages
Brochure Concept House Village English
No ratings yet
Brochure Concept House Village English
2 pages
For and Against Essay Writing Creative Writing Tasks 86038
No ratings yet
For and Against Essay Writing Creative Writing Tasks 86038
2 pages
Laminar Mass Transfer From Porous Tubes and Flat Plates With Wall Resistance
No ratings yet
Laminar Mass Transfer From Porous Tubes and Flat Plates With Wall Resistance
17 pages
Trastornos de La Alimentación y Deglución en Niños Con Parálisis Cerebral: Presencia y Severidad
No ratings yet
Trastornos de La Alimentación y Deglución en Niños Con Parálisis Cerebral: Presencia y Severidad
10 pages
Curriculum Vitae Samreen Shahbaz Gul: Academic Qualifications
No ratings yet
Curriculum Vitae Samreen Shahbaz Gul: Academic Qualifications
2 pages

Normal Probability Plot: Shibdas@isical - Ac.in

Uploaded by

Normal Probability Plot: Shibdas@isical - Ac.in

Uploaded by

Normal Probability plot

Line plot of Y(i ) on  1 (ci ) is called Normal Probability Plot.

While testing for , it is natural to check normality assumption using normal

Consider the standard linear regression model:

Fig.1 is a line plot of ˆi on  (ci ) , with ci = (i-

Fig.1 : Normal Probability Plot with regression residuals

common standard deviation , ˆi ’s are distributes as normal with mean

2. Extraction of independent and identically distributed components using

a singular normal distribution, mass of the joint density of n components of ˆ lies

For the match factory data, (P' ˆ )' is 1  25,

Fig.2 : Normal Probability Plot with PC of regression residuals

Blom, G. (1958). Statistical Estimates and Transformed Beta-Variables. Wiley, New

You might also like