Class 29 1

Uploaded by

sat198709

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

51 views3 pages

Class 29 1

Uploaded by

sat198709

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

IN THE CLASSROOM

s RICK HESSE, Feature Editor, Pepperdine University

Normal Probability Plots

By Rick Hesse, Pepperdine University

ince this column has dealt with spread sheet applications of decision sciences, readers may be interested that during June 27-30, 1998, the Tuck School at Dartmouth College will host an intensive, three-day workshop on Teaching Management Science with Spreadsheets. The workshop is intended for teachers who are new to teaching management science with spreadsheets, as well as those who are experienced. It will bring together a faculty of 15 experts in teaching with spreadsheets (including me) to work closely with a participant group of no more than 60 on a wide range of topics, from how to utilize spreadsheets in the classroom effectively, to how to teach nonlinear programming using spreadsheets. The workshop is cosponsored by INFORMS, DSI, and IFORS. The cost is $695 before May 15, 1998, and $795 thereafter. Complete information and registration forms are available at: www.dartmouth.edu/tuck/tmss

As I have started to teach graduate and undergraduate statistics again these last two years, I have been impressed with how many simple, visual tools are available to look at the data. A few months ago I shared a Box and Whiskers plot and now would like to share a normal probability plot. These plots are a quick and dirty visual graphing technique to see if a data set exhibits the properties of a normal distribution. The idea is to rank the data set and change the ranks into percentiles that would be converted to z-scores. If the data is indeed approximately normally distributed, then the converted data points should lie in a straight line. Since the human eye can distinguish simple lines easier than curved ones, it is a quick and dirty visual test of normality, rather than just the cumulative probability plot. Shown in Figure 1 is the cumulative probability plot for the grades for 17 students of mine in the summer statistics class. The data to build the step function is shown along with the plot.

S core 49 57 58 61 62

R ank 1 2 3 4 5 6 7 8.5 8.5 10 11.5 11.5 13 14.5 14.5 16 17

(R -.5)/N 2.9% 8.8% 1 4.7% 2 0.6% 2 6.5% 3 2.4% 3 8.2% 4 7.1% 4 7.1% 5 5.9% 6 4.7% 6 4.7% 7 3.5% 8 2.4% 8 2.4% 9 1.2% 9 7.1% 0% 45 50 55 60 65 70 75 80 85 90 9 5 100 20% 40% 60% 80% 1 00%
Em pirical c df

Rick Hesse
is professor of quantitative methods at Pepperdine University in the Graziadia Graduate School of Business. He received his B.S., M.S., and D.Sc. at Washington University School of Engineering in applied math and computer science. Dr. Hesse is the author of Managerial Spreadsheet Modeling & Analysis and Applied Management Science: A Quick & Dirty Approach (with Gene Woolsey), articles in numerous journals, and software for personal computers. Rick was the first professor to be awarded the Outstanding Civilian Service Medal by the Department of the Army at West Point in 1982, and was the winner of the Decision Sciences Institutes Innovative Instructional Award in 1981.

64 66 67 67 73 74 74 84 87 87 89 91

Figure 1: Cumulative plot.

Decision Line, December/January 1998

If the data approximates a normal distribution, the curve should look like an S, but this is a difficult visual to quantify for most people. How much does it have to look like an S, for the data to approximate a normal curve? Human beings are better at identifying something that looks like a straight line. Therefore, a normal probability plot will be an easier way to check for normality. Excel 97, although it has many new features, does not have a Data Analysis program to convert a data set into a normal probability plot (or a cumulative step function for that matter), and so this article will give a simple template to accomplish this. A special feature of this template is that the data does not have to be sorted, but can be in any order, as long as it is in one column. Shown in Figure 2 is the data for the template, including the raw scores in column A (given the range name XS), the computed ranks (1 is lowest) in column B, the associated percentile with that rank in column C, and finally the zscores for that percentile in column D (given the range name ZS). The important cell formulas are given below, with B4 being the most complicated, because Excel only gives the lowest rank for tied ranks. B4: (COUNT(XS)+1+RANK(A4,XS,1) RANK(A4,XS,0))/2 C4: (B4-0.5)/COUNT(XS) D4: NORMINV(C4,0,1) Once the z-scores have been computed, columns A and D are graphed. This is an XY scatter plot that uses points only (otherwise the graph would look like Etch-A-Sketch!). After the graph is created, the trendline and the equation are added on the chart by clicking on the data points, then using the right mouse button to bring up the trend line menu. By checking the option to show the equation and r2, we have all the information shown in Figure 3. This graph will allow us to visually check if the points are indeed close to a straight line or to see if there is a pattern to the points being above or below the line. If the data is somewhat close to being normally distributed, the points should lie approximately on the trend line, with the line crossing the x-axis at about the mean of the data, and the inverse of the slope should be close to the standard deviation of the data.
z-Sc o re

A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

N o rm al Pro bab ilit y Plo t

Sco re 57 67 49 64 73 91 66 87 74 58 62 74 84 87 67 61 89 Rank 2 8 .5 1 6 10 17 7 1 4 .5 1 1 .5 3 5 1 1 .5 13 1 4 .5 8 .5 4 16 (R-. 5 )/ N 8.8% 4 7 .1 % 2.9% 3 2 .4 % 5 5 .9 % 9 7 .1 % 3 8 .2 % 8 2 .4 % 6 4 .7 % 1 4 .7 % 2 6 .5 % 6 4 .7 % 7 3 .5 % 8 2 .4 % 4 7 .1 % 2 0 .6 % 9 1 .2 % z-sc ore -1 . 3 5 2 -0 . 0 7 4 -1 . 8 9 0 -0 . 4 5 8 0 .1 4 8 1 .8 9 0 -0 . 2 9 9 0 .9 2 9 0 .3 7 7 -1 . 0 4 9 -0 . 6 2 9 0 .3 7 7 0 .6 2 9 0 .9 2 9 -0 . 0 7 4 -0 . 8 2 1 1 .3 5 2

Figure 2: Data for normal probability plot.

N orm a l Pro b a bility Plo t

3 2 1 0 -1 4 5 -2 -3
R a w Sco re

y = 0.0 764x - 5 .4416 R = 0.9 535

Figure 3: Regression line added to probability plot.

Excel gives the regression line coefficients on the chart, but they can also be computed using the formulas given for cells F18 and H18 (see Figure 5.) F18: = TREND(ZS,XS,1,1)-H18 H18: = TREND(ZS,XS,0,1) For normal probability plots, the mean is approximated by where the straight line crosses the x-axis (-intercept/slope) and the

approximate standard deviation is the reciprocal of the slope. These are computed in cells G3 and G4 and are also shown later in Figure 5. G3: = -H18/F18 G4: = ABS(1/F18) Looking at Figure 3, it is pretty easy to see that these few data points seem to

Decision Line, December/January 1998

Figure 4: Box and whiskers plot.

E 2 3 4 5 6 18 19 A v erag e St d D ev M ed ian P(< = av g ) Slop e

F Calc ulat ed 7 1 .1 8 1 2 .6 6 6 7 .0 0 5 2 .9 4 % 0 .0 7 6 4

G Est im at ed 71 .19 13 .08

H % d if f eren c e 0.02 % 3.37 % 5.87 % 5.88 % -5 . 4 4 1 6

In t erc ep t

Figure 5: Calculated and estimated summary data. fall on a straight line and that the data distribution seems to be normal. This visual check is a lot easier than using Figure 1 to see if the plot looks like an S-shaped curve. If a box and whiskers plot is used (see Figure 4), the plot is inconclusive, because a normal box plot should have two long whiskers and equal rectangles in the box. When there are a lot of data points, not all of the data points need be plottedperhaps every other or every third. This is true both for the normal probability plot and box and whiskers. It is an easy matter to expand (or contract) the template in the middle (somewhere between rows 7-17). When all the appropriate formulas are copied, the graph is automatically redrawn and the trend line recomputed by Excel. A final feature of the template is shown in Figure 5. The percent difference between the calculated and estimated mean and standard deviation are given in H3:H4. Also given is the median for purposes of comparing how close it is to the mean (which theoretically are equal for a normal distribution). The median is calculated and compared to the calculated mean and then a simple logic check is used in K3:K19 (not shown) to compute the percentage of points below the mean in F6. The cell formulas are: F3: = AVERAGE(XS) F4: = STDEV(XS) F5: = MEDIAN(XS) F6: = AVERAGE(K3:K19) K3: = (A3<=$F$3)*1 copy to K3:K19 H5: = ABS(F5/F3-1) H6: = ABS(1-F6/.5) H3: = ABS(G3/F3-1) copy to H4 This template, with both the visual plot and the actual versus estimated parameters, should be able to give the user a good idea if the data is normal without resorting to more difficult tests. s

Dr. Rick Hesse, Graziadia Graduate School of Business, Pepperdine University, email: [email protected].

Decision Line, December/January 1998

Mathematics in The Modern World
No ratings yet
Mathematics in The Modern World
75 pages
A Quick and Easy Guide in Using SPSS for Linear Regression Analysis
From Everand
A Quick and Easy Guide in Using SPSS for Linear Regression Analysis
Jurex Gallo
No ratings yet
Examples of Continuous Probability Distributions:: The Normal and Standard Normal
No ratings yet
Examples of Continuous Probability Distributions:: The Normal and Standard Normal
57 pages
Examples of Continuous Probability Distributions:: The Normal and Standard Normal
No ratings yet
Examples of Continuous Probability Distributions:: The Normal and Standard Normal
57 pages
Examples of Continuous Probability Distributions:: The Normal and Standard Normal
No ratings yet
Examples of Continuous Probability Distributions:: The Normal and Standard Normal
57 pages
03 Basic Statistical Data Analysis Using Excel
100% (1)
03 Basic Statistical Data Analysis Using Excel
141 pages
Week 06 Normal Distribution and Parameter Estimation
No ratings yet
Week 06 Normal Distribution and Parameter Estimation
53 pages
Normality
No ratings yet
Normality
5 pages
Lecture3 Na
No ratings yet
Lecture3 Na
73 pages
Lecture 6
No ratings yet
Lecture 6
57 pages
Math 118 PPT 12.4
No ratings yet
Math 118 PPT 12.4
21 pages
Examples of Continuous Probability Distributions:: The Normal and Standard Normal
No ratings yet
Examples of Continuous Probability Distributions:: The Normal and Standard Normal
58 pages
Bztm8e Presentation PPT 12 04
No ratings yet
Bztm8e Presentation PPT 12 04
19 pages
Excel Normal Distribution Functions
No ratings yet
Excel Normal Distribution Functions
6 pages
Module 9. Statistics New
No ratings yet
Module 9. Statistics New
74 pages
Week 7
No ratings yet
Week 7
57 pages
Local Media8189417746246610906
No ratings yet
Local Media8189417746246610906
23 pages
Normal, Binomial, Poisson, and Exponential Distributions
No ratings yet
Normal, Binomial, Poisson, and Exponential Distributions
39 pages
03 Basic Statistical Data Analysis Using Excel
No ratings yet
03 Basic Statistical Data Analysis Using Excel
141 pages
Lecture 4 - Normal Distribution
No ratings yet
Lecture 4 - Normal Distribution
41 pages
Module 1C Normal Distribution
No ratings yet
Module 1C Normal Distribution
33 pages
Here Are Data For Heights of Randomly Selected Male Students
No ratings yet
Here Are Data For Heights of Randomly Selected Male Students
27 pages
Lecture 07
No ratings yet
Lecture 07
70 pages
Measures of Relative Position
No ratings yet
Measures of Relative Position
28 pages
Lecture 2-Statistics The Normal Distribution and The Central Limit Theorem
No ratings yet
Lecture 2-Statistics The Normal Distribution and The Central Limit Theorem
73 pages
Descriptive Statistics: Analyzing Distributions
No ratings yet
Descriptive Statistics: Analyzing Distributions
16 pages
Normal LectureNote
No ratings yet
Normal LectureNote
48 pages
Biostatistics For Academic3
No ratings yet
Biostatistics For Academic3
28 pages
Understanding Z Score
No ratings yet
Understanding Z Score
41 pages
Statistics For Managers Using Microsoft Excel: 4 Edition
No ratings yet
Statistics For Managers Using Microsoft Excel: 4 Edition
82 pages
Normal, Binomial, Poisson Distributions
No ratings yet
Normal, Binomial, Poisson Distributions
12 pages
Mmw-Chapter 1docx-Pdf-Free
No ratings yet
Mmw-Chapter 1docx-Pdf-Free
5 pages
Statsc
No ratings yet
Statsc
79 pages
Normal Binomial
100% (1)
Normal Binomial
12 pages
Lesson 7:: Normal Distribution in Statistics
No ratings yet
Lesson 7:: Normal Distribution in Statistics
5 pages
Assignment 9 Nomor 1
No ratings yet
Assignment 9 Nomor 1
2 pages
3 Common Proba Distribution
No ratings yet
3 Common Proba Distribution
58 pages
Statistics (Part 4)
No ratings yet
Statistics (Part 4)
117 pages
Continuous Probability Distribution PDF
No ratings yet
Continuous Probability Distribution PDF
47 pages
Chapter 3 NorDIs
No ratings yet
Chapter 3 NorDIs
73 pages
Drill Down: Calculating The Z-Value
No ratings yet
Drill Down: Calculating The Z-Value
3 pages
Chapter 4 The Normal Distribution
100% (1)
Chapter 4 The Normal Distribution
12 pages
Continuous Distributions
No ratings yet
Continuous Distributions
7 pages
Gerstman PP07
No ratings yet
Gerstman PP07
35 pages
Normal Distribution PPT With Assignment 1 Without Answers
No ratings yet
Normal Distribution PPT With Assignment 1 Without Answers
33 pages
4 Normal Distribution
No ratings yet
4 Normal Distribution
40 pages
The Normal Probability Distribution: Chapter Six
No ratings yet
The Normal Probability Distribution: Chapter Six
18 pages
The Normal Distribution
No ratings yet
The Normal Distribution
35 pages
Normal Distribution
No ratings yet
Normal Distribution
17 pages
5 Random Var PDF
No ratings yet
5 Random Var PDF
74 pages
Lec 7 8
No ratings yet
Lec 7 8
58 pages
Statistics For Managers Using Microsoft Excel: Edition
No ratings yet
Statistics For Managers Using Microsoft Excel: Edition
56 pages
BE184
No ratings yet
BE184
47 pages
Normal Probability Plot
No ratings yet
Normal Probability Plot
11 pages
Mathematics As A Tool (Descriptive Statistics) (Midterm Period) Overview: This Module Tackles Mathematics As Applied To Different Areas Such As Data
No ratings yet
Mathematics As A Tool (Descriptive Statistics) (Midterm Period) Overview: This Module Tackles Mathematics As Applied To Different Areas Such As Data
33 pages
LP For P&S Normal Distribution
No ratings yet
LP For P&S Normal Distribution
6 pages
Forecasting Report Lab 1
No ratings yet
Forecasting Report Lab 1
9 pages
Start Predicting In A World Of Data Science And Predictive Analysis
From Everand
Start Predicting In A World Of Data Science And Predictive Analysis
Matthew Abbitt
No ratings yet
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
From Everand
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
Seaport AI Madhavan
No ratings yet
Business Statistics For Dummies
From Everand
Business Statistics For Dummies
Alan Anderson
No ratings yet