Statistical Methodology Step of Scientific Research Important Parametric Tests Important Nonparametric Tests Example Using Excel Program Using Excel For Statistics in Gateway Cases - Office 2007
Statistical Methodology Step of Scientific Research Important Parametric Tests Important Nonparametric Tests Example Using Excel Program Using Excel For Statistics in Gateway Cases - Office 2007
Statistical Methodology Step of Scientific Research Important Parametric Tests Important Nonparametric Tests Example Using Excel Program Using Excel For Statistics in Gateway Cases - Office 2007
Dr.Atcharaporn Khoomtong
Elementary statistics 1
Introduction
Statisticalmethodology
Step of scientific research
Important parametric tests
Important nonparametric tests
Example using Excel program
Using Excel for Statistics in Gateway
Cases Office 2007
Elementary statistics 2
Most people become familiar with probability and
statistics through radios, television,newspapers and
magazines.For example,the following statements
were found in newspapers.
Eating 10 grams(g) of fiber a day reduce the risk
of heart attack by 14%
Thirty minutes (of exercise) two or three times
each week can raise HDLs 10 to 15%
Elementary statistics 3
Elementary statistics 4
Whats it?
Flower
Elementary statistics 6
Statistics plays an important role in the
description of mass phenomena.
Organized and summarized for clear
presentation for ease of communications.
Data may come from studies of populations
or samples
It offers methods to summarize a collection
of data. These methods may be numerical or
graphical, both of which have their own
advantages and disadvantages.
Elementary statistics 7
A Population is the
set of all possible
states of a random
variable. The size of
the population may A Sample is a subset
be either infinite or of the population; its
finite. size is always finite.
Elementary statistics 8
Descriptive Statistics Inferential Statistics
Graphical Confidence interval
Arrange data in tables Compare means of two
Bar graphs and pie charts samples
Numerical t Test
Percentages F -Test
Averages Compare means from
Range three samples
Relationships Pre/post (LSD,DMRT)
Correlation coefficient ANOVA = analysis of
Regression analysis variance
F -Test
Elementary statistics 10
Type of Scale Possible Statements Allowed Examples
Operators
nominal scale identity, countable =, colors, phone
numbers,
feelings
ordinal scale identity, less =, , <, > soccer league
than/greater than table, military
relations, countable ranks, energy
efficiency
classes
interval scale identity, less =, , <, dates (years),
than/greater than >, +, - temperature in
relations, equality of Celsius, IQ scale
differences
ratio scale identity, less =, , <, velocities,
than/greater than >, +, -, *, / lengths,
relations, equality of temperatur in
differences, equality Kelvin, age
of ratios, zero point
Elementary statistics 11
Collecting the
necessary
facts Analyzing the facts
Inference Statistics
Descriptive Statistics Making decisions
Elementary statistics 12
Mode =The most frequent value
Median =The value of the middle point of the ordered
measurements
Mean =The average (balancing point in the distribution)
Variance= The average of the squared deviations of all
the population measurements from the
population mean
Standard deviation =The square root of the variance
2
2
2
S2
1
Called the unbiased
estimator of the population
value
Population of profit margins for five companies:
8%, 10%, 15%, 12%, 5%
8 10 15 12 5 50
10 %
5 5
2
8 10 2 10 10 2 15 10 2 12 10 2 5 10 2
5
2 2
0 5 2 52
2 2 2
5
4 0 25 4 25 58
11.6
5 5
2 11.6 3.406%
Later.
H0: 1 = 2
Elementary statistics 17
Type I error () :
reject H0 | H0 true
Type II error () :
Accept H0 | H1 true
Elementary statistics 18
Calculated F value is greater than the critical F values
Significant >>>reject H0
Elementary statistics 19
Decide H0 1-
fail to reject H0 True Negative False Negative
Decide HA 1-
reject H0 False Positive True Positive
= significance level
1- = power
Z - test F test
T test
Elementary statistics 21
Z - test
is based on the normal probability distribution and is used for
judging the significance of several statistical measures, particularly
the mean. (n>30)
z-test is generally used for comparing the mean of a sample to
some hypothesized mean for the population in case of large sample
T test
is based on t-distribution and is considered an appropriate
test for judging the significance of a sample mean or for
judging the significance of difference between the means
of two samples in case of small sample(s) when population
variance is not known (in which case we use variance of
the sample as an estimate of the population variance).
Unknown variance
Under H0 X 0
~ t(n1)
s/ n
Critical values: statistics books or computer
t-distribution approximately normal for degrees of freedom (df) >30
Elementary statistics 23
F test
is based on F-distribution and is used to compare the variance of
the two-independent samples. This test is also used in the context
of analysis of variance (ANOVA) for judging the significance of
more than two sample means at one and the same time.
Elementary statistics 24
Anova tables:
for a 1-way anova with N observations and T treatments.
Source df SS MS F
treatment (T-1) SStrt =SStrt/(T-1) MStrt/MSerr
errorby subtraction Sserr =SSerr/dferr
Total (N-1)
Finally, you (or the PC) consult tables or otherwise obtain a probability of
obtaining this F value given df for treatment and error.
Elementary statistics 27
linear
Elementary statistics 28
Here's the answer: linear models are linear
in the parameters which have to be
estimated, but not necessarily in the
independent variables.
This explains why the middle of the three
figures above shows a linear discrimination
line between the two classes, although the
line is not linear in the sense of a straight
line.
Elementary statistics 29
Elementary statistics 30
Thus the coefficient of determination specifies
the amount of sample variation in y explained
by x.
For simple linear regression the coefficient of
determination is simply the square of the
correlation coefficient between Y and X .
-1 0 No Linear relationship +1
Elementary statistics 31
The correlation coefficient may take any value between -1.0 and +1.0.
Elementary statistics 32
2 test
Elementary statistics 33
2 test
is based on chi-square distribution and as a parametric test
is used for comparing a sample variance to a theoretical
population variance.
where
= variance of the sample;
E
X
In quality control, there are situations when A
we need to know whether a sample mean lies M
within the confidence limits of the entire P
population. This can be accomplished by L
using t-distribution to determine confidence E
limits for a population mean using a selected
probability.
I
We will use Excel function TINV( ) to determine the t-distribution.
Elementary statistics 36
Ten cans of sliced pineapple were removed at
random from a population of 1000 cans. The
drained weight of the contents were
measured as 410.5, 411.4, 410.4, 412.6,
411.9, 411.5,412.5, 411.4, 411.5, 410.1 g.
Determine the 95% confidence limits for the
entire population.
Elementary statistics 37
Elementary statistics 38
Discussion:
The results show that the 95% confidence lower
and upper limits for the population mean are
410.78 and 411.98, respectively.
Elementary statistics 39
E
X
When a sample is taken from a large
A
population and analyzed for selected DATA,
statistical analysis is helpful in obtaining
M
estimates for the total population from P
which the sample was obtained. In this L
worksheet. E
Elementary statistics 40
Case study : Color Data
Elementary statistics 41
Elementary statistics 42
Click Microsoft Office Button , and Then
Click Excel Options
Click Add-ins. In Manage Box, Select Excel
Add-ins
Click Go
In the Add-Ins Available Box, Select Analysis
ToolPak Check Box and Click OK. (If ToolPak
Is Not Listed, Click Browse to Locate It.)
43
Elementary statistics 44
Step 3 Choose the menu items Data, Data Analysis ....
A dialog box will open as shown.
Step 4 Double click on Descriptive Statistics.
Elementary statistics 45
Step 5 In the edit box for Input Range:, type the range of
cells as SA$2:$A$11.
Step 6 Select the radio button Columns.
Step 7 In output range type A13. Click OK.
Step 8 Excel will calculate the descriptive statistics and
display results in cells A13:B28
Elementary statistics 46
t (difference between samples) / (variability)
Excel will automatically calculate t-values to
compare:
Means of two datasets with equal variances
Means of two datasets with unequal variances
Two sets of paired data
abs(t-score) < abs(t-critical): accept H0
Insufficient evidence to prove that observed
differences reflect real, significant differences
47
E
X
A researcher wishes to test whether heavy
A
metal in soil have different mean after war
threat versus before war threat. The heavy
M
metal in soil is that mean after war threat P
will exceed mean before war threat L
E
Use Excel to help test the hypothesis for the difference
in population means. III
Elementary statistics 48
Step 1 Open a new worksheet expanded to full size.
Step 2 In cells B5 :C19, type the text labels and data values
Ho : 1 2 0.0
H A : 1 2 0.0
Elementary statistics 49
Elementary statistics 50
Elementary statistics 51
t > tcritical(two-tail), so
the mean of sample #1 p value for Two-tail test is
is significantly .007 which is less than .05 so
different from the mean we reject the null hypothesis.
of sample #2.
Elementary statistics 52
E
X
In hypothesis testing, it is sometimes not A
possible to use the same judges for testing
M
different treatments. Although, it would be
P
desirable to use the same judges to evaluate
samples obtained from different treatments. L
E
In such cases, we have a completely
randomized design. Using single-factor ANOVA
IV
We can test to see whether the treatments had any influence on the
judges scores; in other words, does the mean of each treatment differ?
Elementary statistics 53
A B C
150 148 146
151 150 148
152 152 150
153 154 152
154 156 154
Elementary statistics 54
For each treatment, 5 samples were weighted by
5 times. Therefore, the design was completely
randomized. Calculate the F value to determine
whether the means of three treatments are
significantly different.
Elementary statistics 55
Elementary statistics 56
Step 1 Open a new worksheet expanded to full size.
Step 2 In cells A4 :C8, type the text labels and data values
Elementary statistics 57
Elementary statistics 58
The results show that the F value is 0.889. The critical F
values are At the 5% level F = 3.885
This indicates that for the example problem the F value is lower than
the value at the 5% level but not at the 5% level. Thus, we can
say that no significant difference in their mean scores(P<0.05).
Elementary statistics 59
E
X
When we are interested in evaluating samples A
for sensory characteristics using same judges M
with samples obtained from multiple P
treatments, analysis of variance for a two- L
factor design without replication is useful. E
This analysis helps in determining if there are
significant differences among the various V
treatments as well as if an significant
differences exist among the judges themselves.
Elementary statistics 60
Threetypes of ice cream were evaluated by
11 judges. The judges assigned the following
scores.
Judge Ice Cream A Ice Cream B Ice Cream C
A 16 14 15
B 17 15 17
C 16 16 16
D 18 14 16
E 16 14 14
F 17 16 17
G 18 14 15
H 16 15 16
I 17 14 14
J 18 13 16
K 17 15 15
Elementary statistics 61
Elementary statistics 62
Step 1 Open a new worksheet expanded to full size.
Step 2. In cell A3 :D 13, type the text labels and
data values,
Elementary statistics 63
Elementary statistics 64
The difference
among ice cream
types is determined
by examining the F
values. The F value
is calculated as
For judges, the calculated F value is 19.73. This value is
1.36. This value is lower than the critical greater than 3.49 for
F values of 2.35 at the 5 % level the 5% level
Elementary statistics 65
Elementary statistics 66
E
X
A
Simple regression analysis involves determining
the statistical relationship between two
M
variables. One of the uses of such analysis is in P
predicting one variable on the basis of the L
other. E
Elementary statistics 67
Elementary statistics 68
We will use the package Regression available
as an Add-in item in Excel. We will use this
package to obtain required statistical
relationships. We assume that a linear
relationship exists between the off-flavor
score and time (in months) with the equation
y= mx+b,
where
y is off-flavor score, x is time in months, m is slope and
b is intercept.
Elementary statistics 69
Elementary statistics 70
Step 3 Choose the menu items Data, Data Analysis .... A dialog box will
open.
Step 4 Double click on Regression.
Step 5 A new dialog box will open. Enter the range of cells for Y and X as
shown. Check boxes for Residuals and Line Fit Plots. Click OK.
Elementary statistics 71
Probability of
Ratio of variability explained getting this value of
by model to leftover F by randomly
variability. High number sampling from a
means model explains most normally distributed
variation in data. population. Low
value means model
y = 0.31 x + 1.58
Elementary statistics 73
Elementary statistics 74
75
Statistics
- Descriptive Statistics
- Histograms
- Hypothesis Testing
- Scatter Plots
- Regression Analysis
76
Click Microsoft Office Button , and Then
Click Excel Options
Click Add-ins. In Manage Box, Select Excel
Add-ins
Click Go
In the Add-Ins Available Box, Select Analysis
ToolPak Check Box and Click OK. (If ToolPak
Is Not Listed, Click Browse to Locate It.)
77
78
Click Data/Data Analysis/Histogram & OK.
Put Checkmarks on Chart Output & New Worksheet
Boxes.
Move Cursor to Input Range Window, Highlight Data
Going into Histogram.
Move Cursor to Input Bin Range, Highlight Data
Showing Upper Value of Each Bin & Click OK.
Histogram will be on New Worksheet. You May
Lengthen it by Clicking Blank Space in Window, Moving
Cursor to Window Bottom Line & Holding Down Mouse
Button as You Pull Down Window.
79
Go to Sheet One.
Click Data/Data Analysis/ and the Appropriate
Statistical Test. Then Click OK.
On New Window Check Labels Box and Put
Cursor on Variable 1 Range.
Highlight Variable 1 Data Including Label.
Put Cursor on Variable 2 Range & Highlight
Variable 2 Data (Including Label). Then Click OK.
Click Home/Format/AutoFit/Column Width
80
Go to Sheet One.
Highlight Data (Be Sure X Values are in
Left Column and Y Values are in Right
Column).
Click Insert/Scatter. Pull down menu and
click Upper Left Icon.
Click a Datum Point on Chart with Right
Mouse Key, Add Trendline, & Click Linear.
81
Go to Sheet One.
Click Data/Data Analysis (On Far Right)
/Regression & Click OK.
On New Window Check Labels Box and Put
Cursor on X Range.
Highlight X Data Including Label.
Put Cursor on Y Range & Highlight Y Data
(Including Label), Then Click OK.
Click Home/Format/AutoFit Column Width.
82
Elementary statistics 83