50% found this document useful (4 votes)
2K views

Course Content: St. Paul University Philippines

This document provides an overview of key concepts in statistics including measures of central tendency, levels of measurement, and descriptive versus inferential statistics. It describes a graduate level course in statistics that covers topics such as measures of central tendency, variability, hypothesis testing, correlation, regression analysis and exploring the statistical software SPSS. Course requirements include a reaction paper, problem sets, and a final exam.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
50% found this document useful (4 votes)
2K views

Course Content: St. Paul University Philippines

This document provides an overview of key concepts in statistics including measures of central tendency, levels of measurement, and descriptive versus inferential statistics. It describes a graduate level course in statistics that covers topics such as measures of central tendency, variability, hypothesis testing, correlation, regression analysis and exploring the statistical software SPSS. Course requirements include a reaction paper, problem sets, and a final exam.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

1

St. Paul University Philippines


Graduate School

A Course Presentation in Statistics

Course Content
Basic Concepts in Statistics
Measures of Central Tendency
Measures of Variability
Correlation and Regression Analysis
Test of Hypothesis
Z Test
T Test
Chi Square Test
Analysis of Variance (ANOVA)
EXPLORING THE SPSS
Course Requirements

Reaction Paper/ Film Clip Analysis
Problem Set
Final Examination


Reaction Paper (Film Clip Analysis)


Lies, Damned Lies and Statistics: The
Misapplication of Statistics in
Everyday Life
2
Statistics defined . . .
STATI STI CS is a collection of methods for
planning experiment, obtaining data, and
then organizing, summarizing, presenting,
analyzing, interpreting and drawing
conclusions based on the data.

Main Divisions
Descriptive Statistics
- summarize or describe the important
characteristics of a known set of
population data
I nferential Statistics
-use sample data to make inferences (or
generalizations) about a population

Population vs. Sample

A POPULATI ON is the complete collection of
elements (scores, people, measurements, and so
on)

A SAMPLE is a portion / subset of elements
drawn from a population
Parameter vs. Statistic

A PARAMETER is a numerical measurement
describing some characteristics of a population

A STATI STI C is a numerical measurement
describing some characteristic of a sample
3
Qualitative vs. Quantitative Data

Qualitative (categorical or attribute) data
can be separated into different categories
that are distinguished by some non
numerical characteristics

Quantitativedata consists of numbers
representing counts or measurements
Discrete vs Continuous Data
Discrete data result from either a finite number of
possible values or a countable number of possible
values (that is, the number of possible values are
0, 1, 2, or more)

Continuous data result from infinitely many
possible values that can be associated with points
on a continuous scale in such a way that there are
no gaps or interruptions
Dependent vs Independent Variable
Dependent variable the variable that is being
affected
- the variable that is being
explained

I ndependent variable the variable that affects
- the variable that explains


Nominal Level of Measurement
The nominal level of measurement is
characterized by data that consists of names,
labels or categories only. The data cannot be
arranged in an ordering scheme

Examples:
gender of employees, civil status,
nationality, religion, etc
4
Ordinal Level of Measurement
The ordinal level of measurement involves
data that may be arranged in some order, but
differences between data values are either
meaningless or cannot be determined.

Examples:
good, better or best speakers; 1 star, 2 star
or 3 star movie; rank of an employee
Interval Level of Measurement
The interval level of measurement is like the
ordinal level, with the additional property that
meaningful amounts of differences between data
can be determined. However, there are no inherent
(natural) zero starting point

Examples:
body temperature, year (2007, 2008, 2013, etc)
Ratio Level of Measurement
The ratio level of measurement is the
interval modified to include the inherent
zero starting point. For values at this level,
differences and ratios are meaningful.

Examples:
weights, lengths, distance traveled
Visual Summary of the Scales of Measurement
Are there named categories?
YES NO
Nominal scale of measurement
`
Are the scores ranked?
YES NO
Ordinal scale of measurement Are there equal intervals with a
meaningful zero point?
YES NO
Ratio scale of measurement Interval scale of measurement
5
Measures of
Central
Tendency
(UNGROUPED
DATA)
Mean Median Mode
The Mean

Two Forms
Simple mean
Weighted mean

The mean takes the symbol X.
Arithmetic Mean (Mean)

balancing point of a set of scores
the average score
The Mean
If you have a
Population Sample
Total number of cases is N
Sum of the scores is X
Compute the mean of the
population
X
N
Total number of cases is n
Sum of the scores is X
Compute the mean of the
sample


X =
X
n
=
6

Simple Arithmetic Mean
X
X
n
=

Where:
x = an individual
score
n = the number of
scores/cases
Sigma or Ex= sum of
the individual score
values
Example:
1, 2, 3, 4, 5, 6, 7, 8, 9, 10

Consider the following data set:
X
X
n
=

Solution:
=
1 + 2 + 3 + 4 + 5 + 6 + 7+ 8 + 9 + 10
10
Mean = 5.5
Example:
The following data represents the ages of the mothers
of Paulinian Graders randomly selected from four
different grade levels who attended a session on
Counseling. What is the mean age of the mothers per
grade level?

Grade 1: 35, 37, 45, 54, 39, 48
Grade 2: 54, 63, 47, 63, 45, 53, 52, 48, 55, 48, 63
Grade 3: 56, 48, 39, 48, 55, 57, 41, 56
Grade 4: 53, 47, 49, 59, 60, 45, 53, 59, 47
Solution:
To obtain the mean age of the mothers of the Grade 1,
we have

X = 35 + 37 + 45 + 54 + 39 + 48
6
= 258
6
X = 43

**This means that the mothers of the Grade 1 pupils are relatively young.

To obtain the mean age of the mothers of the Grade 1,
we have

X = 35 + 37 + 45 + 54 + 39 + 48
6
= 258
6
X = 43

7
Example:
Find the mean of the other grade levels. Round off
your answers to the nearest hundredths.

Grade 2: 54, 63, 47, 63, 45, 53, 52, 48, 55, 48, 63
Grade 3: 56, 48, 39, 48, 55, 57, 41, 56
Grade 4: 53, 47, 49, 59, 60, 45, 53, 59, 47
Answers:
Grade 2: 54, 63, 47, 63, 45, 53, 52, 48, 55, 48, 63
ANSWER: 53.73

Grade 3: 56, 48, 39, 48, 55, 57, 41, 56
ANSWER: 50

Grade 4: 53, 47, 49, 59, 60, 45, 53, 59, 47
ANSWER: 52.44

Weighted Mean
Where:
w = weight per item value
x = individual score values
X
w
= w
1
X
1
+ w
2
X
2
+ w
3
X
3
+ . . . + w
n
X
n
Total number of weights




Example:
The following are the responses of 30 randomly chosen
respondents in one item of a research questionnaire.








** Find the weighted response of the respondents and
interpret the result.


Verbal Description Weight No. of Responses
Very strongly agree 5 7
Strongly agree 4 11
Agree 3 9
Disagree 2 2
Strongly disagree 1 1
8
Solution:
To obtain for the weighted response, we have

X = 5(7) +4(11) + 3(9) +2(2) + 1(1)
30
= 111
30
X = 3.70 strongly agree

Interpretation of Values
Range Verbal Description
4.20 5.00 Very strongly agree
3.40 4.19 Strongly agree
2.60 3.39 Agree
1.80 2.59 Disagree
1.00 1.79 Strongly disagree
Exercise:
Construct a likert scale to interpret items of a
questionnaire with weights 1 4.

Assume the following descriptions were used:
4 always
3 sometimes
2 seldom
1 never

Example:
The following are the grades of one student one
summer term.






** Find the weighted average of the student.
** What could have been the students average if all his
subjects are of equal weights?


Subject No. of Units Grade
Statistics 3 98
PE 2 90
Chemistry 5 93
9
Characteristics of the Mean
an interval statistic
calculated average
affected by extreme values
most widely used
most sensitive measure
value is determined by every
case in the distribution
A
B
C D E
( 1) + ( 2) + ( 2) + 1 + 4 = 0
3 4 5 6 7 8 9
(+4)
(-2)
(-2)
(-1)
(+1)
sum of the deviations about the mean is zero
Median
the value that lies in the middle after ranking all
the scores
positional measure
the midpoint or the
50th percentile of a
distribution
Median
the value at which 1/2 of the ordered scores fall above
and 1/2 of the scores fall below


1 2 3 4 5 1 2 3 4

Median = 3 Median = 2.5
n = odd n = even
10
Example
I am the 4
th

observation. I
am the median.


Example:

5.40 1.10 0.42 0.73 0.48
1.10
0.42 0.48 0.73 1.10 1.10
5.40




0.73 + 1.10
2
(even number of values no exact middle
shared by two numbers)
MEDIAN is 0.915
Example
Example
5.40 1.10 0.42 0.73 0.48 1.10 0.66
0.42 0.48 0.66 0.73 1.10 1.10 5.40
(in order - odd number of values)

exact middle MEDIAN is 0.73
an ordinal statistic
rank or position average
not affected by extreme values
can be subjected to a few
mathematical computations
less widely used than the mean
represents a typical score
Characteristics
of the Median
11


The following data represents the ages of the mothers
of Paulinian Graders randomly selected from four
different grade levels who attended a session on
Counseling. What is the median of the ages of the
mothers per grade level?

Grade 1: 35, 37, 45, 54, 39, 48
Grade 2: 54, 63, 47, 63, 45, 53, 52, 48, 55, 48, 63
Grade 3: 56, 48, 39, 48, 55, 57, 41, 56
Grade 4: 53, 47, 49, 59, 60, 45, 53, 59, 47
Exercise
Mode
the value which occurs most frequently in a given data
set
does not involve any calculation or ordering of data


Example
Observation Value/
Score
1 5
2 7
3 3
4 8
5 7
Consider the following data set:
Examples
a. 5.40 1.10 0.42 0.73 0.48 1.10
b. 27 27 27 55 55 55 88 88 99
c. 1 2 3 6 7 8 9 10

Mode is 1.10
Bimodal - 27 & 55
No Mode
12

a nominal statistic
Characteristics of
the Mode
an inspection average
most frequently occurring value
cannot be manipulated mathematically
rarely used
most popular score
Advantages Disadvantages
Mode Quick and easy to
calculate.
May not be representative
of the whole sample
Median Fairly easy to calculate.
Half of the scores lie
above the median.
Tedious to find for a large
set of numbers or for a set
that is not in order
Mean Takes all numbers into
account.
Can be affected by outliers
Which is best?
When to use . . .
Mean
-an interval interpretation is needed
-the value of each score is desired
-further statistical computation is expected
Median
-an ordinal interpretation is needed
-the middle score is desired
-avoidance of the influence of extreme values is
needed
Mode
-a nominal interpretation needed
-a quick approximation of a central tendency
measure is desired
-most frequently occurring score is needed
Measures of
Central
Tendency
(GROUPED
DATA)
Mean Median Mode
13
The Mean
i.) Classmark method

X =
n
fx
m
Where:
X
m
class mark / class midpoint
f frequency
n number of cases / observations
The Mean
ii.) Coded deviation method

X = AM +
n
fd
i
Where:
AM assumed mean (X
m
of where the zero deviation is set)
f frequency
d deviation
n number of cases / observations
Example
**Find the mean, median and mode of the following
data set:
X F
24 26 3
21 23 12
18 20 10
15 17 6
12 14 6
9 11 5
6 8 5
3 5 3
The Median



Md = X
LB
+
2
n
- cfp
f
i
Where:
X
LB
lower boundary of the median class
cfp cumulative frequency preceding the median class
n number of cases
f frequency of the median class
i class size/width



Md = X
LB
+
f
i
14
The Mode



Mo = X
LB
+
1

i

1
+
2

Where:
X
LB
lower boundary of the modal class

1
difference between frequency of the modal class
and frequency below it


2
difference between frequency of the modal class
and frequency above it
i class size/width
Exercise
X F
56 62 4
49 55 9
42 48 12
35 41 12
28 34 10
21 27 8
14 20 6
7 13 4
**Find the mean, median and mode of the following
data set:
Other Measures of Position
(QUANTILES)
1. Quartile (Q
k
) divides the distribution into 4
equal parts
2. Decile (D
k
)- divides the distribution into 10
equal parts
3. Percentile (P
k
)- divides the distribution into
100 equal parts


Where:
X
LB
lower boundary of the quartile class
cfp cumulative frequency preceding the quartile class
n number of cases
f frequency of the quartile class
i class size/width

- cfp



Q
k
= X
LB
+
f
i
The Quartile
4
kn
15
The Decile
Where:
X
LB
lower boundary of the decile class
cfp cumulative frequency preceding the decile class
n number of cases
f frequency of the decile class
i class size/width

- cfp



D
k
= X
LB
+
i
10
kn
f

Where:
X
LB
lower boundary of the percenttile class
cfp cumulative frequency preceding the percentile class
n number of cases
f frequency of the percentile class
i class size/width
The Percentile
f

- cfp



P
k
= X
LB
+
100
kn
i
Exercise
**Using the frequency distribution below, find:
1. Q
1
3. D
3
5. P
3

2. D
6
4. P
78
X F
56 62 6
49 55 9
42 48 10
35 41 12
28 34 10
21 27 8
14 20 6
7 13 4
Measures of Variability
The statistical tool used to describe the degree to
which scores/ observations are scattered.
It is used to determine the degree of consistency /
homogeneity of scores.
1. range
2. mean absolute deviation
3. semi interquartile range/ quartile deviation
4. variance
5. standard deviation
16
Formulas (Ungrouped Data)
1. Range
R =HOV LOV

2. Mean absolute deviation

MAD =

3. Semi interquartile range/ quartile deviation
QD = Q
3
Q
1
2
n
X X

/ /
4. Variance

s
2
=



5. Standard deviation

s =
1
) (
2

n
X X
2
s
Formulas (Ungrouped Data)
Exercise:
Given the following data, find the range, MAD,
variance and the standard deviation.
20, 26, 40, 39, 35
Application:
Two seemingly equally excellent students are
vying for an academic honor where only one must
have to be chosen to get the award. The following
are their grades which are used as a basis for giving
the award.
Student A: 90, 92, 92, 94, 95
Student B: 90, 91, 93, 94, 95

Who do you think deserves the award? Why?
17
Guiding Principle
The lesser the value of the measure, the
more consistent, the more homogenous and
the less scattered are the observations in the
set of data.
Formulas (Grouped Data)
1. Range
R = HOV LOV

2. Mean absolute deviation

MAD =

3. Semi interquartile range/ quartile deviation
QD = Q
3
Q
1
2
n
X X f m

/ /
Formulas (Grouped Data)
4. Variance

s
2
=



5. Standard deviation

s =
1
) (
2

n
X X f m
2
s
X F
56 62 6
49 55 9
42 48 10
35 41 12
28 34 10
21 27 8
14 20 6
7 13 4
**Using the frequency distribution below, find:
1. Range 3. QD 5. Standard Deviation


2. MAD 4. variance
Exercise:
18
Tests of Hypothesis
Hypothesis
A statement or tentative theory which aims to
explain facts about the real world
An educated guess
It is subject for testing. If it is found to be
statistically true, it is accepted. Otherwise, it gets
rejected.
Kinds of Hypotheses
1. Null Hypothesis (Ho)
It serves as the working hypothesis
It is that which one hopes to accept or reject
It must always express the idea of no
significant difference

2. Alternative Hypothesis (H
1

or Ha)
It generally represents the hypothetical
statement that the researcher wants to prove.
Types of Alternative Hypotheses (Ha)
1. Directional hypothesis
expresses direction
one tailed
uses order relation of greater than or less than,

2. Non directional hypothesis
does not express direction
two tailed
uses the not equal to
Type I and Type II Errors
When making a decision about a proposed
hypothesis based on the sample data, one runs the
risk of making an error. The following table on the
next slide summarizes the possibilities:
19
Type I and Type II Errors
A Type I error is the mistake of rejecting the null
hypothesis when it is true.
The symbol (alpha) is used to represent the probability
of a type I error.
A Type I I error is the mistake of failing to reject the null
hypothesis when it is false.
The symbol | (beta) is used to represent the probability of
a type II error.

Level of Significance
The probability of making Type I error or alpha
error in a test is called the significance level of the
test. The significance level of a test is the maximum
value of the probability of rejecting the null
hypothesis (Ho) when in fact it is true.
Critical Region
The critical region (or rejection region) is the set of all values
of the test statistic that cause us to reject the null hypothesis.
P - value Critical - value
Region of
acceptance
Region of
rejection
20
Critical Value
A critical value is any value that separates the
critical region (where we reject the null
hypothesis) from the values of the test statistic
that do not lead to rejection of the null
hypothesis, the sampling distribution that
applies, and the significance level o.
P - Value
The P-value (probability value) is the probability of
getting a value of the test statistic that is at least as
extreme as the one representing the sample data,
assuming that the null hypothesis is true. The null
hypothesis is rejected if the P-value is very small,
such as 0.05 or less.
Two-tailed, Right-tailed and
Left-tailed Tests
The tails in a distribution are the extreme
regions bounded by critical values.
Two-tailed Tests
Given:
H
0
: = ; H
1
:
21
Right tailed Tests
Given:
H
0
: = ; H
1
: >
Left tailed Tests
Given:
H
0
: = ; H
1
: <
Steps in Hypothesis Testing
1. Formulate the null hypothesis (Ho) that there is no
significant difference between the items compared. State
the alternative hypothesis (Ha) which is used in case Ho
is rejected.

2. Set the level of significance of the test, o.

3. Determine the test to be used.
Z TEST used if the population standard deviation
is given
T TEST used if the sample standard deviation is
given
Steps in Hypothesis Testing
4. Determine the tabular value of the test.
***For a Z test, the table below summarizes the
critical values at varying significance levels


Type of
Test
Level of Significance
0.10 0.05 0.025 0.01
One
Tailed
1. 28 1. 645 1.96 2.33
Two
Tailed
1.645 1.96 2.33 2. 58
22
Steps in Hypothesis Testing
4. Determine the tabular value of the test.

***For a T test, one must compute first the
degree/s of freedom (df) then look for the tabular
value from the table of Students T Distribution.

i. For a single sample
df = n 1
ii. For two samples
df = n
1
+ n
2
2
Steps in Hypothesis Testing
5. Compute for z or t as needed. Vary your solutions using
the formulas:

For z test
i. Sample mean compared with a population mean
ii. Comparing two sample means
iii. Comparing two sample proportions

For t test
i. Sample mean compared with a population mean
ii. Comparing two sample means
Steps in Hypothesis Testing
6. Compare the computed value with its
corresponding tabular value, then state your
conclusions based on the following guidelines:

Reject Ho if the absolute computed value is
equal to or greater than the absolute tabular value

Accept Ho if the absolute computed value is less
than the absolute tabular value
Decision Criterion
Traditional Method:

***Reject H
0
(Accept H
1
) if the test
statistic falls within the critical region.
***Fail to reject H
0
(Accept Ho) if the
test statistic does not fall within the critical
region.
23
Decision Criterion
P - value method:

*** Reject H
o
(Accept H
1
) if P-value s
o (where o is the significance level, such as
0.05)

***Fail to reject H
0
(Accept Ho)

if
P-value > o
Decision Criterion
Another option:

Instead of using a significance level
such as 0.05, simply identify the P-value and
leave the decision to the reader.
Z - TEST
1. Sample Mean (X) Compared with a Population Mean ()
Where:
X sample mean
population mean
n number of items in the sample
population standard deviation

( X ) n
Z =

Z - TEST
2. Comparing Two Sample Means (X
1
& X
2
)
Where:
X
1
mean of the first sample
X
2
mean of the second sample
n
1
number of items in the first sample
n
2
number of items in the second sample
population standard deviation
X
1
- X
2
Z =
(1/n
1
) + (1/n
2
)
24


Z- TEST
Where:
p
1
proportion of the first sample
p
2
proportion of the second sample
n
1
number of items in the first sample
n
2
number of items in the second sample
q
1
= 1 p
1

q
2
= 1 p
2
3. Comparing Two Sample Proportions (P
1
& P
2
)
P
1
- P
2
Z =
(p
1
q
1
/n
1
) + (p
2
q
2
/n
2
)
T- TEST
4. Sample Mean (X) Compared with a Population Mean ()
Where:
X sample mean
population mean
n number of items in the sample
s sample standard deviation

( X ) n 1
t =
s
T- TEST
5. Comparing Two Sample Means (X
1
& X
2
)
Where:
X
1
mean of the first sample
X
2
mean of the second sample
n
1
number of items in the first sample
n
2
number of items in the second sample
s
1
standard deviation of the first sample
s
2
standard deviation of the second sample

X
1
X
2
t =
(n
1
1)(s
1
)
2
+ (n
2
1)(s
2
)
2

1 + 1
n
1
+ n
2
2 n
1
n
2

Example 1
Data from a school census show that the
mean weight of college students is 45 kilos with a
standard deviation of 3 kilos. A sample of 100
college students were found to have a mean of 47
kilos. Are the college students really heavier than
the rest using the 0.05 level of significance?
25
Example 2
A researcher wishes to find out whether or not there
is significant difference in the monthly allowance of
morning and afternoon students in his school. By random
sampling, he took a sample of 239 students in the morning
session. The students were found to have a mean monthly
allowance of P142.00. The researcher also took a sample of
209 students in the afternoon session . They were found to
have a mean monthly allowance of P148.00. The population
of students in that school have a standard deviation of
P40.00. Is there a significant difference between the two
samples at 0.01 level?
Example 3
A sample survey of television programs in
Metro Manila shows that 80 out of 200 men and 75
out of 250 women dislike May Bukas Pa
program. One likes to know whether the difference
between the two sample proportions, 80/200 = 0.40
and 75/250 = 0.30, is significant or not at 0.05
level.
Example 4
A researcher knows that the average height of
Filipino women is 1.525 meters. A random sample
of 26 women was taken and was found to have a
mean height of 1.56 meters, with a standard
deviation of 0.10 meters. Is there reason to believe
that the 26 women are significantly taller than the
rest using the 0.05 level of significance?
Example 5
Beta company is manufacturing steel wire
with an average tensile strength of 50 kilos. The
laboratory tests 16 pieces and finds that the mean is
47 kilos with a standard deviation of 15 kilos. Are
the results in accordance with the hypothesis that
the population mean is 50 kilos?
26
Example 6
It is known from the records of the city
schools that the standard deviation of math test
scores on ABC test is 5. A sample of 200 students
from the system was taken and it was found out that
the sample mean is 75. Previous tests showed the
population mean to be 70. Is it safe to conclude that
the sample is significantly different from the
population at 0.01 level?
Example 7
Two types of rice varieties are being considered for
yield and a comparison is needed. Thirty hectares were
planted with the rice varieties exposed to fairly uniform
conditions. The results are tabulated below:
Variety A Variety B
Average yield 80 sack/hec 85 sack/hec
Sample Variance 5.90 12.10

Is there significant difference in the yield of the two
varieties at 0.05 level of significance?
Example 8
A manufacturer of flashlight batteries claims
that the average life of his product will exceed 40
hours. A company is willing to buy a very large
shipment of batteries provided the claim is true. A
random sample of 36 batteries is tested, and it was
found out that the sample mean is 45 hours. If the
population of batteries has a standard deviation of 5
hours, is it likely that the batteries will be bought?
Example 9
A company is trying to decide which brand of two
types to buy for their trucks. They would like to adopt Brand
c unless there is some evidence that Brand D is better. An
experiment was conducted where 16 from each brand were
used. The tires were run under uniform conditions until they
wore out. The results are:
Brand C: X
1
= 40,000 km s
1
= 5,400 km
Brand D: X
2
= 38,000 km s
2
= 3,200 km

What conclusion can be drawn?
27
Example 10
All freshmen in a particular school were
found to have a variability in grades expressed as a
standard deviation of 3. two samples among these
freshmen, made up of 20 and 50 students each,
were found to have means of 88 and 85respectively.
Based on their grades, is the first group really
brighter than the second group using 0.01 level of
significance?
Analysis of Variance (F - Test)
-A test that was developed by Ronald A. Fisher

-A technique in inferential statistics designed to test
whether or not more than two samples (or groups)
are significantly different from each other
Analysis of Variance
Steps:
1. Compute for the sum of squares

TSS =





N
x
x


2
2
) (
SSB =
N
x
x
r
ij



2
2
) (
) (
1
SSW = TSS SSB
Analysis of Variance
2. Compute degrees of freedom

dft = rk 1 = N 1

dfb = k 1

dfw = dft dfb


28
Analysis of Variance
3. Compute for the mean sum of squares

MSSB =


MSSW =
dfb
SSB
dfw
SSW
4. Compute for the F Ratio

F =
MSSW
MSSB
Contingency Table for ANOVA
Sources of
Variation
Sum of
Squares
Degree of
Freedom
(df)
Mean Sum
of Squares
F Ratio
Between
Column
SSB dfb MSSB
Within
Column
SSW dfw MSSW
Total TSS dft
Exercise
1. The weights in kilograms of three groups of 5 members
each are shown in the table below. Is there unusual
variation among the groups? ( use = 0.05)

Members
Group
A B C
1 50 60 53
2 48 40 55
3 55 50 40
4 50 60 40
5 46 52 47
Exercise
2. The following are the mileage obtained after several road tests were
run using 5 different kinds of gasoline on a Toyota Car.











Is there significant difference among the mileage yields, at 1% level?

Road
Test
Type of Gasoline
A B C D E
1
ST
35 61 38 65 56
2
ND
31 63 54 60 69
3
RD
42 50 47 57 70
4
TH
48 42 60 55 50
5
TH
40 49 55 60 48
29
Exercise
3. Below are the bowling scores of four groups og four
members each. At 5% significance level, find out if there
is unusual variation among the groups.
Members Group
A B C D
1 98 100 87 90
2 78 95 92 93
3 95 90 105 95
4 110 85 88 97
Chi Square Test (X
2
)
- Used to test significant difference or relationship
- Used if data are in frequencies (enumeration data)

USES:
1. to test the goodness of fit of a normal curve; that is to
find out whether or not a sample distribution conforms
with the hypothetical normal distribution
2. to find out whether or not an observed proportion is
equal to some given ideal or expected proportion
3. to test the independence of one variable from another
variable.
Formulas:
i. For a 2 x 2 table (with YATEs correction for continuity)





ii. For a non 2 x 2 table

X
2
=

X
2
=


EF
EF OF
2
) 5 . 0 (


EF
EF OF
2
) (
Exercise
1. Test the hypothesis that educational attainment does not
depend on socio economic status for the following 100
persons in a particular community.
Socio economic
status
Educational Attainment
Finished College Did Not Finish
College
Poor 18 10
Middle Class 28 25
Rich 14 5
30
Exercise
2. At 1% significance level, does college academic grade
depend on the high school NSAT results for the following
200 students?

Academic
Grade
NSAT Rating
Low Average High
Above 85 13 25 21
75 85 18 31 38
Below 75 14 20 20
Exercise
3. At ABC Company, there are 28 males and 32
females. Out of the 28 males, 10 holds executive
posts and the others do clerical work. Of the 32
females, only 5 hold executive position and the
others do clerical work. Prepare a contingency
table, then test the hypothesis that position is
independent on sex.
Exercise
4. To determine whether type of personality is related to
academic performance, a random sample of 180 high
school students from a certain college were taken and the
data are as follows:






Is there a significant relationship between personality type
and academic performance?
Low Average Average High Average
Introvert 35 30 25
Extrovert 31 23 36
Correlation
and
Regression Analysis
31
Regression Analysis
- concerned with the problem of estimation and
forecasting
FORMULA:
y = a + bx
Where:
y predicted score
a y intercept
b slope of the line

Regression Analysis
Where:
Y mean of the y values
X mean of the x values






b =


a = Y bX
( )( )
( )

2
2
x x n
y x xy n
Correlation Analysis
- Concerned in the relationship of the changes of
the variables
Formula: Pearson Product Moment Correlation (r)


r =

) )( ( ) ( y x xy n


2 2 2 2
) ( ) ( ][ ) ( ) ( [ y y n x x n
Range of Values: r = [-1, 1]

(+) r shows a direct positive relationship
(- ) r shows a negative or inverse relationship

r = 0 this indicates no relationship
r = 1 perfect positive relationship
r = -1 perfect negative relationship
32
Interpretation:
Pearson r Qualitative Description
1 Perfect Correlation
0.91 0.99 Very High
0.71 0.90 High
0.41 0.70 Marked
0.21 0.40 Slight/Low
0 0.20 Negligible
Testing the Significance of r


t = r
2
2
1
) 2 (
r
n

Exercise
1. It is generally known that the number of road accidents is inversely
proportional with road width. The following data shows the result of
a study indicating the number of accidents occurring per hundred
thousand vehicles.





a. draw a scatter diagram
b. find the equation of the LSRL
c. predict accident frequency for a road whose width is 55 feet;
48 feet
d. find the degree of relationship between road width and
accident frequency.
Road width (in feet) (x) 75 52 60 33 22
Number of accidents (y) 40 84 55 92 90

Exercise
2. The following table shows the final grades of ten students
in Algebra and Statistics.




a. draw a scatter diagram
b. find the equation of the LSRL
c. predict grade in Statistics if grade in
Algebra is 78; 82; 89; 95; 100
d. find the degree of relationship between grades in
Algebra and Statistics

Algebra (x) 75 80 93 65 87 71
Statistics (y) 82 78 86 72 91 80
33
Pilar B. Acorda
Email Address : [email protected]
Mobile Number: 09359547319

You might also like