Compiled Notes

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 12

OTHER MEASURES OF LOCATION considered as the most important and

reliable measure of dispersion and is


1. Quartiles – are values that divide the set of denoted by SD/S.
data into 4 equal parts. These values are
denoted by Q1 , Q2 , and Q3 in which the 4. Variance – is the square of standard
25% of the data falls below Q1 , 50% falls deviation. This is denoted by a capital letter
below Q2 , and 75% falls below Q3 V.

2. Deciles – are values that divide the set of 5. Coefficient of Variation – is the ratio
data into 10 equal parts. These values are between standard deviation and mean. It is
denoted by D1 , D2 , D3 , … , and D9 in denoted by CV.
which the 10% of the data falls below D1 ,
20% falls below D2 , 30% falls below D3 , TECHNIQUES AND USEFUL FORMULA
… , and 90% falls below D9. For Ungrouped Data N ≤ 30
3. Percentiles – are values that divide the set 1. Quartiles
of data into 100 equal parts. These values
are denoted by P1, P2, P3, …, and P99 in Steps:
which 1% of the data falls below P1, 2%
falls below P2, 3% falls below P3, …, and 1.1 Arrange the raw scores from highest to lowest
99% falls below P99. or vice-versa.

Equivalent Values: 1.2 Use the formula: Qn = nN ÷ 4

a) Q1 = P25 where: n = quartile rank; N = total number of


observations
b) b) Q2 = D5 = P50 = Median
1.3 Round off the result after using the formula in
c) c) Q3 = P75 Step 1.2 to the nearest whole number. Locate the
quartile to the array using the rounded off value by
d) d) D1 = P10 , D2 = P20 , D3 = P30 , … , D9 starting to count from the lowest score going up.
= P90 .
1.4 The raw score being marked is the quartile.
Measures of dispersion
2. Deciles
• Measures of how spread or how
compressed a certain distribution Steps:

• Measures of the average distance of each 2.1 Arrange the raw scores from highest to lowest
observation from the center of distribution or vice-versa.

• Tells how homogeneous or how 2.2 Use the formula:


heterogeneous the observations in a
particular distribution or data set Dn = nN ÷ 10

e.g. Data set 1 : 1, 2, 5, 9, 8 where: n = decile rank; N = total number of


observations.
Data set 2 : 5, 4, 4, 6, 6
2.3 Round off the result after using the formula in
Some measures of variation or dispersion Step 2.2 to the nearest whole number. Locate the
decile to the array using the rounded off value by
1. Range – is the simplest measure of starting to count from the lowest score going up.
dispersion. It is the difference between the
highest value and the lowest value in the 2.4 The raw score being marked is the decile.
given set of data. This is denoted by the
capital letter R. 3. Percentiles

2. Mean Absolute Deviation – is the mean of Steps:


the absolute deviations from the average
3.1 Arrange the raw scores from highest to lowest
score taken in a given set of data. It is
or vice-versa.
denoted by MAD.
3.2 Use the formula:
3. Standard Deviation – is the square root of
the mean of squared deviations from the Pn = nN ÷ 100
average score in a given distribution. It is
where: n = percentile rank; N = total number of 31+54 +85+19+27+ 73+88 377
X= = =53.86
observations. 7 7
3.3 Round off the result after using the formula in •
Step 3.2 to the nearest whole number. Locate the
percentile to the array using the rounded-off value • Median
by starting to count from the lowest score going up.
19, 27, 31,54,73,85,88
3.4 The raw score being marked is the percentile. ~
X =54
4. Range
• Mode
R = HV – LV
Mode thus not exist.
where: HV = highest value; LV = lowest value
• Q2
5. Mean Absolute Deviation
19, 27, 31,54,73,85,88
n

∑ | X k − X| nN/4 = 2(7)/4 = 14/4 = 3.5 ≈ 4


k=1
MAD=
N Q2=54

where: X k= individual score or entry; X = mean; N • D7


= total number of observations
19, 27, 31,54,73,85,88
6. Standard Deviation
nN/10 = 7(7)/10 = 49/10 = 4.9 ≈ 5


n

∑|X k −X|
2
D 7=73
k=1
S=
N −1 • P41

where: X k= individual score or entry; X = mean; N 19, 27, 31,54,73,85,88


= total number of observations
nN/100 = 41(7)/100 = 287/100 = 2.87≈ 3
7. Variance
P41=31
2
V =S
• Range
where S= standard deviation
R=88−19=69

• MAD
8. Coefficient of Variation
n

S ∑ | X k − X|
CV = MAD= k=1
X N
where: S = standard deviation; X = mean X =53.86
Example:

1. In a general inventory, the set of defective


computers inspected by a technician was
coded as follows: 31, 54, 85, 19, 27, 73, 88.
Calculate the mean, median, mode, Q2, D7,
P41, range, MAD, SD, V, and CV.

Solution:

• Mean:

X=
∑x
n
n

∑ | X k − X| X=
∑x
k=1 n
MAD=
N
(0.08 + 0.22 + … + 0.20) 1.84
X= = =0.18
169.14 10 10
MAD=
7
• Median
MAD=24.16
0.08, 0.08, 0.10, 0.13, 0.13, 0.20, 0.22, 0.25,
• SD 0.31, 0.34


~ 0.13+ 0.20
n
X= =0.17
∑|X k −X|
2
2
k=1
S=
N −1 • Mode

^
X =0.08 , 0.13.

• Q3

0.08, 0.08, 0.10, 0.13, 0.13, 0.20, 0.22, 0.25,


0.31, 0.34

nN/4 = 3(10)/4 = 30/4 = 7.5 ≈ 8

Q3=0.25

• Median
∑|X k− X|2=4960.86
0.08, 0.08, 0.10, 0.13, 0.13, 0.20, 0.22, 0.25,
S=
6 √
4960.86 0.31, 0.34

~ 0.13+ 0.20
S=28.75 X= =0.17
2
• Variance • Mode
2
V =S ^
X =0.08 , 0.13.
2
V =28.75 • Q3

V =826.56 0.08, 0.08, 0.10, 0.13, 0.13, 0.20, 0.22, 0.25,


0.31, 0.34
• CV
nN/4 = 3(10)/4 = 30/4 = 7.5 ≈ 8
S
CV =
X Q3=0.25

28.75 • D6
CV =
53.86
0.08, 0.08, 0.10, 0.13, 0.13, 0.20, 0.22, 0.25,
CV =0.53 0.31, 0.34
Example: nN/10 = 6(10)/10 = 6
2. The estimated radiation levels in D6=0.20
milliroentgens per hour are as follows: 0.08,
0.22 , 0.34 , 0.13 , 0.25 , 0.31 , 0.10 , 0.13 , • P70
0.08 , and 0.20 in the display areas of 10
computer stores in a certain City. Compute 0.08, 0.08, 0.10, 0.13, 0.13, 0.20, 0.22, 0.25,
the measures of central tendency, Q3 , D6 , 0.31, 0.34
P70 , Range, MAD, SD, V, CV:
nN/100 = 70(10)/100 = 700/100 = 7
• Mean:
P70 =0.22
• Range V =S
2

R=0.34−0.08=0.26 V =0.09
2

• MAD −3
V =8.1∗10
n

∑ | X k − X| • CV
MAD= k=1 S
N CV =
X
X =0.18
0.09
CV =
0.18

CV =0.5

∑ | X k − X|
k=1
MAD=
N

0.80
MAD=
10

MAD=0.08

• SD


n

∑|X k −X|
2

k=1
S=
N −1

∑|X k − X|2=0.0808

S=
√ 0.0808
9

S=0.09

• Variance
HYPOTHESIS TESTING • the operational statement of the theory that
the experimenter believes to be true and
Inferential statistics wishes to prove
• enables us to make estimates of population • Is sometimes referred to as the research
values called parameters and to make hypothesis
statements about computed statistics
acceptable to some degree of confidence For the mean, the alternative hypothesis will be
stated in only one of three possible forms:
Two areas of inferential statistics
Ha: μ ≠ some value
1. estimation
Ha: μ > some value
2. hypothesis testing
Ha: μ < some value
Research Problem:
Note: Ha is the opposite of Ho.
e.g. How effective is Minoxidil in treating male
pattern baldness? For example, if Ho is given as μ = 37.0, then it
follows that the alternative hypothesis is given
Specific Objectives: by any of the ff: Ha: μ ≠ 37.0,
1. To estimate the population proportion of
μ < 37.0 or μ > 37.0.
patients who will show new hair growth after
being treated with Minoxidil. (can be What is a Test of Significance?
answered by estimation)
• A test of significance is a problem of
2.To determine whether treatment using Minoxidil is deciding between the null and the
better than the existing treatment that is known to alternative hypotheses on the basis of the
stimulate hair growth among 40% of patients with information contained in a random sample.
male pattern baldness. (answered by hypothesis
testing) • The goal will be to reject Ho in favor of Ha,
because the alternative is the hypothesis
What is Hypothesis Testing? that the researcher believes to be true. If we
are successful in rejecting Ho, we then
COMPONENTS OF A FORMAL HYPOTHESIS TEST
declare the results to be “significant”.
Null Hypothesis
Two Types of Errors
• denoted by Ho
Type I Error 
• the statement being tested
 The mistake (error) of rejecting the null
• it represents what the experimenter doubts hypothesis when it is true.
to be true
 It is not a miscalculation or procedural
• must contain the condition of equality and misstep; it is an actual error that can occur
must be written with the symbol = when a rare event happens by chance.

Example of null hypothesis: Ho: μ = some value  The probability of rejecting the null
hypothesis when it is true is called the
• The null hypothesis corresponding to the significance level (α ).
common belief that the mean body
temperature is 37oC is expressed as Ho : m  The value of α is typically predetermined,
= 37oC and very common choices are α = 0.05 and
α = 0.01.
Note: The hypothesized value can be obtained
from previous studies or from knowledge of the Example of Type I Error
population
• The mistake of rejecting the null hypothesis
Alternative Hypothesis that the mean body temperature is 37.0
when that mean is really 37.0.
• denoted by Ha

• Is the statement that must be true if the null


hypothesis is false
Example of situation resulting from Type I Example of making decisions using the
Error p-value

• BFaD allows the release of an ineffective • If the level of significance α = 0.05 and p-
medicine value is 0.01, reject Ho.

Two Types of Errors • If the level of significance α = 0.05 and p-


value is 0.05, reject Ho.
Type II Error
• If the level of significance α = 0.05 and p-
• The mistake of failing to reject the null value is 0.10, do not reject Ho.
hypothesis when it is false.
SUMMARY OF THE STEPS IN HYPOTHESIS
• The symbol (beta) is used to represent the TESTING
probability of a type II error
1. State the null and alternative hypotheses.
Example Type II Error
2. Decide on a level of significance, Determine
• The mistake of failing to reject the null the testing procedure and methods of analysis
hypothesis (m = 37.0) when it is actually (responsibility of the statistician).
false (that is, the mean is not 37.0).
3. Decide on the type of data to be collected
Example of situation resulting from Type II and choose an appropriate test statistic and
Error testing procedure.
• BFaD does not allow the release of an 4. State the decision rule. Reject Ho if p-value is
effective drug. less than or equal to level of significance
(alpha).
TYPES OF TEST
5. Collect the data and do the test of
Two-tailed test
significance.
• If we are primarily concerned with deciding
6. Determine the p-value. If p-value is less than
whether the true value of a population
or equal to alpha, reject Ho.
parameter is different from a specified
value, then the test should be two-tailed. 7. Write the conclusion and interpret the results.
Left-tailed test

• If we are primarily concerned with deciding


whether the true value of a parameter is
less than a specified value, then the test
should be left-tailed.

Right-tailed test

• If we are primarily concerned with deciding


whether the true value of a parameter is
greater than a specified value, then we
should use the right-tailed test

P-value

• This is the smallest level of significance at


which Ho will be rejected based on the
information contained in the sample 

• It is commonly generated by the statistical


software.

Decision rule: Reject Ho if the p-value is less


than or equal to the level of significance (α)
HYPOTHESIS TESTING: Solution:
FOR ONE POPULATION CASE
1. H O : The average grade of Professor J’s
Steps in Hypothesis Testing Using Critical students is equal to 80% H o : μ=80 %
value
H a :The average grade of Professor J’s
1. State the null and alternative hypotheses.
students is greater than 80%
2. Decide on a level of significance, Determine the H a : μ>80 %
direction of the test.
2. α =0.05 , right tailed test
3. Decide on the type of data to be collected and
choose an appropriate test statistic and testing X−μo
3. Z c = , Z 0.05=1.645
procedure. σ / √n

4. State the decision rule. 4. Reject H o , if Z c > Z α . Otherwise, accept the


null hypothesis.
5. Compute for the test statistic and compare with
the critical value. 85−80
5. Z c = =3.125
6. State the decision base on the resulting 16 / √100
computed value when compared to the critical
6. Since Z c =3.125>Z 0.05 =1.645, reject the
value.
null hypothesis in favor of the alternative
7. Write the conclusion for the given problem. hypothesis.

Summary of the Tests Concerning the 7. Thus we conclude that the claim of
Population Mean professor J the average grade of his
students is greater than 80% at 5% level of
significance.

One sample T-Test

• A one-sample t-test is used to test whether


a population parameter is significantly
different from some hypothesized value
assuming that the entries are normally
distributed and the variance is unknown.

X−μ o
t c=
One sample Z-Test s / √n

• A one-sample z-test is used to test whether Example


a population parameter is significantly
A random sample of 25 female high school
different from some hypothesized value
students shows that their average body mass index
assuming that the entries are normally
(BMI) is about 18 points with a standard deviation
distributed and the variance is known.
of 4.5 points. Test the hypothesis that the average
X−μo BMI of the female high school students is lower
Z c= than 19 points at 5% level of significance.
σ / √n

Example Solution:

A random sample of 100 students enrolled in 1. H O :T he average BMI of the female high
Statistics under Professor J shows that the average school students is equal to 19 points.
grade in the midterm examination is 85%.
H o : μ=19
Professor J claims that the average grade of the
students in the midterm examination is at least 80% H a : T he average BMI of the female high school
with standard deviation of 16%. Is there an
students is lower than 19 points.
evidence to say that the claim of the professor is
correct at 5% level of significance? H a : μ<19

2. α =0.05 , left tailed test


X−μ o computed standard deviation of 25%. Professor J
3. t c = , t (0.05,24) =1.711 claims that the average grade of the students in the
s / √n
midterm examination is not equal to 80%. Test the
4. Reject H o , if t c <−t (0.05,24). Otherwise, accept claim of the professor at 5% level of significance.
the null hypothesis.

18−19
5. t c = =−1.11 Solution:
4.5/ √ 25
1. H O : The average grade of Professor J’s
6. Since t c =−1.11>−t (0.05,24 )=−1.711, accept
students is equal to 80% H o : μ=80 %
the null hypothesis.
H a :The average grade of Professor J’s
7. Thus we conclude that the average BMI of
the female high school students is about 19 students is not equal to 80%
points. H a : μ≠ 80 %

Example 2. α =0.05 , two tailed test

The mean weight of the sample of 100 persons X−μ o


3. t c =  , t (0.05 /2,99) =1.96
from the Honolulu Heart Study is 63 kg. If the ideal s / √n
weight is known to be 60 kg, is the group
significantly overweight? Assume σ =10 kg and 4. Reject H o , if ¿ t c ∨¿ t (0.05/ 2,99). Otherwise,
α =0.05 accept the null hypothesis.

Solution: 85−80
5. t c = =2.00
25/ √ 100
1. H O :The mean weight of the sample of 100
persons from the Honolulu Heart Study is 6. Since ¿ t c ∨¿ 2.00>t (0.05 /2,99)=1.96, reject the
equal to 60 kg. null hypothesis.

H o : μ=60 kg 7. Thus we conclude that the average grade of


Professor J’s students is not equal to 80%.
H a :The mean weight of the sample of 100
persons from the Honolulu Heart Study is
greater than 60 kg.

H a : μ>60 kg

2. α =0.05 , right tailed test

X−μo
3. Z c = , Z 0.05=1.645
σ / √n

4. Reject H o , if Z c > Z α . Otherwise, accept the


null hypothesis.

63−60
5. Z c = =3.
10 / √ 100

6. Since Z c =3>Z 0.05 =1.645, reject the null


hypothesis in favor of the alternative
hypothesis.

7. Thus we conclude that the mean weight of


the sample of 100 persons from the
Honolulu Heart Study is greater than 60 kg
at 5% level of significance.

Example

A random sample of 100 students enrolled in


Statistics under Professor J shows that the average
grade in the midterm examination is 85% with a
HYPOTHESIS TESTING: Solution:
DEPENDENT/PAIRED SAMPLE T-TEST

Paired sample t-test

• When a single sample is studied under 2


separate conditions, two sets of
measurements are obtained. These
measurements are correlated since both
sets of data have been obtained from the
same sample.

• A paired t-test is used when we are


interested in the difference between two
variables for the same subject. Often the
two variables are separated by time.

• The test statistic for this kind of test is:

d
• t c=
sd / √ n

• sd =√ n ∑ d 2−¿ ¿ ¿ ¿

d=
∑d
• where: d is the average of the difference
n
between two variables
33
• sd is the standard deviation of the differences d= =2.2
15
• n is the number of pairs in the sample
sd =√ n ∑ d 2−¿ ¿ ¿ ¿
Example


2
15 ( 135 ) −33
A group of high school students were given sd =
15(15−1)
a pretest before undergoing an intensive review in
the NSAT. The data obtained are given below. sd =2.11
Determine whether the intensive review has a
significant effect on the performance of the Solution:
students. Let α = 0.05.
1. H O : There is no significant difference
between the pre-test and post-test scores of
the students who reviewed for the NSAT.

H a :There is a significant difference between the


pre-test and post-test scores of the students who
reviewed for the NSAT.

2. α =0.05 , two tailed test

d
3. t c = , t =2.145
s d / √ n 0.05/ 2(14)

4. Reject H o , if ¿ t c ∨¿∨t α /2∨¿ . Otherwise,


accept the null hypothesis.

d
5. t c =
sd / √ n

2.2
t c= =4.04
2.11/ √15
6. Since|t c|=4.04>¿ t α / 2∨¿=2.145, reject the √
sd = n ∑ d 2−¿ ¿ ¿ ¿
null hypothesis in favor of the alternative


2
hypothesis. 10 ( 186 ) −38
sd =
10 (10−1)
7. Thus we conclude that there is significant
difference between the pre-test and post- sd =2.15
test scores of the students who reviewed for
the NSAT. Post-test is significantly higher Solution:
than pre-test score, therefore, the review
sessions have significant effect on the post- 1. H O : There is no significant difference
test score. between the pre-test and post-test scores of
the students in Statistics.
Example
H a :There is a significant difference between the
The following data are the scores of the
pre-test and post-test scores of the students in
students in the pre-test and post-test in statistics.
Statistics.
At 5% level of significance, can we say that the
students improved their scores after studying 2. α =0.05 , two tailed test
Statistics?
d
3. t c = , t =2.262
s d / √ n 0.05/ 2(9)

4. Reject H o , if ¿ t c ∨¿∨t α /2∨¿ . Otherwise,


accept the null hypothesis.

d
5. t c =
sd / √ n

3.8
t c= =5.59
2.15/ √10

6. Since|t c|=5.59>¿ t α /2∨¿ =2.262, reject the


null hypothesis in favor of the alternative
hypothesis.

7. Thus we conclude that there is significant


Solution: difference between the pre-test and post-
test scores of the students in Statistics.
Post-test is significantly higher than pre-test
score, therefore, we say that the students
improved their scores after studying
Statistics

d=
∑d
n

38
d= =3.8
10
HYPOTHESIS TESTING: 4. Reject H o , if ¿ t c ∨¿∨t α /2∨¿ . Otherwise,
INDEPENDENT SAMPLE accept the null hypothesis
T-TEST
X 1− X 2
t c=


Independent sample t-test

[ ]
2 2
5. ( n1−1 ) s1 + ( n2−1 ) s 2 1 1
• Use this test to compare two small sets of +
n1+ n2−2 n1 n2
quantitative data when samples are
collected independently of one another. 6.
'Student's' t Test is one of the most 1250−1180
t c= =10.68


commonly used techniques for testing a
hypothesis on the basis of a difference
between sample means.
( 25−1 ) 2 22+ (27−1 ) 25 2 1 1
25+27−2
+
25 27 [ ]
• the t-test determines a probability that two 7. Since|t c|=10.68>¿ t α /2∨¿ =2.576, reject the
populations are the same with respect to the null hypothesis in favor of the alternative
variable tested. hypothesis.

X 1− X 2 8. Thus we conclude that there is significant


t c=


difference on the weight of two groups of

[ ]
2 2
( n1−1 ) s1 + ( n2−1 ) s 2 1 1 chicken at 0.01 level of significance. The
+
n1+ n2−2 n1 n2 kind of feed given to group 1 is better than
the feed given to group 2.
Example
Example
A poultry grower would like to find out which
of two kinds of feeds is better for the growth of Previous midterm examination in Statistics
chickens. He used one kind of feed for one group showed an average score obtained by students in
(group 1) of chickens (n1 = 25) and another kind of Section A and B are the same. The department
feed for a second group (group 2) of chickens (n2 = chair randomly selects two sections of Statistics
27). classes to record the performance of the students.
The following statistics were shown below.
After 45 days, he computed the mean
weights of the chickens in the two groups and
obtained the following data.

x 1=1250 g , s1 =22 g

x 2=1180 g , s 2=25 g Is there an evidence to say by the department chair


that the scores in Sections A and Bare the same?
The chickens in one group were Use 5% level of significance.
independent of the other group. These were
randomly chosen. The groups were subjected to Solution:
the same living conditions except for the type of
feed being used. Use 1% level of significance. 1. H O : There is no significant difference on the
mean scores in the Midterm examination of
Solution: Section A and Section B.

1. H O : There is no significant difference on the H a :There is a significant difference on the mean


mean weights of the chicken in group 1 and scores in the Midterm examination of Section A and
group 2. Section B.

H a :There is a significant difference on the mean 2. α =0.05 , two tailed test


weights of the chicken in group 1 and group 2.
X 1− X 2
t c=


2. α =0.01 , two tailed test

[ ]
2 2
3. ( n1−1 ) s1 + ( n2−1 ) s 2 1 1 ,
+
X 1− X 2 n1+ n2−2 n1 n2
t c=


t 0.05/ 2(27)=2.052
[ ]
2 2
3. ( n1−1 ) s1 + ( n2−1 ) s 2 1 1 ,
+
n1+ n2−2 n1 n2
4. Reject H o , if ¿ t c ∨¿∨t α /2∨¿ . Otherwise,
t 0.01/ 2(50)=2.576 accept the null hypothesis.
X 1− X 2 6. Since t c =21.014>t 0.01 (47 )=¿2.326, reject the
t c=


null hypothesis in favor of the alternative
[ ]
2 2
5. ( n1−1 ) s1 + ( n2−1 ) s 2 1 1
+ hypothesis.
n1+ n2−2 n1 n2
7. Thus, we can say that there is a significant
85.2−86.1 difference on the borrower’s transaction
t c= =−0.6152

√ ( 15−1 ) 10.40+ ( 14−1 ) 20.99 1 1


15+14−2
+
15 14 [ ] cost in pesos between formal lending
institutions and non-formal lending
institutions. The borrower’s transaction cost
6. Since|t c|=0.6152<¿ t α / 2∨¿=2.052, we in pesos is higher in formal lending
accept the null hypothesis. institutions than in non-formal lending
institutions.
7. Thus, the mean score of students from
Section A and B are the same.

Example

Given the data and some descriptive


statistics below, can we say that the borrower’s
transaction cost in pesos is higher in formal lending
institutions that in non-formal lending institutions?
Use 1% level of significance.

Solution:

1. H O : There is no significant difference on the


borrower’s transaction cost in pesos
between formal lending institutions and non-
formal lending institutions.

H a :The borrower’s transaction cost in pesos is


higher in formal lending institutions than in non-
formal lending institutions.

2. α =0.01 , right tailed test

X 1− X 2
t c=

√ [ ]
2 2
3. ( n1−1 ) s1 + ( n2−1 ) s 2 1 1 ,
+
n1+ n2−2 n1 n2
t 0.01(47)=2.326

4. Reject H o , if t c > t α . Otherwise, accept the


null hypothesis.

X 1− X 2
t c=

√ [ ]
2 2
5. ( n1−1 ) s1 + ( n2−1 ) s 2 1 1
+
n1+ n2−2 n1 n2

127.50−61.90
t c= =21.014

√ ( 20−1 ) 64.19+ ( 29−1 ) 150.06 1 1


20+29−2
+
20 29 [ ]

You might also like