0% found this document useful (0 votes)
9 views41 pages

Introduction To Statistical Analysis

The document provides an introduction to statistical analysis, defining statistics and its two main branches: descriptive and inferential statistics. It covers key concepts such as populations, samples, variables, measurement scales, data sources, and methods of data collection, as well as sampling methods and statistical measures. Additionally, it explains central tendency, variation, probability rules, and provides examples to illustrate these concepts.

Uploaded by

parim2004.sg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views41 pages

Introduction To Statistical Analysis

The document provides an introduction to statistical analysis, defining statistics and its two main branches: descriptive and inferential statistics. It covers key concepts such as populations, samples, variables, measurement scales, data sources, and methods of data collection, as well as sampling methods and statistical measures. Additionally, it explains central tendency, variation, probability rules, and provides examples to illustrate these concepts.

Uploaded by

parim2004.sg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

1

Introduction to Statistical Analysis

Introduction:

Decision makers make better decisions when they use all available information in

an effective and meaningful way. The primary role of statistics is to to provide

decision makers with methods for obtaining and analyzing information to help

make these decisions. Statistics is used to answer long-range planning questions,

such as when and where to locate facilities to handle future sales.

Statistics is defined as the science of collecting, organizing, presenting, analyzing

and interpreting numerical data for the purpose of assisting in making a more

effective decision.
Biostatistics is the science of conducting study in biological or health data

concerned with the collection, presentation, analysis and interpretation of data.

Two areas of the body knowledge of statistics:

1. Descriptive statistics consists of the collection, organization, summarization


and presentation of data.

2. Inferential statistics consists of generalizing from samples to populations,


performing hypothesis tests, determining relationships among variables, and
making predictions.

A population consists of all subjects (human or otherwise) that are being


studied. ( Finite or Infinite population)

A sample is a group of subjects selected from a population.


2

Variable is a characteristic or attribute that can assume different values.

1. Quantitative Variable is one that can be measured in the usual sense:


age, height, weight, distance, volume etc.

2. Qualitative Variable is the characteristics the separated in categories:


gender, occupation, the type of disease etc .

The measurement scales

The measurement scales of the variable separate into four common types of
scales are used: nominal, ordinal, interval and ratio.

1. The nominal level of measurement classified data into mutually


exclusive (no overlapping), exhausting categories in which no order can be
imposed on the data: gender, religion, blood group, occupation etc .

2. The ordinal level of measurement classifies data into categories that can
be ranked; however, precise differences between the ranks do not exist: attitude,
grade ( A, B , B, C  , C, D , D, F ) etc.

3. The interval level of measurement ranks data, and precise differences


between units of measure do exist; however, there is no meaningful zero:
temperature, score of the exam etc.

4. The ratio level of measurement possesses all the characteristics of


interval measurement, and there exists a true zero. In addition, true ratio exists
when the same variable is measured on two different members of the population:
height, weight, time, income, age etc.

Sources of Data:

1. Secondary Data: Data which are already available. An example: hospital


registration . Advantage: less expensive. Disadvantage: may not satisfy your needs.

2. Primary Data: Data which must be collected, questionare from survey.


3

Methods of Collecting Primary Data:

1. Focus Group; 2. Telephone Interview; 3. Mail Questionnaires; 4. Door-to-Door


Survey; 5. Mall Intercept; 6. New Product Registration; 7. Personal Interview; and
8. Experiments are some of the sources for collecting the primary data.

Sampling Methods:

There are many ways to collect a sample. The most commonly used methods are:

A. Probability Sampling:

1. Simple Random Sampling: Is a method of selecting items from a population such


that every possible sample of specific size has an equal chance of being selected. In
this case, sampling may be with or without replacement.

2. Stratified Random Sampling: Is obtained by selecting simple random samples


from strata (or mutually exclusive sets). Some of the criteria for dividing a
population into strata are: Sex (male, female); Age (under 18, 18 to 28, 29 to 39);
Occupation (blue-collar, professional, other).

3. Cluster Sampling: Is a simple random sample of groups or cluster of elements.


Cluster sampling is useful when it is difficult or costly to generate a simple random
sample. For example, to estimate the average annual household income in a large
city we use cluster sampling, because to use simple random sampling we need a
complete list of households in the city from which to sample. To use stratified
random sampling, we would again need the list of households. A less expensive
way is to let each block within the city represent a cluster. A sample of clusters
could then be randomly selected, and every household within these clusters could
be interviewed to find the average annual household income.
4

B. Nonprobability Sampling:

1. Judgement Sampling: In this case, the person taking the sample has direct or
indirect control over which items are selected for the sample.

2. Convenience Sampling: In this method, the decision maker selects a sample


from the population in a manner that is relatively easy and convenient.

3. Quota Sampling: In this method, the decision maker requires the sample to
contain a certain number of items with a given characteristic. Many political polls
are, in part, quota sampling.

Statistic is a characteristic or measure obtained by using the data values from a


sample.

Parameter is a characteristic or measure obtained by using all the data values for
a specific population.

Descriptive Statistics is the method presentation the sample data:

Frequency and percentage tabulation

Rate and Ratio

Graph: Bar Pie, Histogram, and Line etc.

Measurement center: Mean, Median and Mode

Measurement variation: Range and Standard deviation


5

Measurement of Central Tendency


(Mean, Median and Mode)

The arithmetic mean or Mean is the average value of all the data

X i
The mean of population;  i 1
; for ungrouped data
N

fX i i
 i 1
; for grouped data
N

X i
The mean of sample; X i 1
; for ungrouped data
n

fX i i
X i 1
; for grouped data
n

x = sample mean
xi = data
N = population size
n = sample size
fi = frequency in the class interval
k = number of classes

Median is the midpoint of the data array.

Mode is the data which have mostly frequency (some group of data have no mode,
some group have more than one mode).
6

For the data group I; 10, 21, 33, 53, 54


n

X i
The mean; X i 1
n

10  21  33  53  54
X = = 34.2
5

The median; Med. = 33

The mode; No mode

For the data group II; 10, 12, 34, 34, 34, 46, 55, 56, 60, 60

The mean; X  40.1

34  46
The median; Med. =  40
2

The mode; Mode = 34

For the data group III; 10, 12, 34, 34, 34, 46, 55, 56, 60, 60, 60, 65, 67

The mean; X  45.6

The median; Med. = 55

The mode; Mode = 34 and 60

Measurement of Variation
 Range R
 Quartile Deviation Q.D.
 Mean Deviation M.D.
 Standard Deviation
Range; R = Xmax - Xmin
7

Q3  Q1
Quartile Deviation; Q.D. 
2

X i 1
i X
Mean Deviation; M.D. = ; for ungrouped data
n

 i 1
fi X i  X
M.D. = ; for grouped data
n

The standard deviation

(x  ) i
2

For population;  i 1
; ungrouped data
N

 f (x  ) i i
2

 i 1
; grouped data
N

 ( x  x) i
2

For sample; S i 1
; ungrouped data
n 1

 f ( x  x) i i
2

S i 1
; grouped data
n 1

Variance is the second power of the standard deviation  2 , S 2


8

Example: For the data 10, 21, 33, 53, 54

The mean;  = X = 34.2


N

(x  ) i
2

For population;  i 1
; ungrouped data
N

 ( x  34.2) i
2

 i 1
 17.36
5

 ( x  x) i
2

For sample; S i 1
; ungrouped data
n 1

 ( x  34.2) i
2

S i 1
 19.41
4

Example 68 65 12 22 63 43 32 43 42 25
49 27 27 74 38 49 30 51 42 28
36 36 27 23 28 42 31 19 32 28
50 46 79 31 38 30 27 28 21 43
22 25 16 49 23 45 24 12 24 12
69 25 27 47 44 51 23
μ = X = 36.6

(x  ) i
2

For population;  i 1
; ungrouped data
N

57

 ( x  36.37) i
2

For population;  i 1
 15.45 ; N = 57
57
9

 ( x  x)
i
2

For sample; S i 1
; ungrouped data
n 1

57

 ( x  36.37)
i
2

For sample; S i 1
 15.58 ; n = 57
56

When we have large n ; S  

To compare the variation between groups, we use the coefficient of variation.

The coefficient of variation ( C.V.) is the ratio of standard deviation and mean.

S
C.V. = x 100%
x

Example : To compare the age and cholesterol variation between male and female.

Gender Age cholesterol

X S CV X S CV

13 25
Male 46 13 100  28.26% 100 25 100  25.00%
46 100

10 30
Female 40 10 100  25.00% 110 30 100  27.27%
40 110

The age of male is more variation than female, but the cholesterol of female is

more variation than male.

In the male group the age data is more variation than the cholesterol value.

In the female group the cholesterol data is more variation than the age value.
10

Probability

The Multiplication Rule for Counting

In order to determine the total number of outcomes in a sequence of events,

the multiplication rule can be used.

Multiplication Rule

In a sequence of n events in which the first one has k 1 possibilities and the
second event has k2 and the third has k3, and so forth, the total number of
possibilities of the sequence will be k1 . k2 . k3 . . . kn

Factorial Formulas; 5! = 1 x 2 x 3 x 4 x 5 = 120

n! = 1 x 2 x 3 x . . . x n

0! = 1

A permutation is an arrangement of n objects in a specific order.

Permutation Rule; The arrangement of n objects in a specific order using r objects


at a time is called a permutation of n objects taking r objects at a time. It is written
n!
as n Pr and The formula is n
Pr =
(n  r )!

A selection of distinct objects without regard to order is called a combination.

Combination Rule ; The number of combinations of r objects selected from n


n!
objects is denoted by n cr and is given by the formula n
cr =
(n  r )! r !
11

A probability experiment is a chance process that leads to well defined results


called outcomes.

An outcome is the results of a single trial of a probability experiment.

A sample space is the set of all possible outcomes of a probability experiment.

number of outcomes in E
The probability of any event E is
total number of outcomes in the sample space

n( E )
P(E) =
n( S )

Law for probability

1. Addition Rule 1
When two event A and B are mutually exclusive, the probability that A or B

will occur is P (A or B) = P(A) + P(B)

2. Addition Rule 2

If A and B are not mutually exclusive, then

P(A or B) = P(A) + P(B) – P(A and B)

3. Multiplication Rule 1

When two events are independent, the probability of both occurring is

P (A and B) = P(A) . P(B)

4. Multiplication Rule 2

When two events are dependent, the probability of both occuring is

P(A and B) = P(A) . P(B/A)


12

5. Conditional Probability
The conditional probability of an event B is relationship of an event A was

defined as the probability that event B occurs after event A has already occured.

P( A and B)
P(B/A) =
P( A)

6. Complementary event rule

The complementary event of A is A

P( A ) = 1 – P(A)

E
Exxaam
mppllee 11 On the survey of 500 normal persons, distributed by the blood type.

Blood type Number Probability


O 225 225/500 = 0.45
A 205 205/500 = 0.41
B 50 50/500 = 0.10
AB 20 20/500 = 0.04

Total 500 1.00

Each blood type are mutually exclusive.

P(A or B) = P(A) + P(B) = 0.41 + 0.10 = 0.51

P(A and B) = 0

P(O´) = 1 – P(O) = 1 – 0.45 = 0.55

P(A or B or AB) = P(A) + P(B) + P(AB) = 0.41 + 0.10 + 0.04 = 0.55


13

Example 2 In the survey of school health for eye and dental health problem in

primary school among 200 students, they have 30 cases eye problem 50 cases

dental problem and 16 have both.

Eye problem A; P(A) = 30/200 = 0.15

Dental problem B; P(B) = 50/200 = 0.25

P(A and B) = 16/200 = 0.08

A and B are not mutually exclusive;

The probability to meet the student has eye or dental problem

P(A or B) = P(A) + P(B) - P(A and B)

= 0.15 + 0.25 - 0.08 = 0.32


14

Example 3 The survey of smoking habit and lung cancer among 200 males.

________________________________________________________

Smoking habit Lung cancer Total


Have (B) Not have (B´)
___________________________________________________
Smoke(A) 6 4 10
Nonsmoke(A´) 2 188 190
___________________________________________________
Total 8 192 200
___________________________________________________
Probability of male to have smoker; P(A) = 10/200 = 0.05

Probability of male to have lung cancer; P(B) = 8/200 = 0.04

Probability to have smoker with cancer = P(A).P(B) = 0.05×0.04 = 0.002

On the survey the probability of male to have smoke and cancer

P(A and B) = 6/200 = 0.03

Probability to have lung cancer among smoker

P(B/A) = P(A and B)/ P(A) = (6/200)/(10/200) = 6/10 = 0.6


15

Probability Distribution

Probability : The proportion between the number of interesting events(n) and

the number of the all possible outcome events (N).


n
P(E) =
N

Sample space : All possible outcome of the trial

Random variable : The relation for transform the sample space events to

the figure.

If x are random variable.

On the survey the family that have two children in the family

If x are the number of the boy in the family.

X = 0, 1, 2

Type of the random variable

- Discrete random variable: counting number, frequency

- Continuous random variable: age, height, weight, blood sugar etc.

Probability distribution of the random variable


(1) Probability distribution of discrete random variable

If x are discrete random variable: x1, x2, . . . , xn

The probability of x ; P(x = xi) when i = 1, 2, 3, . . . , n

We call the probability density function P(x = xi)


16

The properties of the probability

a. P(xi)  0 for all xi


n
b. P ( xi )  1
i 1

(2) Probability distribution of continuous random variable

If x are continuous random variable

a. P(xi)  0 for all xi



b.  P ( x) dx  1


The mean and the variance of probability distribution

The mean of the discrete random variable probability distribution.

  E ( x)   xi p( xi )
all x

The variance of the discrete random variable probability distribution

x
2
 2 = V(x) = E [x – E(x)] = E [x2] - [E(x)]2 = 2
i p( xi )   2
all x

The mean of the continuous random variable probability distribution

  E ( x)   xp( x)dx
allx

The variance of the continuous random variable probability distribution

 2  V ( x)  x p( x)dx   2
2

allx
17

Probability Distribution
1. The Binomial Distribution
A binomial experiment is a probability experiment that satisfies the
following four requirements:

1) Each trial can have only two outcomes, these outcomes can be considered
as either interesting or non-interesting.

2) There must be a fixed number of trials


3) The outcomes of each trial must be independent of each other.
4) The probability of a interested thing must remain the same for each trial.
The probability of x;

P(x) = n Cx px qn-x ; x = 0, 1, 2, . . ., n

n = the number of trials

x = the number of the thing that interested

p = the probability of interested happening for each trial

q = the probability of non-interesting event in each trial ; q = 1 – p


b
The probabilily of x from a to b; P(a  x  b) =  n C x p x q n x
x a

The mean; E(X) =  = np

The variance ; V(X) =  2 = npq


18

E
Exxaam
mppllee 22 In the clinical trial for the new treatment, after treat the new treatment
to the patient, the probability of each patient have good result is 0.4 . If the
experiment trial 15 patients, to find the probability :

a. 10 patients have good result b. 3 to 8 patients have good result

c. more than 4 patients have good result

x = the number patients have good result

p = 0.4 ; q = 1 – p = 0.6

Binomial probability; P( x) =
n
C x p x q n x ; x = 0,1,2,3,…,n

P(x) =
15
C x (.4) x (.6)15x ; x = 0,1,2,3,…,15

the probability :

a. 10 patients have good result


15
P(X = 10) = C10 (.4)10 (.6) 5 = 0.0245

b. 3 to 8 patients have good result


8
15 15 x
P (3  x  8) =  C x (.4) (.6)
x
x 3

= .0634 + .1268 + .1859 + .2066 + .1711 + .1181

= 0.8719

c. more than 4 patients have good result

P(x>4) = P(x  5)

= 1 - P(x  4)
4
15 15 x
= 1 -  C x (.4) (.6)
x
x 0

= 1 - [.0005 + .0047 + .0219 + .0634 + .1268]

= 1 - 0.2173 = 0.7827
19

2. The Poisson Distribution


Poisson Probability is the probability of the discrete random variable that

we cannot limit the number of outcome for random variable. The probability of x
occurrences in an interval of time, volume, area, etc. for a variable.

- The number of cars accident on the highway per day


- The number of bacteria per one bottle of water
The probability of the poisson distribution;

e  x
P( x) = ; x = 0, 1, 2, . . .
x!

 = the average of number of occurrences per unit (area, time,


volume, etc.)

x = the number of occurrences

e = 2.71828

The mean; E(X) =   

The variance; V(X) =  2  

The probability of x from a to b


b e  x
P(a  x  b) = 
x a x!
20

E
Exxaam
mppllee 33 In the hospital, at the emergency unit, there are 3 emergency cases

admit per day, to find the probability that :

a. no case admit b. at most 3 cases admit

c. more than 3 cases admit d. between 2 to 4 cases admit

x = the number of admit cases

e   x
Poisson probability; P( x)  ; x  0,1,2,3,...
x!

there are 3 emergency cases admit per day;  = 3

e3 3x
P( x)  ; x  0,1,2,3,...
x!
the probability that :

a. no case admit

P(X = 0) = 0.0498

b. at most 3 cases admit

P(X  3) = .0498+.1494+.2240+.2240 = 0.6474

c. more than 3 cases admit

P(X > 3) = 1 – P(X  3) = 1 - .6474 = 0.3526

d. between 2 to 4 cases admit

P(2  X  4) = .2241 + .2241 + .1681 = 0.6163


21

The Normal Distribution

The normal distribution is the distribution of continuous random variable.

The data xi are normal distribution with the mean (  ) and the standard

deviation (  ) the probability density function of x ;

2
1 -1/2 x
f (x ;  ,  ) = e   ;  < x < 
2  2   

μ,σ,π,e are constant

From any data xi transform to standard score Z i

xi  
Zi 

The mean of the standard score ; z  0

The standard deviation of the standard score ;  z  1

Example 4 The weight of the normal person are normal distribution with the

mean 50 kg. and the standard deviation 10 kg. How many percentage of all the

normal person that have the weight between 45 to 65 kg. and how many

percentage that have the weight greater than 60 kg.?

  50,  10
xi  
Zi 

22

The probability of the person that have the weight between 45 to 65 kg.
45  50 65  50
P(45 < x < 65) = P( Z  )
10 10

= P(-0.5 < Z < 1.5)

= 1 - 0.3085 - 0.0668

= 0.6247

It is 62.47% of the normal person that have the weight between 45 to 65 kg.

The probability of the person that have the weight greater than 60 kg.
60  50
P(x > 60) = P( Z  ) = P (Z > 1) = 0.1587
10

It is 15.87% of the normal person that have the weight greater than 60 kg.
23

Sampling distribution
We are sampling in the population with the sample size n;

- Quantitative data : sample mean ( X i )



- Qualitative data : sample proportion ( p )

1. Sampling distribution of sample mean ( X i )

We are sampling the quantitative data in the population with have

the mean μ and the standard deviation  sample size n cases.

All possible sample mean ( X i ) are normal distribution, with the value of the

average of the sample means is the same as population mean and the variance

of the sample means is as the population variance divided by the sample size.

 
x

2
 2
 
x n
1. Sampling with replacement

The mean of the sample mean  x


= 

2
The variance of the sample mean 2 
x n

2. Sampling without replacement

The mean of the sample mean  x = 

 2  N  n
The variance of the sample mean  x2  
n  N  1 
24

N  n 2
In population have large N;   1 ; so 2 
 N  1  x n

Example: In the population of size 5 (N=5) . The data are 6, 8, 10, 12, 14

The mean;  = 10

The variance;  2 = 8

If we are sampling with replacement 2 cases per group (n=2)

There are 25 x i all possible outcome .

First draw Second draw

6 8 10 12 14

___________________________________________________________

6 6,6 6,8 6,10 6,12 6,14

(6) (7) (8) (9) (10)

8 8,6 8,8 8,10 8,12 8,14

(7) (8) (9) (10) (11)

10 10,6 10,8 10,10 10,12 10,14

(8) (9) (10) (11) (12)

12 12,6 12,8 12,10 12,12 12,14

(9) (10) (11) (12) (13)

14 14,6 14,8 14,10 14,12 14,14

(10) (11) (12) (13) (14)

___________________________________________________________

Sample mean ( X )
25

xi fi

6 1

7 2

8 3

9 4

10 5

11 4

12 3

13 2

14 1

Total 25

_______________________________________________________

All the sample mean ( X i ) are normal distribution.

The mean of sample mean;  x = 10

The variance of sample mean;  x2 = 4

 x = 10 = 

8 2
 x2 = 4 = 
2 n

2 
x =  and  x2 = x =
n ; n
26

If we are sampling without replacement 2 cases per group (n = 2)

There are 20 x i all possible outcome

First draw Second draw

6 8 10 12 14

___________________________________________________________

6 6,8 6,10 6,12 6,14

(7) (8) (9) (10)

8 8,6 8,10 8,12 8,14

(7) (9) (10) (11)

10 10,6 10,8 10,12 10,14

(8) (9) (11) (12)

12 12,6 12,8 12,10 12,14

(9) (10) (11) (13)

14 14,6 14,8 14,10 14,12

(10) (11) (12) (13)


___________________________________________________________

Sample mean ( X )

The mean of sample mean;  x = 10

The variance of sample mean;  x2 = 3

 x = 10 = 
27

8 52  2  N n 
 x2 = 3 =    
2  5 1  n  N 1 

 N n
  1
If we have large n ;  N 1 

2 
x =  and  x2 = x =
n ; n

For the sampling with or without replacement , we have the same

2 
x =  and  =
2
x
x =
n ; n

All X i  Normal

xi  
Zi  ; 2 known
/ n

xi  
t ; df  n  1; 2unknown
S/ n
28

Example 5 The average of the height of the students in the school is 158 cm. with

the standard deviation 15 cm. in the sampling 100 students in this school , what is

the probability of the sample mean of the height between 155 to 160 cm.?

xi  
Zi  ; 2 known
/ n
The probability of the sample mean between 115 to 121 cm.is
155  158 160  158
P(155  x  160) = P( Z )
15 / 100 15 / 100

= P (-2.0  Z  1.33)

= 1 - .0228 - .0918

= 0.8854

Example 6 The average of birthweight in the rural area 2500 gm.,now we have

the new public health ,in the sample of 25 livebirths that have the standard

deviation of birthweight 1000 gm. What is the probability to have the sample

mean of birthweight greater than 3,000 gm.?

x
t ; df  n  1
S/ n
the probability to have the sample mean of birthweight greater than 3,000 gm.
3000  2500
P ( x > 3000) = P ( t > )
1000 / 25

= P(t > 2.5)

= 0.01
29

2. Sampling distribution of sample proportion

Sampling with sample size n in population with population proportion P

The average all possible sample proportion p̂ ;  p̂ = P

^ P(1  P)
and the standard deviation all possible p ;  p̂ 
n


 p P
p Normal ; Z 
P (1  P)
n

Example 7 In preschool children, the proportion of the children who have dental
health problem 15% , this year among 200 new preschool students, what is the
probability to have 40 and more students have dental health problem? And the
chance to have dental health problem not more than 12% ?

P = 15% = 0.15

the probability to have 40 and more students have dental health problem
40
n = 200, p̂ = = 0.2
200

p̂  P 0.2  0.15
Z = = = 1.98
P(1  P) (0.15)(0.85)
n 200

P ( p̂  0.2) = P (Z  1.98) = 0.0239

the chance to have dental health problem not more than 12% ?

p̂ = 0.12,

p̂  P 0.12  0.15
Z = = = -1.19
P(1  P) (0.15)(0.85)
n 200

P( p̂  0.12) = P(Z  -1.19) = 0.117


30

Estimation parameter
Type of estimation : Point Estimation

Interval Estimation

1. Estimating the population mean (  )

at (1 -  ) 100% confident interval


x  Z /2 ; known
n

S
x  t /2 ; df  n  1; unknown
n
2. Estimating the population variance (  2 )

at (1 -  ) 100% confident interval

(n  1) S 2 (n  1) S 2
  2

2 /2 2 /2

Example 8 From the sample 15 normal persons ,measurement the level of

enzyme we have x = 20 and  2 = 40. To estimate the mean of the level enzyme

of the normal person at 95% confidence interval.

n = 15, x = 20 , 2 = 40 ,  = 6.32 ,

at  = 0.05 , Z.025 = 1.96

Estimate the mean of the level enzyme of the normal person at 95% confidence
interval.
31


 = x  Z / 2
n

1.96(6.32)
 20   20  3.19
15

16.81 <  < 23.19 at 95% CI.

Example 9 In the study of serum of 16 infants the mean 5.96 mg% and the

standard deviation 3.5 mg%. To estimate 95% confidence interval of the

mean serum

S
x  t /2 ; df  n  1; unknown
n
n = 16, x = 5.96 , S = 3.5

at 95% CI. of  ;   0.05 , t / 2 = 2.131 ; df = 15

Estimate  at 95% CI.

(3.5)
5.96  2.131 = 5.96  1.86
16

4.10    7.82 at 95% CI.

Estimate  2 at 95% confidence interval.

(n  1) S 2 (n  1) S 2
 2

2 /2 2 /2

n = 16, s = 3.5, s 2 = 12.25,  = 0.05,df = 15, 2 /2  .025


2
 27.5; 12 /2  .975
2
 6.26

(16  1)(12.25) (16  1)(12.25)


2 
27.5 6.26

6.6818 <  2 < 29.3530

or 2.585 < σ < 5.418


32

3. Estimaing the population proportion (P)

p̂(1  p̂)
at (1 - ) 100% of P; p̂  Z / 2
n

Example 10 In the experiment of new treatment ; we trial in 200 patients the

result have 160 patients got well in 3 days. What is the proportion of patients

get well in 3 days after they have new treatment at 95% confidence interval?
160
n = 200 ; p̂  = 0.8
200

p̂(1  p̂)
Estimate P ; p̂  Z / 2
n

at 95% confidence interval;   0.05 , Z /2  Z.025  1.96

0.8(0.2)
Estimate at 95% CI.; P  0.8  1.96
200

= 0.80  0.055

= 0.745 , 0.855

0.745 < P < 0.855

or 74.5% < P < 85.5%


33

Hypothesis Testing

A statistical hypothesis is an assumption about a population parameter. This


assumption may or may not be true. Hypothesis testing refers to the formal
procedures used by statisticians to accept or reject statistical hypotheses.

Statistical Hypotheses

The best way to determine whether a statistical hypothesis is true would be to


examine the entire population. Since that is often impractical, researchers typically
examine a random sample from the population. If sample data are not consistent
with the statistical hypothesis, the hypothesis is rejected.

There are two types of statistical hypotheses.

 Null hypothesis. The null hypothesis, denoted by H0, is usually the


hypothesis that sample observations result purely from chance.

 Alternative hypothesis. The alternative hypothesis, denoted by H 1 or Ha, is


the hypothesis that sample observations are influenced by some non-random
cause.

For example, suppose we wanted to determine whether the proportion of O blood


type among cancer patient is 50% or not. Suppose we sampling 200 cancer patients
and check for blood type.

H0: P = 0.5
Ha: P ≠ 0.5
34

Can We Accept the Null Hypothesis?

Some researchers say that a hypothesis test can have one of two outcomes: you
accept the null hypothesis or you reject the null hypothesis. Many statisticians,
however, take issue with the notion of "accepting the null hypothesis." Instead,
they say: you reject the null hypothesis or you fail to reject the null hypothesis.

Why the distinction between "acceptance" and "failure to reject?" Acceptance


implies that the null hypothesis is true. Failure to reject implies that the data are not
sufficiently persuasive for us to prefer the alternative hypothesis over the null
hypothesis.

Hypothesis Tests

Statisticians follow a formal process to determine whether to reject a null


hypothesis, based on sample data. This process, called hypothesis testing,
consists of four steps.

 State the hypotheses. This involves stating the null and alternative
hypotheses. The hypotheses are stated in such a way that they are mutually
exclusive. That is, if one is true, the other must be false.

 Formulate an analysis plan. The analysis plan describes how to use sample
data to evaluate the null hypothesis. The evaluation often focuses around a
single test statistic.

 Analyze sample data. Find the value of the test statistic (mean score,
proportion, t-score, z-score, etc.) described in the analysis plan.
35

 Interpret results. Apply the decision rule described in the analysis plan. If the
value of the test statistic is unlikely, based on the null hypothesis, reject the
null hypothesis.

Decision Errors

Two types of errors can result from a hypothesis test.

 Type I error. A Type I error occurs when the researcher rejects a null
hypothesis when it is true. The probability of committing a Type I error is
called the significance level. This probability is also called alpha, and is
often denoted by α.

 Type II error. A Type II error occurs when the researcher fails to reject a
null hypothesis that is false. The probability of committing a Type II error is
called Beta, and is often denoted by β. The probability of not committing a
Type II error is called the Power of the test.

Null Hypothesis H 0
Statistical Decision

True Not true

Accept/Fail to reject No error Type II Error


(1 - ) (  - error)

Type I Error No Error


Reject ( α Error) (1–β)
36

Decision Rules

The analysis plan includes decision rules for rejecting the null hypothesis. In
practice, statisticians describe these decision rules in two ways with reference to a
P-value or with reference to a region of acceptance.

 P-value. The strength of evidence in support of a null hypothesis is measured


by the P-value. Suppose the test statistic is equal to S. The P-value is the
probability of observing a test statistic as extreme as S, assuming the null
hypothesis is true. If the P-value is less than the significance level, we reject
the null hypothesis.
 Region of acceptance. The region of acceptance is a range of values. If the
test statistic falls within the region of acceptance, the null hypothesis is not
rejected. The region of acceptance is defined so that the chance of making a
Type I error is equal to the significance level.

The set of values outside the region of acceptance is called the region of
rejection. If the test statistic falls within the region of rejection, the null
hypothesis is rejected. In such cases, we say that the hypothesis has been
rejected at the α level of significance.

These approaches are equivalent. Some statistics texts use the P-value approach;
others use the region of acceptance approach. In subsequent lessons, this tutorial
will present examples that illustrate each approach.

One-Tailed and Two-Tailed Tests

A test of a statistical hypothesis, where the region of rejection is on only one side
of the sampling distribution, is called a one-tailed test.
37

For example, suppose the null hypothesis states that the mean of the blood sugar is
less than or equal to 100. The alternative hypothesis would be that the mean of the
blood sugar is greater than 100. The region of rejection would consist of a range of
numbers located on the right side of sampling distribution; that is, a set of numbers
greater than 100.

H0: μ ≤ 100
Ha: μ > 100

A test of a statistical hypothesis, where the region of rejection is on both sides of


the sampling distribution, is called a two-tailed test.

For example, suppose the null hypothesis states that the mean of blood sugar is
equal to 110. The alternative hypothesis would be that the mean of blood sugar is
less than 110 or greater than 110. The region of rejection would consist of a range
of numbers located on both sides of sampling distribution; that is, the region of
rejection would consist partly of numbers that were less than 110 and partly of
numbers that were greater than 110.

H0: μ = 110
Ha: μ ≠ 110

Power of a Hypothesis Test

The probability of not committing a Type II error is called the power of a


hypothesis test.
38

Effect Size

To compute the power of the test, one offers an alternative view about the "true"
value of the population parameter, assuming that the null hypothesis is false.
The effect size is the difference between the true value and the value specified in
the null hypothesis.

Effect size = True value - Hypothesized value

For example, suppose the null hypothesis states that a population mean is equal to
100. A researcher might ask: What is the probability of rejecting the null
hypothesis if the true population mean is equal to 90? In this example, the effect
size would be 90 - 100, which equals -10.

Factors That Affect Power

The power of a hypothesis test is affected by three factors.

 Sample size (n). Other things being equal, the greater the sample size, the
greater the power of the test.

 Significance level (α). The higher the significance level, the higher the
power of the test. If you increase the significance level, you reduce
the region of acceptance. As a result, you are more likely to reject the null
hypothesis. This means you are less likely to accept the null hypothesis
when it is false; i.e., less likely to make a Type II error. Hence, the power of
the test is increased.

 The "true" value of the parameter being tested. The greater the difference
between the "true" value of a parameter and the value specified in the null
39

hypothesis, the greater the power of the test. That is, the greater the effect
size, the greater the power of the test.

How to Conduct Hypothesis Tests

All hypothesis tests are conducted the same way. The researcher states a
hypothesis to be tested, formulates an analysis plan, analyzes sample data
according to the plan, and accepts or rejects the null hypothesis, based on results of
the analysis.

 State the hypotheses. Every hypothesis test requires the analyst to state
a null hypothesis and an alternative hypothesis. The hypotheses are stated
in such a way that they are mutually exclusive. That is, if one is true, the
other must be false; and vice versa.

 Formulate an analysis plan. The analysis plan describes how to use sample
data to accept or reject the null hypothesis. It should specify the following
elements.

 Significance level. Often, researchers choose significance levels equal

to 0.01, 0.05, or 0.10; but any value between 0 and 1 can be used.

 Test method. Typically, the test method involves a test statistic and

a sampling distribution. Computed from sample data, the test statistic might be a
mean score, proportion, difference between means, difference between
proportions, z-score, t-score, chi-square, etc. Given a test statistic and its sampling
distribution, a researcher can assess probabilities associated with the test statistic.
40

If the test statistic probability is less than the significance level, the null hypothesis
is rejected.

 Analyze sample data. Using sample data, perform computations called for
in the analysis plan.

 Test statistic. When the null hypothesis involves a mean or proportion,


use either of the following equations to compute the test statistic.

Test statistic = (Statistic - Parameter) / (Standard deviation of statistic)


Test statistic = (Statistic - Parameter) / (Standard error of statistic)

where Parameter is the value appearing in the null hypothesis, and Statistic is
the point estimate of Parameter. As part of the analysis, you may need to compute
the standard deviation or standard error of the statistic. Previously, we presented
common formulas for the standard deviation and standard error.
When the parameter in the null hypothesis involves categorical data, you may use a
chi-square statistic as the test statistic. Instructions for computing a chi-square test
statistic are presented in the lesson on the chi-square goodness of fit test.

P-value. The P-value is the probability of observing a sample statistic as


extreme as the test statistic, assuming the null hypothesis is true.

Interpret the results. If the sample findings are unlikely, given the null
hypothesis, the researcher rejects the null hypothesis. Typically, this involves
comparing the P-value to the significance level, and rejecting the null hypothesis
when the P-value is less than the significance level.
41

Step for hypothesis testing

1. State the hypothesis H 0 , H1

2. Select the test statistic

3. Computation from the sample data; have the p-value

4. Set the error probability α

5. To make the conclusion

If p-value > α ; Fail to reject H 0

If p-value < α ; Reject H 0

You might also like