0% found this document useful (0 votes)
21 views29 pages

6A. Intro To Stat Inference

Uploaded by

Ocs Namuteche
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views29 pages

6A. Intro To Stat Inference

Uploaded by

Ocs Namuteche
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 29

Introduction to Statistical Inference

By

Erastus K Njeru
Introduction
• Foundation for statistical inference (for
means and proportions)
– Population distribution curve
– Probability & Probability distributions
– Normal distribution
– Sampling distribution of the mean
– Decision errors
From Sample to population
• Recall: Histogram for continuous data
– When constructing histogram, we may use as
many bars as we would like, without distorting
picture
– if number of observations infinite, we can have
infinite no of bars – we get smooth curve.
– Curve is called population distribution curve
• If symmetric, then population follows
normal distribution.
Population distribution curve and
histogram
Normal Distribution
Characteristics of the Normal Curve

• The curve is bell-shaped and symmetrical.


• The mean, median, and mode are all equal.
• The highest frequency is in the middle of the
curve.
• The frequency gradually tapers off as the scores
approach the ends of the curve.
• The curve approaches, but never meets, the
abscissa at both high and low ends (Extends from
- ∞ to + ∞).
Example Normal Curves (Heights: males vs females)

Women:
µ = 63.6
 = 2.5
Men:
µ = 69.0
 = 2.8

63.6 69.0
Height (inches)
Normal Curve
• Mean ± 1sd limits include 68.27%
• Mean ± 2sd limits include 95.45%
• Mean ± 1.96sd limits include 95%
• Mean ± 3sd limits include 99.73%
• Mean ± 2.58sd limits include 99%
Normal Curve
Standardization
• For every population with mean μ and
variance σ² there is a normal distribution
• To compare different populations, we define
the standard normal distribution
xi - μ
Zi = ----------
σ
i.e. for every observation xi subtract population
mean and divide by the std deviation. The
resulting Zi value is known as the z-score or
standard normal deviate
Standardizing the Normal
Distribution
X 
Z
Normal  Standardized
Distribution Normal Distribution
 = 1

 X = 0 Z
Standard Scores
To convert any value x to a z-score:

Value  Mean x 
z 
Standard deviation 

A z-score measures the number of


standard deviations that a value falls from
the mean.
Standard Normal Distribution
µ=0 =1

0 x

z
Z- Score
• If an obs ht is 165cm, mean is 160cm and
SD is 5cm, then Z is +1
• Then obs above +1Sd is 16% (as per the
normal curve)
• Thus probability of having ht above 165cm
is 0.16
• Z tabulated in bks as “Table of Unit Normal
Distribution” (Normal Probability Integral)
The Empirical Rule
Standard Normal Distribution: µ = 0 and  = 1
99.7% of data are within 3 standard deviations of the mean

95% within
2 standard deviations

68% within
1 standard deviation

34% 34%
2.4% 2.4%
0.1% 0.1%
13.5% 13.5%
x - 3s x - 2s x-s x x + s x + 2s x + 3s
Standard Scores (z-score)
Once the mean (µ) and SD (σ) have
been specified, finding probabilities
is a simple process for a normal
random variable.
Convert the endpoints of an interval
of interest to z-scores, look up
probabilities associated with the z-
score
Example Standard Scores for Height

For a population of college women, the z-score corresponding to


a height of 62 inches is

Value  Mean 62  65
z   1.11
Standard deviation 2 .7

This z-score tells us that 62 inches is 1.11 standard


deviations below the mean height for this
population.
Finding Probabilities for z-scores

Standard Normal (z) Probabilities (Above/below)

• Body of table contains P(Z  z*).


• Left-most column of table shows algebraic sign, digit before
the decimal place, the first decimal place for z*.
• Second decimal place of z* is in column heading.
• The values in the body of Table (next) refer to the region
under the curve.
Standard Normal (z) Distribution (under)
z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09
0.0 .0000 .0040 .0080 .0120 .0160 .0199 .0239 .0279 .0319 .0359
0.1 .0398 .0438 .0478 .0517 .0557 .0596 .0636 .0675 .0714 .0753
0.2 .0793 .0832 .0871 .0910 .0948 .0987 .1026 .1064 .1103 .1141
0.3 .1179 .1217 .1255 .1293 .1331 .1368 .1406 .1443 .1480 .1517
0.4 .1554 .1591 .1628 .1664 .1700 .1736 .1772 .1808 .1844 .1879
0.5 .1915 .1950 .1985 .2019 .2054 .2088 .2123 .2157 .2190 .2224
0.6 .2257 .2291 .2324 .2357 .2389 .2422 .2454 .2486 .2517 .2549
0.7 .2580 .2611 .2642 .2673 .2704 .2734 .2764 .2794 .2823 .2852
0.8 .2881 .2910 .2939 .2967 .2995 .3023 .3051 .3078 .3106 .3133
0.9 .3159 .3186 .3212 .3238 .3264 .3289 .3315 .3340 .3365 .3389
1.0 .3413 .3438 .3461 .3485 .3508 .3531 .3554 .3577 .3599 .3621
1.1 .3643 .3665 .3686 .3708 .3729 .3749 .3770 .3790 .3810 .3830
1.2 .3849 .3869 .3888 .3907 .3925 .3944 .3962 .3980 .3997 .4015
1.3 .4032 .4049 .4066 .4082 .4099 .4115 .4131 .4147 .4162 .4177
1.4 .4192 .4207 .4222 .4236 .4251 .4265 .4279 .4292 .4306 .4319
1.5 .4332 .4345 .4357 .4370 .4382 .4394 .4406 .4418 .4429 .4441
1.6 .4452 .4463 .4474 .4484 .4495 .4505 .4515 .4525 .4535 .4545
1.7
1.8
.4554
.4641
.4564
.4649
.4573
.4656
.4582
.4664
.4591
.4671
* .4599
.4678
.4608
.4686
.4616
.4693
.4625
.4699
.4633
.4706
1.9 .4713 .4719 .4726 .4732 .4738 .4744 .4750 .4756 .4761 .4767
2.0 .4772 .4778 .4783 .4788 .4793 .4798 .4803 .4808 .4812 .4817
2.1 .4821 .4826 .4830 .4834 .4838 .4842 .4846 .4850 .4854 .4857
2.2 .4861 .4864 .4868 .4871 .4875 .4878 .4881 .4884 .4887 .4890
2.3 .4893 .4896 .4898 .4901 .4904 .4906 .4909 .4911 .4913 .4916
2.4 .4918 .4920 .4922 .4925 .4927 .4929 .4931 .4932 .4934 .4936
2.5 .4938 .4940 .4941 .4943 .4945 .4946 .4948 .4949 .4951 .4952
2.6
2.7
.4953
.4965
.4955
.4966
.4956
.4967
.4957
.4968
.4959
.4969
.4960
.4970
.4961
.4971
.4962
.4972
* .4963
.4973
.4964
.4974
2.8 .4974 .4975 .4976 .4977 .4977 .4978 .4979 .4979 .4980 .4981
2.9 .4981 .4982 .4982 .4983 .4984 .4984 .4985 .4985 .4986 .4986
3.0 .4987 .4987 .4987 .4988 .4988 .4989 .4989 .4989 .4990 .4990
Example
1. The population of neonates in a certain
hospital is known to have birth weights
that are normally distributed with mean
3.2kg and variance 2.5kg.
a) What is the probability of getting a
neonate with BW > 3.6kg
b) What is the prob of getting a neonate with
BW between 2.0 and 2.5kg
Bernoulli trials
• Suppose an experiment (observing an individual)
has two possible outcomes: Y/N; 0/1; +ve/-ve;
Present/absent; success/failure

• Referred to as Bernoulli trial

Ex. Is the baby a boy?


Does the patient have cholera?
Is the client positive?
Does the toss of coin result in ‘head’
Binomial distribution
Note that its possible to have Bernoulli
process from non-binary variable:
Ex. Does patient have severe oedema?
Does roll of die result in a ‘2’?

Series of Bernoulli trials - this can be viewed


as a series of observations) gives rise to
Binomial distribution
Sampling distribution of the mean
• Suppose we have a population, normally
distributed, with mean µ and variance 2
• From this population we select a random
sample of size n, and calculate the sample
mean
• Repeat an infinite number of times
• Result:
– A population of sample means
Sampling distribution of the mean
1. Normally distributed
2. The mean of the population of sample
means will be the population mean µ
3. a) The variance of the population of
sample means will be 2n
b) The standard deviation of the sample
means will approach 
n
Central Limit Theorem

The sampling distribution of sample mean


will be normal even if the original
population deviates from normality, as
long as the sample size n is large enough.
• 30 considered large enough???
• Not to be confused with the sample size
required to carry out statistical inference!!!
Statistical Inference
“In God we trust, others we investigate’’
• Inference - Drawing conclusions about a
population (or populations) based on the
observation of a sample (or samples)

• Conclusion: Make a hypothesis and use


statistical methods to decide whether
hypothesis is true or not
Hypothesis
Definition: Statement about (the distribution of) a
population or populations
Examples:
• Mean (number) of sex partners of HIV positive
clients is higher than that of HIV –ve clients
• Level of development is related to political
party affiliation
• Females have higher blood pressure than males

These statements are preceded by research


questions!
Types of hypothesis
Two types of hypothesis:

• Ho: (Null hypothesis) - to be tested

• HA: (Alternative hypothesis) - what is true if


null hypothesis is not true

• The necessity to test compels us to state


null hypothesis in particular way – null!
Types of errors
When test is carried out two types errors may occur:

True situation
(in population)
H0 True Ho False
Test Ho True (Accept) OK Type II
Decision Ho False (Reject) Type I OK

α = Prob of making Type I error = significance level


1- α = confidence coefficient {(1- α)*100% = conf level}
β = Prob of making Type II error
1- β = Power of test
Notes
• α and β are probabilities of committing
errors, not errors
• Would like α and β to be small – researchers
set α = 0.05 as reasonable, then β is made
smaller by larger sample size

You might also like