0% found this document useful (0 votes)
83 views58 pages

03 Estimation IITB PDF

This document discusses various topics related to interval estimation including: - Point estimation and interval estimation are two types of statistical inference. - The central limit theorem states that the sampling distribution of the mean approaches a normal distribution as sample size increases. - Confidence intervals provide a range of values that will contain the population parameter with a certain level of confidence, as opposed to a single point estimate. - For a mean, the margin of error depends on the population standard deviation and sample size. For a small sample, the t-distribution is used. - Sample size determination ensures the desired level of precision in estimating a population parameter.

Uploaded by

Ninad Kale
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
83 views58 pages

03 Estimation IITB PDF

This document discusses various topics related to interval estimation including: - Point estimation and interval estimation are two types of statistical inference. - The central limit theorem states that the sampling distribution of the mean approaches a normal distribution as sample size increases. - Confidence intervals provide a range of values that will contain the population parameter with a certain level of confidence, as opposed to a single point estimate. - For a mean, the margin of error depends on the population standard deviation and sample size. For a small sample, the t-distribution is used. - Sample size determination ensures the desired level of precision in estimating a population parameter.

Uploaded by

Ninad Kale
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 58

Intervals and Estimation

Dr. Vinay Kulkarni


Estimation related Topics
• Sampling
• Point Estimation
• Sample Mean and Sample Variance
• Interval Estimation
• Confidence interval: one parameter
• Sample Size

2
Samples, Parameters & Statistics
• Sampling
– Allows us to make inferences about a population based on a
sample of that population
• Parameters
– Numerical characteristics about the population that are of
interest
• Statistics
– Parameters cannot be exactly determined
– They can only be estimated from samples
– These estimates or summaries, based on the sample, are known
as Statistics
• Major aspects of samples and statistics:
– How accurate are the estimators (statistics)?
– Is the sample truly representative of the population?

3
Statistics as Estimates for Parameters
• We use statistics to estimate parameters
– Proportions
– Arithmetic averages
– Ranges
– Quartiles
– Deciles
– Percentiles
– Variances
– Standard deviations

4
Sampling Methods
• Non-probability sampling
– Convenience sampling
• Randomly pick-up the easily accessible apples
– Judgment or subjective sampling
– Volunteer sampling
• Especially used in clinical trials and research
• Probability sampling methods
– This involves the planned use of chance
– There is no selection bias
• Assignment:
– Review of sampling techniques and when they should be
used

5
Point Estimation
• Estimation
– First step of Inferential Statistics
• (Second step is Hypothesis Testing)

– Two types:
• Point estimation
• Interval estimation
• Point Estimate
– Value of a statistic
– Calculated from a sample
– Estimates the parameter of the population

6
Discussion
• Different possible samples can be drawn from the
same population
• Each of those samples can yield a different value
of a statistic
– Example: Mean and Variance
• It becomes important to investigate the sampling
distributions of estimators
• Sampling Distribution for a given sample size, n:
– Collection of all the estimators of that parameter
– Of all possible samples of size ‘n’ from the population

7
Central Limit Theorem
• Statement
– The central limit theorem states that given a
distribution with a mean μ and variance σ², the
sampling distribution of the mean approaches a
normal distribution with a mean (μ) and a variance
σ²/N as N, the sample size, increases

8
Sampling distribution of the mean
UNIFORM DISTTRIBUTION

SAMPLING DISTRIBUTION OF THE MEAN

9
Sampling distribution of the mean
POISSON DISTRIBUTION

SAMPLING DISTRIBUTION OF THE MEAN

10
Sampling distribution of the mean
BINOMIAL DISTRIBUTION

SAMPLING DISTRIBUTION OF THE MEAN

11
Sampling distribution of the mean
TRIANGULAR DISTRIBUTION

SAMPLING DISTRIBUTION OF THE MEAN

12
Sampling distribution of the mean
NORMAL DISTRIBUTION

SAMPLING DISTRIBUTION OF THE MEAN

13
Sampling distribution of the mean

14
Mean: Sampling dist. of the mean

15
Variance / SD of sampling distribution of mean

16
Sampling Distribution: Mean
• The sampling distribution of the Mean
– Becomes approximately normal as the size of the
sample ‘n’ increases
– Regardless of the shape of the population
• Standard deviation of the sampling distribution of
the mean: Also known as the standard error
• Where  is the standard deviation of the
population

• http://onlinestatbook.com/

17
Normal Distribution Table

18
Normal Distribution Table

19
Exercise
• Savings account in a bank are normally distributed
with mean Rs 2000 and standard deviation Rs 600
• The bank conducts a study by selecting 100
random accounts
• Find the probability that the mean of the sample
will lie between Rs 1900 and Rs 2050

• Answer: 0.7492

20
Exercise
• Distribution of income of a certain category of
bank employees has a mean of Rs 1,50,000 and a
standard deviation of Rs 20000
• If a random sample of 30 is selected, what is the
probability that the mean salary of the sample will
exceed Rs 1,57,500?

21
Interval Estimation
• Definition:
– It is a range of values related to a parameter
– Calculated based on the sample
– Such that
• The parameter will be within that range
• With some degree of confidence
• Use
– A statistic, such as the mean, can be presented
• As a Point Estimate, X
• As an interval, X ± 𝐸 where E is the margin of error

22
Point v/s Interval Estimates
• Point estimate is often insufficient
– It is either right or wrong!
– It also does not indicate the confidence level in that
estimate

• Interval estimate
– Better option : report an interval estimate
– It provides the range, as well as the degree of
confidence

23
Confidence Interval : Mean
• Margin of error for the Mean
– Where the population SD is known

– Where sample size is large, and population SD is not


known

– Where sample size is < 30, student t-distribution is


used (degrees of freedom = n-1)

24
The t-distribution

25
Exercise
• Mercury needs to be estimated in the water of a
certain area. 16 samples are collected and their
ppm values of mercury are as given below

• Manually calculate the mean, variance and


standard deviation
• Estimate the ppm levels of mercury as an interval
for 95% confidence level

26
Answer
• Sample size small
– n <= 30
– Hence students t-distribution will be used
• n = 16
• Sample mean = 403.063
• Variance = 12.996
• Standard deviation = 3.605
• Confidence level 95%, DOF = 16-1 = 15
• From t charts t.025 = 2.131
• Margin of error = 2.131 * 3.605 / sqrt(16) = 1.92
• Based on this the interval (401.143, 404.983)
27
Problem

28
Exercise
• In a factory that used coal as fuel, the
consumption was observed for 10 consecutive
weeks. It was found that an average value of
11400 tons of coal was consumed per day with a
standard deviation of 700 tons.
• From this data, the plan manager wants to
estimate an interval for the mean consumption
such that he can be 95% confident about the coal
requirement. Can you help him out?

29
Sample Size Determination
• Why determine sample size?
– To ensure that the error in estimating a population
parameter is less than a desired threshold
• When sample is too small:
– Required precision is not achieved
• When sample size is too large
– Wastage of resources required to estimate the
parameter

30
Sample Size Determination
• In the case of sampling distribution of the mean,
the margin of error is:

• Therefore

31
Exercise
• In a normal distribution with mean 375 and SD 48,
what should be the size of the sample to ensure
that the mean will be between 370 and 380 with a
probability of 0.95

• Answer: More than 355

32
Confidence Interval: Variance and SD
• The Variance is a “sum of squares of items”

• The Chi-squared distribution : best represents the


probability distribution of such “sum of squared
items”
• Therefore Chi-squared distribution is used to
derive the confidence interval of sampling
distribution of variance (and, hence, the standard
deviation)
35
Confidence Interval: Variance and SD
• Given
• Chi-squared CDF is given by:

• If (1-) is the desired confidence level


• From the CDF Table, this will be the region
between (/2) and (1- /2)

• Hence:

36
Confidence Interval: Variance and SD
• Confidence interval: Standard Deviation

37
Exercise
• 100 healthy adults were subject to driving hazards
• Their response times were measured, and the
variance calculated was
• 0.0196 seconds squared
• For 95% confidence level, find the interval within
which this value will lie

38
Solution
• n = 100; n-1 = 99
• S2 = 0.0196; SD = 0.14
• Confidence level = 95% = 0.95
• (1-) = 0.95;  = 0.05;
• /2 = 0.025; 1-/2 = 0.975
• 0.025 = 73.36; 0.975 = 128.422

• (100-1) * 0.0196 / (128.422) = 0.0151


• (100-1) * 0.0196 / (73.36) = 0.02645
• 0.015 < 2 < 0.0264
• 0.123 <  < 0.1625

39
Confidence Interval: Proportions
• Another important “statistic” is the “proportion”
• We are often interested in:
– Proportion of the population satisfying a certain
criteria
• Proportion of population above / below poverty line
• Proportion of travellers reporting sick on arrival
• Proportion of population using public transport
• Proportion of kids dropping out of school by the age of 15

40
Confidence Interval: Proportions
• Let ‘P’ be the TRUE value of the proportion in the
population
• Let ‘n’ be the size of the sample drawn from the
population
• Let ‘X’ be the number of elements in the sample
that exhibit the attribute under study
• The ‘estimated value’ of the TRUE proportion of
the population is given by : p^ = X/n

41
Confidence Interval: Proportions
• It can be shown that Z (it is normally distributed)

• and

• Thus, if number of elements ‘n’ in the sample and


the proportion p^ in the sample are known, one
can estimate the interval of the TRUE proportion
of the population at a given confidence level
42
Example
• In a sample of 500 employees, 160 preferred
taking training classes in the morning. What
would be the 95% confidence interval for the
TRUE proportion of employees preferring morning
classes?

43
Solution
• x = 160
• n = 500
• p^ = 160 / 500 = 0.32
• alpha (for 95%) = 0.05
• alpha/2 = 0.025
• Substituting in

• Z0.025 = 1.96
• 1.96 * sqrt(0.32 * (1-0.32)/500) = 0.041
• 0.32 – 0.041 < p < 0.32 + 0.041
• 0.279 < p < 0.361
44
Estimation: Additional Exercises

45
Estimation: Additional Exercises

46
Estimation: Additional Exercises

47
Estimation: Additional Exercises

48
Estimation: Additional Exercises

49
ADDITIONAL SLIDES

50
Random Variables
• Random experiment
– Process of measurement or observation in which the
outcome cannot be completely determined in advance
• Sample space
– All possible outcomes of a random experiment
• Random Variable
– A real-valued quantity, or numerical measure, whose
value depends on the outcomes of a random
experiment
– Can be Discrete or Continuous

51
Random Variable
• The probability that a Random Variable may
assume a particular value is governed by a
Probability Function:
– For Discrete variables
• Probability Mass Function (PMF)
– For Continuous variables
• Probability Density Function (PDF)

– For Discrete / Continuous variables


• Cumulative Distribution Function

52
Random Variable and Probabilities
• Let X be the Random Variable
• Let x be one of its possible values
• Let P(x) be the probability that X=x
• Then

53
Random Variable: Expected Value
• Expected value : E(X)
– Weighted average, of all possible values, considering
their probabilities

• For Discrete random variable

• For Continuous random variable

54
Random Variable: Variance and Standard Dev
• Variance of a Random Variable

• Standard Deviation of a Random Variable

• Where

• Variance and Standard Deviation reflects the extent to


which the Random variable is close to its mean

55
Sample – to – Population
• Goal of all these definitions:
– Given the nature of the phenomena
• Probability distributions
– And a sample with certain deductions
• Measured observations
– Predict additional properties and confidence levels
• We therefore need to
– Understand generic phenomena
– And the probabilistic nature of their events

56
Probability Distributions
• Binomial Distribution
• Poisson Distribution
• Students T-Distribution
• Chi-Square Distribution
• F-Distribution
• Normal Distribution
• Log normal Distribution
• Other Distributions
– Bernoulli
– Geometric
– Hypergeometric
– Multinomial
– Exponential
– Beta
– Gamma

57
Use of Probability Functions of “R”
• dxxxx
– Probability density function
– Given a ‘number’ (quantile) this function will tell you the
probability of getting that number (quantile)
• pxxxx
– Cumulative probability distribution function
– Given a ‘number’ (quantile) this function will tell you the
cumulative probability of getting all values up to that number
• qxxxx
– Given a cumulative probability, this function returns the
‘number’ (quantile) associated with that probability
• rxxxx
– This function generates the required number of data points
conforming to the desired probability distribution (as per the
specified parameters)

58
Normal Distribution
• The Cumulative Distribution Function for Normal
Distribution is available
– In the form of a Standard Normal Distribution Table
– Based on Z = (x-)/
• The Standard Normal Distribution table is used in
solving problems

59
Some properties of probability distributions
• Central Moments
– the expected value of a specified integer power of the
deviation of the random variable from the mean

• The moments and their interpretations


– Zeroth central moment (= 1)
– First central moment (the mean)
– Second central moment (measure of variance)
– Third central moment (measure of skewness)
– Fourth central moment (measure of kurtosis)

60

You might also like