03 Estimation IITB PDF
03 Estimation IITB PDF
2
Samples, Parameters & Statistics
• Sampling
– Allows us to make inferences about a population based on a
sample of that population
• Parameters
– Numerical characteristics about the population that are of
interest
• Statistics
– Parameters cannot be exactly determined
– They can only be estimated from samples
– These estimates or summaries, based on the sample, are known
as Statistics
• Major aspects of samples and statistics:
– How accurate are the estimators (statistics)?
– Is the sample truly representative of the population?
3
Statistics as Estimates for Parameters
• We use statistics to estimate parameters
– Proportions
– Arithmetic averages
– Ranges
– Quartiles
– Deciles
– Percentiles
– Variances
– Standard deviations
4
Sampling Methods
• Non-probability sampling
– Convenience sampling
• Randomly pick-up the easily accessible apples
– Judgment or subjective sampling
– Volunteer sampling
• Especially used in clinical trials and research
• Probability sampling methods
– This involves the planned use of chance
– There is no selection bias
• Assignment:
– Review of sampling techniques and when they should be
used
5
Point Estimation
• Estimation
– First step of Inferential Statistics
• (Second step is Hypothesis Testing)
– Two types:
• Point estimation
• Interval estimation
• Point Estimate
– Value of a statistic
– Calculated from a sample
– Estimates the parameter of the population
6
Discussion
• Different possible samples can be drawn from the
same population
• Each of those samples can yield a different value
of a statistic
– Example: Mean and Variance
• It becomes important to investigate the sampling
distributions of estimators
• Sampling Distribution for a given sample size, n:
– Collection of all the estimators of that parameter
– Of all possible samples of size ‘n’ from the population
7
Central Limit Theorem
• Statement
– The central limit theorem states that given a
distribution with a mean μ and variance σ², the
sampling distribution of the mean approaches a
normal distribution with a mean (μ) and a variance
σ²/N as N, the sample size, increases
8
Sampling distribution of the mean
UNIFORM DISTTRIBUTION
9
Sampling distribution of the mean
POISSON DISTRIBUTION
10
Sampling distribution of the mean
BINOMIAL DISTRIBUTION
11
Sampling distribution of the mean
TRIANGULAR DISTRIBUTION
12
Sampling distribution of the mean
NORMAL DISTRIBUTION
13
Sampling distribution of the mean
14
Mean: Sampling dist. of the mean
15
Variance / SD of sampling distribution of mean
16
Sampling Distribution: Mean
• The sampling distribution of the Mean
– Becomes approximately normal as the size of the
sample ‘n’ increases
– Regardless of the shape of the population
• Standard deviation of the sampling distribution of
the mean: Also known as the standard error
• Where is the standard deviation of the
population
• http://onlinestatbook.com/
17
Normal Distribution Table
18
Normal Distribution Table
19
Exercise
• Savings account in a bank are normally distributed
with mean Rs 2000 and standard deviation Rs 600
• The bank conducts a study by selecting 100
random accounts
• Find the probability that the mean of the sample
will lie between Rs 1900 and Rs 2050
• Answer: 0.7492
20
Exercise
• Distribution of income of a certain category of
bank employees has a mean of Rs 1,50,000 and a
standard deviation of Rs 20000
• If a random sample of 30 is selected, what is the
probability that the mean salary of the sample will
exceed Rs 1,57,500?
21
Interval Estimation
• Definition:
– It is a range of values related to a parameter
– Calculated based on the sample
– Such that
• The parameter will be within that range
• With some degree of confidence
• Use
– A statistic, such as the mean, can be presented
• As a Point Estimate, X
• As an interval, X ± 𝐸 where E is the margin of error
22
Point v/s Interval Estimates
• Point estimate is often insufficient
– It is either right or wrong!
– It also does not indicate the confidence level in that
estimate
• Interval estimate
– Better option : report an interval estimate
– It provides the range, as well as the degree of
confidence
23
Confidence Interval : Mean
• Margin of error for the Mean
– Where the population SD is known
24
The t-distribution
25
Exercise
• Mercury needs to be estimated in the water of a
certain area. 16 samples are collected and their
ppm values of mercury are as given below
26
Answer
• Sample size small
– n <= 30
– Hence students t-distribution will be used
• n = 16
• Sample mean = 403.063
• Variance = 12.996
• Standard deviation = 3.605
• Confidence level 95%, DOF = 16-1 = 15
• From t charts t.025 = 2.131
• Margin of error = 2.131 * 3.605 / sqrt(16) = 1.92
• Based on this the interval (401.143, 404.983)
27
Problem
28
Exercise
• In a factory that used coal as fuel, the
consumption was observed for 10 consecutive
weeks. It was found that an average value of
11400 tons of coal was consumed per day with a
standard deviation of 700 tons.
• From this data, the plan manager wants to
estimate an interval for the mean consumption
such that he can be 95% confident about the coal
requirement. Can you help him out?
29
Sample Size Determination
• Why determine sample size?
– To ensure that the error in estimating a population
parameter is less than a desired threshold
• When sample is too small:
– Required precision is not achieved
• When sample size is too large
– Wastage of resources required to estimate the
parameter
30
Sample Size Determination
• In the case of sampling distribution of the mean,
the margin of error is:
• Therefore
31
Exercise
• In a normal distribution with mean 375 and SD 48,
what should be the size of the sample to ensure
that the mean will be between 370 and 380 with a
probability of 0.95
32
Confidence Interval: Variance and SD
• The Variance is a “sum of squares of items”
• Hence:
36
Confidence Interval: Variance and SD
• Confidence interval: Standard Deviation
37
Exercise
• 100 healthy adults were subject to driving hazards
• Their response times were measured, and the
variance calculated was
• 0.0196 seconds squared
• For 95% confidence level, find the interval within
which this value will lie
38
Solution
• n = 100; n-1 = 99
• S2 = 0.0196; SD = 0.14
• Confidence level = 95% = 0.95
• (1-) = 0.95; = 0.05;
• /2 = 0.025; 1-/2 = 0.975
• 0.025 = 73.36; 0.975 = 128.422
39
Confidence Interval: Proportions
• Another important “statistic” is the “proportion”
• We are often interested in:
– Proportion of the population satisfying a certain
criteria
• Proportion of population above / below poverty line
• Proportion of travellers reporting sick on arrival
• Proportion of population using public transport
• Proportion of kids dropping out of school by the age of 15
40
Confidence Interval: Proportions
• Let ‘P’ be the TRUE value of the proportion in the
population
• Let ‘n’ be the size of the sample drawn from the
population
• Let ‘X’ be the number of elements in the sample
that exhibit the attribute under study
• The ‘estimated value’ of the TRUE proportion of
the population is given by : p^ = X/n
41
Confidence Interval: Proportions
• It can be shown that Z (it is normally distributed)
• and
43
Solution
• x = 160
• n = 500
• p^ = 160 / 500 = 0.32
• alpha (for 95%) = 0.05
• alpha/2 = 0.025
• Substituting in
• Z0.025 = 1.96
• 1.96 * sqrt(0.32 * (1-0.32)/500) = 0.041
• 0.32 – 0.041 < p < 0.32 + 0.041
• 0.279 < p < 0.361
44
Estimation: Additional Exercises
45
Estimation: Additional Exercises
46
Estimation: Additional Exercises
47
Estimation: Additional Exercises
48
Estimation: Additional Exercises
49
ADDITIONAL SLIDES
50
Random Variables
• Random experiment
– Process of measurement or observation in which the
outcome cannot be completely determined in advance
• Sample space
– All possible outcomes of a random experiment
• Random Variable
– A real-valued quantity, or numerical measure, whose
value depends on the outcomes of a random
experiment
– Can be Discrete or Continuous
51
Random Variable
• The probability that a Random Variable may
assume a particular value is governed by a
Probability Function:
– For Discrete variables
• Probability Mass Function (PMF)
– For Continuous variables
• Probability Density Function (PDF)
52
Random Variable and Probabilities
• Let X be the Random Variable
• Let x be one of its possible values
• Let P(x) be the probability that X=x
• Then
53
Random Variable: Expected Value
• Expected value : E(X)
– Weighted average, of all possible values, considering
their probabilities
54
Random Variable: Variance and Standard Dev
• Variance of a Random Variable
• Where
55
Sample – to – Population
• Goal of all these definitions:
– Given the nature of the phenomena
• Probability distributions
– And a sample with certain deductions
• Measured observations
– Predict additional properties and confidence levels
• We therefore need to
– Understand generic phenomena
– And the probabilistic nature of their events
56
Probability Distributions
• Binomial Distribution
• Poisson Distribution
• Students T-Distribution
• Chi-Square Distribution
• F-Distribution
• Normal Distribution
• Log normal Distribution
• Other Distributions
– Bernoulli
– Geometric
– Hypergeometric
– Multinomial
– Exponential
– Beta
– Gamma
57
Use of Probability Functions of “R”
• dxxxx
– Probability density function
– Given a ‘number’ (quantile) this function will tell you the
probability of getting that number (quantile)
• pxxxx
– Cumulative probability distribution function
– Given a ‘number’ (quantile) this function will tell you the
cumulative probability of getting all values up to that number
• qxxxx
– Given a cumulative probability, this function returns the
‘number’ (quantile) associated with that probability
• rxxxx
– This function generates the required number of data points
conforming to the desired probability distribution (as per the
specified parameters)
58
Normal Distribution
• The Cumulative Distribution Function for Normal
Distribution is available
– In the form of a Standard Normal Distribution Table
– Based on Z = (x-)/
• The Standard Normal Distribution table is used in
solving problems
59
Some properties of probability distributions
• Central Moments
– the expected value of a specified integer power of the
deviation of the random variable from the mean
60