0% found this document useful (0 votes)
12 views10 pages

Inferential Statistics

Uploaded by

khushiyadav88400
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views10 pages

Inferential Statistics

Uploaded by

khushiyadav88400
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Sampling and Confidence Interval

Umesh Pathak
December 5, 2024

1 Inferential Statistics
1.1 Sampling and Confidence Intervals
Sampling and confidence intervals are fundamental concepts in statistics used to estimate population
parameters based on a sample and to quantify the uncertainty of these estimates.

1. Sampling
What is Sampling? Sampling involves selecting a subset (sample) of individuals or observations
from a larger population to make inferences about the population as a whole.
Types of Sampling 1. Random Sampling : - Each individual in the population has an equal
chance of being selected. - Reduces bias and ensures representativeness.
2. Stratified Sampling : - The population is divided into subgroups (strata) based on
characteristics, and random samples are taken from each stratum. - Ensures representation of key
subgroups.
3. Systematic Sampling : - Select every kth individual from a list of the population. -
Simpler than random sampling but can introduce bias if the population has a pattern.
4. Cluster Sampling : - Divide the population into clusters (e.g., geographic regions), and
randomly select entire clusters for the sample. - Useful for large, dispersed populations.
5. Convenience Sampling : - Use individuals who are easiest to access. - May introduce
significant bias.

2. Confidence Interval
What is a Confidence Interval? A confidence interval (CI) provides a range of values that likely
contains the true population parameter (e.g., mean, proportion) with a specified level of confidence
(e.g., 95
Key Components of a Confidence Interval 1. Point Estimate : - The sample statistic used to
estimate the population parameter (e.g., sample mean).
2. Margin of Error (MOE) : - Reflects the uncertainty in the estimate due to sampling
variability. - Larger sample sizes lead to smaller MOEs.
3. Confidence Level : - The probability that the interval contains the true parameter. -
Common levels: 90
Formula for Confidence Interval For the population mean (µ), when the population standard
deviation (σ) is known:
σ
CI = X̄ ± Z · √
n

1
For the population mean (µ), when σ is unknown (using the sample standard deviation s):
s
CI = X̄ ± t · √
n

Where: - X̄: Sample mean - Z: Critical value from the standard normal distribution (e.g., 1.96
for 95- t: Critical value from the t-distribution - σ: Population standard deviation - s: Sample
standard deviation - n: Sample size

3. Example of Confidence Interval
Scenario : You conduct a study to estimate the average height of students in a university. A
random sample of 50 students has: - Sample mean (X̄) = 170 cm - Sample standard deviation (s)
= 8 cm
Objective : Construct a 95

Solution :
1. Identify Parameters : - X̄ = 170, s = 8, n = 50, confidence level = 95
2. Find the Critical t-Value : - Degrees of freedom (df ) = n − 1 = 50 − 1 = 49. - For 95
3. Compute the Margin of Error (MOE) :
s
M OE = t · √
n
8
M OE = 2.009 · √ = 2.009 · 1.131 = 2.27
50
4. Calculate the Confidence Interval :

CI = X̄ ± M OE

CI = 170 ± 2.27 = [167.73, 172.27]


Interpretation : With 95

4. Importance of Sampling and Confidence Intervals
1. Sampling : - Reduces costs and time compared to a full population census. - Enables
estimation of population parameters.
2. Confidence Intervals : - Provide a range of plausible values for the population parameter.
- Account for variability in sampling, ensuring more reliable conclusions.

Applications
- Polling : Estimating the proportion of voters favoring a candidate.
- Quality Control : Determining the mean weight of products in a factory.
- Medical Studies : Estimating the average effect of a treatment.

2
1.2 Inference and Significance in Statistics
In statistics, inference and significance are closely related concepts used in analyzing data, making
predictions, and drawing conclusions about a population based on sample data.

1. Statistical Inference
Statistical inference refers to the process of using sample data to draw conclusions about a
population. It involves estimating parameters, testing hypotheses, and predicting future outcomes.
Types of Inference
1. Estimation : - Point Estimate : A single value used to estimate a population parameter (e.g.,
sample mean for population mean).
- Confidence Interval : A range of values that likely contains the population parameter with a
specified probability (e.g., 95
2. Hypothesis Testing : - Involves testing assumptions (null and alternative hypotheses)
about population parameters using sample data.
- Example: Testing if a new drug is more effective than a placebo.
3. Prediction : - Using models (e.g., regression) to predict future outcomes based on current
data.

2. Statistical Significance
Statistical significance is a measure of whether the results observed in a sample are unlikely to
have occurred by random chance. It helps determine if the observed effect is real or due to sampling
variability.
Key Concepts in Significance 1. Null Hypothesis (H0 ) : - The default assumption
that there is no effect or difference (e.g., no relationship between variables).
2. Alternative Hypothesis (Ha ) : - The assumption that there is an effect or difference
(e.g., a relationship exists).
3. p-Value : - The probability of observing results as extreme as the sample data, assuming
the null hypothesis is true. - Threshold : If p < α (e.g., 0.05), the result is considered statistically
significant, and H0 is rejected.
4. Significance Level (α) : - The pre-determined threshold for significance (e.g., 0.05, 0.01).
- It represents the probability of rejecting H0 when it is true (Type I error).

Example: Statistical Inference and Significance
Scenario : A company claims that the average weight of a bag of chips is 500 grams. A quality
control team takes a sample of 30 bags, which has: - Sample mean (X̄) = 495 grams. - Sample
standard deviation (s) = 10 grams.
Objective : Test whether the bags weigh less than 500 grams (Ha : µ < 500) at a significance
level of α = 0.05.

Step-by-Step Solution

1. Formulate Hypotheses : - Null Hypothesis (H0 ): µ = 500 (no difference). - Alternative


Hypothesis (Ha ): µ < 500 (mean weight is less).
2. Select the Test : - Since the population standard deviation is unknown, use a one-sample
t-test .

3
3. Calculate Test Statistic :
X̄ − µ
t=
√s
n

Substituting values:
495 − 500 −5
t= = ≈ −2.73
√10 1.83
30

4. Find the Critical Value : - Degrees of Freedom (df ) = n − 1 = 30 − 1 = 29. - For α = 0.05
(one-tailed test), critical t-value from the t-table is approximately -1.699 .
5. Compare Test Statistic and Critical Value : - t = −2.73 is less than −1.699.
6. Decision : - Reject H0 : There is significant evidence that the average weight is less than
500 grams.

Key Takeaways
1. Inference involves analyzing data to make decisions or predictions about the population.
2. Significance evaluates whether the results are due to chance or a true effect.
3. A statistically significant result (p < α) indicates strong evidence to reject the null hypothesis.

1.3 Estimation and Hypothesis Testing


Estimation and hypothesis testing are two core components of inferential statistics. They are used
to draw conclusions about a population based on sample data.

1. Estimation
Estimation is the process of inferring the value of a population parameter based on sample
data. It provides either a single value (point estimation) or a range of values (interval estimation)
to estimate the unknown parameter.
Types of Estimation
1. Point Estimation : - Provides a single value as an estimate of the population parameter. -
Example: Sample mean (X̄) is a point estimate of the population mean (µ).
Common Point Estimators : - Sample mean (X̄) for population mean (µ). - Sample proportion
(p̂) for population proportion (p). - Sample variance (s2 ) for population variance (σ 2 ).
2. Interval Estimation : - Provides a range of values (confidence interval) within which the
population parameter is likely to lie. - Example: A 95
Confidence Interval Formula : For population mean:
σ
CI = X̄ ± Z · √ (if σ is known)
n
s
CI = X̄ ± t · √ (if σ is unknown)
n

2. Hypothesis Testing
Hypothesis testing is a formal procedure for comparing observed data with a claim or hypothesis
about a population parameter. It helps determine whether the observed data provides enough
evidence to reject the null hypothesis.

4
Steps in Hypothesis Testing
1. State the Hypotheses : - Null Hypothesis (H0 ) : Assumes no effect or no difference (e.g.,
µ = µ0 ). - Alternative Hypothesis (Ha ) : Represents the claim we aim to test (e.g., µ > µ0 ).
2. Choose the Significance Level (α) : - Typically, α = 0.05 or 5- The probability of rejecting
H0 when H0 is true (Type I error).
3. Select the Test and Calculate the Test Statistic : - Examples of test statistics: - z-test: Used
when the population standard deviation (σ) is known or the sample size is large. - t-test: Used
when σ is unknown and the sample size is small.
- For population mean :
X̄ − µ0
z= σ √
n

X̄ − µ0
t=
√s
n

4. Determine the Critical Value or p-Value : - Compare the test statistic to the critical value
from the z- or t-distribution. - Alternatively, calculate the p-value and compare it to α.
5. Make a Decision : - If p ≤ α or the test statistic exceeds the critical value, reject H0 . -
Otherwise, fail to reject H0 .
6. State the Conclusion : - Clearly interpret the results in the context of the problem.

Example: Combining Estimation and Hypothesis Testing
Scenario : A factory claims that the average weight of its packets of rice is 1 kg. A quality
control officer collects a random sample of 30 packets and finds: - Sample mean (X̄) = 0.98 kg -
Sample standard deviation (s) = 0.05 kg
Objective : 1. Estimate the true mean weight using a 952. Test if the average weight is less
than 1 kg at α = 0.05.

Solution :
1. Confidence Interval (Estimation) : - n = 30, X̄ = 0.98, s = 0.05, t-value for df = 29 at 95
s
CI = X̄ ± t · √
n
0.05
CI = 0.98 ± 2.045 · √
30
CI = 0.98 ± 0.0187 = [0.9613, 0.9987]
Interpretation : The true mean weight is likely between 0.9613 kg and 0.9987 kg.

2. Hypothesis Testing :
Step 1 : Hypotheses:
H0 : µ = 1 vs. Ha : µ < 1
Step 2 : Test Statistic:

X̄ − µ0 0.98 − 1 −0.02
t= = = ≈ −2.20
√s 0.05
√ 0.0091
n 30

5
Step 3 : Critical Value: From the t-table for df = 29 and one-tailed test at α = 0.05, the critical
t-value = -1.699.
Step 4 : Decision: Since t = −2.20 < −1.699, reject H0 .
Step 5 : Conclusion: There is significant evidence to conclude that the average weight of the
packets is less than 1 kg.

Comparison of Estimation and Hypothesis Testing
Aspect Estimation Hypothesis Testing
Objective Estimate the value of a popula- Test a claim or hypothesis about
tion parameter. a population parameter.
Output Confidence interval or point esti- Decision: reject or fail to reject
mate. H0 .
Focus Quantifying uncertainty of an es- Assessing evidence for/against a
timate. hypothesis.
Example Estimating the mean income of a Testing if the mean income dif-
population. fers from $50,000.

1.4 Goodness of Fit


Goodness of fit tests are statistical methods used to determine how well observed data match a
theoretical or expected distribution. They help evaluate whether a given dataset follows a specific
probability distribution (e.g., normal, uniform, binomial).

Common Goodness-of-Fit Tests
1. Chi-Square Goodness-of-Fit Test - Used for categorical data to compare observed frequencies
with expected frequencies. - Assumes a large sample size.
Test Statistic :
X (Oi − Ei )2
χ2 =
Ei
Where: - Oi : Observed frequency for category i. - Ei : Expected frequency for category i.
Hypotheses : - H0 : The observed data fit the expected distribution. - Ha : The observed data
do not fit the expected distribution.
Example : Suppose you roll a die 60 times and observe the following frequencies:

O = [8, 12, 10, 14, 9, 7]

The expected frequency for each side is E = 60/6 = 10. A chi-square test determines if the die is
fair.

2. Kolmogorov-Smirnov (K-S) Test - Used for continuous data to compare the empirical dis-
tribution function (EDF) of the sample to a theoretical distribution. - Suitable for small sample
sizes.
Test Statistic :
D = max |Fo (x) − Fe (x)|
Where: - Fo (x): Observed cumulative distribution function (CDF). - Fe (x): Expected CDF.

6
Hypotheses : - H0 : The sample comes from the specified distribution. - Ha : The sample does
not come from the specified distribution.
Example : Test if a sample of data follows a normal distribution using the K-S test.

3. Anderson-Darling Test - Similar to the K-S test but gives more weight to the tails of the
distribution. - Used for testing if data follow a specific distribution, especially normality.
Test Statistic :
n
2 1X
A = −n − ((2i − 1) [ln F (xi ) + ln(1 − F (xn+1−i ))])
n i=1

Where: - n: Sample size. - F (xi ): CDF of the theoretical distribution.



4. Shapiro-Wilk Test - Specifically designed to test for normality of data. - Works well for small
sample sizes.
Hypotheses : - H0 : The data follow a normal distribution. - Ha : The data do not follow a
normal distribution.

Steps in Goodness-of-Fit Testing
1. Formulate Hypotheses : - Null hypothesis (H0 ): The data fit the specified distribution. -
Alternative hypothesis (Ha ): The data do not fit the specified distribution.
2. Determine Expected Values : - For categorical data, calculate expected frequencies based on
the theoretical distribution.
3. Compute the Test Statistic : - Use the appropriate formula (e.g., chi-square, K-S, etc.).
4. Compare to Critical Value or Compute p-value : - Compare the test statistic to the critical
value or use the p-value to determine significance.
5. Conclusion : - If the test statistic exceeds the critical value or p < α (e.g., 0.05), reject H0 .

Example: Chi-Square Goodness-of-Fit Test
Problem: A researcher wants to test if a die is fair. Observed frequencies of outcomes after 60
rolls are:
O = [8, 12, 10, 14, 9, 7]
The expected frequency for each side of a fair die is E = 60/6 = 10.

Solution:
1. Hypotheses : - H0 : The die is fair (observed frequencies match expected frequencies). - Ha :
The die is not fair.
2. Calculate Test Statistic : Using:
X (Oi − Ei )2
χ2 =
Ei

(8 − 10)2 (12 − 10)2 (10 − 10)2 (14 − 10)2 (9 − 10)2 (7 − 10)2


χ2 = + + + + +
10 10 10 10 10 10
4 4 0 16 1 9
χ2 = + + + + + = 3.4
10 10 10 10 10 10

7
3. Compare to Critical Value : - Degrees of freedom (df ) = Number of categories - 1 = 6−1 = 5.
- Critical value for df = 5 and α = 0.05 from the chi-square table is 11.07.
4. Conclusion : - 3.4 < 11.07: Fail to reject H0 . There is no significant evidence to suggest the
die is unfair.

Applications of Goodness-of-Fit Tests
1. Quality Control : - Test if a manufacturing process produces items meeting a specified
distribution (e.g., defect rates).
2. Model Validation : - Check if data follow a theoretical model (e.g., normal distribution for
errors in regression).
3. Genetics : - Test if observed traits follow Mendelian inheritance ratios.
4. Market Research : - Compare observed consumer behaviour to expected proportions.

1.5 Test of Independence


The test of independence is used in statistics to determine whether two categorical variables are
independent of each other or if there is an association between them. It is commonly performed
using the Chi-Square Test of Independence .

1. What is a Test of Independence?
The test of independence examines the relationship between two categorical variables in a con-
tingency table. It answers the question:
”Is there an association between the two variables, or are they independent?”
Key Terms : - Null Hypothesis (H0 ) : The two variables are independent. - Alternative Hy-
pothesis (Ha ) : The two variables are not independent (there is an association).

2. Steps to Perform a Chi-Square Test of Independence
Step 1: Organize Data into a Contingency Table A contingency table displays the frequency
distribution of two categorical variables. For example:
Organize Data into a Contingency Table
Group 1 Group 2 Group 3 Total
Category A 15 20 30 65
Category B 25 15 20 60
Total 40 35 50 125
Step 2: Compute Expected Frequencies The expected frequency for each cell is calculated under
the assumption that the two variables are independent:

(Row Total) × (Column Total)


Eij =
(Grand Total)

For example, in the first cell (A1):

(Row T otal f or A) × (Column T otal f or 1) 65 × 40


EA1 = = = 20.8
Grand T otal 125

8
Step 3: Compute the Chi-Square Statistic The test statistic is calculated as:
X (Oij − Eij )2
χ2 =
Eij

Where: - Oij : Observed frequency in cell (i, j) - Eij : Expected frequency in cell (i, j)
Step 4: Determine Degrees of Freedom The degrees of freedom (df ) for a contingency table is
given by:
df = (r − 1) × (c − 1)
Where r is the number of rows and c is the number of columns.
Step 5: Compare to the Critical Value or Compute p-Value - Use a Chi-Square distribution
table to find the critical value at a given significance level (α). - Alternatively, calculate the p-value
and compare it to α.
Step 6: Decision - If χ2 exceeds the critical value or p ≤ α, reject H0 . - Otherwise, fail to reject
H0 .

3. Example
Scenario : A researcher wants to determine if there is an association between gender (Male/Female)
and preference for a product (Product A/Product B). The data collected is as follows:

Scenario
Product A Product B Product C
Male 40 60 100
Female 50 50 100
Total 90 110 200
Step-by-Step Solution :
Step 1: Compute Expected Frequencies For each cell:

(Row Total) × (Column Total)


Eij =
(Grand Total)

For EM ale,P roductA :

(Row T otal f or M ale) × (Column T otal f or P roduct A) 100 × 90


E= = = 45
Grand T otal 200
For EM ale,P roductB :
100 × 110
E= = 55
200
For EF emale,P roductA :
100 × 90
E= = 45
200
For EF emale,P roductB :
100 × 110
E= = 55
200
Step 2: Construct Expected Table

9
Construct Expected Table
Product A Product B Product C
Male 45 55 100
Female 45 55 100
Total 90 110 200

Step 3: Compute χ2
X (Oij − Eij )2
χ2 =
Eij
For each cell: - Male, Product A:

(40 − 45)2 25
= = 0.56
45 45
- Male, Product B:
(60 − 55)2 25
= = 0.45
55 55
- Female, Product A:
(50 − 45)2 25
= = 0.56
45 45
- Female, Product B:
(50 − 55)2 25
= = 0.45
55 55

χ2 = 0.56 + 0.45 + 0.56 + 0.45 = 2.02


Step 4: Degrees of Freedom

df = (r − 1) × (c − 1) = (2 − 1) × (2 − 1) = 1

Step 5: Compare to Critical Value From the Chi-Square table, for df = 1 and α = 0.05, the
critical value is 3.841 .
Since χ2 = 2.02 < 3.841, we fail to reject H0 .

Conclusion : There is no significant association between gender and product preference at the 5

Applications of Test of Independence
1. Marketing : Analyzing the relationship between customer demographics and product preference.
2. Healthcare : Evaluating the association between smoking and disease prevalence.
3. Education : Determining if there is a relationship between study methods and exam performance.

10

You might also like