BA2 - Statistical Sampling
BA2 - Statistical Sampling
INDIVIDUAL
COPYRIGHT VIEWS
CREDITS JARGONS
www.viswamitra.org| 2
AGENDA
01 RECAP
02 SAMPLING DISTRIBUTION
03 SAMPLING METHODS
05 SAMPLING ERROR
www.viswamitra.org| 3
SAMPLING DISTRIBUTION
www.viswamitra.org| 4
STATISTICAL SAMPLING: METHODS
Sampling
Methods
The population is
Subjective Probabilistic divided into
Sampling Sampling clusters, and a
Methods Methods random sample of
clusters is selected
Expert judgment is Samples are selected Each item in the Selects every The population is
used to select the based on ease of population has nth item from divided into natural
sample access an equal chance the population subsets (strata) Random
Random
based on Time
Time Points
characteristics Selection
(Gender, Age group
etc.) Choose a random Choose n random
time and then select times and select the
the next n items next item
www.viswamitra.org| 5
ESTIMATING POPULATION PARAMETERS
Point Estimate:
A point estimate is a single value derived from sample data that is used to estimate an unknown population parameter. It is most used
method in statistics.
Provides a specific numerical value as an approximation of the population parameter.
Advantages: Easy to calculate. Limitations: Lack of Precision
Interval Estimates:
An interval estimate provides a range of values within which a population parameter is expected to lie, based on sample data. It offers
more information than a point estimate by accounting for the variability and uncertainty inherent in the estimation process.
• Confidence Intervals: A confidence interval is a range of values between which the value of the population parameter is believed to be,
along with a probability that the interval correctly estimates the true (unknown) population parameter. This probability is called the
level of confidence, denoted by 1 - α, where α is a number between 0 and 1. The level of confidence is usually expressed as a percent;
common values are 90%, 95%, or 99%. (Note that if the level of confidence is 90%, then α = 0.1.) The margin of error depends on the
level of confidence and the sample size.
• Prediction Intervals: A prediction interval is one that provides a range for predicting the value of a new observation from the same
population. This is different from a confidence interval, which provides an interval estimate of a population parameter, such as the
mean or proportion. A confidence interval is associated with the sampling distribution of a statistic, but a prediction interval is
associated with the distribution of the random variable itself.
www.viswamitra.org| 6
SAMPLING ERROR
Sampling error is the difference between a sample statistic (e.g., sample mean) and the corresponding
population parameter (e.g., population mean) due to the fact that the sample is only a subset of the
population.
Causes of Sampling Error:
• Random Variation: Natural differences between samples due to randomness.
• Sample Size: Smaller samples tend to have larger sampling errors due to less data representing the
population.
• Sampling Method: Non-random sampling methods can introduce bias, increasing sampling error.
www.viswamitra.org| 7
RECAP
1. What is a sampling distribution? 6. Which sampling method involves dividing the population into strata
A) A distribution of frequencies of a single sample based on shared characteristics?
B) A probability distribution of a statistic obtained from multiple A) Simple Random Sampling
samples B) Systematic Sampling
C) A distribution of the entire population C) Stratified Sampling
D) A single value representing the population mean D) Cluster Sampling
www.viswamitra.org| 9
HYPOTHESIS TESTING
www.viswamitra.org| 10
TYPES OF ERRORS
2. Non Parametric Test: Whenever a few assumptions in the given population are uncertain, we use non-
parametric tests
a. Chi-Square Test: Simple random sampling size greater than 50, Samples are independent.
www.viswamitra.org| 12
HYPOTHESIS TESTING STEPS:
1. Formulate Hypotheses:
o Null Hypothesis (H0): A statement of no effect or no difference, which we aim to test.
o Alternative Hypothesis (H1 or Ha): A statement that there is an effect or a difference.
5. Make a Decision:
o Compare the p-value to the significance level (α).
o If p-value ≤ α, reject the null hypothesis.
o If p-value > α, do not reject the null hypothesis.
www.viswamitra.org| 13
ONE-SAMPLE HYPOTHESIS TESTING
A one-sample hypothesis test is used to determine if a sample comes from a
population with a specific mean (or another parameter). It is useful when you
want to compare the sample mean to a known population mean or a
hypothesized value.
Alternative Hypothesis (H₁): The statement you want to test against the null
hypothesis. It suggests that the population mean is different from the
specified value.
Example: H₁: μ ≠ μ₀ (two-tailed test)
H₁: μ > μ₀ (right-tailed test)
H₁: μ < μ₀ (left-tailed test)
www.viswamitra.org| 14
TWO-TAILED TEST OF HYPOTHESIS FOR MEAN
www.viswamitra.org| 15
ONE-SAMPLE HYPOTHESIS TESTING - EXAMPLE
Problem: An engineer wants to test if the average lifespan of a certain type of battery is 100 hours. A sample of 25 batteries has an average
lifespan of 95 hours with a standard deviation of 10 hours. The significance level is 0.05.
3.Collect Data: Sample mean = 95 hours, sample standard deviation = 10 hours, sample size = 25
6.Make a Decision:
1. The test statistic (-2.5) is outside the range of -2.064 to 2.064.
2. Since the p-value < 0.05, reject H₀.
3. Conclusion: There is sufficient evidence to conclude that the average lifespan of the batteries is different from 100 hours
www.viswamitra.org| 16
TWO-SAMPLE HYPOTHESIS TESTING
Two-sample hypothesis testing is used to compare the means (or other parameters) of two independent
groups to determine if there is a statistically significant difference between them. This type of test is often
used to compare experimental and control groups, different treatment groups, or any other two
independent samples.
www.viswamitra.org| 17
TWO-SAMPLE HYPOTHESIS TESTING - EXAMPLE
Problem: Suppose an engineer wants to compare the mean lifespans of batteries from two different manufacturers. A sample of
30 batteries from Manufacturer A has a mean lifespan of 100 hours with a standard deviation of 5 hours. A sample of 25 batteries
from Manufacturer B has a mean lifespan of 98 hours with a standard deviation of 6 hours. The significance level is 0.05.
www.viswamitra.org| 18
RECAP
1. What is the null hypothesis (H0) in hypothesis testing? 5. What is the significance level (alpha) commonly used in
A) A statement that there is an effect or a difference. hypothesis testing?
B) A statement that there is no effect or no difference. A) 0.01
C) A statement that the sample mean is different from the B) 0.05
population mean. C) 0.10
D) A statement that the alternative hypothesis is true. D) All of the above
www.viswamitra.org| 20
ANALYSIS OF VARIANCE (ANOVA) - EXAMPLE
www.viswamitra.org| 21
CHI-SQUARE TEST FOR INDEPENDENCE
The Chi-Square Test of Independence is a statistical method used to determine if there is a significant association between two categorical
variables. Essentially, it tests whether the distribution of one variable is independent of the distribution of another variable
www.viswamitra.org| 22
CHI-SQUARE TEST FOR INDEPENDENCE - EXAMPLE
A researcher wants to determine if there is an association between smoking status (Smoker, Non-Smoker)
and having a chronic disease (Yes, No).
Disease Yes Disease No Total
Data: Smoker 30 70 100
Non-Smoker 10 90 100
Total 40 160 200
www.viswamitra.org| 23
CHI-SQUARE TEST FOR INDEPENDENCE - EXAMPLE
www.viswamitra.org| 24
RECAP
1. What is the main purpose of ANOVA? 6. What is the null hypothesis in a Chi-square test of independence?
A) To compare the means of two groups. A) The variables are independent.
B) To compare the variances of two groups. B) The variables are dependent.
C) To compare the means of three or more groups. C) The sample is normally distributed.
D) To test for independence between categorical variables. D) The group means are equal.
2. In ANOVA, what is the term for the variation within each group? 7. How is the total sum of squares (SST) calculated in ANOVA?
A) Total variation. A) Sum of the squared differences between each group mean and the overall mean.
B) Between-group variation. B) Sum of the squared differences between each data point and the overall mean.
C) Within-group variation. C) Sum of the squared differences between each data point and its group mean.
D) Residual variation. D) Sum of the squared differences between the group variances.
3. What is the null hypothesis in a one-way ANOVA test? 8. What does the mean square between groups (MSB) represent in ANOVA?
A) All group variances are equal. A) Total variation within each group.
B) All group means are equal. B) Total variation between the groups.
C) The samples are dependent. C) Average variation within each group.
D) The samples are normally distributed. D) Average variation between the groups.
4. What is the alternative hypothesis in a one-way ANOVA test? 9. What does the degrees of freedom (df) for the F-test in ANOVA depend on?
A) At least one group variance is different. A) The number of groups and the total sample size.
B) All group means are equal. B) Only the number of groups.
C) At least one group mean is different. C) Only the total sample size.
D) The samples are normally distributed. D) The significance level.
5. What is the primary purpose of the Chi-square test? 10. What is the formula for the degrees of freedom in a Chi-square test of
A) To compare means between two groups. independence?
B) To compare means between three or more groups. A) (Number of rows - 1) × (Number of columns - 1)
C) To test for independence between categorical variables. B) Number of rows × Number of columns
D) To test for normality of a distribution. C) (Number of rows + 1) × (Number of columns + 1)
D) Number of rows + Number of columns - 1
www.viswamitra.org| 25
THANK YOU
Disclaimer: Views expressed are personal and I don’t represent the company that I’m
working or the ones that I worked in the past.