0% found this document useful (0 votes)
43 views26 pages

BA2 - Statistical Sampling

Methods, Models, and Decisions,” James R. Evans, Pearson Publications, Second edition
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views26 pages

BA2 - Statistical Sampling

Methods, Models, and Decisions,” James R. Evans, Pearson Publications, Second edition
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

BUSINESS ANALYTICS

Dr. Ramaraju Poosapati


July 2024
DISCLAIMER

INDIVIDUAL
COPYRIGHT VIEWS

The presenter does not The presenter does not


claim any of the represent any
content discussed or organization in which he
presented is his own. is working or worked in
The references are the past. The views
used for Educational expressed are personal
purpose only views.

CREDITS JARGONS

Pics credits are given to Industry specific jargons


respective owners. are used or will be used
Presenter does not take during presentation. If
any credit for the pics any thing is not clear,
used in this presentation please interrupt.

www.viswamitra.org| 2
AGENDA
01 RECAP

02 SAMPLING DISTRIBUTION

03 SAMPLING METHODS

04 ESTIMATING POPULATION PARAMETERS

05 SAMPLING ERROR

www.viswamitra.org| 3
SAMPLING DISTRIBUTION

A sampling distribution is a concept used in statistics. It is a probability distribution of a statistic obtained


from a larger number of samples drawn from a specific population. The sampling distribution of a given
population is the distribution of frequencies of a range of different outcomes that could possibly occur for a
statistic of a population.

• A sampling distribution is a probability distribution of a statistic that is obtained through repeated


sampling of a specific population.
• It describes a range of possible outcomes for a statistic, such as the mean or mode of some variable, of
a population.

www.viswamitra.org| 4
STATISTICAL SAMPLING: METHODS

Sampling
Methods

The population is
Subjective Probabilistic divided into
Sampling Sampling clusters, and a
Methods Methods random sample of
clusters is selected

Simple Systematic Continuous


Judgment Convenience Stratified Cluster
Random (Periodic) Process
Sampling Sampling Sampling Sampling
Sampling Sampling Sampling

Expert judgment is Samples are selected Each item in the Selects every The population is
used to select the based on ease of population has nth item from divided into natural
sample access an equal chance the population subsets (strata) Random
Random
based on Time
Time Points
characteristics Selection
(Gender, Age group
etc.) Choose a random Choose n random
time and then select times and select the
the next n items next item

www.viswamitra.org| 5
ESTIMATING POPULATION PARAMETERS

Point Estimate:
A point estimate is a single value derived from sample data that is used to estimate an unknown population parameter. It is most used
method in statistics.
Provides a specific numerical value as an approximation of the population parameter.
Advantages: Easy to calculate. Limitations: Lack of Precision
Interval Estimates:
An interval estimate provides a range of values within which a population parameter is expected to lie, based on sample data. It offers
more information than a point estimate by accounting for the variability and uncertainty inherent in the estimation process.
• Confidence Intervals: A confidence interval is a range of values between which the value of the population parameter is believed to be,
along with a probability that the interval correctly estimates the true (unknown) population parameter. This probability is called the
level of confidence, denoted by 1 - α, where α is a number between 0 and 1. The level of confidence is usually expressed as a percent;
common values are 90%, 95%, or 99%. (Note that if the level of confidence is 90%, then α = 0.1.) The margin of error depends on the
level of confidence and the sample size.

• Prediction Intervals: A prediction interval is one that provides a range for predicting the value of a new observation from the same
population. This is different from a confidence interval, which provides an interval estimate of a population parameter, such as the
mean or proportion. A confidence interval is associated with the sampling distribution of a statistic, but a prediction interval is
associated with the distribution of the random variable itself.
www.viswamitra.org| 6
SAMPLING ERROR

Sampling error is the difference between a sample statistic (e.g., sample mean) and the corresponding
population parameter (e.g., population mean) due to the fact that the sample is only a subset of the
population.
Causes of Sampling Error:
• Random Variation: Natural differences between samples due to randomness.
• Sample Size: Smaller samples tend to have larger sampling errors due to less data representing the
population.
• Sampling Method: Non-random sampling methods can introduce bias, increasing sampling error.

www.viswamitra.org| 7
RECAP

1. What is a sampling distribution? 6. Which sampling method involves dividing the population into strata
A) A distribution of frequencies of a single sample based on shared characteristics?
B) A probability distribution of a statistic obtained from multiple A) Simple Random Sampling
samples B) Systematic Sampling
C) A distribution of the entire population C) Stratified Sampling
D) A single value representing the population mean D) Cluster Sampling

2. Which of the following is true about a sampling distribution?


A) It is based on a single sample from the population. 7. In cluster sampling, what is typically done within each sampled cluster?
B) It is obtained through repeated sampling of a specific population. A) A random sample is taken
C) It represents the distribution of individual data points. B) A complete census is conducted
D) It is always normally distributed regardless of the sample size. C) Expert judgment is used to select samples
D) Convenience sampling is used
3. What is the main characteristic of subjective sampling methods?
A) Random selection of samples 8. What is a challenge associated with cluster sampling?
B) Selection based on expert judgment or convenience A) It is difficult to implement
C) Division of the population into strata B) Clusters should be heterogeneous to represent the entire population
D) Use of systematic procedures accurately
C) It requires expert judgment
4. Which of the following is an example of judgment sampling? D) It does not ensure each stratum is represented proportionately
A) Selecting every 10th customer from a list
B) Using expert judgment to choose the "best" customers 9. Which method involves choosing a random time and then selecting the
C) Randomly selecting names from a database next n items produced after that time?
D) Dividing the population into natural subsets A) Simple Random Sampling
B) Systematic Sampling
5. What is a key feature of probabilistic sampling methods? C) Random Time Selection
A) Samples are selected based on ease of access D) Random Time Points
B) Samples are chosen using random procedures
C) Samples are selected based on expert judgment
D) Samples are selected based on specific criteria known to the expert
www.viswamitra.org| 8
HYPOTHESIS TESTING

1. What is Hypothesis Testing


2. Types of Errors
3. Hypothesis Testing Steps
4. Types of Testing and when to use what.

www.viswamitra.org| 9
HYPOTHESIS TESTING

Hypothesis testing involves drawing


inferences about two contrasting
propositions (each called a hypothesis)
relating to the value of one or more
population parameters (such as the mean,
proportion, standard deviation, or variance.)
• Null Hypothesis (H0): Represents the
existing theory or belief. Accepted as true
unless disproven by statistical evidence.
• Alternative Hypothesis (H1): The
complement of the null hypothesis. It is
true if the null hypothesis is false.

www.viswamitra.org| 10
TYPES OF ERRORS

Striking Balance www.viswamitra.org| 11


TYPES OF TESTING:

Types of Hypothesis Testing:


There are 2 types of Hypothesis testing based on the type of data being analysed. They are
1. Parametric Test: When you have information about the population and can make assumptions.
a. T – Test : Small sample size (Less than 30), Variance – Un Known
b. Z – Test : Large sample size (More than 30), Variance – Known
c. F – Test - Small sample size , Variance – Known
d. ANOVA – Extension of T-test

2. Non Parametric Test: Whenever a few assumptions in the given population are uncertain, we use non-
parametric tests
a. Chi-Square Test: Simple random sampling size greater than 50, Samples are independent.

www.viswamitra.org| 12
HYPOTHESIS TESTING STEPS:

1. Formulate Hypotheses:
o Null Hypothesis (H0): A statement of no effect or no difference, which we aim to test.
o Alternative Hypothesis (H1 or Ha): A statement that there is an effect or a difference.

2. Choose Significance Level (α):


o Common values are 0.05, 0.01, or 0.10. This is the probability of rejecting the null hypothesis when it is true (Type I
error).

3. Select the Test Statistic:


o Depends on the nature of the data and the hypothesis (e.g., t-test, z-test, chi-square test).
o Establish criteria for rejecting or failing to reject H0.

4. Calculate the Test Statistic and p-Value:


o Use the sample data to compute the test statistic.
o The p-value indicates the probability of observing the test results under the null hypothesis.

5. Make a Decision:
o Compare the p-value to the significance level (α).
o If p-value ≤ α, reject the null hypothesis.
o If p-value > α, do not reject the null hypothesis.
www.viswamitra.org| 13
ONE-SAMPLE HYPOTHESIS TESTING
A one-sample hypothesis test is used to determine if a sample comes from a
population with a specific mean (or another parameter). It is useful when you
want to compare the sample mean to a known population mean or a
hypothesized value.

Null Hypothesis (H₀): The statement being tested, usually a statement of no


effect or no difference. For one-sample tests, it often states that the
population mean (μ) is equal to a specific value (μ₀).
Example: H₀: μ = μ₀

Alternative Hypothesis (H₁): The statement you want to test against the null
hypothesis. It suggests that the population mean is different from the
specified value.
Example: H₁: μ ≠ μ₀ (two-tailed test)
H₁: μ > μ₀ (right-tailed test)
H₁: μ < μ₀ (left-tailed test)
www.viswamitra.org| 14
TWO-TAILED TEST OF HYPOTHESIS FOR MEAN

www.viswamitra.org| 15
ONE-SAMPLE HYPOTHESIS TESTING - EXAMPLE
Problem: An engineer wants to test if the average lifespan of a certain type of battery is 100 hours. A sample of 25 batteries has an average
lifespan of 95 hours with a standard deviation of 10 hours. The significance level is 0.05.

1.State the Hypotheses:


1. H₀: μ = 100 hours
2. H₁: μ ≠ 100 hours (two-tailed test)

2.Choose the Significance Level: α = 0.05

3.Collect Data: Sample mean = 95 hours, sample standard deviation = 10 hours, sample size = 25

4.Calculate the Test Statistic:


1. Use the T-test since the population standard deviation is unknown and n < 30.
2. Formula:

5.Determine the P-value or Critical Value:


1. For a two-tailed test with α = 0.05 and df = 24, the critical t-values are approximately ±2.064.
2. The p-value can be found using statistical software or tables.

6.Make a Decision:
1. The test statistic (-2.5) is outside the range of -2.064 to 2.064.
2. Since the p-value < 0.05, reject H₀.
3. Conclusion: There is sufficient evidence to conclude that the average lifespan of the batteries is different from 100 hours

www.viswamitra.org| 16
TWO-SAMPLE HYPOTHESIS TESTING

Two-sample hypothesis testing is used to compare the means (or other parameters) of two independent
groups to determine if there is a statistically significant difference between them. This type of test is often
used to compare experimental and control groups, different treatment groups, or any other two
independent samples.

www.viswamitra.org| 17
TWO-SAMPLE HYPOTHESIS TESTING - EXAMPLE

Problem: Suppose an engineer wants to compare the mean lifespans of batteries from two different manufacturers. A sample of
30 batteries from Manufacturer A has a mean lifespan of 100 hours with a standard deviation of 5 hours. A sample of 25 batteries
from Manufacturer B has a mean lifespan of 98 hours with a standard deviation of 6 hours. The significance level is 0.05.

www.viswamitra.org| 18
RECAP

1. What is the null hypothesis (H0) in hypothesis testing? 5. What is the significance level (alpha) commonly used in
A) A statement that there is an effect or a difference. hypothesis testing?
B) A statement that there is no effect or no difference. A) 0.01
C) A statement that the sample mean is different from the B) 0.05
population mean. C) 0.10
D) A statement that the alternative hypothesis is true. D) All of the above

2. Which of the following is a type I error?


A) Failing to reject a false null hypothesis. 6. What does it mean if a test is said to be "two-tailed"?
B) Rejecting a false null hypothesis. A) The test considers only one direction of effect.
C) Failing to reject a true null hypothesis. B) The test considers both directions of effect.
D) Rejecting a true null hypothesis. C) The test has a higher type I error rate.
D) The test has a higher type II error rate.
3. If the p-value is less than the significance level (alpha), what
decision should be made? 7. When should a one-tailed test be used instead of a two-tailed
A) Accept the null hypothesis. test?
B) Reject the null hypothesis. A) When the effect could be in either direction.
C) Accept the alternative hypothesis. B) When the effect is expected to be in a specific direction.
D) Increase the sample size. C) When you want to decrease the significance level.
D) When you want to increase the sample size.
4. Which of the following statements is true about the
alternative hypothesis (H1)? 8. Which of the following best describes a type II error?
A) It is the hypothesis that there is no effect or no A) Rejecting the null hypothesis when it is true.
difference. B) Failing to reject the null hypothesis when it is false.
B) It is always accepted when the p-value is high. C) Rejecting the null hypothesis when it is false.
C) It is a statement that there is an effect or a difference. D) Failing to reject the null hypothesis when it is true.
D) It is tested directly during hypothesis testing. www.viswamitra.org| 19
ANALYSIS OF VARIANCE (ANOVA)
ANOVA is a statistical method used to compare the means of three or more groups to determine if there is a statistically significant
difference between them. Unlike a t-test, which compares the means of two groups, ANOVA can handle multiple groups simultaneously.

If the F-statistic is greater


than the critical value, or
if the P-value is less than
the significance level (α),
reject H0.

www.viswamitra.org| 20
ANALYSIS OF VARIANCE (ANOVA) - EXAMPLE

Conclusion: The one-way ANOVA shows a significant difference between


the means of the three teaching methods, indicating that at least one
teaching method leads to different test scores compared to the others.

www.viswamitra.org| 21
CHI-SQUARE TEST FOR INDEPENDENCE
The Chi-Square Test of Independence is a statistical method used to determine if there is a significant association between two categorical
variables. Essentially, it tests whether the distribution of one variable is independent of the distribution of another variable

www.viswamitra.org| 22
CHI-SQUARE TEST FOR INDEPENDENCE - EXAMPLE
A researcher wants to determine if there is an association between smoking status (Smoker, Non-Smoker)
and having a chronic disease (Yes, No).
Disease Yes Disease No Total
Data: Smoker 30 70 100
Non-Smoker 10 90 100
Total 40 160 200

www.viswamitra.org| 23
CHI-SQUARE TEST FOR INDEPENDENCE - EXAMPLE

www.viswamitra.org| 24
RECAP

1. What is the main purpose of ANOVA? 6. What is the null hypothesis in a Chi-square test of independence?
A) To compare the means of two groups. A) The variables are independent.
B) To compare the variances of two groups. B) The variables are dependent.
C) To compare the means of three or more groups. C) The sample is normally distributed.
D) To test for independence between categorical variables. D) The group means are equal.

2. In ANOVA, what is the term for the variation within each group? 7. How is the total sum of squares (SST) calculated in ANOVA?
A) Total variation. A) Sum of the squared differences between each group mean and the overall mean.
B) Between-group variation. B) Sum of the squared differences between each data point and the overall mean.
C) Within-group variation. C) Sum of the squared differences between each data point and its group mean.
D) Residual variation. D) Sum of the squared differences between the group variances.

3. What is the null hypothesis in a one-way ANOVA test? 8. What does the mean square between groups (MSB) represent in ANOVA?
A) All group variances are equal. A) Total variation within each group.
B) All group means are equal. B) Total variation between the groups.
C) The samples are dependent. C) Average variation within each group.
D) The samples are normally distributed. D) Average variation between the groups.

4. What is the alternative hypothesis in a one-way ANOVA test? 9. What does the degrees of freedom (df) for the F-test in ANOVA depend on?
A) At least one group variance is different. A) The number of groups and the total sample size.
B) All group means are equal. B) Only the number of groups.
C) At least one group mean is different. C) Only the total sample size.
D) The samples are normally distributed. D) The significance level.

5. What is the primary purpose of the Chi-square test? 10. What is the formula for the degrees of freedom in a Chi-square test of
A) To compare means between two groups. independence?
B) To compare means between three or more groups. A) (Number of rows - 1) × (Number of columns - 1)
C) To test for independence between categorical variables. B) Number of rows × Number of columns
D) To test for normality of a distribution. C) (Number of rows + 1) × (Number of columns + 1)
D) Number of rows + Number of columns - 1
www.viswamitra.org| 25
THANK YOU
Disclaimer: Views expressed are personal and I don’t represent the company that I’m
working or the ones that I worked in the past.

Dr. Ramaraju Poosapati


https://www.linkedin.com/in/rampoosapati

You might also like