Unit 1
Unit 1
1.
A)“All the statistical data are numerical facts but all the numerical facts are not statistical data.” Comment on the
above statement.
This statement is correct and highlights an important distinction in statistics:
All statistical data are indeed numerical facts. Statistics deals with collecting, analyzing, interpreting, and
presenting numerical data.
However, not all numerical facts are statistical data. Some numerical facts are simply individual
measurements or counts that don't necessarily represent a larger dataset or population.
For example:
Statistical data: The average height of students in a class (derived from multiple measurements).
Numerical fact (not statistical data): The height of a single student.
B) Discuss briefly the role of statistics in decision making
2.Explain any two of the following with examples i) Retrospective study ii) Observational study iii) Designed
experiment
i) Retrospective study: A retrospective study looks back in time, examining data from the past to investigate a
particular outcome or condition.
Example: Researchers want to study the relationship between smoking and lung cancer. They identify a group of lung
cancer patients and a control group without lung cancer, then look back at their smoking histories to compare the
prevalence of smoking in each group.
ii) Observational study: An observational study involves collecting data on a group of subjects without any
intervention or manipulation of variables by the researcher.
Example: A nutritionist wants to study the relationship between coffee consumption and sleep quality. They recruit
participants and ask them to record their daily coffee intake and sleep patterns over a month, without suggesting any
changes to their habits.
3.) What is statistical theory? Explain with a practical quote how it is important in Industrial Engineering?
Statistical theory is the branch of applied mathematics that deals with the collection, analysis, interpretation, and
presentation of quantitative data. It provides the theoretical foundation for statistical methods used in various fields,
including industrial engineering.
Importance in Industrial Engineering - Practical Example:
Consider a manufacturing plant producing electronic components. The quality control department needs to ensure that
the diameter of a crucial part falls within specified tolerances. They use statistical theory in several ways:
1. Sampling Theory: Instead of inspecting every part (which would be time-consuming and costly), they use
sampling theory to determine how many parts to inspect to achieve a desired level of confidence in their
quality assessment.
2. Hypothesis Testing: They might test whether the mean diameter of the parts is significantly different from the
target specification.
3. Control Charts: Using principles from statistical theory, they implement control charts to monitor the
production process over time, detecting when the process might be going out of control.
4. Process Capability Analysis: They use statistical distributions to assess whether the manufacturing
process is capable of consistently producing parts within the specified tolerances.
5. Design of Experiments: When trying to optimize the manufacturing process, they use statistical
theory to design experiments that efficiently explore the effects of various factors on the part
diameter.
6. Reliability Analysis: They apply statistical theory to predict the failure rates and lifetimes of the
components.
Practical Quote: "By applying control charts based on statistical theory, we reduced our defect rate by 30%
and increased our production efficiency by 15% within six months."
Unit 3
3.Define the following distribution: (i) Chi-Square distribution (ii) Weibull distribution
(i) Chi-Square distribution: The Chi-Square distribution is a continuous probability distribution of the sum of squares
of k independent standard normal random variables. It's often used in hypothesis testing and constructing confidence
intervals.
(ii) Weibull distribution: The Weibull distribution is a continuous probability distribution used to model the time until
an event occurs or the lifetime of a product. It's particularly useful in reliability engineering and survival analysis due
to its flexibility in representing different types of failure rates.
4.List reasons for having good estimators in statistics. Also explain any two properties
Reasons for having good estimators:
1. Accurate parameter estimation
2. Reliable decision-making
3. Efficient use of data
4. Minimizing errors in inference
5. Consistency across different samples
Two important properties of good estimators:
a) Unbiasedness: An estimator is unbiased if its expected value equals the true population parameter. This means it
doesn't systematically over- or underestimate the parameter.
b) Efficiency: An efficient estimator has the smallest variance among all unbiased estimators for a given parameter.
This means it provides the most precise estimate possible with the available data.
Unit 4
1) Distinguish between Point Estimate and Interval Estimate with Example.
Point Estimate: A point estimate is a single value used to estimate a population parameter.
Example: The sample mean (x̄ ) is used as a point estimate for the population mean (μ).
Interval Estimate: An interval estimate is a range of values likely to contain the population parameter.
Example: A 95% confidence interval for the population mean might be (x̄ - 1.96SE, x̄ + 1.96SE), where SE is the
standard error.
Key Differences:
1. Precision: Point estimates provide a precise value but don't convey uncertainty. Interval estimates show a
range, indicating the level of uncertainty.
2. Confidence: Interval estimates are associated with a confidence level, while point estimates are not.
3. Interpretation: A point estimate says "our best guess is this value," while an interval estimate says "we're X%
confident the true value lies in this range."
2.With suitable examples explain Type – I error, Type – II error, Null Hypothesis and Alternate hypothesis
Null Hypothesis (H0): A statement assuming no effect or no difference in the population. Example: The average height
of adult males is 170 cm.
Alternative Hypothesis (H1 or Ha): A statement that contradicts the null hypothesis. Example: The average height of
adult males is not 170 cm.
Type I error: Rejecting the null hypothesis when it is actually true. Example: Concluding that the average male height
is different from 170 cm when it actually is 170 cm.
Type II error: Failing to reject the null hypothesis when it is actually false. Example: Concluding that the average male
height is 170 cm when it is actually different.
3.Discuss Null Hypothesis, One-sided Alternate Hypothesis, Two-sided Alternate Hypothesis with example.
1. Null Hypothesis (H0): The null hypothesis typically represents the status quo or no effect. It's the hypothesis
we aim to test against.
2. One-sided (or One-tailed) Alternative Hypothesis (H1): This type of alternative hypothesis specifies a
direction of the effect.
3. Two-sided (or Two-tailed) Alternative Hypothesis (H1): This type of alternative hypothesis doesn't specify a
direction, just that there is a difference.
Example: Let's consider a study on a new teaching method's effect on student test scores. The population mean score
with the traditional method is 70.
Null Hypothesis (H0): The new teaching method has no effect on test scores. H0: μ = 70
One-sided Alternative Hypothesis (H1): The new teaching method increases test scores. H1: μ > 70
Two-sided Alternative Hypothesis (H1): The new teaching method affects test scores (could increase or decrease). H1:
μ ≠ 70
Key Differences:
1. Direction: One-sided specifies a direction, two-sided doesn't.
2. Critical Region: One-sided tests have the critical region on only one side of the distribution, while two-sided
tests split it between both tails.
3. P-value: For the same data, a one-sided test will have half the p-value of a two-sided test.
4. Power: One-sided tests are more powerful when the effect is in the hypothesized direction.
5. Usage: Two-sided tests are more common in research, as they're more conservative and don't assume a
direction of effect.
Unit 5
1)Define method of moments and method of likelihood used in estimation of parameters.
Method of Moments: The method of moments is a technique for constructing estimators of the parameters by equating
sample moments with unobservable population moments and then solving these equations for the parameters to be
estimated.
Process:
1. Express the population moments in terms of the parameters.
2. Equate these to the corresponding sample moments.
3. Solve the resulting equations for the parameters.
Method of Maximum Likelihood: The maximum likelihood method finds the values of the parameters that maximize
the likelihood function, which expresses the probability of observing the given sample as a function of the unknown
parameters.
Process:
1. Write the likelihood function based on the probability distribution and observed data.
2. Take the logarithm of the likelihood function (log-likelihood).
3. Find the parameter values that maximize this log-likelihood by setting its derivatives to zero.
Key Differences:
1. Efficiency: Maximum likelihood estimators are generally more efficient (have smaller variance).
2. Complexity: Method of moments is often simpler computationally, especially for complex distributions.
3. Assumptions: Maximum likelihood requires assumptions about the underlying distribution, while method of
moments doesn't.
6.Define the following terms: i) Factor ii) ANOVA iii) Coefficient of determination iv) Types of data analytics
i) Factor: A variable in an experiment that can be manipulated or controlled to study its effect on the outcome.
ii) ANOVA (Analysis of Variance): A statistical method used to analyze the differences among group means in a
sample.
iii) Coefficient of determination (R²): A statistical measure that represents the proportion of the variance in the
dependent variable that is predictable from the independent variable(s).
iv) Types of data analytics:
Descriptive analytics: Summarizing what happened
Diagnostic analytics: Explaining why it happened
Predictive analytics: Forecasting what might happen
Prescriptive analytics: Recommending actions to take
Unit 2
1) With a practical quote, explain how a hypergeometric distribution can be used in statistics
The hypergeometric distribution is used in sampling without replacement from a finite population where each sample
element is classified into one of two mutually exclusive categories.
Practical Example: Imagine a quality control scenario in a factory producing electronic components. A batch of 1000
components contains 50 defective units. The quality control team randomly selects 100 components for inspection.
Question: What is the probability of finding exactly 5 defective components in this sample?
This scenario follows a hypergeometric distribution because:
1. The population (batch) is finite (1000 components).
2. Sampling is done without replacement (each selected component isn't returned before the next is drawn).
3. Each component is classified into one of two categories (defective or non-defective).
The probability can be calculated using the hypergeometric probability mass function:
P(X = k) = [C(K,k) * C(N-K,n-k)] / C(N,n)
Where: N = total population size (1000) K = number of success states in the population (50 defective) n = number of
draws (100) k = number of observed successes (5) C(a,b) represents the number of ways to choose b items from a set
of a items
This example demonstrates how the hypergeometric distribution can be applied in real-world quality control situations
to assess the likelihood of specific sampling outcomes.
B) With a good practical quote discuss the relevance of use of Confidence Interval for in Decision Making
Confidence intervals are crucial in decision making as they provide a range of plausible values for a population
parameter, along with a measure of certainty.
Practical Example: A pharmaceutical company is developing a new drug to lower blood pressure. In a clinical trial
with 1000 patients, they find that the drug lowers systolic blood pressure by an average of 10 mmHg, with a 95%
confidence interval of (8 mmHg, 12 mmHg).
Relevance in Decision Making:
1. Precision Assessment: The width of the interval (4 mmHg) indicates the precision of the estimate. A narrower
interval would suggest more precise results.
2. Clinical Significance: If the lower bound (8 mmHg) is still considered clinically significant, decision-makers
can be more confident in the drug's effectiveness.
3. Risk Management: The upper bound (12 mmHg) helps assess the potential maximum effect, which is
important for safety considerations.
4. Comparison with Existing Treatments: If current treatments typically lower blood pressure by 7 mmHg, the
entire confidence interval being above this value strengthens the case for the new drug.
5. Future Research: If the interval is too wide for decisive action, it informs the need for larger studies.
6. Regulatory Decisions: Regulatory bodies might require the lower bound of the confidence interval to be above
a certain threshold for approval.
7. Investment Decisions: Investors can use this information to assess the potential market value and risks
associated with the drug