SML - Question Bank-20.2.25
SML - Question Bank-20.2.25
Q. No. Question
1 Define and explain Central Tendency? Explain its importance in
manufacturing.8 marks
Central Tendency Definition: A statistical measure that identifies the typical or
central value in a dataset, representing where data tends to cluster.
Three Main Measures:
1. Mean (Average)
Sum of values divided by count
Used for normally distributed data
Example: Average diameter of parts
2. Median (Middle)
Middle value when data is ordered
Best when outliers present
Example: Middle value of component weights
3. Mode (Most Frequent)
Most common value
Used for categorical data
Example: Most common defect type
Importance in Manufacturing:
1. Quality Control
Setting specifications
Monitoring processes
Detecting deviations
Measuring consistency
2. Process Optimization
Setting target values
Adjusting machines
Reducing variation
Improving efficiency
3. Decision Making
Production planning
Maintenance scheduling
Resource allocation
Product specifications
3. Cost Management
Reducing waste
Controlling inventory
Optimizing materials
Planning production
Practical Benefits:
Improved product quality
Reduced defects
Lower costs
Better process control
Enhanced customer satisfaction
More efficient operations
This understanding helps manufacturers maintain quality while optimizing their
processes and reducing costs.
2 Calculate the values for Mean, Median, Mode, and Mid-range for the given
dataset.8 marks
Makes predictions or
Summarizes and describes
inferences about a
Definition the main features of a
population based on a
dataset.
sample.
Measures of central
tendency (mean, median, Hypothesis testing,
Techniques mode), and variability confidence intervals,
(range, variance, standard regression, ANOVA, t-tests.
deviation).
Clinical trials,
Compares means Pre- and post-
T-Test education
of two groups. training test scores.
analysis.
Test scores of Agriculture,
Compares means
ANOVA students across three education, quality
of multiple groups
schools. control.
4. Interquartile Range (IQR): The range between the 25th percentile (Q1)
and the 75th percentile (Q3), capturing the spread of the middle 50% of
the data.
These measures reveal data variability and distribution, aiding analysis and
decision-making.
13 What is Hypothesis testing? Elaborate on the steps of hypothesis testing
and Types of Hypothesis testing.7 marks
Hypothesis testing is a statistical method used to determine if there is enough
evidence in a sample to support or reject a hypothesis about a population. The
process involves:
1. Stating Hypotheses: Define the null (H₀) and alternative (H₁)
hypotheses.
2. Choosing Significance Level (α): Set a threshold (e.g., 0.05) for
rejecting the null hypothesis.
3. Selecting the Test: Choose the appropriate test (e.g., t-test, chi-square)
and calculate the test statistic.
4. Analyzing Data: Compute the test statistic and p-value.
5. Making a Decision: Reject the null hypothesis if p ≤ α, otherwise fail to
reject it.
6. Conclusion: Based on the test result, conclude whether there is
sufficient evidence to support the alternative hypothesis.
Types of Hypothesis Testing:
1. One-tailed vs. Two-tailed: Tests if a parameter is greater or less than a
value (one-tailed) or different (two-tailed).
Eg: One-tailed: A company claims that their new product lasts longer than 100
hours. A one-tailed test would check if the product lasts greater than 100 hours.
Two-tailed: A university claims that their new teaching method improves
scores, but you want to test if it results in either a higher or lower average
score than the previous method (i.e., different from 75%).
One-sample Test:
Two-sample Test:
The table includes all the necessary components for calculating the Z-score:
Data Point (X): The individual values in the dataset.
Mean (μ): The average of all the data points.
Deviation from Mean (X - μ): The difference between each data point
and the mean.
Standard Deviation (σ): A measure of the spread of the data.
Z-Score (Z): The standardized score representing how many standard
deviations a data point is from the mean.
Interpretation of Z-Scores:
A positive Z-score indicates that the data point is above the mean.
A negative Z-score indicates that the data point is below the mean.
The magnitude of the Z-score indicates how far the data point is from
the mean in terms of standard deviations.
Example:
The data point 3 has a Z-score of -1.43, meaning it is 1.43 standard
deviations below the mean.
The data point 21 has a Z-score of 1.38, meaning it is 1.38 standard deviations
above the mean.
17 What is Naive Bayes classifier?
Explain with an example and give its application.7 marks
A Naive Bayes classifier is a probabilistic machine learning algorithm based on
Bayes' theorem with a strong (naive) independence assumption between the
features. It is a popular choice for classification tasks due to its simplicity and
effectiveness, especially in text classification.
1. Bayes' Theorem:
o The core of the algorithm lies in Bayes' theorem, which
calculates the probability of a class given the observed features:
2. P(Class|Features) = (P(Features|Class) * P(Class)) / P(Features)
o Where:
P(Class|Features): Posterior probability (probability of
the class given the features)
P(Features|Class): Likelihood (probability of the features
given the class)
P(Class): Prior probability (probability of the class)
P(Features): Evidence (probability of the features)
3. Naive Independence Assumption:
o The "naive" part comes from the assumption that all features are
independent of each other given the class. This simplifies the
calculation of the likelihood:
P(Features|Class) = P(Feature1|Class) * P(Feature2|Class) * ... *
P(FeatureN|Class
Applications:
Text Classification:
o Spam filtering
o Sentiment analysis
o Topic categorization
o Document classification
Image Recognition:
o Object recognition
o Facial recognition
Medical Diagnosis:
o Disease prediction
o Diagnosis of medical conditions
Advantages:
Simple and easy to implement
Fast training and prediction
Effective in high-dimensional data
Disadvantages:
Naive independence assumption may not always hold true
Can be sensitive to irrelevant features
18 Find the line of regression for the following data:
X = [2, 4, 6, 8], Y = [3, 5, 7, 10].7 marks
x y x^2 xy
2 3 4 6
4 5 16 20
6 7 36 42
8 10 64 80
Calculate the sums of each column:
∑x = 20
∑y = 25
∑x^2 = 120
∑xy = 148
Calculate the slope (m) and intercept (b) using the following formulas:
m = (n * ∑xy - ∑x * ∑y) / (n * ∑x^2 - (∑x)^2)
b = (∑y - m * ∑x) / n
Where n is the number of data points (in this case, n = 4).
Substitute the values from Step 3 into the formulas:
m = (4 * 148 - 20 * 25) / (4 * 120 - 20^2) = 1.15
b = (25 - 1.15 * 20) / 4 = 0.5
Write the equation of the regression line:
The equation of the regression line is y = 1.15x + 0.5.
19 Explain the significance of Chi-Square test in industrial engineering
applications. 6 marks
Quality Control: A core function of industrial engineering is optimizing
quality in manufacturing processes. This Chi-Square test helps identify if
certain machines are associated with higher defect rates.
Process Improvement: If the test shows a relationship, it signals a need to
investigate why a specific machine might be producing more defects. This
could lead to:
Machine maintenance or adjustments
Operator training
Redesign of the manufacturing process
Resource Allocation: The results can inform decisions on how to best allocate
resources (maintenance, operator time, etc.) to different machines.
20 Chi-Square test
A manufacturing company wants to assess whether there's a relationship
between the type of machine used in a production process (Machine A,
Machine B, Machine C) and the occurrence of defects. Data collected over a
week is as follows:
1 12 14
2 10 13
3 11 12
4 13 15
5 10 11
6 12 13
Let's perform a paired t-test to determine if the new process results in a
significantly higher production rate than the old process.
Assumptions:
The data is paired, meaning the production rates for each worker are
measured under both the old and new processes.
The differences in production rates are normally distributed.
Hypothesis:
Null Hypothesis (H0): The mean difference in production rates
between the new and old processes is zero (i.e., no significant
difference).
Alternative Hypothesis (H1): The mean difference in production rates
between the new and old processes is greater than zero (i.e., the new
process is significantly faster).
Calculations:
1. Calculate the difference in production rates for each worker:
o Worker 1: 14 - 12 = 2
o Worker 2: 13 - 10 = 3
o Worker 3: 12 - 11 = 1
o Worker 4: 15 - 13 = 2
o Worker 5: 11 - 10 = 1
o Worker 6: 13 - 12 = 1
2. Calculate the mean and standard deviation of the differences:
o Mean: (2 + 3 + 1 + 2 + 1 + 1) / 6 = 1.67
3. Calculate the squared deviations from the mean:
Worker 1: (2 - 1.67)^2 = 0.1089
Worker 2: (3 - 1.67)^2 = 1.7689
Worker 3: (1 - 1.67)^2 = 0.4489
Worker 4: (2 - 1.67)^2 = 0.1089
Worker 5: (1 - 1.67)^2 = 0.4489
Worker 6: (1 - 1.67)^2 = 0.4489
4. Calculate the sum of squared deviations:
Sum = 0.1089 + 1.7689 + 0.4489 + 0.1089 + 0.4489 + 0.4489 = 3.3424
5. Calculate the sample variance:
Variance = Sum of squared deviations / (n - 1) = 3.3424 / (6 - 1) =
0.6685
6. Calculate the sample standard deviation:
Standard Deviation = √Variance = √0.6685 ≈ 0.82
Calculate the t-statistic:
Here:
𝑀𝑆𝑊
𝐹= = 225.8/7.233 = 31.21
𝑀𝑆𝐵
Step 8: Critical Value and Decision
For dfB=2, dfW=12, and α=0.05, the critical value from the F-
distribution table is 3.89.
Since the calculated F-statistic (31.21) > critical value (3.89), we reject
the null hypothesis.
23 Sample Problem for Two-Way ANOVA: example.
Scenario:
A researcher is studying the effect of two factors, Teaching
Method and Gender, on student test scores. The researcher wants to determine
if there are significant differences in test scores based on the teaching method,
gender, and whether there is an interaction effect between teaching method and
gender.
Factors:
Main Effect of Teaching Method: Are there differences in test scores based on
the teaching method?
Main Effect of Gender: Are there differences in test scores based on gender?
Data:
The researcher collects the following test scores:
Teaching Method Gender Test Scores
Method 1 Male 85, 88, 82
Method 1 Female 90, 87, 91
Method 2 Male 78, 82, 75
Method 2 Female 80, 79, 83
Method 3 Male 92, 95, 89
Method 3 Female 91, 94, 93
24 ANCOVA (Analysis of Covariance)
https://colab.research.google.com/drive/1qqVeaQEUwQM7S67cH0glb2Ji6
ecZN04z?usp=sharing
1 A 75 80
2 A 82 85
3 B 78 82
4 B 79 83
5 C 74 78
6 C 80 81