0% found this document useful (0 votes)
12 views7 pages

S&UL Subjective Question Bank

The document is a comprehensive question bank covering supervised and unsupervised learning, including short, medium, and long answer questions on various topics such as bias-variance trade-off, gradient descent, K-fold cross-validation, and hyperparameter tuning. It includes practical exercises, theoretical explanations, and numerical practice questions to reinforce understanding of machine learning concepts. The content is structured to facilitate learning and assessment in machine learning methodologies.

Uploaded by

non746003
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views7 pages

S&UL Subjective Question Bank

The document is a comprehensive question bank covering supervised and unsupervised learning, including short, medium, and long answer questions on various topics such as bias-variance trade-off, gradient descent, K-fold cross-validation, and hyperparameter tuning. It includes practical exercises, theoretical explanations, and numerical practice questions to reinforce understanding of machine learning concepts. The content is structured to facilitate learning and assessment in machine learning methodologies.

Uploaded by

non746003
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

Question Bank for Supervised and Unsupervised Learning

Short Answer Questions (2 Marks Each)


1. Differentiate supervised and unsupervised learning.
2. Define Mean Squared Error (MSE) mathematically.
3. Explain briefly how mini-batch gradient descent works.
4. List two assumptions of Linear Regression.
5. What is the difference between loss function and cost function?
6. Briefly explain feature importance.
7. What are Eigenvectors and Eigenvalues in PCA?
Medium Answer Questions (5 Marks Each)
1. Explain with a suitable example the concept of bias-variance trade-off.
2. Describe the steps involved in performing Grid Search Cross Validation.
3. Discuss how logistic regression can be used for binary classification.
4. Explain the concept of K-fold cross-validation and why it is important.
5. Write the step-by-step working of the K-Means clustering algorithm.
Long Answer Questions (10 Marks Each)
1. Describe the entire process of hyperparameter tuning using Grid Search and Randomized Search
Cross Validation, including Python implementation details.
2. Explain the concepts of overfitting and underfitting using KNN as an example. Include how
complexity affects model performance and suggest techniques to mitigate these issues.

Short Answer Questions (2 Marks Each)


1. Calculate RMSE given actual values: [4, 6, 8] and predicted values: [5, 7, 7].
2. Write a Python function to compute Mean Squared Error (MSE).
3. Briefly describe the working principle of mini-batch gradient descent.
4. What is collinearity, and why is it problematic in regression models?
5. Distinguish between precision and recall.
6. Explain the concept of dimensionality reduction.
7. Provide two assumptions behind Linear Regression.
Medium Answer Questions (5 Marks Each)
1. Perform one iteration of stochastic gradient descent mathematically for a given linear regression
problem with y = 2x and initial weights w = 0.5, learning rate = 0.1, using data point x=1, y=2.
2. Discuss and calculate bias-variance using a simple dataset example.
3. Explain K-fold Cross-validation clearly, including practical Python steps.
4. Illustrate the elbow method in K-Means clustering and explain its significance.
5. Write detailed Python pseudocode for Grid Search Cross Validation for a Decision Tree.
Long Answer Questions (10 Marks Each)
1. Using a real or hypothetical dataset, demonstrate step-by-step implementation of
hyperparameter tuning using Grid Search and Randomized Search Cross Validation in Python.
2. Explain the concepts of overfitting and underfitting with a detailed KNN example, including
practical Python implementation strategies and graphical interpretation.
Short Answer Questions (2 Marks Each)
1. Define supervised, unsupervised, and semi-supervised learning.
2. Explain MSE and RMSE with formulas.
3. How does Mini-Batch Gradient Descent differ from Batch Gradient Descent?
4. Define precision and recall.
5. What is cross-validation, and why is it used?
6. Briefly explain Eigenvalues and Eigenvectors.
7. Describe the Elbow Method used in clustering.
Medium Answer Questions (5 Marks Each)
1. Perform Linear Regression calculation for one data point (x=2, y=4) given weights w=1, b=0.5, and
calculate squared loss.
2. Explain how to handle categorical and missing values practically in a dataset.
3. Explain polynomial regression with a practical use case scenario.
4. Demonstrate dimensionality reduction using PCA with Python pseudocode.
5. Discuss Feature Importance and methods to calculate it.
Long Answer Questions (10 Marks Each)
1. Provide a complete example of implementing K-Fold Cross Validation along with Grid Search and
Randomized Search CV for hyperparameter tuning in Python for Decision Trees.
2. Illustrate the concepts of complexity, overfitting, and underfitting using KNN as an example,
including Python implementation, graphical interpretation, and strategies to mitigate these
issues.

1. Why is KNN considered a lazy learning algorithm? (2 Marks_


2. How does the value of K (number of neighbours) affect the performance of the KNN algorithm?
What might happen if K is too small or too large? (5 Marks)

Short Answer Questions (2 marks each)


6. Differentiate between Loss Function and Cost Function with one example each.

7. Explain the concept of Bias-Variance Trade-off with a diagram.

8. What is Random Search CV, and when is it preferred over Grid Search CV?

9. Write a short note on Feature Selection and its importance in machine learning.

10. How does data augmentation help improve the performance of image classification models?

11. What is overfitting in KNN and how can it be controlled?

Long Answer Questions (5 marks each )


a) Draw and explain the Sigmoid function with its equation.
b) Discuss how the Sigmoid function is used in Logistic Regression.
13.

a) Define Confusion Matrix and explain each component (TP, FP, TN, FN) with an example.
b) Derive formulas for Precision, Recall, and F1-score and explain when each is useful.
14.

a) Explain K-Fold Cross Validation with a diagram.


b) Describe how it is used with Grid Search to improve model performance.

Short Questions (2 marks each).


6. Spam Detection Model
A spam detection system using Logistic Regression often classifies important mails as spam.
Q: Which metric should be improved: Precision or Recall? Justify your answer.
7. Bank Loan Approval Model
You are tuning a Random Forest model for loan approvals.
Q: Why would you prefer Random Search CV over Grid Search in this situation?
8. Medical Diagnosis App
A mobile app for disease detection from chest X-rays performs well on training data but poorly on
new unseen cases.
Q: How can data augmentation help in this scenario? Give one transformation example.
9. E-commerce Recommendation System
Your system uses many features like clicks, views, ratings, time spent.
Q: Why is Feature Selection important here? How might you implement it?
10. KNN Model for Handwritten Digit Classification
The KNN model gives 99% on training but 70% on test data.
Q: What kind of error is this? What strategy would you use to improve it?
11. Speech Recognition Tool
You apply time stretching and noise addition to speech audio.
Q: What kind of machine learning issue are you addressing with this augmentation?

Long Questions (5 marks each)


12. Logistic Regression for Diabetic Prediction
A hospital develops a logistic regression model for early detection of diabetes.
a) Explain how the Sigmoid Function converts outputs into probabilities.
b) What are loss function and cost function in this context, and how are they optimized?
13. K-Fold CV in Insurance Risk Model
You are building a classification model to assess insurance risk.
a) Explain K-Fold Cross Validation with a diagram.
b) Describe how you would combine it with Grid Search CV to improve hyperparameter
selection.
14. Bias-Variance in House Price Estimator
A regression model overfits when using 20 features but underfits when using just 5.
a) Define Bias-Variance Tradeoff with a real analogy.
b) Explain how Regularization (L1/L2) can help find a balance.
Numerical Practice Questions
Q1. You are given a dataset of 10 samples with 3 features X1, X2, X3 and a target variable Y. After fitting
a Multiple Linear Regression model, you obtained:
Total Sum of Squares (SST) = 180
Regression Sum of Squares (SSR) = 126
Mean Squared Error (MSE) = 6
a) Calculate R² and interpret the result.
b) Find the number of data samples.
c) Calculate the number of predictors.
d) Compute the Residual Sum of Squares (SSE).

Q2. Given a linear regression cost function:

a) Perform two iterations of Batch Gradient Descent. Show step-by-step updates.


b) Evaluate if α = 0.01 is too small or too large in this case with justification.

Q3. Given the following predictions and true labels:

Sample y ŷ

1 1 0.9

2 0 0.1

3 1 0.6

4 0 0.4

a) Compute total binary cross-entropy loss.


b) Comment on the model's confidence in predictions and its implications.

Q4. You are fitting polynomial regression models of degree 1, 3, and 6. Given RMSE values:
Model Training RMSE Validation RMSE

Deg 1 14.2 14.5

Deg 3 7.1 7.5

Deg 6 2.3 12.1

a) Analyze the bias and variance behaviour of each model.


b) Identify the best model. Justify based on trade-off analysis.
c) Sketch expected training vs validation error curve.

Q5. Given the following training dataset and K = 3, use Euclidean Distance to classify the test point X =
(5,5) :

X1 X2 Class

1 1 A

2 2 A

6 6 B

7 7 B

8 8 B

a) Show distance calculation for all training points.


b) Predict the class of the test point.Type equation here .
c) Explain why increasing K may change the result.
Q6. A dataset has the following 2D feature matrix X after centering:

[ ]
2 0
0 2
X=
−2 0
0 −2

a) Compute covariance matrix.


b) Find eigenvalues and eigenvectors.
c) Project the data onto the principal component axis.

Q7. Given the following confusion matrix:


Predicted Positive Predicted Negative

Actual Positive 45 5

Actual Negative 15 35

a) Calculate Precision, Recall, Accuracy, and F1-score.


b) Suppose a cost-sensitive problem penalizes false negatives heavily. Suggest how model should be
adjusted.

Q8. A dataset has 500 records. Column A has 20% missing values. You applied mean imputation. You
then fit a linear regression model using this feature.
a) Explain the statistical effect of mean imputation on variance.
b) Propose and justify an alternative method.
c) Calculate expected MSE reduction if imputation is done via linear interpolation (assume variance is
reduced by 30%).

Q9. Given initial 2D points and K=2,


Points: (2,3), (3,3), (6,7), (8,7)
Initial Centroids: C1 = (2,3), C2 = (8,7)
a) Perform one full iteration of K-means (assign points to clusters, recalculate centroids).
b) Compute SSE (Sum of Squared Errors) after 1 iteration.
c) Discuss how Elbow method will help in determining K. (extra)

You might also like