0% found this document useful (0 votes)

12 views7 pages

S&UL Subjective Question Bank

The document is a comprehensive question bank covering supervised and unsupervised learning, including short, medium, and long answer questions on various topics such as bias-variance trade-off, gradient descent, K-fold cross-validation, and hyperparameter tuning. It includes practical exercises, theoretical explanations, and numerical practice questions to reinforce understanding of machine learning concepts. The content is structured to facilitate learning and assessment in machine learning methodologies.

Uploaded by

non746003

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views7 pages

S&UL Subjective Question Bank

Uploaded by

non746003

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 7

Question Bank for Supervised and Unsupervised Learning

Short Answer Questions (2 Marks Each)

1. Differentiate supervised and unsupervised learning.
2. Define Mean Squared Error (MSE) mathematically.
3. Explain briefly how mini-batch gradient descent works.
4. List two assumptions of Linear Regression.
5. What is the difference between loss function and cost function?
6. Briefly explain feature importance.
7. What are Eigenvectors and Eigenvalues in PCA?
Medium Answer Questions (5 Marks Each)
1. Explain with a suitable example the concept of bias-variance trade-off.
2. Describe the steps involved in performing Grid Search Cross Validation.
3. Discuss how logistic regression can be used for binary classification.
4. Explain the concept of K-fold cross-validation and why it is important.
5. Write the step-by-step working of the K-Means clustering algorithm.
Long Answer Questions (10 Marks Each)
1. Describe the entire process of hyperparameter tuning using Grid Search and Randomized Search
Cross Validation, including Python implementation details.
2. Explain the concepts of overfitting and underfitting using KNN as an example. Include how
complexity affects model performance and suggest techniques to mitigate these issues.

Short Answer Questions (2 Marks Each)

1. Calculate RMSE given actual values: [4, 6, 8] and predicted values: [5, 7, 7].
2. Write a Python function to compute Mean Squared Error (MSE).
3. Briefly describe the working principle of mini-batch gradient descent.
4. What is collinearity, and why is it problematic in regression models?
5. Distinguish between precision and recall.
6. Explain the concept of dimensionality reduction.
7. Provide two assumptions behind Linear Regression.
Medium Answer Questions (5 Marks Each)
1. Perform one iteration of stochastic gradient descent mathematically for a given linear regression
problem with y = 2x and initial weights w = 0.5, learning rate = 0.1, using data point x=1, y=2.
2. Discuss and calculate bias-variance using a simple dataset example.
3. Explain K-fold Cross-validation clearly, including practical Python steps.
4. Illustrate the elbow method in K-Means clustering and explain its significance.
5. Write detailed Python pseudocode for Grid Search Cross Validation for a Decision Tree.
Long Answer Questions (10 Marks Each)
1. Using a real or hypothetical dataset, demonstrate step-by-step implementation of
hyperparameter tuning using Grid Search and Randomized Search Cross Validation in Python.
2. Explain the concepts of overfitting and underfitting with a detailed KNN example, including
practical Python implementation strategies and graphical interpretation.
Short Answer Questions (2 Marks Each)
1. Define supervised, unsupervised, and semi-supervised learning.
2. Explain MSE and RMSE with formulas.
3. How does Mini-Batch Gradient Descent differ from Batch Gradient Descent?
4. Define precision and recall.
5. What is cross-validation, and why is it used?
6. Briefly explain Eigenvalues and Eigenvectors.
7. Describe the Elbow Method used in clustering.
Medium Answer Questions (5 Marks Each)
1. Perform Linear Regression calculation for one data point (x=2, y=4) given weights w=1, b=0.5, and
calculate squared loss.
2. Explain how to handle categorical and missing values practically in a dataset.
3. Explain polynomial regression with a practical use case scenario.
4. Demonstrate dimensionality reduction using PCA with Python pseudocode.
5. Discuss Feature Importance and methods to calculate it.
Long Answer Questions (10 Marks Each)
1. Provide a complete example of implementing K-Fold Cross Validation along with Grid Search and
Randomized Search CV for hyperparameter tuning in Python for Decision Trees.
2. Illustrate the concepts of complexity, overfitting, and underfitting using KNN as an example,
including Python implementation, graphical interpretation, and strategies to mitigate these
issues.

1. Why is KNN considered a lazy learning algorithm? (2 Marks_

2. How does the value of K (number of neighbours) affect the performance of the KNN algorithm?
What might happen if K is too small or too large? (5 Marks)

Short Answer Questions (2 marks each)

6. Differentiate between Loss Function and Cost Function with one example each.

7. Explain the concept of Bias-Variance Trade-off with a diagram.

8. What is Random Search CV, and when is it preferred over Grid Search CV?

9. Write a short note on Feature Selection and its importance in machine learning.

10. How does data augmentation help improve the performance of image classification models?

11. What is overfitting in KNN and how can it be controlled?

Long Answer Questions (5 marks each )

a) Draw and explain the Sigmoid function with its equation.
b) Discuss how the Sigmoid function is used in Logistic Regression.
13.

a) Define Confusion Matrix and explain each component (TP, FP, TN, FN) with an example.
b) Derive formulas for Precision, Recall, and F1-score and explain when each is useful.
14.

a) Explain K-Fold Cross Validation with a diagram.

b) Describe how it is used with Grid Search to improve model performance.

Short Questions (2 marks each).

6. Spam Detection Model
A spam detection system using Logistic Regression often classifies important mails as spam.
Q: Which metric should be improved: Precision or Recall? Justify your answer.
7. Bank Loan Approval Model
You are tuning a Random Forest model for loan approvals.
Q: Why would you prefer Random Search CV over Grid Search in this situation?
8. Medical Diagnosis App
A mobile app for disease detection from chest X-rays performs well on training data but poorly on
new unseen cases.
Q: How can data augmentation help in this scenario? Give one transformation example.
9. E-commerce Recommendation System
Your system uses many features like clicks, views, ratings, time spent.
Q: Why is Feature Selection important here? How might you implement it?
10. KNN Model for Handwritten Digit Classification
The KNN model gives 99% on training but 70% on test data.
Q: What kind of error is this? What strategy would you use to improve it?
11. Speech Recognition Tool
You apply time stretching and noise addition to speech audio.
Q: What kind of machine learning issue are you addressing with this augmentation?

Long Questions (5 marks each)

12. Logistic Regression for Diabetic Prediction
A hospital develops a logistic regression model for early detection of diabetes.
a) Explain how the Sigmoid Function converts outputs into probabilities.
b) What are loss function and cost function in this context, and how are they optimized?
13. K-Fold CV in Insurance Risk Model
You are building a classification model to assess insurance risk.
a) Explain K-Fold Cross Validation with a diagram.
b) Describe how you would combine it with Grid Search CV to improve hyperparameter
selection.
14. Bias-Variance in House Price Estimator
A regression model overfits when using 20 features but underfits when using just 5.
a) Define Bias-Variance Tradeoff with a real analogy.
b) Explain how Regularization (L1/L2) can help find a balance.
Numerical Practice Questions
Q1. You are given a dataset of 10 samples with 3 features X1, X2, X3 and a target variable Y. After fitting
a Multiple Linear Regression model, you obtained:
Total Sum of Squares (SST) = 180
Regression Sum of Squares (SSR) = 126
Mean Squared Error (MSE) = 6
a) Calculate R² and interpret the result.
b) Find the number of data samples.
c) Calculate the number of predictors.
d) Compute the Residual Sum of Squares (SSE).

Q2. Given a linear regression cost function:

a) Perform two iterations of Batch Gradient Descent. Show step-by-step updates.

b) Evaluate if α = 0.01 is too small or too large in this case with justification.

Q3. Given the following predictions and true labels:

Sample y ŷ

1 1 0.9

2 0 0.1

3 1 0.6

4 0 0.4

a) Compute total binary cross-entropy loss.

b) Comment on the model's confidence in predictions and its implications.

Q4. You are fitting polynomial regression models of degree 1, 3, and 6. Given RMSE values:
Model Training RMSE Validation RMSE

Deg 1 14.2 14.5

Deg 3 7.1 7.5

Deg 6 2.3 12.1

a) Analyze the bias and variance behaviour of each model.

b) Identify the best model. Justify based on trade-off analysis.
c) Sketch expected training vs validation error curve.

Q5. Given the following training dataset and K = 3, use Euclidean Distance to classify the test point X =
(5,5) :

X1 X2 Class

1 1 A

2 2 A

6 6 B

7 7 B

8 8 B

a) Show distance calculation for all training points.

b) Predict the class of the test point.Type equation here .
c) Explain why increasing K may change the result.
Q6. A dataset has the following 2D feature matrix X after centering:

[ ]
2 0
0 2
X=
−2 0
0 −2

a) Compute covariance matrix.

b) Find eigenvalues and eigenvectors.
c) Project the data onto the principal component axis.

Q7. Given the following confusion matrix:

Predicted Positive Predicted Negative

Actual Positive 45 5

Actual Negative 15 35

a) Calculate Precision, Recall, Accuracy, and F1-score.

b) Suppose a cost-sensitive problem penalizes false negatives heavily. Suggest how model should be
adjusted.

Q8. A dataset has 500 records. Column A has 20% missing values. You applied mean imputation. You
then fit a linear regression model using this feature.
a) Explain the statistical effect of mean imputation on variance.
b) Propose and justify an alternative method.
c) Calculate expected MSE reduction if imputation is done via linear interpolation (assume variance is
reduced by 30%).

Q9. Given initial 2D points and K=2,

Points: (2,3), (3,3), (6,7), (8,7)
Initial Centroids: C1 = (2,3), C2 = (8,7)
a) Perform one full iteration of K-means (assign points to clusters, recalculate centroids).
b) Compute SSE (Sum of Squared Errors) after 1 iteration.
c) Discuss how Elbow method will help in determining K. (extra)

Wa0030.
No ratings yet
Wa0030.
36 pages
NPTEL ML Assignment Week1
100% (4)
NPTEL ML Assignment Week1
5 pages
ML 5 Marks Questions Answers 1 To 30
No ratings yet
ML 5 Marks Questions Answers 1 To 30
5 pages
ML June 2024
No ratings yet
ML June 2024
12 pages
ML PYQs
No ratings yet
ML PYQs
32 pages
ML 2023
No ratings yet
ML 2023
3 pages
IML-IITKGP - Assignment 5 Solution
No ratings yet
IML-IITKGP - Assignment 5 Solution
7 pages
Ell409 Aq
No ratings yet
Ell409 Aq
8 pages
Questo Es
No ratings yet
Questo Es
8 pages
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
No ratings yet
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
11 pages
Male
No ratings yet
Male
9 pages
Adobe Scan 30-May-2023
No ratings yet
Adobe Scan 30-May-2023
7 pages
ML QB For SEE
No ratings yet
ML QB For SEE
6 pages
QB AMT305module 2
No ratings yet
QB AMT305module 2
4 pages
Question Bank
No ratings yet
Question Bank
6 pages
Machine 2021 Jan-Apr
No ratings yet
Machine 2021 Jan-Apr
45 pages
COMPSCI5014 1 Machine Learning (M) 201904
No ratings yet
COMPSCI5014 1 Machine Learning (M) 201904
7 pages
ML Assignments 2025
No ratings yet
ML Assignments 2025
91 pages
Machine Learning, (CS-3035), Online Spring End Semester Examination 2021
No ratings yet
Machine Learning, (CS-3035), Online Spring End Semester Examination 2021
8 pages
ML Week7 Soln
No ratings yet
ML Week7 Soln
3 pages
MLL Final Exam Prep
No ratings yet
MLL Final Exam Prep
5 pages
Questions and Solutions On Linear Regression
No ratings yet
Questions and Solutions On Linear Regression
5 pages
MLP Question Bank of AI and ML and NLP
No ratings yet
MLP Question Bank of AI and ML and NLP
7 pages
ISE 529 Mock Test Answers
No ratings yet
ISE 529 Mock Test Answers
6 pages
2023 Machine Learning
No ratings yet
2023 Machine Learning
8 pages
Assignment 1-12 ML
No ratings yet
Assignment 1-12 ML
54 pages
Interview Questions AI
No ratings yet
Interview Questions AI
7 pages
Data Science Final Mock Test
No ratings yet
Data Science Final Mock Test
47 pages
ASSIGNMENT2
No ratings yet
ASSIGNMENT2
6 pages
Int 354 ML-1
No ratings yet
Int 354 ML-1
4 pages
UNIT - 01: Short Questions (2 Marks)
No ratings yet
UNIT - 01: Short Questions (2 Marks)
5 pages
MLFA Spring 2024
No ratings yet
MLFA Spring 2024
11 pages
Nptel Week 5
No ratings yet
Nptel Week 5
4 pages
Midterm2008f Sol
No ratings yet
Midterm2008f Sol
12 pages
Final 2019
No ratings yet
Final 2019
15 pages
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
No ratings yet
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
10 pages
Wa0006.
No ratings yet
Wa0006.
4 pages
CS771 IITK EndSem Solutions
100% (1)
CS771 IITK EndSem Solutions
8 pages
Machine 2021 Jul-Dec
No ratings yet
Machine 2021 Jul-Dec
46 pages
ML SPPU Nov Dec 2023
No ratings yet
ML SPPU Nov Dec 2023
2 pages
2024 Machine Learning
No ratings yet
2024 Machine Learning
8 pages
2023-24 AIML ML Mid-Semester Regular QP Anwer-Keys
No ratings yet
2023-24 AIML ML Mid-Semester Regular QP Anwer-Keys
4 pages
Introduction To Machine Learning - Ecen 4122 - 2023
No ratings yet
Introduction To Machine Learning - Ecen 4122 - 2023
4 pages
Midterm Practice Questions
No ratings yet
Midterm Practice Questions
14 pages
Exam Preparation - Machine Learning Applications
No ratings yet
Exam Preparation - Machine Learning Applications
4 pages
07au Midterm
No ratings yet
07au Midterm
17 pages
2nd Exam Question Paper 2
No ratings yet
2nd Exam Question Paper 2
16 pages
Dda3020 22
No ratings yet
Dda3020 22
4 pages
Epfl Machine Learning Final Exam 2021 Solutions
No ratings yet
Epfl Machine Learning Final Exam 2021 Solutions
21 pages
Sppu ML 2023
No ratings yet
Sppu ML 2023
2 pages
5926 - Question - Paper ML
No ratings yet
5926 - Question - Paper ML
2 pages
ML Midsem 2018 Solutions
No ratings yet
ML Midsem 2018 Solutions
7 pages
CS2011 Ai & ML End Sem
No ratings yet
CS2011 Ai & ML End Sem
2 pages
ML Assignment
No ratings yet
ML Assignment
5 pages
Kernel PCA
No ratings yet
Kernel PCA
13 pages
Dda3020 2024F HW1
No ratings yet
Dda3020 2024F HW1
6 pages
Practice (Part III) 2
No ratings yet
Practice (Part III) 2
5 pages
Basic Statistics and Pharmaceutical Statistical Applications, 3rd Edition Complete Digital Book
100% (12)
Basic Statistics and Pharmaceutical Statistical Applications, 3rd Edition Complete Digital Book
15 pages
Practice Questions
No ratings yet
Practice Questions
8 pages
Regression Analysis
No ratings yet
Regression Analysis
16 pages
Assignment1 Solution
No ratings yet
Assignment1 Solution
15 pages
Eca Micro Project
No ratings yet
Eca Micro Project
22 pages
CH06 Wooldridge 7e PPT 2pp
No ratings yet
CH06 Wooldridge 7e PPT 2pp
17 pages
Statistics Basic (1-3)
No ratings yet
Statistics Basic (1-3)
37 pages
Powerful Forecasting With MS Excel Sample
No ratings yet
Powerful Forecasting With MS Excel Sample
257 pages
Stata Eview Problem Set 2 Sol
No ratings yet
Stata Eview Problem Set 2 Sol
14 pages
CH 04
No ratings yet
CH 04
53 pages
Margin of Error
No ratings yet
Margin of Error
6 pages
Multicollinearity and Oaxaca - Tutorial
No ratings yet
Multicollinearity and Oaxaca - Tutorial
35 pages
Regression Metrics
No ratings yet
Regression Metrics
26 pages
HW3 Isye 7406
No ratings yet
HW3 Isye 7406
8 pages
2020 - Applied Statistics For Environmental Science With R
No ratings yet
2020 - Applied Statistics For Environmental Science With R
3 pages
Mat 152 Course Syllabus - Fa2019
No ratings yet
Mat 152 Course Syllabus - Fa2019
4 pages
Manova and Mancova
No ratings yet
Manova and Mancova
10 pages
Unit 2
No ratings yet
Unit 2
4 pages
Aima Assignment Gm03 SOLUTION NOV 12
No ratings yet
Aima Assignment Gm03 SOLUTION NOV 12
9 pages
Advanced Regression With JMP PRO Handout
No ratings yet
Advanced Regression With JMP PRO Handout
46 pages
Econometrics Assignment For MGMT 2017
No ratings yet
Econometrics Assignment For MGMT 2017
2 pages
Business Statistics - II Syllabus
No ratings yet
Business Statistics - II Syllabus
2 pages
Assignment 2
No ratings yet
Assignment 2
9 pages
Uccd1143 Fa CS A00263cbcsf
No ratings yet
Uccd1143 Fa CS A00263cbcsf
10 pages
Chapter IV Answer Key
No ratings yet
Chapter IV Answer Key
9 pages
Decision Analyst Sandy Baron Has Taken A Job With An Upand Coming Consulting Firm in San
No ratings yet
Decision Analyst Sandy Baron Has Taken A Job With An Upand Coming Consulting Firm in San
3 pages
Regression Notes
No ratings yet
Regression Notes
23 pages
MH 3511 Midterm 2018 So LN
No ratings yet
MH 3511 Midterm 2018 So LN
5 pages
Actual Base+Trend Month Number+Seasonal Index: Airline Miles Data
No ratings yet
Actual Base+Trend Month Number+Seasonal Index: Airline Miles Data
3 pages
HousePricePrediction Poster
No ratings yet
HousePricePrediction Poster
1 page

S&UL Subjective Question Bank

Uploaded by

S&UL Subjective Question Bank

Uploaded by

Question Bank for Supervised and Unsupervised Learning

Short Answer Questions (2 Marks Each)

Short Answer Questions (2 Marks Each)

1. Why is KNN considered a lazy learning algorithm? (2 Marks_

Short Answer Questions (2 marks each)

7. Explain the concept of Bias-Variance Trade-off with a diagram.

11. What is overfitting in KNN and how can it be controlled?

Long Answer Questions (5 marks each )

a) Explain K-Fold Cross Validation with a diagram.

Short Questions (2 marks each).

Long Questions (5 marks each)

Q2. Given a linear regression cost function:

a) Perform two iterations of Batch Gradient Descent. Show step-by-step updates.

Q3. Given the following predictions and true labels:

a) Compute total binary cross-entropy loss.

Deg 1 14.2 14.5

Deg 3 7.1 7.5

Deg 6 2.3 12.1

a) Analyze the bias and variance behaviour of each model.

a) Show distance calculation for all training points.

a) Compute covariance matrix.

Q7. Given the following confusion matrix:

a) Calculate Precision, Recall, Accuracy, and F1-score.

Q9. Given initial 2D points and K=2,

You might also like