0% found this document useful (0 votes)
9 views3 pages

ML 5 Mark Questions Answers

The document provides a comprehensive overview of key concepts in machine learning, including techniques like Least Squares in Linear Regression, differences between L1 and L2 regularization, and the workings of Logistic Regression. It also covers practical aspects of model training, decision trees, support vector machines, and methods for dimensionality reduction such as PCA. Additionally, it discusses the importance of feature selection, bias-variance trade-off, and various strategies for improving model performance.

Uploaded by

Suman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views3 pages

ML 5 Mark Questions Answers

The document provides a comprehensive overview of key concepts in machine learning, including techniques like Least Squares in Linear Regression, differences between L1 and L2 regularization, and the workings of Logistic Regression. It also covers practical aspects of model training, decision trees, support vector machines, and methods for dimensionality reduction such as PCA. Additionally, it discusses the importance of feature selection, bias-variance trade-off, and various strategies for improving model performance.

Uploaded by

Suman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

5-Mark Questions and Answers - Machine Learning Applications

1. Explain the concept of 'Least Squares' in Linear Regression.

Ans: The Least Squares method minimizes the sum of squared differences between observed values and

predicted values. It finds the best-fitting line by minimizing the cost function J() = (1/n) * (y )².

2. Difference between L1 and L2 regularization; preference situations.

Ans: L1 (Lasso) adds || to the cost and can shrink some coefficients to zero (feature selection). L2 (Ridge)

adds ² and shrinks all coefficients evenly. Use L1 for sparsity and L2 when all features matter.

3. How Logistic Regression predicts binary outcomes using sigmoid.

Ans: It applies the sigmoid function (z) = 1 / (1 + e^(z)) to the linear output, converting it to a probability. If

the output is >0.5, class 1 is predicted, otherwise class 0.

4. Two practical aspects before training a machine learning model.

Ans: 1. Data preprocessing (handling missing values, scaling, encoding). 2. Feature selection/engineering

(remove irrelevant features to improve model performance).

5. How a Decision Tree makes a prediction.

Ans: The input data is evaluated from the root node by applying feature-based splits until it reaches a leaf

node, which provides the predicted class or value.

6. What is pruning in Decision Trees and why it's important.

Ans: Pruning removes unnecessary branches to reduce overfitting. Pre-pruning stops growth early,

post-pruning trims after full growth for better generalization.

7. Concept of 'hyperplane' and 'margin' in SVMs.

Ans: The hyperplane separates classes in feature space. Margin is the distance to the closest data points

(support vectors). SVM maximizes this margin for better generalization.

8. Bagging vs. Boosting.

Ans: Bagging trains multiple models independently to reduce variance (e.g., Random Forest). Boosting

trains models sequentially to reduce bias (e.g., AdaBoost).

9. Describe K-Means algorithm.

Ans: 1. Initialize K centroids. 2. Assign each point to the nearest centroid. 3. Update centroids. 4. Repeat

until convergence. Challenge: sensitive to K and initialization.

10. Intuition behind PCA for dimensionality reduction.

Ans: PCA finds principal components (directions of max variance), reduces dimensions by projecting data
into fewer axes while retaining most information.

11. Importance of feature selection.

Ans: It removes irrelevant/noisy features, reduces overfitting, speeds up training, and improves model

interpretability and accuracy.

12. How to debug a poorly performing model.

Ans: Check if it has high bias (underfitting) or variance (overfitting) using learning curves. Tune

hyperparameters, improve data quality, or change the model.

13. Bias-Variance trade-off.

Ans: Bias is error from overly simplistic models; variance is error from sensitivity to data noise. High bias

leads to underfitting, high variance to overfitting.

14. What is one-hot encoding and why it's used.

Ans: It converts categorical variables into binary vectors to be usable by ML models. Each category is

represented by a unique binary column.

15. Steps in setting up a supervised learning problem.

Ans: 1. Define the problem. 2. Collect and clean data. 3. Feature engineering. 4. Split into train/test. 5. Train

model. 6. Evaluate. 7. Deploy.

16. How Decision Tree predicts unseen data (with example).

Ans: Follow feature-based decisions from root to leaf. Example: if outlook=sunny and humidity=high predict

'no'.

17. Information Gain and Gini Impurity as split criteria.

Ans: Information Gain measures entropy reduction; Gini measures class impurity. Both help select best

features for splitting.

18. Disadvantages of unpruned Decision Trees; how pruning helps.

Ans: Unpruned trees are complex and overfit. Pruning simplifies the tree, improving generalization and

reducing overfitting.

19. Decision Trees vs. Linear Regression.

Ans: Linear Regression assumes linearity and works with numeric data. Decision Trees handle both types

and non-linear patterns but overfit easily.

20. Pre-pruning vs. Post-pruning.

Ans: Pre-pruning stops tree growth early. Post-pruning grows the tree fully and removes unhelpful branches

afterward.
21. Margin in SVMs and why maximize it.

Ans: Margin is the distance to the closest points. Maximizing it improves generalization and robustness of

the model.

22. SVM and non-linearly separable data (kernel trick).

Ans: The kernel trick maps data to a higher-dimensional space where it becomes linearly separable.

Example: RBF kernel.

23. Hard margin vs. Soft margin SVM.

Ans: Hard margin allows no misclassification (requires perfect separation). Soft margin uses slack variables

to allow some errors, better for noisy data.

24. Regularization parameter (C) in SVMs.

Ans: C balances margin maximization vs. classification error. Low C allows wider margins with some errors;

high C focuses on minimizing errors.

25. SVM vs. Logistic Regression.

Ans: SVM maximizes margin and handles non-linear data via kernels. Logistic Regression estimates

probabilities and is more interpretable for linear cases.

You might also like