0% found this document useful (0 votes)
97 views2 pages

Interview Questions

KNN is a supervised algorithm that classifies points based on distance to nearest labeled points, while k-means is unsupervised and classifies points into clusters based on distances to cluster means. To avoid overfitting, keep models simple, reduce noise, and use techniques like regularization, cross-validation, and parameter penalization. Ensemble learning combines many base models to improve performance, using them sequentially or in parallel.

Uploaded by

rashmi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
97 views2 pages

Interview Questions

KNN is a supervised algorithm that classifies points based on distance to nearest labeled points, while k-means is unsupervised and classifies points into clusters based on distances to cluster means. To avoid overfitting, keep models simple, reduce noise, and use techniques like regularization, cross-validation, and parameter penalization. Ensemble learning combines many base models to improve performance, using them sequentially or in parallel.

Uploaded by

rashmi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

1. Explain the difference between KNN and k.means clustering?

KNN is a supervised machine learning algorithm where we need to provide the labelled data to the
model it then classifies the points based on the distance of the point from the nearest points.
Whereas, on the other hand, K Means clustering is an unsupervised machine learning algorithm thus
we need to provide the model with unlabelled data and this algorithm classifies points into clusters
based on the mean of the distances between different points

2. How to ensure that your model is not overfitting?


Keep the design of the model simple. Try to reduce the noise in the model by considering fewer
variables and parameters.

3.Explain Ensemble learning.


In ensemble learning, many base models like classifiers and regressors are generated and combined
together so that they give better results. It is used when we build component classifiers that are
accurate and independent. There are sequential as well as parallel ensemble methods

4. What is OpenCV ?
OpenCV is Open Source Computer Vision Library released under BSD license. Below you find
some latest interview questions and answers on OpenCV.

5. While working on a data set, how do you select important variables? Explain your methods.
Following are the methods of variable selection you can use:
1. Remove the correlated variables prior to selecting important variables
2. Use linear regression and select variables based on p values
3. Use Forward Selection, Backward Selection, Stepwise Selection
4. Use Random Forest, Xgboost and plot variable importance chart
5. Use Lasso Regression
6. Measure information gain for the available set of features and select top n features
accordingly.

6. When does regularization becomes necessary in Machine Learning?


Regularization becomes necessary when the model begins to ovefit / underfit. This technique
introduces a cost term for bringing in more features with the objective function.

7. Explain dimension reduction in machine learning.


Dimension Reduction is the process of reducing the size of the feature matrix. We try to reduce the
number of columns so that we get a better feature set either by combining columns or by removing
extra variables.
8. What Are the Different Types of Machine Learning?
There are three types of machine learning:
1. Supervised Learning
2. Unsupervised Learning
3. Reinforcement Learning

9. What Is Overfitting, and How Can You Avoid It?


Overfitting is a situation that occurs when a model learns the training set too well, taking up random
fluctuations in the training data as concepts. These impact the model’s ability to generalize and
don’t apply to new data.
When a model is given the training data, it shows 100 percent accuracy—technically a slight loss.
But, when we use the test data, there may be an error and low efficiency. This condition is known as
overfitting.
There are multiple ways of avoiding overfitting, such as:
 Regularization. It involves a cost term for the features involved with the objective function
 Making a simple model. With lesser variables and parameters, the variance can be reduced.
 Cross-validation methods like k-folds can also be used
 If some model parameters are likely to cause overfitting, techniques for regularization like
LASSO can be used that penalize these parameters

10.Explain the Confusion Matrix with Respect to Machine Learning Algorithms.


A confusion matrix (or error matrix) is a specific table that is used to measure the performance of an
algorithm. It is mostly used in supervised learning; in unsupervised learning, it’s called the
matching matrix.
Parameters:
The confusion matrix has two parameters:
 Actual
 Predicted

You might also like