0% found this document useful (0 votes)

16 views6 pages

Machine Learning Insem-01 QP

The document contains a question paper for a Machine Learning course, featuring multiple-choice questions and descriptive questions related to various concepts in machine learning. Topics include k-Nearest Neighbors, supervised learning, distance metrics, handling outliers, missing data, and regularization techniques. It also includes practical tasks such as data analysis and correlation calculation based on a dataset of cars.

Uploaded by

khushpatel1222

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views6 pages

Machine Learning Insem-01 QP

Uploaded by

khushpatel1222

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

MACHINE LEARNING INSEM-01 QP

1 Which of the following is a disadvantage of k-Nearest Neighbors algorithm? [0.5 M]

a) Low accuracy b) Insensitive to outliers
c) Computationally expensive d) Need very less memory

2 The instance-based learner is a ____________. [0.5 M]

a) Lazy-learner b) Eager learner c) Fast Learner
d) None of these

3 Which of the following is not a supervised learning? [0.5 M]

a) Naive Bayesian b) Linear Regression
c) Principal Component Analysis d) Decision Tree

4 The Euclidean distance between the data-point A & B is ______. [0.5 M]

a) 0.17 b) 0.38 c) 0.91 d) 0.9

5 The ______ processes are powerful, non-parametric tools that can be used in supervised [0.5 M]
learning, namely in regression but also in classification problems.
(a) Stochastic (b) Markov (c) Gaussian (d) Statistical

6 Machine Learning uses the theory of ___________ in building mathematical models, [0.5 M]
because the core task is making inference from a sample.
(a) Statistics (b) Mathematics (c) optimization (d) physics

7 A _________________ is a function that separates the examples of different classes. [0.5 M]

(a) Determinant (b) Discriminant (c) Random Process
(d) Optimization Problem

8 If the values of two variables move in the opposite direction, ___________ the [0.5 M]
correlation is ___________
(a) Strong (b) Weak (c) Positive (d) Negative

9 The _________ is a model assessment technique used to evaluate a machine learning [0.5 M]
algorithm’s performance when making predictions on new datasets it has not been
trained on. This is done by partitioning a dataset and using a subset to train the algorithm
and the remaining data for testing.
(a) Correlation (b) Cross-Validation (c) Generalization (d) Normalization

Page 1 of 6
10 The most important way to characterize a random variable is to associate probabilities [0.5 M]
with the values it can take. If the random variable is discrete, i.e., it takes on a finite
number of values, then this assignment of probabilities is called a Probability Mass
Function . It must be, by definition, non-negative and must sum to one.

(a) Probability Mass Function (b) Probability Density Function

11 [3 M]
Consider the following data set, describing CARS:

Display count of cars, average MPG, minimum weight and maximum

displacement of cars with 8 and 4 cylinders in a tabular format.

Calculate the coefficient of correlation value between the attributes “Horse Power” and
“Cylinders” in the above dataset.

First row of the table: 1 M

Second row of the table: 1 M
Correlation Coefficient = 1 M
Ans

Page 2 of 6
12 What are outliers? Mention any two strategies to deal with outliers in datasets. [3 M]
Noise or outlier is a random error or variance in a measured variable.
Ans Strategies to deal with outliers include
Rule of thumb
• 1.5 * IQR above Q3 or below Q1 is an outlier
• 2 standard deviations away from mean

Binning
• Smoothens a sorted data value by consulting its neighborhood.
• The sorted values are distributed into a number of buckets or bins
• Also called as local smoothing

Regression- Data can be smoothed by fitting the data to a function

Clustering- Outliers can be detected by clustering.

1 mark for outlier, 2 marks for any 2 strategies

13 In the real-world data, tuples with missing values for some attributes are a [2 M]
common occurrence. Describe various methods for handling this problem.
Ans
First, determine the pattern of your missing data.

There are three types of missing data:

• Missing Completely at Random: There is no pattern in the
missing data on any variables. This is the best you can hope for.
• Missing at Random: There is a pattern in the missing data but
not on your primary dependent variables such as likelihood to
recommend.
• Missing Not at Random: There is a pattern in the missing data
that affect your primary dependent variables. For example, lower -
income participants are less likely to respond and thus affect your
conclusions about income and likelihood to recommend. Missing not
at random is your worst-case scenario. Proceed with caution.
(a) Replacing a missing value with the most commonly occurring value for that
attribute, or
(b) With the most probable value based on statistics
(c) Replace missing values with the mean.
(d) Replace missing values with the median.
(e) Replace missing values with an interpolated estimate.
(f) Replace missing values with a constant.

Page 3 of 6
(g) Replace missing values using imputation. Imputation is a way of using
features to model each other. That way, when one is missing, the others can be
used to fill in the blank in a reasonable way.
(h) Replace missing values with a dummy value and create an indicator variable
for "missing." When a missing value really means that the feature is not
applicable, then that fact can be highlighted. Filling in a dummy value that is
clearly different from actual values, such as a negative rank, is one way to do this.
Another is to create a new true/false feature tracking whether the original feature
is missing.
(i) Replace missing values with 0. A missing numerical value can mean zero.

First, determine the pattern of your missing data. [0.5 M]

Any 3 strategies: 0.5 * 3 = 1.5 Marks

14 What is Regularization? What is the main application of a Regularizer on cost [2 M]

functions of a Machine Learning model?
Ans Regularization optimizes the predictive models by preventing overfitting.

The performance of a machine learning model can be evaluated through a cost

function.

Generally, a cost function is represented by the sum of squares of the difference

between the actual and predicted value.

This is also called the ‘Sum of squared residuals’ or ‘Sum of squared errors’.

A predictive model when being trained attempts to fit the data in a manner that
minimizes this cost function.

A model begins to overfit when it passes through all the data points.

In such instances, although the value of the cost function is equal to zero, the
model having considered the noise in the dataset, does not represent the actual
function.

Under such circumstances, the error calculated on training data is less.

However, on the test data, the error remains huge.

Page 4 of 6
Essentially a model overfits the data by employing highly complex curves having
terms with large degrees of freedom and corresponding coefficients for each term
that provide weight to it.

For higher degrees of freedom the test set error is large when compared to the
train set error.

Regularization is a concept by which machine learning algorithms can be

prevented from overfitting a dataset.

Regularization achieves this by introducing a penalizing term in the

cost function which assigns a higher penalty to complex curves.
Lambda is a hyperparameter determining the severity of the penalty.

As the value of the penalty increases, the coefficients shrink in value in order to
minimize the cost function.

Since these coefficients also act as weights for the polynomial terms, shrinking
these will reduce the weight assigned to them and ultimately reduce its impact.

Therefore, for the case above, the coefficients assigned to higher degrees of
polynomial terms have shrunk to an extent where the value of such terms no
longer impacts the model as severely as it did before and so we have a simple
curve.

****************************
****************************
Regularization is an effective technique to prevent a model from overfitting.

It allows us to reduce the variance in a model without a substantial increase in

it’s bias.

This method allows us to develop a more generalized model even if only a few
data points are available in our dataset.

Ridge regression helps to shrink the coefficients of a model where the parameters
or features that determine the model is already known.

In contrast, lasso regression can be effective to exclude insignificant variables

from the model’s equation. In other words, lasso regression can help in feature
selection.

Page 5 of 6
Overall, it’s an important technique that can substantially improve the
performance of our model.
Regularization Definition – 0.5 M

Overfitting Definition – 0.5 M

Any one cost function – 0.5 M (Example SSE)

Impact of Regularization on the above cost function – 0.5 M

Page 6 of 6

ML Interview Questions PDF
100% (5)
ML Interview Questions PDF
20 pages
Xbar and R - Xbar and S Charts - X and MR Charts - Coursera (OK)
No ratings yet
Xbar and R - Xbar and S Charts - X and MR Charts - Coursera (OK)
18 pages
ML PYQs
No ratings yet
ML PYQs
32 pages
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
No ratings yet
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
11 pages
Interview Questions On Machine Learning
100% (4)
Interview Questions On Machine Learning
22 pages
ML Mid 1 Scheme
No ratings yet
ML Mid 1 Scheme
8 pages
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
No ratings yet
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
12 pages
Notes 1
No ratings yet
Notes 1
3 pages
Lecture 5
No ratings yet
Lecture 5
26 pages
Machine 2021 Jan-Apr
No ratings yet
Machine 2021 Jan-Apr
45 pages
ML Linear Model
No ratings yet
ML Linear Model
10 pages
Data Mining
No ratings yet
Data Mining
33 pages
Exam Question Ans
No ratings yet
Exam Question Ans
19 pages
Fiches Machine Learning
No ratings yet
Fiches Machine Learning
21 pages
Machine 2021 Jul-Dec
No ratings yet
Machine 2021 Jul-Dec
46 pages
5.feauture Engineering
No ratings yet
5.feauture Engineering
34 pages
MIDA1 AUT - Solutions
No ratings yet
MIDA1 AUT - Solutions
4 pages
DATA SCIENCE iNTERVIEW QUESTION
No ratings yet
DATA SCIENCE iNTERVIEW QUESTION
42 pages
QCM
No ratings yet
QCM
24 pages
My Notes
No ratings yet
My Notes
15 pages
Chapter 6 Supervised Learning
No ratings yet
Chapter 6 Supervised Learning
6 pages
Data Analytics Questions
No ratings yet
Data Analytics Questions
40 pages
ML 01
No ratings yet
ML 01
24 pages
What Are The Differences Between Supervised and Unsupervised Learning?
No ratings yet
What Are The Differences Between Supervised and Unsupervised Learning?
21 pages
EE2211 Past Paper Ans
No ratings yet
EE2211 Past Paper Ans
19 pages
ML Assignments 2025
No ratings yet
ML Assignments 2025
91 pages
EE2211 Past Paper
No ratings yet
EE2211 Past Paper
14 pages
QSRI Lecture1
No ratings yet
QSRI Lecture1
45 pages
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
No ratings yet
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
10 pages
DS Assignment COMPLETED
No ratings yet
DS Assignment COMPLETED
11 pages
Machine Learning 20CSE09
No ratings yet
Machine Learning 20CSE09
3 pages
Regression Analysis
No ratings yet
Regression Analysis
11 pages
ML Imp QB
No ratings yet
ML Imp QB
34 pages
Statistical Learning
No ratings yet
Statistical Learning
31 pages
40 Interview Questions On Machine Learning - AnalyticsVidhya
100% (1)
40 Interview Questions On Machine Learning - AnalyticsVidhya
21 pages
Week11 - Regularization and Optimization
No ratings yet
Week11 - Regularization and Optimization
75 pages
Introduction To Machine Learning Lecture 2: Linear Regression
No ratings yet
Introduction To Machine Learning Lecture 2: Linear Regression
38 pages
Unit 2
No ratings yet
Unit 2
133 pages
Understanding The Geometry of Predictive Models: Workshop at S P Jain School Institute of Management and Research
No ratings yet
Understanding The Geometry of Predictive Models: Workshop at S P Jain School Institute of Management and Research
78 pages
DS 1
No ratings yet
DS 1
20 pages
Wa0030.
No ratings yet
Wa0030.
36 pages
ML Ai
No ratings yet
ML Ai
53 pages
ABDUA 3 and 4
No ratings yet
ABDUA 3 and 4
102 pages
Machine Learning Qs
No ratings yet
Machine Learning Qs
10 pages
What Are The Differences Between Supervised and Unsupervised Learning?
No ratings yet
What Are The Differences Between Supervised and Unsupervised Learning?
22 pages
40 Interview Questions On Machine Learning From Analytics Vidhya
No ratings yet
40 Interview Questions On Machine Learning From Analytics Vidhya
14 pages
Common DS Interview Questions and Answers - 2
No ratings yet
Common DS Interview Questions and Answers - 2
7 pages
2 - Multiple Linear Regression
No ratings yet
2 - Multiple Linear Regression
71 pages
ML:Introduction What Is Machine Learning?: Continuous and Discrete Data
No ratings yet
ML:Introduction What Is Machine Learning?: Continuous and Discrete Data
6 pages
Data Science Interview Questions
No ratings yet
Data Science Interview Questions
31 pages
Aiml K2
No ratings yet
Aiml K2
8 pages
Lec-01-Introduction To Statistical Learning
No ratings yet
Lec-01-Introduction To Statistical Learning
38 pages
MLA TAB Lecture3
No ratings yet
MLA TAB Lecture3
70 pages
2.SupervisedLearning Error
No ratings yet
2.SupervisedLearning Error
32 pages
Cp4252 ML Unit-II
No ratings yet
Cp4252 ML Unit-II
44 pages
Summer of Science-Final Report
100% (1)
Summer of Science-Final Report
7 pages
BTMMeeting25Nov2020 StatisticalLearning
No ratings yet
BTMMeeting25Nov2020 StatisticalLearning
49 pages
Machine 2020 Jul-Dec
No ratings yet
Machine 2020 Jul-Dec
45 pages
2019-20-I ES Key
No ratings yet
2019-20-I ES Key
4 pages
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet
Random Optimization: Fundamentals and Applications
From Everand
Random Optimization: Fundamentals and Applications
Fouad Sabry
No ratings yet
1.9 First Order Linear Equations
No ratings yet
1.9 First Order Linear Equations
2 pages
2.4 Inverse Using Elementary Operations
No ratings yet
2.4 Inverse Using Elementary Operations
3 pages
1.5 Variable Separable Equations
No ratings yet
1.5 Variable Separable Equations
2 pages
1.15 Euler-Cauchy Linear Equation
No ratings yet
1.15 Euler-Cauchy Linear Equation
2 pages
1.16 Legendre's Linear Equation
No ratings yet
1.16 Legendre's Linear Equation
2 pages
2.7 Eigenvalues and Eigenvectors
No ratings yet
2.7 Eigenvalues and Eigenvectors
9 pages
1.14 Variation of Parameters
No ratings yet
1.14 Variation of Parameters
3 pages
1.13 Higher Order Linear Differential Equations
No ratings yet
1.13 Higher Order Linear Differential Equations
13 pages
1.7 Differential Equations With Linear Coefficients
No ratings yet
1.7 Differential Equations With Linear Coefficients
2 pages
End Sem QP Format and Sample QP - Communication Skills in English
No ratings yet
End Sem QP Format and Sample QP - Communication Skills in English
3 pages
1.2 Formulation of Differential Equations by Eliminating Arbitrary Constants
No ratings yet
1.2 Formulation of Differential Equations by Eliminating Arbitrary Constants
3 pages
Lesson Plan - July 2023 - BET - ELE 1071
No ratings yet
Lesson Plan - July 2023 - BET - ELE 1071
1 page
Digital Communication
No ratings yet
Digital Communication
9 pages
Tutorial-3 2
No ratings yet
Tutorial-3 2
10 pages
1.3 Families of Curves
No ratings yet
1.3 Families of Curves
2 pages
Example Problem-1
No ratings yet
Example Problem-1
10 pages
Case Study 1
No ratings yet
Case Study 1
9 pages
Tutorial-5 1
No ratings yet
Tutorial-5 1
1 page
Tutorial-6 1
No ratings yet
Tutorial-6 1
6 pages
DPS PYQs
No ratings yet
DPS PYQs
5 pages
Example Problem-3
No ratings yet
Example Problem-3
7 pages
Unit IV-Communications
No ratings yet
Unit IV-Communications
3 pages
Lecture-4-Carbohydrates & ATP
No ratings yet
Lecture-4-Carbohydrates & ATP
5 pages
Student Answer Script View: MIT MPL - 2nd-4th and 6th Semester - Midterm Examination - Mar 2024 Answer Sheet
No ratings yet
Student Answer Script View: MIT MPL - 2nd-4th and 6th Semester - Midterm Examination - Mar 2024 Answer Sheet
49 pages
ENGLISH Solutions
No ratings yet
ENGLISH Solutions
4 pages
MIE 1071 Bme
No ratings yet
MIE 1071 Bme
7 pages
DSE 2224 21 Mar 2024
No ratings yet
DSE 2224 21 Mar 2024
7 pages
Mid Semester Examination - Scheme of Evaluation: Vi - Semester B.Tech (Data Science and Engineering)
No ratings yet
Mid Semester Examination - Scheme of Evaluation: Vi - Semester B.Tech (Data Science and Engineering)
13 pages
MID SEM QP 2024 MARCH Final
No ratings yet
MID SEM QP 2024 MARCH Final
4 pages
Sessional - 2, April 2023 Machine Learning (DSE 2254), IV Sem, DSE Date: 19/04/2023 Max. Marks: 15
No ratings yet
Sessional - 2, April 2023 Machine Learning (DSE 2254), IV Sem, DSE Date: 19/04/2023 Max. Marks: 15
7 pages
0.1 Simulation Based Power Analysis For Factorial ANOVA Designs PDF
No ratings yet
0.1 Simulation Based Power Analysis For Factorial ANOVA Designs PDF
11 pages
Histogram Polygon: Draw Your Answer
No ratings yet
Histogram Polygon: Draw Your Answer
4 pages
Statistical Regression and Classification - From Linear Models To Machine Learning
100% (10)
Statistical Regression and Classification - From Linear Models To Machine Learning
532 pages
DTB (ch5)
No ratings yet
DTB (ch5)
14 pages
Ch3 Numerically Summarizing Data
No ratings yet
Ch3 Numerically Summarizing Data
35 pages
Chapter 7 Correlation
No ratings yet
Chapter 7 Correlation
16 pages
7 Measures of Central Tendency Data Location and Variability
No ratings yet
7 Measures of Central Tendency Data Location and Variability
61 pages
Linear Regression
No ratings yet
Linear Regression
76 pages
Sample Size in Animal Studies
No ratings yet
Sample Size in Animal Studies
4 pages
Assignment: Central Tendency (Arithmetic Mean, Median and Mode)
No ratings yet
Assignment: Central Tendency (Arithmetic Mean, Median and Mode)
1 page
Capstone Datasets
No ratings yet
Capstone Datasets
37 pages
(Cox (1972) ) Regression Models and Life Tables PDF
No ratings yet
(Cox (1972) ) Regression Models and Life Tables PDF
35 pages
Chapter 11-Inferences About Population Variances
No ratings yet
Chapter 11-Inferences About Population Variances
14 pages
Nunung Manis Setiyani, Rita Andini, Abrar Oemar
No ratings yet
Nunung Manis Setiyani, Rita Andini, Abrar Oemar
18 pages
Research Methodology
No ratings yet
Research Methodology
23 pages
Variance of Sample Variance
No ratings yet
Variance of Sample Variance
0 pages
EMF - Prático
No ratings yet
EMF - Prático
32 pages
Stats Assignment - 1
No ratings yet
Stats Assignment - 1
1 page
AS Lecture 09 (T-Test)
No ratings yet
AS Lecture 09 (T-Test)
31 pages
How To Use SPSS A Step by Step Guide To Analysis and Interpretation 4th Edition by Brian Cronk 188458568X 978-1884585685 Instant Download
100% (2)
How To Use SPSS A Step by Step Guide To Analysis and Interpretation 4th Edition by Brian Cronk 188458568X 978-1884585685 Instant Download
47 pages
Sampling
No ratings yet
Sampling
32 pages
Probability Assignment
80% (5)
Probability Assignment
25 pages
Topic 1 - Estimating Market Risk Measures Answer
No ratings yet
Topic 1 - Estimating Market Risk Measures Answer
22 pages
Assignment 2
No ratings yet
Assignment 2
3 pages
Checklist For Quasi-Experimental Appraisal Tool
No ratings yet
Checklist For Quasi-Experimental Appraisal Tool
4 pages
1structural Equation Modelling in Amos-2 PDF
No ratings yet
1structural Equation Modelling in Amos-2 PDF
40 pages
Maths SMILE - Manas Boolani
100% (1)
Maths SMILE - Manas Boolani
4 pages
1.bais Varience Trade-Off
No ratings yet
1.bais Varience Trade-Off
5 pages
# 7 Report: L1 (Lasso) Regularization
No ratings yet
# 7 Report: L1 (Lasso) Regularization
6 pages

Machine Learning Insem-01 QP

Uploaded by

Machine Learning Insem-01 QP

Uploaded by

MACHINE LEARNING INSEM-01 QP

1 Which of the following is a disadvantage of k-Nearest Neighbors algorithm? [0.5 M]

2 The instance-based learner is a ____________. [0.5 M]

3 Which of the following is not a supervised learning? [0.5 M]

4 The Euclidean distance between the data-point A & B is ______. [0.5 M]

a) 0.17 b) 0.38 c) 0.91 d) 0.9

7 A _________________ is a function that separates the examples of different classes. [0.5 M]

(a) Probability Mass Function (b) Probability Density Function

Display count of cars, average MPG, minimum weight and maximum

First row of the table: 1 M

Regression- Data can be smoothed by fitting the data to a function

1 mark for outlier, 2 marks for any 2 strategies

There are three types of missing data:

First, determine the pattern of your missing data. [0.5 M]

Any 3 strategies: 0.5 * 3 = 1.5 Marks

14 What is Regularization? What is the main application of a Regularizer on cost [2 M]

The performance of a machine learning model can be evaluated through a cost

Generally, a cost function is represented by the sum of squares of the difference

Under such circumstances, the error calculated on training data is less.

However, on the test data, the error remains huge.

Regularization is a concept by which machine learning algorithms can be

Regularization achieves this by introducing a penalizing term in the

It allows us to reduce the variance in a model without a substantial increase in

In contrast, lasso regression can be effective to exclude insignificant variables

Overfitting Definition – 0.5 M

Any one cost function – 0.5 M (Example SSE)

Impact of Regularization on the above cost function – 0.5 M

You might also like