Machine Learning Insem-01 QP
Machine Learning Insem-01 QP
5 The ______ processes are powerful, non-parametric tools that can be used in supervised [0.5 M]
learning, namely in regression but also in classification problems.
(a) Stochastic (b) Markov (c) Gaussian (d) Statistical
6 Machine Learning uses the theory of ___________ in building mathematical models, [0.5 M]
because the core task is making inference from a sample.
(a) Statistics (b) Mathematics (c) optimization (d) physics
8 If the values of two variables move in the opposite direction, ___________ the [0.5 M]
correlation is ___________
(a) Strong (b) Weak (c) Positive (d) Negative
9 The _________ is a model assessment technique used to evaluate a machine learning [0.5 M]
algorithm’s performance when making predictions on new datasets it has not been
trained on. This is done by partitioning a dataset and using a subset to train the algorithm
and the remaining data for testing.
(a) Correlation (b) Cross-Validation (c) Generalization (d) Normalization
Page 1 of 6
10 The most important way to characterize a random variable is to associate probabilities [0.5 M]
with the values it can take. If the random variable is discrete, i.e., it takes on a finite
number of values, then this assignment of probabilities is called a Probability Mass
Function . It must be, by definition, non-negative and must sum to one.
11 [3 M]
Consider the following data set, describing CARS:
Calculate the coefficient of correlation value between the attributes “Horse Power” and
“Cylinders” in the above dataset.
Page 2 of 6
12 What are outliers? Mention any two strategies to deal with outliers in datasets. [3 M]
Noise or outlier is a random error or variance in a measured variable.
Ans Strategies to deal with outliers include
Rule of thumb
• 1.5 * IQR above Q3 or below Q1 is an outlier
• 2 standard deviations away from mean
Binning
• Smoothens a sorted data value by consulting its neighborhood.
• The sorted values are distributed into a number of buckets or bins
• Also called as local smoothing
13 In the real-world data, tuples with missing values for some attributes are a [2 M]
common occurrence. Describe various methods for handling this problem.
Ans
First, determine the pattern of your missing data.
Page 3 of 6
(g) Replace missing values using imputation. Imputation is a way of using
features to model each other. That way, when one is missing, the others can be
used to fill in the blank in a reasonable way.
(h) Replace missing values with a dummy value and create an indicator variable
for "missing." When a missing value really means that the feature is not
applicable, then that fact can be highlighted. Filling in a dummy value that is
clearly different from actual values, such as a negative rank, is one way to do this.
Another is to create a new true/false feature tracking whether the original feature
is missing.
(i) Replace missing values with 0. A missing numerical value can mean zero.
This is also called the ‘Sum of squared residuals’ or ‘Sum of squared errors’.
A predictive model when being trained attempts to fit the data in a manner that
minimizes this cost function.
A model begins to overfit when it passes through all the data points.
In such instances, although the value of the cost function is equal to zero, the
model having considered the noise in the dataset, does not represent the actual
function.
Page 4 of 6
Essentially a model overfits the data by employing highly complex curves having
terms with large degrees of freedom and corresponding coefficients for each term
that provide weight to it.
For higher degrees of freedom the test set error is large when compared to the
train set error.
As the value of the penalty increases, the coefficients shrink in value in order to
minimize the cost function.
Since these coefficients also act as weights for the polynomial terms, shrinking
these will reduce the weight assigned to them and ultimately reduce its impact.
Therefore, for the case above, the coefficients assigned to higher degrees of
polynomial terms have shrunk to an extent where the value of such terms no
longer impacts the model as severely as it did before and so we have a simple
curve.
****************************
****************************
Regularization is an effective technique to prevent a model from overfitting.
This method allows us to develop a more generalized model even if only a few
data points are available in our dataset.
Ridge regression helps to shrink the coefficients of a model where the parameters
or features that determine the model is already known.
Page 5 of 6
Overall, it’s an important technique that can substantially improve the
performance of our model.
Regularization Definition – 0.5 M
Page 6 of 6