Z.H. Sikder University of Science and Technology: Mid-Term Examination, Fall-2020
Z.H. Sikder University of Science and Technology: Mid-Term Examination, Fall-2020
1. A computer program is said to learn from experience E with respect to some task T and some 1
performance measure P if its performance on T, as measured by P, improves with experience
E. Suppose we feed a learning algorithm a lot of historical weather data, and have it learn to
predict weather. In this setting, what is T?
None of these.
The process of the algorithm examining a large amount of historical weather data.
2. Suppose you are working on stock market prediction, and you would like to predict the price 1
of a particular stock tomorrow (measured in dollars). You want to use a learning algorithm for
this. Would you treat this as a classification or a regression problem?
Regression
Classification
3. Some of the problems below are best addressed using a supervised learning algorithm, and the 1
others with an unsupervised learning algorithm. Which of the following would you apply
supervised learning to? (Select all that apply.)
Given data on how 1000 medical patients respond to an experimental drug (such as
effectiveness of the treatment, side effects, etc.), discover whether there are different
categories or “types” of patients in terms of how they respond to the drug, and if so what these
categories are.
Given a large dataset of medical records from patients suffering from heart
disease, try to learn whether there might be different clusters of such patients for which we
might tailor separate treatments.
Have a computer examine an audio clip of a piece of music, and classify whether or
not there are vocals (i.e., a human voice singing) in that audio clip, or if it is a clip of only
musical instruments (and no vocals).
Given genetic (DNA) data from a person, predict the odds of him/her developing
diabetes over the next 10 years.
4. Consider the problem of predicting how well a student does in her second year of 1
college/university, given how well she did in her first year. Specifically, let x be equal to the
number of “A” grades (including A-. A and A+ grades) that a student receives in their first
year of college (freshmen year). We would like to predict the value of y, which we define as
the number of “A” grades they get in their second year (sophomore year).
Here each row is one training example. Recall that in linear regression, our hypothesis
is to denote the number of training examples.
For the training set given above (note that this training set may also be referenced in other
questions in this quiz), what is the value of ?
4
4.5
5. For this question, assume that we are using the training set from Question no-4.Recall our 1
.5
1.5
.7
What is ?
2.5
7. In the given figure, the cost function has been plotted against and , as shown in 1
‘Plot 2’. The contour plot for the same cost function is given in ‘Plot 1’. Based on the figure,
choose the correct options (check all that apply).
If we start from point B, gradient descent with a well-chosen learning rate will
eventually help us reach at or near point A, as the value of cost function J(\theta_0, \theta_1) is
maximum at point A.
If we start from point B, gradient descent with a well-chosen learning rate will
eventually help us reach at or near point C, as the value of cost function J(\theta_0, \theta_1) is
minimum at point C.
If we start from point B, gradient descent with a well-chosen learning rate will
eventually help us reach at or near point A, as the value of cost function J(\theta_0, \theta_1) is
minimum at A.
8. Suppose that for some linear regression problem (say, predicting housing prices as in the 1
lecture), we have some training set, and for our training set we managed to find some ,
such that .Which of the statements below must then be true?
Gradient descent is likely to get stuck at a local minimum and fail to find the global
minimum.
Our training set can be fit perfectly by a straight line, i.e., all of our training
examples lie perfectly on some straight line.
and
what is ?
-4
10. Which of the following are reasons for using feature scaling?
It prevents the matrix (used in the normal equation) from being non-
invertible (singular/degenerate).
11. Viva-voce 10