0% found this document useful (0 votes)
106 views

Z.H. Sikder University of Science and Technology: Mid-Term Examination, Fall-2020

This document appears to be a midterm exam for a machine learning course, consisting of 11 multiple choice questions worth 1 mark each, and a 10 mark section for a viva-voce (oral) examination. The questions cover topics like supervised vs unsupervised learning, linear regression, gradient descent, and feature scaling.

Uploaded by

D.M Jahid
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
106 views

Z.H. Sikder University of Science and Technology: Mid-Term Examination, Fall-2020

This document appears to be a midterm exam for a machine learning course, consisting of 11 multiple choice questions worth 1 mark each, and a 10 mark section for a viva-voce (oral) examination. The questions cover topics like supervised vs unsupervised learning, linear regression, gradient descent, and feature scaling.

Uploaded by

D.M Jahid
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Z.H.

Sikder University of Science and Technology


Department of Computer Science and Engineering
Mid-term Examination, Fall-2020

Course Title: Machine Learning Program: B.Sc.


Course Code: CSE 4419 Full Marks: 20
Batch : 17th Total Time: 1 Hour
Answer all the questions from the following Marks:

1. A computer program is said to learn from experience E with respect to some task T and some 1
performance measure P if its performance on T, as measured by P, improves with experience
E. Suppose we feed a learning algorithm a lot of historical weather data, and have it learn to
predict weather. In this setting, what is T?

The weather prediction task.

None of these.

The probability of it correctly predicting a future date’s weather.

The process of the algorithm examining a large amount of historical weather data.

2. Suppose you are working on stock market prediction, and you would like to predict the price 1
of a particular stock tomorrow (measured in dollars). You want to use a learning algorithm for
this. Would you treat this as a classification or a regression problem?

Regression
Classification

3. Some of the problems below are best addressed using a supervised learning algorithm, and the 1
others with an unsupervised learning algorithm. Which of the following would you apply
supervised learning to? (Select all that apply.)
Given data on how 1000 medical patients respond to an experimental drug (such as
effectiveness of the treatment, side effects, etc.), discover whether there are different
categories or “types” of patients in terms of how they respond to the drug, and if so what these
categories are.

Given a large dataset of medical records from patients suffering from heart
disease, try to learn whether there might be different clusters of such patients for which we
might tailor separate treatments.

Have a computer examine an audio clip of a piece of music, and classify whether or
not there are vocals (i.e., a human voice singing) in that audio clip, or if it is a clip of only
musical instruments (and no vocals).

Given genetic (DNA) data from a person, predict the odds of him/her developing
diabetes over the next 10 years.

4. Consider the problem of predicting how well a student does in her second year of 1
college/university, given how well she did in her first year. Specifically, let x be equal to the
number of “A” grades (including A-. A and A+ grades) that a student receives in their first
year of college (freshmen year). We would like to predict the value of y, which we define as
the number of “A” grades they get in their second year (sophomore year).
Here each row is one training example. Recall that in linear regression, our hypothesis
is to denote the number of training examples.

For the training set given above (note that this training set may also be referenced in other
questions in this quiz), what is the value of ?
4

4.5

5. For this question, assume that we are using the training set from Question no-4.Recall our 1

definition of the cost function was .What is ?

.5

1.5

.7

6. Suppose we set in the linear regression hypothesis from Question no-4. 1

What is ?

2.5

7. In the given figure, the cost function has been plotted against and , as shown in 1

‘Plot 2’. The contour plot for the same cost function is given in ‘Plot 1’. Based on the figure,
choose the correct options (check all that apply).

If we start from point B, gradient descent with a well-chosen learning rate will
eventually help us reach at or near point A, as the value of cost function J(\theta_0, \theta_1) is
maximum at point A.

If we start from point B, gradient descent with a well-chosen learning rate will
eventually help us reach at or near point C, as the value of cost function J(\theta_0, \theta_1) is
minimum at point C.

Point P (the global minimum of plot 2) corresponds to point A of Plot 1.

If we start from point B, gradient descent with a well-chosen learning rate will
eventually help us reach at or near point A, as the value of cost function J(\theta_0, \theta_1) is
minimum at A.

Point P (The global minimum of plot 2) corresponds to point C of Plot 1.

8. Suppose that for some linear regression problem (say, predicting housing prices as in the 1

lecture), we have some training set, and for our training set we managed to find some ,
such that .Which of the statements below must then be true?
Gradient descent is likely to get stuck at a local minimum and fail to find the global
minimum.

For this to be true, we must have and


so that

For this to be true, we must have for every value of = 1, 2,…, .

Our training set can be fit perfectly by a straight line, i.e., all of our training
examples lie perfectly on some straight line.

9. Let u and v be 3-dimensional vectors, where specifically 1

and

what is ?

-4

10. Which of the following are reasons for using feature scaling?

It is necessary to prevent gradient descent from getting stuck in local optima.

It prevents the matrix (used in the normal equation) from being non-
invertible (singular/degenerate).

It speeds up gradient descent by making it require fewer iterations to get to a good


solution.

11. Viva-voce 10

You might also like