0% found this document useful (0 votes)
66 views4 pages

Tutorial 1 Machine Learning

Universiti Malaya

Uploaded by

BunnySha Land
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
66 views4 pages

Tutorial 1 Machine Learning

Universiti Malaya

Uploaded by

BunnySha Land
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

WIA1006/WID3006, Semester II, Session 2022/2023

Tutorial 1

1.1 Intro to ML

1. A computer program is said to learn from experience E with respect to some task T and
some performance measure P if its performance on T, as measured by P, improves with
experience E. Suppose we feed a learning algorithm a lot of historical trajectory data of
vehicles, and have it learn to predict traffic speed. In this setting what is E?
2. Suppose you are working on traffic prediction, and you would like to predict whether traffic
will be heavy at 5pm tomorrow or not. You want to use a learning algorithm for this. Would
you treat this as a classification or a regression problem?

3. Some of the problems below are best addressed using a supervised learning algorithm, and
the others with an unsupervised learning algorithm. Which of the following would you
apply supervised learning to? (Select all that apply.) In each case, assume some appropriate
dataset is available for your algorithm to learn from.

Problems Supervised/
Unsupervised?
Take a collection of 1000 essays written on the US Economy, and find a way
to automatically group these essays into a small number of groups of essays
that are somehow "similar" or "related".

Given 50 articles written by male authors, and 50 articles written by female


authors, learn to predict the gender of a new manuscript's author (when the
identity of this author is unknown).

Examine a large collection of emails that are known to be spam email, to


discover if there are sub-types of spam mail.
Examine a web page, and classify whether the content on the web page
should be considered "child friendly" (e.g., non-pornographic, etc.) or
"adult."
WIA1006/WID3006, Semester II, Session 2022/2023

1.2 Linear Regression (Univariate)

1. Consider the problem of predicting sunny weather condition in each week of 2023 given the
sunny weather condition in each week of 2022.
In this scenario, x represents the number of days in each week that the weather is sunny in
2022. The value of y is defined as “the number of sunny days” in each week of 2023 which
we want to predict.
The following training set is a sample of few weeks with number of sunny days in each of
them.
Recall that in linear regression our hypothesis is , and we use m to
denote the number of training examples.

x y
3 2
2 3
4 5
1 1
5 4

For the training set given above, what is the value of m?

2. For this question, continue using the data provided in (1). Recall the definition of cost
function for linear regression is
𝑚
1 2
𝐽(𝜃0 , 𝜃1 ) = ∗∑ (ℎ𝜃 (𝑥 (𝑖) ) − 𝑦 (𝑖) )
2𝑚
𝑖

What is J(0,1)?

3. Suppose we set 𝜃0 = −1, 𝜃1 = 0.5, what is ℎ𝜃 (3)?

4. Three different classifiers are trained on the same data. Their decision
boundaries are shown below. Which of the following statements are
true?

€ The leftmost classifier has high robustness, poor fit.


WIA1006/WID3006, Semester II, Session 2022/2023

€ The leftmost classifier has poor robustness, high fit.


€ The rightmost classifier has poor robustness, high fit.
€ The rightmost classifier has high robustness, poor fit

5. What is the difference between local minima and global minima gradient descent?

1.3 Linear Regression (Multivariate)

1. Suppose we have m = 4 houses, and each house has area and number of bedrooms which
can be used to predict the house price. A dataset of the features is as follows:

Bedrooms Sqft_area Price

1. 1 880 490,000

2. 3 1930 630,000

3. 4 1940 640,000

4. 3 1350 570,000

You’d like to use polynomial regression to predict a house price from its numbers of
bedrooms and sqft_area. Concretely, suppose you wish to fit a model of the form.

where x1 is the number of bedrooms and x2 is sqft_area. Further you plan to use both
feature scaling (dividing by the max-min, or range, of a feature) and mean normalization.

What is the normalized feature 𝑥24 (i.e. for the fourth training data)?

2. You run gradient descent for 12 iterations with α = 0.2 and compute J(θ) after each iteration
you find that the value of J(θ) increases over time. What would you do to correct this issue?

3. Suppose you have m = 23 training examples with n = 5 features (excluding the additional all-
ones feature for the intercept term, which you should add). The normal equation is
θ=(XTX)−1XT y. For the given values of m and n, what are the dimensions of θ, X, and y in this
equation?

1. X is 23 × 6, y is 23 × 6, θ is 6 × 6

2. X is 23 × 5, y is 23 × 1, θ is 5 × 1

3. X is 23 × 6, y is 23 × 1, θ is 6 × 1

4. X is 23 × 6, y is 23 × 1, θ is 5 × 5
WIA1006/WID3006, Semester II, Session 2022/2023

4. Suppose you have a dataset with m = 1000000 examples and n = 200000 features for each
example. You want to use multivariate linear regression to fit the parameters θ to our data.
Should you prefer gradient descent or the normal equation?

5. Which of the following statements are true?


1. MAE doesn’t add any additional weight to the distance between points. The error
growth is linear.
2. MSE errors grow exponentially with larger values of distance. It’s a metric that adds a
massive penalty to points that are far away and a minimal penalty for points that are
close to the expected result.
3. It is necessary to prevent the normal equation from getting stuck in local optima.
4. It prevents the matrix XTX (used in the normal equation) from being non-invertable
(singular/degenerate).

You might also like