0% found this document useful (0 votes)
8 views13 pages

Ass3 Solns

Uploaded by

ceadamtan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views13 pages

Ass3 Solns

Uploaded by

ceadamtan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Homework Assignment 3

Due: Friday, October 4, 2024, 11:59 p.m. Mountain time


Total marks: 20

Policies:
For all multiple-choice questions, note that multiple correct answers may exist. However, selecting
an incorrect option will cancel out a correct one. For example, if you select two answers, one
correct and one incorrect, you will receive zero points for that question. Similarly, if the number
of incorrect answers selected exceeds the correct ones, your score for that question will be zero.
Please note that it is not possible to receive negative marks. You must select all the correct
options to get full marks for the question.
While the syllabus initially indicated the need to submit a paragraph explaining the use of AI or
other resources in your assignments, this requirement no longer applies as we are now utilizing
eClass quizzes instead of handwritten submissions. Therefore, you are not required to submit any
explanation regarding the tools or resources (such as online tools or AI) used in completing this
quiz.
This PDF version of the questions has been provided for your convenience should you wish to print
them and work offline.
Only answers submitted through the eClass quiz system will be graded. Please do not
submit a written copy of your responses.

Question 1. [1 mark]
Suppose you are tasked with predicting the age of a house based on the following information: the
size of the house in square feet, the number of bedrooms, and the year it was built. Which of the
following options correctly specifies the features and the label for this problem?

a. Features: size of the house, number of bedrooms, year built; Label: age of the house

b. Features: year built, number of bedrooms; Label: size of the house

c. Features: age of the house, size of the house; Label: number of bedrooms

d. Features: number of bedrooms, year built; Label: size of the house

Solution:
The correct answer is:

• a. Features: size of the house, number of bedrooms, year built; Label: age of the house

Question 2. [1 mark]
Imagine you want to predict the likelihood that a customer will purchase a product based on their
past purchase history, the amount of time spent on the website, and their demographic information
(e.g., age, gender, location). Which of the following options correctly specifies the features and the
label for this problem?

1/13
Fall 2024 CMPUT 267: Basics of Machine Learning

a. Features: past purchase history, time spent on the website, demographic info; Label: likeli-
hood of purchase

b. Features: likelihood of purchase, past purchase history; Label: time spent on the website

c. Features: demographic info, likelihood of purchase; Label: past purchase history

d. Features: time spent on the website, past purchase history; Label: demographic info

Solution:
The correct answer is:
• a. Features: past purchase history, time spent on the website, demographic info; Label:
likelihood of purchase

Question 3. [1 mark]
You are provided with the daily temperatures, humidity levels, and wind speeds in a city. You
need to predict whether the city’s next recorded temperature will exceed the highest recorded
temperature for that day. Which of the following options correctly specifies the features and the
label for this problem?
a. Features: daily temperature, humidity levels, wind speeds; Label: whether the next recorded
temperature exceeds the record high

b. Features: wind speeds, daily temperature; Label: humidity levels

c. Features: whether the next recorded temperature exceeds the record high; Label: daily
temperature, humidity levels, wind speeds

d. Features: humidity levels, whether the next recorded temperature exceeds the record high;
Label: wind speeds

Solution:
The correct answer is:
• a. Features: daily temperature, humidity levels, wind speeds; Label: whether the next
recorded temperature exceeds the record high

Question 4. [1 mark]
Suppose you have a dataset where each instance represents a car. The features are: weight of the
car (in kg), engine power (in horsepower), and the number of seats. The label is the price of the
car in dollars. True or False: This is a regression problem.

Solution:
Answer: True.
Explanation: This is a regression problem because the label (car price) can take any real value.
In other words, the label has a notion of order and can be represented as a real number R, making
it suitable for regression.

2/13
Fall 2024 CMPUT 267: Basics of Machine Learning

Question 5. [1 mark]
You are working on a dataset where each instance contains information about a flower. The features
are petal length, petal width, and sepal length (all in cm), and the label indicates the species of
the flower (one of three categories: species A, species B, or species C). True or False: This is a
classification problem.

Solution:
Answer: True.
Explanation: This is a classification problem because the label (flower species) has no notion of
order. The labels are drawn from a finite set of categories (species A, B, or C), and there is no
inherent ranking or numerical value associated with them, making this a classification problem.

Question 6. [1 mark]
Consider a dataset where each instance records a patient’s age, weight, and height. The label
represents the patient’s blood type: A, B, AB, or O. True or False: This is a classification problem.

Solution:
Answer: True.
Explanation: This is a classification problem because the label (blood type) has no notion of
order. The labels come from a finite set of categories (A, B, AB, or O), and there is no inherent
ranking or numerical relationship between them, which makes this a classification task.

Question 7. [1 mark]
Suppose you are working with a dataset where the features X ∈ X = R and the labels Y ∈ Y = R.
You are using a predictor f (X) and the absolute loss function `(f (X), Y ) = |f (X) − Y |, where
f (X) represents the prediction and Y represents the label.
Which of the following expressions correctly represents the expected loss?
R
a. E[`(f (X), Y )] = R |f (x) − y|p(x, y) dy
R R
b. E[`(f (X), Y )] = R R |f (x) − y|p(x, y) dy dx
R R
c. E[`(f (X), Y )] = R R |f (x) − y|p(y | x)p(x) dy dx
R R
d. E[`(f (X), Y )] = R R log(1 + |f (x) − y|)p(x, y) dy dx

Solution:
The correct answers are:
• b. E[`(f (X), Y )] =
R R
R R |f (x) − y|p(x, y) dy dx

• c. E[`(f (X), Y )] =
R R
R R |f (x) − y|p(y | x)p(x) dy dx
Explanation: Both expressions b and c are correct. In b, the expected loss is expressed directly
using the joint probability density p(x, y), where the integral covers both X and Y over their entire
domain R. In c, the joint probability p(x, y) is factorized using the product rule as p(y | x)p(x),
where p(y | x) is the conditional probability of Y given X, and p(x) is the marginal distribution of
X.

3/13
Fall 2024 CMPUT 267: Basics of Machine Learning

Question 8. [1 mark]
Suppose you flip a fair coin 5 times and observe the following outcomes: X1 = 1, X2 = 0, X3 = 1,
X4 = 0, and X5 = 1, where 1 represents heads and 0 represents tails. What is the sample mean of
these 5 coin flips?
(Do not write your answer as a fraction. Instead, express it as a decimal number, rounded to two
decimal places if necessary. For example, write 0.33 instead of 1/3.)

Solution:
The correct value is 0.6.
Explanation: The sample mean α̂ is calculated as the average of the observed outcomes:
1+0+1+0+1 3
α̂ = = = 0.6
5 5
Therefore, the sample mean of the 5 coin flips is 0.6.
Note: The sample mean represents the proportion of heads observed in the 5 flips. Since 3 out of
5 flips resulted in heads, the sample mean is 0.6.

Question 9. [1 mark]
Suppose you are given a predictor f (x) = 10 − 0.2x, which models the relationship between the
age of a house x (in years) and its price y (in hundreds of thousands of dollars). You are provided
with the following dataset of 5 (x, y) pairs:

D = ((x1 , y1 ), (x2 , y2 ), (x3 , y3 ), (x4 , y4 ), (x5 , y5 )) = ((1, 9.5), (2, 9), (3, 8.7), (4, 8), (5, 7.8))

Let the loss function be the squared loss `(f (x), y) = (f (x)−y)2 . Calculate L̂(f ) = 51 5i=1 `(f (xi ), yi )
P
(which is an estimate of the expected squared loss L(f ) = E[`(f (x), y)] using the sample mean over
the dataset).
(Do not write your answer as a fraction. Instead, express it as a decimal number, rounded to two
decimal places if necessary. For example, write 0.33 instead of 1/3.)

Solution:
The correct value for the estimated expected squared loss is 0.764.
Explanation: The expected squared loss is estimated by calculating the average squared error
over the 5 data points. For each data point (xi , yi ), the squared loss is:

`(f (xi ), yi ) = (f (xi ) − yi )2


First, calculate the predictions for each xi using the predictor f (x) = 10 − 0.2x:

f (x1 ) = 10 − 0.2(1) = 9.8, f (x2 ) = 10 − 0.2(2) = 9.6


f (x3 ) = 10 − 0.2(3) = 9.4, f (x4 ) = 10 − 0.2(4) = 9.2, f (x5 ) = 10 − 0.2(5) = 9
Now, calculate the squared losses for each pair:

`(f (x1 ), y1 ) = (9.8 − 9.5)2 = 0.09, `(f (x2 ), y2 ) = (9.6 − 9)2 = 0.36
`(f (x3 ), y3 ) = (9.4−8.7)2 = 0.49, `(f (x4 ), y4 ) = (9.2−8)2 = 1.44, `(f (x5 ), y5 ) = (9−7.8)2 = 1.44
The total squared loss is:

4/13
Fall 2024 CMPUT 267: Basics of Machine Learning

0.09 + 0.36 + 0.49 + 1.44 + 1.44 = 3.82


The sample mean of the squared loss is:

3.82
L̂(f ) =
= 0.764
5
Therefore, the estimate of the expected squared loss using the sample mean is 0.764.

Question 10. [1 mark]


Suppose you are given the same setup as in the previous question, but the loss function is changed
1 P5
to the absolute loss `(f (x), y) = |f (x) − y|. Calculate L̂(f ) = 5 i=1 `(f (xi ), yi ).
(Do not write your answer as a fraction. Instead, express it as a decimal number, rounded to two
decimal places if necessary. For example, write 0.33 instead of 1/3.)

Solution:
The correct value for the estimated expected absolute loss is 0.8.
Explanation: The expected absolute loss is estimated by calculating the average absolute error
over the 5 data points. For each data point (xi , yi ), the absolute loss is:

`(f (xi ), yi ) = |f (xi ) − yi |


First, calculate the predictions for each xi using the predictor f (x) = 10 − 0.2x:

f (x1 ) = 10 − 0.2(1) = 9.8, f (x2 ) = 10 − 0.2(2) = 9.6

f (x3 ) = 10 − 0.2(3) = 9.4, f (x4 ) = 10 − 0.2(4) = 9.2, f (x5 ) = 10 − 0.2(5) = 9


Now, calculate the absolute losses for each pair:

`(f (x1 ), y1 ) = |9.8 − 9.5| = 0.3, `(f (x2 ), y2 ) = |9.6 − 9| = 0.6

`(f (x3 ), y3 ) = |9.4 − 8.7| = 0.7, `(f (x4 ), y4 ) = |9.2 − 8| = 1.2, `(f (x5 ), y5 ) = |9 − 7.8| = 1.2
The total absolute loss is:

0.3 + 0.6 + 0.7 + 1.2 + 1.2 = 4.0


The sample mean of the absolute loss is:

4.0
L̂(f ) =
= 0.8
5
Therefore, the estimate of the expected absolute loss using the sample mean is 0.8.

Question 11. [1 mark]


Suppose you are given a predictor
(
1 if x > 40
f (x) =
0 otherwise,

5/13
Fall 2024 CMPUT 267: Basics of Machine Learning

which models the relationship between the length of an email x (in words) and its classification y
(spam = 1, not spam = 0). You are provided with the following dataset of 5 (x, y) pairs:

D = {(x1 , y1 ), (x2 , y2 ), (x3 , y3 ), (x4 , y4 ), (x5 , y5 )} = {(20, 0), (50, 1), (35, 1), (60, 0), (45, 1)}

Let the loss function be the 0-1 loss


(
1 if f (x) 6= y
`(f (x), y) =
0 if f (x) = y.

Calculate L̂(f ) = 51 5i=1 `(f (xi ), yi ).


P
(Do not write your answer as a fraction. Instead, express it as a decimal number, rounded to two
decimal places if necessary. For example, write 0.33 instead of 1/3.)

Solution:
The correct value for the estimated expected loss is 0.4.
Explanation: The expected loss is estimated by calculating the average loss over the 5 data points.
For each data point (xi , yi ), the loss is:
(
1 if f (xi ) 6= yi
`(f (xi ), yi ) =
0 if f (xi ) = yi
First, apply the predictor f (x) to each xi to obtain the predicted labels:
(
1 if 20 > 40
f (x1 ) = =0
0 otherwise
(
1 if 50 > 40
f (x2 ) = =1
0 otherwise
(
1 if 35 > 40
f (x3 ) = =0
0 otherwise
(
1 if 60 > 40
f (x4 ) = =1
0 otherwise
(
1 if 45 > 40
f (x5 ) = =1
0 otherwise
Next, compare the predicted labels f (xi ) with the actual labels yi to determine the loss for each
data point:
(
1 if 0 6= 0
`(f (x1 ), y1 ) = =0
0 if 0 = 0
(
1 if 1 6= 1
`(f (x2 ), y2 ) = =0
0 if 1 = 1
(
1 if 0 6= 1
`(f (x3 ), y3 ) = =1
0 if 0 = 1

6/13
Fall 2024 CMPUT 267: Basics of Machine Learning

(
1 if 1 6= 0
`(f (x4 ), y4 ) = =1
0 if 1 = 0
(
1 if 1 6= 1
`(f (x5 ), y5 ) = =0
0 if 1 = 1
The total loss is:

0+0+1+1+0=2
The sample mean of the loss is:
2
L̂(f ) =
= 0.4
5
Therefore, the estimate of the expected loss using the sample mean is 0.4.

Question 12. [1 mark]


Let X be a normally distributed random variable with mean µ and variance σ 2 , denoted as X ∼
N (µ, σ 2 ). Consider a sample of n independent and identically distributed (i.i.d.) random variables
X1 , X2 , . . . , Xn , each following the same distribution as X. The sample mean µ̂n is defined as:
n
1X
µ̂n = Xi
n
i=1

This sample mean µ̂n serves as an estimator for the true mean µ. What are the expected values of
µ̂10 and µ̂100 ?

a. E[µ̂10 ] = µ and E[µ̂100 ] = µ

b. E[µ̂10 ] = µ + σ and E[µ̂100 ] = µ + σ

c. E[µ̂10 ] = µ − σ and E[µ̂100 ] = µ − σ

d. E[µ̂10 ] = µ + √σ and E[µ̂100 ] = µ + √σ


10 100

Solution:
The correct answer is:
• a. E[µ̂10 ] = µ and E[µ̂100 ] = µ
Explanation:
The sample mean µ̂n of n independent and identically distributed (i.i.d.) random variables X1 , X2 , . . . , Xn ,
each with mean µ and variance σ 2 , is defined as:
n
1X
µ̂n = Xi
n
i=1

To find the expected value of µ̂n , we use the linearity of expectation:


" n # n
1X 1X 1
E[µ̂n ] = E Xi = E[Xi ] = × nµ = µ
n n n
i=1 i=1

7/13
Fall 2024 CMPUT 267: Basics of Machine Learning

This result shows that the expected value of the sample mean µ̂n is equal to the true mean µ,
regardless of the sample size n. Therefore:

E[µ̂10 ] = µ and E[µ̂100 ] = µ

Question 13. [1 mark]


Consider the same setting as in the previous question. What are the variances of µ̂10 and µ̂100 , and
which one is smaller?
σ2 σ2
a. Var[µ̂10 ] = 10 and Var[µ̂100 ] = 100 . Var[µ̂100 ] < Var[µ̂10 ]
σ2 σ2
b. Var[µ̂10 ] = 100 and Var[µ̂100 ] = 10 . Var[µ̂100 ] > Var[µ̂10 ]

c. Var[µ̂10 ] = σ 2 and Var[µ̂100 ] = σ 2 . Var[µ̂100 ] = Var[µ̂10 ]

d. Var[µ̂10 ] = √σ and Var[µ̂100 ] = √σ . Var[µ̂100 ] < Var[µ̂10 ]


10 100

Solution:
The correct answer is:
σ2 σ2
• a. Var[µ̂10 ] = 10 and Var[µ̂100 ] = 100 . Var[µ̂100 ] < Var[µ̂10 ]
Explanation:
For a sample mean µ̂n of n independent and identically distributed (i.i.d.) random variables
X1 , X2 , . . . , Xn , each with mean µ and variance σ 2 , the variance of the sample mean is given
by:
Var[X] σ2
Var[µ̂n ] = =
n n
Therefore:
σ2 σ2
Var[µ̂10 ] = , Var[µ̂100 ] =
10 100
σ2 σ2
Since 100 < 10 , it follows that:
Var[µ̂100 ] < Var[µ̂10 ]

Question 14. [1 mark]


True or False: If w∗ is the point where g(w) attains its minimum value, then w∗ is also the point
where −g(w) attains its maximum value.

Solution:
Answer: True.
Explanation:
If w∗ is the point where g(w) attains its minimum value, then by definition, for all w,

g(w∗ ) ≤ g(w)

Multiplying both sides of the inequality by −1 (which reverses the inequality), we get:

−g(w∗ ) ≥ −g(w)

8/13
Fall 2024 CMPUT 267: Basics of Machine Learning

This implies that −g(w∗ ) is greater than or equal to −g(w) for all w. Therefore, w∗ is the point
where −g(w) attains its maximum value.
In other words, minimizing g(w) is equivalent to maximizing −g(w). Thus, the point w∗ that
minimizes g(w) also maximizes −g(w).

Question 15. [1 mark]


True or False: For any function g, the minimum value of g(w) is always equal to the maximum
value of −g(w).

Solution:
Answer: False.
Explanation:
Let m be the minimum value of g(w). Then, the maximum value of −g(w) is −m.
For m to be equal to −m, it must be that m = 0. However, in general, the minimum value m
of g(w) is not zero. Therefore, the minimum value of g(w) is not equal to the maximum value of
−g(w) unless m = 0.

Question 16. [1 mark]


Consider the function g(w) = ew + w2 , where w ∈ R. What is the second derivative of g(w)?

a. g 00 (w) = ew + 2w

b. g 00 (w) = ew + 2

c. g 00 (w) = ew

d. g 00 (w) = 2w

Solution:
The correct answer is:

• b. g 00 (w) = ew + 2

Explanation:
Given the function:
g(w) = ew + w2
Step 1: Compute the First Derivative
First, find the first derivative of g(w) with respect to w:

d
g 0 (w) = ew + w2 = ew + 2w

dw
Step 2: Compute the Second Derivative
Next, find the second derivative by differentiating g 0 (w):

d
g 00 (w) = (ew + 2w) = ew + 2
dw

9/13
Fall 2024 CMPUT 267: Basics of Machine Learning

Therefore, the second derivative of g(w) is:

g 00 (w) = ew + 2

Question 17. [1 mark]


Consider the function g(w) = w4 − 4w2 + 1, where w ∈ R.
Given that the second derivative of g(w) is:

g 00 (w) = 12w2 − 8

True or False: The function g(w) is convex over its entire domain.

Solution:
Answer: False.
Explanation:
A function is convex over its entire domain if its second derivative is non-negative for all values of
w.
Given:
g 00 (w) = 12w2 − 8
Let’s analyze the second derivative:
1. Determine when g 00 (w) ≥ 0:
8 2
12w2 − 8 ≥ 0 =⇒ 12w2 ≥ 8 =⇒ w2 ≥ =
12 3
r
2
=⇒ |w| ≥ ≈ 0.816
3
q
2. Interpretation: - For |w| ≥ 23 , g 00 (w) ≥ 0, indicating that g(w) is convex in these regions. -
q
For |w| < 23 , g 00 (w) < 0, indicating that g(w) is concave in these regions.
q
Since there exist values of w (specifically, |w| < 23 ) where g 00 (w) < 0, the function g(w) is not
convex over its entire domain.

Question 18. [1 mark]


Consider the convex function g(w) = 2w2 − 4w + 3, where w ∈ R.
What is the minimum value of g(w) and at what w is it achieved?

a. The minimum value is 1 at w = 1.

b. The minimum value is 2 at w = 1.

c. The minimum value is 1 at w = 2.

d. The minimum value is 3 at w = 0.

Solution:
The correct answer is:

10/13
Fall 2024 CMPUT 267: Basics of Machine Learning

• a. The minimum value is 1 at w = 1.

Explanation:
To find the minimum value of the function g(w) = 2w2 − 4w + 3, we need to determine the value
of w that minimizes g(w). This involves the following steps:
Step 1: Compute the First Derivative
First, find the first derivative of g(w) with respect to w:

d
g 0 (w) = (2w2 − 4w + 3) = 4w − 4
dw
Step 2: Find the Critical Point
Set the first derivative equal to zero to find the critical point:

4w − 4 = 0 =⇒ 4w = 4 =⇒ w = 1

Step 3: Calculate the Minimum Value


Substitute w = 1 back into the original function to find the minimum value:

g(1) = 2(1)2 − 4(1) + 3 = 2 − 4 + 3 = 1

Question 19. [1 mark]


Consider the convex function g(w) = x∈X (w − x)2 , where X = {1, 2, 3, 4, 5} and w ∈ R.
P

What is the minimum value of g(w) and at what w is it achieved?

a. The minimum value is 10 at w = 3.

b. The minimum value is 10 at w = 2.

c. The minimum value is 15 at w = 3.

d. The minimum value is 20 at w = 4.

Solution:
The correct answer is:

• a. The minimum value is 10 at w = 3.

Explanation:
Given the function:
X
g(w) = (w − x)2 = (w − 1)2 + (w − 2)2 + (w − 3)2 + (w − 4)2 + (w − 5)2
x∈X

Step 1: Expand the Function


First, expand each squared term:

(w − x)2 = w2 − 2wx + x2

11/13
Fall 2024 CMPUT 267: Basics of Machine Learning

Summing over all x ∈ X :


5
X 5
X 5
X
2 2 2
g(w) = (w − 2wx + x ) = 5w − 2w x+ x2
x=1 x=1 x=1

Calculate the sums:


5
X
x = 1 + 2 + 3 + 4 + 5 = 15
x=1
5
X
x2 = 12 + 22 + 32 + 42 + 52 = 1 + 4 + 9 + 16 + 25 = 55
x=1

Substitute these into the function:

g(w) = 5w2 − 2w(15) + 55 = 5w2 − 30w + 55

Step 2: Compute the First Derivative


To find the minimum, compute the first derivative of g(w) with respect to w:

d
g 0 (w) = (5w2 − 30w + 55) = 10w − 30
dw
Step 3: Find the Critical Point
Set the first derivative equal to zero to find the critical point:

10w − 30 = 0 =⇒ 10w = 30 =⇒ w = 3

Step 4: Calculate the Minimum Value


Substitute w = 3 back into the original function to find the minimum value:
5
X
g(3) = (3 − x)2 = (3 − 1)2 + (3 − 2)2 + (3 − 3)2 + (3 − 4)2 + (3 − 5)2
x=1

g(3) = 22 + 12 + 02 + (−1)2 + (−2)2 = 4 + 1 + 0 + 1 + 4 = 10


Therefore, the minimum value of g(w) is 10 at w = 3.

Question 20. [1 mark]


Consider the predictor f (x) = xw, where w ∈ R is a one-dimensional parameter, and x rep-
resents the feature with no bias term. Suppose you are given a dataset of n data points D =
((x1 , y1 ), (x2 , y2 ), . . . , (xn , yn )), where each yi is the target variable corresponding to feature xi .
Let the loss function be the squared loss `(f (x), y) = (f (x) − y)2 . The estimate of the expected
loss for a parameter w ∈ R is defined as the following convex function:
n
1X
L̂(w) = (xi w − yi )2
n
i=1

What is the formula for ŵ = arg minw∈R L̂(w) ?


Pn
i=1 yi
a. ŵ = Pn
i=1 xi

12/13
Fall 2024 CMPUT 267: Basics of Machine Learning

Pn
xi yi
b. ŵ = Pi=1
n 2
i=1 xi
Pn
x2
c. ŵ = Pni=1 i
i=1 i yi
x
Pn
i=1 (yi −xi )
d. ŵ = n

Solution:
The correct answer is:
Pn
xi yi
• b. ŵ = Pi=1
n 2
i=1 xi

Explanation:
To find the value of w that minimizes L̂(w), we perform the following steps:
Step 1: Expand L̂(w)
n n n n
! !
1X 2 2 2 2 1X 2 1X 1X 2
L̂(w) = (xi w − 2xi yi w + yi ) = w xi − 2w x i yi + yi
n n n n
i=1 i=1 i=1 i=1

Step 2: Take the Derivative with Respect to w


n n
! !
dL̂(w) 1X 2 1X
= 2w xi − 2 xi yi
dw n n
i=1 i=1

Step 3: Set the Derivative to Zero to Find the Minimizer


n n
! !
1X 2 1X
2w xi − 2 x i yi = 0
n n
i=1 i=1

n n
!
X X
w x2i = x i yi
i=1 i=1
Pn
x i yi
w = Pi=1
n 2
i=1 xi
Therefore, the formula for ŵ is: Pn
x i yi
ŵ = Pi=1
n 2
i=1 xi

13/13

You might also like