Assignment 1
Assignment 1
X2
100
200
300
400
500
600
700
800
900
X3
0.001
0.002
0.003
0.004
0.005
0.006
0.007
0.008
0.009
Y1
5.1
6.1
6.9
7.8
9.2
9.9
11.5
12.0
12.8
Y2
102.1
202.4
303
403.4
504.2
604.8
704.8
805.7
905.7
Y3
4.1
6.0
9.2
12.0
17
20
25.5
31
36.4
1. Do the following.
a. Model Y1 as a linear function of X1. Use linear regression to learn the model parameters.
J()
Assuming 0 to be zero and 1 to vary from -10 to 15, here is the graph for the cost function J()
end
%plot graph
plot(teta1_array, J)
Theta 1
After 6000 iterations, 0 = 1.0556 and 1 = 1.9943. Therefore, by knowing the values of 0 and 1 it is
possible to plot the learned curve.
Y1
end
%cost function
sum_J = sum_J + ( ( (teta0 + teta1*x1(i)) - y1(i) )^2 );
end
After knowing 0 and 1 it becomes easy to predict the values, as the learned curve equation becomes:
h(x) = 0 + 1x =>
J(theta)
J(theta)
c. Repeat gradient descent learning for = 0.01, 0.1, 1.0, and 100. Plot J for the learning
duration.
J(theta)
J(theta)
The concave shape of the cost function is very sharp, which means that it converges rapidly to the
minimal value. Therefore, with an bigger than 0.1, the value of starts to jump from each side of the
concave curve instead of converging to the minimum value.
2. Do the following.
a. Model Y2 as a linear function of X1 and X2. Use linear regression to learn the model
parameters without scaling features. Use an appropriate value for .
Assuming 0 to be zero and varying 1 and 2 from -10 to 15, the graph for the cost function will be on a
three-dimensional space.
Cost function J() with two independent variables
10 7
4
J()
3
2
1
0
20
10
0
-10
-10
-5
10
15
end
end
Theta 1
Theta 0
By using the gradient descent method with = 0.000001 and 6000000 iterations we have
Theta 2
After 6000000 iterations, 0 = 0.4727, 1 = 0.7141 and 2 = 1.0014. Therefore, by knowing the values of
0, 1 and 2 it is possible to plot the learned curve h(x) = 0 + 1X1 + 2X2
Learned curve from the data set
1000
800
y2
600
400
200
0
2
1000
500
4
x1
x2
%cost function
sum_J = sum_J + ( ( (teta0 + teta1*x1(i) + teta2*x2(i)) y2(i) )^2 );
end
%calculating the next teta
teta0 = teta0 - (alpha/m)*sum_teta0;
teta1 = teta1 - (alpha/m)*sum_teta1;
teta2 = teta2 - (alpha/m)*sum_teta2;
%storing the values
teta0_array(k) = teta0;
teta1_array(k) = teta1;
teta2_array(k) = teta2;
end
J(k) = (1/(2*m))*sum_J;
10 4
18
16
14
J(theta)
12
10
8
6
4
2
0
10
20
30
N of iterations
40
50
60
By scaling features x1 and x2 it is possible to converge to values for 0, 1 and 2 with a larger and much
less iterations. The scaling method chosen was to subtract mean of the data set and divide by the range.
The following Matlab code was added to calculate the new scaled data set:
for i = 1:m
x1_scaled(i) = (x1(i) - mean(x1))/range(x1);
x2_scaled(i) = (x2(i) - mean(x2))/range(x2);
end
The number of iterations was reduced to 300 and was changed to 0.1. Below are the cost function and
the convergence curves for 0, 1 and 2.
Theta 1
J(theta)
Theta 0
Theta 2
In order to compute the solution using the mathematical approach, the following formula was used:
=(
where X is a matrix in which the first row is filled with ones and the subsequent rows are the training
sample data and Y is a column vector with all values of y from the training sample. This solution relies on
XTX being invertible. It will not be invertible in case of redundant features or too many features for the
size of the training sample data (n >> m).
The following Matlab code solves the equation to find the matrix:
x_matrix = 0;
y_matrix = 0;
for i=1:m
x_matrix(i,1)
x_matrix(i,2)
x_matrix(i,3)
y_matrix(i,1)
end
=
=
=
=
1;
x1(i);
x2(i);
y2(i);
A=inv(transpose(x_matrix)*x_matrix)
B=transpose(x_matrix)*y_matrix
theta_matrix = A*B
This is the output for the first problem (model Y 1 as a linear function of X1):
x_matrix =
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
2.0000
2.5000
3.0000
3.5000
4.0000
4.5000
5.0000
5.5000
6.0000
y_matrix =
5.1000
6.1000
6.9000
7.8000
9.2000
9.9000
11.5000
12.0000
12.8000
A =
1.1778
-0.2667
B =
81.3000
355.1000
-0.2667
0.0667
theta_matrix =
1.0600
1.9933
The computation gives the correct values of 0 = 1.0600 and 1 = 1.9933, which are similar to the values
found using the Machine Learning approach.
For the second problem however, the data has redundant features (X 1 and X2 are very similar) which
makes XTX not invertible. Therefore, the final result is wrong. Here is the output of the Matlab program
when modeling Y2 as a linear function of X1 and X2:
x_matrix =
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
2.0000
2.5000
3.0000
3.5000
4.0000
4.5000
5.0000
5.5000
6.0000
y_matrix =
102.1000
202.4000
303.0000
403.4000
504.2000
604.8000
704.8000
805.7000
905.7000
100.0000
200.0000
300.0000
400.0000
500.0000
600.0000
700.0000
800.0000
900.0000
B =
1.0e+06 *
0.0045
0.0212
2.8710
1.5012
-1.0008
0.0050
theta_matrix =
512
0
5
-0.0075
0.0050
-0.0000
3. Do the following.
a. Model Y3 as a quadratic function of X1. Use regression to learn the model parameters.
Any polynomial hypothesis can be mapped to a linear regression problem in a higher dimensional
feature space. Therefore, by transforming h(x) = 0 + 1X1 + 2X12 into h(Z) = 0 + 1Z1 + 2Z2 where Z1 is
equal to X1 and Z2 is equal to X12, the polynomial problem becomes a linear multivariable regression.
The same Matlab code for multiple linear regression was used with a little addition:
Theta 1
y3
Theta 2
Theta 0
Here are the convergence curves for 0, 1 and 2 and the learned curve
250
J(theta)
200
150
100
50
10
20
30
N of iterations
40
50
60