0% found this document useful (0 votes)
46 views4 pages

Practical Work N. 3 (Travaux Pratiques N. 3) : Introduction To Machine Learning: Application To Geosciences

This document describes using machine learning to model house energy consumption based on attributes like area and number of windows. It loads housing data, preprocesses it, trains a linear regression model using gradient descent, evaluates the model on test data, and analyzes the results. Key steps include: 1. Loading and visualizing training data. 2. Implementing gradient descent to estimate weights for a linear regression model predicting consumption from area and windows. 3. Evaluating the model on test data and calculating the mean absolute error between predictions and actual consumption. 4. Modifying gradient descent to track loss over iterations and plotting the loss surface.

Uploaded by

Sylia Ben
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views4 pages

Practical Work N. 3 (Travaux Pratiques N. 3) : Introduction To Machine Learning: Application To Geosciences

This document describes using machine learning to model house energy consumption based on attributes like area and number of windows. It loads housing data, preprocesses it, trains a linear regression model using gradient descent, evaluates the model on test data, and analyzes the results. Key steps include: 1. Loading and visualizing training data. 2. Implementing gradient descent to estimate weights for a linear regression model predicting consumption from area and windows. 3. Evaluating the model on test data and calculating the mean absolute error between predictions and actual consumption. 4. Modifying gradient descent to track loss over iterations and plotting the loss surface.

Uploaded by

Sylia Ben
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Giffard-Roisin Sophie M1 2022-23

Introduction to Machine Learning: Application to geosciences


Sophie Giffard-Roisin

Practical Work n. 3 (travaux pratiques n. 3)


1 House consumption regression data set
Remember our first course, were we wanted to estimate the consumption of a house based on some information
on the house.

1.1 Loading the dataset


import numpy as np
import pandas as pd
dataframe = pd.read_csv('dataset_train_consumption.csv')
print(dataframe)

Same as in TP2, let’s separate X and Y.


print(dataframe.keys())
name_featuresX = ['area', 'nb_windows']
X = dataframe[name_featuresX].copy()
Y = dataframe.consumption.copy()

1.2 Question 1: Plot the data


Let’s represent Y in function of X by filling the floowing lines. Pay attention that here X is 2-dimensional,
i.e. we have 2 features.

import matplotlib.pyplot as plt


fig = plt.figure()
ax = fig.add_subplot(projection='3d')
%ax.scatter(X['area'],X['nb_windows'], Y)
ax.scatter([TO FILL])
ax.set_xlabel('X1 (area)')
ax.set_ylabel('X2 (nb windows)')
ax.set_zlabel('Y (consumption)')
plt.show()

1.3 Data pre-processing


As there are no categorical features and no missing data, the only pre-processing needed is the standardization.
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_transf = scaler.fit_transform(X)
scaler_Y = StandardScaler()
Y_transf = scaler_Y.fit_transform(Y.array.reshape(-1,1))

Page 1
Giffard-Roisin Sophie M1 2022-23

2 Training our first machine learning model


Now, let’s try to fit a linear regression to our data using gradient descent. This is the gradient descent
function in case of linear regression fw (x) = w1 x1 + w2 x2 (which we can also write fw (x) = xwT ):
def gradient_descent(Xs, Ys, rate = 0.01, iterations = 100):
w = np.zeros((Xs.shape[1], 1))
for it in range(iterations):
errors = Ys - Xs.dot(w)
grad = - 2 * (Xs.T).dot(errors)
w = w - rate*grad
return w

2.1 Question 2
Let’s first have a look at the code without using it.
a) What are the lines corresponding to the calculation of the loss gradient?
b) How are the parameters (or weights) initialized?
c) What is the size of the parameter vector w?
d) How many for loops will be performed (if we use the default parameters of this gradient descent function)?

2.2 Question 3
Now call the function gradient descent with inputs (Xtransf , Ytransf ) using the default rate and iterations
values. The estimated parameter vector will have the name westimated . Give in your report the values of
westimated,1 and westimated,2 .

w_estimated = [TO FILL]

2.3 Question 4
Fill and launch the following code to plot the estimated linear model:
fig = plt.figure()
ax = fig.add_subplot(projection='3d')
# first, plot the data points
ax.scatter(X_transf[:,0],X_transf[:,1], Y_transf)

x_data=np.arange(-1.5,1.5,0.1)
y_data=np.arange(-1.5,1.5,0.1)

%z_data= x_data*w_estimated[0] + y_data*w_estimated[1]


z_data= x_data*[TO FILL] + y_data*[TO FILL]

ax.plot(x_data, y_data, z_data)

ax.set_xlabel('X1 (area)')
ax.set_ylabel('X2 (nb windows)')
ax.set_zlabel('Y (consumption)')
plt.show()

Page 2
Giffard-Roisin Sophie M1 2022-23

2.4 Question 5
We will make a modified version of the gradient descent function by adding some lines in order to store
the values of w at each iterations (tip: fill in the w iterations using witerations [it] = ..). Fill the following
function:

def gradient_descent_iterations(Xs, Ys, rate = 0.01, iterations = 100):


w = np.zeros((Xs.shape[1], 1))
w_iterations = np.zeros((iterations,Xs.shape[1],1)) # line added: initialization
for it in range(iterations):
errors = Ys - Xs.dot(w)
grad = - 2 * (Xs.T).dot(errors)
w = w - rate*grad
[TO FILL]
return w, w_iterations # second output added
Run the function gradient descent iterations on (Xtransf , Ytransf ), paying attention that now there is two
outputs:
w_estimated, w_estimated_iterations = gradient_descent_iterations(X_transf, Y_transf)
Then plot the variations of the parameters w estimated iterations 1 and w estimated iterations 2 as functions
of iterations (which, in python, are w estimated iterations[:,0] and w estimated iterations[:,1]). Add
the plot to your report. Are the final values in accordance with your answer in question 2?

2.5 Question 6
Repeat Question 5 setting a different rate value: rate = 0.001. Add the plots to the report. Are the
parameter values stabilizing faster or slower (i.e., the convergence is reached in a smallest or largest number
of iterations)? (Bonus: do the same with rate = 0.1: what is happening?)

3 Testing
We will now estimate the consumption of 5 new houses using our trained model.
Load the test data set and transform it, in the same manner as in section 1:
dataframe_test = pd.read_csv('dataset_test_consumption.csv')
print(dataframe_test)
X_test = dataframe_test[name_featuresX].copy()
Y_test = dataframe_test.consumption.copy() # this is the ground truth to validate our predictions
X_test_transf = scaler.transform(X_test)
Y_test_transf = scaler_Y.transform(Y_test.array.reshape(-1,1))

3.1 Question 7
From the values of w1 = westimated [0] and w2 = westimated [1] estimated in question 5, calculate Y test transf predicted
from X test transf. Remember that this is a linear regression with formula: fw (x) = w1 x1 + w2 x2 . Write the
values of the 5 estimated Y test transf predicted.

3.2 Question 8
Let’s now transform back our predictions in kWh (’de-standardization’) and compare them with the ground
truth consumption values Y test. Calculate the error (in kWh) for every sample, and then calculate the mean
absolute error.

Page 3
Giffard-Roisin Sophie M1 2022-23

Y_test_predicted = scaler_Y.inverse_transform(Y_test_transf_predicted).flatten()
errors = [TO FILL]
absolute_errors = np.abs(errors)
mean_error = [TO FILL]

Write the mean absolute error (in kWh) in your report.

4 Bonus
4.1 Question 9 (bonus)
Modify your function gradient descent iterations in order to calculate and store the loss mean squared error
value at each iteration. Plot it for rate = 0.001 with a sufficient number of iterations so that it reaches a
convergence. Add the plot to your report.

4.2 Question 10 (bonus)


Plot the loss as function of w1 and w2 (a surface). For this, you can for example use the function surface from
matplotlib (see https://matplotlib.org/stable/gallery/mplot3d/surface3d.html). Is there more than
one minimum?

Page 4

You might also like