0% found this document useful (0 votes)
9 views41 pages

Lecture 5

The document outlines the process of executing an end-to-end machine learning project, including data acquisition, model selection, training, and evaluation. It discusses various loss functions, particularly Mean Squared Error and Mean Absolute Error, and their implications on model performance. Additionally, it covers techniques for model fine-tuning such as Grid Search and Randomized Search, as well as the importance of cross-validation in assessing model accuracy.

Uploaded by

muqeet8058
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views41 pages

Lecture 5

The document outlines the process of executing an end-to-end machine learning project, including data acquisition, model selection, training, and evaluation. It discusses various loss functions, particularly Mean Squared Error and Mean Absolute Error, and their implications on model performance. Additionally, it covers techniques for model fine-tuning such as Grid Search and Randomized Search, as well as the importance of cross-validation in assessing model accuracy.

Uploaded by

muqeet8058
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 41

Machine Learning

Siraj

Sunday, May 18, 2025 Department of Computer Science, BUITEMS 1


Lecture Content
• End to End ML Project

Sunday, May 18, 2025 Department of Computer Science, BUITEMS 2


End-to-End ML Project
• Look at the big picture.
• Get the data.
• Discover and visualize the data to gain insights.
• Prepare the data for Machine Learning algorithms.
Select a model and train it.
Fine-tune your model.
Present your solution.
Launch, monitor, and maintain your system

Sunday, May 18, 2025 Department of Computer Science, BUITEMS 3


Select and Train a Model
• At last!
• You framed the problem
• You got the data and explored it
• You sampled a training set and a test set
• You wrote transformation pipelines to clean up and
prepare your data for Machine Learning algorithms
automatically.
• You are now ready to select and train a Machine
Learning mode.

Sunday, May 18, 2025 Department of Computer Science, BUITEMS 4


Training and Evaluating on the
Training Set
• Let's train a Linear Regression model.

• Done! You now have a working Linear Regression model.


Let’s try it out on a few instances from the training set:

Sunday, May 18, 2025 Department of Computer Science, BUITEMS 5


• It works, although the predictions are not exactly
accurate (e.g., the second prediction is off by more than
50%!).
• Let’s measure this regression model’s RMSE on the
whole training set using Scikit-Learn’s
mean_squared_error function:

Sunday, May 18, 2025 Department of Computer Science, BUITEMS 6


Loss Function
• Machines learn by means of a loss function.
• It’s a method of evaluating how well specific algorithm
models the given data.
• If predictions deviate too much from actual results, loss
function would cough up a very large number.
• Gradually, with the help of some optimization function,
loss function learns to reduce the error in prediction.

Sunday, May 18, 2025 Department of Computer Science, BUITEMS 7


Which Loss Function to choose?
• There’s no one-size-fits-all loss function to algorithms in
machine learning.
• There are various factors involved in choosing a loss
function for specific problem such as
• type of machine learning algorithm chosen
• ease of calculating the derivatives and to some degree the
percentage of outliers in the data set.

Sunday, May 18, 2025 Department of Computer Science, BUITEMS 8


Categories of Loss Functions
• Broadly, loss functions can be classified into two major
categories depending upon the type of learning task we
are dealing with — Regression
losses and Classification losses.
• In Regression, it deals with predicting a continuous
value for example given floor area, number of rooms,
size of rooms, predict the price of room.
• In classification, we are trying to predict output from set
of finite categorical values i.e Given large data set of
images of hand written digits, categorizing them into
one of 0–9 digits.
Sunday, May 18, 2025 Department of Computer Science, BUITEMS 9
Regression Losses
• Mean Square Error/Quadratic Loss/L2 Loss

• Mean Absolute Error/L1 Loss

Sunday, May 18, 2025 Department of Computer Science, BUITEMS 10


Mean Squared Error (MSE)
• What it is: The average of the squares of the
differences between predicted and actual values.
• Gradient behavior: The error increases faster as the
prediction moves further from the target because of the
square. That means:
• Small errors get smaller gradients.
• Large errors get very big gradients — this makes the
model learn faster from big mistakes.

Sunday, May 18, 2025 Department of Computer Science, BUITEMS 11


MSE
• As the name suggests, Mean square error is measured
as the average of squared difference between
predictions and actual observations.
• It’s only concerned with the average magnitude of error
irrespective of their direction.
• However, due to squaring, predictions which are far
away from actual values are penalized heavily in
comparison to less deviated predictions.
• Plus MSE has nice mathematical properties which makes
it easier to calculate gradients.
Sunday, May 18, 2025 Department of Computer Science, BUITEMS 12
Mean Absolute Error (MAE)

 What it is: The average of the absolute values of the


differences between predicted and actual values.
 Gradient behavior: The gradient is constant — it's
either +1 or -1 depending on the direction of the error.
o All errors, big or small, have the same influence on
the update.
o Makes learning more stable and robust to outliers
but can be slower to converge.
Sunday, May 18, 2025 Department of Computer Science, BUITEMS 13
MAE
• MAE is measured as the average of sum of absolute
differences between predictions and actual
observations.
• Like MSE, this as well measures the magnitude of error
without considering their direction.
• Unlike MSE, MAE needs more complicated tools such as
linear programming to compute the gradients.
• Plus MAE is more robust to outliers since it does not
make use of square.

Sunday, May 18, 2025 Department of Computer Science, BUITEMS 14


In short:
 MSE gradient gets bigger for bigger errors, so it reacts
strongly to big mistakes.
 MAE gradient stays constant, so it treats all errors more
equally.

Sunday, May 18, 2025 Department of Computer Science, BUITEMS 15


Building Intuition
• this is better than nothing but clearly not a great score:
most districts’ median_housing_values range
between $120,000 and $265,000, so a typical prediction
error of $68,628 is not very satisfying.
• This is an example of a model underfitting the training
data.
• When this happens it can mean that the features do not
provide enough information to make good predictions,
or that the model is not powerful enough.

Sunday, May 18, 2025 Department of Computer Science, BUITEMS 16


Overcome Underfitting
• The main ways to fix underfitting are to select a more
powerful model, to feed the training algorithm with better
features, or to reduce the constraints on the model.
• You could try to add more features (e.g., the log of the
population), but first let’s try a more complex model to
see how it does.
• Let’s train a DecisionTreeRegressor.
• This is a powerful model, capable of finding complex
nonlinear relationships in the data.
• The code should look familiar by now:
Sunday, May 18, 2025 Department of Computer Science, BUITEMS 17
• Now that the model is trained, let’s evaluate it on the training
set:

• No error at all? Could this model really be absolutely perfect?


• Of course, it is much more likely that the model has badly
overfit the data.
Sunday, May 18, 2025 Department of Computer Science, BUITEMS 18
Better Evaluation Using Cross-
Validation
• One way to evaluate the Decision Tree model would be
to use the train_test_split function to split the training
set into a smaller training set and a validation set
• Then train your models against the smaller training set
and evaluate them against the validation set.
• It’s a bit of work, but nothing too difficult and it would
work fairly well.

Sunday, May 18, 2025 Department of Computer Science, BUITEMS 19


Alternative is to use
Scikit-Learn’s cross-validation feature
• K-fold cross-validation: it randomly splits the training set
into 10 distinct subsets called folds
• Then it trains and evaluates the Decision Tree model 10
times, picking a different fold for evaluation every time
and training on the other 9 folds.

Sunday, May 18, 2025 Department of Computer Science, BUITEMS 20


Sunday, May 18, 2025 Department of Computer Science, BUITEMS 21
Sunday, May 18, 2025 Department of Computer Science, BUITEMS 22
• Now the Decision Tree doesn’t look as good as it did earlier. In
fact, it seems to perform worse than the Linear Regression
model!
• Notice that cross-validation allows you to get not only an
estimate of the performance of your model, but also a measure
of how precise this estimate is (i.e., its standard deviation).
• The Decision Tree has a score of approximately 71,200,
generally ±3,200.
• You would not have this information if you just used one
validation set. But cross-validation comes at the cost of
training the model several times, so it is not always possible.

Sunday, May 18, 2025 Department of Computer Science, BUITEMS 23


Let’s compute the same scores for the
Linear Regression model just to be sure:

Note: The Decision Tree model is overfitting so badly that it performs worse than the Linear
Regression model.

Sunday, May 18, 2025 Department of Computer Science, BUITEMS 24


RandomForestRegression
• Lets try another model:

Sunday, May 18, 2025 Department of Computer Science, BUITEMS 25


Fine-Tune Your Model
• Grid Search
• Randomized Search
• Ensemble Methods
• Analyze the best Models and their Errors
• Evaluate Your System on the Test Set

Sunday, May 18, 2025 Department of Computer Science, BUITEMS 26


Grid Search
• One way to do that would be to fiddle with the
hyperparameters manually, until you find a great
combination of hyperparameter values, which is very
tedious of course!
• Instead you should get Scikit-Learn’s GridSearchCV to
search for you.
• All you need to do is tell it which hyperparameters you
want it to experiment with, and what values to try out,
and it will evaluate all the possible combinations of
hyperparameter values, using cross-validation.

Sunday, May 18, 2025 Department of Computer Science, BUITEMS 27


For example, the following code searches for the best
combination of hyperparameter values for the
RandomForestRegressor:

Sunday, May 18, 2025 Department of Computer Science, BUITEMS 28


• This param_grid tells Scikit-Learn to first evaluate all 3
× 4 = 12 combinations of n_estimators and
max_features hyperparameter values specified in the
first dict
• Then try all 2 × 3 = 6 combinations of hyperparameter
values in the second dict
• But this time with the bootstrap hyperparameter set to
False instead of True (which is the default value for this
hyperparameter).

Sunday, May 18, 2025 Department of Computer Science, BUITEMS 29


• All in all, the grid search will explore 12 + 6 = 18
combinations of RandomForestRegressor
hyperparameter values, and it will train each model five
times (since we are using five-fold cross validation).
• In other words, all in all, there will be 18 × 5 = 90
rounds of training!
• It may take quite a long time, but when it is done you
can get the best combination of parameters like this:

Sunday, May 18, 2025 Department of Computer Science, BUITEMS 30


You can also get the best estimator:

Sunday, May 18, 2025 Department of Computer Science, BUITEMS 31


Take Away
• In this example, we obtain the best solution by setting
the max_features hyperparameter to 6, and the
n_estimators hyperparameter to 30.
• The RMSE score for this combination is 49,959, which is
slightly better than the score you got earlier using the
default hyperparameter values (which was 52,634).
• Congratulations, you have successfully fine-tuned
your best model!

Sunday, May 18, 2025 Department of Computer Science, BUITEMS 32


Randomized Search
• The grid search approach is fine when you are exploring relatively
few combinations
• Like in the previous example, but when the hyperparameter
search space is large
• It is often preferable to use RandomizedSearchCV instead.
• This class can be used in much the same way as the
GridSearchCV class, but instead of trying out all possible
combinations
• It evaluates a given number of random combinations by selecting
a random value for each hyperparameter at every iteration.
• This approach has two main benefits:

Sunday, May 18, 2025 Department of Computer Science, BUITEMS 33


• If you let the randomized search run for, say, 1,000
iterations, this approach will explore 1,000 different
values for each hyperparameter (instead of just a few
values per hyperparameter with the grid search
approach).
• You have more control over the computing budget you
want to allocate to hyperparameter search, simply by
setting the number of iterations.

Sunday, May 18, 2025 Department of Computer Science, BUITEMS 34


Ensemble Methods
• Another way to fine-tune your system is to try to
combine the models that perform best.
• The group (or “ensemble”) will often perform better
than the best individual model (just like Random Forests
perform better than the individual Decision Trees they
rely on), especially if the individual models make very
different types of errors.

Sunday, May 18, 2025 Department of Computer Science, BUITEMS 35


Analyze the Best Models and Their
Errors
• You will often gain good insights on the problem by
inspecting the best models.
• For example, the RandomForestRegressor can indicate
the relative importance of each attribute for making
accurate predictions.

Sunday, May 18, 2025 Department of Computer Science, BUITEMS 36


corresponding attribute names
scores next to their
Sunday, May 18, 2025 Department of Computer Science, BUITEMS 37
• With this information, you may want to try dropping some
of the less useful features (e.g., apparently only one
ocean_proximity category is really useful, so you could try
dropping the others).
• You should also look at the specific errors that your system
makes, then try to understand why it makes them and
what could fix the problem (adding extra features or, on
the contrary, getting rid of uninformative ones, cleaning up
outliers, etc.).

Sunday, May 18, 2025 Department of Computer Science, BUITEMS 38


Evaluate Your System on the Test
Set
• After tweaking your models for a while, you eventually
have a system that performs sufficiently well.
• Now is the time to evaluate the final model on the test
set.
• There is nothing special about this process; just get the
predictors and the labels from your test se, run your
full_pipeline to transform the data (call transform(), not
fit_transform()!), and evaluate the final model on the
test set:

Sunday, May 18, 2025 Department of Computer Science, BUITEMS 39


• The performance will usually be slightly worse than what
you measured using crossvalidation if you did a lot of
hyperparameter tuning (because your system ends up
fine-tuned to perform well on the validation data and will
likely not perform as well on unknown datasets).
Sunday, May 18, 2025 Department of Computer Science, BUITEMS 40
Try It Out!
Sunday, May 18, 2025 Department of Computer Science, BUITEMS 41

You might also like