0% found this document useful (0 votes)
33 views44 pages

Unit - III

The document discusses machine learning topics like working with data, the mathematical and statistical basis of machine learning, creating matrices from data, exploring probabilities and probability formulas, interpreting learning as optimization, loss functions, and cost functions. It provides examples and explanations of these concepts.

Uploaded by

shubham
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views44 pages

Unit - III

The document discusses machine learning topics like working with data, the mathematical and statistical basis of machine learning, creating matrices from data, exploring probabilities and probability formulas, interpreting learning as optimization, loss functions, and cost functions. It provides examples and explanations of these concepts.

Uploaded by

shubham
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 44

SKN Sinhagd College of Engineering Korti.

Pandharpur

Class-Bachelor of Engineering
Subject-Machine Learning

Department of Computer Science


& Engineering

Presentation Prepared By
Mr. Subhash V. Pingale

1
Getting started with the Math Basics

Working with Data


1.Discovering criminal behavior and detecting
criminals in action
2. Recommending the right product to the right
person
3. Filtering and classifying data from the Internet at
an enormous scale
4. Driving a car autonomously

2 Thursday 25 April 2024


The mathematical and statistical basis of machine learning
makes outputting such useful results possible. Using math
and statistics in this way enables the algorithms to understand
anything with a numerical basis
a set of information useful for deciding whether to play tennis
outside or not, something a machine can learn using the
proper technique. The set of features described by as
follows:
Outlook: Sunny, overcast, or rain
Temperature: Cool, mild, hot
Humidity: High or normal
Windy: True or false
No matter what the information is, for a machine learning
3
algorithm to correctly process it, it should always be
Thursday 25 April 2024
transformed into a number.
Creating a Matrix
After you make all the data numeric, the machine
learning algorithm requires that you turn the individual
features into a matrix of features and the individual
responses into a vector or a matrix (when there are
multiple responses).
A matrix is a collection of numbers, arranged in rows
and columns, much like the squares in a chessboard.
However, unlike a chessboard, which is always square,
matrices can have a different number of rows and
columns.

4 Thursday 25 April 2024


Matrix Operations
 Addition
Multiplication
Y=xb

5 Thursday 25 April 2024


Glancing at advanced matrix operations

6 Thursday 25 April 2024


Exploring the World of Probabilities

 Probability tells you the likelihood of an event, and you


express it as a number. The probability of an event is
measured in the range from 0 (no probability that an event
occurs) to 1 (certainty that an event occurs). Intermediate
values, such as 0.25, 0.5, and 0.75, say that the event will
happen with a certain frequency when
tried enough times. If you multiply the probability by an
integer number representing the number of trials you’re
going to try, you’ll get an estimate of how many times an
event should happen on average if all the trials are tried.

7 Thursday 25 April 2024


Exploring the World of Probabilities

 For example, when you toss a coin, if the coin is fair, the a
priori probability of a head is 50 percent. No matter how
many times you toss the coin, when faced with a new toss
the probability for heads is still 50 percent. However, there
are other situations in which, if you change the context, the
a priori probability is not valid anymore because something
subtle happened and changed it. In this case, you can
express this belief as an a posteriori probability, which is
the a priori probability after something happened to modify
the count..

8 Thursday 25 April 2024


What is Probability - Probability can be defined as the
ratio of the number of favorable outcomes to the total
number of outcomes of an event.
Probability(Event) = Favorable Outcomes/Total
Outcomes
Probability formula with addition rule: Whenever an
event is the union of two other events, say A and B, then
P(A or B) = P(A) + P(B) - P(A∩B)
P(A ∪ B) = P(A) + P(B) - P(A∩B)

9 Thursday 25 April 2024


Probability formula with the complementary
rule: Whenever an event is the complement of another
event, specifically, if A is an event, then P(not A) = 1 -
P(A) or P(A') = 1 - P(A).
P(A) + P(A′) = 1.
Probability formula with the conditional rule:
When event A is already known to have occurred and
the probability of event B is desired, then P(B, given
A) = P(A and B), P(A, given B). It can be vice versa in
the case of event B.
P(B∣A) = P(A∩B)/P(A)
10 Thursday 25 April 2024
Find the probability of getting a number less than 5
when a dice is rolled by using the probability formula.

11 Thursday 25 April 2024


Probability of getting a number less than 5
Given: Sample space = {1,2,3,4,5,6}
Getting a number less than 5 = {1,2,3,4}
Therefore, n(S) = 6
n(A) = 4
Using Probability Formula,
P(A) = (n(A))/(n(s))
p(A) = 4/6
m = 2/3

12 Thursday 25 April 2024


What is the probability of getting a sum of 9 when two
dice are thrown?

13 Thursday 25 April 2024


There is a total of 36 possibilities when we throw two
dice.
To get the desired outcome i.e., 9, we can have the
following favorable outcomes.
(4,5),(5,4),(6,3)(3,6). There are 4 favorable outcomes.
Probability of an event P(E) = (Number of favorable
outcomes) ÷ (Total outcomes in a sample space)
Probability of getting number 9 = 4 ÷ 36 = 1/9

14 Thursday 25 April 2024


Interpreting Learning As Optimization

Supervised learning
Unsupervised learning
Reinforcement learning

The learning process

15 Thursday 25 April 2024


Loss Function
What’s a loss function?
At its core, a loss function is incredibly simple: It’s a
method of evaluating how well your algorithm models
your dataset. If your predictions are totally off, your
loss function will output a higher number. If they’re
pretty good, it’ll output a lower number. As you
change pieces of your algorithm to try and improve
your model, your loss function will tell you if you’re
getting anywhere.

16 Thursday 25 April 2024


17 Thursday 25 April 2024
Different types of loss functions
A lot of the loss functions that you see implemented
in machine learning can get complex and confusing.
But if you remember the end goal of all loss
functions—measuring how well your algorithm is
doing on your dataset—you can keep that
complexity in check.
We’ll run through a few of the most popular loss
functions currently being used, from simple to more
complex.

18 Thursday 25 April 2024


Exploring cost functions
A cost function is an important parameter that
determines how well a machine learning model
performs for a given dataset.
It calculates the difference between the expected value
and predicted value and represents it as a single real
number.

19 Thursday 25 April 2024


Why use cost function

20 Thursday 25 April 2024


21 Thursday 25 April 2024
Types of cost function
Regression Cost Function
Binary Classification cost Functions
Multi-class Classification Cost Function.

22 Thursday 25 April 2024


Regression cost function
 Regression models are used to make a prediction for the
continuous variables such as the price of houses, weather
prediction, loan predictions, etc.
Error= Actual Output-Predicted output
1. Means Error
 In this cost function, the error for each training data is calculated
and then the mean value of all these errors is derived.
 Calculating the mean of the errors is the simplest and most intuitive
way possible.
 The errors can be both negative and positive. So they can cancel
each other out during summation giving zero mean error for the
model.
.
23 Thursday 25 April 2024
2. Mean Squared Error (MSE) :
This improves the drawback we encountered in Mean
Error above. Here a square of the difference between the
actual and predicted value is calculated to avoid any
possibility of negative error.
It is measured as the average of the sum of squared
differences between predictions and actual observations.

MSE = (sum of squared errors)/n

24 Thursday 25 April 2024


3. Mean Absolute Error (MAE)
This cost function also addresses the shortcoming of
mean error differently. Here an absolute difference
between the actual and predicted value is calculated to
avoid any possibility of negative error.
So in this cost function, MAE is measured as the
average of the sum of absolute differences between
predictions and actual observations.

MAE = (sum of absolute errors)/n


25 Thursday 25 April 2024
 With respect to your target, a good practice is to define the cost
function that works the best in solving your problem, and then to figure
out which algorithms work best in optimizing it to define the
hypothesis space you want to test.
 When you work with algorithms that don’t allow the cost function you
want, you can still indirectly influence their optimization process by
fixing their hyper-parameters and selecting your input features with
respect to your cost function. Finally, when you’ve gathered all the
algorithm results, you evaluate them by using your chosen cost
function and then decide on the final hypothesis with the best result
from your chosen error function.

26 Thursday 25 April 2024


Gradient Descent Algorithm

 Gradient Descent is one of the most used machine learning algorithms in the
industry.
What is a Cost Function?
It is a function that measures the performance of a model for any given data. Cost
Function quantifies the error between predicted values and expected values

27 Thursday 25 April 2024


What is Gradient Descent?
Gradient descent is an iterative optimization algorithm
for finding the local minimum of a function.
Let’s say you are playing a game where the players are
at the top of a mountain, and they are asked to reach
the lowest point of the mountain. Additionally, they
are blindfolded. So, what approach do you think would
make you reach the lake?

28 Thursday 25 April 2024


To find the local minimum of a function using gradient
descent, we must take steps proportional to the
negative of the gradient (move away from the
gradient) of the function at the current point. If we take
steps proportional to the positive of the gradient
(moving towards the gradient), we will approach a
local maximum of the function, and the procedure is
called Gradient Ascent.

29 Thursday 25 April 2024


30 Thursday 25 April 2024
Batch Gradient Descent: This is a type of gradient
descent which processes all the training examples for
each iteration of gradient descent. But if the number of
training examples is large, then batch gradient descent
is computationally very expensive. Hence if the
number of training examples is large, then batch
gradient descent is not preferred. Instead, we prefer to
use stochastic gradient descent or mini-batch gradient
descent.

31 Thursday 25 April 2024


Stochastic Gradient Descent: This is a type of
gradient descent which processes 1 training example
per iteration. Hence, the parameters are being updated
even after one iteration in which only a single example
has been processed. Hence this is quite faster than
batch gradient descent. But again, when the number of
training examples is large, even then it processes only
one example which can be additional overhead for the
system as the number of iterations will be quite large.

32 Thursday 25 April 2024


Mini Batch gradient descent: This is a type of
gradient descent which works faster than both batch
gradient descent and stochastic gradient descent.
Here b examples where b<m are processed per
iteration. So even if the number of training examples is
large, it is processed in batches of b training examples
in one go. Thus, it works for larger training examples
and that too with lesser number of iterations.

33 Thursday 25 April 2024


34 Thursday 25 April 2024
Learning Curves in Machine Learning
 learning curve is just a plot showing the progress
over the experience of a specific metric related to
learning during the training of a machine learning
model. They are just a mathematical representation of
the learning process.
we’ll have a measure of time or progress in the x-axis
and a measure of error or performance in the y-axis.

35 Thursday 25 April 2024


Single Curves
The most popular example of a learning curve is loss
over time. Loss (or cost) measures our model error, or
“how bad our model is doing”. So, for now, the lower
our loss becomes, the better our model performance
will be.
In the picture below, we can see the expected behavior
of the learning process:

36 Thursday 25 April 2024


37 Thursday 25 April 2024
Despite the fact it has slight ups and downs, in the
long term, the loss decreases over time, so the
model is learning.
Other examples of very popular learning curves
are accuracy, precision, and recall. All of these
capture model performance, so the higher they are,
the better our model becomes.

38 Thursday 25 April 2024


39 Thursday 25 April 2024
Multiple Curves
One of the most widely used metrics combinations
is training loss + validation loss over time.
The training loss indicates how well the model is
fitting the training data, while the validation loss
indicates how well the model fits new data.
We will see this combination later on, but for now, see
below a typical plot showing both metrics

40 Thursday 25 April 2024


41 Thursday 25 April 2024
Optimization Learning Curves: Learning curves
calculated on the metric by which the parameters of
the model are being optimized, such as loss or Mean
Squared Error
Performance Learning Curves: Learning curves
calculated on the metric by which the model will be
evaluated and selected, such as accuracy, precision

42 Thursday 25 April 2024


Performance curve for two different model

43 Thursday 25 April 2024


Thank you

You might also like