0% found this document useful (0 votes)
18 views

Introduction to Machine Learning

The document provides an introduction to machine learning, detailing its definition as a branch of artificial intelligence focused on algorithm design for behavior evolution based on data. It outlines the machine learning process lifecycle, including data acquisition, feature selection, and algorithm application, while discussing supervised and unsupervised learning methods. Additionally, it covers model evaluation metrics, overfitting vs. underfitting, regularization techniques, and ensemble methods, along with various applications and tools in the field.

Uploaded by

Asma Ayub
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

Introduction to Machine Learning

The document provides an introduction to machine learning, detailing its definition as a branch of artificial intelligence focused on algorithm design for behavior evolution based on data. It outlines the machine learning process lifecycle, including data acquisition, feature selection, and algorithm application, while discussing supervised and unsupervised learning methods. Additionally, it covers model evaluation metrics, overfitting vs. underfitting, regularization techniques, and ensemble methods, along with various applications and tools in the field.

Uploaded by

Asma Ayub
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 116

INTRODUCTION TO MACHINE LEARNING

OVERVIEW OF ML
OVERVIEW OF ML

2
1

3 6

3
WHAT IS MACHINE LEARNING?

A branch of artificial intelligence, concerned with the design


and development of algorithms that allow computers to evolve
behaviors based on empirical data.

As intelligence requires knowledge, it is necessary for the


computers to acquire knowledge.
MACHINE LEARNING PROCESS LIFECYCLE

6
DATA

1/10/2023 Data Science in Practice 7


FEATURES

8
ALGORITHMS
The success of machine learning system also depends on
the algorithms.

The algorithms control the search to find and build the


knowledge structures.

The learning algorithms should extract useful


information from training examples.
LEARNING DIMENSIONS

10
FLAVORS OF CLASSIFICATION
SOME USEFUL CLASSIFIER
Decision Trees
--Examples are used to
--Learn topology
If patrons=full and day=Friday
K-nearest --Order of questions
then wait (0.3/0.7) neighbors
If wait>60 and Reservation=no
then wait (0.4/0.9)

Association rules
--Examples are used to
--Learn support and
confidence of association
rules SVMs
Neural Nets
--Examples are used to
--Learn topology
--Learn edge weights
Naïve bayes
(bayesnet learning)
--Examples are used to
--Learn topology
--Learn CPTs
SUPERVISED LEARNING

19
UNSUPERVISED LEARNING

28
INSTANCE BASED LEARNING ALGORITHMS

Instances are nothing but subsets of datasets, and instance based learning models work
on an identified instance or groups of instances that are critical to the problem.
The results across instances are compared, which can include an instance of new data
as well.
This comparison uses a particular similarity measure to find the best match and predict.
Instance based methods are also called lazy learning Algorithms
Here the focus is on the representation of the instances and similarity measures for
comparison between instances.
Basic idea:
⚫ If it walks like a duck, quacks like a duck, then it’s probably a duck

49
TRAINING AND TESTING

Data acquisition Practical usage

Universal set
(unobserved)

Training set Testing set


(observed) (unobserved)
TRAINING AND TESTING
Training is the process of making the system able to
learn.

No free lunch rule:


⚫ Training set and testing set come from the same distribution
⚫ Need to make some assumptions or bias
MODEL EVALUATION

a: TP (true positive)
PREDICTED CLASS b: FN (false
negative)
Class=Yes Class=No
c: FP (false positive)

Class=Yes a b d: TN (true negative)


ACTUAL
CLASS Class=No c d

1/10/2023 Data Science in Practice 62


TRUE POSITIVE & FALSE POSITIVE
Consider a dataset of dog images where 6 images are ‘dog’ images,
and 4 images are ‘No dog’ images.
So we have a binary classification here with ‘dog’ vs ‘no dog’.
Dog -> Positive class
No dog -> Negative class
Suppose our model has the following predictions.
TRUE POSITIVE & FALSE POSITIVE
Dog -> Positive class
No dog -> Negative class
Lets first talk about prediction which are ‘Dog’. Forget ‘No Dog’
prediction.

There are 7 dog predictions.


Out of 7, there are 4 correct predictions.
Correct predictions are called True Positive (TP). TP tells that how
many samples from positive class are predicted correctly.
TRUE NEGATIVE & FALSE NEGATIVE
Dog -> Positive class
No dog -> Negative class
Lets now talk about prediction which are ‘No Dog’. Forget ‘Dog’
prediction.
ACCURACY
PRECISION AND RECALL

For PRECISION, think about predictions as the base

For RECALL, think about truth as the base

PRECISION and RECALL are for individual classes.


PRECISION – POSITIVE CLASS(DOG)
RECALL- POSITIVE CLASS(DOG)
PRECISION – NEGATIVE CLASS(NO DOG)
RECALL – NEGATIVE CLASS(NO DOG)
F1 SCORE
F1 score is the harmonic mean of precision and recall.

It is used to evaluate binary classification systems.

F1 score is balancing precision and recall on the


positive class while accuracy looks at correctly classified
observations both positive and negative.
PERFORMANCE
There are several factors affecting the performance:
⚫ Types of training provided
⚫ The form and extent of any initial background knowledge
⚫ The type of feedback provided
⚫ The learning algorithms used

Two important factors:


⚫ Modeling
⚫ Optimization
Under-fitting VS. Over-fitting (fixed N)

error

(model = hypothesis + loss


functions)
OVERFIT & UNDERFIT MODEL
OVERFIT & UNDERFIT MODEL
Consider the example of house price prediction based on one
feature.
We are trying to predict the value of home price based on sq ft
area.
Figure shows the scatter plot of all the samples present in the
dataset.
To train a model, we split this dataset into train and test
samples.
OVERFIT & UNDERFIT MODEL
Consider all blue dots as training samples and orange dots as
test samples.
We can train a model that can fit the blue dots.
Our trained model can be an overfit model, an underfit model
or a balanced fit model.
OVER FIT MODEL

This figure shows an overfit model. An overfit model tries to


exactly fit the training samples and the training error becomes
close to zero.
But the test error can become very high.
OVER FIT MODEL
The training and test samples are selected at random. Usually 80/20
ratio.
Figure shows the same problem but with different selection of
train/test samples.
HIGH VARIANCE
When there is high variability in Test Error based on the selection of
training samples.
As samples are being selected randomly, the test error varies randomly
which is not good and a common issue in overfit models.
LOW VARIANCE
Lets consider a simple model, a linear model, which is underfitting
the training samples.
Even if we change the training and testing samples, there is not
bigger difference in the train and test dataset error.
LOW VARIANCE
Based on the samples selected for training and testing, the test error
doesn’t vary that much.
Variance is all about Test Error.
BIAS
Bias is a measurement of how accurately a model can capture a
pattern in training dataset.

Bias is all about Train error. When train error is high, it is said to
have high bias.

In case of overfiiting, bias will be low because the train error will
be minimum.
BALANCED FIT MODEL
BALANCED FIT MODEL
BULLS EYE DIAGRAM
Inner circle represents the ground truth.
WAYS TO GET BALANCED FIT MODEL
Cross Validation
Regularization
Dimensionality Reduction
Ensemble Techniques
K FOLD CROSS VALIDATION
OPTIONS TO TRAIN A MODEL
Use all data to train the model and then use some of that
data to test the model.
PROBLEMS WITH THIS APPROACH?
We are testing the model on the same data on which it
was trained.
Split available dataset into training and test sets
PROBLEMS WITH THIS APPROACH?
This approach works well most of the times.
However, suppose a case where most of the training samples
were from one class and only few training samples were from
other class.
And most of the test samples are from other class.
K FOLD CROSS VALIDATION
We divide the whole dataset into folds. Lets say 5 folds.
And then we run multiple iterations.
In first iteration, the first fold is used to test the model
and remaining four folds are used to train the model.
In second iteration, the second fold is used to test the
model and remaining folds are used to train the model.
This process is repeated till the last fold, where last fold
is used for testing and remaining folds are used to train
the model.
Lastly, results from each iteration are averaged.
K FOLD CROSS VALIDATION
L1 AND L2 REGULARIZATION
L1 and L2 regularizations are some of the techniques that can be
used to address the overfitting issue.
Consider the equation for overfit case, if in this equation I
somehow make sure that my theta 3 and theta 4 is almost close to
0, then the equation will change.
L1 AND L2 REGULARIZATION
the idea here is to shrink your parameters your parameters which
is theta 0, theta 3, theta 4, even theta 2, theta 1 if you can

reduce this parameter if you can keep these parameters smaller


then you can get a better equation for your prediction function.

Now how do we do that?


L1 AND L2 REGULARIZATION
when we run training we pass first sample and then we calculate
‘y’
predicted on some randomly initialized weights, then we compare
it with the truth value and then this is how we call calculate mean
square error or MSE.
L1 AND L2 REGULARIZATION
Here y predictor is actually h theta x i
where h theta x i could be higher order polynomial equation like
this
and x 1, x 2 are nothing but features.
L2 REGULARIZATION
So in this equation what if I add this particular parameter?
There is lambda which is a free parameter, we can control it it's
like a tuning knob, and we are making a square of each of these
theta parameters.
So now if the theta gets bigger this value will get bigger, the
error will be big
And the model will not converge.
L2 REGULARIZATION
Essentially we are penalizing higher values of theta here.

So whenever model tries to make the theta value higher, we are


adding a penalty here. So by adding this penalty we are making
sure that theta value doesn't go too high so they will remain very
small.

we can fine tune this thing using parameter lambda

So this is called L2 regularization it is called L2 because we are


using a square
L1 REGULARIZATION
L1 regularization, we used the absolute value. So that is
the only difference between L1 and L2 that in L1 we are
using absolute value of theta parameter.
Here again if theta is bigger the error overall error is big
and it kind of acts as a penalty so that during the training
overall the value of theta will remain smaller.
DIMENSIONALITY REDUCTION
Some example dimensionality reduction methods are listed as
follows:
⚫ Multidimensional scaling (MDS)
⚫ Principal component analysis (PCA)
⚫ Projection pursuit (PP)
⚫ Partial least squares (PLS) regression

102
DIMENSIONALITY REDUCTION
ENSEMBLE METHODS
This is a very powerful and widely adopted class of techniques.
As the name suggests, ensemble methods encompass multiple
models that are built independently and the results of these
models are combined and responsible for overall predictions.
It is critical to identify what independent models are to be
combined or included, how the results need to be combined, and
in what way to achieve the required result.
The subset of models that are combined is sometimes referred to
as weaker models as the results of these models need not
completely fulfill the expected outcome in isolation.

104
ENSEMBLE METHODS
ENSEMBLE METHODS
The following are some of the Ensemble method algorithms:
⚫ Random forest
⚫ Bagging: Sampling with replacement
⚫ AdaBoost
⚫ Bootstrapped Aggregation (Boosting)
⚫ Stacked generalization (blending)
⚫ Gradient boosting machines (GBM)

106
MACHINE LEARNING APPLICATIONS

113
MACHINE LEARNING TOOLS AND FRAMEWORKS

115
MACHINE LEARNING TOOLS AND FRAMEWORKS

116

You might also like