0% found this document useful (0 votes)
19 views55 pages

Introduction To ML

Uploaded by

Karthik .k
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views55 pages

Introduction To ML

Uploaded by

Karthik .k
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 55

ST JOSEPH’S UNIVERSITY

BENGALURU

INTRODUCTION TO
MACHINE LEARNING
PRESENTED BY:

AARAN DLIMA
INTRODUCTION:
Human beings can learn everything from our
experiences and we do have leaning capacity.

Additionally, we have computers or machines


which work on our instructions.

But can a machine also learn from experiences


or past data like a human does?

So here comes the role of Machine Learning


HUMAN VS MACHINE
INTRODUCTION:
Subset of Artificial Intelligence.

Primary Focus : Creation of algorithms that


enable a computer to independently learn from
data and previous experiences.

Algorithms create a mathematical model that,


without being explicitly programmed, helps in
making predictions or decisions with the
assistance of sample past data, or training data.
EXAMPLE: WANT TO PREDICT CAT OR DOG

For the purpose of developing predictive models, machine learning


brings together statistics and computer science.

A machine can learn if it can gain more data to improve its performance.
HOW DOES ML WORKS?
ML builds prediction models, learns
from the data and predicts the output
of new data when it receives it.

The more data you have, the better


model will be. It will also affects the
accuracy of the predicted output.
HOW DOES ML WORK?
DO WE NEED ML?
Increased Data

Solving complex problems( difficult


for a human)

Decision making in various sector

Finding hidden patterns and


extracting useful information from
data.
CLASSIFICATION OF ML

SUPERVISED LEARNING

UNSUPERVISED LEARNING

REINFORCEMENT LEARNING
CLASSIFICATION OF ML
SUPERVISED LEARNING
REMEMBER: Labeled data is used here.

OBJECTIVE: Mapping of the input data to


the output data.

The system uses labeled data to build a


model that understands the datasets and
learns about each one.

Training and Testing.


UNSUPERVISED LEARNING
REMEMBER: Data that has not been labeled,
classified, or categorized

OBJECTIVE: To restructure the input data into


new features or a group of objects with similar
patterns.

It is a learning method in which a machine learns


without any supervision.

No predetermined result.

The machine tries to find useful insights from the


huge amount of data.
REINFORCEMENT LEARNING
Feedback-based learning method.

Right action reward and punishment for wrong


action.

The agent learns automatically with these


feedbacks and improves its performance.

The robotic dog, which automatically learns the


movement of his arms, is an example of
Reinforcement learning.
APPLICATIONS OF ML
Image recognition

Speech recognition

Prediction

Recommendation
APPLICATIONS OF ML
Voice Assistant

Fraud Detection

Language Translation

Stock market
TYPES OF DATA
Numerical data: Such as house price,
temperature, etc.

Categorical data: Such as Yes/No,


True/False, Blue/green, etc.

Ordinal data: These data are similar to


categorical data but can be measured on
the basis of comparison.
TYPES OF DATASETS

IMAGE DATASETS TEXT DATASETS TABULAR DATASETS


Images Textual Information Rows and columns
Articles, books etc Table format
image classification, Sentiment analysis,
object detection, and Text classification
image segmentation.
DATA PREPROCESSING:
Data preprocessing is required tasks for
cleaning the data and making it suitable for a
machine learning model which also increases
the accuracy and efficiency of a machine
learning model.

Data contains noises, missing values, and


unusable format which cannot be directly used
for machine learning models.
STEPS IN DATA
PREPROCESSING

01 02 03
SEARCH THE DATASET IMPORTING LIBRARIES IMPORT DATASETS

Each Dataset is different from the Import some predefined Python After importing the libraries we
other. libraries. need to import the data that we
have collected.
Search the dataset according to
Few libraries we usually come
the need of your problem
across are numpy, pandas, Here we make use of syntaxes
statement.
matplotlib, seaborn, scikit learn. such as : read_csv,read_excel and
Data we make use is usually in csv so on.(Learn different ways of
format,text file format or excel file. importing the datasets.)
STEPS IN DATA
PREPROCESSING
Once first three steps are done then explore the data. Know
what are your dependent and independent variables are.

04 05 06
DATA CLEANING ENCODING SPLITTING
We are Splitting the dataset into
Check for Missing values This step is basically for treating
X_train, X_test, Y_train, Y_test.
Categorical variables.
How do you deal with missing
Here you need to decide what is
values? Few encoding techniques are Label
your test size.(Usually 20% to
Deleting encoder and One hot encoder with
30%).
Substituting values with mean, the help of scikit learn package.
median or mode
Random state keyword.
Check for column names to be
renamed and so on.
Dummy Variables:
Dummy variables are those variables which have values 0 or 1. The 1 value gives the presence of that variable in a particular column, and
rest variables become 0. With dummy encoding, we will have a number of columns equal to the number of categories.
STEPS IN DATA PREPROCESSING:
7. Feature Scaling

Final step in machine learning.

It is a technique to standardize the independent variables of the


dataset in a specific range.

We put our variables in the same range and in the same scale so
that no any variable dominate the other variable.

Two ways of feature scaling:


Standardization
Normalization

For feature scaling, we will import StandardScaler class of


sklearn.preprocessing library
STEPS IN DATA PREPROCESSING:
7. Feature Scaling
LIFE CYCLE OF ML

Problem Definition

Data Collection

Data Preparation

Model Building

Model Evaluation

Model Deployment
OVERFITTING &UNDERFITTING
These are the two main two problems that we encounter in Machine Learning
which degrades the performance of machine learning models.

Few Terms to Keep In mInd:

Bias: Bias is a prediction error that is introduced in the model due to oversimplifying
the machine learning algorithms. Or it is the difference between the predicted values
and the actual values.

Variance: If the machine learning model performs well with the training dataset, but
does not perform well with the test dataset, then variance occurs.
OVERFITTING:
Occurs when our ML model tries to cover all the data points or more
than the required data points present in the given dataset.

Therefore model starts caching noise and inaccurate values present


in the dataset, and all these factors reduce the efficiency and
accuracy of the model.

The overfitted model has low bias and high variance.

The chances of occurrence of overfitting increase as much we


provide training to our model.

It means the more we train our model, the more chances of occurring
the overfitted model.

Overfitting is the main problem that occurs in Supervised Learning.


LET US TRY TO UNDERSTAND OVERFITTING WITH EXAMPLE:

As we can see from the above linear regression output graph, the model tries to cover all the data
points present in the scatter plot.

It may look efficient, but in reality, it is not so.

Because the goal of the regression model to find the best fit line, but here we have not got any
best fit, so, it will generate the prediction errors.
HOW TO AVOID OVERFITTING:

Cross Validation

Training with more data

Removing Features

Early Stopping

Regularization

Ensembling
UNDERFITTING:
Underfitting occurs when our machine learning model is not able to
capture the underlying trend of the data.

To avoid the overfitting in the model, we have know the techniques


to overcome. One technique is early stopping.

As a result, it may fail to find the best fit of the dominant trend in the
data.

Here the model is not able to learn enough from the training data, and
hence it reduces the accuracy and produces unreliable predictions.

An underfitted model has high bias and low variance.


LET US TRY TO UNDERSTAND UNDERFITTING WITH EXAMPLE:

Here, the model is unable to capture the data points present in the plot.

How to avoid underfitting:


By increasing the training time of the model.
By increasing the number of features.
ERRORS IN MACHINE LEARNING

Reducible errors: These errors can be reduced to improve the model accuracy.

Irreducible errors: These errors will always be present in the model regardless of which
algorithm has been used. The cause of these errors is unknown variables whose value
can't be reduced.
BIAS
Low Bias: A low bias model will make fewer assumptions
about the form of the target function.

High Bias: A model with a high bias makes more assumptions,


and the model becomes unable to capture the important
features of our dataset. A high bias model also cannot
perform well on new data.

Some examples of machine learning algorithms with low bias


are Decision Trees, k-Nearest Neighbours and Support
Vector Machines.

At the same time, an algorithm with high bias is Linear


Regression, Linear Discriminant Analysis and Logistic
Regression.
VARIANCE
Low variance means there is a small variation in the prediction of the
target function with changes in the training data set.

High variance shows a large variation in the prediction of the target


function with changes in the training dataset.

A model that shows high variance learns a lot and perform well with
the training dataset, and does not generalize well with the unseen
dataset.( model gives good results with the training dataset )

Low variance - Linear Regression, Logistic Regression, and Linear


discriminant analysis.

High variance - decision tree, Support Vector Machine, and K-nearest


neighbours.
DIFFERENT COMBINATIONS OF BIAS-VARIANCE

Low-Bias, Low-Variance: This shows an ideal machine learning model.


However, it is not possible practically.

Low-Bias, High-Variance: With low bias and high variance, model


predictions are inconsistent and accurate on average. This case occurs
when the model learns with a large number of parameters and hence
leads to an overfitting

High-Bias, Low-Variance: With High bias and low variance, predictions


are consistent but inaccurate on average. This case occurs when a
model does not learn well with the training dataset or uses few
numbers of the parameter. It leads to underfitting problems in the
model.

High-Bias, High-Variance: Predictions are inconsistent and also


inaccurate on average.
BIAS-VARIANCE TRADE-OFF

While building the machine learning model, it is really important to take


care of bias and variance in order to avoid overfitting and underfitting
in the model.

If the model is very simple with fewer parameters, it may have low
variance and high bias.

If the model has a large number of parameters, it will have high


variance and low bias.

So, it is required to make a balance between bias and variance errors,


and this balance between the bias error and variance error is known as
the Bias-Variance trade-off.
BIAS-VARIANCE TRADE-OFF

For an accurate prediction of the model, algorithms need a low variance and low bias. But this is not possible
because bias and variance are related to each other:
If we decrease the variance, it will increase the bias.
If we decrease the bias, it will increase the variance.

Bias-Variance trade-off is a central issue in supervised learning.

Hence, the Bias-Variance trade-off is about finding the sweet spot to make a balance between bias and
variance errors.
CONFUSION MATRIX
The confusion matrix is a matrix used to determine the
performance of the classification models for a given
set of test data.

The matrix itself can be easily understood, but the


related terminologies may be confusing.

It shows the errors in the model performance in the


form of a matrix.

Known as an error matrix.


CONFUSION MATRIX

For the 2 prediction classes of classifiers, the matrix is of 2*2 table, for 3 classes, it
is 3*3 table, and so on.

The matrix is divided into two dimensions, that are predicted values and actual
values along with the total number of predictions.

Predicted values are those values, which are predicted by the model, and actual
values are the true values for the given observations.
NEED FOR CONFUSION MATRIX

It evaluates the performance of the classification models, when they make


predictions on test data, and tells how good our classification model is.

It not only tells the error made by the classifiers but also the type of errors such as
it is either type-I or type-II error.

With the help of the confusion matrix, we can calculate the different parameters
for the model, such as accuracy, precision, etc.
PERFORMANCE METRICS FOR CLASSIFICATION

Confusion Matrix

Accuracy

Precision

Recall

F Score

AUC(Area Under the Curve)-ROC


PERFORMANCE METRICS FOR CLASSIFICATION
The accuracy metric can be determined as the number of correct predictions to
the total number of predictions.

ACCURACY
When to Use Accuracy?
It is good to use the Accuracy metric when the target variable classes in data are
approximately balanced.

For example, if 60% of classes in a fruit image dataset are of Apple, 40% are
Mango. In this case, if the model is asked to predict whether the image is of Apple
or Mango, it will give a prediction with 97% of accuracy.
PERFORMANCE METRICS FOR CLASSIFICATION
Precision is the ratio of correctly classified positive samples (True Positive) to a
total number of classified positive samples (either correctly or incorrectly).

PRECISION
Precision helps us to visualize the reliability of the machine learning model in
classifying the model as positive.

The precision metric is used to overcome the limitation of Accuracy.


PERFORMANCE METRICS FOR CLASSIFICATION
he recall is calculated as the ratio between the numbers of Positive samples
correctly classified as Positive to the total number of Positive samples.

RECALL
It is also similar to the Precision metric.

It aims to calculate the proportion of actual positive that was identified


incorrectly.

The recall measures the model's ability to detect positive samples.

The higher the recall, the more positive samples detected.


PERFORMANCE METRICS FOR CLASSIFICATION
When to use Precision and Recall?

From the above definitions of Precision and Recall, we can say that recall determines the
performance of a classifier with respect to a false negative, whereas precision gives
information about the performance of a classifier with respect to a false positive.

So, if we want to minimize the false negative, then, Recall should be as near to 100%, and if
we want to minimize the false positive, then precision should be close to 100% as possible.

In simple words, if we maximize precision, it will minimize the FP errors, and if we


maximize recall, it will minimize the FN error.
PERFORMANCE METRICS FOR CLASSIFICATION
F-score or F1 Score is a metric to evaluate a binary classification model on the basis of
predictions that are made for the positive class.

It is calculated with the help of Precision and Recall. So, the F1 Score can be calculated as the
harmonic mean of both precision and Recall, assigning equal weight to each of them

F SCORE
When to use F-Score?
As F-score make use of both precision and recall, so it should be used if both of them are
important for evaluation, but one (precision or recall) is slightly more important to consider than
the other.

For example, when False negatives are comparatively more important than false positives, or vice
versa.
PERFORMANCE METRICS FOR CLASSIFICATION
It is one of the popular and important metrics for evaluating the performance of the
classification model.

AUC (Area
Under Curve) - ROC (Receiver Operating Characteristic curve) curve represents a graph to show the
ROC performance of a classification model at different threshold levels.

The curve is plotted between two parameters, which are:


True Positive Rate
False Positive Rate

TPR = FPR =

To calculate value at any point in a ROC curve, we can evaluate a logistic regression model multiple times with different classification
thresholds, but this would not be much efficient. So, for this, one efficient method is used, which is known as AUC.
AUC: AREA UNDER THE ROC CURVE
AUC calculates the performance across all the thresholds and provides an aggregate measure.

The value of AUC ranges from 0 to 1.

It means a model with 100% wrong prediction will have an AUC of 0.0, whereas models with 100%
correct predictions will have an AUC of 1.0.
AUC: AREA UNDER THE ROC CURVE
When to Use AUC?
AUC should be used to measure how well the predictions are
ranked rather than their absolute values.

It measures the quality of predictions of the model without


considering the classification threshold.

When not to use AUC?


As AUC is scale-invariant, which is not always desirable, and
we need calibrating probability outputs, then AUC is not
preferable.

AUC is not a useful metric when there are wide disparities in


the cost of false negatives vs. false positives, and it is difficult
to minimize one type of classification error.
PERFORMANCE METRICS FOR REGRESSION
The metrics used for regression are different from the classification metrics.

It means we cannot use the Accuracy metric (explained above) to evaluate a regression model; instead, the performance of a
Regression model is reported as errors in the prediction.

Mean Absolute Error

Mean Squared Error

R2 Score

Adjusted R2
PERFORMANCE METRICS FOR REGRESSION
Mean Absolute Error measures the absolute difference between actual and
predicted values, where absolute means taking a number as Positive.

Mean
Absolute Error Let's take an example of Linear Regression, where the model draws a best fit line
between dependent and independent variables. To measure the MAE or error in
prediction, we need to calculate the difference between actual values and
predicted values. But in order to find the absolute error for the complete dataset,
we need to find the mean absolute of the complete dataset.

Y is the Actual value, Y' is the predicted value, and N is the total number of observations.
PERFORMANCE METRICS FOR REGRESSION

MAE is much more robust for the outliers.

One of the limitations of MAE is that it is not differentiable,


Mean so for this, we need to apply different optimizers such as
Absolute Error Gradient Descent.

However, to overcome this limitation, another metric can be


used, which is Mean Squared Error or MSE.
PERFORMANCE METRICS FOR REGRESSION
Mean Squared error or MSE is one of the most suitable metrics for Regression
evaluation. It measures the average of the Squared difference between predicted
values and the actual value given by the model.

Mean Squared
Error MSE is usually positive and non-zero.

Due to squared differences, it penalizes small errors also, and hence it leads to
over-estimation of how bad the model is.

MSE is a much-preferred metric compared to other regression metrics as it is


differentiable and hence optimized better.

Y is the Actual value, Y' is the predicted value, and N is the total number of observations.
PERFORMANCE METRICS FOR REGRESSION
R squared error is also known as Coefficient of Determination, which is another
popular metric used for Regression model evaluation.

R2 SCORE
Determines the goodness of fit.

Strength of relationship between dependent and independent on the scale of


0-100%.

The R squared score will always be less than or equal to 1 without concerning
if the values are too large or small.
PERFORMANCE METRICS FOR REGRESSION
Adjusted R squared, as the name suggests, is the improved version of R squared
error.

R square has a limitation of improvement of a score on increasing the terms,


ADJUSTED R2 even though the model is not improving, and it may mislead the data scientists.
SQUARED
To overcome the issue of R square, adjusted R squared is used, which will
always show a lower value than R².

It is because it adjusts the values of increasing predictors and only shows


improvement if there is a real improvement.

n is the number of observations


k denotes the number of independent variables
and Ra2 denotes the adjusted R2
QUESTIONS

1) Classifications of ML.

2) Steps involved in Data Preprocessing

3) Underfitting and Overfitting (When does it occur, Graph and how to

avoid?)

4) Performance metrics for Classification

5) Performance metrics for Regression


THANK YOU

You might also like