0% found this document useful (0 votes)
25 views74 pages

1.machine Learning Basics

The document provides an overview of machine learning, detailing its definition, historical context, and various types of learning including supervised, unsupervised, and reinforcement learning. It discusses the importance of data in training machine learning models, the differences between supervised and unsupervised learning, and the concepts of bias and variance in model performance. Additionally, it highlights strategies to mitigate issues like overfitting and underfitting in machine learning algorithms.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views74 pages

1.machine Learning Basics

The document provides an overview of machine learning, detailing its definition, historical context, and various types of learning including supervised, unsupervised, and reinforcement learning. It discusses the importance of data in training machine learning models, the differences between supervised and unsupervised learning, and the concepts of bias and variance in model performance. Additionally, it highlights strategies to mitigate issues like overfitting and underfitting in machine learning algorithms.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 74

MANIPAL UNIVERSITYJAIPUR

AI3201: MACHINE LEARNING

Machine Learning Basics


Why Artificial Intelligence (AI)?

Human Computational Power


Human
is Limited
Human Have limited Working
Capacity
 Have High Computational Power
AI
 Can work 24x7 with same
efficiency
What is Artificial Intelligence
(AI)?

 Artificial intelligence (AI) is a term to describe a branch of


computer science that is dedicated to creating intelligent
machines that would learn to work and react like humans.

 The birth of artificial intelligence 1952–1956

 In the 1940s and 50s – AI worked on mathematics, psychology,


engineering, economics and political science.
 In 1950 Alan Turing-: Turing's test it was reasonable to say that
the machine was "thinking"
 In 1951 Game AI
 In 1955 Symbolic reasoning
 The golden years 1956–1974:- The programs developed in the
years(NLP)
 In 1967 Robotics
 Boom 1980–1987 AI program called "expert systems"
 Artificial neural network (ANN)
 Machine Learning
 Deep Learning ………… and Research continue on AI
Computer Science

Image AI
Processing Pattern
Recognition
Symbolic Machine
Learning Learning

Computer Statistical
Robotics Vision Deep Learning
Learning

NN
Speech
NLP
Recognition
Application of AI
Learning
 Learning is the process of acquiring new
understanding, knowledge, behaviors, skills, values,
attitudes, and preferences. The ability to learn is
possessed by humans, animals, and some
machines; there is also evidence for some kind of
learning in certain plants.

Source: Wikipedia
Machine Learning
Types of Learning
Databases ,Symbol and Computer
Vision
 Machine Learning based on Databases. Data are required to train
machine and analysis of data using Machine Learning Algorithm.
 Symbolic Learning based on symbol like human readable
character ,number and other symbol.
 Computer vision tasks include methods for acquiring,
processing, analyzing and understanding digital images, and
extraction of high-dimensional data from the real world in order to
produce numerical or symbolic information, e.g. in the forms of
decisions.
What is Machine Learning?

Traditionally Machine Learning


Adapt to new
data by
Give “instructions” New Data learning from
previous data.

Results Result
s
Machine Learning
Machine Learning

 “the field of study that gives computers the ability to learn


without being explicitly programmed.”- Arthur Samuel

 “A computer program is said to learn from experience E with


respect to some class of tasks T and performance measure P,
if its performance at tasks in T, as measured by P, improves
with experience E.”
-Tom Mitchell
Machine Learning Continue

 It is Subset of Artificial Intelligence


 It is brain of Artificial Intelligence
 It is collection of Algorithm
 Itprovides systems the ability to automatically learn
and improve from experience without being explicitly
programmed.
 Machine learning focuses on the development of
computer programs that can access data and use it
learn for themselves.
What Machine Learning Models are Out
There?
In general, there are 3 families of models:
• Supervised Learning
• Unsupervised Learning
• Reinforcement Learning

15
Family of
ML Models

Supervised Unsupervised Reinforcement


Learning Learning Learning

Determine Learn by
Discover New
relationships Rewarding
Patterns
through training Actions

16
Types of Machine Learning
Type of Learning

Supervise Unsupervise Reinforceme


d Learning d Learning nt Learning

Reinforcement
learning (RL) is an
Supervised Unsupervised
area of machine
learning is learning is a type learning concerned
the machine of machine with how software
learning task of learning algorithm agents ought to take
learning a function used to draw actions in an
that maps an input to inferences from environment in order
datasets consisting of to maximize the notion
an output based on
input data without of cumulative reward.
example input-output
What is Supervised Learning?

• Supervised learning is the algorithm learning


from known and labeled data
• When unknown data is input to the model, we
can get a response based on what it has learned
Real life examples
include:
• Biometric attendance
• Fraud detection

20
Supervised Learning

In supervised learning, the training


set you feed to the algorithm
includes the desired solutions,
called labels
Ex. The spam filter, Car Price
Prediction
What is Unsupervised Learning?

• Unsupervised learning is learning from unknown


data with no supervision
• The machine can identify patterns from
unlabeled data and execute the appropriate
results
Real life examples include:
• Identifying accident prone areas
• Market basket analysis

22
Un-Supervised Learning

 In In unsupervised learning, the


training data is unlabeled.
 Ex: blog’s visitors
Semi-Supervised Learning

 We will have plenty of


unlabeled instances, and few
labeled instances.
 Ex: blog’s visitors
Semi-supervised learning with two
classes (triangles and squares):
Self-Supervised Learning

 In Self Supervised Learning


generating a fully labeled
dataset from a fully unlabeled
one.
 Ex: pet classification model
Self-supervised learning example:
input (left) and target (right)
Reinforcement learning

 The learning system, called an agent in


this context, can observe the
environment, select and perform
actions, and get rewards in return (or
penalties in the form of negative
rewards, as shown in Figure 1-13). It
must then learn by itself what is the
best strategy, called a policy, to get the
most reward over time. A policy
Figure 1-13. Reinforcement defines what action the agent should
learning choose when it is in a given situation.
Supervised Learning Common Algorithms
Examples

Classification Regression
(Categorical Data) (Numerical Data)
Variable
Variable 3a
2a Variable
Variable 3b
1 Variable
Variable 4a
2b Variable
4b
Decision Trees Linear
Sok, K. (2021, December 14). Explaining Linear Regression with Hypothesis Testing and Confidence Interval. Medium.
Regression
https://khemsok97.medium.com/explaining-linear-regression-with-hypothesis-testing-and-confidence-interval-aba454e4775e

27
Unsupervised Learning Common Algorithms
Examples
Clustering

K-means Clustering

28
Machine Learning Process
Machine Learning Process
Difference between Supervised
and Unsupervised Learning
Supervised Learning Unsupervised Learning
Supervised learning algorithms are Unsupervised learning algorithms
trained using labeled data. are trained using unlabeled data.

Supervised learning model takes Unsupervised learning model does


direct feedback to check if it is not take any feedback.
predicting correct output or not.

Supervised learning model predicts Unsupervised learning model finds


the output. the hidden patterns in data.

In supervised learning, input data In unsupervised learning, only


is provided to the model along input data is provided to the model.
with the output.
Supervised Learning Unsupervised Learning

The goal of supervised learning is to The goal of unsupervised learning is to


train the model so that it can predict the find the hidden patterns and useful
output when it is given new data. insights from the unknown dataset.

Supervised learning needs supervision to Unsupervised learning does not need any
train the model. supervision to train the model.

Supervised learning needs supervision to Unsupervised Learning can be classified


train the model. in Clustering and Associations problem
s.

Supervised learning can be used for Unsupervised learning can be used for
those cases where we know the input as those cases where we have only input
well as corresponding outputs. data and no corresponding output data.
Supervised Learning Unsupervised Learning

Supervised learning model produces an Unsupervised learning model may give


accurate result. less accurate result as compared to
supervised learning.
Supervised learning is not close to true Unsupervised learning is closer to the
Artificial intelligence as in this, we first true Artificial Intelligence as it learns
train the model for each data, and then similarly as a child learns daily routine
only it can predict the correct output. things by his experiences.
It includes various algorithms such as It includes various algorithms such as
Linear Regression, Logistic Regression, Clustering, KNN, and Apriori algorithm.
Support Vector Machine, Multi-class
Classification, Decision tree, Bayesian
Logic, etc.
Batch Versus Online Learning

 In batch learning, the system is incapable


of learning incrementally: it must be
trained using all the available data. This
will generally take a lot of time and
computing resources, so it is typically
done offline. First the system is trained,
and then it is launched into production
and runs without learning anymore; it just
applies what it has learned. This is called
offline learning.
Online learning

 In online learning, you train the


system incrementally by feeding it
data instances sequentially, either
individually or in small groups
called minibatches.
Figure In online learning, a model
is trained and launched into
production, and then it keeps
learning as new data comes in
Instance-Based Versus Model-Based Learning

 Identify similarly between


objects

Figure Instance-based learning


Instance-Based Versus Model-Based Learning

 In Model based Learning , we


build a model for predicting

Figure Model-based learning


Testing and Validating

 split your data into two sets: the training set


and the test set.
 Train your model using the training set, and
you test it using the test set.
 The error rate on new cases is called the
generalization error (or out-of-sample
error), and by evaluating your model on the
test set, you get an estimate of this error.
Overfitting

 If the training error is low (i.e., your


model makes few mistakes on the training
set) but the generalization error is high, it
means that your model is overfitting the
training data.
Underfitting

 it performs poorly on both, then it is


underfitting. This is one way to tell when
a model is too simple or too complex.
Validation Set/Development Set/ Dev Set

 you simply hold out part of the training set to evaluate


several candidate models and select the best one. The
new held-out set is called the validation set (or the
development set, or dev set). More specifically, you train
multiple models with various hyperparameters on the
reduced training set (i.e., the full training set minus the
validation set), and you select the model that performs
best on the validation set. After this holdout validation
process, you train the best model on the full training set
(including the validation set), and this gives you the final
Model selection using holdout model. Lastly, you evaluate this final model on the test
validation set to get an estimate of the generalization error.
MANIPAL UNIVERSITYJAIPUR

MACHINE LEARNING

Bias-Variance Trade Off – Machine Learning


Bias-Variance Trade Off – Machine Learning

 It is important to understand prediction errors (bias and variance) when it


comes to accuracy in any machine-learning algorithm.
 There is a tradeoff between a model’s ability to minimize bias and variance
which is referred to as the best solution for selecting a value
of Regularization constant.
 A proper understanding of these errors would help to avoid the overfitting
and underfitting of a data set while training the algorithm.
What is Bias?
 The bias is known as the difference between the prediction of the values by the
Machine Learning model and the correct value.
 Being high in biasing gives a large error in training as well as testing data. It
recommended that an algorithm should always be low-biased to avoid the problem
of underfitting.
 By high bias, the data predicted is in a straight line format, thus not fitting
accurately in the data in the data set. Such fitting is known as the Underfitting of
Data.
 This happens when the hypothesis is too simple or linear in nature.
Graph given below for an example
of such a situation.

In such a problem, a hypothesis looks like follows


A model has either of the two
situations:
• Low bias – Low bias value implies fewer assumptions have been made to build the
target function. In this scenario, the model will closely match the training dataset.
• High bias – High bias value implies more assumptions have been made to build the
target function. In this scenario, the model will not match the dataset closely.

A high-bias model will be unable to capture the dataset trend. It has a high error rate and
is considered an underfitting model. This happens because of a very simplified
algorithm. For instance, a linear regression model might be biased if the data has a
non-linear relationship .
Ways To Reduce High Bias

Since we have discussed some disadvantages of having high bias, here are some ways
to reduce high bias in machine learning.

• Use a complex model: The extremely simplified model is the main cause of high
bias. It is incapable of capturing the data complexity. In such scenarios, the model
can be made more complex.
• Increase the training data size: Increasing the training data size can help reduce
bias. This is because the model is being provided with more examples to learn
from the dataset.
• Increase the features: Increasing the number of features will increase the
complexity of the model. This improves the ability of the model to capture the
underlying data patterns.
• Reduce regularisation of the model: L1 and L2 regularisation can help
prevent overfitting and improve the model’s generalisation ability. Reducing
the regularisation or removing it completely can help improve the
performance.
What is Variance?
 The variability of model prediction for a given data point which tells us the spread
of our data is called the variance of the model.
 The model with high variance has a very complex fit to the training data and thus
is not able to fit accurately on the data which it hasn’t seen before.
 As a result, such models perform very well on training data but have high error
rates on test data.
 When a model is high on variance, it is then said to as Overfitting of Data.
 Overfitting is fitting the training set accurately via complex curve and high order
hypothesis but is not the solution as the error with unseen data is high.
 While training a data model variance should be kept low.
The high variance data looks as
follows.

In such a problem, a hypothesis looks like


follows.
Variance error is either low or high:
• Low variance: Low variance implies that the ML model is less sensitive to
changes in the training data. The model will be able to produce consistent
estimates for the target function using different data subsets of the same
distribution. This is underfitting, where the model can’t generalise on test and
training data.
• High variance: High variance implies that the ML model is susceptible to
changes in the training data. When trained on various subsets of data from the
same distribution, the ML model can significantly change the target function
estimation. This scenario is known as overfitting when the ML model does well
on the training data but not on any new data.
Ways To Reduce High Variance
Here are some ways high variance can be reduced:
• Feature selection: The variance error of a model can be reduced by selecting
the only relevant feature. This will decrease the complexity of the model.
• Cross-validation: By dividing the dataset into testing and training sets several
times, cross-validation can identify if a model is underfitting or overfitting. This
can be used for reducing variance by tuning the hyperparameters.
• Simplifying the model: Decreasing the number of parameters of neural
network layers can help reduce the complexity of the model. This, in turn, helps
in reducing the variance of the model.
• Ensemble methods: Boosting, stacking and bagging are common ensemble
techniques that can help reduce the variance of an ML model and improve the
generalisation performance.
• Early stopping: This is a technique used for preventing overfitting by putting a
stop to the deep learning model training when the validation set performance
stops improving.
Bias Variance Tradeoff
 If the algorithm is too simple (hypothesis with linear equation)
then it may be on high bias and low variance condition and thus
is error-prone.
 If algorithms fit too complex (hypothesis with high degree
equation) then it may be on high variance and low bias.
 In the latter condition, the new entries will not perform well.
Well, there is something between both of these conditions,
known as a Trade-off or Bias Variance Trade-off.
 This tradeoff in complexity is why there is a tradeoff between
bias and variance. An algorithm can’t be more complex and less
complex at the same time.
For the graph, the perfect tradeoff
will be like this.
1. Low-Bias, Low-Variance:
The combination is an ideal machine learning model. However, it is
not possible practically.
2. Low-Bias, High-Variance: This is a case of overfitting where
model predictions are inconsistent and accurate on average. The
predicted values will be accurate(average) but will be scattered.
3. High-Bias, Low-Variance: This is a case of underfitting where
predictions are consistent but inaccurate on average. The predicted
values will be inaccurate but will be not scattered.
4. High-Bias, High-Variance:
With high bias and high variance, predictions are inconsistent and
also inaccurate on average.
Various Combinations of Bias-Variance

Combination Characteristic

High bias, low This type of model is said to be underfitting.


variance
High variance, This type of model is said to be overfitting.
low bias
High bias, high This model cannot capture underlying patterns in the
variance dataset (high bias) and is too sensitive to the training
data changes (high variance). Due to this, the model
will mostly give inaccurate and inconsistent
predictions.
Low bias, low This model type will capture the dataset’s underlying
variance patterns (low bias) and isn’t very sensitive to the
training data changes (low variance). This ML model
is ideal as it can produce accurate and consistent
predictions. However, it is not possible in practice.
Difference between bias and
variance:
Bias Variance

Bias occurs in a machine learning Variance is the amount of variation


model when an algorithm is used the target function estimation will
but does not fit properly. change if different training data is
used.
It is the difference between the It talks about how much any
actual values and the predicted random variable deviated from the
values. expected value.
The model cannot find patterns in The model can find most patterns
the training dataset, failing for from the dataset. It learns from
unseen and seen data. noise or unnecessary data.
Fig. . Illustrations of high bias and high variance models. A
toy dataset was generated from the polynomial y = 5 + 0.1x
+ 0.1x 2 + 0.1x 3 + 0.002x 4 + Random Noise. The fits in
(a) and (b) are both parameterizations of a model. Each
model (line) in both fits has approximately the same error
but does not accurately capture the behavior of the data
due to poor model assumptions; in this case, fitting a first
order polynomial to a dataset generated from a fourth order
polynomial. This is an example of high bias models. A
twentieth order polynomial was fit to a subset of the full
dataset in (c), shown in blue. While the model has very good
predictive error for the training dataset it will not extrapolate
well to the data in the testing set; this is overfitting or high
variance. A third order polynomial was fit to the data in (d)
demonstrating a good balance between bias and variance.
The model accurately captures trends in the data while not
Bias—–>Underfitting—->High
train and test error
Variance—->Overfitting—–>High
test error
Thank you
Supervised Learning

c
Classification
Regression

•Numerical Value Analysis •Categorical or discrete Data


•Example Predicting age of a Analysis
person • E-Mail Data “spam” or “not
•Predicting whether stock price spam.”
of a company will increase
tomorrow
Machine Learning Algorithm

 Linear Regression.
 Logistic Regression.
 Decision Tree.
 SVM.
 Naive Bayes.
 kNN.
 K-Means.
 Random Forest.
Parametric Methods

 Parametric methods are statistical techniques that rely on specific


assumptions about the underlying distribution of the population being
studied.
 These methods typically assume that the data follows a known
Probability distribution, such as the normal distribution, and estimate
the parameters of this distribution using the available data.
 The basic idea behind the Parametric method is that there is a set of
fixed parameters that are used to determine a probability model that is
used in Machine Learning as well.
 Parametric methods are those methods for which we priory know that
the population is normal, or if not then we can easily approximate it
using a Normal Distribution which is possible by invoking the Central
Limit Theorem.
Parameters for using the normal distribution are as follows:

 Mean

 Standard Deviation
Assumptions for Parametric
Methods
Parametric methods require several assumptions
about the data:
 Normality: The data follows a normal (Gaussian)
distribution.
 Homogeneity of variance: The variance of the
population is the same across all groups.
 Independence: Observations are independent of
each other.
What are Parametric Methods?

Statistical Tests:

 t-test: Tests for the difference between the means of


two independent groups.
 ANOVA: Tests for the difference between the means of
three or more groups.
 F-test: Compares the variances of two groups.
 Chi-square test: Tests for relationships between
categorical variables.
 Correlation analysis: Measures the strength and
direction of the linear relationship between two
continuous variables.
Machine Learning Models:
• Linear regression: Predicts a continuous outcome based on a
linear relationship with one or more independent variables.
• Logistic regression: Predicts a binary outcome (e.g., yes/no)
based on a set of independent variables.
• Naive Bayes: Classifies data points based on Bayes’ theorem
and assuming independence between features.
• Hidden Markov Models: Models sequential data with hidden
states and observable outputs.
Difference Between Parametric
and Non-Parametric
Parametric Methods Non-Parametric Methods
Non-Parametric Methods use the flexible
Parametric Methods uses a fixed number
number of parameters to build the
of parameters to build the model.
model.
Parametric analysis is to test group A non-parametric analysis is to test
means. medians.
It is applicable for both – Variable and
It is applicable only for variables.
Attribute.
It always considers strong assumptions It generally fewer assumptions about
about data. data.
Parametric Methods require lesser data Non-Parametric Methods requires much
than Non-Parametric Methods. more data than Parametric Methods.
Parametric methods assumed to be a There is no assumed distribution in non-
normal distribution. parametric methods.
Parametric data handles – Intervals data or ratio data. But non-parametric methods handle original data.

Here when we use parametric methods then the result or When we use non-parametric methods then the result or
outputs generated can be easily affected by outliers. outputs generated cannot be seriously affected by outliers.

Parametric methods have more statistical power than Non- Non-parametric methods have less statistical power than
Parametric methods. Parametric methods.

As far as the computation is considered these methods are As far as the computation is considered these methods are
computationally faster than the Non-Parametric methods. computationally slower than the Parametric methods.

Examples: Logistic Regression, Naïve Bayes Model, etc. Examples: KNN, Decision Tree Model, etc.
Thank you

You might also like