0% found this document useful (0 votes)
21 views64 pages

ML Unit-1

The document outlines the course objectives and outcomes for a Machine Learning course at Saranathan College of Engineering, focusing on both supervised and unsupervised learning models. It covers essential topics such as linear algebra, VC dimension, PAC learning, and inductive bias, along with various machine learning applications and challenges. The course aims to equip students with the ability to construct and evaluate machine learning models effectively.

Uploaded by

senthil7111
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views64 pages

ML Unit-1

The document outlines the course objectives and outcomes for a Machine Learning course at Saranathan College of Engineering, focusing on both supervised and unsupervised learning models. It covers essential topics such as linear algebra, VC dimension, PAC learning, and inductive bias, along with various machine learning applications and challenges. The course aims to equip students with the ability to construct and evaluate machine learning models effectively.

Uploaded by

senthil7111
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 64

SARANATHAN COLLEGE OF ENGINEERING

(AUTONOMOUS)

DEPARTMENT OF COMPUTER SCIENCE AND BUSINESS SYSTEMS

COURSE CODE-NAME:AL3451-MACHINE LEARNING

PREPARED BY
S.SENTHIL ME.,(PhD).
AL3451-MACHINE LEARNING

COURSE OBJECTIVES

 To understand the basic concepts of machine learning.

 To understand and build supervised learning models.

 To understand and build unsupervised learning models.

 To evaluate the algorithms based on corresponding metrics identified


COURSE OUTCOMES

• At the end of the course the students should be able to

• CO1: Explain the basic concepts of machine learning.

• CO2 : Construct supervised learning models.

• CO3 : Construct unsupervised learning algorithms.

• CO4: Evaluate and compare different models


TEXT BOOKS

• 1. Ethem Alpaydin, “Introduction to Machine Learning”, MIT Press, Fourth Edition,

2020.

• 2. Stephen Marsland, “Machine Learning: An Algorithmic Perspective, “Second

Edition”,CRC Press, 2014.


UNIT I
INTRODUCTION TO MACHINE LEARNING

TOPICS TO BE COVERED:
Review of Linear Algebra for machine learning
Introduction and motivation for machine learning
Examples of machine learning applications
Vapnik-Chervonenkis (VC) dimension
Probably Approximately Correct (PAC) learning
Hypothesis spaces
Inductive bias
Generalization
Bias variance trade-off.
1.1 Review of Linear Algebra for machine learning
• Linear algebra helps to solve and compute large and complex dataset

Through specific terminology named matrix decomposition techniques

• Linear algebra also helps to create better supervised and unsupervised machine
learning algorithms

• Linear transformations, matrices, and vector spaces play a significant role in


defining and solving problems in machine learning

• To model,analyze, and optimize complex relationships within data

• statistics is an important concept to organize and integrate data in machine


learning
Data in linear algebra
• Linear algebra used in machine learning for representation of data

• Scalar and vector

Scalar

• It is a physical quantity, just a single number.

• It has only magnitude not direction

• Example : 20 and 80
Contd..
• Vector

• It is a geometric object having both magnitude and direction,

• It is an ordered number array,

• Always in row or column

• It has just one index, which can refer to a particular value within the vector

v=[e1,e2,e3,e4]

1-dimensional array.

Vectors have a magnitude and a direction.


Contd.
• Matrix

• Matrix having its left diagonal elements as 1 and other elements 0 is an identity
matrix

• 2-dimensional array

• It is a rectangular array

• It is arranged in rows and columns.

• A=
Contd..
• Tensor
• It is an array of numbers, arranged on a regular grid

Operations
Scalar-vector multiplication
Scalar2
Vector p=[p1,p2,p3]
P*2=[2*p1,2*p2,2*p3]
Scalar-matrix multiplication
Scalar2
Contd..
• Matrix operation

• Scalar-matrix multiplication

• Matrix-matrix addition

• Matrix-matrix subtraction

• Matrix-matrix multiplication

• vector-matrix operation

• Vector-matrix multiplication

• Transpose

• Inverse
Contd..
• Scalar:
24
Vector:
[2,-6,9] or
Matrix:
Vapnik-Chervonenkis (VC) dimension
• In the context of data analysis and machine learning algorithms,
dimensions refers to the features of attributes of data

• Data set: house-price,size,number of bedrooms,location.

• If we add more dimensions to dataset the volume of space increases


the data become sparse

• 1D-points

• 2D-area

• 3D-more points
Introduction and motivation for machine learning
Examples of machine learning applications
• Prerequisite:
• Basics of probability and linear algebra
• Python language
• Derivatives of single variable and multivariante functions
• Machine learning algorithms enable the computers to learn from data, and
even improve themselves, without being explicitly programmed.
• ML allows software applications to become more accurate in predicting
outcomes.
• To build algorithms that can receive input data and use statistical analysis to
predict an output.
• Machine learning concerned with computer programs, that automatically
improve their performance through experience.
• Machine learning is influencing out day-to-day life and perhaps many people.
• Machine learning is an application, that allows machines to learn from
data.
Process of machine learning
• Data collection-collected from various sources such as files, database,
internet, or mobile devices.
• Data preparation
• Data exploration-find correlations, general trends, and outliers.
• Data pre-processing-transforming the data in a proper format to make it more
suitable for analysis.
• Data cleansing-missing values, duplicate data, invalid data, noise.
• Build model-classification, regression, cluster analysis, association.
• Train model-use datasets to train the model.
• Test model-check for the accuracy of model by providing a test dataset to it.
• Deploy model-deploy the model in the real system
Kaggle dataset-popular source for machine learning datasets
Classification of machine learning

• Supervised learning

• Unsupervised learning

• Reinforcement learning
Supervised learning
• Labeled data are provided to the machine learning system for
training, and the system then predicts the output based on the
training data.
• Provide input data as well as correct output data.
• Objectives
• The mapping of the input data to the output data
• Types
• Classification
• Regression
• Example:
• Spam filtering, image classification, fraud detection.
Contd..
Unsupervised learning
• The training is provided to the machine learning with the set of data
that has not been labelled, classified, or categorized and the
algorithm needs to act on that data without any supervision.
• In unsupervised learning, we don’t have a predetermined result.
• Objective
• The machine tries to find useful insights from the huge amount of
data
• Types
• Clustering
• Association
• Ex: recommender system, targeted maketting.
Reinforcement learning
• Reinforcement learning- it is a feedback-based learning method
where the agent interacts with the environment and explores it.
• It learns by interacting with its environment.
• The agent receives rewards for performing correctly and penalties for
performing incorrectly
Example of machine learning application
• Spam mail filter- to develop system
• Youtube recommendation- discover knowledge from large dataset
• Face recognition- smart phone password
• Hand writing recognition
• Image recognition
• Bank-fraud detection
• Hospital-to detect and diagnose diseases
• Companies-supply chain inventory control
• Chatbot
• Self driving cars
• Music composer
Contd..
• The film writer
• Gamer
• Robots
• Visual personal assistant-amzon “alexa” google assistant,apple -siri
Problems
• Data sparsity-clustering and classification challenging

• Increased computation-more resource and time

• Overfitting-reduces models ability to genralize to new data.

• Euclidean distance-the difference in distance between data points tends to

become negligible

• Performance degradation-k-nearest neighbors can drop in performance.

• Visualization challenges-hard to visualize,making EDA more difficult


solution
• In high dimensional data, data points are at the edges or corners, making the data
sparse.

• High dimension refers to the challenges and complications that arises when
analyzing and organising data in high-dimensional spaces(100-1000 dimensions).

• “dimensional reduction”.

• Reduces the no of random variable under consideration by obtaining a set of


principle variable.

• By this we can retain the most important information in dataset while discarding
the redundant or less important features
Dimensional reduction methods

• Principle component analysis


• Linear discriminant analysis
• T-distributed stochastic neighbor embedding
Vapnik- Chervonenkis (VC)
dimension-contd..
• Vapnik- Chervonenkis (VC) dimension is a measure of the size of a
class of sets.
• Vc dimension VC is a measure of the capacity of a hypothesis set to fit
different data set
• Vc dimension is a measure of the complexity of a machine learning
model
• It used to guide the model selection process while developing
machine learning applications
• Vc dimension is the cardinality of largest set of points that the
algorithm can shatter.
Contd..
• Shattering is the ability of a model to classify a set of points perfectly.
• It consider all possible combination of labels upon points.
• Vc dimension of a model is the size of the largest set of that model
can shatter.
• Vc dimension=2, the model can divide the points into two segments
two points in the dataset are shattered.
• R=1 if x is positive example
0 if x is negative example
• Dataset containing N points
• N points can be labled in 2N ways as positive and negative.
• Find a hypothesis hЄH
Contd..
• Maximum no of points that can be shattered by H is called the Vapnik-
chervonenkis (vc) dimension.

• Vc dimension of h denoted as VC(H)

• Measures the capacity of H.

• Vc is capacity of machine learning algorithms

• Capacity-its ability to learn from a given dataset.

• Accuracy-its ability to correctly identify labels for a given dataset

• Vc dimension act as a guiding light in model selection

• Capacity of classification model=complexity of classification model


Contd.
• Eg. After some examples learner will have learned the correct concept

• Correct means agrees with target concept on labels for all data.
Probably Approximately Correct (PAC) learning
• PAC learning to find a hypothesis that performs well on unseen examples given a sample
of labeled data for training.

• Training data is drawn independently and identically from an unknown probability


distribution

Application of PAC learning:

• Supervised learning

• Sample selection

• Model selection & evaluation

• Active learning

• Computational learning theory


PAC Learning
• PAC learning provide theoretical framework for understanding the sample
complexity, generalization, performance, and guarantee in learning.

• PAC learning plays a role in shaping the design, evaluation, and analysis of machine
learning algorithms.

• Probability of successful learning

• Number of training examples

• Complexity of hypothesis space.

• Accuracy to which target function is approximated

• Manner in which training examples are presented


Contd..
• Instances X(set of object in world)

• Target concept C(subset of instance space)

• Hypothesis H(collection of concept over X)

• Training data D(example from instance space)


Contd.
• PAC learning is used to analyze the efficiency of machine learning algorithms.

• PAC learning is to design algorithms that can learn a target concept with high
probability and accuracy, given a finite amount of labeled training data.

• Hypothesis Class(H): set of possible hypotheses or classifiers that the learning


algorithm.

• Concept Class(C): set of all possible target concepts. The goal is for learning
algorithms to output a hypothesis that approximates the true concepts.

• Concept is a function that maps instances to binary labels (0 or 1).


Contd.
• The no of examples (labeled instances) needed for the learning
algorithm to output a hypothesis that is probably approximately
correct with high probability.
• Error & Confidence:
Error(є):
The maximum allowable error rate for the learned hypothesis.
The hypothesis is considered correct if its error is less than є.
Confidence(δ):
The desired confidence level. It represents the probability that the
hypothesis is probably approximately correct.
A learning algorithm is PAC if, for any є >0 and δ>0, with probability at
least 1-δ (confidence level), the algorithm outputs a hypothesis h such
that the error of h is at most є.
• Theoretical results in PAC learning provide bounds on the number of training
examples needed to achieve a certain level of confidence and error.

• If we are doing classification with categorical inputs.

• All inputs and outputs are binary.

• Eg: gender

• Theres a machine f(x,h),hypothesis called h, h, h….

• Example hypothesis: citizenship

• X1ΛX2

• If there are 3 attributes, what is the complete set of hypothesis in f?


Contd.
• In PAC Learning, given a class, c,and PAC learning error rate(є) and
confidence level (δ) and the goal is to achieve low error rates with high
confidence.

• Noise tolerance:

• PAC learning often assumes that the training data may contain some
amount of noise or errors.

• PAC refers to the ability of a learning algorithm to still learn the


underlying concept in the presence of such noise.
Contd.
• PAC learning provides a rigorous theoretical framework for analyzing
the performance of learning algorithms in terms of their ability to
generalize from limited data, control error rates, and achieve high
confidence in their predictions.
Hypothesis spaces
• H is defined as a set of all possible legal hypothesis

• H is used by supervised machine learning algorithms to determine the best


possible hypothesis to describe the target function

• H is constrained by the choice of framing of the problem, the choice of model

• H can be concluded as single hypothesis that maps input to proper output

• H can be evaluated to make predictions

• Y=mx+c
Contd..
• Where,

• Y-range

• M-slope of the line which divided test data

• X-domain

• C-intercept(constant)

• Ex: lets understand the hypothesis(h) and hypothesis space(H) with


two dimensional coordinate plane showing the distribution of data
Contd..
• H is the combination of all legal best possible ways to divide the
coordinate plane so that it best maps input to proper output.
Inductive bias
• Average squared difference between predictions and true values.
• It’s a measure of how well your model fits the data.
• Zero bias would mean that the model captures the true data-
generating process perfectly.
• Both your training and validation loss would go to zero
• That is unrealistic.
• Inductive bias
• Every machine learning model requires some type of architecture
design and some initial assumptions about the data, to analyze.
• Every belief that we make about the data is a form of inductive bias.
• Play a role in the ability of machine learning models to generalize to
unseen data.
Contd..
• Given a training dataset, we need some additional constraints or
criteria to help us better fit the training samples, so that the trained
model can make better predictions on the unseen samples.
• the additional constraints or criteria here are called inductive bias.

• In traditional machine learning, every algorithm has its own inductive


biases.
• It refers to the set of assumptions that a learning algorithm makes to
predict outputs for inputs it has never seen.
• it’s the bias of a model towards making a particular kind of
assumption in order to generalize from its training data to unseen
situations.
Importance-inductive bias
• Learning from limited data- IB helps models generalize to unseen data
based on the assumptions they carry.

• Guiding learning- given a dataset, there can be countless hypothesis


that fit the data. IB helps the algorithm to choose one plausible
hypothesis.

• Preventing overfitting- a model with no bias might fit with the training
data perfectly, capturing every minute detail, including noise.

• An inductive bias can prevent a model from overfitting by making it


Favor simpler hypotheses.
Types-IB
• Preference bias-it expresses a preference for some hypotheses over
others.

• For example, in decision tree algorithms like ID3, the preference is for
shorter trees over longer trees.

• Restriction bias-it restricts the set of hypotheses considered by the


algorithm. for instance, a linear regression algorithm restricts its
hypothesis to a linear relationship between variables.
Example of IB
• Decision tree- a bias towards shorter trees and splits that categorize
the data most distinctly at each level.

• K-NN-the algorithm assumes that instances that are close to each


other in the feature space have similar outputs.

• Neural network- they have a bias towards smooth functions. the


architecture itself(no of layers, no of neurons) can also impose bias.

• Linear regression- assumes a linear relationship between the input


features and the output.
Generalization
• It refers to your model’s ability to adapt properly to new, previously unseen
data, drawn from the same distribution as the one used to create the model.

• Supervised learning in the domain of machine learning refers to a way for the
model to learn and understand data. based on this training data, the model
learns to make predictions.

• Generalization refers to the model’s capability to adapt and react properly to


previously unseen, new data that has been drawn from the same distribution
as the one used to build the model.

• Generalization examines how well a model can digest new data and make
Contd..
• A model is able to generalize is the key to its success.
• If you train a model too well on training data, it will be incapable of generalizing.
In such cases, it will end up making erroneous predictions when its given new
data. This would make the model ineffective even though its capable of making
corrections predictions for the training dataset. This is known as overfitting.
• The inverse(underfitting) is also true, which happens when you train
a model with inadequate data.
• In case of underfitting, your model would fail to make accurate predictions even
with the training data.
• This would make the model just as useless overfitting.
• Generalization is a measure of how your model performs on predicting unseen
data.so it is important to come up with the best-generalized model to give
better performance against future data.
Contd..
• Let us first understand what is underfitting and overfitting, and then
see what are the best practices to train a generalized model.
What is underfitting?
• Underfitting is a state where the model cannot model itself on the
training data. and also not able to generalize new data.

• You can notice it with the help of loss function during your training.

• A simple rule of thumb is if both training loss and cross-validation loss


are high, then your model is underfitting.

• Lack of data, not enough features, lack of variance in training data or


high regulation rate can cause underfitting.

• A simple solution is to add more shuffled data to your training.

• Training increase, testing reduce


What is overfitting?
• Overfitting is a situation where your model force learns the whole
variance.
• Experts say it as model starts to memorize all the noise instead of
learning.
• A simple rule of thump to identify the overfitting is if your training
loss is low and cross – validation loss is high then your model is
overfitting.
• Uncleaned data, fewer steps in training, higher complexity of the
model can cause overfitting.
Best practices to get the
generalized model
• Split the data into training and testing data
• Testing set is used to cross-validate the trained model
• It is always good to ensure that the distribution in all the dataset is
same.
• To achieve this goal, you can track the performance of a machine
learning algorithm over time as its working with a set of training data.
• We can plot the both the skill on the training data and the skill on the
testing dataset that held back from the training process.
• Training the model for too long would cause a continual decrease in
the performance on the training dataset due to overfitting.
• At the same time, due to the models decreasing ability for
generalization, the error for the test set would start to increase again.
Contd..
• Regularization is a method to avoid high variance and overfitting and
increase generalization. Without getting into details
• Regularization aims to keep coefficients close to zero.
• Low bias: The model will make fewer assumptions about the form of
the target function.
• High bias: a model with a high bias makes more assumptions, and the
model cannot capture the important features of our dataset. A high-
bias model also cannot perform well on new data.
• Variance- tells how much a random variable is different from its
expected value.
• Low variance means there is a small variation in the prediction of the
target function with changes in the training data set.
Contd..
• High variance shows a large variation in the prediction of the target
function with changes in the training dataset.

• Ways to reduce high variance:

• Reduce the input features or number of parameters as a model is


overfitted.

• Do not use a complex model.

• Increase the training data.

• Increase the regularization term


Bias-variance trade-off
• While building the machine learning model, it is really important to take
care of bias and variance in order to avoid overfitting and underfitting in the
model.

• If the model is very simple with fewer parameters, it may have low variance
and high bias.

• Whereas, if the model is complex, and which has a large number of


parameters, it will have high variance and low bias

• So it is required to make a balance between bias and variance errors, and


this balance between the bias error and variance error is known as bias-
variance trade-off.
Properties of inductive bias
• The strength of inductive bias describes its limitation on the size of
the hypothesis space that the learner can search.
• Strong inductive bias gives the learner a relatively small research
space, while weak inductive bias provides a broader search space for
the learner.
• How to measure it?
• Vc dimension theory correctness.
• Only the correct inductive bias can ensure that the learner
successfully learns the target concept.
• Conversely, under the incorrect induction bias, the learner cannot
learn the correct target no matter how many training samples are
used.
• How to measure it? PAC learning theory

• Trade-off
• While inductive bias helps models generalize from training data,
there’s a trade-off.
• A strong inductive bias means the model might not be flexible enough
to capture all patterns in the data.
• On the other hand, a bias that is too weak could lead the model to
overfit the training data.
• Inductive bias is the “background knowledge” or assumptions that
guide a machine learning algorithm.
• It’s essential for generalization, especially when the training data is
noisy.
• However, choosing the right type and amount of inductive bias for a
particular problem is an art and is crucial for the success of the
model.
variance
• A model is said to have high variance if its predictions are sensitive to
small challenge in the input.
• When a model does not perform as well as it does with the trained
data set, there is a possibility the model has a variance.
• It basically tells how scattered the predicted values are from the
actual values
• Bias: error in training data
• Variance: error in the data
• Overfitting: a statistical model is said to be overfitted when we feed it
a lot more data than necessary. Training data accuracy is high and test
data accuracy is low.
• Underfitting: in order to avoid overfitting, we could stop the training
at an earlier stage.
• Training data acc is low and test data acc is low underfitting would
imply that the model has still capacity to learn, so you would simply
train for more iterations or collect more data.
Bias-variance trade off
• It is a standalone theory that provides a different perspective on
generalization.

• The bias-variance trade-off in machine learning involves a tradeoff between


approximation and generalization, aiming to minimize the error in learning.

• Bias variance quantifies how well the best hypothesis performs in


approximating the target function, taking into account the overall ability of the
hypothesis set to approximate the function.

• The decomposition of the out-of-sample error into approximation and


generalization components can help understand the behaviour of the
hypothesis and its performance on different data sets.
Contd..
• The variance term in bias-variance trade-off arises from the fact that
we only have access to one dataset at a time, resulting in different
outcomes for each dataset.
• The bias-variance trade-off can be measured by comparing the
squared difference between the predicted values and the true values
which is called variance, the difference between predicted values and
expected values which is called bias.
• Increasing the size of the hypothesis set reduces bias but increases
variance, while decreasing the size of the hypothesis set increases
bias but reduces variance.
• The bias-variance trade-off highlights the importance of finding the
right balance between model complexity and data resources in a
leading situation.
• Overfitting occurs when a complexity model with many degrees of
freedom fits the training set perfectly but fails to generalize well,
resulting in a high out-of-sample error and no real learning.
• Ensemble learning methods, such as bagging, rely on the concept of
reducing variance by averaging multiple models or predictions leading
to improve performance.

You might also like