Unit 1 ML - Ver 2
Unit 1 ML - Ver 2
Learning
Subject code:
Regulations: 2021 AL3451
Input Data
Traditional Output
Program Programming
Input Data
(Training Data)
ML Algorithm
(Program)
Output
Output Programming
(Training Data)
Input Data
(Huge volume)
Machine Learning (ML): An Introduction
• ML enables to learn from past data or from past experiences without
being explicitly programmed and ML is highly dependent on models,
that is algorithms, in simple words, computer programs.
• ML Definitions:
The use and development of computer systems that are able to
learn and adapt without following explicit instructions, by using
algorithms and statistical models to analyse and draw inferences
from patterns in data.
With the help of sample historical data, which is known as training
data, machine learning algorithms build a mathematical model that
helps in making predictions or decisions without explicitly
programmed.
Machine Learning (ML): An Introduction
• Example :
House Size
No. of Bed rooms No. of Bath rooms No. of years Selling Price
(sq ft)
2000 3 2 5 3 Lakhs
1500 2 1 7 2 Lakhs
2500 4 2 2 4 Lakhs
1200 2 1 9 1.5 Lakhs
• A linear regression model might come up with an equation like this:
Selling Price=
+(
Supervised Learning :
• In Supervised learning, a model is getting trained with a
labelled dataset.
• It is a process of providing input data as well as correct output
data, the supervised learning algorithm is to find a mapping
function to map the input with the output.
Supervised Learning
Supervised Learning
• Supervised learning is a fundamental paradigm in machine
learning where the model is trained on a labeled dataset.
• This means that for each input in the training set, the desired
output or "label" is provided.
• In all the cases, the axis aligned rectangle classifier shatters all
possible combinations of data points, assuming the points are in
general position (ie, not all 4 points are collinear).
Vapnik-Chervonenkis (VC) Dimension
• VC dimensions are used to quantify the accuracy of the
MC model.
• VC dimension is a measure of the capacity or complexity
of a space of functions, that can be learned by a
classification algorithm
• VC dimension is the maximum number of points that can
be separated by a model for all possible configuration.
Rectangle Classifier with Five data points
• For any 5 data point set, we can define a axis aligned rectangle
which has the most extern points as vertices.
• In the above cases, the axis aligned rectangle can not use a
rectangle classifier.
VC Dimension: A Summary
• For a data set contining N points, it can be labelled in 2N ways.
• A linear classifier tries to find a line that can separate these two points based
on their classes. The hypothesis in this context can be represented by the
equation of the line:
• For simplicity, let's consider a two-dimensional space (features and ) and two
data points that belong to either Class 0 or Class 1.
• Here, and are the features of the data points, and , and are the parameters of
the model that define the decision boundary. The hypothesis predicts the class
of a data point based on the sign of the output:
• For instance, a decision boundary might be given by the equation -1, which
can be rewritten in the form of as
, with = -1, = 1, -1.
• Imagine we have Point A (1, 2) in Class 0 and Point B (4, 2) in Class 1.
• For Point A(< 0) it would be classified as Class 0, and for Point B(≥ 0), it would
be classified as Class 1.
• PAC learning offers insights into how well a learning algorithm can be
expected to perform given certain conditions, including the
complexity of the learning task, the amount of training data, and the
allowable error rate.
• Probably: Refers to the confidence (δ) with which we can expect the
learning algorithm to achieve its goal.