0% found this document useful (0 votes)
8 views105 pages

Module-4 ML Landscape

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views105 pages

Module-4 ML Landscape

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 105

Introduction:

Machine learning Landscape:: Module 4

Sushma B.
Assistant Professor
Dept. of Electronics & Communication Engineering
CMRIT, Bangalore

BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 1 / 105


Definition: A computer program is said to learn from experience E
with respect to some class of tasks T and performance measure P, if
its performance at tasks in T, as measured by P, improves with
experience E.
A computer program that learns to play checkers might improve its
performance as measured by its abiliry to win at the class of tasks
involving playing checkers games, through experience obtained by
playing games against itself.
A well defined problem has three features: The class of tasks, The
measure of performance to be improved, and The source of
experience.

BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 2 / 105


A checkers learning problem:
I Task T: playing checkers
I Performance measure P: percent of games won against opponents
I Training experience E: playing practice games against itself
A handwriting recognition learning problem:
I Task T: recognizing and classifying handwritten words within images
I Performance measure P: percent of words correctly classified
I Training experience E: a database of handwritten words with given
classifications
A robot driving learning problem:
I Task T: driving on public four-lane highways using vision sensors
I Performance measure P: average distance travelled before an error
I Training experience E: a sequence of images and steering commands
recorded while observing a human driver

BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 3 / 105


Examples of machine learning

Learning to recognize spoken words.


Learning to drive an autonomous vehicle: Machine learning methods
have been used to train computer-controlled vehicles to steer correctly
Learning to classify new astronomical structures: Machine learning
methods have been applied to a variety of large databases to learn
general regularities implicit in the data.

BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 4 / 105


Designing learning system
Steps
Choosing the Training Experience
Choosing the Target Function
Choosing a Representation for the Target Function
Choosing a Function Approximation Algorithm
The Final Design

BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 5 / 105


1. Choosing the Training Experience:
The very important and first task is to choose the training data or
training experience which will be fed to the Machine Learning
Algorithm.
Training data or experience should be chosen wisely.: It is important
to note that the data or experience that we fed to the algorithm must
have a significant impact on the Success or Failure of the Model.
The attributes which will impact on Success and Failure of Data
I The training experience provides direct or indirect feedback regarding
the choices made by the performance system.
I Degree to which the learner will control the sequences of training
examples (Train with same data again and again till it gains the
required experience)
I Training experience is how well it represents the distribution of
examples over which the final system performance P must be
measured. Machine learning algorithm will get experience while going
through a number of different cases and different examples.

BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 6 / 105


A checkers learning problem:
I Task T: playing checkers
I Performance measure P: percent of games won in the world tournament
I Training experience E: games played against itself
In order to complete the design of the learning system, we must now
choose
I 1. the exact type of knowledge to be,learned
I 2. a representation for this target knowledge
I 3. a learning mechanism

BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 7 / 105


Choosing the Target Function
In machine learning, the target function is the method an AI
algorithm uses to solve a problem by parsing training data. The target
function is a formula that the algorithm uses to calculate predictions.
The next design choice is to determine exactly what type of knowledge
will be learned and how this will be used by the performance program.
according to the knowledge fed to the algorithm the machine learning
will choose NextMove function which will describe what type of legal
moves should be taken.
For example : While playing chess with the opponent, when opponent
will play then the machine learning algorithm will predict what be the
number of possible legal moves taken in order to get success.

BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 8 / 105


Choosing Representation for Target function
When the machine algorithm will know all the possible legal moves the
next step is to choose the optimized move using any representation
For any given board state, the function V will be calculated as a
linear combination of the following board features:
I x1: the number of black pieces on the board
I x2: the number of red pieces on the board
I x3: the number of black kings on the board
I x4: the number of red kings on the board
I x5: the number of black pieces threatened by red (i.e., which can be
captured on red’s next turn)
I x6: the number of red pieces threatened by black
V(b)=w0+w1x1+w2x2+w3x3+w4x4+w5x5+w6x6
w0 through w6 are numerical coefficients, or weights, to be chosen by
the learning algorithm. The weight w0 will provide an additive
constant to the board value.
Learned values for the weights w1 through w6 will determine the
relative importance of the various board features in determining the
value of the board
BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 9 / 105
Choosing a Function Approximation Algorithm
In order to learn the target function f we require a set of training
examples, each describing a specific board state b and the training
value Vtrain (b) for b.
Each training example is an ordered pair of the form (b, Vtrain (b)).
Function Approximation Procedure
I Derive training examples from the indirect training experience available
to the learner : Indirect training experience refers to information or
feedback that isn’t directly formatted as labeled training examples
I Adjusts the weights wi to best fit these training examples
I Estimating training values
I Adjusting the weights

BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 10 / 105


Estimating training values
This approach is to assign the training value of Vrain(b) for any
intermediate board state b to be V (successor (b)).

BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 11 / 105


Final Design
The final design is created at last when system goes from number of
examples , failures and success , correct and incorrect decision.

BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 12 / 105


Performance System
The Performance System is the module that must solve the given
performance task, in this case playing checkers, by using the learned
target function(s).
It takes an instance of a new problem (new game) as input and
produces a trace of its solution (game history) as output.

BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 13 / 105


Performance System
The Performance System is the module that must solve the given
performance task, in this case playing checkers, by using the learned
target function(s).
It takes an instance of a new problem (new game) as input and
produces a trace of its solution (game history) as output.
Next move at each step is selected based on learned V̂ evaluation
function
With experience we expect the performance to improve evaluation
function becomes increasingly accurate.

BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 14 / 105


The Critic takes as input the history or trace of the game and
produces as output a set of training examples of the target function.
The Generalizer takes as input the training examples and produces an
output hypothesis that is its estimate of the target function. This
hypothesis represents the Generalizer’s estimation of the target
function, which maps inputs to desired outputs.
A hypothesis in machine learning refers to the model or function that
a learning algorithm generates to approximate the target function.
The Experiment Generator takes as input the current hypothesis
(currently learned function) and outputs a new problem (i.e., initial
board state) for the Performance System to explore.

BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 15 / 105


BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 16 / 105
Perspectives and issues in machine learning

Perspectives
The evolution of ML algorithms, such as deep learning and
reinforcement learning, has led to state-of-the-art performance in
tasks like image recognition, speech processing, and game-playing.
Automating decision-making processes
The ability to process large datasets allows ML systems to uncover
patterns and insights that are otherwise impossible to detect
manually, enabling data-driven decision-making.

BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 17 / 105


Issues in machine learning
It involves searching a very large space of possible hypotheses to
determine one that best fits the observed data and any prior
knowledge held by the learner.
Data plays a significant role in the machine learning process. One of
the significant issues that machine learning professionals face is the
absence of good quality data.
Many ML models, especially deep learning systems, function as black
boxes, making it difficult to interpret how decisions are made. This
lack of transparency raises concerns in high-stakes applications like
healthcare and law.
ML models can overfit training data, failing to generalize to unseen
data. Striking the right balance between model complexity and
performance is essential.

BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 18 / 105


What is machine learning?

Machine Learning is the science (and art) of programming computers


so they can learn from data.
Arthur Samuel, 1959: Machine Learning is the field of study that gives
computers the ability to learn without being explicitly programmed.
Tom Mitchell, 1997: A computer program is said to learn from
experience E with respect to some task T and some performance
measure P, if its performance on T, as measured by P, improves with
experience E.

BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 19 / 105


The examples that the system uses to learn are called the training set.
Each training example is called a training instance (or sample).
Spam filtering: Task T is to flag spam for new emails, the experience
E is the training data.
The performance measure P : The ratio of correctly classified emails.
This particular performance measure is called accuracy and it is often
used in classification tasks.

BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 20 / 105


Why Use Machine Learning?

Figure: Traditional approach

BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 21 / 105


BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 22 / 105
BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 23 / 105
BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 24 / 105
Machine learning is useful for:
Problems for which existing solutions require a lot of hand-tuning or
long lists of rules. Machine Learning algorithm can often simplify
code and perform better.
Complex problems for which there is no good solution at all using a
traditional approach: the best Machine Learning techniques can find
a solution.
Fluctuating environments: a Machine Learning system can adapt to
new data.
Getting insights about complex problems and large amounts of data.

BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 25 / 105


Types of Machine Learning Systems

Classification based on
Whether or not they are trained with human supervision (supervised,
unsuper- vised, semisupervised, and Reinforcement Learning)
Whether or not they can learn incrementally on the fly (online versus
batch learning)
Whether they work by simply comparing new data points to known
data points, or instead detect patterns in the training data and build
a predictive model.
Supervised/unsupervised learning
Batch or online based learning
Model or instance based learning

BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 26 / 105


Supervised/Unsupervised Learning

Machine Learning systems can be classified according to the amount


and type of supervision they get during training.
Four major categories: supervised learning, unsupervised learning,
semisupervised learning, and Reinforcement Learn- ing.
Supervised learning
I In supervised learning, the training data you feed to the algorithm
includes the desired solutions, called labels
I A typical supervised learning task is classification. The spam filter is a
good example of this: it is trained with many example emails along
with their class (spam or ham), and it must learn how to classify new
emails.
I predict a target numeric value, such as the price of a car, given a set of
features (mileage, age, brand, etc.) called predictors. This sort of task
is called regression

BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 27 / 105


BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 28 / 105
In Machine Learning an attribute is a data type (e.g., “Mileage”),
while a feature has several meanings depending on the context.
Supervised learning algorithms: k-Nearest Neighbors, Linear
Regression, Logistic Regression, Support Vector Machines (SVMs),
Decision Trees and Random Forests, Neural networks

BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 29 / 105


Unsupervised learning
Training data is unlabelled. Model tries to learn without supervision.
Clustering: i) K-Means, ii) DBSCAN, iii) Hierarchical Cluster Analysis
(HCA)
Visualization and dimensionality reduction: Principal Component
Analysis (PCA), t-distributed Stochastic Neighbor Embedding
(t-SNE)

BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 30 / 105


A related task is dimensionality reduction, in which the goal is to
simplify the data without losing too much information.
One way to do this is to merge several correlated features into one.
For example, a car’s mileage may be very correlated with its age, so
the dimensionality reduction algorithm will merge them into one
feature that represents the car’s wear and tear. This is called feature
extraction.

BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 31 / 105


Anomaly detection

Detecting unusual credit card transactions to prevent fraud, catching


manufacturing defects, or automatically removing outliers from a
dataset before feeding it to another learning algorithm.
The system is shown mostly normal instances during training, so it
learns to recognize them and when it sees a new instance it can tell
whether it looks like a normal one or whether it is likely an anomaly
BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 32 / 105
Association rule learning : In which the goal is to dig into large
amounts of data and discover interesting relations between attributes.
For example, suppose you own a supermarket. Running an association
rule on your sales logs may reveal that people who purchase barbecue
sauce and potato chips also tend to buy steak. Thus, you may want
to place these items close to each other.

BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 33 / 105


Semi-supervised learning

Some algorithms can deal with partially labeled training data, usually
a lot of unla- beled data and a little bit of labeled data. This is called
semisupervised learning
Semisupervised learning algorithms are combinations of unsupervised
and supervised algorithms.
Train on labelled data, then use predictions on unlabelled data to
create new labelled points. These new points are added to the
training data, and the model is retrained iteratively.

BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 34 / 105


Reinforcement Learning

The learning system, called an agent in this context, can observe the
environment, select and perform actions, and get rewards in return.
Learns the strategy, called policy
A policy defines what action the agent should choose when it is in a
given situation.

BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 35 / 105


2. Batch and Online Learning

i) Batch learning
In batch learning, the system is incapable of learning incrementally: it
must be trained using all the available data.
This will generally take a lot of time and computing resources, so it is
typically done offline.
First the system is trained, and then it is launched into production
and runs without learning anymore; it just applies what it has learned.
This is called offline learning.

BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 36 / 105


Online Learning
In online learning, you train the system incrementally by feeding it
data instances sequentially, either individually or by small groups
called mini-batches.
Each learning step is fast and cheap, so the system can learn about
new data on the fly, as it arrives
Online learning is great for systems that receive data as a continuous
flow and need to adapt to change rapidly or autonomously.
Useful when there are limited computing resources:
Online learning algorithms can also be used to train systems on huge
datasets that cannot fit in one machine’s main memory (this is called
out-of-core learning).
The algorithm loads part of the data, runs a training step on that
data, and repeats the process until it has run on all of the data

BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 37 / 105


Instance-Based Versus Model-Based Learning

Categorized based on how well they can generalize:


Possibly the most trivial form of learning is simply to learn by heart.
instance-based learning: the system learns the examples by heart,
then generalizes to new cases by comparing them to the learned
examples (or a subset of them), using a similarity measure.

BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 38 / 105


Model based learning
Build a model from a set of examples, use that to make predictions.

life-satisfaction=θ0 + θ1 xGDP

BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 39 / 105


BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 40 / 105
Which values are better?
User performance measure: Utility function (fitness function), cost
function
Utility function: Decides how good the model is
Cost function: Decides how bad the model is
Linear regression tasks: Use Cost function that measures the distance
between the linear model’s predictions and the training examples; the
objective is to minimize this distance.
Linear regression: Feed the training samples and find the parameters
that make the linear model fit best to your data. This is called
training the model.

BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 41 / 105


BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 42 / 105
Main Challenges of Machine Learning

Insufficient Quantity of Training Data


Non-representative Training Data : Non-representative training data
is a machine learning problem that occurs when a training dataset
doesn’t accurately represent the real-world distribution of data. This
can lead to models that perform well on the training data but don’t
generalize to new data.
I Sampling bias : Sampling bias is a bias in which a sample is collected
in such a way that some members of the intended population have a
lower or higher sampling probability
Poor quality data: if training data is full of errors, outliers and noise,
it will make it harder for the system to detect the underlying patterns,
so your system is less likely to perform well. Solution: Clean the data,
Outliers can be discarded, samples with missing features can be
ignored or fill the missing features or train the model with and
without missing features

BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 43 / 105


irreverent features : System is capable of learning if the training data
contains enough relevant features and not too many irrelevant ones.
A critical part of the success of a Machine Learning project is coming
up with a good set of features to train on. This process, called
feature engineering, which involves
I Feature selection: selecting the most useful features to train on among
existing features.
I Feature extraction: combining existing features to produce a more
useful one
I Creating new features by gathering new data.

BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 44 / 105


Overfitting the Training Data: Model performs well on the training
data, but it does not generalize well.
The amount of regularization to apply during learning can be
controlled by a hyper-parameter.
A hyperparameter is a parameter of a learning algorithm

BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 45 / 105


Underfitting the Training Data
Model is too simple to learn the underlying structure of the data.
Occurs when data is too complex for a simple model. Predictions will
be inaccurate on training samples also
Main options to fix this problem are:
I Selecting a more powerful model, with more parameters
I Feeding better features to the learning algorithm (feature engineering)
I Reducing the constraints on the model (e.g., reducing the
regularization hyper- parameter)

BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 46 / 105


Summary

Machine Learning is about making machines get better at some task


by learning from data, instead of having to explicitly code rules.
Different types of ML systems: supervised or not, batch or online,
instance-based or model-based, and so on.
In ML we need to gather data and feed it to the model for training. If
the algorithm is model-based it tunes some parameters to fit the
model to the training set.
If the algorithm is instance-based, it just learns the examples by heart
and generalizes to new instances by comparing them to the learned
instances using a similarity measure.
The system will not perform well if your training set is too small, or if
the data is not representative, noisy, or polluted with irrelevant
features
your model needs to be neither too simple (in which case it will
underfit) nor too complex (in which case it will overfit).
BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 47 / 105
Testing and Validating

Split the data into two sets: the training set and the test set.
The error rate on new cases is called the generalization error (or
out-of- sample error), and by evaluating your model on the test set,
you get an estimate of this error. This value tells you how well your
model will perform on instances it has never seen before.

BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 48 / 105


Hyper-parameter Tuning

Hyperparameter tuning is the process of selecting the optimal values


for a machine learning model’s hyperparameters.
Hyperparameters are settings that control the learning process of the
model, such as the learning rate, the number of neurons in a neural
network, or the kernel size in a support vector machine.
The goal of hyperparameter tuning is to find the values that lead to
the best performance on a given task.
Hyperparameters are configuration variables that control the learning
process of a machine learning model. They are distinct from model
parameters, which are the weights and biases that are learned from
the data.

BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 49 / 105


Data Mismatch

The most important rule to remember is that the validation set and
the test must be as representative as possible of the data you expect
to use in production.
Train-Dev: Hold part of the training dataset
Train the model on train dataset, and evaluate on train-dev set.
If it performs well, then the model is not overfitting the training set,
so if performs poorly on the validation set, the problem must come
from the data mismatch.

BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 50 / 105


End-to-End Machine Learning Project

Main steps of a Machine Learning project,


Problem.
Get the data.
Discover and visualize the data to gain insights.
Prepare the data for Machine Learning algorithms.
Select a model and train it.
Fine-tune your model.
Present your solution.
Launch, monitor, and maintain your system.

BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 51 / 105


Few dataset portals
Popular open data repositories
I UC Irvine Machine Learning Repository
I Kaggle datasets
I — Amazon’s AWS datasets
Meta portals (they list open data repositories): —
http://dataportals.org/, http://opendatamonitor.eu/ ,
http://quandl.com/

BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 52 / 105


Task-1: Build a model of housing prices in California using the
California census data.
This data has metrics such as the population, median income, median
housing price, and so on for each block group (district consisting 600
to 3000 population) in California.
Model should learn from this data and predict the median housing
price in any block given all the other metrics

BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 53 / 105


Frame the problem:
Know the objective (Application of the ML model you are going to
build)
Usage and benefit of the model?
Selection of the Algorithms, performance measure
Effort and time required
Upstream models: Data producers
Downstream models: Data receivers

BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 54 / 105


BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 55 / 105
Pipelines

A sequence of data processing components is called a data pipeline.


Pipelines are very common in Machine Learning systems, since there
is a lot of data to manipulate and many data transformations to apply.
Components in the system run asynchronously, each independently
processing large data sets and storing outputs in a data store for the
next component to use.
The modular design, with simple data store interfaces, allows teams
to work on separate components.
If one component fails, downstream components can often continue
temporarily by using previous outputs, ensuring robustness and
simplicity through a data flow graph.
If failed component is unnoticed and proper monitoring is not
implemented the data gets stale and the overall system’s performance
drops.

BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 56 / 105


Frame the problem continuation
Understand what the current solution looks like : Gives an idea on
reference performance, as well as insights on how to solve the
problem.
If current solution is manually estimated: a team gathers up-to-date
information about a district, and when they cannot get the median
housing price, they estimate it using complex rules.
This is costly and time-consuming, and estimates will be having more
percentage of errors
Solution: Train a model to predict a district’s median housing price
given other data about that district.
The census data will be a great dataset to exploit for this pur pose,
since it includes the median housing prices of thousands of districts,
as well as other data.

BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 57 / 105


is it supervised, unsupervised, or Reinforcement Learning? : Clearly a
typical supervised learning task since you are given labeled training
examples
Learning? Is it a classification task, a regression task, or something
else? : a typical regression task, since you are asked to predict a
value. It is a multiple regression problem since the system will use
multiple features to make a prediction
Univariate or multivariate regression: It is a univariate regression
problem since we are only trying to predict a single value for each
district.
Choose batch learning : There is no continuous flow of data coming
in the system, there is no particular need to adjust to changing data
rapidly, and the data is small enough to fit in memory, so plain batch
learning should do just fine.

BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 58 / 105


Select performance measure

A typical performance measure for regression problems is the Root


Mean Square Error (RMSE).

BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 59 / 105


m is the number of instances (samples) in the dataset you are
measuring the RMSE on. — For example, if you are evaluating the
RMSE on a validation set of 2,000 districts, then m = 2,000.
x (i) is a vector of all the feature values (excluding the label) of the ith
instance in the dataset, and y (i) is its label (the desired output value
for that instance).

BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 60 / 105


BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 61 / 105
BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 62 / 105
BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 63 / 105
BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 64 / 105
outliers are exponentially rare (like in a bell-shaped curve), the RMSE
performs very well and is generally preferred.

BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 65 / 105


Github-Link: https://github.com/ageron/handson-ml2

BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 66 / 105


Concept Learning

We learn our surrounding through 5 senses : eye, ear, nose, tongue and
skin. We learn lot of things during the entire life. Some of them are based
on experience and some of them are based on memorization. On the basis
of that we can divide learning methods into five types:
Rote Learning (memorization): Memorizing things without knowing
the concept/ logic behind them.
Passive Learning (instructions): Learning from a teacher/expert.
Analogy (experience): Learning new things from our past experience.
Inductive Learning (experience): On the basis of past experience,
formulating a generalized concept.
Deductive Learning: Deriving new facts from past facts.

BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 67 / 105


Concept learning: Problem of searching through a predefined space of
potential hypotheses for the hypothesis that best fits the training
examples
The learning algorithm searches a hypothesis space for a potential
hypothesis that best fits the training examples.
The goal of concept learning is to find a hypothesis that best explains
the relationship between the input and output of a target concept.
A hypothesis is a candidate function that represents the target
concept. The hypothesis is evaluated based on how well it fits the
training examples.
The hypothesis that fits the training examples the best is selected as
the final concept.

BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 68 / 105


The problem of inducing general functions from specific training
examples is central to concept learning
Friend enjoys his favorite water sport on:

BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 69 / 105


Target concept — The concept or function to be learned is called the
target concept and denoted by c. It can be seen as a boolean valued
function defined over X and can be represented as c: X → 0, 1.
Here "Enjoy Sport" is the Target Concept.
Hypothesis consists of a conjunction of constraints on the instance
attributes.
In hypothesis representation, value of each attribute could be either
I "?"— that any value is acceptable for this attribute.
I specify a single required value (e.g., Warm) for the attribute
I "φ" that no value is acceptable
Inductive learning hypothesis is any hypothesis found to approximate
the target function well over a sufficiently large set of training
examples will also approximate the target function well over any other
unobserved examples.

BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 70 / 105


BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 71 / 105
Hypothesis Space
Let X denote the instances and H as hypotheses in the EnjoySport
learning task.
Compute the distinct instances and hypothesis in X and H
Hypothesis h is a vector of six constraints, specifying the values of the
six attributes <Sky, Air Temperature, Humidity, Wind, Water,
Forecast>.
In this hypothesis representation, value of each attribute could be
either “?” or “0” other than defined values. So the hypothesis space
H has 5120 syntactically distinct hypothesis.
They are syntactically distinct but not semantically.
h1 = <Sky=0 AND Temp=warm AND Humidity=? AND
Wind=strong AND Water=warm AND Forecast=same > , h2 =
<Sky=sunny AND Temp=warm AND Humidity=? AND
Wind=strong AND Water=0 AND Forecast=same>
1 (hypothesis with one or more 0) + 4×3×3×3×3×3 (add ? to each
attribute) = 973 semantically distinct hypotheses
BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 72 / 105
BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 73 / 105
The concept learning can be viewed as the task of searching through
a large space of hypotheses.
The goal of this search is to find the hypothesis that best fits the
training examples.
By selecting a particular hypothesis representation, the designer of the
learning algorithm implicitly defines the space of all hypotheses that
the program can ever represent and therefore can ever learn.

BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 74 / 105


General-to-Specific Ordering of Hypotheses

h1 = (Sunny, ?, ?, Strong, ?, ?) and - h2 = (Sunny, ?, ?, ?, ?, ?)


The hypothesis h2 has fewer constraints on attributes than hypothesis
h1.
The sets of instances that are classified positive by hl will be less than
sets of instances that are classified positive by h2 as h2 imposes fewer
constraints on the instance.
In other words, we can also say that, any instance classified positive
by hl will also be classified positive by h2.
Therefore, we say that h2 is more general than hl. In reverse, we can
also say that, h1 is more specific than h2.

BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 75 / 105


Hypothesis h1 and h2 classifies the instance x as positive can written
as h1(x) = 1 and h2(x) = 1.
Now h2 is more general than h1, so it can be written as if h1(x) = 1
implies h2(x) = 1.
h2 is more general than h1, so it can be written as if h1(x) = 1
implies h2(x) = 1. h2 >= g h1

BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 76 / 105


BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 77 / 105
BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 78 / 105
the box on the left represents the set X of all instances, the box on
the right the set H of all hypotheses.
Each hypothesis corresponds to some subset of X, that is, the subset
of instances that it classifies positive. In the figure, there are 2
instances x1 and x2, and 3 hypothesis h1, h2 and h2.
h1 classifies — x1, h2 classifies — x1 and x2, and h3 classifies — x1.
This indicates, h2 is more-general-than h1 and h3.
The arrows connecting hypotheses represent the more-general-than
relation, with the arrow pointing toward the less general hypothesis.
Note the subset of instances characterized by h2 subsumes the subset
characterized by hl , hence h2 is more-general–than h1

BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 79 / 105


Find-s: finding a maximally specific hypothesis

Steps for find-s algorithm.

BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 80 / 105


BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 81 / 105
The first step of FIND-S is to initialize h to the most specific
hypothesis in H h — (Ø, Ø, Ø, Ø, Ø, Ø)
First training example x1 = < Sunny, Warm, Normal, Strong ,Warm
,Same>, EnjoySport = +ve. Observing the first training example, it
is clear that hypothesis h is too specific. None of the “Ø” constraints
in h are satisfied by this example, so each is replaced by the next
more general constraint that fits the example h1 = < Sunny, Warm,
Normal, Strong ,Warm, Same>.
Consider the second training example x2 = < Sunny, Warm, High,
Strong, Warm, Same>, EnjoySport = +ve. The second training
example forces the algorithm to further generalize h, this time
substituting a “?” in place of any attribute value in h that is not
satisfied by the new example. Now h2 =< Sunny, Warm, ?, Strong,
Warm, Same>

BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 82 / 105


Consider the third training example x3 =< Rainy, Cold, High, Strong,
Warm, Change>,EnjoySport = — ve. The FIND-S algorithm simply
ignores every negative example. So the hypothesis remain as before,
so h3 =< Sunny, Warm, ?, Strong, Warm, Same>
Consider the fourth training example x4 =<Sunny,Warm,High,Strong,
Cool,Change>, EnjoySport =+ve. The fourth example leads to a
further generalization of h as h4 =< Sunny, Warm, ?, Strong, ?, ?>
So the final hypothesis is < Sunny, Warm, ?, Strong, ?, ?>

BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 83 / 105


Key property of the FIND-S algorithm

FIND-S is guaranteed to output the most specific hypothesis within H


that is consistent with the positive training examples
FIND-S algorithm’s final hypothesis will also be consistent with the
negative examples provided the correct target concept is contained in
H, and provided the training examples are correct.

BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 84 / 105


Limitations of FIND-S

FIND-S will find a hypothesis consistent with the training data, it has no
way to determine whether it has found the only hypothesis in H consistent
with the data (i.e., the correct target concept), or whether there are many
other consistent hypotheses as well.
In case there are multiple hypotheses consistent with the training examples,
FIND-S will find the most specific. It is unclear whether we should prefer
this hypothesis over, say, the most general, or some other hypothesis of
intermediate generality.
In most practical learning problems there is some chance that the training
examples will contain at least some errors or noise. Such inconsistent sets of
training examples can severely mislead FIND-S, given the fact that it ignores
negative examples.
There can be several maximally specific hypotheses consistent with the data.
Find S finds only one.

BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 85 / 105


Consistent and Version Space

BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 86 / 105


hypothesis h is consistent with a set of training examples D

BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 87 / 105


Lets say, we have a hypothesis h1 = < ?, ?, ?, Strong, ?, ?>, is this
hypothesis consistent with set of training example D ?
In case of training example (3), h(x) != c(x). So hypothesis h1 is not
consistent with D.
Lets say, we have a hypothesis h2 = < ?, Warm, ?, Strong, ?, ?>, is
this hypothesis consistent with set of training example D ?
All the training examples hold h(x) = c(x). So hypothesis h2 is
consistent with D.

BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 88 / 105


BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 89 / 105
The List Then Eliminate algorithm

One obvious way to represent the version space is simply to list all of
its members.
This leads to a simple learning algorithm, which we might call the
List-Then-Eliminate algorithm.
The LIST-THEN-ELIMINATE algorithm first initializes the version
space to contain all hypotheses in H and then eliminates any
hypothesis found inconsistent with any training example.
In principle, List-Then-Eliminate algorithm can be applied whenever
the hypothesis space H is finite. However, since it requires exhaustive
enumeration of all hypotheses in practice it is not feasible.

BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 90 / 105


Representation for Version Spaces —
we can represent the version space in terms of its most specific and
most general members.
For the above enjoy-sport training examples D, we can output the
below list of hypothesis which are consistent with D. In other words,
the below list of hypothesis is a version space.
In the list of hypothesis, there are two extremes representing general
(h1 and h2) and specific (h6) hypothesis. Lets define these 2
extremes as general boundary G and specific boundary S.

BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 91 / 105


Definition - G : The general boundary G, with respect to hypothesis
space H and training data D, is the set of maximally general members
of H consistent with D.
Definition - S : The specific boundary S, with respect to hypothesis
space H and training data D, is the set of minimally general (i.e.,
maximally specific) members of H consistent with D.

BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 92 / 105


Candidate Elimination algorithm

The Candidate-Elimination algorithm computes the version space


containing all hypotheses from H that are consistent with an observed
sequence of training examples.
It begins by initializing the version space to the set of all hypotheses
in H; that is, by initializing the G boundary set to contain the most
general hypothesis in H as G0 ← <?, ?, ?, ?, ?, ?> and initializing
the S boundary set to contain the most specific hypothesis as S0 ←
<0, 0, 0, 0, 0, 0>
These two boundary sets delimit the entire hypothesis space, because
every other hypothesis in H is both more general than S0 and more
specific than G0.
As each training example is considered, the S and G boundary sets are
generalized and specialized, respectively, to eliminate from the version
space any hypotheses found inconsistent with the new training
example.
BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 93 / 105
Candidate elimination algorithm

BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 94 / 105


Candidate elimination algorithm

BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 95 / 105


Candidate elimination algorithm with an example
Training examples with D

CANDIDATE-ELIMINTION algorithm begins by initializing the


version space to the set of all hypotheses in H.
Initializing the G boundary set to contain the most general hypothesis
in H G0 = <?, ?, ?, ?, ?, ?>
Initializing the S boundary set to contain the most specific (least
general) hypothesis S0 <0,0,0,0,0,0>
BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 96 / 105
First training example — its a positive example and when its
presented to the CANDIDATE-ELIMINTION algorithm, it checks the
S boundary and finds that it is too specific and it fails to cover the
positive example. The boundary is therefore revised by moving it to
the least more general hypothesis that covers this new example
No update of the G boundary is needed in response to this training
example because G0 correctly covers this example

BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 97 / 105


When the second training example is presented, it has a similar effect of
generalizing S further to S2, leaving G again unchanged i.e., G2 = G1 =
G0

BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 98 / 105


The third training example — its a negative example and when its
presented to the CANDIDATE-ELIMINTION algorithm, it reveals that
the G boundary of the version space is overly general, that is, the
hypothesis in G incorrectly predicts that this new example is a
positive example. The hypothesis in the G boundary must therefore
be specialized until it correctly classifies this new negative example.

BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 99 / 105


BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 100 / 105
Given that there are six attributes that could be specified to specialize
G2, why are there only three new hypotheses in G3 ? For example,
the hypothesis h = (?, ?, Normal, ?, ?, ?) is a minimal specialization
of G2 that correctly labels the new example as a negative example,
but it is not included in G3. The reason is, this hypothesis is excluded
as it is inconsistent with the previously encountered positive examples.
Now the fourth training example — its a positive example and when
its presented to the CANDIDATE-ELIMINTION algorithm, it further
generalizes the S boundary of the version space. It also results in
removing one member of the G boundary, because this member fails
to cover the new positive example

BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 101 / 105


BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 102 / 105
After processing these four examples, the boundary sets S4 and G4
delimit the version space of all hypotheses consistent with the set of
incrementally observed training examples. The entire version space,
including those hypotheses bounded by S4 and G4.

BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 103 / 105


This learned version space is independent of the sequence in which the
training examples are presented
As further training data is encountered, the S and G boundaries will move
monotonically closer to each other, delimiting a smaller and smaller version
space of candidate hypotheses.
BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 104 / 105
Remarks on Version Space and Candidate elimination algorithm
The version space learned by the Candidate elimination algorithm will
converge toward the hypothesis that correctly describes the target
concept
I there are no errors in the training examples
I there is some hypothesis in H that correctly describes the target
concept.

BEC515A (Sushma, Dept of ECE, CMRIT) December 9, 2024 105 / 105

You might also like