ML 1
ML 1
The version space method is a concept learning process accomplished by managing multiple
models within a version space.
Fundamental Assumptions
Diagrammatical Guidelines
Nodes in the generalization tree are connected to a model that matches everything in its
subtree.
Nodes in the specialization tree are connected to a model that matches only one thing in its
subtree.
In the diagram below, the specialization tree is colored red, and the generalization tree is
colored green.
Generalization and Specialization Leads to Version Space Convergence
The key idea in version space learning is that specialization of the general models and
generalization of the specific models may ultimately lead to just one correct model that
matches all observed positive examples and does not match any negative examples.
That is, each time a negative example is used to specialilize the general models, those
specific models that match the negative example are eliminated and each time a positive
example is used to generalize the specific models, those general models that fail to match the
positive example are eliminated. Eventually, the positive and negative examples may be such
that only one general model and one identical specific model survive.
The version space method handles positive and negative examples symmetrically.
Given:
A representation language.
A set of positive and negative examples expressed in that language.
Compute: a concept description that is consistent with all the positive examples and none of
the negative examples.
Method:
Initialize G, the set of maximally general hypotheses, to contain one element: the null
description (all features are variables).
Initialize S, the set of maximally specific hypotheses, to contain one element: the first
positive example.
Accept a new training example.
o If the example is positive:
1. Generalize all the specific models to match the positive example, but
ensure the following:
The new specific models involve minimal changes.
Each new specific model is a specialization of some general
model.
No new specific model is a generalization of some other
specific model.
2. Prune away all the general models that fail to match the positive
example.
o If the example is negative:
1. Specialize all general models to prevent match with the negative
example, but ensure the following:
The new general models involve minimal changes.
Each new general model is a generalization of some specific
model.
No new general model is a specialization of some other general
model.
2. Prune away all the specific models that match the negative example.
o If S and G are both singleton sets, then:
if they are identical, output their value and halt.
if they are different, the training cases were inconsistent. Output this
result and halt.
else continue accepting new training examples.
Can describe all the possible hypotheses in the language consistent with the data.
Fast (close to linear).
Inductive bias is the set of assumptions or preferences that a learning algorithm uses to make
predictions beyond the data it has been trained on. Without inductive bias, machine learning
algorithms would be unable to generalize from training data to unseen situations, as the
possible hypotheses or models could be infinite.
Sources: Medium
For instance, in a classification problem, if the model is trained on data that suggests a linear
relationship between features and outcomes, the inductive bias of the model might favor
a linear hypothesis. This preference guides the model to choose simpler, linear relationships
rather than complex, nonlinear ones, even if such relationships might exist in the data.
Examples:
Inductive bias in decision trees: A preference for shorter trees with fewer splits.
Inductive bias in linear regression: The assumption that the data follows a linear
trend.
These biases help the algorithm make predictions more efficiently, even in situations where
there is uncertainty.
Inductive bias can be categorized into different types based on the constraints or preferences
that guide a learning algorithm:
1. Language Bias
Language bias refers to the constraints placed on the hypothesis space, which defines the
types of models a learning algorithm can consider. For instance, linear regression models
assume a linear relationship between variables, thereby limiting the hypothesis space to linear
functions.
2. Search Bias
Search bias refers to the preferences that an algorithm has when selecting hypotheses from
the available options. For example, many algorithms prefer simpler models over complex
ones due to the principle of Occam’s Razor, which suggests that simpler models are more
likely to generalize well.
3. Algorithm-Specific Biases
Different machine learning algorithms incorporate distinct inductive biases that shape their
learning and prediction processes:
1. Bayesian Models
In Bayesian models, prior knowledge is treated as a form of inductive bias. This prior helps
the model make predictions even when the available data is limited. The model updates its
predictions as new data becomes available, balancing the prior with the likelihood of the
observed data.
3. Linear Regression
The inductive bias in linear regression is the assumption that the relationship between input
variables and output is linear. This bias works well for datasets with linear patterns but may
fail to capture more complex, nonlinear relationships.
4. Logistic Regression
Logistic regression assumes a linear decision boundary between classes, which makes it
effective for binary classification tasks with linearly separable data.
Each of these algorithms leverages specific inductive biases to balance accuracy and
generalization, ensuring that the model doesn’t overfit or underfit the training data.
Inductive bias plays a critical role in ensuring that machine learning models can generalize
effectively from training data to unseen data. Without bias, a learning algorithm would have
to consider every possible hypothesis, which is computationally infeasible.
Inductive bias helps balance the bias-variance trade-off. A model with too much bias
may underfit the data, resulting in poor predictions on unseen data. Conversely, a model
with too little bias may overfit, capturing noise in the training data but failing to generalize.
The goal is to find the right balance: enough inductive bias to ensure generalization, but not
so much that the model becomes too rigid. This is especially important in real-world machine
learning tasks, where data is often noisy and incomplete, and making assumptions about the
data is necessary for the model to make reasonable predictions.
While inductive bias is essential for guiding machine learning models, it comes with
challenges:
Overfitting
When the inductive bias is too weak, the model may overfit the training data by learning
noise rather than meaningful patterns. Overfitting occurs when the model fits the training data
too closely, resulting in poor performance on unseen data.
Underfitting
Conversely, if the inductive bias is too strong, the model may underfit the data, failing to
capture important patterns. This can lead to overly simplistic models that don’t perform well
on either the training or test data.
Finding the optimal level of inductive bias requires tuning the model’s complexity and
flexibility. For instance, regularization techniques can help control the degree of bias by
penalizing overly complex models, thus encouraging generalization without overfitting.
Machine learning practitioners must carefully consider the trade-off between bias and
flexibility to create models that are both accurate and generalizable.
Conclusion
Inductive bias is a fundamental concept in machine learning that guides models in making
predictions beyond the training data. By introducing assumptions about the data, inductive
bias allows algorithms to generalize and learn more efficiently. However, the strength of the
bias must be carefully balanced to avoid underfitting or overfitting the model. Understanding
the role of inductive bias in different machine learning algorithms is crucial for selecting the
right model for a given task. Further exploration of bias-variance trade-offs will lead to
better-performing models in real-world applications.
There are various metrics which we can use to evaluate the performance
of ML algorithms, classification as well as regression algorithms. Let's
discuss these metrics for Classification and Regression problems
separately.
Confusion Matrix
Classification Accuracy
Classification Report
Precision
Recall or Sensitivity
Specificity
Support
F1 Score
ROC AUC Score
LOGLOSS (Logarithmic Loss)
Confusion Matrix
Classification Accuracy
Accuracy=TP+TN/+++
Classification Report
Precision
Precision=TP/TP+FP
Recall or Sensitivity
Recall measures the proportion of true positive instances out of all actual
positive instances. It is calculated as the number of true positive
instances divided by the sum of true positive and false negative instances.
Recall=TP/TP+FN
Specificity
Specificity=TN/TN+FP
Support
F1 Score
=()/(+)
MAE=1/n∑|Y−Y^|
MSE is like the MAE, but the only difference is that the it squares the
difference of actual and predicted output values before summing them all
instead of using the absolute value. The difference can be noticed in the
following equation −
MSE=1/n∑(Y−Y^)
R2=1−[1/n∑i=1(Yi−Yi^)^2]/[1/n∑i=1(Yi−Yi)^2]
Before delving into the intricacies of the ID3 algorithm, let's grasp the essence of decision
trees. Picture a tree-like structure where each internal node represents a test on an attribute,
each branch signifies an outcome of that test, and each leaf node denotes a class label or a
decision. Decision trees mimic human decision-making processes by recursively splitting
data based on different attributes to create a flowchart-like structure for classification or
regression.
ID3 Algorithm
A well-known decision tree approach for machine learning is the Iterative Dichotomiser
3 (ID3) algorithm. By choosing the best characteristic at each node to partition the data
depending on information gain, it recursively constructs a tree. The goal is to make the final
subsets as homogeneous as possible. By choosing features that offer the greatest reduction in
entropy or uncertainty, ID3 iteratively grows the tree. The procedure keeps going until a
halting requirement is satisfied, like a minimum subset size or a maximum tree depth.
Although ID3 is a fundamental method, other iterations such as C4.5 and CART have
addresse
How ID3 Works
The ID3 algorithm is specifically designed for building decision trees from a given dataset.
Its primary objective is to construct a tree that best explains the relationship between
attributes in the data and their corresponding class labels
ID3 employs the concept of entropy and information gain to determine the attribute that best
separates the data. Entropy measures the impurity or randomness in the dataset.
The algorithm calculates the entropy of each attribute and selects the one that results in the most
significant information gain when used for splitting the data.
The chosen attribute is used to split the dataset into subsets based on its distinct values.
For each subset, ID3 recurses to find the next best attribute to further partition the data, forming
branches and new nodes accordingly.
3. Stopping Criteria
The recursion continues until one of the stopping criteria is met, such as when all instances in a
branch belong to the same class or when all attributes have been used for splitting.
ID3 can handle missing attribute values by employing various strategies like attribute mean/mode
substitution or using majority class values.
5. Tree Pruning
Pruning is a technique to prevent overfitting. While not directly included in ID3, post-processing
techniques or variations like C4.5 incorporate pruning to improve the tree's generalization.
Overfitting happens when a model learns too much from the training data, including details that
don’t matter (like noise or outliers).
For example, imagine fitting a very complicated curve to a set of points. The curve will go through
every point, but it won’t represent the actual pattern.
As a result, the model works great on training data but fails when tested on new data.
Overfitting models are like students who memorize answers instead of understanding the topic.
They do well in practice tests (training) but struggle in real exams (testing).
Improving the quality of training data reduces overfitting by focusing on meaningful patterns,
mitigate the risk of fitting the noise or irrelevant features.
Increase the training data can improve the model's ability to generalize to unseen data and reduce
the likelihood of overfitting.
Early stopping during the training phase (have an eye over the loss over the training period as soon
as loss begins to increase stop training).
In decision tree learning (like ID3, C4.5, CART), handling continuous (numerical) values is
crucial, because most real-world data includes attributes like age, income, temperature, etc.
This makes the tree behave like it's using a categorical attribute, but it’s just using a binary
decision.
Step 3: For each threshold, split the data into two groups and calculate:
✅ Example
Age Class
25 No
32 No
40 Yes
45 Yes
50 Yes
yaml
CopyEdit
[Age < 42.5]
/ \
Yes No
(Class = No) (Class = Yes)