Hypothesis in ML
Hypothesis in ML
But note here that we could have divided the coordinate plane as:
The way in which the coordinate would be divided depends on the data, algorithm and
constraints.
All these legal possible ways in which we can divide the coordinate plane to
predict the outcome of the test data composes of the Hypothesis Space.
Each individual possible way is known as the hypothesis.
Hence, in this example the hypothesis space would be like:
Hypothesis in Statistics
In statistics, a hypothesis refers to a statement or assumption about a population
parameter. It is a proposition or educated guess that helps guide statistical analyses.
There are two types of hypotheses: the null hypothesis (H0) and the alternative
hypothesis (H1 or Ha).
Null Hypothesis(H0): This hypothesis suggests that there is no significant
difference or effect, and any observed results are due to chance. It often represents
the status quo or a baseline assumption.
Aternative Hypothesis(H1 or Ha): This hypothesis contradicts the null
hypothesis, proposing that there is a significant difference or effect in the
population. It is what researchers aim to support with evidence.
INDUCTIVE BIAS
Definition
At its core, inductive bias refers to the set of assumptions that a learning algorithm makes
to predict outputs for inputs it has never seen. It’s the bias or inclination of a model
towards making a particular kind of assumption in order to generalize from its training
data to unseen situations.
Why is Inductive Bias Important?
Learning from Limited Data: In real-world scenarios, it’s practically impossible to have
training data for every possible input. Inductive bias helps models generalize to unseen
data based on the assumptions they carry.
Guiding Learning: Given a dataset, there can be countless hypotheses that fit the data.
Inductive bias helps the algorithm choose one plausible hypothesis over another.
Preventing Overfitting: A model with no bias or assumptions might fit the training data
perfectly, capturing every minute detail, including noise. This is known as overfitting. An
inductive bias can prevent a model from overfitting by making it favour simpler
hypotheses.
Types of Inductive Bias
Preference Bias: It expresses a preference for some hypotheses over others. For
example, in decision tree algorithms like ID3, the preference is for shorter trees over
longer trees.
Restriction Bias: It restricts the set of hypotheses considered by the algorithm. For
instance, a linear regression algorithm restricts its hypothesis to linear relationships
between variables.
Examples of Inductive Bias in Common Algorithms
Decision Trees: Decision tree algorithms, like ID3 or C4.5, have a bias towards shorter
trees and splits that categorize the data most distinctly at each level.
k-Nearest Neighbors (k-NN): The algorithm assumes that instances that are close to
each other in the feature space have similar outputs.
Neural Networks: They have a bias towards smooth functions. The architecture itself
(number of layers, number of neurons) can also impose bias.
Linear Regression: Assumes a linear relationship between the input features and the
output.
Trade-offs
While inductive bias helps models generalize from training data, there’s a trade-off. A
strong inductive bias means the model might not be flexible enough to capture all
patterns in the data. On the other hand, too weak a bias could lead the model to overfit
the training data.
Comparison
What are the best practices to get a Generalized model?
It is important to have a training dataset with good variance (i.e. a shuffled data set). The
Best way to do this is computing the hash for an appropriate feature and split data into
training, evaluation and test sets based on the computed hash value. Here the evaluation
set is used to cross-validate the trained model. It is always good to ensure that the
distribution in all the dataset is stationary(same).
Handling outliers also important, it always depends on the task you are working around.
If you are training the model to detect anomalies you should consider outliers, in such
case, these anomalies may be the labels you need to identify. So you cannot classify or
detect without outliers. On the other hand, if you are modeling a regression-based
classification it is good to remove outliers.
Using resampling during the training. Resampling enables you to reconstruct your
sample dataset in different ways for each iteration. One of the most popular resampling
technique is k-fold cross-validation. It does training and testing on the model for k times
with different subsets of your testing data.
It is always good to know when to stop training. It is a common human insight to
determine when to stop training. When you reach a good training loss and a good
validation loss at that point stop training.
Learn to do some feature engineering when needed. In some cases, your model cannot be
able to converge, there may be not a meaningful relation found on the raw features you
have. Doing Feature crosses and introducing new features with meaningful relation helps
the model to converge.
Hope you all get a basic idea of generalization, underfitting, and overfitting. Use this as a
base and keep exploring on subtopics for deeper understandings.
Don’t forget to applaud if you find this article useful. Your doubts and feedbacks are
always welcomed.