MACHINE LEARNING Updated
MACHINE LEARNING Updated
LEARNING
- UNIT 1
REVIEW OF LINEAR ALGEBRA FOR MACHINE LEARNING
1. Basics :
•Scalars: Single values, denoted as lowercase letters (e.g., aaa).
•Vectors: Ordered lists of numbers denoted as bold lowercase (e.g., v\mathbf{v}v).
•Matrices: 2D arrays of numbers denoted as bold uppercase (e.g., A\mathbf{A}A).
•Tensors: Generalizations of vectors & matrices to higher dimensions.
2. Key Operations :
VC Dimension :
model’s capacity, or its ability to classify a variety of data patterns. It’s defined as
the maximum number of points a model can shatter, meaning the model can
perfectly classify all possible labelings of those points. Higher VC dimensions imply
suggest simpler models that may underfit. It’s crucial for understanding a model’s
generalization ability.
PROBABLY APPROXIMATELY CORRECT (PAC) LEARNING
Probably Approximately Correct (PAC) Learning is a framework in machine learning that quantifies a
model's ability to learn from data. In PAC learning, a model is considered successful if, with high
probability (the "Probably" part), it can learn a hypothesis that is approximately correct—that is, close
enough to the true function or distribution generating the data.
Key points:
•Probably: The model will produce an accurate hypothesis with a high probability (e.g., 95%).
•Approximately Correct: The hypothesis may not be perfect, but its error is within an acceptable margin
(ε).
•Efficiency: PAC learning also considers the computational efficiency of finding this hypothesis within a
reasonable amount of data and time.
HYPOTHESIS SPACE
The hypothesis space in machine learning is the set of all possible models or
functions that a learning algorithm can choose from to fit a given dataset. It
includes every potential hypothesis (or function) that could map inputs to outputs
based on the training data.
Why It’s Essential
•Defines Learning Scope: The hypothesis space determines the complexity and
flexibility of the models, influencing what patterns or relationships the model can
learn from the data.
•Affects Generalization: A too-large hypothesis space can lead to overfitting,
where the model learns noise instead of patterns. A too-small hypothesis space may
underfit, missing important data relationships.
•Guides Model Selection: Choosing an appropriate hypothesis space helps
INDUCTIVE BIAS
1. Bias:
Definition: Bias refers to the error done by overly simplistic assumptions made by
the model.
Effect: High bias means the model is too simple to capture the underlying patterns
in the data, leading to underfitting.
Result: A model with high bias will have poor performance on both training and test
data because it cannot represent the complexity of the data.
2. Variance:
Definition: Variance refers to the error introduced by the model’s sensitivity to small
fluctuations or noise in the training data.
Effect: High variance means the model is too complex and overfits, learning the
noise and details of the training set rather than the underlying pattern.
Result: A model with high variance will perform well on training data but poorly on
BIAS-VARIANCE TRADE-OFF VISUALIZATION
THANK YOU..