Unit 4-1
Unit 4-1
Parametric estimation
• A single model is defined for the entire input space (the domain of all
possible inputs).
• parameters of this model are learned using the entire training dataset.
• Once the model is trained, the same set of parameters is applied to any
new test input, regardless of its specific location in the input space.
Nonparametric Estimation
• The input space is divided into local regions based on a distance measure
(e.g., the Euclidean norm, which measures how far points are from each
other in space).
• For each test input, a local model is created using the training data within
its corresponding region.
Instance-Based Models
• nonparametric models that rely on training data instances directly to make
predictions (e.g., k-nearest neighbors).
• calculating the distance from the test input to every training instance, which
is computationally expensive, with a time complexity of O(N),where N is
the number of training instances.
Decision Tree
• applies a divide-and-conquer strategy to solve problems.
• Decision Nodes: Each node applies a test function on the input data
• Leaf Node: The tree ends here, and the value in the leaf node is the
final output for the input.
• A learning algorithm takes a labeled training dataset (data where the
inputs and their corresponding outputs are known) to construct the
decision tree.
• directly learn a rule base from the data, bypassing the tree structure.
• if the data shows a complex relationship between input features and output
classes, the tree will add more splits, branches, and leaves to accurately
model those relationships.
• Entropy quantifies the amount of disorder or uncertainty in a dataset.
A lower entropy means the dataset is more homogeneous (e.g., all
data points belong to the same class), while higher entropy indicates
more heterogeneity (e.g., data points are mixed across multiple
classes).
• Information Gain (IG) is the difference between the entropy of the
dataset before the split and the weighted entropy of the subsets after
the split.
Linear Discrimination
• the fundamental assumption is that instances of different classes can
be separated by a linear decision boundary.
• Class likelihoods:
• Posterior densities:
• Before using complex models like neural networks or kernel-based SVMs, first try
a linear discriminant model to check if it performs well.
Quadratic Discriminant Function:
• Allows for more complex decision boundaries than a linear model.
Advantages:
• Capture the complex boundaries
Disadvantages:
• High computational cost
• Risk of overfitting
• map the data to a higher-dimensional space and apply a linear model.
• Instead of directly using a quadratic discriminant, preprocess the input by
adding higher-order features.
• Original input:
Advantages:
• computational complexity is low
• More interpretable
Disadvantages:
• risk of overfitting
Generalized approach:
• Basic function:
Examples:
• Polynomial basis:
• Trigonometric basis:
Geometry of the Linear Discriminant:
• Two classes: