Machine Learning
Machine Learning
Learning
Created @December 3, 2024 11:49 AM
Tags
Table of Contents
1. Recap
Machine Perception
Pattern Recognition
Key ML Concepts
3. Supervised Learning
Classification
Regression
4. Nearest-Neighbor Classification
7. Evaluation of Hypotheses
Loss Functions
Overfitting
1. Recap
Key Topics Covered in Previous Lectures
5.Learning 1
Search Problems: Strategies for state-space exploration.
Speech recognition.
Fingerprint identification.
Examples:
Challenges:
5.Learning 2
Machine Learning Core
Definition: Deriving insights from raw data to make decisions or predictions.
3. Supervised Learning
Overview
Goal: Learn a function to predict outcomes from labeled input-output pairs.
5.Learning 3
Process
1. Training Phase:
2. Testing Phase:
f(x)
Classification
5.Learning 4
Assigns a discrete category (class) to inputs.
Examples:
Spam detection (binary classification).
Features:
1. Histograms of Features
5.Learning 5
The histograms illustrate the distribution of features like lightness and length for
the fish dataset.
Histogram of Lightness:
Histogram of Length:
5.Learning 6
Axes:
Data Points:
Decision Boundary:
Definition: A line or curve that
separates the feature space into
regions, one for each class.
5.Learning 7
Combination of features that define an object.
2. Features:
3. Classes (Labels):
4. Instance (Exemplar):
Classification Approaches:
1. Binary Classifiers:
2. Multi-class Classifiers:
3. Multi-label Classifiers:
5.Learning 8
recognizing multiple objects in
an image.
Unlike binary classification, where the task is to separate two classes, multi-
class classification must identify one out of several possible classes for a
given input.
Challenges:
2. Image Classification:
Output: Label corresponding to the type of animal (e.g., cat, dog, or bird).
3. Object Detection:
5.Learning 9
1. One-Versus-All (OvA):
Approach:
Example:
Handwritten digits: Train 10 classifiers, one for each digit (0, 1, ..., 9).
For a given input, each classifier outputs a score, and the class with
the highest score is chosen.
Advantages:
Simple to implement.
Disadvantages:
2. One-Versus-One (OvO):
Approach:
( ) = n(n − 1)/2
n
2
Example:
Advantages:
5.Learning 10
Each classifier focuses on distinguishing between two classes, making
it potentially more accurate.
Disadvantages:
Outputs probabilities for each class; the class with the highest probability
is selected, instead of sharp threshold where the voting is only 1 or 0. This
model is more flexible.
5. Neural Networks:
5.Learning 11
Challenges in Multi-Class Classification
1. Class Imbalance:
Some classes may have significantly more samples than others, leading to
biased models.
2. Overlapping Classes:
Solution: Use advanced models like Support Vector Machines (SVMs) with
non-linear kernels or deep learning.
3. Scalability:
2. Multi-Label Classifiers:
Example: Identifying all objects in an image (e.g., both a cat and a dog).
Summary of Techniques
Technique Advantages Disadvantages
5.Learning 12
Logistic
Probabilistic interpretation Assumes linear separability
Regression
4. Nearest-Neighbor Classification
Rule:
Nearest-Neighbor Classification
involves assigning a class label to
a new data point based on the
label of its closest point
(prototype) in the feature space.
Example:
k-Nearest Neighbor:
Algorithm:
1. Determine k:
2. Compute Distance:
Measure the distance between the test point and all training points using a
distance metric.
3. Find Neighbors:
5.Learning 13
Identify the k closest neighbors based on the computed distances.
4. Majority Voting:
Determine the most common class among the k neighbors and assign it to
the test point.
Choosing k:
Small k:
Large k:
Distance Metrics:
1. Euclidean Distance:
Formula
d
d(x1 , x2 ) =
∑(x1i − x2i )2
i=1
2. Manhattan Distance:
Formula:
d
d(x1 , x2 ) = ∑ ∣x1i − x2i ∣
i=1
3. Other Metrics:
5.Learning 14
Minkowski Distance (generalized version of Euclidean/Manhattan).
Strengths:
Simple to understand and implement.
Weaknesses:
Computationally expensive for large datasets.
Practical Application:
Recommender Systems:
Medical Diagnosis:
5.Learning 15
The Perceptron rule updates
weights wi as follows:
wi = wi + α(y − y^)xi
y: True label.
Algorithm:
1. Initialize Weights:
y^ = {
1 if w ⋅ x + b ≥ 0
0 otherwise.
^=
Update weights if y y
w = w + α(y − y^)x
Features:
5.Learning 16
1. Threshold-Based Classification:
Example:
Dataset:
Points: (2,3) labeled 1, (1,1) labeled 0.
Iteration 1:
Predicted y^for (2,3): 0 (incorrect).
Update:
Applications:
Optical character recognition (OCR).
5.Learning 17
Overview:
SVMs are supervised learning models that find the optimal hyperplane to
separate classes in feature space.
Key Concepts:
1. Margin:
Distance between the hyperplane and the nearest data points (support
vectors) from each class.
2. Hyperplane:
w⋅x+b= 0
3. Kernel Trick:
Common Kernels:
Linear.
Polynomial.
SVM Optimization:
1. Objective:
Minimize: (residuals )
5.Learning 18
1
∥w∥2
2
Subject to constraints:
yi (w ⋅ xi + b) ≥ 1
∀i.
2. Slack Variables:
Strengths:
Effective for high-dimensional data.
Weaknesses:
Sensitive to parameter choices (e.g., C and kernel parameters).
5.Learning 19
Applications:
1. Text Classification:
2. Image Recognition:
Face detection.
7. Regression
Regression is a supervised learning technique used to predict a continuous
output value based on input features
The goal is to find a function f(x) that maps inputs (x) to outputs (y).
Types of Regression:
1. Linear Regression:
Hypothesis Function:
hθ (x) = θ0 + θ1 x
Cost Function:
5.Learning 20
m
1 2
J (θ) = ∑ (hθ (x(i) ) − y(i) )
2m
i=1
where:
y: Actual value.
2. Polynomial Regression:
Example:
hθ (x) = θ0 + θ1 x + θ2 x2
3. Logistic Regression:
Hypothesis Function:
1
hθ (x) =
1 + e−θ T x
Steps:
hθ (x) = θ0 + θ1 x
5.Learning 21
3. Use the model to predict the price for a house of a given size.
In cases where linear regression is giving you high error, try to increase the
complexity little by little (make it polynomial)
8. Evaluation of Hypotheses
Evaluation metrics quantify how well a model's predictions match actual
outcomes.
Loss Functions:
1. 0-1 Loss:
Used in classification.
5.Learning 22
Formula:
Formula:
Model fits the training data too closely, failing to generalize to new data.
5.Learning 23
Underfitting:
Split the dataset into training (used for learning) and testing (used for
evaluation).
2. Cross-Validation:
Divide the dataset into k subsets and train/test the model k times.
Each subset serves as the test set once (more on this below).
5.Learning 24
9. Cross-Validation and Regularization
Cross-Validation
1. Holdout Method:
2. K-Fold Cross-Validation:
Train the model k times, each time using a different subset as the test set
and the rest as the training set.
3. Leave-One-Out Cross-Validation:
Regularization
Regularization prevents overfitting by penalizing overly complex models.
Techniques: (EXTRA)
1. L2 Regularization (Ridge Regression):
5.Learning 25
Modified Cost Function:
m n
1 2
J (θ) = ∑ (hθ (x ) − y ) + λ ∑ θj2
(i) (i)
2m
i=1 j=1
3. Elastic Net:
Applications:
Helps in feature selection by reducing less important features.
Key Steps:
1. Initialize Parameters:
2. Compute Gradients:
Calculate the partial derivatives of the cost function with respect to each
parameter:
∂J (θ)
∂θj
3. Update Parameters:
5.Learning 26
∂J (θ)
θj = θj − α
∂θj
Variants:
1. Batch Gradient Descent:
Convergence is slow.
2. Too Large:
Application Example:
Linear Regression with Gradient Descent:
5.Learning 27
Dataset: House prices.
Hypothesis:
hθ (x) = θ0 + θ1 x
Cost Function:
m
1 2
J (θ) = ∑ (hθ (x ) − y )
(i) (i)
2m
i=1
Update Rule:
∂J (θ) ∂J (θ)
θ0 − α , θ1 = θ1 − α
∂θ0 ∂θ1
Practical Challenges:
1. Local Minima:
2. Feature Scaling:
5.Learning 28