0% found this document useful (0 votes)

11 views11 pages

Module B Handbook

The Module B Handbook for the Minor in AI covers key concepts in supervised learning, including linear and polynomial regression, gradient descent, regularization techniques, classification methods, support vector machines, logistic regression, principal component analysis, and the PageRank algorithm. It provides mathematical formulations, objectives, and comparisons of different methods, emphasizing the importance of model evaluation and the trade-offs involved in choosing algorithms. This handbook serves as a comprehensive guide for understanding and applying machine learning techniques.

Uploaded by

elmersonr.ug.21.ad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views11 pages

Module B Handbook

Uploaded by

elmersonr.ug.21.ad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Module B Handbook

Minor In AI
March 2, 2025

Introduction to Supervised Learning

Supervised learning is a type of machine learning where a model is trained on a labeled dataset, meaning
each training example is paired with the correct output. The goal is for the model to learn a mapping from
inputs to outputs so it can make accurate predictions on new, unseen data.

Let's dive into concepts of Supervised Learning:

1 Linear Regression
Linear Regression is a statistical method used to model the relationship between a de-
pendent variable (target) and one or more independent variables (features). It assumes
a linear relationship between variables.

1.1 Types of Linear Regression

• Simple Linear Regression – One independent variable.

• Multiple Linear Regression – Multiple independent variables.

1.2 Equation of Linear Regression

For Simple Linear Regression:

y = mx + c (1)

where:

• y = Dependent variable

• x = Independent variable

• m = Slope of the line (coefficient)

• c = Intercept

For Multiple Linear Regression:

y = b0 + b1 x1 + b2 x2 + ... + bn xn (2)

where:

• b0 is the intercept

• b1 , b2 , ..., bn are the coefficients of independent variables

1
Minor in AI 2

1.3 Objective Function

The goal is to minimize the Mean Squared Error (MSE):
n
1X
M SE = (yi − ŷi )2 (3)
n i=1

where:

• yi is the actual value

• ŷi is the predicted value

• n is the number of observations

1.4 Finding the Best Fit Line

1.4.1 Gradient Descent
∂
bj = bj − α J(b) (4)
∂bj
where:

• α is the learning rate

• J(b) is the cost function (MSE)

1.5 Assumptions of Linear Regression

• Linearity – The relationship between independent and dependent variables is lin-
ear.

• Independence – Observations are independent of each other.

• Homoscedasticity – Constant variance of errors.

• Normality of Residuals – Residuals should be normally distributed.

• No Multicollinearity – Independent variables should not be highly correlated.

2 Polynomial Regression
2.1 Introduction
Polynomial Regression is an extension of Linear Regression where the relationship be-
tween variables is modeled as an n-degree polynomial. It helps in capturing non-linear
relationships.
Minor in AI 3

2.2 Equation of Polynomial Regression

y = b0 + b1 x + b2 x2 + b3 x3 + ... + bn xn (5)
where:

• xn represents polynomial terms

• bn are the coefficients

2.3 Why Use Polynomial Regression?

• When data is non-linear and a straight line does not fit well.

• It allows flexibility in curve fitting by increasing the degree of the polynomial.

2.4 Transforming Data for Polynomial Regression

Since Polynomial Regression is still a form of linear regression (in terms of coefficients),
we transform the features:

• Convert x into x, x2 , x3 , ...

• Then apply Linear Regression on transformed data.

2.5 Choosing the Degree of the Polynomial

• Degree = 1 → Linear Regression

• Degree = 2 → Quadratic Regression

• Degree = 3 → Cubic Regression, etc.

Higher degrees may lead to overfitting.

2.6 Overfitting vs. Underfitting

• Underfitting: Model is too simple (high bias, low variance).

• Overfitting: Model is too complex (low bias, high variance).

• Choose an optimal degree using cross-validation.

2.7 Comparison: Linear vs. Polynomial Regression

Feature Linear Regression Polynomial Regression

Relationship Linear Non-linear
Model Complexity Simple Complex
Overfitting Risk Low High for large degrees
Interpretability Easy Harder for high-degree

Table 1: Comparison of Linear and Polynomial Regression

Minor in AI 4

3 Gradient Descent
Gradient Descent is an optimization algorithm used to minimize the cost function in
machine learning and deep learning models. It iteratively adjusts model parameters to
find the best fit for the data.

3.1 Why Gradient Descent?

• For simple models, we can use Ordinary Least Squares (OLS) to find the best-fit
parameters directly.

• However, for complex models (high-dimensional data, deep learning), OLS is com-
putationally expensive, so we use Gradient Descent.

3.2 How Gradient Descent Works?

1. Initialize parameters (θ) randomly or with zeros.

2. Compute the cost function J(θ).

3. Compute the gradient (derivative) of the cost function.

4. Update parameters using:

∂
θj = θj − α J(θ)
∂θj
where:

• α is the learning rate

• ∂
∂θj
J(θ) is the gradient of the cost function

5. Repeat until convergence (cost function stops changing significantly).

3.3 Cost Function for Linear Regression

m
1 X
J(θ) = (hθ (xi ) − yi )2
2m i=1
where:

• hθ (x) = θ0 + θ1 x (hypothesis function)

• m = number of training examples

3.4 Gradient Descent Update Rule

For Linear Regression, the gradient descent update rule is:
m
X
θj = θj − α (hθ (xi ) − yi )xi
i=1
Minor in AI 5

3.5 Types of Gradient Descent

• Batch Gradient Descent (BGD): Uses all training examples in each iteration.

• Stochastic Gradient Descent (SGD): Updates parameters after each training

example.

• Mini-Batch Gradient Descent: Uses a small batch of data in each iteration.

3.6 Challenges in Gradient Descent

• Choosing the Learning Rate (α): If too high, the algorithm diverges; if too
low, convergence is slow.

• Local Minima and Saddle Points: Momentum-based optimizers like Adam can
help.

• Feature Scaling: Gradient Descent converges faster when features are standard-
ized.

4 Regularization
Regularization is a technique used to prevent overfitting in machine learning models
by adding a penalty to large coefficients.

4.1 Why Regularization?

• In high-dimensional models, some features may capture noise rather than actual
patterns.

• Regularization shrinks coefficients, ensuring the model generalizes well to unseen

data.

4.2 Types of Regularization

4.2.1 L1 Regularization (Lasso Regression)
L1 Regularization adds the absolute value of coefficients as a penalty:
m n
1 X X
J(θ) = (hθ (xi ) − yi )2 + λ |θj |
2m i=1 j=1

• Encourages sparsity (some coefficients become exactly zero).

• Useful for feature selection.

Minor in AI 6

4.2.2 L2 Regularization (Ridge Regression)

L2 Regularization adds the square of coefficients as a penalty:
m n
1 X X
J(θ) = (hθ (xi ) − yi )2 + λ θj2
2m i=1 j=1

• Does not eliminate coefficients but shrinks them towards zero.

• Useful when all features contribute but need regularization.

4.2.3 Elastic Net (Combination of L1 and L2)

Elastic Net combines L1 (Lasso) and L2 (Ridge) regularization:
m n n
1 X X X
J(θ) = (hθ (xi ) − yi )2 + λ1 |θj | + λ2 θj2
2m i=1 j=1 j=1

• Helps when highly correlated features exist.

4.3 Choosing Between Ridge, Lasso, and Elastic Net

Method Effect on Coefficients Use Case

Lasso (L1) Some coefficients become zero Feature selection
Ridge (L2) Shrinks coefficients but none become zero When all features are important
Elastic Net Mix of Ridge and Lasso When features are correlated

Table 2: Comparison of Regularization Techniques

5 Classification
Classification is a supervised learning task where the goal is to assign a given input into
one of several predefined categories. The model learns from labeled training data and
predicts the category for unseen data.

5.1 Key Steps in Classification

1. Data Preprocessing
• Handling Missing Data
• Feature Scaling
• Encoding Categorical Variables
• Feature Selection
2. Splitting the Dataset (Train/Test Split)
3. Training the Model
4. Making Predictions
5. Evaluating the Model
Minor in AI 7

5.2 Performance Evaluation Metrics

5.2.1 Confusion Matrix

Actual \ Predicted Positive (1) Negative (0)

Positive (1) True Positive (TP) False Negative (FN)
Negative (0) False Positive (FP) True Negative (TN)

Table 3: Confusion Matrix

5.2.2 Accuracy
TP + TN
Accuracy = (6)
TP + TN + FP + FN

5.2.3 Precision, Recall, and F1-Score

TP
P recision = (7)
TP + FP
TP
Recall = (8)
TP + FN
P recision × Recall
F 1 − Score = 2 × (9)
P recision + Recall

6 Support Vector Machines (SVM)

Support Vector Machines (SVM) is a supervised learning algorithm used for both
classification and regression tasks. It is particularly effective in high-dimensional
spaces and works well when the number of features is greater than the number of samples.

6.1 Why Use SVM?

• Works well for small datasets.

• Effective for high-dimensional data.

• Can model non-linear relationships using kernel functions.

• Finds an optimal decision boundary by maximizing the margin between classes.

6.2 Understanding the Hyperplane in SVM

A hyperplane is a decision boundary that separates different classes.

• For 2D Data → The hyperplane is a straight line.

• For 3D Data → The hyperplane is a plane.

• For higher dimensions → The hyperplane is a mathematical construct.

Goal of SVM: Find the hyperplane that maximizes the margin (distance between
the closest points from both classes, called support vectors).
Minor in AI 8

6.3 Mathematical Formulation of SVM

Given a dataset with features X and labels y, where yi ∈ {−1, 1}, SVM finds a hyper-
plane defined by:
f (x) = wT x + b
where:
• w is the weight vector.
• b is the bias term.
• x is the input vector.
The objective of SVM is to maximize the margin while ensuring correct classifica-
tion:
yi (wT xi + b) ≥ 1, ∀i

6.4 Optimization Problem

1
min ||w||2
w,b 2

subject to:
yi (wT xi + b) ≥ 1
This quadratic optimization problem ensures that the margin is maximized while
keeping misclassifications to a minimum.

6.5 Types of SVM

6.5.1 Hard Margin SVM (No Misclassification Allowed)
• Assumes data is perfectly linearly separable.
• Finds a hyperplane with maximum margin.
• Not robust to noise or outliers.

6.5.2 Soft Margin SVM (Allows Some Misclassification)

• Used when data is not perfectly separable.
• Introduces a regularization parameter C to balance margin width and misclas-
sification.
• Higher C → Less tolerance for misclassification (more overfitting).
• Lower C → More tolerance for misclassification (better generalization).
Objective function with slack variables ξ:
n
1 2
X
min ||w|| + C ξi
w,b 2
i=1

subject to:
yi (wT xi + b) ≥ 1 − ξi , ∀i
where ξi are slack variables that allow misclassification.
Minor in AI 9

6.6 SVM with Non-Linearly Separable Data (Kernel Trick)

If data is not linearly separable, SVM uses the Kernel Trick to transform the input
space into a higher-dimensional space where it becomes separable.

6.6.1 Common Kernel Functions

Kernel Formula Use Case

Linear Kernel K(xi , xj ) = xTi xj Linearly separable data
Polynomial Kernel K(xi , xj ) = (xTi xj + c)d Captures curved relation-
ships
Radial Basis Function K(xi , xj ) = exp(−γ||xi − Handles non-linear rela-
(RBF) Kernel xj ||2 ) tionships
Sigmoid Kernel K(xi , xj ) = Similar to neural net-
tanh(αxTi xj + c) works

RBF Kernel is the most commonly used because it can model complex
decision boundaries.

6.7 Regularization in SVM (C Parameter)

• The C parameter controls the trade-off between margin width and misclassi-
fication errors.

• High C → Tries to classify all points correctly (can overfit).

• Low C → Allows some misclassifications but generalizes better.

6.8 Advantages and Disadvantages of SVM

6.8.1 Advantages
• Works well with high-dimensional data.

• Effective when number of features ¿ number of samples.

• Uses kernels to model non-linear relationships.

6.8.2 Disadvantages
• Computationally expensive for large datasets.

• Performance depends on kernel choice and hyperparameter tuning.

• Sensitive to imbalanced data (use class weight=’balanced’).

7. Logistic Regression
Logistic regression is used for binary classification problems. It models the probability
that a given input x belongs to class 1.

Model Equation
The model estimates the probability using the sigmoid function:
1
P (y = 1 | x) = σ(z) = , where z = wT x + b
1 + e−z

Loss Function (Binary Cross-Entropy)

N
1 X (i)
y log(ŷ (i) ) + (1 − y (i) ) log(1 − ŷ (i) )

L=−
N i=1
where:
• y (i) is the true label
• ŷ (i) is the predicted probability
• N is the number of training examples

8. Principal Component Analysis (PCA)

PCA is a dimensionality reduction technique that projects data onto directions (principal
components) that maximize variance.

Steps
1. Center the data:
Xcentered = X − X̄

2. Compute the covariance matrix:

1 T
C= X Xcentered
n centered
3. Compute eigenvectors and eigenvalues of C
4. Select top k eigenvectors to form projection matrix Wk
5. Project data onto lower dimension:

Z = Xcentered Wk

9. PageRank Algorithm
PageRank measures the importance of a node (web page) in a directed graph based on
incoming links.

1
PageRank Formula
n
X P R(Bi )
P R(A) = (1 − d) + d
i=1
L(Bi )
where:

• P R(A) is the PageRank of page A

• d is the damping factor (typically 0.85)

• Bi are the pages linking to A

• L(Bi ) is the number of outbound links from page Bi

Model Building Through
No ratings yet
Model Building Through
21 pages
Sanet ST
No ratings yet
Sanet ST
385 pages
Influence of Brand Image, Price, and Product Quality On Purchase Decision On Lazada E-Commerce
No ratings yet
Influence of Brand Image, Price, and Product Quality On Purchase Decision On Lazada E-Commerce
19 pages
Democracy and War The End of An Illusion Errol A Henderson PDF Download
No ratings yet
Democracy and War The End of An Illusion Errol A Henderson PDF Download
88 pages
Linear Regression Python Programming
No ratings yet
Linear Regression Python Programming
25 pages
Calibration and Use of Internal Strain - Gage Balances With Application To Wind Tunnel Testing
100% (1)
Calibration and Use of Internal Strain - Gage Balances With Application To Wind Tunnel Testing
64 pages
Diction Software For Content Analysis
100% (1)
Diction Software For Content Analysis
27 pages
Linear Regression
No ratings yet
Linear Regression
89 pages
Linear Regression
No ratings yet
Linear Regression
36 pages
The Hundred-Page Machine Learning Book - Andriy Burkov
No ratings yet
The Hundred-Page Machine Learning Book - Andriy Burkov
16 pages
Customer Churn Data - A Project Based On Logistic Regression
100% (12)
Customer Churn Data - A Project Based On Logistic Regression
31 pages
Machine Learning Notes Cs229 1
No ratings yet
Machine Learning Notes Cs229 1
217 pages
Value Versus Growth: Australian Evidence
No ratings yet
Value Versus Growth: Australian Evidence
36 pages
MBM 512 Unit 3,4,5 Solved PDF
No ratings yet
MBM 512 Unit 3,4,5 Solved PDF
29 pages
Unit - I
No ratings yet
Unit - I
44 pages
Academic Achievement Among Upper Primary School Students in Relation To Their Mental Pressure After Kedarnath Disaster
No ratings yet
Academic Achievement Among Upper Primary School Students in Relation To Their Mental Pressure After Kedarnath Disaster
9 pages
Audit Quality 43-50
No ratings yet
Audit Quality 43-50
8 pages
Brommelhaus 2023 Partnership and Higher Education Do A Partner S Educational Aspirations Influence A Student S Dropout
No ratings yet
Brommelhaus 2023 Partnership and Higher Education Do A Partner S Educational Aspirations Influence A Student S Dropout
15 pages
Mauryan Empire
No ratings yet
Mauryan Empire
11 pages
Iit Notes
No ratings yet
Iit Notes
5 pages
Quizlet PDF
No ratings yet
Quizlet PDF
2 pages
ML MU Unit 3RegressionTechniquespdf 2025 02-07-10!56!37
No ratings yet
ML MU Unit 3RegressionTechniquespdf 2025 02-07-10!56!37
115 pages
Module3 Ch1
No ratings yet
Module3 Ch1
83 pages
DSML Project Report - Group05
No ratings yet
DSML Project Report - Group05
14 pages
2.1 Supervised Regression
No ratings yet
2.1 Supervised Regression
26 pages
APSC 258 Midterm Study Guide
No ratings yet
APSC 258 Midterm Study Guide
4 pages
NUST Business School: Course Title: Fundamentals of Econometrics Assignment#1
No ratings yet
NUST Business School: Course Title: Fundamentals of Econometrics Assignment#1
12 pages
Lecture3 Supervised Learning I
No ratings yet
Lecture3 Supervised Learning I
84 pages
Nugent 2010 Chapter 3
No ratings yet
Nugent 2010 Chapter 3
13 pages
Data Cleaning and Preparation
No ratings yet
Data Cleaning and Preparation
4 pages
Lecture 09 - 02.09.2024 - Regression-01
No ratings yet
Lecture 09 - 02.09.2024 - Regression-01
62 pages
Regression
No ratings yet
Regression
25 pages
Regression
No ratings yet
Regression
16 pages
Multivariate Regression
No ratings yet
Multivariate Regression
20 pages
Unit 2 ML - Ver 2
No ratings yet
Unit 2 ML - Ver 2
129 pages
COL774 Practice Problems
No ratings yet
COL774 Practice Problems
22 pages
AC-ED L04 - Logistic Regression, Regularization
No ratings yet
AC-ED L04 - Logistic Regression, Regularization
80 pages
Predictive Maintenance
No ratings yet
Predictive Maintenance
66 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
54 pages
Simple Linear Regression Definition: Two Variables Independent Variable Dependent Variable Equation
No ratings yet
Simple Linear Regression Definition: Two Variables Independent Variable Dependent Variable Equation
9 pages
Nitesh Singh Resume (N)
No ratings yet
Nitesh Singh Resume (N)
1 page
Fileml
No ratings yet
Fileml
54 pages
N4cs2495a2 - Mat530 Group Assignment
No ratings yet
N4cs2495a2 - Mat530 Group Assignment
21 pages
03 Linear Models
No ratings yet
03 Linear Models
46 pages
Chapter02 Introduction To DeepLearning
No ratings yet
Chapter02 Introduction To DeepLearning
84 pages
Study Materials Sem5-1
No ratings yet
Study Materials Sem5-1
15 pages
Final ML
No ratings yet
Final ML
54 pages
Lec1 PDF
No ratings yet
Lec1 PDF
56 pages
Lecture Notes 5 Linear Regression
No ratings yet
Lecture Notes 5 Linear Regression
11 pages
Linear Regression
No ratings yet
Linear Regression
62 pages
GradientDescent-Regression Slides
No ratings yet
GradientDescent-Regression Slides
26 pages
ML Algorithms Week 3
No ratings yet
ML Algorithms Week 3
30 pages
Logistic Regression
No ratings yet
Logistic Regression
42 pages
Lec9 - Linear Models
No ratings yet
Lec9 - Linear Models
44 pages
Lecture3 Upload
No ratings yet
Lecture3 Upload
28 pages
Cost Behavior
No ratings yet
Cost Behavior
5 pages
Self-Reported Addiction To Pornography in A Nationally Representative Sample: The Roles of Use Habits, Religiousness, and Moral Incongruence
No ratings yet
Self-Reported Addiction To Pornography in A Nationally Representative Sample: The Roles of Use Habits, Religiousness, and Moral Incongruence
6 pages
A Tutorial of Machine Learning
No ratings yet
A Tutorial of Machine Learning
16 pages
Eggs, Bungee Jumping, and Algebra: An Application of Linear Modeling
No ratings yet
Eggs, Bungee Jumping, and Algebra: An Application of Linear Modeling
11 pages
Linear Regression Notes
No ratings yet
Linear Regression Notes
25 pages
Assignment 5
No ratings yet
Assignment 5
8 pages
AI ML 3 Updated
No ratings yet
AI ML 3 Updated
34 pages
ML4 Linear Models
No ratings yet
ML4 Linear Models
34 pages
Hundred Page ML Book CH 3
No ratings yet
Hundred Page ML Book CH 3
16 pages
ML 1
No ratings yet
ML 1
24 pages
2EL1730 ML Lecture02 Linear and Logistic Regression
No ratings yet
2EL1730 ML Lecture02 Linear and Logistic Regression
65 pages
MLA TAB Lecture3
No ratings yet
MLA TAB Lecture3
70 pages
Progression Linaire
No ratings yet
Progression Linaire
187 pages
XLRI BM
No ratings yet
XLRI BM
246 pages
Lecture 3 - Regression
No ratings yet
Lecture 3 - Regression
47 pages
QSRI Lecture1
No ratings yet
QSRI Lecture1
45 pages
(MLP) Lecture Notes
No ratings yet
(MLP) Lecture Notes
22 pages
Lecture 3 - Linear Regression
No ratings yet
Lecture 3 - Linear Regression
31 pages
Linear Regression
No ratings yet
Linear Regression
36 pages
Essentials of Linear Regression in Python
No ratings yet
Essentials of Linear Regression in Python
23 pages
Machine Learning - SoS 2017
No ratings yet
Machine Learning - SoS 2017
15 pages
Linear-Regression ML
No ratings yet
Linear-Regression ML
36 pages
Machinelearning
No ratings yet
Machinelearning
59 pages
ML Models and When To Choose One Over Others
No ratings yet
ML Models and When To Choose One Over Others
7 pages
GATE 2025 Syllabus For Data Science Artificial Intelligence DA
No ratings yet
GATE 2025 Syllabus For Data Science Artificial Intelligence DA
2 pages
Aiml Unit 3
No ratings yet
Aiml Unit 3
9 pages
AI & ML Unit 3 Notes
No ratings yet
AI & ML Unit 3 Notes
20 pages
Wk05 Machine Learning
No ratings yet
Wk05 Machine Learning
6 pages
Marketing - MCDONALD'S - BRAND ANALYSIS OF A PRODUCT SPECIFICS IN RELATION TO MCDONALD'S
No ratings yet
Marketing - MCDONALD'S - BRAND ANALYSIS OF A PRODUCT SPECIFICS IN RELATION TO MCDONALD'S
73 pages
Chapter 7 Section 9 Answers
No ratings yet
Chapter 7 Section 9 Answers
7 pages
ML Summary PDF
No ratings yet
ML Summary PDF
5 pages
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
From Everand
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
Yue Jiang
4.5/5 (2)
Multiple Integrals, A Collection of Solved Problems
From Everand
Multiple Integrals, A Collection of Solved Problems
Steven Tan
No ratings yet
Geometric functions in computer aided geometric design
From Everand
Geometric functions in computer aided geometric design
Oscar Ruiz
No ratings yet
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)