0% found this document useful (0 votes)
19 views53 pages

UNIT3 Machine Learning

The document provides an overview of various machine learning algorithms, focusing on supervised and unsupervised learning techniques such as Linear Regression, Support Vector Machines (SVM), Decision Trees, and Random Forests. It explains key concepts, equations, and methodologies associated with these algorithms, including their advantages, disadvantages, and applications. Additionally, it discusses unsupervised learning methods like clustering and association rule learning, highlighting their characteristics and evaluation metrics.

Uploaded by

devhirpara8
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views53 pages

UNIT3 Machine Learning

The document provides an overview of various machine learning algorithms, focusing on supervised and unsupervised learning techniques such as Linear Regression, Support Vector Machines (SVM), Decision Trees, and Random Forests. It explains key concepts, equations, and methodologies associated with these algorithms, including their advantages, disadvantages, and applications. Additionally, it discusses unsupervised learning methods like clustering and association rule learning, highlighting their characteristics and evaluation metrics.

Uploaded by

devhirpara8
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 53

Supervised Learning Algorithm

Unsupervised Learning Algorithms


Machine Learning

Prof. Purvi Patel


[email protected]
What is Linear Regression?
• Definition:
• Linear regression is a type of supervised machine learning algorithm that computes the
linear relationship between the dependent variable and one or more independent
features by fitting a linear equation to observed data.

• Types:
• - Simple Linear Regression: One independent variable
• - Multiple Linear Regression: More than one independent variable
• - Univariate Linear Regression: One dependent variable
• - Multivariate Regression: More than one dependent variable
Linear Regression
• Simple Linear Regression:
• Equation: y = β₀ + β₁X
• Variables:
o y: Dependent variable

Types of o
o
X: Independent variable
β₀: Intercept
o β₁: Slope

Linear • Multiple Linear Regression:


• Equation: y = β₀ + β₁X₁ + β₂X₂ + ... + βnXn
Regression • Variables:
o y: Dependent variable
o X₁, X₂, ..., Xn: Independent variables
o β₀: Intercept
o β₁, β₂, ..., βn: Slopes
• Objective:
• To locate the best-fit line minimizing the error
between predicted and actual values

What is the • Best Fit Line: Provides a straight line representing


the relationship between dependent and
Best Fit independent variables

Line? • Slope: Indicates the change in the dependent


variable for a unit change in the independent
variable(s)

• Equation: y = β₀ + β₁X
• Assumption:
• Linear relationship between X (experience) and Y
(salary)
Hypothesis
• Equation: Ŷ = θ₁ + θ₂X
Function in • ŷᵢ = θ₁ + θ₂xᵢ

Linear • Variables:
o yᵢ: True values (dependent variable)
o xᵢ: Input independent training data (independent
Regression variable)
o ŷᵢ: Predicted values
o θ₁: Intercept
o θ₂: Coefficient of x
• Definition:
• Error or difference between predicted value Ŷ and
true value Y

Cost Function • Mean Squared Error (MSE):


• Equation: J(θ) = 1/n Σᵢ=1ⁿ (ŷᵢ - yᵢ)²
for Linear
• Variables:
Regression • - J(θ): Cost function
• - n: Number of data points
• - ŷᵢ: Predicted values
• - yᵢ: Actual values
• Objective:
• Update θ₁ and θ₂ to minimize the error between

Minimizing predicted and true values

• Gradient Descent:
the Cost • Iterative process to update θ₁ and θ₂ based on
gradients calculated from MSE
Function
• Ensures MSE value converges to the global
minima
• Process:
Gradient • Calculate gradients: ∂J(θ)/∂θ₁ and ∂J(θ)/∂θ₂

Descent for • Update parameters:


• θ₁ ← θ₁ - α ∂J(θ)/∂θ₁
Linear • θ₂ ← θ₂ - α ∂J(θ)/∂θ₂

Regression • Learning Rate (α):


• Controls the step size in gradient descent
Data Relations
What is Polynomial Regression?
• Definition:
o Polynomial regression is a type of regression analysis used in statistics and machine learning when the
relationship between the independent variable (input) and the dependent variable (output) is not
linear.

• Non-linear Relationship:
o Allows for more flexibility by fitting a polynomial equation to the data.
Why • Curvilinear Relationships:
o Suitable for relationships that are better represented by

Polynomial a curve rather than a straight line.

Regression? • Capture Non-linear Patterns


• Feature Engineering:
• Add higher-order terms of the dependent
features in the feature space.
How Does • General Form:

Polynomial • Equation: y = β₀ + β₁x + β₂x² + ... + βnxⁿ + ε

• Variables:
Regression •

- y: Dependent variable
- x: Independent variable
Work? • - β₀, β₁, ..., βn: Coefficients of the polynomial
terms
• - n: Degree of the polynomial
• - ε: Error term
• Degree (n):
• Crucial aspect of polynomial regression.

Choosing the • Higher Degree:

Polynomial • Allows the model to fit the training data more


closely but may lead to overfitting.

Degree • Complexity:
• Degree should be chosen based on the complexity
of the underlying relationship in the data.
What is • Definition:
• SVM is a supervised machine learning algorithm
Support used for both classification and regression, though
it is best suited for classification.
Vector • Objective:

Machine • Find the optimal hyperplane in an N-dimensional


space that can separate the data points into
different classes.
(SVM)?
• Hyperplane:
• The decision boundary that separates data points
of different classes.

How SVM • Margin:


• The distance between the hyperplane and the
Works nearest data points from each class.

• Maximum Margin Hyperplane:


• The hyperplane with the largest margin, providing
the best separation.
• Outliers:
• Data points that do not fit the general pattern.

SVM with • Soft Margin:


• Allows some misclassifications to handle outliers.
Outliers • Hinge Loss:
• Penalty for misclassified points, proportional to
the distance from the margin.
• Kernel Trick:
• SVM uses kernel functions to map data to a

Non-Linearl higher-dimensional space where it can be linearly


separable.

y Separable • Common Kernels:


• - Linear
Data • - Polynomial
• - Radial Basis Function (RBF)
• - Sigmoid
• Hyperplane:
• The decision boundary that separates data points
of different classes in a feature space.
Support
• Support Vectors:
Vector • The closest data points to the hyperplane, playing
a critical role in defining the hyperplane and
Machine margin.

Terminology • Margin:
• The distance between the support vector and
hyperplane, which SVM aims to maximize.
• Kernel Functions:
SVM Kernel •

- Linear: K(w,b) = wᵀx + b
- Polynomial: K(w,x) = (γwᵀx + b)ⁿ
Functions •

- Gaussian RBF: K(w,x) = exp(-γ||xi - xj||²)
- Sigmoid: K(xi,xj) = tanh(αxᵢᵀxⱼ + b)
• Advantages:
• - Effective in high-dimensional spaces.
Advantages • - Memory efficient as it uses a subset of training
points (support vectors).
of SVM • - Different kernel functions can be specified for
decision functions, with the option to define
custom kernels.
• Binary Classification:
• Consider a binary classification problem with two
Mathematical classes, labeled as +1 and -1.

Intuition of • Hyperplane Equation:


• wᵀx + b = 0
SVM
• Distance Calculation:
• dᵢ = (wᵀxᵢ + b) / ||w||
• Decision Rule:
• ŷ = 1 if wᵀx + b ≥ 0, else 0

Linear SVM • Optimization:

Classifier • Hard Margin: Minimize (1/2)||w||² subject to yᵢ


(wᵀxᵢ + b) ≥ 1
• Soft Margin: Minimize (1/2)||w||² + C Σᵢ ζᵢ
subject to yᵢ(wᵀxᵢ + b) ≥ 1 - ζᵢ and ζᵢ ≥ 0
Types of • Linear SVM:
• Uses a linear decision boundary to separate data

Support points of different classes.

• Non-Linear SVM:
Vector • Uses kernel functions to handle non-linearly
separable data by transforming it into a
Machines higher-dimensional space.
What is a • A versatile, interpretable algorithm used for
predictive modeling.
• Suitable for both classification and regression
Decision tasks.
• Visual representation of decisions and their
Tree? possible consequences.
• Root Node: Represents the initial feature or
decision.
• Internal Nodes: Test on attributes, leading to
Decision further branching.
• Leaf Nodes: Represent the final decision or
Tree prediction.
• Branches: Indicate the outcomes of decisions.
Structure • Splitting: The process of dividing nodes based on
decision criteria.
• Pruning: Removing unnecessary branches to
improve accuracy.
Decision Tree Approach
Attribute • Information Gain: Measures the change in
entropy after a split.
Selection • Gini Index: Measures the impurity of a node.

Measures
How • Recursive Partitioning: Splitting data based on
attributes.
Decision • Selecting Attributes: Use criteria like Information
Gain or Gini Index.

Trees Are • Stopping Criterion: Max depth or minimum


instances in a leaf node.

Formed
• Interpretability: Easy to understand and visualize.
• Versatility: Handles both numerical and
categorical data.
Advantages • Feature Importance: Provides insights into which
features are most important.
• Handling Missing Data: Decision trees can manage
missing values effectively.
• Overfitting: Decision trees can be prone to
overfitting, especially with small datasets.
• Data Sensitivity: Small changes in data can lead to
Disadvantages a completely different tree.
• Bias: Potential bias in the presence of imbalanced
data.
What is • A powerful ensemble learning technique.
• Combines multiple decision trees to enhance
predictive accuracy.
Random • Originated in 2001 by Leo Breiman.
• Widely used for both classification and regression
Forest? tasks.
• Ensemble of Decision Trees: Multiple trees work
together for a common output.
Fundamental • Randomness in Training: Random subsets of data
and features reduce overfitting.
Concepts • Final Prediction: Aggregation of individual tree
predictions (voting for classification, averaging for
regression).

Random • Training Phase: Builds multiple decision trees


using random subsets of data and features.
• Prediction Phase: Aggregates the results from all
Forest trees for final prediction.
• Advantages: Reduces overfitting, improves
Algorithm accuracy, handles complex data.
Ensemble • Concept: Combining multiple models to improve
performance.
• Analogy: Like a team of experts collaborating on a
Learning problem.
• Examples: Random Forest, XGBoost, AdaBoost,
Models LightGBM, Bagging
• Bagging: Training multiple weak models on
Bagging and different data subsets and averaging results.
• Boosting: Sequential training where each model
Boosting corrects the errors of the previous one, with
weighted voting for final prediction.
• Step 1: Select random K data points from the
training set.

How Random • Step 2: Build decision trees for selected subsets.


• Step 3: Choose the number N of decision trees.
• Step 4: Repeat steps 1 and 2 to build the forest.
Forest Works • Step 5: For new data, aggregate predictions from
all trees (majority vote for classification, average
for regression).
Random Forest Approach
• High Predictive Accuracy: Collaborative
decision-making leads to better predictions.
• Resistance to Overfitting: Randomness in training
helps generalize better.
Key Features • Handling Large Datasets: Efficiently manages large
and complex datasets.
of Random • Variable Importance: Identifies and ranks the
most important features.
Forest • Built-in Cross-Validation: Out-of-bag samples used
for internal validation.
• Handling Missing Values: Robust against
incomplete data.
• Parallelization: Trees can be trained
simultaneously, speeding up the process.
• Complexity: More computationally intensive than

Potential single models.


• Interpretability: Less transparent than individual
decision trees.
Drawbacks • Memory Usage: Requires more memory to store
multiple trees.
Unsupervised • Learning from unlabeled data without predefined
categories.
• Focus on discovering patterns and relationships
Learning autonomously.
• Process Overview:
– No explicit guidance or labeled data.
How – The model identifies hidden structures in
the data.
Unsupervised • Example:
Learning Works – Distinguishing between different species
of animals based on traits without prior
labeling.
Key • Pattern Discovery: Models find patterns in data
without labels.
Characteristics of • Clustering: Grouping similar data points together.
• Feature Extraction: Capturing essential
Unsupervised information to differentiate data.
• Label Association: Assigning categories based on
Learning discovered patterns.
Example of • Scenario: Model trained on unlabeled images of
cow, elephant, and camel.
Unsupervised • Identifies and groups images based on similarities,
even without prior knowledge of what a dog or
Learning cat looks like.
Uunsupervised Learning
Types of • Clustering:Grouping similar data points together.
Unsupervised • Association:Identifying patterns and relationships
between items in a dataset.
Learning
• Types of Clustering
– Hierarchical Clustering
– K-means Clustering
– Principal Component Analysis (PCA)
Clustering –

Singular Value Decomposition (SVD)
Independent Component Analysis (ICA)
– Gaussian Mixture Models (GMMs)
– Density-Based Spatial Clustering of Applications with
Noise (DBSCAN)
• Definition
– Identifying patterns in data using association rules.
Association Rule • Algorithms
Learning – Apriori Algorithm
– Eclat Algorithm
– FP-Growth Algorithm
• Evaluation Metrics
Evaluating – Silhouette Score

Unsupervised –

Calinski-Harabasz Score
Adjusted Rand Index
– Davies-Bouldin Index
Learning Models – F1 Score (adapted for clustering)
• Areas of Application
– Anomaly Detection
Applications of – Scientific Discovery
Unsupervised Learning – Recommendation Systems
– Customer Segmentation
– Image Analysis
• No need for labeled training data.
Advantages of • Effective for dimensionality reduction.
Unsupervised Learning • Capable of finding unknown patterns.
• Provides insights from unlabeled data.
• Hard to measure accuracy due to the lack of
predefined answers.
• Typically lower accuracy compared to supervised
Disadvantages of learning.
Unsupervised Learning • Requires manual interpretation and labeling
post-classification.
• Sensitive to data quality and challenges in performance
evaluation.
Thank You !!

You might also like