UNIT3 Machine Learning
UNIT3 Machine Learning
• Types:
• - Simple Linear Regression: One independent variable
• - Multiple Linear Regression: More than one independent variable
• - Univariate Linear Regression: One dependent variable
• - Multivariate Regression: More than one dependent variable
Linear Regression
• Simple Linear Regression:
• Equation: y = β₀ + β₁X
• Variables:
o y: Dependent variable
Types of o
o
X: Independent variable
β₀: Intercept
o β₁: Slope
• Equation: y = β₀ + β₁X
• Assumption:
• Linear relationship between X (experience) and Y
(salary)
Hypothesis
• Equation: Ŷ = θ₁ + θ₂X
Function in • ŷᵢ = θ₁ + θ₂xᵢ
Linear • Variables:
o yᵢ: True values (dependent variable)
o xᵢ: Input independent training data (independent
Regression variable)
o ŷᵢ: Predicted values
o θ₁: Intercept
o θ₂: Coefficient of x
• Definition:
• Error or difference between predicted value Ŷ and
true value Y
• Gradient Descent:
the Cost • Iterative process to update θ₁ and θ₂ based on
gradients calculated from MSE
Function
• Ensures MSE value converges to the global
minima
• Process:
Gradient • Calculate gradients: ∂J(θ)/∂θ₁ and ∂J(θ)/∂θ₂
• Non-linear Relationship:
o Allows for more flexibility by fitting a polynomial equation to the data.
Why • Curvilinear Relationships:
o Suitable for relationships that are better represented by
• Variables:
Regression •
•
- y: Dependent variable
- x: Independent variable
Work? • - β₀, β₁, ..., βn: Coefficients of the polynomial
terms
• - n: Degree of the polynomial
• - ε: Error term
• Degree (n):
• Crucial aspect of polynomial regression.
Degree • Complexity:
• Degree should be chosen based on the complexity
of the underlying relationship in the data.
What is • Definition:
• SVM is a supervised machine learning algorithm
Support used for both classification and regression, though
it is best suited for classification.
Vector • Objective:
Terminology • Margin:
• The distance between the support vector and
hyperplane, which SVM aims to maximize.
• Kernel Functions:
SVM Kernel •
•
- Linear: K(w,b) = wᵀx + b
- Polynomial: K(w,x) = (γwᵀx + b)ⁿ
Functions •
•
- Gaussian RBF: K(w,x) = exp(-γ||xi - xj||²)
- Sigmoid: K(xi,xj) = tanh(αxᵢᵀxⱼ + b)
• Advantages:
• - Effective in high-dimensional spaces.
Advantages • - Memory efficient as it uses a subset of training
points (support vectors).
of SVM • - Different kernel functions can be specified for
decision functions, with the option to define
custom kernels.
• Binary Classification:
• Consider a binary classification problem with two
Mathematical classes, labeled as +1 and -1.
• Non-Linear SVM:
Vector • Uses kernel functions to handle non-linearly
separable data by transforming it into a
Machines higher-dimensional space.
What is a • A versatile, interpretable algorithm used for
predictive modeling.
• Suitable for both classification and regression
Decision tasks.
• Visual representation of decisions and their
Tree? possible consequences.
• Root Node: Represents the initial feature or
decision.
• Internal Nodes: Test on attributes, leading to
Decision further branching.
• Leaf Nodes: Represent the final decision or
Tree prediction.
• Branches: Indicate the outcomes of decisions.
Structure • Splitting: The process of dividing nodes based on
decision criteria.
• Pruning: Removing unnecessary branches to
improve accuracy.
Decision Tree Approach
Attribute • Information Gain: Measures the change in
entropy after a split.
Selection • Gini Index: Measures the impurity of a node.
Measures
How • Recursive Partitioning: Splitting data based on
attributes.
Decision • Selecting Attributes: Use criteria like Information
Gain or Gini Index.
Formed
• Interpretability: Easy to understand and visualize.
• Versatility: Handles both numerical and
categorical data.
Advantages • Feature Importance: Provides insights into which
features are most important.
• Handling Missing Data: Decision trees can manage
missing values effectively.
• Overfitting: Decision trees can be prone to
overfitting, especially with small datasets.
• Data Sensitivity: Small changes in data can lead to
Disadvantages a completely different tree.
• Bias: Potential bias in the presence of imbalanced
data.
What is • A powerful ensemble learning technique.
• Combines multiple decision trees to enhance
predictive accuracy.
Random • Originated in 2001 by Leo Breiman.
• Widely used for both classification and regression
Forest? tasks.
• Ensemble of Decision Trees: Multiple trees work
together for a common output.
Fundamental • Randomness in Training: Random subsets of data
and features reduce overfitting.
Concepts • Final Prediction: Aggregation of individual tree
predictions (voting for classification, averaging for
regression).
•
Unsupervised –
–
Calinski-Harabasz Score
Adjusted Rand Index
– Davies-Bouldin Index
Learning Models – F1 Score (adapted for clustering)
• Areas of Application
– Anomaly Detection
Applications of – Scientific Discovery
Unsupervised Learning – Recommendation Systems
– Customer Segmentation
– Image Analysis
• No need for labeled training data.
Advantages of • Effective for dimensionality reduction.
Unsupervised Learning • Capable of finding unknown patterns.
• Provides insights from unlabeled data.
• Hard to measure accuracy due to the lack of
predefined answers.
• Typically lower accuracy compared to supervised
Disadvantages of learning.
Unsupervised Learning • Requires manual interpretation and labeling
post-classification.
• Sensitive to data quality and challenges in performance
evaluation.
Thank You !!