0% found this document useful (0 votes)
10 views12 pages

Unit 3

The document provides an overview of various machine learning algorithms, focusing on Linear Regression, Logistic Regression, Decision Trees, Random Forests, Support Vector Machines (SVM), and Naive Bayes Classifier. It explains the definitions, types, and applications of these algorithms, along with their key concepts and mathematical formulations. Additionally, it highlights the advantages and disadvantages of each method, emphasizing their practical uses in classification and regression tasks.

Uploaded by

Kajal Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views12 pages

Unit 3

The document provides an overview of various machine learning algorithms, focusing on Linear Regression, Logistic Regression, Decision Trees, Random Forests, Support Vector Machines (SVM), and Naive Bayes Classifier. It explains the definitions, types, and applications of these algorithms, along with their key concepts and mathematical formulations. Additionally, it highlights the advantages and disadvantages of each method, emphasizing their practical uses in classification and regression tasks.

Uploaded by

Kajal Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

UNIT-3

Linear Regression
Definition:
Linear Regression is a supervised machine learning algorithm
used to predict a value (dependent variable) based on the value of
one or more input variables (independent variables). It shows the
linear relationship between the variables.

• Y = Predicted value (dependent variable)


• X = Input value (independent variable)
• m = Slope of the line (shows how much Y changes with X)
• c = Intercept (value of Y when X = 0)

Example:

Suppose you want to predict a student's marks (Y) based on hours


of study (X). Linear regression helps you find a line that best fits the
data points and can be used to predict future scores.

Types:

1. Simple Linear Regression – One independent variable


2. Multiple Linear Regression – More than one independent
variable

Simple Linear Regression

• Uses: One independent variable (X) to predict one


dependent variable (Y)
• Example:
Predicting salary (Y) based on years of experience (X)
• Goal: Find the best straight line that fits the data points
• Formula: Y = mX + c

2. Multiple Linear Regression

• Uses: Two or more independent variables (X₁, X₂, ..., Xn)


to predict one dependent variable (Y)
• Example:
Predicting house price (Y) based on size (X₁), number of
bedrooms (X₂), and location rating (X₃)
• Goal: Understand how several factors influence the
outcome
• Formula: Y = b₀ + b₁X₁ + b₂X₂ + ... + bnXn

In short:

• Simple Linear Regression = One factor affecting the result


• Multiple Linear Regression = Many factors affecting the result

What is Logistic Regression?

Logistic Regression is a supervised machine learning algorithm


used for classification problems. Unlike Linear Regression, which
predicts continuous values, logistic regression is used to predict
categorical outcomes – especially binary outcomes (like Yes/No,
0/1, True/False).

Decision Tree

A Decision Tree is a supervised learning algorithm used for both


classification and regression tasks. It is called a "tree" because it
resembles a flowchart-like structure where each internal node
represents a test on an attribute (feature), each branch represents
the outcome of the test, and each leaf node represents a class
label (in classification) or a value (in regression).

Why It's Called a Decision Tree

The model makes decisions based on conditions in a hierarchical


manner, similar to how a human might make choices
Real-World Applications:

• Medical diagnosis (e.g., disease prediction)


• Customer segmentation
• Credit scoring
• Fraud detection

What is a Random Forest?

A Random Forest is an ensemble machine learning algorithm


based on Decision Trees. It combines the output of multiple
decision trees to make a more accurate and stable prediction.

• For classification, it outputs the majority vote of the trees.


• For regression, it outputs the average of the outputs of the
trees.
How Does Random Forest Work?

Let’s understand the process step-by-step:

Step 1: Bootstrapping (Sampling)

• From the training dataset, multiple random subsets (with


replacement) are created.
• Each subset is used to train a separate decision tree.

Step 2: Growing Trees with Random Feature Selection

• For each decision tree:


o A random subset of features is selected at each split (not
all features).
o This adds randomness and decorrelates the trees,
reducing overfitting.

Step 3: Aggregating Predictions

• For classification → Majority vote.


• For regression → Mean prediction.

Key Concepts Behind Random Forest

🔸 Bagging (Bootstrap Aggregating)

• Technique to reduce variance and avoid overfitting.


• Multiple models trained on different subsets of data.
• Random Forest = Bagging + Decision Trees.

🔸 Decision Tree

• Base learner in Random Forest.


• Each tree learns a decision boundary, but individual trees may
overfit.
• Combining many trees reduces this risk.

🔸 Out-of-Bag (OOB) Score

• Since trees are trained on bootstrap samples, about 1/3 of


data is left out (not seen by the tree).
• These are used as a test set to estimate model accuracy
without using a separate test set.

Advantages and Disadvantages

Pros:

• Handles both classification and regression tasks.


• Resistant to overfitting (due to averaging).
• Can handle missing data.
• Feature importance ranking.

Cons:

• Slower and more complex than individual decision trees.


• Less interpretable than a single decision tree.
• May not perform well on high-dimensional sparse data (like
text).

What is Support Vector Machine (SVM)


Support Vector Machine (SVM) is a powerful supervised learning algorithm
used in classification, regression, and anomaly detection. Its main
objective is to find the optimal boundary (hyperplane) that separates
different classes in a dataset with the maximum possible margin.

The Main Idea Behind SVM

SVM tries to find the best decision boundary (also called a hyperplane) that
maximally separates the classes.

Key Concepts:

• Hyperplane: A line (in 2D), plane (in 3D), or a higher-dimensional space


that separates data into classes.
• Margin: Distance between the hyperplane and the nearest points
(support vectors) from each class.
• Support Vectors: Data points closest to the hyperplane. They define
the margin.

Goal of SVM

To maximize the margin between classes = better generalization.

Support Vector Machine (SVM) Terminology


• Hyperplane: A decision boundary separating different classes in
feature space, represented by the equation wx + b = 0 in linear
classification.
• Support Vectors: The closest data points to the hyperplane, crucial for
determining the hyperplane and margin in SVM.
• Margin: The distance between the hyperplane and the support vectors.
SVM aims to maximize this margin for better classification
performance.
• Kernel: A function that maps data to a higher-dimensional space,
enabling SVM to handle non-linearly separable data.
• Hard Margin: A maximum-margin hyperplane that perfectly separates
the data without misclassifications.
• Soft Margin: Allows some misclassifications by introducing slack
variables, balancing margin maximization and misclassification
penalties when data is not perfectly separable.
• C: A regularization term balancing margin maximization and
misclassification penalties. A higher C value enforces a stricter penalty
for misclassifications.
• Hinge Loss: A loss function penalizing misclassified points or margin
violations, combined with regularization in SVM.
• Dual Problem: Involves solving for Lagrange multipliers associated
with support vectors, facilitating the kernel trick and efficient
computation.

Linear SVM Intuition

Imagine we have two classes that are linearly separable (you can draw a
straight line between them).
SVM finds the hyperplane with the largest margin.

Mathematical Computation: SVM


Consider a binary classification problem with two classes, labeled as +1 and
-1. We have a training dataset consisting of input feature vectors X and their
corresponding class labels Y.

The equation for the linear hyperplane can be written as:


wTx+b=0wTx+b=0

Where:

• ww is the normal vector to the hyperplane (the direction perpendicular


to it).
• bb is the offset or bias term, representing the distance of the
hyperplane from the origin along the normal vector ww.

What is Naive Bayes Classifier?

Naive Bayes is a supervised learning algorithm based on Bayes' Theorem


with a strong (naive) assumption of independence between features.
Despite this "naive" assumption, it often performs exceptionally well in
practice, especially in natural language processing (NLP) and classification
problems.

Bayes’ Theorem (Foundation of Naive Bayes)

Assumptions Made by Naïve Bayes

The fundamental Naïve Bayes assumption is that each feature makes an:

independent

equal

contribution to the outcome.


Let us take an example to get some better intuition. Consider the car theft

problem with attributes Color, Type, Origin, and the target, Stolen can be

either Yes or No.

The "Naive" Assumption

Naive Bayes assumes that all features are independent given the class. That
is:

P(x1 ,x2 ,...,xn ∣y)=P(x1 ∣y)⋅P(x2 ∣y)⋅⋯⋅P(xn ∣y)

This simplifies computation significantly!

Example:

Here in our dataset, we need to classify whether the car is stolen,

given the features of the car. The columns represent these features and

the rows represent individual entries. If we take the first row of the

dataset, we can observe that the car is stolen if the Color is Red, the Type

is Sports and Origin is Domestic. So we want to classify a Red Domestic

SUV is getting stolen or not. Note that there is no example of a Red

Domestic SUV in our data set.


According to this example, Bayes theorem can be rewritten as:

The variable y is the class variable(stolen?), which represents if the car is

stolen or not given the conditions. Variable X represents the

parameters/features.

X is given as,

Here x1, x2…, xn represent the features, i.e they can be mapped to
Color, Type, and Origin. By substituting for X and expanding using the
chain rule we get,

You might also like