0% found this document useful (0 votes)
6 views29 pages

@vtudeveloper - in ML Mod 3

The document discusses various machine learning algorithms, focusing on similarity-based learning methods like k-Nearest Neighbors (k-NN) and its weighted variant, as well as regression techniques including linear and polynomial regression. It explains the principles, advantages, and limitations of these methods, alongside applications in classification and prediction tasks. Additionally, it introduces decision tree learning as a supervised predictive model for classification, detailing its structure and construction process.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views29 pages

@vtudeveloper - in ML Mod 3

The document discusses various machine learning algorithms, focusing on similarity-based learning methods like k-Nearest Neighbors (k-NN) and its weighted variant, as well as regression techniques including linear and polynomial regression. It explains the principles, advantages, and limitations of these methods, alongside applications in classification and prediction tasks. Additionally, it introduces decision tree learning as a supervised predictive model for classification, detailing its structure and construction process.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

BCS602 | MACHINE LEARNING| VTU Belagavi

Module-3

Chapter – 01 - Similarity-based Learning

Nearest-Neighbor Learning

k- Nearest-Neighbors (k-NN) Learning

Definition:

o k-NN is a non-parametric, similarity-based algorithm used for both classification and


regression.
o It predicts the class or value of a test instance based on the ‘K’ nearest neighbors in the
training data.

Working:

o Classification:
 The algorithm determines the class of a test instance by considering the ‘K’
nearest neighbors and selecting the class with the majority vote.

Machine Learning Page 1


BCS602 | MACHINE LEARNING| VTU Belagavi

o Regression:
 The output is the mean of the target variable values of the ‘K’ nearest
neighbors.

Assumption:

o k-NN relies on the assumption that similar objects are closer to each other in the feature
space.

Instance-Based Learning:

o Memory-Based: The algorithm does not build a prediction model ahead of time, but stores
training data for predictions to be made at the time of the test instance.
o Lazy Learning: No model is constructed during training; the learning process
happens only during testing when predictions are required.

Distance Metric:

o The most common distance metric used is Euclidean distance to measure the
closeness of training data instances to the test instance.

Choosing ‘K’:

o The value of ‘K’ determines how many neighbors should be considered for the prediction.
It is typically selected by experimenting with different values of K to find the optimal one
that produces the most accurate predictions.

Machine Learning Page 2


BCS602 | MACHINE LEARNING| VTU Belagavi

Classification Process:

o For a discrete target variable (classification): The class of the test instance is
determined by the majority vote of the 'K' nearest neighbors.
o For a continuous target variable (regression): The output is the mean of the output
variable values of the ‘K’ nearest neighbors.

Advantages:

o Simple and intuitive.


o Effective for small to medium-sized datasets.
o Can handle multi-class classification.

Disadvantages:

o Computationally expensive during prediction because it requires calculating


distances to all training data instances.
o Performance may degrade with high-dimensional data (curse of dimensionality).

Weighted K-Nearest-Neighbor Algorithm Overview:

o Weighted k-NN is an extension of the k-NN algorithm.


o It improves upon k-NN by assigning weights to neighbors based on their distance from the
test instance.

Motivation:

o Traditional k-NN assigns equal importance to all the ‘k’ nearest neighbors, which can lead to
poor performance when:
 Neighbors are at varying distances.
 The nearest instances are more relevant than the farther ones.

Machine Learning Page 3


BCS602 | MACHINE LEARNING| VTU Belagavi

o Weighted k-NN addresses this by making closer neighbors more influential.

Working Principle:

Weights are inversely proportional to distance:

 Closer neighbors get higher weights, while farther neighbors get lower
weights.

o The final prediction is based on the weighted majority vote (classification) or the
weighted average (regression) of the k nearest neighbors.

Weight Assignment:

o Uniform Weighting: All neighbors are given the same weight (as in standard k-NN).
o Distance-Based Weighting: Weights are computed based on the inverse distance, giving
closer neighbors more influence.

Advantages:

o Addresses the limitations of standard k-NN by considering the relative importance of


neighbors.
o Performs better in datasets where closer neighbors are more relevant to the
prediction.

Applications:

o Classification: Predict the class of the test instance by weighted voting of the k nearest
neighbors.

Machine Learning Page 4


BCS602 | MACHINE LEARNING| VTU Belagavi

o Regression: Predict the output value by computing the weighted mean of the k nearest
neighbors.

Limitations:

o Computational cost increases as distance calculations and weight assignments are performed
for each query.
o Sensitive to the choice of the distance metric (e.g., Euclidean, Manhattan, etc.).

Nearest Centroid Classifier

A simple alternative to k-NN classifiers for similarity-based classification is the Nearest Centroid
Classifier.

It is a simple classifier and also called as Mean Difference classifier.

The idea of this classifier is to classify a test instance to the class whose centroid/mean is closest
to that instance.

The Nearest Centroid Classifier (also known as the Mean Difference Classifier) is a simple
alternative to k-Nearest Neighbors (k-NN) for similarity-based classification.

Algorithm

Inputs: Training dataset T, Distance metric d, Test instance t Output:

Predicted class or category

1. Compute the mean/centroid of each class.

2. Compute the distance between the test instance and mean/centroid of each class
(Euclidean Distance).

3. Predict the class by choosing the class with the smaller distance.

Machine Learning Page 5


BCS602 | MACHINE LEARNING| VTU Belagavi

Locally Weighted Regression (LWR)

Locally Weighted Regression (LWR) is a non-parametric supervised learning algorithm that


performs local regression by combining regression model with nearest neighbor’s model.
LWR is also referred to as a memory-based method as it requires training data while
prediction but uses only the training data instances locally around the point of interest.

Machine Learning Page 6


BCS602 | MACHINE LEARNING| VTU Belagavi

Using nearest neighbors algorithm, we find the instances that are closest to a test instance and
fit linear function to each of those ‘K’ nearest instances in the local regression model.
The key idea is that we need to approximate the linear functions of all ‘K’ neighbors that
minimize the error such that the prediction line is no more linear but rather it is a curve.

Machine Learning Page 7


BCS602 | MACHINE LEARNING| VTU Belagavi

Machine Learning Page 8


BCS602 | MACHINE LEARNING| VTU Belagavi

Machine Learning Page 9


BCS602 | MACHINE LEARNING| VTU Belagavi

Chapter – 02

Regression Analysis

Introduction to Regression

Definition:
Regression analysis is a supervised learning technique used to model the relationship between
one or more independent variables (x) and a dependent variable (y).

Objective:
The goal is to predict or forecast the dependent variable (y) based on the independent variables
(x), which are also called explanatory, predictor, or independent variables.

Mathematical Representation:
The relationship is represented by a function:

Purpose:

Regression analysis helps to determine how the dependent variable changes when an independent
variable is varied while others remain constant.

It answers key questions such as:

o What is the relationship between variables?


o What is the strength and nature (linear or non-linear) of the relationship?
o What is the relevance and contribution of each variable?

Machine Learning Page 10


BCS602 | MACHINE LEARNING| VTU Belagavi

Applications:

 Sales forecasting
 Bond values in portfolio management
 Insurance premiums
 Agricultural yield predictions
 Real estate pricing

Prediction Focus:
Regression is primarily used for predicting continuous or quantitative variables, such as price,
revenue, and other measurable factors.

Introduction to Linear Regression

Definition:
Linear Regression is a fundamental supervised learning algorithm used to model the
relationship between one or more independent variables (predictors) and a dependent variable
(target).

It assumes a linear relationship between the variables.

Objective:
The primary goal of linear regression is to find a linear equation that best fits the data points.
This equation is used to predict the dependent variable based on the values of the independent
variables.

Mathematical Representation:
The relationship is represented as:

Assumptions:

 Linearity: The relationship between x and y is linear.


 Independence: Observations are independent of each other.

Machine Learning Page 11


BCS602 | MACHINE LEARNING| VTU Belagavi

 Homoscedasticity: Constant variance of errors across all levels of xxx.


 Normality: The residuals (errors) are normally distributed.

Types of Linear Regression:

 Simple Linear Regression: Involves one independent variable.


 Multiple Linear Regression: Involves two or more independent variables.

Applications:

 Predicting house prices based on features like size and location.


 Estimating sales based on advertising expenditure.
 Forecasting stock prices or other financial metrics.
 Modeling growth trends in industries.

Advantages:

 Easy to implement and interpret.


 Efficient for linearly separable data.

Limitations:

 Struggles with non-linear relationships.


 Sensitive to outliers, which can distort predictions.

Machine Learning Page 12


BCS602 | MACHINE LEARNING| VTU Belagavi

Machine Learning Page 13


BCS602 | MACHINE LEARNING| VTU Belagavi

Multiple Linear Regression

Multiple regression model involves multiple predictors or independent variables and one
dependent variable.

This is an extension of the linear regression problem. The basic assumptions of multiple linear
regression are that the independent variables are not highly correlated and hence multicollinearity
problem does not exist.

Also, it is assumed that the residuals are normally distributed.

Machine Learning Page 14


BCS602 | MACHINE LEARNING| VTU Belagavi

Machine Learning Page 15


BCS602 | MACHINE LEARNING| VTU Belagavi

Definition:
Multiple Linear Regression (MLR) is an extension of simple linear regression, where multiple
independent variables (predictors) are used to model the relationship with a single dependent
variable (target).

Mathematical Representation:
The relationship is represented as:

Assumptions of Multiple Linear Regression:

No Multicollinearity: The independent variables should not be highly correlated with each
other. Multicollinearity can cause issues in estimating the coefficients accurately.

Normality of Residuals: The residuals (errors) should be normally distributed for valid
inference and hypothesis testing.

o Linearity: The relationship between each independent variable and the dependent variable
should be linear.
o Independence of Errors: Observations should be independent of each other.
o Homoscedasticity: The variance of residuals should be constant across all levels of the
independent variables.

Machine Learning Page 16


BCS602 | MACHINE LEARNING| VTU Belagavi

Applications:

o Predicting house prices based on multiple features (size, location, number of rooms, etc.).
o Estimating the sales of a product based on various factors (price, advertising budget,
competition, etc.).
o Modeling health outcomes based on multiple risk factors (age, BMI, physical activity, etc.).

Advantages:

o Can model the relationship between multiple predictors and a single outcome.
o Provides insights into how different predictors influence the dependent variable.

Limitations:

o If multicollinearity exists (high correlation between predictors), it can affect the


stability and interpretability of the model.
o Can be computationally complex with a large number of predictors.
o Sensitive to outliers, which can distort the relationship between variables.

Polynomial Regression

Introduction to Polynomial Regression

Definition:
Polynomial Regression is a form of regression analysis that models the relationship between the
independent variable(s) and the dependent variable as a polynomial function.

It is used when the relationship between variables is non-linear and cannot be effectively modeled
using linear regression.

Machine Learning Page 17


BCS602 | MACHINE LEARNING| VTU Belagavi

Purpose:
When the data exhibits a non-linear trend, linear regression may result in large errors.
Polynomial regression overcomes this limitation by fitting a curved line to the data.

Approaches to Handle Non-Linearity:

Features of Polynomial Regression:

 Captures curved relationships between variables.


 Provides a more flexible model compared to linear regression.

Applications:

 Modeling growth trends in populations or markets.


 Predicting real-world phenomena such as temperature variations, physics
experiments, or chemical reactions.
 Engineering designs involving complex relationships.

Machine Learning Page 18


BCS602 | MACHINE LEARNING| VTU Belagavi

Advantages:

 Capable of modeling non-linear relationships without transforming the data.


 Provides a better fit for datasets with curved trends.

Limitations:

 Increasing the polynomial degree can lead to overfitting the training data.
 Sensitive to outliers, which can significantly distort the fitted curve.
 May require careful tuning of the degree nnn to balance bias and variance.

Logistic Regression

Introduction to Logistic Regression

Definition:
Logistic Regression is a supervised learning algorithm used for classification problems,
particularly binary classification, where the output is a categorical variable with two possible
outcomes (e.g., yes/no, pass/fail, spam/not spam).

Purpose:
Logistic Regression predicts the probability of a categorical outcome and maps the
prediction to a value between 0 and 1. It works well when the dependent variable is binary.

Applications:

o Email classification: Is the email spam or not?


o Student admission prediction: Should a student be admitted or not based on scores?
o Exam result classification: Will the student pass or fail based on marks?

Core Concept:

o Logistic Regression models the probability of a particular response variable.

Machine Learning Page 19


BCS602 | MACHINE LEARNING| VTU Belagavi

o For instance, if the predicted probability of an email being spam is 0.7, there is a 70% chance
the email is spam.

Challenges with Linear Regression for Classification:

o Linear regression can predict values outside the range of 0 to 1, which is unsuitable for
probabilities.
o Logistic Regression overcomes this by using a sigmoid function to map values to the range [0,
1].

Sigmoid Function:
The sigmoid function (also called the log it function) is used to map any real number to the range
[0, 1]. It is mathematically represented as:

Difference between Odds and Probability:

Machine Learning Page 20


BCS602 | MACHINE LEARNING| VTU Belagavi

For example:
If the probability of an event is 0.75, the odds are:

Features of Logistic Regression:

 Logistic Regression predicts the probability of a class label.


 It applies a threshold (e.g., 0.5) to determine the class label.
 It is based on the log-odds transformation to linearize the relationship between
variables.

Advantages:

 Simple and efficient for binary classification.


 Works well when the relationship between the dependent and independent
variables is linear (in terms of log-odds).
 Outputs interpretable probabilities.

Limitations:

 Struggles with non-linear decision boundaries (can be addressed with extensions like
polynomial logistic regression).
 Sensitive to outliers in the dataset.

Machine Learning Page 21


BCS602 | MACHINE LEARNING| VTU Belagavi

For Understanding Purpose Only

Machine Learning Page 22


BCS602 | MACHINE LEARNING| VTU Belagavi

Chapter – 03

Decision Tree Learning

Introduction to Decision Tree Learning Model

Overview:

 Decision tree learning is a popular supervised predictive model for classification tasks.
 It performs inductive inference, generalizing from observed examples.
 It can classify both categorical and continuous target variables.
 The model is often used for solving complex classification problems with high
accuracy.

Structure of a Decision Tree:

 Root Node: The topmost node that represents the entire dataset.
 Internal/Decision Nodes: These are nodes that perform tests on input attributes and split
the dataset based on test outcomes.
 Branches: Represent the outcomes of a test condition at a decision node.

Machine Learning Page 23


BCS602 | MACHINE LEARNING| VTU Belagavi

 Leaf Nodes/Terminal Nodes: Represent the target labels or output of the decision process.
 Path: A path from root to leaf node represents a logical rule for classification.

Process of Building a Decision Tree:

Goal: Construct a decision tree from the given training dataset. Tree

Construction:

o Start from the root and recursively find the best attribute for splitting.
o This process continues until the tree reaches leaf nodes that cannot be further
split.
o The tree represents all possible hypotheses about the data.

Output: A fully constructed decision tree that represents the learned model. Inference or

Classification:

Goal: For a given test instance, classify it into the correct target class. Classification:

o Start at the root node and traverse the tree based on the test conditions for each
attribute.
o Continue evaluating test conditions until reaching a leaf node, which provides the
target class label for the instance.

Advantages of Decision Trees:

1. Easy to model and interpret.


2. Simple to understand.
3. Can handle both discrete and continuous predictor variables.
4. Can model non-linear relationships between variables.

Machine Learning Page 24


BCS602 | MACHINE LEARNING| VTU Belagavi

5. Fast to train.

Disadvantages of Decision Trees:

1. It is difficult to determine how deep the tree should grow and when to stop.
2. Sensitive to errors and missing attribute values in training data.
3. Computational complexity in handling continuous attributes, requiring
discretization.
4. Risk of overfitting with complex trees.
5. Not suitable for classifying multiple output classes.
6. Learning an optimal decision tree is an NP-complete problem.

Decision Tree Induction Algorithms

Several decision tree algorithms are widely used in classification tasks, including ID3, C4.5, and
CART, among others.

These algorithms differ in their splitting criteria, handling of attributes, and robustness to data
characteristics.

Popular Decision Tree Algorithms:

ID3 (Iterative Dichotomizer 3):

o Developed by J.R. Quinlan in 1986.


o Constructs univariate decision trees (splits based on a single attribute).
o Uses Information Gain as the splitting criterion.
o Assumes attributes are discrete or categorical.
o Works well with large datasets but is prone to overfitting on small datasets.
o Cannot handle missing values or continuous attributes directly (requires
discretization).
o No pruning is performed, making it sensitive to outliers.

Machine Learning Page 25


BCS602 | MACHINE LEARNING| VTU Belagavi

C4.5:

o An extension of ID3 developed by J.R. Quinlan in 1993.


o Uses Gain Ratio as the splitting criterion, which normalizes Information Gain.
o Can handle both categorical and continuous attributes.
o Handles missing values by estimating the best split based on available data.
o Prone to outliers, which can affect the tree construction.

CART (Classification and Regression Trees):

o Developed by Breiman et al. in 1984.


o Can handle categorical and continuous-valued target variables.
o Uses the GINI Index as the splitting criterion for classification tasks.
o Builds binary decision trees (only two splits per node).
o Handles missing values and is robust to outliers.
o Can be used for regression tasks, making it versatile.

Univariate vs. Multivariate Decision Trees:

Univariate Decision Trees:

o Split based on a single attribute at each decision node.


o Examples: ID3 and C4.5.
o Simple and axis-aligned splits.

Multivariate Decision Trees:

o Consider multiple attributes for splitting at a single decision node.


o Example: CART.
o More complex and better suited for non-linear relationships.

Machine Learning Page 26


BCS602 | MACHINE LEARNING| VTU Belagavi

Features of Decision Tree Algorithms

Advantages and Limitations of ID3, C4.5, and CART:

Machine Learning Page 27


BCS602 | MACHINE LEARNING| VTU Belagavi

Algorithm

Machine Learning Page 28


BCS602 | MACHINE LEARNING| VTU Belagavi

Machine Learning Page 29

You might also like