ML Module 3
ML Module 3
MODULE 3
CHAPTER 4
SIMILARITY-BASED LEARNING
Similarity or Instance-based Learning
.IN
C
N
SY
U
VT
KNN
Variants of KNN
Locally weighted regression
Learning vector quantization
Self-organizing maps
RBF networks
Nearest-Neighbor Learning
A powerful classification algorithm used in pattern recognition.
K nearest neighbors stores all available cases and classifies new cases based on a
similarity measure (e.g distance function)
One of the top data mining algorithms used today.
A non-parametric lazy learning algorithm (An Instance based Learning method).
Used for both classification and regression problems.
Purnima SM
Module 3- Machine Learning (BCS602)
Purnima SM
Module 3- Machine Learning (BCS602)
.IN
4.5 Locally Weighted Regression (LWR)
C
N
SY
U
VT
Where, г is called the bandwidth parameter and controls the rate at which wi reduces to zero
with distance from xi.
Purnima SM
MODULE 3
CHAPTER 5
REGRESSION ANALYSIS
1.1 Introduction to Regression
Regression analysis is a fundamental concept that consists of a set of machine learning methods
that predict a continuous outcome variable (y) based on the value of one or multiple predictor
variables (x).
OR
Regression analysis is a statistical method to model the relationship between a dependent
(target) and independent (predictor) variables with one or more independent variables.
Regression is a supervised learning technique which helps in finding the correlation between
variables.
.IN
It is mainly used for prediction, forecasting, time series modelling, and determining the causal-
effect relationship between variables.
C
Regression shows a line or curve that passes through all the data points on target-predictor
graph in such a way that the vertical distance between the data points and the regression line
N
is minimum." The distance between data points and line tells whether a model has captured a
strong relationship or not.
SY
1
Positive Correlation: Two variables are said to be positively correlated when their values
move in the same direction. For example, in the image below, as the value for X increases, so
does the value for Y at a constant rate.
Negative Correlation: Finally, variables X and Y will be negatively correlated when their
values change in opposite directions, so here as the value for X increases, the value for Y
decreases at a constant rate.
Neutral Correlation: No relationship in the change of variables X and Y. In this case, the
values are completely random and do not show any sign of correlation, as shown in the
following image:
.IN
C
N
Causation
Causation is about relationship between two variables as x causes y. This is called x implies b.
SY
Regression is different from causation. Causation indicates that one event is the result of the
occurrence of the other event; i.e. there is a causal relationship between the two events.
Linear and Non-Linear Relationships
U
The relationship between input features (variables) and the output (target) variable is
VT
fundamental. These concepts have significant implications for the choice of algorithms, model
complexity, and predictive performance.
Linear relationship creates a straight line when plotted on a graph, a Non-Linear relationship
does not create a straight line but instead creates a curve.
Example:
Linear-the relationship between the hours spent studying and the grades obtained in a class.
Non-Linear-
Linearity:
Linear Relationship: A linear relationship between variables means that a change in one
variable is associated with a proportional change in another variable. Mathematically, it can be
represented as y = a * x + b, where y is the output, x is the input, and a and b are constants.
2
Linear Models: Goal is to find the best-fitting line (plane in higher dimensions) to the data
points. Linear models are interpretable and work well when the relationship between variables
is close to being linear.
Limitations: Linear models may perform poorly when the relationship between variables is
non-linear. In such cases, they may underfit the data, meaning they are too simple to capture
the underlying patterns.
Non-Linearity:
Non-Linear Relationship: A non-linear relationship implies that the change in one variable is
not proportional to the change in another variable. Non-linear relationships can take various
forms, such as quadratic, exponential, logarithmic, or arbitrary shapes.
Non-Linear Models: Machine learning models like decision trees, random forests, support
vector machines with non-linear kernels, and neural networks can capture non-linear
relationships. These models are more flexible and can fit complex data patterns.
.IN
Benefits: Non-linear models can perform well when the underlying relationships in the data
are complex or when interactions between variables are non-linear. They have the capacity to
capture intricate patterns.
C
N
SY
U
VT
Types of Regression
3
Linear Regression:
Single Independent Variable: Linear regression, also known as simple linear regression, is
used when there is a single independent variable (predictor) and one dependent variable
(target).
Equation: The linear regression equation takes the form: Y = β0 + β1X + ε, where Y is the
dependent variable, X is the independent variable, β0 is the intercept, β1 is the slope
(coefficient), and ε is the error term.
Purpose: Linear regression is used to establish a linear relationship between two variables and
make predictions based on this relationship. It's suitable for simple scenarios where there's only
one predictor.
Multiple Regression:
Multiple Independent Variables: Multiple regression, as the name suggests, is used when there
are two or more independent variables (predictors) and one dependent variable (target).
.IN
Equation: The multiple regression equation extends the concept to multiple predictors: Y = β0
+ β1X1 + β2X2 + ... + βnXn + ε, where Y is the dependent variable, X1, X2, ..., Xn are the
independent variables, β0 is the intercept, β1, β2, ..., βn are the coefficients, and ε is the error
C
term.
N
Purpose: Multiple regression allows you to model the relationship between the dependent
variable and multiple predictors simultaneously. It's used when there are multiple factors that
SY
may influence the target variable, and you want to understand their combined effect and make
predictions based on all these factors.
Polynomial Regression:
U
Use: Polynomial regression is an extension of multiple regression used when the relationship
between the independent and dependent variables is non-linear.
VT
Equation: The polynomial regression equation allows for higher-order terms, such as quadratic
or cubic terms: Y = β0 + β1X + β2X^2 + ... + βnX^n + ε. This allows the model to fit a curve
rather than a straight line.
Logistic Regression:
Use: Logistic regression is used when the dependent variable is binary (0 or 1). It models the
probability of the dependent variable belonging to a particular class.
Equation: Logistic regression uses the logistic function (sigmoid function) to model
probabilities: P(Y=1) = 1 / (1 + e^(-z)), where z is a linear combination of the independent
variables: z = β0 + β1X1 + β2X2 + ... + βnXn. It transforms this probability into a binary
outcome.
Lasso Regression (L1 Regularization):
Use: Lasso regression is used for feature selection and regularization. It penalizes the absolute
values of the coefficients, which encourages sparsity in the model.
4
Objective Function: Lasso regression adds an L1 penalty to the linear regression loss function:
Lasso = RSS + λΣ|βi|, where RSS is the residual sum of squares, λ is the regularization strength,
and |βi| represents the absolute values of the coefficients.
Ridge Regression (L2 Regularization):
Use: Ridge regression is used for regularization to prevent overfitting in multiple regression. It
penalizes the square of the coefficients.
Objective Function: Ridge regression adds an L2 penalty to the linear regression loss function:
Ridge = RSS + λΣ(βi^2), where RSS is the residual sum of squares, λ is the regularization
strength, and (βi^2) represents the square of the coefficients.
Limitations of Regression
.IN
C
N
SY
U
Linear regression model can be created by fitting a line among the scattered data points. The
line is of the form:
5
Ordinary Least Square Approach
The ordinary least squares (OLS) algorithm is a method for estimating the parameters of a
.IN
linear regression model. Aim: To find the values of the linear regression model's parameters
(i.e., the coefficients) that minimize the sum of the squared residuals.
In mathematical terms, this can be written as: Minimize ∑(yi – ŷi)^2
C
where yi is the actual value, ŷi is the predicted value.
N
A linear regression model used for determining the value of the response variable, ŷ, can be
represented as the following equation.
SY
6
.IN
C
N
SY
7
VT
U
SY
N
C
.IN
8
Linear Regression in Matrix Form
.IN
C
N
SY
U
VT
9
.IN
of determination r2 is the ratio of the explained and unexplained variations.
C
N
SY
U
VT
10
CHAPTER 5
REGRESSION ANALYSIS
2 Consider the following dataset in Table 5.11 where the week and number of working hours per
week spent by a research scholar in a library are tabulated. Based on the dataset, predict the
number of hours that will be spent by the research scholar in the 7th and 9th week. Apply Linear
regression model.
Table 5.11
xi 1 2 3 4 5
(week)
yi 12 18 22 28 35
(Hours Spent)
.IN
Solution
4 28 16 112
5 35 25 175
Sum = 15 Sum = 115 Avg ( xi xi )=55/5=11 Avg( xi yi )=401/5=80.2
avg( xi )=15/5=3 avg( yi )=115/5=23
U
VT
xy x y
a1 ________
2
xi2 x
a0 y a 1 x
The prediction for the 7th week hours spent by the research scholar will be
The prediction for the 9th week hours spent by the research scholar will be
Height of Boys 65 70 75 78
Height of Girls 63 67 70 73
.IN
C
Fit a suitable line of best fit for the above data.
N
Solution
SY
xi yi xi xi xi yi
65 63 4225 4095
U
70 67 4900 4690
75 70 5625 5250
VT
78 73 6084 5694
Sum = 288 Sum = 273 Avg ( xi xi Avg( xi yi
Mean( xi Mean( yi )=20834/4=5208.5 )=19729/4=4932.25
)=288/4=72 )=273/4=68.25
xy x y
a1 ________
2
xi2 x
a0 y a 1 x
4932.25 72(68.25) 18.25
a1 0.7449
5208.5 722 24.5
y 0.7449 14.6172 x
4 Using multiple regression, fit a line for the following dataset shown in Table 5.13.
Here, Z is the equity, X is the net sales and Y is the asset. Z is the dependent variable
and X and Y are independent variables. All the data is in million dollars.
Z X Y
.IN
4 12 8
6 18 12
C
7 22 16
N
8 28 36
SY
11 35 42
U
Solution
VT
1 12 8
1 18 12
X 1 22 16
1 28 36
1 35 42
4
6
Y 7
8
11
The regression coefficients can be found as follows
^
a (( X T X )1 X T )Y
.IN
3142 3524 8 12 16 8
11
0.4135
C
= 0.39625
0.0658
N
SY
***
U
VT
VT
U
SY
N
C
.IN
VT
U
SY
N
C
.IN
VT
U
SY
N
C
.IN
VT
U
SY
N
C
.IN
VT
U
SY
N
C
.IN
VT
U
SY
N
C
.IN
VT
U
SY
N
C
.IN
CHAPTER 6
DECISION TREE LEARNING
6.1 Introduction
.IN
The benefits of having a decision tree are as follows :
It does not require any domain knowledge.
It is easy to comprehend.
C
The learning and classification steps of a decision tree are simple and fast.
N
Example : Toll free number
SY
6.1.1 Structure of a Decision Tree A decision tree is a structure that includes a root
node, branches, and leaf nodes. Each internal node denotes a test on an attribute, each
branch denotes the outcome of a test, and each leaf node holds a class label. The topmost
U
.IN
C
N
Knowledge Inference or Classification
SY
U
VT
.IN
Entropy
Information gain C
N
SY
U
VT
Algorithm 6.1: General Algorithm for Decision Trees
.IN
C
N
6.2 DECISION TREE INDUCTION ALGORITHMS
SY
U
VT
.IN
C
N
SY
U
VT
6.2.2 C4.5 Construction
C4.5 is a widely used algorithm for constructing decision trees from a dataset.
Disadvantages of ID3 are: Attributes must be nominal values, dataset must not include
missing data, and finally the algorithm tend to fall into overfitting.
To overcome this disadvantage Ross Quinlan, inventor of ID3, made some
improvements for these bottlenecks and created a new algorithm named C4.5. Now, the
algorithm can create a more generalized models including continuous data and could
handle missing data. And also works with discrete data, supports post-prunning.
.IN
C
N
SY
U
VT
Dealing with Continuous Attributes in C4.5
.IN
C
N
SY
U
VT
.IN
6.2.3 Classification and Regression Trees Construction
C
Classification and Regression Trees (CART) is a widely used algorithm for
constructing decision trees that can be applied to both classification and regression
N
tasks. CART is similar to C4.5 but has some differences in its construction and splitting
SY
criteria.
The classification method CART is required to construct a decision tree based on Gini's
impurity index. It serves as an example of how the values of other variables can be used
U