CH 03 Regression Techniques
CH 03 Regression Techniques
Regression Techniques
Regression
• Is Supervised or Unsupervised?
• What is Regression?
1 2 3 4
Predicting age of a Predicting Predicting whether Predicting whether
person nationality of a stock price of a a document is
person company will related to sighting
increase tomorrow of UFOs?
Regression
Regression can be of following kind
• Linear regression
• Multiple linear regression
• Non-linear regression
Linear regression(single predictor variable)
X→ mean value of x
Y→ mean value of y
Example 1
The below table shows the marks obtain by student in midterm and
final year exam
Midterm(x) Final year(y)
45 60
70 70
60 54
84 82
75 68
Find: 84 76
45 60
70 70
60 54
84 82
75 68
84 76
X mean= Y mean= Σ = 648.62 Σ= 1141.36
β =0.558
α =28.84
• Now,
SUBJE AGE
Solve using linear regression CT X GLUCOSE LEVEL Y
For Age =55 what will be the 1 43 99
glucose level? 2 21 65
3 25 79
4 42 75
5 57 87
6 59 81
7 55 ?
Example 2: Linear Regression
Linear Regression-Step I
GLUCOSE
SUBJECT AGE X LEVEL Y XY X2 Y2
It handles overfitting pretty well using Linear regression is quite sensitive to outliers
dimensionally reduction techniques, Hence,it should not be used in the
regularization, and cross-validation case of big-size data
Use Case – Implementing Linear Regression
1.Loading the Data
2.Exploring the Data
3.Slicing The Data
4.Train and Split Data
5.Generate The Model
6.Evaluate The accuracy
Multiple linear regression
• It is used to estimate the relationship between two or more
independent variables and one dependent variable
• Example:
• The selling price of a house can depend on the desirability of the
location, the number of bedrooms, the number of bathrooms, the
year the house was built, the square footage of the lot and a number
of other factors
• The height of a child can depend on the height of the mother, the
height of the father, nutrition, and environmental factors.
Multiple linear regression
Multiple linear regression
The simplest multiple regression model for two predictor variables is
y = β0 + β1x1 + β2x2 + €
Multiple linear regression
The simplest multiple regression model for two predictor variables is
y = a + b1x1 + b2x2 + €
Polynomial Regression
where,
•m – Number of Features
•n – Number of Examples
•Y-i – Actual Target Value
•Y-i(hat) – Predicted Target Value
Ridge Regression
Note: Assume the model suggested by the optimizer for odds of passing the course is,
Since,
Difference between Linear Regression and Logistic Regression
Sr No Linear Regression Logistic Regression
1 Linear Regression is used for solving Regression Logistic regression is used for solving
problem. Classification problems.
2 In Linear regression, we predict the value of In logistic Regression, we predict the values
continuous variables. of categorical variables.
3 In linear regression, we find the best fit line, by In Logistic Regression, we find the S-curve
which we can easily predict the output. by which we can classify the samples.
4 Least square estimation method is used for Maximum likelihood estimation method is
estimation of accuracy. used for estimation of accuracy.
5 The output for Linear Regression must be a The output of Logistic Regression must be a
continuous value, such as price, age, etc. Categorical value such as 0 or 1, Yes or No,
etc.
6 In Linear regression, it is required that In Logistic regression, it is not required to
relationship between dependent variable and have the linear relationship between the
independent variable must be linear. dependent and independent variable.
Difference between Linear Regression and Logistic Regression
• The unsupervised k-means clustering algorithm gives the values of any point
lying in some particular cluster to be either as 0 or 1 i.e., either true or false. But
the fuzzy logic gives the fuzzy values of any particular data point to be lying in
either of the clusters
• Here, in fuzzy c-means clustering, we find out the centroid of the data points and
then calculate the distance of each data point from the given centroids until the
clusters formed become constant.
Fuzzy Clustering
• Fuzzy Clustering is a type of clustering algorithm in machine learning
that allows a data point to belong to more than one cluster with
different degrees of membership.
• Step 1: Initialize the data points into the desired number of clusters
randomly.
• The table below represents the values of the data points along with their
membership (gamma r) in each cluster.
Cluster (1, 3) (2, 5) (4, 8) (7, 9)
1) 0.8 0.7 0.2 0.1
2) 0.2 0.3 0.8 0.9
• Step 3: Find out the distance of each point from the centroid.
D11 = ((1 - 1.568)2 + (3 - 4.051)2)0.5 = 1.2
Similarly, the distance of all other points is computed from both the centroids.
Similarly, compute all other membership values, and update the matrix.
Step 5: Repeat the steps(2-4) until the constant values are obtained for the membership
values or the difference is less than the tolerance value.
(a small value up to which the difference in values of two consequent updations is
accepted).