Introduction To Clusterwise Regression

Clusterwise linear regression (CLR) simultaneously clusters observations into homogeneous groups and estimates a separate linear regression model for each cluster. This approach is better than first clustering then regressing because cluster analysis does not form groups based on the relationship between the predictor and response variables. CLR represents the true data structure and improves model fit. The CLR model is formulated as a nonlinear programming problem that minimizes the total residual sum of squares across clusters subject to constraints defining the cluster memberships and regression models. An alternative model minimizes the sum of absolute residuals to provide a more robust solution.

Uploaded by

Hania Mohamed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

63 views10 pages

Introduction To Clusterwise Regression

Uploaded by

Hania Mohamed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Clusterwise Linear Regression

By
Eman Ismail
Introduction
 Multiple regression has been frequently used in many
fields.

 Problem
DeSarbo and Cron (1988) mentioned that there are many
applications where the estimation of a single regression line
is not adequate or is even misleading.
Introduction
➢ They showed that by the following synthetic data set ,
as shown in the following figure.

Heterogenous Data

Figure (1): An example of inadequacy of a single regression line

Introduction
• If one regression model was
estimated, the model would be:

yi = 0 + 0 xi , i = 1,...,14 where R2 = 0. (1)

• If the observations were firstly Figure 2

clustered, and two separate
regression models were estimated,
then the models would be:

Cluster 1 : yi = 1 + 2 xi , i = 1,...,7 (2)

Cluster 2 : yi = −1 − 2 xi , i = 8,...,14, (3)

with a combined R 2 = 1.
Figure 3
Introduction
Solution

The Clusterwise Linear

The plot of the data set Regression (CLR) model
should be studied first to check which simultaneously divides the
for such structure prior to observations into a number of
estimation. homogenous clusters and
estimates a regression model for
each cluster by optimizing one
single objective function.
Introduction
 The CLR allows the regression coefficient to vary with observations of
different clusters.
 The importance of the clusterwise regression (CR) can be summarized
as follows:
 Lau et al. (1999) stated that the analysis of real dataset always involves
simultaneous applications of several related statistical methods. The CR
is a technique that applies two statistical methods at the same time,
cluster analysis and regression analysis, by optimizing one single
objective function.
 CR is a better approach than applying the two stage method, which
applies cluster analysis (stage 1) then applies the regression analysis on
the resulting clusters (stage 2) because DeSarbo and Edwards (1996)
mentioned that cluster analysis does not form groups on the basis of the
interrelations between X and y.
 CR represents the true structure present in the data and improves the
overall goodness of fit as clarified in DeSarbo and Cron (1988) and Shao
(2004).
Model 1: NLP model
➢ Lau et al. (1999) formulated the CLR using the MP approach

Find the values of C i 1 ,C i 2 ,  0 ,  0 ,  j ,  j , i 1 and  i 2 i = 1,..., n , j = 1,..., J which:

n
minimize  (C i 1 ( i21 ) + C i 2 ( i22 )) (4)
i =1
J
subject to y i =  0 +   j x ij +  i 1 , i = 1,..., n , (5)
j =1
J
y i =  0 +   j x ij +  i 2 , i = 1,..., n , (6)
j =1

C i 1 + C i 2 = 1, i = 1,..., n , (7)
C i 1 ,C i 2  0,  i 1 ,  i 2 ,  0 ,  j ,  0 ,  j unrestricted, i = 1,..., n , j = 1,..., J . (8)
Model 1: NLP model
 This model is defined by; 𝑛 observations, 𝐽 explanatory variables 𝑥𝑖𝑗 and one
response variable 𝑦𝑖 .
 The iterators are for; an explanatory variable 𝑗 ∈ 1, … , 𝐽 and an observation(𝑖
∈ 1, … , 𝑛 ).
 Also, this model assumes that a sample of n observations is divided into two
mutually exclusive segments or clusters (I, II).
 The model decision variables are: the regression coefficients of cluster I and cluster
II 𝛼0 , 𝛼𝑗 and 𝛽0 , 𝛽𝑗 , respectively; the deviations of observation 𝑖 from the regression
line of cluster I and cluster II 𝜀𝑖1 and 𝜀𝑖2 , respectively and binary decision variables
which indicate whether observation 𝑖 belongs to cluster I or cluster II 𝐶𝑖1 and 𝐶𝑖2 ,
respectively.
 If observation 𝑖 belongs to cluster I then 𝐶𝑖1 = 1, otherwise 𝐶𝑖1 = 0. While, if
observation 𝑖 belongs to cluster II then 𝐶𝑖2 = 1, otherwise 𝐶𝑖2 = 0.
Model 1: NLP model
 The objective function (4) is to minimize the total sum of squared
errors.
 Constraint (5) and (6) are required for defining and estimating the
regression functions of cluster I and cluster II, respectively.
 Constraint (7) ensures that the observation i is a member of either
cluster I or cluster II but not both.
 We do not restrict 𝐶𝑖1 and 𝐶𝑖2 to be binary. The optimization with
constraint (7) and 𝐶𝑖1 , 𝐶𝑖2 ≥ 0 will force them to be either zero or
one.
Model 2: NLP model
 If the distribution of the error term is not known, most researchers appeal to
the robustness by minimizing the sum of absolute errors, instead of the sum of
squared errors in equation (4).

Find the values of C i 1 ,C i 2 ,  0 ,  0 ,  j , j , i+1 ,  i−1 ,  i+2 and  i−2 i = 1,..., n , j = 1,..., J which:
n
minimize  (C i 1 ( i+1 +  i−1 ) + C i 2 ( i+2 +  i−2 )) (9)
i =1

subject to
J
y i =  0 +   j x ij +  i+1 −  i−1 , i = 1,..., n , (10)
j =1
J
y i =  0 +   j x ij +  i+2 −  i−2 , i = 1,..., n , (11)
j =1

C i 1 + C i 2 = 1, i = 1,..., n , (12)
C i 1 ,C i 2 ,  i+1 ,  i−1 ,  i+2 ,  i−2  0,  0 ,  0 ,  j ,  j unrestricted, i = 1,..., n , j = 1,..., J . (13)

Weighted Clusterwise Linear Regression based on adaptive quadratic form distance
No ratings yet
Weighted Clusterwise Linear Regression based on adaptive quadratic form distance
20 pages
Exam With Solutions
No ratings yet
Exam With Solutions
7 pages
Machine Learning Lecture Notes Undergrad (1)
No ratings yet
Machine Learning Lecture Notes Undergrad (1)
19 pages
Cp4252 Ml Unit-II
No ratings yet
Cp4252 Ml Unit-II
44 pages
HW-1
No ratings yet
HW-1
9 pages
ML - Mid2
No ratings yet
ML - Mid2
24 pages
ML QUES MOD-1
No ratings yet
ML QUES MOD-1
25 pages
Mauryan Empire
No ratings yet
Mauryan Empire
11 pages
Fiches Machine Learning
No ratings yet
Fiches Machine Learning
21 pages
MA 324, Lecture 1: Yohann Tendero Yohann - Tendero@
No ratings yet
MA 324, Lecture 1: Yohann Tendero Yohann - Tendero@
19 pages
Unit Ii: Beyond Binary Classification: Handling More Than Two Classes, Regression, Unsupervised
No ratings yet
Unit Ii: Beyond Binary Classification: Handling More Than Two Classes, Regression, Unsupervised
22 pages
480-note-lin
No ratings yet
480-note-lin
11 pages
Unit 2
No ratings yet
Unit 2
92 pages
Predictive Modeling Using Regression
100% (1)
Predictive Modeling Using Regression
48 pages
ML Unit 2
No ratings yet
ML Unit 2
66 pages
09. Stochastic Gradient Descent 1
No ratings yet
09. Stochastic Gradient Descent 1
42 pages
Statistical Inference in Nonlinear Sure Model
No ratings yet
Statistical Inference in Nonlinear Sure Model
7 pages
Data Mining - Other Classifiers
No ratings yet
Data Mining - Other Classifiers
7 pages
Econometria 2
No ratings yet
Econometria 2
16 pages
Homework2 v1.0
No ratings yet
Homework2 v1.0
5 pages
CS550 Regression
No ratings yet
CS550 Regression
62 pages
Inverse Trigonometric Functions (Trigonometry) Mathematics Question Bank
From Everand
Inverse Trigonometric Functions (Trigonometry) Mathematics Question Bank
Mohmmad Khaja Shareef
No ratings yet
Introduction to Machine Learning - - Unit 5 - Week 2
No ratings yet
Introduction to Machine Learning - - Unit 5 - Week 2
4 pages
CS-3035 (ML) - CS End April 2024
No ratings yet
CS-3035 (ML) - CS End April 2024
21 pages
Simple Regression
No ratings yet
Simple Regression
45 pages
ECN 5121 Econometric Methods Two-Variable Regression Model: The Problem of Estimation By: Domodar N. Gujarati
No ratings yet
ECN 5121 Econometric Methods Two-Variable Regression Model: The Problem of Estimation By: Domodar N. Gujarati
65 pages
Machine learning
No ratings yet
Machine learning
62 pages
ML_Introduction
No ratings yet
ML_Introduction
76 pages
ML Linear Model
No ratings yet
ML Linear Model
10 pages
IV_AI-DS_AD3491_FDSA_Unit5
No ratings yet
IV_AI-DS_AD3491_FDSA_Unit5
35 pages
Sparse Regression
No ratings yet
Sparse Regression
37 pages
Marquardt method (1)
No ratings yet
Marquardt method (1)
4 pages
Massachusetts Institute of Technology: Your Full Name: Recitation Time
No ratings yet
Massachusetts Institute of Technology: Your Full Name: Recitation Time
17 pages
MIDA1 AUT - Solutions
No ratings yet
MIDA1 AUT - Solutions
4 pages
Data Mining
No ratings yet
Data Mining
32 pages
Introduction To Machine Learning Week 2 Assignment
100% (1)
Introduction To Machine Learning Week 2 Assignment
8 pages
Lecture10 Mid
No ratings yet
Lecture10 Mid
43 pages
SL_LMRG
No ratings yet
SL_LMRG
32 pages
CLM: Review: - OLS Estimation
No ratings yet
CLM: Review: - OLS Estimation
44 pages
Data Analytics
No ratings yet
Data Analytics
32 pages
q6-5 Solution (Ridge and Lasso)
No ratings yet
q6-5 Solution (Ridge and Lasso)
7 pages
Chapter three
No ratings yet
Chapter three
35 pages
DSS07 CLS Rule Induction, K NN, Naive Bayesian en Đã G P
No ratings yet
DSS07 CLS Rule Induction, K NN, Naive Bayesian en Đã G P
507 pages
3. LR, decision tree
No ratings yet
3. LR, decision tree
48 pages
ML Summary PDF
No ratings yet
ML Summary PDF
5 pages
ML (1)
No ratings yet
ML (1)
6 pages
Quiz2_Mock_Solutions
No ratings yet
Quiz2_Mock_Solutions
19 pages
Supervised Learning Algorithms
No ratings yet
Supervised Learning Algorithms
20 pages
Class Test 2 Answer key.docx
No ratings yet
Class Test 2 Answer key.docx
4 pages
Introduction To Machine Learning Lecture 2: Linear Regression
No ratings yet
Introduction To Machine Learning Lecture 2: Linear Regression
38 pages
Linear Regression Algorithm
No ratings yet
Linear Regression Algorithm
16 pages
MM_CIA1
No ratings yet
MM_CIA1
17 pages
CIE-2 Solutions
No ratings yet
CIE-2 Solutions
10 pages
L7-CurveFitting(LeastSquaresRegression)
No ratings yet
L7-CurveFitting(LeastSquaresRegression)
45 pages
3 - SupervisedIntro
No ratings yet
3 - SupervisedIntro
80 pages
Comparison Adaptive Methods Function Estimation From Samples
No ratings yet
Comparison Adaptive Methods Function Estimation From Samples
16 pages
mlt 2021-22
No ratings yet
mlt 2021-22
14 pages
Geometric functions in computer aided geometric design
From Everand
Geometric functions in computer aided geometric design
Oscar Ruiz
No ratings yet
De Moiver's Theorem (Trigonometry) Mathematics Question Bank
From Everand
De Moiver's Theorem (Trigonometry) Mathematics Question Bank
Mohmmad Khaja Shareef
No ratings yet
Multiple Integrals, A Collection of Solved Problems
From Everand
Multiple Integrals, A Collection of Solved Problems
Steven Tan
No ratings yet
Cherry County Brochure New 27072017 7 9
No ratings yet
Cherry County Brochure New 27072017 7 9
3 pages
Emily Klarer - Argumentative Essay
No ratings yet
Emily Klarer - Argumentative Essay
4 pages
JAMB Arabic Syllabus
No ratings yet
JAMB Arabic Syllabus
6 pages
EQAO - Question Breakdown by Strand
No ratings yet
EQAO - Question Breakdown by Strand
4 pages
Avelino Et Al - 2019 - Transformative Social Innovation
No ratings yet
Avelino Et Al - 2019 - Transformative Social Innovation
12 pages
TOS-Diagnostic Test in English 3
No ratings yet
TOS-Diagnostic Test in English 3
3 pages
Kleinman 2011
No ratings yet
Kleinman 2011
9 pages
EE 331 - Signals and Systems: - A Must For All EE Engineers/researchers
No ratings yet
EE 331 - Signals and Systems: - A Must For All EE Engineers/researchers
29 pages
jama_lieu_2020_review article
No ratings yet
jama_lieu_2020_review article
11 pages
Solitron Devices Data Review
No ratings yet
Solitron Devices Data Review
8 pages
! STAUFF Clamps Poster English
0% (1)
! STAUFF Clamps Poster English
1 page
The Isolation and Characterisation of Jacalin Artocarpus Heterophyllus Jackfruit Lectin Based On Its Charge Properties 1995 The International Journal
No ratings yet
The Isolation and Characterisation of Jacalin Artocarpus Heterophyllus Jackfruit Lectin Based On Its Charge Properties 1995 The International Journal
10 pages
Araling Panlipunan 8 Week 4
No ratings yet
Araling Panlipunan 8 Week 4
21 pages
Ninja User Manual
No ratings yet
Ninja User Manual
21 pages
Solution Manual For Operations and Supply Chain Management The Core Canadian Edition 2nd Edition by Jacobs
No ratings yet
Solution Manual For Operations and Supply Chain Management The Core Canadian Edition 2nd Edition by Jacobs
9 pages
50 Shades 13
No ratings yet
50 Shades 13
8 pages
Amazon Web Services, Inc. Invoice: This Invoice Is For The Billing Period September 1 - September 30, 2022
No ratings yet
Amazon Web Services, Inc. Invoice: This Invoice Is For The Billing Period September 1 - September 30, 2022
91 pages
Love, Sex & Marriage in the Middle Ages; A Sourcebook Conor Mccarthy download
100% (1)
Love, Sex & Marriage in the Middle Ages; A Sourcebook Conor Mccarthy download
48 pages
Cuadernillo Module 13
No ratings yet
Cuadernillo Module 13
32 pages
1 To 100 Counting in Sanskrit
No ratings yet
1 To 100 Counting in Sanskrit
2 pages
Iv-Infusion-Rate-Calculations and Sample Questions
No ratings yet
Iv-Infusion-Rate-Calculations and Sample Questions
3 pages
Nutritional Assessment in Surgical Patients
No ratings yet
Nutritional Assessment in Surgical Patients
41 pages
Class 4 QA Ch-16, A Busy Month
No ratings yet
Class 4 QA Ch-16, A Busy Month
5 pages
LA-1641 REV0.2 Schematic Document
No ratings yet
LA-1641 REV0.2 Schematic Document
47 pages
Christian Warfare: by Charles Grandison Finney
No ratings yet
Christian Warfare: by Charles Grandison Finney
14 pages
Grandwaje Case Study
No ratings yet
Grandwaje Case Study
14 pages
Belray 096
No ratings yet
Belray 096
2 pages
Sekolah Menengah Kejuruan at Taqwa: A. Choose The Correct Answer by Crossing (X) A, B, C, D, or E!
No ratings yet
Sekolah Menengah Kejuruan at Taqwa: A. Choose The Correct Answer by Crossing (X) A, B, C, D, or E!
3 pages
Electronic Support For Placing in London: Designed by The Market For The Market
No ratings yet
Electronic Support For Placing in London: Designed by The Market For The Market
3 pages
Untitled
No ratings yet
Untitled
11 pages

Introduction To Clusterwise Regression

Uploaded by

Introduction To Clusterwise Regression

Uploaded by

Clusterwise Linear Regression

Figure (1): An example of inadequacy of a single regression line

• If the observations were firstly Figure 2

The Clusterwise Linear

Find the values of C i 1 ,C i 2 ,  0 ,  0 ,  j ,  j , i 1 and  i 2 i = 1,..., n , j = 1,..., J which:

You might also like