22 A Comparison of Some Multivariate Linear Regression Estimation Methods
22 A Comparison of Some Multivariate Linear Regression Estimation Methods
ABSTRACT This study dealt with some statistical estimation methods for multivariate regression, and these methods
are: multivariate ordinary maximum likelihood estimation (MVN), maximum likelihood estimation via ECM algorithm
(ECM), and covariance weighted least square estimation (CWLS). The practical side, which is the reliance on real
data regarding asthma patients. The scale to compare is the mean square error. The results of compression shows
that the ECM is the best method when the responses variables yi are independent which has the smallest MSE. While
the other methods have different performance respect to the sample sizes in the dependent variables.
Keywords: Multivariate Analysis, Linear Regression, Maximum Likelihood, ECM Algorithm, Covariance –Weighted
Least Square.
I. INTRODUCTION
The analysis of the relationship between the dependent and independent variables is known as regression analysis,
and it shows how the dependent variable will change as one or more independent variables change owing to various
reasons, most regression models assume Yi is a function of Xi and with i denoting an additive error factor that
could reflect un-modeled Yi determinants or random statistical noise [1].
The least squares method was the first type of regression, Legendre published it in 1805, and Gauss published it in
1809. Legendre and Gauss both used the method to figure out how to calculate the orbits of bodies orbiting the Sun,
through astronomical observations. In 1821, Gauss developed a further refinement of the least squares theory. In the
fifties and sixties of the last century, to calculate regressions, economists employed electromechanical desktop
calculators. Prior to 1970, regression methods were still being researched, and new approaches for robust regression
were being created, time series and growth curves are examples of correlated reactions that can be used in
regression, Curves, pictures, graphs, or other complex data objects are used as predictors (independent variables) or
response variables in regression. Below is a large group of research that used multiple regression analysis in various
fields, and which was able through this analysis to reach important results and have a significant impact on human life
at various educational, environmental, health, scientific and other levels [2].
Their study has given a multivariate statistical analysis with regard to the various categories of students, through the
creation of a multivariate regression model, in order to show the important factors affecting the quality of teaching in
universities, as well as in order to develop solutions for them [3]. The use of woody plants has led to the need to
obtain a quick and effective estimation of carbon stocks in forests in terms of cost, and therefore this study aimed to
find the factors associated with The carbon stock in the Shore forest in the state of Nepal and the evaluation of these
factors, and the correlation between the variables was observed through the use of variations, and a positive
correlation with the carbon stock was obtained through the graph, while the height, ownership and geographical
location had no indications Statistic [4]. Linear regression is one of the most important and most popular statistical and
machine learning algorithms, where linear regression is used to find a linear relationship between one or more
variables. In this paper, the different work done by the researchers on polynomial regression is discussed and their
performance is compared to improve prediction and accuracy [5] .
The multivariate regression model is a continuation of the actual case of multiple variables, The impacts on a set of
dependent variables are modeled all at one time. We can give the dependent variables in the form of a repeated
measure, i.e. measurements conducted on the same person or statistical unit at different times and/or under different
conditions. It can also represent completely different measurements, for example, measuring nutritional variables such
as body mass, height, weight and cholesterol at the same time and linking them to the diet of individuals. [6]
134
JOURNAL OF OPTOELECTRONICS LASER ISSN:1005-0086
i 1
XB,
nq
2
n
2 2 i 1
log 2 log y i B xi 1 y i B x i . …..(4)
1 n
The least squares estimate of XB is unaffected by the covariance matrix ; As a result, X B MY optimizes the
likelihood function for every value of Ʃ . It's just a matter of locating MLE. B is replaced with a least squares
, the log-likelihood, and therefore the likelihood, is maximized . Write B X X X Y .
__
approximation, For each
We need to maximize
XB,
nq
2
n
2 2 i 1
_
log 2 log y i Y X X X xi 1 y i Y X X X xi
1 n _
subject to the stipulation that E be a positive definite number. On the right hand side, the last term can be condensed.
Create a n x 1 vector.
i 0,.....,0,1,0,....,0
with the 1 in the ith place.
y
n
Y X X X xi 1 y i Y X X X x i …….(5)
i
i 1
n
i Y X X X X X 1 Y Y X X X X i
i 1
n
i I M Y 1 Y I M i
i 1
tr I M Y 1 Y I M
tr 1 Y I M Y .
As a result, our issue is to increase
135
JOURNAL OF OPTOELECTRONICS LASER ISSN:1005-0086
nq n 1
X B, log 2 log tr 1 Y I M Y . …… (6)
2 2 2
All partial derivatives (with regard to the aij's) are set to zero, we discover the maximizing value.
log tr 1 ……(7)
ij ij
tr 1 TIJ
where the symmetric q x q matrix Tij has ones in row i column j and row j column i and zeros everywhere else.
1
1 1 ……(8)
IJ IJ
1 Tij 1
One last result involving a trace's derivative is required. Let As a ij s be an rr matrix function of the scalar
s.
trAs a11 s .......a rr s
d d
ds ds
r
da s
ii ……(9)
i 1 ds
dAs
tr
ds
From (8) , (9) ,and the chain rule ,
ij
tr 1 Y I M Y tr
1 Y I M Y
ij
1
tr Y I M Y ……(10)
ij
tr 1 Tij 1 Y I M Y
ij
XB , E tr 1 Tij tr 1 Tij 1 Y I M Y
n
2
1
2
When the partial derivatives equal zero, a positive definite matrix is discovered, which solves the problem.
tr 1 Tij tr 1 Tij 1 Y I M Y
n
….(11)
136
JOURNAL OF OPTOELECTRONICS LASER ISSN:1005-0086
1 1 1
tr Tij tr Tij Y I M Y
n
1
tr Tij
This, of course, applies to all i and j. Furthermore, Under weak conditions is positive definite with probability one
[7].
IV. MAXIMUM LIKELIHOOD ESTIMATION VIA THE ECM ALGORITHM
In current statistics, the EM algorithm is a widely used tool. It's an iterative method for finding maximum likelihood
estimates and posterior modes in incomplete-data problems that has several advantages over Newton-Raphson.
Each iteration's E-step just necessitates taking expectations over complete-data conditional distributions, while the M-
step only necessitates complete-data maximum likelihood estimation, which is often in simple closed form. Second, it
is numerically stable: each iteration increases the likelihood or posterior density, and convergence is nearly always to
a local maximum for practically important concerns [8].
The ECM algorithm is more versatile than EM, but it still has the same desirable convergence features. Each M step
of the EM method is replaced by a series of S conditional maximization steps, each of which maximizes the Q function
over y but with a different vector function of y. Under roughly the same conditions that guarantee EM convergence,
ECM converges to a stationary point [9].
Despite the fact that the joint maximizing values of β and Ʃ are rarely in closed form, we observe that if Ʃ is known,
say Ʃ=Ʃ , the conditional maximum probability estimate of β is just the weighted least-squares estimate:
t
1
n n
t 1
X iT t
1
X i X iT t
1
Yi …..(12)
i 1 i 1
The maximum likelihood conditional estimate of Ʃ on the other hand, can be calculated simply from the cross-products
of the residuals supplied β=β :
t+1
t 1
1 n
Yi X i t 1 Yi X i t 1
n i 1
T
….(13)
Each conditional maximization (12) and (13) clearly increases the log-likelihood function [8]:
l t 1 , t Y l t , t Y
l t 1 , t 1 Y l t 1 , t Y .
137
JOURNAL OF OPTOELECTRONICS LASER ISSN:1005-0086
This is mvregress's fourth output. The standard errors of the CWLS regression coefficients are the square root of the
diagonal of this variance-covariance matrix.
If you only know the error covariance matrix up to a proportion, that is, C 0 , As mentioned in Ordinary Least
2
Squares, you can multiply the mvregress variance-covariance matrix by the MSE [10].
VI. APPLICATION
In this part, the real data that represents the data of asthmatic patients will be analyzed. The data are about five
variables: age, gender, weight, oxygen percentage in the body, and the patient's infection with other diseases. A
sample of 200 patients was collected from the Najaf Health Department / Al-Hakim General Hospital.
Before we start using statistical estimation methods, we will analyze the data using the SPSS program.
Table (1) Tests of Normality
a
Kolmogorov-Smirnov Shapiro-Wilk
Statistic df Sig. Statistic df Sig.
*
O2 .037 200 .200 .995 200 .802
*
Sugar_Rate .027 200 .200 .997 200 .979
*
Pressur .045 200 .200 .994 200 .599
*. This is a lower bound of the true significance.
a. Lilliefors Significance Correction
138
JOURNAL OF OPTOELECTRONICS LASER ISSN:1005-0086
139
JOURNAL OF OPTOELECTRONICS LASER ISSN:1005-0086
Figure (5) Dependent Variable :Suger _ Rate Figure (6) Dependent Variable :pressure
Table (2) Results of the Performance of the Estimation Methods for Real Data
Method
MVN ECM CWLS
y1 y2 y3 y1 y2 y3 y1 y2 y3
MSE yi
299.215
35.2838 1.6968 0.0248 0.1718 0.0254 0.2465 0.1711 0.0253
8
RAB
210.908 230.460 200.781 211.115 210.984 200.778 211.146
200.8031 201.4161
2 8 5 0 3 3 7
MSE
120.7454 327.3888 41.7151
ˆ y
1
ˆ y 2
ˆ y 3
ˆ y
1
ˆ y 2
ˆ y 3
ˆ y
1
ˆ y 2
ˆ y 3
140
JOURNAL OF OPTOELECTRONICS LASER ISSN:1005-0086
From the table (2 ) we notice that, After applying the three statistical methods to real data with an amount of (200)
samples, it became clear to us that MSE for the third statistical method (CWLS) is less than the rest of the other
statistical methods ( MVN and ECM ) and with a large variance.
141
JOURNAL OF OPTOELECTRONICS LASER ISSN:1005-0086
VIII. CONCLUSION
In this study, three statistical methods were used, which are MVN , ECM and CWLS , and through each method,
certain results were obtained and then a comparison was made between these results.
The sample size taken in the case of real data is (200), and through the use of the three statistical methods in this
study and obtaining the results, it was concluded that the (CWLS) method is the best statistical method with a very
small amount MSE compared to MSE for the other two methods.
REFERENCES
[1] M. Thakur, "Regression Analysis Formula," 2021. [Online]. Available: https://www.wallstreetmojo.com/regression-
analysis-formula/.
[2] A. Dempster, "An Overview of Multivariate Data Analysis," JOURNAL OF MULTIVARIATE ANALYSIS 1, pp. 316-
346, 1971.
[3] X. Li, P. Zhao and Y. Yang, "Application of Multivariate Regression Analysis in Teaching Management," Atlantis
Press, 2018.
[4] I. Sharma and S. Kakchapati, "Linear Regression Model to Identify the Factors Associated with," Hindawi, p. 8,
2018.
[5] D. . H. Maulud and A. M. Abdulazeez, "A Review on Linear Regression Comprehensive in Machine Learning,"
Journal of Applied Science and Technology Trends, p. 140 –147, 2020.
[6] D. C. Montgomery, E. A. Peck and G. G. Vining, Introduction to linear regression analysis, 2012.
[7] R. Christensen, Advanced Linear Modeling Multivariate, Time Series, and Spatial Data; Nonparametric
Regression and Response Surface Maximization, 2001.
[8] X. L. MENG and D. . B. RUBIN, "Maximum likelihood estimation via the ECM algorithm:," Biometrika, pp. 267-278,
1993.
[9] W. . A. SHEWHART and S. . S. WILKS, Statistical Analysis with Missing Data Second Edition, 2002.
[10] N. Beck and J. N. Katz, "What to do(and not to do ) with Time-Series Cross- Section Data," The American Political
Science Review, pp. 634-647, 1995.
142