A Comparative Analysis On Linear Regression and Support Vector Regression
A Comparative Analysis On Linear Regression and Support Vector Regression
for linear regression. The models are evaluated with the metrics B. Regression
such as Root Mean Squared Error (RMSE) and Mean Absolute Regression is a statistical analysis method to identify the
Error (MAE) to estimate its performance to use the model for relationship between the variables. The relationship can be
particular business applications. identified between the dependent and independent variables. It
can be described using probability distribution functions
III. SUPERVISED MACHINE LEARNING
represented in Eq.1.
Supervised Machine Learning represented in Fig 1 infers a
function from labeled training data. The training data is Y = f (X , β ) (1)
mapped to new value from the input data and produce result
[7]. This method is faster and produces accurate results. The Where Y is a dependent variable, X is an independent variable
training dataset consist number of tuples (T). Each tuple is a and β is an unknown parameter.
vector which contains attribute values. The target data can have
more than one possible outcome or a continuous value.
(T ∪ X ) denotes T as input attributes that contains n
attributes. T = {a1,a2,a3,…….an} and y as target attribute. The
attributes can be unordered set values or real numbers. The
training sets are assumed to be generated randomly. The
supervised learning algorithm can be used for classification and
regression.
Fig. 3. Regression
Fig. 2. Classification
Authorized licensed use limited to: Polytechnic University of Bucharest. Downloaded on August 23,2023 at 10:54:32 UTC from IEEE Xplore. Restrictions apply.
2016 Online International Conference on Green Engineering and Technologies (IC-GET)
Subjected to
z i − ( w.x + b) ≤ ε + ξi
( w..x) + b − z i ≤ ε + ξ i*
*
ξ i ξ i ≥0
Authorized licensed use limited to: Polytechnic University of Bucharest. Downloaded on August 23,2023 at 10:54:32 UTC from IEEE Xplore. Restrictions apply.
2016 Online International Conference on Green Engineering and Technologies (IC-GET)
No. of Attributes 5
id_assessment
id_student
Attributes date_submitted
is_banked
score
Attribute
Integer
Characteristics
Test Mode Evaluate on training set
B. Experimental Results
Table II and Table III represents the metrics obtained for
the evaluation of LeastMedSq function for linear regression
model and SMOreg function which uses SVM with linear
Fig. 7. Non-Linear Support Vector Regression
kernel model respectively.
After processing the training data into feature space, TABLE II. LEASTMEDSQ FSUNCTION(TRAINING SET)
then normal support vector regression is applied using METRICS OBSERVED VALUE
Eq. 8. Time taken to build model 3.29 seconds
Correlation coefficient -0.0038
1 n
(*
)(
2 α i − α i α j − α j k (xi , x j )
*
) Mean absolute error 9.6689
i , j =1 (8)
max Root mean squared error 12.54
n n
− ε
α i − α *
i(+ ) (
y i α i − α i* ) Relative absolute error 98.8263%
i =1 i =1 Root relative squared error 100.9408%
(α
i =1
i − α i* = 0; ) 0 ≤ α i,α i* ≤ C The evaluation metrics in the above table specifies only
linear regression model with LeastMed sq function.
Where α i ,α i* are Lagrange multipliers TABLE III. SMOREG FUNCTION (TRAINING SET)
Authorized licensed use limited to: Polytechnic University of Bucharest. Downloaded on August 23,2023 at 10:54:32 UTC from IEEE Xplore. Restrictions apply.
2016 Online International Conference on Green Engineering and Technologies (IC-GET)
C. Performance Evaluation In future, the better model can be identified for linear
The performance is evaluated with the metrics obtained. regression with minimized RMSE and MAE and also
Here in this analysis we consider only two metrics i.e., Mean minimum time taken for constructing model on training data.
absolute error and Root mean squared error for evaluation.
REFERENCES
1) Evaluation Metrics
a) Mean absolute error [1] Philip Russom, Big Data Analytics, TDWI Best Practices
Report, 2011.
It calculates the mean of all absolute errors for all
predicted values which is described in Eq. 8. [2] Alfredo Cuzzocrea, Il-Yeol Song,Karen, C. Davis, "Analytics
over Large-Scale Multidimensional Data: The Big Data
n Revolution!", DOLAP’11, ACM, October 28, 2011.
y
'
− y [3] James R. Evans, Carl H. Lindner, "Business Analytics: The
i =1
i i
(8) Next Frontier for Decision Sciences", Decision Science
n Institute, March 2012.
b) Root mean squared error [4] Surajit Chaudhuri, Umeshwar Dayal, and Vivek Narasayya, “An
It calculates the square root of all mean squared errors Overview of Business Intelligence Technology”,
which is described in Eq. 9. Communications of the ACM, Vol. 54, No. 81.1, August 2011.
n [5] Galit Shmueli, Otto R. Koppius, “Predictive Analytics in
y
'
i
− y i
(9) Information Systems Research”, Mis Quarterly, Vol. 35, No. 3,
i =1 pp. 553-572, September 2011.
n [6] RS Michalski, JG Carbonell, TM Mitchell, "Machine learning:
The above metrics are compared to identify the best model An artificial intelligence approach",Springer-Verlag, 2013.
in terms of linear regression for multivariate data analysis. If [7] S. B. Kotsiantis, “Supervised Machine Learning: A Review of
the RMSE and MAE are minimal then the model is the best fit Classification Techniques”, J. of Informatica, Vol. 31, pp.249-
for linear regression. The comparison is represented in Fig 8. 268, 2007.
[8] GAF Seber, AJ Lee, "Linear regression analysis", Wiley Series
in Probability and Statistics,2012.
[9] DC Montgomery, EA Peck, GG Vining, "Introduction to linear
regression analysis", Wiley Series in Probability and Statistics,
2015.
[10] Zhang Xuegong, “Introduction to Statistical Learning Theory
and Support Vector Machines”, Acta Automatica Sinica, 2000.
[11] Smits G.F., Jordaan E.M., “Improved SVM Regression Using
Mixtures of Kernels”, IJCNN '02. Proceedings of the
International Joint Conference on Neural Networks, Vol.3,
2002.
[12] Alex J. Smola and Bernhard Scholkopf," A tutorial on support
vector regression", Statistics and computing, Springer, 2004.
[13] SR Gunn, "Support Vector Machines for Classification and
Regression", ISIS technical report, 1998.
[14] C Li, L Jiang, "Using locally weighted learning to improve
Fig. 8. Comparison of LinearMedSq and SMOReg Linear Kernel functions SMOreg for regression", Trends in Artificial Intelligence,
PRICAI 2006.
It is observed that LinearMedSq function best fits for linear [15] http://crsouza.blogspot.in/2010/03/kernel-functions-for-
regression though the time taken for building the training machine-learning.html
model is more compared to SMOReg function with linear [16] http://crsouza.blogspot.in/2010/04/kernel-support-vector-
kernel. machines-for.html
[17] Hair, J.F.," Multivariate data analysis", Upper Saddle
VI. CONCLUSION AND FUTURE WORK River, NJ [etc.]: Pearson Prentice Hall, 2006.
Data Analytics and business intelligence plays a major role [18] Mark Hall, “The Weka Data Mining Software: An Update”,
in the current competitive market. In case of analyzing a time ACM SIGKDD Explorations, Vol. 11, No. 1, pp. 10-18, June
series multivariate analysis, efficient data model should be 2009.
used for accurate results. If the linear regression model is used [19] Remco R. Bouckaert, “Weka Manual for Version 3-7-8”, The
then there are several functions associated with this model. In University of Waikato, January 2013.
this paper we analyzed the linear regression model with [20] UCI Machine Repository,
LeastMedSq function and SMOreg function over a multivariate http://archieve.ics.uci.edu/ml/datasets/Open+University+/Learni
and time series data set. The analytical results concluded that ng+Analytical+dataset
LeaseMedSq is the best model for linear regression.
Authorized licensed use limited to: Polytechnic University of Bucharest. Downloaded on August 23,2023 at 10:54:32 UTC from IEEE Xplore. Restrictions apply.