0% found this document useful (0 votes)

69 views

Support Vector Regression

This document provides an overview of support vector regression (SVR). SVR is a generalization of support vector machines (SVM) that can be used for regression problems rather than just classification. It works by finding a tube that best approximates the function being estimated, while balancing model complexity and error. The width of this tube is determined by a parameter epsilon that defines the tolerance for error. SVR formulates the problem as a convex optimization that minimizes errors outside the epsilon-tube while keeping the tube as flat as possible. It uses loss functions like linear, quadratic or Huber loss to define the optimization objective.

Uploaded by

Nastaran Moosavi

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

69 views

Support Vector Regression

Uploaded by

Nastaran Moosavi

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/300719035

Support Vector Regression

Chapter · January 2015

DOI: 10.1007/978-1-4302-5990-9_4

CITATIONS READS

53 587

2 authors:

Mariette Awad Rahul Khanna

American University of Beirut Intel
123 PUBLICATIONS 833 CITATIONS 98 PUBLICATIONS 916 CITATIONS

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Datacenter load balancing of compute View project

Power and Machine Learning View project

All content following this page was uploaded by Mariette Awad on 30 July 2018.

The user has requested enhancement of the downloaded file.

Chapter 4

Support Vector Regression

The key to artificial intelligence has always been the representation.

—Jeff Hawkins

Rooted in statistical learning or Vapnik-Chervonenkis (VC) theory, support vector machines (SVMs) are
well positioned to generalize on yet-to-be-seen data. The SVM concepts presented in Chapter 3 can be
generalized to become applicable to regression problems. As in classification, support vector regression
(SVR) is characterized by the use of kernels, sparse solution, and VC control of the margin and the number
of support vectors. Although less popular than SVM, SVR has been proven to be an effective tool in real-value
function estimation. As a supervised-learning approach, SVR trains using a symmetrical loss function,
which equally penalizes high and low misestimates. Using Vapnik’s e -insensitive approach, a flexible tube
of minimal radius is formed symmetrically around the estimated function, such that the absolute values
of errors less than a certain threshold e are ignored both above and below the estimate. In this manner,
points outside the tube are penalized, but those within the tube, either above or below the function, receive
no penalty. One of the main advantages of SVR is that its computational complexity does not depend on
the dimensionality of the input space. Additionally, it has excellent generalization capability, with high
prediction accuracy.
This chapter is designed to provide an overview of SVR and Bayesian regression. It also presents a case
study of a modified SVR applicable to circumstances in which it is critically necessary to eliminate or strictly
limit underestimating a function.

SVR Overview
The regression problem is a generalization of the classification problem, in which the model returns a
continuous-valued output, as opposed to an output from a finite set. In other words, a regression model
estimates a continuous-valued multivariate function.
SVMs solve binary classification problems by formulating them as convex optimization problems
(Vapnik 1998). The optimization problem entails finding the maximum margin separating the hyperplane,
while correctly classifying as many training points as possible. SVMs represent this optimal hyperplane with
support vectors. The sparse solution and good generalization of the SVM lend themselves to adaptation to
regression problems. SVM generalization to SVR is accomplished by introducing an e-insensitive region
around the function, called the e-tube. This tube reformulates the optimization problem to find the tube that
best approximates the continuous-valued function, while balancing model complexity and prediction error.
More specifically, SVR is formulated as an optimization problem by first defining a convex e-insensitive loss
function to be minimized and finding the flattest tube that contains most of the training instances. Hence, a
multiobjective function is constructed from the loss function and the geometrical properties of the tube.

67
Chapter 4 ■ Support Vector Regression

Then, the convex optimization, which has a unique solution, is solved, using appropriate numerical
optimization algorithms. The hyperplane is represented in terms of support vectors, which are training
samples that lie outside the boundary of the tube. As in SVM, the support vectors in SVR are the most
influential instances that affect the shape of the tube, and the training and test data are assumed to
be independent and identically distributed (iid), drawn from the same fixed but unknown probability
distribution function in a supervised-learning context.

SVR: Concepts, Mathematical Model, and Graphical

Representation
SVR problem formulation is often best derived from a geometrical perspective, using the one-dimensional
example in Figure 4-1. The continuous-valued function being approximated can be written as in Equation 4-1.
For multidimensional data, you augment x by one and include b in the w vector to simply the mathematical
notation, and obtain the multivariate regression in Equation 4-2.

∑ w j x j + b, y , b ∈ , x , w ∈  M (4-1)
M
y = f (x) = < w , x > + b = j =1

T
w   x 
f ( x ) =     = w T x + b x , w ∈  M +1 (4-2)
b  1 

Figure 4-1. One-dimensional linear SVR

SVR formulates this function approximation problem as an optimization problem that attempts to find
the narrowest tube centered around the surface, while minimizing the prediction error, that is, the distance
between the predicted and the desired outputs. The former condition produces the objective function in
Equation 4-3, where  w  is the magnitude of the normal vector to the surface that is being approximated:
1
w . (4-3)
2
minw
2

68
Chapter 4 ■ Support Vector Regression

To visualize how the magnitude of the weights can be interpreted as a measure of flatness, consider the
following example:

f ( x , w ) = å i = 1 wi x i , x Î  , w Î  M .
M

Here, M is the order of the polynomial used to approximate a function. As the magnitude of the vector w
increases, a greater number of wi are nonzero, resulting in higher-order solutions, as shown in Figure 4-2.
The horizontal line is a 0th-order polynomial solution and has a very large deviation from the desired
outputs, and thus, a large error. The linear function, a 1st-order polynomial, produces better approximations
for a portion of the data but still underfits the training data. The 6th-order solution produces the best
tradeoff between function flatness and prediction error. The highest-order solution has zero error but a
high complexity and will most likely overfit the solution on yet to be seen data. The magnitude of w acts as a
regularizing term and provides optimization problem control over the flatness of the solution.

Figure 4-2. Solutions with various orders

The constraint is to minimize the error between the predicted value of the function for a given input
and the actual output. SVR adopts an e-insensitive loss function, penalizing predictions that are farther
than e from the desired output. The value of e determines the width of the tube; a smaller value indicates
a lower tolerance for error and also affects the number of support vectors and, consequently, the solution
sparsity. Intuitively, the latter can be visualized for Figure 4-1. If e is decreased, the boundary of the tube is
shifted inward. Therefore, more datapoints are around the boundary, which indicates more support vectors.
Similarly, increasing e will result in fewer points around the boundary.
Because it is less sensitive to noisy inputs, the e-insensitive region makes the model more robust. Several
loss functions can be adopted, including the linear, quadratic, and Huber e, as shown in Equations 4-4, 4-5,
and 4-6, respectively. As demonstrated in Figure 4-3, the Huber loss function is smoother than the linear
and quadratic functions, but it penalizes all deviations from the desired output, with greater penalty as the
error increases. The choice of loss function is influenced by a priori information about the noise distribution
affecting the data samples (Huber 1964), the model sparsity sought, and the training computational
complexity. The loss functions presented here are symmetrical and convex. Although asymmetrical loss
functions can be adopted to limit either underestimation or overestimation, the loss functions should be
convex to ensure that the optimization problem has a unique solution that can be found in a finite number of
steps. Throughout this chapter, the derivations will be based on the linear loss function of Equation 4-4.

69
Chapter 4 ■ Support Vector Regression

ïì 0 y - f ( x ,w ) £ e ;
Le ( y , f ( x ,w ) ) = í (4-4)
îï y - f ( x ,w ) - e otherwise ,

ì0
ï y - f ( x ,w ) £ e ; (4-5)
Le ( y , f ( x ,w ) ) = í
( )
2
y - f ( x ,w ) - e otherwise ,
îï
ì c 2

ïïc y - f ( x ,w ) - y - f ( x ,w ) > c
L ( y , f ( x ,w ) ) = í 2 (4-6)
1
ï y - f ( x ,w ) y - f ( x ,w ) £ c
2

ïî 2

Figure 4-3. Loss function types: (a) linear, (b) quadratic, and (c) Huber

ASYMMETRICAL LOSS FUNCTIONS

Some researchers have proposed modification to loss functions to make them asymmetrical. Shim,
Yong, and Hwang (2011) used an asymmetrical e-insensitive loss function in support vector quantile
regression (SVQR) in an attempt to decrease the number of support vectors. The authors altered the
insensitivity according to the quantile and achieved a sparser model. Schabe (1991) proposed a
two-sided quadratic loss function and a quasi-quadratic s-loss function for Bayes parameter estimation,
and Norstrom (1996) replaced the quadratic loss function with an asymmetrical loss function to
derive a general class of functions that approach infinity near the origin for Bayesian risk analysis.
Nath and Bhattacharyya (2007) presented a maximum margin classifier that bounds misclassification
for each class differently, thus allowing for different tolerances levels. Lee, Hsieh, and Wang (2005)
reformulated the typical SVR approach into a nonconstrained problem, thereby only solving a system
of linear equations rather than a convex quadratic one. Pan and Pan (2006) compared three* different
loss functions for economic tolerance design: Taguchi’s quadratic loss function, inverted normal loss
function, and revised inverted normal loss function.

Adopting a soft-margin approach similar to that employed in SVM, slack variables x, x* can be added
to guard against outliers. These variables determine how many points can be tolerated outside the tube
illustrated in Figure 4-1.
Based on Equations 4-3 and 4-4, the optimization problem in Equation 4-7 is obtained; C is a
regularization—thus, a tuneable parameter that gives more weight to minimizing the flatness, or the error, for
this multiobjective optimization problem. For example, a larger C gives more weight to minimizing the error.
This constrained quadratic optimization problem can be solved by finding the Lagrangian (see Equation 4-8).
The Lagrange multipliers, or dual variables, are l, l*, a, a* and are nonnegative real numbers.

70
Chapter 4 ■ Support Vector Regression

1
min  w 2 + C å i =1 xi + xi* ,
N
(4-7)
2
subject to
yi - w T xi £ e + xi* i = 1...N

w x i - yi £ e + xi
T
i = 1...N
xi ,x ³ 0 i = 1... N
i
*

1
 (w ,x * ,x , l , l * ,a ,a * ) =  w 2 +C å i =1 xi + xi* + å i =1a i* ( yi - w T xi - e - xi* )
N N

2 (4-8)
+ å i =1a i ( - yi + w T xi - e - xi ) - å i =1 lixi + li*xi*
N N

The minimum of Equation 4-8 is found by taking its partial derivatives with respect to the variables
and setting them equal to zero, based on the Karush-Kuhn-Tucker (KKT) conditions. The partial derivatives
with respect to the Lagrange multipliers return the constraints, which have to be less than or equal to zero,
as illustrated in Equation 4-9. The final KKT condition states that the product of the Lagrange multipliers
and the constraints is equal to zero (see Equation 4-10). The Lagrange multipliers that are equal to zero
correspond to data inside the tube, whereas the support vectors have nonzero-valued Lagrange multipliers.
The solution is written in terms of the support vector only—hence, the solution sparsity. The function
approximation is represented in Equation 4-12. By replacing Equation 4-9 in Equation 4-8, the dual form of
the optimization problem can be written as shown in Equation 4-13.

d
= w - å i =1 (a i* - a i ) xi = 0
N

dw
d
= C - li* - a i* = 0
dxi*
d
= C - li - a i = 0
dxi
d (4-9)
= å i =1 xi* £ 0
N

dli*
d
= å i =1 xi £ 0
N

dli
d
= yi - w T xi - e - xi* £ 0
da i*
d
= - yi + w T x i - e - xi £ 0
da i

a i ( - yi + w T x i - e - xi ) = 0
a i* ( yi - w T xi - e - xi* ) = 0 (4-10)
"i
li xi = 0 ,
li*xi* = 0

w = N SV (a * - a ) x
å i=1 i i i (4-11)

71
Chapter 4 ■ Support Vector Regression

f ( x ) = å i =1SV (a i* - a i ) xiT x ,a i ,a i* Î[0 ,C ]

N
(4-12)

1 N SV N SV *
max a ,a * - e å SV (a i + a i* ) + å SV (a i* - a i ) yi - å å (ai - ai ) (a *j - a j ) xiT x j ,
N N
(4-13)
i =1 i =1
2 j =1 i =1

subject to

å (a - a i ) = 0 , a i , a i* Î[0 ,C ]
N SV *
i =1 i

At the beginning of this section, the weights vector w was augmented with the scalar b, and the
derivation of the SVR’s mathematical formulation was carried out, disregarding the explicit computation of b
(see Equation 4-2). However, b could have been calculated from the KKT conditions, as shown next.
Training data that belong to the outside of the boundary of the tube will have nonzero ai or a i* ; they
cannot both be zero, because that would mean that the instance (xi, yi) belongs to the lower and upper
boundary, which is not possible. Therefore, the corresponding constraints will be satisfied with equality, as
demonstrated in Equation 4-14. Furthermore, because the point is not outside the tube, xi = 0 , leading to
the result in Equation 4-15 when a Î(0 ,C ) . Equation 4-16 computes b. Performing the same analysis for a i* ,
one gets Equations 4-17 and 4-18.
(4-14)
yi - w T xi - b - e - xi = 0
(4-15)
yi - w xi - b - e = 0
T

(4-16)
b = yi - w xi - e
T

(4-17)
- yi + w T xi - b - e = 0
(4-18)
b = - yi + w xi - e
T

Instead of using the KKT conditions, one could have also computed b, while solving the optimization
problem, using the interior-point method, which can converge to an optimal solution in logarithmic time by
navigating along the central path of the feasible region. The central path is determined by solving the primal
and dual optimization problems simultaneously.

Kernel SVR and Different Loss Functions: Mathematical

Model and Graphical Representation
The previous section dealt with data in the feature space, assuming f (x) is linear. For non linear functions,
the data can be mapped into a higher dimensional space, called kernel space, to achieve a higher accuracy,
using kernels that satisfy Mercer’s condition (see Figure 4-4), as discussed previously for classification.
Therefore, replacing all instances of x in Equations 4-1–4-18 with k(xi, xj) yields the primal formulation
shown in Equation 4-19, where j(.) is the transformation from feature to kernel space. Equation 4-20
describes the new weight vector in terms of the transformed input. The dual problem is represented in
Equation 4-21, and the function approximation f (x) is in Equation 4-22, where k(.,.), the kernel, is as
illustrated in Equation 4-23.

72
Chapter 4 ■ Support Vector Regression

1 (4-19)
min  w 2 +C å i =1 xi + xi* ,
N

subject to
yi - w T j ( xi ) £ e + xi* i = 1,..., N

w T j ( x i ) - y i £ e + xi i = 1,..., N

xi ,xi* ³ 0 i = 1, ..., N
N SV

w = å (a i - a i )j ( xi )
* (4-20)
i =1

N SV N SV
1 N SV N SV *
max a ,a * - e å (a i + a i ) + å (a i - a i ) yi -
*

i =1
*

i =1
å å (ai - ai ) (a *j - a j ) k ( xi , x j )
2 j =1 i =1
(4-21)

N SV
a i ,a i* Î [0 ,C ] ,i = 1,..., N SV , å (a i* - a i ) = 0
i =1

N SV

f ( x ) = å (a i* - a i ) k ( xi , x ) (4-22)
i =1

(4-23)
k ( xi , x ) = j ( xi ) .j ( x )

2.5

1.5

0.5

-0.5

-1

-1.5

-2

-2.5
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6

Figure 4-4. Nonlinear regression

73
Chapter 4 ■ Support Vector Regression

Bayesian Linear Regression

Unlike SVR, Bayesian linear regression is a generative, as opposed to discriminant, method, that builds
linear regression models based on Bayesian inference. After specifying a model, the method computes the
posterior distribution of parameters and model predictions. This statistical analysis allows the method to
determine model complexity during training, which results in a model that is less likely to overfit.
For simplicity, assume that a single output y p Î are predicted using the model parameters w
learned from a set of predictor variables X sized k ´1 and observations Y sized n ´1 . The observations Y
are assumed to have the distribution in Equation 4-24, where s 2 is the variance of the uncertainty in the
observations:

P (Y | w ,s 2 , X ) ~  ( Xw ,s 2 I ) (4-24)

Once the model has been specified, the model parameters’ posterior distributions can be estimated.
This is done by first assuming a prior distribution of the model parameters (see Equation 4-25). Given the
model variance and observations, the posterior distribution of the model parameters (which is Gaussian)
is as shown in Equation 4-26, with the mean computed in Equation 4-27, and the standard deviation scale
factor, in Equation 4-28. The mean is simply the Moore-Penrose pseudoinverse of the predictive variables
multiplied by the observations. Given some observations, the posterior probability of the model variance is
computed, and an inverse chi-squared distribution (see Equation 4-29), with n - k degrees of freedom and
a scale factor s2 (see Equation 4-30), is obtained. The scale factor is the error between the model’s predicted
output and an observation.
1
P (w,s ) µ
2 (4-25)
s2

P (Y | w ,s 2 , X ) P (w|s 2 )
P (w|s 2 ,Y ) = ~  (w E ,vws 2 ) (4-26)
P (Y |s 2 )

w E = ( X T X ) X T Y
-1 (4-27)

vw = ( X X )
T -1 (4-28)

P (Y |s 2 ) P (s 2 )
P (s |Y ) = ~ inv -  2 ( n - k , s 2 )
2 (4-29)
P (Y )

(Y - XwE ) (Y - XwE )
T
(4-30)
s2 =
n-k
The marginal posterior distribution of the model parameters, given the observations, is a multivariate
Student’s t-distribution, shown in Equation 4-31 and computed in Equation 4-32, with n - k degrees of
freedom, wE mean, and s2 scale factor, as P (w|s 2 ,Y ) has a normal distribution, and P (s 2 |Y ) has an inverse
chi-squared distribution.
(4-31)
P ( w |Y ) ~ t ( n - k , w E , s 2 )

P (w|Y ) = ò P (w|s 2 ,Y ) P (s 2 |Y ) ds 2 (4-32)

74
Chapter 4 ■ Support Vector Regression

Given the model parameter probability distributions and a set of predictive variables Xp, the marginal
posterior predictive distribution Yp, which is a multivariate Student’s t-distribution (see Equation 4-33) can
be determined. The mean is computed in Equation 4-34, and the variance, in Equation 4-35. The predictive
distribution variance depends on the uncertainty in the observed data and the model parameters.

(
P (YP |Y ) ~ t n - k , E (Yp |Y ) ,var (Yp |s ,Y )
2
) (4-33)

E (Yp |Y ) = X pw E (4-34)

var (Yp |s ,Y ) = ( I + X pvw X p )s

2 T 2 (4-35)

The concept of Bayesian regression is displayed in Figure 4-5, in which the sample input data available
during training would have been generated by a Gaussian distribution. If these instances represent their
population well, the regression model is expected to generalize well.

Figure 4-5. One-dimensional regression example illustrating the Gaussian conditional probability
distributions of the output on the input and model parameters

DISCRIMINANT VS. GENERATIVE MODELS

A generative approach models the joint probability distribution of the data and class labels p(x, Ck ),
based on the prior probability distributions of the class labels p(Ck ) and the likelihood probability
distribution p ( x|C k ). The joint distribution computes the posterior probability distributions p (C k |k ) ,
which will be used to map datapoints to class labels.
A discriminant approach directly computes the posterior probability distributions p (C k |x ) without
computing the joint probability distribution p(x, Ck ). A discriminant approach produces a mapping from
the datapoints to the class labels without computing probability distributions. Therefore, this approach
performs the inference and decision stages in one step.

75
Chapter 4 ■ Support Vector Regression

Advantages Disadvantages
Generative • Robust to outliers • Computationally demanding
• Can easily update decision model • Requires a lot of training data
• Allows combination of classifiers trained • Suffers from the curse of
on different types of data by applying dimensionality
probability rules
• Can improve prediction accuracy by
measuring confidence in classification based
on posterior distributions and not making
predictions when confidence is low
Discriminant • Computationally less demanding • Sensitive to noisy data and outliers
• Simple to implement • Requires retraining for any changes
in the decision model

Asymmetrical SVR for Power Prediction: Case Study

Justification: In many instances of approximation, there is an uneven consequence of misprediction,
based on whether the error is above or below the target value (Stockman et al. 2012a, 2012b). For example,
in power prediction an incorrect low estimate may be of much more concern than an overestimate.
Underpredicting can lead to insufficient cooling of datacenters, inadequate uninterruptible power supply
(UPS), unavailable processor resources, needless powering down of chip components, and so on. In the case
of forest fire behavior prediction, a lower estimate of the threat can lead to greater property damage as well
as loss of life, owing to a lack of adequate supply of personnel and equipment.
In these instances, it is crucial to minimize misestimates on one side of a boundary, even at the risk of
reducing the accuracy of the entire estimation. It is necessary to restrict the loss function so that a minimal
number of under- or overestimates occur. This leads to an asymmetrical loss function for training, in which a
greater penalty is applied when the misestimate is on the wrong side of the boundary.
Approach: Asymmetrical and lower-bounded SVR (ALB-SVR) was proposed by Stockman, Awad, and
Khanna (2012a). This approach modifies the SVR loss functions and corresponding error functions, such
that the e-tube is only above the function, as demonstrated in Figure 4-6. The penalty parameter C is split
into C+ and C- so that different penalties can be applied to the upper and lower mispredictions.

Figure 4-6. (a) SVR and (b) ALB-SVR (Source: Intel, 2012)

76
Chapter 4 ■ Support Vector Regression

ALB-SVR uses the Huber insensitive loss function (Popov and Sautin 2008). This function is similar to
the e-insensitive loss function; however, it increases quadratically for small errors outside the e-bound but
below a certain threshold ¶ > e and then linearly beyond ¶ . This makes it robust with respect to outliers.
The Huber insensitive loss function is represented by:
ì0 if t - y £ e
ïï
L e¶HuberSVR ( t , y ) = í( t - y - e )
2
if e < t - y < ¶
ï
ïî( ¶ - e ) ( 2 t - y - ¶ - e ) if t - y ³ ¶.

ALB-SVR modifies the Huber insensitive loss function as follows:

ì0 if 0 ³ ( t - y ) £ e
ï
ïï( t - y ) if ( t - y ) < 0
2

L e¶HuberALB -SVR ( t , y ) = í
ï( (t - y ) - e )
2
if e < (t - y ) < ¶
ï
ïî( ¶ - e ) ( 2 t - y - ¶ - e ) if t - y ³ ¶.

Thus, the solution is:

é L + 1 L ù
ê å (a i - a i )t i - å
-
(ea i+ 2 - a i- 2 )ú
ê i =1 2C i= 1 ú,
max a + ,a - ê 1 ú
ê - å(a i+ - a i- )(a i+ - a i- ) xi × x j ú
êë 2 i , j úû

and the resulting optimization problem:

é L + 1 L -2 ù
ê -å(a i - a i )ti + 2C å(ea i - a i )ú
- +2

max a + ,a - ê i =1 i =1
ú
ê 1 ú
ê + å(a i - a i )(a i - a i )xi × x j
+ - + -
ú
ë 2 i,j û
-C £ (a i+ - a i- ) £ C i = 1..L
L

å(a1
i
+
- a i- ) = 0.

By substituting the new loss function, ALB-SVR’s empirical risk becomes

1 L
Remp ( y ) = åLe -ALB-SVR (ti , yi ).
L i =1

The maximum additional empirical risk for ALB-SVR can be computed as

L L

å (y -t ) + å
iÎ( y -t )£e
e.
iÎ( y -t )>e

Validation: ALB-SVR was tested on a dataset used by David et al. (2010) and Stockman et al. (2010)
that consists of 17,765 samples of five attributes of memory activity counters, with the actual corresponding
power consumed in watts, as measured directly by a memory power riser. The memory power model
attributes are activity, read, write, CKE = high, and CKE = low. ALB-SVR was implemented with a modified

77
Chapter 4 ■ Support Vector Regression

version of LIBSVM (Chang and Lin 2011) for ALB-SVR. Simulation results (see Figures 4-7 – 4-9) took the
average of ten runs of threefold cross-validation of a radial basis function (RBF) kernel, with a combination
of grid search and heuristic experimentation to find the best metaparameters e, g, C+, and C–.

% % Out of
Type
Error Bound

Huber
insensitive 512 – 128 0.1 1.0e-06 1.03 67.07
SVR

Huber
insensitive 10,000,000 1,000 128 0.1 1.0e- 06 1.50 0.24
ALB-SVR

Figure 4-7. Comparative results of SVR versus ALB-SVR (Source: Intel, 2012)

Figure 4-8. Power estimates for running average power limit (RAPL) data with Huber insensitive SVR
(Source: Intel, 2012)

78
Chapter 4 ■ Support Vector Regression

Figure 4-9. Power estimates for RAPL data with Huber insensitive ALB-SVR (Source: Intel, 2012)

In SVR, support vectors are those points that lie outside the e-tube. The smaller the value of e, the more
points that lie outside the tube and hence the greater the number of support vectors. With ALB-SVR the
e -tube is cut in half, and the lower e -bound is dropped. Therefore, for the same g and e parameters, more
points lie outside the tube, and there are a larger number of support vectors. This means that the number of
support vectors is greater for ALB-SVR than for SVR. This increase in the number of support vectors indicates
that using ALB-SVR has some negative effects on the complexity of the estimating function. Although the
percentage relative error data set was higher (5.06 percent), this is acceptable, because the main purpose
was to reduce the number of underestimates and this was achieved.

References
Chang, Chih-Chung, and Chih-Jen Lin. “LIBSVM: A Library for Support Vector Machines,” in “Large-Scale
Machine Learning,” edited by C. Ling, special issue, ACM Transactions on Intelligent Systems and Technology 2,
no. 3 (2011). www.csie.ntu.edu.tw/~cjlin/papers/libsvm.pdf.
David, Howard, Eugene Gorbatov, Ulf R. Hanebutte, Rahul Khanna, and Christian Le. “RAPL: Memory Power
Estimation and Capping.” In Proceedings of the 2010 ACM/IEEE International Symposium on Low-Power
Electronics and Design (ISLPED), August 18–20, 2010, Austin, TX, 189–194. Piscataway, NJ: Institute for Electrical
and Electronics Engineers, 2010.
Huber, Peter J. “Robust Estimation of a Location Parameter.” Annals of Mathematical Statistics 35, no. 1
(1964): 73–101.
Lee, Yuh-Jye, Wen-Feng Hsieh, and Chien-Ming Huang. “e-SSVR: A Smooth Support Vector Machine for
e-Insensitive Regression.” IEEE Transactions on Knowledge and Data Engineering 17, no. 5 (2005): 678–685.

79
Chapter 4 ■ Support Vector Regression

Nath, J. Saketha, and Chiranjib Bhattacharyya. “Maximum Margin Classifiers with Specified False Positive
and False Negative Error Rates.” In Proceedings of the Seventh SIAM International Conference on Data
Mining, April 26–28, 2007, Minneapolis, MN, 35–46. 2007. http://dblp.uni-trier.de/rec/bibtex/conf/
sdm/NathB07.
Norstrom, Jan Gerhard. “The Use of Precautionary Loss Functions in Risk Analysis.” IEEE Transactions on
Reliability 45, no. 3 (1996): 400–403.
Pan, Jeh-Nan, and Jianbiao Pan. “A Comparative Study of Various Loss Functions in the Economic
Tolerance Design.” In Proceedings of the 2006 IEEE International Conference on Management of Innovation
and Technology, June 21–23, 2006, Singapore, China, 783–787. Piscataway, NJ: Institute of Electrical and
Electronics Engineers, 2006.
Popov, A. A, and A. S. Sautin. “Loss Functions Analysis in Support Vector Regression,” 9th International
Conference on Actual Problems of Electronic Instrument Engineering, September 23–25, 2008, Novosibirsk,
Russia, 198. Piscataway, NJ: Institute of Electrical and Electronics Engineers, 2008.
Schabe, H. “Bayes Estimates Under Asymmetric Loss.” IEEE Transactions on Reliability 40, no. 1 (1991): 63–67.
Shim, Joo Yong, and Chang Ha Hwang. “Support Vector Quantile Regression Using Asymmetric e-Insensitive
Loss Function.” Communications for Statistical Applications and Methods 18, no. 2 (2011): 165–170.
Stockman, Melissa, Mariette Awad, and Rahul Khanna. “Asymmetrical and Lower Bounded Support Vector
Regression for Power Prediction.” Intel Technology Journal 16, no. 2 (2012a).
Stockman, Melissa, Mariette Awad, Rahul Khanna, Christian Le, Howard David, Eugene Gorbatov, and Ulf
R. Hanebutte. “A Novel Approach to Memory Power Estimation Using Machine Learning.” In Proceedings of
the 2010 International Conference on Energy Aware Computing (ICEAC), December 16–18, 2010, Cairo, Egypt,
1–3. Piscataway, NJ: Institute for Electrical and Electronics Engineers, 2010.
Stockman, Melissa, Randa S. El Ramli, Mariette Awad, and Rabih Jabr. “An Asymmetrical and Quadratic
Support Vector Regression Loss Function for Beirut Short Term Load Forecast.” In2012 IEEE International
Conference on Systems, Man, and Cybernetics (SMC), October 14–17, 2012, Seoul, Korea, 651–656. Piscataway,
NJ: Institute of Electrical and Electronics Engineers, 2012b.
Vapnik, Vladimir N. Statistical Learning Theory. New York: Wiley, 1998.

View publication stats

Practical Introduction To Non-Linear FEA
No ratings yet
Practical Introduction To Non-Linear FEA
202 pages
Technical Specifications MM92
No ratings yet
Technical Specifications MM92
2 pages
Ucsp Q1 Mod 1 PDF
89% (9)
Ucsp Q1 Mod 1 PDF
26 pages
BS en 12150 - 1 2015 Thermally Toughened
No ratings yet
BS en 12150 - 1 2015 Thermally Toughened
42 pages
Balakotaiah V Ratnakar RR Applied Linear Analysis For Chemic
No ratings yet
Balakotaiah V Ratnakar RR Applied Linear Analysis For Chemic
783 pages
Introduction To Graph Theory. Implementation of The Graph Using The Python Language.
No ratings yet
Introduction To Graph Theory. Implementation of The Graph Using The Python Language.
18 pages
1963-Paris, P. ErdoganA Critical Analysis of Crack Propagation Laws
No ratings yet
1963-Paris, P. ErdoganA Critical Analysis of Crack Propagation Laws
6 pages
Risk Assessment of Shoe Manufacturing Process (Service Ind. LTD.) 123
No ratings yet
Risk Assessment of Shoe Manufacturing Process (Service Ind. LTD.) 123
6 pages
Notes MB
No ratings yet
Notes MB
102 pages
D Smith Thesis June 2022
No ratings yet
D Smith Thesis June 2022
285 pages
Anomaly Detection: A Tutorial: Arindam Banerjee, Varun Chandola, Vipin Kumar, Jaideep Srivastava
No ratings yet
Anomaly Detection: A Tutorial: Arindam Banerjee, Varun Chandola, Vipin Kumar, Jaideep Srivastava
101 pages
M.Sc. Graph - Theory 2024 24 1
No ratings yet
M.Sc. Graph - Theory 2024 24 1
165 pages
3 Matlab Anfis
No ratings yet
3 Matlab Anfis
28 pages
System Identi Cation Data-Driven Modelling of Dynamic Systems - Paul M.J. Van Den Hof
No ratings yet
System Identi Cation Data-Driven Modelling of Dynamic Systems - Paul M.J. Van Den Hof
305 pages
Introduction To Machine Learning: Dr.S.Sankar Ganesh Vellore Institute of Technology
No ratings yet
Introduction To Machine Learning: Dr.S.Sankar Ganesh Vellore Institute of Technology
132 pages
Lecture 3
No ratings yet
Lecture 3
14 pages
Lecture13 ANFIS
No ratings yet
Lecture13 ANFIS
43 pages
Tutorials Book on Deep Chem
No ratings yet
Tutorials Book on Deep Chem
654 pages
Lec-3 Modelling Examples
No ratings yet
Lec-3 Modelling Examples
79 pages
Fuzzy Logic (PDFDrive)
No ratings yet
Fuzzy Logic (PDFDrive)
125 pages
Thesis Chen Haoran G1200371L
100% (1)
Thesis Chen Haoran G1200371L
150 pages
Moseley 2022 Physics-Informed Machine Learning-1
No ratings yet
Moseley 2022 Physics-Informed Machine Learning-1
268 pages
Condition Monitoring of A Turbfan Engine - NCMAPSS
No ratings yet
Condition Monitoring of A Turbfan Engine - NCMAPSS
46 pages
A Gentle Introduction To Graph Neural Network
No ratings yet
A Gentle Introduction To Graph Neural Network
122 pages
Neural Approaches Im Engeineering Matlab Supplement
No ratings yet
Neural Approaches Im Engeineering Matlab Supplement
218 pages
Book ML in Python For PSE
No ratings yet
Book ML in Python For PSE
57 pages
Ammonia_Production_Plants_A_Review
No ratings yet
Ammonia_Production_Plants_A_Review
28 pages
101 B Primary Reformer: Ammonia Plant Process Flow Diagram
No ratings yet
101 B Primary Reformer: Ammonia Plant Process Flow Diagram
1 page
Notes On Time Series Analysis
No ratings yet
Notes On Time Series Analysis
111 pages
Extrafiles File 1210
No ratings yet
Extrafiles File 1210
140 pages
GOOD Notes For System Identification and Parameter Estimation
No ratings yet
GOOD Notes For System Identification and Parameter Estimation
103 pages
MPC
No ratings yet
MPC
250 pages
Basic Design of Primary Reforming of Methane Final Report Submitted
No ratings yet
Basic Design of Primary Reforming of Methane Final Report Submitted
26 pages
Fault Detection Classification
No ratings yet
Fault Detection Classification
210 pages
Industrial Statistics - A Computer Based Approach With Python
No ratings yet
Industrial Statistics - A Computer Based Approach With Python
140 pages
03 Plant Design and Economics Heuristics R02
No ratings yet
03 Plant Design and Economics Heuristics R02
124 pages
2.thermal Analysis of An
No ratings yet
2.thermal Analysis of An
5 pages
Support Vector Machines: Dominik Wisniewski Wojciech Wawrzyniak
No ratings yet
Support Vector Machines: Dominik Wisniewski Wojciech Wawrzyniak
16 pages
Machine Learning
No ratings yet
Machine Learning
135 pages
2022 - Chua Shi, Xiao Wang, Philip S. Yu - Heterogeneous Graph Representation Learning and Applications-Springer
No ratings yet
2022 - Chua Shi, Xiao Wang, Philip S. Yu - Heterogeneous Graph Representation Learning and Applications-Springer
329 pages
Coupled Tank - MPC
No ratings yet
Coupled Tank - MPC
6 pages
Introduction To Model Predictive Control (MPC) : Oscar Mauricio Agudelo Mañozca Bart de Moor
No ratings yet
Introduction To Model Predictive Control (MPC) : Oscar Mauricio Agudelo Mañozca Bart de Moor
17 pages
Signal Analysis Using MATLAB
No ratings yet
Signal Analysis Using MATLAB
8 pages
Fuzzy Inference System - Theory and Applications
No ratings yet
Fuzzy Inference System - Theory and Applications
518 pages
Dynamic Model of Non-Interacting Isothermal CSTR
100% (1)
Dynamic Model of Non-Interacting Isothermal CSTR
39 pages
A First Course in Fuzzy and Neural Control - 2003 - Hung T. Nguyen Et Al
No ratings yet
A First Course in Fuzzy and Neural Control - 2003 - Hung T. Nguyen Et Al
305 pages
21AML543 - Fundamentals of Data Science
No ratings yet
21AML543 - Fundamentals of Data Science
4 pages
Thoma J., Mocellin G. Simulation With Entropy in Engineering Thermodynamics.. Understanding Matter and Systems With Bondgraphs (Springer, 2006)
No ratings yet
Thoma J., Mocellin G. Simulation With Entropy in Engineering Thermodynamics.. Understanding Matter and Systems With Bondgraphs (Springer, 2006)
141 pages
6940 Dieck Measurement5 PDF
No ratings yet
6940 Dieck Measurement5 PDF
36 pages
Deep Learning Methods
No ratings yet
Deep Learning Methods
336 pages
Applications of DOE in Engineering and Science 2019 Revised-LYE Sept 17 PDF
No ratings yet
Applications of DOE in Engineering and Science 2019 Revised-LYE Sept 17 PDF
214 pages
Linear Control Systems, Modeling, Analysis and Design
100% (2)
Linear Control Systems, Modeling, Analysis and Design
262 pages
Aspen Plus
No ratings yet
Aspen Plus
214 pages
Integration of Demand Response into the Electricity Chain: Challenges, Opportunities, and Smart Grid Solutions
From Everand
Integration of Demand Response into the Electricity Chain: Challenges, Opportunities, and Smart Grid Solutions
Arturo Losi
No ratings yet
Introductory Applications of Partial Differential Equations: With Emphasis on Wave Propagation and Diffusion
From Everand
Introductory Applications of Partial Differential Equations: With Emphasis on Wave Propagation and Diffusion
G. L. Lamb, Jr.
No ratings yet
Support Vector Regression PDF
No ratings yet
Support Vector Regression PDF
15 pages
Support Vector Regression
No ratings yet
Support Vector Regression
14 pages
Another Introduction SVM
No ratings yet
Another Introduction SVM
4 pages
Support Vector Machines: Review and Applications in Civil: October 2011
No ratings yet
Support Vector Machines: Review and Applications in Civil: October 2011
15 pages
K-SVM: An Effective SVM Algorithm Based On K-Means Clustering
No ratings yet
K-SVM: An Effective SVM Algorithm Based On K-Means Clustering
8 pages
SVM Tutorial
No ratings yet
SVM Tutorial
28 pages
Deon Garrett Et Al - Comparison of Linear and Nonlinear Methods For EEG Signal Classification
No ratings yet
Deon Garrett Et Al - Comparison of Linear and Nonlinear Methods For EEG Signal Classification
7 pages
On The Feasibility of Portfolio Optimization Under Expected Shortfall 0606015
No ratings yet
On The Feasibility of Portfolio Optimization Under Expected Shortfall 0606015
9 pages
Test 1 - May 2023
No ratings yet
Test 1 - May 2023
17 pages
HKDSE 2013 Physics (English Paper) - PaperPapa
No ratings yet
HKDSE 2013 Physics (English Paper) - PaperPapa
1 page
INDOROCK 2017 Paper - Storage of Crude Oil in Solution Mined Salt Caverns
No ratings yet
INDOROCK 2017 Paper - Storage of Crude Oil in Solution Mined Salt Caverns
10 pages
XL184-313 Study Pharmacy Manual - Temperature Excursion Form - V1.0 05052022
No ratings yet
XL184-313 Study Pharmacy Manual - Temperature Excursion Form - V1.0 05052022
1 page
Effect of Management Strategies On Reducing Heat Stress of Feedlot Cattle: Feed and Water Intake
No ratings yet
Effect of Management Strategies On Reducing Heat Stress of Feedlot Cattle: Feed and Water Intake
11 pages
Sun Tracking Solar Panel Using Arduino: Presented by
No ratings yet
Sun Tracking Solar Panel Using Arduino: Presented by
11 pages
Three Decades of Strategic Management Research On M&As: Citations, Co-Citations, and Topics
No ratings yet
Three Decades of Strategic Management Research On M&As: Citations, Co-Citations, and Topics
12 pages
Nursing Prospectus
No ratings yet
Nursing Prospectus
2 pages
New Pangea - Homogenizing The Future Biosphere
No ratings yet
New Pangea - Homogenizing The Future Biosphere
10 pages
Alphabrain: How A Group of Iconoclasts Are Using Cognitive Science To Advance The Business of Alpha Generation Stephen Duneier
No ratings yet
Alphabrain: How A Group of Iconoclasts Are Using Cognitive Science To Advance The Business of Alpha Generation Stephen Duneier
62 pages
Instrumentation Design Engineering Oil Gas Brochure
100% (1)
Instrumentation Design Engineering Oil Gas Brochure
8 pages
Friction Revision Worksheet
No ratings yet
Friction Revision Worksheet
2 pages
Manoj Meena Proficiency
No ratings yet
Manoj Meena Proficiency
9 pages
Electromagnetic Spectrum Research Activity C12-2-01
No ratings yet
Electromagnetic Spectrum Research Activity C12-2-01
3 pages
Benefits of Time Study
No ratings yet
Benefits of Time Study
2 pages
Bow Catch Up Fridays g7
No ratings yet
Bow Catch Up Fridays g7
3 pages
Nid Mdes 2018
No ratings yet
Nid Mdes 2018
24 pages
Unit Test 2.1
No ratings yet
Unit Test 2.1
2 pages
5 0 Applications First - Order - Odes
No ratings yet
5 0 Applications First - Order - Odes
16 pages
Synopsis
No ratings yet
Synopsis
18 pages
Fiber Lasers: Fundamentals with MATLAB Modelling 1st Edition Johan Meyer (Editor) - Download the ebook today and experience the full content
100% (1)
Fiber Lasers: Fundamentals with MATLAB Modelling 1st Edition Johan Meyer (Editor) - Download the ebook today and experience the full content
65 pages
Genil - Yzabel - B - Activity - 2 - in - The - Contemporary - World
No ratings yet
Genil - Yzabel - B - Activity - 2 - in - The - Contemporary - World
2 pages
Master Merit List of - BS Information Technology
No ratings yet
Master Merit List of - BS Information Technology
268 pages
Test Kinematics
No ratings yet
Test Kinematics
9 pages
Dissertation On Corporate Social Responsibility PDF
100% (1)
Dissertation On Corporate Social Responsibility PDF
4 pages
Ar 3595
No ratings yet
Ar 3595
54 pages
JIS G 3141 Oos: Strips
No ratings yet
JIS G 3141 Oos: Strips
28 pages