0% found this document useful (0 votes)
42 views9 pages

Article Kernal Model

This document compares kernel regression and polynomial regression models on financial data. It finds that kernel regression, a nonparametric approach, performed better than quadratic polynomial regression, a parametric approach, on share price data from Mastercard from 2019. The kernel regression model had a lower RMSE value of 16.00147 compared to the polynomial regression model and selected an optimal bandwidth of 25.64 for the data. Nonparametric models like kernel regression are useful when the relationship between variables does not follow a certain pattern that can be modeled parametrically.

Uploaded by

Chahine Bergaoui
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views9 pages

Article Kernal Model

This document compares kernel regression and polynomial regression models on financial data. It finds that kernel regression, a nonparametric approach, performed better than quadratic polynomial regression, a parametric approach, on share price data from Mastercard from 2019. The kernel regression model had a lower RMSE value of 16.00147 compared to the polynomial regression model and selected an optimal bandwidth of 25.64 for the data. Nonparametric models like kernel regression are useful when the relationship between variables does not follow a certain pattern that can be modeled parametrically.

Uploaded by

Chahine Bergaoui
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Journal of Physics: Conference Series

PAPER • OPEN ACCESS You may also like


- Research on Algorithms of Estimating
Comparison of Kernel regression model with a Photometric Redshifts Based on Large
Sky Survey Databases
polynomial regression model on financial data Dan Wang

- Real-time tumor motion estimation using


respiratory surrogate via memory-based
To cite this article: Nur’eni et al 2021 J. Phys.: Conf. Ser. 1763 012017 learning
Ruijiang Li, John H Lewis, Ross I Berbeco
et al.

- Learning curves of generic features maps


for realistic datasets with a teacher-student
View the article online for updates and enhancements. model
Bruno Loureiro, Cédric Gerbelot, Hugo Cui
et al.

This content was downloaded from IP address 102.158.90.208 on 15/06/2023 at 14:29


The 2-nd International Seminar on Science and Technology 2020 (ISST-2) 2020 IOP Publishing
Journal of Physics: Conference Series 1763 (2021) 012017 doi:10.1088/1742-6596/1763/1/012017

Comparison of Kernel regression model with a polynomial


regression model on financial data

Nur’eni 1*, M Fajri2 and S Astuti3


1
Statistics Study Program, Tadulako University, Indonesia
2
Statistics Study Program, Tadulako University, Indonesia
3
BPS, Statistics of Sulawesi Tengah Province, Indonesia

*Email: [email protected]

Abstract: Regression analysis is constructed for determining the influence of independent


variables on the dependent variable. It can be done by looking at the relationship between those
variables. This task of approximating the mean function can be done essentially in two
approaches, parametric and nonparametric approach. Kernel regression is one of the models with
a nonparametric approach, and polynomial quadratic regression is one of the models with the
parametric approach. This research aims to find the best model regression with compare to the
model of kernel regression and model of polynomial quadratic regression in financial data using
RMSE criterion. Share data that be used is Mastercard Incorporated (MA) with data periods 02
Januari 2019 until 31st December 2019. Research’s result indicated that for MA data, best model
regression is kernel regression with RMSE value = 16,00147 and Bandwidth (h) = 25,64.

1. Introduction
Regression analysis is a statistical analysis used to see the effect of independent variables related to
variables, first looks at the pattern of these relationships [1]. In some financial cases, there are many
problems with the relationship between the dependent variable and the independent variable where the
forms of the relationship do not have a certain pattern, so that they are not resolved, a regression
approach is used with a nonlinear form, where the data pattern is assumed to be known [5] or
nonparametric form, where the data pattern is not assumed to follow any certain pattern [8]. Not all
cases of parametric patterned data in regression analysis follow a certain form like linear, quadratic or
cubic, so the other approaches needed like semiparametric or nonparametric approaches [3]. One method
that can be used in non-linear regression is polynomial quadratic regression, while the method that can
be used in nonparametric regression is kernel regression. Quadratic regression analysis is the
development of linear regression, where the data modelled in quadratic regression has or forms a
quadratic pattern when visualized in a graph or diagram. Meanwhile, the kernel regression equation is
carried out using the smoothing technique, which is based on the kernel function used [7].
This research aims to get the best regression model between the kernel regression model and
quadratic polynomial model regression in financial data based on the RSME value and bandwidth
obtained from the two methods.

Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution
of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.
Published under licence by IOP Publishing Ltd 1
The 2-nd International Seminar on Science and Technology 2020 (ISST-2) 2020 IOP Publishing
Journal of Physics: Conference Series 1763 (2021) 012017 doi:10.1088/1742-6596/1763/1/012017

2. Materials and methods


The data used in this study are financial publications data from yahoo financial, from January 1 to
December 31, 2019 [10]. In reality, not all data can be estimated with the parametric regression approach
because there is no complete information about the shape of the regression curve. In this approach, a
nonparametric regression approach can be used [5]. For a sample of size n observational data, the
relationship between these variables can be expressed by the regression model as follows:

𝑌𝑖 = 𝑚(𝑋𝑖 ) + 𝜀𝑖 , 𝜀𝑖 ~𝑁(0, 𝜎 2 ), 𝑖 = 1,2, … , 𝑛 (1)

Nonparametric estimators that are widely used are smoothing estimators; one example of smoothing
is Kernel Regression. The purpose of smoothing is to remove data variability that does not affect so that
the characteristics of the data will appear clearer so that the resulting curve will be smooth.

2.1 Kernel functions


The kernel function is denoted by K(x) which is a function whose use is applied to each data point.
Following are the characteristics of a kernel function [4]

a. ∫−∞ 𝐾(𝑥)𝑑𝑥 = 1

b. ∫−∞ 𝑥𝐾(𝑥)𝑑𝑥 = 0

c. ∫−∞ 𝑥 2 𝐾(𝑥)𝑑𝑥 = 𝜇2 (𝐾) ≠ 0
∞ ∞
d. ∫−∞[𝐾(𝑥)]2 𝑑𝑥 = ∫−∞ 𝐾 2 (𝑥)𝑑𝑥 = ‖𝐾‖22

In the kernel method, the density estimator for a value of x is denoted by 𝑓̂ℎ (𝑥)f, which is expressed in
the following formula:
1 𝑥−𝑋
𝑓̂ℎ (𝑥) = ∑𝑛 𝐾 ( 𝐼 ) (2)
𝑛ℎ 𝑖=1 ℎ

Here is a general form of the K (x) kernel in terms of bandwidth usage:

1 𝑥
𝐾ℎ (𝑥) = ℎ 𝐾 (ℎ) (3)

Kernel density estimator theorem:



If the kernel function is a density function ∫−∞ 𝐾(𝑢)𝑑𝑢 = 1, then the function estimator using the kernel
function is also a probability density function.

2.2 Kernel density estimator


In the kernel density estimator, there are two kinds of parameters [4], namely:
a. Bandwidth h, dan
b. Density function of kernel K
To select h from the kernel density function K it is necessary to check the unbiased asymptote of 𝑓ℎ (𝑥)
as follows:
∞ 1 𝑥−𝑢
𝐸[𝐾ℎ (𝑥)] = ∫−∞ 𝐾 ( ) 𝑓(𝑢)𝑑𝑢 (4)
ℎ ℎ

Based on the nature of the kernel function:

ℎ 2
𝐵𝑖𝑎𝑠|𝑓̂ℎ (𝑥)| = 2 𝑓 ′′ ℎ (𝑥)𝜇2 (𝐾) + 𝑜(ℎ2 ), ℎ → 0 (5)

2
The 2-nd International Seminar on Science and Technology 2020 (ISST-2) 2020 IOP Publishing
Journal of Physics: Conference Series 1763 (2021) 012017 doi:10.1088/1742-6596/1763/1/012017

As for the variance:

𝑉𝑎𝑟|𝑓̂ℎ (𝑥)| = 𝑛−1 ℎ−1 ‖𝐾‖22 𝑓(𝑥) + 𝑜((𝑛ℎ)−1 ), 𝑛ℎ → ∞ (6)

After the bias and variance are obtained, then analyze the MSE which is a combination of variance and
bias squared from 𝑓̂ℎ (𝑥) as follows:

1 ℎ 4
𝑀𝑆𝐸|𝑓̂ℎ (𝑥)| = 𝑛ℎ 𝑓(𝑥)‖𝐾‖22 + 𝑓(𝑥) + 4 (𝑓 ′′ ℎ (𝑥)𝜇2 (𝐾)2 ) + 𝑜((𝑛ℎ)−1 ) + 𝑜(ℎ4 ) (7)

Using the MSE formula is very difficult to use because there is an unknown density function f (x). For
this reason the MISE (Mean Integrated Squared Error) is defined as follows:

1 ℎ 4
𝑀𝑆𝐸|𝑓̂ℎ (𝑥)| = 𝑛ℎ ‖𝐾‖22 + 4 𝜇2 (𝐾)2 )‖𝑓 ′′ ‖22 + 𝑜((𝑛ℎ)−1 ) + 𝑜(ℎ4 ) (8)

2.3 Kernel regression


Kernel regression is a nonparametric statistical technique used to estimate the value of the Conditional
Expectation of a random variable. Typical expected value is denoted by E(Y|X). Mathematically, for
any x value, the smoothing estimator for m(x) can be expressed as follows:
1
̂ (𝑥) = 𝑛 ∑𝑛𝑖=1 𝜔ℎ𝑖 (𝑥)𝑌𝑖
𝑚 (9)

Where 𝜔ℎ𝑖 (𝑥) can be defined as a weighted function:

𝐾ℎ (𝑥−𝑋𝑖 )
𝜔ℎ𝑖 = 𝑓̂ℎ (𝑥)
(10)

Where:
1
𝑓̂ℎ (𝑥) = ∑𝑛𝑗=1 𝐾ℎ (𝑥 − 𝑋𝐼 ) (11)
𝑛

̂ ℎ (𝑥) from m(x) is obtained as


By substituting (11) into (10), the Nadaraya-Watson kernel estimator 𝑚
follows [2]:
𝑥−𝑋𝑖
∑𝑛
𝑖=1 𝐾( )𝑌𝑖
̂ ℎ𝑁𝑊
𝑚 = ℎ
𝑥−𝑋 (12)
∑𝑛
𝑗=1 𝐾( ℎ )
𝑖

The choice of kernel function to be used in this study is the Gaussian kernel, namely:

1 𝑥2
𝐾𝐺 (𝑥) = 𝑒𝑥𝑝 (− ) (13)
√2𝜋 2

2.4 Optimum bandwidth selection


Given a bandwidth value that is too small will produce a rough estimation curve, on the other hand a
bandwidth that is too large will produce a very smooth estimation curve. There are several ways to
approach the optimum bandwidth selection, one of which is by using the plug-in method. The approach
using the plug-in method is more based on an extension of the Mean Integrated Square Error (MISE)
for kernel smoothing which can be seen in equation (8) so that the assymptotic MISE (A-MISE) is
obtained by ignoring the use of 𝑜((𝑛ℎ)−1 ) + 𝑜(ℎ4 ), so:

3
The 2-nd International Seminar on Science and Technology 2020 (ISST-2) 2020 IOP Publishing
Journal of Physics: Conference Series 1763 (2021) 012017 doi:10.1088/1742-6596/1763/1/012017

1 ℎ4
𝐴 − 𝑀𝐼𝑆𝐸 = 𝑛ℎ ‖𝐾‖22 + 4 𝜇2 (𝐾)2 )‖𝑓 ′′ ‖22 (14)

So that the optimum bandwidth size is:

1/5
‖𝐾‖22
ℎ𝑜𝑝𝑡 = ( )
(‖𝑓 ′′ ‖22 )(𝜇2 (𝐾))2 𝑛
1,06
ℎ𝑜𝑝𝑡 = 1 𝜎 (15)
𝑛 ⁄5

2.5 Optimum selection of Kernel regression model with optimal h


In accordance with the objective of the nonparametric regression approach, which is to obtain a smooth
curve that has an optimum h using n data, it is necessary to measure the performance of the estimator
that is universally accepted. The performance measure for this estimator is to calculate the average
number of squares of residues (Mean Square Error-MSE). The performance measure of the simple
estimator is the square of the remainder, which is averaged by the following formula:
1
𝑀𝑆𝐸 = ∑𝑛𝑖=1(𝑚ℎ𝑁𝑊 (𝑥) − 𝑌𝑖 )2 (16)
𝑛
For 𝑖 = 1,2, … , 𝑛.
This criterion is expected to have a minimum value so that the kernel regression model can be said
to have an optimal h.

2.6 Quadratic Polynomial Regression


The polynomial regression model is a relationship between two variables consisting of the dependent
variable (Y) and the independent variable (X) so that a curve that forms a curved line will be obtained.
Here is a mathematical model of the quadratic polynomial regression equation:

𝑌𝑖 = 𝛽̂0 + 𝛽̂1 𝑋𝑖 + 𝛽̂2 𝑋𝑖2 + 𝑒𝑖 (17)

Where variables Y and X show statistical variables: 𝛽̂0 , 𝛽̂1 , and 𝛽̂2 are estimators for 𝛽0 , 𝛽1 ,, and 𝛽2
which are called simple regression coefficients, e states the error component of the regression form [9].

2.7 Approach to Analysis of Variance in Quadratic Polynomial Nonlinear Regression


The ANOVA approach is based on breaking down the sum of the squares (sum square), and degrees of
freedom associated with the dependent variable Y. decomposing the sum of the total squares and degrees
of freedom is usually arranged in the form of an ANOVA table as follows [6]:

Table 1. analysis of variance (ANOVA)


Source of Sum of Squares Degrees of Freedom Average Square
Variation (SS) (DF) (MS)
Regression SSR K 𝑆𝑆𝑅
𝑀𝑆𝑅 =
𝑘
Error SSE n-k-1 𝑆𝑆𝐸
𝑀𝑆𝐸 =
𝑛−𝑘−1
Total SST n-1

2.8 Parametric assumption test


2.8.1. Normality test Normality testing is a test of the normal distribution of data. This test can be done
using the Kolmogorov Smirnov test where decision making can be done by looking at the value of the
probability.

4
The 2-nd International Seminar on Science and Technology 2020 (ISST-2) 2020 IOP Publishing
Journal of Physics: Conference Series 1763 (2021) 012017 doi:10.1088/1742-6596/1763/1/012017

2.8.2. Linearity test Linearity test can be done using a plot. If the final conclusion obtained is that the
data is linear then the data cannot be used but if the data is nonlinear then we can continue the test to the
next stage, namely the quadratic polynomial regression coefficient test.

2.8.3. Quadratic polynomial regression coefficient test To determine whether the quadratic polynomial
regression is significant, we need to test the significance of the regression parameters. Simultaneous test
with ANOVA table for testing regression parameters together and partial test with t-test for testing
regression parameters separately [1].

3. Result and discussion


The kernel function used is the Gaussian kernel, where the boundaries of this kernel are between
(−∞, ∞). For the kernel nonparametric regression method with Nadaraya-Watson estimation, the
optimum bandwidth is obtained which can provide a curve shape in the estimated regression function.
The data used is data on daily closing price of shares (taken in the last trading period each day). The
Mastercard Incorporated share value data can be expressed in the following chart plot:

Figure 1. Mastercard Incorporated’s value stock

3.1. Kernel regression


From the calculation using the R program, the RMSE value = 16.00147 for the bandwidth size = 25.64.
The following is a graph of the kernel regression results with the Nadaraya-Watson estimate

Figure 2. Kernel Regression Graph with Nadaraya-Watson Estimation

5
The 2-nd International Seminar on Science and Technology 2020 (ISST-2) 2020 IOP Publishing
Journal of Physics: Conference Series 1763 (2021) 012017 doi:10.1088/1742-6596/1763/1/012017

In the picture above, it can be seen that the movement of the values generated by the Nadaraya-Watson
kernel estimator will follow the shape of each existing observation point so that it will produce an
optimum kernel nonparametric regression curve.

3.2. Quadratic Polynomial Regression


The following is the result of the quadratic polynomial regression graph

Figure 3. Quadratic Polynomial Regression Graph

From the picture above, it can be seen that the graph of the quadratic polynomial regression is parabolic
with the following regression equation:

𝑌̂ = 164,818 + 1,743𝑋 − 0,008𝑋 2

However, the regression above cannot be drawn as a conclusion because there has not been a parametric
assumption test and a quadratic polynomial regression coefficient test. For that, the next step is to test
the parametric assumptions.

3.3. Parametric assumption test


3.3.1. Normality test Normality was tested using the Kolmogorov Smirnov test.

Table 2. Kolmogorov Smirnov Test for One Sample


Normality Test Stock Price

Kolmogorov-Smirnov Z 1.314
Asymp. Sig. (2-tailed) 0.063

Based on the table, the probability value for the stock price variable is 0.063. This probability value can
be seen from the Asymp value. The significance (P)> 0.05, the data used is data that is normally
distributed.

3.3.2. Linearity test The linearity test shows that for the linear regression equation, the relationship
between the independent and dependent variables must be linear. This assumption will determine the
type of estimation equation used.

6
The 2-nd International Seminar on Science and Technology 2020 (ISST-2) 2020 IOP Publishing
Journal of Physics: Conference Series 1763 (2021) 012017 doi:10.1088/1742-6596/1763/1/012017

Figure 4. Scatterplot between standardized residual with standardized predicted

Based on the scatterplot above, it can be explained that the linear assumptions are not fulfilled. This can
be seen from the scatterplot graph above which forms a certain pattern so that there is a nonlinear
relationship between variables.

3.4. Quadratic polynomial regression coefficient test


This test is performed using the F-test through the Anova table. The hypothesis is:
𝐻0 : 𝛽1 = 𝛽2 = 0, means that the quadratic polynomial regression model is not significant
𝐻0 : 𝛽1 = 𝛽2 ≠ 0, means that the quadratic polynomial regression model is significant
Based on the results of data processing, the Anova table is obtained as follows:

Table 3. ANOVA results for the quadratic polynomial regression model test
Source of Sum of
Df Mean Square F Sig.
Variation Squares
Regression 501437.947 2 250718.973 408.129 .000
Residual 153578.172 250 614.313
Total 655016.119 252

Testing criteria:
𝐻0 is rejected if Fcount ≥ Ftable (1 − 𝑎; 𝑘; 𝑛 − 𝑘 − 1)
𝐻1 is accepted if Fcount < Ftable (1 − 𝑎; 𝑘; 𝑛 − 𝑘 − 1)
Obtained the value of 𝐹𝑡𝑎𝑏𝑙𝑒 is:
𝐹𝑡𝑎𝑏𝑙𝑒 (1 − 𝑎; 𝑘; 𝑛 − 𝑘 − 1) = 𝐹𝑡𝑎𝑏𝑙𝑒 (1 − 0,05; 2; 250)
= 3,035
Based on the Anova table, the mean square column in the residual row, the MSE value = 614,313 is
obtained, so the RMSE (Root of Mean Squared of Error) value is:

RMSE = √MSE
= √614,313
= 24,785

Table 4. Summary of the quadratic polynomial regression model


Adjusted R Std. Error of
r R Square
Square the Estimate
.875 .766 .764 24.785

7
The 2-nd International Seminar on Science and Technology 2020 (ISST-2) 2020 IOP Publishing
Journal of Physics: Conference Series 1763 (2021) 012017 doi:10.1088/1742-6596/1763/1/012017

3.5. Comparison of Kernel regression model with a quadratic polynomial regression model
From the two regression models that have been carried out, namely the kernel regression model with
Nadaraya-Watson estimation and the quadratic polynomial regression model, comparisons will be made
to determine which regression model is better. The comparison measure used is based on the RMSE
value of each model:

Table 5. Model comparison based on RMSE value


Regression Model RMSE
2
1 ((𝑥 − 𝑋𝑖 )/25,64)
∑𝑛𝑖=1 𝑒𝑥𝑝 (− ) 𝑌𝑖
√2𝜋 2
Nadaraya-Watson Kernel
𝑌̂𝑖 = 2 16.00147
Regression (hopt = 25,64) ((𝑥 − 𝑋𝑖 )/25,64)
1
∑𝑛𝑗=1 𝑒𝑥𝑝 (− )
√2𝜋 2
Quadratic Polynomial
𝑌̂ = 164,818 + 1,743𝑋 − 0,008𝑋 2
Regression Model 24,785

From the Table 5, it can be seen that the Nadaraya-Watson kernel nonparametric regression model
with a bandwidth value of = 25.64 provides a better estimate than the quadratic polynomial regression
model because it produces a smaller RMSE value than the quadratic polynomial regression so that the
best regression model is the kernel regression model with Nadaraya-Watson's estimate.

4. Conclusion
By using the quadratic polynomial regression model, the RMSE (root of Mean Squared Error) value is
24.785, while using the Nadaraya-Watson kernel regression model with the Gaussian kernel type, the
RMSE value is 16.00147 and the bandwidth size used is 25.64. By comparing the two regression models,
it is found that the Nadaraya-Watson kernel regression model with a bandwidth value of = 25.64 in
financial data is better than quadratic polynomial regression model because, in addition to providing a
lower error rate, the kernel regression model also provides a better estimate than the quadratic
polynomial regression model.

Acknowledgement
Thanks to Statistics Laboratory who assisted this research in the data processing. Thanks also to many
academic parts for their contribution to this research.

References
[1] Draper N R & Smith H 1998 Applied regression analysis 326 (New York: John Wiley & Sons)
[2] Eubank R L 1988 Spline smoothing and nonparametric regression 90 (New York: M Dekker).
[3] Fernandes A A R, Budiantara I N and Otok B W 2015 Journal of Mathematics and Statistics 11(2)
61
[4] Halim S and Bisono I 2006 Jurnal Teknik Industri 8(1) 73
[5] Härdle W 1990 Applied nonparametric regression 19 (Cambridge: Cambridge University Press)
[6] Nurgiyantoro B 2004 Statistik Terapan: Untuk Penelitian Ilmu-Ilmu Sosial (Yogyakarta: Gajah
Mada University Press)
[7] Puspitasari I, Suparti S and Wilandari Y 2012 Jurnal Gaussian 1(1) 93-102
[8] Wahyuni S A, Ratnawati R, Indriyani I and Fajri M 2020 Natural Science: Journal of Science
and Technology 9(2) 34-39
[9] Tiro M A 2008 Dasar-dasar statistika (Makassar: Andira Publisher)
[10] Yahoo Finance 2019 Data Saham Mastercard Incorporated (http://www.yahoofinance.com)

You might also like