Article Kernal Model
Article Kernal Model
*Email: [email protected]
1. Introduction
Regression analysis is a statistical analysis used to see the effect of independent variables related to
variables, first looks at the pattern of these relationships [1]. In some financial cases, there are many
problems with the relationship between the dependent variable and the independent variable where the
forms of the relationship do not have a certain pattern, so that they are not resolved, a regression
approach is used with a nonlinear form, where the data pattern is assumed to be known [5] or
nonparametric form, where the data pattern is not assumed to follow any certain pattern [8]. Not all
cases of parametric patterned data in regression analysis follow a certain form like linear, quadratic or
cubic, so the other approaches needed like semiparametric or nonparametric approaches [3]. One method
that can be used in non-linear regression is polynomial quadratic regression, while the method that can
be used in nonparametric regression is kernel regression. Quadratic regression analysis is the
development of linear regression, where the data modelled in quadratic regression has or forms a
quadratic pattern when visualized in a graph or diagram. Meanwhile, the kernel regression equation is
carried out using the smoothing technique, which is based on the kernel function used [7].
This research aims to get the best regression model between the kernel regression model and
quadratic polynomial model regression in financial data based on the RSME value and bandwidth
obtained from the two methods.
Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution
of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.
Published under licence by IOP Publishing Ltd 1
The 2-nd International Seminar on Science and Technology 2020 (ISST-2) 2020 IOP Publishing
Journal of Physics: Conference Series 1763 (2021) 012017 doi:10.1088/1742-6596/1763/1/012017
Nonparametric estimators that are widely used are smoothing estimators; one example of smoothing
is Kernel Regression. The purpose of smoothing is to remove data variability that does not affect so that
the characteristics of the data will appear clearer so that the resulting curve will be smooth.
In the kernel method, the density estimator for a value of x is denoted by 𝑓̂ℎ (𝑥)f, which is expressed in
the following formula:
1 𝑥−𝑋
𝑓̂ℎ (𝑥) = ∑𝑛 𝐾 ( 𝐼 ) (2)
𝑛ℎ 𝑖=1 ℎ
1 𝑥
𝐾ℎ (𝑥) = ℎ 𝐾 (ℎ) (3)
ℎ 2
𝐵𝑖𝑎𝑠|𝑓̂ℎ (𝑥)| = 2 𝑓 ′′ ℎ (𝑥)𝜇2 (𝐾) + 𝑜(ℎ2 ), ℎ → 0 (5)
2
The 2-nd International Seminar on Science and Technology 2020 (ISST-2) 2020 IOP Publishing
Journal of Physics: Conference Series 1763 (2021) 012017 doi:10.1088/1742-6596/1763/1/012017
After the bias and variance are obtained, then analyze the MSE which is a combination of variance and
bias squared from 𝑓̂ℎ (𝑥) as follows:
1 ℎ 4
𝑀𝑆𝐸|𝑓̂ℎ (𝑥)| = 𝑛ℎ 𝑓(𝑥)‖𝐾‖22 + 𝑓(𝑥) + 4 (𝑓 ′′ ℎ (𝑥)𝜇2 (𝐾)2 ) + 𝑜((𝑛ℎ)−1 ) + 𝑜(ℎ4 ) (7)
Using the MSE formula is very difficult to use because there is an unknown density function f (x). For
this reason the MISE (Mean Integrated Squared Error) is defined as follows:
1 ℎ 4
𝑀𝑆𝐸|𝑓̂ℎ (𝑥)| = 𝑛ℎ ‖𝐾‖22 + 4 𝜇2 (𝐾)2 )‖𝑓 ′′ ‖22 + 𝑜((𝑛ℎ)−1 ) + 𝑜(ℎ4 ) (8)
𝐾ℎ (𝑥−𝑋𝑖 )
𝜔ℎ𝑖 = 𝑓̂ℎ (𝑥)
(10)
Where:
1
𝑓̂ℎ (𝑥) = ∑𝑛𝑗=1 𝐾ℎ (𝑥 − 𝑋𝐼 ) (11)
𝑛
The choice of kernel function to be used in this study is the Gaussian kernel, namely:
1 𝑥2
𝐾𝐺 (𝑥) = 𝑒𝑥𝑝 (− ) (13)
√2𝜋 2
3
The 2-nd International Seminar on Science and Technology 2020 (ISST-2) 2020 IOP Publishing
Journal of Physics: Conference Series 1763 (2021) 012017 doi:10.1088/1742-6596/1763/1/012017
1 ℎ4
𝐴 − 𝑀𝐼𝑆𝐸 = 𝑛ℎ ‖𝐾‖22 + 4 𝜇2 (𝐾)2 )‖𝑓 ′′ ‖22 (14)
1/5
‖𝐾‖22
ℎ𝑜𝑝𝑡 = ( )
(‖𝑓 ′′ ‖22 )(𝜇2 (𝐾))2 𝑛
1,06
ℎ𝑜𝑝𝑡 = 1 𝜎 (15)
𝑛 ⁄5
Where variables Y and X show statistical variables: 𝛽̂0 , 𝛽̂1 , and 𝛽̂2 are estimators for 𝛽0 , 𝛽1 ,, and 𝛽2
which are called simple regression coefficients, e states the error component of the regression form [9].
4
The 2-nd International Seminar on Science and Technology 2020 (ISST-2) 2020 IOP Publishing
Journal of Physics: Conference Series 1763 (2021) 012017 doi:10.1088/1742-6596/1763/1/012017
2.8.2. Linearity test Linearity test can be done using a plot. If the final conclusion obtained is that the
data is linear then the data cannot be used but if the data is nonlinear then we can continue the test to the
next stage, namely the quadratic polynomial regression coefficient test.
2.8.3. Quadratic polynomial regression coefficient test To determine whether the quadratic polynomial
regression is significant, we need to test the significance of the regression parameters. Simultaneous test
with ANOVA table for testing regression parameters together and partial test with t-test for testing
regression parameters separately [1].
5
The 2-nd International Seminar on Science and Technology 2020 (ISST-2) 2020 IOP Publishing
Journal of Physics: Conference Series 1763 (2021) 012017 doi:10.1088/1742-6596/1763/1/012017
In the picture above, it can be seen that the movement of the values generated by the Nadaraya-Watson
kernel estimator will follow the shape of each existing observation point so that it will produce an
optimum kernel nonparametric regression curve.
From the picture above, it can be seen that the graph of the quadratic polynomial regression is parabolic
with the following regression equation:
However, the regression above cannot be drawn as a conclusion because there has not been a parametric
assumption test and a quadratic polynomial regression coefficient test. For that, the next step is to test
the parametric assumptions.
Kolmogorov-Smirnov Z 1.314
Asymp. Sig. (2-tailed) 0.063
Based on the table, the probability value for the stock price variable is 0.063. This probability value can
be seen from the Asymp value. The significance (P)> 0.05, the data used is data that is normally
distributed.
3.3.2. Linearity test The linearity test shows that for the linear regression equation, the relationship
between the independent and dependent variables must be linear. This assumption will determine the
type of estimation equation used.
6
The 2-nd International Seminar on Science and Technology 2020 (ISST-2) 2020 IOP Publishing
Journal of Physics: Conference Series 1763 (2021) 012017 doi:10.1088/1742-6596/1763/1/012017
Based on the scatterplot above, it can be explained that the linear assumptions are not fulfilled. This can
be seen from the scatterplot graph above which forms a certain pattern so that there is a nonlinear
relationship between variables.
Table 3. ANOVA results for the quadratic polynomial regression model test
Source of Sum of
Df Mean Square F Sig.
Variation Squares
Regression 501437.947 2 250718.973 408.129 .000
Residual 153578.172 250 614.313
Total 655016.119 252
Testing criteria:
𝐻0 is rejected if Fcount ≥ Ftable (1 − 𝑎; 𝑘; 𝑛 − 𝑘 − 1)
𝐻1 is accepted if Fcount < Ftable (1 − 𝑎; 𝑘; 𝑛 − 𝑘 − 1)
Obtained the value of 𝐹𝑡𝑎𝑏𝑙𝑒 is:
𝐹𝑡𝑎𝑏𝑙𝑒 (1 − 𝑎; 𝑘; 𝑛 − 𝑘 − 1) = 𝐹𝑡𝑎𝑏𝑙𝑒 (1 − 0,05; 2; 250)
= 3,035
Based on the Anova table, the mean square column in the residual row, the MSE value = 614,313 is
obtained, so the RMSE (Root of Mean Squared of Error) value is:
RMSE = √MSE
= √614,313
= 24,785
7
The 2-nd International Seminar on Science and Technology 2020 (ISST-2) 2020 IOP Publishing
Journal of Physics: Conference Series 1763 (2021) 012017 doi:10.1088/1742-6596/1763/1/012017
3.5. Comparison of Kernel regression model with a quadratic polynomial regression model
From the two regression models that have been carried out, namely the kernel regression model with
Nadaraya-Watson estimation and the quadratic polynomial regression model, comparisons will be made
to determine which regression model is better. The comparison measure used is based on the RMSE
value of each model:
From the Table 5, it can be seen that the Nadaraya-Watson kernel nonparametric regression model
with a bandwidth value of = 25.64 provides a better estimate than the quadratic polynomial regression
model because it produces a smaller RMSE value than the quadratic polynomial regression so that the
best regression model is the kernel regression model with Nadaraya-Watson's estimate.
4. Conclusion
By using the quadratic polynomial regression model, the RMSE (root of Mean Squared Error) value is
24.785, while using the Nadaraya-Watson kernel regression model with the Gaussian kernel type, the
RMSE value is 16.00147 and the bandwidth size used is 25.64. By comparing the two regression models,
it is found that the Nadaraya-Watson kernel regression model with a bandwidth value of = 25.64 in
financial data is better than quadratic polynomial regression model because, in addition to providing a
lower error rate, the kernel regression model also provides a better estimate than the quadratic
polynomial regression model.
Acknowledgement
Thanks to Statistics Laboratory who assisted this research in the data processing. Thanks also to many
academic parts for their contribution to this research.
References
[1] Draper N R & Smith H 1998 Applied regression analysis 326 (New York: John Wiley & Sons)
[2] Eubank R L 1988 Spline smoothing and nonparametric regression 90 (New York: M Dekker).
[3] Fernandes A A R, Budiantara I N and Otok B W 2015 Journal of Mathematics and Statistics 11(2)
61
[4] Halim S and Bisono I 2006 Jurnal Teknik Industri 8(1) 73
[5] Härdle W 1990 Applied nonparametric regression 19 (Cambridge: Cambridge University Press)
[6] Nurgiyantoro B 2004 Statistik Terapan: Untuk Penelitian Ilmu-Ilmu Sosial (Yogyakarta: Gajah
Mada University Press)
[7] Puspitasari I, Suparti S and Wilandari Y 2012 Jurnal Gaussian 1(1) 93-102
[8] Wahyuni S A, Ratnawati R, Indriyani I and Fajri M 2020 Natural Science: Journal of Science
and Technology 9(2) 34-39
[9] Tiro M A 2008 Dasar-dasar statistika (Makassar: Andira Publisher)
[10] Yahoo Finance 2019 Data Saham Mastercard Incorporated (http://www.yahoofinance.com)