Final Research Paper
Final Research Paper
Amit Agrawal
Project 2025
Shikhar
Document Details
Submission ID
trn:oid:::1:3046018070 13 Pages
Download Date
File Name
final_research_paper.pdf
File Size
598.5 KB
Quoted Text
0 Missing Citation 0%
Matches that have quotation marks, but no in-text citation
Integrity Flags
0 Integrity Flags for Review
Our system's algorithms look deeply at a document for any inconsistencies that
No suspicious text manipulations found. would set it apart from a normal submission. If we notice something strange, we flag
it for you to review.
0 Missing Citation 0%
Matches that have quotation marks, but no in-text citation
Top Sources
The sources with the highest number of matches within the submission. Overlapping sources will not be displayed.
1 Publication
Amit Rai, Ashish Shrivastava, K. C. Jana. "An Empirical Analysis of Machine Learni… 44%
2 Internet
www.researchgate.net 2%
3 Publication
4 Publication
Enas Raafat Maamoun Shouman. "Chapter 2 Solar Power Prediction with Artificial… 1%
5 Internet
www.streetinsider.com 0%
6 Internet
iea-shc.org 0%
7 Internet
sedok.narod.ru 0%
2
Analysis Of Machine Learning Algorithms For Solar Power
Forecasting
ABSTRACT
6 In the realm of energy generation, solar power stands out as one of the most promising clean
1 energy alternatives to non-renewable sources. However, its reliance on environmental factors
4 introduces uncertainty in energy output. In this context, solar power forecasting plays a crucial
role in reducing this uncertainty and enhancing the overall stability of energy systems. Solar
2 power forecasting helps reduce uncertainty and improve system stability. Recently, machine
learning (ML) models have been widely used for solar power forecasting and design. However,
2 careful consideration of data pre-processing, forecast horizons, and the evaluation of ML model
performance is necessary to identify the most accurate model. This paper offers a comparative
1 analysis of various ML models for solar power forecasting, providing insights into future
research and guiding the selection of appropriate methods based on the strengths and
2 weaknesses of each ML model. Therefore, an effective forecasting approach is determined by
factors such as error performance, convergence time, and computational complexity. This study
evaluates various ML models based on error metrics and convergence time. Additionally, cross-
validation and hyperparameter tuning are explored for the top five performing models to
provide a more thorough evaluation. This offers valuable insights to stakeholders involved in
solar power plant modelling.
KEYWORDS Bagging; linear regression; machine learning; solar power; regularization
method.
ABBREVIATION
LR Linear regression SVM Support vector machine
DT Decision tree SP Solar power
ML Machine learning KNN K Neighbors regression
GBR Gradient boosting regression EN Elastic net
MAE Mean absolute error LAR Least angle regression
SPF Solar power forecasting OMP Orthogonal matching pursuit
MA Moving average LASSO Lasso regression
1. INTRODUCTION
The global demand for electricity is rising due to increasing industrialization and population
growth, making it a key factor in a nation's overall development. Between 2000 and 2018,
energy generation increased by about 3% per year, reflecting worldwide economic growth
1 during that period [1]. However, coal-based power plants produce greenhouse gas emissions
1 and rely on non-renewable resources, which are finite. As a result, the focus is now shifting
toward renewable energy systems [2]. At present, hydropower plays a significant role among
1 renewable energy sources. However, solar power is more widely available and holds great
potential to meet the increasing global energy demand [3]. In recent years, the contribution of
1 solar power plants (SPPs) to the energy grid has been steadily growing. This rise in solar power
brings economic, environmental, and installation advantages, especially in remote areas.
However, the reliance on environmental factors makes solar power generation unpredictable,
which can pose challenges to grid stability [4]. To address the stability issues caused by the
1 unpredictability of solar power, a backup power source or battery storage system is needed [5].
However, these solutions are not cost-effective alternatives. Therefore, solar power forecasting
is a crucial step in managing the unpredictable output of solar power plants (SPPs) and
increasing profitability [6,7]. These methods work well with a single input variable, but the
complexity grows as more independent variables are added. Additionally, the limited
1 availability of data from solar power plants (SPPs) reduces the accuracy of solar power
1 predictions [8]. These traditional models rely on past data and use it to predict future solar
1 power (SP) output for the next time interval [9]. In data-driven approaches, various inputs like
solar radiation, solar power, wind speed, humidity, sun angle, temperature, and weather
1 conditions are provided to the algorithm. The algorithm then establishes a regression
1 relationship between the input and output variables to predict solar power (SP) [10]. The auto-
regressive class of statistical methods analyses components like auto-regressive (AR),
integrated (I), and moving averages (MA) from time-series data. These statistical variables are
then used to create a relationship between the input and target variables. ARIMA and SARIMA
models are commonly used for solar power forecasting in the literature [11-14].
While these methods are effective for short-term predictions, their performance tends to decline
for long-term forecasts, and existing literature often presents a biased comparison of their
effectiveness. Additionally, adjusting the statistical parameters can be quite challenging. Linear
regression offers an advantage over AR models because it doesn't require parameter tuning. It
1 seeks to establish a linear relationship between input and target variables. These methods are
1 straightforward, easy to implement, and require fewer resources compared to advanced
machine learning (ML) techniques, making them widely used in solar power forecasting [15-
4 21]. Non-linear methods like support vector machines, random forests, ensemble learning, and
gradient boosting offer advantages over the previously mentioned techniques by effectively
capturing the non-linear patterns in solar power or radiation data [22].
1 The input parameters for predicting solar power are quite extensive and vary across different
1 studies. Most research typically includes solar radiation, wind speed, and ambient temperature
as key input factors for prediction models. The performance metrics used in these studies
mainly include root mean square error (RMSE), mean absolute error (MAE), mean square error
(MSE), and mean absolute percentage error (MAPE) to assess the proposed models.
1 The performance of machine learning models is assessed using standard statistical indicators
[34].
1 Here, S(t)i is the actual value solar power and S^i(t) is the predicted output.
The performance of machine learning models is measured using various statistical indicators,
1 such as mean absolute percentage error (MAPE), root mean squared error (RMSE), mean
absolute error (MAE), R² error, and root mean squared logarithmic error (RMSLE). MAE
1 represents the average of absolute prediction errors, as shown in Equation (3). R² indicates the
1 proportion of variation between the true and predicted values, shown in Equation (5). RMSE
is the square root of the mean squared error, and for an unbiased estimator, it reflects the
1 standard deviation of the predicted values, as shown in Equation (4). RMSLE measures the
1 logarithmic difference between actual and predicted values and is useful when both values are
large, as shown in Equation (6).
3.RESULT
Solar power is first predicted using the ARIMA model. The base parameters of the ARIMA
1 model are evaluated using the Augmented Dickey-Fuller (ADF) test and Akaike’s Information
Criterion (AIC), as shown in Appendix B. The best model identified is ARIMA (0,1,1) (1,1,0)
1 [12] which indicates a seasonal component in the dataset. This is also evident in the rolling
mean estimation, as shown in Figure 4.
1 Figure 5 compares the actual solar power to the predicted solar power. The prediction errors,
measured by MAPE and R², are 7.986 and 0.696, respectively.
1
3 In this section, the performance of machine learning models is evaluated on the dataset for 6-
hour predictions. The models are implemented in the Spyder notebook using the Python
programming language, along with the NumPy, pandas, and PyCaret libraries [24]. To assess
the performance of the machine learning algorithms, the models are implemented in PyCaret.
1 The models analysed include Extra Trees (ET), Light Gradient Boosting Machine (LGBM),
Extreme Gradient Boosting (XGB), Random Forest (RF), Gradient Boosting Regression
(GBR), Linear Regression (LR), Bayesian Ridge (BR), Least Absolute Shrinkage and Selection
Operator (LASSO), Ridge Regression (RR), Decision Tree (DT), k-Nearest Neighbors (KNN),
AdaBoost Regression (ABR), Elastic Net (EN), Orthogonal Matching Pursuit (OMP), Least
Angle Regression (LAR), and Support Vector Machine with a linear kernel (SVML). No single
1 machine learning model performs best across all types of time-series datasets. To allow for a
more intuitive comparison, single-step cross-validations with 5 and 10 steps are conducted on
the top five machine learning models.
Table 2 presents the performance metrics of 16 machine learning regression models. The
algorithms are compared using six different performance indicators: MAE, RMSE, R²,
RMSLE, MAPE, and computation time (in seconds). The table is sorted by MAE in ascending
order, from the lowest to the highest values.
As shown in Table 3, the Extra Trees Regression (ETR) model outperforms other machine
1 learning models in terms of MAE, RMSE, R², and MAPE, with values of 1.89×10⁴ W, 4.12×10⁴
W, 0.9997, and 0.0136, respectively. The RMSLE value is 0.1007, which measures the
logarithmic difference between actual and predicted values, and it doesn't penalize large
1 differences, making it useful when both values are large. It is also evident that ensemble
1 machine learning models perform better than other models in terms of MAE, RMSE, RMSLE,
1 and MAPE. The Support Vector Machine with a linear kernel (SVML) shows the poorest
performance in terms of statistical errors.
1 Figure 6 illustrates the performance of different categories of machine learning regression
1 techniques in terms of MAE and RMSE. The figure shows that the average MAE and RMSE
values for ensemble methods (ETR, LightGBM, XGBoost, RF, AdaBoost, and GBR) are
61,237.33 W and 87,019.41 W, respectively, which outperform other types of regression
models.
1 Figure 6: Error performance of different ML
Computational Models
Ensemble regression methods show better error performance, although their computational
time is higher compared to other regression methods due to the use of multiple tree-based
decisions for prediction, as shown in Figure 6. However, in terms of computational time,
regularization and linear regression methods outperform other machine learning models, with
average performance errors illustrated in Figure 7.
1 K-fold cross-validation involves splitting the dataset into K parts for training machine learning
models. Each part is used once for training and once for validating the algorithms. This method
1 shuffles the dataset and divides it into K groups, retaining each group's performance until
training is complete. It helps reduce bias in the prediction method.
1 Residuals in regression indicate the vertical distance between the predicted values and the
regression line. They clearly show the error between the observed and predicted values. Ideally,
1 the residuals of any machine learning model should be normally distributed and independent.
Additionally, the points should be densely packed around the origin and exhibit symmetry or a
1 normal distribution around it. The variation of the R² value for the training data of the ensemble
methods ranges from 1 to 0.979, while for the test data, it ranges from 0.974 to 0.977. These
R² values indicate the effectiveness of the top five ensemble regression techniques.
1 Table 5: Conclusion of hyperparameter tuning of top 5 ML models
1
3 The performance of machine learning models depends on the selection of model parameters
and hyperparameters. The choice of hyperparameter values guides the training process on the
dataset and influences how well the model converges to an optimal solution. Each machine
1 learning model has different hyperparameters, and their selection can vary based on the
characteristics of the datasets. In this study, the hyperparameters were evaluated using the
PyCaret package. The random search cross-validation (CV) method was used to explore the
1 default combinations of hyperparameters provided by the scikit-learn library. The reason for
choosing the default scikit-learn search space is that it offers a wide range of hyperparameter
options.
1 Table 5 presents the results of hyperparameter tuning for most of the ensemble methods. Only
3 ensemble methods were selected for tuning due to their superior performance compared to
other machine learning models. The tuning process improved the results of Gradient Boosting
1 Regression (GBR) and LightGBM by 50.47% and 1.75%, respectively, in terms of the loss
function. In this context, "Ht" refers to the tuned model output, while "D" indicates the default
parameters.
4. CONCLUSION
Choosing an accurate machine learning model for solar power forecasting is a challenging task,
and many approaches have been explored in various studies using different ML models. This
1 work offers a detailed review and empirical analysis of solar power forecasting techniques,
including ARIMA-based classical time series methods and 16 different ML models. ARIMA-
based models require relatively fewer data points, but their limitations in long-term predictions
and parameter estimation reduce their effectiveness.
1 While machine learning models are already employed for solar power prediction, many
advanced ML techniques have not been thoroughly investigated or compared in a single study.
1 This work aims to provide a more comprehensive review of these techniques. The performance
evaluation of the ML models indicates that ensemble models offer better forecasting accuracy
than other ML models. The ability to train multiple nodes and combine their predictions gives
ensemble methods an advantage over other approaches.
1 Among the ensemble methods, Extra Trees Regression has outperformed the other models.
However, computational time is a significant limitation for the real-time implementation of
machine learning models. Regularization ML models have the lowest computational time
compared to all ML models. While ensemble models provide accurate forecasting results, they
1 require more computational time. This work offers a practical approach by providing a
systematic review of future research and opportunities for stakeholders to choose an accurate
machine learning model for solar power prediction.
REFERENCES
1. IEA, International Renewable Energy Agency, United Nations Statistics Division, The
World Bank, and World Health Organization, “The energy progress report,” IEA, IRENA,
UNSD, WB, WHO (2019), Track. SDG 7 Energy Progress Report. 2019, Washingt. DC,
2019.
2. Z. Şen, "Solar energy in progress and future research trends," Prog. Energy Combust. Sci.
, vol. 30, no. 4, pp. 367-416, 2004, DOI: 10.1016/j.pecs.2004.02.004.
3. IRENA, Renewable Capacity Statistics 2019.AbuDhabi: International Renewable Energy
Agency (IRENA), 2019. Available: https://www.irena.org/publications/2019/Mar/
Renewable-Capacity-Statistics-2019.
4. S. B. Nam, and J. Hur, “A hybrid spatio-temporal forecasting of solar generating resources
for grid integration,” Energy, Vol. 177, pp. 503–10, 2019. DOI:
10.1016/j.energy.2019.04.127.
5. V.Bagalini,B.Y.Zhao,R.Z.Wang,andU.Desideri,“Solar PV-battery-electric grid-based
energy system for residential applications: system configuration and viability,” Research,
Vol. 2019, pp. 1–17, 2019.DOI:10.34133/2019/ 3838603.
6. N.Dong, J.F.Chang, A.G.Wu,and Z.K.Gao,“A novel convolutional neural network
framework based solar irradiance prediction method,” Int. J. Electr. Power Energy Syst.,
Vol. 114, no. July 2019, pp. 105411, 2020. DOI: 10.1016/j.ijepes.2019.105411.
21. . R.AmaroeSilva, andM.C.Brito, “Impact of network layout and time resolution on spatio-
temporal solar fore casting,” Sol. Energy, Vol. 163, no. November 2017, pp. 329–37,
2018.DOI:10.1016/j.solener.2018.01.095.
22. M. H. D. M. Ribeiro, and L. dos Santos Coelho, “Ensem ble approach based on bagging,
boosting and stacking for short-term prediction in agribusiness time series,” Appl. Soft
Comput. J., Vol. 86, pp. 105837, 2020.DOI: 10.1016/j.asoc.2019.105837.
23. T. a Huld, M. Súri, E. D. Dunlop, M. Albuisson, and L. Wald, “Integration of
HELIOCLIM-1 database into PV GIS to estimate solar electricity potential in Africa,”
in20th European Photovoltaic Solar Energy Conference Exhibition, 2005, pp. 2989.
Available: c:/pdflib/00018470.pdf.
24. M. Ali, “PyCaret.” 2020. Available: https://pycaret.org/.