Preprints
https://doi.org/10.5194/essd-2024-229
https://doi.org/10.5194/essd-2024-229
22 Jul 2024
 | 22 Jul 2024
Status: this preprint is currently under review for the journal ESSD.

A high-quality gap-filled daily ETo dataset for China during 1951–2021 from synoptic stations using machine learning models

Ning Shan Zhou, Li Feng Wu, Qi Liang Yang, Jianhua Dong, Ling Yang, and Yue Li

Abstract. The reference evapotranspiration (ETo) is essential for water-consuming in agriculture and land-water cycle research. The synoptic data from meteorological stations can provide reliable ground data for ETo estimation with the FAO-56 Penman-Monteith equation. However, the five primary variables this equation needs, including maximum temperature (Tmax), minimum temperature (Tmin), sunshine duration (SSD), wind speed (Wind), and relative humidity (RH), often experience severe data loss due to force majeure events in synoptic data. The data loss would directly introduce severe data gaps to the complex records for ETo. Machine learning algorithms can fill various data gaps with low error rates, however, to achieve high data quality, the algorithms must be selected properly to deal with the distinct types of data loss and train independently. Here, based on the data characters, we investigated and classified data gaps from the synoptic dataset into 2 major types: the common, minor data loss gaps including Tmax loss/Tmin loss/SSD loss/Wind loss/RH loss/Wind and SSD loss/Wind and RH loss, and the other 19 types of data loss which is more severe in information loss but barely occurred. Our results show that the XGBoost model achieved the best accuracy in all 3 machine learning models with high statistic levels. For the other 19 types of data gaps, the LSTM models were trained separately for each site and achieved average R², RMSE, and nRMSE at 0.9, 0.5 mm d-1, and 38 % for the total 2419 stations. Thus, we propose a high-quality, gap-filled daily ETo dataset during 1951–2021 for China with the proportion of large errors (the data with daily ETo errors more than 1.5 mm d-1) below 0.2 %. Our results also reveal that the entanglement degree between synoptic variables varies a lot from region to region in China. Although most research indicates that wind speed is not very important for ETo estimation with machine learning models, our findings reveal that wind speed played a more significant role in ETo estimation in most areas of China during the years before the 21st century. Still, the impact of wind speed on ETo has also been alleviated in recent years. This ETo dataset for China is available online at https://doi.org/10.5281/zenodo.11496932 (Zhou et al., 2024).

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this preprint. The responsibility to include appropriate place names lies with the authors.
Ning Shan Zhou, Li Feng Wu, Qi Liang Yang, Jianhua Dong, Ling Yang, and Yue Li

Status: open (until 28 Aug 2024)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
Ning Shan Zhou, Li Feng Wu, Qi Liang Yang, Jianhua Dong, Ling Yang, and Yue Li

Data sets

A high-quality gap-filled daily ETo dataset for China during 1951-2021 from synoptic stations Ning Shan Zhou et al. https://doi.org/10.5281/zenodo.11496932

Ning Shan Zhou, Li Feng Wu, Qi Liang Yang, Jianhua Dong, Ling Yang, and Yue Li

Viewed

Total article views: 193 (including HTML, PDF, and XML)
HTML PDF XML Total BibTeX EndNote
148 37 8 193 6 5
  • HTML: 148
  • PDF: 37
  • XML: 8
  • Total: 193
  • BibTeX: 6
  • EndNote: 5
Views and downloads (calculated since 22 Jul 2024)
Cumulative views and downloads (calculated since 22 Jul 2024)

Viewed (geographical distribution)

Total article views: 190 (including HTML, PDF, and XML) Thereof 190 with geography defined and 0 with unknown origin.
Country # Views %
  • 1
1
 
 
 
 
Latest update: 16 Aug 2024
Download
Short summary
We created a highly precise dataset for daily water needs in China from 1951–2021, using machine learning to fill data gaps at 2419 weather stations. Independent models were trained for minor gaps, and LSTM models addressed severe gaps. Our research also examined the relationships between various weather parameters affecting water needs.
Altmetrics