1. Introduction
The pumping station is an important facility within water conservancy projects, providing essential water resources for domestic use, agricultural irrigation, and industrial production. The safe operation of pumping stations is crucial for the consistent functioning of water conservancy projects and the protection of lives and property. However, during the operation of pumping station systems, factors such as equipment aging, environmental changes, and improper operation can result in equipment failures, system performance degradation, and ultimately accidents such as resource waste, equipment damage, and personnel injury. Therefore, it is necessary to monitor the real-time operational status of pumping stations and diagnose and alert any irregularities in the operation of pumping station units. This approach ensures the continuous, safe, and stable operation of pump unit equipment within pumping stations, reduces equipment failure rates, enhances maintenance and inspection efficiency, and optimizes management strategies. To achieve these goals, the most widely used and effective approach is to develop models that predict the trends of pumping station unit parameters and alert management when units may operate abnormally.
The trend prediction of pumping station unit operating parameters is based on analyzing various real-time operational data of these units. This process involves the real-time analysis and trend prediction of key parameters that may affect the health and safe operation of pumping station units, identifying potential fault hazards in advance for equipment diagnosis and maintenance management [
1,
2]. Researchers have extensively studied monitoring the status of pumping station unit equipment and predicting the trends of operating parameters based on data analysis.
In terms of monitoring the status of pump station equipment, there are three popular methods for monitoring the status of water pump stations: model-based methods, signal analysis-based methods, and data-driven methods [
3]. Model-based methods establish accurate physical models based on the operating mechanism of unit equipment to reflect the operating status of the unit, which usually ensures high accuracy [
4]. Data from the pumping station’s operation processes (such as electrical parameters, pressure or flow in pipelines, water levels in tanks, and changes in discrete states) represent a valuable resource for operation management. Marko highlighted the importance and advantages of employing hybrid models with suitable “data-driven” techniques for controlling water supply systems [
5]. With the development of digital twin technology, many researchers have begun to construct digital twin models for pumping stations. Hallaji proposed a digital twin framework to extend the scope of predictive maintenance by leveraging building information modeling and deep learning [
6]. Feng developed a high-precision digital twin modeling method tailored for pumping stations, which could complete the automatic inspection of the pumping station; optimization of scheduling, prediction and regulation of energy, and carbon emissions; and visualization of results for display and other applications [
7]. However, the model-based method also has its drawbacks. For the operation of units in complex scenarios, such methods make it difficult to establish accurate physical models [
8]. In addition, due to the wear and tear during use, the physical operating mechanism of the equipment may change, resulting in changes in the operating model parameters. Operating models that cannot be updated synchronously may lead to false alarms or missed alarms. Therefore, some researchers use signal analysis-based methods for monitoring. Fu investigated a novel hybrid approach combining a multiscale dominant ingredient chaotic analysis, kernel extreme learning machine, and adaptive mutation grey wolf optimizer for predicting vibration trends in hydropower generator units [
9]. In Wang’s work for the real-time analysis and processing of data in pumping station operation and maintenance systems, a hybrid prediction method was proposed to predict the vibration responses of the pumping station based on a single model of the autoregressive integrated moving average (ARIMA), a combined model of the adaptive network-based fuzzy inference system (ANFIS) and whale optimization algorithm (WOA) [
10]. Signal analysis-based methods detect potential faults by processing and analyzing collected signal data, making them sensitive to signal quality and highly susceptible to noise and interference. Thus, specialized sensors and devices are required, which will increase operating costs. Moreover, some key signals are difficult or impossible to collect. Thus, the application range of signal analysis-based methods is limited [
11]. Consequently, more researchers prefer to use data-driven methods to monitor pumping station devices. Data-driven methods do not rely on physical models and can adapt to the differences of various systems and devices. The migration and deployment of models are convenient, reducing development and maintenance costs. Data-driven methods include methods based on statistical analysis [
12] and those based on machine learning [
13]. Statistical analysis-based methods rely on data distribution characteristics, monitoring the operation status through calculated characteristic parameters based on establishing the relationship between variables. This method requires researchers to have rich experience and professional knowledge of pumping stations when judging the relationships between variables and screening useful features. Machine learning-based methods typically require the development of complex network structures and the tuning of numerous hyperparameters, making it challenging to explain and understand the model’s decision-making process. In addition, such methods require certain hardware facilities for computation. More importantly, when the number of training samples is not sufficient, such methods are prone to overfitting, and their generalization ability is difficult to guarantee.
In the research of data analysis and trend prediction algorithms, methods and models from other fields have been extensively applied in water conservancy engineering. The main goal of analyzing the operating parameter data of pumping station equipment is to extract meaningful insights from real-time or historical data, explain and discover potential patterns or correlations between variables, obtain a deeper understanding of system operating characteristics, and predict the trend of parameter operation. Previous studies have mainly used traditional statistical methods, such as calculating the mean and analysis of variance, to analyze the operating data of pumping stations, but these methods have encountered difficulties in dealing with complex and time-varying pumping station systems. In recent years, machine learning (ML) methods have become an important tool for pump station data analysis. The application of methods such as support vector machines, random forests, and neural networks improves the accuracy of data analysis by effectively handling nonlinear relationships and complex correlations between multiple variables. For example, Surucu reviewed the recent literature on ML-driven condition monitoring systems that have been beneficial in many cases and provided insights into the underlying findings on successful, intelligent condition monitoring systems [
14]. Eiben compared the effectiveness of random forest and k-means clustering models in predicting failures at pumping stations [
15]. Khorsheed explored the integration of machine learning with decision-making techniques to predict potential bearing failures, thereby improving overall manufacturing operations by enabling timely maintenance actions [
16]. At the same time, time series analysis methods such as ARIMA [
17] and exponential smoothing [
18] have been widely adopted to capture trends and seasonal changes in the time series data of pumping stations in order to achieve long-term operational prediction and analysis.
At present, there are several prominent trends in the development of analysis and trend prediction for pump station data. Firstly, the application of deep learning technology will be further expanded, and models such as deep neural networks [
19], long short-term memory networks (LSTMs) [
20], and convolutional neural networks (CNNs) [
21,
22] will improve their modeling and prediction capabilities for complex systems. Secondly, the integration of edge computing and Internet of Things technology will become an important direction [
23,
24]. By enabling data processing at the device level, more efficient data management and analysis can be achieved. In addition, future research should focus more on the fusion and integration of multi-source data to establish a more comprehensive pump station information model. With the development of artificial intelligence technology, pump station data analysis will further move towards intelligent decision-making systems, achieving the automated operation and intelligent optimization of pump station systems.
The operational status of pumping stations is represented by multiple variables, necessitating the use of multivariate statistical analysis methods in the data analysis and modeling process. It is usually necessary to perform dimensionality reduction on the data to extract key features to monitor and judge the performance of the water pump unit. Principal component analysis (PCA) is the most commonly used method that uses a large amount of data to ensure its statistical characteristics and accurately captures the main direction of data changes [
25]. The pump unit in a pumping station operates intermittently and changes gradually, with each operating period being considered as a separate task. As the operating years increase, the number of tasks grows, but the low frequency of data collection results in limited effective data for analysis within each task, presenting challenges for data analysis. Therefore, how to monitor the status of water pump units under the condition of multiple tasks and few samples has become a significant research focus. The machine paradigm of multi-task learning (MTL) improves the model performance by simultaneously learning multiple related tasks, offering a solution to the problem of multiple tasks with few samples [
26,
27]. In traditional single-task learning, a model is trained to solve a specific task. In multi-task learning, models are designed to handle multiple tasks simultaneously, which can relate and share certain features to improve the generalization performance of the model. Achille et al. [
28] addressed the challenge of task description in multi-task learning, while Zhang et al. [
29] explored the use of geometric reasoning, scene depth, and semantics to optimize the effectiveness of multi-task learning. In the analysis and prediction of operating parameters for pumping station equipment, correlations between different monitoring tasks—such as vibration and electrical parameters—can be analyzed to extract appropriate features that represent each task. Moreover, the characteristics of monitoring data for pump stations operating under different conditions can vary significantly. Therefore, uncertainty information can be considered to adjust the weight of model parameters to adapt to different situations, aiming to improve the generalization ability of the model and enable the model to adapt to monitoring tasks under different operating conditions, pump station unit types, or environments. In the process of multi-task learning, each task utilizes all of the extracted features. However, due to factors such as seasonal changes and noise interference, features may shift. Moreover, not every feature will exist in each task; in other words, some tasks may only have specific features. Therefore, how to select the features extracted by multi-task learning has become another problem that needs to be solved. Introducing an attention mechanism (AM) in the process of multi-tasking learning is a feasible method. The attention mechanism simulates the mechanism of human attention allocation in information processing, allowing the model to allocate different weights based on different parts of the input that are focused when processing data, thereby more flexibly dealing with complex tasks and data [
30]. The attention mechanism has been widely applied in fields such as image processing and speech recognition, but it is less commonly used in the field of pump station data analysis and processing. In pump station operation monitoring, different sensors and monitoring tasks can provide information about different aspects of the pump station system. The model makes it difficult to effectively focus on the key information in the large input data. Therefore, attention mechanisms need to be introduced to help the models dynamically pay attention to the data of different monitoring tasks and adjust the weights of tasks according to the current situation. This will be beneficial for improving the adaptability of the monitoring system to multiple tasks, ensuring a more comprehensive understanding of the operating status of the water pump system.
Thus, based on the above background, a trend prediction model based on PCA-based MTL and AM for the operating parameters of water pumping stations is proposed in this paper. The multi-task learning method based on PCA was used to process the operating data of the water pumping station to make full use of the historical data to extract the key common features reflecting the operating state of the devices. The attention mechanism is introduced to dynamically allocate the weight coefficient of common feature mapping for highlighting the key common features and improving the prediction accuracy of the model when predicting the trend of data change for new working conditions.
2. Methods
The basic process of the trend prediction model based on the PCA-based multi-task learning and attention mechanism for operating parameters of the water pumping station in this article is as follows: Firstly, the multi-task learning method based on PCA is used to reduce the dimensionality of the monitoring data and filter out common features of the historical operating condition and tasks. Then, the attention mechanism is introduced to dynamically determine and adjust the weights of the prediction model in each common feature direction. Finally, the real-time operating condition data are mapped to the principal component direction of the common features, and the trend of changes is predicted based on the characteristics and variations in the principal component direction. Two statistical parameters are used to evaluate the predictive performance of the model in the process of training and testing, and they are also used as thresholds to determine whether the operating state of water pumping units is abnormal and what level of alert is issued when the status is abnormal.
2.1. PCA-Based Multi-Task Learning
The operation of the water pumping unit is not continuous, and each operating interval can be treated as a separate task. Therefore, large amounts of historical data can be fully utilized to enable the model to learn the characteristics of data changes during pump station operation. Multi-task learning can improve the performance of the predictive model by simultaneously learning multiple related tasks. At the same time, there are many types of variables in the operation data of pumping stations, and long-term operation and slow parameter changes cause data redundancy, so dimensionality reduction processing is needed. The historical operating data of the pumping units can be represented as
, where
N represents the number of groups of data, each group of data
is a matrix of size
,
is the sample size of the group, and
m is the number of features monitored of the device. The process of standardization is necessary to ensure that the results of the PCA accurately capture the main direction of data changes and are not affected by different feature scales. The covariance matrix
in the PCA can represent the correlation between different features, thereby identifying the main direction of change in the data. It reflects the degree of linear relationship between the various features of the data. The covariance matrix
can be expressed as
through singular value decomposition, so
can be written as the sum of the outer products of
k vectors:
where
=
is called the score matrix and
=
is called the load matrix. By performing the above operation on
N sets of historical operating data, we can obtain the set of principal component numbers
, the set of score matrix
, and the set of load matrix
. In PCA, if two vectors have the same direction, it means that they have similar trends in the principal component direction of the data, that is, their directions of change with the data are similar. The cosine similarity can be used to measure the degree of similarity in the direction between two vectors, so the degree of similarity in the direction between each principal component vector in the set of load matrix
can be calculated as:
where
is the
i-th principal component in the d-th load matrix.
is the
j-th principal component in the
t-th load matrix. The closer the cosine similarity is to 1, the closer these two principal components are to each other. If
, where
indicates the proportion of total variance that we wish to retain (usually taken as 90% or 95%), it means that their trends in the main direction of data change are sufficiently similar and contain similar information. Thus, we can obtain a set of columns with consistent directions in
q groups, and there are
vectors with consistent directions in each group
. By splicing it and conducting SVD decomposition:
,
. If the first column of orthogonal matrix
is treated as a new direction
, the
represents the direction corresponding to the maximum singular value of
, which is the main direction of change in the data. In this way,
q new directions can be obtained, which can be used as a comprehensive feature to represent the main changing directions of the original data.
can serve as common features of these
N sets of historical operating data of the water pumping unit, which are the common features obtained in PCA-based multi-task learning and more comprehensively reflect the changing characteristics of the data.
In summary, the PCA algorithm based on multi-task learning is shown as follows:
- Step 1:
Define and standardize N sets of historical operating conditions data for the equipment;
- Step 2:
Calculate the covariance matrix ;
- Step 3:
Calculate the score matrix ;
- Step 4:
Calculate the load matrix ;
- Step 5:
Calculate the cosine similarity values between the column vectors in to obtain the vectors with and calculate .
2.2. Weight Adjustment Based on the Attention Mechanism
After obtaining the common features from the PCA-based multi-task learning, the next step is to map the data of the new operating period of the device onto these common features to further analyze their changing trends in the common directions. Different sensors and monitoring tasks can provide information about different aspects of the pump station system. The model makes it difficult to effectively focus on the effective key information in the large input data. Thus, introducing attention mechanisms is essential to enable the model to dynamically focus on the data from different monitoring tasks [
31]. The process of weight generation based on the attention mechanism is shown in
Figure 1. It comprises two parts: (1) The pre-trained model. This part uses convolutional neural networks to analyze historical data and classify existing categories. (2) The weight generation based on the attention mechanism. This section promotes the commonalities obtained through training tasks and provides weight parameter values for new tasks.
The pre-trained model consists of three steps:
Step 1: Design the feature extractor:
where
X is the training dataset of the tasks,
z is the feature set of the dataset
X output by the feature extractor, and
are trainable parameters in the feature extractor.
Step 2: Calculate the cosine similarity:
where
is a trainable constant parameter,
represents the
norm of weight values of the last layer in the pre-trained model, where each element represents the weight parameters of
basic categories. The basic categories refer to the categories included in the trained tasks.
Step 3: Calculate the cosine similarity: Calculate the probability
of each basic category:
where each element of
represents the probability that
X belongs to
basic categories, respectively.
The weight generator based on the attention mechanism also includes three steps:
Step 1: Design the feature extractor: In the pre-trained model, is the weight parameter vector, which refers to the weight parameters of all connections of the last layer of neurons. After normalizing using the norm of the last layer’s weight values, these weight values are stored in the memory module, where the number of neurons in the last layer is K; each element in the feature set of the trained task dataset represents the corresponding feature of the category dataset.
Step 2: In the memory module, use the attention mechanism to extract the weight values corresponding to the most relevant features and perform weighted averaging:
where
m is the number of sample points in the new task training set,
K is the number of categories in the trained tasks, and
is the trainable parameter.
are all the connection weight parameters corresponding to the
b-th class neuron in the last layer of the pre-trained deep network.
is the feature of the dataset for the
i-th sample point in the new task.
is the feature of the dataset of the
b-th category in the memory module.
is the weighted average of the weight parameters corresponding to the most relevant features extracted from the memory module by the attention mechanism.
represents the attention mechanism, which is the similarity among the features of each sample task in the new task and the features of each category in the memory module. The greater the similarity between them, the greater the value of
, which indicates that the connection weight parameters of neurons of this category have a greater weight. The value of
can be calculated using the following formula:
where the denominator of the
function is the sum of all pre-trained categories. The output of the
function is the probability value of the
b-th pre-training category, ranging from 0 to 1, representing the coefficient of the weight parameter of the
b-th pre-training category in the weighted average.
Step 3: Calculate the connection weight parameter value
for the new category of neurons:
where
is the classifier weight generated based on the average value of features,
and
are the trainable parameters,
is the classification weight generated based on attention mechanism, and ⊙ represents the Hadamard product calculation.
2.3. The Prediction Model Based on PCA-Based MTL and AM
When a new operating period of the device begins, the data of new samples that have been standardized can be recorded as a matrix of size , where is the number of samples and m is the number of features of the monitored device. As previously noted, the direction of the principal component captures the maximum variance of the data, so the projection values in these directions represent the main changes. The projection value of the new sample along the principal component direction is the inner product between the new sample and the principal component, which indicates the relative position of the new sample in this direction. If the projection value of the new sample in the principal component direction is close to the mean of the historical data in the same direction, it indicates that the change in the new sample in this direction is similar to the average change trend of the historical data, which means that the device is operating normally. If the projection is far away from historical data, it indicates that the new sample has undergone significant changes in this direction or is different from the direction of changes in historical data, which means that there may be abnormal operation of the device. The steps for the model to make trend predictions of data from a new operating period are as follows:
Step 1: Standardize the new sample data obtained from a new operating period to obtain .
Step 2: Calculate the projection value of the new sample on the common feature , and apply the function to obtain the weight coefficients .
Step 3: Perform dimensionality reduction decomposition on the new sample using PCA.
Step 4: Calculate the projection value of the new sample on its own load matrix and the number of principal elements , and apply the function to obtain the coefficients of its own characteristic direction .
Step 5: Calculate the cosine similarity between each column vector in and to obtain the direction with the highest cosine similarity of the common direction corresponding to the , which is the direction most consistent with the direction in . Extract directions to form a new matrix =, and combine coefficients corresponding to directions to form a new coefficient matrix . These directions have similar features or changing trends to the principal component direction being focused in the new sample, which has guiding significance for identifying future data trends and patterns. The common direction will have an impact on the feature weight allocation of the prediction model. Thus, it is important to reasonably allocate the weights of the common direction and its own feature direction , which is also the reason for introducing AM.
Step 6: According to the coefficient matrix
and
, calculate the weight coefficient
of the common direction and the weight coefficient
of the feature direction of the new sample and normalize them. Then, use the attention mechanism to adjust weights. Weigh the
and
using the formula
=
to obtain a matrix
. Take the first
columns of the orthogonal matrix of
as the adjusted eigenvector
. The
and
can be calculated as:
Step 7: Project the new sample onto the main molecular space to obtain its principal component score and residual amount , where .
2.4. Model Monitoring and Evaluation Parameters
After the model training is completed, the operating data of new operating conditions are used as the input for the model to obtain the trend prediction results for the operating data of water pumping station units in the new operating stage. The performance of the model can be monitored and evaluated by Hotelling’s and statistic during the training and prediction process.
Hotelling’s
measures the degree of deviation of the sample in multivariate space. It is used to detect abnormal samples in multidimensional data, reflecting the stability of the model. The smaller the value, the more stable the model is. If the
statistic value of a sample point exceeds the set threshold, the sample point may be considered abnormal. The
statistic and its threshold
can be calculated by the following formulas:
where
n is the number of new samples,
A is the number of principal components extracted by the PCA model,
is the significance level, and
represents the upper limit of the
distribution with
degrees of freedom, corresponding to the critical value of 100%
.
The
statistic, also known as squared prediction error (SPE), is a measure of how far the sample deviates from the space of residuals, which is the portion of the space of original variables not explained by the model. The
statistic is used to monitor the covariance structure of the data. It is calculated through normalization based on the Mahalanobis distance between sample data points and sample means. The statistic
reflects the predictive accuracy of the model, with smaller values indicating higher accuracy. The statistic
and its threshold
can be calculated by the following formula:
where
is the standard normal deviation corresponding to upper limit
.
is the
i-th eigenvalue of the covariance matrix of the new sample.
Therefore, the process of predicting the trend of operating parameters for pump units in pumping stations using the PCA-based MTL and AM model is summarized in
Figure 2.
4. Results and Discussion
After the model was trained based on historical data and part of the data from the new operating periods, the operating data of new operating conditions were put into the model to obtain the trend prediction results for the new operating stage. Hotelling’s
and
statistics were used to evaluate the model’s performance. Control limits of 99% and 95% were chosen to determine the accuracy of the predictive results. For comparison, in addition to the monitoring results of the model based on PCA-based MTL and AM proposed in this paper (as shown in
Figure 6), the analysis results of the other two models were also presented, namely the single-task learning model based on PCA (as shown in
Figure 7) and the PCA-based MTL model without the attention mechanism (as shown in
Figure 8).
It is not difficult to see from the comparison of the results in the figures that, when the MTL algorithm is not employed (as shown in
Figure 7), the values of the monitoring statistics mostly exceed the control limit, indicating a poor stability and prediction accuracy of the model. This result can be easily inferred. Because the single-task learning model based on PCA does not treat each operating stage of historical data as an independent task, it means that it is not able to fully utilize the effective information in historical data. For the PCA-based MTL model without attention mechanism (as shown in
Figure 8), part of the statistical values exceed the control limit, indicating that the model cannot fully match the data of the new operating period. This is because common features extracted directly from historical data are used in data mapping, and weight adjustments are not performed based on the characteristics of the data from the new operating period. For the model based on PCA-based MTL and AM proposed in this paper (as shown in
Figure 6), the weight of common features had been adjusted by the attention mechanism when conducting data mapping, so the statistics have never exceeded any control limit, which means that the model fits the new operating data very well and predicts the changing trend stably and accurately. By comparing the results, we can draw the following conclusion: the MTL algorithms can fully utilize effective common features in historical data to improve the stability of the model. At the same time, the introduction of the AM could adjust the weights in the data mapping process based on the characteristics of the new operating data to be predicted, thereby improving the prediction accuracy of the model. Additionally, this model can predict the variation trends of multiple parameters in real time, enabling anomaly detection and early warning from multiple perspectives. This significantly reduces issues such as delayed fault identification, missed alarms, and false alarms. In contrast, some existing models or systems tend to focus on the prediction and anomaly detection of single parameters. For example, Hao Zhang et al. achieved fault detection and early warning for units by predicting temperature changes [
32], and Jiahao Zhu et al. used VMD and GRU to predict trends in unit vibration signals, enabling the detection and early warning of abnormal conditions [
33]. While single-parameter anomaly detection is possible, predicting multiple parameters simultaneously provides a more accurate reflection of the unit’s operating status, leading to the high-precision monitoring of abnormal conditions.