Wind Turbine Gearbox Multi-Scale Condition Monitor
Wind Turbine Gearbox Multi-Scale Condition Monitor
REGULAR ARTICLE
Abstract. Since wind is expected to play a crucial role on the worldwide electricity production scenario, the
reliability of the turbines is attracting attention of both industry and academia. New techniques for efficient
condition monitoring of key components can be fundamental to optimising the performance and maintenance of
a large fleet of turbines. The gearbox and bearings are the most critical mechanical components as they are
responsible for a large proportion of the downtime of a wind turbine over its lifetime. However, the monitoring of
wind turbine gearboxes is challenging due to the non-stationary nature of the operation and the lack of noise-free
vibration measurements. In the present work, a new approach for efficient long to short term monitoring of wind
turbine gearboxes has been developed based on real data. An turbine drivetrain failure was used as a test case to
develop a new approach based on the use of multi-scale data sources. On the one hand, SCADA (Supervisory
Control And Data Acquisition) data were used for general monitoring of the condition of the machine component
on long to medium term time scales, while on the other hand, high resolution, triggered event data collected by a
CMS (Condition Monitoring System) were used to refine the diagnosis and prognosis of the fault on a shorter
time scale. Even though triggered spot events are very difficult to manage, the results show that the use of multi-
scale high resolution CMS data can be fast and useful in fault diagnosis to classify a target machine with a
healthy reference one. In the present work, the one-class SVM (Support Vector Method) was used for novelty
detection. The approach, when applied to all available time scales, can be very precise in detecting the faulty
machine and can therefore be proposed as a fast detection approach requiring less data compared to the classical
data-driven regression normal behaviour model developed with continuously available SCADA data.
Keywords: Condition monitoring / wind turbine / gearbox bearings / fault detection
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0),
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
2 F. Castellani et al.: Mechanics & Industry 25, 28 (2024)
Fig. 1. Estimates of failure rates and associated downtime for the most important wind turbine components (from [5]).
components. This is fundamentally one of the most Condition monitoring of wind turbine gearboxes can be
important ways to improve the performance of wind farms challenging due to the large number of components and the
[12,13] and to optimise the life cycle of the machines with an wide range of speeds involved. In this context, the ability to
optimal control of the ageing of components [14,15]. use a large amount of data covering key parameters over a
SCADA has given wind turbines a very practical and easy wide range of time scales can be fundamental.
way of storing large amounts of data, but the main drawback The time horizon for the development of a mechanical
is that the ten minute averaging time is too long to capture the failure in a wind turbine gearbox can vary from a few hours
overall dynamics of the system. Consequently, research on to many months depending on:
wind turbine fault detection based on SCADA data analysis – The speed of rotation of the components and their design
often involves the use of medium-term [16] to long-term [17] characteristics;
trends derived from meaningful measurements, typically – The operating conditions in terms of variations in speed
temperatures collected at rotating sub-components. The and loads;
underlying idea is that the initiation of a fault should be – The nature and cause of faults.
accompanied by an abnormal release of heat, observable in
In this context, all available data should be used for
the form of trends, spikes or increased variability. In practice,
optimal monitoring of the components in order to improve
a model representing normal behaviour is constructed using
the reliability and performance of the wind turbines;
data assumed to describe healthy operation, for example a
obviously, the monitoring analysis should cover all possible
regression [18] or a classification [19]. The statistical novelty
time scales, requiring an integrated multi-time-scale
of the measurements during the target period is then
approach. The merit of this approach lies in its ability
analysed. For the reasons outlined above, there is a growing
to provide a critical discussion on the identification of the
trend towards the use of high frequency data for fault
onset of failure. In real test cases, the classification of data is
diagnosis in wind turbines, as can be seen in [20,21]. However,
complicated by the lack of a clear demarcation line between
a key challenge associated with such data is that industrial
healthy and faulty measurements: the use of multiple
systems typically only store it when specific triggered events
time scales proves valuable in clarifying patterns. The
occur, making continuous monitoring impractical. Conse-
structure of manuscript is as follows: Section 2 describes
quently, a recent trend in the literature is to adopt an
the test case and the dataset; Section 3 provides a brief
approach that combines detailed high-frequency analysis
overview of the methods used; Section 4 presents and
close to the fault with continuous monitoring based on
discusses the results; and Section 5 draws conclusions.
SCADA, as done in references [22,23]. As such, multi-scale
data analysis for wind turbine fault diagnosis represents an
innovative research direction, and this study aims to 2 Testcase and dataset
contribute to the field through a discussion of a real-world
test case. This study examines a case of gear bearing failure in The case analysed is a multi-MW horizontal axis wind
an industrial wind farm employing a multi-scale approach, as turbine affected by a recently discovered fault in a drive
outlined below: train bearing.
– Long-term, through the analysis of hourly-averaged Although standard SCADA (Supervisory Control And
SCADA data for 6 years before the fault. Data Acquisition) data analysis gave no clear indication of
– Medium term, through the analysis of ten minutes averaged the fault, the final diagnosis was made by the discovery of
SCADA data for order of one year before the fault. metal debris in the gearbox oil.
– Short term, through the analysis of spot events sampled In general, bearing failures can have very different
with a frequency ranging from order of 1 Hz to thousands of prognoses and failure modes [24]. They can lead quickly
Hz.
F. Castellani et al.: Mechanics & Industry 25, 28 (2024) 3
and dramatically to severe failures (as in the case of fatigue Table 1. Characteristics of the data from the CMS.
failures), or they can affect machine operation gradually,
with a slow progression over time, as in the case of wear or Frequency of Number of points Acquisition
overload failures. The latter case, analysed in the present acquisition time (s)
work, is generally more difficult to diagnose accurately. In
1 kHz 3000 3
this case, the machine may continue to operate for a long
time in faulty conditions and the evolution of the damage is 100 Hz 2000 20
not clearly visible due to the weak symptoms. Data from 20 Hz 1000 50
industrial SCADA and condition monitoring systems were 10 Hz 1000 100
used simultaneously to provide a large amount of 2 Hz 400 200
information on different time scales. The SCADA system 1 Hz 400 400
collects statistics on a 10 minute time scale (min, max,
avg...); the time scale for acquisition is poor, but it
guarantees continuous monitoring of a large number of
machine parameters, ideal for long term data sets. The In this case, the data are stored on different timescales
condition monitoring system is capable of collecting high to allow efficient monitoring of both the fast and slow
resolution data with a frequency of 1 Hz to 1 kHz. Due to components; in any case, the shorter the timescale (higher
the large amount of data, monitoring in this case is not acquisition frequency), the shorter the total observation
continuous and only selected events (not evenly spaced in time as detailed in Table 1.
time) with a limited number of parameters are available. The reference time for acquisition is the time of the event
Figure 2 summarises the main sources of data that have (t = 0); for each time scale, the system collects 70% of the
been used in this work and some of the possible approaches data before and 30% of the data after the reference time. In
for each of the time scales. this way only the acquisitions from 1 up to 20 Hz can give a
The events collected by the condition monitoring good representation of the machine’s transitory (see Fig. 3).
system provide a high-resolution picture of an abnormal It should be also noted that for very high resolution time
event (i.e. an error or extreme condition) or an external scales (1 kHz and 100 Hz) the system collect data only for few
forced intervention, such as a manual stop or remote parameters (mainly the gearbox speed). As a consequence in
external command, that causes a rapid dynamic change in present work it was not possible to work directly in the
the machine’s state. frequency domain for analysing the fault.
4 F. Castellani et al.: Mechanics & Industry 25, 28 (2024)
Fig. 3. Time histories of the rotor speed in the time scales from 1 to 100 Hz for a “Manual Stop” event.
Among the different triggered events in the present Specifically, three different approaches have been
work, only the “manual stop” events have been considered, developed to investigate the damage’s evolution; the first
as they can represent similar transient time histories, two approaches were based on the use of SCADA data
making it easier to classify the machine health states. while the last one was based on high-frequency triggered
The long-term trend for the temperatures of the events.
gearbox components are summarised in Figure 4. The
seasonal trend is clearly visible from the plots but any 3.1 Differential SCADA temperatures analysis
abnormal behaviour can be inferred for the damaged
turbine Ttar compared to the reference healthy one Tref. A differential analysis of the SCADA data for the
The picture presented by the roughly averaged SCADA temperature of the bearings was used to reveal a possible
temperature time histories of Figure 4 is indeed very noisy heating caused by the evolution of the fault. In terms of
due to the effects of highly variable environmental and SCADA analysis, it is important to highlight these main
operating conditions. The alternating oscillations between concepts:
Ttar and Tref may also be caused by the different – Data must be pre-filtered to obtain a long-term database
maintenance level of the cooling fans of each unit. In with all turbines operating simultaneously.
any case, there is no clear and stable difference between the – A large and long data set is required.
two machines. – The analysis cannot be applied to a single turbine and a
Figure 5 shows the trend on a shorter time scale (the healthy reference is required.
raw 10 minute averages from the SCADA) for the
temperature of the gearbox bearing 1 (the damaged The aim of the differential analysis was to investigate
component) over the last year of operation. The scatter whether the turbine under consideration showed an
plot is opaque to show the wide scatter of the raw SCADA increase in bearing temperature. The parameter studied
data, which varies over a wide range depending on over different time horizons is derived for each time step
operating and environmental conditions. Standard using the equation (1), where T tark is the bearing
SCADA analysis alone does not provide a clear view of temperature of the turbine under study at each kth time
the fault, so new approaches were used to reveal the step and T ref k is the temperature of one or more reference
progression of the damage. units operating in a healthy state at the same time step.
1X N
3 Methods DT k ¼ T T tark TT ; ð1Þ
N i¼1 ref k ;i
In this work the increase level of vibration and the
temperature’s rise are used on different time scale to detect In equation (1) N is the total number of turbines
the mechanical fault. considered for healthy behaviour.
F. Castellani et al.: Mechanics & Industry 25, 28 (2024) 5
Fig. 4. Summary of the long-term trend for the gearbox temperature measurements within the SCADA database.
3.2 Regressive temperatures analysis Table 2. Input parameters for the random forest regres-
sion model.
The use of data-driven models to detect temperature
anomalies in wind turbine drive-train components can give Symbol Parameter Units
useful results, especially when applied to SCADA data V Wind Speed m/s
[25,26].
P Active Power kW
In present work a Random Forest Regression [27] data-
driven model for all component temperatures was used to b Blade Angle °
train a “Normal Behaviour Model” (NBM) to be used as a Poil Gear oil Pressure bar
reference for each turbine. In this scenario, the reference is v Rotor Speed rpm
established by the results generated by a normal behaviour Tamb Ambient Temperature °C
model trained on a dataset representing the healthy state of
the same turbine (for all the turbines considered, the first
year of operation was considered). For this type of analysis, – Drive train vibration.
the input parameters listed in Table 2 were used. The – Rotor speed.
different bearing temperatures were considered as outputs – Tower vibration components.
to calculate model residuals for each component (the Torque is not used because it is derived from electrical
difference between the actual temperature and the parameters and is not a “pure” mechanical signal.
predicted temperature). The SVM-based one-class classification method relies
on identifying the smallest hyper-sphere (with radius r and
3.3 Spot events classification centre c) that incorporates all the data points (xi ∀ i = 1, 2,
A feature classification approach using the One-Class … , n). This approach is referred to as Support Vector Data
Support Vector Machine [28,29] for the high resolution Description and can be expressed by equation (2) in the
following constrained optimisation form:
triggered events based on the vibration of the tower and the n o
drive train has been developed to detect the faulty events. In min r2 jkFðxi Þ ck2 r2 ∀i ¼ 1; 2; . . . ; n ð2Þ
this case, the selected feature classification parameters are: r;c
6 F. Castellani et al.: Mechanics & Industry 25, 28 (2024)
where F (xi) is a function to map xi into a vector of higher the analysed turbine. The most pronounced evidence of
dimensionality. damage was revealed when examining the long-term trend
However, the above formulation is highly restrictive, of the monthly averages of DT for bearing 1. The alarm
and can be strongly affected by outliers. Therefore also level was defined as follows:
flexible formulations are proposed for practical applica-
tions; further details on the basis of the approach can be T hr ¼ m ± xs ð3Þ
found in [30].
The reference classification model was trained on where m is the mean of the reference parameter, x is a
healthy events and features were computed by scaling constant adapted to the time resolution and s is the
the power spectral density of the parameters over different standard deviation. For the application of equation (3) in
time scales from 1 to 20 Hz. The final monitored parameter the long-term analysis and the monthly averages, a value of
was the percentage of outliers detected by the algorithm: x = 4 has been chosen, leading to the alarm at the beginning
events were classified as faulty if a large number of time of May 2021, as shown in Figure 6. The hourly data are
steps within each spot event were detected as outliers. much noisier, and the same threshold leads to possible early
false alarms.
Applying the same processing to the residuals for the
4 Results normal behaviour model using Random Forest regression
produced similar results, triggering an alert in June 2021
In this section, the results of the considered test case are (see Fig. 7). However, the results of the same analysis for
summarised and the main advantages and disadvantages of the short term failed to detect the error. This is due to the
the different approaches are discussed. Finally, an analysis nature of the weak, slow-evolving damage on a high-speed
of the evolution of the failure in the long term horizon is shaft bearing.
discussed. The results of the one-class SVM classification
showed that with this method it was only possible to
4.1 Multi-scale approach raise a general alarm, without being able to formulate a
specific prognosis and define the location of the damage.
The analysis of SCADA data differentials was performed Optimal results were obtained by using 20 Hz data
using different time resolutions and horizons: while excluding the rotor speed from the analysis (as in
– A short-term analysis using 10-minute averages of Fig. 8). This is because when rotor speed is also
SCADA parameters over a period of 18 months. introduced, the differences between the faulty target
– A long-term analysis using hourly averages over a period and the healthy reference are increased at 20 Hz, but
of more than 6 years. with the other time scales the fault was not clearly
The manifestation of the fault only became clear during discernible (Fig. 9).
the long-term analysis. As shown in Figure 6, a trend in the Getting an idea of the location of the fault is indeed
differential temperature of gear bearing 1 was detected for much more difficult using this approach.
F. Castellani et al.: Mechanics & Industry 25, 28 (2024) 7
4.2 Damage progression analysis the final component failure (TTF=Time To Failure). The
progression was described by the normalised cumulative
A final analysis of the damage progression was developed integral according to Eq. (4):
using the long term time histories of the residuals of the
Random Forest Regression NBM and the particle count t
∫0 Rðt Þdt
signal, which detects metal debris in the gearbox oil. The P ðtÞ ¼ TTF
ð4Þ
particle counting system is very useful to detect the ∫0 Rðt Þdt
damage in its final stage and to follow the final progression;
this measurement provided the “ground truth” reference for where
the test case. – P(t) is the normalised progression at time t.
The trend of the residuals as well as the trend of the – TTF is the Time To Failure.
particle counts were organised on a daily basis as – R is the instant progression represented by the
cumulative sums over a time horizon of 33 months before temperature residual
8 F. Castellani et al.: Mechanics & Industry 25, 28 (2024)
Fig. 10. Fault progress analysis using the residual of the temperature estimated using the NBM (Thermal progression) and the oil
particle counting system (particle release).
[26] M. Jankauskas, A. Serackis, M. Šapurov, R. Pomarnacki, A. a one-class support vector machine (SVM), Measurement
Baskys, V.K. Hyunh,T. Vaimann, J. Zakis, Exploring the limits 137, 287–301 (2019)
of early predictive maintenance in wind turbines applying an [30] B. Schölkopf et al., Estimating the support of a high-
anomaly detection technique, Sensors 23, 5695 (2023) dimensional distribution, Neural Computat. 13, 1443–1471
[27] P.F. Smith, S. Ganesh, P. Liu, A comparison of random (2001)
forest regression and multiple linear regression for prediction [31] G. Dorcas Wambui, G.A. Waititu, A. Wanjoya, The power
in neuroscience, J. Neurosci. Methods 220, 85–91 (2013) of the pruned exact linear time (PELT) test in multiple
[28] H.J. Shin, D.H. Eom, S.S. Kim, One-class support vector changepoint detection, Am. J. Theor. Appl. Stat. 4, 581
machines—an application in machine fault detection and (2015)
classification, Comput. Ind. Eng. 48, 395–408 (2008) [32] R. Killick, P. Fearnhead, I.A. Eckley, Optimal detection of
[29] J. Saari, D. Strömbergsson, J. Lundberg, A. Thomson, changepoints with a linear computational cost, J. Am. Stat.
Detection and identification of windmill bearing faults using Assoc. 107, 1590–1598 (2012)
Cite this article as: F. Castellani, M. Vedovelli, A. Canali, F. Belcastro, Wind turbine gearbox multi-scale condition monitoring
through operational data, Mechanics & Industry 25, 28 (2024)