Abstract
| Machine learning algorithms enable failure prediction of
large-scale, distributed systems using historical time-series
datasets. Although unsupervised learning algorithms represent a possibility to detect an evolving variety of anomalies,
they do not provide links between detected data events and
system failures. Additional system knowledge is required
for machine learning algorithms to determine the nature of
detected anomalies, which may represent either healthy system behavior or failure precursors. However, knowledge on
failure behavior is expensive to obtain and might only be
available upon pre-selection of anomalous system states using unsupervised algorithms. Moreover, system knowledge
obtained from evaluation of system states needs to be appropriately provided to the algorithms to enable performance
improvements. In this paper, we will present an approach
to efficiently configure the integration of system knowledge
into unsupervised anomaly detection algorithms for failure
prediction. The methodology is based on simulations of failure modes of electronic circuits. Triggering system failures
based on synthetically generated failure behaviors enables
analysis of the detectability of failures and generation of
different types of datasets containing system knowledge. In
this way, the requirements for type and extend of system
knowledge from different sources can be determined, and
suitable algorithms allowing the integration of additional
data can be identified. |