0% found this document useful (0 votes)
28 views12 pages

Enabling Low-Cost Automatic Water Leakage Detection A Semi-Supervised autoML-based Approach

Uploaded by

Hudney Guilherme
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views12 pages

Enabling Low-Cost Automatic Water Leakage Detection A Semi-Supervised autoML-based Approach

Uploaded by

Hudney Guilherme
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Urban Water Journal

ISSN: (Print) (Online) Journal homepage: www.tandfonline.com/journals/nurw20

Enabling low-cost automatic water leakage


detection: a semi-supervised, autoML-based
approach

Willian Muniz Do Nascimento & Luiz Gomes-Jr

To cite this article: Willian Muniz Do Nascimento & Luiz Gomes-Jr (2023) Enabling low-cost
automatic water leakage detection: a semi-supervised, autoML-based approach, Urban Water
Journal, 20:10, 1471-1481, DOI: 10.1080/1573062X.2022.2056710

To link to this article: https://doi.org/10.1080/1573062X.2022.2056710

Published online: 01 Apr 2022.

Submit your article to this journal

Article views: 261

View related articles

View Crossmark data

Citing articles: 4 View citing articles

Full Terms & Conditions of access and use can be found at


https://www.tandfonline.com/action/journalInformation?journalCode=nurw20
URBAN WATER JOURNAL
2023, VOL. 20, NO. 10, 1471–1481
https://doi.org/10.1080/1573062X.2022.2056710

RESEARCH ARTICLE

Enabling low-cost automatic water leakage detection: a semi-supervised,


autoML-based approach
Willian Muniz Do Nascimento and Luiz Gomes-Jr
DAINF, Universidade Tecnológica Federal do Paraná, Paraná, Brasil

ABSTRACT ARTICLE HISTORY


An important aspect of proper management of water resources is the reduction of losses in urban water Compiled 3 February 2022
distribution. Water loss is especially challenging in developing countries such as Brazil. The real-time
KEYWORDS
monitoring of the distribution system followed by the application of outlier detection techniques on Leakage detection; model
water flow data has been an effective strategy to reduce loss. However, these solutions require high selection and optimization;
investments in specialized personnel for building the models and data collection for machine learning. AutoML; self-organizing
This work presents a semi-supervised application of outlier detection techniques and Automated Maps; Local Outlier Factor
Machine Learning (AutoML) resources on water flow data from District Metering Areas (DMAs). The
system does not require experts for model configuration nor curated data for training. The system aims at
reducing implementation and deployment costs related to (i) hiring machine learning experts for model
configuration and (ii) curation of data for model training, enabling a low-investment deployment suitable
for low-income regions.

1. Introduction expensive and time-consuming endeavor. Therefore, this


When a distribution company supplies water to a municipality, is a scenario that could take advantage of the recent
part of the water is not billed, i.e. the amount of water collected advances in automated machine learning (AutoML) to
is not the same as the amount charged to the customers. Part of reduce costs and complexities associated with the devel­
this water remains in the distribution system characterized as a opment of the models.
loss. The water loss problem has multiple causes: leaks, mea­ Another challenge in this context is to obtain reliable train­
surement errors, unauthorized consumption, among others. ing data. Since recording actual leaks may not be done or done
Minimizing water leakage translates to efficiency gains, since indirectly due to high costs of curating this information, our
less water will have to be treated and pumped. This is becom­ focus is on the use of data partially correlated with anomalous
ing increasingly important considering the growing value of events. More specifically, we employ data from the services
water for humanity and the increasing environmental chal­ database (maintenance logs) as a starting point for the devel­
lenges, especially for low-income regions. opment of the system. The strategy is, therefore, a type of semi-
Water loss in the distribution system is currently a major supervised training.
problem for supply companies worldwide. According to Trata The objective of this work is to present a solution that
Brasil Institute (TRATA BRASIL 2017), Brazil has an index of 38% automatically detects leaks or anomalous events in district
of water losses in the distribution system, generating a financial metering areas, without the specialists having to work with
loss of over U$ 2 billion. In this paper we address the water the values manually. The solution adapts to each district meter­
distribution system of Curitiba, Brazil, composed of approxi­ ing area using historical data. The system is designed to reduce
mately 130 pressure zones, with a water loss index of 26.16% costs and complexity in development and deployment, making
in the distribution. Curitiba is one of the largest cities in the it a suitable strategy for low-income regions with restricted
country and has covered 99% of its population with treated budgets.
water and sewage. The solution considers the following requirements: (i) semi-
To identify leaks, Curitiba relies on experts to analyze supervised training of the data, using records partially corre­
water flow data, a process that would benefit from auto­ lated with real anomalies (in the case treated in this paper we
matization. However, automatically identifying leaks faces use data from maintenance logs); (ii) use of AutoML techniques
some challenges such as variation according to weather, for model selection and optimization; (iii) the use of algorithms
wrong sensor readings, different consumption on holidays, designed for outlier detection (different from the more generic
and the large amount of data that is often not mapped. In classification algorithms typically available in AutoML
practice, these challenges are manifested in an increased solutions).
complexity in choosing and tuning the most appropriate The main contribution of this research is the adaptation of
machine learning model. This complexity requires profes­ the problem and related algorithms for the use of AutoML
sionals to design and adjust the models, which is an techniques. Our tests demonstrate that our solution has

CONTACT Willian Muniz Do Nascimento [email protected]


© 2022 Informa UK Limited, trading as Taylor & Francis Group
1472 W. MUNIZ DO NASCIMENTO AND L. GOMES-JR

benefits in comparison both with traditional implementations For the detection of problems in the distribution system
without optimization and with the use of a non-adapted based on water flow, a very important measure is the minimum
AutoML solution. night flow, which generally uses the average or median water
The rest of this work is organized as follows: Section 2 flow in the period between 0am and 5am. With this measure, it
presents the fundamentals and related works, with a descrip­ is possible to have a measurement with minimal interference of
tion of the water loss problem, outlier detection techniques the human factor, and identify whether there is an actual
and its application in distribution systems. Section 3 presents problem in the distribution system.
the origin and treatment of the data used in out test scenario. To identify leaks, the water flow values in the distribution
In Section 4 we describe three different implementations of a system are observed by experts. The variations in these values
leak detection system, each considering a subset of our require­ can be due to several factors, which can make it difficult to
ments: the application of the techniques without optimization identify a leak. Among the main factors are abnormal usage by
(no requirements considered), the application of of-the-shelf customers (such as on hot days with swimming pool fillings,
AutoML (requirements i and ii) and the parameter optimization holidays, and others), operational maneuvers performed in the
of the LOF and SOM techniques (all requirements). distribution network, problems with lack of power, defective
sensors, lack of internet, and others. Such variety makes it
difficult to distinguish between real leaks and normal devia­
2. Fundamentals and related work tions in the patterns.
2.1. Water loss
The uninvoiced water loss index is the difference between the 2.3. Outlier detection systems
total water collected and the total water charged to customers. Outlier detection systems seek to identify observations that
Uninvoiced water can be the result of fraud, use of fire
deviate from the expected behavior (Chandola, Banerjee, and
hydrants, measurement errors, or problems in the distribution
Kumar 2009). One of the simplest ways to identify outliers in a
system (TRATA BRASIL 2017). single-variable data set is by using a statistical method called
Water loss in distribution systems falls in two categories: (i)
Standard Score or Z-Score. As described by Kreyszig (2009), to
unauthorized consumption and measurement errors, and (ii)
calculate the score z for the x value of a variable, the following
physical loss. Physical loss can be due to leakage problems formula is used.
such as burst pipes or leaking pipe connections. Pipe bursting
problems are usually easier and quicker to detect due to the
assistance of the population pointing out the leak location.
However, leaks that are in sandy soil may not be so obvious Where x is the arithmetic mean, and s is the standard deviation
and take days to repair. Moreover, problems with leaks in con­ of the variable. By applying a threshold on the Z-Score values,
nections or small cracks can take even longer to be detected. we can define a boundary to identify outliers. Usually, outlier
In most cities, the distribution network is too large to man­ detection tasks are more complex, requiring more advanced
age as a whole. To reduce complexity, the distribution network techniques.
is subdivided into District Metering Areas. A district metering Detection of outliers in time series is a vast field that has
area can be a specific supply area, a group of residences or been studied in the context of large volume of data Gupta et al.
companies, a neighborhood, a village, etc. (2014). Contextual outliers are among the main types of outliers
researched, which are dependent on and oriented to the data
types, such as time series data, spatial data, and graphs.
2.2. Water loss monitoring
Currently, outlier detection problems are mostly treated as
In an attempt to reduce water loss, water companies are pursu­ machine learning tasks.
ing real-time network monitoring and automatic leakage As described by Russell and Norvig (2016), there are four
detection in order to decrease the time between a leakage types of feedback that determine the main types of machine
detection and its repair. Supervisory Control And Data learning: unsupervised learning, reinforcement learning, super­
Acquisition (SCADA) Systems monitor in real time many or all vised learning, and semi-supervised learning. In unsupervised
of the District Metering Areas. These systems generate a large learning, the agent learns patterns on the inputs even if no
amount of data, which are usually discarded after a certain time feedback is provided. The most common unsupervised learning
or are just forgotten, as they are only used for online task is clustering: detecting potentially useful groups of the
monitoring. input examples. In reinforcement learning, the agent learns
In the database of SCADA systems it is possible to find from punishments for wrong choices or rewards for correct
several types of operational data, such as motor rotation, pres­ choices. In supervised learning, the agent observes some exam­
sure in the pipes, water flow, open valve percentage, and ples of input and output pairs, then it learns a function that
others. In the analysis of DMAs, historical data is very important maps from the input to the output. Finally, semi-supervised
to distinguish different types of outliers and detect consump­ learning is a point between unsupervised and supervised. It is
tion patterns. For example, if the DMA is located in a more useful when there is a lack of mappings of inputs to outputs, or
central part of the city, it generally has higher usage on week­ noisy data, or incorrect data that leads to false mappings.
days. More distant DMAs generally have higher consumption Therefore, it is necessary to make the best possible prediction
on weekends or holidays. with the available input and output pairs.
URBAN WATER JOURNAL 1473

In water leakage detection problems, reliable training data is distribution network, knowing the physical location of eight
usually not available. Therefore, unsupervised or semi- leaks. Thus, they created a leak function based on these data
supervised techniques tend to be favored. Typical unsuper­ and hydraulic principles to facilitate the SOM training.
vised techniques used in outlier detection tasks are SOM and Also applying a SOM model, (Aksela, Aksela, and Vahala 2009)
LOF. Self-Organizing Maps (SOMs) (Kohonen and Honkela 2007) used a hexagonal structure to avoid favoritism of horizontal or
are a type of artificial neural network used for data analysis and vertical directions in the map. The data from the three flow read­
visualization in multiple dimensions. SOMs consist of a compe­ ers per day of the week were used as input vectors (21 dimen­
titive learning grid that adapts to the data. Unlike other neural sions) in the training. Leak function values were also added in the
networks that work by error correction, in self-organizing maps training process. The result showed that the trained model can
Euclidean distances are calculated for all items using weights. detect leaks in a defined area of the distribution network. The
Through competitive learning with adjustment of the weights, method presented in the paper operates on its own after training.
the map restructures itself according to the distribution of the Leakage information is only needed in the training phase.
data in the input set. Local Outlier Factor (LOF) (Chandola, Romano, Kapelan, and Savi (2010) presented an online meth­
Banerjee, and Kumar 2009) is a density-based technique also odology for automatic detection of bursts and leaks by analyzing
used in outlier detection tasks. For each dataset instance, the data collected from sensors. The case study used a District
LOF value is given by the ratio of the local average density of Metering Area (DMA) in the UK. The methodology was validated
the k-nearest neighbors to the local density of the instance. An by simulating leakage events. This methodology used several
instance that is in a neighborhood with low density is declared artificial intelligence techniques: wavelets to remove the noise
as an anomaly, while an instance that is in a neighborhood with from the flow and pressure data, neural networks for short-term
high density is declared as normal. The SOM and LOF algo­ prediction of the pressure and flow values, statistical process
rithms were adapted in this work to be used in an AutoML control to analyze the divergences between the predicted and
setting (Section 4.3). observed data. Finally, the authors performed a classification of
the divergences and alarms through an inference system based
on bayesian networks. The results proved satisfactory with no false
2.4. Outlier detection for water distribution
alarms.
The use of technology in the area of leakage detection in water To detect anomalous events in water flow from four DMAs,
distribution systems has been widely studied (Li et al. 2015; Loureiro et al. (2016) created a practical approach using histor­
Puust et al. 2010; Wu and Liu 2017; Chan, Siong Chin, and ical data from the SCADA system and a history of work orders.
Zhong 2018). The two main types of solutions to the problem The authors proposed four new statistical methods for anom­
are: Leakage Assessment and Leakage Detection (also called alous event detection based on outlier regions, in which there
Leak Control Model (Puust et al. 2010)). The purpose of Leakage is a lower and an upper limit of the flow values. If the input
Assessment is to estimate the amount of water loss in the value is outside this region, it is considered an outlier. This
distribution system. In the Leakage Detection category, the region was previously determined by a training set in which
objective is to detect and locate leaks, following two main the data was marked as being either outlier or not. To perform
models: active and passive. In the passive model we have the this flagging the authors used work orders. The work orders
use of vision and sensors. In the active model we have transi­ may contain incomplete information, or simply not have an
ent-based approach, model-based approach and data-driven order recorded, so these did not specifically represent leaks
approach. (similar to the maintenance data that we used in this paper).
Chan, Siong Chin, and Zhong (2018) compiled an overview The authors obtained satisfactory results in the mostly residen­
of the different types of technologies currently available for tial DMAs for detecting outlier events as pipe bursts and others.
leakage detection. The data-driven approach relies on collect­ The main similarities between our work and those found in
ing data through sensors, processing the signal, and perform­ the literature are the use of the DMA context, water flow as input
ing statistical analysis for leakage detection. The advantage of data, and contextual elements such as: days of the month, days
this approach is that it does not require an in-depth knowledge of the week, and minimum night flow. Some features of our work
of the water distribution system, as the system learns from the differ from the others found, such as more contextual elements
history of data collected using statistical tools or pattern recog­ (features) used in the models, and a larger number of DMAs than
nition. The main disadvantage of this approach is the amount most of the other works found (besides using DMAs for both
of data required to build a predictive or classification model. pumped and gravity-based water distribution). The main differ­
This is the approach used in this work. We rely on historical data ences, however, are the use of AutoML to select the model and
and maintenance logs to build the models. optimize the hyperparameters, and the semi-supervised applica­
As presented by Wu and Liu (2017), we can find several tion of SOM and LOF with optimization of the hyperparameters
techniques for identifying leaks using data-driven approaches. for each DMA. In the section 4 we describe in more details the
They can be classified into three categories: classification characteristics of the implementation of this work.
method, prediction-classification method, and statistical method.
A classification method can be constructed to distinguish a
2.5. Automated machine learning
leak from normal data. Aksela, Aksela, and Vahala (2009) pre­
sented a method for detecting leaks in water distribution sys­ One of the problems in the application of outlier detection
tems based on Self Organazing Maps (SOM). The authors algorithms to water distribution systems is the complexity
obtained data from three flow readers in the studied related to finding the best model, features and parameters.
1474 W. MUNIZ DO NASCIMENTO AND L. GOMES-JR

Currently, a solution to this type of problem is the use of Optimizer; and finally the highly parameterized machine learning
Automated Machine Learning (AutoML). AutoML involves sev­ framework is composed from high-performance classifiers and
eral aspects: Feature engineering, Combined Model Selection preprocessors from the scikit-learn library.
and Hyperparameter optimization, and Meta-Learning. The This research applied AutoML technologies on the problem
concepts involved are described below. of outlier detection in water distribution systems. In our work,
Feature engineering aims to present the best features for a the focus is on model selection and hyperparameter optimiza­
given model. To accomplish this it requires preprocessing, tion. One of our contributions is the adaptation of the unsu­
representation learning, and then selection of key features pervised SOM and LOF algorithms to a semi-supervised scheme
(Tuggener et al. 2019). Among the options for implementing with automatic hyperparameter optimization.
feature engineering are a regression-based feature learning
algorithm called AutoLearn (Kaul, Maheshwary, and Pudi
3. Data processing and methodology
2017) and a framework for automated feature generation called
ExploreKit (Katz, Chul Richard Shin, and Song 2016). Both In this section we present details about the origin of the tem­
options have a similar goal: to ease the workload of feature poral database data, the treatment performed for the relational
engineering by automating this process. In this way both solu­ database and the strategy for obtaining the data indicating
tions can perform feature selection of the best features and also anomalous events.
present candidate features either modifying existing ones or
generating new ones.
3.1. Data and methods
Hutter, Kotthoff, and Vanschoren (2019) describe that hyper­
parameters can be found in most machine learning algorithms, Based on the water distribution system of Curitiba, 16 DMAs
especially in the newer deep learning approaches. These hyper­ were defined in conjunction with professionals from the
parameters are set in the algorithms before execution and do Control and Operation Center (CCO) to carry out this research.
not change during the algorithm training process. Thus, auto­ Data were obtained from a temporal database called Proficy
mating the optimization of hyperparameters becomes an Historian.1 The date range used to apply the techniques was
essential task to reduce the time required to configure the from September to December 2018.
models. Each type of equipment (pump, sensor, valve, etc.) produces
As presented by Yu and Zhu (2020), the automation of a set of different numerical variables that are stored in the
hyperparameter optimization is usually performed by search Historian database. These variables may refer to the values of:
algorithms such as: Grid Search, Random Search, Bayesian rotation speed, voltage, water flow, running time, pressure,
Optimization and its variants, and Tree Parzen Estimators. among others. These numeric values are stored in a temporal
Search algorithms are implemented and are available for use database as ‘tags’. Each numerical information is recorded in a
in toolkits for AutoML. tag with its proper temporal component.
In principle, every machine learning task to be applied on a The Historian temporal database is generated by the real-
dataset needs to define which algorithm it will use, how to set time monitoring and control SCADA system. The data was
its hyperparameters, and how to process its features (Feurer et queried to retrieve interpolated readings for every 5-minutes
al. 2015). The combination of automatic algorithm and hyper­ interval. The query also retrieves metadata related to the qual­
parameter selection has been called the CASH problem ity of the readings. The temporal database data was then
(Combined Algorithm Selection and Hyperparameter optimiza­ mapped and imported into a relational database using the
tion) (Thornton et al. 2013). This problem consists of automati­ model in Figure 1. After that, a data cleaning process is applied,
cally selecting the learning algorithm and simultaneously its where the readings with bad quality and values above the limit
hyperparameters in order to optimize performance of a given are replaced with the average of valid values from neighbor
data set. A solution to the CASH problem is through the use of readings. There is no global normalization of the data since the
machine learning features as if they were blocks assembled into models are built for each DMA.
a pipeline. A complete machine learning pipeline consists of In the Figure 1, a DMA (pressure_zone) is a specific region of
the blocks: data cleaning, feature engineering, model selection, the distribution network, which can be a neighborhood, village,
hyperparameter optimization, and at the end creating an or just a set of network connections. An Equipment (equipment)
ensemble of the top models trained to perform well on the may be a sensor to measure flow, volume, pressure, or even an
test data (Tuggener et al. 2019). engine. A Unity (unity) is a broader division (a municipality, a
In order to automate the Python machine learning library metropolitan area, or a collection of municipalities). A Tag (tag)
called scikit-learn, scikit-learn, Feurer et al. (2015) presented their is a single measure of equipment such as liters per second,
Auto-sklearn system. This system performs automated machine cubic meters, parts per meter, etc. Tag Values (tag_value) are
learning based on the Auto-WEKA CASH problem definition. The numerical values on dates with good or bad quality. The quality
authors proposed a robust and efficient tool for solving this column (good or bad) and the limit column from Tag table
problem. The machine learning pipeline of the Auto-sklearn sys­ were used to clean the data. An Event (event) is a maintenance
tem is given in three blocks: Meta-learning, Bayesian Optimizer record in the water distribution network.
and Ensemble Builder. First, Meta-learning is used to search the DMAs are related to tags in such a way that a DMA may be
dataset for learning frameworks with good performance to estab­ composed of several tags: some input, some output of water
lish a good starting point in the Bayesian Optimizer; second the flow. An inflow tag from a DMA may be an outflow tag from
ensemble of models is automatically built by the Bayesian another DMA, i.e. a tag may be linked to several DMAs.
URBAN WATER JOURNAL 1475

Figure 1. Relational database model containing the tables that integrate the information used in this work.

DMAs require maintenance, which usually interrupts the Self-Organizing Maps (SOM), and (c) a statistical approach
supply of water and/or affect the data. These maintenances called Standard Score (Z-Score). We added a fourth algo­
are called Events (event) and they are imported with a start rithm based on hard-coded rules that simulate the decision
date, end date and normalization date of the incident. An process currently used in the water supply company. This
event may be a network repair, a broken network, an algorithm was named Specialist and served as a baseline to
automation problem, a record of a power failure, among assess the performance of the other algorithms.
others. The data used as events were obtained and Implementation (ii) does not use the aforementioned algo­
mapped from another system. These data do not necessa­ rithms since it applies its own supervised algorithms in the
rily represent a leak, however these data are used as optimization process.
proxies for anomalies (outliers) in this work. These events The quality of the prediction for each implementation
are used to perform comparisons between the anomaly was assessed based on the maintenance database and
detection techniques in section 4. using traditional machine learning metrics of Precision,
Recall and F-Score. Precision is the ratio of correctly
detected outliers to the total outliers detected. Recall is
3.2. Methods the ratio of correctly detected outliers to the total of out­
To demonstrate the effectiveness of AutoML applied to water liers in the dataset. The F-Score is the harmonic mean of
leakage detection, we tested three different approaches for Precision and Recall. Accuracy is not used in our discussions
hyperparameter tuning: (i) traditional implementation without since the classes are highly unbalanced.
automatic optimization, (ii) optimization using an off-the-shelf
AutoML tool, and (iii) automatic optimization over specific out­
4. Implementation
lier detection algorithms. Approach (iii) is our main contribu­
tion, featuring algorithm adaptations to enable the use of To demonstrate the applicability of AutoML techniques, we
traditional outlier detection algorithms (unsupervised) in an tested three approaches: (i) Algorithm implementation without
AutoML (supervised) context. hyperparameter optimization; (ii) Implementation of
The algorithms use in the implementations (i) and (iii) Automated Machine Learning (AutoML) using the Auto-
were: (a) a density-based technique called Local Outlier Sklearn library; (iii) Automatic hyperparameter optimization of
Factor (LOF), (b) a Neural Network-based technique called SOM and LOF techniques.
1476 W. MUNIZ DO NASCIMENTO AND L. GOMES-JR

4.1. Baseline implementation without optimization Since leaks are associated with increase in flow, the formula
for the function r (ratio) is defined, which returns the ratio
The first approach simulates a traditional implementation of an
between the values read when the value of x (current reading)
outlier detection algorithm with basic parameter tuning. In this
is greater than or equal to the value of y (reference reading). In
setting, we employ a few different machine learning algorithms
the comparison between the two water flow variables, the
which are compared with a rule-based algorithm that resem­
value of r is set (greater than zero) only when there is an
bles the current decision process done by experts. The tests
increase in flow, because the reduction in flow is desirable
performed in this first approach are used as a baseline to assess
and does not constitute a leak.
the other two that use AutoML.
This implementation is based on applying and evaluating
several outlier detection algorithms on the data: SOM, LOF,
Z-Score and the Specialist algorithm, described in the next
subsections.
In the sequence, the score formula of the specialist (Specialist) is
defined by the function s, where α is the average of the flow
values, β is the minimum night flow, i is the index of the day
Outlier detection algorithms
under analysis, i-1 is the index of the previous day and i-30 is
The initial preparation for applying the techniques is a the index of the previous month. The vectors α and β repre­
mapping of the data by District Metering Areas (DMAs) sent the time series of the readings.
with their appropriate inputs and outputs. The data range The specialist algorithm (Specialist) is used in the same
spans from September to December 2018: the target month settings as the LOF, Z-Score and SOM techniques, with the
being December (for outliers to be detected) plus the three same approach for quality assessment. In the same fashion as
previous months for training the algorithms. To this end, a the other algorithms, the specialist formula returns an outlier
database search is performed by selecting the values factor capturing the degree in which the reading is anomalous.
grouped by day of the month, making a daily average of Based on the contamination parameter, a threshold is defined
the flow and the minimum night flow by DMA. We also so that all readings over that threshold are considered outliers.
used the day of the week as a contextualization feature. At
first we applied the Z-Score, SOM and LOF algorithms for
automatic outlier detection on the data of the 16 DMAs.
The output of the algorithms is a boolean value (True or
False) indicating whether the values read in the last 24 h
Figure 2 shows an example of the use of the specialist algo­
represent an anomaly for a given DMA. In a production
rithm (shadows represent the confidence interval for previous
scenario, the analysis will run continuously, alerting the
months averages, which are taken into account). In this case,
stakeholders on a daily basis.
the algorithm identified days 8 and 13 as outliers. For day 8 we
The application of the Z-Score is composed of two steps:
have the value 39:89 for α and 20:41 for β, αi 1 ¼ 37:39,
first the calculation of the score is applied to the minimum
αi 30 ¼ 33:45, βi 1 ¼ 20:35, βi 30 ¼ 16:95, as result for the
night flow value using the day of the month and day of the function s we have the score 0:46. For day 13 we have the
week variables, considering as outliers the values with value 44:47 for α and 22:96 for β, αi 1 ¼ 38:49, αi 30 ¼ 41:10,
scores above 2. Similarly it is applied on the average daily βi 1 ¼ 20:95, βi 30 ¼ 19:93, as result for the function s we have
flow value. In other words, this technique used only two the score 0:48. In this case, the threshold defined was 0:45, the
variables, as it did not take into account the day of the scores above that were considered outliers.
week or day of the month.
For the SOM and LOF techniques, the variables daily mean,
minimum night flow, day of the week and day of the month are Analysis of results
used. It is important to use both day of the month and day of the
In this subsection we present quality results in terms of
week to capture both types of cyclical variations for each DMA.
Precision, Recall and F-Score. In order to standardize the
To function as a baseline for the current leak detection
execution of the techniques we define two hyperpara­
procedures, we implemented an algorithm that simulates the
meters: size and contamination, common among all the
criteria currently used by professionals. This algorithm is pre­
techniques described in the previous two subsections
sented in the next section.
(LOF, Z-Score, SOM and Specialist). For LOF, the size para­
meter is the number of neighbors, and for SOM, the number
of maps. The contamination parameter determines the per­
Specialist algorithm
centage of observations expected to be outliers. The SOM
In order to simulate the current leak detection strategy in the and LOF techniques use contamination, and the Z-Score
city, we present in this section an algorithm that replicates the and Specialist a threshold value.
procedure of the professionals when analyzing the water con­ In preparing the data for common execution between the
sumption graphs. The objective is to have a reference for techniques, we set up two vectors for each DMA: the prediction
comparing the automatic techniques against current practices. vector (containing the output of the algorithms) and the vali­
The algorithm is defined below. dation vector (containing maintenance data).
URBAN WATER JOURNAL 1477

Figure 2. Specialist algorithm applied in the GGRE DMA.

Table 1. Techniques with the highest F-Score for each DMA.


DMA Corr. Acurracy Precision Recall Fscore Support Technique Contam.
GBAC 0,21 0,85 0,43 0,18 0,25 17 SOM 0,05
GBAL 0,22 0,80 0,26 0,43 0,32 14 SPECIALIST 0,5
GCOS 0,03 0,72 0,24 0,16 0,19 25 SPECIALIST 0,5
GGRE 0,08 0,86 0,18 0,14 0,15 22 Z-SCORE 1,5
GJMA 0,12 0,76 0,43 0,11 0,17 28 LOF 0,05
GMER 0,01 0,58 0,40 0,12 0,19 96 Z-SCORE 1,5
GPAR 0,09 0,82 0,29 0,11 0,15 19 LOF 0,05
GPAS 0,12 0,66 0,57 0,10 0,16 42 LOF 0,05
GRSC 0,00 0,94 0,00 0,00 0,00 0 SOM 0,05
RABC 0,13 0,86 0,29 0,14 0,19 14 SOM 0,05
RBAL 0,19 0,84 0,43 0,16 0,23 19 LOF 0,05
RBSC 0,13 0,83 0,19 0,27 0,22 22 Z-SCORE 1,5
RCOS 0,06 0,77 0,24 0,14 0,18 42 Z-SCORE 1,5
RGRE 0,34 0,80 1,00 0,14 0,25 28 SPECIALIST 0,6
RJMA 0,10 0,75 0,43 0,10 0,16 30 LOF 0,05
RPAR 0,06 0,70 0,32 0,16 0,22 62 Z-SCORE 1,5

The following values for the hyperparameters were tested: In the next section we present the application of AutoML for
SOM width of the maps 10, 20, 50, 100; LOF 2, 3, 4, 5, 6, 7, 8, 9, 10 a more robust and automated selection and optimization of the
neighbors; Z-Score with thresholds of 1.5, 2 and 2.5 sigmas; algorithms. The objective is to determine if there are practical
Specialist with thresholds of 0.5, 0.6, 0.7. advantages in adapting the outlier detection problem into a
Tables 1 and 2 present the results of the executions for the supervised technique optimized by AutoML.
hyperparameter variations mentioned above. Table 1 presents
the calculation of metrics grouped by the sixteen DMAs. For 4.2. AutoML using auto-Sklearn
each DMA, the technique with the highest F-Score is displayed.
This section presents a first approach to the central proposal of
As can be seen in this table, there is no dominance of any
this paper: automatically optimize outlier detection algorithms
technique. In Table 2 we present the same results grouped by
for water distribution systems. Figure 3 shows the overall archi­
technique, with their respective highest F-Score, lowest and the tecture of this solution.
mean. The maximum of the specialist algorithm was slightly The figure shows the temporal database which is populated
above the others, however, on average, the techniques per­ with data from water flow sensors installed in the network, with
formed similarly. real-time frequency. The flow data may contain noise, and due to
the lack of relationships among the tags, the cleaning and map­
ping step is necessary. The maintenance records originate from a
Table 2. Comparison of F-Score values among techniques. system on a mainframe platform operated by the technicians
Technique Max. Min. Mean responsible for each DMA. For training and optimization of the
SOM 0,25 0,08 0,15 model, data is selected from the relational database by DMA over
LOF 0,29 0,07 0,15 a period of eight months. The techniques are applied to new
Z-SCORE 0,23 0,06 0,14 data from the relational database and the results are presented
SPECIALIST 0,32 0,07 0,15
to the professionals through reports (graphs).
1478 W. MUNIZ DO NASCIMENTO AND L. GOMES-JR

Figure 3. Outlier detection architecture.

The techniques presented in subsection 4.1 were trained weeks ago; e) mean of two weeks ago; g) difference to the
using all the flow data from the selected period. The techniques mean of two weeks ago; h) minimum of three weeks ago; i)
identified outliers based on the contamination hyperparameter difference to the minimum of three weeks ago; j) mean of three
(which specifies the percentage of data considered anoma­ weeks ago; k) difference to the mean of three weeks ago.
lous). The output of the algorithms (detected outliers) were We run Auto-Sklearn with the combinations of size 1, 50,
used for comparison with the maintenance data of the same 100 and 200 for the parameter ensemble size. And with 30,
60, 90, 180, 300 seconds for the runtime parameter. Due to
period. However, for the application of AutoML’s supervised
the increase in the time window, the increase in the number
learning techniques, we performed a split between training set
of features, the increased number of combinations with
(75%) and testing set (25%).
K-Fold and Auto-Sklearn hyperparameters, we reduced the
For the application of AutoML, we use the Auto-Sklearn runtime, as the default of the tool of 1 hour became
Feurer et al. (2015) library described in the subsection 2.5. unfeasible.
Given the training set, we perform a direct application of For a more fair comparison with the results from the pre­
AutoML for all DMAs. However, the initial results of applying vious section (4.1), we re-ran the test with the improved setup
AutoML were significantly worse than the baseline presented in (new features, more training data and k-fold validation). The
the previous section. In order to improve the AutoML results, new results were much improved, and better than the ones
we changed our setup as described next. using AutoML.
We increased the training dataset to improve the quality of the In Table 3 we present the values grouped by DMAs and
model: we doubled the amount of data, from four to eight months technique with the highest average F-Score among all runs. It
of time used (from May to December 2018). To reduce the varia­ is also possible to visualize the size and contamination para­
tion in the quality of the data selected for training, we implemen­ meters used, as well as the highest and lowest F-Score value
ted the K-Fold cross-validation method. K-Fold, through multiple that the technique obtained for that DMA. With these results
runs, ensures that all data is used for both training and testing.
Furthermore, we also added new features to assist in the
acquisition of context information. Starting from the way the
Table 3. Techniques with the highest F-Score for each DMA.
specialist algorithm works, calculating the proportion between
DMA Technique Size Contam. Max. Min. Mean
two days and using only positive proportions, we add new RABC SPECIALIST 1 0,70 0,28 0,09 0,17
variables following the same approach, however calculating GBAC SOM 10 0,05 0,31 0,18 0,24
the difference instead of the proportion. GMER Z-SCORE 1 2,00 0,67 0,45 0,54
RBSC SPECIALIST 1 0,70 0,26 0,09 0,18
The features added refer to the day of the week and the GBAL SOM 10 0,05 0,24 0,13 0,19
minimum and mean values. We use the value of the minimum RBAL SOM 10 0,05 0,49 0,21 0,34
and mean from the previous week with the respective differ­ GGRE SOM 10 0,05 0,20 0,15 0,18
RGRE SOM 10 0,05 0,46 0,18 0,32
ence. In the same way we calculate the values of the previous GJMA SPECIALIST 1 0,50 0,48 0,23 0,38
two weeks and three weeks. This generated the following RJMA Z-SCORE 1 2,00 0,47 0,31 0,40
features: a) minimum of the previous week; b) difference to GPAR SOM 10 0,05 0,31 0,21 0,27
RPAR Z-SCORE 1 2,00 0,53 0,38 0,44
the minimum of the previous week; c) mean of the previous GCOS SOM 10 0,05 0,45 0,20 0,32
week; d) difference to the mean of the previous week; e) mini­ RCOS SOM 10 0,05 0,48 0,21 0,36
mum of two weeks ago; f) difference to the minimum of two GPAS SOM 10 0,05 0,58 0,44 0,52
URBAN WATER JOURNAL 1479

Table 4. Comparison of F-Score values among techniques. Table 5. Values used to perform Grid Search in the hyperparameters of the SOM
Technique Max. Min. Mean and LOF techniques.
SOM 0,60 0,07 0,32 Technique Parameter Values Iteration
LOF 0,66 0,10 0,31 SOM Map size 44 to 66 2
Z-SCORE 0,67 0,09 0,31 SOM Contamination 0,06 to 0,2 0,02
SPECIALIST 0,67 0,06 0,31 SOM Sigma 1 to 3 1
AUTOSKLEARN 0,53 0,07 0,26 SOM Neighborhood gaussian, bubble e triangle. –
Function
SOM Learning Rate 3 to 6 1
LOF Algorithm Auto –
LOF Leaf Size 20 1
we can observe that Auto-Sklearn still obtained a lower result LOF Number of 12 to 50 2
neighbors
than the other techniques, especially SOM. In most DMAs the LOF Contamination 0,05 to 0,2 0,01
SOM showed the best results. LOF Metrics cityblock, cosine, l1, Manhattan, –
In Table 4 we present the values grouped by technique. Canberra, dice, Jaccard,
rogerstanimoto, russellrao,
The highest and lowest value for each technique, plus the sokalmichener, sokalsneath.
mean of all combinations for all DMAs. To present the results
by technique in Table 4, the F-Score values zeroed in the
identification of the minimums were removed. When running To adapt the SOM and LOF algorithms to a semi-supervised
some algorithms, the Auto-Sklearn displayed a message indi­ setting, we used two different strategies: for LOF a novelty
cating that the result found is lower than a random model. detection function from the library itself was used; for SOM
This and other issues with the tool are discussed in sec­ we created a function based on the training threshold.
tion 4.4. LOF training with novelty detection enabled receives the
These results indicate that the SOM and LOF algorithms training data from the flow data, not using the maintenance
are better able to capture aspects that determine the training dataset (i.e. an unsupervised training). After the train­
abnormality of an observation. A likely reason for this ability ing is done, the predict method is called using the test dataset
may be related to the emphasis on the spatial distribution from the flow data. With the results we perform the comparison
of the densities of the observations, in contrast to the
with the test maintenance data (to calculate the F-Score used
subdivisions of space performed by traditional classification
as the basis for optimization).
algorithms. To combine the advantages of the SOM and LOF
The adapted process for the SOM starts by applying the
algorithms with the efficiency of AutoML, we then devel­
technique to obtain the outlier factor (outlier score) for each
oped a semi-supervised scheme for hyperparameter optimi­
input of the training dataset. Based on the contamination, the
zation, described in the next section.
threshold of the outlier factor is calculated to classify the test
dataset inputs. The output of the test dataset is used to calcu­
late the F-Score along with the validation dataset (maintenance
4.3. Hyperparameter Optimization for SOM and LOF in
data).
AutoML Style
To optimize the hyperparameters, we implement a Grid
As a consequence of the unsatisfactory results obtained with the Search procedure. The hyperparameters used are described in
classification techniques in the Auto-Sklearn, we chose to apply a Table 5.
hyperparameter optimization on the SOM and LOF algorithms. The results are presented in Tables 6 and 7. Param1 refers to
The objective is to devise a semi-supervised approach by select­ the SOM sigma and LOF algorithm. Param2 refers to the SOM
ing the best hyperparameters using the maintenance data. neighborhood function and the LOF leaf size. Param3 refers to

Table 6. Techniques with the highest F-Score for each DMA.


DMA Technique Size Contam. param1 param2 param3 Max. Min. Mean
RABC SOM 52 0,06 1 gaussian 4 0,52 0,10 0,22
GBAC SOM 54 0,18 1 gaussian 3 0,67 0,26 0,39
GMER LOF 42 0,12 auto 20 russellrao 0,69 0,44 0,54
RBSC LOF 14 0,2 auto 20 Canberra 0,30 0,15 0,23
GBAL SOM 44 0,2 1 bubble 5 0,33 0,13 0,20
RBAL SOM 54 0,1 1 bubble 5 0,51 0,21 0,37
GGRE SOM 56 0,18 1 bubble 3 0,28 0,18 0,23
RGRE LOF 28 0,2 auto 20 rogerstanimoto 0,50 0,19 0,36
GJMA LOF 46 0,13 auto 20 russellrao 0,52 0,24 0,43
RJMA LOF 44 0,2 auto 20 russellrao 0,51 0,32 0,43
GPAR LOF 24 0,18 auto 20 rogerstanimoto 0,44 0,23 0,33
RPAR SOM 46 0,16 1 bubble 6 0,53 0,40 0,47
GCOS LOF 28 0,19 auto 20 dice 0,43 0,30 0,37
RCOS SOM 50 0,18 1 gaussian 5 0,48 0,32 0,41
GPAS LOF 20 0,13 auto 20 russellrao 0,58 0,44 0,52
1480 W. MUNIZ DO NASCIMENTO AND L. GOMES-JR

Table 7. Comparison of F-Score values between techniques. easily performed with classification algorithms commonly used
Technique Max. Min. Mean in AutoML. To solve this issue we added new context features
SOM 0,67 0,10 0,36 and performed an integration of the SOM and LOF algorithms
LOF 0,69 0,10 0,35
in Auto-Sklearn.
Integrating the SOM and LOF algorithms in a AutoML setting
combines the benefits of the algorithms for outlier detection and
the convenience of automatic hiperparameter optimization.
Table 8. Comparison of F-Score mean among the sections. AutoML simplifies the process of designing the system, therefore
Technique Section 4.1 Section 4.2 Section 4.3 reducing some of the costs associated with hiring machine
SOM 0,15 0,32 0,36 learning experts. This is also a critical issue in low-income regions
LOF 0,15 0,31 0,35 that frequently have a shortage of such professionals.
The proposed system is not yet implemented in a produc­
tion environment. Therefore, we were not able to assess its
performance in a real scenario. In production, a main concern
the SOM learning rate and the LOF metric. The LOF sheet size is to control for errors, being false positives or false negatives.
parameter is always 20 and the LOF algorithm is always The rate of false negatives can be controlled by manually
automatic.
changing the contamination parameter of the models. The
In the Table 7 the mean, minimum and maximum values
balance between false positives and negatives can be adjusted
were calculated referring only to the winning executions of
in training by changing the alpha parameter in the f-score
each DMA, due to the large variation in the result caused by
numerous combinations of hyperparameters during optimiza­ metric. Further adjustments in the model can be done by
tion. The change in approach was positive, as it enabled the implementing incremental learning, which would tune the
practical possibility of applying outlier detection to new data, model based on feedback from experts. Costs with false posi­
which could be used in real time. tives should also be reduced with organizational procedures,
Finally, Table 8 compares the average results from each of for example having experts check the graphs provided by the
the implementations. The combination of the AutoML approch system and coordinating with other departments to check if an
with the SOM and LOF algorithms achieved the best results. anomaly is the result of a programmed intervention.
Finally, our focus in this research is on assessing the feasibility
and benefits of applying AutoML techniques. Therefore, we did
4.4. Discussion not perform a comprehensive feature engineering phase. The
The main challenges in the implementation of our proposal models presented here are adequate for the target production
derive from two central characteristics of the problem: (i) environment, since they already consider more variables than
imprecise validation data and (ii) context dependence in deter­ the ones in use currently. However, many more variables could
mining outliers. be added to the models, such as ones related to weather,
Imprecise validation data (maintenance logs partially related geography, and socioeconomic factors in the DMA region.
to leaks) leads to problems distinguishing between classes (out­
lier or inlier) and a low F-Score value (between 0.3 and 0.7). The
5. Conclusion
narrow line for class separation meant that, in some cases, the
Auto-Sklearn classification algorithms did not return any results The problem of water losses is important and should be taken
as outliers, and as a consequence presented a zero F-Score. seriously by water companies. Daily monitoring for the detection
The justification for using imprecise maintenance data is the of outliers can be fundamental to reduce the loss index. The
applicability in a real-world scenario, since the data was extracted practical objective is to reduce the time to detect leaks and detect
directly from the system in production. This is an important char­ small leaks that may exist in the network and have not been
acteristic, especially for regions of low-income with restricted detected yet.
implementation budgets. Using this imprecise data is an improve­ Our proposal addresses this issues by applying outlier
ment over unsupervised models typical in outlier detection solu­ detection with automatic hyperparameter optimization in a
tions. In a production environment, since the model is based on semi-supervised learning setting. We demonstrate that the
water flow data, outliers will be related to anomalous water flow unsupervised techniques SOM and LOF applied in a semi-
variations, which limits the impact of the imprecise data used in supervised manner obtained better results compared to the
parameter tuning. If more precise data are available, they can be model created with AutoML using the Auto-Sklearn tool.
used with no further changes in the system. This work can be applied in a practical way and has the
The second matter, context dependence on determining potential to have a positive impact on the reduction of water
outliers, concerns the importance of analyzing characteristics losses in Curitiba and similar low-income regions. The proposal
of similar observations. In the SOM and LOF algorithms, context is also adaptable to other contexts, whether in different cities or
is captured in the multidimensional space formed by the vari­ similar problems (e.g. energy distribution, communication
ables. In these algorithms, the item under analysis is character­ infrastructure, etc.). The proposed model is flexible and could
ized as an outlier depending on the relative density of the easily accommodate more dimensions to capture more seaso­
neighboring observations. This type of spatial analysis is not nal, weather-dependent and holiday patterns.
URBAN WATER JOURNAL 1481

As future work we intend to apply automated feature engi­ Katz, Gilad, Eui Chul Richard Shin, and Dawn Song. 2016. “Explorekit:
neering, implement more advanced search techniques for the Automatic Feature Generation and Selection.” In 2016 IEEE 16th
hyperparameter space, and implement incremental learning International Conference on Data Mining (ICDM), 979–984. Barcelona,
Spain: IEEE. doi:10.1109/ICDM.2016.0123.
with feedback from experts in charge of the decisions.
Kaul, Ambika, Saket Maheshwary, and Vikram Pudi. 2017. “Autolearn—
Automated Feature Generation and Selection.” In 2017 IEEE
International Conference on data mining (ICDM), New Orleans, LA, USA,
Notes 217–226. IEEE. doi:10.1109/ICDM.2017.31.
Kohonen, Teuvo, and Timo Honkela. 2007. “Kohonen Network.”
1. https://www.ge.com/digital/applications/proficy-historian
Scholarpedia 2 (1): 1568. doi:10.4249/scholarpedia.1568.
Kreyszig, Erwin. 2009. Advanced Engineering Mathematics 10th Edition. New
Jersey, USA: Publisher John Wiley & Sons.
Disclosure statement Li, Rui, Haidong Huang, Kunlun Xin, and Tao Tao. 2015. “A Review of
Methods for Burst/leakage Detection and Location in Water
No potential conflict of interest was reported by the author(s).
Distribution Systems.” Water Science and Technology: Water Supply 15
(3): 429–441.
Loureiro, Dália, Conceição Amado, André Martins, Diogo Vitorino, Aisha
ORCID Mamade, and S T. Teixeira Coelho. 2016. “Water Distribution Systems
Flow Monitoring and Anomalous Event Detection: A Practical
Willian Muniz Do Nascimento http://orcid.org/0000-0001-5121-6723 Approach.” Urban Water Journal 13 (3): 242–252. doi:10.1080/
Luiz Gomes-Jr http://orcid.org/0000-0002-1534-9032 1573062X.2014.988733.
Puust, R, Z Kapelan, DA Savic, and T Koppel. 2010. “A Review of Methods for
Leakage Management in Pipe Networks.” Urban Water Journal 7 (1):
References 25–45. doi:10.1080/15730621003610878.
Romano, M, Z Kapelan, and DA Savi. 2010. “Real-time Leak Detection in
TRATA BRASIL. 2017. “Instituto Trata Brasil.” Perdas de água (SNIS 2017): Water Distribution Systems.” In Water Distribution Systems Analysis 2010,
Desafios para Disponibilidade Hídrica e Avanço da Eficiência do 1074–1082. 12th Annual Conference on Water Distribution Systems
Saneamento Básico (2019). Analysis (WDSA), Tucson, Arizona, United States. doi:10.1061/41203
Aksela, K, M Aksela, and R Vahala. 2009. “Leakage Detection in a Real (425)97.
Distribution Network Using a SOM.” Urban Water Journal 6 (4): Russell, Stuart J, and Peter Norvig. 2016. Artificial Intelligence: A Modern
279–289. doi:10.1080/15730620802673079. Approach. Malaysia: Pearson Education Limited.
Chan, TK, Cheng Siong Chin, and Xionghu Zhong. 2018. “Review of Current Thornton, Chris, Frank Hutter, Holger H Hoos, and Kevin Leyton-Brown.
Technologies and Proposed Intelligent Methodologies for Water 2013. “Auto-WEKA: Combined Selection and Hyperparameter
Distributed Network Leakage Detection.” IEEE Access 6: 78846–78867. Optimization of Classification Algorithms.” In Proceedings of the
doi:10.1109/ACCESS.2018.2885444. 19th ACM SIGKDD international conference on Knowledge discovery
Chandola, Varun, Arindam Banerjee, and Vipin Kumar. 2009. “Anomaly and data mining, Illinois, Chicago, USA, 847–855. doi:10.1145/
Detection: A Survey.” ACM Computing Surveys (CSUR) 41 (3): 15. 2487575.2487629.
doi:10.1145/1541880.1541882. Tuggener, Lukas, Mohammadreza Amirian, Katharina Rombach, Stefan
Feurer, Matthias, Aaron Klein, Katharina Eggensperger, Jost Springenberg, Lörwald, Anastasia Varlet, Christian Westermann, and Thilo
Manuel Blum, and Frank Hutter. 2015. “Efficient and Robust Automated Stadelmann. 2019. “Automated Machine Learning in Practice: State
Machine Learning.“ In Advances in Neural Information Processing Systems of the Art and Recent Results.” In 2019 6th Swiss Conference on Data
28, edited by C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett, Science (SDS), Bern, Switzerland, 31–36. IEEE. doi:10.1109/
2962–2970. New York: Curran Associates, Inc. SDS.2019.00-11.
Gupta, Manish, Jing Gao, Charu Aggarwal, and Jiawei Han. 2014. “Outlier Wu, Yipeng, and Shuming Liu. 2017. “A Review of Data-driven
Detection for Temporal Data.” Synthesis Lectures on Data Mining and Approaches for Burst Detection in Water Distribution Systems.”
Knowledge Discovery 5 (1): 1–129. doi:10.2200/S00573ED1V01Y Urban Water Journal 14 (9): 972–983. doi:10.1080/1573062
201403DMK008. X.2017.1279191.
Hutter, Frank, Lars Kotthoff, and Joaquin Vanschoren. 2019. Automated Yu, Tong, and Hong Zhu. 2020. “Hyper-Parameter Optimization: A
Machine Learning. Cham, Switzerland: Springer. doi:10.1007/978-3-030- Review of Algorithms and Applications.” arXiv preprint
05318-5. arXiv:2003.05689

You might also like