0% found this document useful (0 votes)
22 views11 pages

Machine Learning-Based Anomaly Detection

Uploaded by

Moïse Djemmo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views11 pages

Machine Learning-Based Anomaly Detection

Uploaded by

Moïse Djemmo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Machine Learning-based Anomaly Detection in

Optical Fiber Monitoring


KHOULOUD ABDELLI,1,3 * JOO YEON CHO,1 FLORIAN AZENDORF ,2 HELMUT
GRIESSER ,1 CARSTEN TROPSCHUG ,2 STEPHAN PACHNICKE3
1ADVA Optical Networking SE, Fraunhoferstr. 9a, 82152 Munich/Martinsried, Germany
2ADVA Optical Networking SE, Märzenquelle 1-3, 98617 Meiningen, Germany
3 Christian-Albrechts-Universität zu Kiel, Kaiserstr. 2, 24143 Kiel, Germany

*Corresponding author: [email protected]


Received XX Month XXXX; revised XX Month, XXXX; accepted XX Month XXXX; posted XX Month XXXX (Doc. ID XXXXX); published XX Month XXXX

Secure and reliable data communication in optical networks is critical for high-speed Internet. However, optical
fibers, serving as the data transmission medium providing connectivity to billons of users worldwide, are prone to a
variety of anomalies resulting from hard failures (e.g., fiber cuts) and malicious physical attacks (e.g., optical
eavesdropping (fiber tapping)) etc. Such anomalies may cause network disruption and thereby inducing huge
financial and data losses, or compromise the confidentiality of optical networks by gaining unauthorized access to
the carried data, or gradually degrade the network operations. Therefore, it is highly required to implement efficient
anomaly detection, diagnosis, and localization schemes for enhancing the availability and reliability of optical
networks. In this paper, we propose a data driven approach to accurately and quickly detect, diagnose, and localize
fiber anomalies including fiber cuts, and optical eavesdropping attacks. The proposed method combines an
autoencoder-based anomaly detection and an attention-based bidirectional gated recurrent unit algorithm,
whereby the former is used for fault detection and the latter is adopted for fault diagnosis and localization once an
anomaly is detected by the autoencoder. We verify the efficiency of our proposed approach by experiments under
various anomaly scenarios using real operational data. The experimental results demonstrate that: (i) the
autoencoder detects any fiber fault or anomaly with an F1 score of 96.86%; and (ii) the attention-based bidirectional
gated recurrent unit algorithm identifies the the detected anomalies with an average accuracy of 98.2%, and
localizes the faults with an average root mean square error of 0.19 m.
http://dx.doi.org/10.1364/JOCN.99.099999

the position on the optical fiber. As result, a recorded OTDR trace


1. INTRODUCTION illustrating the positions of faults along the fiber, is generated,
and used for event analysis. However, OTDR traces are difficult
Optical fiber is the essential medium for transporting a large to interpret even by highly experienced engineers mainly due to
amount of data through the aggregated Internet, mobile the noise overwhelming the signals. Analyzing OTDR signals
backhaul and core network. A single fiber link connects using conventional methods can be time consuming as
thousands of customers and enterprises, carrying a mixture of performing a lot of averaging of OTDR measurements is required
personal, business, and public data. Therefore, the impact of a to remove the noise and thereby to achieve a good event
broken fiber can be enormous and must be responded to detection and localization accuracy. Therefore, it would be
immediately. highly beneficial to develop a reliable automated diagnostic
In general, optical fiber is vulnerable to different types of method that accurately and quickly detects, diagnoses and
anomalies including fiber cut, fiber eavesdropping (fiber pinpoints fiber faults given the OTDR data and thereby reducing
tapping) etc. Such anomalies compromise the availability and operation-and-maintenance expenses (OPEX) and eliminating
the confidentiality of an optical network. Specifically, the manual the time needed to investigate the cause and determine a search
discovery of incidents occurring in the fiber requires area. Upon finding the fault location, appropriate action is taken
considerable expert knowledge and probing time until a fault to remedy the fault and restore service as quickly as possible.
(e.g., broken fiber) is identified. Fiber monitoring aims at Recently, machine learning (ML)-based approaches have shown
detecting anomalies in an optical layer by logging and analyzing great potential to tackle the problem of fiber event detection and
the monitoring data. It has mainly been performed using optical localization [2]. In this respect, long short-term memory, and
time domain reflectometry (OTDR), a technique based on convolutional neural networks (CNNs) have been proposed to
Rayleigh backscattering, widely applied for fiber characteristics’ detect and localize the reflective fiber events induced by the
measurements and for fiber fault detection and localization [1]. connectors and mechanical splices [3-5]. A hybrid ML-based
OTDR operates like an optical radar. It sends a series of optical framework combining a bidirectional long short memory
pulses into the fiber under the test. The backscattered signals are (BiLSTM) network and CNNs, called BiLSTM-CNN, has been
then recorded as a function of time, which can be converted into presented for detecting, localizing, and discriminating between
reflective, non-reflective and merged events [6]. To tackle the 2.1.1 Fiber cut
challenge of fiber event analysis under very low conditions (SNR Fiber cut is a disastrous physical failure in optical networks,
levels ranging from -5 dB and 15 dB), a method combining a capable of causing widespread disruption. In most cases fiber
denoising convolutional autoencoder (DCAE) and a BiLSTM has
cuts are the result of accidental cable damages due to
been proposed, whereby the former is used for noise removal of
OTDR signals and the latter, trained with noiseless OTDR signals, construction activities, ship anchors at cable landing points, or
is adopted for fault detection, localization, and diagnosis with the natural disasters like earthquakes or tornadoes, and only rarely
denoised signal by DCAE as input [7]. The ML models presented by intentional damage such as sabotage with the malevolent
in the last two publications were trained using experimental intent to induce a denial-of-service. Fiber cuts are considered as
data incorporating faults modeled using optical components the single largest cause of service outages. As reported by the
such as connectors or reflectors. Therefore, the generalization Federal Communication Commission (FCC), more than one-third
and robustness capabilities of such models may severely
of service disruptions are caused by fiber-cable breaks [8]. Any
degrade when tested with new unseen data including real
induced faults with various patterns such as fiber bend events service outage due to a fiber cut results in massive data loss,
generated for different bending radius values. Furthermore, network disruption, and huge financial loss etc [9]. In 1991, a
although such methods distinguish the non-reflective events severed fiber-optic cable shut down all New York airports and
from the other events, they cannot easily discriminate the faults induced air traffic control problems [10]. It is time-consuming to
due to bad splice or fiber tapping. Generally, the generated fiber locate and repair fiber cuts.
fault data for training ML-based diagnostic methods is highly
2.1.1 Optical eavesdropping
unbalanced (the instances of normal states are higher than the
faulty instances), which inevitably raises the issues caused by Optical eavesdropping attack permits the eavesdropper to gain
unbalanced class distributions. Additionally, it can be an unauthorized access to the carried data by directly accessing
prohibitively expensive and cumbersome to obtain data the optical channel via fiber tapping for the purpose of stealing
representing all types of faults or anomalies generated for mission-critical and sensitive information. There are several
different scenarios or settings that are accurately labeled. That fiber tapping techniques which can be adopted to launch the
is why unsupervised ML techniques, particularly reconstruction- eavesdropping attack, such as fiber bending, optical splitting,
based anomaly detection approaches, are frequently adopted.
evanescent coupling, V Groove cut etc [11]. However, the easiest
In this paper, an unsupervised ML technique, an autoencoder, is method to make the eavesdropping intrusion undetected is
proposed to quickly detect any anomaly or unexpected microfiber bending by using commercially available clip-on
abnormal event patterns in optical fibers. Once an anomaly or
coupler. Fiber-optic cable tapping incidents have been reported
fault is detected, a diagnostic ML model adopting attention
mechanisms and bidirectional gated recurrent unit network, is such as the eavesdropping device, which was discovered illegally
used to diagnose and localize the fault. The proposed methods installed on Verizon’s optical network in 2003 to glean
are applied to noisy OTDR data with SNR levels varying from 0 information from a mutual fund company regarding its quarterly
dB to 30 dB, including several real faults induced at different statement prior to its public disclosure [12]. Although, it is easy
locations of an optical network. Our contributions can be to perform an eavesdropping intrusion, it is challenging to detect
summarized as follows: such intrusion using conventional intrusion detection methods
• An autoencoder based anomaly detection model is such as OTDR-based techniques.
proposed for detecting any faults in fiber optics 2.2 Autoencoder
including fiber cut and fiber tapping attack. An autoencoder (AE) is a type of artificial neural network
• An attention-based bidirectional gated recurrent unit seeking to learn a compressed representation of an input in an
model for fiber fault diagnosis and localization is unsupervised manner [13]. As shown in Fig. 1, an AE is
presented. composed of two sub-models namely the encoder 𝑓𝜃 and the
decoder 𝑔𝜃 . Generally, the encoder and the decoder are of
• The efficiency of the proposed methods is validated
using experimental monitoring data. symmetrical architecture comprising of several layers each
succeeded by a nonlinear function and shared parameters 𝜃. The
Throughout this article “Diagnose” refers to the operation of encoder 𝑓𝜃 (. ) compresses an input 𝒙 into lower-dimensional
discriminating between the different types of faults. “Localize” latent-space representation 𝒛 , and the decoder 𝑔𝜃 (.) maps the
stands for the act of estimating the location of the fault. “Detect” encoded representation back into the estimated vector 𝒙 ̂ of the
refers to the operation of finding out any abnormal behavior in original input vector. For an autoencoder composed of multiple
optical fiber.
layers, the nonlinear encoder and decoder mappings can be
The rest of this paper is structured as follows: Section 2 gives formulated as follows:
some background information about the physical fiber attacks, 𝑓𝜃 𝑙 (.) = 𝜎 𝑙 (𝑾𝒍 (𝑓𝜃 𝒍−𝟏 (. )) + 𝒃𝒍 ) (1)
the autoencoder, and bidirectional gated recurrent unit 𝒍
algorithm. Section 3 presents the proposed framework for 𝑔𝜃 𝑙 (.) = 𝜎′𝑙 (𝑾′ (𝑔𝜃 𝒍−𝟏 (. )) + 𝒃′𝒍 ) (2)
predictive fiber monitoring. Section 4 describes the
experimental setup and the validation of the presented where 𝑙 represents the number of hidden layers, 𝜎 and 𝜎 ′ denote
approach. Conclusions are drawn in Section 5. the nonlinear activation functions, 𝑾 and 𝑾′ are weight
matrices, 𝒃 and 𝒃′ represent the bias vectors, and 𝜃 denotes the
2. BACKGROUND learnable model parameters {𝑾, 𝒃, 𝑾′ , 𝒃′}. 𝑓𝜃 𝟎 (. ) = 𝒙, and
2.1 Fiber anomalies
𝑔𝜃 0 (. ) = 𝑓𝜃 𝑙 (. ).
nput ncoder 𝒛𝑡 = 𝜎(𝑾𝑧 𝑥𝑡 + 𝑾𝑧 𝒉𝑡−1 + 𝒃𝑧 ) (4)
𝒓𝑡 = 𝜎(𝑾𝑟 𝑥𝑡 + 𝑾𝑟 𝒉𝑡−1 + 𝒃𝑟 ) (5)
1 1 ̃
𝒉𝑡 = tanh( 𝑾ℎ 𝑥𝑡 + 𝑾ℎ ( 𝒓𝑡 ∘ 𝒉𝑡−1 ) + 𝒃ℎ ) (6)
𝒉𝑡 = 𝒛𝑡 ∘ 𝒉𝑡−1 + (1 − 𝒛𝑡 ) ∘ ̃𝒉𝑡 (7)

where 𝒛 denotes the update gate, 𝒓 represents the reset gate, 𝒙


is the input vector, 𝒉 is the output vector, 𝑾 and 𝒃 represent the
weight matrix and the bias vector respectively. 𝜎 is the gate
activation function and tanh represents the output activation
atent space
function. The “∘” operator represents the Hadamard product.

eset gate pdate gate ew hidden


Fig. 1. Structure of a standard autoencoder composed of two state
Hidden
nonlinear mappings (fully connected feedforward neural 1
state +
networks) namely the encoder and the decoder.

The training objective of the autoencoder is to minimize the andidate


hidden state
reconstruction error between the output 𝒙 ̂ and the input 𝒙,
tanh
referred as the loss function ℒ(𝜃), typically the mean square
error (MSE), expressed as:

̂ ‖2
ℒ(𝜃) = ∑‖𝒙 − 𝒙 (3)
Fig. 2. Structure of the gated recurrent unit (GRU) cell.
AE has been widely used for anomaly detection by adopting the
reconstruction error as anomaly score. It is trained with only BiGRU is an extension of GRU that helps to improve the
normal data representing the normal behavior. After training, performance of the model. It consists of two GRUs: one forward
AE will reconstruct the normal instances very well, while it will GRU model that takes the input in a forward direction, and one
fail to reproduce the anomalous observations by yielding high backward GRU model that learns the reversed input. The output
reconstruction errors. The process of the classification of an 𝒚𝑡 of the model is generated by combining the forward output ⃑𝒉𝑡
instance as anomalous/normal is shown in Alg. 1. and backward output ⃐⃑𝒉⃑𝑡 as described by the following equations:

Algorithm 1: Autoencoder based anomaly detection ⃑𝒉𝑡 = 𝐺𝑅𝑈 ( 𝒙𝑡 , ⃑𝒉𝑡−1 ) (8)


Input: Normal dataset 𝒙, anomalous dataset 𝒙(𝑖) 𝑖 = 1, … , 𝑁, ⃐⃑
𝒉⃑𝑡 = 𝐺𝑅𝑈 ( 𝒙𝑡 , ⃐⃑
𝒉⃑𝑡−1 ) (9)
threshold 𝜃 𝒚𝑡 = ⃑𝒉𝑡 ⨁ 𝒉
⃐⃑⃑𝑡 , (10)
Output: reconstruction error ‖𝒙 − 𝒙̂‖
1: train an autoencoder given the normal data 𝒙 where ⨁ denotes an element-wise sum.
2: for 𝑖 = 1 to 𝑁 do
3: 𝑟𝑒𝑐𝑜𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛 𝑒𝑟𝑟𝑜𝑟 (𝑖) = ‖𝒙(𝑖) − 𝑔(𝑓(𝒙(𝑖) ))‖ 2.4 Multi-task learning
4: if 𝑟𝑒𝑐𝑜𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛 𝑒𝑟𝑟𝑜𝑟 (𝑖) > 𝜃 then Multi-task learning is a subfield of ML aiming at improving the
5: 𝒙(𝑖) is anomalous overall performance of multiple tasks by learning them jointly
6: else while sharing knowledge across them. It has been widely
7: 𝒙(𝑖) is normal adopted in various fields ranging from natural language
8: end if processing to computer vision. The multi-task learning
9: end for approaches can be generally categorized into two architectures:
hard- and soft parameter sharing. The hard parameter sharing
method is performed by sharing the hidden layers with the
different tasks (completely sharing the weights and the
2.3 Bidirectional Gated Recurrent unit (BiGRU) parameters between all tasks) while preserving task-specific
The Gated Recurrent Unit (GRU) recently proposed to solve the output layers learnt independently by each task. Whereas for the
gradient vanishing problem [14], is an improved version of soft parameter sharing approach, a model with its own
standard recurrent neural networks (RNNs), used to process parameters is learnt for each task and the distance between the
sequential data and to capture long-term dependencies. The parameters of the model is then regularized to encourage
typical structure of GRU shown in Fig. 2, contains two gates similarities among related parameters [3].
namely reset and update gates, controlling the flow of the
3. PROPOSED APPROACH
information. The update gate regulates the information that
As shown in Fig. 3, the proposed framework for fiber monitoring
flows into the memory, while the reset gate controls the
can be broken into five main stages: (1) optical fiber network
information flowing out of the memory. monitoring and data collection, (2) data processing, (3) fiber
The GRU cell is updated at each time step t by applying the anomaly detection, (4) fiber fault diagnosis and localization, (5)
following equations: mitigation and recovery from fiber failures. The optical fibers
deployed in the network infrastructure are periodically
monitored using OTDR (i.e fiber monitoring unit). The generated distance, combined with the sequence’s computed SNR (𝜸).
OTDR traces (i.e monitoring data) are sent to the software- ncluding the information about the sequence’s S during the
defined networking (SDN) controller managing the optical training phase helps the ML model to learn the behavior of the
infrastructure. Then, the said data is segmented into fixed length normal signal pattern for each input SNR level and thereby to
sequences and normalized. Afterwards, the processed data is fed boost the performance [3]. The fed input to the encoder is then
to the ML based anomaly detection model for detecting the fiber compressed into low dimensional representation through
anomalies or faults. If a fiber anomaly is detected, a ML model for adopting 2 GRU layers composed of 64 and 32 cells, respectively,
fault diagnosis and localization is adopted to diagnose the fault which capture the relevant sequential features modelling the
and localize it. Based on the identified fault, a set of recovery normal state under different SNRs. Afterwards, the decoder
rules is applied to mitigate such fault. The SDN controller notifies reconstructs the output, given the compressed representation
the network operation center in case of failure, which informs output of the encoder. The decoder is inversely symmetric to the
the customer about the type of detected fault and its location, encoder part. Exponential linear unit (ELU) is selected as an
and notifies the maintenance and repair service in case of fiber activation function for each hidden layer of the model. The cost
cut. function is set to the mean square error (MSE), which is adjusted
For this work, we consider the fiber anomalies fiber cut and by using the Adam optimizer.
optical eavesdropping attack as examples of harmful or major ecoder utput
nput ncoder
fiber faults. Given that the patterns of the aforementioned faults
and the anomalies bad splice and dirty connector are similar, 1

particularly under very low SNR conditions, we include them


during the training phase of the ML model for fault diagnosis to 1

ensure a reliable fault identification and reduce false alarms. 3

30
30

Fig. 4. Structure of the proposed model for fiber anomaly


detection.

3.2 Fault Diagnosis and Localization Model


As the tasks, namely the fault diagnosis 𝑻𝟏 , and the fault position
estimation 𝑻𝟐 can benefit from feature space sharing, the
proposed model for fault diagnosis and localization is a
multitask learning framework with hard parameter sharing
adopted to learn the tasks simultaneously in order to enhance
the generalization capability. The architecture of the proposed
framework is composed of shared hidden layers distributing the
knowledge across the tasks 𝑻𝟏 and 𝑻𝟐 followed by a specific task
Fig. 3. Overview of the ML-based fiber monitoring process. layer. The shared hidden layers adopt a combination of
bidirectional gated recurrent unit (BiGRU) network and
3.1 Anomaly Detection Model attention mechanisms. The BiGRU is used to capture the
sequential features characterizing each fault’s pattern, whereas
The autoencoder is trained with normal data only representing the attention mechanisms help the model to concentrate more
the normal behavior in order to learn the distribution on the relevant features in order to improve the fault diagnosis
characterizing the normal state. After the training of the and localization accuracy. As shown in Fig. 5, the input (i.e., the
autoencoder and for the inference phase, the reconstruction abnormal sample detected by the GRU-based autoencoder) is
error is adopted as an anomaly score to detect any potential firstly fed to 2 BiGRU layers composed of 64 and 32 cells,
fault. A well-trained autoencoder will reconstruct the normal respectively, to learn the relevant sequential features
instances very well since they will have the same pattern or [𝒉𝟏 , 𝒉𝟐 … 𝒉𝟑𝟏 ]. Then, the attention layer assigns to each
distribution as the training data, while it will fail to reproduce extracted feature hi a weight (i.e., attention score) αi, which is
the anomalous observations by yielding high reconstruction calculated as follows:
errors. The process of the classification of an instance as
anomalous/normal is illustrated in Algorithm 1. If the computed 𝒆𝒊 = 𝐭𝐚𝐧𝐡( 𝑾𝒉 𝒉𝒊 ) (𝟏𝟏)
𝑇
anomaly score is higher than a set threshold θ, the instance is 𝜶𝑖 = softmax ( 𝑾 𝒆𝑖 ) (1 )
classified as” anomalous”, else it is assigned as” normal”. 𝜃 is a
hyperparameter optimized to ensure high detection accuracy where 𝑾ℎ , 𝑾 denote weight matrices. The softmax is used to
and is adjusted by taking into consideration the degradation and normalize 𝜶𝑖 and to ensure that 𝜶𝑖 ≥ 0, and ∑𝑖 𝜶𝑖 = 1.
the aging effect of the optical fiber. The different computed weights 𝛼𝑖 are aggregated to obtain a
The architecture of the proposed autoencoder model for fiber weighted feature vector (i.e., attention context vector) 𝒄, which
anomaly detection is illustrated in Fig. 4. The model contains an captures the relevant information to improve the performance
encoder and a decoder sub-model with 4 layers. The encoder of the neural network model. 𝒄 is computed as follows:
takes as an input a 30-length sequence of an OTDR trace [𝑷𝟏 ,
𝑷𝟐 , … 𝑷𝟑𝟎 ] representing the attenuation of the fiber along the 𝒄 = ∑𝑖 𝜶𝑖 𝒉𝑖 (13)
model normal events in the fiber optic link. To vary the fiber
Afterwards, 𝒄 is transferred to two task-specific layers bending pattern and thereby to enhance the generalizability
dedicated to solving the tasks of fault diagnosis (𝑇1 ) and fault capability of the ML model, the bend radius of the clip-on coupler
localization (𝑇2 ) respectively, by leveraging the knowledge is ranged from 2.5 mm to 10 mm. Different bad splices with
extracted by the attention-based BiGRU shared layers. The dissimilar losses are performed to create a varying bad splicing
model is trained by minimizing the loss function formulated as: fault pattern. The OTDR configuration parameters namely the
pulse width, the wavelength and the sampling time are set to 10
ℒ𝑡𝑜𝑡𝑎𝑙 = 𝜆1 𝑙 𝑇1 + 𝜆2 𝑙 𝑇2 , (14) ns, 1650 nm and 1 ns, respectively. From 62 up to 65,000 OTDR
records are collected and averaged. Figure 7 shows an example
of a recorded OTDR trace incorporating the different faults,
where 𝐿 𝑇1 and 𝐿 𝑇2 denote the loss of 𝑇1 and 𝑇2 , and the first one
whereas Fig. 8 illustrates the patterns of the investigated faults.
is the cross-entropy loss whereas the second one is the
regression loss (MSE). The loss weights 𝝀𝟏 and 𝝀𝟐 are
4.2 Data Preprocessing
hyperparameters to be tuned.
The generated OTDR traces are segmented into sequences of
length 30 and normalized. For each sequence, γ is computed and
utput layer Fault type Fault location assigned. For the training of the GRU-based autoencoder (GRU-
1
AE), only the normal state sequences incorporating either no
Task speci ic layers fault or normal events induced by the optical components, are
considered, whereas for testing, both normal samples and faulty
ttention layer onte t ector sequences incorporating an anomaly, are used. For training
1
30
31 GRU-AE, a dataset of 47,904 samples is built and split into a
training (70%), and a test dataset (30%).
1 30 31 For training the attention based BiGRU model, we consider only
i hidden layers
1 30 31 the faulty sequences. For each sequence, the fault type (fiber
Forward layer
1 30
31
eavesdropping, bad splice, fiber cut, dirty connector), the fault
ackward layer
position defined as the index within the sequence, are assigned.
A dataset of 61,849 samples is used for training the fault
diagnosis and localization ML model. The said data is divided
nput layer 1 30
into a training (60%), a validation (20%) and a test dataset
(20%).
Fig. 5. Structure of the proposed attention-based bidirectional
gated recurrent unit model for fiber fault diagnosis and
localization.

4. VALIDATION OF THE PROPOSED APPROACH


4.1 Experimental Data
To validate the proposed approach, the experimental setup
shown in Fig. 6 is conducted. The setup is used to record OTDR
traces incorporating different types of fiber faults namely fiber
cut, fiber eavesdropping (fiber tapping), dirty connector and bad
splice. To reproduce a real passive optical network environment,
4 couplers are employed. Optical components like connectors,
variable optical attenuator (VOA) and reflector are utilized to

0 km ad splice 2.2 km m .2 km 0. km
0 0

0 0
m
2 m
Fiber cut
irty connector
2m
0
m

oupler lip on oupler Variable optical


3 0m hysical contact ( ) connector
Fiber tapping attenuator (V )
Fiber patch cable pen connector SF
Small form factor pluggable ending
(SF )
ngled ( ) connector Fiber e lector

Fig. 6. Experimental setup for generating OTDR data containing different faults induced at different locations in an optical network.
• Precision (P) quantifies the relevance of the
predictions made by the ML model. It is expressed as:

𝑇𝑃
𝑃 =
𝑇𝑃 + 𝐹𝑃

• Recall (R) provides the total relevant results correctly


classified by the ML model. It is formulated as:

𝑇𝑃
𝑅 =
𝑇𝑃 + 𝐹𝑁

• F1 score is the harmonic mean of the precision and


recall, calculated as:

𝑃𝑅
𝐹1 =
𝑃+𝑅
Fig. 7. Example of OTDR trace generated using the experimental
setup shown in Fig. 6. 4.3.2 Fault Detection Capability

4.3 Performance Assessment The anomaly detection capability of GRU-AE is optimized by


4.3.1 Evaluation Metrics selecting an optimal threshold θ. Figure 9 shows the precision,
the recall, and the F score cur es as function of θ. f the selected
The fault detection is modeled as a binary classification, whereby
threshold is too low, many faults will be classified as normal,
we distinguish among sequences with label “1: fault” (i.e.,
leading to a higher false positive ratio. Whereas if the chosen
“positi e”) or “0: normal” (i.e., “negati e”). We consider:
threshold is too high, many “normal” sequences will be classified
as “faulty”, resulting in a higher false negati e ratio. Therefore,
• true positives (TP): umber of sequences of type “ ” the optimal threshold that ensures the best precision and recall
correctly classified with label “ ”; tradeoff (i.e., maximizing the F1 score) is chosen. For the
• true negatives (TN): umber of sequences of type “0” optimally selected threshold of 0.008, the precision, the recall,
correctly classified with label “0”; and the F1 scores are 96.9%, 96.86%, and 96.86%, respectively.

• false positives (FP): Number of sequences of type “0”


misclassified with label “ ”;

• false negatives (FN): umber of sequences of type “ ”


misclassified with label “0”.

(a) (b)
Fig. 9. The optimal threshold selection based on the precision,
recall and F1 scores yielded by GRU-AE.

The receiver operating characteristic (ROC) curve, illustrating


the performance of the model at different threshold settings,
shown in Fig. 10, proves that GRU-AE can distinguish very well
between the normal and faulty classes by achieving a high area
(c) (d)
under the curve (AUC) (i.e., measuring the degree of separability
between the classes) of 0.98.
Fig. 8. Patterns of the faults: (a) fiber tapping, (b) bad splice, (c)
dirty connector (the second peak, whereas the first and last
peaks are induced due to PC connector and open PC connector
respectively), (d) fiber cut.

To assess the detection capability, the following metrics are


adopted:
setting the number of cells for each layer of the encoder to 64
yields the best performance (lowest reconstruction error).

++
(a)

(b)

Fig. 10. The receiver operating characteristic curve of GRU-AE.

4.3.3 Optimization of GRU-AE

Network architectures with various sizes are evaluated to select


the optimum one, achieving the best performance while
ensuring a moderate complexity. The impact of the parameters,
(c)
namely “depth” (i.e., the number of hidden layers), “width” (i.e.,
number of cells for each hidden layer), and the activation
function, on the performance of GRU-AE in terms of
reconstruction error is investigated. The number of hidden
layers with either 32 or 64 cells is firstly varied from 2 to 8.
Different activation functions for the hidden layers, namely
rectified Linear Unit (ReLU), leaky ReLU, scaled exponential
linear unit (SELU), and ELU, are analyzed. Figure 11 shows the
output of each activation function for a given input. Several
combinations of number of cells for each hidden layer of the
encoder network are tested. Fig. 12. Optimization of the GRU-AE model: (a) reconstruction
errors with different depths for different cells per layer, (b) for
different activation functions in the hidden layers, (c) for several
number of cells per hidden layer of the encoder.

4.3.4 Comparison of GRU-AE with other ML models

The GRU-AE model is compared to other unsupervised anomaly


detection methods namely isolation forest (IF), local outlier
factor (LOF), and one-class support vector machine (OCSVM), in
terms of F1 score and the area under the precision recall curve
(AUPRC) metrics. IF isolates the outliers in the data by
performing a random partition on the data observations and
then computing the split value between the maximum and
minimum of the chosen instance. The path length defined as the
number of splits required to isolate each observation is adopted
as anomaly score. LOF computes the anomaly score by
measuring the local deviation of the density of a given instance
Fig. 11. The output of different activation functions. with respect to its neighbors. OCSVM learns the boundary
decision encompassing the normal data in the feature space, and
As shown in Fig. 12 (a), the reconstruction error of GRU-AE during the inference stage, any sample lying outside that
shows a decreasing trend with the increase of the depth for a boundary is considered an anomaly. IF, LOF, and OCSVM are
different number of cells per hidden layer, before reaching the trained with normal data, like the GRU-AE model, and tested
optimum depth of 4. Increasing the depth helps the GRU-AE with unseen data including both normal and abnormal data. The
model to capture more features modelling the normal behavior. results shown in Table 1 demonstrate that the GRU-AE model
However, widening the layers higher than 6 can lead to outperforms the other ML methods by yielding the highest
overfitting and thus reduces the performance of the GRU-AE. As values of F1 score and AUPRC scores. Compared to the tested ML
illustrated in Fig. 12 (b), adopting ELU as activation function in algorithms GRU-AE provides significant improvements of more
the hidden layers achieves the smallest reconstruction error than 5.5% and 6.36% in AUPRC and F1 scores, respectively.
compared to the other functions. Figure 12 (c) confirms that
TABLE I
COMPARISON OF DIFFERENT ANOMALY DETECTION ML METHODS IN TERMS
OF AREA UNDER THE PRECISION RECALL CURVE (AUPRC) AND F1 SCORE.
THE BEST RESULT IS SHOWN IN BOLD.

Method F1 score (%) AUPRC (%)


OCSVM 49.8 94.1
IF 86.9 71.5
LOF 90.5 61.7
GRU-AE 96.86 99.6

4.3.5 Fault Diagnosis Capability

The confusion matrix shown in Fig. 13, proves that the attention
based BiGRU model (A-BiGRU) diagnoses the different faults Fig. 14. The diagnosis accuracy of the A-BiGRU model.
with an accuracy higher than 97%, and accurately distinguishes
the physical fiber attack by achieving an accuracy of 98%. As for The feature learning ability of A-BiGRU under very low SNR
low SNR sequences, the patterns of eavesdropping and bad conditions (SNR ≤ 5 dB) for solving the task 𝑇1 is visually
splice faults look similar, the ML model mis-classified a little investigated using the t-distributed stochastic neighbor
these classes. The same applies for dirty connector and fiber cut embedding (t-SNE) technique [15]. Figure 15 shows that first the
patterns under low SNR conditions leading to low learned features under SNR levels lower than 1 dB are of very
misclassification rates. poor separability and A-BiGRU misclassifies most of the faults as
fiber cut mainly due to the similarity of the different fault
patterns under those SNR conditions because of the high noise
overwhelming the patterns, second the extracted features
become more and more discriminative with the increase of the
SNR, and third A-BiGRU can learn effective features for accurate
fault diagnosis even for an SNR condition higher than 2 dB.

(a)

Fig. 13. The confusion matrix of A-BiGRU model.

Figure 14 shows the effects of SNR on the diagnosis accuracy of


A-BiGRU. The accuracy increases with SNR. For SNR values
higher than 10 dB, the accuracy is approaching 100%. For an
SNR lower than 2 dB, the accuracy is worse as it is very difficult
to differentiate the different faults mainly due to the noise that
adversely impacts the patterns of the faults which might look
similar under very low SNR levels. For input sequences of SNR (b)
higher than 2dB, A-BiGRU could discriminate the different types
of faults with a good accuracy higher than 90.5%.
(c)

Fig. 16. Histogram of position prediction errors yielded by A-


BiGRU model.
(d)

Fig. 15. Visualization of the feature learning under low SNR


conditions: (a) 0 dB SNR, (b) 1 dB SNR, (c) 2 dB SNR, (d) 5 dB
SNR.

4.3.6 Fault Localization Capability


Fig. 17: Fault position estimation error (RMSE) for the ML
To assess the fault localization accuracy of the A-BiGRU model, model.
two evaluations metrics namely the prediction error
representing the difference between the predicted and actual 4.3.7 Comparison of BiGRU with other existing ML approaches
position of the fault, and the root mean square error (RMSE) are
adopted. Figure 16 shows that A-BiGRU achieves very small The BiGRU is compared to two baseline ML models recently
prediction errors with a mean of 0.05 m and a standard deviation proposed for fiber fault diagnosis and localization, namely
of 0.2, which proves that the ML model accurately localizes the BiLSTM-CNN [6] and BiLSTM [7]. For the sake of a fair
faults. We analyzed the fault localization estimation comparison, BiLSTM-CNN and BiLSTM are retrained with the
performance of A-BiGRU as function of SNR. As depicted in Fig. same training data as BiGRU, and we adjusted their structures
17, A-BiGRU accurately localizes the faults by achieving an for solving only the two learning tasks 𝑇1 and 𝑇2 . For the
average root mean square error (RMSE) of 0.19 m, and that the architectures of the models, BiLSTM is composed of one BiLSTM
RMSE decreases with SNR. For lower S alues (S ≤ 0 d ), layer with 32 cells followed by task-specific layers composed of
the RMSE can be higher than 0.35 m, whereas for SNR values 16 and 20 neurons, respectively. In contrast the BiLSTM-CNN
higher than 13 dB, it is less than 0.2 m and it could be further model consists of one BiLSTM layer with 32 cells followed by
reduced up to less than 0.1 m for SNR values higher than 27 dB. CNN layers containing mainly one convolutional layer having 32
filters with the max pooling layer succeeded by a dropout layer,
and two task-specific layers composed of 16 and 20 neurons,
respectively. The length of the input sequence of both models is
set to 30. We compare the different models by adopting as
evaluation metrics the average diagnostic accuracy to assess the
fault diagnosis capability, and the average RMSE to evaluate the
fault localization performance. Figure 18 (a) proves that the
proposed model outperforms the existing methods by achieving
an improvement of 8.8% in accuracy due to the inclusion of the
SNR during the training phase and the adoption of the attention
mechanisms to capture the relevant features and thereby boost
the performance. Figure 18 (b) shows that the proposed method
achieves the lowest RMSE value compared to the other models.

(a)

Fig. 19. The confusion matrix of model B.

(b) Table 3 proves that model A outperforms model B in terms of


accuracy with an improvement of 5.1%, which demonstrates the
importance of adopting the autoencoder to discriminate the
normal class from the faulty classes and thereby enhances the
fault diagnosis capability of BiGRU and reduces the false alarm
rate.

TABLE III
COMPARISON OF ML MODELS IN TERMS OF AVERAGE ACCURACY . THE BEST
RESULT IS SHOWN IN BOLD.

Method Average accuracy (%)

Fig. 19: Comparison between the proposed model and existing Model A (GRU-AE + BiGRU) 96.9
ML methods in terms of: (a) average diagnostic accuracy and (b)
Model B (without GRU-AE) 91.8
average RMSE.

The comparison of the results of computational inference time 4.3.9 Investigation of the Robustness of BiGRU
between the proposed model and the existing methods are
shown in Table 2. As it can be seen, the proposed model
Given that the BiGRU model is trained with data incorporating
consumes slightly more time than the existing methods due to
the faults induced at fixed locations of the network to assess the
its deeper architecture.
robustness capability of the proposed method, we modify the
locations of the different faults as shown in Fig. 20, and test the
performance of BiGRU given the new data generated using the
TABLE II
COMPUTIONAL TIME OF POPOSED MODEL AND EXISTING METHODS. new experimental setup.
THE BEST RESULT IS SHOWN IN BOLD. 0 km
Fiber tapping .2 km
0
Method Inference time (12,370 samples)
BiLSTM 1.06 ± 𝟎. 𝟎𝟑 𝒔 m
0

BiLSTM-CNN 1.18 ± 0.14 𝑠 irty


BiGRU 2.1± 0.13 𝑠 2m
0
Fiber cut connector

T m

4.3.8 Integrated learning approach GRU-AE- BiGRU 0

The performance of the integrated approach combining GRU-AE 3 0m ad


splice
and BiGRU models, called model A, is compared to a BiGRU
model trained to discriminate between the normal state and the
different types of faults (omitting the use of the autoencoder), Fig. 20: Modified experimental setup for testing the robustness
denoted by model B, in terms of average accuracy metric. The of the BiGRU model. For a legend of symbols, please refer to Fig.
model B is trained with data including both normal and faulty 6.
data modelling the different fault classes. Figure 19 shows the
confusion matrix obtained by model B. As it can be noticed, it is Tested with that data, the BiGRU achieves a good fault diagnosis
hard for the model to distinguish between the normal class and capability by yielding an accuracy of 96.8%, proving that the ML
the other faults namely bad splices and eavesdropping events model effectively learns the different types of faults, and thus is
due to the similarity of their patterns under low SNR conditions. capable of diagnosing them at different locations of the network.
5. CONCLUSION [12] Sandra Kay Miller, "Hacking at the Speed of Light ", Security
Solutions Magazine, April 2006.
A ML-based approach for fiber fault detection, identification and
[13] M. . Kramer, “ onlinear principal component analysis
localization is proposed. The presented framework includes an
using autoassociative neural networks,” h Journal. 37
autoencoder model to detect the fiber anomalies and an
(2): 233–243, 1991.
attention based bidirectional gated recurrent unit method to
[14] K. ho et al, “ earning hrase epresentations using
recognize the detected fiber faults and localize them. The
Encoder- ecoder for Statistical Machine Translation,”
effectiveness of the proposed approach is validated using OTDR
Proceedings of the 2014 Conference on Empirical Methods
data incorporating various faults including fiber cuts and optical
in Natural Language Processing, 1724-1734, 2014.
eavesdropping attacks. The experimental results proved that the
[15] an der Maaten et al., ‘‘Visualizing data using t-S ,’’
presented framework achieves a good fault detection and
Journal of Machine Learning Research. 9. 2579-2605, 2008.
diagnosis capability and a high localization accuracy. Our
experiments show that ML techniques can enhance the
performance of the anomaly detection and fault diagnosis and
localization in fiber monitoring, and thereby minimize the false
positive alarms, saving time and maintenance costs. In our future
work, we plan to which is usually operated in a very complex,
sophisticated, intelligent, and autonomous environment.

Acknowledgements. This work has been performed in the framework


of the CELTIC-NEXT project AI-NET-PROTECT (Project ID C2019/3-4),
and it is partly funded by the German Federal Ministry of Education and
Research (FKZ16KIS1279K).

REFERENCES
[1] . W et al., “ dentification method of non-reflective faults
based on index distribution of optical fibers,” pt press.
2014 Jan 13;22(1):325-37.
[2] O. N- oateng, et al., “ redicting the actual location of faults
in underground optical networks using linear regression,”
Engineering Reports (2021).
[3] K. bdelli et al., “ eflecti e fiber fault detection and
characterization using long short-term memory,”
IEEE/OSA J. Opt. Commun. Netw., vol. 13, no. 10, October
2021.
[4] K. bdelli et al., “ eflecti e ent etection and
Characterization in Fiber Optical Links Given Noisy OTDR
Signals,” hotonic Networks; 22th ITG Symposium, 2021.
[5] W. Zhang et al., “ deep con olutional neural network with
new training methods for bearing fault diagnosis under
noisy en ironment and different working load, “Mechanical
Systems and Signal Processing, Volume 100, 2018.
[ ] K. bdelli et al., “ i STM-CNN based Multitask Learning
pproach for Fiber Fault iagnosis,” F 202 .
[7] K. Abdelli et al., "Optical Fiber Fault Detection and
Localization in a Noisy OTDR Trace Based on Denoising
Convolutional Autoencoder and Bidirectional Long Short-
Term Memory," in Journal of Lightwave Technology, doi:
10.1109/JLT.2021.3138268.
[8] A. A.- . akar, et al., “ new technique of real-time
monitoring of fiber optic cable networks transmission,”
Optics and Lasers in Engineering 45: 126-130 (2007).
[9] han, .K., et al., “Fiber-fault identification for branched
access networks using a wavelength-sweeping monitoring
source,” hoton. Technol. ett., : 4-616 (1999).
[10] oronha, Val. “ etworks, Security and omple ity: The
Role of Public Policy in Critical Infrastructure Protection -
y Sean . orman.” (200 ).
[11] K. Shaneman and S. Gray, "Optical network security:
technical analysis of fiber tapping mechanisms and
methods for detection & prevention," IEEE MILCOM 2004.
Military Communications Conference, 2004., 2004, pp. 711-
716 Vol. 2.

You might also like