0% found this document useful (0 votes)
12 views10 pages

JLT 39 5 1371

This document presents a model-aware deep learning method for optimizing Raman amplification in few-mode fibers, addressing the growing demand for higher capacity in optical fiber communications. The proposed technique utilizes automatic differentiation to embed a Raman amplification model directly into the training of a neural network, enabling efficient prediction of pump parameters for desired gain profiles. The method demonstrates improved performance in generating flat and tilted gain profiles while minimizing mode-dependent and wavelength-dependent gain fluctuations.

Uploaded by

dvats02
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views10 pages

JLT 39 5 1371

This document presents a model-aware deep learning method for optimizing Raman amplification in few-mode fibers, addressing the growing demand for higher capacity in optical fiber communications. The proposed technique utilizes automatic differentiation to embed a Raman amplification model directly into the training of a neural network, enabling efficient prediction of pump parameters for desired gain profiles. The method demonstrates improved performance in generating flat and tilted gain profiles while minimizing mode-dependent and wavelength-dependent gain fluctuations.

Uploaded by

dvats02
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

JOURNAL OF LIGHTWAVE TECHNOLOGY, VOL. 39, NO.

5, MARCH 1, 2021 1371

Model-Aware Deep Learning Method for


Raman Amplification in Few-Mode Fibers
Gianluca Marcon , Andrea Galtarossa , Fellow, IEEE, Luca Palmieri , Member, IEEE,
and Marco Santagiustina , Member, IEEE

Abstract—One of the most promising solutions to overcome During the last three decades, the demand in internet traffic
the capacity limit of current optical fiber links is space-division increased exponentially with an annual rate of 40%, while cur-
multiplexing, which allows the transmission on various cores of rent technologies are rapidly approaching the nonlinear Shannon
multi-core fibers or modes of few-mode fibers. In order to realize
such systems, suitable optical fiber amplifiers must be designed. limit (NSL) of single-mode fibers (SMFs) [3]. In order to avoid
In single mode fibers, Raman amplification has shown significant bringing the existing optical fiber infrastructure to a “capac-
advantages over doped fiber amplifiers due to its low-noise and ity crunch” [4], space-division multiplexing (SDM) has been
spectral flexibility. For these reasons, its use in next-generation proposed as the key technology for future lightwave systems
space-division multiplexing transmission systems is being studied operating beyond the NSL [5].
extensively. In this work, we propose a deep learning method that
uses automatic differentiation to embed a complete few-mode Ra- A promising approach to implement SDM is to exploit the
man amplification model in the training process of a neural network spatial diversity of modes in few-mode fibers (FMFs) to transmit
to identify the optimal pump wavelengths and power allocation independent data streams, so realizing mode-division multi-
scheme to design both flat and tilted gain profiles. Compared to plexing (MDM) [6]. In order to benefit from the added ca-
other machine learning methods, the proposed technique allows to pacity of spatially-multiplexed transmissions, suitable network
train the neural network on ideal gain profiles, removing the need
to compute a dataset that accurately covers the space of Raman devices must be designed fully compatible with the already
gains we are interested in. The ability to directly target a selected well-established techniques such as wavelength-division multi-
region of the space of possible gains allows the method to be easily plexing (WDM). To this end, the role of SDM-compatible ampli-
generalized to any type of Raman gain profiles, while also being fiers is of fundalmental importance, with several experimental
more robust when increasing the number of pumps, modes, and works demonstrating the effectiveness in MDM scenarios of
the amplification bandwidth. This approach is tested on a 70 km
long 4-mode fiber transmitting over the C+L band with various both erbium-doped fiber amplifiers (EDFAs) [7], [8] and Raman
numbers of Raman pumps in the counter-propagating scheme, amplifiers (RAs) [9], [10].
targeting gain profiles with an average gain in the interval from 5 dB The compensation of link losses with minimal signal-to-noise
to 15 dB and total tilt in the interval from −1.425 to 1.425 dB. We ratio (SNR) reduction has always been a crucial aspect in optical
achieve wavelength- and mode-dependent gain fluctuations lower
communications, but additional care must be taken with SDM
than 0.04 dB and 0.02 dB per dB of gain, respectively.
systems to minimize both mode-dependent gain (MDG) and
Index Terms—Raman amplification, space-division wavelength-dependent gain (WDG), as they can be both detri-
multiplexing, deep learning. mental to the multiple-input multiple-output (MIMO) digital
signal processing (DSP) algorithms that mitigate the effect of
I. INTRODUCTION mode-crosstalk to correctly recover the transmitted signals [11].
ONLINEAR phenomena arising in optical fibers impose While the simplicity and power efficiency of EDFAs made
N an intrinsic limit to their information capacity [1], [2]. them appealing for commercial communication systems, their
reduced gain bandwidth has made Raman amplification an at-
Manuscript received July 9, 2020; revised October 2, 2020; accepted October tractive solution for wideband WDM schemes [12]. The spectral
25, 2020. Date of publication October 29, 2020; date of current version March flexibility of RAs, together with suitable optmization techniques,
1, 2021. This work was supported in part by the Italian Ministry for Education, enables the design of flat gain profiles over large bandwidths
University and Research (MIUR) through law 232/2016—“Departments of
Excellence” and PRIN 2017—project 2017HP5KH7: Fiber Infrastructure for by means of multiple wavelength pumps [12]. In the context
Research on Space-division multiplexed Transmission (FIRST), and in part by of SDM, the additional degrees of freedom can lead to higher
the University of Padova through BIRD 2019—project MACFIBER. (Corre- control of WDG and MDG [13]. Additionally, RAs can offer dis-
sponding author: Gianluca Marcon.)
Gianluca Marcon is with the Department of Information Engineering, Uni- tributed amplification, resulting in a reduced noise contribution
versity of Padova, 35122 Padova, Italy (e-mail: [email protected]). compared to EDFAs [14].
Andrea Galtarossa, Luca Palmieri, and Marco Santagiustina are with In SMF systems, different approaches have been followed
the Department of Information Engineering, University of Padova, 35122
Padova, Italy, and also with CNIT – National Inter-University Consortium for to correctly determine the pump parameters required to obtain
Telecommunications, 56127 Pisa, Italy (e-mail: [email protected]; pre-determined gain profiles. A recent publication [15] pro-
[email protected]; [email protected]). posed a machine learning (ML) technique to solve this problem.
Color versions of one or more of the figures in this article are available online
at https://ieeexplore.ieee.org. Specifically, a neural network (NN) can be trained to learn the
Digital Object Identifier 10.1109/JLT.2020.3034692 inverse relationship y = f −1 (G) between the vector y of pump

0733-8724 © 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
1372 JOURNAL OF LIGHTWAVE TECHNOLOGY, VOL. 39, NO. 5, MARCH 1, 2021

wavelengths and powers and the corresponding gain profile G, trained to learn the inverse function on a much bigger domain
using a synthetic dataset D = {(yi , Gi )} of thousands of gain than required, potentially hindering its performance on flat/tilted
curves generated with random pump parameters. The learned gains. This aspect is also more problematic when increasing the
mapping is then used to compute the required pump parameters amplification bandwidth or the number of modes and pumps, as
ỹ = f −1 (Gtarget ) to approximate a given target gain profile. the dimensionality of the space to explore also increases. The
This eliminates the need to solve complex iterative algorithms choice of parameters for the generation of the dataset is also
that require multiple integrations of the propagation equations critical for the effectiveness of the methods presented in [15],
for every new target profile, making Raman amplification suit- [16]. For example, the powers and wavelengths of the pumps are
able for its application in next-generation self-adaptive and au- selected a priori, which requires preliminary supervision and that
tonomous optical networks, where low-latency automatization can finally mean that the trained NNs might not able to predict
is fundamental [15]. The authors of [15] used two additional the optimal pump parameters.
techniques to refine the prediction of the NN. The first is Owing to automatic differentiation (AD) techniques [17],
model-averaging, which consists in training several NNs in analytical or numerical models describing dynamical systems
parallel, each on a random permutation of the dataset, and finally can be embedded in ML architectures [18]. By recording the
averaging their output. This approach, while providing some series of elementary operations performed on the model input in
improvements, is significantly heavier in terms of computational a computational graph, AD libraries such as PyTorch [19] can
time, both for the training and the inference phase. This aspect, compute the exact derivatives of the model output with respect
together with the memory requirements needed to store hundreds to any parameter to be optimized [20]. In the context of optical
of trained models, could pose a challenge to network controllers, communications, this approach has been demonstrated to be able
where computational power may be limited. The second tech- to perform end-to-end (E2E) optimization of a intensity modula-
nique consists in a fine-tuning phase requiring an additional NN tion/direct detection system by jointly optimizing the transmitter
trained to learn the direct mapping G = f (y). The prediction and receiver using NNs, outperforming classical feed-forward
error on the gain profile obtained with the approximate pump equalization [21]. The effectiveness of this technique has also
parameters ỹ is estimated using the learned direct mapping f˜ and been demonstrated for coherent transmissions [22] where prob-
minimized using an iterative gradient-descent algorithm without abilistic constellation shaping and geometric constellation shap-
integrating the propagation equations. Publication [15] showed ing have shown to be fundamental for achieving record spectral
promising results, demonstrating the feasibility of the method efficiencies in short- and long-haul experiments [23].
with flat and tilted gain profiles using a counter-propagating RA In this work we propose an unsupervised ML method which
over the C and C+L bands, achieving a maximum prediction employs AD to embed a differentiable FMF Raman amplifi-
error on the considered gain profiles well below 1 dB for different cation model in the training procedure of a NN to predict the
levels of amplifications. pump parameters able to generate flat and tilted gain profiles
In the context of MDM, a similar approach to design flat gain over a pre-determined range of amplification levels and gain tilts.
profiles both for 2-mode and 4-mode fibers has been demostrated The trained NN can then be used to obtain the required pump
in [16]. This work does not use neither model-averaging nor fine- parameters for a desired gain profile with low time-complexity.
tuning algorithms; therefore, memory and time requirements for The presented method has the advantage to train the NN directly
both the training and the inference phase are substantially cut. on the searched (e.g. flat and tilted) gain profiles, thereby directly
For the 4-mode FMF, [16] showed encouraging results in terms sampling the selected region of space of possible gains, instead
of MDG and gain flatness; nevertheless, the analysis is limited of building a dataset by solving the Raman equations using
to the C band only. random pump parameters. The supervised dataset design phase,
The main drawback of both methods with respect to iterative along with the issues related to it, is thus completely avoided,
optimization algorithms is that while the latter specifically look with the relationship between target gain and pump parameters
at minimizing a cost function C(Gtarget , G̃) between a desired being learned in the training phase of the NN through the
and predicted gain profile by taking the propagation model into differentiable Raman model. The ability to directly target an
account, the former is instead optimized to minimize a cost arbitrary region of the space of Raman gains makes this method
function C(y, ỹ) between pump parameters. The NNs are thus easily generalizable to any type of gain profiles, more robust and
unaware of the underlying mathematical and physical relations scalable with respect to the changes in number of modes, Raman
between pump parameters and gain profile, which has to be pumps, and fiber parameters. For all these reasons this unsu-
learned from the available data. In order to approximate the pervised method is expected to be more useful in self-adaptive
inverse function y = f −1 (G) using a NN and generate flat networks. This method is validated on different 4-mode fibers
gain profiles, the region of space of approximately flat Raman using a counter-propagating scheme with various numbers of
gains must be properly sampled. This cannot be easily achieved Raman pumps, up to 8, predicting the required pump powers
since the training dataset is generated by solving the Raman and wavelengths to generate gain profiles on the C+L band
equations with randomly drawn pump powers and wavelengths, with average gain and tilt in the interval from 5 dB to 15 dB
meaning that only the codomain of f −1 (·) is sampled with full and from −1.425 dB to 1.425 dB, respectively; results show
control. As a result, only a minor part of the generated gains MDG and gain flatness comparable to those reported in [16], but
fall inside the region of interest, resulting in the NNs being on a larger bandwidth and quantifying the advantage of higher
MARCON et al.: MODEL-AWARE DEEP LEARNING METHOD FOR RAMAN AMPLIFICATION IN FMFs 1373

number of pumps also in terms of the reached root-mean-square (AE) [27]. An AE is composed of two main blocks: an encoder,
error (RMSE).
E( · ; θe ) : Rp → Rq , (4)

II. PROPOSED METHOD and a decoder,

A. Multi-Mode Raman Amplifer Equations D( · ; θd ) : Rq → Rp , (5)


In a few-mode RA supporting M modes, Ns signal wave- where θe , θd are learnable parameters and q < p. The role of the
lengths and Np pump wavelengths, the power evolution of the encoder is to learn a lower dimensionality representation x̂ of
i-th frequency propagating in the m-th mode is described by the its input data x in a way that enables the decoder to compute an
following set of nonlinear ordinary differential equations [24], estimate x̃ of the original data from x̂:
[25]: x̃ = D(E(x; θe ); θd ). (6)
dP m Typically, both E and D consist in NNs that are jointly trained to
ηi i = −αi Pim
dz minimize the average of the cost function CAE between original
Ns +Np M
  and reconstructed samples of a dataset X = {xi }:
+ Pim Im,n gR (|fi − fj |)Pjn 1 
j=i+1 n=1 θe∗ , θd∗ = argmin CAE (D(E(x; θe ); θd ), x) . (7)
θe ,θd |X |
x∈X

i−1 M
fi
− Pim Im,n gR (|fi − fj |)Pjn , (1) By replacing the decoder D with a differentiable Raman model
j=1 n=1
f j R that maps a vector of pump powers and wavelengths
(M +1)Np
where Pim is the power in the m-th mode and i-th frequency, y = [λ | P] ∈ R+ , (8)
where i ∈ {1, . . . , Ns + Np }, m ∈ {1, . . . , M }, and the fre-
to the corresponding on-off gain, we can train the AE using
quencies fi are assumed to be sorted in ascending order; αi
(7) on a dataset X = {Gi } of gain curves to force the encoder
is the attenuation coefficient at the i-th frequency, gR (Δf ) is
NN to learn a low-dimensional representation that minimizes
the Raman gain coefficient for the frequency difference Δf ,
the reconstruction error through R. That is, the trained encoder
and Im,n is the overlap integral between mode m and mode n,
approximates the inverse of the Raman model
defined by
  +∞ E( · ; θe∗ ) ≈ R−1 ( · ), (9)
Im (x, y)In (x, y)dxdy meaning that the lower dimensionality representation of the
Im,n =   +∞ −∞   +∞ , (2) input gain G is the vector y of pump powers and wavelengths
Im (x, y)dxdy In (x, y)dxdy that approximates it.
−∞ −∞ While the numerical integration of the Raman model R is still
where Ik (x, y) is the intensity profile of the k-th mode. The required in the forward-pass of the training process to compute
overlap integrals are assumed to be wavelength independent. (7), this computational cost is no longer needed to determine the
Finally, ηi determines the relative propagation direction of the pump parameters that approximate a target gain profile, which
i-th frequency, so for the counter-propagating pumps ηi = −1, are directly obtained by using E( · ; θe∗ ).
∀i ∈ {Ns + 1, . . . , Ns + Np }, whereas ηi = 1 for the first Ns In this work, the encoder E is a feed-forward (FF) NN with Nh
frequencies. hidden, fully connected (FC) layers of Nn neurons and rectified
Modes with similar propagation constants, i.e. those within linear unit (ReLU) activation functions. Input and ouput layers
the same mode group, exhibit high coupling efficiency, resulting have size Ns × M and Np × (M + 1), respectively.
in the equalization of the amplifier gain for that particular group. In order to force a constraint on the predicted pump parameter
For the purpose of RA they can consequently be treated as vector y, a sigmoidal function
a unique mode [25], [26]. Conversely, linear mode coupling 1
σ(x) = (10)
between different mode groups is weak and will be neglected 1 + e−x
here, like in [16], [25], [26]. is used to limit the output x of the last FC layer of the NN to
For a fiber of length L, the Raman on-off gain G = [Gm i ] of the open interval (0, 1). The resulting normalized pump vector
the amplifier is defined by ŷ can then be linearly mapped to the desired interval of powers
Pim (z = L) with pumps turned on and wavelengths.
i =
Gm , (3) The decoder R consists of a fixed-step, fourth-order Runge-
Pim (z = L) with pumps turned off
Kutta integrator that solves (1) to compute the on-off gain using
where i = 1, . . . , Ns and m = 1, . . . , M . the pump parameters generated by the encoder.

B. Deep Learning Model Architecture C. Training Algorithm


Many of the E2E learning methods in the literature are The optimal encoder parameters, θe∗ , are found by solving (7)
based on a deep learning (DL) architecture called autoencoder with an iterative training algorithm and using the RMSE between
1374 JOURNAL OF LIGHTWAVE TECHNOLOGY, VOL. 39, NO. 5, MARCH 1, 2021

target and approximated gain as a cost function:

1 
M  
CAE (G, G̃) = RMSE Gm m
i , G̃i , (11)
M m=1 i
for i = 1, . . . , Ns . In the k-th iteration of the training algorithm,
the AE reconstruction of each curve in the dataset X is computed
as
G̃ = E(R(G); θe (k)) ∀G ∈ X , (12)
where θe (k) are the encoder parameters at the current iteration.
The total cost function for the iteration is then evaluated by
averaging (11) over X
1 
C(k) = CAE (G, G̃). (13)
|X |
G∈C

Finally, the encoder parameters are updated with a gradient


descent algorithm
θe (k + 1) = θe (k) − ∇θe (k) C(k), (14) Fig. 1. Mean (a) and variance (b) of the output of the NN during the first
training iteration, as a function of the number of hidden layers and for different
where  > 0 is the learning rate (LR) of the algorithm. The exact values of neurons per layer.
computation of the gradients is performed by means of AD
and backpropagation [27]. Advanced optimization algorithms
such as the adaptive moment estimation (Adam) algorithm [28] issue by directly sampling the selected space of Raman gains.
are typically employed for the update step (14) as they offer Consequently, the problem of overfitting is completely avoided,
robust convergence properties and adaptive LR schemes for each and regularization techniques are not required.
parameter.
During training, the relationship between input target gains D. Initial Conditions
and their respective pump powers and wavelengths is learned
When training the AE using the algorithm described above
through the differentiable Raman solver R, meaning that the
we face the problem of local minima, which is common when
vector of pump parameters yi associated to each gain profile
dealing with the optimization of many parameters with complex
Gi of the dataset is not needed. This fact can be exploited by
cost functions. An important aspect to consider when dealing
completely bypassing the dataset generation phase and training
with local minima is the initial conditions of the algorithm,
the encoder on the targeted family of desired ideal gain profiles.
which can significantly affect the outcome of the optimization
In this paper we focus on flat and tilted gains, so in each
problem.
training iteration k we generate a batch Bk = {Gi }B i=1 of B
The parameters of the encoder’s FC layers are initialized
√ √
ideal gain profiles with average gain level lG and tilt tG (gain
by sampling a uniform distribution on the interval [− n, n],
variation per unit wavelength) randomly sampled from a uniform
where n is the inverse of the number of incoming connections
distribution
 min max  to that layer [19]. This approach has been demostrated to be
lG ∼ U lG , lG , (15) effective to mitigate the problem of vanishing gradients when
 min max  training multi-layer NNs [29]. While this random initialization
tG ∼ U tG , tG , (16)
strategy is beneficial in classic supervised learning models, it
It is important to notice that this approach is completely affects the initial condition of our AE, as it imposes a random
generalizable and not limited to flat and tilted gains only, but it value to the initial normalized pump parameter vector ŷ0 . We
could be extended to other families of gain profiles by properly analyzed the statistical distribution of the output of the last FC
including them in the training data. layer, x0 , during the first training iteration for different number
As detailed in section I, in supervised learning techniques of hidden layers and neurons. We found that its elements follow
such as those presented in [15] and [16], the underlying physical a Gaussian-like distribution with zero mean and a variance that
model is only described by and learned from the provided data, decreases as the number of layers and neurons increases. In
meaning that it is essential to use datasets that are representative Fig. 1(a) and (b) we show the mean and variance, respectively,
of the problem. In the context of RA, this means that the of x0 for the case of a 4-mode fiber with 50 wavelength channels
dataset must properly sample the region of possible Raman gains and 8 pumps, resulting into an input layer of 200 neurons and
containing approximately flat gain profiles in order for the NN an output layer of 40 neurons. For a sufficiently high number of
to properly learn the inverse Raman model. This cannot be done hidden layers and neurons (which is easily met in practice) we
efficiently or easily, as there is actually no direct control on which can then use the approximation x0 ≈ 0, meaning that by using
gain profiles are sampled, but rather on the power and wave- (10) in the first training iteration ŷ0 ≈ σ(0) = 0.5, so fixing the
length of each pump. Instead, the presented approach avoids this initial pump powers and wavelengths to the middle point of the
MARCON et al.: MODEL-AWARE DEEP LEARNING METHOD FOR RAMAN AMPLIFICATION IN FMFs 1375

Fig. 2. Diagram of the AE architecture for the design of the Raman gain profile in FMFs. Green arrows and boxes are related to the training phase of the AE.

interval of allowed values. By introducing a centering vector α


Algorithm 1: AE Training Algorithm.
and subtracting it to the input of the sigmoids, we have:
Compute centering vector α
ŷ = σ(x − α), (17) Initialize encoder parameters: θe (0)
for k = 0 to Niter − 1 do
which enables us to relate the initial pump parameters to α as
Compute the mask Hk
follows
Generate batch Bk = {Gi }B i=1 of gain profiles
1
ŷ0 = σ(x0 − α) ≈ σ(−α) = . (18) Propagate batch to obtain the pump parameters from NN:
1 + eα Ŷk = {ŷki } = {σ(x̂ik Hk − α)}, x̂ik = N N (Gi ; θe (k))
We can use this result to force a more desirable initial condi- Map the normalized parameters to the selected range:
tion on the pump parameters by computing the appropriate value Yk = {yki } = Scale(Ŷk )
of α by inverting (18). Integrate (1) to compute the predicted gain profiles:
B˜k = {G̃ki } = {R(yki )}
E. Counter-Propagating Pumps Compute the cost function C(k) using (13)
Compute gradients with backpropagation: ∇θe (k) C(k)
For the case of counter-propagating pumps it is customary
Update the parameters θe (k + 1) using (14)
to implement a differential equation solver based on a shooting
end for
algorithm to determine the correct initial pumps powers to be in-
jected at z = L. This however would require significantly more
computational resources, as the propagation equation should
be solved several times for each training sample and, more then determined by:
importantly, could introduce convergence problems [30].
However, the method proposed here presents a particularly ŷk = σ(xk Hk − α), (19)
advantageous feature in this regard: in fact, the encoder E
where indicates the Hadamard (element-by-element) product.
can direcly predict the pumps powers at z = 0, P̃im (z = 0),
Hk can be suitably designed to “steer” the NN by weighting the
eliminating the need to employ shooting algorithms. By solving
computed gradients of the cost function with respect to the pump
(1) with initial (z = 0) conditions for pumps and signals, we
parameters during backpropagation. In our case, we set:
obtain the predicted gain G̃ along with pumps powers at the end
of the link, P̃im (z = L), which are the values of interest. We ⎧

⎨ 0 1 ≤ i ≤ Np , k < K
therefore trade a significant computation cost in the training
phase for a single integration of (1) in the inference phase. Hk = [Hk ] = 1 Np + 1 ≤ i ≤ (M + 1)Np , k < K
i


The resulting AE-based system is represented in the diagram 1 1 ≤ i ≤ (M + 1)Np , k ≥ K ,
of Fig. 2, highlighting the various components of the archi- (20)
tecture and its input-output relations. Green boxes and arrows where the superscript i indicates the i-th element of the vector.
are related to the training phase of the AE, during which the Using this definition, the pump wavelengths are fixed to their
encoder parameters θe are optimized. In order to compensate initial conditions for the first K iterations, allowing the encoder
the significantly higher sensitivity of the predicted gain to the to learn just the relationship between predicted pump power
optimization parameters and avoid further problems with local and generated gain profile, which is more critical during the
minima, we introduce a modification to the training algorithm first training iterations. The training algorithm is summarized in
by multiplying the output of the last FC layer x by a mask Hk , Algorithm 1 and is completely implemented using the PyTorch
where the subscript k indicates the k-th training iteration. The DL library [19], which enables us to leverage AD and graphics
normalized pump parameters for the k-th training iteration are processing unit (GPU) acceleration.
1376 JOURNAL OF LIGHTWAVE TECHNOLOGY, VOL. 39, NO. 5, MARCH 1, 2021

TABLE I
OVERLAP INTEGRALS OF THE FMFS USED FOR SIMULATION, IN
UNITS OF 1 × 109 m−2

III. RESULTS AND VALIDATION


We test the presented method using counter-propagating
pumps and a L = 70 km long 4-mode step-index fiber (SIF)
whose overlap integrals are calculated in [25] and reported in
Table I. Hereinafter, we refer to this fiber as FMF1. The Raman
Fig. 3. RMSE as a function of the target gain level, for different number of
gain spectrum is computed using the multi-vibrational-mode pumps. Solid lines represent the mean RMSE over the 4 modes, whereas shaded
model of the Raman response function for silica fibers [31], areas indicate the total RMSE variation over the modes.
whereas the peak value for the Raman gain coefficient gR =
7 × 10−14 W−1 m was used [14]. The spectral attenuation coef-
ficient of the fiber is obtained from a parabolic fit of attenuation equations (1):
data of a commercially available SMF, α(λ) = α0 + α1 λ +
α2 λ2 , with coefficients α0 = 5.788 dB km−1 , α1 = −7.1246 × [G̃ | P̃(z = L)] = R(ỹ). (22)
10−3 dB km−1 nm−1 , α2 = 2.268 × 10−6 dB km−1 nm−1 . This The total training time for the employed NNs is approximately
approximation is valid for the wavelengths interval from 1385 45 minutes using an NVIDIA Quadro M4000 GPU. The com-
nm to 1625 nm. As in [24], [25] we assume the absence of putational time to perform a prediction for a single target gain
mode-dependent losses (MDL). profile on an Intel consumer laptop CPU is approximately 11ms,
We consider the transmission on Ns = 50 wavelengths on of which 1ms is required for computing the output of the encoder
the C+L band, for a total number of spatial and wavelength NN, and the remaining 10ms are needed for integrating (1).
channels equal to Nch = M × Ns = 200. The input power for
each channel is set to Pch = −10 dBm. A. Flat Gain Profiles
The encoder NN is composed of Nh = 5 hidden layers of
Nn = 1000 neurons each, and its parameters are optimized First, we asses the performance of the presented method using
using the Adam algorithm with a LR  = 1 × 10−4 . The AE FMF1 for the case of flat target gain profiles in terms of RMSE,
is trained for Niter = 1000 iterations with batches of B = 1024 gain flatness and MDG, for different levels of amplification and
gain curves, which are sufficient to fill the GPU random access varying the number of Raman pumps. Given that the number
memory (RAM) and ensure 100% GPU clock utilization. Each of pumps determines the size of the input and output layers of
batch is generated according to the strategy described in section the encoder NN, the training algorithm must be run for every
II, with average gain level and tilt uniformly sampled from the value that this parameter assumes. For each target curve, we
intervals of 5 dB to 15 dB and −0.015 dB nm−1 to 0.015 dB obtain the corresponding AE prediction using (21), (22) and
nm−1 , respectively. compute the RMSE for each mode m as:
We map the output of the sigmoids to limit the predicted 1 Ns  m 2
power at z = 0 and wavelength of each pump into the intervals RMSEm (G, G̃) = Gi − G̃m i . (23)
Ns i=1
IP (z=0) = [−60, 20] dBm and Iλ = [1410, 1520] nm, respec-
tively. In Fig. 3 we report the RMSE in terms of percentage of the target
Using (18) we set the initial power on each pump to P0 (z = gain level, as a function of the amplification level and using 4, 5,
0) = 3 dBm, whereas the wavelengths are uniformly distributed 6, and 8 Raman pumps. Solid lines and shaded regions represent
over Iλ . Additionally, we use (20) to fix the pumps wavelengths the average RMSE and maximum to minimum RMSE variation
to their initial value for the first K = 100 iterations. over the modes, respectively. For gain levels inside the target
Once the AE is trained, the encoder is used determine pump interval of [5, 15] dB, the RMSE curves are almost constant,
wavelengths and powers at z = 0 to approximate a given target independently of the number of pumps used. Conversely, the
gain profile: RMSE rapidly grows outside the training interval, as the encoder
NN is not able to extrapolate the correct pump parameters. By
ỹ = [λ̃ | P̃(z = 0)] = E(G ; θe∗ ), (21) increasing the number of Raman pumps from 4 to 8 we improve
the RMSE, going from 3% to about 1% of the target gain.
and the corresponding predicted gain G̃ and pumps powers A clear picture on the improvements brought by an increased
at z = L are obtained with a single integration of the Raman number of pumps is given by the gain flatness or WDG, defined
MARCON et al.: MODEL-AWARE DEEP LEARNING METHOD FOR RAMAN AMPLIFICATION IN FMFs 1377

Fig. 4. Gain flatness variation along the modes of FMF1, as a function of the Fig. 6. Relative MDG as a function of the gain level, for different number of
target gain level, using different number of pumps. pumps.

gain profile for each mode is in fact the same up to a residual


MDG, which increases with the gain level.
For a given gain profile G, we quantify its MDG as

MDG(G) = max max Gm i − min Gm
i , (25)
i m m

with i = 1, . . . , Ns and m = 1, . . . , M . In Fig. 6 we report


the MDG as percentage of the total gain using 4, 5, 6, and 8
Raman pumps. Differently from the case of gain flatness, the
number of pumps does not influence the total MDG, which is
practically constant inside the interval of gain levels on which
the AE was trained, settling at about 2% of the target gain.
This residual MDG is caused mainly by the fact that LP01 and
LP11 modes are systematically over-amplified with respect to the
others. By inspecting the values of overlap integrals of FMF1
Fig. 5. Target and predicted flat gain profiles for a 4-mode fiber over the in Table I, we can observe that the sum of the off-diagonal
C+L–band, using 8 pumps. entries in the columns/rows associated with LP01 and LP11 are
the first and second largest, respectively, meaning that power
is more efficiently coupled by the nonlinear Raman interaction
for each mode m as in these two modes. In Fig. 7 we plot the total pump power in
z = L on each mode of the FMF, as predicted by the AE, as a
Fm (G) = max Gm
i − min Gi ,
m
(24) function of the target gain level; the cases of 4, 5, 6, and 8 pumps
i i
are considered, with solid lines representing the average power
for i = 1, . . . , Ns and m = 1, . . . , M . We report gain flatness and shaded areas depicting the power variation by employing
in terms of percentage of target gain level in Fig. 4, where the different numbers of pumps. Independently of the amplification
shaded areas represent the flatness variation over the modes. The level, no power is launched in the LP01 and LP11 modes, with
most significant improvement is obtained from 4 to 5 pumps, 70% of the total power assigned to LP21 , and the remaining
reducing the flatness from 15% to about 6%. For example, this 30% to LP02 , confirming the results of [24] and [25]. Even
means that for a 10 dB target gain, the total flatness would though no power is injected in LP01 and LP11 , these two modes
be decreased to just 0.6dB from 1.5 dB; this value is further are those experience the highest amplification, predominantly
decreased to 0.35dB using 8 pumps. Moreover, we can observe contributing to the residual MDG of the system. In order to
that flatness is practically constant among the modes, with confirm the role of the overlap integrals in determining the
fluctuations always lower than 0.5% of the target gain in the MDG we test two additional 4-mode fibers. The first, which
interval from 5 dB to 15 dB. We can see an example of the we label “FMF2,” is a SIF with a core diameter of 18 μm,
achieved gain profiles for the case of 8 pumps in Fig. 5, where core refractive index of 1.466, and a relative refractive index
we plot the flat target profiles and the predicted gain curves for difference between core and cladding Δ = 0.4%, supporting the
different amplification levels inside the training interval. The propagation of the LP01 , LP11 , LP02 and LP21 modes over the
1378 JOURNAL OF LIGHTWAVE TECHNOLOGY, VOL. 39, NO. 5, MARCH 1, 2021

Fig. 7. Total pump power at z = L in each mode of FMF1, as a function of Fig. 9. Target and predicted gain profiles for the tilted case, using 8 pumps and
the target gain level. Shaded areas indicate the variation using different number with a total tilt of 1.425 dB, i.e. the maximum considered tilt during training.
of pumps.

B. Tilted Gain Profiles


In order to account for tilted gain profiles, the AE is
trained using ideal gain profiles with average gain and tilt uni-
formly sampled from the two-dimensional training region T =
[5, 15] dB × [−0.015, 0.015] dB nm−1 , resulting in a maximum
total tilt on the C+L band equal to Tmax = 0.015 dB nm−1 ×
95 nm = 1.425 dB. In Fig. 9 we report the target gain profiles
and corresponding AE predictions using FMF1 and 8 pumps,
for a total tilt equal to Tmax and for different average gain levels
inside the training region. Results show good agreement between
targets and predictions, with approximately the same gain profile
on each mode, up to the residual MDG.
An analysis similar to that of flat gain profiles is carried
out for the case of tilted profiles, evaluating the metrics of
interest for FMF1 and varying the number of employed Raman
Fig. 8. Relative MDG for flat target gain profiles, as a function of the gain
pumps, keeping the other simulation parameters unchanged. We
level, for three different fibers. The number of pumps is set to Np = 8. compute RMSE, flatness, and MDG of the predicted gain profiles
and visualize them in Fig. 10, representing the metrics as a
function of the target gain level and total tilt on the C+L band.
Each metric is reported in terms of percentage of the target gain
entire simulation bandwidth. Its overlap integrals are reported level; for RMSE and flatness we consider the worst-case scenario
in Table I. The second fiber, which we refer to as “FMF3,” is among the modes, i.e. their maximum value. Fig. 10 is organized
instead an ideal 4-mode fiber whose overlap integrals are equal to such that columns 1 through 4 of the grid correspond to the case
5.47 × 10−9 m−2 , i.e the overlap integral for the LP01 -LP01 mode of 4, 5, 6, and 8 pumps, whereas row 1, 2, and 3 correspond
pair of FMF2. All the other simulation parameters, including the to RMSE, flatness and MDG, respectively. The color scale for
attenuation spectrum and Raman gain coefficient of the fiber, each metric is saturated to different levels in order to improve the
remain unchanged. Training the AE under the same conditions, contrast of the color maps. In Fig. 10(a)–(d) we can appreciate
we can observe the effect of the fiber design on the performance the improvements in terms of RMSE by using more pumps: the
of the system in terms of residual MDG. For the case of 8 Raman color map is increasingly darker inside and in the vicinity of the
pumps, we report the MDG for the three considered fibers as a training region T , whose bounds are represented by a dashed
function of the gain level in Fig. 8: FMF2 exhibits the highest rectangle. Additionally, by using 5 or more pumps, the level
MDG among the fibers, reaching a value of approximately 4% curves show that a RMSE lower than 3% of the target gain level
of the target gain inside the training interval of 5 dB to 15 dB, is achieved for (practically) all the gain level-tilt combinations
while for FMF3 the AE correctly predicts the power distribution in T .
among the modes that results in no MDG, launching power only Similar observations can be made for the flatness from
in the LP01 mode. Fig. 10(e)–(h), where a value of about 17% is reached for the
MARCON et al.: MODEL-AWARE DEEP LEARNING METHOD FOR RAMAN AMPLIFICATION IN FMFs 1379

Fig. 10. Calculated metrics for the tilted gain case, varying the number of Raman pumps: RMSE (a)–(d), flatness (e)–(h), and MDG (i)–(l) as a function of the
target gain level and target tilt. For RMSE and flatness their maximum value among the modes is reported. Columns 1 through 4 refer to the case of 4, 5, 6, and 8
pumps, respectively.

points inside the training region using 4 pumps; increasing the inverse model. In fact, the relationship between input target gain
number of pumps leads to progressively lower flatness values, and the pump parameters that best approximate it are learned
down to 5% inside T with 8 pumps. in the training phase from the embedded numerical model,
Similarly, for the MDG, Fig. 10(i)–(l) show that a higher allowing to accurately sample the targeted region of the space of
number of pumps brings no significant changes, as the minimum possible gain profiles. As a result, this method scales well with
achievable MDG is determined by overlap integrals of the fiber. respect to the number of fiber modes, the number of Raman
Its value stays infact approximately constant inside the training pumps, and the amplification bandwidth. On this regard, the
region regardless of the pump count, with the level curve show- low root-mean-square error (quantified for various number of
ing that MDG values lower than 4% are achieved for a region pump wavelenghts) demonstrated the achievement of the target
considerably wider than T . profile. Another key advantage of this scheme is that it does
not require supervision in selecting simulation parameters (like
power and wavelength ranges) that might also affect the quality
IV. CONCLUSION of the results.
We have demonstrated an unsupervised machine learning This approach is tested on a 4-mode fiber using the counter-
method based on autoencoders to predict the required pump pumping scheme, various numbers of pumps, up to 8, and for
parameters to generate flat and tilted gain profiles using Ra- the C+L band. The training process is further simplified by
man amplification in few-mode fibers. Thanks to automatic the fact that the autoencoder can directly predict the pump
differentiation, a numerical Raman model is embedded in the powers at z = 0, eliminating the need to employ costly shooting
autoencoder, allowing to train it directly on ideal gain profiles algorithms that are typically needed for counter-propagating
(e.g. flat or tilted) and obtaining a robust unsupervised learning Raman amplification models. The pump power to be injected
method that does not rely on a pre-computed dataset to learn the in the fiber are in fact computed with a single integration of the
1380 JOURNAL OF LIGHTWAVE TECHNOLOGY, VOL. 39, NO. 5, MARCH 1, 2021

propagation equations. We achieved very good results regarding [14] C. Headley and G. P. Agrawal, Raman Amplification in Fiber Optical
flatness and mode-dependent gain over the entire C+L band and Communication Systems. San Diego, CA, USA: Academic Press, 2005.
[15] D. Zibar et al., “Inverse system design using machine learning: The Raman
the considered interval of gain levels and tilts, reaching a gain amplifier case,” J. Lightw. Technol., vol. 38, no. 4, pp. 736–753, Feb. 2020.
flatness of 3% of the total gain using 8 pumps, and a residual [16] Y. Chen, J. Du, Y. Huang, K. Xu, and Z. He, “Intelligent gain flattening in
mode-dependent gain of 2% of the total gain, independently of wavelength and space domain for FMF raman amplification by machine
learning based inverse design,” Opt. Express, vol. 28, no. 8, pp. 11 911–
the number of Raman pumps. This method can be extended to 11 920, Apr. 2020.
the case of co-propagating pumps and even to a mixture of co- [17] M. Bartholomew-Biggs, S. Brown, B. Christianson, and L. Dixon, “Au-
and counter-propagating pumps. Finally, if the numerical model tomatic differentiation of algorithms,” J. Comput. Appl. Math., vol. 124,
no. 1, pp. 171–190, Dec. 2000.
is substituted by an experiment (with automatic data aquisition), [18] M. Raissi, P. Perdikaris, and G. Karniadakis, “Physics-informed neural
the encoder neural network could, in principle, be trained by the networks: A deep learning framework for solving forward and inverse
experiments. This will also require the definition of a proper problems involving nonlinear partial differential equations,” J. Comput.
Phys., vol. 378, pp. 686–707, Feb. 2019.
algorithm to update the neural network parameters, to replace [19] A. Paszke et al., “Pytorch: An imperative style, high-performance deep
automatic differentiation. learning library,” in Proc. Adv. Neural Inf. Proc. Syst. 32, H. Wallach, H.
Larochelle, A. Beygelzimer, F. dAlché-Buc, E. Fox, and R. Garnett, Eds.
Curran Associates, Inc., 2019, pp. 8024–8035.
REFERENCES [20] A. G. Baydin, B. A. Pearlmutter, A. A. Radul, and J. M. Siskind, “Auto-
matic differentiation in machine learning: A survey,” J. Mach. Learn. Res.,
[1] P. P. Mitra and J. B. Stark, “Nonlinear limits to the information capacity
of optical fibre communications,” Nature, vol. 411, no. 6841, Jun. 2001, vol. 18, no. 153, pp. 1–43, 2018.
[21] B. Karanov et al., “End-to-end deep learning of optical fiber communica-
Art. no. 1027.
tions,” J. Lightw. Technol., vol. 36, no. 20, pp. 4843–4855, Oct. 2018.
[2] R.-J. Essiambre, G. J. Foschini, G. Kramer, and P. J. Winzer, “Capacity
[22] R. T. Jones, T. A. Eriksson, M. P. Yankov, and D. Zibar, “Deep learning
limits of information transport in fiber-optic networks,” Phys. Rev. Lett.,
vol. 101, no. 16, Oct. 2008, Art. no. 163901. of geometric constellation shaping including fiber nonlinearities,” in Proc.
Eur. Conf. Opt. Commun., Rome, Italy, Sep. 2018, pp.1–3.
[3] A. D. Ellis, N. M. Suibhne, D. Saad, and D. N. Payne, “Communication
[23] J. Cho and P. J. Winzer, “Probabilistic constellation shaping for optical
networks beyond the capacity crunch,” Philos. Trans. Roy. Soc. A: Math.,
fiber communications,” J. Lightw. Technol., vol. 37, no. 6, pp. 1590–1607,
Phys. Eng. Sci., vol. 374, no. 2062, Mar. 2016, Art. no. 20150191.
[4] A. Chralyvy, “The coming capacity crunch,” in Proc. 35th Eur. Conf. Opt. Mar. 2019.
[24] R. Ryf, R. Essiambre, J. von Hoyningen-Huene, and P. Winzer, “Analysis
Commun., Vienna, Austria, Sep. 2009, pp. 1–1.
of mode-dependent gain in Raman amplified few-mode fiber,” in Proc.
[5] D. J. Richardson, J. M. Fini, and L. E. Nelson, “Space-division multiplex-
Opt. Fiber Commun. Conf., Los Angeles, CA, USA, 2012, pp. 1–3, Art.
ing in optical fibres,” Nat. Photon., vol. 7, no. 5, pp. 354–362, May 2013.
[6] R. Ryf et al., “Mode-division multiplexing over 96 km of few-mode fiber no. OW1D.2.
[25] J. Zhou, “An analytical approach for gain optimization in multimode fiber
using coherent 6 × 6 MIMO processing,” J. Lightw. Technol., vol. 30, no. 4,
raman amplifiers,” Opt. Express, vol. 22, no. 18, Sep. 2014, Art. no. 21393.
pp. 521–531, Feb. 2012.
[26] C. Antonelli, A. Mecozzi, and M. Shtaif, “Raman amplification in mul-
[7] V. Sleiffer et al., “737 Tb/s (96 × 3 × 256-Gb/s) mode-division-
timode fibers with random mode coupling,” Opt. Lett., vol. 38, no. 8,
multiplexed DP-16QAM transmission with inline MM-EDFA,” Opt. Ex-
Apr. 2013, Art. no. 1188.
press, vol. 20, no. 26, Dec. 2012, Art. no. B428.
[27] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. Cambridge,
[8] N. Bai et al., “Mode-division multiplexed transmission with inline few-
mode fiber amplifier,” Opt. Express, vol. 20, no. 3, Jan. 2012, Art. no. MA, USA: MIT Press, 2016.
[28] D. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in
2668.
Proc. 3rd Int. Conf. Learning Representations (ICLR), San Diego, CA,
[9] R. Ryf et al., “Mode-equalized distributed Raman amplification in 137-km
USA, May 2015, pp. 1–15.
few-mode fiber,” in Proc. 37th Eur. Conf. Exhib. Opt. Commun., Geneva,
Switzerland, Sep. 2011, pp. 1–3. [29] X. Glorot and Y. Bengio, “Understanding the difficulty of training deep
feedforward neural networks,” in Proc. 13th Int. Conf. Statist., Mar. 2010,
[10] M. Esmaeelpour et al., “Transmission over 1050-km few-mode fiber based
pp. 249–256.
on bidirectional distributed raman amplification,” J. Lightw. Technol.,
[30] H. M. Jiang and K. Xie, “Efficient and robust shooting algorithm for
vol. 34, no. 8, pp. 1864–1871, Apr. 2016.
[11] J. Li et al., “Experimental demonstration of a few-mode raman amplifier numerical design of bidirectionally pumped Raman fiber amplifiers,” J.
Opt. Soc. Amer. B, vol. 29, no. 1, pp.8–14, 2012.
with a flat gain covering 1530–1605 nm,” Opt. Lett., vol. 43, no. 18,
[31] D. Hollenbeck and C. D. Cantrell, “Multiple-vibrational-mode model for
Sep. 2018, Art. no. 4530.
fiber-optic Raman gain spectrum and response function,” J. Opt. Soc. Amer.
[12] J. Bromage, “Raman amplification for fiber communications systems,” J.
Lightw. Technol., vol. 22, no. 1, pp. 79–93, Jan. 2004. B, vol. 19, no. 12, pp.2886–2892, Dec. 2002.
[13] D. Jia, H. Zhang, Z. Ji, N. Bai, and G. Li, “Optical fiber amplifiers for space-
division multiplexing,” Front. Optoelectron., vol. 5, no. 4, pp. 351–357,
Dec. 2012.

You might also like