JLT 39 5 1371
JLT 39 5 1371
Abstract—One of the most promising solutions to overcome During the last three decades, the demand in internet traffic
the capacity limit of current optical fiber links is space-division increased exponentially with an annual rate of 40%, while cur-
multiplexing, which allows the transmission on various cores of rent technologies are rapidly approaching the nonlinear Shannon
multi-core fibers or modes of few-mode fibers. In order to realize
such systems, suitable optical fiber amplifiers must be designed. limit (NSL) of single-mode fibers (SMFs) [3]. In order to avoid
In single mode fibers, Raman amplification has shown significant bringing the existing optical fiber infrastructure to a “capac-
advantages over doped fiber amplifiers due to its low-noise and ity crunch” [4], space-division multiplexing (SDM) has been
spectral flexibility. For these reasons, its use in next-generation proposed as the key technology for future lightwave systems
space-division multiplexing transmission systems is being studied operating beyond the NSL [5].
extensively. In this work, we propose a deep learning method that
uses automatic differentiation to embed a complete few-mode Ra- A promising approach to implement SDM is to exploit the
man amplification model in the training process of a neural network spatial diversity of modes in few-mode fibers (FMFs) to transmit
to identify the optimal pump wavelengths and power allocation independent data streams, so realizing mode-division multi-
scheme to design both flat and tilted gain profiles. Compared to plexing (MDM) [6]. In order to benefit from the added ca-
other machine learning methods, the proposed technique allows to pacity of spatially-multiplexed transmissions, suitable network
train the neural network on ideal gain profiles, removing the need
to compute a dataset that accurately covers the space of Raman devices must be designed fully compatible with the already
gains we are interested in. The ability to directly target a selected well-established techniques such as wavelength-division multi-
region of the space of possible gains allows the method to be easily plexing (WDM). To this end, the role of SDM-compatible ampli-
generalized to any type of Raman gain profiles, while also being fiers is of fundalmental importance, with several experimental
more robust when increasing the number of pumps, modes, and works demonstrating the effectiveness in MDM scenarios of
the amplification bandwidth. This approach is tested on a 70 km
long 4-mode fiber transmitting over the C+L band with various both erbium-doped fiber amplifiers (EDFAs) [7], [8] and Raman
numbers of Raman pumps in the counter-propagating scheme, amplifiers (RAs) [9], [10].
targeting gain profiles with an average gain in the interval from 5 dB The compensation of link losses with minimal signal-to-noise
to 15 dB and total tilt in the interval from −1.425 to 1.425 dB. We ratio (SNR) reduction has always been a crucial aspect in optical
achieve wavelength- and mode-dependent gain fluctuations lower
communications, but additional care must be taken with SDM
than 0.04 dB and 0.02 dB per dB of gain, respectively.
systems to minimize both mode-dependent gain (MDG) and
Index Terms—Raman amplification, space-division wavelength-dependent gain (WDG), as they can be both detri-
multiplexing, deep learning. mental to the multiple-input multiple-output (MIMO) digital
signal processing (DSP) algorithms that mitigate the effect of
I. INTRODUCTION mode-crosstalk to correctly recover the transmitted signals [11].
ONLINEAR phenomena arising in optical fibers impose While the simplicity and power efficiency of EDFAs made
N an intrinsic limit to their information capacity [1], [2]. them appealing for commercial communication systems, their
reduced gain bandwidth has made Raman amplification an at-
Manuscript received July 9, 2020; revised October 2, 2020; accepted October tractive solution for wideband WDM schemes [12]. The spectral
25, 2020. Date of publication October 29, 2020; date of current version March flexibility of RAs, together with suitable optmization techniques,
1, 2021. This work was supported in part by the Italian Ministry for Education, enables the design of flat gain profiles over large bandwidths
University and Research (MIUR) through law 232/2016—“Departments of
Excellence” and PRIN 2017—project 2017HP5KH7: Fiber Infrastructure for by means of multiple wavelength pumps [12]. In the context
Research on Space-division multiplexed Transmission (FIRST), and in part by of SDM, the additional degrees of freedom can lead to higher
the University of Padova through BIRD 2019—project MACFIBER. (Corre- control of WDG and MDG [13]. Additionally, RAs can offer dis-
sponding author: Gianluca Marcon.)
Gianluca Marcon is with the Department of Information Engineering, Uni- tributed amplification, resulting in a reduced noise contribution
versity of Padova, 35122 Padova, Italy (e-mail: [email protected]). compared to EDFAs [14].
Andrea Galtarossa, Luca Palmieri, and Marco Santagiustina are with In SMF systems, different approaches have been followed
the Department of Information Engineering, University of Padova, 35122
Padova, Italy, and also with CNIT – National Inter-University Consortium for to correctly determine the pump parameters required to obtain
Telecommunications, 56127 Pisa, Italy (e-mail: [email protected]; pre-determined gain profiles. A recent publication [15] pro-
[email protected]; [email protected]). posed a machine learning (ML) technique to solve this problem.
Color versions of one or more of the figures in this article are available online
at https://ieeexplore.ieee.org. Specifically, a neural network (NN) can be trained to learn the
Digital Object Identifier 10.1109/JLT.2020.3034692 inverse relationship y = f −1 (G) between the vector y of pump
0733-8724 © 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
1372 JOURNAL OF LIGHTWAVE TECHNOLOGY, VOL. 39, NO. 5, MARCH 1, 2021
wavelengths and powers and the corresponding gain profile G, trained to learn the inverse function on a much bigger domain
using a synthetic dataset D = {(yi , Gi )} of thousands of gain than required, potentially hindering its performance on flat/tilted
curves generated with random pump parameters. The learned gains. This aspect is also more problematic when increasing the
mapping is then used to compute the required pump parameters amplification bandwidth or the number of modes and pumps, as
ỹ = f −1 (Gtarget ) to approximate a given target gain profile. the dimensionality of the space to explore also increases. The
This eliminates the need to solve complex iterative algorithms choice of parameters for the generation of the dataset is also
that require multiple integrations of the propagation equations critical for the effectiveness of the methods presented in [15],
for every new target profile, making Raman amplification suit- [16]. For example, the powers and wavelengths of the pumps are
able for its application in next-generation self-adaptive and au- selected a priori, which requires preliminary supervision and that
tonomous optical networks, where low-latency automatization can finally mean that the trained NNs might not able to predict
is fundamental [15]. The authors of [15] used two additional the optimal pump parameters.
techniques to refine the prediction of the NN. The first is Owing to automatic differentiation (AD) techniques [17],
model-averaging, which consists in training several NNs in analytical or numerical models describing dynamical systems
parallel, each on a random permutation of the dataset, and finally can be embedded in ML architectures [18]. By recording the
averaging their output. This approach, while providing some series of elementary operations performed on the model input in
improvements, is significantly heavier in terms of computational a computational graph, AD libraries such as PyTorch [19] can
time, both for the training and the inference phase. This aspect, compute the exact derivatives of the model output with respect
together with the memory requirements needed to store hundreds to any parameter to be optimized [20]. In the context of optical
of trained models, could pose a challenge to network controllers, communications, this approach has been demonstrated to be able
where computational power may be limited. The second tech- to perform end-to-end (E2E) optimization of a intensity modula-
nique consists in a fine-tuning phase requiring an additional NN tion/direct detection system by jointly optimizing the transmitter
trained to learn the direct mapping G = f (y). The prediction and receiver using NNs, outperforming classical feed-forward
error on the gain profile obtained with the approximate pump equalization [21]. The effectiveness of this technique has also
parameters ỹ is estimated using the learned direct mapping f˜ and been demonstrated for coherent transmissions [22] where prob-
minimized using an iterative gradient-descent algorithm without abilistic constellation shaping and geometric constellation shap-
integrating the propagation equations. Publication [15] showed ing have shown to be fundamental for achieving record spectral
promising results, demonstrating the feasibility of the method efficiencies in short- and long-haul experiments [23].
with flat and tilted gain profiles using a counter-propagating RA In this work we propose an unsupervised ML method which
over the C and C+L bands, achieving a maximum prediction employs AD to embed a differentiable FMF Raman amplifi-
error on the considered gain profiles well below 1 dB for different cation model in the training procedure of a NN to predict the
levels of amplifications. pump parameters able to generate flat and tilted gain profiles
In the context of MDM, a similar approach to design flat gain over a pre-determined range of amplification levels and gain tilts.
profiles both for 2-mode and 4-mode fibers has been demostrated The trained NN can then be used to obtain the required pump
in [16]. This work does not use neither model-averaging nor fine- parameters for a desired gain profile with low time-complexity.
tuning algorithms; therefore, memory and time requirements for The presented method has the advantage to train the NN directly
both the training and the inference phase are substantially cut. on the searched (e.g. flat and tilted) gain profiles, thereby directly
For the 4-mode FMF, [16] showed encouraging results in terms sampling the selected region of space of possible gains, instead
of MDG and gain flatness; nevertheless, the analysis is limited of building a dataset by solving the Raman equations using
to the C band only. random pump parameters. The supervised dataset design phase,
The main drawback of both methods with respect to iterative along with the issues related to it, is thus completely avoided,
optimization algorithms is that while the latter specifically look with the relationship between target gain and pump parameters
at minimizing a cost function C(Gtarget , G̃) between a desired being learned in the training phase of the NN through the
and predicted gain profile by taking the propagation model into differentiable Raman model. The ability to directly target an
account, the former is instead optimized to minimize a cost arbitrary region of the space of Raman gains makes this method
function C(y, ỹ) between pump parameters. The NNs are thus easily generalizable to any type of gain profiles, more robust and
unaware of the underlying mathematical and physical relations scalable with respect to the changes in number of modes, Raman
between pump parameters and gain profile, which has to be pumps, and fiber parameters. For all these reasons this unsu-
learned from the available data. In order to approximate the pervised method is expected to be more useful in self-adaptive
inverse function y = f −1 (G) using a NN and generate flat networks. This method is validated on different 4-mode fibers
gain profiles, the region of space of approximately flat Raman using a counter-propagating scheme with various numbers of
gains must be properly sampled. This cannot be easily achieved Raman pumps, up to 8, predicting the required pump powers
since the training dataset is generated by solving the Raman and wavelengths to generate gain profiles on the C+L band
equations with randomly drawn pump powers and wavelengths, with average gain and tilt in the interval from 5 dB to 15 dB
meaning that only the codomain of f −1 (·) is sampled with full and from −1.425 dB to 1.425 dB, respectively; results show
control. As a result, only a minor part of the generated gains MDG and gain flatness comparable to those reported in [16], but
fall inside the region of interest, resulting in the NNs being on a larger bandwidth and quantifying the advantage of higher
MARCON et al.: MODEL-AWARE DEEP LEARNING METHOD FOR RAMAN AMPLIFICATION IN FMFs 1373
number of pumps also in terms of the reached root-mean-square (AE) [27]. An AE is composed of two main blocks: an encoder,
error (RMSE).
E( · ; θe ) : Rp → Rq , (4)
1
M
CAE (G, G̃) = RMSE Gm m
i , G̃i , (11)
M m=1 i
for i = 1, . . . , Ns . In the k-th iteration of the training algorithm,
the AE reconstruction of each curve in the dataset X is computed
as
G̃ = E(R(G); θe (k)) ∀G ∈ X , (12)
where θe (k) are the encoder parameters at the current iteration.
The total cost function for the iteration is then evaluated by
averaging (11) over X
1
C(k) = CAE (G, G̃). (13)
|X |
G∈C
Fig. 2. Diagram of the AE architecture for the design of the Raman gain profile in FMFs. Green arrows and boxes are related to the training phase of the AE.
TABLE I
OVERLAP INTEGRALS OF THE FMFS USED FOR SIMULATION, IN
UNITS OF 1 × 109 m−2
Fig. 4. Gain flatness variation along the modes of FMF1, as a function of the Fig. 6. Relative MDG as a function of the gain level, for different number of
target gain level, using different number of pumps. pumps.
Fig. 7. Total pump power at z = L in each mode of FMF1, as a function of Fig. 9. Target and predicted gain profiles for the tilted case, using 8 pumps and
the target gain level. Shaded areas indicate the variation using different number with a total tilt of 1.425 dB, i.e. the maximum considered tilt during training.
of pumps.
Fig. 10. Calculated metrics for the tilted gain case, varying the number of Raman pumps: RMSE (a)–(d), flatness (e)–(h), and MDG (i)–(l) as a function of the
target gain level and target tilt. For RMSE and flatness their maximum value among the modes is reported. Columns 1 through 4 refer to the case of 4, 5, 6, and 8
pumps, respectively.
points inside the training region using 4 pumps; increasing the inverse model. In fact, the relationship between input target gain
number of pumps leads to progressively lower flatness values, and the pump parameters that best approximate it are learned
down to 5% inside T with 8 pumps. in the training phase from the embedded numerical model,
Similarly, for the MDG, Fig. 10(i)–(l) show that a higher allowing to accurately sample the targeted region of the space of
number of pumps brings no significant changes, as the minimum possible gain profiles. As a result, this method scales well with
achievable MDG is determined by overlap integrals of the fiber. respect to the number of fiber modes, the number of Raman
Its value stays infact approximately constant inside the training pumps, and the amplification bandwidth. On this regard, the
region regardless of the pump count, with the level curve show- low root-mean-square error (quantified for various number of
ing that MDG values lower than 4% are achieved for a region pump wavelenghts) demonstrated the achievement of the target
considerably wider than T . profile. Another key advantage of this scheme is that it does
not require supervision in selecting simulation parameters (like
power and wavelength ranges) that might also affect the quality
IV. CONCLUSION of the results.
We have demonstrated an unsupervised machine learning This approach is tested on a 4-mode fiber using the counter-
method based on autoencoders to predict the required pump pumping scheme, various numbers of pumps, up to 8, and for
parameters to generate flat and tilted gain profiles using Ra- the C+L band. The training process is further simplified by
man amplification in few-mode fibers. Thanks to automatic the fact that the autoencoder can directly predict the pump
differentiation, a numerical Raman model is embedded in the powers at z = 0, eliminating the need to employ costly shooting
autoencoder, allowing to train it directly on ideal gain profiles algorithms that are typically needed for counter-propagating
(e.g. flat or tilted) and obtaining a robust unsupervised learning Raman amplification models. The pump power to be injected
method that does not rely on a pre-computed dataset to learn the in the fiber are in fact computed with a single integration of the
1380 JOURNAL OF LIGHTWAVE TECHNOLOGY, VOL. 39, NO. 5, MARCH 1, 2021
propagation equations. We achieved very good results regarding [14] C. Headley and G. P. Agrawal, Raman Amplification in Fiber Optical
flatness and mode-dependent gain over the entire C+L band and Communication Systems. San Diego, CA, USA: Academic Press, 2005.
[15] D. Zibar et al., “Inverse system design using machine learning: The Raman
the considered interval of gain levels and tilts, reaching a gain amplifier case,” J. Lightw. Technol., vol. 38, no. 4, pp. 736–753, Feb. 2020.
flatness of 3% of the total gain using 8 pumps, and a residual [16] Y. Chen, J. Du, Y. Huang, K. Xu, and Z. He, “Intelligent gain flattening in
mode-dependent gain of 2% of the total gain, independently of wavelength and space domain for FMF raman amplification by machine
learning based inverse design,” Opt. Express, vol. 28, no. 8, pp. 11 911–
the number of Raman pumps. This method can be extended to 11 920, Apr. 2020.
the case of co-propagating pumps and even to a mixture of co- [17] M. Bartholomew-Biggs, S. Brown, B. Christianson, and L. Dixon, “Au-
and counter-propagating pumps. Finally, if the numerical model tomatic differentiation of algorithms,” J. Comput. Appl. Math., vol. 124,
no. 1, pp. 171–190, Dec. 2000.
is substituted by an experiment (with automatic data aquisition), [18] M. Raissi, P. Perdikaris, and G. Karniadakis, “Physics-informed neural
the encoder neural network could, in principle, be trained by the networks: A deep learning framework for solving forward and inverse
experiments. This will also require the definition of a proper problems involving nonlinear partial differential equations,” J. Comput.
Phys., vol. 378, pp. 686–707, Feb. 2019.
algorithm to update the neural network parameters, to replace [19] A. Paszke et al., “Pytorch: An imperative style, high-performance deep
automatic differentiation. learning library,” in Proc. Adv. Neural Inf. Proc. Syst. 32, H. Wallach, H.
Larochelle, A. Beygelzimer, F. dAlché-Buc, E. Fox, and R. Garnett, Eds.
Curran Associates, Inc., 2019, pp. 8024–8035.
REFERENCES [20] A. G. Baydin, B. A. Pearlmutter, A. A. Radul, and J. M. Siskind, “Auto-
matic differentiation in machine learning: A survey,” J. Mach. Learn. Res.,
[1] P. P. Mitra and J. B. Stark, “Nonlinear limits to the information capacity
of optical fibre communications,” Nature, vol. 411, no. 6841, Jun. 2001, vol. 18, no. 153, pp. 1–43, 2018.
[21] B. Karanov et al., “End-to-end deep learning of optical fiber communica-
Art. no. 1027.
tions,” J. Lightw. Technol., vol. 36, no. 20, pp. 4843–4855, Oct. 2018.
[2] R.-J. Essiambre, G. J. Foschini, G. Kramer, and P. J. Winzer, “Capacity
[22] R. T. Jones, T. A. Eriksson, M. P. Yankov, and D. Zibar, “Deep learning
limits of information transport in fiber-optic networks,” Phys. Rev. Lett.,
vol. 101, no. 16, Oct. 2008, Art. no. 163901. of geometric constellation shaping including fiber nonlinearities,” in Proc.
Eur. Conf. Opt. Commun., Rome, Italy, Sep. 2018, pp.1–3.
[3] A. D. Ellis, N. M. Suibhne, D. Saad, and D. N. Payne, “Communication
[23] J. Cho and P. J. Winzer, “Probabilistic constellation shaping for optical
networks beyond the capacity crunch,” Philos. Trans. Roy. Soc. A: Math.,
fiber communications,” J. Lightw. Technol., vol. 37, no. 6, pp. 1590–1607,
Phys. Eng. Sci., vol. 374, no. 2062, Mar. 2016, Art. no. 20150191.
[4] A. Chralyvy, “The coming capacity crunch,” in Proc. 35th Eur. Conf. Opt. Mar. 2019.
[24] R. Ryf, R. Essiambre, J. von Hoyningen-Huene, and P. Winzer, “Analysis
Commun., Vienna, Austria, Sep. 2009, pp. 1–1.
of mode-dependent gain in Raman amplified few-mode fiber,” in Proc.
[5] D. J. Richardson, J. M. Fini, and L. E. Nelson, “Space-division multiplex-
Opt. Fiber Commun. Conf., Los Angeles, CA, USA, 2012, pp. 1–3, Art.
ing in optical fibres,” Nat. Photon., vol. 7, no. 5, pp. 354–362, May 2013.
[6] R. Ryf et al., “Mode-division multiplexing over 96 km of few-mode fiber no. OW1D.2.
[25] J. Zhou, “An analytical approach for gain optimization in multimode fiber
using coherent 6 × 6 MIMO processing,” J. Lightw. Technol., vol. 30, no. 4,
raman amplifiers,” Opt. Express, vol. 22, no. 18, Sep. 2014, Art. no. 21393.
pp. 521–531, Feb. 2012.
[26] C. Antonelli, A. Mecozzi, and M. Shtaif, “Raman amplification in mul-
[7] V. Sleiffer et al., “737 Tb/s (96 × 3 × 256-Gb/s) mode-division-
timode fibers with random mode coupling,” Opt. Lett., vol. 38, no. 8,
multiplexed DP-16QAM transmission with inline MM-EDFA,” Opt. Ex-
Apr. 2013, Art. no. 1188.
press, vol. 20, no. 26, Dec. 2012, Art. no. B428.
[27] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. Cambridge,
[8] N. Bai et al., “Mode-division multiplexed transmission with inline few-
mode fiber amplifier,” Opt. Express, vol. 20, no. 3, Jan. 2012, Art. no. MA, USA: MIT Press, 2016.
[28] D. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in
2668.
Proc. 3rd Int. Conf. Learning Representations (ICLR), San Diego, CA,
[9] R. Ryf et al., “Mode-equalized distributed Raman amplification in 137-km
USA, May 2015, pp. 1–15.
few-mode fiber,” in Proc. 37th Eur. Conf. Exhib. Opt. Commun., Geneva,
Switzerland, Sep. 2011, pp. 1–3. [29] X. Glorot and Y. Bengio, “Understanding the difficulty of training deep
feedforward neural networks,” in Proc. 13th Int. Conf. Statist., Mar. 2010,
[10] M. Esmaeelpour et al., “Transmission over 1050-km few-mode fiber based
pp. 249–256.
on bidirectional distributed raman amplification,” J. Lightw. Technol.,
[30] H. M. Jiang and K. Xie, “Efficient and robust shooting algorithm for
vol. 34, no. 8, pp. 1864–1871, Apr. 2016.
[11] J. Li et al., “Experimental demonstration of a few-mode raman amplifier numerical design of bidirectionally pumped Raman fiber amplifiers,” J.
Opt. Soc. Amer. B, vol. 29, no. 1, pp.8–14, 2012.
with a flat gain covering 1530–1605 nm,” Opt. Lett., vol. 43, no. 18,
[31] D. Hollenbeck and C. D. Cantrell, “Multiple-vibrational-mode model for
Sep. 2018, Art. no. 4530.
fiber-optic Raman gain spectrum and response function,” J. Opt. Soc. Amer.
[12] J. Bromage, “Raman amplification for fiber communications systems,” J.
Lightw. Technol., vol. 22, no. 1, pp. 79–93, Jan. 2004. B, vol. 19, no. 12, pp.2886–2892, Dec. 2002.
[13] D. Jia, H. Zhang, Z. Ji, N. Bai, and G. Li, “Optical fiber amplifiers for space-
division multiplexing,” Front. Optoelectron., vol. 5, no. 4, pp. 351–357,
Dec. 2012.