Efficient Wireless Traffic Prediction at The Edge A Federated Meta-Learning Approach
Efficient Wireless Traffic Prediction at The Edge A Federated Meta-Learning Approach
Abstract— Wireless traffic prediction plays a vital role in man- is proposed for city-scale wireless traffic prediction.
aging high dynamic and low latency communication networks, Zhang et al. exploit transfer learning to capture the complex
especially in 6G wireless networks. Regarding data and com- patterns hidden in cellular data and transfer the knowledge
puting resources constraints in edge devices, federated wireless
traffic prediction has attracted considerable interest. However,
to various traffic [4]. Transfer learning solves the problem of
federated learning is limited to dealing with heterogeneous limited data and avoids training from scratch. However, the
scenarios and unbalanced data availability. Along this line, knowledge needs to be learnt and transferred from a region
we propose an efficient federated meta-learning approach to with a similar scenario.
learn a sensitive global model with knowledge collected from The aforementioned centralized DL schemes [4], [11] need
different regions. The global model can efficiently adapt to access to the geographically distributed data sets, which are
the heterogeneous local scenarios by processing only one or a
few steps of fine-tuning on the local data sets. Additionally, hard to guarantee in wireless networks due to privacy concerns
distance-based weighted model aggregation is designed to capture and communication overhead. Therefore, it naturally triggers
the dependencies among different regions for better spatial- the idea of federated learning (FL) solution for wireless traffic
temporal prediction. We evaluate the performance of the pro- prediction, such as FedDA [12]. FL can significantly reduce
posed scheme by comparing it with the conventional federated the network bandwidth and latency by sending only the model
learning approaches and other commonly used benchmarks for parameters rather than the raw data stream. However, it is
traffic prediction. The extensive simulation results reveal that the
proposed scheme outperforms the benchmarks. challenging to ensure good performance when FL applications
face spatially-correlated scenarios [13]. FedDA adopted a
Index Terms— Wireless traffic prediction, federated meta- clustering scheme to solve the spatial dependency modeling.
learning, and heterogeneous scenarios.
But the clusters in FedDA are predetermined, which is too
rigid to model the local spatial dependencies.
I. I NTRODUCTION
To avoid training from scratch and achieve personalized
Authorized licensed use limited to: SHANDONG UNIVERSITY. Downloaded on November 24,2024 at 05:19:32 UTC from IEEE Xplore. Restrictions apply.
1574 IEEE COMMUNICATIONS LETTERS, VOL. 26, NO. 7, JULY 2022
Authorized licensed use limited to: SHANDONG UNIVERSITY. Downloaded on November 24,2024 at 05:19:32 UTC from IEEE Xplore. Restrictions apply.
ZHANG et al.: EFFICIENT WIRELESS TRAFFIC PREDICTION AT EDGE: FEDERATED META-LEARNING APPROACH 1575
updated at the j-th step are calculated as follows The above induced global model captures the spatial depen-
t t t
θc,j = θc,j−1 − α∇θt L(θc,j−1 ; Tcs ). (4) dencies among different regions and can adapt to new traffic
patterns. Notice that, for fairness consideration, we set the size
Subsequently, a batch of tasks Tcq is sampled from the query of Tc sampled in every client identical.
set Pcq . The local model is improved by rapidly adjusting to the
sampled query tasks. The updated local models are uploaded
to the central server for the knowledge integration from the C. Model Personalization With Local Adaption
heterogeneous scenarios, such as
Before adopting the model to a new traffic prediction task
t+1 t t
θc,0 = θc,0 − β∇θt L(θc,J ; Tcq ), (5) of a specific client, fine-tuning is executed on the local data
where ∇θt L(θc,Jt
; Tcq ) is the second-order gradient descent set to adjust the model to the private data of each client.
conducted on the query tasks and is merged into the current Specifically, we sample a batch of tasks Tcs from the local data
local models corresponding to the slightly updated and inter- set and conduct only one or a few gradient descent steps. The
nally transferred model parameter θc,Jt
. Taken equation (4) into above mentioned adaption is the repetition of the sampling
consideration, the second-order gradient descent operation is and internal updating process (line 7-9) in Algorithm 1.
given as The volume of the traffic is predicted by implementing the
t
personalized mode with parameter θc,J , which is expressed as
∇θt L(θc,J ; Tcq )
t
= ∇θc,J
t L(θc,J ; Tcq ) · ∇θt θc,J
t J−1
t
θc,J = θ − α ∇θ L(θc,j ; Tcs ). (11)
= ∇θc,J
t L(θc,J ; Tcq ) · ∇θc,J−1
t
t
θc,J t
· ∇θt θc,J−1 j=0
J
=∇ t
t
L(θc,J ; Tcq ) · ∇θc,j−1
t
t
θc,j The model can be evaluated in terms of MSE based on test
θc,J
j=1
data sets Pctest , such as
t
= ∇θc,J
t L(θc,J ; Tcq ) 1
L(θc,J ; Pctest ) = (ŷn − yn )2 , (12)
J
Nctest
t {xn ,yn }∈Pctest
· (I − α∇θc,j−1
t ∇θt L(θc,j−1 ; Tcq )). (6)
j=1 where Nctest is the number of samples for testing.
Authorized licensed use limited to: SHANDONG UNIVERSITY. Downloaded on November 24,2024 at 05:19:32 UTC from IEEE Xplore. Restrictions apply.
1576 IEEE COMMUNICATIONS LETTERS, VOL. 26, NO. 7, JULY 2022
TABLE I
P REDICTION C OMPARISONS A MONG D IFFERENT A LGORITHMS
Authorized licensed use limited to: SHANDONG UNIVERSITY. Downloaded on November 24,2024 at 05:19:32 UTC from IEEE Xplore. Restrictions apply.
ZHANG et al.: EFFICIENT WIRELESS TRAFFIC PREDICTION AT EDGE: FEDERATED META-LEARNING APPROACH 1577
TABLE II more data samples are involved in the local model update.
MSE C OMPARISONS OF D IFFERENT M ODELS W HEN But when the number of fine-tuning steps is large enough, the
D EALING W ITH D IFFERENT T RAFFIC S CENARIOS
performance gain is minimal. In reality, the optimal choices
of these two parameters can be obtained through a grid search
scheme.
V. C ONCLUSION
In this letter, we proposed efficient federated meta-learning
approach for the decentralized wireless traffic prediction.
Distance-based weighted model aggregation scheme was inte-
grated to capture the spatial-temporal characterizes. By imple-
menting the approach, we obtained a sensitive global model
that can quickly adapt to heterogeneous scenarios and unbal-
anced data availability at the edge clients via only a few
steps of fine-tuning. Three measures on two different data sets
evaluated the effectiveness and efficiency of the approach. The
impacts of hyper-parameters were also reported. The experi-
mental results showed that our proposed approach outperforms
other federated learning approaches and classical prediction
Fig. 4. Parameter sensitivity. methods.
the Trentino data set, our method achieves much better results R EFERENCES
than FedAvg and FedDA, especially when the traffic volume
increases from the fourth day. Our method’s superiority can [1] W. Saad, M. Bennis, and M. Chen, “A vision of 6G wireless systems:
Applications, trends, technologies, and open research problems,” IEEE
be more clearly reflected by the ECDF of prediction errors in Netw., vol. 34, no. 3, pp. 134–142, Oct. 2019.
Fig. 3. For example, on the Trentino data set, 82.5% prediction [2] K. B. Letaief, W. Chen, Y. Shi, J. Zhang, and Y. A. Zhang,
errors of our method are less than 0.2 GB. While the cases “The roadmap to 6G: AI empowered wireless networks,” IEEE Commun.
Mag., vol. 57, no. 8, pp. 84–90, Aug. 2019.
for FedAVG and FedDA are 40% and 65%, respectively. [3] L. Zhang, A. Celik, S. Dang, and B. Shihada, “Energy-efficient trajectory
The results indicate that introducing MAML and distance- optimization for UAV-assisted IoT networks,” IEEE Trans. Mobile
based weighted model aggregation into federated learning can Comput., early access, Apr. 22, 2021, doi: 10.1109/TMC.2021.3075083.
[4] C. Zhang, H. Zhang, J. Qiao, D. Yuan, and M. Zhang, “Deep transfer
indeed enhance the generalization ability of the global model, learning for intelligent cellular traffic prediction based on cross-domain
particularly for high heterogeneous scenarios, such as the big data,” IEEE J. Sel. Areas Commun., vol. 37, no. 6, pp. 1389–1401,
Trentino data set. Jun. 2019.
[5] Y. Xu, F. Yin, W. Xu, J. Lin, and S. Cui, “Wireless traffic prediction with
scalable Gaussian process: Framework, algorithms, and verification,”
E. Homogeneous vs Heterogeneous IEEE J. Sel. Areas Commun., vol. 37, no. 6, pp. 1291–1306, Jun. 2019.
[6] F. Tang, B. Mao, Z. M. Fadlullah, and N. Kato, “On a novel deep-
To further demonstrate the ability of different models when learning-based intelligent partially overlapping channel assignment in
dealing with heterogeneous wireless traffic, we select four SDN-IoT,” IEEE Commun. Mag., vol. 56, no. 9, pp. 80–86, Sep. 2018.
regions with high similarities and another one with distinct [7] C. Zhang, P. Patras, and H. Haddadi, “Deep learning in mobile and
wireless networking: A survey,” IEEE Commun. Surveys Tuts., vol. 21,
traffic patterns with the other four. We train a model using no. 3, pp. 2224–2287, Mar. 2019.
data samples from the former four regions and test the model’s [8] C. Zhang and P. Patras, “Long-term mobile traffic forecasting using deep
performance on these five regions separately to mimic homo- spatio-temporal neural networks,” in Proc. 18th ACM Int. Symp. Mobile
Ad Hoc Netw. Comput., Los Angeles, CA, USA, Jun. 2018, pp. 231–240.
geneous and heterogeneous scenarios. The selected regions [9] J. Wang et al., “Spatiotemporal modeling and prediction in cellular
and obtained results of FedAVG, FedDA, and our proposed networks: A big data enabled deep learning approach,” in Proc. IEEE
method are summarized in Table II. We can clearly notice Conf. Comput. Commun. (INFOCOM), Atlanta, GA, USA, May 2017,
pp. 1–9.
from Table II that all three methods perform fairly well for the [10] C. Qiu, Y. Zhang, Z. Feng, P. Zhang, and S. Cui, “Spatio-temporal
homogeneous scenarios. But when dealing with heterogeneous wireless traffic prediction with recurrent neural network,” IEEE Wireless
data set, i.e., test a model on data samples of unseen (possibly Commun. Lett., vol. 7, no. 4, pp. 554–557, Aug. 2018.
[11] C. Zhang, H. Zhang, D. Yuan, and M. Zhang, “Citywide cellular traffic
unbalanced) and distinct traffic patterns, our proposed method prediction based on densely connected convolutional neural networks,”
achieves much better performance compared with FedAVG IEEE Commun. Lett., vol. 22, no. 8, pp. 1656–1659, Aug. 2018.
[12] C. Zhang, S. Dang, B. Shihada, and M.-S. Alouini, “Dual attention-
and FedDA. Thus, the results in Table II demonstrate the based federated learning for wireless traffic prediction,” in Proc. IEEE
excellent adaptive ability of our method of dealing with INFOCOM, Vancouver, BC, Canada, May 2021, pp. 1–10.
heterogeneous wireless traffic datasets. [13] S. Hosseinalipour, C. G. Brinton, V. Aggarwal, H. Dai, and M. Chiang,
“From federated to fog learning: Distributed machine learning over
heterogeneous wireless networks,” IEEE Commun. Mag., vol. 58, no. 12,
F. Impacts of Hyper-Parameters pp. 41–47, Dec. 2020.
There are two key hyper-parameters in our method, i.e., [14] Y. Zhao, M. Li, L. Lai, N. Suda, D. Civin, and V. Chandra, “Federated
learning with non-IID data,” 2018, arXiv:1806.00582.
the number of data samples per slot and the fine-tuning steps. [15] C. Finn, P. Abbeel, and S. Levine, “Model-agnostic meta-learning
We report the results when varying these two hyper-parameters for fast adaptation of deep networks,” in Proc. ICML, Sydney, NSW,
in Fig. 4. It can be seen from Fig. 4 that when the number Australia, Aug. 2017, pp. 1126–1135.
[16] G. Barlacchi et al., “A multi-source dataset of urban life in the city of
of adaption steps or the number of data samples per slot Milan and the Province of Trentino,” Sci. Data, vol. 2, no. 1, pp. 1–15,
increases, the performances of our method are improved since Oct. 2015.
Authorized licensed use limited to: SHANDONG UNIVERSITY. Downloaded on November 24,2024 at 05:19:32 UTC from IEEE Xplore. Restrictions apply.