0% found this document useful (0 votes)
8 views5 pages

Efficient Wireless Traffic Prediction at The Edge A Federated Meta-Learning Approach

The document presents a federated meta-learning approach for efficient wireless traffic prediction in 6G networks, addressing challenges related to heterogeneous data and privacy concerns. It proposes a global model that adapts to local scenarios through minimal fine-tuning and employs distance-based weighted model aggregation to enhance spatial-temporal prediction. Extensive simulations demonstrate that this method outperforms traditional federated learning techniques and benchmarks in predicting wireless traffic.

Uploaded by

sh1619513754
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views5 pages

Efficient Wireless Traffic Prediction at The Edge A Federated Meta-Learning Approach

The document presents a federated meta-learning approach for efficient wireless traffic prediction in 6G networks, addressing challenges related to heterogeneous data and privacy concerns. It proposes a global model that adapts to local scenarios through minimal fine-tuning and employs distance-based weighted model aggregation to enhance spatial-temporal prediction. Extensive simulations demonstrate that this method outperforms traditional federated learning techniques and benchmarks in predicting wireless traffic.

Uploaded by

sh1619513754
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

IEEE COMMUNICATIONS LETTERS, VOL. 26, NO.

7, JULY 2022 1573

Efficient Wireless Traffic Prediction at the Edge:


A Federated Meta-Learning Approach
Liang Zhang , Student Member, IEEE, Chuanting Zhang , Member, IEEE,
and Basem Shihada , Senior Member, IEEE

Abstract— Wireless traffic prediction plays a vital role in man- is proposed for city-scale wireless traffic prediction.
aging high dynamic and low latency communication networks, Zhang et al. exploit transfer learning to capture the complex
especially in 6G wireless networks. Regarding data and com- patterns hidden in cellular data and transfer the knowledge
puting resources constraints in edge devices, federated wireless
traffic prediction has attracted considerable interest. However,
to various traffic [4]. Transfer learning solves the problem of
federated learning is limited to dealing with heterogeneous limited data and avoids training from scratch. However, the
scenarios and unbalanced data availability. Along this line, knowledge needs to be learnt and transferred from a region
we propose an efficient federated meta-learning approach to with a similar scenario.
learn a sensitive global model with knowledge collected from The aforementioned centralized DL schemes [4], [11] need
different regions. The global model can efficiently adapt to access to the geographically distributed data sets, which are
the heterogeneous local scenarios by processing only one or a
few steps of fine-tuning on the local data sets. Additionally, hard to guarantee in wireless networks due to privacy concerns
distance-based weighted model aggregation is designed to capture and communication overhead. Therefore, it naturally triggers
the dependencies among different regions for better spatial- the idea of federated learning (FL) solution for wireless traffic
temporal prediction. We evaluate the performance of the pro- prediction, such as FedDA [12]. FL can significantly reduce
posed scheme by comparing it with the conventional federated the network bandwidth and latency by sending only the model
learning approaches and other commonly used benchmarks for parameters rather than the raw data stream. However, it is
traffic prediction. The extensive simulation results reveal that the
proposed scheme outperforms the benchmarks. challenging to ensure good performance when FL applications
face spatially-correlated scenarios [13]. FedDA adopted a
Index Terms— Wireless traffic prediction, federated meta- clustering scheme to solve the spatial dependency modeling.
learning, and heterogeneous scenarios.
But the clusters in FedDA are predetermined, which is too
rigid to model the local spatial dependencies.
I. I NTRODUCTION
To avoid training from scratch and achieve personalized

W ITH the emergence of the concepts, such as 6G wire-


less networks [1], [2], Internet of Things (IoT), and
unmanned aerial vehicle (UAV) assisted networks [3], the
models for geographically distributed heterogeneous data set,
data-sharing strategy [14], multi-task learning [10], and meta-
learning [15] are adopted to overcome the statistical hetero-
wireless traffic is anticipated to be dynamic, complex, and geneity problem confronted with FL. But data-sharing breaks
excessively high in scale. Wireless traffic prediction [4], [5] is the principle of data privacy. Multi-task learning relies heav-
one of the core ingredients for 6G networks since proactive ily on the assumption of certain task relationships, limiting
resource allocation and green communications rely heavily on its ability to solve the heterogeneity problem. Meta-learning
the accurate prediction of future traffic state. For example, model, on the other hand, is capable of well adapting or
adaptive channel assignment can be obtained by predicting generalizing to new tasks and new environments. Thus, in this
wireless traffic to avoid traffic congestion [6]. letter, we introduce model-agnostic meta-learning (MAML)
Recently, deep learning (DL) approaches have achieved into wireless traffic prediction under the FL framework to
promising improvement for wireless traffic prediction [7], [8]. achieve efficient wireless traffic prediction at the edge. Specif-
Recurrent neural network (RNN) is exploited in [9], [10] to ically, we aim to train a sensitive initial model that can
perform spatial-temporal wireless traffic prediction. In [11], adapt fast to heterogeneous scenarios in different regions.
a spatial-temporal densely connected network (STDenseNet) A distance-based weighted model aggregation is further pro-
Manuscript received 14 February 2022; revised 6 April 2022; accepted posed and integrated to capture the dependencies among
13 April 2022. Date of publication 18 April 2022; date of current version different regions for better spatial-temporal prediction. The
12 July 2022. This work was supported by the King Abdullah University of proposed scheme inherits all the benefits from the FL archi-
Science and Technology. The associate editor coordinating the review of this
letter and approving it for publication was Z. Yang. (Corresponding authors:
tecture and guarantees the extra personalized characteristic to
Chuanting Zhang; Basem Shihada.) each local model.
Liang Zhang and Basem Shihada are with the Computer, Electrical, and
Mathematical Science and Engineering Division, King Abdullah University
of Science and Technology (KAUST), Thuwal 23955, Saudi Arabia (e-mail:
II. DATA AND P ROBLEM F ORMULATION
[email protected]; [email protected]). A. Wireless Traffic Data
Chuanting Zhang was with the Computer, Electrical, and Mathematical
Science and Engineering Division, King Abdullah University of Science and The wireless traffic data sets are call detail records (CDRs)
Technology (KAUST), Thuwal 23955, Saudi Arabia. He is now with the from the city of Milan, Italy and the province of Trentino,
Department of Electrical and Electronic Engineering, University of Bristol,
Bristol BS8 1UB, U.K. (e-mail: [email protected]). Italy, collected every 10 minutes over a two-month time
Digital Object Identifier 10.1109/LCOMM.2022.3167813 span [16]. The raw CDRs are geo-referenced, anonymized
1558-2558 © 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: SHANDONG UNIVERSITY. Downloaded on November 24,2024 at 05:19:32 UTC from IEEE Xplore. Restrictions apply.
1574 IEEE COMMUNICATIONS LETTERS, VOL. 26, NO. 7, JULY 2022

for an individual client. Furthermore, the training set of the k-


th client Pk is divided into support set Pks and query set Pkq .
Personalized knowledge is preserved and internally transferred
via Pks to Pkq . To aggregate the local models at the central
server and inherit the globe model from the central server to
each local client, the uplinks and the downlinks between the
local clients and the central server are built up.
Generally, the objective of FL-based traffic prediction is to
obtain a global model with parameter θ that can minimize the
average loss function of the local data sets, which is denoted as
K
1 
min L(θ; Pk ), (1)
θ K
k=1

where L(θ; Pk ) is the loss function representing the differ-


ences between the predicted traffic volume ŷn and the ground
truth yn . Taking the mean squared error (MSE) as the metric
Fig. 1. Spatial and temporal characteristics of wireless traffic. for example, the loss function is defined as
and aggregated Internet traffic data based on the location 1 
L(θ; Pk ) = (ŷn − yn )2 . (2)
of the regions. Specifically, a CDR record is logged if a Nk
{xn ,yn }∈Pk
user transfers more than 5 MB of data or spends more than
15 minutes online. After that, these records are grouped by In contrast to the traditional FL-based traffic prediction
administrative regions to protect privacy. targeting to train an ordinary model that ingests all clients,
The patterns hidden in wireless traffic are complex and our objective is to obtain a sensitive global model that can
challenging to be modelled. The characteristics of wireless adapt fast to a heterogeneous distribution of scenarios. In this
traffic in heterogeneous scenarios are analysed in Fig. 1, which regard, we manage to minimize the loss between the true value
includes the physical locations of five regions of Milan and the of each client’s traffic and the predicted value calculated based
corresponding temporal and spatial traffic dynamics. We can on the fine-tuned model, which is obtained by proceeding one
observe that some regions have similar temporal patterns or a few steps of fine-tuning on part of the local data set. The
visually and high spatial correlation statistically. For example, objective is formally described as
region A and region B are physically near each other and K J−1
1  
have the same peak traffic hours. Their traffic series also have min L(θ − α ∇θ L(θk,j ; Pks ); Pkq ), (3)
high spatial correlations (0.94 in terms of Pearson correlation θ K j=0
k=1
coefficient calculated with traffic vectors of A and B). But we
also observe that some regions have distinct traffic patterns. where ∇θ L(θk,j ; Pks ) denotes the gradient corresponding
For example, region A and region E have different peak traffic to the j-th steps of local update, j ∈ [0, J). θ −
J−1
hours and small correlations. Besides, as shown in Fig. 1d, α j=0 ∇θ L(θk,j ; Pks ) is the model obtained after fine-tuning
different regions have various traffic statistics. In this context, on support sets Pks . The intuition of problem (3) is to minimize
we need to train a model capable of capturing both the pattern the average loss of the fine-tuned models proceeding on the
similarity (spatial and temporal dependencies) and the pattern query sets Pkq , which would be the new tasks we expect our
diversity (personalization). model to fast adapt to.

B. Problem Formulation III. F EDERATED M ETA -L EARNING A PPROACH


We consider a decentralized communication network among In this section, we propose a federated meta-learning
geographically distributed regions. For each region, a local approach for wireless traffic prediction. The training system
client records the wireless traffic and conducts the local model is configured with a decentralized structure, the same as
update. C = {1, . . . , k, . . . , K} denotes the clients set, where the conventional FL-based approach [12]. We implement the
k is the index and K is the total number of the local MAML strategy in the federated framework and conduct
clients. The sequential traffic data sets are divided into N distance-based weighted model aggregation to simultaneously
time slots. In the n-th time slot, dn is the random variable achieve efficient and personalized traffic prediction. Once the
representing the traffic volume, and the closeness dependency global model is well trained, the test is conducted individually
xn = {dn−m , dn−m+1 , . . . , dn−1 } is regarded as the input at the edge after a few steps of gradient descent fine-tuning.
feature, where m is the number of the nearest data points The scheme is illustrated in Algorithm 1.
taken into consideration. Suppose dn to be the prediction target
which can be labelled as the output yn , since we consider the
one-step-ahead prediction. Thus the input-output pair {xn , yn } A. MAML-Enhanced Parameter Learning
can be obtained by using sliding window scheme. We randomly initialize the global model parameter θ. A set
The samples are locally generated. The number of samples of C = max(δK, 1) clients denoted as Ct is randomly selected
N k varies from client to client, and zero sample is possible during each training episode, where δ is the hyper-parameter

Authorized licensed use limited to: SHANDONG UNIVERSITY. Downloaded on November 24,2024 at 05:19:32 UTC from IEEE Xplore. Restrictions apply.
ZHANG et al.: EFFICIENT WIRELESS TRAFFIC PREDICTION AT EDGE: FEDERATED META-LEARNING APPROACH 1575

Algorithm 1 Federated Meta-Learning Algorithm for B. Distance-Based Weighted Model Aggregation


Wireless Traffic Prediction To further model the spatial dependencies among different
Input: data sets P, step size parameters α and β, regions, we propose a distance-based weighted model aggrega-
fraction of selected client δ tion scheme. More specifically, once the central server received
Output: Learned model parameters θ all the gradient information from the chosen clients at the
1 random initialize θ
t-th communication round, we calculate the cosine similarities
2 for each round t = 0, 1, 2, · · · , do among different regions, which yields a distance matrix ρt+1
3 C = max(δK, 1)
⎡ t+1 ⎤
4 Sample a set Ct of C clients ρ1,1 ρt+11,2 · · · ρt+1
1,C
5 for each client c ∈ Ct in parallel do ⎢ρ2,1 ρ2,2 · · · ρ2,C ⎥
t+1 t+1 t+1
⎢ ⎥
6 Load global model: θc,0t
= θt ρt+1 = ⎢ . .. . .. ⎥, (7)
⎣ .. . . . ⎦
7 Sample a batch of tasks Tcs from Pcs .
8 for each step j = 1, 2, · · · , J do ρt+1
C,1 ρt+1
C,2 ··· ρt+1
C,C
t t t
9 θc,j = θc,j−1 − α∇θt L(θc,j−1 ; Tcs )
where ρt+1
c,r measures the cosine similarity between region c
10 Sample a batch of tasks Tcq from Pcq and region r, and is computed as
11 Update model with (5)
t+1 t+1
Individual model enhancement based on spatial θc,0 · θr,0
12
 t+1
ρt+1
c,r = t+1 t+1 . (8)
dependencies θ̃ct+1 = r∈Ct ρ̃t+1 θr,0
c,r  ||θc,0 || · ||θr,0 ||
13 Global model update: θt+1 = C1 c∈Ct θ̃ct+1
For each client c, an enhanced individual model incorporating
spatial dependencies is obtained as

t+1
qualifying the fraction of the clients chosen at each round. θ̃ct+1 = ρ̃t+1
c,r θr,0 , (9)
For each client c ∈ Ct , we load the current global model r∈Ct
t
in parallel and initialize the local model parameter θc,0 by
where ρ̃t+1 t+1
c,r is the softmax version of ρc,r . Then, the central
reproducing the global model parameter θt . Thereafter, a batch
server update the global model as follows
of traffic prediction tasks Tcs is sampled from the support set
Pcs . J steps of gradient descent are conducted on sampled Tcs , 1  t+1
and the updated model is internally transferred to preserve the θt+1 = θ̃c . (10)
C
personalized knowledge. Formally, the local model parameters c∈Ct

updated at the j-th step are calculated as follows The above induced global model captures the spatial depen-
t t t
θc,j = θc,j−1 − α∇θt L(θc,j−1 ; Tcs ). (4) dencies among different regions and can adapt to new traffic
patterns. Notice that, for fairness consideration, we set the size
Subsequently, a batch of tasks Tcq is sampled from the query of Tc sampled in every client identical.
set Pcq . The local model is improved by rapidly adjusting to the
sampled query tasks. The updated local models are uploaded
to the central server for the knowledge integration from the C. Model Personalization With Local Adaption
heterogeneous scenarios, such as
Before adopting the model to a new traffic prediction task
t+1 t t
θc,0 = θc,0 − β∇θt L(θc,J ; Tcq ), (5) of a specific client, fine-tuning is executed on the local data
where ∇θt L(θc,Jt
; Tcq ) is the second-order gradient descent set to adjust the model to the private data of each client.
conducted on the query tasks and is merged into the current Specifically, we sample a batch of tasks Tcs from the local data
local models corresponding to the slightly updated and inter- set and conduct only one or a few gradient descent steps. The
nally transferred model parameter θc,Jt
. Taken equation (4) into above mentioned adaption is the repetition of the sampling
consideration, the second-order gradient descent operation is and internal updating process (line 7-9) in Algorithm 1.
given as The volume of the traffic is predicted by implementing the
t
personalized mode with parameter θc,J , which is expressed as
∇θt L(θc,J ; Tcq )
t
= ∇θc,J
t L(θc,J ; Tcq ) · ∇θt θc,J
t J−1

t
θc,J = θ − α ∇θ L(θc,j ; Tcs ). (11)
= ∇θc,J
t L(θc,J ; Tcq ) · ∇θc,J−1
t
t
θc,J t
· ∇θt θc,J−1 j=0
J

=∇ t
t
L(θc,J ; Tcq ) · ∇θc,j−1
t
t
θc,j The model can be evaluated in terms of MSE based on test
θc,J
j=1
data sets Pctest , such as
t
= ∇θc,J
t L(θc,J ; Tcq ) 1 
L(θc,J ; Pctest ) = (ŷn − yn )2 , (12)
J
 Nctest
t {xn ,yn }∈Pctest
· (I − α∇θc,j−1
t ∇θt L(θc,j−1 ; Tcq )). (6)
j=1 where Nctest is the number of samples for testing.

Authorized licensed use limited to: SHANDONG UNIVERSITY. Downloaded on November 24,2024 at 05:19:32 UTC from IEEE Xplore. Restrictions apply.
1576 IEEE COMMUNICATIONS LETTERS, VOL. 26, NO. 7, JULY 2022

TABLE I
P REDICTION C OMPARISONS A MONG D IFFERENT A LGORITHMS

IV. E XPERIMENTAL R ESULTS AND A NALYSIS


A. Experiment Settings
Our experiment uses the first seven weeks’ data to train a
Fig. 2. Predictions versus ground truth values.
model and the last week’s data to test the model. In each
communication round we assume only a few clients, e.g.,
δK, are involved, and we set δ to 0.1. K equals to 88 and
223 for the Milano and Trentino, respectively. To generate
data samples, the window size m is set to 6. Data samples
are standardized to accelerate the training speed. We design
a neural network with L layers, and each layer has M
neurons. Considering the amount of data in each region and
the power restrictions of the edge server, L and M are set to Fig. 3. ECDF as a function of absolute error.
3 and 40, unless otherwise specified. We train our model for
100 consecutive rounds with batch size 20 by using SGD. The 0.5463, on the Milano and Trentino data set, respectively.
choices of learning rates, i.e., α and β, are obtained by a grid Our method’s best results are 0.0169 and 0.4815. Thus,
search over α, β ∈ {0.1, 0.01, 0.001}. we can see a clear performance improvement, especially for
the Trentino data set. The improvement percentages are up to
B. Baselines and Evaluation Metrics 5% and 11.9% for Milano and Trentino data sets, respectively.
We compare our algorithm with historical average (HA), In addition, we notice that our method is relatively robust to
support vector regression (SVR), random forest (RF), muti- network architecture as all these three variants achieve similar
layer perceptron (MLP), federated averaging (FedAvg) and prediction results Thanks to its fast adaptability to different
FedDA. The first one is a classical time series prediction traffic scenarios, it achieves similar or even better predictions
method. SVR and RF are traditional machine learning methods when compared with the centralized MLP strategy. The perfor-
for wireless traffic prediction. FedAvg trains a global model by mance of HA is generally poor as they are parameter-free and
averaging the local ones. FedDA captures the spatial depen- have no ability to learn the hidden patterns. Learning-based
dencies of regions by clustering. We train HA, SVR, and RF fully distributed methods can usually achieve lower prediction
in a fully distributed strategy and train MLP in a centralized errors than HA, as they can model the traffic dynamics through
strategy. All the others are trained in a federated manner. adjustable parameters. Besides, a model’s prediction ability
To make a fair comparison, MLP, FedAvg, FedDA, and our has a positive relationship with the number of parameters.
proposed method share exactly the same network architecture Another thing worth noting is that FL-based methods are
and are configured with the same (hyper-)parameters, e.g., superior to fully distributed methods since they involve model
learning rate and batch size. To explore our model’s robust- aggregation and can fuse knowledge of different regions. This
ness, we consider three variants. i.e., a standard network with is an effective way to capture the spatial dependencies among
L = 3 and M = 40 named Proposed-s, a wide network with different regions. But our proposed method achieves better
L = 3 and M = 400 named Proposed-w, and a deep network predictions than FedAvg and FedDA, since it is aware of the
with L = 10 and M = 40 named Proposed-d. We evaluate spatial dependency diversities among different regions.
the prediction performance of different algorithms in terms of
MSE and mean absolute error (MAE) metrics. D. Prediction vs Ground Truth
We report region-level prediction results in this subsection.
C. Prediction Results For each data set, a random region is selected and the
We repeat the experiments 10 times and report the averaged comparisons between predictions and ground truth values are
quantitative prediction results in Table I. The best results are plotted in Fig. 2. Besides, the empirical cumulative distribution
marked in bold for clearness. We can observe from this table function (ECDF) of absolute prediction errors is also reported,
that our proposed method achieves the best prediction results, and the results are summarized in Fig. 3. We include the results
which validates the effectiveness of MAML integrated into a of FedAvg and FedDA in both figures for comparisons.
federated framework. Take the MSE as an example, among all From Fig. 2, we observe that our method has similar perfor-
the baselines (except MLP), the best results are 0.0179 and mance with FedAvg and FedDA on the Milano data set. But on

Authorized licensed use limited to: SHANDONG UNIVERSITY. Downloaded on November 24,2024 at 05:19:32 UTC from IEEE Xplore. Restrictions apply.
ZHANG et al.: EFFICIENT WIRELESS TRAFFIC PREDICTION AT EDGE: FEDERATED META-LEARNING APPROACH 1577

TABLE II more data samples are involved in the local model update.
MSE C OMPARISONS OF D IFFERENT M ODELS W HEN But when the number of fine-tuning steps is large enough, the
D EALING W ITH D IFFERENT T RAFFIC S CENARIOS
performance gain is minimal. In reality, the optimal choices
of these two parameters can be obtained through a grid search
scheme.

V. C ONCLUSION
In this letter, we proposed efficient federated meta-learning
approach for the decentralized wireless traffic prediction.
Distance-based weighted model aggregation scheme was inte-
grated to capture the spatial-temporal characterizes. By imple-
menting the approach, we obtained a sensitive global model
that can quickly adapt to heterogeneous scenarios and unbal-
anced data availability at the edge clients via only a few
steps of fine-tuning. Three measures on two different data sets
evaluated the effectiveness and efficiency of the approach. The
impacts of hyper-parameters were also reported. The experi-
mental results showed that our proposed approach outperforms
other federated learning approaches and classical prediction
Fig. 4. Parameter sensitivity. methods.
the Trentino data set, our method achieves much better results R EFERENCES
than FedAvg and FedDA, especially when the traffic volume
increases from the fourth day. Our method’s superiority can [1] W. Saad, M. Bennis, and M. Chen, “A vision of 6G wireless systems:
Applications, trends, technologies, and open research problems,” IEEE
be more clearly reflected by the ECDF of prediction errors in Netw., vol. 34, no. 3, pp. 134–142, Oct. 2019.
Fig. 3. For example, on the Trentino data set, 82.5% prediction [2] K. B. Letaief, W. Chen, Y. Shi, J. Zhang, and Y. A. Zhang,
errors of our method are less than 0.2 GB. While the cases “The roadmap to 6G: AI empowered wireless networks,” IEEE Commun.
Mag., vol. 57, no. 8, pp. 84–90, Aug. 2019.
for FedAVG and FedDA are 40% and 65%, respectively. [3] L. Zhang, A. Celik, S. Dang, and B. Shihada, “Energy-efficient trajectory
The results indicate that introducing MAML and distance- optimization for UAV-assisted IoT networks,” IEEE Trans. Mobile
based weighted model aggregation into federated learning can Comput., early access, Apr. 22, 2021, doi: 10.1109/TMC.2021.3075083.
[4] C. Zhang, H. Zhang, J. Qiao, D. Yuan, and M. Zhang, “Deep transfer
indeed enhance the generalization ability of the global model, learning for intelligent cellular traffic prediction based on cross-domain
particularly for high heterogeneous scenarios, such as the big data,” IEEE J. Sel. Areas Commun., vol. 37, no. 6, pp. 1389–1401,
Trentino data set. Jun. 2019.
[5] Y. Xu, F. Yin, W. Xu, J. Lin, and S. Cui, “Wireless traffic prediction with
scalable Gaussian process: Framework, algorithms, and verification,”
E. Homogeneous vs Heterogeneous IEEE J. Sel. Areas Commun., vol. 37, no. 6, pp. 1291–1306, Jun. 2019.
[6] F. Tang, B. Mao, Z. M. Fadlullah, and N. Kato, “On a novel deep-
To further demonstrate the ability of different models when learning-based intelligent partially overlapping channel assignment in
dealing with heterogeneous wireless traffic, we select four SDN-IoT,” IEEE Commun. Mag., vol. 56, no. 9, pp. 80–86, Sep. 2018.
regions with high similarities and another one with distinct [7] C. Zhang, P. Patras, and H. Haddadi, “Deep learning in mobile and
wireless networking: A survey,” IEEE Commun. Surveys Tuts., vol. 21,
traffic patterns with the other four. We train a model using no. 3, pp. 2224–2287, Mar. 2019.
data samples from the former four regions and test the model’s [8] C. Zhang and P. Patras, “Long-term mobile traffic forecasting using deep
performance on these five regions separately to mimic homo- spatio-temporal neural networks,” in Proc. 18th ACM Int. Symp. Mobile
Ad Hoc Netw. Comput., Los Angeles, CA, USA, Jun. 2018, pp. 231–240.
geneous and heterogeneous scenarios. The selected regions [9] J. Wang et al., “Spatiotemporal modeling and prediction in cellular
and obtained results of FedAVG, FedDA, and our proposed networks: A big data enabled deep learning approach,” in Proc. IEEE
method are summarized in Table II. We can clearly notice Conf. Comput. Commun. (INFOCOM), Atlanta, GA, USA, May 2017,
pp. 1–9.
from Table II that all three methods perform fairly well for the [10] C. Qiu, Y. Zhang, Z. Feng, P. Zhang, and S. Cui, “Spatio-temporal
homogeneous scenarios. But when dealing with heterogeneous wireless traffic prediction with recurrent neural network,” IEEE Wireless
data set, i.e., test a model on data samples of unseen (possibly Commun. Lett., vol. 7, no. 4, pp. 554–557, Aug. 2018.
[11] C. Zhang, H. Zhang, D. Yuan, and M. Zhang, “Citywide cellular traffic
unbalanced) and distinct traffic patterns, our proposed method prediction based on densely connected convolutional neural networks,”
achieves much better performance compared with FedAVG IEEE Commun. Lett., vol. 22, no. 8, pp. 1656–1659, Aug. 2018.
[12] C. Zhang, S. Dang, B. Shihada, and M.-S. Alouini, “Dual attention-
and FedDA. Thus, the results in Table II demonstrate the based federated learning for wireless traffic prediction,” in Proc. IEEE
excellent adaptive ability of our method of dealing with INFOCOM, Vancouver, BC, Canada, May 2021, pp. 1–10.
heterogeneous wireless traffic datasets. [13] S. Hosseinalipour, C. G. Brinton, V. Aggarwal, H. Dai, and M. Chiang,
“From federated to fog learning: Distributed machine learning over
heterogeneous wireless networks,” IEEE Commun. Mag., vol. 58, no. 12,
F. Impacts of Hyper-Parameters pp. 41–47, Dec. 2020.
There are two key hyper-parameters in our method, i.e., [14] Y. Zhao, M. Li, L. Lai, N. Suda, D. Civin, and V. Chandra, “Federated
learning with non-IID data,” 2018, arXiv:1806.00582.
the number of data samples per slot and the fine-tuning steps. [15] C. Finn, P. Abbeel, and S. Levine, “Model-agnostic meta-learning
We report the results when varying these two hyper-parameters for fast adaptation of deep networks,” in Proc. ICML, Sydney, NSW,
in Fig. 4. It can be seen from Fig. 4 that when the number Australia, Aug. 2017, pp. 1126–1135.
[16] G. Barlacchi et al., “A multi-source dataset of urban life in the city of
of adaption steps or the number of data samples per slot Milan and the Province of Trentino,” Sci. Data, vol. 2, no. 1, pp. 1–15,
increases, the performances of our method are improved since Oct. 2015.
Authorized licensed use limited to: SHANDONG UNIVERSITY. Downloaded on November 24,2024 at 05:19:32 UTC from IEEE Xplore. Restrictions apply.

You might also like