Experience-Driven Computational Resource Allocation of Federated Learning by Deep Reinforcement Learning
Experience-Driven Computational Resource Allocation of Federated Learning by Deep Reinforcement Learning
Motivation
Deep Learning Techniques have emerged as the most promising in training models for various tasks like object detection,
classification, anomaly detection, etc. However, a large chunk of big data is generated through less resourceful user-equipments
(UE’s) like smartphones, IoT devices, where it is impractical to upload all this data to one centralized node server for training.
Centralized model training is a cumbersome process proposing many hindrances like network quality restrictions, privacy and
ownership concerns, lack of collaboration, etc. Hence Federated learning was introduced as a novelty where distributed learning
was brought under one umbrella, enhancing collaboration among devices without private data exposure. Federated Learning
works in iterations, where UE’s train their local models, upload the model parameters instead of the data to the centralized server,
where a global model is synthesized and then distributed back to the UE’s. UE’s however, are heterogeneous in computation and
communication capabilities due to varying underlying hardware; hence there is a tradeoff between the two costs during Federated
Learning, and there has been active research conducted in this field for its optimization. There has also been a push for designing
quick learning algorithms with better converging speeds, but the main issue we face in our research is "energy efficiency". This is
the crucial motivation for the work in the paper. A tradeoff between energy efficiency and learning time arises due to UE's
heterogeneous nature combined with the synchronization among training nodes after each iteration. All of this must take into
account the unpredictable network quality due to mobility or environmental factors. Instead of combining network quality
prediction and optimization algorithms, we turn to machine learning for solving federated learning problems.
Problem Statement
Federated Learning over wireless networks proposes the optimization problem of computational resource allocation on mobile
devices, that captures the tradeoff between connectivity and computational costs and improves the energy efficiency(by trading
idle time for power saving) without slowing the training process, a significant issue with mobile devices with heterogeneous
environments of computation & communication capabilities combined with varying physical specifications and battery exhaustion
limitations. Another factor that is unrealistically assumed in previous papers is the assumption of stable network connectivity
among connected devices.
Contributions
The paper contributes to providing a new computational resource allocation algorithm for federated learning that considers both
the converging time and mobile devices' energy consumption. The algorithm proposed is experience-driven i.e., it can learn the
best strategies for resource allocation based on the previous action (using an action-critic model) and is tested and proved both on
small-scale testbed as well as large scale simulations where it outperforms the traditional state-of-the-art solutions by a superior
margin. The DRL agent is based on an action-critic network forming the federated learning’s core and predicts the best suitable
CPU-cycle frequency for each mobile device at the beginning of every iteration. DRL interacts with the Federated Learning
system, which defines the rules, restrictions, and reward mechanism, observes its state, and determines the action based on
previous experiences, bringing the “experience-driven” part of the algorithm into actuality. It learns through a state, action, and
reward system to find the best policy mapping a state to an action that maximizes the discounted accumulated reward. Improving
the energy-efficiency of federated learning by carefully controlling the CPU-cycle frequency is the key contribution of the paper.
Due to the hardness of the control problem and the unawareness of the network quality, machine learning methods were applied
and an experience-driven method was devised to solve the control problem based on DRL. We train the DRL agent based on the
real-world network datasets. The final trace-driven experiments further demonstrate the superiority of the DRL-based approach
compared to the state-of-the-art solutions.
Proposed Approach
The proposed approach is broadly classified into two parts- Federated Learning system and the DRL agent. Considering a
practical scenario with dynamic network bandwidth, the author sets the state space in DRL formulation since future network
bandwidth is related to historical bandwidth information. Action in the mth round is defined as the set of CPU cycle frequencies of
all connected mobile devices in the mth iteration. Hence in one iteration, the mobile devices complete the federated learning
updates, upload the new parameters to the node server and after each mobile device has completed the upload, the DRL agent
obtains the system cost in the current iteration. The PPO algorithm is chosen for our policy optimization approach due to its ease
of implementation, sample complexity, ease of tuning, and assurance of low deviation from previous policy. The DRL agent
maintains an experience replay buffer, a policy(action), and an estimate of the value function(critic). Since the parameter server
can access mobile devices’ information, hence we can train the DRL agent in an offline manner.
The training procedure begins with random initialization of action and critic network parameters. The real-world network dataset
and mobile devices’ information are pre-loaded, hence constructing a simulated training environment of federated learning
systems. In order to train the DRL agent efficiently, another policy is used to sample the federated learning environment. In this
way, the DRL agent can repeatedly use the experience sampled by the old policy multiple times. The federated learning system
randomly selects a start time and the DRL agent constructs the initial state by each mobile device’s past bandwidth historical
information. Then the DRL agent starts to execute CPU-cycle frequency control and at the beginning of the kth iteration in
federated learning, the DRL agent feeds the current state into the policy network and derives the corresponding action. After the
mobile devices receive the action from the DRL agent, they train the deep learning model with the specified CPU-cycle frequency
determined by the current action. The kth iteration ends with the parameter server receiving all the updates from the mobile
devices. Then the DRL agent can calculate the reward obtained in the kth iteration, and the federated learning system moves to the
next state. At the same time, the experience in the kth iteration is stored in the experience replay buffer. When the experience
replay buffer is full, we can update the DRL agent with the experience in the replay buffer, where the actor-network is updated by
the PPO approach. After the DRL agent learns the information from the experience in the buffer, the new parameters of
actor-network are assigned to the old policy to do the next sampling.
Limitations
The question for Federated Learning implementations has always been around its efficiency.Federated Learning requires small
delays and higher reliability of mobile devices.Although the paper tries to address this issue, however the algorithm is tested in a
simulation and not a real-time experiment. The dataset itself is old and won’t cater to the modern interconnection bandwidth,
traffic and other complications. The unpredictability of network bandwidth and QoS always remains a limitation in these areas and
are still open issues.