Mobile Keyword Prediction Using Federated Learning
Mobile Keyword Prediction Using Federated Learning
https://doi.org/10.22214/ijraset.2023.50826
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue IV Apr 2023- Available at www.ijraset.com
Abstract: Federated learning is a decentralized form of Machine Learning in which data subsets are trained on several edge
devices aggregated and brought to the centralized server. Here, federated applications store the local copy on all the edge devices
such as smartphones where users can use it accordingly. The model gradually learns and trains itself from inputs by the user’s
virtual keyboard and becomes smarter iteratively.
Devices transfer the results in the form of parameters to the centralized server where these results are aggregated with the help
of federated algorithms. For the purpose of next word prediction using federated learning, we implemented different algorithms
such as FedAvg, FedProx, FedSgd. In reference to the traditional ML models, data is collected in a centralized location and is
used to train the model. However, unlike ML techniques the Federated Learning techniques give more control over user’s data
to themselves since the data is not shared with the central server or other devices. Thus, federated learning adheres to the data
privacy and feasibility for its users.
Keywords: federated, decentralized, centralized server, FedAvg, FedProx, FedSgd
I. INTRODUCTION
Next-word prediction feature facilitates text entry and plays an important role in improving the user’s experience. Today, all smart
mobile devices are incorporated with virtual keyboards which support more than 600 languages and emoticons. The current mobile
keyword prediction models are constrained in multiple ways, such as limited vocabulary, contextual awareness, personalization and
multilingual support.
When it comes to the next word prediction feature, the users expect the visible response within a fraction of second. The users might
get uptight about the collection and remote storage of their personal data despite use of data cleaning and strict access protocols as
in case of the traditional neural networks or N-gram approaches.
This paper aims at showing a federated learning environment that encrypts the user-sensitive data and fixes privacy and latency
issues with more feasibility.
Such an approach is referred to as the Secure aggregation Principle where the server is allowed to securely combine the encrypted
results and decrypt only the aggregated results. It applies a federated averaging mechanism that plays a role in aggregating
encrypted chunks of results from edge devices. The paper shows the comparative study of different federated algorithms to judge
the accuracy rate, loss and speed.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 3144
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue IV Apr 2023- Available at www.ijraset.com
Federated Learning has two main components, the aggregator and remote training parties. The aggregator is the server which
averages local updates using federated learning algorithms.Preprocessing of data takes place over local data. Training of models
takes place at the server side where the server aggregates model updates from the mobile devices. The remote training parties are the
edge devices and often referred to as clients that fetch the global model application from the server and transfer their inputs in the
form of local updates to the server.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 3145
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue IV Apr 2023- Available at www.ijraset.com
A typical federated environment has a high density of potentially hundreds of millions of client devices, however only a fraction of
them are active and available at all time for training purposes. In general, a training round consists of only a subset of available
devices or clients which is randomly selected. This is because it is impractical to coordinate the innumerable amount of clients at
once. The initial model and the parameters required for training are distributed by the server to the subset of clients that will
participate in a round of training and evaluation. On each client the model is iteratively invoked in an independent and parallel
manner on a stream of local data batches to generate a new set of model parameters and a set of local metrics (also known as local
aggregations). Thereby the model parameters and locally exported metrics are accumulated across the system by running a
distributed aggregation protocol suite.
V. FEDERATED LEARNING
Federated learning offers a distribution strategy that can be employed to train a machine learning model. It is also a decentralized
technique as it keeps training data localized on mobile devices and never collects them centrally. Mobile devices act as clients and
generate big data that is used for training purposes. The client processes local data and exchanges the model updates with the server,
rather than uploading data directly to the server as in case of centralized model training. The server aggregates the weights from the
sample of clients to generate an improved and updated global model. The model is then sent to the client and this process is done
iteratively. Various aggregation algorithms can be used on the server side to combine the client weights and update the model to
produce a new, improved and accurate model.
Despite when the server-hosted data is anonymised, this decentralized on-device processing technique offers some advantages in
terms of security and privacy as compared to other server storage methods. Previous research shows that privacy-preserving
methods like safe aggregation and differential privacy can both complement from federated learning.Users have direct and hands-on
control over their data as the confidential information is kept on client devices only. Each client transmitted model changes are
transitory, concentrated and aggregated. The client changes are handled in memory and are instantly deleted after updates become
part of the weight vector. They are never saved on the server hence, preserving user privacy and providing secure aggregation.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 3146
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue IV Apr 2023- Available at www.ijraset.com
VI. METHODS
In this work we have used 3 different federated learning methodologies i.e. FedAvg, FedRep, FedProx. Each method differentiates
from one another in terms of aggregation.
At each communication round, the central server selects a subset of clients (S ⊆ [N]) and sends global model information to the
subset. Each client then updates the global model with its own local data. The client then sends the updated model back to the
central server. The server aggregates and updates the global model based on input from clients. This process repeats several rounds
of communication until a termination criterion is reached. B. Validation accuracy is met. A client's Di record with a superscript t
represents the tth communication between the central server and the selected client. where t ∊ {1,...,T}.
A. FedAvg
Federated averaging, also known as FedAvg, is a simple and widely-used federated learning algorithm that involves simply
averaging the model updates from the client devices to produce the new global model.
It has four hyperparameters; the fraction of clients C for each round, B the local minibatch size, E the number of times each client
trains over the local dataset each round (epochs) and the learning rate. We assume that all sample devices complete e epochs and
thus it drops stragglers. FedAvg selects clients consecutively prior to each epoch to steer clear of federations with low probability of
participating clients. Additionally, the learning rate should be decreased when the objective function is optimized and as the model
approaches the global minimum of the loss function to improve the model's accuracy and produce enhanced results.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 3147
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue IV Apr 2023- Available at www.ijraset.com
Key difference between FedAvg and FedSGD is how they handle Non-IID data. Non-IID refers to data without identical and
independent distribution. FedAvg tends to be sensitive to the non-iid data, as it assumes that all client devices have the same data
distribution. FedSGD , on the other hand, handles non-iid data by allowing each client device to take multiple steps of SGD on its
own, which helps reduce impact of local data variations.
C. FedProx
FedProx is a generalization of FedAvg with some modifications to address heterogeneity of data and systems. The learning is again
performed in rounds. At each round, the server samples a set of m clients and sends them the current global model. The FedProx
algorithm works by adding a proximal term to the objective function that is being optimized. This term penalizes large deviations of
the model parameters from the previous round, encouraging the model to converge to a more stable solution. The proximal term is
controlled by a hyperparameter called the "proximal coefficient", which determines the strength of the penalty. Additionally, we
perform the local optimization for a variable number of epochs according to the system resources.
VII. RESULTS
A. Federated Learning simulation on Shakespeare Dataset
During the training of different FL models, it is observed that the FedProx converges faster and better than the FedAvg and
FedSGD.
The partitioning of the Shakespeare dataset is given in the figure 7.1.1.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 3148
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue IV Apr 2023- Available at www.ijraset.com
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 3149
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue IV Apr 2023- Available at www.ijraset.com
With the help of Fig 7.1.8 it can be verified that FedProx gives better accuracy as compared to FedAvg and FedSGD models. Also
it can be observed that for a greater number of rounds the accuracy tends to reach a constant value and the loss tends to decrease
continually.
For comparison purposes the learning rate was set to 0.001 for all the models.
REFERENCES
[1] Andrew Hard, Kanishka Rao, Rajiv Mathews, Swaroop Ramaswamy, Franciose Beaufays,“ Federated Learning for mobile keyboard prediction.”
[2] Timothy Yang, Galen Andrew, Hubert Eichner, Haicheng Sun, We Li,“ Applied Federated Learning: Improving Google Keyboard Query Suggestions.”
[3] Mingqing Chen, Rajiv Mathews, Tom Ouyang, Francoise Beaufays,“ Federated Learning Of Out-Of-Vocabulary Words.”
[4] Chaoyang He, Conghui Tan, Hanlin Tang, Shuang Qiu, Ji Liu,“ Central Server Free Federated Learning over Single-sided Trust Social Networks.”
[5] Khrystyna Shakhovska, Iryna Dumyn, Natalia Kryvinska,“ An approach for a next-word prediction for Ukrainian Language.”
[6] Keith Bonawitz, Hubert Eichner, Wolfgang Grieskamp, Dzmitry Huba, Alex Ingerman, Vladimir Ivanov,“Towards Federated Learning At Scale: System
Design.”
[7] Jakub Konecny, H. Brendan McMahan, Daniel Ramage, Peter Richtarik,“ Federated Optimization: Distributed Machine learning for On-Device Intelligence.”
[8] Joel Stremmel, Arjun Singh,“ Pretraining Federated Text Models for Next Word Prediction.”
[9] Yunlong Lu, Xiaohong Huang, Yueyue Dai, Sabita Maharjan and Yan Zhang,“ Federated Learning for Data Privacy Preservation in Vehicular Cyber-Physical
Systems.”
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 3150
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue IV Apr 2023- Available at www.ijraset.com
[10] Yiqiang Chen, Xin Qin, Jingong Wang, Chaouhui Yu, Wen Gao,“ FedHealth: A Federated Transfer Learning Framework for Wearable Healthcare.”
[11] Mahfuzzur Rahman, Mahima Rabbi, Annajiat Alim Rasel, Md Tanzim Reza, “ A Design and Implementation of Bangla Next Word Predictor Based on
Personalized Federated Learning Leveraging Model Agnostic Meta Learning and Semantic Analysis”
[12] Wei Yang Bryan Lim, Nguyen Cong Luong, Dinh Thai Hoang, Yutao Jiao, Ying-Chang Liang, Qiang Yang, Dusit Niyato and Chunyan Miao, “Federated
Learning in Mobile Edge Networks: A Comprehensive Survey”
[13] Kaiyue ZHANG, Xuan SONG, Chenhan ZHANG, Shui YU, “ Challenges and future directions of secure federated learning: a survey”
[14] Sukanya Bag, “Federated Learning – A Beginners Guide”
[15] George Jeno, “Federated Learning with Python: Design and Implement a Federated Learning System and Develop Applications using existing frameworks”
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 3151