0% found this document useful (0 votes)

22 views

APFederated Machine Learning

Uploaded by

harichigurupati

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views

APFederated Machine Learning

Uploaded by

harichigurupati

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

WHITE PAPER

IEEE SA

IEEE FEDERATED MACHINE LEARNING

WHITE PAPER

Authored by

Qiang Yang
Lixin Fan
Richard Tong
Angelica Lv

Authorized licensed use limited to: IEEE Xplore. Downloaded on April 09,2024 at 09:28:55 UTC from IEEE Xplore. Restrictions apply.
TRADEMARKS AND DISCLAIMERS
IEEE believes the information in this publication is accurate as of its publication date; such information is subject to change
without notice. IEEE is not responsible for any inadvertent errors.

The ideas and proposals in this specification are the respective author’s views and do not represent the views of the affiliated
organization.

The Institute of Electrical and Electronics Engineers, Inc. 3 Park Avenue, New York, NY 10016‐5997, USA

Copyright © 2021 by The Institute of Electrical and Electronics Engineers, Inc.

PDF: STDVA24772 978‐1‐5044‐7660‐7

IEEE is a registered trademark in the U. S. Patent & Trademark Office, owned by The Institute of Electrical and Electronics Engineers,
Incorporated. All other trademarks are the property of the respective trademark owners.

IEEE prohibits discrimination, harassment, and bullying. For more information, visit http://www.ieee.org/web/aboutus/whatis/policies/p9‐
26.html.

No part of this publication may be reproduced in any form, in an electronic retrieval system, or otherwise, without the prior written
permission of the publisher.

Find IEEE standards and standards‐related product listings at: http://standards.ieee.org.

2 IEEE SA Copyright © 2021 IEEE. All rights reserved.

Authorized licensed use limited to: IEEE Xplore. Downloaded on April 09,2024 at 09:28:55 UTC from IEEE Xplore. Restrictions apply.
NOTICE AND DISCLAIMER OF LIABILITY CONCERNING THE USE OF
IEEE SA DOCUMENTS
This IEEE Standards Association (“IEEE SA”) publication (“Work”) is not a consensus standard document. Specifically, this
document is NOT AN IEEE STANDARD. Information contained in this Work has been created by, or obtained from, sources
believed to be reliable, and reviewed by members of the activity that produced this Work. IEEE and the IEEE P3652.1 members
expressly disclaim all warranties (express, implied, and statutory) related to this Work, including, but not limited to, the
warranties of: merchantability; fitness for a particular purpose; non-infringement; quality, accuracy, effectiveness, currency,
or completeness of the Work or content within the Work. In addition, IEEE and the IEEE P3652.1 members disclaim any and all
conditions relating to: results; and workmanlike effort. This document is supplied “AS IS” and “WITH ALL FAULTS.”

Although the IEEE P3652.1 members who have created this Work believe that the information and guidance given in this Work
serve as an enhancement to users, all persons must rely upon their own skill and judgment when making use of it. IN NO EVENT
SHALL IEEE-SA OR ICAP MEMBERS BE LIABLE FOR ANY ERRORS OR OMISSIONS OR DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO: PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
OUT OF THE USE OF THIS WORK, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE AND REGARDLESS OF WHETHER
SUCH DAMAGE WAS FORESEEABLE.

Further, information contained in this Work may be protected by intellectual property rights held by third parties or
organizations, and the use of this information may require the user to negotiate with any such rights holders in order to legally
acquire the rights to do so, and such rights holders may refuse to grant such rights. Attention is also called to the possibility
that implementation of any or all of this Work may require use of subject matter covered by patent rights. By publication of
this Work, no position is taken by the IEEE with respect to the existence or validity of any patent rights in connection therewith.
The IEEE is not responsible for identifying patent rights for which a license may be required, or for conducting inquiries into
the legal validity or scope of patents claims. Users are expressly advised that determination of the validity of any patent rights,
and the risk of infringement of such rights, is entirely their own responsibility. No commitment to grant licenses under patent
rights on a reasonable or non-discriminatory basis has been sought or received from any rights holder.

This Work is published with the understanding that IEEE and the IEEE P3652.1 members are supplying information through this
Work, not attempting to render engineering or other professional services. If such services are required, the assistance of an
appropriate professional should be sought. IEEE is not responsible for the statements and opinions advanced in this Work.

3 IEEE SA Copyright © 2021 IEEE. All rights reserved.

Authorized licensed use limited to: IEEE Xplore. Downloaded on April 09,2024 at 09:28:55 UTC from IEEE Xplore. Restrictions apply.
IEEE FEDERATED MACHINE LEARNING WHITE PAPER ..................................... 5
TABLE OF CONTENTS
ABSTRACT ............................................................................................................. 5

1. DATA PRIVACY POROTECTION AND FEDERATED MACHINE LEARNING ..... 6

1.1. DATA ISOLATION AND PRIVACY PROTECTION................................................ 6

1.2. FEDERATED MACHINE LEARNING FOR DATA PRIVACY PROTECTION..................... 6
1.3. WHY IEEE FEDERATED MACHINE LEARNING (IEEE STD 3652.1-2020) ............ 7

2. FRAMEWORK AND CATEGORIZATION OF FEDERATED

MACHINE LEARNING ............................................................................... 7

3. APPLICATIONS OF FEDERATED MACHINE LEARNING ..............................10

3.1. FINANCE .............................................................................................10

3.2. TELECOMMUNICATIONS .........................................................................10
3.3. HEALTHCARE .......................................................................................11
3.4. EDUCATION .........................................................................................11
3.5. URBAN COMPUTING..............................................................................13
3.6. GOVERNMENT SERVICES.........................................................................14
3.7. GOVERNMENT GOVERNANCE ..................................................................14
3.8. MARKETING ........................................................................................14
3.9. IOT/EDGE COMPUTING ..........................................................................15

4. REFERENCES...........................................................................................17

Authorized licensed use limited to: IEEE Xplore. Downloaded on April 09,2024 at 09:28:55 UTC from IEEE Xplore. Restrictions apply.
IEEE FEDERATED MACHINE LEARNING
WHITE PAPER

ABSTRACT
Data privacy and information security pose significant challenges to the big data and artificial intelligence
(AI) community as these communities are increasingly under pressure to adhere to regulatory
requirements, such as the European Union’s General Data Protection Regulation. Many routine
operations in big data applications, such as merging user data from various sources in order to build a
machine learning model, are considered to be illegal under current regulatory frameworks. The purpose
of federated machine learning is to provide a feasible solution that enables machine learning applications
to utilize the data in a distributed manner that does not exchange raw data directly and does not allow
any party to infer private information of other parties. This white paper intends to present an overview
of the Federated Machine Learning (FML) technology that can be used as a basis for standards,
certifications, laws, policies, and/or product ratings.

This white paper targets an educated audience, including lawmakers, corporate and governmental policy
makers, manufacturers, engineers, and standard setting bodies. However, this white paper is also easily
understood by non‐technical managers and policy makers as it provides system developers and
manufacturers with an overview of Federated Machine Learning techniques. Finally, one must give credit
to the IEEE Federated Machine Learning (P3652.1) working group participants for their tremendous
dedication, expertise and thoughtful collaborations, without which the publication of IEEE Std 3652.1‐
2020 [1] would not have been possible .

5 IEEE SA Copyright © 2021 IEEE. All rights reserved.

Authorized licensed use limited to: IEEE Xplore. Downloaded on April 09,2024 at 09:28:55 UTC from IEEE Xplore. Restrictions apply.
1. DATA PRIVACY PROTECTION AND
FEDERATED MACHINE LEARNING

1.1. DATA ISOLATION AND PRIVACY PROTECTION

Today, artificial intelligence (AI) technology is showing its strengths in almost every industry and walks of life. The
current public interest in AI is partly driven by Big Data availability, e.g., AlphaGo in 2016 used a total of 300,000
games as training data and achieved excellent results. However, except for a few industries, most application
fields have only limited data of poor quality, making the realization of AI technology more difficult than previously
thought. Moreover, data privacy and information security pose significant challenges to the big data and AI
community as these communities are increasingly under pressure to adhere to regulatory requirements such as
the European Union’s General Data Protection Regulation. Many routine operations in big data applications, such
as merging user data from various sources to build a machine learning model, are deemed illegal under current
regulatory frameworks.

As a result, we face a dilemma that our data is in the form of isolated islands, but we are forbidden in many
situations to collect, fuse, and use the data from different places for AI processing. How to legally solve the
problem of data fragmentation and isolation is a major challenge for AI researchers and practitioners today.

1.2. FEDERATED MACHINE LEARNING FOR DATA

PRIVACY PROTECTION
The concept of federated learning was initially proposed by Google researchers to build machine learning models
based on data sets that are distributed across multiple mobile phone devices while preventing data leakage [2],
[3]. The idea was extended by researchers from China, Singapore, Europe, and the United States to cover secure
distributed and collaborative learning scenarios among multiple organizations such as banks and hospitals that
have respective private data but do not wish to have this data shared [4]. Organizations such as banks or health
centers would like to take advantage of machine learning models co‐developed with peer organizations but are
obligated to keep their own data under tight protection.

Federated machine learning is a technological framework that allows a machine learning model to be collectively
constructed and used through data that is distributed across repositories owned by different organizations or
devices. While facilitating the building of federated machine learning models, this framework also aims to
preserve privacy, improve security, and meet regulatory requirements concerning data usage.

6 IEEE SA Copyright © 2021 IEEE. All rights reserved.

Authorized licensed use limited to: IEEE Xplore. Downloaded on April 09,2024 at 09:28:55 UTC from IEEE Xplore. Restrictions apply.
1.3. WHY IEEE FEDERATED MACHINE LEARNING
(IEEE STD 3652.1-2020)
In response to the urgent need for a Federated Machine Learning technology framework, the IEEE P3652.1
Working Group came together to create an architectural framework and application guidelines for federated
machine learning (FML), which includes the following:

 A description and definition of federated machine learning

 The categories of federated machine learning technologies and the application scenarios to
which each category applies
 A set of measures concerning the performance evaluation criteria for federated machine
learning
 Associated features of federated machine learning that fulfill different regulatory
requirements (see IEEE Std 3652.1-2020 [1]).

This white paper does not detail the technical content of the guide; rather, the white paper illustrates the need of
such a guide by showcasing a variety of use cases of the FML frameworks defined in the guide. By doing so, the
hope is that the white paper will provide readers with a brief overview of the technological landscape of FML as
well as underlying principles concerning the implementation of the FML framework in real-life applications.

2. FRAMEWORK AND CATEGORIZATION OF

FEDERATED MACHINE LEARNING
Federated machine learning is a distributed machine-learning framework that enables multiple participants to
collaboratively train and use a machine learning model for a given task, e.g., classification, prediction, and
recommendation. Within this framework, all raw data owned by different participants are protected by secure
and privacy-preserving techniques, which prevent the data from being tampered with and disclosed by other
participants or reverse-engineered by other participants.

FML, as a machine-learning framework, first concerns the performance of the learned models. It is expected that
any sound FML methods maintain performance that is very close to that of the model built when data from
multiple participants were put together in one location. Second, due to the distributed learning nature of FML,
the learning efficiency is of crucial importance for various FML methods. IEEE Std 3652.1-2020 [1] devotes a great
deal of attention on reducing both computational complexity and communication costs with efficient FML
methods. Third, for the sake of data security and privacy-preservation, the design, development, and

7 IEEE SA Copyright © 2021 IEEE. All rights reserved.

Authorized licensed use limited to: IEEE Xplore. Downloaded on April 09,2024 at 09:28:55 UTC from IEEE Xplore. Restrictions apply.
implementation of federated machine learning frameworks should be carefully considered by monitoring privacy
leakage and other security issues. Fourth, FML also needs to economically incentivize users to join and stay in the
federation. This economic incentive mechanism constitutes a unique feature of FML that is not included in other
distributed learning paradigms.

Depending on how the data are portioned between different participants, FML can be categorized as Horizontal
FML, Vertical FML, and Federated Transfer Learning (see Figure 1). Specifically, Horizontal FML refers to building
a model in the scenario where data sets have significant overlaps on the feature spaces but not on the ID spaces.
For example, Google proposed a horizontal federated learning solution for Android phone model updates
(McMahan, et al. [5]). In that framework, a single user using an Android phone updates the model parameters
locally and uploads the parameters to the Android cloud, thus jointly training the centralized model together with
other data owners. A secure aggregation scheme to protect the privacy of aggregated user updates under their
federated learning framework is also introduced (Bonawitz, et al. [6]).

Vertical FML refers to building a model in the scenario where data sets have significant overlaps on the sample
space, but not on the feature spaces. For example, consider two different companies in the same cityone is a
bank, and the other is an e-commerce company. Their user sets are likely to contain most of the residents of the
area, so the intersection of their user space is large. However, since the bank records the user’s revenue and
expenditure behavior and credit rating, and the e-commerce retains the user’s browsing and purchasing history,
their feature spaces are very different. Under this circumstance, one may apply Vertically Federated Learning,
which is the process of aggregating these different features and computing the training loss and gradients in a
privacy-preserving manner to build a model with data from both parties collaboratively.

Federated Transfer Learning (FTL) refers to the federated machine learning technique designed for application
scenarios where data sets have no significant overlap on neither the sample space nor the feature space. Consider
two institutionsone is a bank located in China, and the other is an e-commerce company located in the United
States. Due to geographical restrictions, the user groups of the two institutions have a small intersection. On the
other hand, due to the different businesses, only a small portion of the feature space from both parties overlaps.
In this case, transfer learning (Yang, et al. [7]) techniques can be applied to provide solutions for the entire sample
and feature space under a federation. Specifically, a common representation between the two-feature space is
learned using the limited common sample sets and later applied to obtain predictions for samples with only one-
sided features.

8 IEEE SA Copyright © 2021 IEEE. All rights reserved.

Authorized licensed use limited to: IEEE Xplore. Downloaded on April 09,2024 at 09:28:55 UTC from IEEE Xplore. Restrictions apply.
FIGURE 1 Categorization of Federated Learning

9 IEEE SA Copyright © 2021 IEEE. All rights reserved.

Authorized licensed use limited to: IEEE Xplore. Downloaded on April 09,2024 at 09:28:55 UTC from IEEE Xplore. Restrictions apply.
3. APPLICATIONS OF FEDERATED MACHINE
LEARNING
The application and commercialization of federated machine learning in the industry include different use cases,
which have different features of the federated machine learning function. Federated machine learning application
areas are broadly categorized as three types, according to requirements from different marketing sectors, i.e.,
business-to-consumer (B2C), business-to-business (B2B), and business-to-government (B2G). Each category has
respective requirements on FML methods concerning privacy protection, model performances, and efficiency etc.

3.1. FINANCE
Finance or financial services is an important area that can be greatly improved with the use of AI and big data.
Traditionally financial companies or banks make business decisions based on their data such as information from
bank accounts, credit card use, and loan history, which might be insufficient to evaluate customers’ financial risks
because these data only present a small part of user behavior needed for risk modeling. In contrast, customers’
yearly income, real estate ownership, and shopping history may provide more valuable information, but these are
the private information of users that need to be protected. In financial application scenarios, regulatory
requirements and privacy concerns prevent banks and financial companies from sharing their data. The main risks
faced by financial institutions are overdue loans and fraudulent loans caused by user credit risk and even fraud.
Traditional financial institutions may only know users’ borrowing history and behavior locally, but they know little
about users’ interests, consumption tendencies, behavior, and other private information. To conduct modeling
without involving privacy leakage and improve the assessment of risks of loans, the traditional practice is to
provide each institution with a separate model and integrate all model’s results to get the evaluation result.
However, this modeling method often has low performance and the obtained result may not be accurate enough.
Federated machine learning can solve this problem by jointly modeling the users’ overall behavior across many
sectors and financial institutions, without compromising model performances. By adopting FML methods, each
data holder can exchange encryption parameters to jointly train a model and obtain more reliable evaluation
results when the data is retained locally. It can help financial institutions avoid risks more effectively.

3.2. TELECOMMUNICATIONS
Mobile devices equipped with neural network processing units exploit their strong computational power to train
NN models using data captured by a wide range of on-device sensors. With such on-device computational power
and data, mobile applications have significantly improved their usability and bring convenience to people’s lives.

10 IEEE SA Copyright © 2021 IEEE. All rights reserved.

Authorized licensed use limited to: IEEE Xplore. Downloaded on April 09,2024 at 09:28:55 UTC from IEEE Xplore. Restrictions apply.
However, serious ethical and regulatory concerns about data privacy remain, since mobile devices have collected
enormous amounts of personal data such as biometric data, photos, and other personal information, and sent
them to remote servers. Personalized recommendation service, as an example, is used to manage users’ flight,
meeting, and hotel booking information and provide recommendations based on users’ personal information such
as contacts, message, calendar, location, sports/sleep data, app usage, etc. In this case, horizontal FML techniques
provide a secure and trustworthy solution to prevent leakage of sensitive personal data.

3.3. HEALTHCARE
There are diverse health-related data such as trans-omics data, including genome, epigenome, transcriptome,
metabolome, proteome and metagenome, imaging data and phenotype data collected from wearable devices or
other channels, along with the environmental, socioeconomic and behavior data. However, health-related data,
especially patients’ data is highly sensitive and distributed in nature, thus collection and sharing of such data may
bring critical legal and ethical privacy concerns. For example, if insurers learn a patient’s health data and find out
he/she has severe or high medical cost diseases, they may refuse to provide insurance service. FML can overcome
those obstacles by providing a federated machine learning model across organizations while keeping sensitive
health data in the local environment. FML applications in the healthcare field may have different scenarios
including business-to-government (B2G), business-to-business (B2B), business-to-customer (B2C) or mixed
models. The most common FML scenario in healthcare is B2B, where there is a need for the collaborative building
of FML models among different hospitals, companies, research institutions, etc. Direct moving data between
hospitals may raise concerns about security, privacy, and availability of medical data. FML can address these
concerns and the horizontal FML model should achieve better performance than the models trained with single
institutional data. As an example, with horizontal FML, in genetic studies, the comprehensive analysis of genes
helps to discover the hidden patterns between genotype and phenotype and benefits diagnostic and treatment
development of diseases such as cancers. Currently, samples collected from a single institution is insufficient to
cover all the mutations in BRCA1/2, while FML provides a feasible and secured way of training an FML model
predicting the risk of breast and ovarian cancer.

3.4. EDUCATION
Uses of machine learning in education and training applications range from standard data mining for the purpose
of domain specific student assessment (such as language skill diagnosis), personalized learning, teacher’s aids,
human knowledge discovery and representation, etc. The educational AI employs a variety of traditional machine

11 IEEE SA Copyright © 2021 IEEE. All rights reserved.

Authorized licensed use limited to: IEEE Xplore. Downloaded on April 09,2024 at 09:28:55 UTC from IEEE Xplore. Restrictions apply.
learning techniques, including clustering, pattern recognition, classification, optimization, control, and
recommendations. Very recently the research community has also started to consider reinforcement learning as
a way to learn pedagogical strategies, and deep learning to dynamically generate contents and to engage in
human-AI interactions using natural languages and multimodal communication mechanisms. Typically, the
application of AI in education is B2C. There are at least three fundamental problems posed by these applications
of machine learning to education and training:

a) The protection of personally identifiable data, which is regulated in general, but even more highly
regulated in the educational arena, especially when children are involved.

b) The interoperable exchange and sharing of the models generated by machine learning driven learning
management systems, such as adaptive instructional system (AIS), many of which are expressed in terms
of knowledge, skills, abilities, attitudes, and other characteristics and include learner models, domain
models, pedagogical models, adaptive models and interface models.

c) The ethical practice of AI, which includes verifying that the models generated are not applied in
unwarranted or unwanted ways and are either not biased or transparent about their biases.

FML can help address these use cases by:

1) Constructing learner models with data from multiple learning systems: In this use case, multiple learning
systems produce data about learners, some of whom use more than one system, but the systems are
prohibited from sharing data and the identity of learners. Each system applies its own machine learning
to estimate mastery, or to make predictions or estimate the effect of a particular activity as a function of
the aggregation of learner states. These estimates are exchanged among multiple learning systems and
a larger model is constructed using federated machine learning to improve the accuracy of each system
and, if appropriate, the recommendations it makes.

2) Using FML to aggregate and combine learner interaction data related to domain models: This enables
machine learning driven content analysis, ontological construction content generation, and content
quality improvement for adaptive instructional system authoring.

3) Using FML for improving pedagogical strategies: Pedagogical strategies are represented in AIS in many
ways. A common way is as a set of rules, which may be an event-condition-action table, a sequence of
speech or dialog acts, rules based on instructional design theories, or branching and remediation rules
based on estimates of the learner’s current state. In existing AIS, these action rules are almost

Authorized licensed use limited to: IEEE Xplore. Downloaded on April 09,2024 at 09:28:55 UTC from IEEE Xplore. Restrictions apply.
determined a priori, but the input to the rules is often machine learned. For example, the rules might
dictate different instructional behaviors based on the categorization of learners, and the categorizations
might be learned from data. Or, the rules might require estimates of the effectiveness of learner activity,
or classification of learning activity in educational taxonomies (e.g., Blooms). To the extent that learners
or activities are exchanged by multiple systems, FML can be used to improve accuracy without
compromising the privacy of data. In addition, it is possible that the rules themselves are machine learned.
In that case, using methods like reinforcement learning, it is important to have large data sets by
aggregating data from multiple sources across multiple students.

3.5. URBAN COMPUTING

Urban computing is a process of acquiring, integrating, and analyzing heterogeneous big data generated by a
diversity of sources including sensors, devices, vehicles, buildings, and humans in urban space. Applications of
urban computing aim to tackle major city challenges, such as air pollution, increased energy consumption, and
traffic congestion. Urban computing also helps us understand the nature of urban phenomena and even predict
the future of cities. Urban computing is an interdisciplinary field fusing the computing science field with traditional
fields like transportation, civil engineering, economy, ecology, and sociology in the context of urban spaces. In
many urban computing scenarios, there are regulatory requirements and privacy concerns that prevent the
sharing of data. FML can overcome these challenges by building a federated machine learning model across
organizations while the individual data of each organization stay in their local environment. Smart ride-hailing is
one example of this use case. Ride-hailing companies have a strong incentive to find optimal solutions to the
Vehicle Routing Problem. GPS data from these vehicles provide information on the number of vehicles and their
speed along with different road segments, facilitating the predictions about future traffic conditions. While ride-
hailing companies are not allowed or may be unwilling to share valuable data with each other, FML solves the
problem by allowing different companies to build and train a federated machine learning model, with model
parametersbut not the private datasecurely exchanged under the federated system’s encryption mechanism.
Environmental protection is another example of the use case. Air quality prediction can help residents take
precautionary measures and allow city governments to implement corresponding countermeasures. However,
this can be challenging since the air quality of a region depends on many factors, including industrial emissions,
vehicle exhaust, and meteorological conditions. Factories are not allowed or may not be willing to share data
about their real-time emissions, from which sensitive operational and financial information may be gleaned.
Regulatory and privacy concerns may also prevent environmental regulators from collecting data about individual
vehicles’ location, model, and speed, air quality index (AQI) readings from sparsely distributed air quality

Authorized licensed use limited to: IEEE Xplore. Downloaded on April 09,2024 at 09:28:55 UTC from IEEE Xplore. Restrictions apply.
monitoring stations and meteorological conditions and are unable to make use of more fine resolution industrial
emissions and vehicle exhaust data. FML solves this problem, allowing the training of a federated machine learning
model on heterogeneous big data, increasing the accuracy of air quality prediction.

3.6. GOVERNMENT SERVICES

Government services can be greatly improved by using big data that is collected across multiple sources.
Applications of such services include accurate traffic flow predictions and early identification of public vigilance
threats, etc. Moreover, government services are government-sponsored, and jointly provided by government
agencies and private businesses. While government agencies possess a large volume of data, this data may exist
in the form of data solo with administrative procedures and privacy concerns preventing data from being exploited
by different agencies. FML can overcome these challenges by building a federated machine learning model across
government agencies and businesses, while the individual data of each organization remain intact in their local
environment.

3.7. GOVERNMENT GOVERNANCE

Government governance can be understood as the self-optimization and management of social affairs carried out
by a government organization. Specifically, the development of e-Government towards “Open Government” and
“Smart Government” aims, ultimately, to provide efficient, smart, and personalized administrative services to
people. This reform calls for a new approach that takes advantages of an enormous amount of data, which is
distributed across a hierarchy of regional government departments. The difficulty with exploiting government
data from distributed datasets lies in requiring the protection of highly sensitive and private information located
in government organizations. It is essential to explore a new mode of exploiting the knowledge in the government
data without disclosing the data itself. Federated machine learning provides an effective solution that allows
governments to work together while protecting the data security and user privacy.

3.8. MARKETING
The development of a smart marketing strategy is usually achieved by mathematically modeling over big data sets.
Conventionally, the data sets used for modeling are the collected fundamental profiles and historical behaviors of
the advertiser’s existing clients. These data sets often cover different dimensions based on the category of the
subareas in which the advertisers serve. Any individual site may only have limited descriptive capability to produce

Authorized licensed use limited to: IEEE Xplore. Downloaded on April 09,2024 at 09:28:55 UTC from IEEE Xplore. Restrictions apply.
an accurate statistical marketing model. This eventually restricts the practical performance of a marketing strategy.
As the increasingly strict data privacy regulations take effect, including the GDPR, and the public awareness of
data confidentiality keeps rising, the sharing of data among various subareas is generally prohibited subject to
data privacy concerns, which in turn impedes the development of high-performance models for traditional
marketing platforms.

The introduction of FML, especially vertical FML across different organizations that complement each other in
data dimensions, conducts virtual model aggregation without risk of privacy breaching when making decisions. A
wide range of features and samples achieved by FML help enrich useful patterns that can be extracted for training
a machine learning model and thereby significantly improve the marketing model performance. Cooperating with
client's social behavior data collected from social media companies via FML, for example, credit rating companies
have the capability to identify clients with potentially high default rates, which can only be computed by
collectively checking on multiple financial organizations.

3.9. IOT/EDGE COMPUTING

With the development of 5G and Internet of Things (IoT) technology, edge devices, and local data are fast growing.
One of the main trends in 5G and edge computing is to power the edge devices with intelligence instead of leaving
the intelligence in a server cloud. As a result, more mobile phones and IoT devices require the ability to make
decisions locally while collaborating with servers globally. The ability to process data and build machine learning
models locally to enable privacy-preservation as well as personalize user experience is an important area across
the academic and industrial domains. Traditional AI performs the cloud-based or centralized modeling process,
which needs data transferred to the cloud servers and conducting the training, evaluation, deployment, and
serving. With the introduction of federated learning, collaborative training can be widely used on mobile phones
or IoT devices to enable data and model enhancement. AI scenarios on edge devise can be improved, which
includes personalized image processing for cameras, short video content generation, better automatic speech
recognition (ASR) and natural language understanding (NLU) for personal virtual assistants, improved
recommendation system, online advertising, smart air-conditioner and intelligent glasses with AR/VR, etc. Most
of the existing pipelines of aggregating local data into a logically or physically centralized server for processing can
be extended with FML.

As AI applications are getting increasingly popular on mobile devices, the application developers are no longer
satisfied with AI models being trained with opened datasets. They also wish to collect information from users to
optimize their models for improving model performance and user experience. However, collecting personal data

Authorized licensed use limited to: IEEE Xplore. Downloaded on April 09,2024 at 09:28:55 UTC from IEEE Xplore. Restrictions apply.
such as biometric data, photos, and other personal information will infringe on user privacy and could potentially
violate laws and regulations. Therefore, FML is a technical solution for improving model performance without
compromising user privacy. Taking personalized recommendation services as an examplerecommendation
systems are often used to manage users’ flight, meeting, and hotel booking information and provide
recommendations based on users’ personal information such as contacts, message, calendar, location,
sports/sleep data, app usage etc. In these cases, horizontal FML techniques can provide a more secure and
trustworthy solution to help prevent leakage of sensitive personal data.

Authorized licensed use limited to: IEEE Xplore. Downloaded on April 09,2024 at 09:28:55 UTC from IEEE Xplore. Restrictions apply.
4. REFERENCES

The following list of sources either has been referenced within this paper or may be useful for additional reading:

[1] IEEE Std 3652.1-2020, IEEE Guide for Architectural Framework and Application of Federated Machine Learning,
2020. https://standards.ieee.org/standard/3652_1-2020.html

[2] Jakub Konecný, H. Brendan McMahan, Daniel Ramage, and Peter Richtárik. 2016a. Federated Optimization:
Distributed Machine Learning for On-Device Intelligence. CoRR abs/1610.02527 (2016).

[3] Peter Kairouz, et al., Advances and Open Problems in Federated Learning, Foundations and Trends in Machine
Learning Vol 4 Issue 1. http://dx.doi.org/10.1561/2200000083. 2021.

[4] Qiang Yang, Yang Liu, Yong Cheng, Yan Kang, Tianjian Chen, Han Yu. Federated Learning. ISBN:
9781681736976, https://doi.org/10.2200/S00960ED2V01Y201910AIM043. Morgan & Claypool Publishers,
Dec 2019.

[5] H. Brendan McMahan, et al. Federated Learning of Deep Networks using Model Averaging. CoRR
abs/1602.05629 (2016). arXiv:1602.05629, 2016.

[6] Keith Bonawitz, et al. Practical Secure Aggregation for Privacy-Preserving Machine Learning. In Proceedings of
the 2017 ACM SIGSAC Conference on Computer and Communications Security (CCS ’17). ACM, New York, NY,
USA, 1175–1191. 2017.

[7] Qiang Yang, Yu Zhang, Wenyuan Dai and Sinno Jilin Pan, Transfer Learning. ISBN: 9781139061773. DOI:
https://doi.org/10.1017/9781139061773. Cambridge University Press. Jan. 2020.

Authorized licensed use limited to: IEEE Xplore. Downloaded on April 09,2024 at 09:28:55 UTC from IEEE Xplore. Restrictions apply.
RAISING THE WORLD’S
STANDARDS