A Differential Privacy Protection Based Federat - 2024 - Engineering Application
A Differential Privacy Protection Based Federat - 2024 - Engineering Application
Keywords: Nowadays, companies collect massive quantities of data to enhance their operations, often at the expense of
Deep learning sharing user sensible information. This data is widely used to train Deep Learning (DL) neural networks to
Fog computing model, classify, or recognize complex data. These activities enable companies to offer an array of services to
Internet of Things
users, such as precise advertising and optimal location services. This study explores potential solutions for
Privacy
preserving privacy while utilizing DL applications.
Sensitive information
To address the privacy issue, we develop a privacy-preserving framework specifically designed for fog
computing environments. Unlike traditional cloud computing architectures, fog embedded architectures only
share a small portion of user data with a nearby fog node, ensuring that the majority of sensitive data
remains secure. Within these fog nodes, we incorporate two additional algorithms, namely Generalization
and Threshold, to enhance the privacy-preserving capabilities of the framework.
The first algorithm, Generalization, introduces a validation dataset within the fog nodes which not only
increases the accuracy of the fog-embedded framework but also ensures that user data is preserved. The second
algorithm, Threshold, is responsible for protecting user data samples and reducing the amount of information
sent to the server. By combining these two algorithms, we are able to provide an additional layer of protection
for user privacy while still maintaining the accuracy of the model.
We conduct an evaluation to test its effectiveness using two separate datasets. In addition, we analyze
them through a Feed Forward Neural Network (FFNN) and compare the results with a traditional centralized
architecture to validate the effectiveness of the proposed framework.
The results of our evaluation demonstrate that the proposed privacy-preserving framework, when combined
with the Generalization and Threshold algorithms, can preserve up to 38.44% of user data. Additionally, we
were able to extend the framework to multiple fog nodes without compromising the network’s accuracy, as
we only observed a 0.1% decrease in accuracy when using the proposed architecture.
This study emphasizes the importance of preserving user information while using DL applications and
provides a solution that trains the desired network without violating user privacy, hence preserving their
anonymity. Overall, the study highlights the potential of Federated Deep Learning to improve the accuracy
and privacy of DL applications in fog computing environments.
∗ Corresponding author.
E-mail addresses: [email protected] (N. Gutiérrez), [email protected] (B. Otero), [email protected] (E. Rodríguez), [email protected]
(G. Utrera), [email protected] (S. Mus), [email protected] (R. Canal).
1
These authors contributed equally to this work.
https://doi.org/10.1016/j.engappai.2023.107689
Received 10 February 2023; Received in revised form 14 July 2023; Accepted 7 December 2023
Available online 13 December 2023
0952-1976/© 2023 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-
nc-nd/4.0/).
N. Gutiérrez et al. Engineering Applications of Artificial Intelligence 130 (2024) 107689
2
N. Gutiérrez et al. Engineering Applications of Artificial Intelligence 130 (2024) 107689
Table 1 the fog nodes can act as the intermediaries between the end-devices
Related privacy protection works in IoT-Fog networks.
and the central server, ensuring that the data never leaves the private
Reference Architecture Protection model
internal system. The fog nodes can also perform the aggregation of
Shokri and Shmatikov (2015) Distributed DL Distributed the model updates, ensuring that only the encoded model updates are
training
transmitted to the central server. Fig. 1 shows the privacy diagram of
Phong et al. (2017) Distributed DL Homomorphic
encryption the fog-embedded framework.
Lyu et al. (2019) Distributed DL – The proposed privacy-preserving framework for DL applications in
Boulemtafes et al. (2020) – Differential Privacy fog computing environments is designed with three layers: cloud server,
Gong et al. (2020) – Differential Privacy
Ma et al. (2022) Federated DL Homomorphic
fog nodes, and end-nodes.
encryption Furthermore, it consists of a tree-like structure with a generic cloud
Moqurrab et al. (2022) Distributed DL – server, which is the root of the design. Then, the middle components
Abdel-Basset et al. (2022) Federated DL Private identifiers
are the fog nodes, and finally we find the numerous end devices, which
Utomo et al. (2022) Federated DL AI Fairness checking
methods represent different IoT devices or users. These end devices are specific
components that contain all the sensitive data that must be preserved.
In the proposed framework, the fog nodes act as a firewall, protect-
ing user privacy by training all the information in the fog nodes and
proposed in the next section, as well as datasets that do contain sensible
uploading later only an encoded portion of the results to the server.
information.
Moreover, the main interest of this architecture is that each end-
Fog-enabled deep learning models are used in the healthcare do-
device has a fog node assigned, which confines a private internal
main to preserve user data privacy. Moqurrab et al. (2022) base its
system. In this private network, only the components inside it can
model on a CNN with bidirectional-LSTM. This model provides a flex-
access the unprocessed information and gradients.
ible privacy protection via user-defined privacy thresholds, however
it fails on semantic privacy protection (same as differential privacy). The operation of the architecture is defined by several steps. Before
Federated Deep Learning (FDL) also is used to improve data privacy proceeding with the description of the steps illustrated in Fig. 1, it is
against GAN attacks. Abdel-Basset et al. (2022) devise a framework important to establish the following definitions:
where fog nodes train the FDL model ensuring that contributors do not
have access to data form other contributors by using private identifiers • 𝑅𝐷𝑥 is user raw data, where 𝑥 stands for the user id.
to protect class probabilities. The framework is evaluated for image • 𝑃 𝐷𝑥 is the preprocessed user data.
classification using CNNs, demonstrating the effectiveness. • 𝐺𝐷 is the Gradient Dataset used in Section 3.1.
On the dataset side, Ligett et al. (2017) and Karapiperis and Verykios • [𝛥𝑊𝑥 ] are the gradient weights for each user.
(2015) use datasets that contain sensitive information, such as clinical • 𝑀𝑤 is the global model.
records or financial status. Their studies reveal an accuracy loss in their • 𝑤 are the different layer weights.
regression analysis, as the main limitation of differential privacy-based • [𝛥𝑊 𝑀] stands for the gradient weights after applying the weig-
solutions is sensitive data management. Consequently, in Gong et al. hted mean in the Generalization Algorithm.
(2019) and Gong et al. (2020), the authors overcome this problem • [𝛥𝑊 ] are the resulting gradient weights of the Threshold Shuffler
proposing the PrivR framework. This framework implements a differ- Algorithm.
ential privacy regression model based on relevance. Their technique
consists of a polynomial transformation of the objective function. First a portion of each user data, mapped with an IoT device, is
Furthermore, it disrupts the coefficients by adding less noise in more sent to the fog node so that the global model can be trained. This data
relevant coefficients and vice versa. The solution was tested using corresponds to the first and second steps in Fig. 1. The global model
the Adult, and Banking datasets (Dua and Graff, 2017) obtaining an resides in the cloud server. This model contains the weights that will
accuracy of 85% and 89%. Finally, the authors conclude that the PrivR be updated by the fog nodes. The fog node downloads the last version
framework effectively prevents the leakage of data retaining the utility of the global model from the cloud server and it trains it with the end-
of the model. Table 1 summarizes the related work highlighting the nodes dataset. This can be appreciated in the third and fourth steps of
architecture of the solution and the protection model. Fig. 1.
In contrast to the previous work, this proposal evaluates a dis- Each time a fog node trains the model with the given data sepa-
tributed framework. In our proposal, we test these same datasets in rately, two privacy techniques are applied. The first technique is the
a distributed framework, a fog-embedded framework. Also, their tech- Generalization Algorithm, which is explained in Section 3.1, plus the
niques do not preserve user data; instead, they help weigh the most Threshold Algorithm described in Section 3.2. The training in conjunc-
relevant characteristics in the datasets. Finally, these articles do not tion with the aforementioned algorithms are represented in steps 5, 6
use DL to predict the datasets outputs, whereas this work does. and 7 of Fig. 1. The new gradient weights received from the algorithms
In short, the main difference of the proposed solution regarding
and training of the model are sent continuously to the cloud server in
previous Federated DL solutions (Ma et al., 2022), (Abdel-Basset et al.,
step 8. Subsequently, the cloud server updates the global model weights
2022) and (Utomo et al., 2022) is the usage of the Generalization and
and tests them in steps 9 and 10.
Threshold algorithms that enhance the privacy-preserving capabilities
After the global model is updated and uploaded to the cloud server,
of the framework while maintaining the neuronal network accuracy.
the new weights are re-sent to the next fog node. The same process is
Moreover, we perform a comparative analysis of the centralized and
federated models by implementing both systems and testing them using repeated from step 3. The final goal is to ensure that in each iteration,
the Adult and Banking datasets. the model is further improved and that every fog node trains with the
latest version of the global model.
3. Framework Hence, the fog-embedded framework is designed to preserve user
privacy since the IoT end nodes do not reveal their raw data, nor raw
This article proposes several techniques to preserve user privacy gradients to the cloud server.
using a fog-embedded framework. In this way, Fig. 1 shows in detail the operation of the fog-embedded
By combining Federated DL with the fog-embedded framework framework proposed in this work describing the sequence of steps to
proposed, the privacy of user data can be further secured. Specifically, preserve user privacy for computing environments.
3
N. Gutiérrez et al. Engineering Applications of Artificial Intelligence 130 (2024) 107689
3.1. Generalization algorithm The goal of this technique is to return a gradient of a random weight
between a selected threshold, thereby ensuring that raw information is
The primary purpose of our Generalization algorithm is to acquire never sent to the server.
a better accuracy under the fog-embedded framework. Due to the First, as seen in lines 4 and 5, the maximum and minimum values
distributed nature of the data across the different fog nodes, each fog of the interval are calculated (𝑙𝛥𝑓 𝑖𝑛𝑎𝑙 and 𝑙𝛥𝑖𝑛𝑖𝑡𝑖𝑎𝑙 respectively). This com-
putation returns a 10% higher and lowers values than the previously
node trains with only a portion of the total dataset, hence decreasing
computed weight.
the accuracy. This method proposes a solution to the accuracy problem.
After the threshold is set, a cryptographically secure random num-
The pseudocode listed in Fig. 2 describes the proposed Generaliza-
ber is selected between the threshold limits and returned to the fog
tion algorithm. First, a portion of the data received by the IoT devices
node. By incorporating this technique, the user privacy is further pro-
is saved to a new dataset, called Generalization dataset. Then, the tected as the returned number is unpredictable which makes it unlikely
Generalization dataset is used to validate the model. The validation to know the exact initial weights.
process generates a grade for each model weight and then it applies This algorithm is the continuation of the Generalization Algorithm
the weighted mean to the result. Subsequently, a new gradient weight and corresponds to step 7 in Fig. 1, and again, the input and output
is computed. values match. Finally, Fig. 5 shows a simplified flow of the high-level
We can observe that the algorithm shows the update process of process of the Threshold Shuffler algorithm.
the fog nodes model with the updated weights on line 6. Then, the The framework was also tested using alternative algorithms, namely
model is evaluated by extracting a grade from each weight using the the mean and median of all node weights in addition to the randomized
Generalization dataset on line 7. In the last step, the weighted mean is solution. However, the mean and median algorithms were not able to
computed on line 9 of the algorithm. This algorithm corresponds to step provide the same level of privacy protection as the Threshold Shuffler
6 in Fig. 1, showing why the input and output values match. Last, Fig. 3 Algorithm. This is why the proposed Threshold Shuffler Algorithm was
shows a simplified flow of the high-level process of the Generalization ultimately selected and implemented.
algorithm. To compare the proposal, we built up a centralized architecture.
The centralized architecture consists of a traditional and standard DL
architecture using cloud computing. As shown in Fig. 6, the architecture
3.2. Threshold shuffler algorithm has two parts; the cloud server and the users or IoT devices. The devices
contain the raw data that is later used for training the model, while
The Threshold Shuffler Algorithm is an improved technique to up- the cloud server comprises the model and the validation data. First,
date the NN weights, preserving user privacy. The method is presented the clients connect to the server and upload raw data to the server,
through a pseudocode listing in Fig. 4. as seen in step 1 of Fig. 6. Later, the cloud server preprocesses the
4
N. Gutiérrez et al. Engineering Applications of Artificial Intelligence 130 (2024) 107689
5
N. Gutiérrez et al. Engineering Applications of Artificial Intelligence 130 (2024) 107689
raw data in step 2 of the same figure. Next, the cloud server trains, 3. Assign processes or threads: Each created device and node is
updates, and validates the model in steps 3, 4 and 5. Overall, as Fig. 6 then assigned to an independent process or thread to ensure that
shows, the architecture does not preserve user privacy. The raw data they operate independently and concurrently.
is information without any filter or preprocessing added, that contains 4. Distribute dataset and assign Neural Networks: Each device
all the user’s private information. The cloud environment is a privacy- receives a portion of the dataset and a Neural Network to train
violating model due to the raw data being directly uploaded to the on. The devices are also connected to their associated nodes.
server. 5. Training and communication: Devices train their respective
In conclusion, the system introduces a fog-embedded architecture Neural Networks. Upon completion, they send the training in-
that includes two algorithms to help preserve the user sensitive data formation to their associated fog nodes. The fog nodes then
while maintaining the neuronal network accuracy. It uses the fog nodes compute the specified algorithms using the information received.
as a firewall and only a selected amount of non-traceable data is sent 6. Centralized controller: A central controller is in place to gather,
to the cloud server. store, and analyze the information from all the fog nodes. This
enables the research team to evaluate the performance of the
4. Performance evaluation system and make improvements as needed.
The test was run on a 1.6 GHz Intel Core i5 with 8 GB of memory
In this section, we first describe the system implementation. After,
and an Intel UHD Graphics 617 1536 MB, using Tensorflow version
we describe and analyze the datasets. Then, we perform a sensitivity 2.0.0 with Python 3.7.4 version and the Numpy library. We used the
analysis to determine the hyperparameters that define the optimal virtual machine STANDARD_D14_V2. We employed a callback function
neural network. Finally, the section concludes with the results and to register all the statistical metrics of each execution. This enabled us
discussion. to easily compare each model and store all the data in a log folder.
The implementation of such a system enables the fog computing
4.1. System implementation and configuration paradigm to be evaluated for distributed learning, resulting in reduced
latency, improved data privacy, and better scalability compared to tra-
The implementation of the top-down system for distributed learning ditional cloud-based solutions. This approach can also aid in handling
using fog computing and Python can be summarized into the following large-scale datasets and optimizing communication and computational
steps: resources.
1. Receive configuration file: The system starts by receiving a 4.2. Dataset description
configuration file that contains information about the dataset,
Neural Network configuration, number of nodes, algorithms to To evaluate the effectiveness of the proposed fog-embedded frame-
be used by the main node, number of devices, and number of work, a series of experiments were conducted using two datasets
epochs. The value of all these hyperparameters are shown in containing user financial information, namely the Adult and Bank-
Table 6. ing datasets (Dua and Graff, 2017). In this way, we demonstrate
2. Create devices and nodes: Using the configuration, the system the versatility of the framework and algorithms. Plus, we highlight
creates the required number of devices and nodes. These values its potential for application in different contexts beyond the specific
are shown in Tables 4 and 5. scenarios tested.
6
N. Gutiérrez et al. Engineering Applications of Artificial Intelligence 130 (2024) 107689
Table 2 Table 4
Individual parameter treatment for Adult dataset. Number of samples in each device and fog node for the Adult dataset.
Normalization One-hot Deleted Framework Samples per fog node Samples per
IoT device
age workclass fnlwgt
education-num occupation education Centralized architecture – 43,958
capital-gain marital-status Three nodes 14,652 7326
capital-loss relationship Five nodes 8790 4395
hours-per-week race Seven nodes 6278 3139
native county Validation data – 4884
sex
Table 5
Table 3 Number of samples in each device and fog node for the Banking dataset.
Individual parameter treatment for Banking dataset. Framework Samples per Samples per
Normalization One-hot Deleted fog node IoT device
age job contact Centralized architecture – 40,690
balance marital day Three nodes 13,562 6781
pdays education duration Five nodes 8138 4069
previous default campaign Seven nodes 5812 2906
housing Validation data – 4521
loan
month
poutcome
4521 samples from the Banking dataset, were selected which corre-
spond to 10% of all data. Having only 10% of the data for validation
assures that this dataset does not exceed the size of the samples each
The Adult dataset consists of 48,842 information samples. Each
fog node receives. The remaining data samples were saved as training
record has 14 features: age, workclass, fnlwgt, education, education-num,
data.
marital-status, occupation, relationship, race, sex, capital-gain, capital-loss,
To determine the number of samples allocated to each fog node,
hours-per-week, and native county. Also, the dataset contains a response,
the number of devices per fog node must be taken into consideration.
income. The Adult NN purpose is to separate the input data into high
In our experiments, each fog node consists of two devices, therefore,
income (more than 50K/year) or small income (smaller or equal than
the total data of all devices is divided and allocated to each fog node
50K/year).
accordingly.
The Banking dataset consists of 45,211 information samples. Each
The system was tested with three, five and seven fog nodes. The
record has 21 features: age, job, marital, education, default, housing,
reason for changing the number of fog nodes is that our goal is to
loan, contact, day, month, duration, campaign, pdays, previous, poutcome,
maintain a similar accuracy independently from the number of nodes in
emp.var.rate, cons.price.idx, cons.conf.idx, euribor3 m, nr.employed and y.
the network. The distribution of data among the devices and fog nodes
The Banking NN purpose is to predict whether a client can subscribe a
is represented in both Tables 4 and 5.
term deposit.
Looking at the Table 5, we can observe that the centralized architec-
When choosing a dataset, some considerations were taken into
ture contains all the data samples from all the end nodes, and it trains
account. First, since the framework is a privacy-preserving architecture,
the network with more points than any other fog node. On the other
the dataset has to contain information about different individuals. In
side, the fog nodes have less information to train the network, and the
the case of the Adult and Banking datasets, they contain sensitive
more distributed the nodes are, the fewer data the users will have to
information samples which are going to be preserved. Therefore, our
pass to the fog node.
goal is to safeguard user privacy by applying our fog-embedded system
to the dataset. Each dataset has received a unique preprocessing in 4.4. Description models
accordance with their characteristics.
A NN works by ascribing different weights to the samples in each
4.3. Preprocessing layer. This process is repeated a selected number of epochs, so the
weights are shaped correctly. Several design parameters must be con-
The treatment and preprocessing of the data samples is performed sidered to achieve good performance without over or underfitting,
in the end nodes (i.e. end devices). Since both datasets are very similar, which is essential to assure a good accuracy and implementation of the
a unique but akin preprocessing has been applied. For each data type network. To avoid over and underfitting, the NN has been carefully
of the parameter (e.g. number, multiple-choice), we apply a different designed.
approach. Firstly, we conducted a thorough analysis of the dataset, taking into
In the case where the dataset parameter is a numerical value, the account its size. Both datasets have a similar number of data samples
number is normalized to obtain a binary value. On the other hand, if the since the Adult dataset has 48,842 and the Banking dataset has 45,211.
parameter is a multiple-choice field, it is processed through the one-hot Furthermore, both datasets have a similar number of parameters, and,
encoding to ensure it is properly processed by the model. as stated in the previous section, they are very similar to each other.
The one-hot encoding consists of creating a column for each existing In this type of datasets the most frequent and best performing types of
different possible value of the parameter and then marking with a 1 models, are the fully-connected NNs, as stated in Brilliant.org (2020).
the column where the register belongs to. To optimize the network Fig. 7 presents an example of a FFNN, which is a fully-connected
performance, some parameters were deemed unnecessary and thus neural network, applied to the Adult dataset. Each circle represents a
removed from the dataset. unit and each column a layer. As it can be observed, the NN has an
Tables 2 and 3 show the treatment of each parameter individually input layer, three hidden layers and an output layer.
for both Adult and Banking datasets. Next, the model hyperparameters must be computed. First, each
Once these processes are finished, the new dataset is saved into two layer is equipped with an activation function and a specific number of
different serialized files, one for training and another for validation. To units. An activation function defines the node output. There is a large
create the testing dataset, 4884 samples from the Adult dataset and list of activation functions. In the proposed network, the hidden layers
7
N. Gutiérrez et al. Engineering Applications of Artificial Intelligence 130 (2024) 107689
have a Rectified Linear Unit (ReLU) activation function. The reason strategies to explore the search space more effectively. This method en-
behind having a ReLU activation function is that it does not saturate ables the rapid identification of high-performing models with a reduced
or stop to shape the samples weights. This process is vital to avoid computational cost. More details of the algorithm for hyperparameter
overfitting. As for the output layers, the used activation function is optimization can be found in Li et al. (2018).
Sigmoid. To assess the sensitivity of the proposed model to different hyper-
Second, the number of units in each layer is ascertained through an parameter configurations, the following parameters were systematically
extensive hyperparameter tuning process, which will be explained in varied during the model selection process:
more detail in the following section. Considering that fog nodes process
a smaller portion of data compared to the centralized architecture, it 1. Number of Layers: Recognizing that the number of layers in-
is essential to employ a smaller model to circumvent overfitting. The fluences the model abstraction capacity and that the available
optimal network for the fog-embedded framework is identified after samples within each device are limited, the study focused on
conducting a series of experiments, wherein the number of layers and testing small networks with one to four layers. The experimental
units are modulated. results showed a significant improvement in the NN performance
These experiments entail training multiple neural networks with when using a three-layer architecture.
diverse data subsets derived from both the Adult and Banking datasets. 2. Learning rate: The model sensitivity to various learning rates,
The initial experiments encompass broad exploration of the numbers ranging from 0.001 to 0.1, was investigated. The outcomes re-
of layers and neurons, encompassing extensive testing ranges for each vealed an increase in performance as the learning rate increased
network. The optimal configuration of the values of the hyperparam-
up to a certain threshold, beyond which performance declined.
eters that define the neural network (layers, epochs, etc.) is explained
The optimal learning rate was determined to be 0.01.
in detail in the next section.
3. Number of epochs: The sensitivity analysis also considered the
The neural network is defined by several key elements, including
impact of different epoch numbers, ranging from 5 to 50. The
the optimizer, the loss function, and the number of epochs. The loss
model performance increased with a growing number of epochs
function serves as a mathematical measure of the error in the predic-
until a certain point, at which the performance plateaued. The
tions made by the model, with the aim of minimizing this error and
optimal number of epochs was identified as 15, attributable to
optimizing the predictions.
In this study, the Mean Square Error (MSE) was selected as the loss the limited number of samples each device receives.
function. In addition to the loss function, an optimizer is required to 4. Batch size: The model sensitivity to varying batch sizes, from
shape and refine the model weights. The Adam (TensorFlow, 2020) 8 to 128, was also examined. The performance was found to
optimizer was chosen due to its high performance in a limited number improve with an increasing batch size up to a certain limit, after
of epochs. The number of epochs for the fog node to train the model was which it plateaued. The optimal batch size was established to be
determined as a final step. Considering the limited number of samples between 16 (for layer 2) and 32 neurons (for layers 1 and 3).
and the need to achieve stable model results without overtraining the
network, we decided to set the number of epochs to 15. Table 6 summarizes the ranges used to perform the experimentation
of the different configurations of the hyperparameters to determine the
4.5. Sensitivity analysis most optimal neural network. The values of the hyperparameters for
optimal configuration are shown in bold.
The model selection in this study was conducted using the Keras Upon completion of these experiments, we found that the ideal
Hyperband tool, a state-of-the-art optimization technique for hyperpa- number of layers is three, with the number of neurons in each layer
rameter tuning (see API (2020)). oscillating between 32 and 16. The resulting model structure, achieved
Hyperband is a novel approach for efficient hyperparameter opti- through this rigorous tuning process is presented in the following
mization that employs adaptive resource allocation and early stopping listing:
8
N. Gutiérrez et al. Engineering Applications of Artificial Intelligence 130 (2024) 107689
Table 6
Hyperparameter ranges used for experimentation.
Hyperparameter Configurations
Layers 1, 2, 3, 4
Learning rates 0.001, 0.005, 0.01, 0.05, 0.1
Epochs 5, 10, 15, 20, 25, 30, 35, 40, 45, 50
Batch size 8, 16, 32, 64, 128
Table 7
Summary of the value of the parameters used to perform the experiments.
Parameter Value
Number fog nodes 3,5,7
Number devices per node 2
Samples to train the model 43,958 (Adult), 40,690 (Banking)
Samples to validate the model 4884 (Adult), 4521 (Banking) Fig. 8. Mutual Information algorithm functioning and application.
Layers 3
Learning rate 0.01
Epochs 15
Batch size 16 (for layer 2), 32 (for layers 1 and 3) studied metric since it perfectly depicts the binary classification per-
formance. The loss was also computed to avoid overfitting, while the
AUC provides an aggregate measure of performance across all possible
classification thresholds. Details on the definition and meaning of these
model.Sequential([ metrics can be found at Hossin and Sulaiman (2015).
model.Dense(32, ReLU), Below, we show the results obtained for each of the metrics con-
model.Dense(16, ReLU), sidered, both to evaluate the privacy of the data and to evaluate the
model.Dense(32, ReLU), results of the model.
model.Dense(1, Sigmoid)
])
4.6.1. Mutual information
Mutual information is a measure of the amount of information
Finally, the sensitivity analysis revealed that the proposed model
shared between two random variables, indicating the degree of depen-
performance is influenced by the changes in its input parameters and
dence between them. In our case, mutual information will determine
conditions. By determining the optimal parameter values through this
the percentage of hidden information that results from applying the
analysis, the model performance can be optimized to ensure robustness
algorithms described in the previous sections to the experiments carried
under varying conditions.
out.
4.6. Experiments We used the Normalized Mutual Information to scale the results
between 0.0 (no mutual information) and 1.0 (perfect correlation)
In this section, we will discuss the experiments and results obtained (more details in Sun et al. (2019)).
for the architectures, models, and algorithms described in the previous To test the increase in the user data protection, we compare the
sections. Table 7 summarizes the values of all the parameters selected initial gradients once the fog node has trained the network, with the
to determine the neural network configuration, the required number final gradients when the network passes through the algorithms and
of devices and nodes and the samples to training and validation of the returns to the cloud server. The Normalized Mutual Information can be
)
neural network model. calculated as follows: 𝐼(𝑋, 𝑌 ) = 𝐻(𝑋)−𝐻(𝑋∣𝑌
𝐻(𝑋)
.
In this regard, to test the system accuracy, the fog-embedded ar- In the depicted equation, 𝐻(𝑋) represents the entropy of the system
chitecture is tested with three, five and seven fog nodes for each weights once the model is trained but before applying the algorithms.
dataset. Testing the framework in different scenarios allows the system 𝐻(𝑋 ∣ 𝑌 ) is the conditional entropy that remains in the weight
to generalize and evaluate the proposed architecture. The same exper- results when applying the algorithms. In other words, the mutual
iments were conducted in the centralized architecture for comparison. information calculates the quantity of information the model weights
Therefore, for each experiment, 50 executions were carried out to share before and after they have been through the algorithms. Fig. 8
ensure result stability. And later, to evaluate them, we conducted the represents the entropy and mutual information of the system. The
mean of each result. All the results obtained are shown and discussed circle described in the left represents the model entropy of the weights
in the following section. without algorithms, and in the right circle, the model weights with the
Then, to implement the code to execute the experiments, depen- applied algorithms.
dency injection was used to assure that the components are treated Furthermore, if the Normalized Mutual Information is 1.0, both
separately. Each device and fog node is individually considered, such gradients are the same (i.e. no information is preserved). The goal is
as Hudli and Hudli (2013) depicted. Consequently, the architecture is to achieve as little mutual information as possible. In the centralized
scalable, flexible and with fewer intern dependencies. architecture and in the fog-embedded framework without algorithms
On the other hand, the evaluation of the framework both at the level the mutual information can be computed as 𝐼(𝑋, 𝑋) = 𝐻(𝑋) 𝐻(𝑋)
= 1.0.
of data privacy and at the level of accuracy of the model is verified in In those cases, 𝐻(𝑋 ∣ 𝑌 ) does not compute since the scenario where
two-fold. First, the mutual information between gradients is calculated the algorithms are applied (𝑌 ) is not performed. In these cases, the
to test the preservation of user privacy. The mutual information aims Mutual Information will be maximum (1.0) since no preservation is
to identify the percentage of hidden information that results from applied to the network gradients. The difference between the central-
applying the described framework with both algorithms. ized framework and the fog-embedded framework without algorithms
Second, we show the metrics that evaluate the models obtained for is that the second does not deliver the raw data directly to the cloud
the datasets considered, and compare them with those obtained for the server, whereas the first does. To calculate the Mutual Information,
centralized architecture. The metrics analyzed were the accuracy, the we go through the following steps: First, once the network is trained,
loss and the Area under the Curve (AUC). These metrics are usually the gradient values are saved. Furthermore, the system also saves the
used for classification problems. The accuracy was chosen as the main gradient values once the algorithms are applied. From both results the
9
N. Gutiérrez et al. Engineering Applications of Artificial Intelligence 130 (2024) 107689
Fig. 9. Results obtained for the Adult dataset experiments taking into account accuracy, loss and AUC metrics.
Table 8 Additionally, the fact that the loss does not increase by more than 0.3
Results of the Mutual Information experiments. ensures that there is no overfitting nor underfitting. Finally, although
Framework Mutual information the values of the AUC metric increase slightly, the model found is better
Centralized model 1.0 for the fog architecture when compared to the centralized architecture
Nodes in Adult 0.6735 model.
Nodes in Banking 0.6156
Additionally, the same results have been obtained for the Banking
dataset as we can see in Fig. 10. In this case, we obtain similar results as
for the Adult dataset. Therefore, we achieve an accuracy of 97% for this
Normalized Mutual Information is calculated following the equation dataset, a loss of 0.151 and AUC of 0.55 in the centralized architecture.
depicted in this section. The increase in loss and AUC in the fog node experiments is negligible,
All calculations are performed for both Adult and Banking datasets and the accuracy remains constant, even if we consider the centralized
and we observed that the Mutual Information between nodes is very model.
similar. That is the reason we decided to perform the mean of all results Having both datasets perform similarly allows us to generalize the
obtained to all experiments. Table 8 shows the results and outcomes. results and stipulate that the system can be extended to other collected
As we can see in Table 8, the Mutual Information decreases in the data in similar environments. Additionally, both datasets use the same
fog-embedded framework with the applied algorithms. Additionally, network allowing transfer learning between them, such as proposed in
Table 8 shows a clear decrease in the shared information with and with- Zhuang et al. (2021). Transfer learning aims to reuse a NN model from
out information. On the one hand, with the Adult dataset the system one task and be utilized as a starting point of another. By using transfer
using the algorithms shares 32.65% less information. Likewise, with the learning, the system leverages one model for one specific data and uses
Banking dataset the system using the algorithms shares 38.44% less it for another in a similar scheme. In real-life scenarios, large quantities
information. Consequently, the results demonstrate a preservation of of data from multiple schemes can be extracted and analyzed, allowing
the user’s information while maintaining the accuracy in the NNs when companies to process and share valuable data without the necessity of
revealing user’s sensitive information.
applying the proposed algorithms in the fog-embedded framework.
Overall, the evaluation performed shows that the proposed system
can maintain accuracy throughout the different number of fog nodes,
4.6.2. Performance preservation
allowing companies to use DL techniques in their data, reducing the
The previous section proves that the Mutual Information helps to
shared information significantly in a cloud environment. Furthermore,
protect the user’s information, and it is crucial to assure accuracy and the system works just as good as the centralized architecture. Moreover,
loss stability in the new environment. In this section, we show the the loss increase is minimal and indicated that no overfitting is taking
results of the same experiments but now considering the metrics that place.
determine neural network (model) performance.
Fig. 9 shows the mean of the results in each experiment. First, the 5. Conclusions
centralized model has an average of 83.9% of accuracy, 0.341 loss
and 0.85 AUC. These results do not present overfitting since the loss This work proposes an efficient way to enhance our DL knowledge
decreases at each epoch, and the accuracy remains stable between and continue to train networks but preserve user sensitive information.
training and validation. The system performs similarly in the columns To do so, a fog-embedded framework was introduced, which is a three-
from Fig. 9, obtained in the cloud server, for three fog nodes, five fog level framework with two new algorithms inside. The fog nodes train
nodes and seven fog nodes. and apply encryption algorithms to update the weights before the final
It is important to note that the decrease in accuracy between information is sent to the cloud server. Each fog node creates a secure
fog nodes is below 1% and the increase in loss can be considered local network where the data samples reside.
insignificant. This indicates that the proposed framework is able to The fog nodes act as a privacy-preserving filter between the user
achieve consistent results regardless of the number of fog nodes used. devices and the cloud server. The weights are shuffled and randomized
10
N. Gutiérrez et al. Engineering Applications of Artificial Intelligence 130 (2024) 107689
Fig. 10. Results obtained for the Banking dataset experiments taking into account accuracy, loss and AUC metrics.
among the fog nodes, which makes it difficult to reconstruct the origi- Utrera: Resources, Visualization, Writing – review & editing. Sergi
nal user data from the updated weights. In this way, the user’s privacy Mus: Investigation, Software. Ramon Canal: Supervision, Methodol-
is preserved while still allowing for model updating and improvement. ogy, Writing – review & editing.
The algorithms are run in the fog nodes to assure accuracy stability,
no overfitting and the desired preservation of the information. Fur- Declaration of competing interest
thermore, we tested the system with different fog nodes to generalize
our results and conducted each experiment 50 times. The Normalized The authors declare that they have no known competing financial
Mutual Information was calculated in the system, comparing it with interests or personal relationships that could have appeared to influ-
the centralized architecture and got 32.65% and 38.44% less shared ence the work reported in this paper.All authors declare that they have
information between the IoT devices and the cloud servers. Finally, the no conflicts of interest.
accuracy results revealed a less than 1% decrease in accuracy and a
Data availability
negligible reduction in loss.
The results showed a favorable detection enabling the system to
We have reference the datasets used in this article.
preserve the user’s privacy. The decrease of the accuracy is negligible,
and the system can be quickly introduced in companies without raising
Acknowledgments
accuracy concerns and helping to preserve the user’s sensitive informa-
tion. These results mean that the system allows companies to internally This work is partially supported by the Spanish Ministry of Science
train their data without sharing information in a cloud environment; and Innovation, Spain under contracts PID2021-124463OB-IOO and
meanwhile, other security techniques such as the proposed algorithms PID2019-107255GB-C22; by the Generalitat de Catalunya, Spain under
are executed to assure further information preservation. grant 2021SGR00326; and by the DRAC (IU16-011591), the HORIZON
One potential limitation of the system is the need of having large Vitamin-V, Spain(101093062), the HORIZON-AG PHOENI2X, Spain
quantities of data to train the network. Additionally, the scalability of (101070586) and the HORIZON HORSE, Spain (101096342) projects.
the usage of transfer learning for multiple datasets may be constrained
and difficult to extend. References
In terms of future work, we believe that expanding the amount of
data with similar characteristics can improve the model performance. Abdel-Basset, M., Hawash, H., Moustafa, N., Razzak, I., Abd Elfattah, M., 2022. Privacy-
preserved learning from non-iid data in fog-assisted IoT: A federated learning
This expansion of data would allow for a more comprehensive study
approach. Digit. Commun. Netw.
of the capabilities and potential of transfer learning. Additionally, the API, K., 2020. Keras API reference. URL: https://keras.io/api/keras_tuner/.
system could be tested in an industrial setup that generates new data, Boulemtafes, A., Derhab, A., Challal, Y., 2020. A review of privacy-preserving
which would provide further insights into the model ability to handle techniques for deep learning. Neurocomputing 384, 21–45, URL: http://www.
sciencedirect.com/science/article/pii/S0925231219316431.
novel and unique data.
Brilliant.org, 2020. Feedforward neural networks. URL: https://brilliant.org/wiki/
In conclusion, the article proposes a system that helps to preserve feedforward-neural-networks/.
the user privacy while maintaining the neuronal network accuracy in Dua, D., Graff, C., 2017. UCI Machine Learning Repository. University of California,
a distributed fog architecture. School of Information and Computer Sciences, Irvine, URL: http://archive.ics.uci.
edu/ml.
Gong, M., Pan, K., Xie, Y., 2019. Differential privacy preservation in regression
CRediT authorship contribution statement analysis based on relevance. Knowl.-Based Syst. 173, 140–149. http://dx.doi.
org/10.1016/j.knosys.2019.02.028, URL: http://www.sciencedirect.com/science/
Norma Gutiérrez: Software, Investigation, Data curation, Writ- article/pii/S0950705119300899.
Gong, M., Pan, K., Xie, Y., Qin, A., Tang, Z., 2020. Preserving differential privacy
ing – original draft. Beatriz Otero: Conceptualization, Methodology, in deep neural networks with relevance-based adaptive noise imposition. Neural
Resources, Validation, Writing, Supervision. Eva Rodríguez: Conceptu- Netw. 125, 131–141. http://dx.doi.org/10.1016/j.neunet.2020.02.001, URL: http:
alization, Validation, Methodology, Writing – review & editing. Gladys //www.sciencedirect.com/science/article/pii/S0893608020300460.
11
N. Gutiérrez et al. Engineering Applications of Artificial Intelligence 130 (2024) 107689
Gu, K., Zhang, W., Wang, X., Li, X., Jia, W., 2023. Dual attribute-based auditing scheme Ma, J., Naas, S.A., Sigg, S., Lyu, X., 2022. Privacy-preserving federated learning based
for fog computing-based data dynamic storage with distributed collaborative on multi-key homomorphic encryption. Int. J. Intell. Syst. 37 (9), 5880–5901.
verification. IEEE Trans. Netw. Serv. Manag. 1. http://dx.doi.org/10.1109/TNSM. Martínez-Villaseñor, L., Ponce, H., Brieva, J., Moya-Albor, E., Núñez-Martínez, J.,
2023.3267235. Peñafort-Asturiano, C., 2019. UP-fall detection dataset: A multimodal approach.
Hossin, M., Sulaiman, M.N., 2015. A review on evaluation metrics for data classification Sensors 19 (9), 1988.
evaluations. Int. J. Data Min. Knowl. Manag. Process 5, 1–11. Moqurrab, S.A., Tariq, N., Anjum, A., Asheralieva, A., Malik, S.U., Malik, H., Pervaiz, H.,
Hudli, S., Hudli, R., 2013. A verification strategy for dependency injec- Gill, S.S., 2022. A deep learning-based privacy-preserving model for smart health-
tion. Lect. Notes Softw. Eng. 71–74. http://dx.doi.org/10.7763/LNSE.2013. care in internet of medical things using fog computing. Wirel. Pers. Commun. 126
V1.16, URL: https://www.researchgate.net/publication/290465955_A_Verification_ (3), 2379–2401.
Strategy_for_Dependency_Injection. Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y., 2011. Reading digits in
Karapiperis, D., Verykios, V.S., 2015. An LSH-based blocking approach with a natural images with unsupervised feature learning. URL: http://ufldl.stanford.edu/
homomorphic matching technique for privacy-preserving record linkage. IEEE housenumbers.
Trans. Knowl. Data Eng. 27 (4), 909–921. http://dx.doi.org/10.1109/TKDE.2014. Phong, L.T., Aono, Y., Hayashi, T., Wang, L., Moriai, S., 2017. Privacy-
2349916. preserving deep learning: revisited and enhanced. In: International
LeCun, Y., Cortes, C., 2010. MNIST handwritten digit database. URL: http://yann.lecun. Conference on Applications and Techniques in Information Security. URL:
com/exdb/mnist/. https://www.semanticscholar.org/paper/Privacy-Preserving-Deep-Learning%3A-
Li, L., Jamieson, K., DeSalvo, G., Rostamizadeh, A., Talwalkar, A., 2018. Hyperband: Revisited-and-Phong-Aono/3db5c5a6c720091e2c9be8bb45b96fe7be313717.
A novel bandit-based approach to hyperparameter optimization. arXiv:1603.06560, Shokri, R., Shmatikov, V., 2015. Privacy-Preserving Deep Learning. Association
URL: https://arxiv.org/abs/1603.06560. for Computing Machinery, pp. 1310–1321. http://dx.doi.org/10.1145/2810103.
Ligett, K., Neel, S., Roth, A., Waggoner, B., Wu, S.Z., 2017. Accuracy first: select- 2813687.
ing a differential privacy level for accuracy constrained ERM. In: Guyon, I., Sun, K., Tian, P., Qi, H., Ma, F., Yang, G., 2019. An improved normalized mutual
Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. information variable selection algorithm for neural network-based soft sensors.
(Eds.), Advances in Neural Information Processing Systems. Vol. 30, Curran MDPI 19 (24), 5368. http://dx.doi.org/10.3390/s19245368, URL: https://www.
Associates, Inc., pp. 2566–2576, URL: https://proceedings.neurips.cc/paper/2017/ ncbi.nlm.nih.gov/pmc/articles/PMC6960561/.
file/86df7dcfd896fcaf2674f757a2463eba-Paper.pdf. TensorFlow, 2020. tf.keras.optimizers.Adam. URL: https://www.tensorflow.org/api_
Lyu, L., Bezdek, J., He, X., Jin, J., 2019. Fog-embedded deep learning for the internet docs/python/tf/keras/optimizers/Adam.
of things. IEEE Trans. Ind. Inform. PP, 1. http://dx.doi.org/10.1109/TII.2019. Utomo, S., John, A., Rouniyar, A., Hsu, H.-C., Hsiung, P.-A., 2022. Federated trust-
2912465. worthy AI architecture for smart cities. In: 2022 IEEE International Smart Cities
Conference. ISC2, pp. 1–7. http://dx.doi.org/10.1109/ISC255366.2022.9922069.
Zhuang, F., Qi, Z., Duan, K., Xi, D., Zhu, Y., Zhu, H., Xiong, H., He, Q., 2021.
A comprehensive survey on transfer learning. Proc. IEEE 109 (1), 43–76. http:
//dx.doi.org/10.1109/JPROC.2020.3004555.
12