0% found this document useful (0 votes)
19 views10 pages

Efficient Video Privacy Protection Against Malicious Face Recognition Models

The document proposes an efficient method for protecting privacy in videos against malicious facial recognition models. Existing protection techniques for images add noise to images (called "cloaks") but have high computational overhead when applied to videos. The proposed method exploits computation redundancy by reusing cloaks across video frames based on relationships between key facial features, significantly reducing computation time compared to existing methods. The method was implemented on TensorFlow and evaluated on standard video datasets, showing a 35.5% reduction in computation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views10 pages

Efficient Video Privacy Protection Against Malicious Face Recognition Models

The document proposes an efficient method for protecting privacy in videos against malicious facial recognition models. Existing protection techniques for images add noise to images (called "cloaks") but have high computational overhead when applied to videos. The proposed method exploits computation redundancy by reusing cloaks across video frames based on relationships between key facial features, significantly reducing computation time compared to existing methods. The method was implemented on TensorFlow and evaluated on standard video datasets, showing a 35.5% reduction in computation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Received 5 October 2022; accepted 24 October 2022.

Date of publication 1 November 2022;


date of current version 30 November 2022. The review of this article was arranged by Associate Editor Celimuge Wu.
Digital Object Identifier 10.1109/OJCS.2022.3218559

Efficient Video Privacy Protection Against


Malicious Face Recognition Models
ENTING GUO 1, PENG LI 1 (Senior Member, IEEE), SHUI YU 2 (Senior Member, IEEE),
AND HAO WANG 3 (Senior Member, IEEE)
1
School of Computer Science and Engineering, University of Aizu, Aizuwakamatsu 965-8580, Japan
2
School of Computer Science, University of Technology Sydney, Ultimo, NSW 2007, Australia
3
Department of Computer Science, Norwegian University of Science and Technology, 7034 Trondheim, Norway
CORRESPONDING AUTHOR: PENG LI (e-mail: [email protected]).
This work was supported in part by Chinese Scholarship Council (CSC).

ABSTRACT The proliferation of powerful facial recognition systems poses a serious threat to user privacy.
Attackers could train highly accurate facial recognition models using public data on social platforms.
Therefore, recent works have proposed image pre-processing techniques to protect user privacy. Without
affecting people’s normal viewing, these techniques add special noises into images, so that it would be
difficult for attackers to train models with high accuracy. However, existing protection techniques are
mainly designed for image data protection, and they cannot be directly applied for video data because of
high computational overhead. In this paper, we propose an efficient protection method for video privacy
that exploits unique features of video protection to eliminate computation redundancy for computational
acceleration. The evaluation results under various benchmarks demonstrate that our method significantly
outperforms the traditional methods by reducing computation overhead by 35.5%.

INDEX TERMS Computation reuse, deep learning, video privacy.

I. INTRODUCTION for images. Specifically, the faces in images are masked


Deep Learning (DL) shows great promises for facial recog- by a special kind of matrices, called cloaks. These cloaks
nition [1], [2], which enables various intelligent services are designed in a sophisticated way, so that people cannot
(e.g., medical records, smart communication) while posing distinguish masked images and original ones with human
a threat to personal privacy. With the popularity of social vision capability. Meanwhile, even though these masked im-
networks, people share their images or videos on Twitter or ages are fed to malicious models, the model training cannot
Facebook [3], [4], [5]. By using these data available on social converge to high accuracy. Despite the promises of Fawkes
platforms, attackers with modest resources can build highly for images, it does not work well for video protection be-
accurate facial recognition models without people’s aware- cause of its high computational overhead. Masking a single
ness [5], [6], [7]. Therefore, it is important to protect user face takes about 400 seconds on a mid-range GPU, and
privacy hidden in public images and videos from unauthorized a one-minute video with 60 frames per second needs 10
facial recognition trackers. hours.
Some methods have been proposed to avoid facial recog- In this paper, we find that there exists a large amount of
nition by deforming images, but they degrades user expe- computation redundancy in the video masking process. Moti-
riences [5], [8], [9]. Others require users to wear specific vated by this finding, we propose to accelerate face masking
clothing with patterns that interfere with the recognition for videos by reusing some cloaks. Specifically, we first locate
model [10], [11]. Some protection methods [12], [13] need faces in videos through the key positions of the eyes and
the information of attack models to generate protection data. mouths using the MTCNN method [15]. Then, we propose
As one of the state-of-the-art privacy protection techniques, a matrix affine technique to transform cloaks based on the
Fawkes [14], [15] considers both visual effects and privacy relationship of key positions, avoiding re-generating cloaks

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
VOLUME 3, 2022 271
GUO ET AL.: EFFICIENT VIDEO PRIVACY PROTECTION AGAINST MALICIOUS FACE RECOGNITION MODELS

for every video frames. The main contributions of this paper transformation, the authors offer a unique security enhance-
are listed as follows: ment cycle-consistent generative adversarial network (GAN)
r We find that existing privacy protection methods against [17], [19]. In PECAM, for instance, each vehicle’s license
malicious face recognition have high computational plate number is regarded as private information when used
overhead for videos. The main overhead comes from the for video surveillance. The monitoring video’s finer details
process of generating cloaks for faces in video frames. are masked by GAN before being uploaded to the server. The
r We propose a novel method to reuse existing cloaks, structural information is also kept, and it has been empirically
instead of generating new ones to reduce computational demonstrated to be secure and reversible. Consequently, it is
overhead, so that video protection can be accelerated. usually possible to determine the application of road condi-
r We implement our method on TensorFlow and use well- tion and accident analysis [20], [21]. Moreover, PECAM can
known video data sets for performance evaluation. The retrieve the data for in-depth analysis. For privacy protection,
results of experiments show that 35.5 % computation can this method works effectively. To get the same visual impres-
be saved. sion in the context of this paper however, is challenging.
The rest of our paper is organized as follows. We first intro- Some programs alter images and videos so they look good
duce the background and discuss the motivation in Section II. from both a human and a machine perspective. The backdoor
Section III presents the algorithm design. Then, we show attack is a typical example, in which the attacker controls
performance evaluation in Section IV. Finally, we draw the the training of the DL model by supplying specific data and
conclusion in Section V. labels. This method is also analyzed following as protecting
privacy via poisoning attacks. After the DL model is impacted
by backdoor data, the data containing the relevant inputs are
II. BACKGROUND AND RELATED WORK
classified into predetermined categories. Since the change
The current protection techniques are outlined in this section.
of the images or videos is the user’s personal conduct in
Additionally, we preview three aspects, including metrics,
this article’s context, other people’s data cannot be impacted.
sorts of attackers, and protection methods. The current study
Furthermore, personal information is shared on social media
extends numerous types of applications and focuses on the
networks where tags cannot be changed. Therefore, back-
human vision and machine recognition aspects. Attacker types
door’s approach is not completely applicable to our scenario.
are separated into authorized model and unauthorized model
In summary, we compare two parallel metrics in existing
based on the relationship between attackers and protectors.
applications and list some applications. Moreover, the short-
Additionally, we list the protections against the unauthorized
comings of the above methods for the scenario in our article
model.
are compared.

A. METRICS OF HUMAN AND MACHINE B. ATTACKER TYPES


An increasing number of apps now use this recognition Recent studies have shown that DL models can memorize the
method as DL develops. Modifying images has resulted in information of training data [22], [23]. Due to the possibility
several applications, whether they be from a human or ma- that these attacks could reveal sensitive information about
chine perspective. Applications in the arts and entertainment training dataset participants, training models with differen-
change how people perceive themselves. Moreover, appli- tially private (DP) is becoming more popular [24], [25]. Note
cations that avoid face or sign recognition can result from that these methods presume a stable model trainer and are not
machine perspective analysis of photos. Applications that scan appropriate for unauthorized model trainers. This DP scheme
for malware tampering or circumvention also consider the two is shown as:
aforementioned factors.  
Pr[M(x) ∈ S] ≤ eE Pr M(x  ) ∈ S + δ (1)
Celebrity faces are replaced in videos with Deepfakes using
DL, and the edited content is then made available online [16], where δ is a positive real number for relaxation and E is the
[17]. Applications for this technology are innovative and use- scale coefficient of the protection method. The underlying
ful. Examples include the accurate video dubbing of foreign premise of DP is that, if the impact of randomly altering
films, historical figure recreations for educational purposes, database entries is sufficiently limited, the statistical charac-
and virtual dressing rooms when shopping. Although the re- teristics that result cannot be utilized to deduce the content
sultant artifacts are imperceptible to humans, DL analysis of a single record [26], [27]. In general, an algorithm is a DP
makes them easy to spot. Some research look for certain algorithm if it is impossible to determine whether the result of
artifacts to identify deepfakes. the algorithm uses data about a specific person.
Additionally, certain information is not readily detectable DP is usually used to query data [28], [29]. The cutting-
by humans but is recognized by machines. In addition, edge approach uses the concept of inquiry to video in order
the machine view can be utilized to conceal image or to safeguard privacy. Privid first divides the long videos into
video details without impacting the structure data. Using short videos of equal length and counts the appearance time
the real-time video streaming analytics (VSA) front-end, the of the object [30]. After that, the object occurrence time is
privacy enhancement system (PECAM) is a flexible privacy- sorted into the same database records. The authors use statis-
enhancing system [18]. To carry out the privacy enhancement tics to measure the privacy of each object and the information
272 VOLUME 3, 2022
FIGURE 1. There are three ways to protect privacy on images. 1) Evasion protection methods cover a portion of the face with the suitable decoration. 2)
Poison protection methods generate the images to interfere with the attackers. 3) Cloaking protection methods go a step further and modify the image at
the pixel level.

available in a single query. Each access can only return results it is difficult to eliminate the influence of backdoor data in the
that have been processed, such as the quantity and duration of model, it is not difficult to detect attacks [34]. On the other
objects. Using this method, Privid divides continuous video hand, the label is the owner of image under face recognition
into discrete query pieces and then increases the DP of the task. Since our images or videos are published on social plat-
query results [26], [30]. This method mostly applies to track- forms, they can only be protected by clean label methods.
ing statistics, not to private social platform videos. We have Clean label attacks does not change the label of the data, but
no control over how the video is edited or how the inquiry by modifying the original data to protect the effect.
process works. Methods based on DP are therefore unrelated
to the subject matter of this paper. Not the amount of data a 3) PROTECTING PRIVACY VIA CLOAK
query can access at once needs to be regulated, but rather the
Protecting privacy via cloak is more suitable for the scenario
effect that an unauthorized attacker could have after training
of individual privacy [10], [14], [32]. Firstly, it can be modi-
the data.
fied for a user’s personal data, Secondly, it has good generality
and can deal with a wide range of models. Third, the conceal-
C. PROTECTING PRIVACY METHODS
ment of abnormal detection is better. Different from backdoor
We provide a preview of cloaks which protect privacy through
attacks, this approach does not trigger the wrong classification
the pixel-level changes. As shown in Fig. 1, there are three
by having the model record a particular input-output pair.
ways to protect privacy on images. Protecting privacy via
This particular input-output pair is independent of the original
evasion attacks uses the appropriate decoration to cover part of
recognition task. Protection method via cloak is equivalent to
the face. Poison in protection methods generate the images to
modifying the input-output pair in the recognition task.
interfere the attackers. Cloaking protection methods modifies
the image at the pixel-level. In order to protect user privacy
from unauthorized attackers, many techniques use attacks III. ALGORITHM DESIGN
against DL models. This scenario can be seen as a reversal of This section describes threat model and assumptions as shown
the traditional roles of attacker and protector, where the user in Fig. 2. The method for locating key positions of the face is
are the attackers and a third party tracker with unauthorized then demonstrated. By combining the consideration of visual
tracing is the protector. effects and privacy protections, we formulate the loss func-
tion. We then go over how to re-generate the cloaks using
1) PROTECTING PRIVACY VIA EVASION ATTACKS adjacent frames.
This type of technology requires the user to wear the appropri-
ate decoration, which is not suitable for normal use [22], [31]. A. THREAT MODEL AND ASSUMPTIONS
To evade tracking, this kind of methods need adequate white In this section, we present the threat model and assumptions
box access to the attacker model. The recognition effect of the for both users and trackers as shown in Fig. 1. Then we
tracker is calculated as the optimization objective. Therefore, analyse the intermediate results of face recognition models,
the scope of application is small and easy to be defended [32], which are called intermediate features. We follow existing
[33]. The other kind of evasion method changes the original work’s assumptions about the computing power, where users
image obviously, which will affect the normal use. can obtain intermediate features [13], [14].
Users: Users’ goal is to share their images or videos on so-
2) PROTECTING PRIVACY VIA POISONING ATTACKS cial platforms while preventing facial recognition by unautho-
Another way to avoid DL model attack is to interfere with rized facial recognition trackers. In addition, changes of visual
their training [31], [33]. A typical one is the backdoor attack, effects should be constrained, to maintain regular usage [14],
in which the attacker guides the training process of DL model [34], [35]. Therefore, users should apply data pre-processing
by generating specific data and labels. Model corruption at- for protection before uploading. The pre-processed data can
tacks actively attack the tracker, which will easily lead the lead trackers to train faulty models that fails to recognize user
tracker to use more advanced attack methods. In fact, although faces. Unfortunately, such pre-processing is computationally

VOLUME 3, 2022 273


GUO ET AL.: EFFICIENT VIDEO PRIVACY PROTECTION AGAINST MALICIOUS FACE RECOGNITION MODELS

FIGURE 2. Users would like to publish their videos or images on social media platforms but they don’t want unauthorized face recognition trackers to be
able to identify who they are. Trackers have sufficient processing power. They can train the recognition model using a vast data set available on social
media.

FIGURE 3. The operation principle of the protection method in feature space is visually demonstrated. In the figure, we show an example of a dataset
with four people. Classes are distinguished by different shapes, where the circle is the protected class. (a) The data is spread over locations. (b) The
protection method changes data features. (c) The feature distribution of video is more concentrated than individual images.

expensive, especially for videos, which brings high compu- different locations in the feature space. Note that feature space
tation overhead for users. Therefore, we have the following has multiple dimensions; for visual purposes, we present a
design goals for the protection method. two-dimensional one here. The same user’s data is distributed
r The protection method should not affect the visual ef- across nearby regions as a result of their similar features. The
fects of images or videos. orange circle user pre-processes his data using the protection
r The malicious models trained on the pre-processed data method. To obfuscate the malicious model, the protection
cannot recognize user’s faces. method moves the user’s data to the locations of another user,
r The computational overhead of the protection method called target class. In our example, the user data is moved
should be acceptable for users. closer to the red triangle class. Due to the large variation of
Tracker: We assume that trackers have sufficient computing features, even if the tracker selects a different model, it won’t
power. They can access large data sets or use pre-trained typically assign the orange circle data to its real class.
feature extractors through transfer learning. The intermediate Moreover, the relationship between the data belonging to
feature of faces in the data set are located in the same high- the same user must be considered. In fact, the identical per-
dimensional space, which is called feature space. As a result son’s data are not entirely mapped in the same location in
of the large amount of data in the feature space, we have the feature space. Weather, angle, and camera equipment can
chance to confuse our own data with other data. The attacker also have an impact on the intermediate features of the same
that only identifies a single user is out of this paper’s scope. person. In a given video, these outside variables tend not to
At the same time, we mainly consider the case, where social drastically alter. As a result, compared to individual images,
platform is the primary source of personal data. Although, in the intermediate elements in a video frame are more closely
reality, user’s data might also be leaked from other sources. If distributed as shown in Fig. 3(c).
the user provide enough pre-processed data, the effect of real
data provided by other sources is negligible. B. PROTECTION PROCESS
Feature Space: We explain the intuition of the protection 1) LOCATING FACE POSITIONS
method using an example with four classes in the feature When pre-processing video data, the video is first divided into
space as shown in Fig. 3. Four users’ data are mapped to frames. Inspired by MTCNN [15], we use a positioning model

274 VOLUME 3, 2022


FIGURE 4. Video image cloaks are related to one another. The key positions are tracked by the model for position detection. Based on key positions, we
apply an affine transformation method for cloaks.

cascaded of three deep convolutional neural networks (CNN) Note that the SSIM [14] value is proportional to the similarity
to locate faces and their landmarks as shown in Fig. 4(b). First, between two frames.
using shallow CNN, the positioning model quickly generates While using SSIM to measure visual effects, we use the
candidate windows, each of which has a chance of locating changes in intermediate feature to reflect protection effects.
the face. Second, a more sophisticated CNN refines the can- We train a CNN facial recognition model to extract the in-
didate windows and filters out many of them. Finally, CNN termediate features of the frame. The frame’s dimensionality
outputs the face location and refines the results. Note that the can be swiftly decreased using the convolutional layer. We ba-
positioning model specifically outputs the location of the face sically need to intercept the convolutional layer’s output as an
together with the of the eyes and mouth, called key positions. intermediary feature in our proposed protection method. It is
Later in the cloak generation section, we will detail how to straightforward for us to acquire the Minkowski Distance [37]
identify the relationship between adjacent cloaks through key from the intermediate features in the lower dimensions, called
positions. feature distance. The output of the convolutional layer [38]
on the position (k, x, y) is computed with the region of inputs
according to the following equation:
2) JOINT OPTIMIZATION OF VISUAL EFFECTS AND PRIVACY
PROTECTION C Fw Fh
The basic aspect of the our protection method is to super- F (x) = W(c,m,n,k) ∗ In(c,x+m,y+n) +b j ,
impose the faces by a pixel matrix, called cloak. The cloak c m n
influences the frame in both visual effect and the position of 0 ≤ k ≤ K, 0 ≤ x ≤ Iw − Fw , 0 ≤ y ≤ Ih − Fh
intermediate features. We measure the visual effect changes ⎛ ⎞1
using Structure Similarity Metric (SSIM) [14], [31], [36]. C Fw Fh p

SSIM assesses how similar two frames are, utilizing bright- Dt = ⎝ |F (xa )(m,n,k) − F (xb )(m,n,k) | p⎠
.
ness, contrast, and structure as three different dimensions as c m n
follows: (4)
2μx μy + C1 where the convolution layer is defined by the weight W(c,m,n,k)
l(x, y) = ,
μ2x + μ2y + C1 of height Fh , width Fw and channel C in network. The con-
2σx σy + C2 volution layer scans the space of the inputs with height Ih
c(x, y) = , and width Iw . p is the order of Minkowski Distance. The
σx2 + σy2 + C2
convolutional layer’s parameters are frozen and no longer
   
1 x − μx 1 y − μy changed again after the model is converged. We can map the
s(x, y) = √ . √ . (2) frames to their positions in the feature space using the frozen
N − 1 σx N − 1 σy
recognition model. In addition, simply using a portion of the
where μx and μy represents the mean of x and y, respectively. network for computation leads to lower costs, which may be
C1 and C2 are the divided zero protections, which are constant managed within the user’s tolerance range [11], [13], [14].
time the value range of the images. σx and σy stand for vari- To sum up, the optimization strategy of the protection
ance of x and y, respectively. method should combine the visual effects and feature dis-
Then the SSIM can be obtained by multiplying the above tance. The optimization objective is:
brightness, contrast, and structure similarity as follows:   
L f = − Dt (F (x), F (xm )) + λ1 Dt F (xm ) , F x p ,
SSIM(x, y) = l (x, y) · c(x, y) · s(x, y), Ls = λ max (|Ds (x, xm ) | − ρ, 0)
      
2μx μy + C1 2σxy + C2 − λ2 max |Ds xm , x p | − ρ, 0 ,
=  . (3)
μ2x + μ2y + C1 σx2 + σy2 + C2 min L f − λ3 Ls . (5)

VOLUME 3, 2022 275


GUO ET AL.: EFFICIENT VIDEO PRIVACY PROTECTION AGAINST MALICIOUS FACE RECOGNITION MODELS

often greater than 0.7 in continuous videos. Even identical


images from the same person have a low similarity rate of
typically less than 0.5. The average similarity between the
random images is 0.25.
Fig. 7 presents the optimizations process of individual
images and frames in a video. The process of random ini-
tialization and optimization of individual images is shown in
Fig. 7(a). The central part is the final optimization point. The
protection method uses gradient descent to update the param-
eters of the cloak during training. The iterative processes are
depicted in the figure by the arrows. For a computation to
be efficient, initialization is a crucial step in the optimization
FIGURE 5. The facial recognition model is frozen during optimization.The process. The number of iterations is increased by random or
loss function is directly differentiated against the parameters in the
cloak.The iterative gradient descent procedure gradually updates the
improper initialization. However, the traditional optimization
cloaks to the ideal value. process fails to take into account how the video’s frames
relate to one another, which results in numerous unnecessary
iterations as shown in Fig. 7(b).
To re-generate the cloak for the recent frame, we use the
one from the previous frame. After the iteration process of the
previous frame, the cloak is reused for the subsequent frame
as the initialization in Fig. 7(c). Ideally, the optimization result
of the previous frame is close to the next frame. In the actual
process, if the actions of the two frames is unconspicuous, this
intuitive approach works well. However, the face in the video
have some obvious rotation, which degrades the performance
of the above method. Therefore, we propose a matrix affine
transformation method to track the face rotation, which will
be detailed explain in the following subsection.

FIGURE 6. We choose a number of continuous frames from a video, then 4) RELATIONSHIP BETWEEN CONTINUOUS FRAMES
we use SSIM to assess how similar they are. We choose a few other
people’s individual photos at random for comparison. Continuous videos We re-generate the cloak based on the prior one using an affine
typically have SSIM values that are higher than single images. transformation method [38]. The key positions, such as the
corners of the mouth and the eyes, are first located from the
continuous frames. Additional computational expenses can be
where F is the feature extractor to translate the frames to
avoided by incorporating the task of finding key positions
the feature space. λ max(|Ds(x, xm )| − ρ, 0) calculates SSIM,
within the positioning model. As a result, we create the affine
which is the measure of the view effects. In order to improve
transformation matrix as follows:
the consistency of the video, we consider the feature distance −
and SSIM between adjacent frame x p and recent frame xm . −−−−−−→ →
f (P) f (Q) = ϕ PQ ,
The image is mistakenly assigned to different categories by
    
shifting its location within the feature space. Therefore, using y A | b x
these pre-processed data as guidelines, malicious face recog- = (6)
1 0 ... 0 1 1
nition likewise produces inaccurate results. The loss function
is directly differentiated against the parameters in the cloak in −−−−−−→
where P, Q are the points before affine in f (P) f (Q). y is
Fig. 5. A random pixel matrix, which is the same size as the selected from key positions. A and b are affine matrices that
face, serves as the initiation of a cloak. By using an iterative determine the changes of each point.
gradient descent procedure, the cloaks are gradually updated The corners of the mouth and the centers of the eyes are
to the optimal value. the key positions that the affine transformation method uses
to identify the plane of the face in a recent frame. Once
3) EFFICIENT CLOAK GENERATION affine matrices have been established, we can utilize the affine
There are many similarities between continuous frames in a transformation method as given in. The pixels of the frame
video, including similar positions in feature space and visual are affined in this manner [38]. The areas where the pixels in
effects. As shown in Fig. 6, we select continuous frames from the new frame do not match those in the old frame are filled
a video and measure their similarity using SSIM. For com- using area interpolation. The cloaks are created to match the
parison, we randomly select images of several people from updated frames following the affine transformation method.
the well-known VGG2 data [14], [39]. The SSIM value is Therefore, the optimization process is accelerated.

276 VOLUME 3, 2022


FIGURE 7. The optimization process of the image and video data is visually demonstrated. The optimization target of each image is represented as the
central points, respectively. (a) Two images are protected independently. (b) The objects of consecutive frames in a video has high similarity. (c) The
clocks in the video frames is reused.

Algorithm 1: Privacy Protection.


1: Input Video
2: for images r = 1, . . ., m do
3: Using positioning model to locate the face in the
image
4: Store key positions generated by positioning
model
5: if the Cloak of r − 1 exist then
6: Affine the Cloak of by the key positions
7: else
8: Randomly initializes Cloak FIGURE 8. The affine approach uses three key positions to establish the
9: end if face’s plane in the most recent frame. We employ the affine
10: while loss ≥ threshold do transformation method.The cloak’s pixels are affined in this manner. When
the pixels in the new frame do not match those in the old frame, area
11: Iteratively optimize Cloak interpolation is employed to fill those gaps. The cloaks are created to
12: end while match the updated frames following the affine procedure.
13: Output protected images
14: end for
15: Restore the image to video
16: Output protected video

5) VIDEO PROTECTION PROCEDURE


Frames are removed from the input video at the beginning
of protection method in Algorithm 1. The pixel matrix of
the face is then cuts out once the positioning model has first
identified the key positions on each frame as shown in Fig. 8 .
During this procedure, key positions for each image are noted.
The protection mechanism randomly initializes the cloak’s FIGURE 9. Relation metrics of frames in videos.
parameters while dealing with the first frame. Otherwise, the
affine transformation matrix is generated by extracting the key
positions from the preceding and most recent frames. The loss overhead of the protection method for videos. Moreover, some
function in this protection method is made up of visual effects variants of data are chosen for comparison. Finally, we ana-
and feature distance. The cloak’s parameters are iteratively lyze three loss function indexes to reflect the visual effect and
updated by the optimization function until the loss function protection effect.
is below the threshold, as shown in Algorithm 1. Each frame
is protected and ultimately converted to a video in the manner A. ENVIRONMENT
described above. We deploy our experiment on TensorFlow [39], which runs
with 4 × 2.8 GHz Intel Core i7 CPU, 8 GB memory and
IV. EVALUATION Nvidia GeForce RTX 3060 GPU. After careful considera-
We conduct extensive experiments to evaluate the perfor- tion of the experimental scale, we choose the well-known
mance of the proposed method. This section evaluates the British original drama data for protection [14]. We deploy the

VOLUME 3, 2022 277


GUO ET AL.: EFFICIENT VIDEO PRIVACY PROTECTION AGAINST MALICIOUS FACE RECOGNITION MODELS

FIGURE 10. Evaluation on short videos.

FIGURE 11. Evaluation on long videos.

TABLE 1. Protection Success Rate in Face Recognition Platforms Hyper-parameters: Our approach includes several hyper-
parameters. We undertake a thorough analysis to determine
the ideal hyper-parameter settings in order to improve the
performance of the computation reuse approach. The selection
of hyper-parameters mainly refers to previous experimen-
tal results [14], [39] and the results obtained in actual use.
The first of the hyper-parameters is the number of iterations.
well-performance protection system [14] and list their respec- Each cloak is initialized, either randomly in the conventional
tive calculations. Each configuration is arranged according to method or through the computation reuse process by mapping
previous work [12], [36]. the cloak from the previous frame. The cloaks are iteratively
Metrics: Fawkes simultaneously optimized two different optimized after initialization, with a maximum value of 40
modifications. Users’ visual effects are unaffected. The at- iterations set for each image. Typically, the iteration results
tacker’s training model may be misled by the pre-processed can be finished inside this upper limit. On the other hand, we
videos, misclassifying the real user data. As a result, there established a threshold and used it to gauge calculation speed
are two separate indicators to measure these objectives. The throughout the comparison process. When the rate of opti-
visual differences between videos taken before and after pro- mization within the threshold accelerates, the aforementioned
tection are represented by SSIM. A lower value means that number of iterations can be reduced to boost the optimization
the protection method has less of an influence on the image rate.
or video’s routine use. The protective effect measurement is
the other component. We continue to use the feature extractor B. EVALUATION
in previous work [14]. Through the pre-trained model, the We demonstrate the effects of the traditional method and
feature extractor compares the feature distance between two the computation reuse method on the data’s visual effects in
frames. The method’s improved protective impact is gener- Fig. 9(a), Compared to existing methods, our method gets
ally shown by the noticeable feature change. The total loss the similarity between frames closer than traditional method
function, which is inversely proportional to space loss and to the original video. As shown in Fig. 9(a), both meth-
square of input loss, is summarized at the end. In addition, ods have an impact on the feature continuity of the original
our method can achieve the same image protection efficiency video. Our method is considerably more similar to the original
as Fawkes in Table 1. video.

278 VOLUME 3, 2022


The values of the loss metrics are shown in Fig. 10. We con- REFERENCES
trasted the two examples’ video processing outcomes based [1] M. Abadi et al., “Deep learning with differential privacy,” in Proc. ACM
on the duration of the videos. The first one is for seconds-long SIGSAC Conf. Comput. Commun. Secur., 2016, pp. 308–318.
[2] N. Carlini and D. Wagner, “Adversarial examples are not easily de-
videos, while the second one is for minutes-long ones. Firstly, tected: Bypassing ten detection methods,” in Proc. 10th ACM Workshop
the general situation of picture protection is observed. In the Artif. Intell. Secur., 2017, pp. 3–14.
initial iteration of the process, the loss function is negligible. [3] D. Ambra et al., “Why do adversarial attacks transfer? explaining trans-
ferability of evasion and poisoning attacks,” in Proc. 28th USENIX
Since the cloak’s value is so negligible, it has no discernible Conf. Secur. Symp., 2019, pp. 321–338.
visual effect on the frames. The other side doesn’t really affect [4] N. Aaron et al., “Level playing field for million scale face recogni-
the characteristic space much. Adadelta is the optimization tion,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2017,
pp. 7044–7053.
function we select [37]. So that the loss function can increase [5] T. Li and L. Lin, “AnonymousNet: Natural face de-identification with
quickly at roughly 2–5 iterations, this optimization technique measurable privacy,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern
accumulates and updates in the direction of the first iterations. Recognit. Workshops, 2019, pp. 56–65.
[6] M. Fredrikson, S. Jha, and T. Ristenpart, “Model inversion attacks
Adadelta can prevent the initial optimization process from that exploit confidence information and basic countermeasures,” in
staying in the local best quality. Firstly, observe the three Proc. 22nd ACM SIGSAC Conf. Comput. Commun. Secur., 2015,
indicators in the short video shown in Fig. 10(a). For input pp. 1322–1333.
[7] S. T. Jan, J. Messou, Y.-C. Lin, J.-B. Huang, and G. Wang, “Connecting
space loss shown in Fig. 10(b), due to random initialization, the digital and physical world: Improving the robustness of adversarial
the traditional method suffers a minor loss at the beginning. attacks,” in Proc. 33rd AAAI Conf. Artif. Intell. 31st Innov. Appl. Artif.
It is quickly optimized, nevertheless, to bring about several Intell. Conf. 9th AAAI Symp. Educ. Adv. Artif. Intell., 2019, Art. no. 119.
[8] Q. Sun, A. Tewari, W. Xu, M. Fritz, C. Theobalt, and B. Schiele, “A
alterations. After optimization, we simultaneously stay online hybrid model for identity obfuscation by face replacement,” in Proc.
and vary till around 30 iterations have passed. Feature space Eur. Conf. Comput. Vis., 2018, pp. 553–569.
Loss presents the same trend in Fig. 10(c), but in the end, loss [9] Y. Wu, F. Yang, Y. Xu, and H. Ling, “Privacy-protective-GAN for
privacy preserving face de-identification,” J. Comput. Sci. Technol.,
has undergone more notable alterations than computational vol. 34, no. 1, pp. 47–60, 2019.
reuse. Unpredictability before and after the frames, of course, [10] S. Thys, W. V. Ranst, and T. Goedemé, “Fooling automated surveil-
could influence visual effects. Based on the optimization loss lance cameras: Adversarial patches to attack person detection,”
in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2019,
results above, the computation reuse can be utilized below pp. 49–55.
the threshold in fewer iterations, saving 33.4 % computational [11] Z. Wu, S.-N. Lim, L. S. Davis, and T. Goldstein, “Making an invisibility
costs. cloak: Real world adversarial attacks on object detectors,” in Proc. Eur.
Conf. Comput. Vis., 2020, pp. 1–17.
The outcomes are comparable to the short video, as seen [12] A. Shafahi et al., “Poison frogs! Targeted clean-label poisoning attacks
from the perspective of the extended video, as shown in on neural networks,” in Proc. 32nd Int. Conf. Neural Inf. Process. Syst.,
Fig. 11. Finding optimal values is made easier by using com- 2018, pp. 6103–6113.
[13] C. Zhu, W. R. Huang, H. Li, G. Taylor, C. Studer, and T. Goldstein,
putation reuse. Moreover, the benefits were more obvious “Transferable clean-label poisoning attacks on deep neural nets,” in
than with short videos. Similar frames are more common Proc. 36th Int. Conf. Mach. Learn., 2019, pp. 7614–7623.
during the calculating process. The previous calculation re- [14] S. Shan, E. Wenger, J. Zhang, H. Li, H. Zheng, and B. Y. Zhao,
“Fawkes: Protecting privacy against unauthorized deep learning mod-
sults are comparatively stable and can be utilized immediately els,” in Proc. 29th USENIX Conf. Secur. Symp., 2020, Art. no. 90.
for calculating the next frame. The final experimental result, [15] K. Zhang, Z. Zhang, Z. Li, and Y. Qiao, “Joint face detec-
however, continued to use the maximum value of iteration tion and alignment using multitask cascaded convolutional net-
works,” IEEE Signal Process. Lett., vol. 23, no. 10, pp. 1499–1503,
as the standard rather than the average result since we did Oct. 2016.
not consciously enhance the judgment of the similarity of the [16] Y. Mirsky and W. Lee, “The creation and detection of deepfakes: A
number of related frames in order to lessen the computing survey,” ACM Comput. Surv., vol. 54, no. 1, pp. 1–41, 2021.
[17] L. A. Gatys, A. S. Ecker, and M. Bethge, “Image style transfer using
burden. It is important to note that the iterative option uses convolutional neural networks,” in Proc. IEEE Int. Conf. Comput. Vis.,
the iterative setting rather than the loss function’s threshold 2016, pp. 2414–2423.
value. Because the loss function can occasionally oscillate [18] H. Wu et al., “PECAM: Privacy-enhanced video streaming and analytics
via securely-reversible transformation,” in Proc. 27th Annu. Int. Conf.
above and below the threshold value, altering the outcome of Mobile Comput. Netw., 2021, pp. 229–241.
the protection. [19] O. Gafni, L. Wolf, and Y. Taigman, “Live face de-identification
in video,” in Proc. IEEE/CVF Int. Conf. Comput. Vis., 2019,
pp. 9377–9386.
V. CONCLUSION [20] F. Mo et al., “DarkneTZ: Towards model privacy at the edge using
Pre-processing of images containing personal data is crucial trusted execution environments,” in Proc. 18th Int. Conf. Mobile Syst.
for maintaining DL privacy. We present a video study and do Appl. Serv., 2020, pp. 161–174.
[21] M. Xu, X. Zhang, Y. Liu, G. Huang, X. Liu, and F. X. Lin, “Approximate
an empirical analysis of the affine relations for cloaks. We also query service on autonomous IoT cameras,” in Proc. 18th Int. Conf.
consider the limitations of earlier protection methods, which Mobile Syst. Appl. Serv., 2020, pp. 191–205.
are limited to individual image protection. As a result, the [22] C. Song, T. Ristenpart, and V. Shmatikov, “Machine learning models
that remember too much,” in Proc. ACM SIGSAC Conf. Comput. Com-
protection procedure is improved by the affine technique. We mun. Secur., 2017, pp. 587–601.
adapt the cloak to the ongoing frames using key positioning [23] N. Carlini et al., “Towards evaluating the robustness of
model. Our evaluation shows that our method works much neural networks,” in Proc. IEEE Symp. Secur. Privacy, 2017,
pp. 39–47.
better than the conventional ways; for instance, the 35.5% [24] C. Dwork, “Differential privacy: A survey of results,” in Proc. Int. Conf.
computation consumption is saved. Theory Appl. Models Comput., 2008, pp. 1–19.

VOLUME 3, 2022 279


GUO ET AL.: EFFICIENT VIDEO PRIVACY PROTECTION AGAINST MALICIOUS FACE RECOGNITION MODELS

[25] Z. Yang et al., “Neural network inversion in adversarial setting via back- PENG LI (Senior Member, IEEE) received the
ground knowledge alignment,” in Proc. ACM SIGSAC Conf. Comput. B.S. degree from the Huazhong University of Sci-
Commun. Secur., 2019, pp. 225–240. ence and Technology, Wuhan, China, in 2007,
[26] N. Johnson, J. P. Near, and D. Song, “Towards practical differential and the M.S. and Ph.D. degrees from the Univer-
privacy for SQL queries,” Proc. VLDB Endowment, vol. 11, no. 5, sity of Aizu, Aizuwakamatsu, Japan, in 2009 and
pp. 526–539, 2018. 2012, respectively. He is currently a Senior As-
[27] F. D. McSherry, “Privacy integrated queries: An extensible platform for sociate Professor with the University of Aizu. He
privacy-preserving data analysis,” in Proc. ACM SIGMOD Int. Conf. has authored or coauthored more than 100 papers
Manage. Data, 2009, pp. 19–30. in major conferences and journals. His research
[28] F. Bastani et al., “MIRIS: Fast object track queries in video,” in Proc. interests mainly include cloud/edge computing,
ACM SIGMOD Int. Conf. Manage. Data, 2020, pp. 1907–1921. Internet-of-Things, distributed AI systems, and AI
[29] Z. Cai, M. Saberian, and N. Vasconcelos, “Learning complexity-aware security and privacy. He was the recipient of the Young Author Award of
cascades for deep pedestrian detection,” in Proc. IEEE Int. Conf. Com- IEEE Computer Society Japan Chapter in 2014, Best Paper Award of IEEE
put. Vis., 2015, pp. 3361–3369. TrustCom 2016, and Best Paper Award of IEEE Communication Society Big
[30] F. Cangialosi, N. Agarwal, V. Arun, S. Narayana, A. Sarwate, and Data Technical Committee in 2019. He supervised students to win the First
R. Netravali, “Privid: Practical, privacy-preserving video analytics Prize of IEEE ComSoc Student Competition in 2016. Dr. Li was also the
queries,” in Proc. 19th USENIX Symp. Netw. Syst. Des. Implementation, recipient of the 2020 Best Paper Award of IEEE Transactions on Computers.
2022, pp. 209–228. Dr. Li is the Editor of IEEE OPEN JOURNAL OF THE COMPUTER SOCIETY, and
[31] J. Steinhardt, P. W. Koh, and P. Liang, “Certified defenses for data IEICE Transactions on Communications.
poisoning attacks,” in Proc. 31st Int. Conf. Neural Inf. Process. Syst.,
2017, pp. 3520–3532. SHUI YU (Senior Member, IEEE) received the
[32] O. Suciu, R. Marginean, Y. Kaya, H. Daume III, and T. Dumitras, Ph.D. degree from Deakin University, Burwood,
“When does machine learning generalized transferability for evasion VIC, Australia, in 2004. He currently is a Professor
and poisoning attacks,” in Proc. USENIX Conf. Secur. Symp., 2018, with the School of Computer Science, University
pp. 1299–1316. of Technology Sydney, Ultimo, NSW, Australia.
[33] Y. Wu et al., “DeltaGrad: Rapid retraining of machine learning models,” He initiated the research field of networking for
in Proc. 37th Int. Conf. Mach. Learn., 2020, pp. 10355–10366. big data in 2013, and his research outputs have
[34] B. Wang et al., “Neural cleanse: Identifying and mitigating backdoor been widely adopted by industrial systems, such
attacks in neural networks,” in Proc. IEEE Symp. Secur. Privacy, 2019, as Amazon cloud security. He has authored or
pp. 707–723. coauthored four monographs and edited two books,
[35] B. Wang, Y. Yao, B. Viswanath, H. Zheng, and B. Y. Zhao, “With more than 500 technical papers, including top jour-
great training comes great vulnerability: Practical attacks against nals and top conferences, such as IEEE TRANSACTIONS ON PARALLEL AND
transfer learning,” in Proc. 27th USENIX Conf. Secur. Symp., 2018, DISTRIBUTED SYSTEMS, IEEE TRANSACTIONS ON COMPUTERS, IEEE TRANS-
pp. 1281–1297. ACTIONS ON INFORMATION FORENSICS AND SECURITY, IEEE TRANSACTIONS
[36] M. Sharif, S. Bhagavatula, L. Bauer, and M. K. Reiter, “Accessorize to ON MOBILE COMPUTING, IEEE TRANSACTIONS ON KNOWLEDGE AND DATA
a crime: Real and stealthy attacks on state-of-the-art face recognition,” ENGINEERING, IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING,
in Proc. ACM SIGSAC Conf. Comput. Commun. Secur., 2016, pp. 1528– IEEE/ACM TRANSACTIONS ON NETWORKING, and INFOCOM. His research
1540. interests include Big Data, security and privacy, networking, and mathe-
[37] O. Suciu et al., “When does machine learning fall? generalized transfer- matical modeling. His h-index is 66. He is currently serving a number of
ability for evasion and poisoning attacks,” in Proc. 27th USENIX Conf. prestigious editorial boards, including IEEE COMMUNICATIONS SURVEYS
Secur. Symp., 2018, pp. 1299–1316. AND TUTORIALS as an Area Editor, IEEE Communications Magazine, IEEE
[38] G. Singh, R. Ganvir, M. Püschel, and M. Vechev, “Beyond the single INTERNET OF THINGS JOURNAL, and so on. He was a Distinguished Lecturer
neuron convex barrier for neural network certification,” in Proc. 33rd of IEEE Communications Society (2018–2021). He is a Distinguished Visitor
Int. Conf. Neural Inf. Process. Syst., 2019, Art. no. 1352. of IEEE Computer Society, a Voting Member of IEEE ComSoc Educational
[39] M. Abadi et al., “TensorFlow: A system for large-scale machine learn- Services Board, and an Elected Member of Board of Governor of IEEE
ing,” in Proc. 12th USENIX Conf. Operating Syst. Des. Implementation, Vehicular Technology Society.
2016, pp. 265–283.
HAO WANG (Senior Member, IEEE) is cur-
rently an Associate Professor and the Head of
the Big Data Laboratory, Department of ICT and
Natural Sciences, Norwegian University of Sci-
ence and Technology, Trondheim, Norway. He
was a Researcher with IBM Canada, McMaster,
and St. Francis Xavier University, Antigonish, NS,
ENTING GUO received the master’s degree with Canada, before he moved to Norway. His research
the Nanjing University of Posts and Telecommu- interests include Big Data analytics and industrial
nications, Nanjing, China, in 2020. He is currently Internet of Things, high-performance computing,
working toward the Ph.D. degree from the School safety-critical systems, and communication secu-
of the Division of Computer Science, University rity. He has authored more than 60 papers in the IEEE TVT, GlobalCom
of Aizu, Aizuwakamatsu, Japan. His research in- 2016, Sensors, the IEEE Design & Test, and Computer Communications. He
terests include AI systems, and AI security and is a Member of the IEEE IES Technical Committee on Industrial Informatics.
privacy. He was a TPC Co-Chair of the IEEE DataCom 2015, IEEE CIT 2017, and
ES 2017, and a Reviewer of journals, such as the IEEE TKDE, TBD, TETC,
T-IFS, and ACM TOMM.

280 VOLUME 3, 2022

You might also like