Loki 2
Loki 2
Loki 2
Scientific Programming
Volume 2022, Article ID 2508690, 15 pages
https://doi.org/10.1155/2022/2508690
Research Article
Evaluating the Privacy Policy of Android Apps: A Privacy Policy
Compliance Study for Popular Apps in China and Europe
Kaijun Liu ,1,2 Guoai Xu ,1,2 Xiaomei Zhang ,3 Guosheng Xu ,1,2 and Zhangjie Zhao 4
1
School of Cyberspace Security, Beijing University of Posts and Telecommunications, Beijing 100876, China
2
National Engineering Research Center of Mobile Network Security, Beijing University of Posts and Telecommunications,
Beijing 100876, China
3
China Cybersecurity Review Technology and Certification Center, Beijing 100020, China
4
Beijing Big Data Center, Beijing 100101, China
Received 20 February 2022; Revised 26 June 2022; Accepted 26 July 2022; Published 23 August 2022
Copyright © 2022 Kaijun Liu et al. This is an open access article distributed under the Creative Commons Attribution License,
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Recently, with the increase in the market share of the Android system and the sharp increase in the number of Android mobile
apps, many countries and regions have successively launched laws and regulations related to data security. The EU’s GDPR and
China’s Information Security Technology-Personal Information Security Specification are two of the most important bills,
affecting vast areas and large populations. Both regulations impose requirements on privacy policy specifications for Android
apps. With these requirements, however, apps’ privacy policies have become larger. Researchers have conducted studies on
whether the actual privacy behavior of apps conforms to their privacy policy description but have not focused on compliance with
the privacy policy itself. In this paper, we propose evaluation metrics for privacy policy compliance and evaluate popular apps by
analyzing privacy policies and apps. We applied our method to 1,000 apps from the Google Play Store in Europe and 1,000 apps
from the Tencent Appstore in China. We detected a number of app privacy policy noncompliance issues and discovered a number
of privacy issues with third-party services and third-party libraries.
In previous studies, researchers conducted research on This information includes your device brand, device
the app privacy policy and the actual privacy behavior of the model, network status, etc. The device information we
app, and analyzed whether the privacy policy was consistent collect does not contain any user-sensitive information,
such as device ID or any information that can be used
with the app’s behavior [7]. Researchers also focused on the to permanently identify the user or device.
issue of consent notices displayed by apps after the relevant
laws took effect [8]. But the community lacks research on the Figure 1: An excerpt of the privacy policy of Super Space Cleaner.
compliance of the app’s privacy policy itself. Figure 1 shows
an excerpt of the privacy policy of Super Space Cleaner [9],
an app with over 5 million downloads on the Google Play 2. Context of Our Work and Research Questions
Store. It is not difficult to find that such a disclosure of Our work focused on whether the privacy policies of popular
privacy behavior is not an effective way to show users what Android apps in the two major markets are complete under
kind of data the app needs and how the data will be used. the requirements of GDPR and the Specification and
Even if the user agrees to the policy and uses the app, the user whether a noncompliant privacy policy means that the app
has no idea what personal data might be sent or stored. This has more privacy issues. Is there a strong correlation be-
raises a question: is the privacy policy of the popular apps tween the app’s privacy policy and the app’s own privacy
GDPR-compliant? Does a noncompliant privacy policy practices? In this section, we present the related work of the
mean more noncompliant data breaches? study and our research goals.
To investigate app privacy policy compliance issues, we
conducted a study on the top 1,000 most popular apps each
from Google Play Store in Europe and Tencent Appstore in 2.1. Context of Our Work. With the popularity of the An-
China [10]. We used popular apps from Google Play Store in droid operating system and the widespread use of Android
Europe to test their privacy policies for compliance with apps, many researchers are concerned about the veracity of
GDPR and popular apps from the Tencent Appstore in the privacy policy of the app and whether the actual privacy
China to test their privacy policies for compliance with the behavior of the app is consistent with the privacy policy.
Specification. We first identified the specification of privacy In 2016, there was still lack of definition for the privacy
policy writing in accordance with GDPR and the Specifi- policy writing specification. Yu et al. [11] used NLP to
cation and then built an automated and scalable pipeline for analyze the privacy policy of Android apps sentence by
analyzing the app’s privacy policy specification and the app’s sentence and analyzed the app’s calls to privacy APIs and
possible collection of private data with actual privacy be- data sending behaviors to determine whether the remaining
havior and applying it to our dataset. The related code and privacy policies matched. Slavin et al. [7] focused on the
sample app list are available at https://github.com/ language structure of privacy policies and proposed a
xingyueren-qinmu/PPE_code. framework based on a privacy policy phrase ontology and a
Our research makes the following contributions: set of mappings from API methods to policy phrases, which
(i) A privacy policy evaluation scheme is proposed. By was used to detect violations. Wang et al. [12] focused on
analyzing the privacy policy text and app, the discovering guidance for private data input from a GUI and
readability, completeness, and accuracy of the policy proposed a static analysis method that can be used to analyze
are obtained. dynamic GUI interfaces for large-scale analysis of app in-
terfaces and identify possible data breaches. Kununka [13]
(ii) We tested 1,000 app samples each from the Google
et al. conducted a detailed study on a small number of apps
Play Store and Tencent Appstore to understand the
and found that the privacy behavior and privacy policies of
privacy policy compliance of current popular apps.
the target samples were inconsistent. However, due to the
(iii) We analyzed the test results and discussed the small number of analysis samples, there may be data dis-
presentation format of the apps' privacy policies, the crimination problems. Yu et al. [14] designed a tool to
completeness and accuracy of privacy policies, user comprehensively analyze privacy policies, app bytecodes,
consent violations, and the privacy behaviors of app advertisement descriptions, and app permissions to
third-party services and libraries. discover app privacy violations. During their research, they
This article is organized as follows: Section 2 introduces proposed nine semantic patterns in privacy policies, cov-
the relevant work and research objectives of our research; ering most of the privacy policy topics. In 2018, Ferrara and
Section 3 introduces the privacy policy requirements in Spoto [15] performed the automated static analysis of apps
GDPR and the Specification, as well as our perspective on to customize reports for the four major players in the GDPR
evaluating privacy policies; Section 4 introduces our privacy compliance process. Momen et al. [16] compared whether
policy text analysis method; Section 5 presents our app the app privacy problem had improved before and after the
analysis method; Section 6 is an indepth analysis of our data introduction of GDPR through a detailed analysis of app
results, Section 7 is the discussion; and Section 8 is the permissions. Their research shows that the GDPR has a
conclusion. certain normative effect on app privacy behavior. Fan et al.
Scientific Programming 3
[17] proposed an automated system to detect GDPR vio- We examined the top 100 most downloaded apps in
lations of apps and privacy policies by identifying data each of 10 categories in October 2021 from the Tencent
practices declared in the app’s privacy policy and data-re- Appstore, for which reason we cannot get the total
lated behaviors in the app code. Guamán et al. [18, 19] ranking of apps for the Tencent Appstore but can only
focused on whether the app violated the relevant provisions collect the rankings of classified apps in the Tencent
of the GDPR when transmitting data across borders. In 2021, Appstore. See the app list in our GitHub repository for
Nguyen et al. [8] examined the illegal behavior of apps details. The app study described in the following sec-
collecting privacy data before the user had agreed to the tions is based on these 2,000 apps.
privacy statement. They paid special attention to the data Privacy Policy Samples. The above 2,000 apps have links
collection behavior of the advertising domain and informed to their privacy policies on the corresponding pages of
the developers of this behavior. the app store. We crawled the privacy policies of these
apps. The follow-up studies are conducted based on
these samples.
2.2. Research Questions. In this work, we examined the legal
compliance with privacy policies for popular Android apps.
We first proposed three metrics for analyzing app privacy 3. Privacy Policy Specification
policy compliance and developed an automated pipeline to
implement the analysis method. Then, we performed the 3.1. Legal Background of Privacy Policy. In GDPR Section 3,
tests on apps in China and Europe. Finally, we conducted an Rights of the Data Subjects, clause 13, “Information to be
indepth analysis of the detection data. Specifically, our study provided where personal data is collected from the data
aims to answer the following research questions: subject,” and clause 14, “Information to be provided where
personal data has not been obtained from the data subject,”
RQ1: what are the requirements of laws and regulations specify that data controllers need to provide information to
for the app’s privacy policy? We summarized and data subjects. Articles 15–22 enumerate the rights of the data
compared the GDPR and the Specification in the subject. From this, the GDPR requirements for privacy
privacy policy-related regulations and sorted a total of policy writing can be distilled. In the Specification, Article
10 requirements. 5.5 “Personal Information Protection Policy” proposes the
RQ2: How can we evaluate a privacy policy’s compli- requirements of the personal information protection policy
ance? We proposed three evaluation perspectives for for personal information controllers, and a template ex-
privacy policies, which are the integrity of app privacy ample of the personal information protection policy is
policy; privacy data collection before user consent; and shown in Appendix of Specification.
accuracy of the app privacy policy. Then, we designed Using the requirements and templates of GDPR and the
the corresponding automated detection method and Specification, we summarize the specifications for the
applied the method to 1,000 popular apps each from writing of privacy policies. The main topics of privacy
Google Play Store in Europe and Tencent Appstore in policies are as follows:
China.
Summary of the Policy. This part includes the scope of
RQ3: what is the privacy policy compliance of Tencent products or services to which the personal information
Appstore (popular apps in China) and Google Play protection the policy applies, applicable personal in-
Store in Europe (popular apps in Europe)? We compare formation subject type, effective and updated time, and
the privacy policy compliance of popular apps from the the policy directory.
Tencent Appstore and Google Play Store in Europe
from multiple perspectives and conduct indepth dis- How the App Collects and Uses Personal Data? This part
cussion on privacy violations by third-party services. needs to list the types of personal information required
and the method of collection (such as obtaining per-
mission through the API and obtaining it through
2.3. Data Collection. cookies) based on different business functions or
Regulation Text. We downloaded the GDPR text file interest.
from the GDPR official website [3] (updated to How Personal Data Are Shared, Transferred, and
23.05.2018). We downloaded the full text of the Publicly Disclosed? This section needs to explain the
specification from the China National Standard Full- reasons for sharing and transferring personal infor-
Text Open System [20] (updated to 06.03.2020). The mation; the types of personal information that need to
research on the content of regulations in the following be shared and transferred; and the types and identities
sections will be based on these two texts. of data recipients. This section also needs to describe
App Samples. To obtain the popular apps for Google the types and reasons for the public disclosure of
Play Store in Europe, we crawled the popular app list on personal data, where appropriate, when the consent of
Diandian Data [21] from October 2021 for each EU the data subject is not needed (e.g., such as information
member state and got the most popular 500 free apps required by a legal authority).
and 500 paid apps sorted by the number of downloads. How the Data Controller Protects the User’s Personal
Then, we downloaded these apps using a web crawler. Data? This part of the Specification requires the data
4 Scientific Programming
controller to explain the security protection measures regulations are clearly stated in the app and whether
for the user’s personal data, the personal information data collection is started after the user agrees. Section
security protocol currently followed, and the certifi- 5 introduces this part in detail.
cation and security obtained. There are no clear re- (3) Evaluation of the Accuracy of the Privacy Policy. This
quirements in this part of the GDPR. part focuses on whether the content of the privacy
How the Data Controller Stores the User’s Personal policy is consistent with the actual privacy behavior
Data? This section explains where and for how long the of the app. The specific analysis method is introduced
data controller stores the user’s personal data. in Section 5.
Data Subject Rights. This section needs to express to the
users their rights as a data subject, including data access 4. Privacy Policy Integrity
rights, rectification rights, deletion rights, restriction of
processing rights, data portability rights, right to object, In this section, we introduce the privacy policy integrity
and automation of personal decision-making. analysis method. The text of the privacy policy is expressed
in various ways, which brings difficulties to the analysis.
How the Data Subject Can Lodge a Complaint or Appeal Considering the analysis efficiency, combined with the
with the Supervisory Authority? This section requires characteristics of the privacy text with paragraph subtitles,
the data controller to publish the complaint, the we chose to analyze the theme of the privacy policy. We
reporting method, and the response time to the considered a privacy policy to be complete if it addressed all
complaint for the user. the topics covered by the relevant regulations.
How to Handle Children’s Personal Information? This
section needs to explain that the collection of children’s
information requires the consent of the guardian. 4.1. Privacy Policy Text Preprocessing. We crawled the pri-
vacy policy from the app stores, and due to the diverse
In addition to the above requirements, the GDPR structure of the privacy policy page, we used the URL2io
requires that the privacy policy contains a corresponding service [22] to extract the body of the web page. URL2io
legal basis, that is, the policy needs to state that the basis filtered out the node tree that may be the text in the webpage
for writing is the GDPR. In the Specification, complaints source code and used the evaluation algorithm based on
and responses are mandatory in privacy policies. machine learning to predict the text node. Of the 2,000
privacy policies, we successfully obtained 1,834 privacy
3.2. Explicit Privacy Policy to Users. According to our policy texts using URL2io and 62 privacy policies manually;
analysis, the average length of the English privacy policy the rest were either linked incorrectly or did not exist in text
is more than 3,000 words, and the average length of the form.
Chinese privacy policy is more than 8,000 words, which We chose to classify the title content in the text of the app
makes it difficult for users to read before using the app. to judge the integrity of the subject of the privacy policy. The
As a result, various laws and regulations require apps to reason we did not extract the subject from the body para-
explicitly ask users for consent to ensure that users are graphs is that developers formulate privacy policies differ-
aware of the app’s privacy acquisition behavior. GDPR ently. In the absence of standard datasets for training
requires consent in Section 2.2 to be freely given, specific, classification models, it is difficult to accurately extract the
informed, and unambiguous. In recent years, China has content of paragraphs. In contrast, analyzing headlines that
contacted and issued a number of testing bases and consist of only one sentence is much easier. We first used
testing specifications, requiring that, after the app is regular expression matching to obtain the content of the text
installed and before the basic business starts, the per- title, such as matching #, ∗, and other format markers, along
sonal data processing rules should be clearly displayed to with paragraph numbers, and built a text tree according to
users through the interactive interface. Therefore, the title number and title level and output to the privacy
whether it is an app applicable to the GDPR or the policy integrity analysis module.
Specification, when the app starts to obtain and collect
the user’s private data, it must seek the user’s consent 4.2. Method of Privacy Policy Integrity Analysis. Figure 2
through a visual interface. shows the workflow of integrity analysis. First, we manu-
Based on the above two regulations' requirements on ally extracted some of the titles in the privacy policy to
privacy policy, our evaluation of the privacy policy was construct a topic-candidate title (T-CT) table. Then, we used
divided into three parts: Bert [23] to calculate the features of the candidate titles (CTs)
(1) Evaluation of the Integrity of the Privacy Policy. Pay and the privacy policy titles (PPTs) to be tested. Next, we
attention to whether the content of the app privacy used the cosine similarity algorithm to calculate the simi-
policy is complete and whether it contains the topics larity between PPT and CT. Finally, we determined the
required by law. Section 4 presents our specific threshold of similarity through several experiments to realize
approach. the privacy policy integrity analysis work.
(2) Evaluation of Asking for Consent for Collecting Pri- Topic-Candidate Title Table Building: Both English and
vacy Data. Pay attention to whether the privacy Chinese privacy policies are diverse in the way they
Scientific Programming 5
CTF1 CTF2 CTF3 CTF4 CTFn PPTF PPTF PPTF PPTF PPTF
1 2 3 4 n
Bert Bert
Model Model
CT1 CT2... CTn CT1 CT2... CTn PPT1 PPT2 ... PPTn PPT1 PPT2 ... PPTn
Topic 2
Topic 1
Privacy
Privacy
Policy
Policy
1
2
Figure 2: Workflow of privacy policy subject integrity analysis.
express and describe the same topic matter. Therefore, “Information Collection”: [
“How Do We Collect Your Information”,
if only one expression of the topic is used to calculate “Information We Collect About You”,
similarity with the title in the privacy policy text, it may “What information Do We Receive”,
lead to false negatives. That is, the privacy policy ac- “The information we collect”,
“Collection of your personal infromation”,
tually contains the topic, but the similarity between the “What we collect”
privacy policy text and the topic text is too low because ]
of the different expressions. To avoid this, we need to Figure 3: Candidate titles of the topic “how to collect users’
create a table of topic-candidate titles. We manually personal data?”
sampled 40 policies from each of the two stores. The
criteria we followed when sampling was that the sample
met the subject requirements of GDPR and the embedding layer is divided into three parts: word
Specification. To avoid discrimination, multiple sam- embedding, segmentation embedding, and position
ples from the same developer were not used, and embedding. The encoder layer uses the bi-directional
samples were evenly distributed according to the attention mechanism of the transformer, and there are
number of downloads (we automatically extracted the 12 encoder layers consisting of the attention mecha-
downloads of the app from the store page when nism network and the forward fully connected net-
obtaining the samples). We extracted the description of work. The pooler layer is responsible for downscaling
each subject that met the requirements from the sample the result vector from the encoder layer to the final
as a CT. Figure 3 shows possible expressions for the result vector. In our study, for the Chinese privacy
topic “how to collect users’ personal data”, all of which policy, we use the BERT-Large Chinese model pro-
make up our T-CT table. posed by Cui et al. [24], which is a Chinese Bert model
with a word list size of 21128, a number of nodes in the
Bert-Based Sentence Feature Calculation (Figure 4). We
hidden layer of 1024, a total of 24 layers in the encoder
chose the Bert model for our experiments. Bert is a very
layer, and a number of parameters of 330 M. For the
popular pretrained language model that understands
English privacy policy, we use the BERT-Large English
language well and performs downstream tasks well with
model provided by Turc et al. [25]. The word table size
a small amount of data fine-tuning. Bert used
is 30,522, the number of nodes in the hidden layer is
Transformer's bi-directional attention mechanism to
1024, the encoder layer has 24 layers, and the number of
design the network and designed two semi-supervised
parameters is 340 M.
training methods, Mask Language Model and Next
Sentence Prediction, in the pre-training process. The The specific feature process is as follows. First, we use the
first part is the embedding layer, the second part is the Bert model to compute the feature vectors for each CT in the
encoder layer, and the third part is the pooler layer. The T-CT table in turn. Specifically, the average of the
6 Scientific Programming
1024
Encoder
24
Encoder layers
1024
Encoder
Tokenizer
embedding output for each word in the headline is calcu- belongs contains the topic to which the CT belongs.
lated. Then, for each privacy policy to be tested, we extract After several rounds of debugging, we finally set the
the titles in the text as PPTs and compute their feature similarity threshold to 0.914 for Chinese and 0.922 for
vectors using the same method. The feature vectors of all our English.
CTs and PPTs are recorded in the library and used for
1024
k�1 Wk CTi × Wk PPTj
subsequent similarity calculations: simCTi , PPTj � ��������������� ����������������2 .
2
k�1 Wk CTi × 1024
1024
k�1 Wk PPTj
Similarity Calculation. We computed the semantic (1)
features of all CTs and PPTs using the Bert model. Here,
we chose the cosine similarity algorithm to calculate the
similarity between the two feature values. The cosine
5. App Analysis
similarity algorithm measures the similarity between
two vectors by measuring the cosine of the angle be- In this section, we discuss two analyses we applied to apps in
tween them. The sensitivity of this algorithm to the the Google Play Store and Tencent Appstore. Through dy-
direction of the vectors makes it widely used in clas- namic analysis, we detect data collection without consent,
sification work. For the output of the Bert model, we and we detect the correctness of privacy policies through
traverse the eigenvalues of each PPT and use cosine static analysis on the app. We describe our testing methods
similarity to calculate its similarity to each CT eigen- and processes in Sections 5.1 and 5.2, respectively.
value. Formula (1) shows the calculation of the simi-
larity of semantic features of two titles, where CTi
denotes the ith candidate title, PPTj denotes the jth 5.1. Data Collection without Consent. Both the GDPR and
privacy policy title, and Wk (x) denotes taking the kth the Specification require apps to ask users for consent, and
dimensional component of the eigenvectorx. Once the that consent must be freely given, specific, informed, and
similarity between two titles exceeds a threshold, we unambiguous. That is to say, the acquisition and
assume that the privacy policy to which the PPT transmission of personal data must only take place after
Scientific Programming 7
the user’s active consent (e.g., “click to accept”); oth- API and Traffic Correlation Analysis. We first searched
erwise, the app’s privacy policy may be deemed to be in the traffic for private data that can be obtained through
violation. the API, and we focused on the data shown in Table 2.
To detect whether the app’s privacy policy is clearly These data are from Appendix A of the Specification,
showed and contains potential violations, we designed an but some of them were not found during the analysis.
automatic detection scheme to dynamically analyze the app. So, the final test results are based on the display in
We chose to run the inspected app on a device (Pixel 1, Section 6.3. We obtained the relevant values from the
Android 7.1.2) equipped with the Xposed framework [26] test device via the Android debug bridge. We traversed
and our designed Hook module to analyze the app’s private the traffic log, performed string matching on the pa-
data acquisition behavior. To identify app data sending is- rameters, and requested the raw data from each request.
sues, we monitored the app’s traffic. To intercept the TLS The data used for string matching contains the private
traffic, devices used our own root certificate for detection, data content and the category to which it belongs.
and we used mitmproxy [27] to do this. We installed our Taking “latitude 32.899” for instance, we use the various
own root certificate as a system certificate. Our detection ran precisions of 32.899 and various spellings of the word
in the environment described above. Figure 5 shows our “latitude” for string matching. Once there is a hit, the
analysis process. We first installed the app and then executed app has a privacy violation. Many apps send out
it automatically. During the execution process, we moni- encrypted traffic content, and simple string matching is
tored API calls and network traffic. The specific scheme is as ineffective with this kind of traffic. Our monitoring of
follows. cryptographic APIs gave us the ability to analyze the
content of encrypted traffic. Consider the following
App Installation. Nguyen et al. [8] agreed to all the example:
permission requests of the apps when installing them. y � AES(x), z � base64(y). (2)
Their purpose was to make the app have sufficient
ability to obtain all possible data and identify the be- when we monitor the traffic whose request raw is z (or
havior of sending private data. In this research, for the contains z), by traversing the input and output of the
same app, we used two installation methods: granting cryptography API, we can get the input y from the
all permissions and retaining all permissions, that is, Base64 call and get the input x from the AES call, so as
pipeline A simulated a normal usage scenario for the to decrypt the cypher traffic. To deal with the possi-
user, and no permissions were granted during instal- bility of multiple layers of encryption (such as AES
lation; in pipeline B, all permissions were granted to the (AES (. . .AES (x). . .))), we keep traversing the cryp-
app during installation. tographic API until the associated input and output
Automated Execution. We automate the execution of content cannot be obtained. In the previous research,
each app after it is installed. When the app runs for the we found that, among the 48 lines of traffic, only one
first time, it may display a welcome screen that is not line was encrypted twice, so this design has little
related to the analysis. For this situation, we customized impact on the analysis efficiency of the system. We
the automatic running rules to let the app enter its real tested the above scheme and the scheme based on
starting interface and then stop interfering with the cryptography decryption in [29] on the “No. 1
running of the app. At this point, the app may have Community” app and analyzed 12 encrypted API
three forms of expression: the permissions for the app; Hook results and 8 traffic monitoring results. Our
displaying the privacy consent interface; and entering method took 0.434 s, and the method in [29] took
the main interface. We used the Android system tool UI 1.568 s (the test environment was MACOS/2.6 GHz
Automator [28] to obtain the structural information of 6core Core-i7/16 GB RAM). The advantages were
the interface for subsequent analysis. Note that apps do significant.
not ask us to pay and create an account during this Identification of Privacy Leakage Behaviors. We di-
process, and we are not required to do so. vided privacy behaviors into three types: data leakage,
API Monitoring and Traffic Monitoring. In the process potential data leakage, and data acquisition. The act of
of automated execution, we used the Xposed frame- obtaining and sending private data before asking for
work to monitor 41 APIs related to privacy data and the user’s consent violates the GDPR and the Speci-
cryptography (i.e., javax.crypto.Cipher and Base64 fication, and we marked it as a kind of data leakage; the
API). Part of the classes and the related privacy data are use of a private network protocol and delayed sending
shown in Table 1. The API call parameters, return (saving the data in a file and transmitting it through
values, and call stack information were recorded in the the Internet at some point in the future) was con-
log. We used mitmproxy to monitor the content of the sidered obtaining private data and encrypting it, so it
request traffic and record the host, parameters, and was marked as a potential data leak. For behaviors that
request raw data in the log. API logs and traffic logs only obtain private data without sending or
were used in subsequent analysis. encrypting it, we marked it as data acquisition because
8 Scientific Programming
Automated
API Hook API Results
Execution
Correlation Data Leakage
Analysis Results
Application
amount
PDF format or image format. Figure 6 shows the results of
15
our analysis. Interestingly, missing links to privacy policies
and link problems were mostly found in the Google Play
10
Store app. We also noted that the Google Play Store requires
that apps’ privacy policies not be displayed in the PDF
5
format [30], and 24 apps violated this rule.
Based on the above findings, we randomly sampled the 0
three presentations of privacy policies. We manually in- unavailable image pdf
stalled and viewed the privacy policies of 30 apps, 10 of
Tencent App Store
which were presented in text, 10 with pictures, and 10
Google Play
presented in the PDF form. We found that only five apps
opened the privacy policy page through the default browser, Figure 6: Reasons for the failure to obtain the privacy policy text.
and the privacy policy presentations of these five apps were
all text. Among the privacy policies presented as pictures and
PDFs, eight apps had a zoom function on their viewing Google Play Store. Figure 7 shows a comparison of the
pages; three apps in PDF format and two apps in image integrity of the app privacy policies of the two stores. Note
format could not zoom in on a 5-inch 1920 ∗ 1080 screen, that “Complaints and Responses” is a subject covered by the
making it hard to see. Interestingly, we found that two apps' Specification requirements, and “Legal Basis” is a subject
privacy policy linkspoint to a PDF file in the app store, while covered by the GDPR requirements. It can be found that
point to a web page within the app. Since extracting privacy “how personal information is collected and used” and “the
policies from apps is a difficult task, we did not compare all sharing, transfer, and public disclosure of personal infor-
the apps'privacy policy links in the app store and within the mation” are topics covered in 91% of apps’ privacy policies
app. for both stores. Since the template of the privacy policy is
In general, the privacy policy can be best displayed in given in the Specification, the privacy policy of the Tencent
text. Because whether the developer chooses to use Android Appstore is relatively complete. The topic with the lowest
WebView or the built-in browser of the mobile phone to inclusion rate is “Complaints and Responses,” with an in-
display the privacy policy, the user can display a clear privacy clusion rate of 77%, and the inclusion rates of other topics
policy by zooming in and out of the page. The text format is are all higher than 88%. Compared with the Tencent
convenient for typesetting. Multiple privacy policies made Appstore, in the Google Play Store samples, except for “data
by the same developer are unified via the format, thereby subjects’ rights” and “update of the privacy policy,” the topic
improving the user’s reading experience. In contrast, pic- inclusion rate is lower than that of the Tencent Appstore
tures and PDF formats may not only increase the reading samples. The inclusion rates of the topics “information
difficulty for users but also cannot guarantee that the display storage,” “protection for minors,” and “legal basis” are all
page of the privacy policy will not be tampered with below 60%. Note that, for topics with an inclusion rate of less
(through traffic hijacking) and can only increase the cost of than 70%, we searched the full text through keyword ex-
page tampering to a limited extent. traction and string matching to prevent app developers from
not presenting this content as separate topics.
Ninety-two percent of the privacy policies we success-
6.2. App Privacy Policy Completeness and Accuracy. As de- fully analyzed included the topics “how data is collected and
scribed in Section 4.1, we performed text integrity analysis used” and “how data is shared.” Among the privacy policy
on the privacy policy texts of 1,886 successfully obtained topics listed in Section 3.1, these two topics are often directly
apps. The overall conclusion is that the privacy policies of the related to privacy data types. Does the existence of the
Tencent Appstore app are more normative than those of the subject mean that the content of the privacy policy is
10 Scientific Programming
40
6.3. Privacy Behavior of the App before Consent Is Obtained.
under the registrable domain names, we used the public control whether the app developer makes corre-
domain name suffix list [33] to further resolve these sponding settings in the app. mParticle [37], to which
domain names into registrable domains. In the end, 158 mparticle.com belongs, and recommends that users
apps in the Tencent Appstore contacted 81 PDDs, and implement the rights of data subjects through e-mail
in the Google Play Store, 151 apps contacted 75 PDDs. contact. Umeng [38], to which umeng.com and
The results show there are situations where multiple umsns.com belong, recommends users log in to its
apps access the same PDD. Here, we speculated that official website to process personal data, but we did not
third-party libraries and third-party services are find a suitable entry. It is not difficult to see that, at the
causing this phenomenon. We assumed that if a PDD current stage, it is difficult for users to protect their
appears in at least five apps and these five apps are not rights against data violations involving third-party
from the same developer, this PDD was a domain name services.
accessed by a third-party service or a third-party li- Is solicitation of user consent for third-party services
brary. Through the above hypothesis screening, we solely the responsibility of the third-party developer?
found that 24 PDDs were the target domains of 426 We analyzed the privacy policies of the top three third-
DLs, accounting for 68% of the times private data was party services for two stores (listed in Figure 9), which
sent without user consent. This result shows that only a are QQ, Umeng, JPush [39], branch [40], mParticle,
very small number of first parties collect private data. In and Apptentive [41]. It was found that these third-party
contrast, most of the private data is leaked or may be services all required developers to list the use of the
leaked to third parties because developers rely heavily service in the app’s privacy policy, and the user’s
on third-party services for various needs, such as consent to the app’s privacy policy was deemed to be
personalized push, analytics services, and social net- consent to the third-party service’s privacy policy.
working. Figure 9 shows the top five third-party service Therefore, once the user agrees to the privacy policy of
domains for two stores. Among them, qq.com and the app, the third-party service can start collecting or
branch.io are the domains with the highest number of using the user’s private data. Among the six third-party
data breaches in the Tencent Appstore and Google Play services we analyzed, only Umeng required developers
Store in Europe, respectively, and the companies they to perform a delayed initialization configuration in its
belong to are both well-known third-party service own compliance document [42] to ensure that users
providers. agreed to the app’s privacy policy before the Umeng
SDK was initialized by the app. Therefore, we con-
6.4. About Third-Party Service. In Section 6.3, we listed the cluded that it is the responsibility of both the app
top five third-party service domains for each of the two developer and the third-party service that the third-
stores. In Section 3.1, we mentioned that the Specification party service starts to collect the user’s private infor-
and GDPR require apps to clearly state in their privacy mation before the user’s consent. The app developers
policies which third parties they share information with, to did not adequately delay initialization, and the third-
explain the types and ways in which information is shared party services did not clearly indicate the relevant
and to list the privacy policies of third-party services. Our compliance requirements to the app developers in
integrity analysis indicated that 88% of the apps that used documents similar to the “Service Access Guide” (the
these services had a privacy policy that said so. Still, data above Umeng’s reminder was not in the service access
breaches are widespread. For this, we ask the following two documentation).
questions and try to find the answer.
How can users assert their rights when third-party 6.5. Summary and Recommendations. From the above
services are involved? When reading and analyzing the analysis of the results, it is not difficult to find that popular
developer documents of third-party services, we no- apps in the Tencent Appstore perform better than those in
ticed that these privacy policies all mention how users the Google Play Store in terms of privacy policy writing and
manage their own information and claim the rights of user consent. Especially in terms of the user consent so-
information subjects. However, we found that, for licitation, only 84 apps out of 1,000 samples in the Google
different third-party services, the difficulty for users to Play Store prompt apps to review and agree to their privacy
assert the rights of information subjects was different. policy summary and privacy policy text before starting their
Thanks to having their own clients, users of QQ open regular services (including registration and login services).
SDK [34] and Paypal [35] can manage the information There are probably two reasons for this. The first is that
collected by other apps through these third-party Specification was introduced later than the GDPR, and
services by logging into the corresponding clients. For several studies [43, 44] have indicated that China has taken
example, the user can log in to the QQ [36] app, disable the GDPR into account in writing its data security regula-
the QQ service connected to other apps in the settings, tions and has refined many of the elements of the GDPR to
and delete the data. The provider branch gives users a suit its national context. This makes the Specification a better
variety of ways to delete data and exit the branch service reference for both application developers and data security
in the privacy policy and also provides corresponding regulators. The second is that, in recent years, China has
functions to developers through the SDK, but it cannot continued its massive regulation of data security for apps.
Scientific Programming 13
qq.com 86 (38.22%)
umeng.com 45 (20.00%)
jpush.cn 35 (15.56%)
umsns.com 22 (9.78%)
baidu.com 17 (7.56%)
branch.io 76 (38.19%)
mparticle.com 30 (15.08%)
appten
21 (10.55%)
-tive.com
app-measure
15 (7.54%)
-ment.com
paypal.com 10 (5.03%)
0 20 40 60 80
Tencent App Store
Google Play
Figure 9: Third-party service domains for each of the two stores.
Multiple meetings and briefings [45] have prompted app (3) When using a third-party service, carefully read the
developers to write more compliant app privacy policies and privacy policy or compliance document of the ser-
write apps with fewer data security concerns. vice to understand the privacy requirements of the
However, although the app privacy policy notification of third-party service for app developers and avoid
the Tencent Appstore is better, the collection of private data potential privacy issues. At the same time, when it
before the user’s consent has not been reduced because of comes to third-party services, we hope that devel-
this. As described in Section 6.3, this is mainly caused by opers will try their best to provide users with a
third-party libraries and third-party services. In the statistics convenient way to claim data rights.
of third-party libraries and third-party services, the privacy
We recommend that the Google Play Store conduct
behavior of third-party service providers in China accounts
more rigorous testing of the apps’ privacy policies, including
for a very high proportion. At a time when the Android
detecting the presentation of the privacy policies and the
ecosystem is highly globalized, it is difficult for most apps to
normative nature of the apps to solicit user consent. An
avoid using various third-party services due to functional
automated evaluation process should be established in order
requirements or performance requirements. Privacy viola-
to analyze whether app privacy policies comply with the legal
tions of third-party services may cause apps to violate their
requirements of the region. The store can then ask devel-
applicable data protection laws.
opers to improve their privacy policies based on the analysis
Through the detection of privacy policies and app samples,
results. In this way, privacy violations can be reduced.
we evaluated the privacy policy compliance of popular apps in
two app stores based on their applicable regulations. Overall,
the privacy policy compliance of the Tencent Appstore app is 7. Discussion
better than that of the Google Play Store.
Based on the above analysis, we make the following 7.1. Poor Presentation of Privacy Policy. We mentioned in
recommendations to app developers: Section 3.2 that the average length of the privacy policy of the
app is more than 2,000 words (more than 8,000 Chinese
(1) Present the privacy policy in text form. Presenting it characters). In the first half of 2021, US consumers used an
in image or PDF format may cause reading diffi- average of 46 apps per month [46]; if consumers needed to
culties for users. fully understand the privacy policies of these apps, they may
(2) If the applicable regulations or specification docu- need to read more than 100,000 words. Our analysis also found
ments provide a privacy policy template, the privacy that each app uses at least two third-party libraries, and as
policy should be written according to the template; if required by law, the app’s privacy policy must include a link to
there is no template, one can try to learn the privacy the privacy policies of these third-party services. The privacy
policy of the top app from among the popular apps policies of some large third-party services also contain links to
or the app publishing platform itself. the privacy policies of the third-party services they use. Some
14 Scientific Programming
service providers, such as DingTalk [47], do not write special download the sample apps directly from the Google Play
privacy policies for their own SDKs or third-party services. As Store on the European servers, this problem was also
a consequence, the developer can only quote the privacy policy difficult to solve.
of the service provider, and the third-party privacy policy links
used in it may be as many as 20. Take Douyin [48], the most
downloaded app in the Tencent Appstore, as an example. To
8. Conclusions
fully understand the privacy policy of the app and its use and In this paper, we proposed three evaluation metrics for app
indirect use of third-party services, users need to read at least privacy policy compliance, which are integrity of app privacy
42 privacy policies, and the number of words is as high as policy, privacy data collection before user consent, and
579,000 (Chinese characters), which is unbearable for con- accuracy of app privacy policy. Then, we analyzed 2,000
sumers. It can be seen that developers of various apps and popular apps on the Google Play Store and Tencent Appstore
third-party services should also think about how to serve their to understand app privacy policy compliance with the GDPR
customers more reasonably when writing a complete privacy and the Information Security Technology-Personal Infor-
policy in accordance with the law. mation Security Specification. Compared with the previous
study, this paper discusses the integrity of the privacy policy
7.2. Responsibilities of Third-Party Services. Third-party li- text in depth and increases the variety of privacy data in
braries and third-party services often choose to hide tech- terms of analyzing consent issues and privacy policy ac-
nical details in a black-box manner to protect their code, curacy, and also increases the scope of text scanning in
which makes it difficult for developers to understand how general.
the app works and to discover privacy behaviors before users In our privacy policy completeness study, we found that
agree. In the current globalization of the Android ecosystem, the way regulations are written has an impact on the
an excellent third-party service may be used by app de- completeness of privacy policies. Our research points out
velopers from all over the world. How to avoid violating the that the Specification does a better job than the GDPR in
privacy protection laws of multiple regions is a test for third- guiding developers to specify a qualified privacy policy. Our
party service developers. Third-party service developers study also found that there were 55 and 219 apps, respec-
urgently need to make changes to provide user consent tively, in the Google Play Store and Tencent Appstore with
interfaces to app developers or directly display consent poor privacy policies, 117 and 151 apps, respectively, that
dialog windows to users. For those third parties who play the started to collect users’ privacy data before obtaining the
role of data processors, it is the responsibility of the de- user’s consent to the privacy policy, and 215 and 916 apps,
veloper team and enterprises to faithfully provide data respectively, that did not fulfil the obligation of the consent
processing services without “crossing the border” of the solicitation. The results reveal how the two stores’ popular
identity of the data controller. apps violate their respective laws. Based on our indepth
analysis of the data, we found that incomplete app privacy
policies are not directly linked to actual privacy violations, as
7.3. Limitations in Our Method. Our approach naturally a large number of privacy behavior violations are caused by
suffers from certain limitations. First, we do not have access popular third-party services. Finally, we put forward some
to information beyond what is stated in the privacy policy. suggestions for app stores and third-party service providers,
This is for two reasons. One is that it is difficult to fully respectively.
trigger all processes of the app and automate registration and
login, which is an inherent challenge for dynamic analysis.
The other is that, as described in Section 7.1, the text of the Data Availability
privacy policy contains complex links that create multiple
The program code, sample app list, and a list of candidate
layers of references and nesting, making it difficult to obtain
titles for privacy policies used to support the results of this
the full privacy policy and therefore to conduct privacy data
study are hosted in our GitHub repository at the address
cross-analysis.
included in the article. The intermediate data supporting the
Second, we use the Xposed framework to dynamically
findings of this study, including the intermediate results of
analyze the apps, which cannot resist the antidetection
the privacy policy integrity analysis, and the intermediate
methods of the apps very well. The privacy behavior based on
results of the app samples dynamic analysis, are available
the native code is also difficult to identify through the
from the corresponding author upon request.
Xposed framework. Also, since the Xposed framework only
works well on Android 7.1.2, our detection may have missed
some private data operations that apps only trigger on higher Conflicts of Interest
Android versions, which may have led to some false
negatives. The authors declare that they have no conflicts of interest.
Furthermore, we will have difficulty accessing some
restricted domains during the test because of the policy Acknowledgments
restrictions in our region. Therefore, some common
domains such as facebook.com, Google.com, etc. may be This research was funded by the National Natural Science
missing in the test results. Although we chose to Foundation of China (grant no. 61873069).
Scientific Programming 15