0% found this document useful (0 votes)
12 views17 pages

Hidden APIs MSA

The paper discusses the discovery of hidden and undocumented APIs in mobile super apps like WeChat and TikTok, which can pose security risks as they may allow unauthorized access to sensitive resources. A tool called APIScope was developed to systematically identify and classify these hidden APIs through static and dynamic analysis, revealing that many super apps contain APIs that can be exploited by third-party miniapps. The authors emphasize the need for super app vendors to implement proper access controls to mitigate potential security vulnerabilities associated with these hidden APIs.

Uploaded by

mobiletrackerf6
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views17 pages

Hidden APIs MSA

The paper discusses the discovery of hidden and undocumented APIs in mobile super apps like WeChat and TikTok, which can pose security risks as they may allow unauthorized access to sensitive resources. A tool called APIScope was developed to systematically identify and classify these hidden APIs through static and dynamic analysis, revealing that many super apps contain APIs that can be exploited by third-party miniapps. The authors emphasize the need for super app vendors to implement proper access controls to mitigate potential security vulnerabilities associated with these hidden APIs.

Uploaded by

mobiletrackerf6
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Uncovering and Exploiting Hidden APIs in Mobile Super Apps

Chao Wang Yue Zhang Zhiqiang Lin


The Ohio State University The Ohio State University The Ohio State University
[email protected] [email protected] [email protected]

ABSTRACT would have used the same set of the APIs. However, by performing
Mobile applications, particularly those from social media platforms a manual analysis, we discovered discrepancies in the APIs used
such as WeChat and TikTok, are evolving into “super apps” that by these miniapps. For instance, privileged APIs like openUrl are
offer a wide range of services such as instant messaging and media present in 1st-party miniapps like Tencent Doc [4], which has more
sharing, e-commerce, e-learning, and e-government. These super than 200 million online consumers. openUrl can open arbitrary
arXiv:2306.08134v1 [cs.CR] 13 Jun 2023

apps often provide APIs for developers to create “miniapps” that URLs, but the 3rd-party miniapps cannot use openUrl and must
run within the super app. These APIs should have been thoroughly use the wx.request API to ensure that the URLs are checked by
scrutinized for security. Unfortunately, we find that many of them WeChat to prevent the loading of malicious content. Moreover,
are undocumented and unsecured, potentially allowing miniapps not all APIs are equally mentioned in the official documentation.
to bypass restrictions and gain higher privileged access. To sys- The Chinese version of the development documentation comprises
tematically identify these hidden APIs before they are exploited by 975 APIs [8], while the English version has only 570 APIs [9]. Ad-
attackers, we developed a tool APIScope with both static analysis ditionally, none of the privileged APIs, such as openUrl are ever
and dynamic analysis, where static analysis is used to recognize referenced in the official documentation, regardless of the language.
hidden undocumented APIs, and dynamic analysis is used to con- Thus, there may be undocumented APIs in the super app platforms
firm whether the identified APIs can be invoked by an unprivileged (at least in WeChat). Such undocumented APIs may pose security
3rd-party miniapps. We have applied APIScope to five popular su- risks. For example, they may have a higher level of privilege, as they
per apps (i.e., WeChat, WeCom, Baidu, QQ, and Tiktok) and found are designed exclusively for use by 1st-party apps. In order to en-
that all of them contain hidden APIs, many of which can be ex- sure security, super apps should implement proper access controls
ploited due to missing security checks. We have also quantified the for these privileged APIs, such as allowing access solely through
hidden APIs that may have security implications by verifying if an approved list for 1st-party miniapps. Otherwise, they may be
they have access to resources protected by Android permissions. a weak spot for unauthorized access by 3rd-party miniapps.
Furthermore, we demonstrate the potential security hazards by pre- Although our manual analysis with the host app and its 1st-party
senting various attack scenarios, including unauthorized access to miniapp implementation has yielded surprising findings, it is cer-
any web pages, downloading and installing malicious software, and tainly not scalable nor complete. Meanwhile, given the fact that so
stealing sensitive information. We have reported our findings to the many super apps are available today, it will be extremely helpful if
relevant vendors, some of whom have patched the vulnerabilities we can have a tool to identify all of the hidden APIs if that is possi-
and rewarded us with bug bounties. ble from their implementations. Also, since privileged APIs without
any checks can be easily exploited by malicious miniapps, we must
inform the super app vendors to patch the missing or misplaced
1 INTRODUCTION checks. Motivated by these pressing needs, in this paper, we present
APIScope, a binary analysis tool combined with both static and
Over the past a few years, we have witnessed a rapid growth of dynamic analysis to systematically scrutinize hidden APIs, which
the miniapp paradigm [33], in which a mobile super app (e.g., are undocumented, from super app implementations.
WeChat [6] and TikTok [5]) provides a seamless runtime environ-
Multiple challenges must be addressed while developing APIS-
ment for a miniapp, a web-app alike small application, for enhanced
cope. Particularly, several programming languages have been used
user experience (e.g., install-less) and stickiness with the super app
to implement a super app at various layers (e.g., JavaScript at the
(e.g., a user can access almost all the daily services without leaving
miniapp layer, C/C++ at the JavaScript runtime layer, and Java at
it). Today, more than 4.3 million miniapps [7] have been developed
the service abstraction layer provided by the host app), and conse-
in WeChat (a super app with 1.2 billion monthly active users [1]),
quently it is challenging to recognize how APIs across these differ-
surpassing the total number of Android apps in Google Play (which
ent languages and interfaces are invoked. Second, after identifying
has about 2.7 million as of November 2022 [2]). These miniapps of-
an undocumented API, it is also challenging to classify whether it
fer a variety of daily services from transportation (e.g., ride hailing),
is an API that can be invoked by third-party miniapps. Fortunately,
e-commerce (e.g., online shopping), e-learning, e-government (e.g.,
we have addressed these challenges and successfully implemented
pandemic control and contact tracing), mobile gaming, to entertain-
APIScope. There are two key components inside APIScope: Static
ment (e.g., short-form user videos), and so on. They are developed
API Recognition and Dynamic API Classification. At a high level, it
by both the 1st-party (i.e., the one who makes the super app plat-
takes a super app binary as well as its list of public APIs as input,
form), as well as 3rd-party (i.e., developers who create additional
and identifies the hidden APIs based on the invariants of the func-
software based on the platform provided by the 1st-party).
tions and interfaces from the public APIs in the super apps using
Obviously, since both the 1st-part and the 3rd-party miniapps are Static API Recognition. Next, it dynamically executes the identified
all built on top of the APIs provided by the super app platform, they
This is a preprint of our CCS 2023 paper. Chao Wang, Yue Zhang, and Zhiqiang Lin

APIs to confirm whether they are true APIs, and further classi- Application JavaScript
MiniApps Implementation
Layer
fies them into checked and unchecked ones based on whether it
can only be invoked by the 1st-party miniapps using Dynamic API JavaScript JavaScript
Framework JavaScript APIs Implementation
Classification. Layer

We have tested APIScope with five popular super apps: WeChat,


Customized C/C++
WeCom, Baidu, QQ, and TikTok. Our evaluation results show that V8 Layer V8 Interfaces
Implementation

all the tested super apps contained hidden APIs. Interestingly, our
study found hidden APIs in different categories, with some super Service
User Data Bluetooth Network Java
Abstraction ...
Layer Services Services Services Implementation
apps having more hidden APIs than documented ones. For example,
the API category of Payment of WeChat contains 28 hidden APIs, Host App Layer Host App
Java & C/C++
Implementation
which is significantly more than its documented ones (i.e., only one).
Java & C/C++
We also measure the usage of hidden APIs in both 1st party miniapps OS Layer Android OS Implementation
and 3rd party miniapps. We found that the use of undocumented
APIs is common among both 1st-party miniapps and 3rd-party Figure 1: Architecture of Super App Runtime in Android
miniapps regardless of their category.
It is evident that not all hidden APIs may pose security risks
when misused. Therefore, our objective was to dive into the security
implications of hidden APIs. Specifically, we focused on the hidden
2 BACKGROUND
APIs that lack security checks but can access sensitive Android OS Miniapps are programs that run on top of host apps instead of
resources. To achieve this, we proposed the use of dynamic analysis directly on the operating system. Host apps have to function like
techniques. Our dynamic analysis approach involves identifying an operating system and provide resources (e.g., location, phone
APIs that call native APIs, which can access sensitive resources. We numbers, addresses, and social network information) to miniapps
achieved this by hooking APIs that access sensitive resources and through APIs. Mobile super apps are organized in a layered architec-
monitoring their use by unchecked and undocumented APIs. After ture, with each layer focusing on different aspects like portability,
conducting our investigation, we found that WeChat has 39 hidden security, and convenience, but working together to support miniapp
unchecked APIs (7.77%) that invoke Android APIs protected by execution within host apps, as shown in Figure 1:
permissions. Similarly, WeCom has 40 (6.75%), Baidu has 8 (7.61%), • Mini-Application Layer, which is the top layer of a super-app
Tiktok has 32 (26.23%), and QQ has 38 (12.88%) such APIs, which runtime. All miniapps, including 1st-party and 3rd-party miniapps,
can have security risks. are located in this layer. To prevent one miniapp from accessing
To further validate our findings, we conducted several attack resources of other miniapps, the host app creates an isolated pro-
case studies by developing a number of malicious miniapp using cess for each miniapp. If privileged access is given to 1st-party
these hidden APIs. Specifically, in WeChat, we developed a ma- miniapps, it must be controlled and checked to prevent 3rd-party
licious mini-app to exploit the hidden private_openUrl API to miniapps from using them. Typically, miniapps are implemented
access arbitrary malicious content without detection by the super using JavaScript [33].
apps. Additionally, by using the installDownloadTask hidden API, • JavaScript Framework Layer provides APIs for resource ac-
we developed a mini-app that can download and install harmful cesses and management, which are consumed by miniapps in the
Android apps surreptitiously. Malicious apps have the capability to Application Layer. These APIs allow miniapps to access resources
pilfer a user’s sensitive information. Our demonstration reveals the (such as location-based services) and manage UI elements (such
utilization of hidden APIs such as captureScreen, which enables as opening a new UI window). The JavaScript Framework Layer
malicious miniapps to steal screenshots, getLocalPhoneNumber, is also implemented using JavaScript.
which permits theft of the user’s phone number, and searchCon-
tacts, which facilitates the theft of the user’s contact information. • Customized V8 Layer, which provides support for native C/C++
libraries such as WebGL to power the execution of miniapps. It
Contributions. We make the following contributions: also acts as a bridge between the JavaScript Framework layer and
• We are the first to discover that super apps may provide hidden, lower-layers. When miniapps call APIs such as wx.getLocation,
i.e., undocumented, APIs (for the 1st-party miniapps), and those the Framework layer sends the API name and parameters to the
hidden APIs that do not have permission checks can be exploited Customized V8 layer, which then passes the request to the un-
by the 3rd-party miniapps for privileged accesses. derlying layers. This layer is usually implemented using C/C++.
• We propose APIScope to systematically identify and classify the • Service Abstraction Layer, which provides an interface to ac-
hidden APIs in super apps, with two novel techniques to statically cess services from either the super apps (e.g., user account infor-
recognize the APIs and dynamically execute and classify them. mation) or the underlying OS (e.g., Bluetooth, location-based ser-
• We implement APIScope, and evaluate it with 5 super apps and vices). In the case of the wx.getLocation API, this layer commu-
find all of them containing hidden APIs, some of which can be nicates with the host app using IPC to invoke the Java API get-
exploited by malicious 3rd-party miniapps. We have made the re- SystemService(LOCATION_SERVICE) to retrieve the current lo-
sponsible disclosure to their vendors, and received bug bounties cation. This layer is implemented using a combination of Java
from some of them. and C/C++ code for the Android platform.
10 a(apiName, params, callbackId) {
11 callbackId = NativeGlobal.invokeHandler(apiName, params,
12 callbackId);
13 invokeCallbackHandler(callbackId, callbackHandler)
14 }(apiName, filteredParams, callbackId)
15 }
NativeGlobal.invokeHandler("getLocation", 'wgs84',callbackId)
16 return this;
17 }(global);

This is a preprint of our CCS 2023 paper. Submission to ACM CCS 2023, 2023

1// Implementation of Docuemented API getLocation look similar to that of the documented APIs (e.g., they have similar
2 package com.tencent.mm.plugin.appbrand.jsapi.m;
3 public class x extends a { function signature, similar parameter type and return value type).
4 public static final int CTRL_INDEX = 17;
5 public static final String NAME = "getLocation"; We start by inferring whether those functions are indeed undoc-
6 umented APIs, since intuitively the public APIs and undocumented
7 @Override
8 public final void b(IAppBrandComponent env, JSONObject data,int cId){ APIs are APIs, and the developers would have followed the same
9 // some other logic
10 env.doCallback(cId, env.Map2JSON(result)); practice to implement them. Without surprise, we found the imple-
11 }
12 } mentation of openUrl, which confirms our observation. In Figure 2,
13 we show 3 API implementations of WeChat. Although the code
14 // Implementation of Undocumented API openUrl
15 package com.tencent.mm.plugin.appbrand.jsapi.n; is highly obfuscated (where the names of the classes and methods
16 public class y extends a {
17 public static final int CTRL_INDEX = 201; are replaced with meaningless letters, such as “a”,“b”), we still can
18 public static final String NAME = "openUrl";
19
observe some invariants: WeChat’s public API getLocation (line
20 @Override 1–13) and its undocumented API openUrl (line 14–25) both have
21 public final void b(IAppBrandComponent env,JSONObject data, int cId){
22 // some other logic the same parameter types and return types, as well as the same
23 env.doCallback(cId, env.Map2JSON(result));
24 } superclass (i.e., class b). As such, we can use these invariants (e.g.,
25 }
26
the superclass of the API, the parameters of the API) collected from
27 // Implementation of Undocumented API private_openUrl the public APIs to search for possible undocumented APIs. For
28 package com.tencent.mm.plugin.appbrand.jsapi.n;
29 public class z extends a { instance, as shown in Figure 2, we identified another function pri-
30 public static final int CTRL_INDEX = 406;
31 public static final String NAME = "private_openUrl"; vate_openUrl (lines 28–38) that has the same function signature,
32
33 @Override
which is very likely an undocumented API.
34 public final void b(IAppBrandComponent env,JSONObject data, int cId){
35 // some other logic Observation-II: Undocumented API Invocation. Although there
36 env.doCallback(cId,env.Map2JSON(result));
37 } may be undocumented APIs (e.g., private_openUrl) provided by
38 }
WeChat, we have to find a way to invoke them (if they are indeed
APIs). Interestingly, when we directly invoke undocumented APIs
Figure 2: APIs implementations of WeChat.
such as private_openUrl in a miniapp, we obtain an error, “fail:
not supported”, which is different from the error we observed
3 MOTIVATION AND PROBLEM STATEMENT when invoking openUrl with “fail: no permission”. As such,
we infer that the accessibility of the API private_openUrl is not
This section describes the motivation of this work by providing
the same as that of openUrl (since the observed error messages are
some key observations in §3.1, then define the problem, the scope
different), and there may be a way to invoke it. As such, we further
and the threat model in §3.2.
inspected the normal invocation of the documented APIs, and seek
to obtain insights from the process.
3.1 Key Observations
1 // Docuemented API Implementation of Baidu
To be more precise, as described in §2, the JavaScript Framework
2 package com.baidu.swan.apps.scheme.actions.f;
As34 public
alluded class a extends
earlier, when
public a (e context) {
aa manually
{ inspecting the implementation Layer acquires the invocation request during a regular API call and
of some of the 1st-party miniapps offered by WeChat, we found
5 super(context, "/swanAPI/getLocation"); transfers it to the lower layers via the interfaces exposed by the
6 }
that
7 other than the public APIs that all the miniapps can access Customized V8 Layer. In Figure 3, we provide a code snippet illus-
8 @Override
without
9 restrictions,
public the 1st-party
boolean a (Context c, Schememiniapp Tencent Doc
s, CallbackHandler actually
cb, SwanApp a){ trating the API invocation chain of WeChat, where the invocation
10
uses some // some other logic
undocumented APIs (e.g., openUrl for opening arbitrary
11 }
request for the getLocation API (line 3 in the top-left frame) is
URLs).
12 } Moreover, the designers of WeChat do not make the APIs eventually passed to the NativeGlobal.invokeHandler function
13
available to be public
14 // Unocuemented (their documentation
API Implementation of Baidu does not even mention (line 11 in the bottom-left frame), which in turn conveys the API
15 package com.baidu.swan.apps.impl.account.a;
openUrl), and have placed
16 public class f extends aa {
security checks to prevent openUrl from invocation request to the underlying layers. Notably, the Native-
being
17 accessed
public f (eby arbitrary
context) { miniapps. For example, whenever a Global.invokeHandler function receives three inputs: the API
18 super(context, "/swanAPI/getBDUSS");
3rd-party
19 } miniapp attempts to invoke openUrl, WeChat will throw name (e.g., getLocation), the API parameters, and a callback func-
20
an21insufficient
@Override permission exception (i.e., “fail: no permission”) tion ID (which enables the API to manage the asynchronous call).
22 public boolean a (Context c, Scheme s, CallbackHandler cb, SwanApp a){
and
23
terminate its execution. The use of openUrl in the 1st-party
// some other logic Given that NativeGlobal.invokeHandler can deliver the nor-
Tencent
24 } Doc miniapp prompted us to investigate the possibility
25 } mal invocation request to the underlying layers, we conclude that
of other hidden APIs offered by WeChat without proper security it also has the capabilities to deliver undocumented API invocation
checks. This inspired us to explore the feasibility of identifying and requests. Therefore, we feed the API name private_openUrl and
exploiting these APIs, but we faced two challenges: (i) identifying its parameter (which is a URL) to the interface and let it pass the
the hidden APIs and (ii) properly invoking them to test for poten- API name and the URL to the underlying layers. Interestingly, we
tial vulnerabilities. Through further exploration, we made two key find that the underlying layers handle the passed API name and the
observations to address these challenges. parameter as normal API invocations and further pass the invoca-
Observation-I: Undocumented API Recognition. By manually tion requests to the host apps. As shown in Figure 4, while WeChat
inspecting the implementation of WeChat, we found that multi- restricts the undocumented APIs to be accessed by mini-apps, un-
ple suspicious undocumented functions are co-located with their fortunately we find that not all undocumented APIs are protected
documented APIs. That is, those functions and the public APIs are through security checks. In particular, WeChat has enforced the se-
located in the same super app packages, and their implementations curity check for the undocumented API openUrl, but it does not add
This is a preprint of our CCS 2023 paper. Chao Wang, Yue Zhang, and Zhiqiang Lin

1 wx.getLocation = function (arg) {


2 var params = 0 < arguments.length && void 0 !== arg ? arg : {};
3 Object(WeixinJSBridge.invokeMethod)("getLocation", params, {
4 beforeSuccess: function(e) { wx.getLocation({type:'wgs84'})
5 // Code Omitted //
6 }
7 })
8 }
Object(WeixinJSBridge.invokeMethod)("getLocation", 'wgs84', Callback{})
1 WeixinJSBridge = function(global) {
2 var NativeGlobal = global.NativeGlobal;
3 var globalCount JavaScript
= 0; Customized V8 Service
4 Miniapp Host App
Framework Layer Layer Abstraction Layer
5 function invokeMethod(apiName, params, callbackHandler) { function a("getLocation", 'wgs84', callbackId)
❶ Invoking
6 params = WeixinNativeBuffer.pack(params);
JavaScript API
7 var filteredParams❷ =PassingparamFilter(params
Invocation || {}),
8 callbackId = ++globalCount;
Request ❸ Passing
9 callbackQueue[callbackId] = callbackHandler, Invocation Request
10 a(apiName, params, callbackId) {
❹ Binder IPC
11 callbackId = NativeGlobal.invokeHandler(apiName, params,
12 callbackId);
13 invokeCallbackHandler(callbackId, callbackHandler)
❺ Returning Results
14 }(apiName, filteredParams, callbackId) ❻ Returning Results
15 } ❼ Returning Results NativeGlobal.invokeHandler("getLocation", 'wgs84',callbackId)
16 return this; Results
❽ Returning
17 }(global);

Figure 3: An Example of WeChat API Invocation At JavaScript Framework Layer.

Malicious
Miniapp
JavaScript
Framework Layer
Customized V8
Layer
Service
Abstraction Layer
Host App the convenience and also our expertise, we focus on the super
getLocation apps running on Android platform, though in theory our approach
openUrl should also work for the iOS platform.
1// Implementation of Docuemented API getLocation
2 package com.tencent.mm.plugin.appbrand.jsapi.m;
3getLocation
public class x extends a {
private_openUrl 4 public static final int CTRL_INDEX = 17;
getLocation
5 openUrl
public static final String NAME = "getLocation"; 3.3 Threat Model
6
private_openUrl
7 @Override openUrl getLocation
8 public final void b(IAppBrandComponent env, JSONObject data,int callbackId) {
As previously discussed, our objective is to develop techniques for
9 // some other logic
10 env.doCallback(callbackId, env.Map2JSON(result));
detecting hidden APIs that lack security checks before a malicious
11 }
12 }
app exploits them. In this context, the attacker is a malware that
private_openUrl private_openUrl
13
14 // Implementation of Undocumented API openUrl
has been installed on the user’s mobile device. We will not delve
15 package com.tencent.mm.plugin.appbrand.jsapi.n;
16 public class y extends a {
into the details of how this malware can be installed, as we believe
17 public static final int CTRL_INDEX = 201;
18 public static final String NAME = "openUrl";
it is practical to assume that super apps are not aware of such types
19
of malware until we report our findings to them. It is worth noting
Figure 4: The Workflow@Override 20
of API invocations. Public API in-
21
public final void b(IAppBrandComponent env,JSONObject data, int callbackId) {
that previous research on super apps has also made similar assump-
vocation getLocation (green // line);22
some other
23
logic
Checked Undocumented
env.doCallback(callbackId, env.Map2JSON(result));
tions [25]. Undocumented APIs refer to functions or APIs that are
API openUrl (red line);} Unchecked
} 24
25
Undocumented API pri-
not included in the official documentation, regardless of whether
vate_openUrl (purple line). 26
27
// Implementation of Undocumented API private_openUrl
28 package com.tencent.mm.plugin.appbrand.jsapi.n;
it is in English or Chinese. An attacker could acquire knowledge
the security checks for the undocumented
public class z extends a API
29 { private_openUrl, about the existence of these hidden APIs by reverse engineering
30
public static final int CTRL_INDEX = 406;
which has the exact same functionalities
31 asString
public static final openUrl. Also, the API
NAME = "private_openUrl";
the super app client or by reading technical blogs on the internet.
32
name and parameters are @Override
not obfuscated since they have to be
33 Specifically, undocumented APIs may have access to sensitive re-
34
public final void b(IAppBrandComponent env,JSONObject data, int callbackId) {
passed to lower layers. 35
// some other logic sources that are safeguarded by Android OS. If an attacker exploits
36 env.doCallback(callbackId,env.Map2JSON(result));
37
38 }
} these APIs, they can launch attacks against the victim users.
3.2 Problem Statement and Scope
Since our manual investigation 1 // DocuementedhasAPI revealed that of
Implementation there
Baiduare indeed
2 package com.baidu.swan.apps.scheme.actions.f; 4 CHALLENGES AND INSIGHTS
hidden APIs in the super app
3 public
4
platform
class a extendsand
public a (e context) {
aa { some of them can be

exploited, the goal of this 5 work is to develop


super(context, techniques to uncover
"/swanAPI/getLocation"); (I) Challenges in API Recognition. The first step of our APIS-
6 }
them. More specifically, 7 we need to recognize the hidden APIs cope is to identify undocumented APIs when given a host app.
8 @Override
based on how documented 9 APIs
public are
boolean aimplemented
(Context c, Schemeand executed,
scheme, CallbackHandler cb, Intuitively,
SwanApp app){it sounds trivial, since when given an API, we could
10 // some other logic
and meanwhile test them 11 to}determine whether they can be invoked compare it with the APIs released on the official documentation to
12 }
by 3rd-party miniapps 13 to bypass security restrictions (or those APIs decide whether it is documented or not. However, it is challenging
14 // Unocuemented API Implementation of Baidu
themselves may have 15 vulnerabilities). Please note that
package com.baidu.swan.apps.impl.account.a; we do not to determine whether an internal function or an interface is an API.
16 public class f extends aa {
consider all those 3rd-party 17 invocable
public APIs
f (e context) { as exploitable, since For instance, there are 3,702 functions and interfaces implemented
18 super(context, "/swanAPI/getBDUSS");
whether an API is exploitable 19 } depends on the functionalities of the in JavaScript, not to mention those implemented in 92 native C/C++
20
APIs (e.g.,
1 wx.getLocation the API
= function (arg) implements
{
21 privileged
@Override operations). libraries, and 56,492 Java classes in WeChat’s latest version. Note
22 0 !==
2 var params = 0 < arguments.length && void public
arg ? boolean
arg : {}; a (Context c, Scheme scheme, CallbackHandler cb, SwanApp app){
Also, since there 23
are24multiple
3 Object(WeixinJSBridge.invokeMethod)("getLocation", // some {other logic
super apps available today, wx.getLocation({type:'wgs84'})
params,
ideally, that we do not have to consider the functions at lower-layer’s imple-
4 beforeSuccess: function(e) { }
5 // Code Omitted //
25 }
6 we } would like to develop generic techniques to cover them all.
mentations (i.e., any layer below the JavaScript framework), since
7 })
8 }
However, our observation is heavily based on the miniapp run-time the hidden APIs are not exposed at these layers. Obviously, we
Object(WeixinJSBridge.invokeMethod)("getLocation", 'wgs84', Callback{})
1 WeixinJSBridge = function(global) {
2 architecture
var NativeGlobal = presented in Figure 1. Therefore, the super apps that
global.NativeGlobal;
cannot directly treat all these functions as APIs.
3 var globalCount = 0;
4
5
dofunction
not follow this architecture, e.g., do not use V8 engine to execute
invokeMethod(apiName, params, callbackHandler) {
Also, although for a specific implementation of host apps (e.g.,
function a("getLocation", 'wgs84', callbackId)
6
7 their
params = WeixinNativeBuffer.pack(params);
varminiapp
filteredParamscode, will be ||out
= paramFilter(params {}), of our scope. Finally, because of
WeChat), simple pattern matching approaches can be applied to
8 callbackId = ++globalCount;
9 callbackQueue[callbackId] = callbackHandler,
10 a(apiName, params, callbackId) {
11 callbackId = NativeGlobal.invokeHandler(apiName, params,
12 callbackId);
13 invokeCallbackHandler(callbackId, callbackHandler)
14 }(apiName, filteredParams, callbackId)
15 } NativeGlobal.invokeHandler("getLocation", 'wgs84',callbackId)
16 return this;
17 }(global);
This is a preprint of our CCS 2023 paper. Submission to ACM CCS 2023, 2023

Static API Recognition (§ 5.1) Dynamic API Classification (§ 5.2)


Results

(II) Forward Slicing for API (III) Dynamic API Probing for Undocumented
(I) Automatic (II) Undocumented (I) Test Case
Invocation Identification API Category Classification Unchecked API
Invariants API Recognition Generation
Decompiler
Extraction Undocumented
APIs
G P
E Testing JavaScript Testing
Undocumented
Checked API
Super Apps Testing Cases Forward API JavaScript
Cases Runtime Slicing Cases Probing Runtime
Generator
Decompiled Code
Public APIs

Figure 5: APIScope Architecture

recognize APIs. For example, when implementing the callbacks of example, most JavaScript analysis tools (e.g., Jalangi2 [30]) are de-
the APIs, WeChat uses android.webkit.ValueCallback at the signed for traditional web browsers. They cannot run with the
Service Abstraction layer to handle all the callback results. From super apps since the offered APIs are different. Moreover, most of
the callbacks, we can locate the corresponding APIs and extract these tools need to instrument the testing instances, which involves
patterns to pinpoint the rest APIs. However, there are multiple the modification of the testing instances. In our case, the testing
super apps, each of which could have different implementations. instances are the miniapps (not web applications), which usually
For example, unlike the implementation of WeChat, TikTok uses have integrity checks and cannot be modified easily.
com.he.jsbinding.JsContext.ScopeCallback at the Service Ab-
Insights. To invoke the API for its behavior classification, we need
straction layer to handle the callback results of their APIs, and the
to find the interface, e.g., NativeGlobal.invokeHandler as shown
pattern for WeChat will fail when dealing with TikTok. Moreover,
in Figure 3. Interestingly, to identify this interface, we can monitor
such a pattern-matching approach requires recognizing callbacks
how a public API is executed, e.g., how it is invoked (its name, pa-
first, which may be challenging due to the code obfuscation. As
rameters), and when it is passed between the boundary of the layers.
discussed in §3.1, the miniapp is executed on top of the super apps
More specifically, we notice that we can use function trace analysis
(e.g., Android apps), which is often heavily obfuscated. It is hard
to identify interfaces such as NativeGlobal.invokeHandler, since
to recognize callbacks statically unless we fully understand the
the API execution starts from the invocation, and ends at the inter-
obfuscated code, and as such, we need a more obfuscation-resilient
face boundary. By tracing all of the function executions with their
approach instead of simple pattern matching.
parameters and then identifying them based on the use of the API
Insights. We notice that there exist some invariants such as the name, which is passed as parameters, we can automatically identify
method signatures of public APIs and their superclasses in the API the interface, which is typically the last invocation point in the
implementations, as illustrated in §3.1 based on super app WeChat JavaScript layer. With the identified invocation point, we can then
(e.g., every API has the same superclass a, though this name is feed it with different API names and invoke them to classify further
obfuscated; every public API must contain the name of the API for (e.g., whether they can be invoked by the 3rd-party miniapps).
the references by the miniapps, and this cannot be obfuscated but
can be easily recognized). As such, we can first extract these API in- 5 APISCOPE
variants based on these public API implementations, from which to
recognize the rest of the APIs. This process can be automated since As shown in Figure 5, our developed APIScope consists of two
it is easy to identify these API invariants when the implementation phases of analysis—static analysis first and then dynamic analysis,
of public APIs is provided. with the following two key components:
• Static API Recognition (§5.1). This component takes the bi-
(II) Challenges in API Classification. Once we have identified
nary code of super apps (i.e., APKs) and the list of the official
all these hidden APIs, we still need to further classify them into dif-
APIs in the documentation as input, and produces the undocu-
ferent categories and determine whether they are invocable (when
mented APIs as output. At a high level, it first decompiles the
there is no security check). It will be very challenging if we only use
APKs by Soot [3], automatically extracts the invariants based on
static analysis to decide this, and thus we need to rely on dynamic
the public APIs, and then uses the invariants to recognize the
analysis to dynamically invoke them. However, to invoke a hidden
hidden APIs from the implementations of super apps.
API, we still need to recognize the interface that can communicate
with the underlying layers. Although we have already known that • Dynamic API Classification (§5.2). This component takes
the interface communicates with the underlying layers takes the the hidden APIs as input, and classifies them into three dif-
API name as its inputs (as described in §3.1), it is still challenging ferent categories: unchecked hidden APIs (exploitable by 3rd
to know whether this interface accepts the API name as its in- miniapps), checked APIs (available to only 1st-party miniapps),
put before we actually execute it (due to the obfuscated JavaScript and non-APIs, as the final output. At a high level, it first uses
code). Meanwhile, although multiple dynamic tools are available the Test Case Generator to produce two types of test cases: one
for JavaScript, they cannot be applied to our case directly due to is for API invocation identification executed by a lightweight
the highly customized JavaScript framework implementations. For tracing engine for the monitored execution, and the other is for
API classification. With these test cases, APIScope eventually
identifies the interfaces as well as the categories of the APIs.
This is a preprint of our CCS 2023 paper. Chao Wang, Yue Zhang, and Zhiqiang Lin

1 // Docuemented API Implementation of Baidu 5.2 Dynamic API Classification


2 package com.baidu.swan.apps.scheme.actions.f;
3 public class a extends aa { With the identified undocumented APIs, next we need to invoke
4 public a (e context) {
5 super(context, "/swanAPI/getLocation"); each of them to decide whether they can be exploited by attackers
6 }
7
based on the error messages obtained while executing the corre-
8 @Override sponding test cases for each of the API. This is a three-step process,
9 public boolean a (Context c, Scheme s, CallbackHandler cb, SwanApp a){
10 // some other logic starting from test case generation, followed by API invocation
11 }
12 } identification using function trace analysis, and finally the API
13 classification through dynamic API probing.
14 // Unocuemented API Implementation of Baidu
15 package com.baidu.swan.apps.impl.account.a;
16 public class f extends aa { Step-I: Test Case Generation. In this step, we use our test case gen-
17 public f (e context) {
18 super(context, "/swanAPI/getBDUSS");
erator to produce test cases. The test cases are the JavaScript code
19 } snippets that contain the APIs to be invoked (with their parameters
20
21 @Override configured). For example, wx.getLocation({type: "wgs84"}) is
22 public boolean a (Context c, Scheme s, CallbackHandler cb, SwanApp a){
23 // some other logic a test case for testing API wx.getLocation (how to invoke such
24 } test cases will be described in API invocation identification). There
25 }
are two types of test cases: one for API invocation identification and
Figure 6: APIs implementations of Baidu. Note that lines 1 – the other for API classification. The goal of API invocation identifi-
12 contain a documented API, and lines 14 – 25 contain an cation is to execute the documented API, and use the function trace
undocumented API. analysis to identify the invocation point. Therefore, we only need
to generate a few test cases (which are the test cases of documented
5.1 Static API Recognition APIs). However, in API classification, which invokes the undocu-
mented APIs and categorizes them based on their outputs, we need
To recognize APIs, APIScope first needs to extract the invariants to produce at least one valid test case for each undocumented API
based on the decompiled code of public APIs. With the invariants, it (to obtain the outputs). In particular, since each API may accept
then recognizes the hidden APIs. Therefore, it is a two-step process. one or multiple parameters, to produce a valid test case, we have
In the following, we describe these two steps in greater details. to identify all the types (e.g., Integer, Boolean) of the parameters,
Step-I: Automatic Invariants Extraction. APIScope first needs through which we can further feed each API a list of parameter
to extract the invariants based on the decompiled code of the public instances in the right order (e.g., testAPI(true, 1234)):
APIs. from the implementations of the super apps. In particular, • Parameter Type Extraction. While APIScope could identify
when given an API, APIScope will aggressively identify as many in- the types of parameters through documentation analysis, such
variants as possible from the implementation, and these invariants an approach cannot identify the types of parameters for undoc-
include: (i) the method signatures (e.g., the return type, the number umented APIs. Therefore, we need a more reliable approach to
of the parameters, and parameter types); (ii) the superclass; (iii) the ensure that we can extract parameter types for both documented
super packages (e.g., in super app Baidu com.baidu.swan.apps is and undocumented APIs. Our idea is to analyze the implemen-
the super package of com.baidu.swan.apps.scheme.actions.f tations of the APIs, since we have already identified the imple-
as shown in Figure 6), and (iv) their callers. Again, they are invari- mentations for both documented and undocumented APIs as de-
ants because they will not be changed in the API implementation scribed in §5.1. For instance, in WeChat’s implementation, we no-
(both public and undocumented) for a specific super app, though tice that the types of the parameters of an API can be recognized
the specific content for the invariant may be changed across super by inspecting the methods invoked by JSON instances, e.g., in the
apps. For instance, in the superclass invariant of APIs, in WeChat, implementation of getLocation, we can notice that a JSON ob-
when comparing any two implementations of the provided APIs ject invokes method optString("paramname", paramvalue),
(e.g., getLocation and private_openUrl), we can easily recog- which indicates that getLocation has a “paramname” parameter
nize that they are both extended from the superclass a, as shown with type String. Similarly, if the API accepts a Boolean value as
in Figure 2; similarly, the superclass of APIs provided by Baidu is its parameter, there will be a method optBoolean("paramname",
extended from the same superclass aa, as shown in Figure 6. paramvalue) in its implementation.
Step-II: Undocumented API Recognition. With the invariants, • Parameter Instance Generation. The parameters must be
APIScope then recognizes the undocumented APIs. In particular, it instantiated before being fed into the APIs. We used a pre-defined
iterates each of the function implementations again, by matching template-based approach to instantiate the parameters. At a high
the invariants extracted; if it matches with all the invariants as in the level, the template specified the appropriate values with different
public APIs and it has not been added in the undocumented set yet, types that can be used to produce the parameters (e.g., “1” and “0”
this function’s implementation is an undocumented API. That is, we are used when the “type” of the parameter is of type “number”,
have used quite restrictive patterns that need to exist in all public and “test” was used when the “type” of the parameter is of type
API implementations for a particular super app, and a function “string”). For instance, WeChat API showToast (which shows a
must contain all of these invariants in order to be considered an message to the user) has two parameters title and duration,
undocumented API. Regarding how exactly APIScope identifies with types string and number, respectively. As such, we produced
them, we present a detailed algorithm in Appendix §A for the an instance with the predefined template, where title is set to
readers of interest.
This is a preprint of our CCS 2023 paper. Submission to ACM CCS 2023, 2023

“test” and duration is set to “1”. Using such a template method, use. Meanwhile, although it is true that different platforms may
we successfully instantiated all the parameters. customize the V8 Engine to enable their desired functionalities,
• Parameter Order Permutation. Although we have instanti- they will not intentionally remove the built-in Profiler since it is
ated the parameters, we still do not know the orders of those also helpful for their own debugging purposes. Therefore, as long
parameters for the undocumented APIs, as the parameters in the as we can find a way to invoke Profiler, we will be able to collect
Service Abstraction layer are all encapsulated in JSON objects. the traces. Fortunately, we can use Frida [16], an Android hooking
Therefore, we have to properly order the parameters, and we tool, to dynamically instrument the V8 Engine to invoke startPro-
use a brute-force approach. For example, true and 1234 are two filing of Profiler and let it start profiling, and collect the function
parameters of testAPI, which could have two possible combina- traces of documented API execution.
tions: testAPI(true, 1234)) and testAPI(1234, true). We With the collected function traces, we then present how to find
just assume that all those combinations are valid and invoke the desired interface using function trace analysis, a standard tech-
them one-by-one (the invalid ones will be filtered out during the nique widely used in program analysis. As discussed in §3.1, API
API classification, which will be described later). Given that one invocation is a complicated process involving multiple layers. For-
API can accept no more than 4 parameters (which results in 24 tunately, the Profiler only runs inside the JavaScript Framework
combinations), according to our static analysis with the code, we layer, and we can just monitor the function traces produced at
believe such a brute-force approach is acceptable. this layer since we aim to identify how to invoke an API from the
Specifically, we would like to clarify certain technical details. JavaScript layer. In particular, our analysis starts from the API of
First, during our dynamic analysis, we only explore a limited range our interests (e.g., wx.getLocation), identifies all the functions
of inputs. This is because dynamic tracing does not require a broad involved based on the dependencies of parameter and API names,
range of input to expose hidden APIs. Additionally, the test case and eventually identifies the last invocation function, e.g., NativeG-
generation is sufficient for testing security checks, such as whether lobal.invokeHandler (see Figure 3), which is the desired interface
the hidden API is protected by security checks. In other words, as we aim to discover. Specifically, the dependencies are indeed the
long as valid inputs are provided to the API, our tool can trigger the chained relationship, and we actually build such dependencies based
API if there are no security checks. If there are security checks, we on the parameters that are fed into the functions (we can monitor
the changes of parameters of the functions). For example, when
can observe errors. Our objective is not to enumerate all possible
we execute wx.getLocation, we will observe a function named
inputs, as we are not fuzzing the actual hidden API. Second, hidden
APIs may require complex parameter types, such as JSON-objects. NativeGlobal.invokeHandler that takes a parameter named get-
These complex parameter types are combinations of other basic Location as its inputs. Therefore, we know that wx.getLocation
parameter types (e.g., integer, string), and can be recursively derived and NativeGlobal.invokeHandler have dependencies.
until they become primitive types. For instance, an object may To provide a detailed explanation of how our trace analysis
contain a string, an integer, and a boolean. We can simply inflate works, we will utilize an example that features the implementa-
each parameter based on its respective parameter type. As APIs tions of API invocations across three layers, namely the JavaScript
implemented in the Service Abstraction Layer lack states or context, Framework layer, the Customized V8 layer, and the Service Ab-
it is unnecessary to determine their execution state within this layer. straction layer. The process begins with the JavaScript Frame-
Our testing process involves providing our tool with a code snippet work layer, which initiates the API invocation by calling Native-
containing the API to be tested, which is sufficient for our purposes. Global.invokeHandler. This invocation is then handed over to
The JavaScript Framework Layer handles most of the checks, so the Customized V8 layer, which is responsible for handling it. As
the API invocation is checked before its order or dependency state shown in Figure 7, this step is represented line 10 of the JavaScript
is resolved. Framework layer’s implementation. Next, the Customized V8 layer
extracts critical information from the API invocation, including the
Step-II: API Invocation Identification. Next, APIScope needs to API name, its parameters, and any corresponding callbacks. This in-
execute the generated test cases on top of our customized V8 engine formation is obtained from lines 28–32 of the Customized V8 layer’s
to identify how the documented API is invoked, so that it can later implementation. The Customized V8 layer then proceeds to invoke
similarly invoke the undocumented ones. Intuitively, when we test the relevant APIs at the Service Abstraction Layer through the use
a specific API, we need to compile and produce a testing miniapp of the Java Native Interface (JNI) [21]. Finally, during the API invo-
that contains the API for our test. However, this approach is not cations at the Service Abstraction layer (line 4), this layer may need
scaled and can slow down our testing performance. Interestingly, to communicate with the Customized V8 layer for additional op-
we notice that we can let the V8 engine directly inject the JavaScript erations, such as performing permission checks if the API requires
code into the JavaScript Framework Layer (the V8 engine has a them. We have omitted this code for the sake of brevity. In summary,
function named script, which accepts JavaScript code as input, our trace analysis provides insight into the entire process of API
and injects the code for the JavaScript Framework Layer to execute). invocations across the three layers of the system. We track the flow
Since the JavaScript code is injected into the JavaScript Framework of control and collect data on API names, parameters, and callbacks
layer, the super apps will handle the code as they handle the code to enable a more comprehensive analysis of the system’s behavior.
in a regular miniapp.
Also, in most cases, V8 Engine has a built-in Profiler, but the Step-III: Dynamic Probing for API Category Classification.
super apps do not directly expose any interfaces for developers to With the identified interfaces of how to invoke a public API, we then
use it to similarly invoke undocumented APIs, by first generating
3 var globalCount = 0;
4
5 function invokeMethod(apiName, pa
6 params = WeixinNativeBuffer.p
7 var filteredParams = paramFil
8 callbackId = ++globalCount
9 callbackQueue[callbackId] = c
10 a(apiName, params, ca
11 callbackId = NativeGl
12 callbackId)
13 invokeCallbackHandler
14 }(apiName, filteredParams
This is a preprint of our CCS 2023 paper. Chao Wang, Yue Zhang, and Zhiqiang Lin 15 }
16 return this;
17 }(global);

JavaScript Framework Layer Customized V8 Layer


1 WeixinJSBridge = function(global) { 1 // Implementation of invokeHandler in NativeGlobal JavaScript Object (C++)
2 var NativeGlobal = global.NativeGlobal; 2 int magicbrush::BindingNativeGlobal::BindTo(v8::Object *a1, v8::Isolate *a2){
3 var globalCount = 0; 3 /* Code Omitted */
4 4
5 function invokeMethod(apiName, params, callbackHandler) { 5 v13 = 0;
6 params = WeixinNativeBuffer.pack(params); 6 v7 = (v8::Value *)mm::JSGet<v8::Local<v8::Value>>(a1, v6, "NativeGlobal", &v12);
7 var filteredParams = paramFilter(params || {}), 7 if ( !v7 || (v9 = (int)v7, !v8::Value::IsObject(v7)) )
8 callbackId = ++globalCount; 8 v9 = v8::Object::New(a1, v8);
9 callbackQueue[callbackId] = callbackHandler, 9 v13 = v9; 1// Implement
10 a(apiName, params, callbackId) { 10 2 package com
11 callbackId = NativeGlobal.invokeHandler(apiName, params, 11 /* Code Omitted */ 3 public clas
12 callbackId); 12 4 public
13 invokeCallbackHandler(callbackId, callbackHandler) 13 mm::JSSetWithData((int)a1, 5 public s
14 }(apiName, filteredParams, callbackId) 14 v13, 6
15 } 15 (int)"invokeHandler", 7 @Overri
16 return this; 16 (int)magicbrush::nativeglobal::invokeHandler, 8 public
17 }(global); 17 a2); 9 // s
18 mm::JSSet<v8::Local<v8::Object>>(a1, *a3, "NativeGlobal", v13); 10 env.d
1 // Implementation of invoke handler in Java framework Service Abstraction Layer 19 return v13; 11 }
2 package com.tencent.magicbrush; 20 } 12 }
3 public abstract class MBRuntime { 21 13
4 protected String nativeInvokeHandler(String apiName, String apiParam, int id) { 22 int magicbrush::nativeglobal::invokeHandler(v8::Isolate *a1, _DWORD *a2) { 14 // Impleme
5 if (this.nativeHandler != null) { 23 /* Code Omitted */ 15 package co
6 try { 24 16 public cla
7 return this.nativeHandler.invoke(apiName, apiParam, id); 25 mm::JSConvert<std::string, void>::fromV8(api_name, a1, v6); 17 public
8 } catch (Throwable e) { 26 mm::JSConvert<char16_t const*, void>::fromV8(api_param, a1, v6); 18 public
9 Logger.printStackTrace("MBRuntime", e, "crash when invoke jsapi!"); 27 mm::JSConvert<int, void>::fromV8(callback_id, a1, v6); 19
10 throw e; 28 Java_com_tencent_magicbrush_MBRuntime_nativeInvokeHandler( 20 @Overri
11 } 29 api_name, 21 public
12 } 30 api_param, 22 //
13 Logger.error("MBRuntime", "no native invoke handler"); 31 callback_id 23 env
14 return ""; 32 ) 24 }
15 } 33 25 }
16 } 34 /* Code Omitted */ 26
35 } 27 // Impleme
28 package co
29 public cla
Figure 7: The implementations of API invocations across three layers (WeChat) 30 public
31 public
32
33 @Overri
34 public
the corresponding test cases, and then injecting the JavaScript code it to WeChat, WeChat will reject the invocation request and 35
36
//
env
using the script function into the V8 engine, as described earlier. throw an error message “fail: not supported”. Then, such 37
38 }
}

When executing a particular test case, there could be three types an error message is used as a signature to match the non-APIs.
of outcomes: the tested “API” is a checked API (when invoked, a As an example, in the case of WeChat, if we attempt to use the 1 // Docuemen
2 package com
permission denial will be observed based on the standard error API openUrl, the super app will generate an error message stating 3 public clas
4 public a
messages), the tested “API” is an unchecked API (which can be “fail: no permission”. This error message implies that the API 5 supe
6 }
invoked successfully), the tested “API” is not an API. As such, we is a checked hidden API. On the other hand, if we use the API 7
8 @Overrid
can use the following strategies to identify them. private_openUrl, the super app will handle the invocation request 9 public b
10 //
11 }
• Unchecked APIs. Similar to the public APIs, the unchecked as a regular request without displaying any error message. As a 12 }
13
undocumented APIs can be invoked without requiring additional result, we can conclude that this API is an unchecked hidden API. 14 // Unocuem
15 package co
permissions. As such, we first deliver a public API invocation 16 public cla
17 public
request, such as getLocation, and record the feedback of the 6 EVALUATION 18 sup
19 }
host app. For example, WeChat and Baidu will not print any 20

errors when the invocation request gets approved, and we then We have developed a prototype of APIScope with 5K lines of code 21
22
@Overri
public

use this as a signature to see whether an invocation request is on top of open source tools such as Soot [13] for decompilation 23
24 }
//

successfully executed. and Frida [16] for tracing. In this section, we present the evalua- 25 }

tion results. We first describe our experimental setup in §6.1, and


• Checked APIs. The checked APIs are the APIs that are protected
then APIScope’s effectiveness in §6.2. The efficiency of APIScope
by security checks, which can only be invoked by their 1st-party
is presented in Appendix-§B for readers of interests.
miniapps. In the event of a security check failure, the super apps
will generate error messages notifying the user of insufficient
permissions. This exception applies to all APIs within various 6.1 Experiment Setup
super apps, albeit with minor variations in the error messages
displayed. For example, when 3rd-party mini-apps attempt to The Tested Host Apps. Today, there are quite a number of super
invoke a checked API of WeChat, the host app will throw an apps that support the execution of miniapps. Although we wish to
error message “fail: no permission”. For WeCom, the error test all of them, eventually we selected five of them, as shown in
message becomes “fail: access denied”. Therefore, we use Table 1, and these include WeChat, WeCom and QQ from Tencent
keywords such as “fail”, “no permission” and “access de- Holdings Ltd., Baidu from Baidu Inc., and TikTok from ByteDance
nied” to match and decide whether the invocation request gets Ltd. We excluded other super apps such as Alipay and Snapchat
denied. If so, it is a checked API. particularly because they do not build on the V8 engine (making our
tool unsuitable for them at this moment). Also, to study the security
• Non-APIs. Theoretically, APIScope may have false positives, issues of the tested super apps correspondingly, we registered an
and as such, our tool may mistakenly recognize some non-APIs. account in each platform, downloaded their development tools and
Therefore, we need to filter them out. To that end, we first create SDKs, built miniapps by following their official documents, and
an invalid request and then send it to the host app to see the inspected their code. Among them, Baidu has a relatively closed
feedback. For example, if we initiate an invalid request and send ecosystem, where only the enterprise developers are allowed to
This is a preprint of our CCS 2023 paper. Submission to ACM CCS 2023, 2023

1st-party miniapp
Name Vendor Version V8 Date Installs
being tested?
has fewer API candidates (i.e., 124 API candidates), likely due to its
Baidu Baidu 12.21 7.6 08/13/2021 5,000,000+ ✓ smallest LoC compared to other super apps.
QQ Tencent 8.8 7.2 10/05/2021 10,000,000+ ✓
TikTok ByteDance 17.9 7.2 10/19/2021 1,000,000,000+ ✗ The effectiveness of dynamic analysis is measured by the number
WeChat Tencent 8.0 8.0 07/21/2021 100,000,000+ ✓ of traced functions during API invocation identification and the
WeCom Tencent 3.1 8.0 09/14/2021 100,000+ ✓
number of test cases used during API classification. Among the test
Table 1: Summary of the Tested Super Apps
cases, we also quantify the number of automatically generated test
register as their developers. However, they allow individuals to cases and manually created test cases. We can see that most of the
apply for trial accounts to use their development tools to develop test cases are automatically generated by our test case generation
miniapps, and therefore, we tested Baidu using their trial accounts. algorithm, and the number of automatically generated test cases is
greater than the number of API candidates due to the parameter
The Tested Miniapps. We believe it is important to measure the order permutation (as discussed in §5.2). With our dynamic classifi-
usage of undocumented APIs in 1st-party and 3rd-party miniapps cation for the identified APIs, APIScope detected a large number
for two reasons. First, understanding how 1st-party miniapps use of hidden APIs, many of which are unchecked (as reported in Ta-
these APIs can help us comprehend the entire ecosystem. Second, ble 2). WeChat has more APIs (590 public APIs, 502 undocumented
if 3rd-party developers know about these APIs, they may use them, unchecked APIs, and 65 undocumented checked APIs) than the
which can lead to security issues if these APIs have access to sen- other super apps. However, TikTok has a relatively small number
sitive resources. To analyze the usage of undocumented APIs in of APIs (383 public APIs, 120 undocumented unchecked APIs, and
1st-party miniapps, we searched for interfaces provided by host 2 undocumented checked APIs). With respect to the percentage
apps and collected 236 miniapps from WeChat and WeCom, 340 of undocumented unchecked and checked APIs, WeCom has the
miniapps from Baidu, and 24 miniapps from QQ. We could not most undocumented unchecked APIs (46.3%) and undocumented
find information about the 1st-party miniapps of TikTok, so we checked APIs (6.4%).
did not report their API usage. We could not scan all 3rd-party
miniapps because there is no public dataset or crawlers available. Correctness of Our Result. We quantify whether there are any
Therefore, we can only measure the usage of hidden APIs among false positives or false positives for the identified hidden APIs. First,
3rd-party miniapps within the WeChat ecosystem. We collected a false positive here means that the identified API is not hidden, or
267, 359 miniapps using Mini-Crawler [38] within 3 weeks. is not an API. By design, APIScope will not have false positives for
two reasons: (1) the invariants we extracted have very strict patterns
The Testing Environment. We performed our static analysis on (they have to exist among all public APIs and all of them have to be
one laptop, which has 6 cores, Intel Core i7-10850H (4.90 GHz) present in the undocumented APIs), and (2) our dynamic probing
CPUs and 64 GB RAM, and our dynamic analysis on a Google Pixel for API classification can filter out those non-APIs, which eliminate
4 running Android 11 and a Google Pixel 2 running Android 9, since potential false positives. Nevertheless, we still thoroughly scruti-
we particularly focused on the Android version of miniapps. nized each API identified for WeChat by conducting a manual check
to ensure that there were no false positives. In other words, we
6.2 Effectiveness made sure that the tool did not mistakenly classify non-APIs as APIs.
The effectiveness evaluation aims to quantify how APIScope un- Thanks to our design, we did not come across any false positives
covered the hidden APIs in terms of the specific numbers for the during our examination. Second, with respect to false negatives (i.e.,
involved analysis (which is presented in Table 2), and their quali- “true” hidden API is missed by APIScope), we note that theoretically
ties (i.e., whether there are any false positives). It is worth noting APIScope could have false negatives, for instance, if our invariants
that the manually created cases are indeed rare. For example, for are too strong. However, we will not be able to quantify this, since
Baidu, we automatically created 423 test cases, and created another we do not have the ground truth, unless we can manually examine
56 test cases manually, so the manual efforts are around 11%, i.e., each line of code. Therefore, we leave this to future work.
56/(56+423) = 0.11. Other super apps even have a lower amount of Categories of the Identified APIs. With the identified APIs,
manual efforts than Baidu (e.g., WeCom has 2.9 % manual efforts). we can then obtain some insights with them, such as which cate-
Specifically, the effectiveness of our static analysis is measured gory contains more hidden APIs. To this end, we manually walked
by the identification of API invariants, the number of identified through each API, and categorize them based on the categories of
API candidates (i.e., the functions that are very likely to be APIs). the documented ones, to classify the undocumented (i.e., hidden)
However, whether those API candidates are really APIs are deter- APIs. This result is presented in Table 3. Interestingly, we found
mined in dynamic API classification. For the API invariants, while that most of the categories contain undocumented unchecked APIs.
we have listed four invariants in §5.1, not all of them will exist in In particular, for some of the super apps (e.g., WeChat), their undoc-
all super apps (e.g., Baidu and QQ do not have caller invariant), as umented unchecked APIs can be even more than the documented
shown in Table 2. That is why APIScope aggressively identifies as APIs in some of the categories (e.g., the API category Payment
many invariants as possible. With these invariants, it sufficiently has 28 undocumented APIs, which is way more than their docu-
recognizes the undocumented APIs even though some of them do mented APIs). Finally, we found that some well-documented APIs
not exist in other super apps. During static API recognition, APIS- of a specific super app may not be open to the public in other super
cope recognized in total 1,829 API candidates for these super apps. apps. For example, getUserInfo is an undocumented API of Baidu,
Among them, WeCom contains the most hidden API candidates while WeChat has the same API with the same functionalities,
(683), followed by WeChat (containing 575 API candidates). Tiktok
This is a preprint of our CCS 2023 paper. Chao Wang, Yue Zhang, and Zhiqiang Lin

Input Static Analysis Dynamic Analysis Output


API Classification
API Invariants Invocation # of # of # of
Name #Size # of # of Public # of Hidden API (# of Test Cases)
Identification Checked Unchecked Non
(MBs) LoC API Candidates
Method Super Super (# of Traced # of Auto # of Manually API API API
Callers Functions)
Signature Class Package Generated Created
Baidu 123.6 2,005,003 464 ✓ ✓ ✓ ✗ 143 30 423 56 25 113 5
QQ 138.6 1,557,805 506 ✓ ✓ ✓ ✗ 304 43 1,083 61 6 295 3
TikTok 6.2 718,395 383 ✓ ✓ ✓ ✓ 124 37 352 53 2 122 0
WeChat 199.2 1,609,650 590 ✓ ✓ ✓ ✓ 575 28 2,184 66 65 502 8
WeCom 224.8 1,067,273 606 ✓ ✓ ✓ ✓ 683 31 2,315 70 82 593 8

Table 2: Effectiveness of APIScope with the tested super apps. The terms “Signature”, “Super Class”, “Super Package”, and
“Callers” have consistent meanings with those defined in §5.1.
WeChat WeCom Baidu TikTok QQ
Available APIs
D % UU % UC % D % UU % UC % D % UU % UC % D % UU % UC % D % UU % UC %
Basic 5 71.4 2 28.6 - 0.0 6 66.7 3 33.3 - 0.0 8 72.7 2 18.2 1 9.1 7 63.6 4 36.4 - 0.0 3 100.0 - 0.0 - 0.0
App 13 39.4 14 42.4 6 18.2 13 37.1 16 45.7 6 17.1 8 42.1 10 52.6 1 5.3 6 50.0 6 50.0 - 0.0 9 34.6 17 65.4 - 0.0
Base
Debug 15 88.2 2 11.8 - 0.0 15 88.2 2 11.8 - 0.0 1 3.3 28 93.3 1 3.3 - 0.0 - 0.0 - 0.0 20 100.0 - 0.0 - 0.0
Misc 10 58.8 7 41.2 - 0.0 10 55.6 8 44.4 - 0.0 9 100.0 - 0.0 - 0.0 10 52.6 9 47.4 - 0.0 9 100.0 - 0.0 - 0.0
Interaction 6 46.2 7 53.8 - 0.0 6 46.2 7 53.8 - 0.0 7 41.2 10 58.8 - 0.0 9 81.8 2 18.2 - 0.0 6 40.0 9 60.0 - 0.0
Navigation 4 44.4 5 55.6 - 0.0 4 40.0 6 60.0 - 0.0 4 100.0 - 0.0 - 0.0 5 100.0 - 0.0 - 0.0 4 33.3 8 66.7 - 0.0
UI Animation 32 100.0 - 0.0 - 0.0 32 100.0 - 0.0 - 0.0 21 95.5 1 4.5 - 0.0 1 100.0 - 0.0 - 0.0 31 100.0 - 0.0 - 0.0
WebView - 0.0 22 95.7 1 4.3 - 0.0 24 96.0 1 4.0 - 0.0 3 75.0 1 25.0 - 0.0 3 100.0 - 0.0 - 0.0 16 100.0 - 0.0
Misc 20 27.0 54 73.0 - 0.0 20 25.6 58 74.4 - 0.0 37 77.1 11 22.9 - 0.0 14 73.7 5 26.3 - 0.0 18 42.9 24 57.1 - 0.0
Request 5 55.6 4 44.4 - 0.0 5 55.6 4 44.4 - 0.0 2 66.7 1 33.3 - 0.0 6 60.0 4 40.0 - 0.0 4 66.7 2 33.3 - 0.0
Download 7 24.1 21 72.4 1 3.4 7 23.3 22 73.3 1 3.3 11 100.0 - 0.0 - 0.0 - 0.0 4 100.0 - 0.0 6 60.0 4 40.0 - 0.0
Network Upload 7 50.0 5 35.7 2 14.3 7 46.7 6 40.0 2 13.3 6 100.0 - 0.0 - 0.0 - 0.0 4 100.0 - 0.0 6 75.0 2 25.0 - 0.0
Websocket 14 93.3 1 6.7 - 0.0 14 93.3 1 6.7 - 0.0 13 100.0 - 0.0 - 0.0 7 77.8 2 22.2 - 0.0 13 86.7 2 13.3 - 0.0
Misc 23 88.5 3 11.5 - 0.0 23 85.2 4 14.8 - 0.0 - 0.0 - 0.0 - 0.0 - 0.0 - 0.0 - 0.0 10 55.6 8 44.4 - 0.0
Storage 10 66.7 5 33.3 - 0.0 10 66.7 5 33.3 - 0.0 10 100.0 - 0.0 - 0.0 10 90.9 1 9.1 - 0.0 10 83.3 2 16.7 - 0.0
Map 8 14.3 48 85.7 - 0.0 8 14.3 48 85.7 - 0.0 7 100.0 - 0.0 - 0.0 6 100.0 - 0.0 - 0.0 9 36.0 16 64.0 - 0.0
Image 6 60.0 4 40.0 - 0.0 6 60.0 4 40.0 - 0.0 6 85.7 1 14.3 - 0.0 5 83.3 1 16.7 - 0.0 6 60.0 4 40.0 - 0.0
Video 14 35.0 26 65.0 - 0.0 14 31.8 30 68.2 - 0.0 19 95.0 1 5.0 - 0.0 8 80.0 2 20.0 - 0.0 14 63.6 8 36.4 - 0.0
Audio 64 84.2 9 11.8 3 3.9 64 79.0 14 17.3 3 3.7 44 100.0 - 0.0 - 0.0 44 81.5 10 18.5 - 0.0 61 85.9 10 14.1 - 0.0
Media
Live 26 46.4 30 53.6 - 0.0 26 39.4 40 60.6 - 0.0 8 100.0 - 0.0 - 0.0 19 100.0 - 0.0 - 0.0 23 57.5 17 42.5 - 0.0
Recorder 16 84.2 3 15.8 - 0.0 16 84.2 3 15.8 - 0.0 12 100.0 - 0.0 - 0.0 11 91.7 1 8.3 - 0.0 15 88.2 2 11.8 - 0.0
Camera 9 60.0 6 40.0 - 0.0 9 52.9 8 47.1 - 0.0 9 50.0 9 50.0 - 0.0 20 95.2 1 4.8 - 0.0 4 36.4 7 63.6 - 0.0
Misc 12 75.0 3 18.8 1 6.3 12 75.0 3 18.8 1 6.3 18 100.0 - 0.0 - 0.0 - 0.0 - 0.0 - 0.0 6 100.0 - 0.0 - 0.0
Location 3 42.9 4 57.1 - 0.0 3 42.9 4 57.1 - 0.0 7 100.0 - 0.0 - 0.0 3 100.0 - 0.0 - 0.0 3 100.0 - 0.0 - 0.0
Share 4 33.3 7 58.3 1 8.3 4 16.7 19 79.2 1 4.2 3 100.0 - 0.0 - 0.0 5 71.4 2 28.6 - 0.0 5 35.7 9 64.3 - 0.0
Canvas 60 74.1 21 25.9 - 0.0 60 74.1 21 25.9 - 0.0 46 92.0 4 8.0 - 0.0 49 98.0 1 2.0 - 0.0 48 92.3 4 7.7 - 0.0
File 39 97.5 1 2.5 - 0.0 39 92.9 3 7.1 - 0.0 35 100.0 - 0.0 - 0.0 34 97.1 1 2.9 - 0.0 37 97.4 1 2.6 - 0.0
Login 2 100.0 - 0.0 - 0.0 5 83.3 1 16.7 - 0.0 3 42.9 1 14.3 3 42.9 2 100.0 - 0.0 - 0.0 2 100.0 - 0.0 - 0.0
Navigate 2 33.3 2 33.3 2 33.3 2 22.2 5 55.6 2 22.2 3 100.0 - 0.0 - 0.0 7 100.0 - 0.0 - 0.0 2 50.0 1 25.0 1 25.0
User Info 2 16.7 7 58.3 3 25.0 5 23.8 13 61.9 3 14.3 1 10.0 6 60.0 3 30.0 2 13.3 13 86.7 - 0.0 2 28.6 4 57.1 1 14.3
Open API Payment 1 3.4 13 44.8 15 51.7 1 3.2 15 48.4 15 48.4 1 50.0 - 0.0 1 50.0 1 33.3 1 33.3 1 33.3 2 22.2 7 77.8 - 0.0
Bio-Auth 3 27.3 3 27.3 5 45.5 3 21.4 6 42.9 5 35.7 - 0.0 - 0.0 - 0.0 - 0.0 1 100.0 - 0.0 3 100.0 - 0.0 - 0.0
Enterprise - 0.0 1 100.0 - 0.0 5 17.9 6 21.4 17 60.7 - 0.0 - 0.0 - 0.0 - 0.0 - 0.0 - 0.0 - 0.0 - 0.0 - 0.0
Misc 14 19.4 42 58.3 16 22.2 14 16.7 54 64.3 16 19.0 16 57.1 2 7.1 10 35.7 25 55.6 20 44.4 - 0.0 12 13.0 78 84.8 2 2.2
Wi-Fi 9 100.0 - 0.0 - 0.0 9 100.0 - 0.0 - 0.0 10 100.0 - 0.0 - 0.0 4 100.0 - 0.0 - 0.0 9 100.0 - 0.0 - 0.0
Bluetooth 18 60.0 11 36.7 1 3.3 18 58.1 12 38.7 1 3.2 - 0.0 - 0.0 - 0.0 - 0.0 - 0.0 - 0.0 18 100.0 - 0.0 - 0.0
Contact 1 10.0 5 50.0 4 40.0 1 9.1 6 54.5 4 36.4 1 33.3 2 66.7 - 0.0 - 0.0 - 0.0 - 0.0 1 25.0 2 50.0 1 25.0
Device NFC 5 26.3 14 73.7 - 0.0 9 39.1 14 60.9 - 0.0 - 0.0 - 0.0 - 0.0 - 0.0 - 0.0 - 0.0 5 100.0 - 0.0 - 0.0
Screen 4 36.4 6 54.5 1 9.1 4 36.4 6 54.5 1 9.1 3 100.0 - 0.0 - 0.0 9 100.0 - 0.0 - 0.0 4 100.0 - 0.0 - 0.0
Phone 1 4.3 21 91.3 1 4.3 1 4.3 21 91.3 1 4.3 1 100.0 - 0.0 - 0.0 1 100.0 - 0.0 - 0.0 1 50.0 1 50.0 - 0.0
Misc 28 63.6 15 34.1 1 2.3 28 59.6 18 38.3 1 2.1 21 80.8 5 19.2 - 0.0 16 69.6 7 30.4 - 0.0 28 82.4 6 17.6 - 0.0
CV 19 100.0 - 0.0 - 0.0 19 100.0 - 0.0 - 0.0 18 90.0 2 10.0 - 0.0 - 0.0 - 0.0 - 0.0 - 0.0 - 0.0 - 0.0
AI
Misc - 0.0 - 0.0 - 0.0 - 0.0 1 100.0 - 0.0 11 100.0 - 0.0 - 0.0 7 100.0 - 0.0 - 0.0 - 0.0 - 0.0 - 0.0
AD 19 95.0 1 5.0 - 0.0 19 95.0 1 5.0 - 0.0 9 64.3 4 28.6 1 7.1 13 61.9 8 38.1 - 0.0 3 25.0 9 75.0 - 0.0
Uncategorized 30 38.5 47 60.3 1 1.3 30 36.6 51 62.2 1 1.2 15 53.6 10 35.7 3 10.7 17 68.0 7 28.0 1 4.0 34 68.0 15 30.0 1 2.0
All 590 51.0 502 43.4 65 5.6 606 47.3 593 46.3 82 6.4 464 77.1 113 18.8 25 4.2 383 75.8 120 23.8 2 0.4 506 62.7 295 36.6 6 0.7

Table 3: Categories of Documented and Undocumented APIs. “D” means documented APIs; “UU” means undocumented
unchecked APIs; “UC” means undocumented checked APIs.

which is publicly accessible. Finally, since APIScope is a systematic Usage of Hidden APIs (Among the 1st-party Miniapps). We
and mostly automated tool, it can inspect API changes based on obtained many 1st-party miniapps and classified them into cate-
previous versions of the super app implementations as long as we gories based on their meta-data. From the data in Table 4, we found
can obtain both their APKs and documentation. We have a detailed that the use of undocumented APIs is common among 1st-party
evaluation of API evaluation in Appendix-§C for interested readers.
This is a preprint of our CCS 2023 paper. Submission to ACM CCS 2023, 2023

WeChat WeCom Baidu QQ API Name Category # App % *App w/ Check


Category
# U # App % # U # App % # U # App % # U # App % swan.button Interaction 104 88.14 ✗
Business 14 49 28.6 16 49 32.7 21 38 55.3 1 3 33.3 swan.login Login 31 26.27 ✓
Education 6 26 23.1 7 26 26.9 5 16 31.3 - 3 0.0 swan.postMessage Uncategorized 8 6.78 ✗

Baidu
E-learning 5 9 55.6 5 9 55.6 12 33 36.4 - 1 0.0 swan.getBDUSS User Info 4 3.39 ✓
Entertainment 9 17 52.9 9 17 52.9 29 75 38.7 2 2 100.0 swan.getCommonSysInfo System 3 2.54 ✓
Finance 1 1 100.0 1 1 100.0 21 23 91.3 - - 0.0 swan.getUserInfo User Info 3 2.54 ✗
Food - - 0.0 - - 0.0 - 5 0.0 - - 0.0 swan.getChannelID Uncategorized 2 1.69 ✓
Games 18 36 50.0 18 36 50.0 - - 0.0 - - 0.0 wx.hideNavigationBar Bar 28 32.18 ✗
Government 2 7 28.6 2 7 28.6 3 8 37.5 1 1 100.0 wx.requestSubscribeMessage Subscribe 25 28.74 ✗
Health 2 7 28.6 2 7 28.6 1 5 20.0 - 1 0.0 wx.showNavigationBar Bar 23 26.44 ✗
Job - 1 0.0 - 1 0.0 - - 0.0 - - 0.0 wx.requestVirtualPayment Payment 11 12.64 ✓
Lifestyle 2 5 40.0 2 5 40.0 3 15 20.0 - 1 0.0 wx.openUrl Misc 8 9.20 ✓
Photo 3 7 42.9 3 7 42.9 - - 0.0 - - 0.0 wx.hideHomeButton Interaction 8 9.20 ✗
Shopping 1 1 100.0 1 1 100.0 - 2 0.0 - - 0.0 wx.enterContact Contact 5 5.75 ✓
Social 4 8 50.0 4 8 50.0 1 4 25.0 - 1 0.0 wx.drawCanvas Canvas 5 5.75 ✗
Sports - - 0.0 - - 0.0 - 1 0.0 - - 0.0 wx.setPageOrientation Misc 4 4.60 ✗

WeChat
Tool 15 55 27.3 15 55 27.3 16 47 34.0 4 8 50.0 wx.operateWXData Misc 4 4.60 ✗
Traffic 3 5 60.0 3 5 60.0 4 10 40.0 - 1 0.0 wx.getBackgroundFetchData Misc 3 3.45 ✗
Travelling 2 2 100.0 2 2 100.0 1 56 1.8 1 2 50.0 wx.setBackgroundFetchToken Misc 3 3.45 ✗
Uncategorized - - 0.0 - - 0.0 1 2 50.0 - - 0.0 wx.startFacialRecognitionVerify Bio-Auth 3 3.45 ✓
Total 87 236 36.9 90 236 38.1 118 340 34.7 9 24 37.5 wx.checkIsSupportFacialRecognition Bio-Auth 2 2.30 ✓
wx.navigateBackApplication Navigate 2 2.30 ✗
Table 4: The 1st party miniapps that have used the un- wx.navigateBackNative Navigate 2 2.30 ✓
documented APIs. The first column indicates the number wx.onDeviceOrientationChange Device 2 2.30 ✗
wx.openBusinessView View 2 2.30 ✗
of 1st-party mini-apps using undocumented APIs, and the wx.verifyPaymentPassword Payment 2 2.30 ✓
second column represents the total number of 1st-party wx.hideNavigationBar Bar 28 31.11 ✗
wx.requestSubscribeMessage Subscribe 25 27.78 ✗
mini-apps. We calculate the percentage of mini-apps by us- wx.showNavigationBar Bar 23 25.56 ✗
wx.requestVirtualPayment Payment 11 12.22 ✓
ing the first column divided by the second. wx.openUrl Misc 8 8.89 ✓
wx.hideHomeButton Interaction 8 8.89 ✗
wx.enterContact Contact 5 5.56 ✓
wx.drawCanvas Canvas 5 5.56 ✗
wx.setPageOrientation Misc 4 4.44 ✗
wx.operateWXData Misc 4 4.44 ✗
WeCom

wx.getBackgroundFetchData Misc 3 3.33 ✗


wx.setBackgroundFetchToken Misc 3 3.33 ✗
miniapps regardless of their category. WeCom had the highest per- wx.startFacialRecognitionVerify Bio-Auth 3 3.33 ✓
centage of 1st-party miniapps using undocumented APIs at 38.1%, wx.checkIsSupportFacialRecognition Bio-Auth 2 2.22 ✓
wx.navigateBackApplication Navigate 2 2.22 ✗
followed by QQ at 37.5%, WeChat at 36.9%, and Baidu at 34.7%. We wx.navigateBackNative Navigate 2 2.22 ✓
also observed that 1st-party miniapps in the Traveling, Shopping, wx.openBusinessView Misc 2 2.22 ✗
wx.qy.chooseAttach File 2 2.22 ✓
and Finance categories were more likely to use undocumented APIs, wx.qy.chooseWxworkContact Enterprise 2 2.22 ✓
wx.qy.chooseWxworkVisibleRange Enterprise 2 2.22 ✓
and these APIs were often related to payment. For example, many wx.qy.openWechatWebviewUrl WebView 2 2.22 ✗
miniapps in these categories would use the undocumented API wx.qy.postNotification System 2 2.22 ✓
wx.qy.showUserProfile User Info 2 2.22 ✓
verifyPaymentPassword to verify payment passwords. wx.qy.wwLog Uncategorized 2 2.22 ✗
Next, we sought to understand the most popular undocumented wx.qy.wwOpenUrlScheme Uncategorized 2 2.22 ✓
wx.verifyPaymentPassword Payment 2 2.22 ✓
APIs and how often they are used by 1st-party miniapps. We grouped qq.openUrl Misc 4 44.44 ✗
qq.addRecentColorSign UI 3 33.33 ✗
the APIs by name and counted the number of miniapps that used qq.exitMiniProgram App 2 22.22 ✗
each API. This information is presented in table Table 5. We found qq.getGroupInfo User Info 2 22.22 ✗
qq.getGroupInfoExtra User Info 2 22.22 ✗
that 7 undocumented APIs provided by Baidu were used by their qq.getPerformance System 1 11.11 ✗
1st-party miniapps, 34 undocumented APIs provided by WeChat qq.getQua Uncategorized 1 11.11 ✗
QQ

qq.getUserInfoExtra User Info 1 11.11 ✗


were used by their 1st-party miniapps (only 19 of which are listed in qq.invokeNativePlugin System 1 11.11 ✓
Table 5 due to space constraints), 43 undocumented APIs provided qq.notifyNative System 1 11.11 ✗
qq.openScheme Misc 1 11.11 ✓
by WeCom were used by their 1st-party miniapps (again, only those qq.requestMidasPayment Payment 1 11.11 ✗
qq.toggleSecureWindow UI 1 11.11 ✗
used by more than two miniapps are shown), and 14 undocumented qq.wnsRequest App 1 11.11 ✗
APIs provided by QQ were used by their 1st-party miniapps.
Table 5: The popular hidden APIs invoked by the 1st-party
Finally, we present whether there are any missing security checks miniapps.
for these undocumented APIs from our API classification result in
the last column of Table 5. We found that 3 out of 7 (42.9%) APIs
used by Baidu’s 1st-party miniapps do not have security checks Usage of Hidden APIs (Among the 3rd-party Miniapps). Based
and can be invoked and exploited by 3rd-party miniapps; 16 of on the data presented in Table 6, we have discovered that the uti-
34 (47.06%) APIs of WeChat; 22 of 43 (51.16%) APIs of WeCom; lization of undocumented APIs is widespread among 3rd-party
and 12 of 14 (85.7%) APIs of QQ can be exploited by 3rd-party miniapps, regardless of their category. The percentage of 3rd-party
miniapps. We also noticed that different vendors have different miniapps employing undocumented APIs is 29.54%. Our observa-
security restrictions on their undocumented APIs. For example, tions have further revealed that 3rd-party miniapps in the Shopping
WeChat and WeCom place security checks on their undocumented and Business categories are more inclined to use undocumented
APIs that are related to payment (wx.requestVirtualPayment), APIs, particularly those linked to sensitive operations like payment.
authentication (wx.startFacialRecognitionVerify) and access In addition, we conducted an analysis to comprehend the most
to resources (wx.openUrl). popular undocumented APIs and the frequency of their usage by
This is a preprint of our CCS 2023 paper. Chao Wang, Yue Zhang, and Zhiqiang Lin

Category #U # App % API Name Category # App % *App w/ Check


wx.requestFacetoFacePayment Payment 40,091 14.98 ✓
Business 8,116 14,887 54.52 wx.operateWXData Misc 21,834 8.16 ✗
E-learning 335 2,088 16.04 wx.setPageOrientation UI 18,499 6.91 ✗
Education 2,738 40,410 6.78 wx.enterContact Contact 17,421 6.51 ✓
Entertainment 1,286 5,258 24.46 wx.openUrl Misc 17,140 6.41 ✓
Finance 262 1,408 18.61 wx.preloadWebview WebView 15,335 5.73 ✓
Food 1,107 6,345 17.45 wx.navigateBackNative Navigate 13,407 5.01 ✓
Games 1,777 4,745 37.45 wx.editTextWithPopForm Misc 13,390 5.00 ✗
Government 929 7,808 11.90 wx.openAddressWithLightMode Address 13,390 5.00 ✗
Health 795 6,422 12.38 wx.requestPersonalPay Payment 10,263 3.84 ✗
Job 177 4,399 4.02 wx.previewMedia Media 6,635 2.48 ✗
Lifestyle 11,846 35,371 33.49 wx.drawCanvas Canvas 6,055 2.26 ✗
Photo 136 1,981 6.87 wx.openBusinessView Misc 3,800 1.42 ✗
Shopping 44,629 46,202 96.60 wx.onDeviceOrientationChange Device 1,626 0.61 ✗
Social 217 5,694 3.81 wx.startFacialRecognitionVerify Bio-Auth 1,239 0.46 ✓
Sports 312 3,378 9.24 wx.checkIsSupportFacialRecognition Bio-Auth 669 0.25 ✓
Tool 3,423 72,301 4.73 wx.notifyBLECharacteristicValueChanged Bluetooth 603 0.23 ✗
Traffic 580 6,502 8.92 wx.getBackgroundFetchData Misc 498 0.19 ✗
Travelling 309 2,160 14.31 wx.setBackgroundFetchToken Misc 485 0.18 ✗
wx.startFacialRecognitionVerifyAndUploadVideo Bio-Auth 464 0.17 ✓
Total 78,974 267,359 29.54 wx.updateApp Update 448 0.17 ✗
wx.openOfflinePayView UI 324 0.12 ✓
Table 6: The 3rd party WeChat miniapps that have used the wx.sendBizRedPacket Payment 212 0.08 ✓
undocumented APIs. wx.getVideoInfo Video 193 0.07 ✗
wx.compressVideo Video 148 0.06 ✗
wx.setBLEMTU Bluetooth 127 0.05 ✗
wx.getPhoneNumber User Info 122 0.05 ✗
3rd-party miniapps. We categorized the APIs by name and tallied wx.openVideoEditor Video 118 0.04 ✗
wx.chooseContact Contact 100 0.04 ✗
the number of miniapps that leveraged each API. We have found wx.openChannelsLive Misc 97 0.04 ✗
that 103 undocumented APIs provided by WeChat were utilized by wx.openAddress Address 96 0.04 ✗
wx.setMenuStyle Menu 74 0.03 ✗
their 3rd-party miniapps. Among these APIs, it is notable that 79
of them lack security checks. As shown in Table 7, we present a Table 7: The popular hidden APIs invoked by the 3rd-party
summary of undocumented APIs that have been utilized by over WeChat miniapps.
50 mini-apps. It is evident that a majority of these hidden APIs lack
proper security measures. To further understand the details, we
delved into a selection of them to uncover why 3rd-party mini-apps that it has the potential to pose security risks. Otherwise, we can
have knowledge of them and whether they are being exploited. proceed to examine the implementation of each method within that
Our investigation has yielded some intriguing findings. (i) While hidden API, conducting the process recursively as needed.
some APIs are not publicly documented, Tencent does share them However, not all invoked APIs manipulate sensitive resources
with certain vendors who work closely with them and permit these within the Android system. For example, the android.graphics
vendors to request access. An example of such an API is request- API offers graphics tools that allow developers to draw directly onto
FacetoFacePayment [27] (which is used by 40,091 miniapps). (ii) the screen. It is evident that invoking these APIs would not result in
There were some concealed APIs that were once freely available for any security consequences. Therefore, we consider APIs that access
use without any security checks. However, Tencent subsequently resources protected by permissions (such as location, the Internet,
banned them. One such API is “openUrl” [23]. Interestingly, even and file system) to have security risks. Consequently, we opted to
though Tencent has banned the usage of this API, a whopping utilize a lightweight dynamic analysis approach to identify such
17,140 miniapps have yet to remove the invocation of this API from APIs. Specifically, we hook all Android APIs that access sensitive
their code (obviously, this will not work). This API has already resources, which are typically protected by Android permissions,
been banned by Tencent prior to our report. (iii) There are still and invoke unchecked undocumented APIs one by one. By monitor-
some APIs that remain usable until we notify Tencent of the issue. ing whether the sensitive resource access APIs are invoked during
For example, captureScreen (12 miniapps used this API) can be this process, we can determine whether the undocumented APIs
utilized to obtain the user’s sensitive information (See §7.2). are implemented based on them. Furthermore, we are able to infer
whether these APIs posed any security risks. While this approach
7 EXPLOITING UNCHECKED HIDDEN APIS may not uncover all the APIs since the execution of the hidden APIs
may depend on the parameters and may not trigger the underlying
7.1 Quantifying the Security Risks security sensitive APIs, it can at least provide a lower-bound.
Methodology. After quantifying the number of unchecked undoc- Results. We categorize the hidden APIs by analyzing the Android
umented APIs, our goal is to gain a better understanding of whether APIs that utilize the resources and grouping them accordingly. As
or not these APIs pose any security risks. While it is possible to shown in Table 8, we have identified 39 APIs (7.77%) in WeChat, 40
manually analyze each API individually, it is not very practical APIs (6.75%) in WeCom, 8 APIs (7.08%) in Baidu, 32 APIs (26.67%)
or reliable, especially given the vast number of APIs we need to in Tiktok and 38 APIs (12.88%) in QQ that invoke Android APIs
analyze (more than 1,500 APIs). However, our observation is that that are protected by permissions. It should be noted that WeChat
for an undocumented API to have potential security implications, and WeCom have the most APIs that can access sensitive resources,
it must be able to access sensitive information and resources on the while Baidu has the least number of such APIs. This is likely due to
Android system (e.g., location, files, and the internet). Therefore, the fact that super apps require more Android permissions. To be
if we find that the hidden API calls a native API, we can conclude more specific, WeChat requires 92 permissions, which is larger than
This is a preprint of our CCS 2023 paper. Submission to ACM CCS 2023, 2023

Resource
WeChat
# UUS %
WeCom
# UUS %
Baidu
# UUS %
Tiktok
# UUS % # UUS
QQ
%
API Usage by Super App
Bluetooth
Camera
3 0.59
1 0.20
3 0.51
1 0.17
-
-
-
-
-
-
-
-
-
1 0.34
-
WeChat
Location - - - - - - - - 1 0.34 102 WeCom

Number of Uses
Media 5 0.96 5 0.84 - - 11 9.17 11 3.73 Baidu
NFC
Network
3 0.59
16 3.19
3 0.51
16 2.70
-
7 6.19
- - -
20 16.67
-
24 8.14
-
TikTok
Package 3 0.59 4 0.67 1 0.88 - - 1 0.34 QQ
Storage 25 4.98 26 4.38 3 2.65 2 1.67 8 2.71 101
Telephony - - - - - - 1 0.83 - -
Total 39 7.77 40 6.75 8 7.08 32 26.67 38 12.88

Table 8: The sensitive resources that undocumented


unchecked APIs accessed. UUS means undocumented
100

MediaMetadataRetriever

WifiManager
MediaExtractor

WifiInfo
Camera

AudioManager
MediaPlayer
NdefRecord

NsdManager

WifiNetworkSpecifier
NetworkRequest
WifiConfiguration
PackageManager
SharedPreferences
BluetoothGatt
BluetoothManager

MediaFormat

NdefMessage
ConnectivityManager
IpPrefix

MacAddress
BluetoothDevice

BluetoothGattCharacteristic
BluetoothGattService

AudioDeviceInfo
BluetoothAdapter

LocationManager

NfcAdapter
LocalServerSocket
NetworkInfo

LinkProperties
unchecked sensitive APIs. Please note that a single hidden
API may have access to multiple types of resources. There-
fore, the total number of hidden APIs may not be equal to
the sum of all the APIs that have been identified for each
individual resource type.

that of Baidu (82). These accessed sensitive resources include cam-


era, location, audio, and Internet. It is important to note that hidden
APIs that access sensitive resources do not necessarily mean that Figure 8: Android APIs used by the hidden APIs from differ-
they can access them without requiring permission. Specifically, in ent companies.
addition to the resources that are safeguarded by Android permis-
sions, we are also including SharedPreferences in our checklist. Attacks Targeted Resources Exploited APIs Vulnerable Super Apps
This is because miniapps may utilize this Android API to store files private_openUrl
WeChat,WeCom
A1 Web Resources openUrl
in the space belonging to the super apps, which could potentially postMessage
QQ, Baidu
compromise the files of both the super apps and other apps. installDownloadTask
addDownloadTaskStraight WeChat,WeCom
Next, our objective is to understand the Android APIs utilized A2 Web Resources
startDownloadAppTask QQ
by the undocumented APIs. For this purpose, we count the number installApp
A3 User information captureScreen WeChat, WeCom
of Android APIs invoked by each hidden API of the super app, and A4 User phonenumber getLocalPhoneNumber Tiktok
classify them based on the names of the corresponding Android A5 User contacts searchContacts WeChat

API Packages. We exclude the API packages that only be invoked Table 9: Summary of the attacks we tested
once. It can be observed from Figure 8 that the API most commonly
used is SharedPreferences. This is reasonable, as many of the
APIs involve file operations. The available APIs consist of those
the official API wx.request to access websites, and any network
dedicated to saving screenshots onto disks, which can be utilized
requests made through this API will be thoroughly vetted), but our
to launch A3. Besides file access APIs, numerous hidden APIs make
malware can bypass these restrictions and navigate to any webpage
use of Internet access APIs for different purposes, including pay-
without being vetted. This vulnerability allows our miniapp to open
ment processing, network resource access, and more. The currently
phishing websites and steal sensitive information, which is more
available APIs comprise those responsible for website access, which
powerful than previous phishing attacks [25]. We were successful
can be leveraged to trigger A1, APIs created for APK downloading
in this attack on several super apps but could not test it on TikTok
and installation, which can be utilized to launch A2, and APIs for
because it does not have the necessary APIs. This vulnerability is a
querying contact information, which can be employed to initiate A5.
significant security risk for super apps because they have a unique
Please note that there are also APIs that access NFC, Camera, and
threat model that differs from web browsers. Super apps only allow
Telephony Manager (which can be used to launch A4). However,
access to specific domains, unlike web browsers that can access any
since they have only been invoked once, we have excluded them
website. This vulnerability has been confirmed as a high-severity
from the figure.
vulnerability by Tencent.

7.2 Attack Case Studies (A2) Malware Download and Installation. We developed a
malicious miniapp that can download and install malware using
We present a few case studies to demonstrate how we can exploit APIs installDownloadTask or addDownloadTaskStraight. Reg-
those hidden unchecked (i.e., unprotected) APIs. For proof of con- ular miniapps cannot download or install APK files on a mobile
cept, we present five case studies covering from arbitrary webpage device because they have limited capabilities and can only down-
access to information theft, as shown in Table 9. load certain file types from specific servers. However, by using these
(A1) Arbitrary Web Page Access. We made a malicious miniapp APIs, a miniapp can download and install harmful APKs, which can
that can open any webpage using the hidden API private_openUrl. cause significant damage to the user’s mobile security and privacy.
Super apps usually have an allowlist of approved domains to prevent This attack works on both WeChat and WeCom. Finally, although
users from accessing untrusted sources (i.e., miniapps usually utilize APKs cannot be installed without the user consent, miniApps is
This is a preprint of our CCS 2023 paper. Chao Wang, Yue Zhang, and Zhiqiang Lin

running inside the Super Apps, and as long as Super App has the in- might want to make these miniapps because chat history can be
stalling permission (which most users will grant because they trust used as evidence in court. We plan to develop a tool that can identify
Super Apps), the malicious miniApp can install arbitrary APKs. hidden API vulnerabilities (e.g., SQL injection and buffer overflow).
(A3) Screenshot-based Information Theft. We made a malicious Ethics and Responsible Disclosure. Being an attack work by
miniapp that uses the captureScreen to secretly take screenshots nature, we must carefully address the ethical concerns. To this
and store them without the user’s permission. This could be used end, we have followed the community practice when exploiting
by attackers to steal sensitive information like passwords and credit the vulnerabilities and demonstrated our attacks. First, for proof of
card numbers from the user’s screen. The consequences of this kind concept, we developed quite a number of malicious miniapps and
of attack are serious. For example, the attacker could use them to launched attacks against our own accounts and devices. We have
steal the victim’s identity and open fake accounts or make illegal never uploaded our malicious miniapps onto the markets to harm
purchases. They could also use the screenshots to commit financial other users. Second, we have disclosed the vulnerabilities and our
fraud by stealing the victim’s credit card. attacks against WeChat to Tencent in September 2021, and the other
four super apps in November 2021. They have all acknowledged and
(A4) Phone Number Theft. The malicious miniapps may use
confirmed our findings, and so far among them Tencent (the biggest
getLocalPhoneNumber to illicitly obtain the user’s phone numbers.
super app vendor with 1.2 billion monthly users) has confirmed
The hidden API is implemented by getLine1Number, which is a
with 4 vulnerabilities, ranked 1 low, 2 medium, and 1 high, and
built-in feature of the Android SDK intended to provide the phone
awarded us with bug bounty and fixed them. TikTok has been
number associated with the SIM card currently inserted in the de-
patched too, but not Baidu at this time of writing.
vice. Nevertheless, access to phone number information from the
SIM card may be blocked or restricted by some carriers or manufac-
turers, thereby rendering this attack unsuccessful in certain cases. 9 RELATED WORK
(A5) Contact Information Theft. A miniapp can potentially ac- Super Apps Security. More and more super apps have started to
cess sensitive information, such as friend list (including the user- support the miniapp paradigm. Correspondingly, its security has
names and WeChat ID) using searchContacts. Our experiments received increasing attention. For instance, Lu et al. [25] identi-
were conducted primarily in 2021, during which we found that this fied multiple flaws in WeChat, and demonstrated how an attacker
hidden API was still functional based on our raw results. Upon re- would be able to launch phishing attacks against mobile users and
porting the issue to WeChat, we were informed that another group collect sensitive data from the host apps. Zhang et al. [38] devel-
had already reported the problem to them (CVE-2021-40180 [32]), oped a crawler, and understood the super apps by measuring the
and that the exploit no longer works on the new version of WeChat. program practices of the provided miniapps, including how often
the miniapp code will be obfuscated. Most recently, Zhang et al. [37]
studied the identity confusion in WebView-based super apps, and
8 DISCUSSION
identified that multiple super apps contain this vulnerability. A new
Limitations and Future Work. Although effective, APIScope can attack named cross-miniapp request forgery (CMRF) [36] was also
still be improved in various ways. It is possible for the tool to have recently discovered, which exploits the missing checks of miniapp
false positives and negatives, although none have been encountered IDs for various attacks. Differently from those works, our study
through dynamic validation and manual verification. Also, while uncovers the undocumented APIs provided by the super apps and
currently tested on Android, additional work is needed to support demonstrates how they can be exploited. In a broader scope, there
other platforms. However, our findings are representative across is a large body of research studying the security of other super apps
different platforms, as miniapp codebases are similar. Note that including web browsers and their lightweight apps, such as Google
APIScope is limited to super-apps that use the V8 engine and is not Instant apps [11]. In particular, Aonzo et al. [11], and Tang et al.
suitable for those that do not (e.g., Alipay). [31] point out that Google Instant Apps can be abused to mount
In our study, we discovered some hidden APIs that may be password-stealing attacks.
vulnerable, such as the installDownloadTask and addDownload- Undocumented API Detection and Exploitation. APIScope is
TaskStraight APIs, which are susceptible to SQL injection attacks. the first system to detect and exploit undocumented APIs in mobile
Attackers can compromise super app file download tasks by re- super apps like WeChat. Previous work has focused on detecting un-
placing the download URL of the WeChat update package with documented APIs in other platforms, such as Android and iOS, or on
a malicious one. We also noticed that there are two APIs called identifying missing security checks (e.g., [10, 15, 19, 24, 28, 29, 39]).
dumpHeapSnapshot and HeapProfiler that also have vulnerabil- For example, PScout analyzed undocumented APIs in Android [12],
ities. These APIs are designed to save data from the V8 engine to and Li et al. showed that there are 17 undocumented Android
a file, but our miniapp misuses them to write to any file it wants. APIs that are widely accessed by 3rd-party apps [20]. Zeinab and
While Android tries to prevent this, important files like chat histo- Yousra studied access control vulnerabilities caused by residual
ries are still at risk. This could lead to serious problems because our APIs [22]. In addition, there are ways to invoke undocumented
miniapp could overwrite important files of other miniapps and their APIs in iOS [17, 34] and detect their abuses [14]. Yang et al. [35]
host apps, which breaks the security measures put in place by super proposed BridgeScope to identify sensitive JavaScript bridge APIs
apps. Our experiment proved that we could overwrite a file called in hybrid apps. Undocumented APIs have also been found in the
EnMicroMsg.db, which stores chat history on WeChat. Attackers Java language and exploited by attackers [18, 26]. APIScope builds
This is a preprint of our CCS 2023 paper. Submission to ACM CCS 2023, 2023

on this previous work to specifically focus on mobile super-apps. [17] J. Han, S. M. Kywe, Q. Yan, F. Bao, R. Deng, D. Gao, Y. Li, and J. Zhou, “Launching
Finding hidden APIs in super apps using traditional techniques is generic attacks on ios with approved third-party applications,” in International
Conference on Applied Cryptography and Network Security. Springer, 2013, pp.
difficult due to the combination of web views, host native apps, 272–289.
and mini app execution environments, along with code scattering [18] S. Huang, J. Guo, S. Li, X. Li, Y. Qi, K. Chow, and J. Huang, “Safecheck: safety
enhancement of java unsafe api,” in 2019 IEEE/ACM 41st International Conference
and obfuscation. Our new approach monitors parameter propaga- on Software Engineering (ICSE). IEEE, 2019, pp. 889–899.
tion to detect API usage, using robust signatures based on super [19] S. M. Kywe, Y. Li, K. Petal, and M. Grace, “Attacking android smartphone systems
classnames and public methods. We have also created a method for without permissions,” in 2016 14th Annual Conference on Privacy, Security and
Trust (PST). IEEE, 2016, pp. 147–156.
automatic test case generation and API classification. [20] L. Li, T. F. Bissyandé, Y. Le Traon, and J. Klein, “Accessing inaccessible android
apis: An empirical study,” in 2016 IEEE International Conference on Software
Maintenance and Evolution (ICSME). IEEE, 2016, pp. 411–422.
10 CONCLUSION [21] S. Liang, The Java native interface: programmer’s guide and specification.
Addison-Wesley Professional, 1999.
In this paper, we have revealed that super apps often contain undoc- [22] Z. Ling, R. Liu, Y. Zhang, K. Jia, B. Pearson, X. Fu, and L. Junzhou, “Prison
umented and unchecked APIs for their 1st-party mini-apps, which break of android reflection restriction and defense,” in IEEE INFOCOM 2021-IEEE
can grant elevated privileges such as APK downloading, arbitrary Conference on Computer Communications. IEEE, 2021, pp. 1–10.
[23] Listen, “How to use “openUrl”?” https://developers.weixin.qq.com/community/
web view accessing, and sensitive information querying. Unfortu- develop/article/doc/00000efea1c4785424fc1dd4e51c13.
nately, these undocumented APIs can be exploited by malicious [24] B. Livshits and J. Jung, “Automatic mediation of { Privacy-Sensitive } resource
access in smartphone applications,” in 22nd USENIX Security Symposium (USENIX
3rd-party mini-apps, as they lack security checks. To address this Security 13), 2013, pp. 113–130.
issue, we have designed and implemented APIScope, a tool that can [25] H. Lu, L. Xing, Y. Xiao, Y. Zhang, X. Liao, X. Wang, and X. Wang, “Demystifying
statically identify these undocumented APIs and dynamically verify resource management risks in emerging mobile app-in-app ecosystems,” in
Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications
their exploitability. Through our testing on five popular super apps Security, 2020, pp. 569–585.
such as WeChat and TikTok, we have found that all of them contain [26] L. Mastrangelo, L. Ponzanelli, A. Mocci, M. Lanza, M. Hauswirth, and N. Nystrom,
these types of APIs. Our findings suggest that super app vendors “Use at your own risk: the java unsafe api in the wild,” ACM Sigplan Notices, vol. 50,
no. 10, pp. 695–710, 2015.
must thoroughly examine and take caution with their privileged [27] MayBG, “How to use “requestFacetoFacePayment”?” https://developers.
APIs to prevent them from becoming potential exploit points. weixin.qq.com/community/develop/doc/000cce1ebd80006b1e8f5185b56800.
[28] X. Pan, X. Wang, Y. Duan, X. Wang, and H. Yin, “Dark hazard: Learning-based,
large-scale discovery of hidden sensitive operations in android apps.” in NDSS,
REFERENCES vol. 17, 2017, pp. 10–14 722.
[29] J. Samhi, L. Li, T. F. Bissyandé, and J. Klein, “Difuzer: Uncovering suspicious hid-
[1] “6 powerful wechat statistics you need to know in 2022,” https://brewinteractive. den sensitive operations in android apps,” in Proceedings of the 44th International
com/wechat-statistics/, (Accessed on 12/30/2022). Conference on Software Engineering, 2022, pp. 723–735.
[2] “Google play store: number of apps 2022 | statista,” https://www.statista.com/ [30] K. Sen, S. Kalasapur, T. Brutch, and S. Gibbs, “Jalangi: A selective record-replay
statistics/266210/number-of-available-applications-in-the-google-play-store/, and dynamic analysis framework for javascript,” in Proceedings of the 2013 9th
(Accessed on 12/27/2022). Joint Meeting on Foundations of Software Engineering, 2013, pp. 488–498.
[3] “Soot:a framework for analyzing and transforming java and android applications,” [31] Y. Tang, Y. Sui, H. Wang, X. Luo, H. Zhou, and Z. Xu, “All your app links are belong
http://soot-oss.github.io/soot/, (Accessed on 12/30/2022). to us: understanding the threats of instant apps based attacks,” in Proceedings of
[4] “Tencent app,” https://www.nbd.com.cn/articles/2022-12-01/2576229.html. the 28th ACM Joint Meeting on European Software Engineering Conference and
[5] “Tiktok - make your day,” https://www.tiktok.com/, (Accessed on 12/30/2022). Symposium on the Foundations of Software Engineering, 2020, pp. 914–926.
[6] “Wechat mini programs showcases new capabilities to celebrate its third anniver- [32] vuldb, “Cve-2021-40180,” https://vuldb.com/?id.205138.
sary,” https://www.tencent.com/en-us/articles/2200946.html. [33] W3C, “Miniapp standardization white paper,” https://w3c.github.io/miniapp/
[7] “What are wechat mini-programs? a simple introduction - walkthechat,” https: white-paper/, 2020.
//walkthechat.com/wechat-mini-programs-simple-introduction/, (Accessed on [34] T. Wang, K. Lu, L. Lu, S. Chung, and W. Lee, “Jekyll on ios: When benign apps
12/30/2022). become evil,” in 22nd { USENIX } Security Symposium ( { USENIX } Security 13),
[8] “WeChat Chinese Documentation,” https://developers.weixin.qq.com/ 2013, pp. 559–572.
miniprogram/en/dev/api/, 04 2022, (Accessed on 12/21/2022). [35] G. Yang, A. Mendoza, J. Zhang, and G. Gu, “Precisely and scalably vetting
[9] “WeChat English Documentation,” https://developers.weixin.qq.com/ javascript bridge in android hybrid apps,” in International Symposium on Re-
miniprogram/en/dev/api/, 04 2022, (Accessed on 12/30/2022). search in Attacks, Intrusions, and Defenses. Springer, 2017, pp. 143–166.
[10] M. Alhanahnah, Q. Yan, H. Bagheri, H. Zhou, Y. Tsutano, W. Srisa-An, and [36] Y. Yang, Y. Zhang, and Z. Lin, “Cross miniapp request forgery: Root causes,
X. Luo, “Dina: Detecting hidden android inter-app communication in dynamic attacks, and vulnerability detection,” in Proceedings of the 2022 ACM SIGSAC
loaded code,” IEEE Transactions on Information Forensics and Security, vol. 15, pp. Conference on Computer and Communications Security, 2022, pp. 3079–3092.
2782–2797, 2020. [37] L. Zhang, Z. Zhang, A. Liu, Y. Cao, X. Zhang, Y. Chen, Y. Zhang, G. Yang, and
[11] S. Aonzo, A. Merlo, G. Tavella, and Y. Fratantonio, “Phishing attacks on modern M. Yang, “Identity confusion in webview-based mobile app-in-app ecosystems,”
android,” in Proceedings of the 2018 ACM SIGSAC Conference on Computer and in 31st { USENIX } Security Symposium ( { USENIX } Security 22), 2022.
Communications Security, 2018, pp. 1788–1801. [38] Y. Zhang, B. Turkistani, A. Y. Yang, C. Zuo, and Z. Lin, “A measurement study of
[12] K. W. Y. Au, Y. F. Zhou, Z. Huang, and D. Lie, “Pscout: analyzing the android wechat mini-apps,” in Abstract Proceedings of the 2021 ACM SIGMETRICS/Inter-
permission specification,” in Proceedings of the 2012 ACM conference on Computer national Conference on Measurement and Modeling of Computer Systems, 2021,
and communications security, 2012, pp. 217–228. pp. 19–20.
[13] A. Bartel, J. Klein, Y. Le Traon, and M. Monperrus, “Dexpler: converting android [39] Q. Zhao, C. Zuo, B. Dolan-Gavitt, G. Pellegrino, and Z. Lin, “Automatic uncov-
dalvik bytecode to jimple for static analysis with soot,” in Proceedings of the ACM ering of hidden behaviors from input validation in mobile apps,” in 2020 IEEE
SIGPLAN International Workshop on State of the Art in Java Program analysis, Symposium on Security and Privacy (SP). IEEE, 2020, pp. 1106–1120.
2012, pp. 27–38.
[14] Z. Deng, B. Saltaformaggio, X. Zhang, and D. Xu, “iris: Vetting private api abuse in
ios applications,” in Proceedings of the 22nd ACM SIGSAC Conference on Computer
and Communications Security, 2015, pp. 44–56.
[15] K. Drakonakis, S. Ioannidis, and J. Polakis, “The cookie hunter: Automated
black-box auditing for web authentication and authorization flaws,” in Proceed-
ings of the 2020 ACM SIGSAC Conference on Computer and Communications
Security, 2020, pp. 1953–1970.
[16] A. Druffel and K. Heid, “Davinci: Android app analysis beyond frida via dynamic
system call instrumentation,” in International Conference on Applied Cryptography
and Network Security. Springer, 2020, pp. 473–489.
This is a preprint of our CCS 2023 paper. Chao Wang, Yue Zhang, and Zhiqiang Lin

Algorithm 1: Invariant Extraction and Matching


Input: 𝐷𝐴𝑃𝐼 : The Set of Documented APIs ; 𝐹 : The Set of All Functions 160
Output: 𝑈 𝐴𝑃𝐼 : The Set of Undocumented APIs
140

Time Consumed - Static (s)


1 PROCEDURE InvariantExtraction (𝑃𝐴𝑃𝐼, 𝐹 )
120
2 𝑃𝐴𝑃𝐼 ← ∅ ;
100
3 foreach 𝑓 𝑗 ∈ 𝐹 do
4 foreach 𝑎𝑝𝑖𝑘 ∈ 𝐷𝐴𝑃𝐼 do 80
5 if searchString (𝑎𝑝𝑖𝑘 , 𝑓 𝑗 ) then 60
6 𝑃𝐴𝑃𝐼 .𝑎𝑑𝑑 ( 𝑓 𝑗 ) ; 40
20
7 𝑖𝑛𝑣𝑎𝑟𝑖𝑎𝑛𝑡 ← ∅ ; 0
𝐼 ←∅; 0.5

Time Consumed - Dynamic (s)


8
9 𝑖𝑠𝐼𝑛𝑣𝑎𝑟𝑖𝑛𝑡 ← TRUE; 1.0
10 foreach 𝑐𝑎𝑝𝑖𝑖 ∈ 𝑃𝐴𝑃𝐼 do 1.5
11 𝑖𝑛𝑣𝑎𝑟𝑖𝑎𝑛𝑡 ←getMethodSignature (𝑃𝐴𝑃𝐼𝑖 ) ; 2.0
12 foreach 𝑐𝑎𝑝𝑖 𝑗 ∈ 𝑃𝐴𝑃𝐼 do 2.5
13 if 𝑖𝑛𝑣𝑎𝑟𝑖𝑎𝑛𝑡 ! =getMethodSignature (𝑐𝑎𝑝𝑖 𝑗 ) then 3.0
14 𝑖𝑠𝐼𝑛𝑣𝑎𝑟𝑖𝑛𝑡 ← FALSE; 3.5
15 BREAK; 4.0
QQ WeChat WeCom Baidu TikTok
16 if 𝑖𝑠𝐼𝑛𝑣𝑎𝑟𝑖𝑛𝑡 == TRUE then
17 if 𝑖𝑛𝑣𝑎𝑟𝑖𝑎𝑛𝑡 ∉ 𝐼 then
18 𝐼 .𝑎𝑑𝑑 (𝑖𝑛𝑣𝑎𝑟𝑖𝑎𝑛𝑡 ) ;

19 𝑖𝑛𝑣𝑎𝑟𝑖𝑎𝑛𝑡 ←getSuperClass (𝑐𝑎𝑝𝑖𝑖 ) ; Figure 9: Time cost of APIScope in its static and dynamic
20 foreach 𝑐𝑎𝑝𝑖 𝑗 ∈ 𝑃𝐴𝑃𝐼 do
21 if 𝑖𝑛𝑣𝑎𝑟𝑖𝑎𝑛𝑡 ! =getSuperClass (𝑐𝑎𝑝𝑖 𝑗 ) then analysis. The dynamic analysis only includes the time con-
22 𝑖𝑠𝐼𝑛𝑣𝑎𝑟𝑖𝑛𝑡 ← FALSE; sumed for identifying API invocation points.
23 BREAK;
the string “getLocation” as shown in the 5th line of Figure 2, if
24 if 𝑖𝑠𝐼𝑛𝑣𝑎𝑟𝑖𝑛𝑡 == TRUE then
25 if 𝑖𝑛𝑣𝑎𝑟𝑖𝑎𝑛𝑡 ∉ 𝐼 then it matches, we add the implementation of the whole body into set
26 𝐼 .𝑎𝑑𝑑 (𝑖𝑛𝑣𝑎𝑟𝑖𝑎𝑛𝑡 ) ; 𝑃𝐴𝑃𝐼 (line 6). Next we will iterate API implementation in 𝑃𝐴𝑃𝐼
27 𝑖𝑠𝐼𝑛𝑣𝑎𝑟𝑖𝑛𝑡 ← TRUE; to extract the invariants (line 7-44). For each specific invariant,
28 𝑖𝑛𝑣𝑎𝑟𝑖𝑎𝑛𝑡 ←getSuperPackage (𝑐𝑎𝑝𝑖𝑖 ) ; e.g., the superclass (line 19-26), only when this invariant exists
29 foreach 𝑐𝑎𝑝𝑖 𝑗 ∈ 𝑃𝐴𝑃𝐼 do
30 if 𝑖𝑛𝑣𝑎𝑟𝑖𝑎𝑛𝑡 ! =getSuperPackage (𝑐𝑎𝑝𝑖 𝑗 ) then in all APIs, we consider it is an invariant and we add it to the
31 𝑖𝑠𝐼𝑛𝑣𝑎𝑟𝑖𝑛𝑡 ← FALSE; invariant set 𝐼 (line 26); otherwise, we break the iteration and skip
32 BREAK;
this invariant (line 22). After these iterations, our invariant set will
33 if 𝑖𝑠𝐼𝑛𝑣𝑎𝑟𝑖𝑛𝑡 == TRUE then contain method signature, super class, super packages, and callers,
34 if 𝑖𝑛𝑣𝑎𝑟𝑖𝑎𝑛𝑡 ∉ 𝐼 then
35 𝐼 .𝑎𝑑𝑑 (𝑖𝑛𝑣𝑎𝑟𝑖𝑎𝑛𝑡 ) ; if they exist in the corresponding public API implementations.
36 𝑖𝑠𝐼𝑛𝑣𝑎𝑟𝑖𝑛𝑡 ← TRUE; With the extracted API invariants, it then becomes straightfor-
37 𝑖𝑛𝑣𝑎𝑟𝑖𝑎𝑛𝑡 ←getCaller (𝑐𝑎𝑝𝑖𝑖 ) ; ward to identify the undocumented APIs, as shown in line 45-51.
38 foreach 𝑐𝑎𝑝𝑖 𝑗 ∈ 𝑃𝐴𝑃𝐼 do Specifically, we first iterate implementations of functions by match-
39 if 𝑖𝑛𝑣𝑎𝑟𝑖𝑎𝑛𝑡 ! =getCaller (𝑐𝑎𝑝𝑖 𝑗 ) then
40 𝑖𝑠𝐼𝑛𝑣𝑎𝑟𝑖𝑛𝑡 ← FALSE; ing the collected invariants (line 48), and if a implementation matches
41 BREAK; with all the invariants as in the public APIs (and it has not been
42 if 𝑖𝑠𝐼𝑛𝑣𝑎𝑟𝑖𝑛𝑡 == TRUE then added in the undocumented set yet), the implementation is added
if 𝑖𝑛𝑣𝑎𝑟𝑖𝑎𝑛𝑡 ∉ 𝐼 then
43
44 𝐼 .𝑎𝑑𝑑 (𝑖𝑛𝑣𝑎𝑟𝑖𝑎𝑛𝑡 ) ;
as an undocumented API (line 50).

45 PROCEDURE UndocumentedAPIRecognition (𝑃𝐴𝑃𝐼, 𝐹 ) B EFFICIENCY


46 𝑈 𝐴𝑃𝐼 ← ∅ ;
47 foreach 𝑓 𝑗 ∈ 𝐹 do As our APIScope consists of two phases of analysis — static analysis
48 if matchinvariant (𝐼, 𝑓 𝑗 ) == TRUE then
49 if 𝑓 𝑗 ∉ 𝐼𝐴𝑃𝐼 then and dynamic analysis, in the following, we measure the perfor-
50 if 𝑓 𝑗 ∉ 𝑈 𝐴𝑃𝐼 then mance of those two phases, respectively. Specifically, the perfor-
51 𝑈 𝐴𝑃𝐼 .𝑎𝑑𝑑 ( 𝑓 𝑗 )
mance of the static analysis was measured by the time consumed
by decompiling (via Soot) of the super app, and then scanning the
decompiled code to identify the invariants and API candidates. The
upper half of Figure 9 shows this time cost. It can be seen that it
Appendices takes 85.1 seconds to finish the static analysis of a super app on av-
erage. Among all those super apps, QQ is the most time-consuming
one, which takes around 160 seconds.
A ALGORITHM FOR INVARIANT
The performance of dynamic analysis was measured by the time
EXTRACTION AND MATCHING consumed for generating test cases, identifying API invocation
The detailed algorithm of how we extract the invariants is presented points, and classifying the APIs based on their executions. As shown
between line 1 and line 44 in algorithm 1. In particular, it first in the bottom half of Figure 9, APIScope takes an average of 3.05
identifies the implementation 𝑓 𝑗 of the public APIs by searching seconds to identify the API invocation point of a specific host
the strings with the name of the public API (line 3-6). For instance, app. It is difficult to measure the time cost for API classification,
to identify the implementation of API wx.getLocation, we use as the invocation of an API may involve user interactions that
This is a preprint of our CCS 2023 paper. Submission to ACM CCS 2023, 2023

800
700
600
Number of APIs

500
400
300
200
100
0
6.3.28-880
6.3.30-900
6.3.31-920
6 .32 40
6.5.5.3--941
6.5.4-1980
6.5.5.8--1040
6.5.10-1061
6.5.13-1080
6.5.14-1100
6.5.16-1100
6.5.19-1120
6.5.22-1140
1 0
6.6.23-1140
6.6.0-1180
6.6.1-1200
6.6.2-1220
6.6.3-1240
6.6.6-1260
6.7.7-1300
6.7.2-1321
6.7.3-1340
6.7.3-1341
7.0.4-1360
7.0.0-1360
7.0.0-1360
7.0.0-1362
7.0.0-1363
7.0.3-1380
7.0.3-1381
7.0.4-1400
7.0.4-1402
7.0.5-1420
7.0.6-1421
7.0.6-1480
7.0.7-1500
7.0.7-1505
7.0.7-1522
7.0.8-1523
7.0.9-1540
7.0.9-1542
7.0.9-1543
7.0.0.9--1544
7.0.10-1565
7.0.10-1560
7.0.10-1560
7.0.12-1581
7.0.12-1600
7.0.12-1600
7.0.13-1621
7.0.13-1620
7.0.13-1620
7.0.14-1641
7.0.14-1640
7.0.15-1660
7.0.15-1660
7.0.15-1680
7.0.16-1680
7.0.16-1681
7.0.16-1690
7.0.16-1690
7.0.17-1701
7.0.17-1700
7.0.17-1700
7.0.18-1721
7.0.18-1720
7.0.18-1720
7.0.19-1743
7.0.19-1740
7.0.20-1760
7.0.20-1760
7.0.21-1780
7.0.21-1781
7.0.21-1781
7.0.21-1782
7.0.21-1803
7.0.22-1820
1 1
8.0.22-1800
8.0.0-1820
8.0.1-1840
8.0.2-1841
8.0.2-1841
8.0.2-1852
8.0.3-1860
8.0.6-1880
8.0.7-1900
8.0.7-1900
8.0.0.9--1920
.11 19 0
-1940
60
6 .7 00

7 .9 54

8 .9 92
9
6.3.27-
6.3

Versions

Figure 10: # of Uncovered APIs in WeChat. The bluebar is the # the APIs, and the redbar is # of public APIs.

700 500 400


600
400
APIs of WeCom

500 300
APIs of Baidu

APIs of QQ
400 300
300 200
200
200 100
100
100
0 0 0
2.8.2
2.8.2
2.8.5
2.8.6
2.8.7
2 .8
2.8.8.9
2.8.10
3.0.0.17
3.0.12
3.0.20
3.0.24
3.1.28
.18

11.19.0.8
11.21.0.8
11.22.0.9
12.25.00.8
1 .0. .8
122.0.00.8
12.0.0 .9
12.0.0..10
1 .3 12
12 2.3..0.8
12.16. 0.9
12.17.1.10
.0. 0
12

8.4 1
.18
.0
.5
.0
.5
.0
8.8 3
.33
3 .1

.21 5.1

.
2.7

8.4

8.5
8.5
8.6
8.6
8.7
8.8
11.18.

Versions of WeCom Versions of QQ


11

Versions of Baidu

Figure 11: # of Uncovered APIs in WeCom, Baidu and QQ.

cannot be precisely measured. For example, when our tool invokes the first two version of WeChat, and also all of them contain sig-
getLocation, the host app will pop up a dialog and ask the user nificantly number of undocumented APIs. Meanwhile, through our
to grant permission. The user’s reaction time, including the time manual investigations on the historical versions, we also obtained
taken to press the button, will also be included in the results. As two interesting findings: (i) the documented APIs in earlier ver-
such, we can only provide approximate results, and we found that sion may later become undocumented available. For example, API
none of the dynamic API executions took hours to complete, even captureScreen, which is used to capture a screenshot, has been
though there may be thousands of test cases to execute (as shown in removed from their documentation and become an undocumented
Table 2). In fact, most of them just took several minutes to complete, one; (ii) the undocumented APIs can be released to the public. For
which is acceptable since APIScope is a one-time program analysis example, an API named “chooseContact” was an undocumented
tool for a specific super app. API, and since 7.0.12, it has become a documented API.

C THE API EVOLUTION OF SUPER APPS


APIScope can be used to analyze the earlier version of super apps.
However, we have to note that its dynamic analysis component
may not support the older version of the super apps (e.g., they even
cannot be installed in our Google Pixel 4 phone). Also, to detect
whether an API is documented or not, we need the official doc-
umentation. Unfortunately, among all five tested super apps, we
cannot obtain all the historical documentations. Therefore, eventu-
ally, we collected 93 historic WeChat apps, 15 historic Wecom, 14
historic Baidu, and 9 historic QQ, together with their corresponding
documentation.
The detailed changes of APIs (including documented and undocu-
mented) with these super apps over the previous versions have been
reported in Figure 10 and Figure 11. We can clearly see that most
of the super apps when started contain undocumented APIs except

You might also like