0% found this document useful (0 votes)
10 views5 pages

ASE22 Industry Fastbot

Fastbot2 is a reusable automated model-based GUI testing tool for Android apps, enhanced by reinforcement learning, aimed at accelerating testing cycles by leveraging knowledge from previous runs. The tool has shown superior performance in activity coverage and fault detection compared to existing testing tools and has been successfully integrated into ByteDance's CI pipeline. Fastbot2 utilizes a probabilistic model to guide testing and has been publicly released for community use.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views5 pages

ASE22 Industry Fastbot

Fastbot2 is a reusable automated model-based GUI testing tool for Android apps, enhanced by reinforcement learning, aimed at accelerating testing cycles by leveraging knowledge from previous runs. The tool has shown superior performance in activity coverage and fault detection compared to existing testing tools and has been successfully integrated into ByteDance's CI pipeline. Fastbot2 utilizes a probabilistic model to guide testing and has been publicly released for community use.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Fastbot2: Reusable Automated Model-based GUI Testing for

Android Enhanced by Reinforcement Learning


Zhengwei Lv Chao Peng∗ Zhao Zhang
[email protected] [email protected] [email protected]
Bytedance Bytedance Bytedance
Beijing, China Beijing, China Beijing, China

Ting Su∗ Kai Liu† Ping Yang


[email protected] [email protected] [email protected]
East China Normal University East China Normal University / Bytedance
Shanghai, China ByteDance, Beijing, China Beijing, China

ABSTRACT 1 INTRODUCTION
We introduce a reusable automated model-based GUI testing tech- Mobile apps have drastically increased in number over the recent
nique for Android apps to accelerate the testing cycle. Our key in- years [1]. Ensuring app quality is crucial to keeping user loyalty and
sight is that the knowledge of event-activity transitions from the pre- maintaining business success. To this end, automated GUI testing
vious testing runs, i.e., executing which events can reach which ac- has become an attractive and cost-effective solution [2, 10, 11].
tivities, is valuable for guiding the follow-up testing runs to quickly In practice, an industrial app undergoes frequent updates to
cover major app functionalities. To this end, we propose (1) a proba- catch up with the changing user demands. At ByteDance, we re-
bilistic model to memorize and leverage this knowledge during test- lease new updates of major apps on a weekly basis. As a result,
ing, and (2) design a model-based guided testing strategy (enhanced continuous testing becomes crucial for quick feedback on app qual-
by a reinforcement learning algorithm). We implemented our tech- ity (e.g., doing smoke testing) whenever a new internal version
nique as an automated testing tool named Fastbot2. The evaluation is built. However, simply adopting existing testing tools [10], al-
on two popular industrial apps (with billions of user installations), though feasible, is inefficient and ineffective, as they simply rerun
Douyin and Toutiao, shows that Fastbot2 outperforms the state- each version from scratch and do not leverage the knowledge from
of-the-art testing tools (Monkey, Ape and Stoat) in both activity previous testing runs to accelerate GUI testing in the current run.
coverage and fault detection in the context of continuous testing. To To fill this important gap, we introduce a reusable automated GUI
date, Fastbot2 has been deployed in the CI pipeline at ByteDance testing technique. Our key idea is to leverage model-based testing
for nearly two years, and 50.8% of the developer-fixed crash bugs (MBT). Among existing testing solutions, MBT is recognized for
were reported by Fastbot2, which significantly improves app qual- its unique model construction phase, which is ideal for storing
ity. Fastbot2 has been made publicly available to benefit the com- and leveraging the prior knowledge. However, we face two major
munity at: https:// github.com/ bytedance/ Fastbot_Android. technical challenges in putting our idea into practice.
The first challenge is how to effectively store the knowledge
CCS CONCEPTS from previous testing runs. Our key insight is that the knowledge
• Software and its engineering → Software testing and debug- of event-activity transitions, i.e., executing which events reaches
ging; • Computing methodologies → Reinforcement learning. which activities, is valuable for guiding the follow-up testing to
quickly cover core activities. Thus, we propose a probabilistic model
ACM Reference Format:
Zhengwei Lv, Chao Peng, Zhao Zhang, Ting Su, Kai Liu, and Ping Yang.
as the basis of MBT to memorize such knowledge from each testing
2022. Fastbot2: Reusable Automated Model-based GUI Testing for An- run. This model stores a set of event-activity transitions, each of
droid Enhanced by Reinforcement Learning. In 37th IEEE/ACM Interna- which records the historical probability of an event to reach an
tional Conference on Automated Software Engineering (ASE ’22), October activity. Moreover, to tackle the complexity of industrial apps, we
10–14, 2022, Rochester, MI, USA. ACM, New York, NY, USA, 5 pages. https: introduce a conception of hyper-event to represent events in this
//doi.org/10.1145/3551349.3559505 model, which is useful to balance the model scalability and accuracy.
∗ Chao Peng and Ting Su are the corresponding authors. The second challenge is how to effectively leverage the prior
† Kai Liu was a research intern at ByteDance when this work was conducted. knowledge to guide GUI testing. Classic MBT methods traverse the
model to generate GUI events (i.e., GUI tests). However, one promi-
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed nent problem is that such GUI tests are likely to be broken due to
for profit or commercial advantage and that copies bear this notice and the full citation the unawareness of the connectivity between different GUI events.
on the first page. Copyrights for components of this work owned by others than ACM To overcome this issue, our key insight is to employ the proba-
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specific permission and/or a bilistic model to achieve on-the-fly, guided model-based testing.
fee. Request permissions from [email protected]. Specifically, the probabilistic model (which stores the knowledge of
ASE ’22, October 10–14, 2022, Rochester, MI, USA event-activity transitions) provides one-step guidance about which
© 2022 Association for Computing Machinery.
ACM ISBN 978-1-4503-9475-8/22/10. . . $15.00 events on the current GUI page could be selected to quickly reach
https://doi.org/10.1145/3551349.3559505
ASE ’22, October 10–14, 2022, Rochester, MI, USA Zhengwei Lv, Chao Peng, Zhao Zhang, Ting Su, Kai Liu, and Ping Yang

b4. Event Execution Fastbot2


b5. Update Historical Data and Probabilistic Model

b3. Algorithm Determination


a3. Probabilistic
Historical Data Model Construction Activity Coverage
(v1, v2, … , vn-1) Q-Table
a2. Installation b1
.G Hyper-event b6. Rewards
UI
Inf
o Abstractor
APK b2. Available Events Probabilistic
a1. Valid-text Extraction Model RL Agent Crashes
Valid-text Pool
vn

Figure 1: Fastbot2’s workflow in each testing run. The arrows denote different steps, which are annotated by “a1”, “b1”, etc.

those not-yet-covered activities in the current testing run. More- (a) App Under Test (Toutiao)
over, to further improve performance, we develop a reinforcement
learning algorithm to provide multi-step guidance (also informed by Title 1 (e1) Back Author Profile Photo
the probabilistic model), which aims to reach those deep activities Author 1 (e2)
(e4) (e5) (e7)
News 1 (e8)
requiring executing multiple sequential events. Remarks 1
Livestream
(e3) News 2 (e8’)
We implemented our technique as an automated testing tool News Video
Title 2 (e1’) Content News 3 (e8’’) (e9)
named Fastbot2. Our evaluation on our two popular industrial Author 2 (e2’) (e6) …
apps, Douyin and Toutiao, shows that Fastbot2 outperforms the Remarks 2
News n (e8n)
(e3’)
two state-of-the-art MBT tools, Stoat [9] and Ape [5], and the
random testing tool Monkey [4] in both activity coverage and bug Activity 1 Activity 2 Activity 3 Activity 4
(Home) (News Info) (Author Homepage) (Livestream)
finding in the context of continuous testing. To sum up, our work
makes two major contributions: (1) We propose a reusable automated (b) Probabilistic Model 63.6% Activity 2
e1
model-based GUI testing technique enhanced by reinforcement 60% Activity 2 36.4% Activity 5
learning to satisfy the practical needs of continuous testing, which e1
100%
40% Activity 5 e2 Activity 3
has not been considered by prior work. (2) Our implementation,
100%
Fastbot2, outperforms the state-of-the-art. It has also been suc- e2 Activity 3 Update 90% Activity 1
e3
cessfully deployed in the CI pipeline at ByteDance and received 90% Activity 1
10% Activity 5
positive feedback on its ability of improving app quality. e3
10% Activity 5 100%
e4 Activity 1
100%
2 FASTBOT2 e4 Activity 1 100%
e5 Activity 3
M: Initial
Figure 1 shows Fastbot2’s workflow. Fastbot2 takes as input an M’: After executing e1 and e5

APK file of the app, outputs coverage and crash reports. Fastbot2 (c) Q-value Update
Q Table
includes two major phases. The first phase does setup before testing: e1 e5 e7 e9 … an
e1 q1
decompiling the APK to gather static widget text labels of (Step reward1 reward5 reward7 rewardn e5 q5
“a1”), installing the app on a pool of mobile devices (Step “a2”), and e7 q7
an qn
loading the historical data if available to populate the probabilistic q1 q5 q7 qn
model (Step “a3”, cf. Section 2.2.1). The second phase does guided
GUI exploration (cf. Section 2.3). Fastbot2 dumps the current GUI Figure 2: Activity Transition Example from Toutiao App
page from the app (Step “b1”), identifies and abstracts available
hyper-events from the current page (Step “b2”, cf. Section 2.2.2),
selects the event which is likely to increase activity coverage (Step 2.2 Probabilistic Model and Hyper-events
“b3”), executes this UI event (Step “b4”) and updates the historical 2.2.1 Probabilistic Model. We propose a probabilistic model as
testing data, the probabilistic model (Step “b5”) and the reinforce- the basis of MBT to memorize the knowledge of event-activity
ment learning agent (Step “b6”). These steps (“b1”∼“b6”) will be transitions from previous testing runs. Specifically, this model 𝑀 is
iteratively conducted until the time budget is used up. formally defined as a 3-tuple 𝑀 = (E, A, 𝛿), where
• E is the set of hyper-events created from UI widgets.
2.1 An Illustrative Example • A is the set of activities of the app under test.
Figure 2(a) gives an illustrative example taken from Toutiao, a • 𝛿 is the transition function, i.e., E → P (A × [0, 1]). P
popular daily news app. We use this example to ease the exposition is a powerset function and each transition is of the form
of our approach in Sections 2.2 and 2.3. In Toutiao, users can view 𝑒 → (𝐴, 𝑝), meaning that the probability of a hyper-event 𝑒
news on Activity 1, click the news title (e.g., “Title 1”) to reach reaching an app activity 𝐴 is 𝑝, where 𝑒 ∈ E and 𝐴 ∈ A.
Activity 2 to view the content, click the news author (“Author”) to The probabilistic model 𝑀 is constructed from historical explo-
view all news from this author on Activity 3 and watch the live ration data. The probability of reaching activity 𝐴𝑖 by executing
video on Activity 4 if available by clicking the profile photo. the hyper-event 𝑒 (denoted by 𝑃 (𝑒, 𝐴𝑖 )) is calculated by :
Fastbot2: Reusable Automated Model-based GUI Testing for Android Enhanced by Reinforcement Learning ASE ’22, October 10–14, 2022, Rochester, MI, USA

been added in the current tested app version. This mode can help
𝑁 (𝑒, 𝐴𝑖 ) expand the model and prioritize exploring potentially new features.
𝑃 (𝑒, 𝐴𝑖 ) = (1)
𝑁 (𝑒) Model exploitation. If all the hyper-events from the current GUI
where 𝑁 (𝑒, 𝐴𝑖 ) denotes the number of times of 𝑒 reaching 𝐴𝑖 , page have been included in the probabilistic model 𝑀, Fastbot2
and 𝑁 (𝑒) denotes the total execution times of 𝑒𝑖 in all the pre- will activate this mode to select an event with higher probability
vious testing runs. If the hyper-event 𝑒 can reach 𝑘 activities (e.g., to cover those not-yet-covered activities in the current testing run
Í
𝐴1, . . . , 𝐴𝑖 , . . . , 𝐴𝑘 ), 𝑘𝑖=1 𝑃 (𝑒, 𝐴𝑖 ) = 1 holds. (which were covered in the previous testing runs). Let A𝑡 be the set
of already covered activities in the current testing run and E𝑐 be
Example. Figure 2(b) gives an example of the initial probabilistic
the set of hyper-events from the current GUI page, the expectation
model (see the left part) loaded from previous testing runs before
of improving activity coverage by executing 𝑒𝑖 (𝑒𝑖 ∈ E𝑐 ) can be
starting the current testing run. For example, Activity 2 can be Í
computed as E(𝑒𝑖 ) = 𝐴∉A𝑡 𝑃 (𝑒𝑖 , 𝐴), 0 ≤ 𝑖 ≤ |E𝑐 |.
reached by executing the hyper-event 𝑒1 on Activity 1, 𝑒1 can reach
Here, E(𝑒𝑖 ) represents the expectation value of probability that
Activity 2 and 5 with probability values 60% and 40%, respectively.
those not-yet-covered activities in the current testing run will be
2.2.2 Hyper-events. We propose the concept of hyper-event to rep- covered after the hyper-event 𝑒𝑖 is executed. The higher E(𝑒𝑖 ), the
resent the events in the probabilistic model. A hyper-event is created more likely it is to improve activity coverage.Thus, Fastbot2 in
from each UI widget according to its properties. Specifically, we only this mode selects the hyper-event 𝑒𝑖 by probability 𝑃𝑀 (𝑒𝑖 ):
consider the following four properties of a widget: the activity
E(𝑒𝑖 ) E(𝑒𝑖 )
which the widget belongs to, the widget’s text1 , resource-id,
Õ
𝑃𝑀 (𝑒𝑖 ) = 𝑒𝑥𝑝 ( ) / 𝑒𝑥𝑝 ( ) (2)
and the supported action types (e.g., click, long click). In other 𝛼 𝛼
𝑒𝑖 ∈E𝑐
words, if some widgets have the same four properties, we assume
they have the similar functionality and only one hyper-event will be where, 𝛼 is a hyperparameter which adjusts the randomness of
created. We ignore all the other minor widget properties (e.g., a wid- this mode. This equation is adapted from the softmax formula. We
get’s type) when creating hyper-events with the aim of balancing also require that 𝑒𝑖 should be selected no more than 𝐾 times to
between model scalability and accuracy. ensure fairness. In practice, we set 𝛼 as 0.8 and 𝐾 as 1. By using the
probabilistic model as priori information, the model exploitation
Example. In Figure 2(a), on Activity 1, Fastbot2 will create only
mode can quickly improve activity coverage in a short time.
one hyper-event “e1” for the widgets named “Title 1” and “Title 2”
because these two widgets have the identical widget properties: the Example. In Figure 2(a), three hyper-events are available on Ac-
same activity, the same empty text (“Title 1” and “Title 2” are the tivity 1. Since all these three events have been included in the
texts dynamically loaded from the app server without static text probabilistic model 𝑀 (see the left part of Figure 2(b)), Fastbot2
labels), the same resource-id, and the same click action type. In activates the model exploitation mode to select events. According
this way, we will create three hyper-events, i.e., “e1” (representing to 𝑀, event e1 and e2 are more likely to reach unexplored activities
“Title 1” and “Title 2”), “e2” (representing “Author 1” and “Author (i.e., Activity 2, 3, 5), while event e3 has 90% probability to stay in
2”), “e3” (representing “Remarks 1” and “Remarks 2”), on Activity 1. Activity 1. Thus, Fastbot2 is likely to select e1 or e2. Assume e1
is selected and then Activity 2 is covered. In Activity 2, event e4
2.3 Model-based Guided UI Exploration (the back button) has 100% probability to return back to Activity 1,
while event e5 and e6 have not been included in 𝑀. As this time,
The key idea of Fastbot2 is to reuse the prior knowledge stored in Fastbot2 activates the model expansion mode and randomly se-
the probabilistic model to effectively guide GUI testing. To achieve lects e5 or e6. Assume e5 is selected and then Activity 3 is covered.
this, the key step is to decide which UI event on the current GUI Meanwhile, 𝑀 is updated by adding e5→Activity 3 with probability
page should be selected so as to quickly increase activity coverage. value 100% (see the right part of Figure 2(b)).
This step corresponds to Step “b3” in Figure 1. Specifically, given a
GUI page, Fastbot2 extracts the available hyper-events, and selects 2.3.2 Learning-based event selection. However, the probabilistic
the event2 to be executed based on the two synergistically combined model can only express one-step guidance information. Fortunately,
strategies: (1) model-based event selection (cf. Section 2.3.1), and (2) reinforcement learning technique is able to spread one-step into
learning-based event selection (cf. Section 2.3.2). multiple-step guidance information.
2.3.1 Model-based event selection. Model-based event selection Q-table expansion. The key component of the RL agent is the Q-
contains two modes, i.e., model expansion and model exploitation. table, which contains the Q-values (which indicate the possibility of
executing each hyper-event to reach a new activity). During testing,
Model expansion. If some hyper-events from the current GUI page
no matter which event selection strategy is used, the Q-value of
have not been included in the probabilistic model 𝑀, Fastbot2 will
the selected hyper-event 𝑒𝑡 on the current GUI page, i.e., 𝑄 (𝑒𝑡 ), is
activate this mode to randomly select one not-yet-executed hyper-
updated to 𝑄 (𝑒𝑡 ) + 𝛼 (𝐺𝑡,𝑡 +𝑛 − 𝑄 (𝑒𝑡 )), where 𝐺𝑡,𝑡 +𝑛 is the n-step
event. This situation may occur because the previous testing runs
cumulative reward calculated by an N-step Sarsa method [3]:
may not cover all hyper-events or some new app features have

1 Here, the text means the static text labels stored in the resource files of the APK file. 𝐺𝑡,𝑡 +𝑛 = 𝑟𝑡 +1 + 𝛾𝑟𝑡 +2 + · · · + 𝛾 𝑛 𝑄 (𝑒𝑡 +𝑛 ) (3)
If the text is dynamically loaded from the app server, we treat its text as empty.
2 After we decide which hyper-event should be selected, if the selected hyper-event Here, 𝛾 is the discount factor. 𝑟𝑡 +1 is the immediate reward earned
represents multiple UI widgets, we will randomly pick one UI widget to exercise. after the event 𝑒𝑡 is executed, which is defined as
ASE ’22, October 10–14, 2022, Rochester, MI, USA Zhengwei Lv, Chao Peng, Zhao Zhang, Ting Su, Kai Liu, and Ping Yang

versions of Douyin (v19.7∼v20.6) and Toutiao (v8.7.0∼v8.7.9) for


E(𝑒𝑡 ) 𝑉 continuous testing. We ran Fastbot2 to test one version on 10
𝑟𝑡 +1 = p +p (4)
𝑁 (𝑒𝑡 ) + 1 𝑁 (𝐴𝑡 ) + 1 devices in parallel for 1 hour, and test the next version by reusing
Here, 𝑁 (𝑒𝑡 ) denotes the number of times 𝑒𝑡 is executed, 𝐴𝑡 denotes historical data from all previous runs. For the other tools, we ran
the activity 𝑒𝑡 leads to, and 𝑁 (𝐴𝑡 ) denotes the number of times 𝐴𝑡 them on each version on 10 devices in parallel as this is the typical
is visited so far in the current testing. 𝑉 represents the value of 𝐴𝑡 usage scenario of these tools. We compare the achieved activity
and is calculated using: coverage and the number of uncovered unique crashes by each tool.
RQ2: Ablation Study. Do model-based and learning-based event se-
Õ
𝑉 = 𝑛ℎ + 0.5 ∗ 𝑛𝑐 + E(𝑒𝑖 ) (5) lection strategies both contribute to Fastbot2’s overall performance?
𝑒𝑖 ∈ E𝑐
We test Douyin (v.19.7) and Toutiao (v.8.7.0) on 10 devices for
Here, 𝑛ℎ is the number of hyper-events in the reached GUI page 1 hour with the model-based strategy enabled only, the learning-
but are not in the probabilistic model. Thus, executing these hyper- based strategy enabled only, and both strategies enabled to evaluate
events will likely touch new features. 𝑛𝑐 is the number of hyper- their respective impact on Fastbot2’s overall performance.
events in the next GUI page and contained in the probabilistic model In RQ1 and RQ2, 10 different Huawei, OPPO and Google Pixel
Í
but have not been executed in the current testing run. 𝑒𝑖 ∈E𝑐 E(𝑒𝑖 ) Android devices are used to mitigate the device fragmentation issue.
is the sum of expectation values of executing 𝑒𝑖 to improve activity
coverage, which is the same formula defined in model exploitation. 3.1 RQ1: Test Effectiveness
Q-table exploitation. A hyper-event is selected by probability 𝑃𝑄 : Figures 3(a) and 3(b) show activity coverage achieved: bars give
𝑄 (𝑒𝑖 ) Õ 𝑄 (𝑒𝑖 ) numbers of activities covered by different tools on each app version,
𝑃𝑄 (𝑒𝑖 ) = 𝑒𝑥𝑝 ( ) / 𝑒𝑥𝑝 ( ) (6) and curves give accumulated numbers of activities covered after
𝛽 𝛽
𝑒𝑖 ∈ E𝑐 testing ten consecutive versions. Figures 3(c) and 3(d) give the
where 𝛽 is the hyperparameter that adjust the randomness of the similar information on the number of unique crashes revealed by
strategy, which is set to 0.1 in our practice. these tools (we deduplicate crashes according to stack traces [9]).
Example. In Figure 2(a), on Activity 2, e4 and e6 have not been We can see that Fastbot2 achieved the highest activity coverage
executed yet and e6 is in the model 𝑀. In this case, e1 (on Activity on each single version of both apps (except Toutiao’s v8.7.3) and
1) will be given a higher reward because e1 can lead to interesting the highest accumulated activity coverage across ten continuous
actions (e4 and e6). Similarly, e8 is also a new event (not in 𝑀) versions for both apps. It indicates that reusing the knowledge from
on Activity 3 after executing e5, thus e5 is also given a higher previous testing runs can effectively improve activity coverage within
reward. Assume e1, e2 and e3 are all executed for many times after the same time budget. On the other hand, We can see that Fastbot2
a while, and Activities 2, 3 and 4 are all covered (all events have uncovered many more crashes on Douyin than all other tools (all
been included in the model 𝑀). In this case, the Q-table is used to the crashes were confirmed as real bugs). Fastbot2 uncovered 2
make decision on event selection: e1 is likely to be selected as it has fewer crashes than Ape on Toutiao. We find that most crashes found
the highest rewards indicating that it can reach deeper activities. by these two tools on Toutiao are similar native crashes (i.e., crashes
triggered by the app but reside in native C++ libraries). The overall
2.4 Fastbot2’s Implementation result indicates that reusing the knowledge from previous testing runs
Fastbot2 is implemented as a fully automated tool and consists of can also effectively improve Fastbot2’s bug finding ability.
client and server modules. The client reuses the GUI tree dumping Figures 4(a) and (b) use venn diagrams to compare the differences
and action execution capabilities of Ape [5] to interact with the between Fastbot2 and other tools in terms of the accumulated
app. The server written in GoLang performs event selection and activity coverage after testing ten app versions. The overlapped
supports multi-device collaboration mode (which allows multiple part denotes the number of activities covered by both tools, while
clients to test the same app in parallel on multiple devices and share other parts denote the number of activities covered by the two tools
the same probabilistic model and RL agent). alone, respectively. We can see that Fastbot2 is indeed effective as
it can cover many more unique activities than other tools.
3 EVALUATION We note that Fastbot2 performed much better on Douyin than
We evaluate the effectiveness of Fastbot2 by comparing it with Toutiao, compared to Ape. This is because Douyin is much more
two popular model-based tools, Stoat [9] and Ape [5]; and complicated than Toutiao. For example, Douyin has more com-
Monkey [4], a popular industrial testing tool. We tried to include plicated features, e.g., online shopping in the livestreaming room,
Q-testing [8], a recent reinforcement learning-based testing tool. video recording and editing. In contrast, Toutiao has simpler fea-
However, Q-testing always fails with exceptions after a few min- tures (e.g., news reading). As a result, Toutiao is more likely to reach
utes of testing our industrial apps. Thus, we do not compare with saturated coverage within 1-hour testing by Fastbot2 and Ape.
Q-testing. We investigate the following research questions: We also note that Stoat did not perform well as we expected.
RQ1: Test Effectiveness: Is Fastbot2 able to achieve higher activ- After inspection, we find Stoat only generated around 300 events
ity coverage and reveal more unique crashes than existing tools when during 1-hour testing. The main reason is that Stoat kept querying
applied in the scenario of continuous app version updates? the GUI tree and only generated the next event until the current GUI
We use our two popular apps, Douyin (short video) and Toutiao page became stable. However, many features of Douyin and Toutiao,
(daily news), as our subjects and selected ten recent consecutive e.g., advertisements, profiles and user comments, are dynamically
changing, which makes Stoat waste a lot of time for waiting.
Fastbot2: Reusable Automated Model-based GUI Testing for Android Enhanced by Reinforcement Learning ASE ’22, October 10–14, 2022, Rochester, MI, USA

200 250 10 20
Fastbot
Num. of Covered Activities

Num. of Covered Activities

Num. of Revealed

Num. of Revealed
150 200 APE
15

Crashes

Crashes
150 Stoat
100 5 10 Monkey
100
50 5 Fastbot-Accum.
50
APE-Accum.
0 0 0 0 Stoat-Accum.
v8 0
v8 1
v8 2
v8 3
v8 4
v8 5
v8 6
v8 7
v8 8
.9

v8 0
v8 1
v8 2
v8 3
v8 4
v8 5
v8 6
v8 7
v8 8
.9

v1 7
v1 8
v2 9
v2 0
v2 1
v2 2
v2 3
v2 4
v2 5
6
7
8
9
0
1
2
3
4
5
6
Monkey-Accum.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

9.
9.
9.
0.
0.
0.
0.
0.
0.
0.
.7
.7
.7
.7
.7
.7
.7
.7
.7
.7

.7
.7
.7
.7
.7
.7
.7
.7
.7
.7
9.
9.
9.
0.
0.
0.
0.
0.
0.
0.

v1
v8

v8
v1
v1
v1
v2
v2
v2
v2
v2
v2
v2
(a) Activity Coverage for Toutiao (b) Activity Coverage for Douyin (c) Uncovered Crashes of Toutiao (d) Uncovered Crashes of Douyin

Figure 3: Testing results between Fastbot2 and other tools in terms of activity coverage and bug finding.

improve code coverage and fault detection. They also extract infor-
mation from crowd-based testing to enhance Sapienz [7]. WeChat
WCTester [12, 13] adopts Monkey-based random testing. WCTester
allows human testers to specify blacklisted widgets and define GUI
(a) Toutiao (b) Douyin event sequences to improve coverage. However, our work is sig-
Figure 4: Differences of accumulative activity coverage. nificantly different from the prior work. First, Fastbot2 mainly
adopts model-based testing (enhanced by a learning-based algo-
rithm). Second, Fastbot2 reuses the knowledge from the historical
3.2 RQ2: Ablation Study exploration data to resolve the practical needs of continuous testing,
Figure 5(a) and (b) shows the activity coverage of Toutiao and which have not been considered by prior work.
Douyin achieved by different testing strategies within 1-hour test-
ing. Fastbot2 (denoted by “RL+PM”) achieves 31.5% coverage
6 CONCLUSION
for Toutiao, which is higher than both the model-based event This paper presents a reusable automated model-based GUI testing
selection strategy alone (28.5%, denoted by “PM Only”) and the technique for Android enhanced by reinforcement learning to sat-
learning-based event selection strategy alone (29.0%, denoted by “RL isfy the practical needs of continuous testing. Our implementation
Only”). Similarly, Fastbot2 achieves higher coverage for Douyin Fastbot2 outperforms the three state-of-the-art testing tools in
(20.8%) than model-based strategy alone (18.6%) and learning-based both activity coverage and bug finding in the scenario of contin-
strategy alone (19.2%). The result indicates that both the model- uous testing on two popular apps Douyin and Toutiao. Fastbot2
based and learning-based event selection strategies contribute to has been successfully deployed in the CI pipeline at ByteDance and
Fastbot2’s overall performance in improving activity coverage. received positive feedback on its ability of improving app quality.

REFERENCES
[1] AppBrain. 2022. . Retrieved June 3, 2022 from https://www.appbrain.com/stats/
number-of-android-apps
[2] Shauvik Roy Choudhary, Alessandra Gorla, and Alessandro Orso. 2015. Au-
tomated test input generation for android: Are we there yet?. In ASE. IEEE,
429–440.
[3] Kristopher De Asis, J Hernandez-Garcia, G Holland, and Richard Sutton. 2018.
Multi-step reinforcement learning: A unifying algorithm. In AAAI, Vol. 32.
[4] Google. 2021. UI/Application Exerciser Monkey. Retrieved March 3, 2021 from
https://developer.android.com/studio/test/monkey
[5] Tianxiao Gu, Chengnian Sun, Xiaoxing Ma, Chun Cao, Chang Xu, Yuan Yao,
(a) Toutiao (b) Douyin
Qirun Zhang, Jian Lu, and Zhendong Su. 2019. Practical GUI testing of Android
Figure 5: Comparing Fastbot2’s internal strategies. applications via model abstraction and refinement. In ICSE. IEEE, 269–280.
[6] Ke Mao, Mark Harman, and Yue Jia. 2016. Sapienz: Multi-objective automated
testing for android applications. In ISSTA. 94–105.
4 INDUSTRIAL DEPLOYMENT [7] Ke Mao, Mark Harman, and Yue Jia. 2017. Crowd intelligence enhances automated
mobile testing. In ASE. IEEE, 16–26.
To date, Fastbot2 have been deployed in the Continuous Inte- [8] Minxue Pan, An Huang, Guoxin Wang, Tian Zhang, and Xuandong Li. 2020.
gration pipeline at ByteDance for nearly two years. Fastbot2 is Reinforcement learning based curiosity-driven testing of Android applications.
In ISSTA. 153–164.
automatically triggered by nightly builds to obtain quick feedback [9] Ting Su, Guozhu Meng, Yuting Chen, Ke Wu, Weiming Yang, Yao Yao, Geguang
on app quality when new code changes occur. We have received pos- Pu, Yang Liu, and Zhendong Su. 2017. Guided, stochastic model-based GUI testing
itive feedback from app development teams. For example, among of Android apps. In ESEC/FSE. 245–256.
[10] Ting Su, Jue Wang, and Zhendong Su. 2021. Benchmarking Automated GUI
all the developer-fixed bugs for Toutiao from September 1 to Oc- Testing for Android against Real-World Bugs. In ESEC/FSE. to appear.
tober 31, 2021, 50.8% of these bugs were uncovered by Fastbot2. [11] Wenyu Wang, Dengfeng Li, Wei Yang, Yurui Cao, Zhenwen Zhang, Yuetang
Additionally, Fastbot2 can cover 80% of the hot-spot activities in Deng, and Tao Xie. 2018. An empirical study of android test generation tools in
industrial cases. In ASE. IEEE, 738–748.
Toutiao that are frequently visited by online users. These results [12] Xia Zeng, Dengfeng Li, Wujie Zheng, Fan Xia, Yuetang Deng, Wing Lam, Wei
corroborate Fastbot2’s strong effectiveness. Yang, and Tao Xie. 2016. Automated test input generation for android: Are we
really there yet in an industrial case?. In ESEC/FSE. 987–992.
5 RELATED WORK [13] Haibing Zheng, Dengfeng Li, Beihai Liang, Xia Zeng, Wujie Zheng, Yuetang
Deng, Wing Lam, Wei Yang, and Tao Xie. 2017. Automated test input generation
We focus on discussing the industrial practice of automated Android for android: Towards getting there in an industrial case. In ICSE-SEIP. IEEE,
GUI testing. Facebook Sapienz [6] adopts search-based testing to 253–262.

You might also like