Repairing Crashes in Android Apps

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

Repairing Crashes in Android Apps

Shin Hwei Tan∗ Zhen Dong


Southern University of Science and Technology National University of Singapore
[email protected] [email protected]

Xiang Gao Abhik Roychoudhury


National University of Singapore National University of Singapore
[email protected] [email protected]

ABSTRACT 1 INTRODUCTION
Android apps are omnipresent, and frequently suffer from crashes Smartphones have become pervasive, with 492 millions sold world-
— leading to poor user experience and economic loss. Past work wide in the year of 2011 alone [21]. Users tend to rely more on their
focused on automated test generation to detect crashes in Android smartphones to conduct their daily computing tasks as smartphones
apps. However, automated repair of crashes has not been studied. are bundled with various mobile applications. Hence, it is important
In this paper, we propose the first approach to automatically re- to ensure the reliability of apps running in their smartphones.
pair Android apps, specifically we propose a technique for fixing Testing and analysis of mobile apps, with the goal of enhancing
crashes in Android apps. Unlike most test-based repair approaches, reliability, have been studied in prior work. Some of these works
we do not need a test-suite; instead a single failing test is meticu- focus on static and dynamic analysis of mobile apps [2, 7, 18, 56],
lously analyzed for crash locations and reasons behind these crashes. while other works focus on testing of mobile apps [3, 4, 30, 31, 43].
Our approach hinges on a careful empirical study which seeks to To further improve the reliability of mobile applications, several
establish common root-causes for crashes in Android apps, and approaches go beyond automated testing of apps by issuing security-
then distills the remedy of these root-causes in the form of eight related patches [6, 39]. While fixing security-related vulnerabilities
generic transformation operators. These operators are applied using is important, a survey revealed that most of the respondents have
a search-based repair framework embodied in our repair tool Droix. experienced a problem when using a mobile application, with 62
We also prepare a benchmark DroixBench capturing reproducible percent of them reported a crash, freeze or error [1]. Indeed, fre-
crashes in Android apps. Our evaluation of Droix on DroixBench quent crashes of an app will lead to negative user experience and
reveals that the automatically produced patches are often syntacti- may eventually cause users to uninstall the app. In this paper, we
cally identical to the human patch, and on some rare occasion even study automated approaches which alleviate the concern due to
better than the human patch (in terms of avoiding regressions). app crashes via the use of automated repair.
These results confirm our intuition that our proposed transforma- Recently, several automated program repair techniques have
tions form a sufficient set of operators to patch crashes in Android. been introduced to reduce the time and effort in fixing software
errors [24, 28, 35, 40, 42, 52]. These approaches take in a buggy
CCS CONCEPTS program P and some correctness criterion in the form of a test-
• Software and its engineering → Automatic programming; suite T , producing a modified program P ′ which passes all tests in
Software testing and debugging; Dynamic analysis; T . Despite recent advances in automated program repair techniques,
existing approaches cannot be directly applied for fixing crashes
KEYWORDS found in mobile applications due to various challenges.
The key challenge in adopting automated repair approaches to
Automated repair, Android apps, Crash, SBSE
mobile applications is that the quality of the generated patches is
ACM Reference Format: heavily dependent on the quality of the given test suite. Indeed, any
Shin Hwei Tan, Zhen Dong, Xiang Gao, and Abhik Roychoudhury. 2018. repair technique tries to patch errors to achieve the intended behav-
Repairing Crashes in Android Apps. In ICSE ’18: ICSE ’18: 40th International
ior. Yet, in reality, the intended behavior is incompletely specified,
Conference on Software Engineering , May 27-June 3, 2018, Gothenburg, Swe-
often through a set of test cases. Thus, repair methods attempt to
den. ACM, New York, NY, USA, 12 pages. https://doi.org/10.1145/3180155.
3180243 patch a given buggy program, so that the patched program passes
all tests in a given test-suite T (We call repair techniques that use
∗ This work was done during the author’s PhD study at National University of Singapore
test cases to drive the patch generation process test-driven repair).
Unsurprisingly, test-driven repair may not only produce incomplete
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed fixes but the patched program may also end up introducing new
for profit or commercial advantage and that copies bear this notice and the full citation errors, because the patched program may fail tests outside T , which
on the first page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
were previously passing [45, 49]. Meanwhile, several unique proper-
to post on servers or to redistribute to lists, requires prior specific permission and/or a ties of test cases for mobile applications pose unique challenges for
fee. Request permissions from [email protected]. test-driven repair. First, regression test cases may not be available
ICSE ’18, May 27-June 3, 2018, Gothenburg, Sweden
for a given mobile app A. While prior researches on automated test
© 2018 Association for Computing Machinery.
ACM ISBN 978-1-4503-5638-1/18/05. . . $15.00 generation for mobile apps could be used for generating crashing
https://doi.org/10.1145/3180155.3180243
ICSE ’18, May 27-June 3, 2018, Gothenburg, Sweden Shin Hwei Tan, Zhen Dong, Xiang Gao, and Abhik Roychoudhury

inputs, regression test inputs that ensure the correct behaviors of Fragment is added

A are often absent. Secondly, instead of simple inputs, test inputs Activity
launched onAttach()
for mobile apps are often given as a sequence of UI commands (e.g., onCreate()
onCreate()
Created
clicks and touches) leading to crashes in the app. Meanwhile, GUI
onCreateView()
onStart()
tests are often flaky [29, 36]: their outcome is non-deterministic Started
onRestart()
User navigates onActivityCreated()
for the same program version. As current repair approaches rely to the activity onResume()
onStart()
solely on the test outcomes for their correctness criteria, they may App process Activity
Resumed

not be able to correctly reproduce tests behavior and subsequently killed running
User navigates
onResume()
to the activity
generate incorrect patches due to flaky tests. Another activity comes
into the foreground
Fragment is active
User returns
Another key challenge in applying recent repair techniques to App with higher
to the activity User navigates backward Fragment is added to
or fragment is
onPause() the back stack,
mobile applications lies on their reliance on the availability of priority need memory Paused removed/replaced then removed/replaced
The activity is
source code. However, mobile applications are often distributed no longer visible
onPause()
as standard Android .apk files since the source code for a given onStop()
onStop()
The fragment
Stopped returns to the
version of a mobile app may not be directly accessible nor actively The activity is finishing or onDestroyView()
layout from back stack

maintained. Moreover, while previous automated repair techniques being destroyed by the system
onDestroy()
are applied for fixing programs used by developers and program- onDestroy()
onDetach()
mers, mobile applications may be utilized by general non-technical Destroyed
Activity
users who may not have any prior knowledge regarding source shutdown Fragment is destroyed
Activity lifecycle Fragment lifecycle
code and test compilations.
We present a novel framework, called Droix for automated repair
of crashes in Android applications. In particular, our contributions Figure 1: Activity Lifecycle, Fragment Lifecycle and the Activity-
Fragment Coordination
can be summarized as follows:
Android repair: We propose a novel Android repair framework
that automatically generates a fixed APK given a buggy APK and Figure 1 shows the lifecycles of activity and fragment in Android.
a UI test. Android applications were not studied in prior work in Each method in Figure 1 represents a lifecycle callback, a method
automated program repair, but various researches on analysis [2, that gets called given a change of state. Lifecycle transition obeys
7, 18, 56] and automated testing [3, 4, 30, 31, 43] illustrate the certain principles. For instance, an activity with the paused state
importance of ensuring the reliability of Android apps. could move to the resumed state or the stopped state, or may be
Repairing UI-based test cases: Different from existing repair ap- killed by the Android system to free up RAM.
proaches based on a set of simple inputs, our approach fixes a A fragment is a portion of user interface or a behavior that can
crash with a single UI event sequence. Specifically, we employ be put in an Activity. Each fragment can be modified independently
techniques allowing end users to reproduce the crashing event of the host activity (activity containing the fragment) by performing
sequence by recording user actions on Android devices instead of a set of changes. For a fragment, it goes through more states than
writing test codes. The crashing input could be either recorded an Activity from being launched to the active state, e.g., onAttach
manually by users or automatically generated by GUI testing ap- and onCreateView states.
proaches [30, 47]. The communication between an activity and a fragment needs
Lifecycle-aware transformations Our approach is different from to obey certain principles. A fragment is embedded in an activity
existing test-driven repair approaches since it does not seek to and could communicate with its host activity after being attached.
modify a program to pass a given test-suite. Instead, it seeks to The allowed states of a fragment are determined by the state of
repair the crashes witnessed by a single crashing input, by em- its host activity. For instance, a fragment is not allowed to reach
ploying program transformations which are likely to repair the the onStart state before its host activity enters the onStart state. A
root-causes behind crashes. We introduce a novel set of lifecycle- violation of these principles may cause crashes in Android apps.
aware transformations that could automatically patch crashing
android apps by using management rules from the activity lifecycle 3 A MOTIVATING EXAMPLE
and fragment lifecycle.
We illustrate the workflow of our automated repair technique by
Evaluation: We propose DroixBench, a set of 24 reproducible
showing an example app, and its crash. The crash occurred in Tran-
crashes in 15 open source Android apps. Our evaluation on 24
sistor, a radio app for Android with 63 stars in GitHub. According
defects shows that Droix could repair 15 bugs, and seven of these
to the bug report1 , Transistor crashes when performing the event
repairs are syntactically equivalent to the human patches.
sequence shown in Figure 2: (a) starting Transistor; (b) shutting it
down by pressing the system back button; (c) starting Transistor
2 BACKGROUND: LIFECYCLE IN ANDROID again and changing the icon of any radio station. Then, it crashes
Different from Java programs, Android applications do not have a with a notification “Transistor keeps stopping”(d). Listing 1 shows
single main method. Instead, Android apps provide multiple entry the log relevant to this crash. The stack trace information in Listing 1
points such as onCreate and onStart methods. Via these methods, suggests that the crash is caused by IllegalStateException.
Android framework is able to control the execution of apps and
maintain their lifecycle. 1 https://github.com/y20k/transistor/issues/21
Repairing Crashes in Android Apps ICSE ’18, May 27-June 3, 2018, Gothenburg, Sweden

the previously attached activity. The developer’s patch explicitly


invokes startActivityForResult method of the previously at-
tached activity instead of the newly created activity. After applying
the Developer’s patch, a user reports that the system back button
no longer functions correctly when changing the station icon (i.e.,
pressing the back button does not close the app but mistakenly
(a) Open Transistor (b) Press back
opens a window for selecting images). Specifically, the user reports
the following event sequence when the app fails to function prop-
erly: open Transistor → tap to change icon → press
back twice → open Transistor → tap to change icon →
press back twice. We test the APK generated by Droix with this
event sequence, and we observe that our fixed APK does not exhibit
the faulty behavior reported by the user. Hence, we believe that the
(d) Crashed with a notification (C) Open again and change an icon
patch generated by Droix works better than the developer’s patch.

Figure 2: Continuous snapshots of a crash in Transistor. 4 IDENTIFYING CAUSES OF CRASHES IN


Our automated repair framework, Droix performs analysis of ANDROID APPLICATIONS
the Activity-Fragment coordination (dashed lines in Figure 1) and To study the root causes of crashes in Android apps, we man-
reports potential violations in the communication between a frag- ual inspect Android apps on GitHub and API documentation (as
ment and its host activity. Our manual analysis of the source code prior work has showed success in finding bugs via API documen-
for this app further reveals that the crash occurs because the frag- tation [48]). Our goal is to identify a set of common causes for
ment attempts to call an inherited method startActivityForResult Android crashes. We first obtain a set of popular Android apps by
at line 482, which indirectly invokes a method of its host activity. crawling GitHub and searching for the word “android app” written
However, the fragment is detached from the previous activity dur- in Java using the GitHub API 2 . For each app repository, we search
ing the termination of the app and needs to be attached to a new for closed issues (resolved bug report) with the word “crash”. We
activity in the restarting app. The method invocation occurs before focus on closed issues because those issues have been confirmed by
the new activity has been completely created and leads to the crash. the developers and are more likely to contain fixes for the crashes.
FATAL EXCEPTION : main P r o c e s s : o r g . y20k . t r a n s i s t o r , PID : 2 4 1 6 From the list of closed issues on app crashes, we further extract
java . lang . I l l e g a l S t a t e E x c e p t i o n : issues that contain at least one corresponding commit associated
Fragment M a i n A c t i v i t y F r a g m e n t { 8 2 e 1 b e c } n o t a t t a c h e d t o A c t i v i t y
a t a n d r o i d . . . s t a r t A c t i v i t y F o r R e s u l t ( Fragment . j a v a : 9 2 5 ) with the crash. The final output of our crawler is a list of crashes-
a t y20k . . . s e l e c t F r o m I m a g e P i c k e r ( M a i n A c t i v i t y F r a g m e n t . j a v a : 4 8 2 ) related closed issues that have been fixed by the developers. Overall,
Listing 1: Stack trace for the crash in Transistor our crawler searches through 7691 GitHub closed issues where 1155
(15%) of these issues are related to crashes. The relatively high per-
if ( g e t A c t i v i t y ( ) ! = null ) centage of crash-related issues indicates the prevalence of crashes
4 8 2 : s t a r t A c t i v i t y F o r R e s u l t ( p i c k I m a g e I n t e n t , REQUEST_LOAD_IMAGE ) ; in Android apps. Among these 1155 issues, 107 of these issues from
Listing 2: Droix’s patch for the crash in Transistor 15 different apps have corresponding bug-fixing commits. We man-
ually analyzed all issues and attempted to answer two questions:
s t a r t A c t i v i t y F o r R e s u l t ( p i c k I m a g e I n t e n t , REQUEST_LOAD_IMAGE ) ; Q1: What are the possible root causes and exceptions that lead to
482: mActivity . s t a r t A c t i v i t y F o r R e s u l t ( pickImageIntent ,
REQUEST_LOAD_IMAGE ) ; crashes in Android apps?
Q2: How does the complexity of activity/fragment lifecycle affect
Listing 3: Developer’s patch for the crash in Transistor
crashes in Android apps?
Droix defines specific repair operators based on our study of We study Q2 because a survey of Android developers suggests
crashes in Android apps and the Android API documentation (see that the topmost reasons (47%) for NullPointerException in An-
Section 4). One of the transformation operators identified through droid apps occur due to the complexity of activity/fragment life-
our study, GetActivity-check, is designed to check if the ac- cycle [18]. Our goal is to identify a set of generic transformations
tivity containing the fragment has been created. The condition that are often used by Android developers in fixing Android apps.
getActivity()!=null prevents the scenario where a fragment To gain deeper understanding of the root causes of each crash (Q1)
communicates with its host activity before the activity is created. and to identify the affect of activity/fragment lifecycle on the likeli-
Listing 2 shows the patch automatically generated by Droix. hood of introducing crashes (Q2), we manually examine lifecycle
With the patch, method startActivityForResult will not be in- management rules in the official Android API documentations 3 .
voked if the host activity has not been created. The related function Our study shows that the most common exceptions are:
(i.e., changing station icon) works well after our repair. In con- • NullPointerException (40.19%)
trast, although the developer’s patch does not crash on the given • IllegalStateException (7.48%)
input, it introduces regressions. Listing 3 shows the developer’s
patch where mActivity is a field of the fragment referencing its 2 https://developer.github.com/v3/
host activity. When restarting the app, this field still points to 3 https://developer.android.com/guide/components/activities/activity-lifecycle.html
ICSE ’18, May 27-June 3, 2018, Gothenburg, Sweden Shin Hwei Tan, Zhen Dong, Xiang Gao, and Abhik Roychoudhury

Table 1: Root cause of crashes in Android apps

Category Specific reason Description GitHub Issues (%) Frequent Exception Type Category Total (%)
Configuration changes activity recreation during configuration changes 5.61 NullPointer
Stateloss transaction loss during commit 2.80 IllegalState
Lifecycle GetActivity activity-fragment coordination 2.80 IllegalState 14.02
Activity backstack inappropriate handling of activity stack 1.87 IllegalArgument
Save instance uninitialized object instances in onSaveInstance() callback 0.93 IllegalState
Resource-related resource type mismatches 10.28 NullPointer
Resource 16.82
Resource limit limited resources 4.67 OutOfMemory
Incorrect resource retrieve a wrong resource id 1.87 SQLite
Activity-related missing activities 7.48 NullPointer
View-related missing views 6.54 NullPointer
Callback
Intent-related missing intents 3.74 NullPointer 17.76
Unhandled callbacks missing callbacks 2.80 NullPointer
Missing Null-check missing check for null object reference 12.15 NullPointer
External Service/Library defects in external service/library 8.41 NullPointer
Others Workaround temporary fixes for defect 4.67 IndexOutOfBound 52.34
API changes API version changes 2.80 SQLite
Others project-specific defects 24.30 -

The high percentage of NullPointerException confirms with the Table 2: Supported Operators in Droix
findings of prior study of Android apps [18].
Operator Description
Table 1 shows the common root causes of crashes in Android
Insert a condition to check whether the activity containing
apps we investigated. Column “Category" in Table 1 describes the S1: GetActivity-check
the fragment has been created.
high-level causes of the crashes, while the “Specific reasons" col- S2: Retain object Store objects and load them when configuration changes
umn gives the specific causes that lead to the crash. The last column S3: Replace resource id Replace resource id with another resource id of same type.
Replace the current method call with another method call
(Category Total (%)) presents the total percentage of issues that fits S4: Replace method
with similar name and compatible parameter types.
into a particular category. Overall, 14.02% of crashes in our study S5: Replace cast Replace the current type cast with another compatible type.
occur due to the violation of management rules for Android Activ- S6: Move stmt Removes a statement and add it to another location.
S7: Null-check Insert condition to check if a given object is null.
ity/Fragment lifecycles. The reader can refer to Section 5 on the S8: Try-catch Insert try-catch blocks for the given exception.
explanation of these lifecycle-related crashes. Meanwhile, 16.82% of
the investigated crashes are due to improper handling of resources,
including resources either not available (Resource-related) or lim- other program transformation operators in Table 2 and the specific
ited resources like memory (Resource limit). Furthermore, improper reasons of crashes associated with each operator in this section.
handling of callbacks contributes to 17.76% of crashes. Note that Retain stateful object Configuration changes (e.g., phone rota-
this “Callback" category denotes implementation-specific problems tion and language) cause activity to be destroyed and recreated
of different components in Android library (e.g., Activity, View and which allows apps to adapt to new configuration (transition from
Intent). Among 40.19% of NullPointerExceptions thrown in these onDestroy()→ onCreate()). According to Android documenta-
crashing apps, only 12.15% is related to missing the check for null tion 4 , developer could resolve this kind of crashes by either (1)
objects (Missing Null-check). Interestingly, 4.67% of the GitHub is- retaining a stateful object when the activity is recreated or (2) avoid-
sues include comments by Android developers acknowledging the ing the activity recreation. We choose the first strategy because it is
fact that the patch issued are merely temporary fixes (Workaround) more flexible as it allows activity recreations instead of preventing
for these crashes that may require future patches to completely the configuration changes altogether. Listing 4 presents an example
resolve the crash. that explains how we retain the Option object by using the saved
Overall, Table 1 shows that the complexity of activity/fragment instance after the configuration changes to prevent null reference
lifecycle and incorrect resource handling are two general causes of the object (S2: Retain object).
of crashes in Android apps. Moreover, “Missing Null-check" in the public void o n C r e a t e ( B u n d l e s a v e d I n s t a n c e S t a t e ) {
“Other" category also often leads to crashes in Android apps. super . o n C r e a t e ( s a v e d I n s t a n c e S t a t e ) ;
s e t R e t a i n I n s t a n c e ( true ) ; // retain this fragment
}
5 STRATEGIES TO RESOLVE CRASHES // new field for saving the object
private static O p t i o n s a v e O p t i o n ;
Our manual analysis of crashes in Android apps identifies eight
public View o n C r e a t e V i e w ( L a y o u t I n f l a t e r i n f l a t e r ,
program transformation operators which are useful for repairing ViewGroup c o n t a i n e r , B u n d l e s a v e d I n s t a n c e S t a t e ) {
these crashes. Table 2 gives an overview of each operator derived // saving and loading the object
through our analysis. As “Missing Null-check" is one of the common if ( o p t i o n ! = null ) { s a v e O p t i o n = o p t i o n ; }
else { o p t i o n = s a v e O p t i o n ; }
causes of crashes in Table 1, we include this operator (S7: Null- switch ( o p t i o n . g e t B u t t o n S t y l e ( ) ) { // crashing point
check) in our set of operators. Another frequently used operator
(5%) that fixes crashes that occur across different categories in Listing 4: Example of handling crashes during configuration
Table 1 is inserting exception handler (S8: Try-catch) which we changes
also include into our set of operators. We now proceed to discuss 4 https://developer.android.com/guide/topics/resources/runtime-changes.html
Repairing Crashes in Android Apps ICSE ’18, May 27-June 3, 2018, Gothenburg, Sweden

Commit transactions Each fragment can be modified indepen- 6 METHODOLOGY


dently of the host activity by performing a set of changes. Each Figure 3 presents the overall workflow of Droix’s repair framework.
set of changes that we commit (perform requested modifications Droix consists of several components: a test replayer, a log analyzer,
atomically) to the activity is called a transaction. Android docu- a mutant generator, a test checker, a code checker, and a selector.
mentation 5 specifies rules to prohibit committing transactions at Given a buggy APK P and UI event sequences U extracted from its
certain stages of the lifecyle. Transactions that are committed in bug report, Droix produces a patched APK P ′ that passes U and has
disallowed stages will cause the app to throw an exception. For the minimum number of properties violations.
example, invoking commit() after onSaveInstanceState() will Droix fixes a crash using a two-phase approach. In the first phase,
lead to IllegalStateException since the transaction could not be Droix generates an instrumented APK I to log all executed callbacks.
recorded during this stage. We employ two strategies for resolving With the instrumented APK, Droix replays the UI event sequences
the incorrect commits: (S6: Move stmt) moving commit() to a le- U on a device. The log analyzer parses the logs dumped from the
gal callback (e.g., onPostResume()), (S4: Replace method) replacing execution, extracts program locations Locs from the stack trace, and
commit() with commitAllowingStateLoss(). identifies test-level property Rt orig using the recorded callbacks.
Communication between activity and fragment The lifecycle In the second phase, Droix decompiles APK P to the intermedi-
of a fragment is affected by the lifecycle of its host activity 6 . For ate representation. Then, our mutant generator produces a set of
example, in Figure 1, when an activity is created (onCreate()), the candidate apps (stored in the mutant pool) by applying a set of
fragment cannot proceed beyond the onActivityCreated() stage. operators at each location l in Locs. For each operator op, our code
Invoking getActivity() in the illegal stage of the lifecycle will checker records code-level property Rc cand based on the program
return null, since the host activity has not been created or the frag- structure of l and the information in thrown exception. For each
ment is detached from its host activity. A NullPointerException candidate APK C , Droix reinstalls APK C onto the device and replays
may be thrown in the following execution. We employ two strate- U on APKC . Then, our log analyzer parses the dumped logs that
gies for resolving this problem: (S1: GetActivity-check) inserting include the execution information of callback methods to extract
condition if(getActivity()==null), and (S6: Move stmt) mov- new buggy locations and information of test-level property Rt cand .
ing getActivity() to another stage (when the host activity is Given as input Rt cand for APK C , the test checker compares Rt orig
created and the fragment is not detached from the host activity) of with Rt cand to check if APK C introduces any new property viola-
the fragment lifecycle. tions. Finally, our evaluator analyzes Rt cand and Rc cand to compute
Retrieve wrong resource id Android resources are the additional the number of property violations and passes the results to the
files and static content used in Android source code (e.g., bitmaps, selector, which chooses the best app as the final fixed APK.
and layout) 7 . A resource id is of the form R.x .y where x refers to
the type of resource and y represents the name of the resource. 6.1 Test with UI Sequences
The resource id is defined in XML files and it is the parameter of Existing techniques in automated program repair typically rely on
several Android API (e.g., findViewbyId(id) and setText(id)). unit tests [32] or test scripts [28, 35, 53] to guide repair process. As
Android developers may mistakenly use a non-existing resource additional UI tests for checking correctness are often unavailable,
id which leads to Resources$NotFound exception. Listing 5 shows Droix uses user event sequences (e.g., clicks and touches) as input
a scenario where the developers change the string resource id (S3: to repair buggy apps, which introduces new challenges: (1) these
Replace resource id). event sequences are often not included as part of the source code
int m s g S t r I d = R . s t r i n g . c o n f i r m a t i o n _ r e m o v e _ a l e r t ; repository and reproducing these event sequences is often time-
int m s g S t r I d = R . s t r i n g . c o n f i r m a t i o n _ r e m o v e _ f i l e _ a l e r t ; consuming; (2) ensuring that a recorded sequence has been reliably
Listing 5: Example of handling crashes due to wrong resource id replayed multiple times is difficult as UI tests tend to be flaky (the
test execution results may vary for the same configuration).
Incorrect type-cast of resource To implement UI interfaces, an To reduce manual effort in obtaining UI sequences, Droix sup-
Android API 8 (findViewById(id)) could be invoked to retrieve ports several kinds of event sequences, including: (1) a set of actions
widgets (view) in the UI. As each widget is identified by attributes (e.g., clicks, and touches) leading to the crash which can be recorded
defined in the corresponding XML files, an Android developer using monkeyrunner 9 GUI, (2) a set of Android Debug Bridge (adb)
may misinterpret the correct type of a widget, resulting in crashes commands 10 , and (3) scripts with a mixture of recorded actions
due to ClassCastException. We repair the crash by replacing the and adb commands. Non-technical users could record their actions
type cast expression with correct type (S5: Replace cast). Listing 6 with monkeyrunner while Android developers could write adb com-
shows an example where the ImageButton object is incorrectly mands to have better control of the devices (e.g., rotate screen).
type caster. Droix employs several strategies to ensure that the UI test out-
m D e f i n i t i o n = ( TextView ) f i n d V i e w B y I d ( R . i d . d e f i n i t i o n ) ;
come is consistent across different executions [36]. Specifically, for
mDefinition =( ImageButton ) findViewById ( R . id . d e f i n i t i o n ) ; each UI test, Droix automatically launches the app from the home
screen, inserts pauses in between each event sequence, terminates
Listing 6: Example fix for incorrect resource type-cast
5 https://developer.android.com/reference/android/app/FragmentTransaction.html 9 Monkeyrunner contains API that allows controlling Android devices:
6 https://developer.android.com/guide/components/fragments.html https://developer.android.com/studio/test/monkeyrunner/index.html
7 https://developer.android.com/guide/topics/resources/accessing-resources.html 10 ADB is a command-line tool that are used to control Android devices:
8 https://developer.android.com/reference/android/app/Activity.html https://developer.android.com/studio/command-line/adb.html
ICSE ’18, May 27-June 3, 2018, Gothenburg, Sweden Shin Hwei Tan, Zhen Dong, Xiang Gao, and Abhik Roychoudhury

Buggy APK Droix


Mutant
Logs Bug locations
Log analyzer Mutant Decompiler
generator
Bug report UI sequence
Test checker Operators
Violations
Evaluator Code checker
Violations

Fixed APK Selector Mutants pool


APKs

Figure 3: Droix’s Android Repair Framework

Table 3: Code-level and Test-level Properties Enforced in Droix

Level Type Description


Well-formedness Verify that a mutated APK is compilable and the structural type of the program matches the requires context of the selected operator.
Code-level Bug hazard Checks whether a transformation violates Java exception-handling best practices.
Exception Type Checks whether a transformation matches a given exception type. (e.g., Insert Null-check should be used for fixing NullPointerException exclusively)
Lifecycle Checks that the event transition matches with the activity and fragment lifecycle model (Figure 1).
Test-level Activity-Fragment Checks that the interaction between a fragment and its parent activity matches the activity-fragment coordination model (dashed lines in Figure 1)
Commit Checks that a commit of a fragment’s transactions is performed in the allowed states (i.e., after an activity’s state is saved).

the apps after test execution, and brings the android device back to exception handling bug hazards and Java exception handling best
home screen (ensure that the last state of the device is the same as practices [18]. Given an exception E that leads to a crash, our code
the initial state of the device). Moreover, Droix executes each UI checker categorizes E as either a checked exception, an unchecked
test for at least three times in which each test execution has pauses exception, or an error to determine if we could insert a handler (try-
of different duration (5, 10, 15 seconds) inserted in between events. catch block) for E. According the Java exception handling best prac-
tice “Error represents an unrecoverable condition which should not
6.2 Fault Localization be handled”, hence, our code checker considers inserting handler
Our fault localization step pinpoints faulty program locations lead- for runtime errors a hard constraint and eliminates such patches. In
ing to the crash. Since our approach does not require source code contrast, inserting handlers for unchecked and checked exceptions
nor heavy test suite, we leverage stack trace information for fault are encoded as soft constraints that could affect the score of a mu-
localization. The stack trace contains (1) the type of exceptions tant. Meanwhile, we encode the well-formedness property and the
being thrown, (2) the specific lines of code where the exception is exception type property as hard constraints that should be satisfied.
thrown, and (3) the list of classes and method calls in the runtime Given a previous lifecycle callback prev and a current lifecy-
stack when the exception occurs. We use stack trace information cle callback curr, our test checker verifies if prev → curr obeys
for fault localization because (1) this information is often included the activity/fragment lifecycle management rules (Figure 1). Droix
in the bug report of crashes (which allows us to compare the actual considers all test-level properties as soft constraints because these
exception thrown with the expected exception) and (2) prior study properties may not be directly related to the crash (e.g., resource-
has shown the effectiveness of using stack trace to locate Java run- related crashes).
time exceptions [44]. The stack trace information is given to our
search algorithm for fix localization. When searching for complex Algorithm 1: Patch generation algorithm
fixes, once a fix using initial stack trace is generated, it may enable Input: Buggy AP K P , Operators Op , Population size PopSize , UI test U ,
other crashes, leading to new stack traces and new fixes. Program Locations Locs
Input: Fitness F it : < P atch, Rc, Rt >→ Z
Result: APK that passes U and contains least property violations
6.3 Code Checker and Test Checker Pop ← init ial Popul at ion(AP K P , PopSize ) ;
Instead of relying solely on the UI test outcome, Droix enforces while ∄C ∈ Pop .C passes U do
Mut ant s ← Mut at e (Pop, Op, Locs ) ; // apply Op at l ∈ Locs
two kinds of properties: code-level properties (properties that are /* select mutant with least Rc and Rt violations */
checked prior to test execution) and test-level properties (properties Pop ← Sel ect (Mut ant s, PopSize, F it ) ;
that are verified during/after test execution). These properties are end
important because (1) they serve as additional test oracles for vali-
dating candidate apps; and (2) they could compensate for the lack
of passing UI tests.
Table 3 shows different properties enforced in Droix. Bug hazard 6.4 Mutant Generation and Evaluation
is a circumstance that increases likelihood of a bug being present Droix supports eight operators derived from our study of crashes in
in a program [13]. A recent study of Android apps reveals several Android apps (Section 4). Table 2 shows the details of each operator.
Repairing Crashes in Android Apps ICSE ’18, May 27-June 3, 2018, Gothenburg, Sweden

Algorithm 1 presents our patch generation algorithm. Droix none of these studies try to replicate the reported crashes. There-
leverages (µ+λ) evolutionary algorithm with µ = 40 and λ = 20. fore, all existing benchmarks cannot be used for evaluating the
Given as input population size PopSize, fitness function Fit, and a effectiveness of analyzing crashes in Android apps.
list of faulty locations Locs, our approach iteratively generates new We introduce a new benchmark, called DroixBench that con-
mutants by applying one of the operators listed in Table 2 at each tains 24 reproducible crashes in 15 real-world Android apps. Apart
location in Locs, evaluates each mutant by executing the input UI from evaluating Droix, this benchmark could be used to assess
event sequences U , and computes the number of code-level property the effectiveness of detecting and analyzing crashes in Android
Rc and test-level property Rt violations. The generate-and-validate apps. To facilitate future research on analysis of crashes, we made
process terminates when either there exists at least one mutant in DroixBench publicly available at: https://droix2017.github.io/.
the population that passes U or the time limit is exceeded. Our patch DroixBench is a new set of Android apps for evaluating Droix.
generation algorithm differs from existing approaches that use Apps used for deriving transformation operators in Section 4 are
evolutionary algorithm [24, 53] in which we use a different patch excluded from DroixBench to avoid the overfitting problem in the
representation and fitness function. Specifically, each mutant is an evaluation. Specifically, we modified our crawler to find the most
APK in our representation. Instead of using the number of passing recent issues (bug reports) on Android apps crashes on GitHub. Our
tests as the fitness function, our fitness function Fit computes the goal is to identify a set of reproducible crashes in Android apps.
number of code-level and test-level property violations. To reduce the time in manual inspection of these bug reports, our
crawler excludes (1) issues without any bug-fixing commits (which
7 IMPLEMENTATION is essential for comparing patch quality); (2) unresolved issues (to
avoid invalid failures); and (3) non-Android related issues (e.g., iOS
Our Android repair framework leverages various open source tools
crashes) . This step yields more than 300 GitHub issues. We further
to support different components. Specifically, our log analyzer uses
exclude defects that do not fulfill the criteria below:
Logcat 11 , a command-line tool that generates logs when events
Device-specific defects. We eliminate defects that require specific
occur on an Android device. We implement the eight operators in
versions/brands of Android devices.
Table 2 on top of the Soot framework (v2.5.0) [25]. Soot is a Java
Resource-dependent defects. We eliminate defects that require
optimization framework that supports analysis and transformation
specific resources (e.g., making phone calls) as we may not be able
of Java bytecode. Dexpler, a module included in Soot leverages
to replicate these issues easily on an Android emulator.
an Dalvik bytecode disassembler to produce Jimple (a Soot rep-
Irreproducible crashes. We eliminate crashes that are deemed
resentation) which enables reading and writing Dalvik bytecode
irreproducible by the developers.
directly [11]. We use the Dexpler module in Soot for our decom-
piler component in Figure 3. To support the “S4: Replace method"
operator, we use the Levenshtein distance to select a method with 9 EVALUATION
similar method name and compatible parameter types. Our imple- We perform evaluation on the effectiveness of Droix in repairing
mentation for the “S3: Replace resource id" operator uses Android crashes on real Android apps and we compare the quality of Droix’s
resource parser in FlowDroid [7] to obtain a resource id of the same patch with the quality of the human patch. Our evaluation aims to
type. As each compiled APK needs to be signed before installation, address the following research questions:
we use jarsigner 12 for signing the compiled APK. We re-install
the signed APK onto the device using adb commands 13 . Instead of RQ1 How many crashes in Android apps can Droix fix?
uninstalling and re-installing each signed app, app re-installation RQ2 How is the quality of the patches generated by Droix com-
allows us to keep the app data (e.g. account information and set- pared with the patches generated by developers?
tings) to save time in re-entering the required information during
subsequent execution of U . 9.1 Experimental Setup
We evaluate Droix on 24 defects from 15 real Android apps in
8 SUBJECTS DroixBench. Table 4 lists information about the evaluated apps.
While there are various benchmarks used in evaluating the effective- The “Type” column contains information about the specific type
ness of automated testing of Android applications [4, 5, 15, 30] and of exception that causes the crash, whereas the “TestEx” column
the effectiveness of repair approaches for C programs [27, 50, 57], represents the time taken in seconds to execute the UI test. Overall,
a recent study [16] showed that the crashes in these benchmarks DroixBench contains a wide variety of apps of various sizes (4-115K
cannot be adequately reproduced by existing Android testing tools. lines of code) and different types of exceptions that lead to crashes.
Meanwhile, Android-specific benchmark like DROIDBENCH [7] As Droix relies on randomized algorithm, we use the same pa-
does not contain real Android apps and it is designed for evalu- rameters (10 runs for each defect with PopSize=40 and a maximum
ating taint-analysis tools. Although empirical studies on Android of 10 generations) as in GenProg [26] for our experiments. In each
apps [12, 18] investigated the bug reports of real Android apps, run, we report the first found among the lowest score (minimum
property violations) patches. Each run of Droix is terminated after
one hour or when a patch with minimal violations is generated. All
11 https://developer.android.com/studio/command-line/logcat.html
experiments were performed on a machine with a quad-core Intel
12 http://docs.oracle.com/javase/7/docs/technotes/tools/windows/jarsigner.html Core i7-5600U 2.60GHz processor and 12GB of memory. All apps
13 https://developer.android.com/studio/command-line/adb.html are executed on a Google Nexus 5x emulator (Android API25).
ICSE ’18, May 27-June 3, 2018, Gothenburg, Sweden Shin Hwei Tan, Zhen Dong, Xiang Gao, and Abhik Roychoudhury

Table 4: Subject Apps and Their Basic Statistics (C4) Incorrect. We label a APK machine as “Incorrect" if APK machine
leads to undesirable behavior (e.g., causes another crash) but this
App Name Description Version LOC Type TestEx(s) behavior is not observed in APKhuman .
1.2.3 4K NullPointer 42.1 (C5) Better. We label a APK machine as “Better" when APK human
Transistor radio players
1.1.5 4K IllegalState 40.1 leads to regression witnessed by another UI test UR whereas
1.17.1 54K NullPointer 37.2 APK machine passes UR .
Pix-art photo editor
1.17.0 60K NullPointer 42.0
Formally, C1 =⇒ C2 ∧ C2 =⇒ C3, hence, a generated patch
poet writing 1.18.2 12K NullPointer 42.3
PoetAssistant
helper 1.10.4 6K SQLite 60.9 that is syntactically equivalent to the human patch is superior to
both semantically equivalent patch and UI-behavior equivalent
10.10.1 29K NullPointer 50.5
Anymemo flashcard learning patch. We note that, in general, checking whether a patch is se-
10.9.922 33K NullPointer 83.9
2.8.1 73K IllegalState 50.6 mantically equivalent to the human patch (C2) is an undecidable
AnkiDroid flashcard learning
2.7b1 73K ClassCast 37.2 problem. However, in our manual analysis, the correct behavior for
opensoure app 0.103.2 50K IllegalState 38.7 all evaluated patches are well-defined. While C1 and C2 investigate
Fdroid
repository 0.98 38K SQLite 37.3 the behavior of patches at the source-code level, we introduce C3
Yalp app repository 0.17 11K NullPointer 57.4 to compare the behavior of patches at the GUI-level. We consider
LabCoat GitLab client 2.2.4 45K NullPointer 49.2 C3 because our approach uses GUI tests for guiding the repair pro-
2.1.4 42K IllegalArgument 32.0 cess. Furthermore, since our approach does not require source code,
finance expense
GnuCash
tracker
2.1.3 40K NullPointer 37.2 direct manual checking of source code may be sometimes tedious.
2.0.5 37K IllegalArgument 42.2
0.4.2b 10K NullPointer 42.5 9.2 Evaluation Results
NoiseCapture noise evaluator
0.4.2b 10K ClassCast 41.2
ConnectBot secure shell client 1.9.2 26K OutOfBounds 57.4 Table 5 shows the patch quality results for Droix. The “Time” col-
umn in Table 5 indicates the time taken in seconds across 10 runs
K9 email client 5.111 115K NullPointer 42.2
for generating the patch before the one-hour time limit is reached.
OpenMF Mifosx client 1.0.1 75K IllegalState 134.0
On average, Droix takes 30 minutes to generate a patch. Meanwhile,
Transdroid torrents client 2.5.0b1 37K NullPointer 45.9 the “Repair” column denotes the number of plausible patches (APKs
Beem communication tool 0.1.7rc1 21K NullPointer 61.3 that pass the UI test) generated by Droix. Overall, Droix generates

15 plausible patches (rows marked with ) out of 24 evaluated de-
fects. Our analysis of the 9 defects that are not repaired by Droix
reveals that all of these defects are difficult to fix because all the
For each defect, we manually inspect the source code of human
corresponding human patches require at least 10 lines of edits.
patched program and the source code decompiled from Droix’s
The “Fix type” column in Table 5 shows the operator used in
patched program. If the source code of automatically patched pro-
each patch (Refer to Table 2 for the description of each operator).
gram differs from the human patched program, we further inves-
The “Null-check" operator is the most frequently used operators
tigate the UI behavior of patched programs by installing both the
(used in six patches and 4/6=67% of these patches are syntactically
human generated APK and the automatically generated APK onto
equivalent to the human patches). These results match with the high
the Android device. For each APK, we manually perform visual
frequency of “Null-check" operator in our empirical study (Table 1).
comparison of the screens triggered by a set of available UI actions
Interestingly, we also observe that the “GetActivity-check" operator
(clicks, swipes) after the crashing point.
tends to produce high quality patches because this operator aims
Definition 1. Given the source code of human patched program
to enforce the “Activity-Fragment" property that checks for the
Src human , the code of an automatically generated patch Src machine ,
coordination between the host activity and its embedded fragment.
the compiled APK of human patched program APK human , the com-
The “Syntactic Equiv.” column in Table 5 shows the patches
piled APK of automatically generated patch APK machine , we mea-
that fulfill C1, while the “Semantic Equiv.” column denotes patches
sure patch quality using the criteria defined below:
that fulfill C2. Similarly, the “UI-behavior Equiv” column demon-
(C1) Syntactically Equivalent. Src machine is “Syntactically Equiv- strates the number of fixed APKs that fulfill the C3 criteria. Par-
alent” if Src machine and Src human are syntactically the same. ticularly, we consider the patch generated by Droix for Anymemo
(C2) Semantically Equivalent. Src machine is “Semantically Equiv- v10.9.922 as “Semantically Equivalent” because both patches use an
alent” if Src machine and Src human are not syntactically the same object of the same type retained before configuration changes to
but produce the same semantic behavior. fix a NullPointerException but the object is retained in different
(C3) UI-behavior Equivalent. APK machine is “UI-behavior Equiv- program locations (i.e., not syntactically equivalent). Meanwhile,
alent” to APK human , if the UI-state at the crashing point after ap- Droix generates three APKs that are UI-behavior equivalent to
plying the automated fix is same as the UI-state at the crashing the human generated APKs. Interestingly, we observed that al-
point after applying the human patch. Two UI-state are considered though the human patches for these defects require multi-lines
to be same if their UI layouts are same, the set of events enabled are fixes, the bug reports for these UI-behavior equivalent patches in-
same, and these events again (recursively) lead to UI-equivalent dicate that specific conditions are required to trigger the crashes
states. UI-behavior equivalence of APK human against APK machine (e.g., mSpinner.getSelectedItemId()!=INVALID_ROW_ID for the
is checked manually in our experiments. GnuCash v2.0.5 defect). As these conditions are difficult to trigger
Repairing Crashes in Android Apps ICSE ’18, May 27-June 3, 2018, Gothenburg, Sweden

Table 5: Patch Quality Results

App Version Time (s) Fix type Repair Syntactic Equiv. Semantic Equiv. UI-behavior Equiv. Others

1.2.3 616 - √
Transistor
1.1.5 987 GetActivity-check better( ⊕ )
1.17.1 1164 - √
PixArt
1.17.0 1525 Null-check △

1.18.2 955 Null-check △
PoetAssistant
1.10.4 3600 -
10.10.1 2104 - √
Anymemo
10.9.922 1336 Retain Object ⊙
2.8.1 3600 - √
AnkiDroid
2.7b1 3600 Try-catch text missing(×)

0.103.2 2293 Replace method ⋆
Fdroid
0.98 518 -
Yalp 0.17 2970 -

LabCoat 2.2.4 2074 Null-check ⋆
2.1.3 360 - √
GnuCash 2.0.5 1492 Try-catch △
2.1.4 3600 -

ConnectBot 1.9.2 572 Try-catch text missing(×)

0.4.2b 340 Null-check √ ⋆
NoiseCapture
0.4.2b 520 Replace cast ⋆

K9 5.111 1718 Try-catch crash(×)

OpenMF 1.0.1 3600 GetActivity-check ⋆

Beem 0.1.7rc1 2378 Null-check ⋆

Transdroid 2.5.0b1 1315 Null-check ⋆
24 15 7 1 3 4

from the UI level, synthesizing precise conditions is not required used operators in fixing open source apps and from Android API
for ensuring UI-behavior equivalent. documentation, our set of operators is not exhaustive.
The “Others” column in Table 5 includes one patch that is bet- Reproducing crashes. We manually reproduce each crash in our
ter than the human patch (marked as ⊕) and three patches that proposed benchmark. As we rely on Android emulator for repro-
are incorrect (marked as ×). We consider the patch for Transistor ducing crashes, the crashes in our benchmark are limited to crashes
v1.1.5 to be better than human patch as it passes regression test that could be reliably reproduced on Android emulators. Crashes
stated in the bug report whereas the human patch introduces a that require specific setup (e.g., making phone calls) may be more
new regression (See Section 3 for detailed explanations). For two challenging or impractical to replay.
of the incorrect patches, we notice that some texts that appear Crashes investigated. As we only investigate open source An-
on the screen of human APKs are missing in the screen of fixed droid apps in our empirical study and in our proposed benchmark,
APKs (text missing). Meanwhile, the crash in k9 v5.111 occurs due our results may not generalize to closed-source apps. We focus on
to an invalid email address for a particular contact. In this case, open source apps because our patch analysis requires the availabil-
the human APK treats the contact as a non-existing contact while ity of source codes. Nevertheless, as Droix takes as input Android
the patched APK displays the contact as unknown recipient and APK, it could be used for fixing closed source apps. We leave the
crashes when the unknown recipient is selected. We think that empirical evaluation of closed source apps as our future work.
both the human APK and the patched APK could be improved (e.g., Patch Quality. During our manual patch analysis, at least two
prompt the user to enter a valid email address instead of ignoring of the authors analyze the quality of human patches versus the
the contact). Although the patch generated by Droix for k9 violates quality of automatically generated patches separately and meet to
the bug hazard property (catching a runtime exception), we select resolve any disagreement. As most bug reports include detailed
this patch as no other patches are found within the time limit. explanations of human patches and the expected behavior of the
crashing UI test, the patch analysis is relatively straightforward.
Droix fixes 15 out of 24 evaluated crashes, seven of these
fixes are the same as the human patches, one repair is semanti-
cally equivalent, three are UI-behavior equivalent. In one rare 11 RELATED WORK
case, we generate better repair. Testing and Analysis of Android Apps. Many automated tech-
niques (AndroidRipper [4], ACTEVE [5], A3 E [9], Collider [23], Dyn-
odroid [30], FSMdroid [47], Fuzzdroid [43], Orbit [56], Sapienz [31],
10 THREATS TO VALIDITY Swifthand [15], and work by Mirzaei et al. [37]) are proposed to gen-
We identify the following threats to the validity of our experiments: erate test inputs for Android apps. Our approach is orthogonal to
Operators used. While we derive our operators from frequently these approaches and the tests generated by these approaches could
ICSE ’18, May 27-June 3, 2018, Gothenburg, Sweden Shin Hwei Tan, Zhen Dong, Xiang Gao, and Abhik Roychoudhury

serve as inputs to our Android repair system. Several approaches fo- Other Repairs of Android Apps EnergyPatch fixes energy bugs
cus on reproducing crashes in Java projects [14, 46, 55]. Meanwhile, in Android apps using a repair expression that captures the resource
CRASHSCOPE [38] automatically detects and reproduces crashes in expression and releases system calls [10]. The battery-aware trans-
Android apps. Our benchmark with 24 reproducible crashes could formations proposed in [17] aims to reduce power consumption
be used for evaluating the effectiveness of these approaches. Simi- of mobile devices. Several approaches generate security patches
lar to Flowdroid [7], we implement our fix operators on top of the for Android apps [39, 59]. While energy bugs and security-related
Soot framework, and we use activity lifecycle information for our vulnerabilities may cause crashes in Android apps, we present a
analysis of Android apps. Instead of considering only the activity generic framework for automated repair of Android crashes, focus-
lifecycle as in Flowdroid, we also encode fragment lifecycle and ing on crashes that occur due to the misunderstanding of Android
activity-fragment coordination as test-level properties. RERAN [22] activity and fragment lifecycles.
could precisely record and replay UI events on Android devices, UI Repair. FlowFixer is an approach that repairs broken workflow
including gestures (e.g., multitouch). While our approach allows in GUI applications that evolve due to GUI refactoring. SITAR uses
UI sequences in forms of scripts recorded in the user interface, annotated event-flow graph for fixing unusable GUI test scripts [20].
the record-and-replay mechanism in RERAN could allow Droix Although Droix takes as input UI test, it automatically fixes buggy
to handle more complex UI events. Although our code checker Android apps rather than the inputs that crash the GUI applications.
incorporates some Java exception handling best practices listed in
recent study of Android apps [18], our empirical study of crashes 12 CONCLUSIONS AND FUTURE WORK
that occur in Android apps goes beyond prior study by performing
We study the common causes of 107 crashes in Android apps. Our
a thorough investigation of the common root causes of Android
investigation reveals that app crashes occur due to missing callback
crashes.
handler (17.76%), improper handling of resources (16%), and viola-
Automated Program Repair. Several techniques (Angelix [35],
tions of management rules for the Android activity and fragment
ASTOR [33], ClearView [41], Directfix [34], GenProg [26], PAR [24],
lifecycles (14%). Based on our analysis of patches issued by Android
Prophet [28], NOPOL [54], relifix [49]) have been introduced to
developers to fix these crashes and the Android API documentations
automatically generate patches. There are several key differences
that specify the correct usage of Android API, we derive a set of
of our Android repair framework compared to other existing re-
lifecycle-aware transformations. To reduce time and effort in fixing
pair approaches. Firstly, instead of relying on the quality of the
crashes in Android apps, we also introduce Droix, a novel Android
test suite for guiding the repair process, our approach augments
repair framework that automatically generates a fixed APK when
a given UI test with code-level and test-level properties for rank-
given as input a buggy APK and UI event sequences. To encour-
ing generated patches. Secondly, existing approaches could not
age future research of Android crashes, we propose DroixBench, a
handle flaky UI tests as they may misinterpret the test outcome
benchmark that contains 24 reproducible crashes occurring in 15
of UI tests and may mistakenly produce invalid patches. Finally,
open source Android apps. Our evaluation on DroixBench demon-
our repair framework modifies compiled APK and each test execu-
strates that Droix could generate repair for 63% of the evaluated
tion is performed remotely on Android emulators, whereas other
crashes and seven of the automatically generated patches are syn-
approaches modify source code directly where each test is being
tactically equivalent to the human patches.
executed on the same platform as other components of the repair
Although our repair framework currently performs analysis and
system. Other studies for automated repair use benchmark for C
mutation of Android apps on desktop machine while executing UI
programs [27, 50, 57, 58], whereas Droixbench contains a set of
tests on an Android emulator, in the future, it is feasible to have
reproducible crashes for Android apps. QACrashFix [19] and work
a standalone repair system that could be installed as an app that
by Azim et al. [8] use Android apps as dataset for experiments,
automatically fixes crashes occurring in other apps on Android
without any Android-specific study of cause for crashes. Their
devices. Since our GUI interface does not assume any programming
repair operators are Android-agnostic. Specifically, QACrashFix
knowledge, our repair framework could potentially benefit general
merely add/delete/replace single node in the Abstract Syntax Tree,
non-technical users who would like to have their own versions of
wheareas work by Azim et al only inserts fault-avoiding code that
fixed apps instead of waiting for the official releases. Moreover, as
is similar to workaround identified in our study in Section 4. To
we observe that many crashes occur due to the misunderstanding
eliminate invalid patches, anti-patterns are proposed as a set of
of activity/fragment lifecycle that are specified in the Android API
forbidden rules that can be enforced on top of search-based repair
documentations, we think that Droix could be used as a plugin that
approaches [51]. Although our code-level and test-level properties
automatically provides management rule violations together with
could be considered as different forms of anti-patterns that are
patch suggestions to assist developers in understanding Android
examined prior to and after test executions, we use these proper-
API specifications.
ties for selecting mutants that violate fewer properties instead of
eliminating these mutants. Similar to Droix that uses stack trace
information for fault localization, the work of Sinha et al. uses ACKNOWLEDGEMENT
stack trace information for locating Java exceptions [44]. However, This research is supported by the National Research Foundation,
their approach only supports analysis of NullPointerException, Prime Minister’s Office, Singapore under its Corporate Laboratory
whereas our approach could automatically repair different types of at University Scheme, National University of Singapore, and Sin-
exceptions. gapore Telecommunications Ltd. The first author thanks Southern
University of Science and Technology for the travel support.
Repairing Crashes in Android Apps ICSE ’18, May 27-June 3, 2018, Gothenburg, Sweden

REFERENCES Symposium on Software Testing and Analysis (ISSTA 2013). ACM, New York, NY,
[1] 2017. What Consumers Really Need and Want. https://goo.gl/puYdkG. (2017). USA, 67–77. https://doi.org/10.1145/2483760.2483777
Accessed 2017-03-27. [24] Dongsun Kim, Jaechang Nam, Jaewoo Song, and Sunghun Kim. 2013. Automatic
[2] Sharad Agarwal, Ratul Mahajan, Alice Zheng, and Victor Bahl. 2010. Diagnosing patch generation learned from human-written patches. In ICSE’ 2013. IEEE Press,
mobile applications in the wild. In Proceedings of the 9th ACM SIGCOMM Workshop 802–811.
on Hot Topics in Networks. ACM, 22. [25] Patrick Lam, Eric Bodden, Ondrej Lhoták, and Laurie Hendren. 2011. The Soot
[3] Domenico Amalfitano, Anna Rita Fasolino, and Porfirio Tramontana. 2011. A framework for Java program analysis: a retrospective. In Cetus Users and Compiler
gui crawling-based technique for android mobile application testing. In Soft- Infastructure Workshop (CETUS 2011), Vol. 15. 35.
ware Testing, Verification and Validation Workshops (ICSTW), 2011 IEEE Fourth [26] Claire Le Goues, Michael Dewey-Vogt, Stephanie Forrest, and Westley Weimer.
International Conference on. IEEE, 252–261. 2012. A Systematic Study of Automated Program Repair: Fixing 55 out of 105
[4] Domenico Amalfitano, Anna Rita Fasolino, Porfirio Tramontana, Salvatore Bugs for $8 Each. In Proceedings of the 34th International Conference on Software
De Carmine, and Atif M Memon. 2012. Using GUI ripping for automated test- Engineering (ICSE ’12). IEEE Press, Piscataway, NJ, USA, 3–13.
ing of Android applications. In Proceedings of the 27th IEEE/ACM International [27] Claire Le Goues, Neal Holtschulte, Edward K Smith, Yuriy Brun, Premkumar
Conference on Automated Software Engineering. ACM, 258–261. Devanbu, Stephanie Forrest, and Westley Weimer. 2015. The ManyBugs and
[5] Saswat Anand, Mayur Naik, Mary Jean Harrold, and Hongseok Yang. 2012. Auto- IntroClass benchmarks for automated repair of C programs. IEEE Transactions
mated concolic testing of smartphone apps. In Proceedings of the ACM SIGSOFT on Software Engineering 41, 12 (2015), 1236–1256.
20th International Symposium on the Foundations of Software Engineering. ACM, [28] Fan Long and Martin Rinard. 2016. Automatic Patch Generation by Learning
59. Correct Code. In Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Sympo-
[6] Alessandro Armando, Alessio Merlo, Mauro Migliardi, and Luca Verderame. 2013. sium on Principles of Programming Languages (POPL ’16). ACM, New York, NY,
Breaking and fixing the android launching flow. Computers & Security 39 (2013), USA, 298–312.
104–115. [29] Qingzhou Luo, Farah Hariri, Lamyaa Eloussi, and Darko Marinov. 2014. An
[7] Steven Arzt, Siegfried Rasthofer, Christian Fritz, Eric Bodden, Alexandre Bar- Empirical Analysis of Flaky Tests. In Proceedings of the 22Nd ACM SIGSOFT
tel, Jacques Klein, Yves Le Traon, Damien Octeau, and Patrick McDaniel. 2014. International Symposium on Foundations of Software Engineering (FSE 2014). ACM,
Flowdroid: Precise context, flow, field, object-sensitive and lifecycle-aware taint New York, NY, USA, 643–653. https://doi.org/10.1145/2635868.2635920
analysis for android apps. Acm Sigplan Notices 49, 6 (2014), 259–269. [30] Aravind Machiry, Rohan Tahiliani, and Mayur Naik. 2013. Dynodroid: An Input
[8] Md Tanzirul Azim, Iulian Neamtiu, and Lisa M Marvel. 2014. Towards self- Generation System for Android Apps. In Proceedings of the 2013 9th Joint Meeting
healing smartphone software via automated patching. In Proceedings of the 29th on Foundations of Software Engineering (ESEC/FSE 2013). ACM, New York, NY,
ACM/IEEE international conference on Automated software engineering. ACM, USA, 224–234. https://doi.org/10.1145/2491411.2491450
623–628. [31] Ke Mao, Mark Harman, and Yue Jia. 2016. Sapienz: Multi-objective Automated
[9] Tanzirul Azim and Iulian Neamtiu. 2013. Targeted and depth-first exploration Testing for Android Applications. In Proceedings of the 25th International Sympo-
for systematic testing of android apps. In Acm Sigplan Notices, Vol. 48. ACM, sium on Software Testing and Analysis (ISSTA 2016). ACM, New York, NY, USA,
641–660. 94–105. https://doi.org/10.1145/2931037.2931054
[10] A. Banerjee, L. K. Chong, C. Ballabriga, and A. Roychoudhury. 2017. EnergyPatch: [32] Matias Martinez, Thomas Durieux, Romain Sommerard, Jifeng Xuan, and Martin
Repairing Resource Leaks to Improve Energy-efficiency of Android Apps. IEEE Monperrus. 2016. Automatic repair of real bugs in Java: A large-scale experiment
Transactions on Software Engineering PP, 99 (2017), 1–1. https://doi.org/10.1109/ on the Defects4J dataset. Empirical Software Engineering (2016), 1–29.
TSE.2017.2689012 [33] Matias Martinez and Martin Monperrus. 2016. ASTOR: A Program Repair Library
[11] Alexandre Bartel, Jacques Klein, Yves Le Traon, and Martin Monperrus. 2012. for Java (Demo). In Proceedings of the 25th International Symposium on Software
Dexpler: Converting Android Dalvik Bytecode to Jimple for Static Analysis with Testing and Analysis (ISSTA 2016). ACM, New York, NY, USA, 441–444. https:
Soot. In Proceedings of the ACM SIGPLAN International Workshop on State of //doi.org/10.1145/2931037.2948705
the Art in Java Program Analysis (SOAP ’12). ACM, New York, NY, USA, 27–38. [34] Sergey Mechtaev, Jooyong Yi, and Abhik Roychoudhury. 2015. Directfix: Looking
https://doi.org/10.1145/2259051.2259056 for simple program repairs. In Proceedings of the 37th International Conference on
[12] Pamela Bhattacharya, Liudmila Ulanova, Iulian Neamtiu, and Sai Charan Koduru. Software Engineering-Volume 1. IEEE Press, 448–458.
2013. An empirical analysis of bug reports and bug fixing in open source android [35] Sergey Mechtaev, Jooyong Yi, and Abhik Roychoudhury. 2016. Angelix: Scalable
apps. In Software Maintenance and Reengineering (CSMR), 2013 17th European multiline program patch synthesis via symbolic analysis. In Software Engineering
Conference on. IEEE, 133–143. (ICSE), 2016 IEEE/ACM 38th International Conference on. IEEE, 691–701.
[13] Robert V Binder. 2000. Testing object-oriented systems: models, patterns, and tools. [36] Atif M. Memon and Myra B. Cohen. 2013. Automated Testing of GUI Applications:
Addison-Wesley Professional. Models, Tools, and Controlling Flakiness. In Proceedings of the 2013 International
[14] N. Chen and S. Kim. 2015. STAR: Stack Trace Based Automatic Crash Reproduc- Conference on Software Engineering (ICSE ’13). IEEE Press, Piscataway, NJ, USA,
tion via Symbolic Execution. IEEE Transactions on Software Engineering 41, 2 1479–1480. http://dl.acm.org/citation.cfm?id=2486788.2487046
(Feb 2015), 198–220. https://doi.org/10.1109/TSE.2014.2363469 [37] Nariman Mirzaei, Sam Malek, Corina S. Păsăreanu, Naeem Esfahani, and Riyadh
[15] Wontae Choi, George Necula, and Koushik Sen. 2013. Guided gui testing of Mahmood. 2012. Testing Android Apps Through Symbolic Execution. SIGSOFT
android apps with minimal restart and approximate learning. In Acm Sigplan Softw. Eng. Notes 37, 6 (Nov. 2012), 1–5. https://doi.org/10.1145/2382756.2382798
Notices, Vol. 48. ACM, 623–640. [38] Kevin Moran, Mario Linares-Vásquez, Carlos Bernal-Cárdenas, Christopher Ven-
[16] Shauvik Roy Choudhary, Alessandra Gorla, and Alessandro Orso. 2015. Au- dome, and Denys Poshyvanyk. 2016. Automatically discovering, reporting and
tomated test input generation for android: Are we there yet?(e). In Automated reproducing android application crashes. In Software Testing, Verification and
Software Engineering (ASE), 2015 30th IEEE/ACM International Conference on. IEEE, Validation (ICST), 2016 IEEE International Conference on. IEEE, 33–44.
429–440. [39] Collin Mulliner, Jon Oberheide, William Robertson, and Engin Kirda. 2013. Patch-
[17] Jürgen Cito, Julia Rubin, Phillip Stanley-Marbell, and Martin Rinard. 2016. Battery- droid: Scalable third-party security patches for android devices. In Proceedings of
aware transformations in mobile applications. In Automated Software Engineering the 29th Annual Computer Security Applications Conference. ACM, 259–268.
(ASE), 2016 31st IEEE/ACM International Conference on. IEEE, 702–707. [40] Hoang Duong Thien Nguyen, Dawei Qi, Abhik Roychoudhury, and Satish Chan-
[18] Roberta Coelho, Lucas Almeida, Georgios Gousios, Arie Van Deursen, and dra. 2013. SemFix: Program repair via semantic analysis. In Proceedings of the
Christoph Treude. 2016. Exception handling bug hazards in Android. Empirical 2013 International Conference on Software Engineering. IEEE Press, 772–781.
Software Engineering (2016), 1–41. [41] Jeff H. Perkins, Sunghun Kim, Sam Larsen, Saman Amarasinghe, Jonathan
[19] Qing Gao, Hansheng Zhang, Jie Wang, Yingfei Xiong, Lu Zhang, and Hong Mei. Bachrach, Michael Carbin, Carlos Pacheco, Frank Sherwood, Stelios Sidiroglou,
2015. Fixing recurring crash bugs via analyzing q&a sites (T). In Automated Greg Sullivan, Weng-Fai Wong, Yoav Zibin, Michael D. Ernst, and Martin Rinard.
Software Engineering (ASE), 2015 30th IEEE/ACM International Conference on. 2009. Automatically Patching Errors in Deployed Software. In SOSP. 87–102.
IEEE, 307–318. [42] Yuhua Qi, Xiaoguang Mao, Yan Lei, Ziying Dai, and Chengsong Wang. 2014. The
[20] Zebao Gao, Zhenyu Chen, Yunxiao Zou, and Atif M Memon. 2016. Sitar: GUI Strength of Random Search on Automated Program Repair. In Proceedings of the
test script repair. Ieee transactions on software engineering 42, 2 (2016), 170–186. 36th International Conference on Software Engineering (ICSE). ACM, New York,
[21] Laurence Goasduff and Christy Pettey. 2012. Gartner says worldwide smartphone NY, USA, 254–265.
sales soared in fourth quarter of 2011 with 47 percent growth. Visited April (2012). [43] Siegfried Rasthofer, Steven Arzt, Stefan Triller, and Michael Pradel. 2017. Making
[22] Lorenzo Gomez, Iulian Neamtiu, Tanzirul Azim, and Todd Millstein. 2013. RERAN: Malory Behave Maliciously: Targeted Fuzzing of Android Execution Environ-
Timing- and Touch-sensitive Record and Replay for Android. In Proceedings of ments. In Proceedings of the 39th International Conference on Software Engineering
the 2013 International Conference on Software Engineering (ICSE ’13). IEEE Press, (ICSE ’17). IEEE Press, Piscataway, NJ, USA, 300–311. https://doi.org/10.1109/
Piscataway, NJ, USA, 72–81. http://dl.acm.org/citation.cfm?id=2486788.2486799 ICSE.2017.35
[23] Casper S. Jensen, Mukul R. Prasad, and Anders Møller. 2013. Automated Testing [44] Saurabh Sinha, Hina Shah, Carsten Görg, Shujuan Jiang, Mijung Kim, and
with Targeted Event Sequence Generation. In Proceedings of the 2013 International Mary Jean Harrold. 2009. Fault Localization and Repair for Java Runtime
ICSE ’18, May 27-June 3, 2018, Gothenburg, Sweden Shin Hwei Tan, Zhen Dong, Xiang Gao, and Abhik Roychoudhury

Exceptions. In Proceedings of the Eighteenth International Symposium on Soft- ACM SIGSOFT International Symposium on Foundations of Software Engineering.
ware Testing and Analysis (ISSTA ’09). ACM, New York, NY, USA, 153–164. ACM, 727–738.
https://doi.org/10.1145/1572272.1572291 [52] W. Weimer, Z.P. Fry, and S. Forrest. 2013. Leveraging program equivalence
[45] Edward K Smith, Earl T Barr, Claire Le Goues, and Yuriy Brun. 2015. Is the cure for adaptive program repair: Models and first results. In Automated Software
worse than the disease? overfitting in automated program repair. In Proceedings Engineering (ASE).
of the 2015 10th Joint Meeting on Foundations of Software Engineering. ACM, [53] Westley Weimer, ThanhVu Nguyen, Claire Le Goues, and Stephanie Forrest. 2009.
532–543. Automatically finding patches using genetic programming. In ICSE. 364–374.
[46] Mozhan Soltani, Annibale Panichella, and Arie van Deursen. 2017. A guided [54] J. Xuan, M. Martinez, F. DeMarco, M. Clement, S. Lamelas Marcote, T. Durieux,
genetic algorithm for automated crash reproduction. In Proceedings of the 39th D. Le Berre, and M. Monperrus. 2016. Nopol: Automatic Repair of Conditional
International Conference on Software Engineering. IEEE Press, 209–220. Statement Bugs in Java Programs. IEEE Transactions on Software Engineering PP,
[47] Ting Su. 2016. FSMdroid: Guided GUI Testing of Android Apps. In Proceedings of 99 (2016), 1–1.
the 38th International Conference on Software Engineering Companion (ICSE ’16). [55] Jifeng Xuan, Xiaoyuan Xie, and Martin Monperrus. 2015. Crash reproduction via
ACM, New York, NY, USA, 689–691. https://doi.org/10.1145/2889160.2891043 test case mutation: Let existing test cases help. In Proceedings of the 2015 10th
[48] Shin Hwei Tan, Darko Marinov, Lin Tan, and Gary T. Leavens. 2012. @tCom- Joint Meeting on Foundations of Software Engineering. ACM, 910–913.
ment: Testing Javadoc Comments to Detect Comment-Code Inconsistencies. In [56] Wei Yang, Mukul R Prasad, and Tao Xie. 2013. A grey-box approach for automated
Proceedings of the 2012 IEEE Fifth International Conference on Software Testing, GUI-model generation of mobile applications. In International Conference on
Verification and Validation (ICST ’12). IEEE Computer Society, Washington, DC, Fundamental Approaches to Software Engineering. Springer, 250–265.
USA, 260–269. https://doi.org/10.1109/ICST.2012.106 [57] Jooyong Yi, Umair Z. Ahmed, Amey Karkare, Shin Hwei Tan, and Abhik Roy-
[49] Shin Hwei Tan and Abhik Roychoudhury. 2015. Relifix: Automated Repair choudhury. 2017. A Feasibility Study of Using Automated Program Repair for
of Software Regressions. In Proceedings of the 37th International Conference on Introductory Programming Assignments. In Proceedings of the 2017 11th Joint
Software Engineering - Volume 1 (ICSE ’15). IEEE Press, Piscataway, NJ, USA, Meeting on Foundations of Software Engineering (ESEC/FSE 2017). ACM, New York,
471–482. http://dl.acm.org/citation.cfm?id=2818754.2818813 NY, USA, 740–751. https://doi.org/10.1145/3106237.3106262
[50] Shin Hwei Tan, Jooyong Yi, Yulis, Sergey Mechtaev, and Abhik Roychoudhury. [58] Jooyong Yi, Shin Hwei Tan, Sergey Mechtaev, Marcel Böhme, and Abhik Roy-
2017. Codeflaws: A Programming Competition Benchmark for Evaluating Auto- choudhury. 2017. A correlation study between automated program repair and
mated Program Repair Tools. In Proceedings of the 39th International Conference test-suite metrics. Empirical Software Engineering (2017), 1–32.
on Software Engineering Companion (ICSE-C ’17). IEEE Press, Piscataway, NJ, USA, [59] Mu Zhang and Heng Yin. 2014. AppSealer: Automatic Generation of Vulnerability-
180–182. https://doi.org/10.1109/ICSE-C.2017.76 Specific Patches for Preventing Component Hijacking Attacks in Android Appli-
[51] Shin Hwei Tan, Hiroaki Yoshida, Mukul R Prasad, and Abhik Roychoudhury. 2016. cations. In NDSS.
Anti-patterns in search-based program repair. In Proceedings of the 2016 24th

You might also like