CVE-2021-1048: Android kernel refcount increment on mid-destruction file
Jann Horn
The Basics
NOTE: The original vulnerability was in the Linux kernel, but in-the-wild exploitation was only seen on Android-based devices, which run Android-specific kernel forks
Disclosure or Patch Date: it's complicated (but the Android bulletin is from 6 November 2021)
Product: Android / Linux kernel
Advisory: ASB 2021-11
Affected Versions (upstream Linux):
- 5.9-rc2 - 5.9-rc3 (mainline: only release candidates affected)
- 5.8.4 - 5.8.7 (short-lived stable branch)
- date range: 2020-08-26 - 2020-09-09
- 5.7.18 and higher (short-lived stable branch, EOL before fix)
- date range: 2020-08-26 - EOL
- 5.4.61 - 5.4.63 (LTS stable branch)
- date range: 2020-08-26 - 2020-09-09
- 4.19.142 - 4.19.143 (LTS stable branch)
- date range: 2020-08-26 - 2020-09-09
- 4.14.195 - 4.14.196
- date range: 2020-08-26 - 2020-09-09
- 4.9.234 - 4.9.235
- date range: 2020-08-26 - 2020-09-12
- 4.4.234 - 4.4.235
- date range: 2020-08-26 - 2020-09-12
Affected Versions (Android devices): possibly some Android devices before SPL 2021-11-06, depending on LTS syncs
First Patched Version:
- upstream: 5.9-rc4, 5.8.8, 5.4.64, 4.19.144, 4.14.197, 4.9.236, 4.4.236
- Android devices: SPL 2021-11-06 or lower (see "context of bug" section for explanation)
Issue/Bug Report (upstream Linux): https://lore.kernel.org/linux-fsdevel/[email protected]/T/#u
Issue/Bug Report (Android devices): unknown
Patch CL: https://git.kernel.org/linus/77f4689de17c
Bug-Introducing CL: https://git.kernel.org/linus/a9ed4a6560b8 (bugfix for another memory corruption)
Reporter(s) (upstream Linux): syzbot/syzkaller
Reporter(s) (Android devices): unknown
The Code
Proof-of-concept: N/A
Exploit sample: N/A
Did you have access to the exploit sample when doing the analysis? no
The Vulnerability
Bug class: object state confusion leading to use-after-free
Vulnerability details:
ep_loop_check_proc()
is trying to increment the refcount of a file with
get_file()
. However, get_file()
is only allowed when a refcounted reference
is already held to the file; and ep_loop_check_proc()
instead relies on
locking ep->mtx
to protect the weak reference to the file from concurrent
removal by eventpoll_release()
, which doesn't prevent encountering a file with
refcount zero.
Here is a diagram of the relevant lifetime states of struct file
:
Essentially, get_file()
is called on an object that may be in a state in which
get_file()
is not permitted.
Patch analysis:
get_file()
is replaced with get_file_rcu()
, which is valid for (a superset
of) all possible states of the file.
Thoughts on how this vuln might have been found (fuzzing, code auditing, variant analysis, etc.): Since the bug was quickly fixed in upstream Linux, but not in all Android devices, there's a good chance that the attackers specifically searched for memory corruption fixes that are present upstream but not in Android devices.
This reminds me of https://googleprojectzero.blogspot.com/2019/11/bad-binder-android-in-wild-exploit.html , another case where a bug was fixed upstream but not in all Android kernels.
(Historical/present/future) context of bug:
The commit that introduced the bug (and fixed another one) was included in the Android Security Bulletin for December 2020, forcing all Android vendors to include that commit. However, the fix for this bug, despite quickly landing in upstream stable kernels (see "Affected Versions" above), was only included in an Android Security Bulletin in November 2021.
This means that devices by Android vendors who only cherrypick bugfixes referenced in Android Security Bulletins, rather than pulling the complete Android common kernel tree, will have been vulnerable for almost a year, even though upstream stable releases (and Android common kernels) were only affected for ~2-3 weeks.
That doesn't necessarily mean that all Android devices were affected that long
though; for example, Pixel 4 XL devices seem to have been patched in their
March 2021 security update through the periodic LTS update from 4.14.191 to
4.14.199.
The kernel versions that were shipped to Pixel 4 XL devices are (from running
strings
on boot.img
in the firmware images):
- in the December 2020 update:
4.14.191-gf6c9439f069c-ab6924784
(still vulnerable?) - in the January 2021 update:
4.14.191-gd36f32db91a3-ab6960308
(still vulnerable?) - in the February 2021 update:
4.14.191-gd36f32db91a3-ab7006457
(still vulnerable?) - in the March 2021 update:
4.14.199-g815ef3fd6754-ab7079165
(fixed) - in the April 2021 update:
4.14.199-gb0863551cb91-ab7132611
(fixed)
The Exploit
(The terms exploit primitive, exploit strategy, exploit technique, and exploit flow are defined here.)
Exploit strategy (or strategies): N/A - no exploit sample to analyze
Exploit flow:
Known cases of the same exploit flow:
Part of an exploit chain?
The Next Steps
Variant analysis
Areas/approach for variant analysis (and why):
I think there are two approaches for variant analysis here:
- Check whether any Linux kernel patches listed in Android Security Bulletins
are referenced by other commits in the
Fixes:
tag, and verify for any hits that they either aren't security-relevant or have also been included in an ASB. - Look whether there are any other codepaths that extract a file from an epoll item and assume that its refcount is non-zero.
Found variants:
I found no variants with clear security implications.
Re #1, the following upstream Linux commits referenced in bulletins from 2020 and 2021 are referenced by followup fix commits:
- d0cb50185ae9 (
do_last(): fetch directory ->i_mode and ->i_uid before it's too late
)- followup: 6404674acd59 (
vfs: fix do_last() regression
)- reported by syzkaller: https://syzkaller.appspot.com/bug?extid=190005201ced78a74ad6
- looks like just a NULL deref when racing?
- followup: 6404674acd59 (
- 07e6124a1a46 (
vt: selection, close sel_buffer race
)- followup: e8c75a30a23c (
vt: selection, push sel_lock up
)- deadlock fix
- followup: 4b70dd57a15d (
vt: selection, push console lock down
)- deadlock fix
- followup: e8c75a30a23c (
- 594cc251fdd0 (
make 'user_access_begin()' do 'access_ok()'
)- followup: ab10ae1c3bef (
lib: Reduce user_access_begin() boundaries in strncpy_from_user() and strnlen_user()
)- looks like a powerpc-specific performance regression fix?
- followup: ab10ae1c3bef (
- 6d390e4b5d48 (
locks: fix a potential use-after-free problem when wakeup a waiter
)- followup: dcf23ac3e846 (
locks: reinstate locks_delete_block optimization
)- performance regression fix
- followup: dcf23ac3e846 (
- a9ed4a6560b8 (
epoll: Keep a reference on files added to the check list
)- followup: 77f4689de17c (
fix regression in "epoll: Keep a reference on files added to the check list"
)- original case
- followup: 77f4689de17c (
- 21998a351512 (
x86/speculation: Avoid force-disabling IBPB based on STIBP and enhanced IBRS.
)- followup: 33fc379df76b (
x86/speculation: Fix prctl() when spectre_v2_user={seccomp,prctl},ibpb
)- fixes incorrect reporting of speculation mitigation status on X86
- followup: 1978b3a53a74 (
x86/speculation: Allow IBPB to be conditionally enabled on CPUs with always-on STIBP
)- fixes not being able to turn on IBPB on X86
- followup: 33fc379df76b (
- 8019ad13ef7f (
futex: Fix inode life-time issue
)- followup: 8d67743653dc (
futex: Unbreak futex hashing
)- performance regression fix, theoretically also correctness fix
- followup: 8d67743653dc (
Re #2: The only place that looks vaguely interesting in that regard is
ep_item_poll()
: From what I can tell, it can invoke vfs_poll()
on a file
whose refcount is already zero, but only before the file's ->release()
handler
is called. But I think that's fine.
Structural improvements
What are structural improvements such as ways to kill the bug class, prevent the introduction of this vulnerability, mitigate the exploit flow, make this type of vulnerability harder to exploit, etc.?
Ideas to kill the bug class: In my opinion, the bug class here is "object state confusion", and killing the bug class would have to involve using static analysis and annotations to sanity-check whether object states match the requirements.
Ideas to mitigate the exploit flow: N/A
Other potential improvements:
When cherrypicking specific security fixes, it would probably be a good idea to
at least monitor the upstream repository for commits that refer to the
cherrypicked patch with Fixes:
.
0-day detection methods
What are potential detection methods for similar 0-days? Meaning are there any ideas of how this exploit or similar exploits could be detected as a 0-day?