Tuesday, January 18, 2022

Zooming in on Zero-click Exploits

Posted by Natalie Silvanovich, Project Zero


Zoom is a video conferencing platform that has gained popularity throughout the pandemic. Unlike other video conferencing systems that I have investigated, where one user initiates a call that other users must immediately accept or reject, Zoom calls are typically scheduled in advance and joined via an email invitation. In the past, I hadn’t prioritized reviewing Zoom because I believed that any attack against a Zoom client would require multiple clicks from a user. However, a zero-click attack against the Windows Zoom client was recently revealed at Pwn2Own, showing that it does indeed have a fully remote attack surface. The following post details my investigation into Zoom.

This analysis resulted in two vulnerabilities being reported to Zoom. One was a buffer overflow that affected both Zoom clients and MMR servers, and one was an info leak that is only useful to attackers on MMR servers. Both of these vulnerabilities were fixed on November 24, 2021.

Zoom Attack Surface Overview

Zoom’s main feature is multi-user conference calls called meetings that support a variety of features including audio, video, screen sharing and in-call text messages. There are several ways that users can join Zoom meetings. To start, Zoom provides full-featured installable clients for many platforms, including Windows, Mac, Linux, Android and iPhone. Users can also join Zoom meetings using a browser link, but they are able to use fewer features of Zoom. Finally, users can join a meeting by dialing phone numbers provided in the invitation on a touch-tone phone, but this only allows access to the audio stream of a meeting. This research focused on the Zoom client software, as the other methods of joining calls use existing device features.

Zoom clients support several communication features other than meetings that are available to a user’s Zoom Contacts. A Zoom Contact is a user that another user has added as a contact using the Zoom user interface. Both users must consent before they become Zoom Contacts. Afterwards, the users can send text messages to one another outside of meetings and start channels for persistent group conversations. Also, if either user hosts a meeting, they can invite the other user in a manner that is similar to a phone call: the other user is immediately notified and they can join the meeting with a single click. These features represent the zero-click attack surface of Zoom. Note that this attack surface is only available to attackers that have convinced their target to accept them as a contact. Likewise, meetings are part of the one-click attack surface only for Zoom Contacts, as other users need to click several times to enter a meeting.

That said, it’s likely not that difficult for a dedicated attacker to convince a target to join a Zoom call even if it takes multiple clicks, and the way some organizations use Zoom presents interesting attack scenarios. For example, many groups host public Zoom meetings, and Zoom supports a paid Webinar feature where large groups of unknown attendees can join a one-way video conference. It could be possible for an attacker to join a public meeting and target other attendees. Zoom also relies on a server to transmit audio and video streams, and end-to-end encryption is off by default. It could be possible for an attacker to compromise Zoom’s servers and gain access to meeting data.

Zoom Messages

I started out by looking at the zero-click attack surface of Zoom. Loading the Linux client into IDA, it appeared that a great deal of its server communication occurred over XMPP. Based on strings in the binary, it was clear that XMPP parsing was performed using a library called gloox. I fuzzed this library using AFL and other coverage-guided fuzzers, but did not find any vulnerabilities. I then looked at how Zoom uses the data provided over XMPP.

XMPP traffic seemed to be sent over SSL, so I located the SSL_write function in the binary based on log strings, and hooked it using Frida. The output contained many XMPP stanzas (messages) as well as other network traffic, which I analyzed to determine how XMPP is used by Zoom. XMPP is used for most communication between Zoom clients outside of meetings, such as messages and channels, and is also used for signaling (call set-up) when a Zoom Contact invites another Zoom Contact to a meeting.

I spent some time going through the client binary trying to determine how the client processes XMPP, for example, if a stanza contains a text message, how is that message extracted and displayed in the client. Even though the Zoom client contains many log strings, this was challenging, and I eventually asked my teammate Ned Williamson for help locating symbols for the client. He discovered that several old versions of the Android Zoom SDK contained symbols. While these versions are roughly five years old, and do not present a complete view of the client as they only include some libraries that it uses, they were immensely helpful in understanding how Zoom uses XMPP.

Application-defined tags can be added to gloox’s XMPP parser by extending the class StanzaExtension and implementing the method newInstance to define how the tag is converted into a C++ object. Parsed XMPP stanzas are then processed using the MessageHandler class. Application developers extend this class, implementing the method handleMessage with code that performs application functionality based on the contents of the stanza received. Zoom implements its XMPP handling in CXmppIMSession::handleMessage, which is a large function that is an entrypoint to most messaging and calling features. The final processing stage of many XMPP tags is in the class ns_zoom_messager::CZoomMMXmppWrapper, which contains many methods starting with ‘On’ that handle specific events. I spent a fair amount of time analyzing these code paths, but didn’t find any bugs. Interestingly, Thijs Alkemade and Daan Keuper released a write-up of their Pwn2Own bug after I completed this research, and it involved a vulnerability in this area.

RTP Processing

Afterwards, I investigated how Zoom clients process audio and video content. Like all other video conferencing systems that I have analyzed, it uses Real-time Transport Protocol (RTP) to transport this data. Based on log strings included in the Linux client binary, Zoom appears to use a branch of WebRTC for audio. Since I have looked at this library a great deal in previous posts, I did not investigate it further. For video, Zoom implements its own RTP processing and uses a custom underlying codec named Zealot (libzlt).

Analyzing the Linux client in IDA, I found what I believed to be the video RTP entrypoint, and fuzzed it using afl-qemu. This resulted in several crashes, mostly in RTP extension processing. I tried modifying the RTP sent by a client to reproduce these bugs, but it was not received by the device on the other side and I suspected the server was filtering it. I tried to get around this by enabling end-to-end encryption, but Zoom does not encrypt RTP headers, only the contents of RTP packets (as is typical of most RTP implementations).

Curious about how Zoom server filtering works, I decided to set up Zoom On-Premises Deployment. This is a Zoom product that allows customers to set up on-site servers to process their organization’s Zoom calls. This required a fair amount of configuration, and I ended up reaching out to the Zoom Security Team for assistance. They helped me get it working, and I greatly appreciate their contribution to this research.

Zoom On-Premises Deployments consist of two hosts: the controller and the Multimedia Router (MMR). Analyzing the traffic to each server, it became clear that the MMR is the host that transmits audio and video content between Zoom clients. Loading the code for the MMR process into IDA, I located where RTP is processed, and it indeed parses the extensions as a part of its forwarding logic and verifies them correctly, dropping any RTP packets that are malformed.

The code that processes RTP on the MMR appeared different than the code that I fuzzed on the device, so I set up fuzzing on the server code as well. This was challenging, as the code was in the MMR binary, which was not compiled as a relocatable binary (more on this later). This meant that I couldn’t load it as a library and call into specific offsets in the binary as I usually do to fuzz binaries that don’t have source code available. Instead, I compiled my own fuzzing stub that called the function I wanted to fuzz as a relocatable that defined fopen, and loaded it using LD_PRELOAD when executing the MMR binary. Then my code would take control of execution the first time that the MMR binary called fopen, and was able to call the function being fuzzed.

This approach has a lot of downsides, the biggest being that the fuzzing stub can’t accept command line parameters, execution is fairly slow and a lot of fuzzing tools don’t honor LD_PRELOAD on the target. That said, I was able to fuzz with code coverage using Mateusz Jurczyk’s excellent DrSanCov, with no results.

Packet Processing

When analyzing RTP traffic, I noticed that both Zoom clients and the MMR server process a great deal of packets that didn’t appear to be RTP or XMPP. Looking at the SDK with symbols, one library appeared to do a lot of serialization: libssb_sdk.so. This library contains a great deal of classes with the methods load_from and save_to defined with identical declarations, so it is likely that they all implement the same virtual class.

One parameter to the load_from methods is an object of class msg_db_t,  which implements a buffer that supports reading different data types. Deserialization is performed by load_from methods by reading needed data from the msg_db_t object, and serialization is performed by save_to methods by writing to it.

After hooking a few save_to methods with Frida and comparing the written output to data sent with SSL_write, it became clear that these serialization classes are part of the remote attack surface of Zoom. Reviewing each load_from method, several contained code similar to the following (from ssb::conf_send_msg_req::load_from).

ssb::i_stream_t<ssb::msg_db_t,ssb::bytes_convertor>::operator>>(

msg_db, &this->str_len, consume_bytes, error_out);

  str_len = this->str_len;

  if ( str_len )

  {

    mem = operator new[](str_len);

    out_len = 0;

    this->str_mem = mem;

    ssb::i_stream_t<ssb::msg_db_t,ssb::bytes_convertor>::

read_str_with_len(msg_db, mem, &out_len);

read_str_with_len is defined as follows.

int __fastcall ssb::i_stream_t<ssb::msg_db_t,ssb::bytes_convertor>::

read_str_with_len(msg_db_t* msg, signed __int8 *mem,

unsigned int *len)

{

  if ( !msg->invalid )

  {

ssb::i_stream_t<ssb::msg_db_t,ssb::bytes_convertor>::operator>>(msg, len, (int)len, 0);

    if ( !msg->invalid )

    {

      if ( *len )

        ssb::i_stream_t<ssb::msg_db_t,ssb::bytes_convertor>::

read(msg, mem, *len, 0);

    }

  }

  return msg;

}

Note that the string buffer is allocated based on a length read from the msg_db_t buffer, but then a second length is read from the buffer and used as the length of the string that is read. This means that if an attacker could manipulate the contents of the msg_db_t buffer, they could specify the length of the buffer allocated, and overwrite it with any length of data (up to a limit of 0x1FFF bytes, not shown in the code snippet above).

I tested this bug by hooking SSL_write with Frida, and sending the malformed packet, and it caused the Zoom client to crash on a variety of platforms. This vulnerability was assigned CVE-2021-34423 and fixed on November 24, 2021.

Looking at the code for the MMR server, I noticed that ssb::conf_send_msg_req::load_from, the class the vulnerability occurs in was also present on the MMR server. Since the MMR forwards Zoom meeting traffic from one client to another, it makes sense that it might also deserialize this packet type. I analyzed the MMR code in IDA, and found that deserialization of this class only occurs during Zoom Webinars. I purchased a Zoom Webinar license, and was able to crash my own Zoom MMR server by sending this packet. I was not willing to test a vulnerability of this type on Zoom’s public MMR servers, but it seems reasonably likely that the same code was also in Zoom’s public servers.

Looking further at deserialization, I noticed that all deserialized objects contain an optional field of type ssb::dyna_para_table_t, which is basically a properties table that allows a map of name strings to variant objects to be included in the deserialized object. The variants in the table are implemented by the structure ssb::variant_t, as follows.

struct variant{

char type;

short length;

var_data data;

};

union var_data{

        char i8;

        char* i8_ptr;

        short i16;

        short* i16_ptr;

        int i32;

        int* i32_ptr;

        long long i64;

        long long i64*;

};

The value of the type field corresponds to the width of the variant data (1 for 8-bit, 2 for 16-bit, 3 for 32-bit and 4 four 64-bit). The length field specifies whether the variant is an array and its length. If it has the value 0, the variant is not an array, and a numeric value is read from the data field based on its type. If the length field has any other value, the data field is cast to a pointer, an array of that size is read.

My immediate concern with this implementation was that it could be prone to type confusion. One possibility is that a numeric value could be confused with an array pointer, which would allow an attacker to create a variant with a pointer that they specify. However, both the client and MMR perform very aggressive type checks on variants they treat as arrays. Another possibility is that a pointer could be confused with a numeric value. This could allow an attacker to determine the address of a buffer they control if the value is ever returned to the attacker. I found a few locations in the MMR code where a pointer is converted to a numeric value in this way and logged, but nowhere that an attacker could obtain the incorrectly cast value. Finally, I looked at how array data is handled, and I found that there are several locations where byte array variants are converted to strings, however not all of them checked that the byte array has a null terminator. This meant that if these variants were converted to strings, the string could contain the contents of uninitialized memory.

Most of the time, packets sent to the MMR by one user are immediately forwarded to other users without being deserialized by the server. For some bugs, this is a useful feature, for example, it is what allows CVE-2021-34423 discussed earlier to be triggered on a client. However, an information leak in variants needs to occur on the server to be useful to an attacker. When a client deserializes an incoming packet, it is for use on the device, so even if a deserialized string contains sensitive information, it is unlikely that this information will be transmitted off the device. Meanwhile, the MMR exists expressly to transmit information from one user to another, so if a string gets deserialized, there is a reasonable chance that it gets sent to another user, or alters server behavior in an observable way. So, I tried to find a way to get the server to deserialize a variant and convert it to a string. I eventually figured out that when a user logs into Zoom in a browser, the browser can’t process serialized packets, so the MMR must convert them to strings so they can be accessed through web requests. Indeed, I found that if I removed the null terminator from the user_name variant, it would be converted to a string and sent to the browser as the user’s display name.

The vulnerability was assigned CVE-2021-34424 and fixed on November 24, 2021. I tested it on my own MMR as well as Zoom’s public MMR, and it worked and returned pointer data in both cases.

Exploit Attempt

I attempted to exploit my local MMR server with these vulnerabilities, and while I had success with portions of the exploit, I was not able to get it working. I started off by investigating the possibility of creating a client that could trigger each bug outside of the Zoom client, but client authentication appeared complex and I lacked symbols for this part of the code, so I didn’t pursue this as I suspected it would be very time-consuming. Instead, I analyzed the exploitability of the bugs by triggering them from a Linux Zoom client hooked with Frida.

I started off by investigating the impact of heap corruption on the MMR process. MMR servers run on CentOS 7, which uses a modern glibc heap, so exploiting heap unlinking did not seem promising. I looked into overwriting the vtable of a C++ object allocated on the heap instead.

 

I wrote several Frida scripts that hooked malloc on the server, and used them to monitor how incoming traffic affects allocation. It turned out that there are not many ways for an attacker to control memory allocation on an MMR server that are useful for exploiting this vulnerability. There are several packet types that an attacker can send to the server that cause memory to be allocated on the heap and then freed when processing is finished, but not as many where the attacker can trigger both allocation and freeing. Moreover, the MMR server performs different types of processing in separate threads that use unique heap arenas, so many areas of the code where this type of allocation is likely to occur, such as connection management, allocate memory in a different heap arena than the thread where the bug occurs. The only such allocations I could find that were made in the same arena were related to meeting set-up: when a user joins a meeting, certain objects are allocated on the heap, which are then freed when they leave the meeting. Unfortunately, these allocations are difficult to automate as they require many unique users accounts in order for the allocation to be performed repeatedly, and allocation takes an observable amount of time (seconds).

I eventually wrote Frida scripts that looked for free chunks of unusual sizes that bordered C++ objects with vtables during normal MMR operation. There were a few allocation sizes that met this criteria, and since CVE-2021-34423 allows for the size of the buffer that is overflowed to be specified by the attacker, I was able to corrupt the memory of the adjacent object. Unfortunately, heap verification was very robust, so in most cases, the MMR process would crash due to a heap verification error before a virtual call was made on the corrupted object. I eventually got around this by focusing on allocation sizes that are small enough to be stored in fastbins by the heap, as heap chunks that are stored in fastbins do not contain verifiable heap metadata. Chunks of size 58 turned out to be the best choice, and by triggering the bug with an allocation of that size, I was able to control the pointer of a virtual call about one in ten times I triggered the bug.

The next step was to figure out where to point the pointer I could control, and this turned out to be more challenging than I expected. The MMR process did not have ASLR enabled when I did this research (it was enabled in version 4.6.20211128.136, which was released on November 28, 2021), so I was hoping to find a series of locations in the binary that this call could be directed to that would eventually end in a call to execv with controllable parameters, as the MMR initialization code contains many calls to this function. However, there were a few features of the server that made this difficult. First, only the MMR binary was loaded at a fixed location. The heap and system libraries were not, so only the actual MMR code was available without bypassing ASLR. Second, if the MMR crashes, it has an exponential backoff which culminates in it respawning every hour on the hour. This limits how many exploit attempts an attacker has. It is realistic that an attacker might spend days or even weeks trying to exploit a server, but this still limits them to hundreds of attempts. This means that any exploit of an MMR server would need to be at least somewhat reliable, so certain techniques that require a lot of attempts, such as allocating a large buffer on the heap and trying to guess its location were not practical.

I eventually decided that it would be helpful to allocate a buffer on the heap with controlled contents and determine its location. This would make the exploit fairly reliable in the case that the overflow successfully leads to a virtual call, as the buffer could be used as a fake vtable, and also contain strings that could be used as parameters to execv. I tried using CVE-2021-34424 to leak such an address, but wasn’t able to get this working.

This bug allows the attacker to provide a string of any size, which then gets copied out of bounds up until a null character is encountered in memory, and then returned. It is possible for CVE-2021-34424 to return a heap pointer, as the MMR maps the heap that gets corrupted at a low address that does not usually contain null bytes, however, I could not find a way to force a specific heap pointer to be allocated next to the string buffer that gets copied out of bounds. C++ objects used by the MMR tend to be virtual objects, so the first 64 bits of most object allocations are a vtable which contains null bytes, ending the copy. Other allocated structures, especially larger ones, tend to contain non-pointer data. I was able to get this bug to return heap pointers by specifying a string that was less than 64 bits long, so the nearby allocations were sometimes the pointers themselves, but allocations of this size are so frequent it was not possible to ascertain what heap data they pointed to with any accuracy.

One last idea I had was to use another type confusion bug to leak a pointer to a controllable buffer. There is one such bug in the processing of deserialized ssb::kv_update_req objects. This object’s ssb::dyna_para_table_t table contains a variant named nodeid which represents the specific Zoom client that the message refers to. If an attacker changes this variant to be of type array instead of a 32-bit integer, the address of the pointer to this array will be logged as a string. I tried to combine CVE-2021-34424 with this bug, hoping that it might be possible for the leaked data to be this log string that contains pointer information. Unfortunately, I wasn’t able to get this to work because of timing: the log entry needs to be logged at almost exactly the same time as the bug is triggered so that the log data is still in memory, and I wasn't able to send packets fast enough. I suspect it might be possible for this to work with improved automation, as I was relying on clients hooked with Frida and browsers to interact with the Zoom server, but I decided not to pursue this as it would require tooling that would take substantial effort to develop.

Conclusion

I performed a security analysis of Zoom and reported two vulnerabilities. One was a buffer overflow that affected both Zoom clients and MMR servers, and one was an info leak that is only useful to attackers on MMR servers. Both of these vulnerabilities were fixed on November 24, 2021.

The vulnerabilities in Zoom’s MMR server are especially concerning, as this server processes meeting audio and video content, so a compromise could allow an attacker to monitor any Zoom meetings that do not have end-to-end encryption enabled. While I was not successful in exploiting these vulnerabilities, I was able to use them to perform many elements of exploitation, and I believe that an attacker would be able to exploit them with sufficient investment. The lack of ASLR in the Zoom MMR process greatly increased the risk that an attacker could compromise it, and it is positive that Zoom has recently enabled it. That said, if vulnerabilities similar to the ones that I reported still exist in the MMR server, it is likely that an attacker could bypass it, so it is also important that Zoom continue to improve the robustness of the MMR code.

It is also important to note that this research was possible because Zoom allows customers to set up their own servers, meanwhile no other video conferencing solution with proprietary servers that I have investigated allows this, so it is unclear how these results compare to other video conferencing platforms.

Overall, while the client bugs that were discovered during this research were comparable to what Project Zero has found in other videoconferencing platforms, the server bugs were surprising, especially when the server lacked ASLR and supports modes of operation that are not end-to-end encrypted.

There are a few factors that commonly lead to security problems in videoconferencing applications that contributed to these bugs in Zoom. One is the huge amount of code included in Zoom. There were large portions of code that I couldn’t determine the functionality of, and many of the classes that could be deserialized didn’t appear to be commonly used. This both increases the difficulty of security research and increases the attack surface by making more code that could potentially contain vulnerabilities available to attackers. In addition, Zoom uses many proprietary formats and protocols which meant that understanding the attack surface of the platform and creating the tooling to manipulate specific interfaces was very time consuming. Using the features we tested also required paying roughly $1500 USD in licensing fees. These barriers to security research likely mean that Zoom is not investigated as often as it could be, potentially leading to simple bugs going undiscovered.  

Still, my largest concern in this assessment was the lack of ASLR in the Zoom MMR server. ASLR is arguably the most important mitigation in preventing exploitation of memory corruption, and most other mitigations rely on it on some level to be effective. There is no good reason for it to be disabled in the vast majority of software. There has recently been a push to reduce the susceptibility of software to memory corruption vulnerabilities by moving to memory-safe languages and implementing enhanced memory mitigations, but this relies on vendors using the security measures provided by the platforms they write software for. All software written for platforms that support ASLR should have it (and other basic memory mitigations) enabled.

The closed nature of Zoom also impacted this analysis greatly. Most video conferencing systems use open-source software, either WebRTC or PJSIP. While these platforms are not free of problems, it’s easier for researchers, customers and vendors alike to verify their security properties and understand the risk they present because they are open. Closed-source software presents unique security challenges, and Zoom could do more to make their platform accessible to security researchers and others who wish to evaluate it. While the Zoom Security Team helped me access and configure server software, it is not clear that support is available to other researchers, and licensing the software was still expensive. Zoom, and other companies that produce closed-source security-sensitive software should consider how to make their software accessible to security researchers.