Today’s The Fast and the Curious post explores how Chrome achieved the highest score on the new Speedometer 3.0, an upgraded browser benchmarking tool to optimize the performance of Web applications. Try out Chrome today!
Speedometer 3.0 is a recently published benchmark for measuring browser performance that was created as an industry collaboration between companies like Google, Apple, Mozilla, Intel, and Microsoft. This benchmark helped us identify areas in which we could optimize Chrome to deliver a faster browser experience to all our users.
Here’s a closer look at how we further optimized Chrome to achieve the highest score ever Speedometer 3, by carefully tracking its recent performance over time as the updated benchmark was being developed. Since the inception of Speedometer 3 in May 2022, we've driven a 72% increase in Chrome’s Speedometer score - translating into performance gains for our users:
By looking at the workloads in Speedometer and in which functions Chrome was spending the most time, we were able to make targeted optimizations to those functions that each drove an increase in Chrome’s score. For example, the SpaceSplitString function is used heavily to turn space-separated strings such as those in “class=’foo bar’ ” into a list representation. In this function we removed some unnecessary bound checks. When we detect that there are duplicated stylesheets, we dedupe them and reference a single stylesheet instance. We made an optimization to reduce the cost of drawing paths and arcs by tuning memory allocations. When creating form editors we detected some unnecessary processing that occurs when form elements are created. Within querySelector, we were able to detect what selector was commonly used and create a hot-path for that.
We previously shared how we optimized innerHTML using specialized fast paths for parsing, an implementation that also made its way into WebKit. Some workloads in Speedometer 3 use DOMParser so we extended the same optimization for another 1% gain.
We worked with the Harfbuzz maintainer to also optimize how Chrome renders AAT fonts such as those used by Apple Mac OS system fonts. Text starts as a processed stream of unicode characters that is then transformed into a glyph stream that is then run through a state machine defined in the AAT font. The optimization allows us to determine more quickly whether glyphs actually participate in the rules for the state machine, leading to speed-ups when processing text using AAT.
An important strategy for achieving high performance is tiering up code, which is picking the right code to further optimize within the engine. Intel contributed profile guided tiering to V8 that remembers tiering decisions from the past such that if a function was stably tiered up in the past, we eagerly tier it up on future runs.
Another area of changes that drove around 3% progression on Speedometer 3 was improvements around garbage collection. V8’s garbage collector has a long history of making use of renderer idle time to avoid interfering with actual application code. The recent changes follow this spirit by extending existing mechanisms to prefer garbage collection in idle time on otherwise very active renderers where possible. Specifically, DOM finalization code that is run on reclaiming objects is now also run in idle time. Previously, such operations would compete with regular application code over CPU resources. In addition, V8 now supports a much more compact layout for objects that wrap DOM elements, i.e., all objects that are exposed to JavaScript frameworks. The compact layout reduces memory pressure and results in less time spent on garbage collection.
Posted by Thomas Nattestad, Chrome Product Manager
On the Chrome team, we believe it’s not sufficient to be fast most of the time, we have to be fast all of the time. Today’s The Fast and the Curious post explores how we contributed to Core Web Vitals by surveying the field data of Chrome responding to user interactions across all websites, ultimately improving performance of the web.
As billions of people turn to the web to get things done every day, the browser becomes more responsible for hosting a multitude of apps at once, resource contention becomes a challenge. The multi-process Chrome browser contends for multiple resources: CPU and memory of course, but also its own queues of work between its internal services (in this article, the network service).
This is why we’ve been focused on identifying and fixing slow interactions from Chrome users’ field data, which is the authoritative source when it comes to real user experiences. We gather this field data by recording anonymized Perfetto traces on Chrome Canary, and report them using a privacy-preserving filter.
When looking at field data of slow interactions, one particular cause caught our attention: recurring synchronous calls to fetch the current site’s cookies from the network service.
Let’s dive into some history.
Cookies have been part of the web platform since the very beginning. They are commonly created like this:
document.cookie = "user=Alice;color=blue"
And later retrieved like this:
// Assuming a `getCookie` helper method: getCookie("user", document.cookie)
Its implementation was simple in single-process browsers, which kept the cookie jar in memory.
Over time, browsers became multi-process, and the process hosting the cookie jar became responsible for answering more and more queries. Because the Web Spec requires Javascript to fetch cookies synchronously, however, answering each document.cookie query is a blocking operation.
document.cookie
The operation itself is very fast, so this approach was generally fine, but under heavy load scenarios where multiple websites are requesting cookies (and other resources) from the network service, the queue of requests could get backed up.
We discovered through field traces of slow interactions that some websites were triggering inefficient scenarios with cookies being fetched multiple times in a row. We landed additional metrics to measure how often a GetCookieString() IPC was redundant (same value returned as last time) across all navigations. We were astonished to discover that 87% of cookie accesses were redundant and that, in some cases, this could happen hundreds of times per second.
GetCookieString()
The simple design of document.cookie was backfiring as JavaScript on the web was using it like a local value when it was really a remote lookup. Was this a classic computer science case of caching?! Not so fast!
The web spec allows collaborating domains to modify each other’s cookies. Hence, a simple cache per renderer process didn’t work, as it would have prevented writes from propagating between such sites (causing stale cookies and, for example, unsynchronized carts in ecommerce applications).
We solved this with a new paradigm which we called Shared Memory Versioning. The idea is that each value of document.cookie is now paired with a monotonically increasing version. Each renderer caches its last read of document.cookie alongside that version. The network service hosts the version of each document.cookie in shared memory. Renderers can thus tell whether they have the latest version without having to send an inter-process query to the network service.
This reduced cookie-related inter-process messages by 80% and made document.cookie accesses 60% faster 🥳.
Improving an algorithm is nice, but what we ultimately care about is whether that improvement results in improving slow interactions for users. In other words, we need to test the hypothesis that stalled cookie queries were a significant cause of slow interactions.
To achieve this, we used Chrome’s A/B testing framework to study the effect and determined that it, combined with other improvements to reduce resource contention, improved the slowest interactions by approximately 5% on all platforms. This further resulted in more websites passing Core Web Vitals 🥳. All of this adds up to a more seamless web for users.
Timeline of the weighted average of the slowest interactions across the web on Chrome as this was released to 1% (Nov), 50% (Dec), and then all users (Feb).
By Gabriel Charette, Olivier Li Shing Tat-Dupuis, Carlos Caballero Grolimund, and François Doray, from the Chrome engineering team
Today’s The Fast and the Curious post covers the release of Speedometer 3.0 an upgraded browser benchmarking tool to optimize the performance of Web applications.
In collaboration with major web browser engines, Blink/V8, Gecko/SpiderMonkey, and WebKit/JavaScriptCore, we’re excited to release Speedometer 3.0. Benchmarks, like Speedometer, are tools that can help browser vendors find opportunities to improve performance. Ideally, they simulate functionality that users encounter on typical websites, to ensure browsers can optimize areas that are beneficial to users.
Let’s dig into the new changes in Speedometer 3.0.
Since its initial release in 2014 by the WebKit team, browser vendors have successfully used Speedometer to optimize their engines and improve user experiences on the web. Speedometer 2.0, a result of a collaboration between Apple and Chrome, followed in 2018, and it included an updated set of workloads that were more representative of the modern web at that time.
The web has changed a lot since 2018, and so has Speedometer in its latest release, Speedometer 3. This work has been based on a joint multi-stakeholder governance model to share work, and build a collaborative understanding of performance on the web to help drive browser performance in ways that help users. The goal of this collaborative project is to create a shared understanding of web performance so that improvements can be made to enhance the user experience. Together, we were able to to improve how Speedometer captures and calculates scores, show more detailed results and introduce an even wider variety of workloads. This cross-browser collaboration introduced more diverse perspectives that enabled clearer insights into a broader set of web users and workflows, ensuring the newest version of Speedometer will help make the web better for everyone, regardless of which browser they use.
Building a reliable benchmark with representative tests and workloads is challenging enough. That task becomes even more challenging if it will be used as a tool to guide optimization of browser engines over multiple years. To develop the Speedometer 3 benchmark, the Chrome Aurora team, together with colleagues from other participating browser vendors, were tasked with finding new workloads that accurately reflect what users experience across the vast, diverse and eclectic web of 2024 and beyond.
A few tests and workloads can’t simulate the entire web, but while building Speedometer 3 we have established some criteria for selecting ones that are critical to user’s experience. We are now closer to a representative benchmark than ever before. Let’s take a look at how Speedometer workloads evolved
Since the goal is to use workloads that are representative of the web today, we needed to take a look at the previous workloads used in Speedometer and determine what changes were necessary. We needed to decide which frameworks are still relevant, which apps needed updating and what types of work we didn’t capture in previous versions. In Speedometer 2, all workloads were variations of a todo app implemented in different JS frameworks. We found that, as the web evolved over the past six years, we missed out on various JavaScript and Browser APIs that became popular, and apps tend to be much larger and more complicated than before. As a result, we made changes to the list of frameworks we included and we added a wider variety of workloads that cover a broader range of APIs and features.
To determine which frameworks to include, we used data from HTTP Archive and discussed inclusion with all browser vendors to ensure we cover a good range of implementations. For the initial evaluation, we took a snapshot of the HTTP Archive from March 2023 to determine the top JavaScript UI frameworks currently used to build complex web apps.
Another approach is to determine inclusion based on popularity with developers: Do we need to include frameworks that have “momentum”, where a framework's current usage in production might be low, but we anticipate growth in adoption? This is somewhat hard to determine and might not be the ideal sole indicator for inclusion. One data point to evaluate momentum might be monthly NPM downloads of frameworks.
Here are the same 15 frameworks NPM downloads for March 2023:
With both data points on hand, we decided on a list that we felt gives us a good representation of frameworks. We kept the list small to allow space for brand new types of workloads, instead of just todo apps. We also selected commonly used versions for each framework, based on the current usage.
In addition, we updated the previous JavaScript implementations and included a new web-component based version, implemented with vanilla JavaScript.
A simple Todo-list only tests a subset of functionality. For example: how well do browsers handle complicated flexbox and grid layouts? How can we capture SVG and canvas rendering and how can we include more realistic scenarios that happen on a website?
We collected and categorized areas of interest into DOM, layout, API and patterns, to be able to match them to potential workloads that would allow us to test these areas. In addition we collected user journeys that included the different categories of interest: editing text, rendering charts, navigating a site, and so on.
There are many more areas that we weren’t able to include, but the final list of workloads presents a larger variety and we hope that future versions of Speedometer will build upon the current list.
The Chrome Aurora team worked with the Chrome V8 team to validate our assumptions above. In Chrome, we can use runtime-call-stats to measure time spent in each web API (and additionally many internal components). This allows us to get an insight into how dominant certain APIs are.
If we look at Speedometer 2.1 we see that a disproportionate amount of benchmark time is spent in innerHTML.
While innerHTML is an important web API, it's overrepresented in Speedometer 2.1. Doing the same analysis on the new version 3.0 yields a slightly different picture:
We can see that innerHTML is still present, but its overall contribution shrunk from roughly 14% down to 4.5%. As a result, we get a better distribution that favors more DOM APIs to be optimized. We can also see that a few Canvas APIs have moved into this list, thanks to the new workloads in v3.0.
While we will never be able to perfectly represent the whole web in a fast-running and stable benchmark, it is clear that Speedometer 3.0 is a giant step in the right direction.
Ultimately, we ended up with the following list of workloads presented in the next few sections.
TodoMVC
Many developers might recognize the TodoMVC app. It’s a popular resource for learning and offers a wide range of TodoMVC implementations with different frameworks.
TodoMVC is a to-do application that allows a user to keep track of tasks. The user can enter a new task, update an existing one, mark a task as completed, or delete it. In addition to the basic CRUD operations, the TodoMVC app has some added functionality: filters are available to change the view to “all”, “active” or “completed” tasks and a status text displays the number of active tasks to complete.
In Speedometer, we introduced a local data source for todo items, which we use in our tests to populate the todo apps. This gave us the opportunity to test a larger character set with different languages.
The tests for these apps are all similar and are relatable to typical user journeys with a todo app:
These tests seem simple, but it lets us benchmark DOM manipulations. Having a variety of framework implementations also cover several different ways how this can be done.
Complex DOM / TodoMVC
The complex DOM workloads embed various TodoMVC implementations in a static UI shell that mimics a complex web page. The idea is to capture the performance impact on executing seemingly isolated actions (e.g. adding/deleting todo items) in the context of a complex website. Small performance hits that aren’t obvious in an isolated TodoMVC workload are amplified in a larger application and therefore capture more real-world impact.
The tests are similar to the TodoMVC tests, executed in the complex DOM & CSSOM environment.
This introduces an additional layer of complexity that browsers have to be able to handle effortlessly.
Single-page-applications (News Site)
Single-page-applications (SPAs) are widely used on the web for streaming, gaming, social media and pretty much anything you can imagine. A SPA lets us capture navigating between pages and interacting with an app. We chose a news site to represent a SPA, since it allows us to capture the main areas of interest in a deterministic way. An important factor was that we want to ensure we are using static local data and that the app doesn’t rely on network requests to present this data to the user.
Two implementations are included: one built with Next.js and the other with Nuxt. This gave us the opportunity to represent applications built with meta frameworks, with the caveat that we needed to ensure to use static outputs.
Tests for the news site mimic a typical user journey, by selecting a menu item and navigating to another section of the site.
These tests let us evaluate how well a browser can handle large DOM and CSSOM changes, by changing a large amount of data that needs to be displayed when navigating to a different page.
Charting Apps & Dashboards
Charting apps allow us to test SVG and canvas rendering by displaying charts in various workloads.
These apps represent popular sites that display financial information, stock charts or dashboards.
Both SVG rendering and the use of the canvas api weren’t represented in previous releases of Speedometer.
Observable Plot displays a stacked bar chart, as well as a dotted chart. It is based on D3, which is a JavaScript library for visualizing tabular data and outputs SVG elements. It loops through a big dataset to build the source data that D3 needs, using map, filter and flatMap methods. As a result this exercises creation and copying of objects and arrays.
Chart.js is a JavaScript charting library. The included workload displays a scatter graph with the canvas api, both with some transparency and with full opacity. This uses the same data as the previous workload, but with a different preparation phase. In this case it makes a heavy use of trigonometry to compute distances between airports.
React Stockcharts displays a dashboard for stocks. It is based on D3 for all computation, but outputs SVG directly using React.
Webkit Perf-Dashboard is an application used to track various performance metrics of WebKit. The dashboard uses canvas drawing and web components for its ui.
These workloads test DOM manipulation with SVG or canvas by interacting with charts. For example here are the interactions of the Observable Plot workload:
Code Editors
Editors, for example WYSIWYG text and code editors, let us focus on editing live text and capturing form interactions. Typical scenarios are writing an email, logging into a website or filling out an online form. Although there is some form interaction present in the TodoMVC apps, the editor workloads use a large data set, which lets us evaluate performance more accurately.
Codemirror is a code editor that implements a text input field with support for many editing features. Several languages and frameworks are available and for this workload we used the JavaScript library from Codemirror.
Tiptap Editor is a headless, framework-agnostic rich text editor that's customizable and extendable. This workload used Tiptap as its basis and added a simple ui to interact with.
Both apps test DOM insertion and manipulation of a large amount of data in the following way:
Being able to collaborate with all major browser vendors and having all of us contribute to workloads has been a unique experience and we are looking forward to continuing to collaborate in the browser benchmarking space.
Don’t forget to check out the new release of Speedometer and test it out in your favorite browser, dig into the results, check out our repo and feel free to open issues with any improvements or ideas for workloads you would like to see included in the next version. We are aiming for a more frequent release schedule in the future and if you are a framework author and want to contribute, feel free to file an issue on our Github to start the discussion.
Posted by Thorsten Kober, Chrome Aurora
Today’s The Fast and the Curious post explores how Core Web Vitals saved Chrome users more than 10,000 Years of waiting for web pages to load in 2023 (across Chrome desktop and Android) by quantifying the experience of sites and identifying opportunities to make improvements.
In 2020, we introduced Web Vitals - essential quality signals for webpages to ensure a better user experience. Since then, there has been a massive leap in web performance made possible by our work on Core Web Vitals (CWV) and its broader impact on the web. Today, over 40% of sites pass all of the CWV metrics, leading to pages that load and respond to interactions more quickly. Here’s a closer look at the journey to help improve the performance for sites and some specific work done in the browser and the ecosystem to enable this achievement.
Chrome's Quest for Speed
The very essence of the web lies in its ability to provide information and services efficiently and rapidly. This principle is at the heart of Google's business and drives our work on Chrome. However, we noticed an issue with sites over a long time horizon. Even if slow sites improved their performance for a while, it would often decline over time. No matter how fast Google Search might be, the user experience would be subpar if the pages found were slow to load.
We could not help these sites improve their performance directly, but we wanted users to have a great experience when they moved from Google Search to the individual sites. To tackle the challenge of improving the user experience while simultaneously providing unified guidance to developers, teams from Search and Chrome collaborated to address the issue of slow web pages.
Defining the Fast Web
We examined millions of pages to define a public standard for a fast, user-friendly web page (initially published in The Science Behind Web Vitals). We published our specifications and data to the open ecosystem and took note of the feedback we received. The introduction of CWV metrics such as LCP (Largest Contentful Paint) was groundbreaking because it allowed us to measure when the user actually sees the content. The ability to measure the actual user experience at scale has been foundational to the improvements that we will discuss in this blog post.
Next, we updated Google's search ranking algorithms in August 2021 to consider, among other factors, whether a page met the speed and usability standards established as part of CWV. Today, it remains highly recommended for site owners to achieve good Core Web Vitals for success with Search and to ensure a great user experience generally.
Exponential Impact of Small Changes
The results we saw after these changes were significant. The average page load in Chrome is now 166 ms faster. That might seem like a minor improvement, but small changes can accumulate to create a substantial impact on the web.
So far in 2023, this project saved users over 10,000 years of waiting for web pages to load and over 1,200 years of waiting for web pages to respond to user input. And the web continues to get faster. We also tracked improvements in how many navigations meet Core Web Vitals (CWV). The current figures stand at 64.45% for mobile (up from 64%) and 68.39% for desktop (up from 67%). The Chrome Data team projects a ~69% pass rate by the end of the year.
Caption: Our savings for LCP translate into 8,000 years saved for users waiting for pages to load on Android and 2,000 years in 2023 so far. On INP, we have saved users 800 years on Android and 450 years on Windows so far in 2023.
Next, let’s look at some recent updates from both the Chrome team and the wider developer ecosystem, demonstrating how our joint efforts are speeding up the web.
Chrome’s Core Web Vitals Achievements
We’re proud to highlight numerous ways we’ve optimized performance.
The Back/forward cache (bfcache) is designed to improve browsing experience by enabling instant back and forward navigation. BFCache’s hit rate has improved month-over-month on both Android (3.6%) and Desktop (1.8%).
Another example of a particularly impactful optimization is our PreconnectOnAnchorInteraction feature which connects to origins on pointer-down rather than pointer-up. This fully launched feature led to a 6/10ms (0.4/1%) median LCP improvement on Android/Desktop, and an improvement in cross-origin LCP by ~60ms on both Android and Desktop. The launch also resulted in a 0.08% Content Ad revenue increase, underlining the significant impact of performance optimizations on user engagement and ecosystem health.
We also introduced prerendering, which makes pages load instantly by rendering them before the user actually visits. Page loads via typing URLs directly in the omnibox get a 500-700ms (14-25%) median LCP improvement when prerendered, depending on the platform, moving global median LCP across all navigations by 6.4ms. We're currently rolling out prerendering of omnibox-initiated searches.
Chrome has been working hard to keep background tabs out of your way. Implementing tab throttling for background tabs running at EcoQOS on Windows 11 and Task Role and QoS Adjustments on macOS have led to improvements in Largest Contentful Paint (LCP) and Interaction to Next Paint (INP).
The web’s modern ability to run all types of applications also comes with a mandate to manage the workload that this encurs. We have been optimizing Chrome under mutliple active tabs and are happy to report improvements to scheduling and contention which improve INP by 5% and LCP by 2% in the last 6 months.
We have made targeted improvements to the page loading code in Chrome in 2022. These resulted in LCP improving by 10% on Android, and CWV pass rate improving by 1.5%.
Chrome's renderer has also seen some improvements. The renderer's main thread includes task queues for JavaScript, rendering, and image loading. Some changes that alter the priority of these tasks for optimal CWV include.
High priority image loading: Historically, image-loading had the same or lower priority than rendering. However, an experiment showed that between an image load task and a rendering task, choosing the image load task first can prevent layout shift of an intermediate frame that doesn't have the image and also improves LCP. The improvement on Android at the 75th percentile was -6.66% for CLS and -0.82% for LCP, improving the CWV pass rate on Android by +0.24%. A similar experiment that boosted the loading priority to "medium" of the first five images parsed from the HTML (for non-icon-sized images) showed an improvement on Android at the 75th percentile of -6.08% for CLS and -0.53% for LCP. A combined experiment showed the effects of both changes were largely independent.
Prioritize compositing after delay: If it has been more than 100ms since the last compositing task run, elevate the priority of any queued compositing task so that it will preempt normal-priority work. This produced an improvement of -0.27% for CLS on Android and Windows at the 95th percentile.
SVG Raster Optimizations: Another SVG drawing optimization improved INP pass rates on desktops by -2.28% for MacOS at the 75th percentile.
Ecosystem Core Web Vitals Achievements
The broader developer ecosystem has also achieved remarkable results by focusing on Core Web Vitals. The most significant achievement was the performance improvement on WordPress - the Content Management System that powers over a third of the web: "WordPress 6.3 loads 27% faster for block themes and 18% faster for classic themes, compared to WordPress 6.2, based on the Largest Contentful Paint (LCP) metric".
Some parts of the WordPress ecosystem are going even further. Prerendering some links via the speculation rules API, NitroPack's prerendered page loads have seen an 80% LCP improvement and 55% INP improvement compared to those without any speculative loading.
Caption: The percentage of origins passing all three Core Web Vitals (LCP, FID, CLS) with a "good" experience (Source: HTTP Archive)
The JavaScript framework community has also seen Core Web Vital gains. Over the past few years, Chrome Aurora has collaborated with Next.js, Angular, and Nuxt to release performance-focused features like the next/script component, NgOptimizedImage, and nuxt/google-fonts. In 2022, Next.js pass rates increased from 20.4% to 27.3%, Angular pass rates increased from 7.6% to 13.2%, and Nuxt pass rates increased from 15.8% to 20.2%. Enterprise partners who tried our features have seen wins in LCP. For example, after switching to NgOptimizedImage, Land's End saw a 40% LCP improvement on mobile in Lighthouse lab tests and a 75% improvement in LCP on desktop. In similar tests, CareerKarma's LCP reduced 24% when switching to next/script's web worker mode.
In the business world, performance optimization has led to remarkable growth. For instance, RedBus improved INP and observed a 7% increase in conversion rates. Economic Times improved INP and saw a 42% rise in page views and a 49% reduction in bounce rate. Meesho successfully brought LCP down from 6.9s to 2.5s, resulting in a 16.6% reduction in bounce rate and a 3% increase in conversions.
Major web platforms have also seen significant improvements. Amazon has leveraged the bfcache change introduced on Chrome and saw a 22.7 percentage point (pp) improvement in bfcache hit rate with Chrome's latest version (M112). Cricbuzz experienced an even higher increase, with a 31.40 pp improvement.
Partnering for a Better Web
These performance improvements aren't just statistics – they represent real-world improvements in user experience (and hence business metrics) as well as developer experience.
Crucially, we have managed to achieve these speed boosts without impacting developer satisfaction, which remains high at 90% overall. Through our developer satisfaction studies, we also found that about half (~51%) of developers are monitoring CWV and are either already optimizing for them or planning to do so. Furthermore, a significant majority (78%) of developers optimizing for CWV report seeing notable improvements in their scores.
Our aim is always to create a better web experience for all users, so we're excited to see the web getting faster. But we also understand that maintaining developer satisfaction is crucial to sustaining these improvements. As developers continue to monitor and optimize for CWV, we are optimistic about the future of web performance.
On behalf of the Chrome team, we want to thank the developer community for their incredible work. By focusing on Core Web Vitals, we've made the web a significantly faster and more enjoyable place to be. We look forward to continuing this journey together, making the web better for everyone, everywhere.
Posted by Addy Osmani, Annie Sullivan and Kouhei Ueno, Software Engineers for Chrome
Big performance wins can be found by taking a step back and tweaking what you already have. Today’s The Fast and the Curious post explores how we improved the scrolling experience of Chrome on Android, ultimately reducing slow scrolling jank by 2x. Read on to see how we discovered and evaluated the problem, and how that has helped us design a better browser experience going forward.
When measuring the performance of a browser, one might typically think of page load speed or Web Vitals. On mobile where touch interactions are common we also prioritize your interaction with Chrome to ensure it is always smooth and responsive including on new form factors like foldables. A significant focus of late has been on reducing jank while you scroll.
We recently improved the scrolling experience of Chrome on Android by 2x by filtering noise and reducing visual jumps in the content presented on screen. To get this result, we had to take a step back and figure out the problem of why Chrome on Android was lagging behind Chrome on iOS.
As we compared Chrome across platforms, we were hit with a particular observation. iOS Chrome scrolling was smooth and consistent whereas on Android, Chrome’s scrolling didn't follow your finger as closely. However, our metrics were telling us that while janks occurred occasionally, they weren’t as common as our perception when comparing with Chrome on iOS. Thus we had ourselves a mystery which needed some investigation.
Our metrics flagged that we often received input at an inconsistent rate; but since the input rate was greater than the display’s frame rate, we usually had at least one input event to trigger the production of a frame to display. However, this frame might have consumed fewer or more input events, which could result in inconsistent shifting of content on screen even while scrolling at a fixed speed.
This problem of different input rate vs frame rate is a problem that Chrome has had to address before. Internally, we resample input to predict/extrapolate where the finger was at a consistent point relative to the frame we want to produce. This should result in each frame representing a consistent amount of time and should mean smooth scrolling regardless of noise in the input events. The ideal scenario is illustrated in the following diagram where blue dots are real input events, green are resampled input events, and the displayed scroll deltas would fluctuate if you were to use the real input events rather than resampling.
Okay so we already do resampling so what's the problem?
Input resampling inside of Chrome (and Android) were added back in 2019 as 90hz devices emerged and the problem above became more apparent (oscillating between 2 vs 1 input events per frame rather than consistently 2 input events per frame we usually see on 60hz devices). Android implemented multiple resampling algorithms (kalman, linear, etc.) and arrived at the conclusion that linear resampling (drawing a line between two points to figure out velocity and then extrapolate to the given timestamp) was good enough for their use cases. This fixed the problem for most Android apps, but didn't address Chrome.
Due to historical reasons and web spec requirements for raw input, Chrome uses unbuffered input and thus as devices started to appear with sampling rates that didn’t fit with input, Chrome had to implement some version of resampling. Below we see that each frame (measuring the time from input to it being displayed) consumes a different amount of input events (2 for the first, 3 for the second, and 1 for the third), if we assume input is consistently arriving and each is exactly 30 pixels of displacement then ideally we should smooth it out to 60 pixels per frame as seen below:
However, while we were investigating the original mystery we discovered that reality was very different from the ideal situation pictured above. We found that the actual input movement of the screen was quite spiky and inconsistent (more than we expected) and that our predictor was improving things but not as much as desired. On the left is real finger displacement on a screen (each point is an input event) and on the right the result of our predictor of actual content offset after smoothing out (each point is a frame)
Frames are being presented consistently on the right, but the rate of displacement spikes between one to another isn’t consistent (-50 to -40 followed by another -52 being especially drastic). Human fingers don’t move this discretely (at frame level precision). Rather they should slide and flex in a gradient, speeding up or slowing down gradually. So we knew we had a problem here. We dug deeper into Chrome’s implementation and found there were some fundamental differences in Chrome’s implementation (which was supposedly a copy of Android’s).
1. Android uses the native C++ MotionEvent timestamp (with nanosecond precision), but Chrome uses Java MotionEvent.getEventTime & MotionEvent.getHistoricalEventTime (milliseconds precision). Unfortunately, nanosecond precision was not part of the public API. However, rounding of milliseconds can introduce error into our predictor when it computes velocity between event timestamps.2. Android’s implementation takes care when selecting the two input events so resampling is using the most relevant events. Chrome however uses a simple FIFO queue of input events, which can result in weird cases of using future events to predict velocity in the past in rare cases on high refresh rate devices.
1. Android uses the native C++ MotionEvent timestamp (with nanosecond precision), but Chrome uses Java MotionEvent.getEventTime & MotionEvent.getHistoricalEventTime (milliseconds precision). Unfortunately, nanosecond precision was not part of the public API. However, rounding of milliseconds can introduce error into our predictor when it computes velocity between event timestamps.
2. Android’s implementation takes care when selecting the two input events so resampling is using the most relevant events. Chrome however uses a simple FIFO queue of input events, which can result in weird cases of using future events to predict velocity in the past in rare cases on high refresh rate devices.
We prototyped using Android’s resampling in Chrome, but found it was still not perfect for Chrome’s architecture resulting in some jank. To improve on it, we experimented with different algorithms, using automation to replay the same input over and over again and evaluating the screen displacement curves. After tuning, this landed at the 1€ filter implementation that visibly and drastically improved the scrolling experience. With this filter, the screen tracks closely to your finger and websites smoothly scroll, preventing jank caused by inconsistent input events. The improvement is visible in our manual validation, on both top-end and low-end devices (Here's a redmi 9A video example).
In Android 14, the nanosecond API for java MotionEvents will be publicly exposed in the SDK so Chrome (and other apps with unbuffered input) will be able to call it. We also developed new metrics that track the quality of the scroll predictors frame, by creating a test app which introduced pixel level differences between frames (and no other form of jank) and running experiments to see what people would notice. These analysis can be read about here and will be used going forward for more exciting performance wins and to make this a visible area for tracking against regressions. In the end, after tuning and enabling the 1€ filter, our metrics show a 2x reduction in visible jank while scrolling slowly! This improvement is going live in M116 as the default, but will be launched all the way back to M110 and brings Chrome on Android on par with Chrome on iOS!
The moral of the story is: Sometimes metrics don’t cover all the cases and taking a step back and investigating from the top down and getting the lay of the land can end with a superior scrolling experience for users.
Post by: Stephen Nusko, Chrome Software Engineer