Layout Instability API

Status of this document

This specification was published by the Web Platform Incubator Community Group. It is not a W3C Standard nor is it on the W3C Standards Track. Please note that under the W3C Community Contributor License Agreement (CLA) there is a limited opt-out and other conditions apply. Learn more about W3C Community and Business Groups.

1. Introduction

This section is non-normative.

The shifting of DOM elements on a webpage detracts from the user’s experience, and occurs frequently on the web today. This shifting is often due to content loading asynchronously and displacing other elements on the page.

The layout Instability API identifies these unstable pages by reporting a value (the "layout shift") for each animation frame in the user’s session. This specification presents a method for a user agent to compute the layout shift value.

The layout shift value is expected to have a general correspondence to the severity of layout instability at a particular time. The method of computing it considers both the area of the region impacted by instability and the distance by which elements on the page are shifted.

The values exposed by this specification are not meant to be used to serve as "layout change observers" for a couple of reasons. First, they are tied to the PerformanceObserver, hence the dispatch of the callbacks can be done lazily by the user agent if it deems this necessary to avoid impacting performance of the site. Second, very small layout shifts can be ignored by the user agent. Thus, it is not advised to rely on this API as a way of running any JavaScript that impacts the user-visible behavior of a website.

1.1. Cumulative Layout Shift (CLS)

This section is non-normative.

The layout shift value represents a single point in time, but it is also useful to have a value to represent the total instability of the page for the period of time the user spends on it.

To that end we propose two values that a user agent or a developer is able to compute to obtain such a representation. (These definitions are non-normative, because the API does not expose these values.)

The document cumulative layout shift (DCLS) score is the sum of every layout shift value that is reported inside a single browsing context. (The DCLS score does not account for layout instability inside descendant browsing contexts.)
The cumulative layout shift (CLS) score is the sum of every layout shift value that is reported inside a top-level browsing context, plus a fraction (the subframe weighting factor) of each layout shift value that is reported inside any descendant browsing context.
The subframe weighting factor for a layout shift value in a child browsing context is the fraction of the top-level viewport that is occupied by the viewport of the child browsing context.

The cumulative layout shift score is expected to have a general correspondence to the severity of layout instability for the lifetime of a page.

The developer can use this API to compute the DCLS or CLS scores, by summing the values as they are reported to the observer, and taking the "final" score at the time of the visibilitychange event.

This strategy is illustrated in the usage example.

1.2. Source attribution

This section is non-normative.

In addition to the layout shift value, the API reports a sampling of up to five DOM elements whose layout shifts most substantially contributed to the layout shift value for an animation frame.

It is possible that the true "root cause" of instability will be only indirectly related to the DOM element that experiences a layout shift. For example, if a newly inserted element shifts content below it, the sources attribute will report only the shifted elements, and not the inserted element.

We do not believe it is feasible for the user agent to understand causes of instability at the level of indirection necessary for a meaningful "root cause" attribution. However, we expect that the more straightforward reporting of shifted elements presented in this API will nevertheless be of significant value to developers who are attempting to diagnose an occurrence of layout instability.

1.3. Usage example

This section is non-normative.

let perFrameLayoutShiftData = [];
let cumulativeLayoutShiftScore = 0;

function updateCLS(entries) {
  for (const entry of entries) {
    // Only count layout shifts without recent user input.
    if (entry.hadRecentInput)
      return;

    perFrameLayoutShiftData.push({
      score: entry.value,
      timestamp: entry.startTime
    });
    cumulativeLayoutShiftScore += entry.value;
  }
}

// Observe all layout shift occurrences.
const observer = new PerformanceObserver((list) => {
  updateCLS(list.getEntries());
});
observer.observe({type: 'layout-shift', buffered: true});

// Send final data to an analytics back end once the page is hidden.
document.addEventListener('visibilitychange', () => {
  if (document.visibilityState === 'hidden') {
    // Force any pending records to be dispatched.
    updateCLS(observer.takeRecords());

    // Send data to your analytics back end (assumes `sendToAnalytics` is
    // defined elsewhere).
    sendToAnalytics({perFrameLayoutShiftData, cumulativeLayoutShiftScore});
  }
});

The layout shift score is only one signal, which correlates in an approximate manner with the user experience of "jumpiness".

Developers are advised not to worry about small variations between layout shift scores; this metric is not intended to be a high-precision value, and user agents might compromise precision in the interest of calculation efficiency. Moreover, the definition of the metric might evolve over time.

2. Terminology

2.1. Basic Concepts

The starting point of a Node N in a coordinate space C is defined as follows:

If N is an Element which generates one or more boxes, the starting point of N in C is the two-dimensional offset in pixel units from the origin of C to the flow-relative starting corner of the first fragment of the principal box of N.
If N is a text node, the starting point of N in C is the two-dimensional offset in pixel units from the origin of C to the flow-relative starting corner of the first line box generated by N.

The transform-indifferent starting point of a Node N in a coordinate space C is the starting point of N in C, calculated as if every transformed element had a transformation matrix equal to the identity matrix.

NOTE: To determine whether a node has shifted, we consider the starting point both with and without transforms, to ensure that a node is not made unstable solely due to a transform change. However, the CSS transform is always taken into account for the calculation of the visual representation and the associated exclusion of points outside of the viewport.

The visual representation of a Node N is defined as follows:

If N is an Element which generates one or more boxes, the visual representation of N is the set of all points that lie within the bounds of any fragment of any box generated by N, in the coordinate space of the viewport, excluding any points that lie outside of the viewport.
If N is a text node, the visual representation of N is the set of all points that lie within the bounds of any line box generated by N, in the coordinate space of the viewport, excluding any points that lie outside of the viewport.

A condition holds in the previous frame if it was true at the point in time immediately after the most recently completed invocation of the report the layout shift algorithm.

The previous frame starting point of a Node N in a coordinate space C is the point which, in the previous frame, was the starting point of N in C.

The previous frame transform-indifferent starting point of a Node N in a coordinate space C is the point which, in the previous frame, was the transform-indifferent starting point of N in C.

The previous frame visual representation of a Node N is the set which, in the previous frame, was the visual representation of N.

Each user agent defines a number of pixels to significance, an integer which is used to compute whether movement is considered as a layout shift. This flexibility is provided so that the user agent can adjust for performance or based on user experience considerations.

Point A differs significantly from point B if A and B differ by number of pixels to significance or more pixel units in either the horizontal or vertical direction.

Note: Chrome has defined number of pixels to significance as 3.

2.2. Unstable Nodes

A Node N has shifted in a coordinate space C if:

the starting point of N in C differs significantly from the previous frame starting point of N in C, and
the transform-indifferent starting point of N in C differs significantly from the previous frame transform-indifferent starting point of N in C.

Otherwise, N has not shifted in C.

A Node N is an unstable-candidate if:

N is either
- an Element which generates one or more boxes, or
- a text node; and
currently and in the previous frame, the computed value of the visibility property for N equals "visible"; and
currently and in the previous frame, the computed value of the opacity property for N and for every ancestor of N is not equal to 0; and
N has shifted in the coordinate space of the viewport; and
N has shifted in the coordinate space of the initial containing block; and
there does not exist an Element P such that
1. currently and in the previous frame, P is in the containing block chain of N, and
2. currently and in the previous frame, P has a scrollable overflow region, and
3. P is not an unstable-candidate, and
4. N has not shifted in the coordinate space of the scrollable overflow region of P.

NOTE: The condition relating to a scrollable overflow region is intended to prevent nodes from being considered unstable solely because of a scroll operation.

A Node N is unstable if it is an unstable-candidate and it is not an inline clip crosser.

A Node N is an inline clip crosser if:

N is an unstable-candidate;
either the visual representation or the previous frame visual representation of N is empty; and
N would not be an unstable-candidate if the phrase "either the horizontal or vertical direction" in the definition of differs significantly were replaced by "the vertical direction" (if the block axis of N is vertical) or "the horizontal direction" (if the block axis of N is horizontal).

NOTE: An example of an inline clip crosser is an element that shifts into or out of view by moving in the inline direction across the boundary of a containing clip. We exclude such an element from the unstable node set as long as it don’t shift in the block flow direction. This can make it easier to build certain types of "carousel" user interface controls.

The unstable node set of a Document D is the set containing every unstable shadow-including descendant of D.

NOTE: In the first frame, the previous frame starting point does not exist for any node, and therefore the unstable node set is empty.

2.3. Layout Shift Value

The viewport base distance is the greater of the visual viewport width and the visual viewport height.

The move vector of a Node N is the two-dimensional offset in pixel units from

the previous frame starting point of N in the coordinate space of the viewport, to
the starting point of N in the coordinate space of the viewport.

The move distance of a Node N is the greater of

the absolute value of the horizontal component of the move vector of N, and
the absolute value of the vertical component of the move vector of N.

The maximum move distance of a Document D is the greatest move distance of any Node in the unstable node set of D, or 0 if the unstable node set of D is empty.

The distance fraction of a Document D is the lesser of

the maximum move distance of D divided by the viewport base distance (or 0 if the viewport base distance is 0), and
1.0.

The node impact region of an unstable Node N is the set containing

every point in the visual representation of N, and
every point in the previous frame visual representation of N.

The impact region of a Document D is the set containing every point in the node impact region of any Node in the unstable node set of D.

The impact fraction of a Document D is the area of the impact region divided by the area of the viewport (or 0 if the area of the viewport is 0).

NOTE: Computing the area of the impact region is an instance of the Klee measure problem in two dimensions. A solution using a sweep line and a segment tree, with time complexity O(n lg n) for n unstable nodes, is described here.

The layout shift value of a Document D is the impact fraction of D multiplied by the distance fraction of D.

NOTE: The layout shift value takes into account both the fraction of the viewport that has been impacted by layout instability as well as the greatest distance by which any given element has moved. This recognizes that a large element which moves only a small distance can have a low impact on the perceived instability of the page.

2.4. Input Exclusion

An excluding input is any event from an input device which signals a user’s active interaction with the document, or any event which directly changes the size of the viewport.

Excluding inputs generally include mousedown, keydown, pointerdown, and change events. However, an event whose only effect is to begin or update a flick or scroll gesture is not an excluding input.

The user agent may delay the reporting of layout shifts after a pointerdown event until such time as it is known that the event does not begin a flick or scroll gesture.

The mousemove and pointermove events are also not excluding inputs.

3. `LayoutShift` interface

[Exposed=Window]
interface LayoutShift : PerformanceEntry {
  readonly attribute double value;
  readonly attribute boolean hadRecentInput;
  readonly attribute DOMHighResTimeStamp lastInputTime;
  readonly attribute FrozenArray<LayoutShiftAttribution> sources;
  [Default] object toJSON();
};

All attributes have the values which are assigned to them by the steps to report the layout shift.

A user agent implementing the Layout Instability API must include "layout-shift" in supportedEntryTypes for Window contexts. This allows developers to detect support for the Layout Instability API.

4. `LayoutShiftAttribution` interface

[Exposed=Window]
interface LayoutShiftAttribution {
  readonly attribute Node? node;
  readonly attribute DOMRectReadOnly previousRect;
  readonly attribute DOMRectReadOnly currentRect;
};

Each LayoutShiftAttribution is associated with a Node (its associated node).

The getter of the node attribute of a LayoutShiftAttribution instance A invokes the get an element algorithm with the associated node of A, and the node document of the associated node of A, as inputs, and returns the result of that algorithm.

Note: The use of the get an element algorithm ensures that the node attribute is null if the attributed node is no longer connected, or is inside a shadow root.

The get an element algorithm should be moved out of the Element Timing spec and into a place more suitable for reuse here.

The get an element algorithm should be generalized to accept Node instead of Element.

The previousRect and currentRect attributes have the values which are assigned to them by the steps to create the attribution.

5. Processing model

Within the update the rendering step of the event loop processing model, a user agent implementing the Layout Instability API MUST perform the following step after the step that invokes the mark paint timing algorithm:

For each fully active Document in docs, invoke the algorithm to report the layout shift for that Document.

5.1. Report the layout shift

When asked to report the layout shift for an active Document D, run the following steps:

If the current layout shift value of D is not 0:
1. Create a new LayoutShift object newEntry with D’s relevant realm.
2. Set newEntry’s name attribute to "layout-shift".
3. Set newEntry’s entryType attribute to "layout-shift".
4. Set newEntry’s startTime attribute to current high resolution time given D’s relevant global object.
5. Set newEntry’s duration attribute to 0.
6. Set newEntry’s value attribute to the current layout shift value of D.
7. Set newEntry’s lastInputTime attribute to the time of the most recent excluding input, or 0 if no excluding input has occurred during the browsing session.
8. Set newEntry’s hadRecentInput attribute to true if lastInputTime is less than 500 milliseconds in the past, and false otherwise.
9. Set newEntry’s sources attribute to the result of invoking the algorithm to report the layout shift sources for D.
10. Queue the PerformanceEntry newEntry object.

5.2. Report the layout shift sources

When asked to report the layout shift sources for an active Document D, run the following steps:

Let C be an empty list of Node objects.
For each member N of the unstable node set of D, run these steps:
1. If there exists any member existingNode of C such that the node impact region of N is a subset of the node impact region of existingNode, then continue.
2. Otherwise, if there exists any member existingNode of C such that the node impact region of existingNode is a subset of the node impact region of N, then replace the first such member existingNode with N in C.
3. Otherwise, if there are fewer than 5 members of C, then append N to C.
  
  NOTE: The choice of 5 is arbitrary but it balances providing detailed attribution while not having a prohibitive memory cost or being spammy in the set of nodes exposed.
4. Otherwise, run these steps:
  1. Let smallest be the first member of C whose node impact region is not greater in area than the node impact region of any other member of C.
  2. If the area of the node impact region of N is greater than the area of the node impact region of smallest, then replace smallest with N in C.
Return a FrozenArray of LayoutShiftAttribution objects created by running the algorithm to create the attribution once for each member of C.

When asked to create the attribution for a Node N, run the following steps:

Create a new LayoutShiftAttribution object A with N’s relevant realm.
Set the associated node of A to N.
Set the previousRect attribute of A to the smallest Rectangle containing the previous frame visual representation of N.
Set the currentRect attribute of A to the smallest Rectangle containing the visual representation of N.
Return A.

6. Security & privacy considerations

Layout instability bears an indirect relationship to resource timing, as slow resources could cause intermediate layouts that would not otherwise be performed. Resource timing information can be used by malicious websites for statistical fingerprinting. The layout instability API only reports instability in the current browsing context. It does not directly provide any aggregation of instability scores across multiple browsing contexts. Developers can implement such aggregation manually, but browsing contexts with different origins would need to cooperate to share instability scores.

Conformance

Document conventions

Conformance requirements are expressed with a combination of descriptive assertions and RFC 2119 terminology. The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in the normative parts of this document are to be interpreted as described in RFC 2119. However, for readability, these words do not appear in all uppercase letters in this specification.

All of the text of this specification is normative except sections explicitly marked as non-normative, examples, and notes. [RFC2119]

Examples in this specification are introduced with the words “for example” or are set apart from the normative text with class="example", like this:

This is an example of an informative example.

Informative notes begin with the word “Note” and are set apart from the normative text with class="note", like this:

Note, this is an informative note.

Conformant Algorithms

Requirements phrased in the imperative as part of algorithms (such as "strip any leading space characters" or "return false and abort these steps") are to be interpreted with the meaning of the key word ("must", "should", "may", etc) used in introducing the algorithm.

Conformance requirements phrased as algorithms or specific steps can be implemented in any manner, so long as the end result is equivalent. In particular, the algorithms defined in this specification are intended to be easy to understand and are not intended to be performant. Implementers are encouraged to optimize.