Clear Site Data

1. Introduction

This section is not normative.

Web applications store data locally on a user’s computer in order to provide functionality while the user is offline, and to increase performance when the user is online. These local caches have significant advantages for both users and developers, but present risks as well.

A user’s data is both sensitive and valuable; web developers ought to take reasonable steps to protect it. One such step would be to encrypt data before storing it. Another would be to remove data from the user’s machine when it is no longer necessary (for example, when the user signs out of the application, or deletes their account).

Site authors can remove data from a number of storage mechanisms via JavaScript, but others are difficult to deal with reliably. Consider cookies, for instance, which can be partially cleared via JavaScript access to document.cookie. HttpOnly cookies, however, can only be removed via a number of Set-Cookie headers in an HTTP response. This, of course, requires exhaustive knowledge of all the cookies set for a host, which can be complicated to ascertain. Cache is still harder; no imperative interface to a browser’s network cache exists, period.

This document defines a new mechanism to deal with removing data from these and other types of local storage, giving web developers the ability to clear out a user’s local cache of data via the Clear-Site-Data HTTP response header.

1.1. Examples

1.1.1. Signing Out

A user signs out of Super Secret Social Network via a CSRF-protected POST to https://supersecretsocialnetwork.example.com/logout, and the site author wishes to ensure that locally stored data is removed as a result.

They can do so by sending the following HTTP header in the response:

Clear-Site-Data: *

1.1.2. Targeted Clearing

A user signs out of Megacorp Inc.'s site via a CSRD-protected POST to https://megacorp.example.com/logout. Megacorp has a large number of services available as subdomains, so many that it’s not entirely clear which of them would be safe to clear as a response to a logout action. One option would be to simply clear everything, and deal with the fallout. Megacorp’s CEO, however, once lost hours and hours of progress in "Irate Ibix" due to inadvertant site-data clearing, and so is refuses to allow such a sweeping impact to the site’s users.

The developers know, however, that the "Minus" application is certainly safe to clear out. They can target this specific subdomain by including a request to that subdomain as part of the logout landing page (ideally as a CORS-enabled, CSRF-protected POST):

fetch("https://minus.megacorp.example.com/clear-site-data",
      {
          method: "POST",
          mode: "cors",
          headers: new Headers({
              "CSRF": "[insert sekrit token here]"
          })
      });

That endpoint would return proper CORS headers in response to that request’s preflight, and would return the following header for the actual request:

Clear-Site-Data: *; includeSubdomains

1.1.3. Keep Critical Cookies

A user opts-out of interest-based advertising via a CSRF-protected POST to https://ads-are-awesome.example.com/optout. The site author wishes to remove DOM-accessible data which might contain tracking information, but needs to ensure that the opt-out cookie which the user has just received isn’t wiped along with it.

They can do so by sending the following HTTP header in the response, which includes all the types except for "cookies":

Clear-Site-Data: domStorage executionContexts cache; includeSubdomains

1.1.4. Kill Switch

Super Secret Social Network’s developers learn that the site was vulnerable to cross-site scripting attacks which allowed malicious parties to inject arbitrary code into its origin. They fixed the site, and added a strong Content Security Policy [CSP2] to mitigate the risk going forward, but they can’t be entirely sure that clients are really back to a trustworthy state. Perhaps the attackers found a clever persistence mechanism?

They can reduce the risk of a persistent client-side XSS by sending the following HTTP header in a response to wipe out local sources of data:

Clear-Site-Data: *; includeSubdomains

Note: Installing a Service Worker guarantees that a request will go out to a server every ~24 hours. That update ping would be a wonderful time to send a header like this one in case of catastophe. [SERVICE-WORKERS]

1.2. Goals

Generally, the goal is to allow web developers more control over the data stored locally by a user agent for their origins. In particular, developers should be able to reliably ensure the following:

Data stored in an origin’s client-side storage mechanisms like [INDEXEDDB], WebSQL, Filesystem, localStorage, and sessionStorage is cleared.
Cookies for an origin’s host are removed [RFC6265].
Web Workers (dedicated and shared) running for an origin are terminated.
Service Workers registered for an origin are terminated and deregistered.
Resources from an origin are removed from the user agent’s local cache.
All of the above can be propagated to an origin’s host’s subdomains.
All of the above can be propagated to the HTTP version of an HTTPS origin.
None of the above can be bypassed by a maliciously active document that retains interesting data in memory, and rewrites it if it’s cleared.

2. Clearing Site Data

Developers may instruct a user agent to clear various types of relevant data in two ways: an HTTP response header, and a JavaScript API:

The Clear-Site-Data HTTP response header field sends a signal to the user agent that it ought to remove all data of a certain set of types. The header is represented by the following ABNF:

"Clear-Site-Data:" *WSP data-type-list *[ ";" *WSP extension *WSP ]  *WSP

data-type-list = "*" / ( type *( " " type ) )
type = "domStorage" / "cookies" / "executionContexts" / "cache"
extension = subdomain-extension / unknown-extension
subdomain-extension = "includeSubdomains"
unknown-extension = *( WSP / <VCHAR except ";" and ","> )

The header’s value contains either the U+002A ASTERISK character (*) or a list of type exclusions, followed by a set of options.

If the header’s value’s data-type-list component is "*", then all data types specified in this document that are related to this site will be removed.

Parsing details can be found in §3.1 Parsing.

User agent conformance details are detailed in §3.2 Clear data for response. Those steps represent the following requirements when the header is present in a response (response):

User agents MUST ignore the Clear-Site-Data header if it is delivered along with a Response whose URL is a priori insecure.

Note: This means that the header will be ignored for unauthenticated or unencrypted connections ("HTTP" vs "HTTPS", for example).
If the value of the header’s data-type-list contains cookies or *, then all cookies which would be sent along with any request to the response’s url's host MUST be removed.

Further, if the same header’s extension contains includeSubdomains, then all cookies which would be sent along with any request to any host which is a subdomain of response’s url's host MUST be removed.
If the value of the header’s data-type-list contains domStorage or *, then all DOM-accessible storage mechanisms (localStorage, sessionStorage, [INDEXEDDB], [WEBDATABASE], etc) for response’s url's origin MUST be cleared.

If the includeSubdomains option is present, then all DOM-accessible storage mechanisms for any origin whose host is a subdomain of response’s url's host MUST be cleared.
If the value of the header’s data-type-list contains cache or *, then all locally cached data for response’s url's origin MUST be removed.

If the includeSubdomains option is present, then all locally cached data for any host which is a subdomain of response’s url's host MUST be removed.
If the value of the header’s data-type-list contains executionContexts or *, then all browsing contexts whose active Document’s origin is identical to url's origin MUST be neutered by tightly sandboxing them.

If the includeSubdomains option is present, then all browsing contexts whose active Document’s origin’s host is a subdomain of response’s url's host MUST be neutered.

2.2. JavaScript API

This might live more cleanly in [STORAGE].

Megacorp, Inc. wants to remove data in response to a user’s activity on their site. They can execute the following JavaScript to clear all the relevant data for a user:

navigator.storage.clear();

If they only wished to clear the otherwise inaccessible cache for the current origin and all subdomains:

navigator.storage.clear({
  types: [ "cache" ],
  includeSubdomains: true
});

enum StorageClearType {
  "cache",
  "cookies",
  "domStorage",
  "executionContexts"
};

dictionary StorageClearOptions {
  sequence<StorageClearType> types;
  boolean includeSubdomains = false;
};

partial interface StorageManager {
  Promise<void> clear(StorageClearOptions options);
};

clear(options)

Clears data based on the values in the options argument. Returns a Promise that resolves when clearing is complete. If no types are specified, all data types will be cleared.

Arguments for the StorageManager.clear(options) method.
Parameter	Type	Nullable	Optional	Description
options	StorageClearOptions	✘	✘	The data to clear.

2.3. Fetch Integration

Monkey patching! Talk with Anne.

If the Clear-Site-Data header is present in an HTTP response, then data MUST be cleared before rendering the response to the user. That is, before step #9 in the current main fetch algorithm, execute the following step:

If response’s header list contains a header named Clear-Site-Data, then execute §3.2 Clear data for response on response.

Note: This happens after Set-Cookie headers are processed. If we clear cookies, we clear all of them. This is intentional, as removing only certain cookies might leave an application in an indeterminate and vulnerable state. Removing specific cookies is best done via expiration using the Set-Cookie header.

3. Algorithms

3.1. Parsing

3.1.1. Which data types ought to be removed for `response`? x

If response does not contain a Clear-Site-Data header, return an empty list.
Let types be the value of response’s Clear-Site-Data data-type-list component, split on spaces.
If types contains a single entry whose value is *:
1. Append cache, cookies, domStorage, and executionContexts to remove.
2. Return remove.
Let remove be an empty list.
For each token in types:
1. If token is a valid type, append it to to remove.
2. Otherwise, ignore token.
Return remove.

3.1.2. Should subdomains' data be cleared for `response`

Let extensions be the list of response’s Clear-Site-Data extension components.
If extensions contains includeSubdomains, return Include Subdomains.
Otherwise, return Exclude Subdomains.

3.1.3. Does `origin` match `origin to clear` and `subdomain state`

Given an origin, the origin to clear, and the "include subdomains" flag, return Matches or Does Not Match.

If either origin or origin to clear are globally unique identifiers, return Does Not Match.
If origin is the same as origin to clear, return Matches.
If subdomain state is Exclude Subdomains, return Does Not Match.
Let labels to clear be the host component of origin to clear split into labels, and labels be the host component of origin, split into labels.
If labels does not have more entries than labels to clear, return Does Not Match.
While labels to clear is not empty:
1. If the final entry of labels to clear does not exactly match the final entry of labels, return Does Not Match.
2. Remove the final entry of labels to clear, and of labels.
Return Matches.

3.2. Clear data for `response`

Given a response (response), this algorithm parses the Clear-Site-Data header to determine what needs to be cleared, which origins are affected, and then executes those requests.

If response’s URL is a priori insecure, skip the remaining steps of this algorithm.

Some have suggested that this might not be a restriction we want (see Martin Thomson’s public-webappsec post on the topic, for example).
Let types be the result of §3.1.1 Which data types ought to be removed for response? executed on response.
Let subdomain state be the result of §3.1.2 Should subdomains' data be cleared for response executed on response.
Execute §3.4 Clear types for origin with subdomain state on types, response’s url's origin, and subdomain state.

Note: Especially given the cross-context implications, user agents are are encouraged to give web developers some mechanism by which the clearing operation can be debugged. This might take the form of a console message or timeline entry indicating success.

3.3. Clear data for `storageRequestOptions`

Given a StorageClearOptions (options), this algorithm determines what needs to be cleared, returns a Promise, and executes the request asynchronously.

If the incumbent settings object is not a secure context, return a Promise rejected with NotSupportedError.
Let promise be a newly created Promise object.
Return promise, and execute the remaining steps asynchronously.
Let subdomain state be Include Subdomains if options' includeSubdomains property is true, and Exclude Subdomains otherwise.
Let types be an empty list.
If options' types is an empty sequence:
1. Append cache, cookies, domStorage, and executionContexts to types.
Otherwise, for each StorageClearType type in options' types property:
1. Append type to types.
Execute §3.4 Clear types for origin with subdomain state on types, the incumbent settings object’s origin, and subdomain state.
Resolve promise with undefined.

3.4. Clear `types` for `origin` with `subdomain state`

If types contains "executionContexts", execute §3.4.1 Neuter browsing contexts matching origin with subdomain state on origin, with subdomain state.
If types contains "cookies", execute §3.4.4 Clear cookies for origin with subdomain state on origin, with subdomain state.
If types contains "domStorage", execute §3.4.5 Clear DOM-accessible storage for origin with subdomain state on origin, with subdomain state.
If types contains "cache", execute §3.4.3 Clear cache for origin with subdomain state on origin, with subdomain state.
If types contains "executionContexts", execute §3.4.2 Reload browsing contexts matching origin with subdomain state on origin, with subdomain state.

3.4.1. Neuter browsing contexts matching `origin` with `subdomain state`

Given an origin (origin) and a subdomain state of either Include Subdomains or Exclude Subdomains, this algorithm walks through the set of browsing contexts which the user agent knows about, and sandboxes each in order to prevent them from recreating cleared data (from in-memory JavaScript variables, for instance). Once data is cleared, the affected browsing contexts will be hard-reloaded, as defined in §3.4.2 Reload browsing contexts matching origin with subdomain state:

For each context in the user agent’s set of browsing contexts:
1. Let document be context’s active document.
2. While document is an iframe srcdoc document, let document be the active document of document’s browsing context container.
3. If §3.1.3 Does origin match origin to clear and subdomain state returns Matches when executed on context’s origin, origin, and subdomain state:
  1. Parse a sandboxing directive using the empty string as the input, and document’s active sandboxing flag set as the output.

3.4.2. Reload browsing contexts matching `origin` with `subdomain state`

For each context in the user agent’s set of browsing contexts:
1. Let document be context’s active document.
2. While document is an iframe srcdoc document, let document be the active document of document’s browsing context container.
3. If §3.1.3 Does origin match origin to clear and subdomain state returns Matches when executed on context’s origin, origin, and subdomain state:
  1. Navigate context to document’s URL with replacement enabled and exceptions enabled. The source browsing context is context. This is a reload-triggered navigation.

3.4.3. Clear cache for `origin` with `subdomain state`

Given an origin (origin) and a subdomain state of either Include Subdomains or Exclude Subdomains, this algorithm removes data from the user agent’s local caches that matches the origin and subdomain state.

Let host be origin’s host, canonicalized as per Section 5.1.2 of [RFC6265].
If subdomain state is Include Subdomains, then let cache list be the set of entries from the network cache whose target URI’s host domain-matches host when canonicalized as per Section 5.1.2 of [RFC6265]
Otherwise, subdomain state is Exclude Subdomains, so let cache list be the set of entries from the network cache whose target URI host is identical to host when canonicalized as per Section 5.1.2 or [RFC6265].
Remove each entry in cache list from the network cache.
If a user agent implements caches beyond a pure network cache, it MUST remove all entries from those caches which match origin and subdomain state.

We’re dealing with the network cache here, as defined in [RFC7234], but that’s not nearly everything a user agent caches. How hand-wavey with the vendor-specific section can we be? For instance, Chrome clears out prerendered pages, script caches, WebGL shader caches, WebRTC bits and pieces, address bar suggestion caches, various networking bits that aren’t representations (HSTS/HPKP, SCDH, etc.). Perhaps [STORAGE] will make this clearer?

3.4.4. Clear cookies for `origin` with `subdomain state`

Given an origin (origin) and a subdomain state of either Include Subdomains or Exclude Subdomains, this algorithm removes cookies from the user agent’s cookie store whose domain attribute matches the origin and subdomain state.

Note: This algorithm assumes that the user agent has implemented a cookie store (as discussed in Section 5.3 of [RFC6265]), which offers the ability to retrieve a list of cookies by host, and to remove individual cookies.

Let host be origin’s host, canonicalized as per Section 5.1.2 of [RFC6265].
If subdomain state is Include Subdomains, then let cookie list be the set of cookies from the cookie store whose domain attribute is domain-matched by host.

Note: The direction of the matching is important. If subdomain.example.com delivers the Clear-Site-Data header and includes subdomains, then cookies for .another.subdomain.example.com will be cleared, but cookies for .example.com will not.
Otherwise, subdomain state is Exclude Subdomains, so let cookie list be the set of cookies from the cookie store whose domain attribute is identical to host.
Remove each cookie in cookie list from the cookie store.

3.4.5. Clear DOM-accessible storage for `origin` with `subdomain state`

For each area in the user agent’s set of local storage areas [HTML]:
1. If §3.1.3 Does origin match origin to clear and subdomain state returns Matches when executed on area’s origin, origin, and subdomain state:
  1. Execute clear() on the Storage object associated with area.
For each area in the user agent’s set of session storage areas [HTML]:
1. If §3.1.3 Does origin match origin to clear and subdomain state returns Matches when executed on area’s origin, origin, and subdomain state:
  1. Execute clear() on the Storage object associated with area.
For each database in the user agent’s set of Indexed Databases [INDEXEDDB]:
1. If §3.1.3 Does origin match origin to clear and subdomain state returns Matches when executed on database’s origin, origin, and subdomain state:
  1. Set database’s delete pending flag to true.
  2. For each connection in the set of all IDBDatabase objects connected to database:
    1. Execute the database closing steps on connection.
  3. Execute the database deletion steps on database, passing in database’s origin and name.
For each database in the user agent’s set of WebSQL databases [WEBDATABASE]:
1. If §3.1.3 Does origin match origin to clear and subdomain state returns Matches when executed on database’s origin, origin, and subdomain state:
  1. Delete database.
    
    The [WEBDATABASE] spec is fairly unhelpful here with regard to deletion details.
For each registration in the user agent’s set of registered service worker registrations:
1. If §3.1.3 Does origin match origin to clear and subdomain state returns Matches when executed on registration’s scope URL’s origin, origin, and subdomain state:
  1. Execute unregister() on registration.

We still need to spell out Filesystems, Dedicated Workers, Shared Workers, etc. (This isn’t an exhaustive list. We should fix that too.)

How do we say something about plugins here? Point out to NPP_ClearSiteData?

4. Privacy Considerations

4.1. Web developers control the timing.

If triggered at appropriate times, Clear-Site-Data can increase a user’s privacy and security by clearing sensitive data from their user agent. However, note that the web developer (and not the user) is in control of when the clearing event is triggered. Even assuming a non-malicious site author, users can’t rely on data being cleared at any particular point, nor are users in control of what data types are cleared.

If a user wishes to ensure that site data is indeed cleared at some specific point, they ought to rely on the data-clearing functionality offered by their user agent.

At a bare minimum, user agents OUGHT TO (in the [RFC6919] sense of the words) offer the same functionality to users that they offer to web developers. Ideally, they will offer significantly more than we can offer at a platform level (clearing browsing history, for example).

4.2. Remnants of data on disk.

While Clear-Site-Data triggers a clearing event in a user’s agent, it is difficult to make promises about the state of a user’s disk after a clearing event takes place. In particular, note that it is up to the user agent to ensure that all traces of a site’s date is actually removed from disk, which can be a herculean task (consider virtual memory, as a good example of a larger issue).

In short, most user agents implement data clearing as "best effort", but can’t promise an exhaustive wipe.

If a user wishes to ensure that site data does not remain on disk, the best way to do so is to use a browsing mode that promises not to intentionally write data to disk (Chrome’s "Incognito", Internet Explorer’s "InPrivate", etc). These modes will do a better job of keeping data off disk, but are still subject to a number of limitations at the edges.

5. IANA Considerations

The permanent message header field registry should be updated with the following registration: [RFC3864]

5.1. Clear-Site-Data

Header field name: Clear-Site-Data
Applicable protocol: http
Status: standard
Author/Change controller: W3C
Specification document: This specification (See §2.1 The Clear-Site-Data HTTP Response Header Field)

6. Acknowledgements

Michal Zalewski proposed a variant of this concept, and Mark Knichel helped refine the details.

Conformance

Document conventions

Conformance requirements are expressed with a combination of descriptive assertions and RFC 2119 terminology. The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in the normative parts of this document are to be interpreted as described in RFC 2119. However, for readability, these words do not appear in all uppercase letters in this specification.

All of the text of this specification is normative except sections explicitly marked as non-normative, examples, and notes. [RFC2119]

Examples in this specification are introduced with the words "for example" or are set apart from the normative text with class="example", like this:

Informative notes begin with the word "Note" and are set apart from the normative text with class="note", like this:

Conformant Algorithms

Requirements phrased in the imperative as part of algorithms (such as "strip any leading space characters" or "return false and abort these steps") are to be interpreted with the meaning of the key word ("must", "should", "may", etc) used in introducing the algorithm.

Conformance requirements phrased as algorithms or specific steps can be implemented in any manner, so long as the end result is equivalent. In particular, the algorithms defined in this specification are intended to be easy to understand and are not intended to be performant. Implementers are encouraged to optimize.

Conformance Classes

A conformant user agent must implement all the requirements listed in this specification that are applicable to user agents.

A conformant server must implement all the requirements listed in this specification that are applicable to servers.

Clear Site Data

W3C First Public Working Draft, 4 August 2015

Abstract

Status of this document

Table of Contents

1. Introduction

1.1. Examples

1.1.1. Signing Out

1.1.2. Targeted Clearing

1.1.3. Keep Critical Cookies

1.1.4. Kill Switch

1.2. Goals

2. Clearing Site Data

2.1. The Clear-Site-Data HTTP Response Header Field

2.2. JavaScript API

2.3. Fetch Integration

3. Algorithms

3.1. Parsing

3.1.1. Which data types ought to be removed for response? x

3.1.2. Should subdomains' data be cleared for response

3.1.3. Does origin match origin to clear and subdomain state

3.2. Clear data for response

3.3. Clear data for storageRequestOptions

3.4. Clear types for origin with subdomain state

3.4.1. Neuter browsing contexts matching origin with subdomain state

3.4.2. Reload browsing contexts matching origin with subdomain state

3.4.3. Clear cache for origin with subdomain state

3.4.4. Clear cookies for origin with subdomain state

3.4.5. Clear DOM-accessible storage for origin with subdomain state