Navigational-Tracking Mitigations

Draft Community Group Report,

Participate:
Github Repository
Privacy CG Meetings
Issue Tracking:
GitHub
Inline In Spec
Editors:
(Brave)
(Google)
This version:
https://privacycg.github.io/nav-tracking-mitigations/

Abstract

This specification defines navigational tracking and when and how browsers are required to prevent it from happening.

Status of this document

This specification is intended to be migrated to the W3C standards track. It is not a W3C standard.

It was published by the Privacy Community Group. Please note that under the W3C Community Contributor License Agreement (CLA) there is a limited opt-out and other conditions apply. Learn more about W3C Community and Business Groups.

1. Introduction

This section is non-normative.

Browsers are working to prevent cross-site tracking, which threatens user privacy. In addition to third-party cookies and storage, other client-side methods exist that enable cross-site tracking. Navigational tracking correlates user identities across sites during navigations between those sites. Navigational tracking uses link decoration to convey information, but not all link decoration is tracking. This project attempts to distinguish tracking from non-tracking navigation and to prevent the tracking without damaging similar but benign navigations.

2. Infrastructure

This specification depends on the Infra standard. [INFRA]

3. Terminology

Link decoration is when the source of a hyperlink "decorates" its URL with extra information beyond what’s necessary to identify the page a user wants to navigate to. This information can be placed almost anywhere inside the URL.

Navigational tracking refers to the general use of one or more navigations to identify that a user on one site is the same person as a user on another site. Navigations transmit information cross-site in a few different ways, including in the target URL, which might be decorated, and in the timing of the request.

Bounce tracking refers to the use of redirects in a top-level context (including HTTP 3xx statuses, meta elements with http-equiv=refresh attributes, and script-directed navigation that doesn’t wait for user input) along with link decoration to join user identities between sites. Bounce tracking is a subset of navigational tracking and can include automated navigation through the same or different sites from the source or ultimate destination of a link.

Tracking via a bounce through an aggregation domain:

  1. The content publisher’s page (on publisher.example) embeds a third-party script from tracker.example.

  2. The third-party script tries to read an already-stored identifier, for example one it has set into publisher.example's storage or one read from a third-party tracker.example iframe.

  3. If it can’t, it redirects the top level page to tracker.example using window.location.

  4. During this load tracker.example is the first party and can read and write its cookie jar.

  5. tracker.example redirects back to the original page URL, with that URL decorated with its user ID in a query parameter.

  6. The tracker.example user ID is now available on publisher.example and can be saved into its first-party storage so that future visits don’t need to bounce.

4. Threat model

This section will precisely define the goals and non-goals of this specification’s mitigations. It will define a few classes of actors with the ability to modify websites in particular ways. Then it will define what cross-site information each of these actors can or cannot learn.

4.1. Threat actors

TODO

5. Considered Alternatives

This section is non-normative.

So far, the alternative designs consist of mitigations that various browsers have already deployed.

5.1. Deployed Mitigations

Some browsers have deployed and announced protections against navigational tracking. This section is a work in progress to detail what protections have been shipped and / or are planned. This section is not comprehensive.

5.1.1. Safari

Safari uses an algorithmic approach to combat navigational tracking. Safari labels a site as having navigational-based cross-site tracking capability if the following criteria are met within a particular client:

Sites can be "tainted" as having cross-site tracking capabilities if they redirect to sites already classified as having cross-site tracking capabilities. For example, consider the case of a user clicking on a link on start.example, which redirects to second.example, which redirects to third.example, which redirects to end.example. If Safari has classified third.example as having tracking capabilities, the above behavior can result in Safari classifying second.example as having cross-site tracking capabilities.

If the registrable domain that the user is being automatically redirected from has been classified as having cross-site tracking capabilities, Safari will delete all non-cookie storage on the site the user is being redirected to, if the user does not interact (i.e., register a user activation) on the destination site within seven days of browser use.

Additionally, if the URL the user is navigating to has either query parameters or a URL fragment, the lifetime of client-side set cookies on the destination page is capped at 24 hours.

5.1.2. Firefox

Firefox uses a list-based approach to combat navigational tracking. Sites on the Disconnect list are considered tracking sites. All storage for tracking sites is cleared after 24 hours, unless the user has interacted with the site in the first-party context in the last 45 days.

5.1.3. Brave

Brave uses three list-based approaches to combat navigational tracking.

First, Brave strips query parameters commonly used for navigational tracking from URLs on navigation. This list is maintained by Brave.

Second, in the non-default, "aggressive blocking" configuration, Brave uses popular crowd-sourced filter lists (e.g., EasyList, EasyPrivacy, uBlock Origin) to identify URLs that are used for bounce tracking, and will preempt the navigation with an interstitial (similar to Google SafeBrowsing), giving the user the option to continue the navigation or cancel it.

Third, Brave uses a list-based approach for identifying bounce tracking URLs where the destination URL is present in the URL of the intermediate tracking URL. In such cases, Brave will skip the intermediate navigation and request the destination URL instead. For example, if Brave Browser observes the user about to navigate to the URL https://tracker.example/bounce?dest=https://destination.example/, the browser might replace the navigation to tracker.example/bounce, with a navigation to https://destination.example/. This list is maintained by Brave, and is drawn from a mix of crowd-sourcing and existing open-source projects.

Acknowledgements

Many thanks to the Privacy Community Group for many good discussions about this proposal.

Conformance

Document conventions

Conformance requirements are expressed with a combination of descriptive assertions and RFC 2119 terminology. The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in the normative parts of this document are to be interpreted as described in RFC 2119. However, for readability, these words do not appear in all uppercase letters in this specification.

All of the text of this specification is normative except sections explicitly marked as non-normative, examples, and notes. [RFC2119]

Examples in this specification are introduced with the words “for example” or are set apart from the normative text with class="example", like this:

This is an example of an informative example.

Informative notes begin with the word “Note” and are set apart from the normative text with class="note", like this:

Note, this is an informative note.

Conformant Algorithms

Requirements phrased in the imperative as part of algorithms (such as "strip any leading space characters" or "return false and abort these steps") are to be interpreted with the meaning of the key word ("must", "should", "may", etc) used in introducing the algorithm.

Conformance requirements phrased as algorithms or specific steps can be implemented in any manner, so long as the end result is equivalent. In particular, the algorithms defined in this specification are intended to be easy to understand and are not intended to be performant. Implementers are encouraged to optimize.

Index

Terms defined by this specification

Terms defined by reference

References

Normative References

[HTML]
Anne van Kesteren; et al. HTML Standard. Living Standard. URL: https://html.spec.whatwg.org/multipage/
[INFRA]
Anne van Kesteren; Domenic Denicola. Infra Standard. Living Standard. URL: https://infra.spec.whatwg.org/
[RFC2119]
S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. March 1997. Best Current Practice. URL: https://datatracker.ietf.org/doc/html/rfc2119
[URL]
Anne van Kesteren. URL Standard. Living Standard. URL: https://url.spec.whatwg.org/

Informative References

[TRACKING-DNT]
Roy Fielding; David Singer. Tracking Preference Expression (DNT). 17 January 2019. NOTE. URL: https://www.w3.org/TR/tracking-dnt/

Issues Index

Changing the numbers changes what map the user sees, and embedding a user ID would not successfully transfer that user ID to the target site, but it’s hard for an automated system inside a browser to prove that, and even hard for humans reading the URL to be confident of it. [Issue #4]
The URL identifies an action rather than a page, and the user ID might be essential for that action to happen. However, this is also clearly a user ID and sufficient to track a user if the source and target collaborate. [Issue #5]
This is probably the same case as the unsubscribe link. [Issue #5]