1. Introduction
This section is non-normative.
Browsers are working to prevent cross-site tracking, which threatens user privacy. In addition to third-party cookies and storage, other client-side methods exist that enable cross-site tracking. Navigational tracking correlates user identities across sites during navigations between those sites. Navigational tracking uses link decoration to convey information, but not all link decoration is tracking. This project attempts to distinguish tracking from non-tracking navigation and to prevent the tracking without damaging similar but benign navigations.
2. Infrastructure
This specification depends on the Infra standard. [INFRA]
3. Terminology
Link decoration is when the source of a hyperlink "decorates" its URL with extra information beyond what’s necessary to identify the page a user wants to navigate to. This information can be placed almost anywhere inside the URL.
Navigational tracking refers to the general use of one or more navigations to identify that a user on one site is the same person as a user on another site. Navigations transmit information cross-site in a few different ways, including in the target URL, which might be decorated, and in the timing of the request.
Examples and non-examples of link decoration and navigational tracking, with the potential decoration or tracking element emphasized:
https://publisher.example/page?userId=5789rhkdsaf8urfnsd
-
Link decoration, and also navigational tracking.
https://bookshop.org/a/1122/9780062252074
-
Link decoration but not navigational tracking: This number identifies an affiliate to credit with a book sale. Replacing this with another number gets to the same target page.
https://bookshop.org/a/1122/9780062252074
-
Not decoration: This number identifies a particular book. Changing it yields a different target page.
https://bugzilla.mozilla.org/show_bug.cgi?id=1460058
-
Not decoration: changing the number changes which bug the user sees.
https://www.google.com/maps/@37.4220328,-122.0847584,17.12z
-
Changing the numbers changes what map the user sees, and embedding a user ID would not successfully transfer that user ID to the target site, but it’s hard for an automated system inside a browser to prove that, and even hard for humans reading the URL to be confident of it. [Issue #4]
https://publisher.example/unsubscribe?userId=5789rhkdsaf8urfnsd
-
The URL identifies an action rather than a page, and the user ID might be essential for that action to happen. However, this is also clearly a user ID and sufficient to track a user if the source and target collaborate. [Issue #5]
https://example.com/auth/callback?token=1234567
-
This is probably the same case as the unsubscribe link. [Issue #5]
https://example.com/login?returnto=item/12345
-
Assuming a request for this URL shows a login page instead of immediately redirecting to
item/12345
, this is a link decoration but not navigational tracking.
Bounce tracking refers to the use of redirects in a top-level context
(including HTTP 3xx statuses, meta
elements with http-equiv
=refresh
attributes, and script-directed
navigation that doesn’t wait for user input) along with link decoration to
join user identities between sites. Bounce tracking is a subset of navigational tracking and can include automated navigation through the same
or different sites from the source or ultimate destination of a link.
Tracking via a bounce through an aggregation domain:
-
The content publisher’s page (on
publisher.example
) embeds a third-party script fromtracker.example
. -
The third-party script tries to read an already-stored identifier, for example one it has set into
publisher.example
's storage or one read from a third-partytracker.example
iframe
. -
If it can’t, it redirects the top level page to
tracker.example
usingwindow.location
. -
During this load
tracker.example
is the first party and can read and write its cookie jar. -
tracker.example
redirects back to the original page URL, with that URL decorated with its user ID in a query parameter. -
The
tracker.example
user ID is now available onpublisher.example
and can be saved into its first-party storage so that future visits don’t need to bounce.
4. Threat model
This section will precisely define the goals and non-goals of this specification’s mitigations. It will define a few classes of actors with the ability to modify websites in particular ways. Then it will define what cross-site information each of these actors can or cannot learn.
4.1. Threat actors
TODO
5. Considered Alternatives
This section is non-normative.
So far, the alternative designs consist of mitigations that various browsers have already deployed.
5.1. Deployed Mitigations
Some browsers have deployed and announced protections against navigational tracking. This section is a work in progress to detail what protections have been shipped and / or are planned. This section is not comprehensive.
5.1.1. Safari
Safari uses an algorithmic approach to combat navigational tracking. Safari classifies a site as having cross-site tracking capabilities if the following criteria are met within a particular client:
-
The site appears as a third-party resource under enough different registrable domains.
-
The site automatically redirects the user to enough other sites, immediately or after a short delay.
-
The site redirects to sites that are classified as trackers, recursively.
For example, consider the case of a user clicking on a link on
start.example
, which redirects tosecond.example
, which redirects tothird.example
, which redirects toend.example
. If Safari has classifiedthird.example
as having tracking capabilities, the above behavior can result in Safari classifyingsecond.example
as having cross-site tracking capabilities.
If a user navigates or is redirected from a classified tracker with a URL that includes either query parameters or a URL fragment, the lifetime of client-side set cookies on the destination page is capped at 24 hours.
5.1.2. Firefox
Firefox uses a list-based approach to combat navigational tracking. Sites on the Disconnect list are considered tracking sites. All storage for tracking sites is cleared after 24 hours, unless the user has interacted with the site in the first-party context in the last 45 days.
Firefox is also starting to remove query parameters known to be used for cross-site tracking. ([FSN-2021-Q4]) The affected query parameters are chosen using the criteria on the Mozilla Anti Tracking Policy, which includes:
-
High-entropy parameters that might identify a user or encode user data, except:
-
Parameters exclusively identifying specific elements or actions on the navigating page (per-click or per-element identifiers), as long as those parameters assign a different value to each click or element they are identifying.
-
Identifiers necessary to complete a user-initiated task such as logging in or submitting a form.
-
-
High-entropy parameters that are broadly included in nearly all outgoing navigations from a site, even if the parameters don’t uniquely identify a user.
As of May 2022, this query-parameter stripping is applied by default in the Firefox Nightly build, and planned to be enabled in strict ETP mode and in private browsing.
5.1.3. Brave
Brave uses four list-based approaches to combat navigational tracking.First, Brave strips query parameters commonly used for navigational tracking from URLs on navigation. This list is maintained by Brave.
Second, by default, when i) the user is about to visit a list-identified
bounce-tracking URL, and ii) the current profile does not contain any cookies
or localStorage
for that site, Brave will create a new, "ephemeral", empty storage
area for the site. This storage area persists as long as the user has
any top-level frames open for the site. As soon as the user has no
top-level frames for the labeled bounce-tracking site, the ephemeral storage
area is deleted.
Third, in the non-default, "aggressive blocking" configuration, Brave uses popular crowd-sourced filter lists (e.g., EasyList, EasyPrivacy, uBlock Origin) to identify URLs that are used for bounce tracking, and will preempt the navigation with an interstitial (similar to Google SafeBrowsing), giving the user the option to continue the navigation or cancel it.
Fourth, Brave uses a list-based approach for identifying bounce tracking
URLs where the destination URL is present in the URL of the intermediate
tracking URL. In such cases, Brave will skip the intermediate navigation
and request the destination URL instead. For example, if Brave
Browser observes the user about to navigate to the URL https://tracker.example/bounce?dest=https://destination.example/
,
the browser might replace the navigation to tracker.example/bounce
,
with a navigation to https://destination.example/
. This list
is maintained by Brave, and is drawn from a mix of crowd-sourcing and
existing open-source projects.
Acknowledgements
Many thanks to the Privacy Community Group for many good discussions about this proposal.