Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should platform brand hint be a semicolon-separated list? #91

Closed
mcatanzaro opened this issue Mar 18, 2020 · 8 comments
Closed

Should platform brand hint be a semicolon-separated list? #91

mcatanzaro opened this issue Mar 18, 2020 · 8 comments

Comments

@mcatanzaro
Copy link

We'd want platform brand to default to, e.g. "Linux", but Ubuntu and Fedora will both insist on using their own branding. In a traditional user agent string that looks like "Mozilla/5.0 (X11; Fedora; Linux x86_64)". Disaster would result without X11 (which is a lie, because we stopped using X11 years ago, but that value has ossified) or without Linux, and just adding Fedora breaks some Google websites, so we have a quirk to not send it to those.

Anyway, point is that platform brand might really be a list, but it won't be parsed as a list unless the spec says to do so. E.g. we could make it semicolon-separated and GREASEd with gibberish values (but not fake values; e.g. "asdf; Windows" but never "Linux; macOS; Windows", following your proposal in #52 (comment)) to ensure servers parse it as a list. What I don't want to wind up with is a situation where "Ubuntu" or "Ubuntu Linux" or "Ubuntu; Linux" ossifies and we have to start faking Ubuntu in the platform brand.

The current spec says: "User agents SHOULD keep these strings short and to the point, but servers MUST accept arbitrary values for each, as they are all values constructed at the user agent's whim." Which is not compatible with list processing.

Downside of this proposal: adding a list structure to just one of the hints badly breaks the parallelism of the spec. It's elegant that currently the hints are all unstructured. Having just one of the hints be a list would be kinda weird. Another downside: currently only the UA list needs to be GREASEd (if I read the proposal correctly). Having exactly one of the hints require GREASing would be annoying. So I'm not sure we should necessarily change anything... it might be OK to send just "Ubuntu" or "Fedora" by default, and then maintain quirks for websites that expect it to be "Linux" or "Ubuntu". Quirks will always be required no matter how we construct the spec.

CC @othermaciej

@yoavweiss
Copy link
Collaborator

Thanks for the thoughtfully-formed question! :)

I'd be inclined to say that a list is indeed needed if we were in a situation where the platform and its version expose the same amount of entropy and should always be exposed together.

In that world, I'd say that platform can be a list of NavigatorUABrandVersion.

But, we are not there. Platform version exposes different entropy characteristics, and it's likely that developers would want to look into one but not the other (e.g. look into platform for styling decisions, but without looking at a version number).

As such, I'm reluctant to (re-)entangle those 2 values together into a single hint. I'm also not a fan of conditionally filling the version number in that single hint, as it adds complexity for both users and implementers.

@amtunlimited
Copy link
Contributor

As an added note, adding more brand than "linux" shoots the entropy for that client hint up a handful of bits and adds a bunch of extra buckets with (relatively) low sizes, making that hint more of an issue for some people and not others. It could complicate things like trying to calculate how much entropy each header is adding to the budget.

@mcatanzaro
Copy link
Author

So Yoav, I'm not sure I entirely understand your recommendation.

Platform version is something we'll just leave blank. It would be meaningless if the platform was Linux (what would we expose? kernel version?) and too much entropy if the platform is the operating system (exposing "Fedora 32", "Fedora 31", etc. is just too much).

What we want to do is make sure we stay compatible with any websites that are looking for "Linux" but also not ever expose "Linux" alone without "Fedora" or "Ubuntu" due to OS branding requirements.

As an added note, adding more brand than "linux" shoots the entropy for that client hint up a handful of bits and adds a bunch of extra buckets with (relatively) low sizes, making that hint more of an issue for some people and not others. It could complicate things like trying to calculate how much entropy each header is adding to the budget.

It is a requirement of both Ubuntu and Fedora, though, so it's going to happen. Small distributions probably don't want to do this, but bigger distributions have branding requirements. We are not going to send "Linux" without also sending "Fedora," and Ubuntu is not going to want to do this either.

@yoavweiss
Copy link
Collaborator

yoavweiss commented Apr 27, 2020

Apologies for the delayed reply...

What we want to do is make sure we stay compatible with any websites that are looking for "Linux" but also not ever expose "Linux" alone without "Fedora" or "Ubuntu" due to OS branding requirements.

As an added note, adding more brand than "linux" shoots the entropy for that client hint up a handful of bits and adds a bunch of extra buckets with (relatively) low sizes, making that hint more of an issue for some people and not others. It could complicate things like trying to calculate how much entropy each header is adding to the budget.

It is a requirement of both Ubuntu and Fedora, though, so it's going to happen. Small distributions probably don't want to do this, but bigger distributions have branding requirements. We are not going to send "Linux" without also sending "Fedora," and Ubuntu is not going to want to do this either.

I understand, and that seems like something that's already exposed in the User-Agent string. Let me think about this a bit.

@yoavweiss
Copy link
Collaborator

I was informed that the addition of Linux distro to the current User-Agent string is something that's done by extensions, rather than part of the browser's code (and something browser privacy folks are not excited about, due to the added entropy. As such, I don't think there's necessarily a need to officially make the platform a list.

Linux distros could switch from adding that entropy passively to the User-Agent header, to adding it only to requests which contain Sec-CH-UA-Platform. That can enable their analytics use-case with lower fingerprinting risk.

@mcatanzaro
Copy link
Author

I was informed that the addition of Linux distro to the current User-Agent string is something that's done by extensions, rather than part of the browser's code

Hm, on Fedora, that's only correct for Chromium, and only because we want it to work in Google Chrome also and can't build that ourselves. In Firefox and WebKit, the user agent is configured at build time.

@mcatanzaro
Copy link
Author

Linux distros could switch from adding that entropy passively to the User-Agent header, to adding it only to requests which contain Sec-CH-UA-Platform. That can enable their analytics use-case with lower fingerprinting risk.

But that sounds good to me. ;)

@yoavweiss
Copy link
Collaborator

Great! Closing for now, but let me know if you feel more discussion is warranted

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants