-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sec-CH-UA randomization vs. Sec-CH-UA-Engine #52
Comments
Another disadvantage to randomness is the increased Sending Would |
As you suggested on that issue, limiting the randomness to be fixed per browser version (so that each version changes the string, but all in all, we have 1 or 2 buckets per browser version) would solve/mitigate that issue.
That's indeed what I'm concerned with. Making the engine more prominent can lead to engine block/allow lists, which is arguably not better than browser block/allow lists.
That's what I envisioned. |
@miketaylr @foolip - I'd love to hear your opinions on the above |
Also @slightlyoff, which I hear has opinions on things |
I should also mention that there may be a third option here (as explained in #54) that discourages allow/block lists from a more technical perspective. |
The randomization solution seems very strange to me. What is Sec-CH-UA actually for? If it's for telling the website what browser is being used then partially randomizing it defeats it's purpose.
I'm not sold on these allow/block lists being a problem in the first place. Yes, it goes somewhat against the open principles of the web but I don't see why trying to enforce that openness is the responsibility of the browser platform. It seems to me that Sec-CH-UA should simply tell the site what the browser is. Whether you have:
or
Seems like a moot point to me.
I.e. on the first request, only the engine is sent as this may well be all the site wants or needs to know. |
Sec-CH-UA-Engine is an interesting proposal, and a benefit is that at least initially there would be no need for one engine to pretend to be another. However, I think the dynamics in the longer term are going to be the same as for Sec-CH-UA. If this mechanism were in place and WebKit were launched today, it would likely pretend to be both KHTML and Gecko, just as it does in the UA string. Similarly, EdgeHTML would pretend to be Chromium. Looking forward, any fork of Chromium would certainly claim to be Chromium. If a wholly new engine comes along, it would likely also have to pretend to be one of the existing engines to get off the ground. In other words, I don't see this as avoiding the need to present a set of tokens and throwing in random tokens. |
Safari even today has site-specific UA string quirks where we pretend to be Chrome or Firefox (because some sites have a UA string lockout or conditional feature but work fine with a fictional UA sting). Under this new model, on those sites we'd probably need to claim to be Gecko or Chromium respectively in addition to Firefox or Chrome. I think the randomized token list (as an incentive to search for inclusion of a tag) might help a little. Note though, it only helps if it's required, not totally optional as currently written, as I suggested in #60 ) However sites might have a priority list of UA tokens to look for. For example, once they have decided a browser is Safari because that tag is present, they won't believe it's Firefox. In which case we'd have to (still) send completely fictional values, instead of half-true values. Overall I am not sure there's any solution that would let browsers make compatibility claims successfully, while also always honestly reporting their actual brand. |
@othermaciej That is interesting. I'm surprised there are sites both significant enough and lax enough to require that kind of work-around from a major browser vendor. Ideally, I would think that such issues are more a problem for the site to resolve than for the browser. However, I can also see the problem from the other side. If you have users of your browser saying 'major site x doesn't work on this browser' then that's a problem for you as well. I see this as a separate issue to the spec of the Sec-CH-UA header though. That header is either for telling the website what the browser is or it's not. If it's not (or it's randomised to the point of being useless for that) then what is it's purpose? |
@Steve51D there have been times when even some Google web properties require such a workaround (because there's a site or feature lockout but site actually works fine with a different UA string). It's even worse for WebKit-based browsers on other platforms, for example Epiphany. I think if |
@othermaciej Are these types of issues mostly caused by the fact that Epiphany has its own unique UA token? If so, this is what I was thinking Of course, neither solution is perfect. As you say, lying would likely exist in both cases (unless we pursued a way to technically discourage the use of UA tokens in allow/deny lists such as #54), but it seems that exposing equivalence class targeting by default could help smaller browsers based on larger engines avoid compatibility pitfalls caused by exposing a per-browser brand identifier by default. |
In the existing UA string, there are both Safari tokens and WebKit tokens. Many sites seem to check for the Safari token, not the WebKit one. I don't think this would change if the same info was refactored into two separate header fields. |
@othermaciej that makes sense since most browsers expose WebKit in their UA today. If the default value exposed through UA client hints was an equivalence class instead of a per-browser identifier, however, I do wonder if we'd see more developers use the default (i.e. target all WebKit-like browsers) instead of compiling these equivalence classes themselves by using |
I think there's some comprehensibility benefit here even if, for example, WebKit has to send Specifically, |
Exposing engine by default and exact browser on opt-in would probably be an improvement. Would this be sufficient for the stats-gathering purpose of browser knowledge? (I think WebKit would continue to say |
Oh, here's a complication. For bug workarounds, sites often need a version, so Engine field might need versioning. But WebKit has only a frozen version in the current UA string. I guess we could duplicate the Safari version as the WebKit version, but this would be weird for WebKit clients that are not Safari, particularly on non-Apple ports. |
I'm not an expert on what's needed for stats gathering, but my guess is that my suggestion is not sufficient for statistics. I liked the suggestion in w3ctag/design-reviews#467 (comment) to send detailed information on X% of requests. I'm not sure exactly what constraints we'd want on the choice of requests to minimize fingerprinting information: maybe roll the die each time a top-level page load starts with no storage? Sites like gs.statcounter.com could also (ask their embedder to let them) just send |
I believe #53 is tracking the splitting of version into its own CH. Would that opt-in work for sites that require versioning?
That was my thinking as well. |
It probably depends who you ask.
Overall, it seems like a good compromise between the competing interests. |
I talked to @torgo yesterday, and he raised some good points which seem relevant to this thread. Beyond that, I'm concerned that over-indexing on "engine" would limit the future forkability of rendering engines, and enforce undesired conformity between different browsers that all use the same engine. As it stands, it's possible and likely for such different browsers to differ in the features they enable or disable, and they are also free to apply their own patches on the engine in the versions they ship. All that would be harder if server-side differential serving would assume different browsers with the same engine are all identical. An approach where we have |
I completely agree, which is why I have some reservations about browsers pretending to be other browsers some fixed percentage of the time. It seems like this could cause share measurement to become difficult, as it would be less clear how much share came from a particular browser itself versus another browser with more share pretending to be that browser. If we were to pursue the
I'm not sure I follow the concern here. Given that there's no technical limitation preventing developers from creating allow/block lists, I expect that we will eventually arrive at a future where any new or forked engine will have to include the name of a more popular engine in its The code that sites would need to write to detect browsers based on such engines wouldn't change; they'd still need to parse the unique brand token in either case. The only difference is that in the
Is this assuming that a second round trip will be required on first navigate before sites will receive the client hints they've opted into? If so, then yes, I agree that the
TL;DR, my main concern with exposing both brand and engine in a single hint is that it does not move the needle very far from where we are today. While UA client hints in general will transform the UA string from a passive fingerprinting surface to an active one (something I am super supportive of), exposing both fields in a single hint by default doesn't seem like it will inspire developer change. We could certainly provide guidance encouraging developers to detect the engine field in |
Thanks all for the ongoing discussion. After talking to folks and thinking about this some more, I think the best approach would be something along those lines:
|
Closing as I didn't hear any objections to my conclusions. Please let me know if there's something more to discuss here and I'll reopen. |
@yoavweiss and I had an offline discussion today where we agreed that we needed additional community feedback on how browser equivalence classes are defined. During this conversation, we identified two potential paths forward, each with their own trade-offs:
Sec-CH-UA randomization
This is the way that the spec is authored currently and involves GREASE-ing the Sec-CH-UA set to ensure that sites cannot create block lists for unknown tokens in the set. By itself, however, it does not address the commonly seen case where sites create allow lists of known per-browser tokens to enable certain features. In order to combat this, the current proposal is to have browsers pretend to be other browsers in their equivalence-set by sending other browsers' Sec-CH-UA sets in place of their own for a small number of navigations.
Advantages
Disadvantages
Unknowns
Sec-CH-UA-Engine
This is a proposal that has been mentioned in various issues (#4, #7, #21, #29) that involves creating a new Sec-CH-UA-Engine hint that would describe a browser's underlying engine and would be sent in place of the Sec-CH-UA hint by default. The idea is that this would allow developers to target browser equivalence classes by default, while still allowing them to target individual browsers (perhaps with some penalization due to Privacy Budget) by using the Accept-CH header to request a per-browser token using the Sec-CH-UA hint.
Advantages
Disadvantages
Unknowns
In short, we'd appreciate community feedback on this issue to help drive the best outcome for the web. If you have feedback, data, or other suggestions that could help shape the future of this feature, please feel free to join the discussion!
The text was updated successfully, but these errors were encountered: