-
Notifications
You must be signed in to change notification settings - Fork 625
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hidden text passed to Readablity #930
Comments
Can you elaborate on what you mean by "obvious"? On what basis/heuristic do you think readability should discard text like this? (reuters' use of visually-hidden text that is semantically non-hidden is heavily frowned upon from an accessibility perspective... but I suppose that's not stopping them. 😞 ) |
@gijsk You have just identified heuristics. Can you say why you wouldn't use them aside from "purity"? If browser engine developers were not tolerant to issues like this, we wouldn't have the web in today's form at all. To deliver remotely acceptable results any HTML parsing package must work under assumption that everything can happen, that is the standard behaviour accepted in early 90s. Also I do agree with your point, but it has nothing to do with my issue. If you have to pick if it is proper to support 99% of users or 1%, it is normal to support 99% and not 1%. But I do agree with your idea that 1% should be supported too, just Readability API should have something like VoiceReaderAccessibility boolean parameter to switch its behaviour to properly support that 1% too. I would be very happy to see it and would use that on day 1 when it is implemented. |
@ivanlabsii I didn't say that I wouldn't use them. I'm trying to ask... based on what heuristic/algorithm/logic do you think readability should discard this text? Which of those CSS properties do you think should "count" towards saying "this text should not be included in the output"? I don't think the answer is "obvious"... my money would probably be on the clip rect, though it would (a) be a pain to parse it correctly irrespective of the syntax used and (b) would ignore the cases without the clip (but e.g. off-screen positioned 1px high/wide items). |
This is the sample page:
https://www.reuters.com/legal/qualcomm-saw-nuvia-buy-chance-save-14-billion-year-arm-fees-ceo-tells-jury-2024-12-18/
It contains the hidden text in this form:
<span style="border: 0px; clip: rect(0px, 0px, 0px, 0px); clip-path: inset(50%); height: 1px; margin: -1px; overflow: hidden; padding: 0px; position: absolute; width: 1px; white-space: nowrap;">, opens new tab</span>
This is passed as a cleaned span to Readability, though it should be obvious that it is hidden:
<span>, opens new tab</span>
The text was updated successfully, but these errors were encountered: