Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add proper canonicalization of domain names #255

Open
wants to merge 22 commits into
base: main
Choose a base branch
from

Conversation

rubycon
Copy link
Contributor

@rubycon rubycon commented Jan 30, 2025

The current specs only handle host canonicalization for IP addresses and opaque hostnames but if an URL has a special scheme its host should be normalized as a domain name (IDNA processing).

Cf. #220

  • Tests are written and can be reviewed and commented upon at:
    • At least 2 tests already exist in WPT: Pattern: [{ "hostname": "xn--caf-dma.com" }], inputs= [{ "hostname": "café.com" }]; Pattern: ["http://🚲.com/"], Expected: { "protocol": "http", "hostname": "xn--h78h.com", "pathname": "/" }.
  • The top of this comment includes a clear commit message to use.

(See WHATWG Working Mode: Changes for more details.)


Preview | Diff

Yves-Marie and others added 22 commits March 15, 2024 14:39
…#233)

Per https://whatwg.org/working-mode#anchors this is being done to ensure
anchors to these concepts continue to work into the future.

Resolves whatwg#231.
These examples form the only quick reference material in the spec. While MDN goes into more depth, brief explanations of what is happening in this syntax don't take up much room and may make it easier to tell what's going on, especially with regexp parts.

The example explaining pattern strings is also expanded to be explicit that regular expressions are enclosed in parentheses.

Resolves whatwg#229.
Per discussion on whatwg#182 some text explaining this would be useful.

This mostly consists of advice since the useful algorithms are already exposed.
This PR fixes pathname processing for inputs that have opaque pathnames.
This PR fixes pathname processing for inputs that have opaque pathnames.

(Amended by editor to apply to the correct draft.)
Reverting the review draft change caused a commit that the deploy script
doesn't like. This one doesn't change that file so should succeed.
Use the basic URL parser when parsing URLs

Blob handling is not required, since the resulting URL is not stored in
any way that would make the blob handling visible. This is consistent
with what the URL constructor and similar uses do.

This change should not be observable.

Partially addresses whatwg#242.
Editorial: Correct "set" to "let"

These uses introduce the variable, so must be "set".
If the URL's host is an IP address, the string representation is
required.

This corrects/clarifies existing behavior.
The current specs only handle host canonicalization for IP addresses and opaque hostnames but if an URL has a special scheme its host should be normalized as a domain name (IDNA processing).

Cf. whatwg#220
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants