Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

<regex>: Some escape sequences are mishandled #5244

Open
muellerj2 opened this issue Jan 17, 2025 · 0 comments
Open

<regex>: Some escape sequences are mishandled #5244

muellerj2 opened this issue Jan 17, 2025 · 0 comments

Comments

@muellerj2
Copy link
Contributor

There are a number of escape sequences that the parser mistakenly accepts or miscompiles.

ECMAScript

  • Backreferences with leading zero digits (e.g., \01 for capture group 1) should be rejected. [ECMA-262 3rd ed., Section 15.10.2.11 "DecimalEscape"]
  • \00 and more zero digits should be rejected and not be interpreted as an escape for NUL. Only \0 is a valid escape sequence for NUL. [ECMA-262 3rd ed., Section 15.10.2.11 "DecimalEscape"]
  • When a custom traits implementation defines a new character class "z", [\z] matches the characters in this class and not the character z. (Meanwhile, \z without brackets matches the character z and not the characters in the class "z".) [ECMA-262 3rd ed., Sections 15.10.1 "Patterns" and 15.10.2.12 "CharacterClassEscape"]
  • [\b] should match U+0008 BACKSPACE, not b. [ECMA-262 3rd ed., Section 15.10.2.19 "ClassEscape"]

awk

See Section "Regular expressions" in the awk specification.

  • Octal escape sequences are not parsed correctly in square-bracket character class definitions. (E.g., [\040] should match U+0020 SPACE.)
  • Similarly, [\"] and [\/] match backslashes as well even though they shouldn't.
  • While the awk specification says that using unspecified escape sequences results in undefined behavior, I think we should reject them. (I believe we should handle this differently from ECMAScript mode, where unrecognized escape sequences just yield the escaped character.)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant