Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor PyUnicodeError internal C helpers #127787

Closed
picnixz opened this issue Dec 10, 2024 · 3 comments
Closed

Refactor PyUnicodeError internal C helpers #127787

picnixz opened this issue Dec 10, 2024 · 3 comments
Assignees
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement

Comments

@picnixz
Copy link
Member

picnixz commented Dec 10, 2024

Feature or enhancement

Proposal:

Some helpers make the realization of #126004 a bit harder as we end up with many duplicated code. In order to make the related PRs and #127694 smoother, I suggest refactoring the following helpers:

  • Unify get_unicode and get_string as a single as_unicode_error_attribute.
  • Allow to retrieve the underlying object attribute and its size in one function, say get_unicode_error_object_and_size. This is typically useful for unifying
  • The PyUnicode{Encode,Decode,Translate}Error_GetObject public functions can use a common implementation unicode_error_get_object_impl. Since this depends on the underlying object type, the implementation function would take a flag as a parameter to indicate whether it's a bytes or a string.
  • All PyUnicode{Encode,Decode}Error_GetEncoding public functions can use a common implementation unicode_error_get_encoding_impl. The encoding is always a string.
  • All PyUnicode{Encode,Decode,Translate}Error_{Get,Set}Reason public functions can use a common implementation unicode_error_{get,set}_reason_impl. The reason is always a string.
  • All PyUnicode{Encode,Decode,Translate}_{Get,Set}{Start,End} public functions can use a common implementation unicode_error_{get,set}_{start,end}_impl. Since this depends on the underlying object type, the implementation function would take a flag as a parameter to indicate whether it's a bytes or a string.

Has this already been discussed elsewhere?

This is a minor feature, which does not need previous discussion elsewhere

Links to previous discussion of this feature:

#127674 (comment)

Linked PRs

@picnixz picnixz added type-feature A feature request or enhancement interpreter-core (Objects, Python, Grammar, and Parser dirs) labels Dec 10, 2024
@picnixz picnixz self-assigned this Dec 10, 2024
@encukou encukou changed the title Refactor PyUnicodeError internal C helpers gh-126004: Refactor PyUnicodeError internal C helpers Dec 10, 2024
@encukou encukou changed the title gh-126004: Refactor PyUnicodeError internal C helpers Refactor PyUnicodeError internal C helpers Dec 10, 2024
@encukou
Copy link
Member

encukou commented Dec 10, 2024

I think the gh-126004 issue can be used for the related refactoring. But I guess this one works as well.

IMO, there can be a _PyUnicodeError_GetParams to get all of object, size, start and end at the same time, and base the public functions on that.
In unicodeobject the compiler could hopefully inline the function and elide the extra work, and codecs usually needs all the values.

@picnixz
Copy link
Member Author

picnixz commented Dec 10, 2024

_PyUnicodeError_GetParams

Ah I missed this one. I can add it now (the interface is nice enough). You can comment on the PR which functions you'd like to unify even more.

EDIT: I've added this one and removed un-necessary functions. Now it looks much nicer!

encukou pushed a commit that referenced this issue Jan 3, 2025
…face (GH-127789)

- Unify `get_unicode` and `get_string` in a single function.

- Allow to retrieve the underlying `object` attribute, its
  size, and the adjusted 'start' and 'end', all at once.
  Add a new `_PyUnicodeError_GetParams` internal function for this.
  (In `exceptions.c`, it's somewhat common to not need all the attributes,
  but the compiler has opportunity to inline the function and optimize
  unneeded work away. Outside that file, we'll usually need all or
  most of them at once.)

- Use a common implementation for the following functions:

  - `PyUnicode{Decode,Encode}Error_GetEncoding`
  - `PyUnicode{Decode,Encode,Translate}Error_GetObject`
  - `PyUnicode{Decode,Encode,Translate}Error_{Get,Set}Reason`
  - `PyUnicode{Decode,Encode,Translate}Error_{Get,Set}{Start,End}`
@picnixz
Copy link
Member Author

picnixz commented Jan 3, 2025

I can now close this one and move back to fixing the codecs.

@picnixz picnixz closed this as completed Jan 3, 2025
WolframAlph pushed a commit to WolframAlph/cpython that referenced this issue Jan 4, 2025
… interface (pythonGH-127789)

- Unify `get_unicode` and `get_string` in a single function.

- Allow to retrieve the underlying `object` attribute, its
  size, and the adjusted 'start' and 'end', all at once.
  Add a new `_PyUnicodeError_GetParams` internal function for this.
  (In `exceptions.c`, it's somewhat common to not need all the attributes,
  but the compiler has opportunity to inline the function and optimize
  unneeded work away. Outside that file, we'll usually need all or
  most of them at once.)

- Use a common implementation for the following functions:

  - `PyUnicode{Decode,Encode}Error_GetEncoding`
  - `PyUnicode{Decode,Encode,Translate}Error_GetObject`
  - `PyUnicode{Decode,Encode,Translate}Error_{Get,Set}Reason`
  - `PyUnicode{Decode,Encode,Translate}Error_{Get,Set}{Start,End}`
srinivasreddy pushed a commit to srinivasreddy/cpython that referenced this issue Jan 8, 2025
… interface (pythonGH-127789)

- Unify `get_unicode` and `get_string` in a single function.

- Allow to retrieve the underlying `object` attribute, its
  size, and the adjusted 'start' and 'end', all at once.
  Add a new `_PyUnicodeError_GetParams` internal function for this.
  (In `exceptions.c`, it's somewhat common to not need all the attributes,
  but the compiler has opportunity to inline the function and optimize
  unneeded work away. Outside that file, we'll usually need all or
  most of them at once.)

- Use a common implementation for the following functions:

  - `PyUnicode{Decode,Encode}Error_GetEncoding`
  - `PyUnicode{Decode,Encode,Translate}Error_GetObject`
  - `PyUnicode{Decode,Encode,Translate}Error_{Get,Set}Reason`
  - `PyUnicode{Decode,Encode,Translate}Error_{Get,Set}{Start,End}`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

2 participants