-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ERR: consistent error messages for unsupported setitem values #60218
Comments
Very nice! I believe in the ValueError cases pandas is trying to convert in setitem. data = Index(np.arange(5), dtype="int64")
data.array[0] = "1"
print(data)
# Index([1, 1, 2, 3, 4], dtype='int64') Some of the TypeError cases are also a result of pandas attempting to convert, e.g. Whether we try to convert or raise seems to me to be the most important from a API consistency perspective. That then determines whether we should be raising a ValueError (conversion failed) or TypeError (RHS has the wrong type). |
I think it is actually coming from numpy, but indeed from trying to convert the input (because numpy actually allows you to set strings, as long s they can be converted to an int): >>> arr = np.array([1, 2, 3])
>>> arr[0] = "10"
>>> arr
array([10, 2, 3])
>>> arr[0] = "not an integer"
...
ValueError: invalid literal for int() with base 10: 'not an integer'
>>> int("not an integer")
...
ValueError: invalid literal for int() with base 10: 'not an integer' For our nullable dtypes, we don't allow setting strings, so in that case there is no "failed conversion", but the value's type is just considered wrong: >>> arr = pd.array([1, 2, 3])
>>> arr[0] = "10"
...
File ~/scipy/repos/pandas/pandas/core/arrays/masked.py:289, in BaseMaskedArray._validate_setitem_value(self, value)
...
TypeError: Invalid value '10' for dtype Int64 |
In terms of messages in the case of a TypeError, we currently have those variations:
Any preference there? I was thinking of some combo like "Invalid value '{value}' for dtype xx. Value should be a 'xx' or .., got '{type(value)}' instead." |
Thanks @jorisvandenbossche, I didn't realize conversion was a NumPy behavior. Should we be unifying the behavior between NumPy-backed (converting & raising ValueError on failure) and EA-backed (no attempt to convert & raising TypeError)? |
While cleaning up some string tests, I noticed that the setitem validation error message was different between pyarrow vs python storage for StringDtype (and will do a PR to make that consistent), but that made me wonder how the situation is in general. Creating an overview here, similarly to #59580 (error messages in reduction operations).
The code to generate the table above (the above table is a trimmed version of the result, removing some lines with identical results):
The text was updated successfully, but these errors were encountered: