Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-128942: make arraymodule.c free-thread safe #128943

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

tom-pytel
Copy link
Contributor

@tom-pytel tom-pytel commented Jan 17, 2025

This PR is a work in progress but it currently fixes the crashes mentioned in the related issue so I am sending it up now to elicit feedback to make sure I am on the right path. So far the changes have been made functional but as minimal as possible to facilitate review. The critical sections are fairly broad so optimization may follow if functionality is confirmed. The changes are mostly additions of critical sections to publicly visible methods and slots gotten from the two structures listed below (as well as a few internal functions):

static PyMethodDef array_methods[] = {
    ARRAY_ARRAY_APPEND_METHODDEF
    ARRAY_ARRAY_BUFFER_INFO_METHODDEF
    ARRAY_ARRAY_BYTESWAP_METHODDEF
    ARRAY_ARRAY_CLEAR_METHODDEF
    ARRAY_ARRAY___COPY___METHODDEF
    ARRAY_ARRAY_COUNT_METHODDEF
    ARRAY_ARRAY___DEEPCOPY___METHODDEF
    ARRAY_ARRAY_EXTEND_METHODDEF
    ARRAY_ARRAY_FROMFILE_METHODDEF
    ARRAY_ARRAY_FROMLIST_METHODDEF
    ARRAY_ARRAY_FROMBYTES_METHODDEF
    ARRAY_ARRAY_FROMUNICODE_METHODDEF
    ARRAY_ARRAY_INDEX_METHODDEF
    ARRAY_ARRAY_INSERT_METHODDEF
    ARRAY_ARRAY_POP_METHODDEF
    ARRAY_ARRAY___REDUCE_EX___METHODDEF
    ARRAY_ARRAY_REMOVE_METHODDEF
    ARRAY_ARRAY_REVERSE_METHODDEF
    ARRAY_ARRAY_TOFILE_METHODDEF
    ARRAY_ARRAY_TOLIST_METHODDEF
    ARRAY_ARRAY_TOBYTES_METHODDEF
    ARRAY_ARRAY_TOUNICODE_METHODDEF
    ARRAY_ARRAY___SIZEOF___METHODDEF
    {"__class_getitem__", Py_GenericAlias, METH_O|METH_CLASS, PyDoc_STR("See PEP 585")},
    {NULL, NULL}  /* sentinel */
};

static PyType_Slot array_slots[] = {
    {Py_tp_dealloc, array_dealloc},
    {Py_tp_repr, array_repr},
    {Py_tp_getattro, PyObject_GenericGetAttr},
    {Py_tp_doc, (void *)arraytype_doc},
    {Py_tp_richcompare, array_richcompare},
    {Py_tp_iter, array_iter},
    {Py_tp_methods, array_methods},
    {Py_tp_members, array_members},
    {Py_tp_getset, array_getsets},
    {Py_tp_alloc, PyType_GenericAlloc},
    {Py_tp_new, array_new},
    {Py_tp_traverse, array_tp_traverse},

    /* as sequence */
    {Py_sq_length, array_length},
    {Py_sq_concat, array_concat},
    {Py_sq_repeat, array_repeat},
    {Py_sq_item, array_item},
    {Py_sq_ass_item, array_ass_item},
    {Py_sq_contains, array_contains},
    {Py_sq_inplace_concat, array_inplace_concat},
    {Py_sq_inplace_repeat, array_inplace_repeat},

    /* as mapping */
    {Py_mp_length, array_length},
    {Py_mp_subscript, array_subscr},
    {Py_mp_ass_subscript, array_ass_subscr},

    /* as buffer */
    {Py_bf_getbuffer, array_buffer_getbuf},
    {Py_bf_releasebuffer, array_buffer_relbuf},

    {0, NULL},
};

The following public methods/slots have been modified directly:

* array_array_buffer_info   = ARRAY_ARRAY_BUFFER_INFO_METHODDEF
* array_array_byteswap      = ARRAY_ARRAY_BYTESWAP_METHODDEF
* array_array_clear         = ARRAY_ARRAY_CLEAR_METHODDEF
* array_array_count         = ARRAY_ARRAY_COUNT_METHODDEF
* array_array_fromlist      = ARRAY_ARRAY_FROMLIST_METHODDEF  / CRITICAL2
* array_array_fromunicode   = ARRAY_ARRAY_FROMUNICODE_METHODDEF
* array_array_index         = ARRAY_ARRAY_INDEX_METHODDEF
* array_array_pop           = ARRAY_ARRAY_POP_METHODDEF
* array_array_remove        = ARRAY_ARRAY_REMOVE_METHODDEF
* array_array_reverse       = ARRAY_ARRAY_REVERSE_METHODDEF
* array_array_tofile        = ARRAY_ARRAY_TOFILE_METHODDEF
* array_array_tolist        = ARRAY_ARRAY_TOLIST_METHODDEF
* array_array_tobytes       = ARRAY_ARRAY_TOBYTES_METHODDEF
* array_array_tounicode     = ARRAY_ARRAY_TOUNICODE_METHODDEF
* array_array___sizeof__    = ARRAY_ARRAY___SIZEOF___METHODDEF
* array_repr                = Py_tp_repr
* array_richcompare         = Py_tp_richcompare  / CRITICAL2
* array_new                 = Py_tp_new
* array_length              = Py_sq_length
* array_concat              = Py_sq_concat  / CRITICAL2
* array_repeat              = Py_sq_repeat
* array_item                = Py_sq_item
* array_ass_item            = Py_sq_ass_item
* array_contains            = Py_sq_contains
* array_inplace_repeat      = Py_sq_inplace_repeat
* array_length              = Py_mp_length
* array_subscr              = Py_mp_subscript  -> array_item
* array_ass_subscr          = Py_mp_ass_subscript  / CRITICAL2
* array_buffer_getbuf       = Py_bf_getbuffer
* array_buffer_relbuf       = Py_bf_releasebuffer

The following have not been modified but depend for safety on the functions they call:

+ array_array_append        = ARRAY_ARRAY_APPEND_METHODDEF  -> ins
+ array_array___copy__      = ARRAY_ARRAY___COPY___METHODDEF  -> array_slice
+ array_array___deepcopy__  = ARRAY_ARRAY___DEEPCOPY___METHODDEF  -> array_array___copy__
+ array_array_extend        = ARRAY_ARRAY_EXTEND_METHODDEF  -> array_do_extend  / CRITICAL2
+ array_array_frombytes     = ARRAY_ARRAY_FROMBYTES_METHODDEF  -> frombytes  / CRITICAL2
+ array_array_insert        = ARRAY_ARRAY_INSERT_METHODDEF  -> ins
+ array_array___reduce_ex__ = ARRAY_ARRAY___REDUCE_EX___METHODDEF  -> array_array_tolist_impl, array_array_tobytes_impl
+ array_inplace_concat      = Py_sq_inplace_concat  -> array_do_extend  / CRITICAL2

The following look safe as is:

- array_array_fromfile      = ARRAY_ARRAY_FROMFILE_METHODDEF  -> array_array_frombytes
- array_dealloc             = Py_tp_dealloc
- array_iter                = Py_tp_iter
- array_tp_traverse         = Py_tp_traverse

And the following non-public utility functions have been made safe:

* array_slice
* array_do_extend  / CRITICAL2
* ins
* frombytes  / CRITICAL2

Here are a few questions needing guidance:

  • Changed clinic sig of array_array_frombytes() so I could lock the original source object during op (maybe its a writeable bytearray). Want to prevent modification of data during the operation but not sure this will even do this so not sure is necessary or even desired. Should I remove this and leave just the PyObject_GetBuffer() for protection? Also for this added a forward declaration of array_array_frombytes() for array_array_fromfile_impl() instead of moving stuff around for minimal impact.

  • In array_new there may or may not be an initializer object, if there is not then the args tuple is used as a stand-in for the critical section lock. Is there a better way to do this?

Misc:

  • array_richcompare was incrementing refcount of Py_True and Py_False, removed that as they are immortal.

  • The following were larger / slightly more complicated functions to check so should double check array_array_frombytes, array_richcompare, array_ass_subscr and array_new.

More work is still needed, like the array iterator and more testing and verification.

@bedevere-app
Copy link

bedevere-app bot commented Jan 17, 2025

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

@tom-pytel
Copy link
Contributor Author

Requesting a look from @ZeroIntensity.

Copy link
Member

@ZeroIntensity ZeroIntensity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this! At a glance, don't use goto with critical sections--it's messier and can be error-prone (IIRC there's problems with it compiling on MSVC too). Instead, create a wrapper function like this:

static PyObject *
array_something(arrayobject *self, PyObject *arg)
{
    PyObject *res;
    Py_BEGIN_CRITICAL_SECTION(self);
    res = array_something_lock_held(); // the actual implementation function
    Py_END_CRITICAL_SECTION();
    return res;
}

For functions that are defined with the Argument Clinic, you can use @critical_section to do this even faster. (See what I did for the ssl module.)

@tom-pytel
Copy link
Contributor Author

tom-pytel commented Jan 17, 2025

Made requested changes, also made arrayiter safe.

Also found a good old-fashioned non-free-threaded bug which probably goes back a long way - if try to arrayiter.__setstate__() when it is exhausted then segfault, fixed here obviously.

Question remaining about need to lock bytes source object in frombytes() or if holding the buffer is sufficient?

@erlend-aasland
Copy link
Contributor

Also found a good old-fashioned non-free-threaded bug which probably goes back a long way - if try to arrayiter.__setstate__() when it is exhausted then segfault, fixed here obviously.

Could you separate that out from this PR and submit a dedicated PR for this bug? We might want to backport it.

@erlend-aasland
Copy link
Contributor

Regarding the style nits: please make sure all new code (and in most cases, also changed code) follow PEP-7.

/*[clinic input]
array.array.frombytes
buffer: Py_buffer
bytes: object
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remind me again why we can't use @critical_section here? It's been a while since I've had a look at these kinds of changes. cc. @ZeroIntensity

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just put it on the internal function same as with ins(), but I can raise the critical sections to the public functions array_array_frombytes, array_array_insert and array_array_append if you prefer.

Copy link
Contributor

@erlend-aasland erlend-aasland Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. If Peter is fine with this variant, I'm also fine with it, but please leave the docstring intact. You need this (coupled with a make clinic):

Suggested change
bytes: object
bytes as buffer: object

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And maybe you can answer that question for me here, does bytes need to be locked in this case or is it sufficient just to hold the buffer as was the case in the original non-free safe code?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this case, I don't think we need a lock_held function. Just @critical_section should do.

I'm not sure what the rules are for the buffer protocol in regards to free-threading. Hopefully it's fine to leave bytes unlocked.

Comment on lines +2654 to +2656
if (value != NULL) {
_Py_CRITICAL_SECTION_ASSERT_OBJECT_LOCKED(value);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use pre-processor guard instead of runtime guards for these: Py_DEBUG.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole if block gets optimized out when Py_DEBUG is not defined, but having the #ifdef doesn't hurt either.

@tom-pytel
Copy link
Contributor Author

Still would like to know if array_array_frombytes() really needs to lock the bytes object or if just holding the buffer is sufficient. The idea being trying to prevent the CONTENTS of the buffer changing during the operation. Just holding buffer is not enough if is writable but not sure locking object would help that anyway.

Copy link
Member

@ZeroIntensity ZeroIntensity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're getting there! Now, we're overlocking things a bit. This isn't always the rule, but in general you only need to lock something if you want to avoid inconsistent state (making it a "critical section"!).

That, and keep in mind that things that have a _Py or Py prefix are generally thread safe themselves and will lock if they need to. You don't need to lock for every call to Python.

/*[clinic input]
array.array.frombytes
buffer: Py_buffer
bytes: object
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this case, I don't think we need a lock_held function. Just @critical_section should do.

I'm not sure what the rules are for the buffer protocol in regards to free-threading. Hopefully it's fine to leave bytes unlocked.

@@ -2657,6 +2825,7 @@ array_buffer_getbuf(arrayobject *self, Py_buffer *view, int flags)
return -1;
}

Py_BEGIN_CRITICAL_SECTION(self);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should probably just be a lock_held function.

return 0;
}

static void
array_buffer_relbuf(arrayobject *self, Py_buffer *view)
{
self->ob_exports--;
_Py_atomic_add_ssize(&self->ob_exports, -1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will affect the default build, we need an #ifdef Py_GIL_DISABLED.

@@ -754,7 +760,7 @@ array_richcompare(PyObject *v, PyObject *w, int op)
res = Py_False;
else
res = Py_True;
return Py_NewRef(res);
return res;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks unrelated. Let's try to avoid that for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants