-
Notifications
You must be signed in to change notification settings - Fork 212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PROTON-2818: Move epoll proctor connection logic to a task thread. #427
Conversation
c/src/proactor/epoll.c
Outdated
@@ -1139,6 +1141,21 @@ static pn_event_batch_t *pconnection_process(pconnection_t *pc, uint32_t events, | |||
} | |||
if (sched_ready) schedule_done(&pc->task); | |||
|
|||
if (pc->first_schedule) { | |||
// Normal case: resumed logic from pn_proactor_connect2. | |||
// But possible tie: pn_connection_wake() or pn_proactor_disconnect(). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo? "possibility of"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a typo but obviously not clear. While the original logic is preserved as much as possible, the dropping of the task lock and context switch allows competitor threads that were not possible prior to this change. Either of those two calls are possible from an arbitrary thread between the setting of first_schedule and arriving at this code.
A comment which doesn't make sense on its own is obviously not helpful. I will try to rework the comments and code structure for clarity on their own.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general if this works and passes the tests it's fine. But I feel that we've added (yet another) state in the implicit lifecycle state machine of connections and raw_connections. It would be easier to understand the lifecycle IMO if this state machine was explicit instead of being represented by ostensibly orthogonal booleans which I suspect are not truely orthogonal.
if (rc->first_schedule) { | ||
// Normal case: resumed logic from pn_proactor_raw_connect. | ||
// But possible tie: pn_raw_connection_wake() | ||
// Defer wake check until getaddrinfo is done. | ||
rc->first_schedule = false; | ||
assert(!events); // No socket yet. | ||
praw_connection_first_connect_lh(rc); // Drops and reacquires lock. | ||
if (rc->psocket.epoll_io.fd != -1 && !pni_task_wake_pending(&rc->task)) { | ||
unlock(&rc->task.mutex); | ||
return NULL; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this code should be the first piece of code in the possibilities, as it is the first that should happen; currently the logi cto kick off the connect is first; but now the logic to do the lookup must be earlier in the lifecycle of the connection so for clarity it should be the first condition in the sequence (unless the semantics mean that this doesn't work for some reason).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hopefully addressed in the reworked version.
Agreed about the lifecycle state issue. This proposed fix strives for the minimal code logic changes to move the blocking activity to a different thread. The subsequent "real fix" for the parent JIRA will necessarily introduce a new state (presumably with early cancel option compared to the current blocked-until-done). The initiating of the getaddrinfo call will also presumably be sensibly moved back to the pn_xxx_connect call to avoid a pointless thread switch and the first_call boolean will have no purpose.
@cliffjansen Thanks for these changes - this is definitely clearer than the original change. |
Merged |
This patch moves connection logic that can possibly block from the calling thread to the first actual scheduling of the task itself. It otherwise makes no change to the existing logic. Including in the following cases:
The trimmed "if (conn->disconnected)..." bits are code simplifications where the value of "disconnected" cannot have changed.
Similarly, the schedule_if_inactive() calls are trimmed as the inactive test is necessarily false with the newly added task in the task list.