Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC, not for merge] Net: Websocket: introduce non-blocking receive frame API #4709

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Cahb
Copy link

@Cahb Cahb commented Sep 24, 2024

First of all, I'd like to emphasize that it's an Request For Comments PR, as I would love to hear your opinion on the changes as a whole, to make sure it's an community approved / way-to-go approach / changes.

Changes description:
Currently, the Receive Frame of WebsocketImpl is implemented in a way, that it expects for a single websocket frame to be received in a blocking fashion: untill frame gets received, context is blocked till completion.

This is an issue with poor-network-connection clients that connect to server in a way, that clients with high packet loss / clients that send only part of a WSS frame could potentially lock-up processing thread for a very long time.

This fix implements a new set of coroutines, that mimic the behavior of previous implementation, however are aimed to use internal buffer for caching partially received frame, and return one whenever frame assembly is completed.
This proves to unblock the caller thread from any hussle with bad connections, and enables the true async
implementation of socket processing.

Real life use case / why I made the changes:
The reason why I did these changes in the first place, is to address an issue we've had with a SW that we were using: Poco library is the backbone of the OpenWiFi uCentral Gateway (later on referrenced as 'GW') software (link that is used as a Cloud controller for uCentral / OpenWifi capable and compatible Access Point devices.
The GW itself is built on top of Poco in the Async-programming fashion.

The part of the GW where we had an issue, is the Observer / Reactor part, that get's triggered upon EPOLLIN events arriving at the underlying websocket FD, which later on makes a call to a blocking receiveFrame call - link 2.
After thorough investigation, we found out, that underlying receiveFrame calls receiveBytes->receiveHeader / receiveMask / receivePayload and finally WebSocketImpl::receiveNBytes.
The problem with receiveNBytes function, is that it's a loop-based function, that tries to receiveSomeBytes up untill the size of received data is not satisfied with payloadSize of websocket frame.

E.g. this function expects the underlying socket to return complete websocket frame / payload in a loop. Untill exact number of requested bytes of data is not being read the function does not return control to the caller.

A different approach - proposed with this RFC / PR - would be to expose a new API call, that would utilize internal buffering on each recv() call, and in case if bufferized data is sufficient to compose a single WSS frame, it would then return complete frame back to the caller.

Final words:
While these changes address specific issue we've had in our production deployments - APs with bad internet connection caused control / websocket processing threads to hang - the use case does not explicitly limits changes usage to just better handling of connections with bad internet whatsoever.
These changes fundamentally challenge the synchronous base nature of the underlying receiveFrame function.

CC: @phwhite @stephb9959 @i-chvets @SviatoslavBoichuk @serhiy-p @taraschornyiplv

Currently, the Receive Frame of WebsocketImpl is implemented in a way, that it
expects for a single websocket frame to be received in a blocking fashion:
untill frame gets received, context is blocked till completion.

This is an issue with poor-network-connection clients that
connect to server in a way, that clients with high packet
loss / clients that send only part of a WSS frame could
potentially lock-up processing thread for a very long
time.

This fix implements a new set of coroutines, that mimic
the behavior of previous implementation, however are
aimed to use internal buffer for caching partially
received frame, and return one whenever frame assembly
is completed.
This proves to unblock the caller thread from any hussle
with bad connections, and enables the true async
implementation of socket processing.

Signed-off-by: Oleksandr Mazur <[email protected]>
_bufferOffset -= totalRewind;
_frameFlags = 0;
return 0;
} else if (frameSize < 0) {

Check warning

Code scanning / CodeQL

Comparison result is always the same Warning

Comparison is always false because frameSize >= 1.
return frameSize;
}

// clear internal buffer from received frame;

Check notice

Code scanning / CodeQL

Commented-out code Note

This comment appears to contain commented-out code.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant