-
Notifications
You must be signed in to change notification settings - Fork 734
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crash on com.apollographql.websocket on _inputStreamCallbackFunc #3390
Crash on com.apollographql.websocket on _inputStreamCallbackFunc #3390
Comments
Hi @aaronbarsky - you mentioned this being consistent; do you know the conditions or sequence of events that leads to the crash. Being able to have a reproduction case would be immensely helpful to resolving this. |
By consistent, I mean that the call stack is exactly the same in each case. It happens about 1/500 sessions. The challenge is that the start of the crash is
which makes it quite tough to write a unit test or minimally reproducible example. |
We're going to really struggle to create or verify a fix for this since, as you said, its difficult to reproduce it.
What codepaths are you referring to here? If there are some obvious areas where we are creating unsafe memory access, I'd suggest trying to fork the library, fix the |
I worked around the problem with an explicit call to closeConnection. Previously I had
Now I keep an explicit reference to the networkTransport so I can close it before deallocating the client
(Note that networkTransport is Fixing it in the library would require a lot of changes, that could be complex. The problem is that FoundationStream marks the delgate as Current situation
My theoretical fix would look something like
The tricky bit is that you can't make strong references to self in the deinit. So rather than the delegate being an instance of Apollo.FoundationStream, it would have to be a separate object with a weak reference back to FoundationStream. I'm too busy to make an attempt on this now. Perhaps if this code doesn't change much in Apollo 2.0, I can give it another look in the future. |
Thanks for the great debugging @aaronbarsky.
The current StarScream-based websocket implementation will be removed from Apollo iOS; it's old, unsupported and hindering progress. Ideally we build a default websocket implementation using iOS native websocket APIs but allow users to swap in whatever websocket library they choose to use. When that happens is still undecided though. We're trying to keep the scope of 2.0 breaking changes as small as possible and it really depends how much work is needed here to achieve that. |
I'm seeing a handful of crashes like this as well in my app |
@aaronbarsky Isn't setting the
If that's the case, could the crash be that On my end, the crashes seem to be happening because I call |
Any update here? I see a lot of crashes with 100% iOS 18 on |
@arnauddorgans - if you're able to provide a reproduction sample project or more debugging on your end may help in figuring out the root cause. I think this issue is becoming a catch-all for anything websocket related and they may not all be the same thing. |
Is it possible that next minor version will contains the fix for this? It seems reproducible. |
@vladdorfman, all the recent instances of crash captures seem to be adhoc. I don't think it's reliably reproducible yet but if you're able to write a test that can consistently crash please do. |
Yeah, we will need a reproduction case we can use to debug and verify a fix. Even if it only crashes one out of ten times. But a screenshot of the stack trace in your existing project is not enough for us to track this down. If you are able to create a unit test or provide an example project with the crash occurring, we will prioritize a fix. |
I spent some time on this issue yesterday and again today trying to craft a test that fails. I haven't managed to replicate a single failure in 10000 repetitions of various tests; closing the websocket, leaving it dangling, setting the transport to nil, cancelling the subscription, etc. The stack trace screenshots unfortunately don't tell us much other than it's happening in cleanup, and I don't think we can confidently make any changes to the cleanup code without knowing exactly how things are failing. |
This is still the number one crash for our app by a large margin. I can't reproduce it in the debugger, but I have experienced it occasionally on my device in the wild. This morning I found a potentially related discussion on the Apple Developer forums. The stack traces are very similar to what I'm seeing in Sentry & Xcode Organizer. https://forums.developer.apple.com/forums/thread/769191 |
Thanks for linking to that discussion @pixelmatrix, it's an interesting read. I agree it sounds very similar to what you and others are experiencing in this issue. While there are probably improvements we can make to the cleanup sequence this is looking more like a iOS 18 + |
That was my thought as well. Given that it may not be actionable on our side, any recommendation for how we can get this in front of someone at Apple? |
Maybe a +1 on that Apple issue with your own traces if available? I don't have any contacts that could ensure prompt action on this unfortunately. |
Hi @pixelmatrix - have you had any luck with this issue since early December? In the Apple discussion thread it looks like one of the reporters claims to have fixed it after investigating threading. Wondering if that could lead anywhere in this issue too. |
I didn't see that update, but it sounds like a promising theory worth exploring. I've actually moved on to a different approach here to solve our crashes, given how prominent this crash is and how long we've been stuck. Since the SDK technically allows for other implementations of WebSocketClient, I'm about to ship an update that uses a custom implementation powered by a more modern websocket library. Has the team considered updating WebSocketClient using a more modern stack? There's proper support for WebSockets now in both Network.framework and URLSession. |
I agree our websocket implementation is old; it's based on the 3.1.2 version of StarScream. I believe we did investigate moving to the more modern 4.x versions a long time ago but it was not practical/feasible. I'd have to go dig up the exact reasons for that outcome though. We do have plans to modernize the websocket implementation once the 2.0 work is done. That would be released as a minor version post 2.0 release. |
Regarding the comment in the Apple thread, I wonder if |
@pixelmatrix we've been working with another customer that was able to to reproduce the crash and got PR apollographql/apollo-ios-dev#578 up to resolve that. It looks successful from our limited testing and we're waiting for them to provide feedback but it would be great if you could check it out and test on your end too. |
Do you have any feedback for the maintainers? Please tell us by taking a one-minute survey. Your responses will help us understand Apollo iOS usage and allow us to serve you better. |
Oops, did not mean to close this. Will reopen.. |
Summary
Crashlytics is reporting a very rare but consistent crash on the com.apollograph.websocket queue.
The app is a mix of 60% foreground and 40% background.
Version
1.9.2
Steps to reproduce the behavior
Unfortunately I only have crash logs
Logs
No response
Anything else?
My hunch is that it's the FoundationStream teardown sequence:
stream.delegate is an
unowned(unsafe)
var.There are several codepaths where cleanup is not called on com.apollographql.websocket. When this happens, delegate can be set to nil as the CFReadStream is attempting to invoke a callback on the deallocated delegate.
The text was updated successfully, but these errors were encountered: