-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Edge case in async DNS causes crash #3389
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks... if I understand it, it's the new ongoing client connection wsi that's getting destroyed (and not the adns wsi) during a callback from the adns code into connect_3 as a callback. And that's happening, eg, because we timed-out or ran out of connection attempts or somesuch. But due to a lack of passing the problem up from connect_3 through the adns code, we end up concealing the wsi has been destroyed and soldiering on with its dead body in the adns code.
If so, the patch with adding the very specific return type spelling out to the caller that, eg, the wsi has been destroyed has worked well elsewhere in lws and makes it very clear to the caller what the situation is, it looks like the right way. Whether all connect_3 paths return NULL so that's enough I dunno, but AFAIK it should be better than what's there atm. I pushed it on main and v4.3-stable, thanks again. |
Yup, for me it was eventually segfaulting in https://github.com/warmcat/libwebsockets/blob/main/lib/core-net/client/connect.c#L487 when it tried to call the protocol callback, although I'm running 4.2 and it looks like on mainline this particular case might have been caught by the checks added in d3783f0. But if the wsi is dead, then any number of bad things might also happen, so it seemed like reporting the error immediately was the right thing to do. |
I've encountered a crash when running in a scenario where network connectivity is interrupted and restored. The crash occurs towards the end of
lws_client_connect_via_info()
, but appears to happen because earlier on in the function the wsi was destroyed.I managed to trace the problem to
lws_async_dns_query()
, which is being called withlws_client_connect_3_connect()
as a callback. This callback has some error paths which can result in the wsi being destroyed, however in my case this seems to be happening when the callback is called inlws_async_dns_complete()
, which does not report an error in this case.I put together a patch which fixes the issue for me, but I'd appreciate some feedback on whether this is the correct fix. Please see PR #3388
The text was updated successfully, but these errors were encountered: