10000 Edge case in async DNS causes crash · Issue #3389 · warmcat/libwebsockets · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Edge case in async DNS causes crash #3389

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mattfischer opened this issue May 14, 2025 · 2 comments
Closed

Edge case in async DNS causes crash #3389

mattfischer opened this issue May 14, 2025 · 2 comments

Comments

@mattfischer
Copy link

I've encountered a crash when running in a scenario where network connectivity is interrupted and restored. The crash occurs towards the end of lws_client_connect_via_info(), but appears to happen because earlier on in the function the wsi was destroyed.

I managed to trace the problem to lws_async_dns_query(), which is being called with lws_client_connect_3_connect() as a callback. This callback has some error paths which can result in the wsi being destroyed, however in my case this seems to be happening when the callback is called in lws_async_dns_complete(), which does not report an error in this case.

I put together a patch which fixes the issue for me, but I'd appreciate some feedback on whether this is the correct fix. Please see PR #3388

@lws-team
Copy link
Member

The crash occurs towards the end of lws_client_connect_via_info(), but appears to happen because earlier on in the function the wsi was destroyed.

Thanks... if I understand it, it's the new ongoing client connection wsi that's getting destroyed (and not the adns wsi) during a callback from the adns code into connect_3 as a callback. And that's happening, eg, because we timed-out or ran out of connection attempts or somesuch.

But due to a lack of passing the problem up from connect_3 through the adns code, we end up concealing the wsi has been destroyed and soldiering on with its dead body in the adns code.

I'd appreciate some feedback on whether this is the correct fix.

If so, the patch with adding the very specific return type spelling out to the caller that, eg, the wsi has been destroyed has worked well elsewhere in lws and makes it very clear to the caller what the situation is, it looks like the right way. Whether all connect_3 paths return NULL so that's enough I dunno, but AFAIK it should be better than what's there atm.

I pushed it on main and v4.3-stable, thanks again.

@mattfischer
Copy link
Author
mattfischer commented May 14, 2025

But due to a lack of passing the p 5D1E roblem up from connect_3 through the adns code, we end up concealing the wsi has been destroyed and soldiering on with its dead body in the adns code.

Yup, for me it was eventually segfaulting in https://github.com/warmcat/libwebsockets/blob/main/lib/core-net/client/connect.c#L487 when it tried to call the protocol callback, although I'm running 4.2 and it looks like on mainline this particular case might have been caught by the checks added in d3783f0. But if the wsi is dead, then any number of bad things might also happen, so it seemed like reporting the error immediately was the right thing to do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants
0