-
Notifications
You must be signed in to change notification settings - Fork 5.5k
Try to skip getaddrinfo if "host" is already an IP #3113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Supporting code is dozens of lines, which seems like a lot. How much difference does it make? I guess it must be substantial since this got merged into the standard library, but I'm surprised. I wonder if this could be moved from its current location into |
@bdarnell here's the associated PR: python/asyncio#302 I think it's substantial as a redundant trip around |
Thanks for that link. My question was really about teasing out where exactly the slowness is - is it about using any I asked this question because depending on the answer, it seems like the optimization could be built in to asyncio.loop.getaddrinfo itself instead of being duplicated in Tornado. And indeed, python/asyncio#302 did this. However, it was reversed in python/asyncio#357: So by the same logic, this optimization should exist in TCPClient and not in Resolver (just as you have it in #3116). The amount of code duplication is unfortunate but I doubt its worth trying to get a public version of
That is, run getaddrinfo once with AI_NUMERICHOST without an executor for the case in which the requested host is already resolved, and only if that fails run regular getaddrinfo in an executor. I'm sure this gets a little more complicated than my 5-line sketch but should still be simpler than completely duplicating @ajdavis Anything to add? |
It’s been a long time since I did the asyncio patch. I remember thinking at the time it was a surprising amount of code duplication but the Python core team decided it was acceptable. Yes, the getaddrinfo lock is gone on major platforms now, that might or might not affect your decisions here. 😉 |
fyi the trio implementation uses this AI_NUMERICHOST trick also: https://github.com/python-trio/trio/blob/4edfd41bd5519a2e626e87f6c6ca9fb32b90a6f4/trio/_socket.py#L125-L192 |
I've made a PR to port the AI_NUMERICHOST | AI_NUMERICSERV shortcut from trio to asyncio here: python/cpython#31497 |
OK, so if that cpython patch is accepted then it'll be solved in getaddrinfo and we don't have to do anything on the tornado side. (we could do a similar patch in the tornado resolver to get the benefit sooner but then we'd also have to make sure that there's no locking around getaddrinfo on older python versions that tornado still supports) But I'm not sure that patch should be accepted - Jesse's benchmarking found that getaddrinfo with AI_NUMERICHOST was much slower than inet_pton on macos in 2015. Unless that's changed, the trio patch would be a performance regression, so it might be better to leave it as is. That would mean we'd still need something like #3116 to get this performance benefit for tornado. Given the amount of subtleties we've uncovered here and the lack of concrete benchmarks showing improvements in Tornado functionality, I'm inclined to keep things as they are and not worry about making this optimization. |
@bdarnell I did some benchmarks against a fresh Macos runner and it seems the issues in getaddrinfo are fixed python/cpython#90980 (comment) |
OK, so given those performance results I'm in favor of the |
TCPClient always calls
resolver.resolve(
even if"host"
is already an IPtornado/tornado/tcpclient.py
Lines 260 to 265 in 00c9e0a
asyncio has a nice implementation that skips the getaddrinfo call https://github.com/python/cpython/blob/b5527688aae11d0b5af58176267a9943576e71e5/Lib/asyncio/base_events.py#L1369-L1380 in this case
The text was updated successfully, but these errors were encountered: