-
Notifications
You must be signed in to change notification settings - Fork 934
dnsdist: add option to support cache sharing between different payload sizes #15532
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
:-( ... waiting for other errors ... |
I have a few concerns about this feature request: ignoring the whole additional section, including EDNS options, seems like a recipe for disaster, especially since we cannot know which options will be implemented in the future. Even ignoring TSIG is probably a problem. |
Ok, skipping the whole AR sounds a shortcut, though it fits many customer env which is pretty simple and closed. I was thinking initially to skip AR count and the fix header of OPT RR only, however, that would involve changing
I think TSIG should not impact the actual answers? So it should not be critical to be included in cache key? I did not touch this part yet, does dnsdist perform any security validation or just forwarding? However, this reminds me that the DO bit might be important, we might not want to return the cache result from DO=0 client to a DO=1 client. In case other bits get important in future, it might be good to include the 4 bytes of TTL field in OPT RR. We can construct a 'faked' TTL (all 0s) for those who did not send an OPT RR.
Are we caching TC answer? I did not test and notice this yet. If that is the case then I would think we shall not cache TC answers. If TC happens quite a lot then we should somehow fire an alarm (by user's hook or metrics monitoring) and then config |
TSIG absolutely can affect what answer is provided if it is used by the backend to do view selection, which BIND, for example, is capable of. I operate an environment where one criteria used for view selection is which TSIG key the query is signed with, and have 4 views with the same zone, each of which have very different answers for some queries in each view. |
That's an fair answer, talking about bind9 view, then it could be configured with many other parameters as well. And PowerDNS can also edit an answer with hooks based on many other factors. It actually provided extra flexibility that RFCs does not define. Anyway, I would agree that skipping whole AR seems heavy and let's seek for more accurate control on this. |
A recent PR (#15423) introduce (optional) caching of TC=1 answers. On a side note, TC=1 answers are not an error condition, but a protocol feature to indicate: switch to TCP. Having a bigger buffer upstream does not mean downstream clients can handle that larger size. |
Thanks for pointing out this, I did see this parameter name but did not notice it was introduced recently for the purpose of caching TC answer. This does create a confliction, looks like the two features are mutually exclusive, both are trying to improve cache effeciency for given use cases. BTW, just think of that this PR also added handling block that if dnsdist cached size is greater than client's UDP payload size then it will send back TC immediately. It can actually serve the clients that have smaller payload size to get a quick TC thus can switch to TCP, in the mean time, it can also serve clients that have larger payload size with the right answer too. |
Just found that the |
Yes, the EDNS buffer size does need to be part of the cache key, e.g. RFC 8906 claims: https://datatracker.ietf.org/doc/html/rfc8906#name-whole-answer-caches
When I look at real world distributions of EDNS buffer sizes (specifically at authoritative nameservers) I do see a larger set of unique EDNS buffer sizes than I would like from a caching perspective, but also some buffer sizes are much more common than others. So I do think it makes sense to "quantize" the buffer size offered by the client so that the cache only needs to retain answers for a few common buffer sizes. For instance, at least in the data I've looked at, there are very few queries that use buffer sizes in the range of 513 to 1219 bytes (inclusive), so quantizing 512-1219 to 512 (that is, if a client offers anywhere from 512-1219, canonicalize it to 512 bytes when querying the downstream / searching the cache) is pretty non-impactful. Of course the tradeoff of doing this kind of quantization is that a client using an unusual buffer size might be more likely to get a TC=1 response when it's not strictly necessary. But it also prevents such clients from polluting the cache with not very useful cache entries. There are actually a decent number of resolvers out there using 1220, so using 1220 instead of 1232 as the cutpoint for the next quantization range can make sense. Note that you have to round down to the lower value in the quantization range with a scheme like this. If a client offers 1219 they can be validly served a response that was generated from a query with a 512 byte buffer size, but the opposite is not true. |
Thanks for the good discussion. I can see the reason that buffer size to be part of cache key while I also see use case that customer want to share cache between different buffer size settings. Defining quantized groups looks like a trade-off for both. In this case I would suggest not to use a hardcoded quantization, e.g. 512 to 1219 or 512 to 1231. We could provide one vector option (default empty? means do not use this feature, or with a pre-defined grouping?) so if needed user can customize the groups based on its own use case. One point is that we may not need to modify the payload size in the query, we just need to round down the payload size when calculating cache key. |
I think the cache key needs to be based on the EDNS UDP payload size that the client offers in its query. And that value needs to be quantized when searching the cache in order to avoid fragmenting the cache if clients use many different payload sizes. I think it also needs to be quantized when sending a downstream query to a server, to avoid generating a response that is too large for some clients in the quantization range. E.g., if you have a configured quantization range of 1220 to 1451 bytes and you get a client query with a payload size of 1400 bytes, and that value is sent unchanged to the downstream server, and the server sends a response with an actual size of 1300 bytes, that response cannot be used for clients that use payload sizes in the 1220 to 1299 part of the range. The payload size in the downstream query needs to be reset to 1220 so that the response is guaranteed to be usable for all the clients in the range. |
This is covered by cached size and query payload validation before cache result is sent to client. If customer setup its downstream servers to send differentl length of results for different payload size request of the same name, then they do not need this feature thus should not activate this feature.
Note that dnsdist is providing the capability of modifying payload 8000 size of the query before it gets send to downstream server. If we modify payload size after applying rules then it breaks the rule, if we modify payload size before applying rules then it does not stop the described use case from happening with the new size from rule. |
Hello, is this one still progressing? |
I still need to review it and I have been quite busy, sorry about that! I'll plan on doing that next week. |
Ok, no problem, thanks for reply. Just a quick keep-alive query. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for opening this pull request, and sorry about the review delay! I have to say I was a bit uneasy with this feature, mostly because it only works if the backend is not adding additional records based on the EDNS UDP payload size, in which case it might actually degrade the performance because a client might have been able to get a small enough response over UDP and will have to retry over TCP instead (if a reply with a bigger payload size is present in the cache). That being said, as long as the feature is not enabled by default this might an acceptable trade-off, and the code is not too intrusive, so I don't see any reason not to accept. I have requested some small changes and after that we should be good to go.
This branch is conflicted because we removed |
Hi, thanks for the conflicts notice! I wasn't sure if I need to do anything since I see the "Resolve conflicts" button is greyed out to me. Looks like this must be done via command line. I do need some help here. After searching I guess I should do followings: git checkout master Would you help to check if this is the right steps? I'm not quite sure if "--force-with-lease" is the right argument or a must here, but the searched result says it shall be used. |
What I usually do in this case is:
|
Use case arise that two clients with different udp payload size, a customized client uses 4096 while glibc resolver uses 1232. User would like to share cache result for the same query name, type and class in this case. The downstream servers does not use ECS and would not return different answers upon other EDNS options so this is to add an option to support such use case.
Signed-off-by: Remi Gacogne <remi.gacogne@powerdns.com>
Thanks! That is helpful, I followed and with a few |
Pull Request Test Coverage Report for Build 15572927645Details
💛 - Coveralls |
Short description
Use case arise that two clients with different udp payload size, a customized client uses 4096 while glibc resolver uses 1232. User would like to share cache result for the same query name, type and class in this case. The downstream servers does not use ECS and would not return different answers upon other EDNS options so this is to add an option to support such use case.
Checklist
I have: