dnsdist: add option to support cache sharing between different payload sizes #15532

pacnal · 2025-05-07T08:04:22Z

Short description

Use case arise that two clients with different udp payload size, a customized client uses 4096 while glibc resolver uses 1232. User would like to share cache result for the same query name, type and class in this case. The downstream servers does not use ECS and would not return different answers upon other EDNS options so this is to add an option to support such use case.

Checklist

I have:

read the CONTRIBUTING.md document
compiled this code
tested this code
included documentation (including possible behaviour changes)
documented the code
added or modified regression test(s)
added or modified unit test(s)

pacnal · 2025-05-07T08:08:25Z

[Spell checking: pdns/dnsdistdist/docs/reference/config.rst#L1068]
abvoe is not a recognized word. (unrecognized-spelling)

:-( ... waiting for other errors ...

rgacogne · 2025-05-07T15:15:22Z

I have a few concerns about this feature request: ignoring the whole additional section, including EDNS options, seems like a recipe for disaster, especially since we cannot know which options will be implemented in the future. Even ignoring TSIG is probably a problem.
On the other hand I do see a use-case for trying to be smarter about EDNS UDP buffer size, perhaps by defining a few groups (512, 512-1232, 1232-4096, I have not really thought about it). Doesn't skipping the UDP buffer size altogether mean that we could end up with only a truncated answer in the cache, if the first query for the name has a low value (512) or no EDNS? And then be stuck with that useless entry even for clients willing to accept a bigger response over UDP?

pacnal · 2025-05-08T01:06:47Z

I have a few concerns about this feature request: ignoring the whole additional section, including EDNS options, seems like a recipe for disaster

Ok, skipping the whole AR sounds a shortcut, though it fits many customer env which is pretty simple and closed. I was thinking initially to skip AR count and the fix header of OPT RR only, however, that would involve changing PacketCache::hashAfterQname() where we do not have an argument available to indicate the need. I was also thinking if possible to add a special option value in the d_optionsToSkip array to indidate this, but it is still tricky. And this funciton seems been used outside of dnsdist, so I'm hesitating to add a new argument for it. Do you think if it is ok to add a new argument (with default value false) to PacketCache::hashAfterQname() and so more accurate operations can be done?

Even ignoring TSIG is probably a problem.

I think TSIG should not impact the actual answers? So it should not be critical to be included in cache key? I did not touch this part yet, does dnsdist perform any security validation or just forwarding? However, this reminds me that the DO bit might be important, we might not want to return the cache result from DO=0 client to a DO=1 client. In case other bits get important in future, it might be good to include the 4 bytes of TTL field in OPT RR. We can construct a 'faked' TTL (all 0s) for those who did not send an OPT RR.

we could end up with only a truncated answer in the cache, if the first query for the name has a low value (512) or no EDNS? And then be stuck with that useless entry even for clients willing to accept a bigger response over UDP?

Are we caching TC answer? I did not test and notice this yet. If that is the case then I would think we shall not cache TC answers. If TC happens quite a lot then we should somehow fire an alarm (by user's hook or metrics monitoring) and then config dnsdist to enlarge its payload size towards downstream servers.

dwfreed · 2025-05-08T03:39:42Z

I think TSIG should not impact the actual answers? So it should not be critical to be included in cache key?

TSIG absolutely can affect what answer is provided if it is used by the backend to do view selection, which BIND, for example, is capable of. I operate an environment where one criteria used for view selection is which TSIG key the query is signed with, and have 4 views with the same zone, each of which have very different answers for some queries in each view.

pacnal · 2025-05-08T03:46:05Z

TSIG absolutely can affect what answer is provided if it is used by the backend to do view selection, which BIND, for example, is capable of. I operate an environment where one criteria used for view selection is which TSIG key the query is signed with, and have 4 views with the same zone, each of which have very different answers for some queries in each view.

That's an fair answer, talking about bind9 view, then it could be configured with many other parameters as well. And PowerDNS can also edit an answer with hooks based on many other factors. It actually provided extra flexibility that RFCs does not define. Anyway, I would agree that skipping whole AR seems heavy and let's seek for more accurate control on this.

omoerbeek · 2025-05-08T05:44:42Z

Are we caching TC answer? I did not test and notice this yet. If that is the case then I would think we shall not cache TC answers. If TC happens quite a lot then we should somehow fire an alarm (by user's hook or metrics monitoring) and then config dnsdist to enlarge its payload size towards downstream servers.

A recent PR (#15423) introduce (optional) caching of TC=1 answers. On a side note, TC=1 answers are not an error condition, but a protocol feature to indicate: switch to TCP. Having a bigger buffer upstream does not mean downstream clients can handle that larger size.

pacnal · 2025-05-08T06:13:16Z

A recent PR (#15423) introduce (optional) caching of TC=1 answers. On a side note, TC=1 answers are not an error condition, but a protocol feature to indicate: switch to TCP. Having a bigger buffer upstream does not mean downstream clients can handle that larger size.

Thanks for pointing out this, I did see this parameter name but did not notice it was introduced recently for the purpose of caching TC answer. This does create a confliction, looks like the two features are mutually exclusive, both are trying to improve cache effeciency for given use cases.

BTW, just think of that this PR also added handling block that if dnsdist cached size is greater than client's UDP payload size then it will send back TC immediately. It can actually serve the clients that have smaller payload size to get a quick TC thus can switch to TCP, in the mean time, it can also serve clients that have larger payload size with the right answer too.

pacnal · 2025-05-08T10:44:01Z

Just found that the d_settings.d_optionsToSkip is not really skip the EDNS option when doing hashing. It only skips the option data part, it still requires the option is sent, and must be with the same data length. Is this the purposed design?

edmonds · 2025-05-08T17:35:40Z

On the other hand I do see a use-case for trying to be smarter about EDNS UDP buffer size, perhaps by defining a few groups (512, 512-1232, 1232-4096, I have not really thought about it). Doesn't skipping the UDP buffer size altogether mean that we could end up with only a truncated answer in the cache, if the first query for the name has a low value (512) or no EDNS? And then be stuck with that useless entry even for clients willing to accept a bigger response over UDP?

Yes, the EDNS buffer size does need to be part of the cache key, e.g. RFC 8906 claims:

https://datatracker.ietf.org/doc/html/rfc8906#name-whole-answer-caches

Whole answer caches take a previously constructed answer and return it to a subsequent query for the same question. However, they can return the wrong response if they do not take all of the relevant attributes of the query into account.

In addition to the standard tuple of <qname,qtype,qclass>, a non-exhaustive set of attributes that must be considered include: RD, AD, CD, OPT record, DO, EDNS buffer size, EDNS version, EDNS options, and transport.

When I look at real world distributions of EDNS buffer sizes (specifically at authoritative nameservers) I do see a larger set of unique EDNS buffer sizes than I would like from a caching perspective, but also some buffer sizes are much more common than others. So I do think it makes sense to "quantize" the buffer size offered by the client so that the cache only needs to retain answers for a few common buffer sizes.

For instance, at least in the data I've looked at, there are very few queries that use buffer sizes in the range of 513 to 1219 bytes (inclusive), so quantizing 512-1219 to 512 (that is, if a client offers anywhere from 512-1219, canonicalize it to 512 bytes when querying the downstream / searching the cache) is pretty non-impactful. Of course the tradeoff of doing this kind of quantization is that a client using an unusual buffer size might be more likely to get a TC=1 response when it's not strictly necessary. But it also prevents such clients from polluting the cache with not very useful cache entries.

There are actually a decent number of resolvers out there using 1220, so using 1220 instead of 1232 as the cutpoint for the next quantization range can make sense.

Note that you have to round down to the lower value in the quantization range with a scheme like this. If a client offers 1219 they can be validly served a response that was generated from a query with a 512 byte buffer size, but the opposite is not true.

pacnal · 2025-05-09T00:51:49Z

Thanks for the good discussion. I can see the reason that buffer size to be part of cache key while I also see use case that customer want to share cache between different buffer size settings. Defining quantized groups looks like a trade-off for both. In this case I would suggest not to use a hardcoded quantization, e.g. 512 to 1219 or 512 to 1231. We could provide one vector option (default empty? means do not use this feature, or with a pre-defined grouping?) so if needed user can customize the groups based on its own use case. One point is that we may not need to modify the payload size in the query, we just need to round down the payload size when calculating cache key.

edmonds · 2025-05-11T20:12:16Z

One point is that we may not need to modify the payload size in the query, we just need to round down the payload size when calculating cache key.

I think the cache key needs to be based on the EDNS UDP payload size that the client offers in its query. And that value needs to be quantized when searching the cache in order to avoid fragmenting the cache if clients use many different payload sizes. I think it also needs to be quantized when sending a downstream query to a server, to avoid generating a response that is too large for some clients in the quantization range.

E.g., if you have a configured quantization range of 1220 to 1451 bytes and you get a client query with a payload size of 1400 bytes, and that value is sent unchanged to the downstream server, and the server sends a response with an actual size of 1300 bytes, that response cannot be used for clients that use payload sizes in the 1220 to 1299 part of the range. The payload size in the downstream query needs to be reset to 1220 so that the response is guaranteed to be usable for all the clients in the range.

pacnal · 2025-05-11T23:42:03Z

E.g., if you have a configured quantization range of 1220 to 1451 bytes and you get a client query with a payload size of 1400 bytes, and that value is sent unchanged to the downstream server, and the server sends a response with an actual size of 1300 bytes, that response cannot be used for clients that use payload sizes in the 1220 to 1299 part of the range.

This is covered by cached size and query payload validation before cache result is sent to client. If customer setup its downstream servers to send differentl length of results for different payload size request of the same name, then they do not need this feature thus should not activate this feature.

The payload size in the downstream query needs to be reset to 1220 so that the response is guaranteed to be usable for all the clients in the range.

Note that dnsdist is providing the capability of modifying payload 8000 size of the query before it gets send to downstream server. If we modify payload size after applying rules then it breaks the rule, if we modify payload size before applying rules then it does not stop the described use case from happening with the new size from rule.

pacnal · 2025-05-30T01:05:50Z

Hello, is this one still progressing?

rgacogne · 2025-05-30T07:49:06Z

I still need to review it and I have been quite busy, sorry about that! I'll plan on doing that next week.

pacnal · 2025-05-30T08:33:35Z

Ok, no problem, thanks for reply. Just a quick keep-alive query.

rgacogne

Thanks for opening this pull request, and sorry about the review delay! I have to say I was a bit uneasy with this feature, mostly because it only works if the backend is not adding additional records based on the EDNS UDP payload size, in which case it might actually degrade the performance because a client might have been able to get a small enough response over UDP and will have to retry over TCP instead (if a reply with a bigger payload size is present in the cache). That being said, as long as the feature is not enabled by default this might an acceptable trade-off, and the code is not too intrusive, so I don't see any reason not to accept. I have requested some small changes and after that we should be good to go.

pdns/dnsdistdist/dnsdist-settings-definitions.yml

pdns/dnsdistdist/dnsdist.cc

pdns/packetcache.hh

regression-tests.dnsdist/test_Caching.py

pdns/dnsdistdist/docs/reference/config.rst

pdns/dnsdistdist/dnsdist-settings-definitions.yml

rgacogne · 2025-06-10T08:01:09Z

This branch is conflicted because we removed dnsdist-rust-lib/rust/src/lib.rs from the repository in the meantime. It's not hard to fix by rebasing on the current master and just doing git rm dnsdist-rust-lib/rust/src/lib.rs every time there is a reported conflict, but let me know if you need help with that!

pacnal · 2025-06-10T09:00:41Z

Hi, thanks for the conflicts notice! I wasn't sure if I need to do anything since I see the "Resolve conflicts" button is greyed out to me. Looks like this must be done via command line. I do need some help here. After searching I guess I should do followings:

git checkout master
git pull --rebase upstream
// resolve conflicts by "git rm dnsdist-rust-lib/rust/src/lib.rs"
git push --force-with-lease

Would you help to check if this is the right steps? I'm not quite sure if "--force-with-lease" is the right argument or a must here, but the searched result says it shall be used.

rgacogne · 2025-06-10T09:07:15Z

What I usually do in this case is:

make sure I have an up-to-date version of the upstream master branch: git fetch upstream master (if the name of the upstream repository is upstream, check git remote -v, and if necessary add it with git remote add upstream git@github.com:PowerDNS/pdns.git)
switch to the branch I want to rebase: git switch <branch-name>. In your case it looks like the branch name might be master which is a bit confusing but not a problem
rebase against the upstream master branch: git rebase -i upstream/master
Push the resulting branch with git push --force-with-lease. --force-with-lease is not required but will prevent mistakes by ensuring that the remote branch matches the expected state.

Use case arise that two clients with different udp payload size, a customized client uses 4096 while glibc resolver uses 1232. User would like to share cache result for the same query name, type and class in this case. The downstream servers does not use ECS and would not return different answers upon other EDNS options so this is to add an option to support such use case.

Signed-off-by: Remi Gacogne <remi.gacogne@powerdns.com>

pacnal · 2025-06-11T00:14:38Z

Thanks! That is helpful, I followed and with a few git rm it succeeded with rebase and finally pushed.

coveralls · 2025-06-11T00:42:18Z

Pull Request Test Coverage Report for Build 15572927645

Details

49 of 59 (83.05%) changed or added relevant lines in 5 files are covered.
29 unchanged lines in 11 files lost coverage.
Overall coverage increased (+0.009%) to 65.516%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
pdns/dnsdistdist/dnsdist-lua-bindings-packetcache.cc	12	14	85.71%
pdns/dnsdistdist/dnsdist-configuration-yaml.cc	3	11	27.27%

Files with Coverage Reduction	New Missed Lines	%
pdns/dnsdistdist/dnsdist-tcp.cc	1	77.14%
modules/godbcbackend/sodbc.cc	2	77.41%
modules/lmdbbackend/lmdbbackend.cc	2	74.59%
pdns/dnsdistdist/dnsdist-crypto.cc	2	75.72%
pdns/misc.cc	2	61.39%
pdns/opensslsigners.cc	2	61.27%
pdns/recursordist/sortlist.cc	2	72.94%
modules/gpgsqlbackend/spgsql.cc	3	68.18%
pdns/recursordist/test-syncres_cc2.cc	3	88.91%
pdns/remote_logger.cc	3	57.59%

Totals
Change from base Build 15554991653:	0.009%
Covered Lines:	125573
Relevant Lines:	163246

💛 - Coveralls

rgacogne added enhancement dnsdist labels May 7, 2025

pacnal changed the title ~~dnsdist: support skip hashing AR section for caching~~ dnsdist: add option to support cache sharing between different payload sizes May 9, 2025

rgacogne self-requested a review May 19, 2025 07:41

rgacogne requested changes Jun 6, 2025

View reviewed changes

rgacogne mentioned this pull request Jun 6, 2025

dnsdist: Gracefully handle list of integers in YAML nokia/PowerDNS-pdns#1

Merged

7 tasks

rgacogne self-requested a review June 10, 2025 07:54

pacnal and others added 5 commits June 11, 2025 00:10

Use payload size ranking for cache sharing

98c8ea8

add missing empty condition check on payload rank option

c77398c

dnsdist: Gracefully handle list of integers in YAML

ba4d8fb

Signed-off-by: Remi Gacogne <remi.gacogne@powerdns.com>

Address PR comments of reword and remove rsp payload size check

3c13e71

pacnal force-pushed the master branch from f69cb1e to 3c13e71 Compare June 11, 2025 00:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

dnsdist: add option to support cache sharing between different payload sizes #15532

dnsdist: add option to support cache sharing between different payload sizes #15532

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dnsdist: add option to support cache sharing between different payload sizes #15532

Are you sure you want to change the base?

dnsdist: add option to support cache sharing between different payload sizes #15532

Conversation

Short description

Checklist

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Pull Request Test Coverage Report for Build 15572927645

Details

💛 - Coveralls

Uh oh!

Uh oh!