8000 dnsdist: add option to support cache sharing between different payload sizes by pacnal · Pull Request #15532 · PowerDNS/pdns · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

dnsdist: add option to support cache sharing between different payload sizes #15532

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

pacnal
Copy link
Contributor
@pacnal pacnal commented May 7, 2025

Short description

Use case arise that two clients with different udp payload size, a customized client uses 4096 while glibc resolver uses 1232. User would like to share cache result for the same query name, type and class in this case. The downstream servers does not use ECS and would not return different answers upon other EDNS options so this is to add an option to support such use case.

Checklist

I have:

  • read the CONTRIBUTING.md document
  • compiled this code
  • tested this code
  • included documentation (including possible behaviour changes)
  • documented the code
  • added or modified regression test(s)
  • added or modified unit test(s)

@pacnal
Copy link
Contributor Author
pacnal commented May 7, 2025

[Spell checking: pdns/dnsdistdist/docs/reference/config.rst#L1068]
abvoe is not a recognized word. (unrecognized-spelling)

:-( ... waiting for other errors ...

@rgacogne
Copy link
Member
rgacogne commented May 7, 2025

I have a few concerns about this feature request: ignoring the whole additional section, including EDNS options, seems like a recipe for disaster, especially since we cannot know which options will be implemented in the future. Even ignoring TSIG is probably a problem.
On the other hand I do see a use-case for trying to be smarter about EDNS UDP buffer size, perhaps by defining a few groups (512, 512-1232, 1232-4096, I have not really thought about it). Doesn't skipping the UDP buffer size altogether mean that we could end up with only a truncated answer in the cache, if the first query for the name has a low value (512) or no EDNS? And then be stuck with that useless entry even for clients willing to accept a bigger response over UDP?

@pacnal
Copy link
Contributor Author
pacnal commented May 8, 2025

I have a few concerns about this feature request: ignoring the whole additional section, including EDNS options, seems like a recipe for disaster

Ok, skipping the whole AR sounds a shortcut, though it fits many customer env which is pretty simple and closed. I was thinking initially to skip AR count and the fix header of OPT RR only, however, that would involve changing PacketCache::hashAfterQname() where we do not have an argument available to indicate the need. I was also thinking if possible to add a special option value in the d_optionsToSkip array to indidate this, but it is still tricky. And this funciton seems been used outside of dnsdist, so I'm hesitating to add a new argument for it. Do you think if it is ok to add a new argument (with default value false) to PacketCache::hashAfterQname() and so more accurate operations can be done?

Even ignoring TSIG is probably a problem.

I think TSIG should not impact the actual answers? So it should not be critical to be included in cache key? I did not touch this part yet, does dnsdist perform any security validation or just forwarding? However, this reminds me that the DO bit might be important, we might not want to return the cache result from DO=0 client to a DO=1 client. In case other bits get important in future, it might be good to include the 4 bytes of TTL field in OPT RR. We can construct a 'faked' TTL (all 0s) for those who did not send an OPT RR.

we could end up with only a truncated answer in the cache, if the first query for the name has a low value (512) or no EDNS? And then be stuck with that useless entry even for clients willing to accept a bigger response over UDP?

Are we caching TC answer? I did not test and notice this yet. If that is the case then I would think we shall not cache TC answers. If TC happens quite a lot then we should somehow fire an alarm (by user's hook or metrics monitoring) and then config dnsdist to enlarge its payload size towards downstream servers.

@dwfreed
Copy link
Contributor
dwfreed commented May 8, 2025

I think TSIG should not impact the actual answers? So it should not be critical to be included in cache key?

TSIG absolutely can affect what answer is provided if it is used by the backend to do view selection, which BIND, for example, is capable of. I operate an environment where one criteria used for view selection is which TSIG key the query is signed with, and have 4 views with the same zone, each of which have very different answers for some queries in each view.

@pacnal
Copy link
Contributor Author
pacnal commented May 8, 2025

TSIG absolutely can affect what answer is provided if it is used by the backend to do view selection, which BIND, for example, is capable of. I operate an environment where one criteria used for view selection is which TSIG key the query is signed with, and have 4 views with the same zone, each of which have very different answers for some queries in each view.

That's an fair answer, talking about bind9 view, then it could be configured with many other parameters as well. And PowerDNS can also edit an answer with hooks based on many other factors. It actually provided extra flexibility that RFCs does not define. Anyway, I would agree that skipping whole AR seems heavy and let's seek for more accurate control on this.

@omoerbeek
Copy link
Member

Are we caching TC answer? I did not test and notice this yet. If that is the case then I would think we shall not cache TC answers. If TC happens quite a lot then we should somehow fire an alarm (by user's hook or metrics monitoring) and then config dnsdist to enlarge its payload size towards downstream servers.

A recent PR (#15423) introduce (optional) caching of TC=1 answers. On a side note, TC=1 answers are not an error condition, but a protocol feature to indicate: switch to TCP. Having a bigger buffer upstream does not mean downstream clients can handle that larger size.

@pacnal
Copy link
Contributor Author
pacnal commented May 8, 2025

A recent PR (#15423) introduce (optional) caching of TC=1 answers. On a side note, TC=1 answers are not an error condition, but a protocol feature to indicate: switch to TCP. Having a bigger buffer upstream does not mean downstream clients can handle that larger size.

Thanks for pointing out this, I did see this parameter name but did not notice it was introduced recently for the purpose of caching TC answer. This does create a confliction, looks like the two features are mutually exclusive, both are trying to improve cache effeciency for given use cases.

BTW, just think of that this PR also added handling block that if dnsdist cached size is greater than client's UDP payload size then it will send back TC immediately. It can actually serve the clients that have smaller payload size to get a quick TC thus can switch to TCP, in the mean time, it can also serve clients that have larger payload size with the right answer too.

@pacnal
Copy link
Contributor Author
pacnal commented May 8, 2025

Just found that the d_settings.d_optionsToSkip is not really skip the EDNS option when doing hashing. It only skips the option data part, it still requires the option is sent, and must be with the same data length. Is this the purposed design?

@edmonds
Copy link
Contributor
edmonds commented May 8, 2025

On the other hand I do see a use-case for trying to be smarter about EDNS UDP buffer size, perhaps by defining a few groups (512, 512-1232, 1232-4096, I have not really thought about it). Doesn't skipping the UDP buffer size altogether mean that we could end up with only a truncated answer in the cache, if the first query for the name has a low value (512) or no EDNS? And then be stuck with that useless entry even for clients willing to accept a bigger response over UDP?

Yes, the EDNS buffer size does need to be part of the cache key, e.g. RFC 8906 claims:

https://datatracker.ietf.org/doc/html/rfc8906#name-whole-answer-caches

Whole answer caches take a previously constructed answer and return it to a subsequent query for the same question. However, they can return the wrong response if they do not take all of the relevant attributes of the query into account.

In addition to the standard tuple of <qname,qtype,qclass>, a non-exhaustive set of attributes that must be considered include: RD, AD, CD, OPT record, DO, EDNS buffer size, EDNS version, EDNS options, and transport.

When I look at real world distributions of EDNS buffer sizes (specifically at authoritative nameservers) I do see a larger set of unique EDNS buffer sizes than I would like from a caching perspective, but also some buffer sizes are much more common than others. So I do think it makes sense to "quantize" the buffer size offered by the client so that the cache only needs to retain answers for a few common buffer sizes.

For instance, at least in the data I've looked at, there are very few queries that use buffer sizes in the range of 513 to 1219 bytes (inclusive), so quantizing 512-1219 to 512 (that is, if a client offers anywhere from 512-1219, canonicalize it to 512 bytes when querying the downstream / searching the cache) is pretty non-impactful. Of course the tradeoff of doing this kind of quantization is that a client using an unusual buffer size might be more likely to get a TC=1 response when it's not strictly necessary. But it also prevents such clients from polluting the cache with not very useful cache entries.

There are actually a decent number of resolvers out there using 1220, so using 1220 instead of 1232 as the cutpoint for the next quantization range can make sense.

Note that you have to round down to the lower value in the quantization range with a scheme like this. If a client offers 1219 they can be validly served a response that was generated from a query with a 512 byte buffer size, but the opposite is not true.

@pacnal
Copy link
Contributor Author
pacnal commented May 9, 2025

Thanks for the good discussion. I can see the reason that buffer size to be part of cache key while I also see use case that customer want to share cache between different buffer size settings. Defining quantized groups looks like a trade-off for both. In this case I would suggest not to use a hardcoded quantization, e.g. 512 to 1219 or 512 to 1231. We could provide one vector option (default empty? means do not use this feature, or with a pre-defined grouping?) so if needed user can customize the groups based on its own use case. One point is that we may not need to modify the payload size in the query, we just need to round down the payload size when calculating cache key.

@pacnal pacnal changed the title dnsdist: support skip hashing AR section for caching dnsdist: add option to support cache sharing between different payload sizes May 9, 2025
@edmonds
Copy link
Contributor
edmonds commented May 11, 2025

One point is that we may not need to modify the payload size in the query, we just need to round down the payload size when calculating cache key.

I think the cache key needs to be based on the EDNS UDP payload size that the client offers in its query. And that value needs to be quantized when searching the cache in order to avoid fragmenting the cache if clients use many different payload sizes. I think it also needs to be quantized when sending a downstream query to a server, to avoid generating a response that is too large for some clients in the quantization range.

E.g., if you have a configured quantization range of 1220 to 1451 bytes and you get a client query with a payload size of 1400 bytes, and that value is sent unchanged to the downstream server, and the server sends a response with an actual size of 1300 bytes, that response cannot be used for clients that use payload sizes in the 1220 to 1299 part of the range. The payload size in the downstream query needs to be reset to 1220 so that the response is guaranteed to be usable for all the clients in the range.

@pacnal
Copy link
Contributor Author
pacnal commented May 11, 2025

E.g., if you have a configured quantization range of 1220 to 1451 bytes and you get a client query with a payload size of 1400 bytes, and that value is sent unchanged to the downstream server, and the server sends a response with an actual size of 1300 bytes, that response cannot be used for clients that use payload sizes in the 1220 to 1299 part of the range.

This is covered by cached size and query payload validation before cache result is sent to client. If customer setup its downstream servers to send differentl length of results for different payload size request of the same name, then they do not need this feature thus should not activate this feature.

The payload size in the downstream query needs to be reset to 1220 so that the response is guaranteed to be usable for all the clients in the range.

Note that dnsdist is providing the capability of modifying payload 8000 size of the query before it gets send to downstream server. If we modify payload size after applying rules then it breaks the rule, if we modify payload size before applying rules then it does not stop the described use case from happening with the new size from rule.

@rgacogne rgacogne self-requested a review May 19, 2025 07:41
@pacnal
Copy link
Contributor Author
pacnal commented May 30, 2025

Hello, is this one still progressing?

@rgacogne
Copy link
Member

I still need to review it and I have been quite busy, sorry about that! I'll plan on doing that next week.

@pacnal
Copy link
Contributor Author
pacnal commented May 30, 2025

Ok, no problem, thanks for reply. Just a quick keep-alive query.

Copy link
Member
@rgacogne rgacogne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for opening this pull request, and sorry about the review delay! I have to say I was a bit uneasy with this feature, mostly because it only works if the backend is not adding additional records based on the EDNS UDP payload size, in which case it might actually degrade the performance because a client might have been able to get a small enough response over UDP and will have to retry over TCP instead (if a reply with a bigger payload size is present in the cache). That being said, as long as the feature is not enabled by default this might an acceptable trade-off, and the code is not too intrusive, so I don't see any reason not to accept. I have requested some small changes and after that we should be good to go.

@rgacogne
Copy link
Member

This branch is conflicted because we removed dnsdist-rust-lib/rust/src/lib.rs from the repository in the meantime. It's not hard to fix by rebasing on the current master and just doing git rm dnsdist-rust-lib/rust/src/lib.rs every time there is a reported conflict, but let me know if you need help with that!

@pacnal
Copy link
Contributor Author
pacnal commented Jun 10, 2025

Hi, thanks for the conflicts notice! I wasn't sure if I need to do anything since I see the "Resolve conflicts" button is greyed out to me. Looks like this must be done via command line. I do need some help here. After searching I guess I should do followings:

git checkout master
git pull --rebase upstream
// resolve conflicts by "git rm dnsdist-rust-lib/rust/src/lib.rs"
git push --force-with-lease

Would you help to check if this is the right steps? I'm not quite sure if "--force-with-lease" is the right argument or a must here, but the searched result says it shall be used.

@rgacogne
Copy link
Member

What I usually do in this case is:

  • make sure I have an up-to-date version of the upstream master branch: git fetch upstream master (if the name of the upstream repository is upstream, check git remote -v, and if necessary add it with git remote add upstream git@github.com:PowerDNS/pdns.git)
  • switch to the branch I want to rebase: git switch <branch-name>. In your case it looks like the branch name might be master which is a bit confusing but not a problem
  • rebase against the upstream master branch: git rebase -i upstream/master
  • Push the resulting branch with git push --force-with-lease. --force-with-lease is not required but will prevent mistakes by ensuring that the remote branch matches the expected state.

pacnal and others added 5 commits June 11, 2025 00:10
Use case arise that two clients with different udp payload size,
a customized client uses 4096 while glibc resolver uses 1232.
User would like to share cache result for the same query name,
type and class in this case. The downstream servers does not use
ECS and would not return different answers upon other EDNS
options so this is to add an option to support such use case.
Signed-off-by: Remi Gacogne <remi.gacogne@powerdns.com>
@pacnal
Copy link
Contributor Author
pacnal commented Jun 11, 2025

Thanks! That is helpful, I followed and with a few git rm it succeeded with rebase and finally pushed.

@coveralls
Copy link

Pull Request Test Coverage Report for Build 15572927645

Details

  • 49 of 59 (83.05%) changed or added relevant lines in 5 files are covered.
  • 29 unchanged lines in 11 files lost coverage.
  • Overall coverage increased (+0.009%) to 65.516%

Changes Missing Coverage Covered Lines Changed/Added Lines %
pdns/dnsdistdist/dnsdist-lua-bindings-packetcache.cc 12 14 85.71%
pdns/dnsdistdist/dnsdist-configuration-yaml.cc 3 11 27.27%
Files with Coverage Reduction New Missed Lines %
pdns/dnsdistdist/dnsdist-tcp.cc 1 77.14%
modules/godbcbackend/sodbc.cc 2 77.41%
modules/lmdbbackend/lmdbbackend.cc 2 74.59%
pdns/dnsdistdist/dnsdist-crypto.cc 2 75.72%
pdns/misc.cc 2 61.39%
pdns/opensslsigners.cc 2 61.27%
pdns/recursordist/sortlist.cc 2 72.94%
modules/gpgsqlbackend/spgsql.cc 3 68.18%
pdns/recursordist/test-syncres_cc2.cc 3 88.91%
pdns/remote_logger.cc 3 57.59%
Totals Coverage Status
Change from base Build 15554991653: 0.009%
Covered Lines: 125573
Relevant Lines: 163246

💛 - Coveralls

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants
0