8000 Create Istio owned CNI config by jaellio · Pull Request #56156 · istio/istio · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Create Istio owned CNI config #56156

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

jaellio
Copy link
Contributor
@jaellio jaellio commented May 3, 2025

Please provide a description of this PR:

Changes for creating an Istio owned CNI config. Instead of appending the istio-cni plugin to the default CNI config we will copy the CNI config, append the istio-cni plugin and write a new Istio owned config file in the same directory. The config will also be created with a higher priority than the primary CNI. This change helps prevent the race condition described in #55968.

Significant Changes/ Open Questions to Consider:

  1. This PR defines CNIConfName to be the primary CNI filename. Previously this value was used to specify the desired CNI config file name to add the istio-cni plugin to. Now, the file will be used to defined the primary CNI filename and path from where to copy the config.
  2. Should there be a plugin equality check between the primary cni plugins and the istio owned config file plugins?
  3. Should the istio owned cni plugin filename be configurable? Should the file priority be determined programmatically? - UPDATED
  4. If the istio owned cni plugin file already existed when the cni daemon restarts, can we assume the second highest priority config file is the primary CNI?
  5. What is the intended behavior of the unchained CNI plugin setting? Should this PR impact the unchained CNI plugin's behavior?

TODO:

@istio-testing
Copy link
Collaborator

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@istio-testing istio-testing added do-not-merge/work-in-progress Block merging of a PR because it isn't ready yet. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels May 3, 2025
@jaellio
Copy link
Contributor Author
jaellio commented May 12, 2025

/test all

@jaellio jaellio marked this pull request as ready for review May 12, 2025 17:55
@jaellio jaellio requested a review from a team as a code owner May 12, 2025 17:55
@istio-testing istio-testing removed the do-not-merge/work-in-progress Block merging of a PR because it isn't ready yet. label May 12, 2025
@jaellio jaellio changed the title [Proposal Draft] Create Istio owned CNI config Create Istio owned CNI config May 12, 2025
@aaronjwood
Copy link
Contributor

Regarding #55968 (comment):

  1. What happens when you run up against other CNI's that are named 00-*? What if they're 000-*? In summary, how does this address the issue where you end up racing with something else to "be first?"
  2. How will consumers know they're save if this is tucked away in a config? What happens if consumers change their infra and there's another CNI that beats out Istio's, which brings the situation back to problem 1 above? How do we avoid silently putting folks back in a state where they can, again silently, be affected by this based on specific triggers?
  3. At one point @howardjohn mentioned the idea of using init containers as a temporary measure to provide strong guarantees, at least until the CNI changes that are needed are available as a baseline for a patch. Did this discussion go any further?

Considering this problem-space touches security, compliance, and packet flows I feel we should provide a very binary guarantee, even if it's for a temporary implementation, and disallow any holes that 1 and 2 bring to the table. Basically, it should be impossible (provided in an automated manner) for something to come up and "start working" outside the confines of a mesh, especially when it can be silent.

@keithmattix
Copy link
Contributor

Considering this problem-space touches security, compliance, and packet flows I feel we should provide a very binary guarantee, even if it's for a temporary implementation, and disallow any holes that 1 and 2 bring to the table.

While this is ideal, I'm not sure how possible it is without containernetworking/cni#1052 given that pods can be added to the mesh while they're running. That means that an initcontainer won't work, and it means the hooks that we have post-restart are fairly scarce. I think if this PR allowed the file name to be configurable (like 3) suggests in the PR description), users at least have some control if another CNI has a different name.

How do we avoid silently putting folks back in a state where they can, again silently, be affected by this based on specific triggers?

This is the fundamental problem IMO; the CNI spec provides no controls (not even a file/directory lock!) for the kind of binary control we're looking for (well until containernetworking/cni#1052), so istio-cni doesn't even know it's been preempted post reboot. So, I think this PR gets us as close as we can get for now

@aaronjwood
Copy link
Contributor

That means that an initcontainer won't work

Could it be injected via a webhook, forcing the pod to restart and run the init container when added to the mesh?

so istio-cni doesn't even know it's been preempted post reboot

My understanding was that there's already some code in place to watch for changes to these CNI configs via inotify. Couldn't inotify be used to provide a kind of hook for this part of the problem?

@keithmattix
Copy link
Contributor

Could it be injected via a webhook, forcing the pod to restart and run the init container when added to the mesh?

There is no event for a node restart. There's no safe action istio can do to force the pod to restart once it's been added to the mesh, and even if we could, that's a lot of disruption and negates one of the primary benefits of ambient (not requiring restarts)

Couldn't inotify be used to provide a kind of hook for this part of the problem?

This is the entire problem; no other pods on the cluster can come up before the primary CNI post-reboot because the primary CNI creates the network for the istio-cni pod (especially now that istio-cni runs out of the host network). During that time, there's nothing istio-cni can do to force the primary CNI to wait for istio-cni to come up. It would be nice, but we can't rely on it as the CNI spec doesn't require it

@aaronjwood
Copy link
Contributor
aaronjwood commented May 13, 2025

There is no event for a node restart.

True, I was thinking there'd be some reconciliation (could this be done in the program mentioned at the bottom of this comment that'd run first, outside the CNI network?) tied to the labeling of the workload, and based on that the container would be injected. So on one side you'd have a reconciled approach and on the other side you'd have a webhook to catch anything new coming up. Adding the container as part of reconcilation sucks, but it at least makes the solution correct and reliable, and closes the holes.

that's a lot of disruption and negates one of the primary benefits of ambient (not requiring restarts)

You're right, and the disruption is terrible. I'd argue that the security and compliance holes eclipse that problem. For example, we won't ship to customers with the possibility of ambient mode being enabled due to this issue. A few other folks I know (unrelated to my day job) are backing out of their ambient setups until this is fixed. I see it similar to crypto: if there was a fancy new super safe/fast/compressing cipher but had a hole or two its benefits wouldn't matter since it's fundamentally broken.

because the primary CNI creates the network for the istio-cni pod (especially now that istio-cni runs out of the host network)

Ah, yes you're right. Forgot that istio-cni is not on the host net! There could be something run outside of that network that could come up first, just to do the watch and to send some sort of message about it to istio-cni (when it eventually comes up). I imagine that would be a lot more (throwaway) work.

@keithmattix
Copy link
Contributor

I think what you're suggesting is feasible @aaronjwood, but I think this PR gets us to about the same place as a practical matter. Anybody installing a new CNI onto a cluster is going to be aware/can learn what that config file's name is. From there, you're a helm upgrade away from changing the istio-cni's owned config file name with this PR. You only need to change it again if there's a change to the existing CNI's config naming scheme or a new CNI is added (not to common from my understanding). I'll also add that this is a stopgap until the new CNI versions get plumbed through to containerd and Kubernetes.

@jaellio
Copy link
Contributor Author
jaellio commented May 13, 2025

You're right, and the disruption is terrible. I'd argue that the security and compliance holes eclipse that problem. For example, we won't ship to customers with the possibility of ambient mode being enabled due to this issue. A few other folks I know (unrelated to my day job) are backing out of their ambient setups until this is fixed. I see it similar to crypto: if there was a fancy new super safe/fast/compressing cipher but had a hole or two its benefits wouldn't matter since it's fundamentally broken.

I think the benefit of the fix in this PR depends on some of the primary CNI specifics you mentioned above and users' knowledge of how this fix interacts with a primary CNI:

  1. What happens when you run up against other CNI's that are named 00-? What if they're 000-? In summary, how does this address the issue where you end up racing with something else to "be first?"
  2. How will consumers know they're save if this is tucked away in a config? What happens if consumers change their infra and there's another CNI that beats out Istio's, which brings the situation back to problem 1 above? How do we avoid silently putting folks back in a state where they can, again silently, be affected by this based on specific triggers?

If there is a CNI that has the requirement of "owning"/controlling the highest priority CNI config and will overwrite any higher priority config then the solution in this PR will never work. I am not sure how often CNI's have this dynamic requirement. In the case where the primary CNI has a fixed priority, this solution works as long as the user properly configures the Istio owned CNI config's priority if necessary (the default of 02 isn't high enough).

@jaellio
Copy link
Contributor Author
jaellio commented May 13, 2025

At one point @howardjohn mentioned the idea of using init containers as a temporary measure to provide strong guarantees, at least until the CNI changes that are needed are available as a baseline for a patch. Did this discussion go any further?

I agree with this being an alternative, but it modifies the fundamental patterns ambient is based on (not injecting a sidecar or init container into application workloads).

You're right, and the disruption is terrible. I'd argue that the security and compliance holes eclipse that problem. For example, we won't ship to customers with the possibility of ambient mode being enabled due to this issue. A few other folks I know (unrelated to my day job) are backing out of their ambient setups until this is fixed. I see it similar to crypto: if there was a fancy new super safe/fast/compressing cipher but had a hole or two its benefits wouldn't matter since it's fundamentally broken.

I totally hear this though. I think we need to find the balance between a solution that favors complete security and disrupts the existing ambient workflows, and a solution where security is based on the user's proper configuration and knowledge of their underlying CNI and doesn't impact the existing ambient workflow.

@jaellio jaellio added the do-not-merge Block automatic merging of a PR. label May 13, 2025
@aaronjwood
Copy link
Contributor

I am not sure how often CNI's have this dynamic requirement.

While not "dynamic" I believe Multus has the requirement of being first. Last I looked their config is prefixed with 00. I'm not sure what'll happen if something loads before it, haven't really dug in there. To your point, I'm not sure which other CNIs in the landscape (who knows about closed source ones used inside company x, etc.) would run into this. I'd imagine the whole "starting first" requirement to not be overly rare.

and will overwrite any higher priority config

I'm not sure how the design of CNI holds together for this situation then. For example, if Multus really does have a hard requirement on being first what else can they do besides say "we don't work with anything that tries to get above us" and punt the problem? To @keithmattix's point I totally get that this is all a temporary solution until the CNI change is available for use.

In the case where the primary CNI has a fixed priority, this solution works

This was going to be my next question. Currently we say Istio's CNI works with anything, with this change what'll be the messaging for folks using Multus (or something similar) which requires to be first? This would account for both OSS and proprietary solutions so anything is on the table.

I totally hear this though. I think we need to find the balance between a solution that favors complete security and disrupts the existing ambient workflows, and a solution where security is based on the user's proper configuration and knowledge of their underlying CNI and doesn't impact the existing ambient workflow.

Agreed, in a perfect reality the change to CNI will become available immediately and this whole intermediate state we're in can be skipped :) I understand the situation is a bit tricky logistically and overall quite annoying!

@jaellio
Copy link
Contributor Author
jaellio commented May 13, 2025

This was going to be my next question. Currently we say Istio's CNI works with anything, with this change what'll be the messaging for folks using Multus (or something similar) which requires to be first? This would account for both OSS and proprietary solutions so anything is on the table.

One thought here is an Istio owned CNI config could be opt in...but if users opt out they're vulnerable (which is definitely not ideal)

@jaellio
Copy link
Contributor Author
jaellio commented May 13, 2025

While not "dynamic" I believe Multus has the requirement of being first. Last I looked their config is prefixed with 00. I'm not sure what'll happen if something loads before it, haven't really dug in there. To your point, I'm not sure which other CNIs in the landscape (who knows about closed source ones used inside company x, etc.) would run into this. I'd imagine the whole "starting first" requirement to not be overly rare.

I'll do some more investigation here to see which CNIs this solution might be incompatible with.

@howardjohn
Copy link
Member

A few incomplete notes ..

  • For multus users I don't think they need any fix , pods in multus explicitly declare which network they need(?) though maybe that doesn't work for ambient...
  • For init container I don't think it's bad as an option. It won't work for dynamically enrolled pods but that's largely fine.. if you opt into the extra layer of enforcement then you can handle restarting your pods to onboard. Note if we do this we need to make the repair controller ambient aware.
  • Given the high risk of environmental incompatibilities this should definitely be optional. I guess tbd what the default is based on some more investigation
    thanks for leading this

jaellio added 3 commits June 4, 2025 17:22
Signed-off-by: Jackie Elliott <jaellio@microsoft.com>
Signed-off-by: Jackie Elliott <jaellio@microsoft.com>
configurable

Signed-off-by: Jackie Elliott <jaellio@microsoft.com>
@jaellio jaellio force-pushed the jaellio/fixcnirace branch from 31badbb to 21e5fb4 Compare June 6, 2025 21:01
@jaellio jaellio requested a review from a team as a code owner June 6, 2025 21:01
Signed-off-by: Jackie Elliott <jaellio@microsoft.com>
@istio-testing istio-testing added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 6, 2025
Signed-off-by: Jackie Elliott <jaellio@microsoft.com>
@jaellio jaellio requested a review from howardjohn June 6, 2025 23:59
@keithmattix
Copy link
Contributor

/test integ-ambient-mc

@jaellio
Copy link
Contributor Author
jaellio commented Jun 16, 2025

@howardjohn could you PTAL?

@jaellio
Copy link
Contributor Author
jaellio commented Jun 23, 2025

Per discussion in the WG meeting on 6/18 we are moving forward with this temporary solution for 1.27 @howardjohn

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge Block automatic merging of a PR. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0