Create Istio owned CNI config #56156

jaellio · 2025-05-03T00:53:45Z

Please provide a description of this PR:

Changes for creating an Istio owned CNI config. Instead of appending the istio-cni plugin to the default CNI config we will copy the CNI config, append the istio-cni plugin and write a new Istio owned config file in the same directory. The config will also be created with a higher priority than the primary CNI. This change helps prevent the race condition described in #55968.

Significant Changes/ Open Questions to Consider:

This PR defines CNIConfName to be the primary CNI filename. Previously this value was used to specify the desired CNI config file name to add the istio-cni plugin to. Now, the file will be used to defined the primary CNI filename and path from where to copy the config.
Should there be a plugin equality check between the primary cni plugins and the istio owned config file plugins?
Should the istio owned cni plugin filename be configurable? Should the file priority be determined programmatically? - UPDATED
If the istio owned cni plugin file already existed when the cni daemon restarts, can we assume the second highest priority config file is the primary CNI?
What is the intended behavior of the unchained CNI plugin setting? Should this PR impact the unchained CNI plugin's behavior?

TODO:

Add istio config cleanup
Consider alternative solutions proposed here Ambient: iptables inconsistently missing for pod(s) upon node reboot #55968 (comment). This may be a temporary solution until RFC - Support safe subdirectory-based plugin conf loading containernetworking/cni#1052 is available
Add more robust testing
Add documentation
Consider edge cases (istio-cni already defined in primary CNI)

istio-testing · 2025-05-03T00:53:48Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

jaellio · 2025-05-12T07:08:34Z

/test all

aaronjwood · 2025-05-13T00:24:02Z

Regarding #55968 (comment):

What happens when you run up against other CNI's that are named 00-*? What if they're 000-*? In summary, how does this address the issue where you end up racing with something else to "be first?"
How will consumers know they're save if this is tucked away in a config? What happens if consumers change their infra and there's another CNI that beats out Istio's, which brings the situation back to problem 1 above? How do we avoid silently putting folks back in a state where they can, again silently, be affected by this based on specific triggers?
At one point @howardjohn mentioned the idea of using init containers as a temporary measure to provide strong guarantees, at least until the CNI changes that are needed are available as a baseline for a patch. Did this discussion go any further?

Considering this problem-space touches security, compliance, and packet flows I feel we should provide a very binary guarantee, even if it's for a temporary implementation, and disallow any holes that 1 and 2 bring to the table. Basically, it should be impossible (provided in an automated manner) for something to come up and "start working" outside the confines of a mesh, especially when it can be silent.

keithmattix · 2025-05-13T15:41:22Z

Considering this problem-space touches security, compliance, and packet flows I feel we should provide a very binary guarantee, even if it's for a temporary implementation, and disallow any holes that 1 and 2 bring to the table.

While this is ideal, I'm not sure how possible it is without containernetworking/cni#1052 given that pods can be added to the mesh while they're running. That means that an initcontainer won't work, and it means the hooks that we have post-restart are fairly scarce. I think if this PR allowed the file name to be configurable (like 3) suggests in the PR description), users at least have some control if another CNI has a different name.

How do we avoid silently putting folks back in a state where they can, again silently, be affected by this based on specific triggers?

This is the fundamental problem IMO; the CNI spec provides no controls (not even a file/directory lock!) for the kind of binary control we're looking for (well until containernetworking/cni#1052), so istio-cni doesn't even know it's been preempted post reboot. So, I think this PR gets us as close as we can get for now

aaronjwood · 2025-05-13T19:47:09Z

That means that an initcontainer won't work

Could it be injected via a webhook, forcing the pod to restart and run the init container when added to the mesh?

so istio-cni doesn't even know it's been preempted post reboot

My understanding was that there's already some code in place to watch for changes to these CNI configs via inotify. Couldn't inotify be used to provide a kind of hook for this part of the problem?

keithmattix · 2025-05-13T20:01:37Z

Could it be injected via a webhook, forcing the pod to restart and run the init container when added to the mesh?

There is no event for a node restart. There's no safe action istio can do to force the pod to restart once it's been added to the mesh, and even if we could, that's a lot of disruption and negates one of the primary benefits of ambient (not requiring restarts)

Couldn't inotify be used to provide a kind of hook for this part of the problem?

This is the entire problem; no other pods on the cluster can come up before the primary CNI post-reboot because the primary CNI creates the network for the istio-cni pod (especially now that istio-cni runs out of the host network). During that time, there's nothing istio-cni can do to force the primary CNI to wait for istio-cni to come up. It would be nice, but we can't rely on it as the CNI spec doesn't require it

aaronjwood · 2025-05-13T20:25:23Z

There is no event for a node restart.

True, I was thinking there'd be some reconciliation (could this be done in the program mentioned at the bottom of this comment that'd run first, outside the CNI network?) tied to the labeling of the workload, and based on that the container would be injected. So on one side you'd have a reconciled approach and on the other side you'd have a webhook to catch anything new coming up. Adding the container as part of reconcilation sucks, but it at least makes the solution correct and reliable, and closes the holes.

that's a lot of disruption and negates one of the primary benefits of ambient (not requiring restarts)

You're right, and the disruption is terrible. I'd argue that the security and compliance holes eclipse that problem. For example, we won't ship to customers with the possibility of ambient mode being enabled due to this issue. A few other folks I know (unrelated to my day job) are backing out of their ambient setups until this is fixed. I see it similar to crypto: if there was a fancy new super safe/fast/compressing cipher but had a hole or two its benefits wouldn't matter since it's fundamentally broken.

because the primary CNI creates the network for the istio-cni pod (especially now that istio-cni runs out of the host network)

Ah, yes you're right. Forgot that istio-cni is not on the host net! There could be something run outside of that network that could come up first, just to do the watch and to send some sort of message about it to istio-cni (when it eventually comes up). I imagine that would be a lot more (throwaway) work.

keithmattix · 2025-05-13T20:51:58Z

I think what you're suggesting is feasible @aaronjwood, but I think this PR gets us to about the same place as a practical matter. Anybody installing a new CNI onto a cluster is going to be aware/can learn what that config file's name is. From there, you're a helm upgrade away from changing the istio-cni's owned config file name with this PR. You only need to change it again if there's a change to the existing CNI's config naming scheme or a new CNI is added (not to common from my understanding). I'll also add that this is a stopgap until the new CNI versions get plumbed through to containerd and Kubernetes.

jaellio · 2025-05-13T20:57:14Z

You're right, and the disruption is terrible. I'd argue that the security and compliance holes eclipse that problem. For example, we won't ship to customers with the possibility of ambient mode being enabled due to this issue. A few other folks I know (unrelated to my day job) are backing out of their ambient setups until this is fixed. I see it similar to crypto: if there was a fancy new super safe/fast/compressing cipher but had a hole or two its benefits wouldn't matter since it's fundamentally broken.

I think the benefit of the fix in this PR depends on some of the primary CNI specifics you mentioned above and users' knowledge of how this fix interacts with a primary CNI:

What happens when you run up against other CNI's that are named 00-? What if they're 000-? In summary, how does this address the issue where you end up racing with something else to "be first?"

How will consumers know they're save if this is tucked away in a config? What happens if consumers change their infra and there's another CNI that beats out Istio's, which brings the situation back to problem 1 above? How do we avoid silently putting folks back in a state where they can, again silently, be affected by this based on specific triggers?

If there is a CNI that has the requirement of "owning"/controlling the highest priority CNI config and will overwrite any higher priority config then the solution in this PR will never work. I am not sure how often CNI's have this dynamic requirement. In the case where the primary CNI has a fixed priority, this solution works as long as the user properly configures the Istio owned CNI config's priority if necessary (the default of 02 isn't high enough).

jaellio · 2025-05-13T21:05:38Z

At one point @howardjohn mentioned the idea of using init containers as a temporary measure to provide strong guarantees, at least until the CNI changes that are needed are available as a baseline for a patch. Did this discussion go any further?

I agree with this being an alternative, but it modifies the fundamental patterns ambient is based on (not injecting a sidecar or init container into application workloads).

You're right, and the disruption is terrible. I'd argue that the security and compliance holes eclipse that problem. For example, we won't ship to customers with the possibility of ambient mode being enabled due to this issue. A few other folks I know (unrelated to my day job) are backing out of their ambient setups until this is fixed. I see it similar to crypto: if there was a fancy new super safe/fast/compressing cipher but had a hole or two its benefits wouldn't matter since it's fundamentally broken.

I totally hear this though. I think we need to find the balance between a solution that favors complete security and disrupts the existing ambient workflows, and a solution where security is based on the user's proper configuration and knowledge of their underlying CNI and doesn't impact the existing ambient workflow.

aaronjwood · 2025-05-13T21:32:03Z

I am not sure how often CNI's have this dynamic requirement.

While not "dynamic" I believe Multus has the requirement of being first. Last I looked their config is prefixed with 00. I'm not sure what'll happen if something loads before it, haven't really dug in there. To your point, I'm not sure which other CNIs in the landscape (who knows about closed source ones used inside company x, etc.) would run into this. I'd imagine the whole "starting first" requirement to not be overly rare.

and will overwrite any higher priority config

I'm not sure how the design of CNI holds together for this situation then. For example, if Multus really does have a hard requirement on being first what else can they do besides say "we don't work with anything that tries to get above us" and punt the problem? To @keithmattix's point I totally get that this is all a temporary solution until the CNI change is available for use.

In the case where the primary CNI has a fixed priority, this solution works

This was going to be my next question. Currently we say Istio's CNI works with anything, with this change what'll be the messaging for folks using Multus (or something similar) which requires to be first? This would account for both OSS and proprietary solutions so anything is on the table.

I totally hear this though. I think we need to find the balance between a solution that favors complete security and disrupts the existing ambient workflows, and a solution where security is based on the user's proper configuration and knowledge of their underlying CNI and doesn't impact the existing ambient workflow.

Agreed, in a perfect reality the change to CNI will become available immediately and this whole intermediate state we're in can be skipped :) I understand the situation is a bit tricky logistically and overall quite annoying!

jaellio · 2025-05-13T22:01:18Z

This was going to be my next question. Currently we say Istio's CNI works with anything, with this change what'll be the messaging for folks using Multus (or something similar) which requires to be first? This would account for both OSS and proprietary solutions so anything is on the table.

One thought here is an Istio owned CNI config could be opt in...but if users opt out they're vulnerable (which is definitely not ideal)

jaellio · 2025-05-13T22:03:13Z

While not "dynamic" I believe Multus has the requirement of being first. Last I looked their config is prefixed with 00. I'm not sure what'll happen if something loads before it, haven't really dug in there. To your point, I'm not sure which other CNIs in the landscape (who knows about closed source ones used inside company x, etc.) would run into this. I'd imagine the whole "starting first" requirement to not be overly rare.

I'll do some more investigation here to see which CNIs this solution might be incompatible with.

howardjohn · 2025-05-14T01:55:16Z

A few incomplete notes ..

For multus users I don't think they need any fix , pods in multus explicitly declare which network they need(?) though maybe that doesn't work for ambient...
For init container I don't think it's bad as an option. It won't work for dynamically enrolled pods but that's largely fine.. if you opt into the extra layer of enforcement then you can handle restarting your pods to onboard. Note if we do this we need to make the repair controller ambient aware.
Given the high risk of environmental incompatibilities this should definitely be optional. I guess tbd what the default is based on some more investigation
thanks for leading this

Signed-off-by: Jackie Elliott <jaellio@microsoft.com>

configurable Signed-off-by: Jackie Elliott <jaellio@microsoft.com>

Signed-off-by: Jackie Elliott <jaellio@microsoft.com>

keithmattix · 2025-06-09T18:24:00Z

/test integ-ambient-mc

jaellio · 2025-06-16T18:22:51Z

@howardjohn could you PTAL?

jaellio · 2025-06-23T17:13:50Z

Per discussion in the WG meeting on 6/18 we are moving forward with this temporary solution for 1.27 @howardjohn

istio-testing added do-not-merge/work-in-progress Block merging of a PR because it isn't ready yet. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels May 3, 2025

jaellio marked this pull request as ready for review May 12, 2025 17:55

jaellio requested a review from a team as a code owner May 12, 2025 17:55

istio-testing removed the do-not-merge/work-in-progress Block merging of a PR because it isn't ready yet. label May 12, 2025

jaellio changed the title ~~[Proposal Draft] Create Istio owned CNI config~~ Create Istio owned CNI config May 12, 2025

jaellio added the do-not-merge Block automatic merging of a PR. label May 13, 2025

jaellio added 3 commits June 4, 2025 17:22

Create Istio owned CNI config

373e226

Signed-off-by: Jackie Elliott <jaellio@microsoft.com>

Set CNIConfName to primary CNI config file name

6901820

Signed-off-by: Jackie Elliott <jaellio@microsoft.com>

Make Istio owned cni config opt in and the config name

21e5fb4

configurable Signed-off-by: Jackie Elliott <jaellio@microsoft.com>

jaellio force-pushed the jaellio/fixcnirace branch from 31badbb to 21e5fb4 Compare June 6, 2025 21:01

jaellio requested a review from a team as a code owner June 6, 2025 21:01

Add warning for invalid istio own cni config filename priority

6734e2d

Signed-off-by: Jackie Elliott <jaellio@microsoft.com>

istio-testing added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 6, 2025

Add release note

acdb74b

Signed-off-by: Jackie Elliott <jaellio@microsoft.com>

jaellio requested a review from howardjohn June 6, 2025 23:59

howardjohn approved these changes Jun 24, 2025

View reviewed changes

zirkome mentioned this pull request Jun 25, 2025

Ambient: iptables inconsistently missing for pod(s) upon node reboot #55968

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Create Istio owned CNI config #56156

Create Istio owned CNI config #56156

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Create Istio owned CNI config #56156

Are you sure you want to change the base?

Create Istio owned CNI config #56156

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!