8000 Allow direct routing to container ports from trusted interfaces by robmry · Pull Request #49832 · moby/moby · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Allow direct routing to container ports from trusted interfaces #49832

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
May 1, 2025

Conversation

robmry
Copy link
Contributor
@robmry robmry commented Apr 17, 2025

- What I did

Since 28.0.0, docker's iptables rules will block packets addressed to a container's address, unless they originate from the host.

The issue linked above describes a case where this is undesirable (Flannel VXLAN provides an overlay network, and packets arriving on its interface should be treated as belonging to the bridge network). The existing workarounds are:

  • set env var DOCKER_INSECURE_NO_IPTABLES_RAW=1
    • but this enables access to published ports from any interface, for containers in all bridge networks.
    • there's no plan for an nftables equivalent for that env var, it was only added as a workaround for kernels without iptables "raw" support, and "raw" rules aren't special in nftables.
  • use gateway mode nat-unprotected
    • but that enables routed access to any port (published or not), on any container in the network 8000 , from any network interface.

Follow-ups needed:

  • update the CLI repo's dockerd.md
  • docs updates

- How I did it

bridge: add option com.docker.network.bridge.trusted_host_interfaces

Interfaces listed have access to published ports on the container's address - enabling direct routing to the container via those interfaces.

It accepts a colon-separated list (comma is allowed in an interface name, colon isn't).

For example:

docker network create -o com.docker.network.bridge.trusted_host_interfaces="flannel.1:eth42" mynet

(Like other options for user-defined networks, it's not possible to supply this option for the default bridge network.)

Add daemon option --allow-direct-routing

This daemon level option disables direct access filtering, enabling direct access to published ports on container addresses in all bridge networks, via all host interfaces.

For example, in /etc/docker/daemon.json:

  "allow-direct-routing": true

It overlaps with short-term env-var workaround DOCKER_INSECURE_NO_IPTABLES_RAW=1, but:

  • it does not allow packets sent from outside the host to reach ports published only to 127.0.0.1.
  • it will outlive iptables (the workaround was initially intended for hosts that do not have kernel support for the "raw" iptables table).

- How to verify it

Updated integration tests.

- Human readable description for the release notes

- Add daemon option `"allow-direct-routing"` to disable filtering of packets from outside the host addressed directly to containers.
- Add bridge network option `"com.docker.network.bridge.trusted_host_interfaces"`, accepting a colon-separated list of interface names. These interfaces have direct access to published ports on container IP addresses.

@robmry robmry added this to the 28.1.1 milestone Apr 17, 2025
@robmry robmry self-assigned this Apr 17, 2025
@robmry robmry force-pushed the trusted_interfaces branch 4 times, most recently from 44ca6dc to c1d50bc Compare April 17, 2025 18:37
@robmry robmry closed this Apr 17, 2025
@robmry robmry reopened this Apr 17, 2025
@robmry robmry force-pushed the trusted_interfaces branch from c1d50bc to 29b4c52 Compare April 17, 2025 19:37
@vvoland vvoland modified the milestones: 28.1.1, 28.1.2 Apr 18, 2025
@robmry robmry force-pushed the trusted_interfaces branch from 29b4c52 to eec99ec Compare April 22, 2025 10:27
@robmry robmry modified the milestones: 28.1.2, 28.2.0 Apr 23, 2025
@robmry robmry force-pushed the trusted_interfaces branch from eec99ec to 79c88a3 Compare April 24, 2025 14:01
@robmry robmry marked this pull request as ready for review April 24, 2025 14:09
Comment on lines +27 to +29
// TrustedHostInterfaces can be used to supply a list of host interfaces that are
// allowed direct access to published ports on a container's address.
TrustedHostInterfaces = "com.docker.network.bridge.trusted_host_interfaces"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not strictly related to this PR, but I think we should consider adding CLI flags that would set these options and make them more discoverable.
Alternatively, at least list them in the description of --opt in the network create --help.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, agreed - they are terrible ... the worst part is we can't currently error when a -o option isn't matched because it contains a typo - it'll still appear in inspect output, but just silently won't do what was expected.

Copy link
Contributor
@vvoland vvoland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@vvoland
Copy link
Contributor
vvoland commented Apr 24, 2025

Should we consider removing DOCKER_INSECURE_NO_IPTABLES_RAW now?

@robmry
Copy link
Contributor Author
robmry commented Apr 24, 2025

Should we consider removing DOCKER_INSECURE_NO_IPTABLES_RAW now?

Not yet unfortunately ... it's still needed to stop the daemon from trying to create iptables "raw" rules on hosts that don't have kernel support for it (the original purpose of that env var).

This new flag only stops it from creating the raw rule that blocks remote access to container ports, not the one that blocks access remote access to ports published the host's IPv4 loopback address. (And it doesn't stop the daemon from trying to delete from the raw table, which would also be an error without the kernel module.)

@robmry robmry mentioned this pull request Apr 24, 2025
@@ -137,6 +138,7 @@ func (ncfg *networkConfiguration) MarshalJSON() ([]byte, error) {
nMap["GwModeIPv4"] = ncfg.GwModeIPv4
nMap["GwModeIPv6"] = ncfg.GwModeIPv6
nMap["EnableICC"] = ncfg.EnableICC
nMap["TrustedHostInterfaces"] = strings.Join(ncfg.TrustedHostInterfaces, " ")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not simply marshal the slice of strings as a JSON array of strings? The persistence of bridge network configurations is so far removed from parsing the netlabels that I am unconvinced that there would be any maintenance or code readability advantage to using the same serialization scheme in the persistence layer. Marshaling to a JSON array has the advantage of being compatible with the default un/marshalling of a Go []string struct field. With a stringly-typed list we would have to define a custom type that implements json.Marshaler and json.Unmarshaler in order to un/marshal directly to/from a struct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With a stringly-typed list we would have to define a custom type that implements json.Marshaler and json.Unmarshaler in order to un/marshal directly to/from a struct.

Yes, exactly. Although, it'd marshal without needing a custom type so the other way to be to unmarshal by type asserting the value to []interface{}, then iterate over that and type assert the elements to strings.

So, "simply marshal the slice of strings as a JSON array of strings" isn't as simple as simply storing a string!

(I've added a commit with a mashal/unmarshal unit test for networkConfiguration as there didn't seem to be one.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the other way to be to unmarshal by type asserting the value to []interface{}, then iterate over that and type assert the elements to strings.

Fair point. Though just for completeness' sake, we do have a "./internal/sliceutil".Map function which makes that operation not much more verbose than a strings.FieldsFunc call.

sliceutil.Map(v, func(a any) string { s, _ := a.(string); return s })

@robmry robmry force-pushed the trusted_interfaces branch from 2170160 to 5075be7 Compare April 30, 2025 11:45
@robmry robmry requested a review from corhere April 30, 2025 11:47
@@ -337,6 +340,8 @@ func (c *networkConfiguration) fromLabels(labels map[string]string) error {
if c.EnableICC, err = strconv.ParseBool(value); err != nil {
return parseErr(label, value, err.Error())
}
case TrustedHostInterfaces:
c.TrustedHostInterfaces = strings.FieldsFunc(value, func(r rune) bool { return r == ':' })
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FieldsFunc splits the string s at each run of Unicode code points...

(emphasis mine.)

value is a bag of bytes, not a Unicode string. While I can't come up with an pathological input which would split in an unexpected way, I still worry about the risk of surprising behaviours from treating the value as a UTF-8 string.

Suggested change
c.TrustedHostInterfaces = strings.FieldsFunc(value, func(r rune) bool { return r == ':' })
c.TrustedHostInterfaces = strings.Split(value, ":")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite that, because strings.Split("", ":") is [""] ... so the bridge driver sees an empty interface name and generates a broken iptables rule. But, ok.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that's annoying. I overlooked that difference in behaviour in the godoc. Since my worry is more theoretical than anything at this point, I'm fine with reverting to strings.FieldsFunc until I can prove the existence of a pathological input that would round-trip incorrectly. Such an input may not exist. Now excuse me while I whip up a quick fuzz test...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fuzzing found nothing. My fears were unfounded. strings.FieldsFunc is fit for purpose, as far as I can tell, to split a bag-of-bytes string on a UTF-8 codepoint. And it makes sense once I think about it: UTF-8 is self-synchronizing so there is no confusing an ASCII char for part of a multibyte sequence.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thanks Cory - I've put it back to FieldsFunc, and rebased just because it had got a bit behind. I'll let the tests run, then merge.

@@ -215,6 +217,10 @@ func (ncfg *networkConfiguration) UnmarshalJSON(b []byte) error {
ncfg.GwModeIPv6, _ = newGwMode(v.(string))
}
ncfg.EnableICC = nMap["EnableICC"].(bool)
if v, ok := nMap["TrustedHostInterfaces"]; ok {
s, _ := v.(string)
ncfg.TrustedHostInterfaces = strings.FieldsFunc(s, func(r rune) bool { return r == ':' })
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ncfg.TrustedHostInterfaces = strings.FieldsFunc(s, func(r rune) bool { return r == ':' })
ncfg.TrustedHostInterfaces = strings.Split(s, ":")

GwModeIPv4 gwMode
GwModeIPv6 gwMode
EnableICC bool
TrustedHostInterfaces []string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should write down the preconditions that other code depends upon so the next person knows not to break the contract.

Suggested change
TrustedHostInterfaces []string
TrustedHostInterfaces []string // Interface names must not contain ':' characters

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, just in case Linux starts allowing ':' chars!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm referring to the bridge_store code which would not faithfully round-trip names that contain colon characters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, sure - I added the comment. (But, it's still a list of interface names, and it's a Linux rule that interface names can't contain ':' chars.)

@robmry robmry force-pushed the trusted_interfaces branch from 5075be7 to a62fe56 Compare April 30, 2025 17:21
@robmry robmry requested a review from corhere April 30, 2025 17:27
robmry added 3 commits April 30, 2025 20:59
trusted_host_interface have access to published ports on container
addresses - enabling direct routing to the container via those
interfaces.

Signed-off-by: Rob Murray <rob.murray@docker.com>
Signed-off-by: Rob Murray <rob.murray@docker.com>
Per-network option com.docker.network.bridge.trusted-host-interfaces
accepts a list of interfaces that are allowed to route
directly to a container's published ports in a bridge
network with nat enabled.

This daemon level option disables direct access filtering,
enabling direct access to published ports on container
addresses in all bridge networks, via all host interfaces.

It overlaps with short-term env-var workaround:
  DOCKER_INSECURE_NO_IPTABLES_RAW=1
- it does not allow packets sent from outside the host to reach
  ports published only to 127.0.0.1
- it will outlive iptables (the workaround was initially intended
  for hosts that do not have kernel support for the "raw" iptables
  table).

Signed-off-by: Rob Murray <rob.murray@docker.com>
@robmry robmry force-pushed the trusted_interfaces branch from a62fe56 to 44a3453 Compare April 30, 2025 19:59
@robmry robmry merged commit fa23123 into moby:master May 1, 2025
141 checks passed
@robmry robmry deleted the trusted_interfaces branch May 14, 2025 09:46
ArthurFlag pushed a commit to docker/docs that referenced this pull request Jun 3, 2025
…es" (#22601)

## Description

~**This update is for moby 28.2.0 - do not merge until it ships.**~ -
28.2 has shipped now.

Add description for daemon option `--allow-direct-routing` and network
option `com.docker.network.bridge.trusted_host_interfaces`.

## Related issues or tickets

moby/moby#49832

## Reviews

<!-- Notes for reviewers here -->
<!-- List applicable reviews (optionally @tag reviewers) -->

- [ ] Technical review
- [ ] Editorial review
- [ ] Product review

Signed-off-by: Rob Murray <rob.murray@docker.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Inter host container communication is not working for Publish ports with Flannel VXLAN in Docker V28.0
3 participants
0