8000 cmd/k8s-operator, k8s-operator: support direct connections on ProxyGroups by ChaosInTheCRD · Pull Request #16115 · tailscale/tailscale · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

cmd/k8s-operator, k8s-operator: support direct connections on ProxyGroups #16115

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

ChaosInTheCRD
Copy link
Contributor

updates: #14674

Copy link
review-ai-agent bot commented May 28, 2025

Pull Request Revisions

RevisionDescription
r18
Modified constant service name valueReplaced invalidSvcName constant with testSvcName in ports configuration
r17
Updated comment for filter predicateModified comment to clarify predicate function's filtering logic for Tailscale managed objects
r16
Inverted object managed resource filterModified predicate function to negate the original isManagedResource check in the Kubernetes operator filter
r15
Port name changed in service configurationModified service port name from directConnPortName to staticEndpointPortName in test configuration
r14
Added node access and refactoredModified ProxyGroup and operator logic to improve nodeport service creation, port allocation, and added node resource access permissions
13 more revisions
r13
Added comment about tailnet listenerAdded a documentation comment with a reference link explaining Tailscale device connectivity for the TailnetListenerConfig field
r12
Updated GitHub reference linkUpdated the GitHub link to a specific commit for Kubernetes port range allocator reference
r11
Updated TailnetListenerConfig listener type commentModified comment for TailnetListenerConfig to clarify NodePort is currently the only supported listener type
r10
Fixed JSON tag for Tailnet EndpointsCorrected JSON tag from endpoint to endpoints in TailnetListener struct definition
r9No changes since last revision
r8
Added node filter for ProxyGroupImplemented nodeHandlerForProxyGroup to create reconcile requests for ProxyGroups based on Node labels
r7
Added NodePort-based tailnet listener configurationIntroduced new ProxyClass configuration for NodePort-based tailnet listeners, enabling direct connections from other Tailnet devices by dynamically allocating and managing NodePort services across cluster nodes
r6
Added copyright and license headersInserted Tailscale copyright and BSD-3-Clause license header at the top of the ports.go file
r5
Port range validation logic modifiedModified port range validation to handle Kubernetes NodePort range detection more gracefully, adding logging and removing strict error handling
r4
Port allocation and validation refactoredIntroduced new ports.go with port range validation, updated port allocation logic in proxygroup.go, and improved NodePort configuration checks
r3
Added NodePort-based tailnet listener configurationIntroduced new ProxyClass configuration for NodePort-based tailnet listeners, enabling direct connections via external node IPs and configurable port ranges with node label selectors
r2
Enhanced Tailnet Listener Configuration DescriptionsAdded detailed documentation and comments for Tailnet Listener configuration, including NodePort configuration and type descriptions
r1
Added NodePort support for ProxyGroupsImplemented functionality to expose ProxyGroup replicas via NodePort, including port range allocation, node selection, and static endpoint configuration for Tailscale proxy instances

☑️ AI review skipped after 5 revisions, comment with `/review` to review again
Help React with emojis to give feedback on AI-generated reviews:
  • 👍 means the feedback was helpful and actionable
  • 👎 means the feedback was incorrect or unhelpful
💬 Replying to feedback with a comment helps us improve the system. Your input also contributes to shaping future interactions with the AI reviewer.

We'd love to hear from you—reach out anytime at team@review.ai.

@ChaosInTheCRD ChaosInTheCRD requested review from irbekrm and tomhjp and removed request for irbekrm May 28, 2025 12:35
@ChaosInTheCRD ChaosInTheCRD force-pushed the chaosinthecrd/k8s-operator-direct-connections branch from a6d809b to 8085b80 Compare May 28, 2025 16:26
@ChaosInTheCRD
Copy link
Contributor Author
ChaosInTheCRD commented May 30, 2025

Just want to add some notes for this PR:

  • Currently only one staticEndpoints is picked up by tailscaled, even though the config field takes a slice. Ideally we want to fix this so that multiple endpoints can be advertised by tailscaled. (not necessary for this PR)
  • We need to ensure that rather than erroring out we are updating the ProxyGroup / ProxyClass resource where appropriate. Otherwise further reconciles could be triggered.
  • We need to update the status of the ProxyGroup to clearly display the staticEndpoints that have been configured (and therefore the firewall rules that need to be applied for things to be functional).

@ChaosInTheCRD ChaosInTheCRD force-pushed the chaosinthecrd/k8s-operator-direct-connections branch from 8085b80 to dd9850a Compare June 4, 2025 13:27
Comment on lines +737 to +769
func (r *ProxyGroupReconciler) findStaticEndpoints(ctx context.Context, port int32, proxyClass *tsapi.ProxyClass, logger *zap.SugaredLogger) ([]netip.AddrPort, error) {
if len(proxyClass.Spec.TailnetListenerConfig.NodePortConfig.Selector) < 1 {
logger.Debugf("no LabelSelectors specified on tailnetListenerConfig for ProxyClass %q. Listing all nodes")
}

nodes := new(corev1.NodeList)
selectors := client.MatchingLabels(proxyClass.Spec.TailnetListenerConfig.NodePortConfig.Selector)

err := r.List(ctx, nodes, selectors)
if err != nil {
return nil, fmt.Errorf("failed to list nodes: %w", err)
}

if len(nodes.Items) == 0 {
err := fmt.Errorf("failed to match nodes to configured NodeSelectors in TailnetListenerConfig")
logger.Warn(err.Error())
return nil, err
}

endpoints := []netip.AddrPort{}

for _, n := range nodes.Items {
for _, a := range n.Status.Addresses {
if a.Type == corev1.NodeExternalIP {
addrPort := fmt.Sprintf("%s:%d", a.Address, port)
i, err := netip.ParseAddrPort(addrPort)
if err != nil {
logger.Debugf("failed to parse external address on node %q: %q", n.Name, addrPort)
continue
}
logger.Debugf("adding endpoint %q to staticEndpoints config", addrPort)
endpoints = append(endpoints, i)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the findStaticEndpoints function, there's no limit on the number of endpoints collected. For clusters with many nodes, this could result in a very large list. Consider adding a limit or pagination mechanism to avoid potential performance issues with large node counts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm interesting. Going to leave this comment open for now. At the moment we could just limit it to 1 node since tailscaled won't leverage more than one anyway (but this will change in the future).

@ChaosInTheCRD ChaosInTheCRD force-pushed the chaosinthecrd/k8s-operator-direct-connections branch from 4132310 to d75f0bb Compare June 5, 2025 12:35
…oups

updates: #14674

Signed-off-by: chaosinthecrd <tom@tmlabs.co.uk>
@ChaosInTheCRD ChaosInTheCRD force-pushed the chaosinthecrd/k8s-operator-direct-connections branch from 5a166f2 to 66a43ea Compare June 10, 2025 09:49
Copy link
Member
@tomhjp tomhjp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the many comments. Please could you also add a more detailed PR description? As well as any specific decisions made in the implementation, it would also be super helpful to include some example config for how to try it out.

@@ -104,6 +104,10 @@ func (r *HAServiceReconciler) Reconcile(ctx context.Context, req reconcile.Reque
hostname := nameForService(svc)
logger = logger.With("hostname", hostname)

if isManagedResource(svc) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we filter these in operator.go instead, so we never get into the Reconcile function in the first place?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I've gone ahead and done this. Will need to make sure we test it because we haven't done this for any other reconcilers. Also good to note that we haven't done this for other reconcilers, would be good to do this across the board where necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keeping this open so that I double check that this is working as expected. Also meaning to circle back and test the behaviour of the reconciler (I was seeing a lot of noise)

@@ -206,6 +215,164 @@ func (r *ProxyGroupReconciler) Reconcile(ctx context.Context, req reconcile.Requ
return setStatusReady(pg, metav1.ConditionTrue, reasonProxyGroupReady, reasonProxyGroupReady)
}

func getProxyGroupSvcPorts(ctx context.Context, c client.Client, namespace string, usedPorts map[int32]bool) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm reading correctly I think the map[int32]bool type would be better off as a Set from the util/set package.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry could you be a bit more specific? I am not familiar with util/set package 👀

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad, I omitted the tailscale.com part of the packge: https://pkg.go.dev/tailscale.com/util/set

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think that the bool is necessary here. Later on we iterate over the map to discover ports in the range that have not been used (value is false). See here.

ChaosInTheCRD and others added 2 commits June 10, 2025 14:29
Co-authored-by: Tom Proctor <tomhjp@users.noreply.github.com>
Signed-off-by: Tom Meadows <tom@tmlabs.co.uk>
Co-authored-by: Tom Proctor <tomhjp@users.noreply.github.com>
Signed-off-by: Tom Meadows <tom@tmlabs.co.uk>
ChaosInTheCRD and others added 2 commits June 10, 2025 17:26
Co-authored-by: Tom Proctor <tomhjp@users.noreply.github.com>
Signed-off-by: Tom Meadows <tom@tmlabs.co.uk>
Co-authored-by: Tom Proctor <tomhjp@users.noreply.github.com>
Signed-off-by: Tom Meadows <tom@tmlabs.co.uk>
@@ -84,7 +84,7 @@ type ProxyClassSpec struct {
UseLetsEncryptStagingEnvironment bool `json:"useLetsEncryptStagingEnvironment,omitempty"`
// Configuration for listeners on proxies in order to facilitate
// 'direct connections' from other devices on the tailnet.
// See https://tailscale.com/kb/1411/device-connectivity for details on tailnet device connectivity.
// See https://tailscale.com/kb/1445/kubernetes-operator-customization#static-endpoints
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might change... make sure we update this if it does.

Signed-off-by: chaosinthecrd <tom@tmlabs.co.uk>
@ChaosInTheCRD ChaosInTheCRD force-pushed the chaosinthecrd/k8s-operator-direct-connections branch from 7ce42da to dcce7f8 Compare June 10, 2025 17:18
@@ -375,6 +383,7 @@ func runReconcilers(opts reconcilerOpts) {
err = builder.
ControllerManagedBy(mgr).
For(&corev1.Service{}).
WithEventFilter(objectManagedResourceFilterPredicate()).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be missing something, but why are we adding this? The Services we reconcile are not operator's managed Services?

Copy link
Contributor
@irbekrm irbekrm Jun 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will break this reconciler- we'll no longer reconcile the HA Services at all.

I think what you are aiming for is the opposite (dont reconcile managed Services rather than only reconcile managed Services) and also that filter should only to be applied to Services - you can achieve that with a filter similar to what we use elsewhere and only applied to Services.

We should generally avoid filters that are added to all watched objects, it'd be quite easy to introduce subtle bugs that way

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry - I added this yesterday and hadn't tested yet.

This filter makes sure we don't reconcile on Kubernetes Services that are tailscale.com/managed = true. This was upon review comment from @tomhjp requesting that we do this as a filter on the reconciler rather than in reconcile function itself.

However, I just noticed that the filter in fact is missing a !, and is currently reconciling on managed Services only. I will fix this!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See latest commit. Still need to test this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add the filter to Services only

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(See my last comment - also in this specific case it will break all other reconciles that are for tailscale managed objects like state Secrets)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I see what you're saying. This will break the reconcile for other objects that we watch on this reconciler. Sorry about this, I genericized it because I thought it was going to make it more expandable but in doing so you're right it will affect the other watches.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's why we shouldn't use these- it's really easy to break other things especially given we don't have e2e tests so this is really only ever tested by hand

Signed-off-by: chaosinthecrd <tom@tmlabs.co.uk>
@ChaosInTheCRD ChaosInTheCRD force-pushed the chaosinthecrd/k8s-operator-direct-connections branch from cbbc5d4 to b2f913e Compare June 11, 2025 09:09
// ProxyGroup
reqs := make([]reconcile.Request, 0)
for _, pg := range pgList.Items {
if pg.Spec.ProxyClass != "" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can also have default ProxyClass

Comment on lines +884 to +887
for k, v := range tlc.NodePortConfig.Selector {
if val, ok := labels[k]; ok && val == v {
logger.Debugf("reconciling on ProxyGroup %q due to event from Node %q", pg.Name, o.GetName())
reqs = append(reqs, reconcile.Request{NamespacedName: client.ObjectKeyFromObject(&pg)})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic will result in the same ProxyGroup being appended to reqs multiple times if the selector specifies multiple labels. It will also result in ProxyGroup being reconciled if only one of the labels matches.

I think what we want is the same behaviour as for any Kubernetes selector where all labels have to match for the node to be selected

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There might be some helper function somwhere that you could use

Name: invalidSvcName,
Namespace: tsNamespace,
Labels: map[string]string{
kubetypes.LabelManaged: "true",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add tailscale.com/managed here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry - do you mean, "we shouldn't add tailscale.com/managed here? Since kubetypes.LabelManaged = "tailscale.com/managed".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I'm sorry, I actually forgot the kubetypes is our library 🤦🏼 ignore this comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem at all!

Co-authored-by: Irbe Krumina <irbe@tailscale.com>
Signed-off-by: Tom Meadows <tom@tmlabs.co.uk>
Spec: corev1.ServiceSpec{
Type: corev1.ServiceTypeNodePort,
Ports: []corev1.ServicePort{
// NOTE (ChaosInTheCRD): we set the ports once we've iterated over every svc and found any old configuration we want to persist
Copy link
Contributor
@irbekrm irbekrm Jun 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit:

Suggested change
// NOTE (ChaosInTheCRD): we set the ports once we've iterated over every svc and found any old configuration we want to persist
// NOTE(ChaosInTheCRD): we set the ports once we've iterated over every svc and found any old configuration we want to persist.

// Copyright (c) Tailscale Inc & AUTHORS
// SPDX-License-Identifier: BSD-3-Clause

package main
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: can we name this file something more specific to the NodePort Services? We have other bits of fairly complicated port wrangling in other places (see egress services logic, for example)

Comment on lines +23 to +24
portMax = 65535
portMin = 1024
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion:

Suggested change
portMax = 65535
portMin = 1024
tailscaledPortMax = 65535
tailscaledPortMin = 1024

or something else more explicit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0