8000 Bgp control plane: add route aggregation feature by romanspb80 · Pull Request #37275 · cilium/cilium · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Bgp control plane: add route aggregation feature #37275

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

romanspb80
Copy link
Contributor
@romanspb80 romanspb80 commented Jan 27, 2025

Add route aggregation feature in BGP Control Plane (RFC 4632)
New route aggregation options under the CiliumBGPAdvertisement.

apiVersion: cilium.io/v2alpha1
kind: CiliumBGPAdvertisement
metadata:
  name: bgp-advertisements
  labels:
    advertise: bgp
spec:
  advertisements:
    - advertisementType: "Service"
      service:
        aggregationLengthIPv4: 24                  <<<<<<<<<<<<<<   New Field
        aggregationLengthIPv6: 120                 <<<<<<<<<<<<<<   New Field

Fixes: #36777

Signed-off-by: Roman Ptitcyn romanspb@yahoo.com

@romanspb80 romanspb80 requested review from a team as code owners January 27, 2025 00:53
@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Jan 27, 2025
@github-actions github-actions bot added the kind/community-contribution This was a contribution made by a community member. label Jan 27, 2025
Copy link
Contributor
@harsimran-pabla harsimran-pabla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @romanspb80 Before jumping into the code review of the implementation, I think we need to agree on the API change. I still think user should specify aggregation length instead of dynamically calculating it based on the boolean flag.

In vast majority of cases there is going to be single IP returned from getServicePrefix
, so with single /32 or /128 prefix there will be nothing to aggregate.

Have you tested this change on kubernetes cluster with sample loadbalancer service ? can you provide some example outputs how aggregation will look like?

@romanspb80
Copy link
Contributor Author

Hi @romanspb80 Before jumping into the code review of the implementation, I think we need to agree on the API change. I still think user should specify aggregation length instead of dynamically calculating it based on the boolean flag.

In vast majority of cases there is going to be single IP returned from getServicePrefix , so with single /32 or /128 prefix there will be nothing to aggregate.

Have you tested this change on kubernetes cluster with sample loadbalancer service ? can you provide some example outputs how aggregation will look like?

Hi @harsimran-pabla . Thanks for the comment.
I agree with you that the most common case would be single prefix. But there ar 8000 e couple of points which are worth discussing.
For example, we can use annotation to assign multiple ips for serice advertisement:

apiVersion: v1
kind: Service
metadata:
  name: nginxservice
  annotations:
    "io.cilium/lb-ipam-ips": "172.16.102.2,172.16.102.8"
spec:
  type: LoadBalancer
  selector:
    app: nginx
  ports:
  - name: grpc
    port: 8888
    targetPort: 80 

In this case the dynamically calculating works:

bash-5.1# ./gobgp neighbor 172.18.0.2 adj-in
   ID  Network              Next Hop             AS_PATH              Age        Attrs
   0   10.244.1.0/24        172.18.0.2           65002                00:03:05   [{Origin: i}]
   0   172.16.102.0/28      172.18.0.2           65002                00:00:05   [{Origin: i}]
bash-5.1# 

The dynamically calculated mask /28 is more efficiently then if we set the mask /24 manualy.

Also there is another case which I noted in my comment :
"I would also like to note that in this pull request, aggregation is performed only for prefixes within the service. But in this PR I made a feature in which aggregation takes into account all services with this option and the route aggregation is global for all prefixes of the affected services. I suggest discussing this point too."

Thanks

@harsimran-pabla
Copy link
Contributor

But there are couple of points which are worth discussing. For example, we can use annotation to assign multiple ips for serice advertisement

Yes, but the common case is that each service gets single /32 prefix. And if we pick start and end address from various services which have aggregate flag set, resulting aggregated prefix can be very large range - as services can have IPs allocated from different lb-ipam pools.

@harsimran-pabla
Copy link
Contributor

"I would also like to note that in this pull request, aggregation is performed only for prefixes within the service. But in this #36791 I made a feature in which aggregation takes into account all services with this option and the route aggregation is global for all prefixes of the affected services. I suggest discussing this point too."

This approach was modifying LB Pool with a aggregate flag. Why not introduce new BGP advertisement type which will advertise complete LB IPAM pool range ? Implementation would be much simpler and not change existing service reconcilers.

I am trying to understand if the goal is to reduce /32 prefix announcements, can we achieve it with this simpler mechanism. Existing service announcements can be configured if user wants to service with eTP=Local to work.

@romanspb80
Copy link
Contributor Author
romanspb80 commented Jan 30, 2025

"I would also like to note that in this pull request, aggregation is performed only for prefixes within the service. But in this #36791 I made a feature in which aggregation takes into account all services with this option and the route aggregation is global for all prefixes of the affected services. I suggest discussing this point too."

This approach was modifying LB Pool with a aggregate flag. Why not introduce new BGP advertisement type which will advertise complete LB IPAM pool range ? Implementation would be much simpler and not change existing service reconcilers.

I am trying to understand if the goal is to reduce /32 prefix announcements, can we achieve it with this simpler mechanism. Existing service announcements can be configured if user wants to service with eTP=Local to work.

Yes, that makes sense. So instead of dynamic calculation use the cidr (or all cidrs from the LB pool) mask right away?
For example, if cidr from LB pool is 172.16.102.0/25 then bgp advertisement will be same in case of enabled aggregation.
We can also add an option for the case where the pool contains multiple cidrs, then they can be aggregated if they are adjacent.

@harsimran-pabla
Copy link
Contributor

We can also add an option for the case where the pool contains multiple cidrs, then they can be aggregated if they are adjacent.

No, I would still prefer each pool advertisement. Atleast we should start with this common case and see later if it makes sense to improve further upon it. In this approach, feature is not about prefix aggregation. It is more about LB Pool advertisement.

@harsimran-pabla
Copy link
Contributor

There are two approaches to reduce /32 advertisements, I am not leaning towards one or the other.

  1. Introduce new advertisement type, which is LB Pool Advert. This will advertise LB Pool prefix regardless of checking backend services. Much simpler implementation as well as larger prefix block can be advertised. Drawback is that if you'd want services with eTP=Local, this is not enough. And other cases of cluster ip/external ip advertisement, this advertisement type is not the solution for it.
  2. Provide prefix length as previously discussed here.

I am not yet convinced if we need to jump straight into auto-aggregation, we can look at it in the future. At this stage, I think we need something simpler and deterministic.

@romanspb80
Copy link
Contributor Author

Ok. Will do this approach:

  1. Provide prefix length as previously discussed here.

@maintainer-s-little-helper
Copy link

Commit 3a5d0c8 does not match "(?m)^Signed-off-by:".

Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Feb 8, 2025
@romanspb80 romanspb80 force-pushed the BGP-Control-Plane--add-route-aggregation branch from 3a5d0c8 to 89a459f Compare February 8, 2025 21:28
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Feb 8, 2025
@romanspb80
Copy link
Contributor Author

Hi @harsimran-pabla
I've updated PR in accordance with the second approach. I added two fields:

  • aggregationLengthIPv4 with range 24..31
  • aggregationLengthIPv6 with range 120..127

It makes sense to discuss the thresholds of these ranges.

Thanks

Copy link
Contributor
@harsimran-pabla harsimran-pabla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @romanspb80 for the changes. API looks good to me, there are few other comments regarding the implementation.

  • Route policies should be done for all service types ( external IP, cluster IP ), currently it implemented for LB Ingress IP only.
  • Unit tests are missing.
  • Route policies which are merged here should be sorted based on prefix length. For eg, if multiple advertisements match same service, longer prefix should be first.

Copy link
Contributor
@tommyp1ckles tommyp1ckles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cilium/sig-k8s CRD changes look good to me. I do see that @harsimran-pabla had a comment regarding the need for the min values.

@romanspb80 romanspb80 force-pushed the BGP-Control-Plane--add-route-aggregation branch 2 times, most recently from 4e3904e to 4d27eee Compare February 16, 2025 14:14
@romanspb80
Copy link
Contributor Author

@harsimran-pabla , I've updated the PR in accordance with the comments.
Thanks

@romanspb80 romanspb80 force-pushed the BGP-Control-Plane--add-route-aggregation branch from 4d27eee to 1af4fc5 Compare February 19, 2025 19:56
@romanspb80
Copy link
Contributor Author

Hi @romanspb80, instead of merge commit, can you please rebase only your change on top of latest. merge-commit is not allowed in Cilium.

Also, we promoted the BGP APIs from v2alpha1 to v2, I see that you updated the API in v2. Similar to change in pkg/k8s/apis/cilium.io/v2/bgp_advert_types.go, you should do same change in pkg/k8s/apis/cilium.io/v2alpha1/bgp_advert_types.go, we want to keep both APIs same for time being. This helps in easy migration for users from v2alpha1 to v2.

Thank you, @harsimran-pabla . I updated the change

@YutaroHayakawa
Copy link
Member

/test

@pchaigno pchaigno enabled auto-merge March 9, 2025 08:20
@pchaigno pchaigno disabled auto-merge March 9, 2025 08:22
@romanspb80 romanspb80 force-pushed the BGP-Control-Plane--add-route-aggregation branch 3 times, most recently from 20fc92b to 27801fa Compare March 10, 2025 20:28
@harsimran-pabla
Copy link
Contributor

@romanspb80 I think you'd need to rebase on latest to get the failing test to pass. Check this comment on slack.

New route aggregation option under the CiliumBGPAdvertisement.
apiVersion: cilium.io/v2alpha1
kind: CiliumBGPAdvertisement
metadata:
  name: bgp-advertisements
  labels:
    advertise: bgp
spec:
  advertisements:
    - advertisementType: "Service"
      service:
        aggregationLengthIPv4: 24                   <<<<<<<<<<<<<<   New Field
        aggregationLengthIPv6: 120                  <<<<<<<<<<<<<<   New Field

Fixes: cilium#36777

Signed-off-by: Roman Ptitcyn <romanspb@yahoo.com>
@romanspb80 romanspb80 force-pushed the BGP-Control-Plane--add-route-aggregation branch from 27801fa to f61646e Compare March 14, 2025 23:25
@romanspb80
Copy link
Contributor Author

/test

@romanspb80
Copy link
Contributor Author

@romanspb80 I think you'd need to rebase on latest to get the failing test to pass. Check this comment on slack.

done. Thanks!

@pchaigno pchaigno added this pull request to the merge queue Mar 15, 2025
Merged via the queue into cilium:main with commit 9e1e31a Mar 15, 2025
67 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/bgp Impacts the Border Gateway Protocol feature. kind/community-contribution This was a contribution made by a community member. release-note/minor This PR changes functionality that users may find relevant to operating Cilium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature request: BGP Control Plane - BGP advertisements, route aggregation
7 participants
0