feat!: helm chart and base image refactor DO NOT MERGE #816

ryanfaircloth · 2024-01-15T17:54:26Z

Individual commits for the specific changes at a high level this pr removes the need for hostPID (container is still privileged)

remove lvmd as a independent ds it can still be external or embedded
in embedded mode runs lvmd in its own container within the pod created by the DS
moves the socket for inter-process communication to a emptydir()
removes the dependency on the host having /sbin/lvm in the distro
adds logging of the lvm command line (base64 encoded)
Removes uselegacy
moves CRDs to the crd folder for safer uninstall/reinstall

I really don't expect to see this PR merged but the feedback is useful and I will pull the parts out that make sense and submit as bit size PRs.

BREAKING CHANGE: This was moved to legacy over a year ago removing to simplify chart Signed-off-by: Ryan Faircloth <ryan@dss-i.com>

BREAKING CHANGE: Moving to the root crds folder so that crds will not be removed on uninstall. The consequence of this change is CRDs are not upgraded however common CD tools such as argo and flux2 will upgrade the CRDs. It is common for k8s solutions using CRDs to require manual updates when using helm for deployment. Alternative would be to use manifests or olm in the future Signed-off-by: Ryan Faircloth <ryan@dss-i.com>

Loadbalancer can have a cost and is not required for this use case. fix is changing default to ClusterIP Signed-off-by: Ryan Faircloth <ryan@dss-i.com>

TopoLVM is critical to scheduling and node readiness set appropriate default values Signed-off-by: Ryan Faircloth <ryan@dss-i.com>

Signed-off-by: Ryan Faircloth <ryan@dss-i.com>

Alpine is a slightly smaller image this change swaps for a similar configuration and adds nvme-cli and lvm2 so user provided init routines have a likely chance of being able to use the same image to keep pull size down. Result is 284MB vs the current image size of 319MB. A side benefit when the current image is built it is more likely than not a different package will be installed because versions are not easily pined. In this PR I am not pinning versions as this adds a responsibility to the upstream team to update package level locked versions. Signed-off-by: Ryan Faircloth <ryan@dss-i.com>

This refactor handled embedding lvmd as a container rather than a process in the node csi container. This makes logging more friendly as the lvmd logs can be monitored without "reading between the lines" of the node pod. The socket is mounted using a emptydir with memory as the medium. Removed use of nodePID Signed-off-by: Ryan Faircloth <ryan@dss-i.com>

toshipp · 2024-01-16T01:07:28Z

@ryanfaircloth
hostPID is needed because lvm command should run in the host namespaces due to its posix semaphore usage.

jakobmoellerdev · 2024-01-16T09:07:50Z

lvmd/command/lvm.go


 	log.FromContext(ctx).Info("invoking LVM command", "args", args)
 	err := c.Run()
+
+	log.FromContext(ctx).Info("result of LVM command", "stdout", stdout.Bytes(), "stderr", stderr.Bytes())


Can we introduce this at a lower log level? I do not think this is necessary for common debugging

The current run command that is slighly less useful for troubleshooting is already at the info level I would say move both to debug but I don't see support for changing log level this runs once at startup so I don't see the harm with including it. It took me several hours to get to the issue of usePID not working on eks 1.28 because I was unaware nsenter was failing silently

I would argument for V(2) at the very least for both commands then but maybe we put that into a separate issue as it affects pretty much all commands? WDYT @pluser @toshipp ?

Overall what is logged and when could use some improvement its not "bad" but its not very intentional, for example at the info level I would like to know what vgs were found on startup so in the list function we should have a dump of the VGs, as LVs actions we should see that at info level. this really should be debug but as a non go programmer I wasn't really wanting to do more than I could safely copy and paste.

I agree with @jakobmoellerdev . Please separate it.

jakobmoellerdev · 2024-01-16T09:09:00Z

charts/topolvm/templates/node/daemonset.yaml

      {{- end }}
      {{- with .Values.node.initContainers }}
      initContainers: {{ toYaml . | nindent 6 }}
      {{- end }}
      containers:
-        - name: topolvm-node
+      {{- if .Values.node.lvmdEmbedded }}


lvmdEmbedded is fundamentally different from what you are trying to do here. Embedded means embedded in topolvm-node, not a separate binary. If you run lvmd separately from topolvm-node it is no longer embedded.

Yes I do agree I changed the implementation and re-used the same flag the admin experience should be the same though would you like a new word here or a more verbose description of the change

Just trying to add some context as the original implementor of the embedded functionality: IMO this is a fundamentally different deployment from using --embed-lvmd in topolvm-node so we should find a different name for it.

From what I can see, this functionality should not be removed just because we change the way that we move from 2 DS to 1 DS without using this flag. When using the flag, we do not even need to run LVMD so there will always be 1 DS.

I have 2 follow-up questions from your work:

How in the future would we allow users to run with --embed-lvmd on topolvm-node? IMO this is answered by running 1 DS with 2 containers in the default setting (lvmdEmbedded = false) or introducing a new flag that is mutually exclusive with lvmdEmbedded = true

What tangible benefit does it bring to have a single daemon set with 2 containers over 2 ds with 1 container. I currently do not see major gains, would be cool to have this in the PR explanation

I see we have multiple implementation choices right now LVMD can be systemd, in its own DS or as a child process of the node DS container. In this PR I removed "embeded" as it is currently defined because that didn't seem to solve a unique problem or offer any value on its own and to me made troubleshooting more complicated. having LVMD as its own daemon set also doesn't seem to be particularly useful as an option because we could have race conditions between the pods or node may not start for some reason such as lack of resources. So really I moved the LVMD "managed" into this pod and redefined embeded. I think adding node.lvmd.external would be more descriptive as this really just reduces the deployment choices to two. DS (via node pod) and external

I see we have multiple implementation choices right now LVMD can be systemd, in its own DS or as a child process of the node DS container.

It is not a child process if you run it within topolvm-node. It is one process and we do not open up a gRPC server in this case.

In this PR I removed "embeded" as it is currently defined because that didn't seem to solve a unique problem or offer any value on its own and to me made troubleshooting more complicated.

I believe you are quite quick to come to this resolution. Please reference the dedicated design proposal and see if you still do not understand the use case. Just because this option of deployment is not applicable for you does not mean it is not useful for other users.

having LVMD as its own daemon set also doesn't seem to be particularly useful as an option because we could have race conditions between the pods or node may not start for some reason such as lack of resources.

This does not make sense because 1. race conditions between pods are not a thing (we are an eventually consistent system, and starting one before the other does not result in problems) and 2. If the node does not have enough resources for both lvmd and topolvm-node and you run with 1DS the daemonset will still not start and run successfully.

So really I moved the LVMD "managed" into this pod and redefined embeded. I think adding node.lvmd.external would be more descriptive as this really just reduces the deployment choices to two. DS (via node pod) and external

I disagree that you redefined embedded. You removed the option to configure it entirely. There is no way in your chart to enable this kind of configuration.
However I agree that having 2 deployment choices should be the goal.
My conclusion is different however: I think your way of deploying (1 DS with 2 containers ) should succeed the 2 DS approach and NOT embedded LVMD.

re race conditions I have seen instances of the node pod crashing when lvmd is not ready, I have also seen a few instances where workload pods start before node/lvmd is ready

Thanks for the doc pointer, after reading I think my approach achieves the same goals in this design doc the goal of not needing a second DS is achieved by treating LVMd as a library (I thought it was process) with the consequence of having two very different interface methods. My approach does result in retention of an independent process and grpc but means the interface between node and lvmd is the same rather it is deployed in the pod or deployed as a systemd service. The goals is I understand them are the same. My approach has a slightly smaller code surface less to maintain and would make the embeded moot assuming I understood the goals correctly. While I am removing choices have I caused an unforeseen consequence by impacting a feature or function.

For the crashes resulting from startup: While it is true that topolvm-node will continue to restart if it does not establish a connection to LVMD, this will work just fine after a restart and is part of a normal regression loop in k8s. Your approach can fix this though, I agree.

To your comment: I believe your approach does not achieve 2 of the main goals on why this was introduced in the first place: 1. Reduce Memory Consumption because you still need a gRPC server and a socket and 2. Reduce Container Count - Even though you have 1 DS you still retain both containers, which results in more crun/kubelet overhead. This is especially important in edge scenarios where local provisioning needs low resource overhead which is one of the main use cases for our Users (the environment in which we bundle TopoLVM). Without these reasons there would not be a valid reason to even combine lvmd and topolvm-node.

That being said and assuming all of this is not valid in the end for the greater open source community, you would still have to remove all of the code that was introduced with embed-lvmd on top of your existing changes, otherwise it will end up dead code. Same goes for the test coding. All of this while not gaining features and even increasing resource consumption again.

I am honestly surprised how you think that your change is beneficial over an implementation approach that is already present, presents the same functionality in a less complex deployment model (because compared to your suggestion, you will have to setup a socket and maintain 2 containers with 2 log streams, while the existing implementation does not need a socket at all), has lower resource overhead and is already tested.

From what I can see, the only reason you really did this is because:

[embedded] made troubleshooting more complicated [during testing]

Which is an entirely separate problem and should be tackled within the given code imo.

Thanks for the back and forth I see the issue and will revise the approach to use embedded rather than side car

jakobmoellerdev · 2024-01-16T09:12:26Z

Dockerfile

-FROM --platform=$TARGETPLATFORM topolvm as topolvm-with-sidecar
+FROM --platform=$TARGETPLATFORM alpine:3.19 as topolvm-with-sidecar
+
+RUN apk add --no-cache \


This will add licensing issues to the released TopoLVM images as they now package various OS packages that need to be mentioned in the licensing

While that is possible I don't think its a new concern the only thing that is likely net new is lvm2 and nvmecli. To resolve this and future concerns I would suggest the addition of FOSSA (free service for oss apps) to produce a SBOM. You may have specific reasons here I don't understand happy to discuess and resolve

Honestly this is a great point. @toshipp I suggest we open up an issue on LICENSE tracking for all of our open source dependencies in TopoLVM so that we remain compliant in the project and questions like these dont arise in PRs. I suggest we do this before we merge here.

We (speaking on behalf of Red Hat) are interested in remaining License Compliant for TopoLVM and we personally do not use the Dockerfile in the open source version because we rebuild with Red Hat Maintained versions of packages such as lvm2 (which is also maintained and owned by Red Hat and published as open source).

Until this is not cleared up however, I suggest not to move forward with this change to make sure no licensing aspects change while we move for more transparency.

I also see the way this is being built today the sidecars from the kube project are embeded into the container, I don't see a THIRDPARTY or other sbom so I would like to continue with the change as its more likely slightly better due to alpine being a smaller base image with fewer inclusions, or a net same

Its a different story if you use third party container images or if you build your own container image imo. But lets wait for other maintainers opinions

#818 for LICENSE discussions. Thanks for creating!

@jakobmoellerdev @ryanfaircloth I'll handle #818.

.values.yaml

jakobmoellerdev · 2024-01-16T09:13:25Z

charts/topolvm/templates/node/daemonset.yaml

@@ -24,8 +24,8 @@ spec:
        {{- with .Values.node.podLabels }}
        {{- toYaml . | nindent 8 }}
        {{- end }}
-      {{- if and .Values.node.metrics.enabled .Values.node.metrics.annotations }}
-      annotations: {{ toYaml .Values.node.metrics.annotations | nindent 8 }}
+      {{- if and .Values.node.csi.metrics.enabled .Values.node.csi.metrics.annotations }}


breaking change with previous chart versions, needs a major version

Agree added ! and Breaking Change to the commits to ensure this was communicated

Thats nice! But maybe we should think about wether we need to move this CSI section at all? What benefit does it bring or is it necessary? If not, maybe we can avoid the breaking change alltogether?

Might have been over agressive but these options don't apply to the pod only the csi container so I was attempting to not introduce ambiguity with the changes to move lvmd into the pod

Ah understood, makes sense to me :) . Then lets wait for other maintainers to give feedback on if this will be accepted.

jakobmoellerdev · 2024-01-16T09:14:44Z

charts/topolvm/templates/node/daemonset.yaml

+        {{- if .Values.node.lvmdEmbedded }}
+        - name: lvmd-socket-dir
+          emptyDir: 
+            medium: Memory


nice change, but are you sure this is the only thing requiring hostPID? Just because it starts doesnt mean all block devices are recognized properly

I did test in my environment which at the moment is EKS with k8s 1.28

See #816 (comment)

I incorrectly remove the topolvm target I missed that image is published but not used in the helm chart I had no reason to remove it other than simplification so I am putting it back

The current process was building the upstream side-cars with no specific options or modifications. All required arch versions are provided upstream this change simply uses as provided a future change could adopt the upstream containers

toshipp · 2024-01-17T09:11:18Z

@ryanfaircloth
This PR is too big for me (as you said). Please make it smaller to solve a specific problem when you send the next PR, even if you need feedback.

toshipp · 2024-01-17T09:45:17Z

I leave my comments for each commit.

7b33e32
Since it is hard for existing users to migrate from the old API group to the new one, we don't want to delete it. So I decline this.

6cdae05
I suppose there are both advantages and disadvantages to this change. Also, this is a break, we should separate it from the PR if you want to continue.

5c9d859
This was introduced by the contributor for IaaS environments. We have less knowledge of them, it may work well, but this is a break. Please separate it.

f62a1ac
We want users to choose to use priorityClasses. I object to this.

3cce93d
We are familiar with Ubuntu and chose Ubuntu because of its ease of maintenance rather than image size. I decline it.

ryanfaircloth · 2024-01-17T10:40:29Z

I leave my comments for each commit.

7b33e32 Since it is hard for existing users to migrate from the old API group to the new one, we don't want to delete it. So I decline this.

6cdae05 I suppose there are both advantages and disadvantages to this change. Also, this is a break, we should separate it from the PR if you want to continue.

5c9d859 This was introduced by the contributor for IaaS environments. We have less knowledge of them, it may work well, but this is a break. Please separate it.

f62a1ac We want users to choose to use priorityClasses. I object to this.

3cce93d We are familiar with Ubuntu and chose Ubuntu because of its ease of maintenance rather than image size. I decline it.

Thanks I will keep this open for easy access to the feedback and close it off when I've proposed the smaller PRs for proper discussion.

ryanfaircloth requested a review from a team as a code owner January 15, 2024 17:54

ryanfaircloth requested review from toshipp and cupnes January 15, 2024 17:54

ryanfaircloth added 7 commits January 15, 2024 12:55

feat!: Remove legacy domain

7b33e32

BREAKING CHANGE: This was moved to legacy over a year ago removing to simplify chart Signed-off-by: Ryan Faircloth <ryan@dss-i.com>

fix: service.type LoadBalancer should be ClusterIP

5c9d859

Loadbalancer can have a cost and is not required for this use case. fix is changing default to ClusterIP Signed-off-by: Ryan Faircloth <ryan@dss-i.com>

fix: set default for priorityClasName

f62a1ac

TopoLVM is critical to scheduling and node readiness set appropriate default values Signed-off-by: Ryan Faircloth <ryan@dss-i.com>

chore: add logging of lvm output

bb57e8b

Signed-off-by: Ryan Faircloth <ryan@dss-i.com>

ryanfaircloth force-pushed the k8s-1.28 branch from 7eade4e to 8a4ac8f Compare January 15, 2024 17:56

toshipp mentioned this pull request Jan 16, 2024

feat: add "additionalVolumes" for lvmd and node ds #814

Merged

jakobmoellerdev requested changes Jan 16, 2024

View reviewed changes

jakobmoellerdev reviewed Jan 16, 2024

View reviewed changes

ryanfaircloth added 3 commits January 16, 2024 05:07

chore: remove file I failed to exclude from manual testing

6ea9571

fix: restore topolvm target

a25b32a

I incorrectly remove the topolvm target I missed that image is published but not used in the helm chart I had no reason to remove it other than simplification so I am putting it back

refactor: use upstream builds of side-cars

fe4ec7a

The current process was building the upstream side-cars with no specific options or modifications. All required arch versions are provided upstream this change simply uses as provided a future change could adopt the upstream containers

ryanfaircloth changed the title ~~feat!: helm chart and base image refactor~~ feat!: helm chart and base image refactor DO NOT MERGE Jan 16, 2024

satoru-takeuchi mentioned this pull request Jan 17, 2024

Produce a SBOM using fossa or another acceptable tool #818

Open

satoru-takeuchi added the do-not-merge/needs-rebase label Jan 17, 2024

toshipp marked this pull request as draft January 18, 2024 01:14

ryanfaircloth closed this Jan 22, 2024

feat!: helm chart and base image refactor DO NOT MERGE #816

feat!: helm chart and base image refactor DO NOT MERGE #816

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!