Fix for final inventory update #319

seans3 · 2021-01-26T06:37:37Z

Updates the Prune function to correctly store the final inventory items. These items are:
1. the successfully applied objects
2. the prune failures
Fixes error where unsuccessfully applied objects were included in the final inventory.
Adds to Prune unit test to validate the final inventory stored is correct.
Small clean-up for TaskContext in apply_task to ensure only successfully applied objects are added to context.
Many new debug logging statements.

k8s-ci-robot · 2021-01-26T06:37:44Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: seans3

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [seans3]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Liujingfang1

Overall looks good to me.

Liujingfang1 · 2021-01-26T18:10:21Z

pkg/apply/prune/prune_test.go

 				}
 			})
 		}
 	}
 }

+// unionObjects returns the union of sliceA and sliceB as a slice of unstructured objects.


If both sliceA and sliceB contains the unstructured object with the same ObjMeta, the one from sliceB is taken.

Liujingfang1 · 2021-01-26T18:21:40Z

pkg/apply/prune/prune.go

-			localIds = append(localIds, clusterObj)
+			klog.V(4).Infof("skip prune for lifecycle directive %s/%s", pruneObj.Namespace, pruneObj.Name)
+			taskContext.EventChannel() <- createPruneEvent(pruneObj, obj, event.PruneSkipped)
+			pruneFailures = append(pruneFailures, pruneObj)


Maybe add a comment here for why adding it to pruneFailures. It's actually not a failure, we want to use pruneFailures to keep it in the final inventory list.

Liujingfang1 · 2021-01-26T18:28:36Z

pkg/apply/prune/prune.go

 				continue
 			}
+			err = namespacedClient.Delete(context.TODO(), pruneObj.Name, metav1.DeleteOptions{})
+			if err != nil {


The error handling here is the same as the one after namespacedClient, err := po.namespacedClient(pruneObj). We can combine the getting client and deleting the object into one function, like what you have done for getObject. The new function can be deleteObject.

mortent · 2021-01-27T22:58:42Z

So overall looks good, but I want to understand how we decide whether a resource that we failed to apply should be included in the inventory or not. I want to make sure that a situation where a resource already exists in the cluster and apply (which in this case would be an update/patch) fails, we don't end up orphaning the resource. I don't think this were among the testcases.

seans3 · 2021-02-01T22:26:04Z

So overall looks good, but I want to understand how we decide whether a resource that we failed to apply should be included in the inventory or not. I want to make sure that a situation where a resource already exists in the cluster and apply (which in this case would be an update/patch) fails, we don't end up orphaning the resource. I don't think this were among the testcases.

We had an informative discussion about this corner case, and we believe we know a way forward to make the final inventory storage step even more error tolerant. This PR is strictly better than before, so we will merge this current PR. The improvements to the algorithm will be instituted in a future PR. The following is documentation of our discussion to further improve the final inventory calculation.

The apply operation currently does a get before the apply. From these operations we believe we have the information necessary to pass to the Step 4 final inventory (through the TaskContext) to improve its error tolerance. Currently, we are only storing the UID of successfully applied objects. We will enhance this data structure according to the following decision tree.

If we are able to successfully get the object, and the apply succeeds, the o 8000 bject remains in the inventory.
If we are able to successfully get the object, but the apply fails, then we know it exists in the cluster, and it should therefore remain in the inventory.
If the get fails with a Not Found error, then we know the item is not in the cluster. This is considered an initial creation. If the apply succeeds for this initial creation, then it will be added to the final inventory. If the apply fails it will not be added to the final inventory because it is not in the cluster.
If the get fails with a different error, then we check if the item was in the previous inventory.
a. If the item was in the previous inventory, then we assume it is still in the cluster and we keep it in the inventory
b. If the item was not in the previous inventory, then we can confidently assume it is not in the cluster and we do not add it to the inventory.

Implementing this decision tree (in a future PR) would have two prerequisites:

We need to store more than UIDs for successful applies in the TaskContext. We would probably have to store a Map<ObjMetadata, UID> for all objects we intend to store in the final inventory.
We would need to pass the set of previous inventory into the apply step, since we do not keep this separate after the initial union inventory storage.

Implementing this decision tree would allow us to remove the final get iteration on all currently applied objects in the Step 4 final inventory calculation.

Liujingfang1 · 2021-02-01T23:13:11Z

/lgtm

Fix for final inventory update

6db68fd

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jan 26, 2021

seans3 requested a review from mortent January 26, 2021 06:37

k8s-ci-robot requested review from Liujingfang1 and soltysh January 26, 2021 06:37

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 26, 2021

seans3 removed the request for review from soltysh January 26, 2021 06:37

k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jan 26, 2021

Liujingfang1 reviewed Jan 26, 2021

View reviewed changes

Liujingfang1 mentioned this pull request Jan 26, 2021

Log filename and/or Object identifier when running into issues applying resources kptdev/kpt#1395

Closed

k8s-ci-robot assigned Liujingfang1 Feb 1, 2021

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 1, 2021

k8s-ci-robot merged commit 152b41c into kubernetes-sigs:master Feb 1, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix for final inventory update #319

Fix for final inventory update #319

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Fix for final inventory update #319

Fix for final inventory update #319

Uh oh!

Conversation

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!