8000 Improve APIManager status by miguelsorianod · Pull Request #549 · 3scale/3scale-operator · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Improve APIManager status #549

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jan 29, 2021
Merged

Improve APIManager status #549

merged 9 commits into from
Jan 29, 2021

Conversation

miguelsorianod
Copy link
Contributor
@miguelsorianod miguelsorianod commented Dec 15, 2020

This PR adds a new condition in APIManager's Status section named Available.

We define the Available condition to true when ALL of the following scenarios are true:

  • All expected DeploymentConfigs to be deployed exist and have the Available condition set to true
  • All 3scale default OpenShift routes exist and have the Admitted condition set to true. The default routes are:
    • Master route
    • Backend Listener route
    • Default tenant admin route, developer route, APIcast staging and production routes beloinging to the default tenant

Design decisions / notable changes:

  • A new APIManagerStatusReconciler reconciler has been created that is now used in the APIManager controller
  • The previously defined conditions field in APIManager has been replaced by the common.Conditions type that we have available. This is a change in schema but if I'm not wrong this change in type has implied the addition of fields to the conditions field so this specific change should not be breaking compatibility.
  • The upgrade process does not update the status section. This keeps the same behavior as before this PR.

Pending:

  • Documentation
  • Testing
  • Clean commit history

return false
}

// TODO should we sort the DCs? when we set the status section we are
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See this TODO

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've replaced the TODO by a comment indicating that at that point the DCs should be sorted already

@eguzki
Copy link
Member
eguzki commented Dec 15, 2020

missing error reporting. Are you implementing it in this PR?

@miguelsorianod
Copy link
Contributor Author

missing error reporting. Are you implementing it in this PR?

Do you mean spec validation implementation? If that's the case I was not intending to implement it in this PR.


statusResult, err := r.reconcileAPIManagerStatus(instance)
if err != nil {
statusResult, statusErr := r.reconcileAPIManagerStatus(instance, err)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unsure on the requeue/received errors management at this point between logic and status reconciliation. Let's iterate from this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now if there's a requeue or error in the logic reconciler we'll reconcile at that point. Same when status scenario is reached.

@miguelsorianod miguelsorianod force-pushed the improve-apimanager-status branch from 6a38650 to 77d8e50 Compare December 16, 2020 11:55
@miguelsorianod
Copy link
Contributor Author

missing error reporting. Are you implementing it in this PR?

I'll open an issue to investigate how we could improve in that aspect. I understand the idea is to further improve the status by checking usual errors that might happen (imagestreams cannot pull the image, etc...)

@eguzki
Copy link
Member
eguzki commented Dec 16, 2020

From https://issues.redhat.com/browse/THREESCALE-4753, this "state machine" would be a nice to have:

Create event when all DC condition is available when current status is not available
Create event when any DC condition is unavailable when current status is available

@miguelsorianod miguelsorianod force-pushed the improve-apimanager-status branch 2 times, most recently from e9963a5 to df5c188 Compare January 11, 2021 17:02
@miguelsorianod
Copy link
Contributor Author

When testing the functionality the following scenario has arisen:

We check the readiness of the OCP Routes. Due to Zync-related routes are not created by the APIManager they don't have an APIManager OwnerReference. This means that we are not able to detect changes on routes, and this implies that the reconciler is not able to detect them and the state calculation cannot be done appropriately. The following scenario happens:

  1. We deploy an APIManager
  2. Eventually the APIManager Deployments and Routes end up being deployed appropriately and in a ready state but the last state that the APIManager receives is one corresponding to one of its Deployments. At that point in time of the event all DCs are already ready but not all routes. Due to we don't receive events related to Zync-related routes for the reasons explained above this change in status is not observed by the APIManager reconcile and it leaves the APIManager's availability condition permanently set to false which would be incorrect.

The following solutions have been thought and are not feasible:

  • Watch all routes. The problem with this is that the event received is the name and namespace of the route and not the APIManager one
  • Watch all routes that contain an APIManager OwnerReference with the EnqueueRequestForOwner method. The issue here is that Zync-related routes do not directly have an APIManager OwnerReference but an OwnerReference to Zync-Que DC

Possible solutions that might be feasible:

  • System somehow adds the APIManager to which it belongs as an annotation or some information that from the operator we can take a look at. The issue with this is that it sort of ties APIManager knowledge in System
  • We create our own handler that recursively resolves OwnerReferences. This happens before the Event is received and we would watch Route events with this handler and this handler would try to remap from Route objects to its APIManager object. Related to this approach there are several unknowns and the implementation could have several edge cases. We should be careful if we decide to implement this
  • If we detect that routes are not ready then we reenqueue the request. The issue with this is that if all routes are ready and then some of them go into a "not ready" state we would not detect this either so we would end up having the same issue. There's also the issue on deciding WHEN should we reenqueue and how much delay. We have to keep in mind that we don't always want to wait when it's not ready

Maybe the best idea for now is to just take into account Deployments only and not consider routes. In this way we have something already and we can continue thinking about whether we want to include Routes and how. I'm not 100% sure though as I'm a little bit concerned that introducing this would be more problematic that if we don't due to users might report that "routes are not available but the 'Available' flag is set to true". At the end it will depend on how we define the Available flag and what expectations we set to the users / users have.

@miguelsorianod
Copy link
Contributor Author

I'm currently exploring the following possible solution that I commented in my previous reply:

We create our own handler that recursively resolves OwnerReferences. This happens before the Event is received and we would watch Route events with this handler and this handler would try to remap from Route objects to its APIManager object. Related to this approach there are several unknowns and the implementation could have several edge cases. We should be careful if we decide to implement this

@miguelsorianod
Copy link
Contributor Author

New changes:

@miguelsorianod
Copy link
Contributor Author

@eguzki I think maybe this could be good enough already?

Regarding your previous comment #549 (comment), depending on what's the objective we want to achieve with that it might not be needed: The Condition provides the field LastTransitionTime which provides the last time the condition transit from one status to another.

@miguelsorianod
Copy link
Contributor Author
miguelsorianod commented Jan 18, 2021

Examples:

APIManager status conditions when it is not available:

  "conditions": [
    {
      "lastTransitionTime": "2021-01-18T17:08:46Z",
      "status": "False",
      "type": "Available"
    }
  ],

APIManager status conditions when it is Available:

  "conditions": [
    {
      "lastTransitionTime": "2021-01-18T17:13:42Z",
      "status": "True",
      "type": "Available"
    }
  ],

@eguzki
Copy link
Member
eguzki commented Jan 26, 2021

About the generation of k8s events on:

Create event when all DC condition is available when current status is not available
Create event when any DC condition is unavailable when current status is available

The state can be the status field itself. No need to inspect the transition time.

It can be done in another PR.

8000
if routeFound {
routeReady := helper.IsRouteReady(matchedRoute)
if !routeReady {
allDefaultRoutesReady = false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can break the loop, right?

In the future, we can build a list or errors (errorList) to report which route is not ready. The allDefaultRoutesReady would be when the length of the errList is 0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct.
The reason I didn't break the loop is exactly what you said. I left it prepared in case we want to add the list of routes.

@eguzki
Copy link
Member
eguzki commented Jan 26, 2021

Looking good. Added some feedback

Add 'Available' field to APIManager status section.
We define the Available condition to true when ALL of the following scenarios are true:
  · All expected DeploymentConfigs to be deployed exist and have the Available condition set to true
  · All 3scale default OpenShift routes exist and have the Admitted condition set to true. The default routes are:
    · Master route
    · Backend Listener route
    · Default tenant admin route, developer route, APIcast staging and production routes beloinging to the default tenant

Additionally status reconciliation has been refactored into its own reconciler
@miguelsorianod miguelsorianod force-pushed the improve-apimanager-status branch from 5ab0662 to a718df6 Compare January 28, 2021 16:43
@codeclimate
Copy link
codeclimate bot commented Jan 28, 2021

Code Climate has analyzed commit a718df6 and detected 15 issues on this pull request.

Here's the issue category breakdown:

Category Count
Complexity 1
Style 14

View more on Code Climate.

@miguelsorianod
Copy link
Contributor Author

Changes applied. Ready for a new review

@eguzki
Copy link
Member
eguzki commented Jan 28, 2021

looking good

@miguelsorianod miguelsorianod changed the title [WIP] Improve APIManager status Improve APIManager status Jan 29, 2021
@miguelsorianod miguelsorianod merged commit 910ee20 into master Jan 29, 2021
@miguelsorianod miguelsorianod deleted the improve-apimanager-status branch January 29, 2021 09:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0