8000 e2e: add workload health validation to DR ops and refactor cluster name handling by parikshithb · Pull Request #2071 · RamenDR/ramen · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

e2e: add workload health validation to DR ops and refactor cluster name handling #2071

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jun 5, 2025

Conversation

parikshithb
Copy link
Member
@parikshithb parikshithb commented May 30, 2025

This PR adds workload health validation to all DR operations and deployer methods to ensure workloads are fully operational before considering operations successful. The implementation required refactoring to work with cluster objects instead of cluster name strings.

Changes Made

Workload Health Validation:

  • Added health validation after all deployment operations (ApplicationSet & Subscription)
  • Added health validation after DR protection enable operation for both regular and discovered apps
  • Added health validation after failover/relocate operations to ensure workloads are healthy on target clusters

Cluster Handling Refactoring:
Moved GetCluster from standalone function to method on *Env type, updated functions to return and accept types.Cluster objects instead of cluster name strings, and refactored failoverRelocate functions to eliminate redundant cluster lookups.

Fixes #2018

This change converts GetCluster function from e2e/env package
to types package as a method on *Env type, encapsulating env
functionality by grouping cluster retrieval logic with the Env
type it operates on.

Signed-off-by: Parikshith <parikshithb@gmail.com>
@parikshithb parikshithb removed the request for review from raghavendra-talur May 30, 2025 11:48
Copy link
Member
@nirs nirs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! see comment about simplying error hanlding.

Can you share results with this change? how much time we wait for health in each step, and logs showing that we wait for health. If we don't have these logs we need to add them to make it easier to debug when wait times out.

@parikshithb parikshithb force-pushed the validate_wl branch 2 times, most recently from 1e68575 to e61af30 Compare June 2, 2025 09:37
@parikshithb
Copy link
Member Author
parikshithb commented Jun 2, 2025
  1. Workload validation in deploy takes <10 seconds except for appset apps which takes <1min:
2025-06-02T16:51:12.772+0530	INFO	appset-deploy-rbd-busybox	deployers/appset.go:23	Deploying applicationset app "e2e-appset-deploy-rbd-busybox/busybox" in cluster "dr1"
2025-06-02T16:51:13.043+0530	DEBUG	appset-deploy-rbd-busybox	deployers/retry.go:46	Waiting until workload "e2e-appset-deploy-rbd-busybox/busybox" is healthy in cluster "dr1"
2025-06-02T16:52:03.144+0530	DEBUG	appset-deploy-rbd-busybox	workloads/deploy.go:95	Deployment "e2e-appset-deploy-rbd-busybox/busybox" is ready in cluster "dr1"
2025-06-02T16:52:03.144+0530	DEBUG	appset-deploy-rbd-busybox	deployers/retry.go:51	Workload "e2e-appset-deploy-rbd-busybox/busybox" is healthy in cluster "dr1"
2025-06-02T16:52:03.144+0530	INFO	appset-deploy-rbd-busybox	deployers/appset.go:49	Workload deployed

2025-06-02T16:51:12.773+0530	INFO	appset-deploy-cephfs-busybox	deployers/appset.go:23	Deploying applicationset app "e2e-appset-deploy-cephfs-busybox/busybox" in cluster "dr1"
2025-06-02T16:51:13.078+0530	DEBUG	appset-deploy-cephfs-busybox	deployers/retry.go:46	Waiting until workload "e2e-appset-deploy-cephfs-busybox/busybox" is healthy in cluster "dr1"
2025-06-02T16:52:03.207+0530	DEBUG	appset-deploy-cephfs-busybox	workloads/deploy.go:95	Deployment "e2e-appset-deploy-cephfs-busybox/busybox" is ready in cluster "dr1"
2025-06-02T16:52:03.207+0530	DEBUG	appset-deploy-cephfs-busybox	deployers/retry.go:51	Workload "e2e-appset-deploy-cephfs-busybox/busybox" is healthy in cluster "dr1"
2025-06-02T16:52:03.207+0530	INFO	appset-deploy-cephfs-busybox	deployers/appset.go:49	Workload deployed

2025-06-02T16:51:12.773+0530	INFO	subscr-deploy-cephfs-busybox	deployers/subscr.go:41	Deploying subscription app "e2e-subscr-deploy-cephfs-busybox/busybox" in cluster "dr1"
2025-06-02T16:51:24.473+0530	DEBUG	subscr-deploy-cephfs-busybox	deployers/retry.go:46	Waiting until workload "e2e-subscr-deploy-cephfs-busybox/busybox" is healthy in cluster "dr1"
2025-06-02T16:51:34.494+0530	DEBUG	subscr-deploy-cephfs-busybox	workloads/deploy.go:95	Deployment "e2e-subscr-deploy-cephfs-busybox/busybox" is ready in cluster "dr1"
2025-06-02T16:51:34.494+0530	DEBUG	subscr-deploy-cephfs-busybox	deployers/retry.go:51	Workload "e2e-subscr-deploy-cephfs-busybox/busybox" is healthy in cluster "dr1"
2025-06-02T16:51:34.494+0530	INFO	subscr-deploy-cephfs-busybox	deployers/subscr.go:78	Workload deployed

2025-06-02T16:51:12.773+0530	INFO	subscr-deploy-rbd-busybox	deployers/subscr.go:41	Deploying subscription app "e2e-subscr-deploy-rbd-busybox/busybox" in cluster "dr1"
2025-06-02T16:51:29.503+0530	DEBUG	subscr-deploy-rbd-busybox	deployers/retry.go:46	Waiting until workload "e2e-subscr-deploy-rbd-busybox/busybox" is healthy in cluster "dr1"
2025-06-02T16:51:39.524+0530	DEBUG	subscr-deploy-rbd-busybox	workloads/deploy.go:95	Deployment "e2e-subscr-deploy-rbd-busybox/busybox" is ready in cluster "dr1"
2025-06-02T16:51:39.524+0530	DEBUG	subscr-deploy-rbd-busybox	deployers/retry.go:51	Workload "e2e-subscr-deploy-rbd-busybox/busybox" is healthy in cluster "dr1"
2025-06-02T16:51:39.524+0530	INFO	subscr-deploy-rbd-busybox	deployers/subscr.go:78	Workload deployed

2025-06-02T16:51:12.880+0530	INFO	disapp-deploy-cephfs-busybox	deployers/disapp.go:47	Deploying discovered app "e2e-disapp-deploy-cephfs-busybox/busybox" in cluster "dr1"
2025-06-02T16:51:16.080+0530	DEBUG	disapp-deploy-cephfs-busybox	deployers/retry.go:46	Waiting until workload "e2e-disapp-deploy-cephfs-busybox/busybox" is healthy in cluster "dr1"
2025-06-02T16:51:16.108+0530	DEBUG	disapp-deploy-cephfs-busybox	workloads/deploy.go:95	Deployment "e2e-disapp-deploy-cephfs-busybox/busybox" is ready in cluster "dr1"
2025-06-02T16:51:16.108+0530	DEBUG	disapp-deploy-cephfs-busybox	deployers/retry.go:51	Workload "e2e-disapp-deploy-cephfs-busybox/busybox" is healthy in cluster "dr1"
2025-06-02T16:51:16.108+0530	INFO	disapp-deploy-cephfs-busybox	deployers/disapp.go:66	Workload deployed

2025-06-02T16:51:12.880+0530	INFO	disapp-deploy-rbd-busybox	deployers/disapp.go:47	Deploying discovered app "e2e-disapp-deploy-rbd-busybox/busybox" in cluster "dr1"
2025-06-02T16:51:15.983+0530	DEBUG	disapp-deploy-rbd-busybox	deployers/retry.go:46	Waiting until workload "e2e-disapp-deploy-rbd-busybox/busybox" is healthy in cluster "dr1"
2025-06-02T16:51:16.010+0530	DEBUG	disapp-deploy-rbd-busybox	workloads/deploy.go:95	Deployment "e2e-disapp-deploy-rbd-busybox/busybox" is ready in cluster "dr1"
2025-06-02T16:51:16.010+0530	DEBUG	disapp-deploy-rbd-busybox	deployers/retry.go:51	Workload "e2e-disapp-deploy-rbd-busybox/busybox" is healthy in cluster "dr1"
2025-06-02T16:51:16.010+0530	INFO	disapp-deploy-rbd-busybox	deployers/disapp.go:66	Workload deployed
  1. In Protect and unprotect, workload validation completes in mili seconds for all apps.

  2. In Failover aswell workload validation completes in mili seconds to complete for all apps:

2025-06-02T16:52:46.792+0530	INFO	disapp-deploy-rbd-busybox	dractions/actions.go:163	Failing over workload "e2e-disapp-deploy-rbd-busybox/busybox" from cluster "dr1" to cluster "dr2"
2025-06-02T16:56:23.021+0530	DEBUG	disapp-deploy-rbd-busybox	deployers/retry.go:46	Waiting until workload "e2e-disapp-deploy-rbd-busybox/busybox" is healthy in cluster "dr2"
2025-06-02T16:56:23.036+0530	DEBUG	disapp-deploy-rbd-busybox	workloads/deploy.go:95	Deployment "e2e-disapp-deploy-rbd-busybox/busybox" is ready in cluster "dr2"
2025-06-02T16:56:23.036+0530	DEBUG	disapp-deploy-rbd-busybox	deployers/retry.go:51	Workload "e2e-disapp-deploy-rbd-busybox/busybox" is healthy in cluster "dr2"
2025-06-02T16:56:23.036+0530	INFO	disapp-deploy-rbd-busybox	dractions/actions.go:171	Workload failed over

2025-06-02T16:52:51.975+0530	INFO	disapp-deploy-cephfs-busybox	dractions/actions.go:163	Failing over workload "e2e-disapp-deploy-cephfs-busybox/busybox" from cluster "dr1" to cluster "dr2"
2025-06-02T16:56:51.023+0530	DEBUG	disapp-deploy-cephfs-busybox	deployers/retry.go:46	Waiting until workload "e2e-disapp-deploy-cephfs-busybox/busybox" is healthy in cluster "dr2"
2025-06-02T16:56:51.028+0530	DEBUG	disapp-deploy-cephfs-busybox	workloads/deploy.go:95	Deployment "e2e-disapp-deploy-cephfs-busybox/busybox" is ready in cluster "dr2"
2025-06-02T16:56:51.028+0530	DEBUG	disapp-deploy-cephfs-busybox	deployers/retry.go:51	Workload "e2e-disapp-deploy-cephfs-busybox/busybox" is healthy in cluster "dr2"
2025-06-02T16:56:51.028+0530	INFO	disapp-deploy-cephfs-busybox	dractions/actions.go:171	Workload failed over

2025-06-02T16:53:15.129+0530	INFO	subscr-deploy-rbd-busybox	dractions/actions.go:163	Failing over workload "e2e-subscr-deploy-rbd-busybox/busybox" from cluster "dr1" to cluster "dr2"
2025-06-02T16:56:43.056+0530	DEBUG	subscr-deploy-rbd-busybox	deployers/retry.go:46	Waiting until workload "e2e-subscr-deploy-rbd-busybox/busybox" is healthy in cluster "dr2"
2025-06-02T16:56:43.060+0530	DEBUG	subscr-deploy-rbd-busybox	workloads/deploy.go:95	Deployment "e2e-subscr-deploy-rbd-busybox/busybox" is ready in cluster "dr2"
2025-06-02T16:56:43.060+0530	DEBUG	subscr-deploy-rbd-busybox	deployers/retry.go:51	Workload "e2e-subscr-deploy-rbd-busybox/busybox" is healthy in cluster "dr2"
2025-06-02T16:56:43.060+0530	INFO	subscr-deploy-rbd-busybox	dractions/actions.go:171	Workload failed over

2025-06-02T16:53:38.470+0530	INFO	appset-deploy-rbd-busybox	dractions/actions.go:163	Failing over workload "e2e-appset-deploy-rbd-busybox/busybox" from cluster "dr1" to cluster "dr2"
2025-06-02T17:00:09.649+0530	DEBUG	appset-deploy-rbd-busybox	deployers/retry.go:46	Waiting until workload "e2e-appset-deploy-rbd-busybox/busybox" is healthy in cluster "dr2"
2025-06-02T17:00:09.682+0530	DEBUG	appset-deploy-rbd-busybox	workloads/deploy.go:95	Deployment "e2e-appset-deploy-rbd-busybox/busybox" is ready in cluster "dr2"
2025-06-02T17:00:09.682+0530	DEBUG	appset-deploy-rbd-busybox	deployers/retry.go:51	Workload "e2e-appset-deploy-rbd-busybox/busybox" is healthy in cluster "dr2"
2025-06-02T17:00:09.682+0530	INFO	appset-deploy-rbd-busybox	dractions/actions.go:171	Workload failed over

2025-06-02T16:53:38.588+0530	INFO	appset-deploy-cephfs-busybox	dractions/actions.go:163	Failing over workload "e2e-appset-deploy-cephfs-busybox/busybox" from cluster "dr1" to cluster "dr2"
2025-06-02T17:00:09.654+0530	DEBUG	appset-deploy-cephfs-busybox	deployers/retry.go:46	Waiting until workload "e2e-appset-deploy-cephfs-busybox/busybox" is healthy in cluster "dr2"
2025-06-02T17:00:09.681+0530	DEBUG	appset-deploy-cephfs-busybox	workloads/deploy.go:95	Deployment "e2e-appset-deploy-cephfs-busybox/busybox" is ready in cluster "dr2"
2025-06-02T17:00:09.682+0530	DEBUG	appset-deploy-cephfs-busybox	deployers/retry.go:51	Workload "e2e-appset-deploy-cephfs-busybox/busybox" is healthy in cluster "dr2"
2025-06-02T17:00:09.682+0530	INFO	appset-deploy-cephfs-busybox	dractions/actions.go:171	Workload failed over

2025-06-02T16:53:40.677+0530	INFO	subscr-deploy-cephfs-busybox	dractions/actions.go:163	Failing over workload "e2e-subscr-deploy-cephfs-busybox/busybox" from cluster "dr1" to cluster "dr2"
2025-06-02T16:58:08.215+0530	DEBUG	subscr-deploy-cephfs-busybox	deployers/retry.go:46	Waiting until workload "e2e-subscr-deploy-cephfs-busybox/busybox" is healthy in cluster "dr2"
2025-06-02T16:58:08.223+0530	DEBUG	subscr-deploy-cephfs-busybox	workloads/deploy.go:95	Deployment "e2e-subscr-deploy-cephfs-busybox/busybox" is ready in cluster "dr2"
2025-06-02T16:58:08.223+0530	DEBUG	subscr-deploy-cephfs-busybox	deployers/retry.go:51	Workload "e2e-subscr-deploy-cephfs-busybox/busybox" is healthy in cluster "dr2"
2025-06-02T16:58:08.223+0530	INFO	subscr-deploy-cephfs-busybox	dractions/actions.go:171	Workload failed over
  1. During relocate, workload validation in sub rbd took 15secs, and appset cephfs ~2.2mins(might reach deadline for relocate) took appset rbd took ~80sec:
2025-06-02T16:56:23.057+0530	INFO	disapp-deploy-rbd-busybox	dractions/actions.go:196	Relocating workload "e2e-disapp-deploy-rbd-busybox/busybox" from cluster "dr2" to cluster "dr1"
2025-06-02T16:58:21.134+0530	DEBUG	disapp-deploy-rbd-busybox	deployers/retry.go:46	Waiting until workload "e2e-disapp-deploy-rbd-busybox/busybox" is healthy in cluster "dr1"
2025-06-02T16:58:21.149+0530	DEBUG	disapp-deploy-rbd-busybox	workloads/deploy.go:95	Deployment "e2e-disapp-deploy-rbd-busybox/busybox" is ready in cluster "dr1"
2025-06-02T16:58:21.149+0530	DEBUG	disapp-deploy-rbd-busybox	deployers/retry.go:51	Workload "e2e-disapp-deploy-rbd-busybox/busybox" is healthy in cluster "dr1"
2025-06-02T16:58:21.149+0530	INFO	disapp-deploy-rbd-busybox	dractions/actions.go:204	Workload relocated

2025-06-02T16:56:43.085+0530	INFO	subscr-deploy-rbd-busybox	dractions/actions.go:196	Relocating workload "e2e-subscr-deploy-rbd-busybox/busybox" from cluster "dr2" to cluster "dr1"
2025-06-02T16:58:43.363+0530	DEBUG	subscr-deploy-rbd-busybox	deployers/retry.go:46	Waiting until workload "e2e-subscr-deploy-rbd-busybox/busybox" is healthy in cluster "dr1"
2025-06-02T16:58:58.382+0530	DEBUG	subscr-deploy-rbd-busybox	workloads/deploy.go:95	Deployment "e2e-subscr-deploy-rbd-busybox/busybox" is ready in cluster "dr1"
2025-06-02T16:58:58.383+0530	DEBUG	subscr-deploy-rbd-busybox	deployers/retry.go:51	Workload "e2e-subscr-deploy-rbd-busybox/busybox" is healthy in cluster "dr1"
2025-06-02T16:58:58.383+0530	INFO	subscr-deploy-rbd-busybox	
8000
dractions/actions.go:204	Workload relocated

2025-06-02T16:56:51.055+0530	INFO	disapp-deploy-cephfs-busybox	dractions/actions.go:196	Relocating workload "e2e-disapp-deploy-cephfs-busybox/busybox" from cluster "dr2" to cluster "dr1"
2025-06-02T17:00:49.462+0530	DEBUG	disapp-deploy-cephfs-busybox	deployers/retry.go:46	Waiting until workload "e2e-disapp-deploy-cephfs-busybox/busybox" is healthy in cluster "dr1"
2025-06-02T17:00:49.468+0530	DEBUG	disapp-deploy-cephfs-busybox	workloads/deploy.go:95	Deployment "e2e-disapp-deploy-cephfs-busybox/busybox" is ready in cluster "dr1"
2025-06-02T17:00:49.468+0530	DEBUG	disapp-deploy-cephfs-busybox	deployers/retry.go:51	Workload "e2e-disapp-deploy-cephfs-busybox/busybox" is healthy in cluster "dr1"
2025-06-02T17:00:49.468+0530	INFO	disapp-deploy-cephfs-busybox	dractions/actions.go:204	Workload relocated

2025-06-02T16:58:08.244+0530	INFO	subscr-deploy-cephfs-busybox	dractions/actions.go:196	Relocating workload "e2e-subscr-deploy-cephfs-busybox/busybox" from cluster "dr2" to cluster "dr1"
2025-06-02T17:03:14.749+0530	DEBUG	subscr-deploy-cephfs-busybox	deployers/retry.go:46	Waiting until workload "e2e-subscr-deploy-cephfs-busybox/busybox" is healthy in cluster "dr1"
2025-06-02T17:03:14.756+0530	DEBUG	subscr-deploy-cephfs-busybox	workloads/deploy.go:95	Deployment "e2e-subscr-deploy-cephfs-busybox/busybox" is ready in cluster "dr1"
2025-06-02T17:03:14.756+0530	DEBUG	subscr-deploy-cephfs-busybox	deployers/retry.go:51	Workload "e2e-subscr-deploy-cephfs-busybox/busybox" is healthy in cluster "dr1"
2025-06-02T17:03:14.756+0530	INFO	subscr-deploy-cephfs-busybox	dractions/actions.go:204	Workload relocated

2025-06-02T17:00:09.785+0530	INFO	appset-deploy-rbd-busybox	dractions/actions.go:196	Relocating workload "e2e-appset-deploy-rbd-busybox/busybox" from cluster "dr2" to cluster "dr1"
2025-06-02T17:05:08.509+0530	DEBUG	appset-deploy-rbd-busybox	deployers/retry.go:46	Waiting until workload "e2e-appset-deploy-rbd-busybox/busybox" is healthy in cluster "dr1"
2025-06-02T17:06:33.653+0530	DEBUG	appset-deploy-rbd-busybox	workloads/deploy.go:95	Deployment "e2e-appset-deploy-rbd-busybox/busybox" is ready in cluster "dr1"
2025-06-02T17:06:33.653+0530	DEBUG	appset-deploy-rbd-busybox	deployers/retry.go:51	Workload "e2e-appset-deploy-rbd-busybox/busybox" is healthy in cluster "dr1"
2025-06-02T17:06:33.653+0530	INFO	appset-deploy-rbd-busybox	dractions/actions.go:204	Workload relocated

2025-06-02T17:00:09.790+0530	INFO	appset-deploy-cephfs-busybox	dractions/actions.go:196	Relocating workload "e2e-appset-deploy-cephfs-busybox/busybox" from cluster "dr2" to cluster "dr1"
2025-06-02T17:07:08.992+0530	DEBUG	appset-deploy-cephfs-busybox	deployers/retry.go:46	Waiting until workload "e2e-appset-deploy-cephfs-busybox/busybox" is healthy in cluster "dr1"
2025-06-02T17:09:24.374+0530	DEBUG	appset-deploy-cephfs-busybox	workloads/deploy.go:95	Deployment "e2e-appset-deploy-cephfs-busybox/busybox" is ready in cluster "dr1"
2025-06-02T17:09:24.374+0530	DEBUG	appset-deploy-cephfs-busybox	deployers/retry.go:51	Workload "e2e-appset-deploy-cephfs-busybox/busybox" is healthy in cluster "dr1"
2025-06-02T17:09:24.374+0530	INFO	appset-deploy-cephfs-busybox	dractions/actions.go:204	Workload relocated

dr.log

@parikshithb parikshithb requested a review from nirs June 2, 2025 13:53
Copy link
Member
@nirs nirs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's understand why the application needs 2m:30s to become healthy after relocate when using appset-deploy-cephfs. We may need to increase relocate timeout to avoid failures on slow setups.

@parikshithb parikshithb marked this pull request as draft June 4, 2025 12:05
@nirs nirs self-requested a review June 4, 2025 12:26
Updated GetCurrentCluster function to return types.Cluster
object by looking up the cluster name from PlacementDecision
and retrieving the corresponding cluster from the env.
Updated the function comment to reflect that it now returns
the cluster object.

Updated callers to access the full cluster object rather than
name string across deployers and DR actions to handle the new
return type.

Signed-off-by: Parikshith <parikshithb@gmail.com>
Upated getTargetCluster function to return types.Cluster object
by looking up the target cluster name and retrieving the cluster
from the env.
Updated variable naming from targetCluster to targetClusterName.

Updated all callers in Failover and Relocate functions to use
targetCluster.Name when passing cluster names.

Signed-off-by: Parikshith <parikshithb@gmail.com>
Updated failoverRelocate and failoverRelocateDiscoveredApps
functions to accept types.Cluster objects instead of cluster
name strings for currentCluster and targetCluster parameters.
This eliminates redundant GetCluster calls within
failoverRelocateDiscoveredApps since the cluster objects are
now passed directly from the callers.
Updated function calls in Failover and Relocate to pass cluster
objects instead of cluster names

Signed-off-by: Parikshith <parikshithb@gmail.com>
Updated waitAndUpdateDRPC function to accept types.Cluster
object instead of cluster name string, maintaining
consistency with other functions that now work with cluster
objects.
Updated callers in failoverRelocate for managed and disapp
to pass cluster objects instead of cluster names.

Signed-off-by: Parikshith <parikshithb@gmail.com>
Add debug log when starting to wait for workload health to
improve debugging visibility when health checks timeout.
Standardize error message format to use namespace/appName
and fix method call to getAppName() instead of GetName().

Signed-off-by: Parikshith <parikshithb@gmail.com>
@parikshithb
@parikshithb parikshithb requested a review from nirs June 4, 2025 14:34
@nirs
Copy link
Member
nirs commented Jun 4, 2025

Ran e2e including the workload validation after ops with drpolicies having 5 min and 1m(default) scheduling interval locally:

5m drpolicy used in config:

    name: dr-policy-5m
    resourceVersion: "67851"
    uid: 870a6b4d-8c3b-432d-af79-ca1cd393735c
  spec:
    drClusters:
    - dr1
    - dr2
    replicationClassSelector: {}
    schedulingInterval: 5m

The times looks very similar, did you change the drPolicy in the config? The validation logs should show the drpolicy.

@parikshithb
Copy link
Member Author

Ran e2e including the workload validation after ops with drpolicies having 5 min and 1m(default) scheduling interval locally:
5m drpolicy used in config:

    name: dr-policy-5m
    resourceVersion: "67851"
    uid: 870a6b4d-8c3b-432d-af79-ca1cd393735c
  spec:
    drClusters:
    - dr1
    - dr2
    replicationClassSelector: {}
    schedulingInterval: 5m

The times looks very similar, did you change the drPolicy in the config? The validation logs should show the drpolicy.

Yup, made sure I updated the drpolicy in config, logged in our logs:

2025-06-04T18:57:20.107+0530	INFO	validate/validate.go:145	Validated clusters ["dr1", "dr2"] in DRPolicy "dr-policy-5m"

@parikshithb parikshithb marked this pull request as ready for review June 4, 2025 15:25
@parikshithb parikshithb requested a review from nirs June 4, 2025 15:30
Copy link
Member
@nirs nirs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! one commit message should be fixed.

Add WaitWorkloadHealth checks to all DR operations except
unprotect(due to RamenDR#2077) to ensure workload is
healthy after operations complete.
Updated DiscoveredApp deployer to use the cluster variable
instead of ctx.Env().C1 for consistency.

These changes ensure workloads are fully operational before
considering workload deployments and different DR operations
successful.

Signed-off-by: Parikshith <parikshithb@gmail.com>
The Health method was incorrectly returning nil (success) even
when deployments were not ready, causing WaitWorkloadHealth to
immediately succeed without waiting. Now returns proper error
with replica status when deployment is not healthy.

Signed-off-by: Parikshith <parikshithb@gmail.com>
The Health() method was logging when a deployment is ready,
but this is redundant since the caller (WaitWorkloadHealth)
already logs both the "waiting" and "healthy" status messages.
This eliminates duplicate log entries for the same event.

Signed-off-by: Parikshith <parikshithb@gmail.com>
Copy link
Member
@nirs nirs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@nirs nirs merged commit 7ca6bd4 into RamenDR:main Jun 5, 2025
23 checks passed
@nirs
Copy link
Member
nirs commented Jun 5, 2025

@parikshithb please send a ramenctl PR to consume this change.

parikshithb added a commit to parikshithb/ramenctl that referenced this pull request Jun 5, 2025
Consuming fix for validating workload health after DR
operations: RamenDR/ramen#2071

Issue fixed in ramen e2e: RamenDR/ramen#2018

Signed-off-by: Parikshith <parikshithb@gmail.com>
nirs pushed a commit to RamenDR/ramenctl that referenced this pull request Jun 5, 2025
Consuming fix for validating workload health after DR
operations: RamenDR/ramen#2071

Issue fixed in ramen e2e: RamenDR/ramen#2018

Signed-off-by: Parikshith <parikshithb@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

e2e: Validate workload health after key operations
2 participants
0