Open
Description
UPDATE: I see "nats disconnected" in the Vertex init log. I am thinking the isbsvc was being flakey.
Describe the bug
e2e failure
Snippet:
Concurrent e2e Should update all rollouts concurrently
COMBINATION: Data Loss: Pipeline, NumaflowController; Direct Apply: ISBService
/home/runner/work/numaplane/numaplane/tests/e2e/concurrent-e2e/concurrent_test.go:312
STEP: Verifying that the NumaflowControllerRollout was created @ 05/14/25 22:34:18.04
STEP: Verifying that the NumaflowControllerRollout is ready @ 05/14/25 22:34:18.09
STEP: Verifying that NumaflowControllerRollout is deployed @ 05/14/25 22:34:18.09
STEP: Verifying that NumaflowControllerRollout child resources is deployed @ 05/14/25 22:34:18.138
STEP: Verifying that NumaflowControllerRollout child resources is healthy @ 05/14/25 22:34:18.188
STEP: Verifying that the Numaflow Controller Deployment exists @ 05/14/25 22:34:22.537
STEP: Verifying that the ISBServiceRollout was created @ 05/14/25 22:34:22.542
STEP: Verifying that the ISBServiceRollout is ready @ 05/14/25 22:34:22.544
STEP: verify ISBServiceRollout is deployed @ 05/14/25 22:34:22.544
STEP: verify ISBServiceRollout child resource is deployed @ 05/14/25 22:34:22.663
STEP: verify ISBServiceRollout child resource is healthy @ 05/14/25 22:34:22.665
STEP: Verifying that the ISBService exists @ 05/14/25 22:35:00.743
STEP: Verifying that the StatefulSet exists and is ready @ 05/14/25 22:35:00.787
STEP: Verifying that the StatefulSet Pods are in Running phase @ 05/14/25 22:35:00.79
STEP: getting new isbservice name @ 05/14/25 22:35:00.794
STEP: Verifying PDB @ 05/14/25 22:35:00.796
STEP: found PDB with Match Labels: map[app.kubernetes.io/component:isbsvc numaflow.numaproj.io/isbsvc-name:test-isbservice-rollout-0] @ 05/14/25 22:35:00.798
found PDB with Match Labels: map[app.kubernetes.io/component:isbsvc numaflow.numaproj.io/isbsvc-name:test-isbservice-rollout-0] STEP: Verifying that the PipelineRollout was created @ 05/14/25 22:35:00.803
STEP: Verifying that the Pipeline was created @ 05/14/25 22:35:00.807
STEP: verifying Pipeline Spec @ 05/14/25 22:35:00.807
STEP: Verifying that the PipelineRollout is Deployed @ 05/14/25 22:35:00.903
STEP: Verifying that the PipelineRollout is Deployed @ 05/14/25 22:35:01.118
STEP: Verifying InProgressStrategy @ 05/14/25 22:35:01.126
STEP: Verifying that the PipelineRollout Child Condition is Healthy @ 05/14/25 22:35:01.141
[FAILED] in [It] - /home/runner/work/numaplane/numaplane/tests/e2e/pipeline.go:97 @ 05/14/25 22:41:01.157
[ABORTED] in [AfterEach] - /home/runner/work/numaplane/numaplane/tests/e2e/common.go:636 @ 05/14/25 22:41:01.157
• [ABORTED] [403.168 seconds]
TOP-LEVEL [AfterEach] Concurrent e2e Should update all rollouts concurrently
COMBINATION: Data Loss: Pipeline, NumaflowController; Direct Apply: ISBService
[AfterEach] /home/runner/work/numaplane/numaplane/tests/e2e/common.go:632
[It] /home/runner/work/numaplane/numaplane/tests/e2e/concurrent-e2e/concurrent_test.go:312
[ABORTED] Test spec has failed, aborting suite run
In [AfterEach] at: /home/runner/work/numaplane/numaplane/tests/e2e/common.go:636 @ 05/14/25 22:41:01.157
pod-logs-progressive-concurrent.zip
resource-changes-progressive-concurrent.zip
Message from the maintainers:
Impacted by this bug? Give it a 👍. We often sort issues this way to know what to prioritize.