8000 Run the chaosduck component during our e2e testing. by mattmoor · Pull Request #3565 · knative/eventing · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Run the chaosduck component during our e2e testing. #3565

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 15, 2020

Conversation

mattmoor
Copy link
Member

No description provided.

@knative-prow-robot knative-prow-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 11, 2020
@googlebot googlebot added the cla: yes Indicates the PR's author has signed the CLA. label Jul 11, 2020
@knative-prow-robot knative-prow-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. area/test-and-release Test infrastructure, tests or release labels Jul 11, 2020
@knative-prow-robot knative-prow-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 11, 2020
@mattmoor
Copy link
Member Author

It passed, running it again.

/test pull-knative-eventing-integration-tests

@mattmoor
Copy link
Member Author

Pretty sure this isn't running properly yet. Without a replicated webhook, I'd at least expect intermittent failures there, and looking through the logs the pods are far too old relative to the chaos duck:

chaosduck-5989f5cd88-mrb4n             1/1   Running   0     103s
eventing-controller-76cbd5d948-gcvbk   1/1   Running   0     2m38s
eventing-webhook-b66887bcd-ctnk6       1/1   Running   1     2m32s
eventing-webhook-b66887bcd-hr8x2       1/1   Running   0     2m32s
imc-controller-6c7979cfd9-ft7kz        1/1   Running   0     4s
imc-dispatcher-6596db8fcc-wk82v        1/1   Running   0     4s

@mattmoor
Copy link
Member Author

Cracking the logs I see:

knative-eventing-9utkcux231/chaosduck-5989f5cd88-mrb4n[chaosduck]: 2020/07/12 15:00:22 Ended iteration with err: pods "eventing-controller-76cbd5d948-gcvbk" is forbidden: User "system:serviceaccount:knative-eventing-9utkcux231:eventing-controller" cannot delete resource "pods" in API group "" in the namespace "knative-eventing-9utkcux231"

I just need to be less lazy about piggybacking on an existing SA and create my own with the right RBAC 🙃

@mattmoor
Copy link
Member Author

This should be fixed, let's see what breaks now 😈

@mattmoor
Copy link
Member Author

The problem now is where I'm standing things up. I saw this in serving where the wait_for_ready_pods in the serving namespace never completes because there is always a pod in terminating 😈

@knative-prow-robot knative-prow-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jul 14, 2020
@mattmoor
Copy link
Member Author

Enabling the mt-broker-controller and some Brokers start failing to become ready 🤔

Rolled back the broker bit and trying the next component on my list.

This runs the following components in an HA configuration and enabled "chaosduck" on them:
 - eventing webhook
 - eventing controller
 - sugar controller

This also stubs things our for the IMC controller/dispatcher and the MT Broker, but these are disabled due to observed issues (see linked issues).
@mattmoor mattmoor changed the title [WIP] Run the chaosduck component during our e2e testing. Run the chaosduck component during our e2e testing. Jul 14, 2020
@knative-prow-robot knative-prow-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 14, 2020
@mattmoor
Copy link
Member Author

/hold

@knative-prow-robot knative-prow-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 14, 2020
@mattmoor
Copy link
Member Author
mattmoor commented Jul 15, 2020

/test pull-knative-eventing-integration-tests

(running again to flush out flakes) 😇

@mattmoor
Copy link
Member Author

One more time...

/test pull-knative-eventing-integration-tests

@mattmoor
Copy link
Member Author

This looks like the webhook shutdown failure I am chasing (here):

TestChannelNamespaceDefaulter/InMemoryChannel-messaging.knative.dev/v1: creation.go:79: Failed to create channel "e2e-defaulter-channel": Internal error occurred: failed calling webhook "webhook.eventing.knative.dev": Post https://eventing-webhook.knative-eventing-cjvz5x2e0p.svc:443/defaulting?timeout=2s: EOF

/retest

@knative-test-reporter-robot

The following jobs failed:

Test name Triggers Retries
pull-knative-eventing-integration-tests 0/3

Failed non-flaky tests preventing automatic retry of pull-knative-eventing-integration-tests:

test/e2e.TestDefaultBrokerWithManyTriggers
test/e2e.TestDefaultBrokerWithManyTriggers/test_default_broker_with_many_attribute_and_extension_triggers

@mattmoor
Copy link
Member Author

Webhook again:

TestDefaultBrokerWithManyTriggers/test_default_broker_with_many_attribute_and_extension_triggers: creation.go:219: Failed to create v1beta1 trigger "trigger-testany-testany--extname1-extval1": Internal error occurred: failed calling webhook "validation.webhook.eventing.knative.dev": Post https://eventing-webhook.knative-eventing-kckqbzjwly.svc:443/resource-validation?timeout=2s: EOF

There is half a fix in (#3596), and I talked to @tcnghia about bumping network.DefaultDrainTimeout as well.

/retest

@vaikas
Copy link
Contributor
vaikas commented Jul 15, 2020

/lgtm
/approve

@knative-prow-robot knative-prow-robot added the lgtm Indicates that a PR is ready to be merged. label Jul 15, 2020
@knative-prow-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mattmoor, vaikas

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@mattmoor
Copy link
Member Author

/hold cancel

If we start seeing pervasive issues, we should role this back, but the scope of this is intended to be a relatively stable subset to start flushing out more niche HA issues.

@knative-prow-robot knative-prow-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 15, 2020
@knative-prow-robot knative-prow-robot merged commit 884ad13 into knative:master Jul 15, 2020
@mattmoor mattmoor deleted the chaosduck branch July 15, 2020 20:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/test-and-release Test infrastructure, tests or release cla: yes Indicates the PR's author has signed the CLA. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants
0