Add Enos benchmark scenario #30675

raskchanky · 2025-05-19T20:59:37Z

Description

This PR adds a new scenario, called benchmark. The purpose of this scenario is to setup the required infrastructure to run performance tests against a Vault cluster. In its current state, the tests are not run automatically.

Note that the scenario hardcodes ubuntu/amd64. This was a choice made for expediency, as some of the packages that needed installing had to be installed from source, rather than a package manager. Ultimately, it would probably be nice to go back and remove those hardcoded choices and make the package installation a bit more flexible.

There are a few pieces to this new scenario:

A new benchmark module
- Creates the metrics 8000 and k6 instances.
- Installs k6 on the k6 instance, surprise!
- Installs grafana and prometheus on the metrics instance.
- Uses sed to add some telemetry to Consul, if that’s being used as the Vault backend.
- Installs the prometheus node exporter on all the Vault nodes, Consul backend nodes (if present), and k6 node. This is for collecting host metrics, e.g. CPU, memory, disk.
- Copies all the grafana dashboards up to the metrics instance.
- Copies all the k6 templates up to the k6 instance.
A new restart_consul module, for restarting Consul (surprise!) after the telemetry is added.
A new create_metrics_security_groups module for creating the security groups needed for opening grafana and prometheus ports.
Added telemetry to the Vault config for all the Vault nodes.
A max_io variable for passing to the target_ec2_instances module. If present, this will configure fast disks.
Four grafana dashboards
Three k6 templates
Various shell scripts for installing and running things

Right now this is used by:

Launching the scenario
Opening the grafana dashboard on the metrics instance in a browser
SSHing into the k6 instance and using the k6-run.sh script to run a specific k6 scenario
Capturing the output from the grafana dashboard by hand, typically via screenshots
Destroying everything

Eventually, I think the plan is to automate all of this in CI somehow, but it’s not clear to me how the data would be automatically collected after a run. That’s a future problem though, not something that needs solving in this PR (unless there’s an easy way).

Also, note that I’m very new to Terraform and Enos both, so I’m sure this whole PR is ripe for optimization.

https://hashicorp.atlassian.net/browse/VAULT-36161

TODO only if you're a HashiCorp employee

Backport Labels: If this fix needs to be backported, use the appropriate backport/ label that matches the desired release branch. Note that in the CE repo, the latest release branch will look like backport/x.x.x, but older release branches will be backport/ent/x.x.x+ent.
- LTS: If this fixes a critical security vulnerability or severity 1 bug, it will also need to be backported to the current LTS versions of Vault. To ensure this, use all available enterprise labels.
ENT Breakage: If this PR either 1) removes a public function OR 2) changes the signature
of a public function, even if that change is in a CE file, double check that
applying the patch for this PR to the ENT repo and running tests doesn't
break any tests. Sometimes ENT only tests rely on public functions in CE
files.
Jira: If this change has an associated Jira, it's referenced either
in the PR description, commit message, or branch name.
RFC: If this change has an associated RFC, please link it in the description.
ENT PR: If this change has an associated ENT PR, please link it in the
description. Also, make sure the changelog is in this PR, not in your ENT PR.

github-actions · 2025-05-19T21:00:27Z

CI Results:
All Go tests succeeded! ✅

github-actions · 2025-05-19T21:14:21Z

Build Results:
Build failed for these jobs: test:failure. Please refer to this workflow to learn more: https://github.com/hashicorp/vault/actions/runs/15220344513

Add Enos benchmark scenario

b30d15c

raskchanky requested a review from a team as a code owner May 19, 2025 20:59

github-actions bot added the hashicorp-contributed-pr If the PR is HashiCorp (i.e. not-community) contributed label May 19, 2025

raskchanky requested a review from ryancragun May 19, 2025 21:00

raskchanky added this to the 1.20.0-rc milestone May 20, 2025

raskchanky added the pr/no-changelog label May 20, 2025

Merge branch 'main' into enos-benchmark

d095d1f

vercel bot deployed to Preview May 20, 2025 22:22 View deployment

raskchanky added 6 commits May 21, 2025 15:41

add docs on how to run the scenario

9bfc465

update description again

39831fb

see if this works better if we return an empty map

97ebc19

hopefully disabling telemetry doesn't crash everything now

f98041d

yet another try at making telemetry configurable

3f420f3

swap consul nodes over to be the same as the vault ones

3f53bb8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Enos benchmark scenario #30675

Add Enos benchmark scenario #30675

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Add Enos benchmark scenario #30675

Are you sure you want to change the base?

Add Enos benchmark scenario #30675

Conversation

Uh oh!

Description

TODO only if you're a HashiCorp employee

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!