Open
Description
@JamesKunstle suggested we hit NCCL timeouts in CI because we're testing with gp3 volumes that have throughput configured too low (125MB/s), and we should configure our CI runners to use 512MB/s instead.
Let's identify:
- Where this is configured today (does not seem to be configured with code)
- What e2e tests need this new setting (small, large, or all)