8000 feat: support cloud files for TB sync using wandb-core by timoffex · Pull Request #9849 · wandb/wandb · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
8000

feat: support cloud files for TB sync using wandb-core #9849

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 22, 2025

Conversation

timoffex
Copy link
Contributor
@timoffex timoffex commented May 13, 2025

Fixes WB-21735. Documentation PR: wandb/docs#1338

Replaces paths in internal/tensorboard with a new LocalOrCloudPath type. This requires some additional work when guessing root directories, so that's moved out of tensorboard.go into a new rootdirguesser.go file.

Cloud support

GCS format: gs://bucket/path/to/file. GCS uses the standard Google Cloud application default credentials, which can be configured using gcloud auth application-default login.

S3 format: s3://bucket/path/to/file. S3 uses its standard default credentials. The credentials file can be configured using aws configure, but environment variables might also work. I'm not sure how to use IAM instead of access keys and whether this is possible with the Go SDK, so there may be limitations (which are likely also present in TensorFlow's built-in support).

Azure format: az://account/container/path/to/file. Azure uses credentials set by az login, but also requires the environment variables AZURE_STORAGE_ACCOUNT and AZURE_STORAGE_KEY to be set. These variables and some alternatives are described here. In contrast, TensorFlow's Azure support requires the TF_AZURE_STORAGE_KEY variable to be set. So to run a script that uses TensorFlow and W&B with Azure log directories, one must use a pattern like

AZURE_STORAGE_ACCOUNT=myaccount \
AZURE_STORAGE_KEY=mykey \
TF_AZURE_STORAGE_KEY="$AZURE_STORAGE_KEY" \
python my_script.py

Testing

The only code that's specific to different clouds is in localorcloudpath.go; all other code uses the generic API provided by gocloud.dev, even for local files.

For manually testing, create a bucket in any of the supported clouds and then use this script:

Script:

import tensorflow as tf
import wandb

# This import is required for its side-effect of enabling
# cloud paths for tf.summary.create_file_writer().
# It's not necessary for GCS.
import tensorflow_io


with wandb.init(sync_tensorboard=True) as run:
    path = # s3:// or gs:// URL here
    with tf.summary.create_file_writer(path).as_default():
        tf.summary.scalar("x", 0.3, step=1)
        tf.summary.scalar("y", 0.7, step=1)
        tf.summary.scalar("z", 2.1, step=1)

        tf.summary.scalar("x", 0.5, step=2)
        tf.summary.scalar("x", 0.9, step=3)

        tf.summary.scalar("y", 1.1, step=2)
        tf.summary.scalar("z", 3.5, step=2)
        tf.summary.scalar("y", 0.8, step=3)

Confirmed this works with GCS, S3 and Azure (though for S3, my script ended with a segfault after the run finished, which I think must have come from TensorFlow itself).

Copy link
Contributor Author
timoffex commented May 13, 2025

@timoffex timoffex changed the title tensorboard cloud sync feat: support cloud files for TB sync using wandb-core May 13, 2025
Copy link
socket-security bot commented May 13, 2025

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff Package Supply Chain
Security
Vulnerability Quality Maintenance License

View full report

Copy link
codecov bot commented May 13, 2025

Codecov Report

Attention: Patch coverage is 73.81703% with 83 lines in your changes missing coverage. Please review.

Project coverage is 81.26%. Comparing base (98b0847) to head (64f3966).

Files with missing lines Patch % Lines
core/internal/tensorboard/localorcloudpath.go 70.40% 34 Missing and 3 partials ⚠️
core/internal/tensorboard/tensorboard.go 63.09% 26 Missing and 5 partials ⚠️
core/internal/tensorboard/rootdirguesser.go 85.85% 9 Missing and 5 partials ⚠️
core/internal/tensorboard/tfeventreader.go 83.33% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #9849      +/-   ##
==========================================
- Coverage   81.36%   81.26%   -0.10%     
==========================================
  Files         826      828       +2     
  Lines       84073    84244     +171     
==========================================
+ Hits        68409    68465      +56     
- Misses      14917    15030     +113     
- Partials      747      749       +2     
Flag Coverage Δ
func 48.83% <51.10%> (-0.10%) ⬇️
system 66.42% <4.10%> (-0.23%) ⬇️
unit 67.29% <65.61%> (-0.04%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
core/internal/tensorboard/tfeventstream.go 83.87% <100.00%> (ø)
core/internal/tensorboard/tfeventreader.go 69.39% <83.33%> (-0.55%) ⬇️
core/internal/tensorboard/rootdirguesser.go 85.85% <85.85%> (ø)
core/internal/tensorboard/tensorboard.go 77.60% <63.09%> (+1.68%) ⬆️
core/internal/tensorboard/localorcloudpath.go 70.40% <70.40%> (ø)

... and 35 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@timoffex timoffex force-pushed the 05-12-tensorboard_cloud_sync branch from bb3a996 to 600caf6 Compare May 13, 2025 22:10
@timoffex timoffex changed the base branch from main to graphite-base/9849 May 14, 2025 23:25
@timoffex timoffex force-pushed the 05-12-tensorboard_cloud_sync branch from 600caf6 to b7e3028 Compare May 14, 2025 23:25
@timoffex timoffex changed the base branch from graphite-base/9849 to 05-14-tensorboard_blob May 14, 2025 23:25
@timoffex timoffex force-pushed the 05-14-tensorboard_blob branch from 8337d42 to dada1c3 Compare May 14, 2025 23:45
@timoffex timoffex marked this pull request as ready for review May 14, 2025 23:45
@timoffex timoffex requested a review from a team as a code owner May 14, 2025 23:45
@timoffex timoffex force-pushed the 05-12-tensorboard_cloud_sync branch from b7e3028 to 0e75b5a Compare May 14, 2025 23:45
@timoffex timoffex force-pushed the 05-14-tensorboard_blob branch from dada1c3 to 4b887d1 Compare May 20, 2025 01:14
@timoffex timoffex force-pushed the 05-12-tensorboard_cloud_sync branch from 0e75b5a to a9bd189 Compare May 20, 2025 01:14
@timoffex timoffex force-pushed the 05-14-tensorboard_blob branch from 4b887d1 to 2513ac4 Compare May 20, 2025 19:10
@timoffex timoffex force-pushed the 05-12-tensorboard_cloud_sync branch from a9bd189 to 327d551 Compare May 20, 2025 19:10
@timoffex timoffex force-pushed the 05-14-tensorboard_blob branch 2 times, most recently from 09a5dfc to 18d3c1f Compare May 20, 2025 19:28
@timoffex timoffex force-pushed the 05-12-tensorboard_cloud_sync branch 2 times, most recently from 99af253 to 97c5877 Compare May 20, 2025 19:38
@timoffex timoffex changed the base branch from 05-14-tensorboard_blob to graphite-base/9849 May 20, 2025 19:44
@timoffex timoffex force-pushed the 05-12-tensorboard_cloud_sync branch from 97c5877 to 1287dcb Compare May 20, 2025 19:44
@timoffex timoffex force-pushed the graphite-base/9849 branch from 18d3c1f to 0e9e80a Compare May 20, 2025 19:44
@graphite-app graphite-app bot changed the base branch from graphite-base/9849 to main May 20, 2025 19:45
@timoffex timoffex force-pushed the 05-12-tensorboard_cloud_sync branch 2 times, most recently from b38a41c to 9e27556 Compare May 21, 2025 00:41
@timoffex timoffex force-pushed the 05-12-tensorboard_cloud_sync branch from 9e27556 to 64f3966 Compare May 21, 2025 01:01
@timoffex timoffex force-pushed the 05-12-tensorboard_cloud_sync branch from 64f3966 to 55c0d1e Compare May 22, 2025 21:20
@timoffex timoffex merged commit 8391e03 into main May 22, 2025
24 checks passed
Copy link
Contributor Author

Merge activity

@timoffex timoffex deleted the 05-12-tensorboard_cloud_sync branch May 22, 2025 21:36
timoffex added a commit to wandb/docs that referenced this pull request Jul 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0