-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Integrate SwanLab for offline/online experiment tracking for Accelerate #3605
New issue
10000Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM ! Thanks for adding this ! Can you add a couple of tests like the other tackers ?
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Hi, we're currently updating the local log storage format for SwanLab, and a new version will be released within 2 days. We'll also complete the corresponding test cases for the new version! |
I’ve added some new test cases. Would you mind helping me review them when you're available? Thanks a lot for your time! @SunMarc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot !
Thanks for the review and merge! 🤗 |
What does this PR do?
This PR introduces SwanLab, a lightweight open-source experiment tracking tool, as a new logging option for the training framework. The integration provides both online and offline tracking capabilities, along with a local dashboard for visualizing results.
SwanLab has previously supported:
report_to
parameter (documentation)We've received numerous requests from the community to add native Accelerate support (see here), and we're excited to officially integrate with this excellent project to provide a more seamless experience for developers. This integration is particularly valuable for users in regions with limited network connectivity (such as China), offering them robust experiment tracking capabilities.
Before submitting
to it if that's the case. (see here)
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
@SunMarc I noticed that you recently reviewed some related PRs—would you mind helping review my PR as well? Thank you! (I believe you also reviewed the Hugging Face Transformers integration previously—looking forward to collaborating again! 😄)
Additional information about this PR
Usage guidline
Step 0: Set Up Accelerate and example environment
Following the accelerate official cv example (pet image classification task):
Step 1: Set Up SwanLab Online Tracking
Install:
To use SwanLab's online tracking, log in to the SwanLab website and obtain your API key from the Settings page. Then, authenticate using the following command:
If you prefer offline mode, skip this step and install local dashboard:
Step 2: download Oxford-IIT Pet Dataset used in example code
You can find download link here
Step 3: run offical example script in accelerate projects
visualization demo here
Since my server is offline, I changed the
pretrain
parameter tofalse
in thecreate_model
code to avoid downloading the model online, which led to very poor accuracy after just 3 epochs 😂.test suite passes
Since my AI training server couldn't connect to Hugging Face, some tests failed during the automated testing process.😭