-
Notifications
You must be signed in to change notification settings - Fork 3k
[LLM] [Cherry-Pick] Integrate PDC SDK for LLM training fault tolerance platform #9706
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
* pdc校验码支持 * pdc校验码支持 * pdc校验码支持
Thanks for your contribution! |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #9706 +/- ##
===========================================
+ Coverage 51.11% 52.64% +1.53%
===========================================
Files 730 720 -10
Lines 122587 112928 -9659
===========================================
- Hits 62657 59453 -3204
+ Misses 59930 53475 -6455 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
5b3b379
to
2c6f2c4
Compare
* args支持自动保存本地json * args支持自动保存本地json
* 增加resume-ckpt支持 * 增加resume-ckpt支持 * 增加resume-ckpt支持 * pdc-versioncontrol-6234 pdc-versioncontrol-6400 sft 多阶段接口 * 增加resume-ckpt支持 * 增加resume-ckpt支持
|
||
from paddlenlp.utils.log import logger | ||
|
||
PDC_AGENT_BIN = "/root/paddlejob/tools/agent" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里的PATH设置成固定会不会有坑,如有其他平台或者PDC的其他镜像想使用的时候,不一定有这个path
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个不会的,/root/paddlejob/tools 不是镜像的路径,是PDC所有插件预置的路径。每次拉起容器的时候都会在这个目录里实时部署最新插件组,因此是不会随便改的。
…nto cp_pdc_sdk
…nto cp_pdc_sdk
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR types
New features
PR changes
Others
Description
CP from: #9263; #9431; #9443; #9539; #9618; #9690;
[Pcard-88789]