-
Notifications
You must be signed in to change notification settings - Fork 654
remove ipo loss + small fixed #1615
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/1615
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 6bbcf8f with merge base 48eb35d ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
||
IPO learns from preferences dataset simply by regressing the gap between log-likelihood ratios | ||
|
||
:math:`\log \bigg(\frac{(\pi(\text{chosen})}{\pi(\text{rejected})}\bigg)` and :math:`\log \bigg(\frac{\pi_{\text{ref}}(\text{chosen})}{\pi_{\text{ref}}(\text{rejected})} \bigg)` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my beautiful latex : (
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its not a goodbye. Its just a "until next time".
Thanks for all the reviews and help with this @SalmanMohammadi !
one thing about IPO was that we needed to average over the length of the completions rather than just summing over the log-probs over the seq see: https://github.com/huggingface/trl/blob/main/trl/trainer/dpo_trainer.py#L1416-L1417 |
Co-authored-by: Felipe Mello <felipemello@fb.com>
Hey @kashif. Yep you're totally right. We actually had an issue for this (#1291), but didn't get round to it, and since we just put out a new release we didn't want to ship a version of the IPOLoss which didn't have this fix. If you'd be interested in contributing this fix for at all I can restore the loss on main now that the release is out. |
no worries! i'll let someone else take a shot at it! Already have a bunch on my plate haha but yeah if no one takes it, I can do it too... |
No sweat at all. It's a pretty straightforward fix since we already do something similar with the SimPO loss so hopefully it we can make it to the next release. |
we need to get your sweet latex back in! |
Context
What is the purpose of this PR? Is it to
IPO is currently not working properly (https://github.com/pytorch/torchtune/issues/12910) and needs more testing. This PR removes it before our new release
Also small fixes were done. Major one was removing loss parameters from DPO, so its easy to switch losses with diferent parameters in the config.