8000 remove ipo loss + small fixed by felipemello1 · Pull Request #1615 · pytorch/torchtune · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

remove ipo loss + small fixed #1615

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Sep 17, 2024
Merged

Conversation

felipemello1
Copy link
Contributor

Context

What is the purpose of this PR? Is it to

  • add a new feature
  • fix a bug
  • update tests and/or documentation
  • other (please add here)

IPO is currently not working properly (https://github.com/pytorch/torchtune/issues/12910) and needs more testing. This PR removes it before our new release

Also small fixes were done. Major one was removing loss parameters from DPO, so its easy to switch losses with diferent parameters in the config.

Copy link
pytorch-bot bot commented Sep 17, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/1615

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 6bbcf8f with merge base 48eb35d (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 17, 2024

IPO learns from preferences dataset simply by regressing the gap between log-likelihood ratios

:math:`\log \bigg(\frac{(\pi(\text{chosen})}{\pi(\text{rejected})}\bigg)` and :math:`\log \bigg(\frac{\pi_{\text{ref}}(\text{chosen})}{\pi_{\text{ref}}(\text{rejected})} \bigg)`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my beautiful latex : (

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its not a goodbye. Its just a "until next time".

Thanks for all the reviews and help with this @SalmanMohammadi !

@felipemello1 felipemello1 merged commit 6d3e065 into pytorch:main Sep 17, 2024
17 checks passed
@felipemello1 felipemello1 deleted the dpo_small_fix branch September 17, 2024 18:23
@kashif
Copy link
kashif commented Sep 17, 2024

one thing about IPO was that we needed to average over the length of the completions rather than just summing over the log-probs over the seq see: https://github.com/huggingface/trl/blob/main/trl/trainer/dpo_trainer.py#L1416-L1417

ebsmothers pushed a commit that referenced this pull request Sep 17, 2024
Co-authored-by: Felipe Mello <felipemello@fb.com>
@SalmanMohammadi
Copy link
Contributor

one thing about IPO was that we needed to average over the length of the completions rather than just summing over the log-probs over the seq see: https://github.com/huggingface/trl/blob/main/trl/trainer/dpo_trainer.py#L1416-L1417

Hey @kashif. Yep you're totally right. We actually had an issue for this (#1291), but didn't get round to it, and since we just put out a new release we didn't want to ship a version of the IPOLoss which didn't have this fix.

If you'd be interested in contributing this fix for at all I can restore the loss on main now that the release is out.

@kashif
Copy link
kashif commented Sep 18, 2024

no worries! i'll let someone else take a shot at it! Already have a bunch on my plate haha but yeah if no one takes it, I can do it too...

@SalmanMohammadi
Copy link
Contributor
SalmanMohammadi commented Sep 18, 2024

no worries! i'll let someone else take a shot at it! Already have a bunch on my plate haha but yeah if no one takes it, I can do it too...

No sweat at all. It's a pretty straightforward fix since we already do something similar with the SimPO loss so hopefully it we can make it to the next release.

@kashif
Copy link
kashif commented Sep 18, 2024

we need to get your sweet latex back in!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0