remove ipo loss + small fixed #1615

felipemello1 · 2024-09-17T17:41:52Z

Context

What is the purpose of this PR? Is it to

add a new feature
fix a bug
update tests and/or documentation
other (please add here)

IPO is currently not working properly (https://github.com/pytorch/torchtune/issues/12910) and needs more testing. This PR removes it before our new release

Also small fixes were done. Major one was removing loss parameters from DPO, so its easy to switch losses with diferent parameters in the config.

pytorch-bot · 2024-09-17T17:41:55Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/1615

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 6bbcf8f with merge base 48eb35d ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

SalmanMohammadi · 2024-09-17T17:56:11Z

torchtune/rlhf/loss/dpo.py

-
-        IPO learns from preferences dataset simply by regressing the gap between log-likelihood ratios
-
-        :math:`\log \bigg(\frac{(\pi(\text{chosen})}{\pi(\text{rejected})}\bigg)` and :math:`\log \bigg(\frac{\pi_{\text{ref}}(\text{chosen})}{\pi_{\text{ref}}(\text{rejected})} \bigg)`


my beautiful latex : (

Its not a goodbye. Its just a "until next time".

Thanks for all the reviews and help with this @SalmanMohammadi !

kashif · 2024-09-17T18:27:49Z

one thing about IPO was that we needed to average over the length of the completions rather than just summing over the log-probs over the seq see: https://github.com/huggingface/trl/blob/main/trl/trainer/dpo_trainer.py#L1416-L1417

Co-authored-by: Felipe Mello <felipemello@fb.com>

SalmanMohammadi · 2024-09-18T10:06:57Z

one thing about IPO was that we needed to average over the length of the completions rather than just summing over the log-probs over the seq see: https://github.com/huggingface/trl/blob/main/trl/trainer/dpo_trainer.py#L1416-L1417

Hey @kashif. Yep you're totally right. We actually had an issue for this (#1291), but didn't get round to it, and since we just put out a new release we didn't want to ship a version of the IPOLoss which didn't have this fix.

If you'd be interested in contributing this fix for at all I can restore the loss on main now that the release is out.

kashif · 2024-09-18T10:09:41Z

no worries! i'll let someone else take a shot at it! Already have a bunch on my plate haha but yeah if no one takes it, I can do it too...

SalmanMohammadi · 2024-09-18T10:15:50Z

no worries! i'll let someone else take a shot at it! Already have a bunch on my plate haha but yeah if no one takes it, I can do it too...

No sweat at all. It's a pretty straightforward fix since we already do something similar with the SimPO loss so hopefully it we can make it to the next release.

kashif · 2024-09-18T10:19:23Z

we need to get your sweet latex back in!

Felipe Mello added 2 commits September 17, 2024 10:29

small fixes

41d6151

remove ipo loss

1ba0eba

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 17, 2024

Merge branch 'main' of github.com:pytorch/torchtune into dpo_small_fix

6bbcf8f

SalmanMohammadi approved these changes Sep 17, 2024

View reviewed changes

felipemello1 merged commit 6d3e065 into pytorch:main Sep 17, 2024
17 checks passed

felipemello1 deleted the dpo_small_fix branch September 17, 2024 18:23

ebsmothers pushed a commit that referenced this pull request Sep 17, 2024

remove ipo loss + small fixed (#1615)

22456bb

Co-authored-by: Felipe Mello <felipemello@fb.com>

krammnic mentioned this pull request Oct 16, 2024

General tracker of RLHF methods addition #1850

Closed

8 tasks

krammnic mentioned this pull request Dec 25, 2024

Custom losses redesign in alignment section #2206

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

remove ipo loss + small fixed #1615

remove ipo loss + small fixed #1615

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!


		IPO learns from preferences dataset simply by regressing the gap between log-likelihood ratios

		:math:`\log \bigg(\frac{(\pi(\text{chosen})}{\pi(\text{rejected})}\bigg)` and :math:`\log \bigg(\frac{\pi_{\text{ref}}(\text{chosen})}{\pi_{\text{ref}}(\text{rejected})} \bigg)`

remove ipo loss + small fixed #1615

remove ipo loss + small fixed #1615

Uh oh!

Conversation

Context

Uh oh!

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/1615

✅ No Failures

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!