Open
Description
As claimed in Section 3.2, one of motivations is to reduce the interference between small and large scales training dependencies. However, in your practical implementation, both drafter and refiner are trained over all scales, especially there is no special design for drafter training. Can you explain on this?
Metadata
Metadata
Assignees
Labels
No labels