10000 [fix] Fix math reward hanging [WIP] by BlankCheng · Pull Request #109 · LLM360/Reasoning360 · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

[fix] Fix math reward hanging [WIP] #109

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

BlankCheng
Copy link
Collaborator

Problem

The training hangs during the math reward computation phase when meeting timeouts.

Cause

The suspected cause is attributed to hanging worker processes within the math reward function. For example, calls to time-consuming operations like sympy.simplify() may enter an indefinite state on certain inputs if not properly killed.

The existing asyncio timeout mechanism captures the top-level task but fails to terminate the orphaned sympy subprocess. This might result in the ProcessPoolExecutor being unable to shut down cleanly, leading to a deadlock. Also, since the reward calculation is a CPU-bound operation, the asyncio event loop might not bring benefits.

Solution

This PR implements mp_reward_manager.py, a multiprocessing reward manager that replaces the previous asyncio implementation.

  • Removal of asyncio
  • Explicit Process Timeouts: A timeout is now enforced on a per-process level. This guarantees that any single hanging reward process is terminated, preventing it from blocking the ProcessPoolExecu A1B1 tor shutdown.

The implementation is currently undergoing testing to validate the fix.

@BlankCheng BlankCheng requested a review from AndreasXie July 7, 2025 02:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant
0