You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently general FSDP2 for non-MoE models all run well, but not run well with MoE models. (e.g., Qwen3-30B-A3B, DeepSeek-V2-Lite)
for Qwen3-30B-A3B, it is obviously slower than Qwen3-32B, especially on the refit process or using hf-tp-plan with dtensor tp > 1.
for DeepSeek-V2-Lite, fail on the following error on model.layers.0.self_attn.rotary_emb.cos_cached , said v.shape=torch.Size([2048, 64]) and self.reference_model_buffers[k].shape=torch.Size([163840, 64])
File "/workspace/nemo_rl/models/policy/dtensor_policy_worker.py", line 649, in get_reference_policy_logprobs
with self.use_reference_model():
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/yukih/.local/share/uv/python/cpython-3.12.9-linux-x86_64-gnu/lib/python3.12/contextlib.py", line 137, in __enter__
return next(self.gen)
^^^^^^^^^^^^^^
File "/workspacenemo_rl/models/policy/dtensor_policy_worker.py", line 626, in use_reference_model
val.copy_(self.reference_model_buffers[k])
RuntimeError: The size of tensor a (2048) must match the size of tensor b (163840) at non-singleton dimension 0
The text was updated successfully, but these errors were encountered:
Currently general FSDP2 for non-MoE models all run well, but not run well with MoE models. (e.g.,
Qwen3-30B-A3B
,DeepSeek-V2-Lite
)for

Qwen3-30B-A3B
, it is obviously slower than Qwen3-32B, especially on the refit process or using hf-tp-plan with dtensor tp > 1.for
DeepSeek-V2-Lite
, fail on the following error onmodel.layers.0.self_attn.rotary_emb.cos_cached
, saidv.shape=torch.Size([2048, 64]) and self.reference_model_buffers[k].shape=torch.Size([163840, 64])
The text was updated successfully, but these errors were encountered: