8000 Dimension issue during evaluation of val_seen · Issue #12 · expectorlin/NavCoT · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
Dimension issue during evaluation of val_seen #12
Open
@Ehrmantra

Description

@Ehrmantra

I really appreciate your work. During the process of reproducing it, I encountered a dimension mismatch issue when testing on the unseen data. A 513-dimensional vector appeared, and I'm not sure how to resolve this problem.

[04:07:39.871757] Beginning evaluation of val_seen
Traceback (most recent call last):
File "/data1_8t/user/zr/codes/NavCoT/finetune_src/r2r/main.py", line 316, in
main()
File "/data1_8t/user/zr/codes/NavCoT/finetune_src/r2r/main.py", line 312, in main
valid(args, train_env, val_envs, rank=rank)
File "/data1_8t/user/zr/codes/NavCoT/finetune_src/r2r/main.py", line 273, in valid
agent.test(use_dropout=False, feedback='argmax', iters=iters)
File "/data1_8t/user/zr/codes/NavCoT/finetune_src/r2r/agent_cmt.py", line 1114, in test
super().test(iters=iters, rollout_function=self.rollout_llm)
File "/data1_8t/user/zr/codes/NavCoT/finetune_src/r2r/agent_base.py", line 42, in test
for traj in rollout_function(**kwargs):
File "/data1_8t/user/zr/codes/NavCoT/finetune_src/r2r/agent_cmt.py", line 1042, in rollout_llm
nav_output = self.llm.generate(nav_input["prompts"],images=None,max_gen_len=64,temperature=self.args.temperature)
File "/data1_8t/user/zr/miniconda3/envs/accessory/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/data1_8t/user/zr/codes/NavCoT/LLaMA2-Accessory/accessory/model/meta.py", line 127, in generate
tokens[k, : len(t)] = torch.tensor(t).long()
File "/data1_8t/user/zr/miniconda3/envs/accessory/lib/python3.9/site-packages/torch/utils/_device.py", line 62, in torch_function
return func(*args, **kwargs)
RuntimeError: The expanded size of the tensor (512) must match the existing size (513) at non-singleton dimension 0. Target sizes: [512]. Tensor sizes: [513]
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1516850) of binary: /data1_8t/user/zr/miniconda3/envs/accessory/bin/python3.9
Traceback (most recent call last):
File "/data1_8t/user/zr/miniconda3/envs/accessory/bin/torchrun", line 8, in
sys.exit(main())
File "/data1_8t/user/zr/miniconda3/envs/accessory/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper
return f(*args, **kwargs)
File "/data1_8t/user/zr/miniconda3/envs/accessory/lib/python3.9/site-packages/torch/distributed/run.py", line 794, in main
run(args)
File "/data1_8t/user/zr/miniconda3/envs/accessory/lib/python3.9/site-packages/torch/distributed/run.py", line 785, in run
elastic_launch(
File "/data1_8t/user/zr/miniconda3/envs/accessory/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 134, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/data1_8t/user/zr/miniconda3/envs/accessory/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

r2r/main.py FAILED

Failures:
<NO_OTHER_FAILURES>

Root Cause (first observed failure):
[0]:
time : 2025-06-12_04:13:29
host : ubuntu
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 1516850)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Could you kindly spare some time to help me amidst your busy schedule? Thank you.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0