8000 Are values from Tables 3-7 for task MPE Tag, algorithms MAA2C and MAA2C_NS swapped? · Issue #44 · uoe-agents/epymarl · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
Are values from Tables 3-7 for task MPE Tag, algorithms MAA2C and MAA2C_NS swapped? #44
Open
@gsavarela

Description

@gsavarela

I was unable to verify the results reported for algorithm MAA2C_NS and TAG
task. Even after correcting for the add_value_last_step=False as per issue #43.
Upon cross validation I found evidence pointing to the possibility of
swapped values between the maximum returns for shared parameters, Table 3, and
the maximum returns, Table 7, for non-shared parameters modalities.

Reproduce:

Config:

{   
    "action_selector": "soft_policies",
    "add_value_last_step": false,
    "agent": "rnn_ns",
    "agent_output_type": "pi_logits",
    "batch_size": 10,
    "batch_size_run": 10,
    "buffer_cpu_only": true,
    "buffer_size": 10,
    "checkpoint_path": "",
    "critic_type": "cv_critic_ns",
    "entropy_coef": 0.01,
    "env": "gymma",
    "env_args": {   "key": "mpe:SimpleTag-v0",
                    "pretrained_wrapper": "PretrainedTag",
                    "seed": 343532797,
                    "state_last_action": false,
                    "time_limit": 25},
    "evaluate": false,
    "gamma": 0.99,
    "grad_norm_clip": 10,
    "hidden_dim": 128,
    "hypergroup": null,
    "label": "default_label",
    "learner": "actor_critic_learner",
    "learner_log_interval": 10000,
    "load_step": 0,
    "local_results_path": "results",
    "log_interval": 250000,
    "lr": 0.0003,
    "mac": "non_shared_mac",
    "mask_before_softmax": true,
    "name": "maa2c_ns",
    "obs_agent_id": false,
    "obs_individual_obs": false,
    "obs_last_action": false,
    "optim_alpha": 0.99,
    "optim_eps": 1e-05,
    "q_nstep": 5,
    "repeat_id": 1,
    "runner": "parallel",
    "runner_log_interval": 10000,
    "save_model": false,
    "save_model_interval": 500000,
    "save_replay": false,
    "seed": 343532797,
    "standardise_returns": false,
    "standardise_rewards": true,
    "t_max": 20050000,
    "target_update_interval_or_tau": 0.01,
    "test_greedy": true,
    "test_interval": 500000,
    "test_nepisode": 100,
    "use_cuda": false,
    "use_rnn": true,
    "use_tensorboard": true
}

Considerations

The first consideration is that I have ran experiments for both MAA2C and MAA2C_NS,
and got better results for the MAA2C.

The second consideration is the consistency of results for the Tag task, as reported: We
observe that in all environments except the matrix games, parameter sharing
improves the returns over no parameter sharing. While the average values
presented in Figure 3 do not seem statistically significant, by looking closer
in Tables 3 and 7 we observe that in several cases of algorithm-task pairs the
improvement due to parameter sharing seems significant. Such improvements can
be observed for most algorithms in MPE tasks, especially in Speaker-Listener
and Tag.

Table A groups the results for all the algorithms, minus COMA, for both
modalities for the MPE environment and shows the variation of the results. A
positive change means that the parameter sharing variation has excess of
maximum returns over the non-shared parameters.

Table A: Maximum returns over five seeds for eight algorithms with
parameter sharing (PS), without parameter sharing (NS), and the change in
excess of returns for MPE tasks.

Algorithm Task PS NS Change (%)
IQL Speaker-Listener -18.36 -18.61 1.36%
Spread -132.63 -141.87 6.97%
Adversary 9.38 9.09 3.09%
Tag 22.18 19.18 13.53%
IA2C Speaker-Listener -12.6 -17.08 35.56%
Spread -134.43 -131.74 -2.00%
Adversary 12.12 10.8 10.89%
Tag 17.44 16.04 8.03%
IPPO Speaker-Listener -13.1 -15.56 18.78%
Spread -133.86 -132.46 -1.05%
Adversary 12.17 11.17 8.22%
Tag 19.44 18.46 5.04%
MADDPG Speaker-Listener -13.56 -12.73 -6.12%
Spread -141.7 -136.73 -3.51%
Adversary 8.97 8.81 1.78%
Tag 12.5 2.82 77.44%
MAA2C Speaker-Listener -10.71 -13.66 27.54%
Spread -129.9 -130.88 0.75%
Adversary 12.06 10.88 9.78%
Tag 19.95 26.5 -32.83%
MAPPO Speaker-Listener -10.68 -14.35 34.36%
Spread -133.54 -128.64 -3.67%
Adversary 11.3 12.04 -6.55%
Tag 18.52 17.96 3.02%
VDN Speaker-Listener -15.95 -15.47 -3.01%
Spread -131.03 -142.13 8.47%
Adversary 9.28 9.34 -0.65%
Tag 24.5 18.44 24.73%
QMIX Speaker-Listener -11.56 -11.59 0.26%
Spread -126.62 -130.97 3.44%
Adversary 9.67 11.32 -17.06%
Tag 31.18 26.88 13.79%
  • Average Change (%): 7.51%
  • Total Change (%): 240.40%

More strictly, the differences are even larger when we take into account only the Tag task.

Table B: Maximum returns over five seeds for the Tag task with parameter sharing (PS),
without parameter sharing (NS), the excess of returns of PS over NS, and the change in
excess of returns for the eight algorithms.

Algorithm PS NS Excess of Returns Change (%)
IQL 22.18 19.18 3 13.53%
IA2C 17.44 16.04 1.4 8.03%
IPPO 19.44 18.46 0.98 5.04%
MADDPG 12.5 2.82 9.68 77.44%
MAA2C 19.95 26.5 -6.55 -32.83%
MAPPO 18.52 17.96 0.56 3.02%
VDN 24.5 18.44 6.06 24.73%
QMIX 31.18 26.88 4.3 13.79%
  • Average Change: 2.42875 14.09%
  • Total Change: 19.43 112.75%

Can you confirm that is indeed the case? Or point to the right direction.

Thanks,

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0