8000
We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
There was an error while loading. Please reload this page.
2.0.1 meta new 1. 添加RL2 meta方法. 2. 添加env包。 3. 添加validation,现在可以在测试环境中无探索测试,目前只支持单线程 3,可控的reset,现在可以指定什么时候初始化hidden state
优化PPO 优化PPO算法,使其更加稳定。