Open
Description
The Soft Actor-Critic paper (arXiv v2) says, in the last paragraph on page 5:
We then use the minimum of the Q-functions for the value gradient in Equation 6 and policy gradient in Equation 13
However, the code in sac/algos/sac.py
uses only one of Q functions in the policy gradient loss. It does use the minimum in the value gradient loss.
Is there a reason for the discrepancy? Thanks!
Metadata
Metadata
Assignees
Labels
No labels