python version: 3.10.13 torch version: 2.1.1+cu121
This is the Easy21 Assignment for David Silver's 2015 COMPM050/COMPGI13, Reinforcement Learning, which is a REALLY REALLY EXCELLENT course.
If you encounter difficulties while trying to complete its assignments, you can refer to my implementation.
非常推荐David Silver的这个课程,有许多在十年后的今天看来仍不落后的观点和讲解, 如果你想要实现大作业Easy21却觉得无从下手,或是觉得别人的代码缺少解释,可以参考我的实现:D
File construct:
├─policy_gradient
├─tabular_method
└─utils
在policy_gradient中,有一个简单的linear_approximation实现,与一个REINFORCE算法的实现 在tabular_method中,包含了原始的Easy21所要求的实现,以及一个百万次迭代后的Q值数组
Citation: