8000 GitHub - megforr/toy_rl
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

megforr/toy_rl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RL Algorithms on Toy Environments

Environments

Algorithms

  • Tabular Q-learning
  • Deep Q-Learning (DQN)
  • Policy Gradients (PG)
    • REINFORCE
      • On-policy Method - directly optimizing the current policy (no replay buffer needed).
      • Policy gradient: $\nabla J ~= E[Q(s,a) \nabla log \pi(a|s)]$
        • Scale of gradient is proportional to the value of the action taken: Q(s,a)
        • The gradient itself is equal to the gradient of the log probability of the action taken: \nabla log \pi(a|s)
        • Stochastic gradient ascent via minimizing loss: $L = -Q(s,a) log \pi(a|s)$
    • Proximal Policy Optimization (PPO)
    • Deep Deterministic Policy Gradient (DDPG)
  • Actor Critic
    • Advantage Actor Critic (A2C)
    • Soft Actor Critic (SAC)
  • Multi-Arm Bandit (MAB)
    • Epsilon Greedy
    • Upper Confidence Bound-1
    • Thompson Sampling
    • Best Arm ID - Fixed Confidence
    • Best Arm ID - Fixed Budget
  • Contextual MAB (cMAB)
    • LinUCB

Frameworks

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

0