8000 GitHub - megforr/toy_rl

More Web Proxy on the site http://driver.im/

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.idea		.idea
A2C		A2C
DQN		DQN
PG		PG
.DS_Store		.DS_Store
readme.md		readme.md

Repository files navigation

RL Algorithms on Toy Environments

Environments

Algorithms

Tabular Q-learning
Deep Q-Learning (DQN)
Policy Gradients (PG)
- REINFORCE
  - On-policy Method - directly optimizing the current policy (no replay buffer needed).
  - Policy gradient: $\nabla J ~= E[Q(s,a) \nabla log \pi(a|s)]$
    - Scale of gradient is proportional to the value of the action taken: Q(s,a)
    - The gradient itself is equal to the gradient of the log probability of the action taken: \nabla log \pi(a|s)
    - Stochastic gradient ascent via minimizing loss: $L = -Q(s,a) log \pi(a|s)$
- Proximal Policy Optimization (PPO)
- Deep Deterministic Policy Gradient (DDPG)
Actor Critic
- Advantage Actor Critic (A2C)
- Soft Actor Critic (SAC)
Multi-Arm Bandit (MAB)
- Epsilon Greedy
- Upper Confidence Bound-1
- Thompson Sampling
- Best Arm ID - Fixed Confidence
- Best Arm ID - Fixed Budget
Contextual MAB (cMAB)
- LinUCB

Frameworks

Pytorch
- Pytorch Install
- Pytorch help

About

No description, website, or topics provided.

Report repository

Releases

No releases published

Packages

No packages published

Languages

Python 100.0%

0