8000 GitHub - keeeal/temporal-ut3: Temporal difference learning for ultimate tic-tac-toe.
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

keeeal/temporal-ut3

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TemporalUT3

Temporal difference learning for ultimate tic-tac-toe.

What is u 7A82 ltimate tic-tac-toe?

It's like tic-tac-toe, but each square of the game contains another game of tic-tac-toe in it! Win small games to claim the squares in the big game. Simple, right? But there is a catch: Whichever small square you pick is the next big square your opponent must play in. Read more...

ultimate tic-tac-toe gif

What is temporal difference learning?

Temporal difference (TD) learning is a reinforcement learning algorithm trained only using self-play. The algorithm learns by bootstrapping from the current estimate of the value function, i.e. the value of a state is updated based on the current estimate of the value of future states. Read more...

How to use

Training

To begin training:

python train.py

or set the learning hyperparameters using any of the optional arguments:

python train.py --lr LEARN_RATE --a ALPHA --e EPSILON

Playing

You can play against a trained model using

python player.py --params path/to/parameters.params

If no parameters are provided, the opponent will make moves randomly.

Experiments

Coming soon.

To-do

Requirements

Thanks

Releases

No releases published

Packages

No packages published

Languages

0