gym-tokens

This environment is used for the project, "An inference perspective on urgency in decision-making: A drunkard’s walk case study" which was presented at Cosyne 2020.

Abstract

Agents are often tasked with deciding early to maximize reward rate. Evidence accumulation-to-bound and urgency-gating models each fail to wholly reproduce experimental results of single tasks, but each bring useful ingredients. Here, we provide an intuitive theory of time-constrained decision-making, combining these ingredients in the context of the well-known, yet under-studied ‘tokens task’, which challenges agents to use prediction to capitalize on early decisions. We support the theory with the development and analysis of a solution by a neurally plausible reinforcement learning (RL) algorithm, by an interpretable optimal solution, and with a qualitative match to measured neural recordings for urgency and commitment time from non-human primates and human behavioral data . Our approach offers 3 novel attributes. First, the agent employs a compressed representation of future trajectories, inspired by the recently proposed successor state representation in hippocampus, as a powerful and learnable balance between model-based and model-free RL approaches. Second, the agent exploits its evidence accumulation model to compute a real-time posterior estimate of a trial’s value that we find decays with time, introducing a bias towards high value trials. We show that hyperbolic discounting, the prevalent form found in primates, emerges naturally if the agent’s memory is limited to storing mean values and uses a least-biased prior. Third, the resulting urgency signal in our model is an estimate of trial difficulty and has a particularly simple form that combines intrinsically generated subjective confidence, experimentally imposed time pressure, and time linearly. These dependencies are also exhibited by the urgency signal measured in basal ganglia, providing neural evidence for our formulation. Finally, we use a variant of Q-learning, a neurally plausible learning algorithm, on the task and achieve primate-level performance. Our formulation of time-constrained decision-making provides an experimentally-grounded inference perspective on reward-based learning.

To install, go to the local directory of this folder: python setup.py install

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
dist		dist
gym_tokens.egg-info		gym_tokens.egg-info
gym_tokens		gym_tokens
lib		lib
storage		storage
utils		utils
.gitignore		.gitignore
README.md		README.md
main.py		main.py
plot_reward.py		plot_reward.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

gym-tokens

Abstract

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

raymondchua/gym-tokens

Folders and files

Latest commit

History

Repository files navigation

gym-tokens

Abstract

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages