8000 GitHub - raymondchua/gym-tokens
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

raymondchua/gym-tokens

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

gym-tokens

This environment is used for the project, "An inference perspective on urgency in decision-making: A drunkard’s walk case study" which was presented at Cosyne 2020.

Abstract

Agents are often tasked with deciding early to maximize reward rate. Evidence accumulation-to-bound and urgency-gating models each fail to wholly reproduce experimental results of single tasks, but each bring useful ingredients. Here, we provide an intuitive theory of time-constrained decision-making, combining these ingredients in the context of the well-known, yet under-studied ‘tokens task’, which challenges agents to use prediction to capitalize on early decisions. We support the theory with the development and analysis of a solution by a neurally plausible reinforcement learning (RL) algorithm, by an interpretable optimal solution, and with a qualitative match to measured neural recordings for urgency and commitment time from non-human primates and human behavioral data . Our approach offers 3 novel attributes. First, the agent employs a compressed representation of future trajectories, inspired by the recently proposed successor state representation in hippocampus, as a powerful and learnable balance between model-based and model-free RL approaches. Second, the agent exploits its evidence accumulation model to compute a real-time posterior estimate of a trial’s value that we find decays with time, introducing a bias towards high value trials. We show that hyperbolic discounting, the prevalent form found in primates, emerges naturally if the agent’s memory is limited to storing mean values and uses a least-biased prior. Third, the resulting urgency signal in our model is an estimate of trial difficulty and has a particularly simple form that combines intrinsically generated subjective confidence, experimentally imposed time pressure, and time linearly. These dependencies are also exhibited by the urgency signal measured in basal ganglia, providing neural evidence for our formulation. Finally, we use a variant of Q-learning, a neurally plausible learning algorithm, on the task and achieve primate-level performance. Our formulation of time-constrained decision-making provides an experimentally-grounded inference perspective on reward-based learning.

To install, go to the local directory of this folder: python setup.py install

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages

0