Abstract
In Q-learning, artificial neural networks (ANNs) are employed as function approximators to learn policies that maximise rewards. Although the biological neural networks inspired ANNs, little is known about the similarities between the function of the two. In particular, it is unknown whether concepts from animal learning such as the Matching Law (the proportion of choices directed towards a particular option matches the proportion of reinforcements obtained from that option) also apply to learning in ANN’s. In this study, we trained a Deep Q-Network (DQN) agent using recurrent neural networks (RNNs) on a modified delayed Sample to Match (DSM) task to investigate the mechanisms by which ANNs learn to perform cognitive tasks of varying difficulty. This task requires the maintenance of a working memory of a target stimulus presented at the trial's onset through a period involving intervening stimuli. We systematically varied the difficulty of the task and found two critical mechanisms necessary for achieving high performance. First, longer and more complex sequences necessitated the transition from a vanilla RNN architecture to a Long Short-Term Memory (LSTM) network to learn the task. The choice of a more advanced architecture allowed us to scale to lengths of sequences that go beyond what is possible to study in laboratory animals. Second, we observed an effect analogous to the Law of Matching observed in animal learning: altering the magnitude of rewards for correct responses significantly impacted the model's learning. Models receiving low rewards for target stimuli tended to adopt a neutral action strategy during both target and intervening stimuli phases, resulting in poor performance. We analysed this effect quantitatively using the accuracy of the trained model on independent draws from the environment. Our experiments indicate that although ANNs and biological systems use different learning mechanisms and scale in distinct ways, they can exhibit convergent behaviour in response to similar reinforcement principles.