PUBLICATIONS WITH ABSTRACTS
(LISTED BY DATE)
Abstract:
Synaptic plasticity configures interactions between neurons and is therefore likely to be a primary driver of behavioral learning and development. How this microscopic-macroscopic interaction occurs is poorly understood, as researchers frequently examine models within particular ranges of abstraction and scale. Computational neuroscience and machine learning models offer theoretically powerful analyses of plasticity in neural networks, but results are often siloed and only coarsely linked to biology. In this review, we examine connections between these areas, asking how network computations change as a function of diverse features of plasticity and vice versa. We review how plasticity can be controlled at synapses by calcium dynamics and neuromodulatory signals, the manifestation of these changes in networks, and their impacts in specialized circuits. We conclude that metaplasticity—defined broadly as the adaptive control of plasticity—forges connections across scales by governing what groups of synapses can and can't learn about, when, and to what ends. The metaplasticity we discuss acts by co-opting Hebbian mechanisms, shifting network properties, and routing activity within and across brain systems. Asking how these operations can go awry should also be useful for understanding pathology, which we address in the context of autism, schizophrenia and Parkinson’s disease.
Abstract:
A hallmark of human intelligence, but challenging for reinforcement learning (RL) agents, is the ability to compositionally generalise, that is, to recompose familiar knowledge components in novel ways to solve new problems. For instance, when navigating in a city, one needs to know the location of the destination and how to operate a vehicle to get there, whether it be pedalling a bike or operating a car. In RL, these correspond to the reward function and transition function, respectively. To compositionally generalize, these two components need to be transferable independently of each other: multiple modes of transport can reach the same goal, and any given mode can be used to reach multiple destinations. Yet there are also instances where it can be helpful to learn and transfer entire structures, jointly representing goals and transitions, particularly whenever these recur in natural tasks (e.g., given a suggestion to get ice cream, one might prefer to bike, even in new towns). Prior theoretical work has explored how, in model-based RL, agents can learn and generalize task components (transition and reward functions). But a satisfactory account for how a single agent can simultaneously satisfy the two competing demands is still lacking. Here, we propose a hierarchical RL agent that learns and transfers individual task components as well as entire structures (particular compositions of components) by inferring both through a non-parametric Bayesian model of the task. It maintains a factorised representation of task components through a hierarchical Dirichlet process, but it also represents different possible covariances between these components through a standard Dirichlet process. We validate our approach on a variety of navigation tasks covering a wide range of statistical correlations between task components and show that it can also improve generalisation and transfer in more complex, hierarchical tasks with goal/subgoal structures. Finally, we end with a discussion of our work including how this clustering algorithm could conceivably be implemented by cortico-striatal gating circuits in the brain.
Abstract:
Computational modeling has become a central aspect of research in the cognitive neurosciences. As the field matures, it is increasingly important to move beyond standard models to quantitatively assess models with richer dynamics that may better reflect underlying cognitive and neural processes. For example, sequential sampling models (SSMs) are a general class of models of decision making intended to capture processes jointly giving rise to reaction time distributions and choice data in n-alternative choice paradigms. A number of model variations are of theoretical interest, but empirical data analysis has historically been tied to a small subset for which likelihood functions are analytically tractable. Advances in methods designed for likelihood-free inference have recently made it computationally feasible to consider a much larger spectrum of sequential sampling models. In addition, recent work has motivated the combination of SSMs with reinforcement learning (RL) models, which had historically been considered in separate literatures. Here we provide a significant addition to the widely used HDDM Python toolbox and include a tutorial for how users can easily fit and assess a (user extensible) wide variety of SSMs, and how they can be combined with RL models. The extension comes batteries included, including model visualization tools, posterior predictive checks, and ability to link trial-wise neural signals with model parameters via hierarchical Bayesian regression.
Abstract:
The cortico-basal ganglia circuit is needed to suppress prepotent actions and to facilitate controlled behavior. Under conditions of response conflict, the frontal cortex and subthalamic nucleus [STN] exhibit increased spiking and theta band power, which are linked to adaptive regulation of behavioral output. The electrophysiological mechanisms underlying these neural signatures of impulse control remain poorly understood. To address this lacuna, we constructed a novel large-scale, biophysically principled model of the subthalamopallidal [STN-Globus Pallidus externus (GPe)] network, and examined the mechanisms that modulate theta power and spiking in response to cortical input. Simulations confirmed that theta power does not emerge from intrinsic network dynamics but is robustly elicited in response to cortical input as burst events representing action selection dynamics. Rhythmic burst events of multiple cortical populations, representing a state of conflict where cortical motor plans vacillate in the theta range, led to prolonged STN theta and increased spiking, consistent with empirical literature. Notably, theta band signaling required NMDA, but not AMPA, currents, which were in turn related to a triphasic STN response characterized by spiking, silence and bursting periods. Finally, theta band resonance was also strongly modulated by architectural connectivity, with maximal theta arising when multiple cortical populations project to individual STN “conflict detector” units, due to an NMDA-dependent supralinear response. Our results provide insights into the biophysical principles and architectural constraints that give rise to STN dynamics during response conflict, and how their disruption can lead to impulsivity and compulsivity.
Abstract:
When navigating uncertain worlds, humans must balance exploring new options versus exploiting known rewards. Longer horizons and spatially structured option values encourage humans to explore, but the impact of real-world cognitive constraints such as environment size and memory demands on explore-exploit decisions is unclear. In the present study, humans chose between options varying in uncertainty during a multi-armed bandit task with varying environment size and memory demands. Regression and cognitive computational models of choice behavior showed that with a lower cognitive load, humans are more exploratory than a simulated valuemaximizing learner, but under cognitive constraints, they adaptively scale down exploration to maintain exploitation. Thus, while humans are curious, cognitive constraints force people to decrease their strategic exploration in a resource-rational-like manner to focus on harvesting known rewards.
Abstract:
Adaptive sequential behavior is a hallmark of human cognition. In particular, humans can learn to produce precise spatiotemporal sequences given a certain context. For instance, musicians can not only reproduce learned action sequences in a context-dependent manner, they can also quickly and flexibly reapply them in any desired tempo or rhythm without overwriting previous learning. Existing neural network models fail to account for these properties. We argue that this limitation emerges from the fact that sequence information (i.e., the position of the action) and timing (i.e., the moment of response execution) are typically stored in the same neural network weights. Here, we augment a biologically plausible recurrent neural network of cortical dynamics to include a basal ganglia-thalamic module which uses reinforcement learning to dynamically modulate action. This “associative cluster-dependent chain” (ACDC) model modularly stores sequence and timing information in distinct loci of the network. This feature increases computational power and allows ACDC to display a wide range of temporal properties (e.g., multiple sequences, temporal shifting, rescaling, and compositionality), while still accounting for several behavioral and neurophysiological empirical observations. Finally, we apply this ACDC network to show how it can learn the famous “Thunderstruck” song intro and then flexibly play it in a “bossa nova” rhythm without further training.
Abstract:
Significant evidence supports the view that dopamine shapes learning by encoding reward prediction errors. However, it is unknown whether striatal targets receive tailored dopamine dynamics based on regional functional specialization. Here, we report novel, wave-like spatiotemporal activity-patterns in dopamine axons and release across the dorsal striatum. These waves switch between activational motifs and organize dopamine transients into localized clusters within functionally related striatal sub-regions. Notably, wave trajectories were tailored to task demands, propagating from dorsomedial to dorsolateral striatum when rewards are contingent on animal behavior, and in the opponent direction when rewards are independent of behavioral responses. We propose a computational architecture in which striatal dopamine waves are sculpted by inference about agency, and provide a mechanism to direct credit assignment to specialized striatal subregions. Supporting model predictions, dorsomedial dopamine activity during reward-pursuit signaled the extent of instrumental control, and interacted with reward waves to predict future behavioral adjustments.
Abstract:
Even the most ardent proponents of computational psychiatry admit that the field is far from influencing routine clinical practice. We propose one reason for this is that the field has had difficulty recognizing the variability among mental health problems—and the resulting need to model context and temporal dynamics for many problems. We develop three heuristics for estimating whether time and context are important to a mental health problem. Is it characterized by a core neurobiological mechanism? Does it follow a straightforward natural trajectory? And is intentional mental content peripheral to the problem? For many problems the answers are no, suggesting modeling time and context is critical. We review computational psychiatry advances toward this end, including modeling state variation, using domain-specific stimuli, and interpreting differences in context. We discuss complementary network and complex systems approaches. Novel methods and unification with adjacent fields may inspire a new generation of computational psychiatry.
Abstract:
In cognitive neuroscience, computational modeling can formally adjudicate between theories and affords quantitative fits to behavioral/brain data. Pragmatically, however, the space of plausible generative models considered is dramatically limited by the set of models with known likelihood functions. For many models, the lack of a closed-form likelihood typically impedes Bayesian inference methods. As a result, standard models are evaluated for convenience, even when other models might be superior. Likelihood-free methods exist but are limited by their computational cost or their restriction to particular inference scenarios. Here, we propose neural networks that learn approximate likelihoods for arbitrary generative models, allowing fast posterior sampling with only a one-off cost for model simulations that is amortized for future inference. We show that these methods can accurately recover posterior parameter distributions for a variety of neurocognitive process models. We provide code allowing users to deploy these methods for arbitrary hierarchical model instantiations without further training.
Abstract:
Adaptive cognitive-control involves a hierarchical cortico-striatal gating system that supports selective updating, maintenance, and retrieval of useful cognitive and motor information. Here, we developed a task that independently manipulates selective gating operations into working42 memory (input gating), from working-memory (output gating), and of responses (motor gating) and tested the neural dynamics and computational principles that support them. Increases in gating demands, captured by gate switches, were expressed by distinct EEG correlates at each gating level that evolved dynamically in partially overlapping time windows. Further, categorical representations of specific maintained items and of motor responses could be decoded from EEG when the corresponding gate was switching, thereby linking gating operations to prioritization. Finally, gate switching at all levels was related to increases in the motor decision threshold as quantified by the drift diffusion model. Together these results support the notion that cognitive gating operations scaffold on top of mechanisms involved in motor gating.
Abstract:
Dopamine is well-established to contribute to cognitive control through its effects in the striatum and cortex, respectively. Considering growing evidence that cognitive control is effort-costly, striatal dopamine may also impact control by mediating motivation. Here, we consider the emerging perspective that striatal dopamine boosts control by making people more sensitive to the benefits versus the costs of cognitive effort, reconciling data across multiple instances of what counts as cognitive effort. In addition, we build on knowledge about dopamine’s role in motor action selection to make predictions about how it impacts the competition between controlled and prepotent actions. Finally, we speculate about heterogeneous functional consequences of dopamine signaling in distinct striatal sub-regions, retaining a core cost-benefit interpretation in all regions.
Abstract:
Adaptive behavior requires balancing approach and avoidance based on the rewarding and aversive consequences of actions. Imbalances in this evaluation are thought to characterize mood disorders such as major depressive disorder (MDD). We present a novel application of the drift diffusion model (DDM) suited to quantify how offers of reward and aversiveness, and neural correlates thereof, are dynamically integrated to form decisions, and how such processes are altered in MDD. Hierarchical parameter estimation from the DDM demonstrated that the MDD group differed in three distinct reward-related parameters driving approach-based decision making. First, MDD was associated with reduced reward sensitivity, measured as the impact of offered reward on evidence accumulation. Notably, this effect was replicated in a follow-up study. Second, the MDD group showed lower starting point bias towards approaching offers. Third, this starting point was influenced in opposite directions by Pavlovian effects and by nucleus accumbens activity across the groups: greater accumbens activity was related to approach bias in controls but avoid bias in MDD. Cross-validation revealed that the combination of these computational biomarkers – and not the raw behavioral or neural data – were diagnostic of patient status, with accumbens influences being particularly diagnostic. Finally, within the MDD group, reward sensitivity and nucleus accumbens parameters were differentially related to symptoms of perceived stress and depression Collectively, these findings establish the promise of computational psychiatry approaches to dissecting approach-avoidance decision dynamics relevant for affective disorders.
Abstract:
Background: Psychiatric diagnosis and treatment have historically taken a symptom-based approach, with less attention on identifying underlying symptom-producing mechanisms. Recent efforts have illuminated the extent to which different underlying circuitry can produce phenotypically similar symptomatology (e.g. psychosis in bipolar disorder vs schizophrenia). Computational modelling makes it possible to identify and mathematically differentiate behaviorally-unobservable, specific reinforcement-learning (RL) differences in schizophrenia (SZ) patients versus other disorders, likely due to a higher reliance on prediction-error(PE)- driven learning associated with basal ganglia, and under-reliance on explicit value representations associated with OFC. Methods: We use a well-established probabilistic-RL task to replicate those findings in individuals with schizophrenia both on (N=120) and off (N=44) anti-psychotic medications, and include a patient comparison group of bipolar patients with psychosis (N=60) and healthy controls (n=72). Results: Using accuracy, there was a main effect of group (F(3,279)=7.87, p<0.001, such that all patients groups were less accurate than controls. Using computationally derived parameters, both medicated and unmediated individuals with SZ, but not bipolar patients, demonstrated a reduced “mixing” parameter (F(3,295)=13.91,p<0.001), indicating less dependence on learning explicit value representations, as well as greater learning decay between training and test (F(1,289)=12.81, p<0.001). Unmedicated SZ also showed greater decision noise (F(3,295)=2.67, p=0.04). Conclusions: Both medicated and unmedicated patients show overreliance on PE-driven learning, as well as significantly higher noise and value-related memory decay, compared to the healthy controls and the bipolar patients. Additionally, the computational model parameters capturing these processes can significantly improve patient/control classification, potentially providing useful diagnosis insight.
Abstract:
Schizophrenia is characterized by abnormal perceptions and beliefs, but the computational mechanisms through which these abnormalities emerge remain unclear. One prominent hypothesis asserts that such abnormalities result from overly precise representations of prior knowledge, which in turn lead beliefs to become insensitive to feedback. In contrast, another prominent hypothesis asserts that such abnormalities result from a tendency to interpret prediction errors as indicating meaningful change, leading to the assignment of aberrant salience to noisy or misleading information. Here we examine behaviour of patients and control subjects in a behavioural paradigm capable of adjudicating between these competing hypotheses and characterizing belief updates directly on individual trials. We show that patients are more prone to completely ignoring new information and perseverating on previous responses, but when they do update, tend to do so completely. This updating strategy limits the integration of information over time, reducing both the flexibility and precision of beliefs and provides a potential explanation for how patients could simultaneously show over-sensitivity and under-sensitivity to feedback in different paradigms.
Abstract:
In computer science, reinforcement learning is a powerful framework with which artificial agents can learn to maximize their performance for any given Markov decision process (MDP). Advances over the last decade, in combination with deep neural networks, have enjoyed performance advantages over humans in many difficult task settings. However, such frameworks perform far less favorably when evaluated in their ability to generalize or transfer representations across different tasks. Existing algorithms that facilitate transfer typically are limited to cases in which the transition function or the optimal policy is portable to new contexts, but achieving "deep transfer" characteristic of human behavior has been elusive. Such transfer typically requires discovery of abstractions that permit analogical reuse of previously learned representations to superficially distinct tasks. Here, we demonstrate that abstractions that minimize error in predictions of reward outcomes generalize across tasks with different transition and reward functions. Such reward-predictive representations compress the state space of a task into a lower dimensional representation by combining states that are equivalent in terms of both the transition and reward functions. Because only state equivalences are considered, the resulting state representation is not tied to the transition and reward functions themselves and thus generalizes across tasks with different reward and transition functions. These results contrast with those using abstractions that myopically maximize reward in any given MDP and motivate further experiments in humans and animals to investigate if neural and cognitive systems involved in state representation perform abstractions that facilitate such equivalence relations.
Abstract:
Stimulants such as methylphenidate are increasingly used for cognitive enhancement but precise mechanisms are unknown. We found that methylphenidate boosts willingness to expend cognitive effort by altering the benefit-to-cost ratio of cognitive work. Willingness to expend effort was greater for participants with higher striatal dopamine synthesis capacity, whereas methylphenidate and sulpiride, a selective D2 receptor antagonist, increased cognitive motivation more for participants with lower synthesis capacity. A sequential sampling model informed by momentary gaze revealed that decisions to expend effort are related to amplification of benefit-versus-cost information attended early in the decision process, whereas the effect of benefits is strengthened with higher synthesis capacity and by methylphenidate. These findings demonstrate that methylphenidate boosts the perceived benefits versus costs of cognitive effort by modulating striatal dopamine signaling.
Abstract:
Computational psychiatry is a rapidly growing field attempting to translate advances in computational neuroscience and machine learning into improved outcomes for patients suffering from mental illness. It encompasses both data-driven and theory-driven efforts. Here, recent advances in theory-driven work are reviewed. We argue that the brain is a computational organ. As such, an understanding of the illnesses arising from it will require a computational framework. The review divides work up into three theoretical approaches that have deep mathematical connections: dynamical systems, Bayesian inference and reinforcement learning. We discuss both general and specific challenges for the field, and suggest ways forward.
Abstract:
Humans routinely face novel environments in which they have to generalize in order to act adaptively. However, doing so involves the non-trivial challenge of deciding which aspects of a task domain to generalize. While it is sometimes appropriate to simply re-use a learned behavior, often adaptive generalization entails recombining distinct components of knowledge acquired across multiple contexts. Theoretical work has suggested a computational trade-off in which it can be more or less useful to learn and generalize aspects of task structure jointly or compositionally, depending on previous task statistics, but it is unknown whether humans modulate their generalization strategy accordingly. Here we develop a series of navigation tasks that separately manipulate the statistics of goal values ("what to do") and state transitions ("how to do it") across contexts and assess whether human subjects generalize these task components separately or conjunctively. We find that human generalization is sensitive to the statistics of the previously experienced task domain, favoring compositional or conjunctive generalization when the task statistics are indicative of such structures, and a mixture of the two when they are more ambiguous. These results support a normative "meta-generalization" account and suggests that people not only generalize previous task components but also generalize the statistical structure most likely to support generalization.
Abstract:
Cognitive models have been instrumental for generating insights into the brain processes underlying learning and decision making. In reinforcement learning it has recently been shown that not only choice proportions, but also their latency distributions, can be well captured when the choice function is replaced with a sequential sampling model such as the drift diffusion model. Hierarchical Bayesian parameter estimation further enhances the identifiability of distinct learning and choice parameters. One caveat is that these models can be time-consuming to build, sample from, and validate, especially when models include links between neural activations and model parameters. Here we describe a novel extension to the widely used hierarchical drift diffusion model (HDDM) toolbox which facilitates flexible construction, estimation, and evaluation of the reinforcement learning drift diffusion model (RLDDM) using hierarchical Bayesian methods. We describe the types of experiments most applicable to the model and provide a tutorial to illustrate how to perform quantitative data analysis and model evaluation. Parameter recovery confirmed that the method can reliably estimate parameters with varying numbers of synthetic subjects and trials. We also show that the simultaneous estimation of learning and choice parameters can improve the sensitivity to detect brain-behavioral relationships, including the impact of learned values and fronto-basal ganglia activity patterns on dynamic decision parameters.
Abstract:
Very little is known about how individuals learn under uncertainty when other people are involved. We propose that humans are particularly tuned to social uncertainty, which is especially noisy and ambiguous. Individuals exhibiting less tolerance for uncertainty, such as those with anxiety, may have greater difficulty learning in uncertain social contexts and therefore provide an ideal test population to probe learning dynamics under uncertainty. Using a dynamic trust game and a matched nonsocial task, we found that healthy subjects (n = 257) were particularly good at learning under negative social uncertainty, swiftly figuring out when to stop investing in an exploitative social partner. In contrast, subjects with anxiety (n = 97) overinvested in exploitative partners. Computational modeling attributed this pattern to a selective reduction in learning from negative social events and a failure to enhance learning as uncertainty rises—two mechanisms that likely facilitate adaptive social choice.
Abstract:
Computational modeling plays an important role in modern neuroscience research. Much previous research has relied on statistical methods, separately, to address two problems that are actually interdependent. First, given a particular computational model, Bayesian hierarchical techniques have been used to estimate individual variation in parameters over a population of subjects, leveraging their population-level distributions. Second, candidate models are themselves compared, and individual variation in the expressed model estimated, according to the fits of the models to each subject. The interdependence between these two problems arises because the relevant population for estimating parameters of a model depends on which other subjects express the model. Here, we propose a hierarchical Bayesian inference (HBI) framework for concurrent model comparison, parameter estimation and inference at the population level, combining previous approaches. We show that this framework has important advantages for both parameter estimation and model comparison theoretically and experimentally. The parameters estimated by the HBI show smaller errors compared to other methods. Model comparison by HBI is robust against outliers and is not biased towards overly simplistic models. Furthermore, the fully Bayesian approach of our theory enables researchers to make inference on group-level parameters by performing HBI t-test.
Abstract:
Learning should be adjusted according to the surprise associated with observed outcomes but calibrated according to statistical context. For example, when occasional changepoints are expected, surprising outcomes should be weighted heavily to speed learning. In contrast, when uninformative outliers are expected to occur occasionally, surprising outcomes should less influential. Here we dissociate surprising outcomes from the degree to which they demand learning using a predictive inference task and computational modeling. We show that the P300, a stimulus-locked electrophysiological response previously associated with adjustments in learning behavior, does so conditionally on the source of surprise. Larger P300 signals predicted greater learning in a changing context, but less learning in a context where surprise was indicative of a one-off outlier (oddball). Our results suggest that the P300 provides a surprise signal that is interpreted by downstream learning processes differentially according to statistical context in order to appropriately calibrate learning across complex environments.
Abstract:
Background. Several studies have reported diminished learning from non-social outcomes in depressed individuals. However, it is not clear how depression impacts learning from social feedback. Notably, mood disorders are commonly associated with deficits in social functioning, which raises the possibility that potential impairments in social learning may negatively affect real-life social experiences in depressed subjects. Methods. Ninety-two participants with high (HD; N = 40) and low (LD; N = 52) depression scores were recruited. Subjects performed a learning task, during which they received monetary outcomes or social feedback which they were told came from other people. Additionally, participants answered questions about their everyday social experiences. Computational models were fit to the data and model parameters were related to social experience measures. Results. HD subjects reported a reduced quality and quantity of social experiences compared to LD controls, including an increase in the amount of time spent in negative social situations. Moreover, HD participants showed lower learning rates than LD subjects in the social condition of the task. Interestingly, across all participants, reduced social learning rates predicted higher amounts of time spent in negative social situations, even when depression scores were controlled for. Conclusion. These findings indicate that deficits in social learning may affect the quality of everyday social experiences. Specifically, the impaired ability to use social feedback to appropriately update future actions, which was observed in HD subjects, may lead to suboptimal interpersonal behavior in real life. This, in turn, may evoke negative feedback from others, thus bringing about more unpleasant social encounters.
Abstract:
Background: Maladaptive approach-avoidance behavior has been implicated in the pathophysiology of major depressive disorder (MDD), but the neural basis of these abnormalities in decision-making remains unclear. Capitalizing on recent preclinical findings, we adapted an approach-avoidance conflict task from non-human primate research for use in human functional MRI. Methods: Forty-two female participants, including 18 unmedicated individuals with current MDD (mean age 25.2 ± 5.1) and 24 psychiatrically healthy controls (mean age 26.3 ± 7.6) completed the adapted approach-avoidance task during functional MRI. To probe potential mechanistic factors underlying the observed behavioral and fMRI findings and inform interpretation of putative group differences, we examined electrophysiological data from two female Macaca mulatta monkeys performing the approach-avoidance conflict task mimicked in the fMRI study. Results: Findings demonstrated congruent neural correlates of approach-avoidance conflict and aversive responsiveness in the anterior cingulate cortex, including pregenual cortex, of human subjects and macaques (humans p<0.05 whole-brain corrected; macaques p<0.05). The MDD group exhibited aberrant task-related activations in the anterior cingulate cortex, prefrontal cortex and striatum (all ps<0.05). Neural effects in the MDD group were cross-sectionally associated with stress and depressive symptoms. Importantly, they also prospectively predicted stress at six-month follow-up (all ps<0.05). Conclusions: Findings indicate there is conservation of anterior cingulate regions of activation across species and that frontal and striatal regions, in unmedicated humans with MDD, are abnormally responsive during cost-benefit decision-making. We suggest that these disruptions could be valuable candidates for translational biomarkers.
Abstract:
Background. Cognitive deficits in depressed adults may reflect impaired decision-making. To investigate this possibility, we analyzed data from unmedicated adults with Major Depressive Disorder (MDD) and healthy controls as they performed a probabilistic reward task. The Hierarchical Drift Diffusion Model (HDDM) was used to quantify decision-making mechanisms recruited by the task, to determine if any such mechanism was disrupted by depression. Methods. Data came from two samples (Study 1: 258 MDD, 36 controls; Study 2: 23 MDD, 25 controls). On each trial, participants indicated which of two similar stimuli was presented; correct identifications were rewarded. Quantile-probability plots and the HDDM quantified the impact of MDD on response times (RT), speed of evidence accumulation (drift rate), and the width of decision thresholds, among other parameters. Results. RTs were more positively skewed in depressed v. healthy adults, and the HDDM revealed that drift rates were reduced—and decision thresholds were wider—in the MDD groups. This pattern suggests that depressed adults accumulated the evidence needed to make decisions more slowly than controls did. Conclusions. Depressed adults responded slower than controls in both studies, and poorer performance led the MDD group to receive fewer rewards than controls in Study 1. These results did not reflect a sensorimotor deficit but were instead due to sluggish evidence accumulation. Thus, slowed decision-making—not slowed perception or response execution— caused the performance deficit in MDD. If these results generalize to other tasks, they may help explain the broad cognitive deficits seen in depression.
Abstract:
Dopamine is thought to provide reward prediction error signals to temporal lobe memory systems, but the role of these signals in episodic memory has not been fully characterized. Here we developed an incidental memory paradigm to (i) estimate the influence of reward prediction errors on the formation of episodic memories, (ii) dissociate this influence from surprise and uncertainty, (iii) characterize the role of temporal correspondence between prediction error and memoranda presentation and (iv) determine the extent to which this influence is dependent on memory consolidation. We found that people encoded incidental memoranda more strongly when they gambled for potential rewards. Moreover, the degree to which gambling strengthened encoding scaled with the reward prediction error experienced when memoranda were presented (and not before or after). This encoding enhancement was detectable within minutes and did not differ substantially after 24 h, indicating that it is not dependent on memory consolidation. These results suggest a computationally and temporally specific role for reward prediction error signalling in memory formation.
Abstract:
Humans are remarkably adept at generalizing knowledge between experiences in a way that can be difficult for computers. Often, this entails generalizing constituent pieces of experiences that do not fully overlap, but nonetheless share useful similarities with, previously acquired knowledge. However, it is often unclear how knowledge gained in one context should generalize to another. Previous computational models and data suggest that rather than learning about each individual context, humans build latent abstract structures and learn to link these structures to arbitrary contexts, facilitating generalization. In these models, task structures that are more popular across contexts are more likely to be revisited in new contexts. However, these models can only re-use policies as a whole and are unable to transfer knowledge about the transition structure of the environment even if only the goal has changed (or vice-versa). This contrasts with ecological settings, where some aspects of task structure, such as the transition function, will be shared between context separately from other aspects, such as the reward function. Here, we develop a novel non-parametric Bayesian agent that forms independent latent clusters for transition and reward functions, affording separable transfer of their constituent parts across contexts. We show that the relative performance of this agent compared to an agent that jointly clusters reward and transition functions depends environmental task statistics: the mutual information between transition and reward functions and the stochasticity of the observations. We formalize our analysis through an information theoretic account of the priors, and propose a meta learning agent that dynamically arbitrates between strategies across task domains to optimize a statistical tradeoff.
Abstract:
Learning from rewards and punishments is essential to survival and facilitates flexible human behavior. It is widely appreciated that multiple cognitive and reinforcement learning systems contribute to decision-making, but the nature of their interactions is elusive. Here, we leverage methods for extracting trial-by-trial indices of reinforcement learning (RL) and working memory (WM) in human electro-encephalography to reveal single-trial computations beyond that afforded by behavior alone. Neural dynamics confirmed that increases in neural expectation were predictive of reduced neural surprise in the following feedback period, supporting central tenets of RL models. Within- and cross-trial dynamics revealed a cooperative interplay between systems for learning, in which WM contributes expectations to guide RL, despite competition between systems during choice. Together, these results provide a deeper understanding of how multiple neural systems interact for learning and decision-making and facilitate analysis of their disruption in clinical populations.
Abstract:
The nature of capacity limits for visual working memory has been the subject of an intense debate that has relied on models that assume items are encoded independently. Here we propose that instead, similar features are jointly encoded through a "chunking" process to optimize performance on visual working memory tasks. We show that such chunking can: 1) facilitate performance improvements for abstract capacity-limited systems, 2) be optimized through reinforcement, 3) be implemented by center-surround dynamics, and 4) increase effective storage capacity at the expense of recall precision. Human performance on a variant of a canonical working memory task demonstrated performance advantages, precision detriments, inter-item dependencies, and trial-to-trial behavioral adjustments diagnostic of performance optimization through center surround chunking. Models incorporating center-surround chunking provided a better quantitative description of human performance in our study as well as in a meta-analytic dataset, and apparent differences in working memory capacity across individuals were attributable to individual differences in the implementation of chunking. Our results reveal a normative rationale for center surround connectivity in working memory circuitry, call for re-evaluation of memory performance differences that have previously been attributed to differences in capacity, and support a more nuanced view of visual working memory capacity limitations: strategic tradeoff between storage capacity and memory precision through chunking contribute to flexible capacity limitations that include both discrete and continuous aspects.
Abstract:
In this report, we provide the first evidence that mood and anxiety dimensions are associated with unique aspects of EEG responses to reward and punishment, respectively. We reanalyzed data from our prior publication of a categorical depiction of depression to address more sophisticated dimensional hypotheses. Highly symptomatic depressed individuals (N = 46) completed a probabilistic learning task with concurrent EEG. Measures of anxiety and depression symptomatology were significantly correlated with each other; however, only anxiety predicted better avoidance learning due to a tighter coupling of negative prediction error signaling with punishment-specific EEG features. In contrast, depression predicted a smaller reward-related EEG feature, but this did not affect prediction error coupling or the ability to learn from reward. We suggest that this reward-related alteration reflects motivational or hedonic aspects of reward and not a diminishment in the ability to represent the information content of reinforcements. These findings compel further research into the domain-specific neural systems underlying dimensional aspects of psychiatric disease.
Abstract:
Motivation exerts control over behavior by eliciting Pavlovian responses, which can either match or conflict with instrumental action. We can overcome maladaptive motivational influences putatively through frontal cognitive control. However, the neurocomputational mechanisms subserving this control are unclear; does control entail up-regulating instrumental systems, down-regulating Pavlovian systems, or both? We combined electroencephalography (EEG) recordings with a motivational Go/NoGo learning task (N = 34), in which multiple Go options enabled us to disentangle selective action learning from nonselective Pavlovian responses. Midfrontal theta-band (4 Hz–8 Hz) activity covaried with the level of Pavlovian conflict and was associated with reduced Pavlovian biases rather than reduced instrumental learning biases. Motor and lateral prefrontal regions synchronized to the midfrontal cortex, and these network dynamics predicted the reduction of Pavlovian biases over and above local, midfrontal theta activity. This work links midfrontal processing to detecting Pavlovian conflict and highlights the importance of network processing in reducing the impact of maladaptive, Pavlovian biases.
Abstract:
The subthalamic nucleus (STN) is a small almond-shaped subcortical structure classically known for its role in motor inhibition through the indirect pathway within the basal ganglia. Little is known about the role of the STN in mediating cognitive functions in humans. Here, we explore the role of the STN in human subjects making decisions under conditions of uncertainty using single-neuron recordings and intermittent deep brain stimulation (DBS) during a financial decision-making task. Intraoperative single-neuronal data from the STN reveals that on highuncertainty trials, spiking activity encodes the upcoming decision within a brief (500 ms) temporal window during the choice period, prior to the manifestation of the choice. Application of intermittent DBS selectively prior to the choice period alters decisions and biases subject behavior towards conservative wagers.
Abstract:
To behave adaptively in environments that are noisy and nonstationary, humans and other animals must monitor feedback from their environment and adjust their predictions and actions accordingly. An understudied approach for modeling these adaptive processes comes from the engineering field of control theory, which provides general principles for regulating dynamical systems, often without requiring a generative model. The proportional–integral–derivative (PID) controller is one of the most popular models of industrial process control. The proportional term is analogous to the "delta rule" in psychology, adjusting estimates in proportion to each error in prediction. The integral and derivative terms augment this update to simultaneously improve accuracy and stability. Here, we tested whether the PID algorithm can describe how people sequentially adjust their predictions in response to new information. Across three experiments, we found that the PID controller was an effective model of participants’ decisions in noisy, changing environments. In Experiment 1, we reanalyzed a change-point detection experiment and showed that participants’ behavior incorporated elements of PID updating. In Experiments 2–3, we developed a task with gradual transitions that we optimized to detect PID-like adjustments. In both experiments, the PID model offered better descriptions of behavioral adjustments than both the classical delta-rule model and its more sophisticated variant, the Kalman filter. We further examined how participants weighted different PID terms in response to salient environmental events, finding that these control terms were modulated by reward, surprise, and outcome entropy. These experiments provide preliminary evidence that adaptive learning in dynamic environments resembles PID control.
Abstract:
Why are we so slow in choosing the lesser of 2 evils? We considered whether such slowing relates to uncertainty about the value of these options, which arises from the tendency to avoid them during learning, and whether such slowing relates to frontosubthalamic inhibitory control mechanisms. In total, 49 participants performed a reinforcement-learning task and a stop-signal task while fMRI was recorded. A reinforcement-learning model was used to quantify learning strategies. Individual differences in lose–lose slowing related to information uncertainty due to sampling, and independently, to less efficient response inhibition in the stop-signal task. Neuroimaging analysis revealed an analogous dissociation: subthalamic nucleus (STN) BOLD activity related to variability in stopping latencies, whereas weaker frontosubthalamic connectivity related to slowing and information sampling. Across tasks, fast inhibitors increased STN activity for successfully canceled responses in the stop task, but decreased activity for lose–lose choices. These data support the notion that fronto-STN communication implements a rapid but transient brake on response execution, and that slowing due to decision uncertainty could result from an inefficient release of this "hold your horses" mechanism.
Abstract:
BACKGROUND: While many have emphasized impaired reward prediction error signaling in schizophrenia, multiple studies suggest that some decision-making deficits may arise from overreliance on stimulus-response systems together with a compromised ability to represent expected value. Guided by computational frameworks, we formulated and tested two scenarios in which maladaptive representations of expected value should be most evident, thereby delineating conditions that may evoke decision-making impairments in schizophrenia. METHODS: In a modified reinforcement learning paradigm, 42 medicated people with schizophrenia and 36 healthy volunteers learned to select the most frequently rewarded option in a 75-25 pair: once when presented with a more deterministic (90-10) pair and once when presented with a more probabilistic (60-40) pair. Novel and old combinations of choice options were presented in a subsequent transfer phase. Computational modeling was employed to elucidate contributions from stimulus-response systems (actor–critic) and expected value (Q-learning). RESULTS: People with schizophrenia showed robust performance impairments with increasing value difference between two competing options, which strongly correlated with decreased contributions from expected value-based learning (Q-learning). Moreover, a subtle yet consistent contextual choice bias for the probabilistic 75 option was present in people with schizophrenia, which could be accounted for by a context-dependent reward prediction error in the actor–critic. CONCLUSIONS: We provide evidence that decision-making impairments in schizophrenia increase monotonically with demands placed on expected value computations. A contextual choice bias is consistent with overreliance on stimulus-response learning, which may signify a deficit secondary to the maladaptive representation of expected value. These results shed new light on conditions under which decision-making impairments may arise.
Abstract:
BACKGROUND: The current study was designed to test the hypothesis that motivational deficits in schizophrenia (SZ) are tied to a reduced ability to differentially signal gains and instances of loss-avoidance in the brain, leading to reduced ability to form adaptive representations of expected value. METHODS: We administered a reinforcement learning paradigm to 27 medicated SZ patients and 27 control subjects in which participants learned three probabilistic discriminations. In regions of interest in reward networks identified a priori, we examined contrasts between trial types with different expected values (e.g., expected gain–nonmonetary) and between outcomes with the same prediction error valence but different experienced values (e.g., gain–lossavoidance outcome, miss–loss outcome). RESULTS: Both whole-brain and region of interest analyses revealed that SZ patients showed reduced differentiation between gain and loss-avoidance outcomes in the dorsal anterior cingulate cortex and bilateral anterior insula. That is, SZ patients showed reduced contrasts between positive prediction errors of different objective values in these areas. In addition, we observed significant correlations between gain–loss-avoidance outcome contrasts in the ventral striatum and ratings for avolition/anhedonia and between expected gain–nonmonetary contrasts in the ventral striatum and ventromedial prefrontal cortex. CONCLUSIONS: These results provide further evidence for intact prediction error signaling in medicated SZ patients, especially with regard to loss-avoidance. By contrast, components of frontostriatal circuits appear to show reduced sensitivity to the absolute valence of expected and experienced outcomes, suggesting a mechanism by which motivational deficits may emerge.
Abstract:
Cognitive control -- the ability to override a salient or prepotent action to execute a more deliberate one -- is required for flexible, goal-directed behavior, and yet it is subjectively costly: decision-makers avoid allocating control resources, even when doing so affords more valuable outcomes. Dopamine likely offsets effort costs just as it does for physical effort. And yet, dopamine can also promote impulsive action, undermining control. We propose a novel hypothesis that reconciles opposing effects of dopamine on cognitive control: during action selection, striatal dopamine biases benefits relative to costs, but does so preferentially for 'proximal' motor and cognitive actions. Considering the nature of instrumental affordances and their dynamics during action selection facilitates a parsimonious interpretation and conserved corticostriatal mechanisms across physical and cognitive domains.
Abstract:
.
Abstract:
Background: When studying learning, researchers directly observe only the participants’ choices, which are often assumed to arise from a unitary learning process. However, a number of separable systems, such as working memory (WM) and reinforcement learning (RL), contribute simultaneously to human learning. Identifying each system’s contributions is essential for mapping the neural substrates contributing in parallel to behavior; computational modeling can help design tasks that allow such a separable identification of processes, and infer their contributions in individuals. Methods: We present a new experimental protocol that separately identifies the contributions of RL and WM to learning, is sensitive to parametric variations in both, and allows us to investigate whether the processes interact. In experiments 1-2, we test this protocol with healthy young adults (n=29 and n=52). In experiment 3, we use it to investigate learning deficits in medicated individuals with schizophrenia (n=49 patients, n=32 controls). Results: Experiments 1-2 established WM and RL contributions to learning, evidenced by parametric modulations of choice by load and delay, and reward history, respectively. It also showed interactions between WM and RL, where RL was enhanced under high WM load. Moreover, we observed a cost of mental effort, controlling for reinforcement history: participants preferred stimuli they encountered under low WM load. Experiment 3 revealed selective deficits in WM contributions and preserved RL value learning in individuals with schizophrenia compared to controls. Conclusions: Computational approaches allow us to disentangle contributions of multiple systems to learning and, consequently, further our understanding of psychiatric diseases.
Abstract:
Reinforcement learning in simple instrumental tasks is usually modeled as a monolithic process in which reward prediction errors are used to update expected values of choice options. This modeling ignores the different contributions of different memory and decision-making systems thought to contribute even to simple learning. In an fMRI experiment, we asked how working memory and incremental reinforcement learning processes interact to guide human learning. Working memory load was manipulated by varying the number of stimuli to be learned across blocks. Behavioral results and computational modeling confirmed that learning was best explained as a mixture of two mechanisms: a fast, capacity-limited, and delay-sensitive working memory process together with slower reinforcement learning. Model-based analysis of fMRI data showed that striatum and lateral prefrontal cortex were sensitive to reward prediction error, as shown previously, but critically, these signals were reduced when the learning problem was within capacity of working memory. The degree of this neural interaction related to individual differences in the use of working memory to guide behavioral learning. These results indicate that the two systems do not process information independently, but rather interact during learning.
Abstract:
Catecholamines modulate the impact of motivational cues on action. Such motivational biases have been proposed to reflect cue-based, ‘Pavlovian’ effects. Here, we assess whether motivational biases may also arise from asymmetrical instrumental learning of active and passive responses following reward and punishment outcomes. We present a novel paradigm, allowing us to disentangle the impact of reward and punishment on instrumental learning from Pavlovian response biasing. Computational analyses showed that motivational biases reflect both Pavlovian and instrumental effects: reward and punishment cues promoted generalized (in)action in a Pavlovian manner, whereas outcomes enhanced instrumental (un)learning of chosen actions. These cue- and outcome-based biases were altered independently by the catecholamine enhancer melthylphenidate. Methylphenidate’s effect varied across individuals with a putative proxy of baseline dopamine synthesis capacity, working memory span. Our study uncovers two distinct mechanisms by which motivation impacts behaviour, and helps refine current models of catecholaminergic modulation of motivated action.
Abstract:
Much human behavior is driven by rewards. Preclinical neurophysiological and clinical positron emission tomography (PET) studies have implicated striatal phasic dopamine (DA) release as a primary modulator of reward processing. However, the relationship between experimental reward-induced striatal DA release and responsiveness to naturalistic rewards, and therefore functional relevance of these findings, has been elusive. We therefore combined, for the first time, a DA D2/3 receptor [18F]fallypride PET during a probabilistic reinforcement learning (RL) task with a six day ecological momentary assessments (EMA) of reward-related behavior in the everyday life of 16 healthy volunteers. We detected significant reward-induced DA release in the bilateral putamen, caudate nucleus and ventral striatum, the extent of which was associated with better behavioral performance on the RL task across all regions. Furthermore, individual variability in the extent of reward-induced DA release in the right caudate nucleus and ventral striatum modulated the tendency to be actively engaged in a behavior if the active engagement was previously deemed enjoyable. This study suggests a link between striatal reward-related DA release and ecologically relevant reward-oriented behavior, suggesting an avenue for the inquiry into the DAergic basis of optimal and impaired motivational drive.
Abstract:
Advances in our understanding of brain function and dysfunction require the integration of heterogeneous sources of data across multiple levels of analysis, from biophysics to cognition and back. This chapter reviews the utility of computational neuroscience approaches across these levels and how they have advanced our understanding of multiple constructs relevant for mental illness, including working memory, reward-based decision making, model-free and model-based reinforcement learning, exploration versus exploitation, Pavlovian contributions to motivated behavior, inhibitory control, and social interactions. The computational framework formalizes these processes, providing quantitative and falsifi able predictions. It also affords a characterization of mental illnesses not in terms of overall defi cit but rather in terms of aberrations in managing fundamental trade-offs inherent within healthy cognitive processing.
Abstract:
Vast spectra of biological and psychological processes are potentially involved in the mechanisms of psychiatric illness. Computational neuroscience brings a diverse toolkit to bear on understanding these processes. This chapter begins by organizing the many ways in which computational neuroscience may provide insight to the mechanisms of psychiatric illness. It then contextualizes the quest for deep mechanistic understanding through the perspective that even partial or nonmechanistic understanding can be applied productively. Finally, it questions the standards by which these approaches should be evaluated. If computational psychiatry hopes to go beyond traditional psychiatry, it cannot be judged solely on the basis of how closely it reproduces the diagnoses and prognoses of traditional psychiatry, but must also be judged against more fundamental measures such as patient outcomes.
Abstract:
Current reinforcement learning models often assume simplified decision processes that do not fully reflect the dynamic complexities of choice processes. Conversely, sequential sampling models of decision making account for both choice-accuracy and response time, but assume that decisions are based on static decision values. To combine these two computational models of decision making and learning, we implemented reinforcement learning models in which the drift diffusion model describes the choice process, thereby capturing both within and across trial dynamics. To exemplify the utility of this approach, we quantitatively fit data from a common reinforcement learning paradigm using hierarchical Bayesian parameter estimation, and compared model variants to determine whether they could capture the effects of stimulant medication in adult patients with attention-deficit hyperactivity disorder (ADHD). The model with best relative fit provided a good description of the learning process, choices, and response times. A parameter recovery experiment showed that the hierarchical Bayesian modeling approach enables accurate estimation of model parameters. The model approach described here, using simultaneous estimation of reinforcement learning and drift diffusion model parameters, shows promise to reveal new insight into cognitive and neural mechanisms of learning and decision making, and alteration of such processes in clinical groups.
Abstract:
Dopaminergic medications commonly used to treat Parkinson's disease can be associated with motor and non-motor behavioural side effects such as dyskinesias and impulse control disorders (ICDs, also known as behavioural addictions). These behaviours, including gambling disorder, binge eating disorder, compulsive sexual behavior and compulsive shopping, are common occurring in 17% of patients on dopamine agonists. The behaviours represent an interaction between dopaminergic medications and either an individual susceptibility or the underlying neurobiology of Parkinson's disease. Parkinsonian rodent models show enhanced reinforcing effects of dopaminergic medications and a potential role for individual vulnerability. Neurophysiological preclinical studies provide insight into the influence of chronic dopaminergic medications on phasic and tonic dopamine mediated via D2 autoreceptor downregulation. Greater stimulus-driven phasic dopamine release is consistent with observations of enhanced novelty preference, greater performance to reward outcomes and enhanced striatal dopamine release to conditioned cues. Tonic stimulation of D2 post-synaptic receptors are also consistent with impaired learning from loss outcomes. Impairments are observed across subtypes of decisional impulsivity (delay discounting, reflection impulsivity and risk taking), together possibly reflecting impaired action outcome mapping in the context of uncertainty and the relative balance of rewards and losses. Impairments appear to be more specific to the decisional but not motor domains of impulsivity, which may reflect differences in ventral and dorsal striatal engagement. Emerging evidence discriminates amongst subtypes of ICDs including ventral striatal anatomical specificity, gender, novelty seeking, delay discounting, response conflict and the effect of deep brain stimulation targeting the subthalamic nucleus (STN DBS), factors that might point towards individual vulnerability predisposing towards the expression of different behavioural subtypes and differing neurobiological substrates. Therapeutic interventions including medication adjustment, naltrexone, cognitive behavioural therapy and possibly the capacity of STN DBS to allow medication changes have demonstrated efficacy. Large scale studies are indicated to identify risk predictors and therapeutic targets leading towards precision medicine.
Abstract:
Often the world is structured such that distinct sensory contexts signify the same abstract rule set. Learning from feedback thus informs us not only about the value of stimulus-action associations but also about which rule set applies. Hierarchical clustering models suggest that learners discover structure in the environment, clustering distinct sensory events into a single latent rule set. Such structure enables a learner to transfer any newly acquired information to other contexts linked to the same rule set, and facilitates re-use of learned knowledge in novel contexts. Here, we show that humans exhibit this transfer, generalization and clustering during learning. Trial-by-trial model-based analysis of EEG signals revealed that subjects’ reward expectations incorporated this hierarchical structure; these structured neural signals were predictive of behavioral transfer and clustering. These results further our understanding of how humans learn and generalize flexibly by building abstract, behaviorally relevant representations of the complex, high-dimensional sensory environment.
Abstract:
Recent research indicates that adults and infants spontaneously create and generalize hierarchical rule sets during incidental learning. Computational models and empirical data suggest that, in adults, this process is supported by circuits linking prefrontal cortex (PFC) with striatum andtheir modulation by dopamine, but the neural circuits supporting this form of learning in infants are largely unknown. We used near-infrared spectroscopy to record PFC activity in 8-month-old human infants during a simple audiovisual hierarchical-rulelearning task. Behavioral results confirmed that infants adopted hierarchical rule sets to learn and generalize spoken object–label mappings across different speaker contexts. Infants had increased activity over right dorsal lateral PFC when rule sets switched from one trial to the next, a neural marker related to updating rule sets into working memory in the adult literature. Infants' eye blink rate, a possible physiological correlate of striatal dopamine activity, also increased when rule sets switched from one trial to the next. Moreover, the increase in right dorsolateral PFC activity in conjunction with eye blink rate also predicted infants' generalization ability, providing exploratory evidence for frontostriatal involvement during learning. These findings provide evidence that PFC is involved in rudimentary hierarchical rule learning in 8-month-old infants, an ability that was previously thought to emerge later in life in concert with PFC maturation.
Abstract:
We propose that schizophrenia involves a combination of decreased phasic dopamine responses for relevant stimuli and increased spontaneous phasic dopamine release. Using insights from computational reinforcement-learning models and basic-science studies of the dopamine system, we show that each of these two disturbances contributes to a specific symptom domain and explains a large set of experimental findings associated with that domain. Reduced phasic responses for relevant stimuli help to explain negative symptoms and provide a unified explanation for the following experimental findings in schizophrenia, most of which have been shown to correlate with negative symptoms: reduced learning from rewards; blunted activation of the ventral striatum, midbrain, and other limbic regions for rewards and positive prediction errors; blunted activation of the ventral striatum during reward anticipation; blunted autonomic responding for relevant stimuli; blunted neural activation for aversive outcomes and aversive prediction errors; reduced willingness to expend effort for rewards; and psychomotor slowing. Increased spontaneous phasic dopamine release helps to explain positive symptoms and provides a unified explanation for the following experimental findings in schizophrenia, most of which have been shown to correlate with positive symptoms: aberrant learning for neutral cues (assessed with behavioral and autonomic responses), and aberrant, increased activation of the ventral striatum, midbrain, and other limbic regions for neutral cues, neutral outcomes, and neutral prediction errors. Taken together, then, these two disturbances explain many findings in schizophrenia. We review evidence supporting their co-occurrence and consider their differential implications for the treatment of positive and negative symptoms.
Abstract:
Generalizing knowledge from experimental data requires constructing theories capable of explaining observations and extending beyond them. Computational modeling offers formal quantitative methods for generating and testing theories of cognition and neural processing. These techniques can be used to extract general principles from specific experimental measurements, but introduce dangers inherent to theory: model-based analyses are conditioned on a set of fixed assumptions that impact the interpretations of experimental data. When these conditions are not met, model-based results can be misleading or biased. Recent work in computational modeling has highlighted the implications of this problem and developed new methods for minimizing its negative impact. Here we discuss the issues that arise when data is interpreted through models and strategies for avoiding misinterpretation of data through model fitting.
Abstract:
Convergent evidence suggests that the basal ganglia support reinforcement learning by adjusting action values according to reward prediction errors. However, adaptive behavior in stochastic environments requires the consideration of uncertainty to dynamically adjust the learning rate. We consider how cholinergic tonically active interneurons (TANs) may endow the striatum with such a mechanism in computational models spanning three Marr's levels of analysis. In the neural model, TANs modulate the excitability of spiny neurons, their population response to reinforcement, and hence the effective learning rate. Long TAN pauses facilitated robustness to spurious outcomes by increasing divergence in synaptic weights between neurons coding for alternative action values, whereas short TAN pauses facilitated stochastic behavior but increased responsiveness to change-points in outcome contingencies. A feedback control system allowed TAN pauses to be dynamically modulated by uncertainty across the spiny neuron population, allowing the system to self-tune and optimize performance across stochastic environments.
Abstract:
Translating advances in neuroscience into benefits for patients with mental illness presents enormous challenges because it involves both the most complex organ, the brain, and its interaction with a similarly complex environment. Dealing with such complexities demands powerful techniques. Computational psychiatry combines multiple levels and types of computation with multiple types of data in an effort to improve understanding, prediction and treatment of mental illness. Computational psychiatry, broadly defined, encompasses two complementary approaches: data driven and theory driven. Data-driven approaches apply machine-learning methods to high-dimensional data to improve classification of disease, predict treatment outcomes or improve treatment selection. These approaches are generally agnostic as to the underlying mechanisms. Theory-driven approaches, in contrast, use models that instantiate prior knowledge of, or explicit hypotheses about, such mechanisms, possibly at multiple levels of analysis and abstraction. We review recent advances in both approaches, with an emphasis on clinical applications, and highlight the utility of combining them.
Abstract:
The negative symptoms of schizophrenia (SZ) are associated with a pattern of reinforcement learning (RL) deficits likely related to degraded representations of reward values. However, the RL tasks used to date have required active responses to both reward and punishing stimuli. Pavlovian biases have been shown to affect performance on these tasks through invigoration of action to reward and inhibition of action to punishment, and may be partially responsible for the effects found in patients. Forty-five patients with schizophrenia and 30 demographically-matched controls completed a four-stimulus reinforcement learning task that crossed action (“Go” or “NoGo”) and the valence of the optimal outcome (reward or punishment-avoidance), such that all combinations of action and outcome valence were tested. Behaviour was modelled using a six-parameter RL model and EEG was simultaneously recorded. Patients demonstrated a reduction in Pavlovian performance bias that was evident in a reduced Go bias across the full group. In a subset of patients administered clozapine, the reduction in Pavlovian bias was enhanced. The reduction in Pavlovian bias in SZ patients was accompanied by feedback processing differences at the time of the P3a component. The reduced Pavlovian bias in patients is suggested to be due to reduced fidelity in the communication between striatal regions and frontal cortex. It may also partially account for previous findings of poorer "Go-learning" in schizophrenia where "Go" responses or Pavlovian consistent responses are required for optimal performance. An attenuated P3a component dynamic in patients is consistent with a view that deficits in operant learning are due to impairments in adaptively using feedback to update representations of stimulus value
Abstract:
Study of human executive function focuses on our ability to represent cognitive rules independently of stimulus or response modality. However, recent findings suggest that executive functions cannot be modularized separately from perceptual and motor systems, and that they instead scaffold on top of motor action selection. Here we investigate whether patterns of motor demands influence how participants choose to implement abstract rule structures. In a learning task that requires integrating two stimulus dimensions for determining appropriate responses, subjects typically structure the problem hierarchically, using one dimension to cue the task-set and the other to cue the response given the task-set. However, the choice of which dimension to use at each level can be arbitrary. We hypothesized that the specific structure subjects adopt would be constrained by the motor patterns afforded within each rule. Across four independent data-sets, we show that subjects create rule structures that afford motor clustering, preferring structures in which adjacent motor actions are valid within each task-set. In a fifth data-set using instructed rules, this bias was strong enough to counteract the well-known task switch-cost when instructions were incongruent with motor clustering. Computational simulations confirm that observed biases can be explained by leveraging overlap in cortical motor representations to improve outcome prediction and hence infer the structure to be learned. These results highlight the importance of sensorimotor constraints in abstract rule formation and shed light on why humans have strong biases to invent structure even when it does not exist.
Abstract:
Huntington's disease (HD) is genetically determined but with variability in symptom onset, leading to uncertainty as to when pharmacological intervention should be initiated. Here we take a computational approach based on neurocognitive phenotyping, computational modeling, and classification, in an effort to provide quantitative predictors of HD before symptom onset. A large sample of subjects -- consisting of both pre-manifest individuals carrying the HD mutation (pre-HD), and early symptomatic -- as well as healthy controls performed the antisaccade conflict task, which requires executive control and response inhibition. While symptomatic HD subjects differed substantially from controls in behavioral measures [reaction time (RT) and error rates], there was no such clear behavioral differences in pre-HD. RT distributions and error rates were fit with an accumulator-based model which summarizes the computational processes involved and which are related to identified mechanisms in more detailed neural models of prefrontal cortex and basal ganglia. Classification based on fitted model parameters revealed a key parameter related to executive control differentiated pre-HD from controls, whereas the response inhibition parameter declined only after symptom onset. These findings demonstrate the utility of computational approaches for classification and prediction of brain disorders, and provide clues as to the underlying neural mechanisms.
Abstract:
Considerable evidence suggests that multiple learning systems can drive behavior. Choice can proceed reflexively from previous actions and their associated outcomes, as captured by "model-free" learning algorithms, or flexibly from prospective consideration of outcomes that might occur, as captured by "model-based" learning algorithms. However, differential contributions of dopamine to these systems are poorly understood. Dopamine is widely thought to support model-free learning by modulating plasticity in striatum. Model-based learning may also be affected by these striatal effects, or by other dopaminergic effects elsewhere, notably on prefrontal working memory function. Indeed, prominent demonstrations linking striatal dopamine to putatively model-free learning did not rule out model-based effects, whereas other studies have reported dopaminergic modulation of verifiably model-based learning, but without distinguishing a prefrontal versus striatal locus. To clarify the relationships between dopamine, neural systems, and learning strategies, we combine a genetic association approach in humans with two well-studied reinforcement learning tasks: one isolating model-based from model-free behavior and the other sensitive to key aspects of striatal plasticity. Prefrontal function was indexed by a polymorphism in the COMT gene, differences of which reflect dopamine levels in the prefrontal cortex. This polymorphism has been associated with differences in prefrontal activity and working memory. Striatal function was indexed by a gene coding for DARPP-32, which is densely expressed in the striatum where it is necessary for synaptic plasticity. We found evidence for our hypothesis that variations in prefrontal dopamine relate to model-based learning, whereas variations in striatal dopamine function relate to model-free learning.
Abstract:
Two studies invite us to reconsider the nature of striatal dopamine signals. Accumbens dopamine appears to signal the value of overt action and prediction errors arise from deviations in these signals.
Abstract:
Contemporary psychiatry faces major challenges. Its syndrome-based disease classification is not based on mechanisms and does not guide treatment, which largely depends on trial and error. The development of therapies is hindered by ignorance of potential beneficiary patient subgroups. Neuroscientific and genetics research have yet to aff ect disease definitions or contribute to clinical decision making. In this challenging setting, what should psychiatric research focus on? In two companion http://ski.clps.brown.edu/papers, we present a list of problems nominated by clinicians and researchers from diff erent disciplines as candidates for future scientific investigation of mental disorders. These problems are loosely grouped into challenges concerning nosology and diagnosis (this Personal View) and problems related to pathogenesis and aetiology (in the companion Personal View). Motivated by successful examples in other disciplines, particularly the list of Hilbert’s problems in mathematics, this subjective and eclectic list of priority problems is intended for psychiatric researchers, helping to re-focus existing research and providing perspectives for future psychiatric science.
Abstract:
We argue that bidirectional interaction between animal and human studies is essential for understanding the human brain. The revolution in meso-scale study of circuits in non-human species provides a historical opportunity. However, to fully realize its potential requires integration with human neuroscience. We describe three strategies for successful interactionist neuroscience.
Abstract:
The cognitive and affective factors implicated in the motivational impairments seen in many people with schizophrenia remain poorly understood. Many research groups have done studies in the past 2 years examining the role of effort-cost computations driven by the hypothesis that overestimation of the cost of effort involved in volitional behavior might underlie the reduction in goal-directed behavior seen in some people with schizophrenia. The goal of this review is to assess the available evidence and the interpretative ambiguities that remain to be addressed by further studies. There is a clear preponderance of evidence suggesting that people with schizophrenia demonstrate altered effort allocation by failing to make high-effort response choices to maximize reward. The evidence relating altered effort allocation to the severity of negative symptoms is mixed. It remains for future work to determine the precise mechanisms implicated in altered effort allocation with two prominent possibilities: that patients 1) overestimate the cost of effort or 2) underestimate the value of potential awards. Other mechanisms that need to be investigated include the potential contributions of other impairments associated with the illness that increase the cost of effort. Furthermore, it is possible that accurate value representations fail to invigorate behavior. Although questions remain, evidence available to date suggests that the study of cost/benefit decision making may shed new light on the motivational impairments seen in many people with schizophrenia.
Abstract:
We focus on exploratory decisions across disorders of compulsivity, a potential dimensional construct for the classification of mental disorders. Behaviours associated with the pathological use of alcohol or food, in alcohol use disorders (AUD) or binge-eating disorder (BED), suggest a disturbance in explore-exploit decision-making, whereby strategic exploratory decisions in attempt to improve long-term outcomes may diminish in favour of more repetitive or exploitatory choices. We compare exploration versus exploitation across disorders of natural (obesity with and without BED) and drug rewards (AUD). We separately acquired resting state functional MRI data using a novel multi-echo planar imaging sequence and independent components analysis from healthy individuals to assess the neural correlates underlying exploration. Participants with AUD showed reduced exploratory behaviour across gain and loss environments, leading to lower-yielding exploitatory choices. Obese subjects with and without BED did not differ from healthy volunteers but when compared to each other or to AUD subjects, BED had enhanced exploratory behaviours particularly in the loss domain. All subject groups had decreased exploration or greater uncertainty avoidance to losses compared to rewards. More exploratory decisions in the context of reward were associated with frontal polar and ventral striatal connectivity. For losses, exploration was associated with frontal polar and precuneus connectivity. We further implicate the relevance and dimensionality of constructs of compulsivity across disorders of both natural and drug rewards.
Abstract:
The ability to extract hierarchically organized rule structures from noisy environments is critical to human cognitive, social, and emotional intelligence. Adults spontaneously create hierarchical rule structures of this sort. In the present research, we conducted two experiments to examine the previously unknown developmental origins of this hallmark skill. In Experiment 1, we exploited a visual paradigm previously shown to elicit incidental hierarchical rule learning in adults. In Experiment 2, we used the same learning structure to examine whether these hierarchical-rule-learning mechanisms are domain general and can help infants learn spoken object-label mappings across different speaker contexts. In both experiments, we found that 8-month-olds created and generalized hierarchical rules during learning. Eyeblink rate, an exploratory indicator of striatal dopamine activity, mirrored behavioral-learning patterns. Our results provide direct evidence that the human brain is predisposed to extract knowledge from noisy environments, and they add a fundamental learning mechanism to what is currently known about the neurocognitive toolbox available to infants.
Abstract:
A large body of research shows that striatal dopamine critically affects the extent to which we learn from the positive and negative outcomes of our decisions. In this study, we examined the relationship between reinforcement learning and spontaneous eye blink rate (sEBR), a cheap, non-invasive, and easy to obtain marker of striatal dopaminergic activity. Based on previous findings from pharmacological and patient studies, our main prediction was that in healthy individuals, low blink rates (and concomitant lower striatal dopamine levels) would be associated with better learning from negative choices, while high blink rates (and concomitant higher striatal dopamine levels) would be associated with learning from positive choices. Behavioral analyses showed that in healthy individuals, lower blink rates were indeed associated with greater learning from negative outcomes, indicating that lower dopamine levels per se may enhance avoidance learning. Yet, higher EBR was not associated with better learning from positive outcomes. These observations support the notion that sEBR reflects tonic dopamine levels, and suggest that sEBR may specifically relate to dopamine D2 receptor function, given the importance of the dopaminergic D2 pathway in avoidance learning. More generally, these findings highlight the usefulness of sEBR as a non-invasive and cheap method for assessing the relationship between striatal dopaminergic function and behavior.
Abstract:
What are the neural dynamics of choice processes during reinforcement learning? Two largely separate literatures have examined dynamics of reinforcement learning (RL) as a function of experience but assuming a static choice process, or conversely, the dynamics of choice processes in decision making but based on static decision values. Here we show that human choice processes during RL are well described by a drift diffusion model (DDM) of decision making in which the learned trial-by-trial reward values are sequentially sampled, with a choice made when the value signal crosses a decision threshold. Moreover, simultaneous fMRI and EEG recordings revealed that this decision threshold is not fixed across trials but varies as a function of activity in the subthalamic nucleus (STN) and is further modulated by trial-by-trial measures of decision conflict and activity in the dorsomedial frontal cortex (pre-SMA BOLD and mediofrontal theta in EEG). These findings provide converging multimodal evidence for a model in which decision threshold in reward-based tasks is adjusted as afunction of communication from pre-SMA to STN when choices differ subtly in reward values, allowing more time to choose the statistically more rewarding option.
Abstract:
The extent to which we learn from positive and negative outcomes of decisions is modulated by the neurotransmitter dopamine. Dopamine neurons burst fire in response to unexpected rewards and pause following negative outcomes. This dual signaling mechanism is hypothesized to drive both approach and avoidance behavior. Here we test a prediction deriving from a computational reinforcement learning model, in which approach is mediated via activation of the direct corticostriatal pathway due to striatal D1 receptor stimulation, while avoidance occurs via disinhibition of indirect pathway striatal neurons secondary to a reduction of D2 receptor stimulation. Using positron emission tomography with two separate radioligands, we demonstrate that individual differences in human approach and avoidance learning are predicted by variability in striatal D1 and D2 receptor binding, respectively. Moreover, transient dopamine precursor depletion improved learning from negative outcomes. These findings support a bidirectional modulatory role for striatal dopamine in reward and avoidance learning via segregated D1 and D2 corticostriatal pathways.
Abstract:
Computational approaches to cognitive neuroscience encompass multiple levels of analysis, from detailed biophysical models of neural activity to abstract algorithmic or normative models of cognition, with several levels in between. Despite often strong opinions on the ‘right’ level of modeling, there is no single panacea: attempts to link biological with higher level cognitive processes require a multitude of approaches. Here I argue that these disparate approaches should not be viewed as competitive, nor should they be accessible to only other researchers already endorsing the particular level of modeling. Rather, insights gained from one level of modeling should inform modeling endeavors at the level above and below it. One way to achieve this synergism is to link levels of modeling by quantitatively fitting the behavioral outputs of detailed mechanistic models with higher level descriptions. If the fits are reasonable (e.g., similar to those achieved when applying high level models to human behavior), one can then derive plausible links between mechanism and computation. Model-based cognitive neuroscience approaches can then be employed to manipulate or measure neural function motivated by the candidate mechanisms, and to test whether these are related to high level model parameters. I describe several examples of this approach in the domain of reward-based learning, cognitive control, and decision making and show how neural and algorithmic models have each informed or refined the other.
Abstract:
Psychiatric research is in crisis. We highlight efforts to overcome current challenges, focusing on the emerging field of computational psychiatry, which might enable us to move from a symptom-based description of mental illness to descriptors based on objective computational multidimensional functional variables. We survey recent efforts towards this goal, and describe a set of methods that together form a toolbox to aid this research program. We identify four levels in computational psychiatry: (i) Behavioral tasks indexing various psychological processes; (ii) computational models that identify the generative psychological processes; (iii) parameter estimation methods concerned with quantitatively fitting these models to subject behavior, focusing on hierarchical Bayesian estimation as a rich framework with many desirable properties; and (iv) machine learning clustering methods which identify clinically significant conditions and sub-groups of individuals. As a proof of principle we apply these methods to two different data sets. Finally, we highlight challenges for future research.
Abstract:
The field of cognitive science studies latent, unobservable cognitive processes that generate observable behaviors. Similarly, cognitive neuroscience attempts to link latent cognitive processes with the neural mechanisms that generate them. Although neural processes are partially observable (with imaging and electrophysiology), it would be a mistake to 'skip' the cognitive level and pursue a purely neuroscientific enterprise to studying behavior. In fact, virtually all of the major advances in understanding the neural basis of behavior over the last century have relied fundamentally on principles of cognition for guiding the appropriate measurements, manipulations, tasks, and interpretations. We provide several examples from the domains of episodic memory, working memory and cognitive control, and decision making in which cognitive theorizing and prior experimentation has been essential in guiding neuroscientific investigations and discoveries.
Abstract:
Conflict has been proposed to act as a cost in action selection, implying a general function of medio–frontal cortex in the adaptation to aversive events. Here we investigate if response conflict acts as a cost during reinforcement learning by modulating experienced reward values in cortical and striatal systems. Electroencephalography recordings show that conflict diminishes the relationship between reward-related frontal theta power and cue preference yet it enhances the relationship between punishment and cue avoidance. Individual differences in the cost of conflict on reward versus punishment sensitivity are also related to a genetic polymorphism associated with striatal D1 versus D2 pathway balance (DARPP-32). We manipulate these patterns with the D2 agent cabergoline, which induces a strong bias to amplify the aversive value of punishment outcomes following conflict. Collectively, these findings demonstrate that interactive cortico–striatal systems implicitly modulate experienced reward and punishment values as a function of conflict.
Abstract:
Previous research has shown that patients with schizophrenia are impaired in reinforcement learning tasks. However, behavioral learning curves in such tasks originate from the interaction of multiple neural processes, including the basal ganglia- and dopamine- dependent reinforcement learning (RL) system, but also prefrontal cortex-dependent cognitive strategies involving working memory (WM). Thus, it is unclear which specific system induces impairments in schizophrenia. We recently developed a task and computational model allowing us to separately assess the roles of RL (slow, cumulative learning) mechanisms versus WM (fast but capacity-limited) mechanisms in healthy adult human subjects. Here, we used this task to assess patients' specific sources of impairments in learning. In 15 separate blocks, subjects learned to pick one of three actions for stimuli. The number of stimuli to learn in each block varied from two to six, allowing us to separate influences of capacity-limited WM from the incremental RL system. As expected, both patients (n = 49) and healthy controls (n = 36) showed effects of set size and delay between stimulus repetitions, confirming the presence of working memory effects. Patients performed significantly worse than controls overall, but computational model fits and behavioral analyses indicate that these deficits could be entirely accounted for by changes in WM parameters (capacity and reliability), whereas RL processes were spared. These results suggest that the working memory system contributes strongly to learning impairments in schizophrenia.
Abstract:
Humans exhibit a preference for options they have freely chosen over equally valued options they have not; however, the neural mechanism that drives this bias and its functional significance have yet to be identified. Here, we propose a model in which choice biases arise due to amplified positive reward prediction errors associated with free choice. Using a novel variant of a probabilistic learning task, we show that choice biases are selective to options that are predominantly associated with positive outcomes. A polymorphism in DARPP-32, a gene linked to dopaminergic striatal plasticity and individual differences in reinforcement learning, was found to predict the effect of choice as a function of value. We propose that these choice biases are the behavioral byproduct of a credit assignment mechanism responsible for ensuring the effective delivery of dopaminergic reinforcement learning signals broadcast to the striatum.
Abstract:
The striatal dopaminergic system has been implicated in reinforcement learning (RL), motor performance, and incentive motivation. Various computational models have been proposed to account for each of these effects individuallly, but a formal analysis of their interactions is lacking. Here we present a novel algorithmic model expanding the classical actor-critic architecture to include fundamental interactive properties of neural circuit models, incorporating both incentive and learning effects into a single theoretical framework. The standard actor is replaced by a dual opponent actor system representing distinct striatal populations, which come to differentially specialize in discriminating positive and negative action values. Dopamine modulates the degree to which each actor component contributes to both learning and choice discriminations. In contrast to standard frameworks, this model simultaneously captures documented effects of dopamine on both learning and choice incentive – and their interactions – across a variety of studies, including probabilistic RL, effort-based choice, and motor skill learning.
Abstract:
Objective. Impairments in learning are central to autism spectrum disorders. The authors investigated the cognitive and neural basis of these deficits in young adults with autism spectrum disorders using a wellcharacterized probabilistic reinforcement learning paradigm. Method. The probabilistic selection task was implemented among matched participants with autism spectrum disorders (N=22) and with typical development (N=25), aged 18–40 years, using rapid event-related functional MRI. Participants were trained to choose the correct stimulus in highprobability (AB), medium-probability (CD), and low-probability (EF) pairs, presentedwith valid feedback 80%, 70%, and 60% of the time, respectively. Whole-brain voxel-wise and parametric modulator analyses examined early and late learning during the stimulus and feedback epochs of the task. Results. The groups exhibited comparable performance on medium- and lowprobability pairs. Typically developing persons showed higher accuracy on the high-probability pair, better win-stay performance (selection of the previously rewarded stimulus on the next trial of that type), and more robust recruitment of the anterior and medial prefrontal cortex during the stimulus epoch, suggesting development of an intact reward-based working memory for recent stimulus values. Throughout the feedback epoch, individuals with autism spectrum disorders exhibited greater recruitment of the anterior cingulate and orbito-frontal cortices compared with individuals with typical development, indicating continuing trial-by-trial activity related to feedback processing. Conclusions. Individuals with autism spectrum disorders exhibit learning deficits reflecting impaired ability to develop an effective reward-based working memory to guide stimulus selection. Instead, they continue to rely on trial-by-trial feedback processing to support learning dependent upon engagement of the anterior cingulate and orbito-frontal cortices.
Abstract:
Whether to continue to exploit a source of reward, or to search for a new one of potentially greater value, is a fundamental and underconstrained decision. Recent computational studies of this exploration-exploitation tradeoff have found that variability in exploration across individuals is influenced by a functional polymorphism (Val158Met) in the catechol-O-methyltransferase (COMT) gene, whose protein product degrades synaptically-released dopamine. However, these and other genotype-phenotype associations have rarely been causally tested. To directly test this association and to evaluate additional behavioral characteristics, including perceived locus of control, here we used the COMT inhibitor tolcapone in a randomized, double-blind, counterbalanced, within-subject study of 66 subjects genotyped for the Val158Met allele to assess the hypothesis that reducing COMT enzymatic activity interacts with genotype to increase uncertainty-driven exploration. In keeping with our initial hypothesis, tolcapone led to an increase in exploratory, but not exploitative, behavior in Met/Met rather than Val/Val subjects. Independent of genotype, those subjects with a more external locus of control also showed increases in uncertainty-driven exploration on tolcapone relative to placebo. However, we did not replicate our previous finding that Met/Met subjects show greater exploration at baseline. Together these findings support a model in which exploration is hypothesized to have a dopaminergic basis. Moreover, in keeping with findings in other behavioral and cognitive domains, the response to an increase in presumptively frontal dopamine is dependent upon baseline dopamine tone.
Abstract:
Recent advancements in cognitive neuroscience have afforded a description of neural responses in terms of latent algorithmic operations. However, the adoption of this approach to human scalp electroencephalography (EEG) has been more limited, despite the ability of this methodology to quantify canonical neuronal processes. Here, we provide evidence that theta band activities over themidfrontal cortex appear to reflect a common computation used for realizing the need for cognitive control. Moreover, by virtue of inherent properties of field oscillations, these theta band processes may be used to communicate this need and subsequently implement such control across disparate brain regions. Thus, frontal theta isa compelling candidate mechanism by which emergent processes, such as 'cognitive control', may be biophysically realized.
Abstract:
Human cognition is flexible and adaptive, affording the ability to detect and leverage complex structure inherent in the environment and generalize this structure to novel situations. Behavioral studies show that humans impute structure into simple learning problems, even when this tendency affords no behavioral advantage. Here we used electroencephalography to investigate the neural dynamics indicative of such incidental latent structure. Event-related potentials over lateral prefrontal cortex, typically observed for instructed task rules, were stratified according to individual participants’ constructed rule sets. Moreover, this individualized latent rule structure could be independently decoded from multielectrode pattern classification. Both neural markers were predictive of participants’ ability to subsequently generalize rule structure to new contexts. These EEG dynamics reveal that the human brain spontaneously constructs hierarchically structured representations during learning of simple task rules.
Abstract:
The prefrontal cortex is proposed to implement cognitive control via directed top-down influence over behavior. But how is this feat achieved? The virtue of such a descriptive model is contingent on a mechanistic understanding of how motor execution is altered in specific circumstances. In this report, we provide evidence that the well-known phenomenon of slowed RTs following mistakes (post-error slowing) is directly influenced by the degree of subthalamic nucleus (STN) activity. The STN is proposed to act as a brake on motor execution following conflict or errors, buying more time so a more cautious response can be made on the next trial. STN local field potentials from nine Parkinsonʼs patients undergoing deep brain stimulation surgery were recorded while they performed a response conflict task. In a 2.5- to 5-Hz frequency range previously associated with conflict and error processing, the degree phase consistency preceding the response was associated with increasingly slower RTs specifically following errors. These findings provide compelling evidence that post-error slowing is in part mediated by a corticosubthalamic "hyperdirect" pathway for increased response caution.
Abstract:
Convergent evidence suggests that corticostriatal interactions act as a gate to select the input to work- ing memory (WM). However, not all information in WM is relevant for behavior simultaneously. For this reason, a second "output gate" might advanta- geously govern which contents of WM influence behavior. Here, we test whether frontostriatal circuits previously implicated in input gating also support output gating during selection from WM. fMRI of a hierarchical rule task with dissociable input and output gating demands demonstrated greater lateral prefrontal cortex (PFC) recruitment and frontostriatal connectivity during output gating. Moreover, PFC and striatum correlated with distinct behavioral pro- files. Whereas PFC recruitment correlated with mean efficiency of selection from WM, striatal recruitment and frontostriatal interactions correlated with its reliability, as though such dynamics stochastically gate WM’s output. These results support the output gating hypothesis, suggesting that contextual repre- sentations in PFC influence striatum to select which information in WM drives responding.
Abstract:
Can you predict what people are going to do just by watching them? This is certainly difficult: it would require a clear mapping between observable indicators and unobservable cognitive states. In this report, we demonstrate how this is possible by monitoring eye gaze and pupil dilation, which predict dissociable biases during decision making. We quantified decision making using the drift diffusion model (DDM), which provides an algorithmic account of how evidence accumulation and response caution contribute to decisions through separate latent parameters of drift rate and decision threshold, respectively. We used a hierarchical Bayesian estimation approach to assess the single trial influence of observable physiological signals on these latent DDM parameters. Increased eye gaze dwell time specifically predicted an increased drift rate toward the fixated option, irrespective of the value of the option. In contrast, greater pupil dilation specifically predicted an increase in decision threshold during difficult decisions. These findings suggest that eye tracking and pupillometry reflect the operations of dissociated latent decision processes.
Abstract:
Patients with schizophrenia (SZ) show cognitive impairments on a wide range of tasks, with clear deficiencies in tasks reliant on prefrontal cortex function and less consistently observed impairments in tasks recruiting the striatum. This study leverages tasks hypothesized to differentially recruit these neural structures to assess relative deficiencies of each. Forty-eight patients and 38 controls completed two reinforcement learning tasks hypothesized to interrogate prefrontal and striatal functions and their interaction. In each task, participants learned reward discriminations by trial and error and were tested on novel stimulus combinations to assess learned values. In the task putatively assessing fronto-striatal interaction, participants were (inaccurately) instructed that one of the stimuli was valuable. Consistent with prior reports and a model of confirmation bias, this manipulation resulted in overvaluation of the instructed stimulus after its true value had been experienced. Patients showed less susceptibility to this confirmation bias effect than did controls. In the choice bias task hypothesized to more purely assess striatal function, biases in endogenously and exogenously chosen actions were assessed. No group differences were observed. In the subset of participants who showed learning in both tasks, larger group differences were observed in the confirmation bias task than in the choice bias task. In the confirmation bias task, patients also showed impairment in the task conditions with no prior instruction. This deficit was most readily observed on the most deterministic discriminations. Taken together, these results suggest impairments in fronto-striatal interaction in SZ, rather than in striatal function per se.
Abstract:
The diffusion model is a commonly used tool to infer latent psychological processes underlying decision making, and to link them to neural mechanisms based on reaction times. Although efficient open source software has been made available to quantitatively fit the model to data, current estimation methods require an abundance of reaction time measurements to recover meaningful parameters, and only provide point estimates of each parameter. In contrast, hierarchical Bayesian parameter estimation methods are useful for enhancing statistical power, allowing for simultaneous estimation of individual subject parameters and the group distribution that they are drawn from, while also providing measures of uncertainty in these parameters in the posterior distribution. Here, we present a novel Python-based toolbox called HDDM (hierarchical drift diffusion model), which allows fast and flexible estimation of the the drift-diffusion model and the related linear ballistic accumulator model. HDDM requires fewer data per subject / condition than non-hierarchical method, allows for full Bayesian data analysis, and can handle outliers in the data. Finally, HDDM supports the estimation of how trial-by-trial measurements (e.g. fMRI) influence decision making parameters. This paper will first describe the theoretical background of drift-diffusion model and Bayesian inference. We then illustrate usage of the toolbox on a real-world data set from our lab. Finally, parameter recovery studies show that HDDM beats alternative fitting methods like the chi-quantile method as well as maximum likelihood estimation. The software and documentation can be downloaded at: http://ski.clps.brown.edu/hddm_docs
Abstract:
In this report we describe how common brain networks within the medial frontal cortex (MFC) facilitate adaptive behavioral control in rodents and humans. We demonstrate that after errors, low-frequency oscillations below 12 Hz are modulated over the midfrontal cortex in humans and within the prelimbic and anterior cingulate regions of the MFC in rats. These oscillations were phase locked between the MFC and motor areas in both rats and humans. In rats, single neurons that encoded prior behavioral outcomes were phase coherent with low-frequency field oscillations, particularly after errors. Inactivating the medial frontal regions in rats led to impaired behavioral adjustments after errors, eliminated the differential expression of low-frequency oscillations after errors and increased low-frequency spike-field coupling within the motor cortex. Our results describe a new mechanism for behavioral adaptation through low-frequency oscillations and elucidate how medial frontal networks synchronize brain activity to guide performance.
Abstract:
Pavlovian biases influence learning and decision making by intricately coupling reward seeking with action invigoration and punishment avoidance with action suppression. This bias is not always adaptive; it can oftentimes interfere with instrumental requirements. The prefrontal cortex is thought to help resolve such conflict between motivational systems, but the nature of this control process remains unknown. EEG recordings of mid-frontal theta band power are sensitive to conflict and predictive of adaptive control over behavior, but it is not clear whether this signal would reflect control over conflict between motivational systems. Here we utilized a task that orthogonalized action requirements and outcome valence while recording concurrent EEG in human participants. By applying a computational model of task performance, we derived parameters reflective of the latent influence of Pavlovian bias and how it was modulated by midfrontal theta power during motivational conflict. Between subjects, individuals who performed better under Pavlovian conflict exhibited higher mid-frontal theta power. Within subjects, trial-to-trial variance in theta power was predictive of ability to overcome the influence of the Pavlovian bias, and this effect was most pronounced in individuals with higher mid-frontal theta to conflict. These findings demonstrate that mid-frontal theta is not only a sensitive index of prefrontal control, but it can also reflect the application of top-down control over instrumental processes.
Abstract:
Learning and executive functions such as task-switching share common neural substrates, notably prefrontal cortex and basal ganglia. Understanding how they interact requires studying how cognitive control facilitates learning but also how learning provides the (potentially hidden) structure, such as abstract rules or task-sets, needed for cognitive control. We investigate this question from three complementary angles. First, we develop a new context-task-set (C-TS) model, inspired by nonparametric Bayesian methods, specifying how the learner might infer hidden structure (hierarchical rules) and decide to reuse or create new structure in novel situations. Second, we develop a neurobiologically explicit network model to assess mechanisms of such structured learning in hierarchical frontal cortex and basal ganglia circuits. We systematically explore the link between these modeling levels across task demands. We find that the network provides an approximate implementation of high-level C-TS computations, with specific neural mechanisms modulating distinct C-TS parameters. Third, this synergism yields predictions about the nature of human optimal and suboptimal choices and response times during learning and task-switching. In particular, the models suggest that participants spontaneously build task-set structure into a learning problem when not cued to do so, which predicts positive and negative transfer in subsequent generalization tests. We provide experimental evidence for these predictions and show that C-TS provides a good quantitative fit to human sequences of choices. These findings implicate a strong tendency to interactively engage cognitive control and learning, resulting in structured abstract representations that afford generalization opportunities and, thus, potentially long-term rather than short-term optimality.
Abstract:
Planning and executing volitional actions in the face of con icting habitual responses is a critical aspect of human behavior. At the core of the interplay between these two control systems lies an override mechanism that can suppress the habitual action selection process and allow executive control to take over. Here, we construct a neural circuit model informed by behavioral and electrophysiological data collected on various response inhibition paradigms. This model extends a well established model of action selection in the basal ganglia (BG) by including a frontal executive control network which integrates information about sensory input and task rules to facilitate well- informed decision making via the oculomotor system. Our simulations of the antisaccade, Simon and saccade-override task ensue in con ict between a prepotent and controlled response which causes the network to pause action selection via projections to the subthalamic nucleus. Our model reproduces key behavioral and electrophysiological patterns and their sensitivity to lesions and pharmacological manipulations. Finally, we show how this network can be extended to include the inferior frontal cortex to simulate key qualitative patterns of global response inhibition demands as required in the stop-signal task.
Abstract:
Dopamine contributes to corticostriatal plasticity and motor learning. Dopamine denervation profoundly alters motor performance, as in Parkinson’s disease (PD); however, the extent to which these symptoms reflect impaired motor learning is unknown. Here, we demonstrate a D2 receptor blockade-induced aberrant learning that impedes future motor performance when dopamine signaling is restored, an effect diminished by coadministration of adenosine antagonists during blockade. We hypothesize that an inappropriate corticostriatal potentiation in striatopallidal cells of the indirect pathway underlies aberrant learning. We demonstrate synaptic potentiation in striatopallidal neurons induced by D2 blockade and diminished by application of an adenosine antagonist, consistent with behavioral observations. A neurocomputational model of the basal ganglia recapitulates the behavioral pattern and further links aberrant learning to plasticity in the indirect pathway. Thus, D2-mediated aberrant learning may contribute to motor deficits in PD, suggesting new avenues for the development of therapeutics.
Abstract:
Background: Decision-making studies show that response selection is influenced by the "effort cost" associated with response alternatives. These effort-cost calculations seem to be mediated by a distributed neural circuit including the anterior cingulate cortex and subcortical targets of dopamine neurons. On the basis of evidence of dysfunction in these systems in schizophrenia (SZ), we examined whether effort-cost computations were impaired in SZ patients and whether these deficits were associated with negative symptoms. Methods: Effort-cost decision-making performance was evaluated in 44 patients with SZ and 36 demographically matched control subjects. Subjects performed a computerized task where they were presented with a series of 30 trials in which they could choose between making 20 button presses for $1 or 100 button presses for higher amounts (varying from $3 to $7 across trials). Probability of reward receipt was also manipulated to determine whether certain (100%) or uncertain (50%) reward affected effort-based decision- making. Results: Patients were less likely than control subjects to select the high-effort response alternative during the 100% probability condition, particularly when the value payoff was highest (i.e., $6 and $7). Patients were also less likely to select the high-effort option on trials after reward in the 50% probability condition. Furthermore, these impairments in effort-cost computations were greatest among patients with elevated negative symptoms. There was no association with haloperidol equivalent dosage. Conclusions: The motivational impairments of SZ might be associated with abnormalities in estimating the "cost" of effortful behavior. This increased effort cost might undermine volition.
Abstract:
Instrumental learning involves corticostriatal circuitry and the dopaminergic system. This system is typically modeled in the reinforcement learning (RL) framework by incrementally accumulating reward values of states and actions. However, human learning also implicates prefrontal cortical mechanisms involved in higher level cognitive functions. The interaction of these systems remains poorly understood, and models of human behavior often ignore working memory (WM) and therefore incorrectly assign behavioral variance to the RL system. Here we designed a task that highlights the profound entanglement of these two processes, even in simple learning problems. By systematically varying the size of the learning problem and delay between stimulus repetitions, we separately extracted WM-specific effects of load and delay on learning. We propose a new computational model that accounts for the dynamic integration of RL and WM processes observed in subjects’ behavior. Incorporating capacity-limited WM into the model allowed us to capture behavioral variance that could not be captured in a pure RL framework even if we (implausibly) allowed separate RL systems for each set size. The WM component also allowed for a more reasonable estimation of a single RL process. Finally, we report effects of two genetic polymorphisms having relative specificity for prefrontal (COMT) and basal ganglia (GPR6) functions. Whereas COMT selectively influenced model estimates of WM capacity, GPR6 selectively influenced RL learning rate. Thus, this study allowed us to specify distinct influences of the high-level and low-level cognitive functions on instrumental learning, beyond the possibilities offered by simple reinforcement learning models.
Abstract:
In order to understand the exploitation/exploration trade-off in reinforcement learning, previous theoretical and empirical accounts have suggested that increased uncertainty may precede the decision to explore an alternative option. To date, the neural mechanisms that support the strategic application of uncertainty-driven exploration remain underspecified. In this study, electroencephalography (EEG) was used to assess trial-to-trial dynamics relevant to exploration and exploitation. Theta-band activities over middle and lateral frontal areas have previously been implicated in EEG studies of re- inforcement learning and strategic control. It was hypothesized that these areas may interact during top-down strategic behavioral control involved in exploratory choices. Here, we used a dynamic reward--learning task and an associated mathematical model that predicted individual response times. This reinforcement-learning model generated value-based prediction errors and trial-by-trial estimates of exploration as a function of uncertainty. Mid-frontal theta power correlated with unsigned prediction error, although negative prediction errors had greater power overall. Trial-to-trial variations in response-locked frontal theta were linearly related to relative uncertainty and were larger in individuals who used uncertainty to guide exploration. This finding suggests that theta- band activities reflect prefrontal-directed strategic control during exploratory choices.
Abstract:
How do individuals decide to act based on a rewarding status quo versus an unexplored choice that might yield a better outcome? Recent evidence suggests individuals may strategically explore as a function of the relative uncertainty about the expected value of options. However, the neural mechanisms supporting uncertainty-driven exploration remain underspecified. The present fMRI study scanned a reinforcement learning task in which participants stop a rotating clock hand in order to win points. Reward schedules were such that expected value could increase, decrease, or remain constant with respect to time. We fit several mathematical models to subject behavior to generate trial-by-trial estimates of exploration as a function of relative uncertainty. These estimates were used to analyze our fMRI data. Results indicate that rostrolateral prefrontal cortex tracks trial- by-trial changes in relative uncertainty, and this pattern distinguished individuals who rely on relative uncertainty for their exploratory decisions versus those who do not.
Abstract:
Context: Negative symptoms are a core feature of schizophrenia, but their pathogenesis remains unclear. Negative symptoms are defined by the absence of normal function. However, there must be a productive mechanism that leads to this absence. Objective: To test a reinforcement learning account suggesting that negative symptoms result from a failure in the representation of the expected value of rewards coupled with preserved loss-avoidance learning. Design: Participants performed a probabilistic reinforcement learning paradigm involving stimulus pairs in which choices resulted in reward or in loss avoidance. Following training, participants indicated their valuation of the stimuli in a transfer test phase. Computational modeling was used to distinguish between alternative accounts of the data. Setting: A tertiary care research outpatient clinic. Patients: In total, 47 clinically stable patients with a diagnosis of schizophrenia or schizoaffective disorder and 28 healthy volunteers participated in the study. Patients were divided into a high-negative symptom group and a low-negative symptom group. Main Outcome Measures: The number of choices leading to reward or loss avoidance, as well as performance in the transfer test phase. Quantitative fits from 3 different models were examined. Results: Patients in the high-negative symptom group demonstrated impaired learning from rewards but intact loss-avoidance learning and failed to distinguish rewarding stimuli from loss-avoiding stimuli in the transfer test phase. Model fits revealed that patients in the highnegative symptom group were better characterized by an 'actor-critic' model, learning stimulus-response associations, whereas control subjects and patients in the lownegative symptom group incorporated expected value of their actions ('Q learning') into the selection process. Conclusions: Negative symptoms in schizophrenia are associated with a specific reinforcement learning abnormality: patients with high-negative symptoms do not represent the expected value of rewards when making decisions but learn to avoid punishments through the use of prediction errors. This computational framework offers the potential to understand negative symptoms at a mechanistic level.
Abstract:
Goal-oriented signals from the prefrontal cortex gate the selection of appropriate actions in the basal ganglia. Key nodes within this fronto-basal ganglia action regulation network are increasingly engaged when one anticipates the need to inhibit and override planned actions. Here, we ask how the advance preparation of action plans modulates the need for fronto-subcortical control when a planned action needs to be withdrawn. Functional magnetic resonance imaging data were collected while human participants performed a stop task with cues indicating the likelihood of a stop signal being sounded. Mathematical modeling of go trial responses suggested that participants attained a more cautious response strategy when the probability of a stop signal increased. Effective connectivity analysis indicated that, even in the absence of stop signals, the proactive engagement of the full control network is tailored to the likelihood of stop trial occurrence. Importantly, during actual stop trials, the strength of fronto-subcortical projections was stronger when stopping had to be engaged reactively compared with when it was proactively prepared in advance. These findings suggest that fronto-basal ganglia control is strongest in an unpredictable environment, where the prefrontal cortex plays an important role in the optimization of reactive control. Importantly, these results further indicate that the advance preparation of action plans reduces the need for reactive fronto-basal ganglia communication to gate voluntary actions.
Abstract:
It takes effort and time to tame one's impulses. Although medial prefrontal cortex (mPFC) is broadly implicated in effortful control over behavior, the subthalamic nucleus (STN) is specifically thought to contribute by acting as a brake on cortico-striatal function during decision conflict, buying time until the right decision can be made. Using the drift diffusion model of decision making, we found that trial-to-trial increases in mPFC activity (EEG theta power, 4-8 Hz) were related to an increased threshold for evidence accumulation (decision threshold) as a function of conflict. Deep brain stimulation of the STN in individuals with Parkinson's disease reversed this relationship, resulting in impulsive choice. In addition, intracranial recordings of the STN area revealed increased activity (2.5-5 Hz) during these same high-conflict decisions. Activity in these slow frequency bands may reflect a neural substrate for cortico-basal ganglia communication regulating decision processes.
Abstract:
In this letter, we examine the computational mechanisms of reinforcement-based decision making. We bridge the gap across multiple levels of analysis, from neural models of corticostriatal circuits - the basal ganglia (BG) model (Frank, 2005, 2006) to simpler but mathematically tractable diffusion models of two-choice decision making. Specifically, we generated simulated data from the BG model and fit the diffusion model (Ratcliff, 1978) to it. The standard diffusion model fits underestimated response times under conditions of high response/reinforcement conflict. Follow up fits showed good fits to the data both by raising decision thresholds as a function of conflict and when allowing this threshold to collapse with time. This profile captures the role and dynamics of the subthalamic nucleus in BG circuitry, and as such, parametric modulations of projection strengths from this nucleus were associated with parametric increases in decision boundary and its modulation by conflict. We then present data from a human reinforcement learning experiment involving decisions with low and high reinforcement conflict. Again, the standard model failed to fit the data, but we found that the same two variants that fit the BG model data fit the experimental data, thereby providing a convergence of theoretical accounts of complex interactive decision making mechanisms consistent with available data. This work also demonstrates how to make modest modifications to diffusion models to summarize core computations of the BG model. The result is a better fit and understanding of reinforcement-based choice data than that which would have occurred with either model alone.
Abstract:
Growing evidence suggests that the prefrontal cortex (PFC) is organized hierarchically, with more anterior regions having increasingly abstract representations. How does this organization support hierarchical cognitive control and the rapid discovery of abstract action rules? We present computational models at different levels of description. A neural circuit model simulates interacting corticostriatal circuits organized hierarchically. In each circuit the basal ganglia (BG) gate frontal actions, with some striatal units gating the inputs to PFC, and others gating the outputs to influence response selection. Learning at all of these levels is accomplished via dopaminergic reward prediction error signals in each corticostriatal circuit. This functionality allows the system to exhibit conditional if-then hypothesis testing and to learn rapidly in environments with hierarchical structure. We also develop a hybrid Bayesian-RL mixture of experts (MoE) model, which can estimate the most likely hypothesis state of individual participants based on their observed sequence of choices and rewards. This model yields accurate probabilistic estimates about which hypotheses are attended by manipulating attentional states in the generative neural model and recovering them with the MoE model. This two-pronged modeling approach leads to multiple quantitative predictions that are tested with fMRI in the companion paper.
Abstract:
The frontal lobes may be organized hierarchically such that more rostral frontal regions modulate cognitive control operations in caudal regions. In our companion paper (Frank and Badre; submitted), we provide novel neural circuit and algorithmic models of hierarchical cognitive control in cortico-striatal circuits. Here, we test key model predictions using fMRI. Our neural circuit model proposes that contextual representations in rostral frontal cortex influence the striatal gating of contextual representations in caudal frontal cortex. Reinforcement learning operates at each level, such that the system adaptively learns to gate higher order contextual information into rostral regions. Our algorithmic "mixture of experts" model captures the key computations of this neural model and provides trial-by-trial estimates of the learner's latent hypothesis states. In the present paper, we used these quantitative estimates to reanalyze fMRI data from a hierarchical reinforcement learning task reported in Badre et al. (2010). Results validate key predictions of the models and provide evidence for an individual cortico-striatal circuit for reinforcement learning of hierarchical structure at a specific level of policy abstraction. These findings are initially consistent with the proposal that hierarchical control in frontal cortex may emerge from interactions among nested cortico-striatal circuits at different levels of abstraction.
Abstract:
The striatum is critical for the incremental learning of values associated with behavioral actions. The pre- frontal cortex (PFC) represents abstract rules and explicit contingencies to support rapid behavioral adapta- tion in the absence of cumulative experience. Here we test two alternative models of the interaction between these systems, and individual differences thereof, when human subjects are instructed with prior information about reward contingencies that may or may not be accurate. Behaviorally, subjects are overly influenced by prior instructions, at the expense of learning true reinforcement statistics. Computational analysis found that this pattern of data is best accounted for by a confirmation bias mechanism in which prior beliefs - putatively represented in PFC - influence the learning that occurs in the striatum such that reinforcement statistics are distorted. We assessed genetic variants affecting prefrontal and striatal dopaminergic neurotransmission. A polymorphism in the COMT gene (rs4680), associated with prefrontal dopaminergic function, was predic- tive of the degree to which participants persisted in responding in accordance with prior instructions even as evidence against their veracity accumulated. Polymorphisms in genes associated with striatal dopamine function (DARPP-32, rs907094, and DRD2, rs6277), were predictive of learning from positive and negative outcomes. Notably, these same variants were predictive of the degree to which such learning was overly inflated or neglected when outcomes are consistent or inconsistent with prior instructions. These findings indicate dissociable neurocomputational and genetic mechanisms by which initial biases are strengthened by experience.
Abstract:
Computational models of the basal ganglia have matured and received increasing attention over the last decade. This article reviews some of the theoretical advances offered by these models, focusing on motor and cognitive action selection, learning, and the interaction between multiple corticostriatal circuits in selection and learning.
Abstract:
Over the last decade and a half, reinforcement learning models have fostered an increasingly sophisticated understanding of the functions of dopamine and cortico-basal ganglia-thalamo-cortical (CBGTC) circuits. More recently, these models, and the insights that they afford, have started to be used to understand important aspects of several psychiatric and neurological disorders that involve disturbances of the dopaminergic system and CBGTC circuits. We review this approach and its existing and potential applications to Parkinson’s disease, Tourette’s syndrome, attention-deficit/hyperactivity disorder, addiction, schizophrenia and preclinical animal models used to screen new antipsychotic drugs. The approach’s proven explanatory and predictive power bodes well for the continued growth of computational psychiatry and computational neurology.
Abstract:
Many of the individual differences in cognition, motivation and learning -- and the disruption of these processes in neurological conditions -- are influenced by genetic factors. We provide an integrative synthesis across human and animal studies, focusing on a recent spate of evidence implicating a role for genes controlling dopaminergic function in frontostriatal circuitry, including COMT, DARPP-32, DAT1, DRD2, and DRD4. These genetic effects are interpreted within theoretical frameworks developed in the context of the broader cognitive and computational neuroscience literature, constrained by data from pharmacological, neuroimaging, electrophysiological, and patient studies. In this framework, genes modulate the efficacy of particular neural computations, and effects of genetic variation are revealed by assays designed to be maximally sensitive to these computations. We discuss the merits and caveats of this approach and outline a number of novel candidate genes of interest for future study.
Abstract:
The medial prefrontal cortex (mPFC) is particularly reactive to signals of error, punishment, and conflict in the service of behavioral adaptation and it is consistently implicated in the etiology of major depressive disorder (MDD). This association makes conceptual sense, given that MDD has been associated with hyper-reactivity in neural systems associated with punishment processing. Yet in practice, depression-related variance in measures of mPFC functioning often fails to relate to performance. For example, neuroelectric reflections of mediofrontal error signals are often found to be larger in MDD, but a deficit in post-error performance suggests that these error signals are not being used to rapidly adapt behavior. Thus, it remains unknown if depression-related variance in error signals reflects a meaningful alteration in the use of error or punishment information. However, larger mediofrontal error signals have also been related to another behavioral tendency: increased accuracy in avoidance learning. The integrity of this error-avoidance system remains untested in MDD. In this study, EEG was recorded as 21 symptomatic, drug-free participants with current or past MDD and 24 control participants performed a probabilistic reinforcement learning task. Depressed participants had larger mid-frontal EEG responses to error feedback than controls. The direct relationship between error signal amplitudes and avoidance learning accuracy was replicated. Crucially, this relationship was stronger in depressed participants for high conflict “lose–lose” situations, demonstrating a selective alteration of avoidance learning. This investigation provided evidence that larger error signal amplitudes in depression are associated with increased avoidance learning, identifying a candidate mechanistic model for hypersensitivity to negative outcomes in depression.
Abstract:
Fortune favors those who are able to align their plans and goals to accord with the constraints imposed on them by an intricate and dynamic world. However, this presents an exceedingly difficult assignment, since the constraints pressed on an organism are typically complex, uncertain, and even paradoxical. When foodstuffs run low in the fall, should a hungry forager explore new and unfamiliar territory, or should it conserve energy and wait for something to turn up? The situation may appear dire and warrant the hazards of straying from routine, yet knowledge built up over years of experience may suggest that patience will be rewarded. Flexible goal-directed behavior demands an adaptive system capable of selecting behavior appropriate for a given context. Evidence spanning a range of methodologies suggests that the anterior cingulate cortex (ACC) and the basal ganglia (BG) are two of the core brain structures involved in cognitive control, both contributing to pathways critical for learning and decision making. At an abstract level of description, the ACC is generally thought to be involved in monitoring performance and instigating rapid behavioral adjust- ments when required, whereas the BG is thought to facilitate and suppress behavior based on more stable environmental statistics. Both of these functions have been simulated in separate computational models of the ACC and BG. Although considerable debate still surrounds the unique function each system serves, here we take the approach that a better understanding may also emerge from considering the interaction between the ACC and the BG. We propose a model in which ACC activity is modulated in part by reinforcement learning processes in the BG. In particular, we focus on how this relationship between the ACC and the BG may help clarify our understanding of the error- related negativity (ERN), a component of the event-related potential (ERP) thought to be generated in the ACC. We begin with a brief overview of the two dominant theories explaining the ERN: the reinforcement learning hypothesis advanced by Holroyd, Coles, and colleagues, and the conflict monitoring hypothesis advocated by Botvinick, Yeung, and colleagues. This overview is followed by a sketch of the core BG model and its role in reinforcement learning and action selection. We then include a novel extension incorporating the ACC into the BG model, using simulated ACC activity to quantify the ERN as response conflict driven by reinforcement learning processes in the BG as a function of feedback processing. We conclude with a discussion of how this model may advance our understanding of both the ACC and the BG, in addition to resolving an ongoing debate between the two dominant models of the ERN.
Abstract:
Background: Negative symptoms are core features of schizophrenia (SZ); however, the cognitive and neural basis for individual negative symptom domains remains unclear. Converging evidence suggests a role for striatal and prefrontal dopamine in reward learning and the exploration of actions that might produce outcomes that are better than the status quo. The current study examines whether deficits in reinforcement learning and uncertainty-driven exploration predict specific negative symptom domains. Methods: We administered a temporal decision-making task, which required trial-by-trial adjustment of reaction time to maximize reward receipt, to 51 patients with SZ and 39 age-matched healthy control subjects. Task conditions were designed such that expected value (probability ϫ magnitude) increased, decreased, or remained constant with increasing response times. Computational analyses were applied to estimate the degree to which trial-by-trial responses are influenced by reinforcement history. Results: Individuals with SZ showed impaired Go learning but intact NoGo learning relative to control subjects. These effects were most pronounced in patients with higher levels of negative symptoms. Uncertainty-based exploration was substantially reduced in individuals with SZ and selectively correlated with clinical ratings of anhedonia. Conclusions: Schizophrenia patients, particularly those with high negative symptoms, failed to speed reaction times to increase positive outcomes and showed reduced tendency to explore when alternative actions could lead to better outcomes than the status quo. Results are interpreted in the context of current computational, genetic, and pharmacological data supporting the roles of striatal and prefrontal dopamine in these processes.
Abstract:
To examine how stress affects cognitive functioning, individual differences in trait vulnerability (punishment sensitivity) and state reactivity (negative affect) to social evaluative threat were examined during concurrent reinforcement learning. Lower trait-level punishment sensitivity predicted better reward learning and poorer punishment learning; the opposite pattern was found in more punishment sensitive individuals. Increasing state-level negative affect was directly related to punishment learning accuracy in highly punishment sensitive individuals, but these measures were inversely related in less sensitive individuals. Combined electrophysiological measurement, performance accuracy and computational estimations of learning parameters suggest that trait and state vulnerability to stress alter cortico-striatal functioning during reinforcement learning, possibly mediated via medio-frontal cortical systems.
Abstract:
Background. Autism spectrum disorders (ASDs) can be conceptualized as disorders of learning, however there have been few experimental studies taking this perspective. Methods. We examined the probabilistic reinforcement learning performance of 28 adults with ASDs and 30 typically developing adults on a task requiring learning relationships between three stimulus pairs consisting of Japanese characters with feedback that was valid with different probabilities (80%, 70%, and 60%). Both univariate and Bayesian state-space data analytic methods were employed. Hypotheses were based on the extant literature as well as on neurobiological and computational models of reinforcement learning. Results. Both groups learned the task after training. However, there were group differences in early learning in the first task block where individuals with ASDs acquired the most frequently accurately reinforced stimulus pair (80%) comparably to typically developing individuals; exhibited poorer acquisition of the less frequently reinforced 70% pair as assessed by state-space learning curves; and outperformed typically developing individuals on the near chance (60%) pair. Individuals with ASDs also demonstrated deficits in using positive feedback to exploit rewarded choices. Conclusions. Results support the contention that individuals with ASDs are slower learners. Based on neurobiology and on the results of computational modeling, one interpretation of this pattern of findings is that impairments are related to deficits in flexible updating of reinforcement history as mediated by the orbito-frontal cortex, with spared functioning of the basal ganglia. This hypothesis about the pathophysiology of learning in ASDs can be tested using functional magnetic resonance imaging.
Abstract:
Individuals with autism spectrum disorders (ASDs) exhibit intact rote learning with impaired generalization. A transitive inference paradigm, involving training on four sequentially presented stimulus pairs containing overlapping items, with subsequent testing on two novel pairs, was used to investigate this pattern of learning in 27 young adults with ASDs and 31 matched neurotypical individuals (TYPs). On the basis of findings about memory and neuropathology, we hypothesized that individuals with ASDs would use a relational flexibility/conjunctive strategy reliant on an intact hippocampus, versus an associative strength/value transfer strategy requiring intact interactions between the prefrontal cortex and the striatum. Hypotheses were largely confirmed. ASDs demonstrated reduced interference from intervening pairs in early training; only TYPs formed a serial position curve by test; and ASDs exhibited impairments on the novel test pair consisting of end items with intact performance on the inner test pair. However, comparable serial position curves formed for both groups by the end of the first block.
Abstract:
Rationale. Aversively motivated learning is more poorly understood than appetitively motivated learning in many aspects, including the role of dopamine receptors in different regions of the striatum. Objectives. The present study investigated the roles of the D1-like DA receptors in the nucleus accumbens (NAc) and dorsolateral striatum (DLS) on learning and performance of conditioned avoidance responses (CARs). Methods. Adult male Wistar rats received intraperitoneal (i.p.), intra-NAc, or intra-DLS injections of the D1 dopamine receptor agonist SKF 81297 or the D1 receptor antagonist SCH 23390 20 min before or immediately after a training session in the CAR task two-way active avoidance, carried out 24 h before a test session. Results. Pre-training administration of SCH 23390, but not SKF 81297, caused a significant decrease in the number of CARs in the test, but not in the training session, when injected into the DLS, or in either session when injected into the NAc. It also caused a significant increase in the number of escape failures in the training session when injected into the NAc. Systemic administration caused a combination of these effects. Post-training administrations of these drugs caused no significant effect. Conclusions. The results suggest that the D1-like receptors in the NAc and DLS play important, though different, roles in learning and performance of CAR.
Abstract:
Previous research indicates that behavioral performance in simple probability learning tasks can be organized into response strategy classifications that are thought to predict important personal characteristics and individual differences. Typically, relatively small proportion of subjects can be identified as optimizers for effectively exploiting the environment and choosing the more rewarding stimulus nearly all of the time. In contrast, the vast majority of subjects behaves sub-optimally and adopts the matching or super-matching strategy, apportioning their responses in a way that matches or slightly exceeds the probabilities of reinforcement. In the present study, we administered a two-choice probability learning paradigm to 51 individuals with schizophrenia (SZ) and 29 healthy controls (NC) to examine whether there are differences in the proportion of subjects falling into these response strategy classifications, and to determine whether task performance is differentially associated with symptom severity and neuropsychological functioning. Although the sample of SZ patients did not differ from NC in overall rate of learning or end performance, significant clinical differences emerged when patients were divided into optimizing, super-matching and matching subgroups based upon task performance. Patients classified as optimizers, who adopted the most advantageous learning strategy, exhibited higher levels of positive and negative symptoms than their matching and super-matching counterparts. Importantly, when both positive and negative symptoms were considered together, only negative symptom severity was a significant predictor of whether a subject would behave optimally, with each one standard deviation increase in negative symptoms increasing the odds of a patient being an optimizer by as much as 80%. These data provide a rare example of a greater clinical impairment being associated with better behavioral performance.
Abstract:
Objective: Patients with schizophrenia (SZ) show reinforcement learning impairments related to both the gradual/procedural acquisition of reward contingencies, and the ability to use trial-to-trial feedback to make rapid behavioral adjustments. Method: We used neurocomputational modeling to develop plausible mechanistic hypotheses explaining reinforcement learning impairments in individuals with SZ. We tested the model with a novel Go/NoGo learning task in which subjects had to learn to respond or withhold responses when presented with different stimuli associated with different probabilities of gains or losses in points. We analyzed data from 34 patients and 23 matched controls, characterizing positive- and negative-feedback-driven learning in both a training phase and a test phase. Results: Consistent with simulations from a computational model of aberrant dopamine input to the basal ganglia patients, patients with SZ showed an overall increased rate of responding in the training phase, together with reduced response-time acceleration to frequently rewarded stimuli across training blocks, and a reduced relative preference for frequently rewarded training stimuli in the test phase. Patients did not differ from controls on measures of procedural negative-feedback-driven learning, although patients with SZ exhibited deficits in trial-to-trial adjustments to negative feedback, with these measures correlating with negative symptom severity. Conclusions: These findings support the hypothesis that patients with SZ have a deficit in procedural 'Go' learning, linked to abnormalities in DA transmission at D1-type receptors, despite a 'Go bias' (increased response rate), potentially related to excessive tonic dopamine. Deficits in trial-to-trial reinforcement learning were limited to a subset of patients with SZ with severe negative symptoms, putatively stemming from prefrontal cortical dysfunction.
Abstract:
We review the contributions of biologically constrained computational models to our understanding of motor and cognitive deficits in Parkinson’s disease (PD). The loss of dopaminergic neurons innervating the striatum in PD, and the well-established role of dopamine (DA) in reinforcement learning (RL), enable neural network models of the basal ganglia (BG) to derive concrete and testable predictions. We focus in this review on one simple underlying principle – the notion that reduced DA increases activity and causes long-term potentiation in the indirect pathway of the BG. We show how this theory can provide a unified account of diverse and seemingly unrelated phenomena in PD including progressive motor degeneration as well as cognitive deficits in RL, decision making and working memory. DA replacement therapy and deep brain stimulation can alleviate some aspects of these impairments, but can actually introduce negative effects such as motor dyskinesias and cognitive impulsivity. We discuss these treatment effects in terms of modulation of specific mechanisms within the computational framework. In addition, we review neurocomputational interpretations of increased impulsivity in the face of response conflict in patients with deep-brain-stimulation.
Abstract:
Investigations into action monitoring have consistently detailed a frontocentral voltage deflection in the event-related potential (ERP) following the presentation of negatively valenced feedback, sometimes termed the feedback-related negativity (FRN). The FRN has been proposed to reflect a neural response to prediction errors during reinforcement learning, yet the single-trial relationship between neural activity and the quanta of expectation violation remains untested. Although ERP methods are not well suited to single-trial analyses, the FRN has been associated with theta band oscillatory perturbations in the medial prefrontal cortex. Mediofrontal theta oscillations have been previously associated with expectation violation and behavioral adaptation and are well suited to single-trial analysis. Here, we recorded EEG activity during a probabilistic reinforcement learning task and fit the performance data to an abstract computational model (Q-learning) for calculation of single-trial reward prediction errors. Single-trial theta oscillatory activities following feedback were investigated within the context of expectation (prediction error) and adaptation (subsequent reaction time change). Results indicate that interactive medial and lateral frontal theta activities reflect the degree of negative and positive reward prediction error in the service of behavioral adaptation. These different brain areas use prediction error calculations for different behavioral adaptations, with medial frontal theta reflecting the utilization of prediction errors for reaction time slowing (specifically following errors), but lateral frontal theta reflecting prediction errors leading to working memory-related reaction time speeding for the correct choice.
Abstract:
Larger error-related negativities (ERNs) have been consistently found in obsessive-compulsive disorder (OCD) patients, and are thought to reflect the activities of a hyperactive cortico-striatal circuit during action monitoring. We previously observed that obsessive-compulsive (OC) symptomatic students (non-patients) have larger ERNs during errors in a response competition task, yet smaller ERNs in a reinforcement learning task. The finding of a task-specific dissociation suggests that distinct yet partially overlapping medio-frontal systems underlie the ERN in different tasks, and that OC symptoms are associated with functional differences in these systems. Here, we used EEG source localization to identify why OC symptoms are associated with hyperactive ERNs to errors yet hypoactive ERNs when selecting maladaptive actions. At rest, OC symptomatology predicted greater activity in rostral anterior cingulate cortex (rACC) and lower activity in dorsal anterior cingulate cortex (dACC). When compared to a group with low OC symptom scores, the high OC group had greater rACC reactivity during errors in the response competition task and less deactivation of dACC activity during errors in the reinforcement learning task. The degree of activation in these areas correlated with ERN amplitudes during both tasks in the high OC group, but not in the low group. Interactive anterior cingulate cortex (ACC) systems associated avoidance of maladaptive actions were intact in the high OC group, but were related to poorer performance on a third task: probabilistic reversal learning. These novel findings link both tonic and phasic activities in the ACC to action monitoring alterations, including dissociation in performance deficits, in OC symptomatic participants.
Abstract:
Adaptive behavior depends on the ability to flexibly alter our choices in response to changes in reward and punishment contingencies. One brain region frequently implicated in such behavior is the striatum. However, this region is functionally diverse and there are a number of apparent inconsistencies across previous studies. For instance, how can significant BOLD responses in the ventral striatum during punishment-based reversal learning be reconciled with the frequently demonstrated role of the ventral striatum in reward processing? Here we attempt to address this question by separately examining BOLD responses during reversal learning driven by reward and during reversal learning driven by punishment. We demonstrate simultaneous valencespecific and valence-nonspecific signals in the striatum, with the posterior dorsal striatum responding only to unexpected reward, and the anterior ventral striatum responding to both unexpected punishment as well as unexpected reward. These data help to reconcile conflicting findings from previous studies by showing that distinct regions of the striatum exhibit dissociable responses to punishment during reversal learning.
Abstract:
Previous studies have typically found that individuals with schizophrenia (SZ) report levels of emotional experience that are similar to controls (CN) when asked to view a single evocative stimulus and make an absolute judgment of stimulus ‘‘value.’’ However, value is rarely assigned in absolute terms in real-life situations, where one alternative or experience is often evaluated alongside others, and value judgments are made in relative terms. In the current study, we examined performance on a preference task that requires individuals to differentiate between the relative values of different stimuli. In this task, subjects were presented with many pairs of moderately positive stimuli and asked to indicate which stimulus they preferred in each pair. Resulting data indicated the rank order of preference across stimuli and the consistency of their transitive mapping (ie, if A > B and B > C, then A should be > C). Individuals with SZ (n 5 38) were both less consistent in their rankings of stimuli and more likely to have larger magnitudes of discrepant responses than control subjects (n 5 27). Furthermore, CN showed clear differentiation between different valence categories of stimuli (ie, highly positive > mildly positive > mildly negative > highly negative); while individuals with SZ showed the same general pattern of results but with less differentiation between the valence levels. These data suggest that individuals with SZ are impaired in developing or maintaining nuanced representations of the different attributes of a stimulus, thus making stimuli of similar general value easily confusable.
Abstract:
Reinforcement learning is ubiquitous. Unlike other forms of learning, it involves the processing of fast yet content-poor feedback information to correct assump- tions about the nature of a task or of a set of stimuli. This feedback information is often delivered as generic rewards or punishments, and has little to do with the stimulus features to be learned. How can such low-content feedback lead to such an efficient learning paradigm? Through a review of existing neuro-computational models of rein- forcement learning, we suggest that the efficiency of this type of learning resides in the dynamic and synergistic cooperation of brain systems that use different levels of computations. The implementation of reward signals at the synaptic, cellular, network and system levels give the organism the necessary robustness, adaptability and pro- cessing speed required for evolutionary and behavioral success.
Abstract:
What biological mechanisms underlie the reward-predictive firing properties of midbrain dopaminergic neurons, and how do they relate to the complex constellation of empirical findings understood as Pavlovian and instrumental conditioning? We previously presented PVLV, a biologically-inspired Pavlovian learning algorithm accounting for DA activity in terms of two interrelated systems: a primary value (PV) system, which governs how DA cells respond to a US (reward) and; a learned value (LV) system, which governs how DA cells respond to a CS. Here, we provide a more extensive review of the biological mechanisms supporting phasic DA firing and their relation to the spate of Pavlovian conditioning phenomena and their sensitivity to focal brain lesions. We further extend the model by incorporating a new NV (novelty value) component reflecting the ability of novel stimuli to trigger phasic DA firing, providing ``novelty bonuses'' which encourages exploratory working memory updating and in turn speeds learning in trace conditioning and other working memory-dependent paradigms. The evolving PVLV model builds upon insights developed in many earlier computational models, especially reinforcement learning models based on the ideas of Sutton and Barto, biological models, and the psychological model developed by Savastano and Miller. The PVLV framework synthesizes these various approaches, overcoming important shortcomings of each by providing a coherent and specific mapping to much of the relevant empirical data at both the micro- and macro-levels, and examines their relevance for higher order cognitive functions.
Abstract:
Background. Central to understanding of the behavioural consequences of depression has been the theory that the disorder is accompanied by an increased sensitivity to negative compared with positive reinforcement (negative bias), whereas other theorists have emphasized a global reduction in sensitivity to reinforcement in depression (blunting). Method. In this study, we used a probabilistic selection task that was designed to examine independently rates of learning to predict both positive and negative reinforcement. Twenty-three depressed out-patients and 23 healthy controls from the local population participated in the study. Results. No evidence for a negative bias was observed on the task, either during acquisition of the task or during generalization of the learned information. Depressed patients responded slower on the task than controls but showed a similar modulation of reaction times (RTs) as controls following reinforcement. Evidence for blunting was observed on the training phase, as reflected in reduced trial-by-trial adjustment during this phase. However, this effect was related specifically to the severity of anhedonia, as measured by the Snaith–Hamilton Pleasure Scale (SHAPS), and was independent of overall depression severity. Conclusions. We argue that the observation of a negative bias or blunting in a group of depressed patients may be dependent on the neuropsychological task and the symptoms of the patients tested. Our results provide insight into how these theories might be further tested.
Abstract:
The basal ganglia support learning to exploit decisions that have yielded positive outcomes in the past. In contrast, limited evidence implicates the prefrontal cortex in the process of making strategic exploratory decisions when the magnitude of potential outcomes is unknown. Here we examine neurogenetic contributions to individual differences in these distinct aspects of motivated human behavior, using a temporal decision-making task and computational analysis. We show that two genes controlling striatal dopamine function, DARPP-32 (also called PPP1R1B) and DRD2, are associated with exploitative learning to adjust response times incrementally as a function of positive and negative decision outcomes. In contrast, a gene primarily controlling prefrontal dopamine function (COMT) is associated with a particular type of ‘directed exploration’, in which exploratory decisions are made in proportion to Bayesian uncertainty about whether other choices might produce outcomes that are better than the status quo. Quantitative model fits reveal that genetic factors modulate independent parameters of a reinforcement learning system.
Abstract:
The basal ganglia (BG) are implicated in a wide variety of motor and cognitive behaviors, making it difficult to extract a unifying function of these brain structures. We review a series of neurocomputational models that focus on the action selection and reinforcement learning functions of the BG, and their modulation by dopamine, as constrained by a broad range of data. We begin with the “basic” model, which forms the core mechanism for later appended models including the roles of norepinephrine, cognitive and affective components of prefrontal cortex, and the interaction between verbal instructions and reinforcement learning. We further review experiments designed to test model predictions as a function of disease, medications, genetics, and behavioral manipulations. Abstract mathematical models that have been used in conjunction with these accounts are also discussed.
Abstract:
Individuals differ in their tendencies to seek positive decision outcomes or to avoid negative ones. At the neurobiological level, our model suggests that phasic changes in dopamine support learning to reinforce good decisions via striatal D1 receptors, and to avoid maladaptive choices via striatal D2 receptors. Accordingly, in a previous study individual differences in positive and negative learning were strongly modulated by two genetic polymorphisms factors related to striatal D1 and D2 function, respectively. Nevertheless, whereas the role for dopamine in positive learning is relatively well accepted, that in learning to avoid negative outcomes is more controversial. Here we further explore D2-receptor-related genetic contributions to probabilistic avoidance, in light of recent data showing that particular DRD2 polymorphisms are associated with functional modulation of receptor expression (Zhang et al 2007, PNAS). We find that a promoter polymorphism rs12364283 associated with transcription and D2 receptor density was strongly and selectively predictive of avoidance-based decisions. Two further polymorphisms (rs2283265 and rs1076560) associated with relatively reduced presynaptic relative to postsynaptic D2 receptor expression were predictive of relative impairments in negative compared to positive decisions. These previously undocumented effects of DRD2 polymorphisms were largely independent of those we reported previously for the C957T polymorphism (rs6277) associated with striatal D2 density. In contrast, effects of the commonly studied Taq1A polymorphism on reinforcement-based decisions were due to indirect association with C957T. Taken together these findings suggest multiple D2-dependent genetic mechanisms contributing to avoidance. We discuss these effects in the context of neurocomputational models of reinforcement leaning in the basal ganglia.
Abstract:
Hyperactive cortico-striatal circuits including the anterior cingulate cortex (ACC) have been implicated to underlie obtrusive thoughts and repetitive behaviors in obsessive-compulsive disorder (OCD). Larger error-related negativities (ERNs) inOCDpatients during simple flanker tasks have been proposed to reflect anamplified error signal in these hyperactive circuits. Such amplified error signals typically are associated with an adaptive change in response, yet in OCD these same repetitive responses persist to the point of distress and impairment. In contrast to this repetitive character of OC behavior, larger ERN amplitudes have been linked to better avoidance learning in reinforcement learning tasks. Study I thus investigated if OC symptomatology in non-patients predicted an enhanced ERN after suboptimal choices in a probabilistic learning task. Absent any behavioral differences, higher OC symptoms predicted smaller ERNs. Study II replicated this effect in an independent sample while also replicating findings of a larger ERN in a flanker task. Therewere no relevant behavioral differences in reinforcement learning or error monitoring as a function of symptom score. These findings implicate different, yet overlapping neural mechanisms underlying the negative deflection in the ERP following the execution of an erroneous motor response and the one following a suboptimal choice in a reinforcement learning paradigm. OC symptomatology may be dissociated in these neural systems, with hypoactivity in a system that enables learning to avoid maladaptive choices, and hyperactivity in another system that enables the same behavior to be repeated when it was assessed as not quite good enough the first time.
Abstract:
The capacity to anticipate and prepare for future events is thought to be critical for cognitive control. Dominant accounts of cognitive control treat the developing system as merely a weaker version of the adult system, progressively strengthening over time. Using the AX Continuous Performance Task (AX-CPT) in combination with high-resolution pupillometry, we find that whereas 8-year-old children resemble adults in their proactive use of cognitive control, 3.5-year-old children exhibit a qualitatively different, reactive form of cognitive control, responding to events only as they unfold and retrieving information from memory as needed in the moment. These results demonstrate the need to reconsider the origins of cognitive control and the basis for children's behaviors across domains.
Abstract:
Rationale: Repeated haloperidol treatment in rodents results in a day-to-day intensification of catalepsy (i.e., sensitization). Prior experiments suggest that this sensitization is context dependent and resistant to extinction training. Objectives: To provide a neurobiological mechanistic explanation for these findings. Methods: We use a neurocomputational model of the basal ganglia, and simulate two alternative models based on the reward prediction error and novelty hypotheses of dopamine function. We also conducted a behavioral rat experiment to adjudicate between these models. 20 male Sprague-Dawley rats were challenged with 0.25 mg/kg haloperidol across multiple days, and were subsequently tested in either a familiar or novel context. Results: Simulation results show that catalepsy sensitization, and its context dependency, can be explained by ``NoGo'' learning via simulated D2 receptor antagonism in striatopallidal neurons, leading to increasingly slowed response latencies. The model further exhibits a non-extinguishable component of catalepsy sensitization, due to latent NoGo representations that are prevented from being expressed, and therefore from being unlearned, during extinction. In the rat experiment, context dependency effects were not dependent on the novelty of the context, ruling out the novelty model's account of context-dependency. Conclusions: Simulations lend insight into potential complex mechanisms leading to context-dependent catalepsy sensitization, extinction, and renewal.
Abstract:
Humans learn how to behave directly through environmental experience and indirectly through rules and instructions. Behavior analytic research has shown that instructions can control behavior, even when such behavior leads to sub-optimal outcomes (Hayes, 1989). Here we examine the control of behavior through instructions in a reinforcement-learning task known to depend on striatal dopaminergic function. Participants selected between probabilistically reinforced stimuli, and were (incorrectly) told that a specific stimulus had the highest (or lowest) reinforcement probability. Despite experience to the contrary, instructions drove choice behavior. We present neural network simulations that capture the interactions between instruction-driven and reinforcement-driven behavior via two potential neural circuits: one in which the striatum is inaccurately trained by instruction representations coming from prefrontal cortex/ hippocampus (PFC/HC), and another in which the striatum learns the environmentally based reinforcement contingencies, but is ``overridden'' at decision output. Both models capture the core behavioral phenomena but, because they differ fundamentally on what is learned, make distinct predictions for subsequent behavioral and neuroimaging experiments. Finally, we attempt to distinguish between the proposed computational mechanisms governing instructed behavior by fitting a series of abstract ``Q-learning'' and Bayesian models to subject data. The best-fitting model supports one of the neural models, suggesting the existence of a ``confirmation bias'' in which the PFC/HC system trains the reinforcement system by amplifying outcomes that are consistent with instructions while diminishing inconsistent outcomes.
Abstract:
Various psychological models posit two systems that contribute to decision making. The first system is bottom-up, automatic, intuitive, emotional, and implicit, while the second system is top-down, controlled, deliberative, and explicit. It has become increasingly evident that this dichotomy is both too simplistic and too vague. Here we consider insights gained from a different approach: one that considers the multiple computational demands of the decision making system in context of neural mechanisms specialized to accomplish some of its more basic functions. The use of explicit computational models has led to (i) identification of core trade-offs between cognitive systems that are solved by having multiple neural systems, and (ii) novel predictions that can be tested empirically, and which serve to further refine the models.
Abstract:
Individual variability in reward-based learning has been ascribed to quantitative variation in baseline levels of striatal dopamine. However, direct evidence for this pervasive hypothesis has hitherto been unavailable. We demonstrate that individual differences in reward-based reversal learning reflect variation in baseline striatal dopamine synthesis capacity, as measured with neurochemical positron emission tomography. Subjects with high baseline dopamine synthesis in the striatum showed relatively better reversal learning from unexpected rewards than from unexpected punishments, whereas subjects with low baseline dopamine synthesis in the striatum showed the reverse pattern. In addition, baseline dopamine synthesis predicted the direction of dopaminergic drug effects. The D2 receptor agonist bromocriptine improved reward-based relative to punishment-based reversal learning in subjects with low baseline dopamine synthesis capacity, while impairing it in subjects with high baseline dopamine synthesis capacity in the striatum. Finally, this pattern of drug effects was outcome-specific, and driven primarily by drug effects on punishment-, but not reward-based reversal learning. These data demonstrate that the effects of D2 receptor stimulation on reversal learning in humans depend on task demands and baseline striatal dopamine synthesis capacity.
Abstract:
The basal ganglia (BG) are critical for the coordination of several motor, cognitive, and emotional functions and become dysfunctional in several pathological states ranging from Parkinson's disease to Schizophrenia. Here we review principles developed within a neurocomputational framework of BG and related circuitry which provide insights into their functional roles in behavior. We focus on two classes of models: those that incorporate aspects of biological realism and constrained by functional principles, and more abstract mathematical models focusing on the higher level computational goals of the BG. While the former are arguably more ``realistic'', the latter have a complementary advantage in being able to describe functional principles of how the system works in a relatively simple set of equations, but are less suited to making specific hypotheses about the roles of specific nuclei and neurophysiological processes. We review the basic architecture and assumptions of these models, their relevance to our understanding of the neurobiological and cognitive functions of the BG, and provide an update on the potential roles of biological details not explicitly incorporated in existing models. Empirical studies ranging from those in transgenic mice to dopaminergic manipulation, deep brain stimulation, and genetics in humans largely support model predictions and provide the basis for further refinement. Finally, we discuss possible future directions and possible ways to integrate different types of models.
Abstract:
Converging evidence implicates striatal dopamine (DA) in reinforcement learning, such that DA increases enhance "Go learning" to pursue actions with rewarding outcomes, whereas DA decreases enhance "NoGo learning" to avoid non-rewarding actions. Here we test whether these effects apply to the response time domain. We employ a novel paradigm which requires the adjustment of response times to a single response. Reward probability varies as a function of response time, while reward magnitude changes in the opposite direction. In the control condition, these factors exactly cancel, such that the expected value across time is constant (CEV). In two other conditions, expected value increases (IEV) or decreases (DEV), such that reward maximization requires either speeding up (Go learning) or slowing down (NoGo learning) relative to the CEV condition. We tested patients with Parkinson's disease (depleted striatal DA levels) on and off dopaminergic medication, compared to age-matched controls. While medicated, patients were better at speeding up in the DEV relative to CEV conditions. Conversely, non-medicated patients were better at slowing down to maximize reward in the IEV condition. These effects of DA manipulation on cumulative Go/NoGo response time adaptation were captured with an a priori computational model of the basal ganglia, previously applied only to forced-choice tasks. There were also robust trial-to-trial changes in response time, but these single trial adaptations were not affected by disease or medication and are posited to rely on extrastriatal, possibly prefrontal, structures.
Abstract:
Parkinson's disease (PD) patients exhibit cognitive deficits, including reinforcement learning, working memory and set shifting. Computational models of the basal ganglia -- frontal system posit similar mechanisms for these deficits in terms of reduced dynamic range of striatal dopamine (DA) signals in both medicated and non-medicated states. Here, we report results from the first study that tests PD patients on and off dopaminergic medications in a modified version of the AX-CPT working memory (WM) task, consisting of three performance phases and one phase requiring WM associations to be learned via reinforcement feedback. Patients generally showed impairments relative to controls. Medicated patients showed deficits specifically when having to ignore distracting stimuli during the delay. Our models suggest that this impairment is due to medication causing excessive WM updating by enhancing striatal "Go" signals that facilitate such updating, while concurrently suppressing "NoGo" signals. In contrast, patients off medication showed deficits consistent with an overall reduction in striatal DA and associated Go updating signals. Supporting this dichotomy, patients on and off medication both showed attentional shifting deficits, but for different reasons. Deficits in non-medicated patients were consistent with an inability to update the new attentional set, whereas those in medicated patients were evident when having to ignore distractors that had previously been task-relevant. Finally, in the feedback-based WM phase, medicated patients were better than unmedicated patients, suggesting a key role of striatal DA in using feedback to update information into WM. These results lend further insight into the role of basal ganglia dopamine in WM and broadly support predictions from neurocomputational models.
Abstract:
Animal findings have highlighted the modulatory role of phasic dopamine (DA) signaling in incentive learning, particularly in the acquisition of reward-related behavior. In humans, these processes remain largely unknown. In a recent study, we demonstrated that a single low dose of a D2/D3 agonist (pramipexole) - assumed to activate DA autoreceptors and thus reduce phasic DA bursts - impaired reward learning in healthy subjects performing a probabilistic reward task. The purpose of this study was to extend these behavioral findings using event-related potentials and computational modeling. Compared with the placebo group, participants receiving pramipexole showed increased feedback-related negativity to probabilistic rewards and decreased activation in dorsal anterior cingulate regions previously implicated in integrating reinforcement history over time. Additionally, findings of blunted reward learning in participants receiving pramipexole were simulated by reduced presynaptic DA signaling in response to reward in a neural network model of striatal-cortical function. These preliminary findings offer important insights on the role of phasic DA signals on reinforcement learning in humans and provide initial evidence regarding the spatiotemporal dynamics of brain mechanisms underlying these processes.
Abstract:
The dopamine hypothesis of aging suggests that a monotonic dopaminergic decline accounts for many of the changes found in cognitive aging. We tested 44 older adults with a probabilistic selection task sensitive to dopaminergic function and designed to assess relative biases to learn more from positive or negative feedback. Previous studies demonstrated that low levels of dopamine lead to avoidance of those choices that lead to negative outcomes, whereas high levels of dopamine result in an increased sensitivity to positive outcomes. In the current study, age had a significant effect on the bias to avoid negative outcomes: older seniors showed an enhanced tendency to learn from negative compared with positive consequences of their decisions. Younger seniors failed to show this negative learning bias. Moreover, the enhanced probabilistic integration of negative outcomes in older seniors was accompanied by a reduction in trial-to-trial learning from positive outcomes, thought to rely on working memory. These findings are consistent with models positing multiple neural mechanisms that support probabilistic integration and trial-to-trial behavior, which may be differentially impacted by older age.
Abstract:
The transitive inference (TI) task assesses the ability to generalize learned knowledge to new contexts, and is thought to depend on the hippocampus (Dusek \& Eichenbaum, 1997). Animals or humans learn in separate trials to choose stimulus A over B, B over C, C over D and D over E, via reinforcement feedback. Transitive responding based on the hierarchical structure A>B>C>D>E is then tested with the novel BD pair. We and others have argued that successful BD performance by animals -- and even humans in some implicit studies -- can be explained by simple reinforcement learning processes which do not depend critically on the hippocampus, but rather on the striatal dopamine system. We recently showed that the benzodiazepene midazolam, which is thought to disrupt hippocampal function, profoundly impaired human memory recall performance but actually enhanced implicit TI performance (Frank, O'Reilly & Curran, 2006). We posited that midazolam biased participants to recruit striatum during learning due to dysfunctional hippocampal processing, and that this change actually supported generalization of reinforcement values. Greene (2007) questions the validity of our pharmacological assumptions and argues that our conclusions are unfounded. Here we stand by our original hypothesis, which remains the most parsimonious account of the data, and is grounded by multiple lines of evidence.
Abstract:
Rationale The dopaminergic system, particularly D2-like dopamine receptors, has been strongly implicated in reward processing. Animal studies have emphasized the role of phasic dopamine (DA) signaling in reward-related learning, but these processes remain largely unexplored in humans. Objectives To evaluate the effect of a single, low dose of a D2/D3 agonist---pramipexole---on reinforcement learning in healthy adults. Based on prior evidence indicating that low doses of DA agonists decrease phasic DA release through autoreceptor stimulation, we hypothesized that 0.5 mg of pramipexole would impair reward learning due to presynaptic mechanisms. Materials and methods Using a double-blind design, a single 0.5-mg dose of pramipexole or placebo was administered to 32 healthy volunteers, who performed a probabilistic reward task involving a differential reinforcement schedule as well as various control tasks. Results As hypothesized, response bias toward the more frequently rewarded stimulus was impaired in the pramipexole group, even after adjusting for transient adverse effects. In addition, the pramipexole group showed reaction time and motor speed slowing and increased negative affect; however, when adverse physical side effects were considered, group differences in motor speed and negative affect disappeared. Conclusions These findings show that a single low dose of pramipexole impaired the acquisition of reward-related behavior in healthy participants, and they are consistent with prior evidence suggesting that phasic DA signaling is required to reinforce actions leading to reward. The potential implications of the present findings to psychiatric conditions, including depression and impulse control disorders related to addiction, are discussed.
Abstract:
Deep brain stimulation (DBS) of the subthalamic nucleus dramatically improves the motor symptoms of Parkinson's disease, but causes cognitive side effects such as impulsivity. Here we show that DBS selectively interferes with the normal ability to slow down when faced with decision conflict. While on DBS, patients actually sped up under high conflict conditions. This form of impulsivity was not affected by dopaminergic medication status. Instead, medication impaired patients' ability to learn from negative decision outcomes. These findings implicate independent mechanisms leading to impulsivity in treated Parkinson's patients, and were predicted by a single neurocomputational model of the basal ganglia.
Abstract:
What are the genetic and neural components that support adaptive learning from positive and negative outcomes? Here we show with genetic analyses that three independent dopaminergic mechanisms contribute to reward and avoidance learning in humans. A polymorphism in the DARPP-32 gene, associated with striatal dopamine function, predicted relatively better probabilistic reward learning. Conversely, the C957T polymorphism of the DRD2 gene, associated with striatal D2 receptor function, predicted the degree to which participants learned to avoid choices that had been probabilistically associated with negative outcomes. The val/met polymorphism of the COMT gene, associated with prefrontal cortical dopamine function, predicted participants' ability to rapidly adapt behavior on a trial-to-trial basis. These findings support a neurocomputational dissociation between striatal and prefrontal dopaminergic mechanisms in reinforcement learning. Computational maximum likelihood analyses reveal independent gene effects on three reinforcement learning parameters that can explain the observed dissociations.
Abstract:
The error related negativity (ERN) and error positivity (Pe) are electrophysiological markers of error processing thought to originate in medial frontal cortex. Previous studies using probabilistic reinforcement showed that individuals who learn more from negative than positive feedback (``negative learners'') had larger ERNs than did ``positive learners''. These findings support the dopamine reinforcement learning hypothesis of the ERN and associated computational models. However, it remains unclear (a) to what extent these effects generalize to tasks outside the restricted probabilistic reinforcement learning domain, and (b) whether there is a dopaminergic source of these effects. To address these issues, we tested subjects' reinforcement learning biases behaviorally, and recorded EEG during an unrelated recognition memory experiment. Initial recognition responses were speeded, but subjects were subsequently allowed to self-correct their responses. We found that negative learners, as assessed via probabilistic learning, had larger ERNs in the recognition memory task, suggestive of a common underlying enhanced error processing mechanism. Negative learners also had enhanced Pe's when self-correcting errors than positive learners. Moreover, the ERN and Pe components contributed independently to negative learning. We also tested for a dopaminergic genetic basis for these ERP components. We analyzed the COMT val/met polymorphism, which has been linked to frontal dopamine levels. COMT genotype affected Pe (but not ERN) magnitude; met/met homozygotes showed enhanced Pe's to self-corrected errors compared to val carriers. These results are consistent with a role for the Pe and frontal monoamines in error awareness.
Abstract:
We test our neurocomputational model of fronto-striatal dopamine (DA) and noradrenaline (NA) function for understanding cognitive and motivational deficits in attention deficit / hyperactivity disorder (ADHD). Our model predicts that low striatal DA levels in ADHD should lead to deficits in ``Go'' learning from positive reinforcement, which should be alleviated by stimulant medications, as observed with DA manipulations in other populations. Indeed, while non-medicated adult ADHD participants were impaired at both positive (Go) and negative (NoGo) reinforcement learning, only the former deficits were ameliorated by medication. We also found evidence for our model's extension of the same striatal DA mechanisms to working memory, via interactions with prefrontal cortex. In a modified AX-continuous performance task, ADHD participants showed reduced sensitivity to working memory contextual information, despite no global performance deficits, and were more susceptible to the influence of distractor stimuli presented during the delay. These effects were reversed with stimulant medications. Moreover, the tendency for medications to improve Go relative to NoGo reinforcement learning was predictive of their improvement in working memory in distracting conditions, suggestive of common DA mechanisms and supporting a unified account of DA function in ADHD. However, other ADHD effects such as erratic trial-to-trial switching and reaction time variability are not accounted for by model DA mechanisms, and are instead consistent with cortical noradrenergic dysfunction and associated computational models. Accordingly, putative NA deficits were correlated with each other and independent of putative DA-related deficits. Taken together, our results demonstrate the usefulness of computational approaches for understanding cognitive deficits in ADHD.
Abstract:
Models of natural action selection implicate fronto-striatal circuits in both motor and cognitive "actions". Dysfunction of these circuits leads to decision making deficits in various populations. We review how computational models provide insights into the mechanistic basis for these deficits in Parkinson's patients and those with ventromedial frontal damage. We then consider implications of the models for understanding behavior and cognition in attention-deficit/hyperactivity disorder (ADHD). Incorporation of cortical norepinephrine function into the model improves action selection in noisy environments and accounts for response variability in ADHD. We close with more general clinical implications.
Abstract:
The prefrontal cortex (PFC) has long been thought to serve as an "executive" that controls the selection of actions, and cognitive functions more generally. However, the mechanistic basis of this executive function has not been clearly specied, often amounting to a homunculus. This paper reviews recent attempts to deconstruct this homunculus by elucidating the precise computational and neural mechanisms underlying the executive functions of the PFC. The overall approach builds upon existing mechanistic models of the basal ganglia and frontal systems known to play a critical role in motor control and action selection, where the basal ganglia provide a "Go" vs. "NoGo" modulation of frontal action representations. In our model, the basal ganglia modulate working memory representations in prefrontal areas, to support more abstract executive functions. We have developed a computational model of this system that is capable of developing human-like performance on working memory and executive control tasks through trial-and-error learning. This learning is based on reinforcement learning mechanisms associated with the midbrain dopaminergic system and its activation via the BG and amygdala. Finally, we briefly describe various empirical tests of this framework.
Abstract:
Background: Rewards and punishments may make distinct contributions to learning via separate striato-cortical pathways. We investigated whether fronto-striatal dysfunction in schizophrenia (SZ) is characterized by selective impairment in either reward- (Go) or punishment-driven (NoGo) learning. Methods: We administered two versions of a Probabilistic Selection task (Frank et al., 2004) to 40 SZs and 31 controls, using difficult-to-verbalize stimuli (Exp 1) and nameable objects (Exp 2). In an acquisition phase, participants learned to choose between three different stimulus pairs (AB, CD, EF) presented in random order, based on probabilistic feedback (80%, 70%, 60%). We used ANOVAs to assess the effects of group and reinforcement probability on two measures of contingency learning. To characterize the preference of subjects for choosing the most rewarded stimulus and avoiding the most punished stimulus, we subsequently tested participants with novel pairs of stimuli involving either A or B, providing no feedback. Results: Controls demonstrated superior performance during the first 40 acquisition trials in each of the 80% and 70% conditions versus the 60% condition; patients showed similarly impaired (<60%) performance in all three conditions. In novel test pairs, patients showed decreased preference for the most rewarded stimulus (A; t=2.674; p=0.01). Patients were unimpaired at avoiding the most negative stimulus (B; t=0.737). Conclusions: The results of these experiments provide additional evidence for the presence of deficits in reinforcement learning in SZ, suggesting that reward-driven (Go) learning may be more profoundly impaired than punishment-driven (NoGo) learning.
Abstract:
The ability to stop motor responses depends critically on the right inferior frontal cortex (IFC), and also engages a midbrain region consistent with the subthalamic nucleus (STN). Here we used diffusion-weighted imaging (DWI) tractography to show that the IFC and the STN region are connected via a white matter tract, which could underlie a 'hyperdirect' pathway for basal ganglia control. Using a novel method of 'triangulation' analysis of tractography data, we also found that both the IFC and the STN region are connected with the presupplementary motor area (preSMA). We hypothesized that the preSMA could play a conflict detection role within a network between preSMA, IFC and the STN region. A second experiment tested this idea with fMRI using a conditional stopsignal paradigm, enabling examination of behavioral and neural signatures of conflictinduced slowing. The preSMA, IFC, and STN region were significantly activated the greater the conflict-induced slowing. Activation corresponded strongly with spatial foci predicted by the DWI tract analysis, as well as with foci activated by complete response inhibition. The results illustrate how tractography can reveal connections that are verifiable with fMRI. The results also demonstrate a three-way functional-anatomical right-hemisphere network which could either brake or completely stop responses.
Abstract:
What biological mechanisms enable midbrain dopaminergic neurons to exhibit their well-established reward-predictive ring properties? A number of existing theories use the temporal-differences (TD) reinforcement learning algorithm as a computational framework for addressing this question. We propose an alternative mechanism called PVLV, which can be more directly related to the underlying biology than the more abstract TD model, and is also more robust to variability in the environment. PVLV contains two subsystems: a primary value (PV) system that controls performance and learning at the time of primary rewards, learning to expect and thus inhibit dopamine ring to primary rewards; and a learned value (LV) system that learns about conditioned stimuli (CS) associated with primary rewards, and can drive dopamine ring at CS onset. The PV system is essentially the Rescorla-Wagner/delta-rule, and is associated with neurons in the ventral striatum/nucleus accumbens that send inhibitory projections to the dopaminergic midbrain nuclei. The LV system is associated with neurons in the central nucleus of the amygdala, which send net excitatory projections to the dopaminergic nuclei. We show that the PVLV model can account for critical aspects of the dopamine ring data, and it makes a number of clear predictions about effects of lesions and other manipulations, several of which are consistent with existing data. For example, rst and second-order conditioning can be anatomically dissociated, which is consistent with PVLV and not TD. Overall, our model provides a biologically plausible framework for understanding the neural basis of reward learning.
Abstract:
The basal ganglia (BG) coordinate decision making processes by facilitating adaptive frontal motor commands while suppressing others. In previous work, neural network simulations accounted for response selection deficits associated with BG dopamine depletion in Parkinson's disease. Novel predictions from this model have been subsequently confirmed in Parkinson patients and in healthy participants under pharmacological challenge. Nevertheless, one clear limitation of that model is in its omission of the subthalamic nucleus (STN), a key BG structure that participates in both motor and cognitive processes. The present model incorporates the STN and shows that by modulating when a response is executed, the STN reduces premature responding and therefore has substantial effects on which response is ultimately selected, particularly when there are multiple competing responses. The model accurately captures the dynamics of activity in various BG areas during response selection. Simulated dopamine depletion results in emergent oscillatory activity in BG structures, which has been linked with Parkinson's tremor. Finally, the model accounts for the beneficial effects of STN lesions on these oscillations, but suggests that this benefit may come at the expense of impaired decision making.
Abstract:
We test a neurocomputational model of dopamine function in cognition by administering to healthy participants low doses of D2 agents cabergoline and haloperidol. Our model suggests that dopamine dynamically modulates the balance of ``Go'' and ``NoGo'' basal ganglia pathways during cognitive learning and performance. Cabergoline impaired, while haloperidol enhanced, Go learning from positive reinforcement, consistent with presynaptic drug effects. Cabergoline also caused an overall bias toward Go responding, consistent with postsynaptic action. These same effects extended to working memory and attentional domains, supporting the idea that the basal ganglia / dopamine system modulates the updating of prefrontal representations. Drug effects interacted with baseline working memory span in all tasks. Taken together, our results support a unified account of the role of dopamine in modulating cognitive processes that depend on the basal ganglia.
Abstract:
We explore the division of labor between the basal ganglia (BG) / dopamine (DA) system and orbitofrontal cortex (OFC) in reinforcement learning and decision making. We show that a "primitive" neural network model of the BG/DA system learns to make decisions based on their relative likelihood of reinforcement, but that the same model fails when the magnitude of gains and losses is more relevant than their frequency of occurence. An augmented model including OFC and amygdalar interactions with the BG system is more successful at estimating the true expected value of decisions, and is faster at learning to switch behavior when decision-outcome contingencies change. In our combined model, "Go" and "NoGo" BG pathways modulate the selection of premotor responses based on their probability of reinforcement, whereas medial and lateral OFC areas exert top-down control by representing reinforcement magnitudes in working memory. The model successfully captures patterns of behavior resulting from OFC damage in decision making, reversal learning, and devaluation paradigms, and makes additional predictions for the underlying source of these deficits.
Abstract:
People often make logically sound decisions using explicit reasoning strategies, but sometimes it pays to rely on more implicit ``gut-level'' intuition. The transitive inference paradigm has been widely used as a test of explicit logical reasoning in animals and humans, but it can also be solved in a more implicit manner. Some have argued that the hippocampus supports relational memories required for making logical inferences. Here we show that the benzodiazepene midazolam, which inactivates the hippocampus, causes profound explicit memory deficits in healthy participants, but actually enhances their ability in making implicit transitive inferences. These results are consistent with neurocomputational models of the basal ganglia/dopamine system that learn to make decisions based on positive and negative reinforcement. We suggest that disengaging the hippocampal explicit memory system can be advantageous for this more implicit form of learning.
Abstract:
The prefrontal cortex has long been thought to subserve both working memory (the holding of information online for processing) and ``executive'' functions (deciding how to manipulate working memory and perform processing). Although many computational models of working memory have been developed, the mechanistic basis of executive function remains elusive, often amounting to a homunculus. This paper presents an attempt to deconstruct this homunculus through powerful learning mechanisms that allow a computational model of the prefrontal cortex to control both itself and other brain areas in a strategic, task-appropriate manner. These learning mechanisms are based on subcortical structures in the midbrain, basal ganglia and amygdala, which together form an actor/critic architecture. The critic system learns which prefrontal representations are task-relevant and trains the actor, which in turn provides a dynamic gating mechanism for controlling working memory updating. Computationally, the learning mechanism is designed to simultaneously solve the temporal and structural credit assignment problems. The model's performance compares favorably with standard backpropagation-based temporal learning mechanisms on the challenging 1-2-AX working memory task, and other benchmark working memory tasks.
Abstract:
The prefrontal cortex (PFC) has long been thought to subserve both working memory and ``executive'' function, but the mechanistic basis of their integrated function has remained poorly understood, often amounting to a homunculus. This paper reviews the progress in our lab and others pursuing a long-term research agenda to deconstruct this homunculus by elucidating the precise computational and neural mechanisms underlying these phenomena. We outline six key functional demands underlying working memory, and then describe the current state of our computational model of the PFC and associated systems in the basal ganglia (BG). The model, called PBWM (prefrontal-cortex, basal-ganglia working memory model), relies on actively maintained representations in the PFC, which are dynamically updated/gated by the BG. It is capable of developing human-like performance largely on its own by taking advantage of powerful reinforcement learning mechanisms, based on the midbrain dopaminergic system and its activation via the BG and amygdala. These learning mechanisms enable the model to learn to control both itself and other brain areas in a strategic, task-appropriate manner. The model can learn challenging working memory tasks, and has been corroborated by several important empirical studies.
Abstract:
Transitive inference (TI) in animals (e.g., choosing A over C based on knowing that A is better than B and B is better C) has been interpreted by some as reflecting a declarative, logical inference process. We invert this anthropomorphic interpretation by providing evidence that humans can exhibit TI-like behavior based on simpler associative mechanisms that underly many theories of animal learning. In this study, human participants were trained on a five-pair TI problem (A+B-, B+C-, C+D-, D+E-, E+F-), and, unlike in previous human TI studies, were prevented from becoming explicitly aware of the logical hierarchy, so they could not employ logical reasoning. They were then tested with three problems: B vs D, B vs. E, and C vs. E. Participants only reliably chose B over E, whereas the other test conditions yielded chance performance. This result is inconsistent with the use of logical reasoning, and is instead consistent with an account developed to explain earlier TI studies with rats that found the same pattern of results. In this account, choice performance is based on differential associative strengths across the stimulus items that develop over training, despite equal overt reinforcement.
Abstract:
Dopamine (DA) depletion in the basal ganglia (BG) of Parkinson's patients gives rise to both frontal-like and implicit learning impairments. Dopaminergic medication alleviates some cognitive deficits but impairs those that depend on intact areas of the BG, apparently due to DA ``overdose''. These findings are difficult to accommodate with verbal theories of BG/DA function, owing to complexity of system dynamics: DA dynamically modulates function in the BG, which is itself a modulatory system. This paper presents a neural network model that instantiates key biological properties and provides insight into the underlying role of DA in the BG during learning and execution of cognitive tasks. Specifically, the BG modulates the execution of ``actions'' (e.g., motor responses and working memory updating) that are being considered in different parts of frontal cortex. Phasic changes in DA, which occur during error feedback, dynamically modulate the BG threshold for facilitating/suppressing a cortical command in response to particular stimuli. Reduced dynamic range of DA explains Parkinson and DA overdose deficits with a single underlying dysfunction, despite overall differences in raw DA levels. Simulated Parkinsonism and medication effects provide a theoretical basis for behavioral data in probabilistic classification and reversal tasks. The model also provides novel testable predictions for neuropsychological and pharmacological studies, and motivates further investigation of BG/DA interactions with prefrontal cortex in working memory.
Abstract:
The error-related negativity (ERN) is an electrophysiological marker thought to reflect changes in dopamine when participants make errors in cognitive tasks. Our computational model further predicts that larger ERNs should be associated with better learning to avoid maladaptive responses. Here we show that participants who avoided negative events had larger ERNs than those who were biased to learn more from positive outcomes. We also tested for effects of response conflict on ERN magnitude. While there was no overall effect of conflict, positive learners had larger ERNs when having to choose among two good options (``win/win'' decisions) compared with two bad options (``lose/lose'' decisions), whereas negative learners exhibited the opposite pattern. These results demonstrate that the ERN predicts the degree to which participants are biased to learn more from their mistakes than their correct choices, and clarify the extent to which it indexes decision conflict.
Abstract:
To what extent do we learn from the positive versus negative outcomes of our decisions? The neuromodulator dopamine plays a key role in these reinforcement learning processes. Patients with Parkinson's disease, who have depleted dopamine in the basal ganglia, are impaired in tasks that require learning from trial and error. Here we show, using two cognitive procedural learning tasks, that Parkinson's patients off medication are better at learning to avoid choices that lead to negative outcomes than they are at learning from positive outcomes. Dopamine medication reverses this bias, making patients more sensitive to positive than negative outcomes. This pattern was predicted by our biologically-based computational model of basal ganglia/dopamine interactions in cognition, which has separate pathways for ``Go'' and ``NoGo'' responses that are differentially modulated by positive and negative reinforcement.
Abstract:
How do we produce complex motor sequences? To what extent do we learn from the positive versus negative consequences of our decisions? How do we maintain task-relevant information in working memory while ignoring distracting information? This dissertation provides a mechanistic framework that explores how these seemingly unrelated processes recruit remarkably similar neural circuits linking the basal ganglia (BG) with frontal cortex. Drawing from neuroanatomical and biochemical considerations, this framework suggests that the BG facilitate or suppress cortical ``actions'' (e.g., motor responses and working memory updating) via separate Go and NoGo pathways projecting to frontal cortex, and that the relative balance of these pathways is dynamically modulated by dopamine (DA). Transient DA bursts and dips during positive and negative reinforcement support Go and NoGo learning via D1 and D2 receptors, respectively. Computational neural network models instantiate key biological properties and provide insight into the underlying role of BG/DA interactions during the learning and execution of cognitive tasks. These models account for complex medication-dependent cognitive deficits in Parkinson's disease, and make simple predictions for the underlying source of these deficits, emphasizing the importance of the dynamic range of DA signals. These predictions have been subsequently confirmed in medicated and non-medicated Parkinson's patients and in healthy individuals under pharmacologically-induced DA manipulation. In all of these studies, elevated levels of phasic DA release led to greater Go learning from positive outcomes of decisions, whereas diminished DA levels led to better NoGo learning to avoid negative outcomes. Tonic DA stimulation led to more overall Go responding. These effects extended to higher level cognitive function: tonic DA stimulation led to more overall working memory updating and concomitant distractibility, whereas enhanced phasic DA release led to greater selective updating for task-relevant (i.e., ``positively-valenced'') information, but difficulty in ignoring this information in a subsequent set-shift. Drug effects also interacted with baseline working memory span. Taken together, these results provide substantial support for a unified account of the role of DA in modulating cognitive processes that depend on the basal ganglia.
Abstract:
We present a framework for understanding how the hippocampus, neocortex, and basal ganglia work together to support cognitive and behavioral function in the mammalian brain. This framework is based on computational tradeoffs that arise in neural network models, where achieving one type of learning function requires very different parameters from those necessary to achieve another form of learning. For example, we dissociate the hippocampus from cortex with respect to general levels of activity, learning rate, and level of overlap between activation patterns. Similarly, the frontal cortex and associated basal ganglia system have important neural specializations not required of the posterior cortex system. Taken together, this overall cognitive architecture, which has been implemented in functioning computational models, provides a rich and often subtle means of explaining a wide range of behavioral and cognitive neuroscience data. Here, we summarize recent results in the domains of recognition memory, contextual fear conditioning, effects of basal ganglia lesions on stimulus-response and place learning, and flexible responding.
Abstract:
Following training on a set of 4 ordered, simultaneous, odor discrimination problems: A+B-; B+C-; C+D-; D+E-, intact rats display transitivity: When tested on the novel combination BD they choose B. Rats with damage to the hippocampus, however, do not show transitivity (Dusek & Eichenbaum, 1997). These results have been interpreted as support for the idea that the hippocampus is a relational memory storage system that enables the subject to make comparisons among representations of the individual problems and choose based on inferential logic. We provide evidence for a simpler explanation: Specifically, subjects make their choices based on the absolute excitatory value of the individual stimuli. This value determines the ability of that stimulus to attract a response. This conclusion emerged because following training on a 5 problem set A+B-; B+C-; C+D-; D+E-, E+F-, rats preferred B when tested with BE but not when tested with BD. The implication of these results for how to conceptualize the role of the hippocampus in transitive-like phenomena is discussed.
Abstract:
The frontal cortex and basal ganglia interact via a relatively well-understood and elaborate system of interconnections. In the context of motor function, these interconnections can be understood as disinhibiting or ``releasing the brakes'' on frontal motor action plans --- the basal ganglia detect appropriate contexts for performing motor actions, and enable the frontal cortex to execute such actions at the appropriate time. We build on this idea in the domain of working memory through the use of computational neural network models of this circuit. In our model, the frontal cortex exhibits robust active maintenance, while the basal ganglia contribute a selective, dynamic gating function that enables frontal memory representations to be rapidly updated in a task-relevant manner. We apply the model to a novel version of the continuous performance task (CPT) that requires subroutine-like selective working memory updating, and compare and contrast our model with other existing models and theories of frontal cortex--basal ganglia interactions.