Optimizing rewards

Actor-Critis algorithms from Reinforcement Learning (in machine learning) have two artificial neural netwroks (ANNs).

  • The actor network is responsible for selecting an action given a state of the environment.
  • The critic network is responsible for evaluating the action selected by the actor network by learning the value of environment states.
  • Basal ganglia

    Our cortex and numerous subcortical nuclei (e.g. basal ganglia) are responsible for learning and optimizing rewards.

  • In particular these areas of the brain learn to criticize others by predicting the rewards or punishments for our actions.
  • These are areas which push us to act or remain silent.
  • We also have metacogntion - the ability to predict what will happen if we act in different ways.