Replies: 2 comments
-
Some initial thoughts and questions:
Happy to discuss more during sync or another call! |
Beta Was this translation helpful? Give feedback.
-
In your thinking is this a hierarchy problem, a sequencing problem, or both? |
Beta Was this translation helpful? Give feedback.
-
This is a working draft idea for an agent focused on learning skills across multiple tasks. This idea is not complete, but I want others to see the idea as it evolves.
Our high-level goals in building this agent are:
Key design objectives
To try to re-use as much existing work as possible, we will start by trying to build a hybrid agent that combines components from Deep RL and and symbolic cognitive architectures.
The RL components will be from a popular Deep RL framework such as Acme or Ray RLlib, and include a
They may also include a meta-trainer for meta-RL, an experience dataset for offline learning, and multiple environments.
The cognitive architecture components will use the following components from Soar, ACT-R, or some other cognitive architecture or production system:
They may also include a declarative memory module and productions (i.e. symbolic policy rules)
Background
The idea for this approach to a general agent came from our observations that modern RL does not have an explicit abstraction for goal management.
Human cognitive control
Humans clearly maintain a high-level representation of many of their goals that are available to their executive control processes. For example, when prompted we can describe our plans for the day, year, or the rest of our lives are. We can also describe "what we are currently doing" or "working on", usually in terms of a goal, objective, intention, or need.
Neuroscientists and cognitive scientists call the part of the human mind concerned with managing goals and working memory as well as conscious planning to achieve our goals cognitive control and differentiate it from knowledge (declarative memory) and learned habits (procedural memory). Cognitive control (aka executive function) is most closely associated with the prefrontal cortex of the brain. There has been significant work to understand and model cognitive control (e.g. Badre's book On Task, and Eliasmith's book/papers on Spaun & Nengo, as well as the work in symbolic cognitive architectures described below).
Goal management in Deep RL
Popular RL algorithms that train a policy to do well at a given task based on experience in the form of a observation-action-reward-observation loop, such as TD-learning, are similar to operant conditioning in humans, which is used to learn unconscious responses, i.e., habits. The simplest form of Deep RL results in a DNN policy that performs well on a given task in a given environment (e.g. the Atari game Breakout). This is akin to humans learning to perform well at a task like touch typing by building up muscle memory through a lot of experience. In this type of learning, the goal is always to maximize reward, but no explicit semantic representation (i.e., what cognitive scientists call a "symbolic representation") exists.
Multitask RL and Meta-RL explicitly define a
task
abstraction, which can be represented as in a variety of ways (e.g. a one-hot-encoded vector) and passed to a task-conditioned DNN (e.g. a Policy). Also, goal-conditioned RL allows a state from the environments state-space to be used as agoal
, i.e., a target state that the agent tries to progress the environment toward by means of the agent's actions.Some existing benchmarks and frameworks for multi-task RL include:
Goal management in cognitive architectures
In this section we will look at how goals are represented in two of the most mature and well known cognitive architectures: ACT-R and Soar.
In ACT-R, goals are chunks, i.e., data structures that represent highly structured short term memory. They are managed by the Goal module which has a goal buffer that allows only one goal to be active at at time in an agent (note that agents in ACT-R are called models). Other ACT-R modules (e.g. the procedural module) access the currently active goal by inspecting the goal buffer.
In Soar, goals are...
Beta Was this translation helpful? Give feedback.
All reactions