[WIP] [Idea] Hyrid agent combining RL and Cognitive architecture components #122

andyk · 2021-06-01T18:46:19Z

andyk
Jun 1, 2021
Maintainer

This is a working draft idea for an agent focused on learning skills across multiple tasks. This idea is not complete, but I want others to see the idea as it evolves.

Our high-level goals in building this agent are:

Build an agent that treats goals explicitly (vs implicit in the environment rewards and policy)
Demonstrate an agent that is "aware" of and dynamically manages their set of goals
Explore whether it is fruitful to combine components from across existing diverse agent frameworks (e.g. Deep RL and Cognitive Architectures)
Gain experience and lessons useful for designing AgentOS. Especially:
1. the Agent Component Registry ([Design] AgentOS Component System (ACS) and Registry (Formerly ACR - AgentOS Component Registry) #113)
2. a multi-environment management module (i.e., API/CLI/OS-level abstractions for programming an agent that can handle multiple, potentially concurrent, envs).

Key design objectives

research & decide on the multiple tasks and environments that the agent will interact with
build a multi-task RL agent that can compose multiple goals (aka tasks) of different types
add a "cognitive control" module that manages explicit goal representations (possibly using a separate working memory abstraction)
refactor agent so that environments/tasks can be dynamically added, removed

To try to re-use as much existing work as possible, we will start by trying to build a hybrid agent that combines components from Deep RL and and symbolic cognitive architectures.

The RL components will be from a popular Deep RL framework such as Acme or Ray RLlib, and include a

policy
trainer
environment
tasks (a.k.a., goals)

They may also include a meta-trainer for meta-RL, an experience dataset for offline learning, and multiple environments.

The cognitive architecture components will use the following components from Soar, ACT-R, or some other cognitive architecture or production system:

working memory
goal management

They may also include a declarative memory module and productions (i.e. symbolic policy rules)

Background

The idea for this approach to a general agent came from our observations that modern RL does not have an explicit abstraction for goal management.

Human cognitive control

Humans clearly maintain a high-level representation of many of their goals that are available to their executive control processes. For example, when prompted we can describe our plans for the day, year, or the rest of our lives are. We can also describe "what we are currently doing" or "working on", usually in terms of a goal, objective, intention, or need.

Neuroscientists and cognitive scientists call the part of the human mind concerned with managing goals and working memory as well as conscious planning to achieve our goals cognitive control and differentiate it from knowledge (declarative memory) and learned habits (procedural memory). Cognitive control (aka executive function) is most closely associated with the prefrontal cortex of the brain. There has been significant work to understand and model cognitive control (e.g. Badre's book On Task, and Eliasmith's book/papers on Spaun & Nengo, as well as the work in symbolic cognitive architectures described below).

Goal management in Deep RL

Popular RL algorithms that train a policy to do well at a given task based on experience in the form of a observation-action-reward-observation loop, such as TD-learning, are similar to operant conditioning in humans, which is used to learn unconscious responses, i.e., habits. The simplest form of Deep RL results in a DNN policy that performs well on a given task in a given environment (e.g. the Atari game Breakout). This is akin to humans learning to perform well at a task like touch typing by building up muscle memory through a lot of experience. In this type of learning, the goal is always to maximize reward, but no explicit semantic representation (i.e., what cognitive scientists call a "symbolic representation") exists.

Multitask RL and Meta-RL explicitly define a task abstraction, which can be represented as in a variety of ways (e.g. a one-hot-encoded vector) and passed to a task-conditioned DNN (e.g. a Policy). Also, goal-conditioned RL allows a state from the environments state-space to be used as a goal, i.e., a target state that the agent tries to progress the environment toward by means of the agent's actions.

Some existing benchmarks and frameworks for multi-task RL include:

Arcade Learning Environment, a suite of dozens of Atari 2600 games
https://github.com/rlworkgroup/metaworld, 50 robotic arm manipulation tasks. The tasks share the same robot, action space, and workspace.
[https://github.com/facebookresearch/mtrl](Multi Task RL) - by Facebook Research
Multiworld - An environment framework created by Vitchyr et al. at UC Berkeley for goal-conditioned RL research
Omniglot, a dataset with 1623 characters from 50 different languages.

Goal management in cognitive architectures

In this section we will look at how goals are represented in two of the most mature and well known cognitive architectures: ACT-R and Soar.

In ACT-R, goals are chunks, i.e., data structures that represent highly structured short term memory. They are managed by the Goal module which has a goal buffer that allows only one goal to be active at at time in an agent (note that agents in ACT-R are called models). Other ACT-R modules (e.g. the procedural module) access the currently active goal by inspecting the goal buffer.

In Soar, goals are...

nickjalbert · 2021-06-04T10:46:55Z

nickjalbert
Jun 4, 2021
Maintainer

Some initial thoughts and questions:

I feel very under-educated on the multitask space. Maybe I'll watch a few Finn lectures to try to get a bit more up to speed. But keep that in mind while reading the following comments/questions.
It seems to me that the kind of goal-management discussed in this draft has to be tied up with some sort of intrinsic motivation (e.g. curiosity-driven exploration or uncertainty awareness). Otherwise, that "1-hot vector passed to a task-conditioned DNN" would seem to satisfy the top two research goals (i.e. the goal is explicit and the agent is "aware" as the goal is part of the input). Maybe something like "intrinsic goal management" where the agent itself sets and modifies its goals based on some internal criteria given a particular external situation is a more precise description?
It might be helpful to flesh out the idea of "explicit goals" a little better. Are they human interpretable? Do we expect to be able to look at some file to find the current goals (vs being distributed across a "goal NN" or something)? Could some sort of decoder that looks at the state of the agent and environment and spits out some interpretable label for the agent's goal satisfy the objectives even if the agent itself does have an identifiable "goal vector" or "goal neuron"?
A maybe-cool tangential problem (or maybe this is the core problem?) in this line of work is figuring out how to hook an RL agent up to some sort of datastore from the symbolic AI days and have the RL agent do something useful with it. This seems like a hard problem even if we're just working with a small goalstore! Any thoughts on how to tackle this (we could also have a brainstorming session)? The DeepMind paper "Hybrid computing using a neural network with dynamic external memory" might be an interesting jumping off point (I've only skimmed).

Happy to discuss more during sync or another call!

0 replies

nickjalbert · 2021-06-04T11:12:13Z

nickjalbert
Jun 4, 2021
Maintainer

a multi-task RL agent that can compose multiple goals (aka tasks) of different types

In your thinking is this a hierarchy problem, a sequencing problem, or both?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] [Idea] Hyrid agent combining RL and Cognitive architecture components #122

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments

{{title}}

{{title}}

Select a reply

[WIP] [Idea] Hyrid agent combining RL and Cognitive architecture components #122

andyk Jun 1, 2021 Maintainer

Key design objectives

Background

Human cognitive control

Goal management in Deep RL

Goal management in cognitive architectures

Replies: 2 comments

nickjalbert Jun 4, 2021 Maintainer

nickjalbert Jun 4, 2021 Maintainer

andyk
Jun 1, 2021
Maintainer

nickjalbert
Jun 4, 2021
Maintainer

nickjalbert
Jun 4, 2021
Maintainer