Key Points

Agent Workflow Memory

This paper introduces Agent Workflow Memory (AWM), a method that enables language model agents to learn and reuse task workflows, improving their performance on complex, long-horizon tasks. AWM induces routines from past experiences and selectively guides the agent, boosting success rates on web navigation benchmarks.

Expected outcomes:

Improve task completion rates on complex tasks.
Reduce the number of steps needed to solve tasks.
Enhance generalization across different tasks and domains.

Core Content:

1. Workflow Induction and Integration:

AWM induces workflows by extracting reusable routines from agent trajectories.
These workflows are integrated into the agent's memory to guide future task-solving.
Each workflow represents a goal with a common routine from available action trajectories.

Explanation:

AWM uses an induction module to identify frequently used sequences of actions (workflows) from the agent's past attempts at solving tasks.
These workflows are added to the agent's memory.
During future tasks, the agent consults its memory of workflows to guide its actions.

2. Offline vs. Online Scenarios:

AWM operates in both offline and online scenarios.
Offline: Workflows are extracted from annotated training examples and used at test time.
Online: Workflows are induced from test queries on the fly in a supervision-free setting.

Explanation:

In the offline setting, the agent learns workflows from a pre-existing dataset of solved tasks before encountering new, unseen tasks.
In the online setting, the agent learns and adapts its workflows as it encounters new tasks, without relying on a pre-existing dataset.

3. LM-based Workflow Induction:

AWM uses an LM-based module that prompts the agent to extract common sub-routines.
The model abstracts example-specific contexts to enhance workflow generality.
Workflows are segmented and stored in the workflow memory.

Explanation:

The language model is prompted to identify and extract common action sequences (sub-routines) from the agent's experiences.
The LM generalizes these workflows by replacing specific details with more general terms.
The generalized workflows are stored for future use.

4. WebArena Evaluation:

AWM improves over the BrowserGym baseline by 51.1% relative success rate.
It outperforms methods with human expert-written workflows by 7.6%.
AWM reduces the average number of steps taken to solve a task.

Explanation:

AWM significantly improves task success rates compared to existing methods.
AWM achieves better performance than methods that rely on human-designed workflows.
AWM solves tasks more efficiently, requiring fewer steps than other approaches.

5. Mind2Web Evaluation:

AWM improves cross-task results by 24.6% in relative step-wise success rate.
It demonstrates generalization across tasks, websites, and domains.
AWM scores 8.9 – 14.0 absolute points higher over baseline as train-test distribution gaps widen.

Explanation:

AWM shows strong performance in scenarios where the agent needs to adapt to new tasks.
AWM generalizes well to different websites and domains, even when there are significant differences between the training and testing data.

Q&A:

Q: What is the key idea behind Agent Workflow Memory (AWM)?

A: AWM allows agents to learn and reuse common task routines (workflows) from their past experiences, enabling them to solve complex tasks more efficiently and generalize better across different scenarios.

Q: How does AWM operate in offline versus online settings?

A: In the offline setting, AWM learns workflows from a pre-existing dataset of solved tasks. In the online setting, AWM learns and adapts its workflows as it encounters new tasks, without relying on a pre-existing dataset.

Q: What are the benefits of using LM-based workflow induction?

A: LM-based workflow induction allows the agent to extract common sub-routines from its experiences and generalize them by abstracting example-specific contexts, leading to more flexible and reusable workflows.

Q: How does AWM compare to methods that use human expert-written workflows?

A: AWM can outperform methods that rely on human expert-written workflows, demonstrating its ability to learn and adapt to new tasks without human supervision.

MindMap

Target Audience

The primary target audience includes researchers and practitioners in the fields of natural language processing, machine learning, and artificial intelligence, particularly those interested in developing more robust and adaptable language model-based agents for real-world applications. The paper is also relevant to individuals working on web navigation, task automation, and human-computer interaction.

Author Background

The authors are affiliated with Carnegie Mellon University and Massachusetts Institute of Technology. Their expertise lies in the field of language models and their application in solving real-world tasks.

Historical Context

The research is situated in the context of rapidly advancing language model-based agents and their increasing application in digital tasks. It addresses the limitations of current agents in handling complex tasks and adapting to changing environments, drawing inspiration from human problem-solving strategies.

Agent Workflow Memory

Key Points