Agents are an emerging class of artificial intelligence (AI) systems that use large language models (LLMs) to interact with the world. In the ‘Towards AGI’ series, we aim to explore the future of Agents. However, before we delve into the future, let’s first revisit the past.
Here is a brief diagram which captures the evolution journey of agents:
Prompt chaining is not a new concept. It’s the underlying technology behind traditional agents or chatbots, which often require handcrafted rules. This made adapting to new environments challenging. However, modern agents can access a suite of tools, depending on the user’s input and demands. These agents utilize the common-sense priors present in LLMs to adapt to novel tasks and answer user queries in environments where pre-determined chains don’t exist.
Despite the hype around Agents in 2023, they are yet to become a part of our day-to-day lives. But, mark my words, “2024 is going to go down in history as the year of Agents”. Why am I so confident? It’s because Agents have already made the leap from concept to reality. Currently, we are in the phase of scaling from the initial stage to widespread use.
With the launch of GPT-3.5, LLMs became powerful enough for decision-making (choosing actions from a given action space) which is the core capability of Agents. Now, we are making progress on two fronts:
- the core reasoning capabilities are advancing due to improved models; and
- the agent designs are becoming more refined and ready for production with the introduction of new foundational building blocks beyond just decision-making.
In this article, we will focus on one such building block – Memory (also known as State).
Memory is about adding state to stateless systems
LLMs in their current form are stateless which means that LLMs do not retain information about the user’s previous interaction. However, agents store the previous interactions in variables and use it in subsequent LLM calls. Therefore, agents are stateful and they have memory. But the catch is that most agents follow a simple design pattern where the entire history from the previous interaction is passed to the LLM during the next call. This simplistic design pattern ensures that there is no information loss but it also has multiple limitations:
- Memory is limited by the context window size of the LLM
- Context pollution deteriorates the output quality of the LLM
- No ability to synthesize deeper insights on top of raw observations
Thus, agents need to judiciously use the context window for better performance, and at the same time, they need a larger context window to store information in a lossless way. This trade-off can be resolved by dividing the memory into two parts:
- Short-Term Memory (STM): It is the Main Context that is fed to the LLM in runtime. It has a limited size which depends on the context window length of the LLM.
- Long-Term Memory (LTM): It is the External Context which is stored on disk. Before every LLM call the agent retrieves relevant information from LTM and uses it to edit the STM. And after LLM call the agent writes relevant information from the output into the LTM.
This solution is similar to how any modern computer manages its memory. This analogy is better captured in the following table:
Agents | Computers | |
---|---|---|
Short-Term memory | Context Window | RAM |
Long-Term memory | VectorDB, GraphDB, RelationalDB, Files and Folders | SSD, External Hard Drives, Cloud Storage |
Deep dive into various types of Agent Memory
- STM: Working memory (LLM Context): It is a data structure with multiple parts which are usually represented with a prompt template and relevant variables. Before runtime, the STM is synthesized by replacing the relevant variables in the prompt template with information retrieved from the LTM. It includes
- Perceptual inputs: Observation (aka Grounding) from previous tool calls
- Active knowledge: generated by reasoning or retrieved from long-term memory
- Other core information carried over from the previous decision cycle (e.g., agent’s active goals).
- LTM type 1: Episodic memory (aka Raw memory): It stores the ground truth of all the actions, their outputs (observations), and the reasoning (thought) behind those actions. During the planning stage of a decision cycle, these episodes may be retrieved into working memory to support reasoning. It can be stored in relational DBs and files. It can consist of:
- Input-Output pairs of tools called by the agent during the current run
- History event flows (see Memory Stream in Generative Agents paper)
- Game trajectories from previous episodes
- LTM type 2: Semantic memory (aka Reflections): It stores an agent’s knowledge about the world and itself. Semantic memory is usually initialized from an external database for knowledge support. But it can also be learned by deriving insights from raw observations (see Reflections in Generative Agents paper). Some examples could be
- RAG: Leveraging game manuals and facts as a semantic memory to affect the policy or using internal docs like HR Policy to answer questions.
- RAG-based In-context Learning: A vectorDB stores recipes for doing some known tasks which can be used as in-context examples while solving newer tasks (Example: Langchain’s Extending the SQL toolkit).
- Self-learning from user inputs: For example, MemGPT uses archival storage to store facts, experiences, preferences of the user, etc.
- Self-learning from environment interactions: Reflexion (Shinn et al., 2023) uses an LLM to reflect on failed episodes and stores the results (e.g., “there is no dishwasher in the kitchen”).
- LTM type 3: Procedural memory: This memory represents the agent’s procedures for thinking, acting, decision-making, etc. Unlike episodic or semantic memory that may be initially empty or even absent, procedural memory must be initialized by the designer with proper code to bootstrap the agent. It is of two types
- implicit knowledge stored in the LLM weights
- explicit knowledge is written in the agent’s code: It can be further divided into 2 types:
- procedures that implement actions (reasoning, retrieval, grounding, and learning)
- procedures that implement decision-making itself
Choosing the right Memory design in Production
Since agents are powered by LLMs, they are inherently probabilistic. Therefore, keeping the Agent design as simple as possible is necessary. For instance, if an application is simple enough, the agent should be able to perform without any Semantic Memory component. In this section, let’s try to see how the memory design depends on some sample end-use cases.
Use-case 1: Role-play with the user as a friend or assistant
This is one of the most popular use cases. MemGPT is a great example of this. The main KPI of agents in this setup would be to remember facts that the user reveals during normal conversation. So, you’ll need an Episodic memory component to store all the conversations. But you’ll also need a Semantic memory component to store the details and preferences of the user which will populated by extracting insights from raw conversations.
Use-case 2: Interacting with the user for customer support
In this setup, let’s assume that in the backend there are humans who are trying to resolve certain customer queries. However, the responsibility for communicating with the user is delegated to an AI agent. In a professional setup like this, the agent is responsible for extracting tasks from conversations and passing them to an employee and once the employee completes the task the agent will convey the output to the user. This use case is quite transactional. Hence, the Semantic memory does not need to focus on user preferences. Instead, it should solve for maintaining a task list and updating the status of the task as human support agents do the job.
Use-case 3: Task-execution expert that interacts with tools
Of the three use cases, this is the most transactional one. AutoGPT and SuperAGI are the ideal examples of this category. Here the agents are given a goal and they need to achieve it by calling some tools. In this case, the episodic memory will not be about storing chat history with the user, but instead, it will store the tool call history (inputs and outputs). The semantic memory could consist of VectorDB storing recipes for doing certain basic tasks. Any new task will probably be some combination of basic tasks. So we can use RAG to find the top n basic tasks that are relevant to solving the current task. Then use the solutions of those basic tasks as in-context examples to solve the current task.
Conclusion & Next Steps
In this blog, we saw that design choices for Memory depend on the end use case. But we haven’t gone deeper into how we interact with memory. There are nuances to reading from the memory (Retrieval) and writing into the memory (Learning). In fact, Retrieval and Learning are just two types of actions in the whole action-space of agents. So stay tuned for the next articles, in which we will go deeper into Action-Space (Retrieval, Learning, Reasoning, and Grounding).