There are many choices to take when choosing the architecture for an LLM application, should you create a RAG flow, an agents or what about multi-agent architectures? This post will give an overview of the different types of LLM agents and how, why and if they should be used. Is the technology mature to be used in production systems? What are the challenges and can they be overcome?

An agent can be used when a task is too complex to be solved by a single recipe. When there are multiple reasoning steps to solve the problem, and the steps are not always the same. Agents are relevant when they are put into a dynamic environment, where the agent needs to learn from its environment and adapt to it.

“An autonomous agent is a system situated within and a part of an environment that senses that environment and acts on it, over time, in pursuit of its own agenda and so as to effect what it senses in the future.” - Franklin and Graesser (1997)

Agents try to mimic human intelligence with planning, memory, and reasoning. To do it needs the following components:

Profiling, to define the role of the agent
Memory, to store and use information
Planning, to plan the actions
Action, to act on the environment

The memory and planning modules enables the agent to learn and adapt to a dynamic environment. The agent can also use its memory to store information about the environment and use it to make decisions.

Agent is trying to solve the problem of artificial general intelligence (AGI) and might not be the best choice for a specific task, where a recipe can be used.

An overview of different approaches to an LLM app. The figure is for illustrative purposes only. [image by author]

The components of a LLM Agent

Let’s take a closer look at the different components of any LLM autonomous agent. There are many different agent architectures, take a look at the great paper: https://arxiv.org/abs/2308.11432 for an overview and comparison. Much of the content in the following sections is inspired by this paper.

Understanding how agents work is useful, even if you will not use them, since it can help to design better RAG based systems. In a RAG system you can implement parts of the agent architecture, like the planning or memory module, to make the system more dynamic and adaptive.

Profiling

The profile is the role of the agent like supervisor, teacher, domain expert or writer. The profile can contain information like age, career, education, etc. It infleunces what the agent remembers and how it plans. It is the foundation for the agent design, as it influences all the other components of the agent.

Planning

There is different strategies for planning in LLM agents.

Planning without feedback

Planning without feedback uses no external signals to influence future actions. Examples include: Chain of Thought (CoT) for single-path reasoning where the LLM divides the task into a series of steps and followes them. Then there is Self-consistent CoT (CoT-SC), Tree of Thought (ToT) and Graph of Thought (GoT) that are multiple-path reasoning methods. Methods that generate multiple reasoning paths and then selects the most frequent or most suitable option.

Planning with feedback

This strategy uses external or internal signals to revise or improve the agent’s actions. It can be further divided into three sub-strategies:

Environmental feedback: The agent obtains feedback from the objective world or virtual environment, such as task completion signals or scene observations. For example, ReAct and Voyager use environmental feedback to adapt their plans to the current situations.

Human feedback: The agent interacts with humans to obtain subjective feedback that can align with human values and preferences. For example, Inner Monologue and ChatCoT use human feedback to enhance their reasoning and action taking processes.

Model feedback: The agent uses pre-trained models to provide feedback on its own actions and outputs. For example, Reflexion and SelfCheck use LLMs to generate verbal feedback that can help the agent correct errors or improve quality.

To understand the planning methods better, lets take a deeper look at widely used ReAct method.

ReAct prompting

The agent follows a three-step loop: thought, act, and observation.

Thoughts: In the thought step, the agent generates a natural language sentence that expresses its intention or plan for the next action.
Actions: In the act step, the agent executes the action by either responding to the user or calling an external function or API.
Observations: In the observation step, the agent receives feedback from the environment or the user and updates its internal state accordingly.

The agent repeats this loop until the task is completed or an exception occurs. And how does it do that? It is actually just a well-crafted prompt that guides the agent through the loop. If you find the prompt and read it for instance here you can see how it is constructed.

Memory

The memory module stores and retrieves information that the agent needs to perform its tasks. It can be divided into short-term memory and long-term memory inspired by human memory systems.

Short term memory: The context provided, keeping track of a state and serving it intelligently to in the prompts for the agents. The is the in-context chat history. Long term memory: This is the knowledge base. To use knowledge the LLM can use memory reading to extract useful information from memory or memory reflection to summarize and infer high-level information from memory. Long term memory can be realized by using an external vector storage, that can be updated with memory writing operation when users provide new useful information.

Actions

Actions are the things the agent can do, this is often called “tools”. Actions can be to generate a new observation, ask a question, make a decision, take an action on the external environment, alter the internal state of the agent, trigger another action etc. Actions is needed to interact with the environment. It is influenced by the previous components, and it can be described from four perspectives:

Action goal: what are the intended outcomes of the actions? The agent can perform actions with various objectives, such as task completion, communication, or environment exploration.

Action production: how are the actions generated? The agent may take actions via different strategies and sources, such as memory recollection or plan following.

Action space: what are the available actions? The agent can choose from a set of possible actions, such as external tools or internal knowledge of the LLMs.

Action impact: what are the consequences of the actions? The agent’s actions can have different effects on the environment, the agent itself, or other agents, such as altering environment states, triggering new actions, or updating memories.

See https://arxiv.org/abs/2308.11432 for a more detailed description of these perspectives.

The frameworks

There is several frameworks for creating agents in LLM. The most popular ones are LangChain, LangGraph, and AutoGen.

LangChain

LangChain is an open-source framework designed to simplify the creation of applications using large language models (LLMs). It provides a standard interface for chains, lots of integrations with other tools, and end-to-end chains for common applications. LangChain is ideal when you need to integrate LLMs with external sources of computation and data. It has an easy-to-use implementation of the ReAct method described above so you can get a single-agent system up and running really quickly.

LangGraph

LangGraph is not as much a multi-agent framework as a graph framework that allows developers to define complex inter-agent interactions as graphs. It focuses on building stateful, multi-actor applications with fine-grained control over agent interactions.

LangGraph is suitable when your solution requires complex inter-agent interactions and fine-grained control over these interactions. You can have implement either a static graph or a dynamic graph. A static graph is a graph that does not change between runs. A dynamic graph is a graph that changes between runs. Agents plan and make the graph dynamically.

AutoGen

AutoGen specializes in conversational agents, providing conversation as a high-level abstraction over multi-agent collaboration. It emerged as perhaps the first multi-agent framework. The biggest difference in mental model between LangGraph and AutoGen is in the construction of the agents.

AutoGen is the better choice when you’re looking to develop a solution that requires conversational agents.

Multi-agents vs Single agents

According to a recent research paper (https://arxiv.org/html/2402.18272v1), experiments were conducted to compare the performance of single-agent reasoning tasks with multi-agent discussions. The results indicate that when equipped with a robust prompt and a powerful language model, a single agent can match the performance of multi-LLM multi-agent discussions. Nonetheless, without the aid of demonstrations in the prompt, multi-agent discussions generally surpass the capabilities of a solo agent. The research also highlights that in multi-agent setups, the inclusion of agents with stronger language models can boost the overall performance, even elevating the contributions of agents with less powerful models.

The problems

Autonomous agents have a number of challenges that make them more difficult to operationalize than RAG systems. These challenges include:

Less control over the systems behaviour, since it can by dynamic and cyclic
Can get stuck in a loop or far off track pursuing costly paths with little potential business value
The agent can have difficulty figuring out when it has completed a task
Can be too slow
Can be too expensive, an agent will likely use more tokens than a RAG system
More difficult to evaluate and debug, as they are more complex and less predictable than RAG systems
More difficult to understand and explain
Difficult to steer to solve a specific problem

For us to operationalize autonomous agents, we need better internal guardrails, a good way of tracking and understanding the inner reasoning of the agent.

The reason I have not found autonomous LLM agents to be as useful as RAG systems is perhaps, they are designed for a different purpose. Autonomous agents are aiming to solve artificial general intelligence (AGI), which is different solving a scoped specific task, where recipes can be used (artificial narrow intelligence).

In conclusion, although I have learned a lot from investigating autonomous agents, I have not found them to be useful for production systems in our company. However, I might experiment with implementing parts of the agent architecture into our RAG system, since it can make the system better at adapting to many different tasks.

The components of a LLM Agent#

Profiling#

Planning#

Planning without feedback#

Planning with feedback#

ReAct prompting#

Memory#

Actions#

The frameworks#

LangChain#

LangGraph#

AutoGen#

Multi-agents vs Single agents#

The problems#