The messy reality of evaluating conversational AI: lessons from the fog

Have you ever tried to measure fog with a ruler? Probably not. But evaluating AI chatbots can feel a lot like that. Fog is undefined, slippery, and without clear edges—making the ruler a pretty useless tool. When you think you’ve nailed down the perfect metric or found “the right answer,”—poof—the fog shifts, and you’re back at square one. Still, if you care about building trustworthy AI, learning to navigate this fog is essential. ...

October 7, 2025 · 19 min · Martin Møldrup

LLM Agents: Hype or Real Value for Your Projects?

There are many choices to take when choosing the architecture for an LLM application, should you create a RAG flow, an agents or what about multi-agent architectures? This post will give an overview of the different types of LLM agents and how, why and if they should be used. Is the technology mature to be used in production systems? What are the challenges and can they be overcome? An agent can be used when a task is too complex to be solved by a single recipe. When there are multiple reasoning steps to solve the problem, and the steps are not always the same. Agents are relevant when they are put into a dynamic environment, where the agent needs to learn from its environment and adapt to it. ...

July 2, 2024 · 9 min · Martin Møldrup