For consumers of advanced technical products, finding and understanding product documentation and learning about the systems can be a daunting task. With a wide range of products and complex engineering knowledge required, both our internal teams and customers often struggle to get the information they need. To address this challenge, we are developing an advanced AI chatbot designed to simplify access to documentation and provide quick answers to technical questions. This solution aims to enhance both internal efficiency and customer satisfaction. Over the past seven months, significant progress has been made, and as the AI architect, I am excited to share the results and insights gained from this project.

Design and architecture

The chatbot is built using a large language model (LLM) and a knowledge base. We are using widely used Advanced Retrieval Augmented Generation (RAG) pattern. Another possible direction we considered was using Agents, but we wanted to see how far we can get with RAG first, which is a lot easier to manage and debug. A good rule in data science, is it’s best to start with simple methods and use advanced ones only when necessary. My experience with Agents is that they often have issues getting caught in loops or veering significantly off course, and they can struggle to accurately identify when a task has been completed.

The current architecture of our chatbots conversation functionality.

Let me describe each of the steps in the current chatbot conversation chain:

Step 1: Rewrite the question

We rewrite the query to make it more specific and to remove any ambiguity. A LLM is used to rewrite the query. This step is our way to handle coreference for instance “what is this?” where “this” is defined earlier in the chat history.

Step 2: Intent classification and entity recognition

An transformer based model is trained for doing intent classification and entity recognition. The intent is used for routing the query and select strategies, for instance some intents like summarize, translate or “thanks” do not need a search in the knowledge base. Also in step 6 the intent is used to combine the right templates for the prompt.

See: How to make sense of what a user is asking for in a LLM chatbot

Step 3: Prepare the advanced search query

We use the entities extracted in step 2 to prepare the advanced search query. We do a lookup using our API’s to see what literature is relevant for the product names or product codes mentioned in the question. The output of this step is a list of sources to search in in the next step.

Step 4: Retrieve the relevant documents

We use the advanced search query to retrieve the relevant documents from the knowledge base. The output of this step is a list of documents that might be relevant for the question.

Step 5: Post-retrieval reranking and filtering

The search is not perfect, for instance a vector search searches for similarity not relevance. We filter away any dublicates and then use a reranking algorithm and to sort top 50 documents based on relevance and take top 5 from the reranked list. The output of this step is a list of documents that are relevant for the question.

Step 6: Generate the prompt from templates

Now we have a list of documents that are relevant for the question. We use the intent from step 2 to select the right templates for the prompt. The output of this step is a list of prompt ready to be used in the next step.

Step 7: Generate the response using LLM

Finally we use the LLM to generate the response using the prompt. The output of this step is the response that is sent back to the user.


I will dive deeper into each of these steps in future posts as well as how we do the indexing into the knowledge search.