Chat interfaces, the needle, and the haystack
Exciting times at Prismos! We just released a major upgrade of our chat assistant. In this blog post, I’ll explain why this release is a big deal.
The problem at hand
Western democracies produce enormous amounts of data. At Prismos, our goal is to solve the information overload problem that policy experts experience daily by fixing search. We capture and structure an ever-increasing amount of this data, build search algorithms on top, and make it available to our users. One of the ways we make this data available is through a chat interface; that is, an interface where users type instructions rather than click buttons to execute an action. We have been strong believers in the chat interface from the get-go. The very first version of Prismos, which we launched ahead of the Commissioner Hearings in November 2024, already had one, and we have been continuously improving it for about a year now. Users love it because of its simplicity. You just ask questions in plain natural language, and it then becomes our problem to make sure you get the answer you want.
Making it our problem is the only right approach. There should be no learning curve to use Prismos. But take a step back and think about the complexity of this problem. Users could ask literally anything, ranging from very general questions like “which members of parliaments X and Y have been most vocal about topic Z in the past month?”, to very precise questions like “what was said about topic X during meeting Y?”. We can’t simply train an LLM on the data we capture and make that model power the chat interface. LLMs aren’t architected to produce true statements. They predict the most likely word based on the sequence they’ve already generated and the user’s input. If the whole sequence ends up being a true statement, then you’re just lucky. Add to that the humongous cost of having to retrain a model all the time for it to stay up to date with the latest developments in politics, and it quickly becomes clear that we need something else.
What if we treat an LLM for what it is - a smart next-word predictor - and we stop caring about the world knowledge it might or might not store? With that approach, we can split the problem into two: a text generation problem and a data management problem. The LLM takes care of the text generation part. Another piece of software takes care of the data management part. Let’s call the latter one a “query engine”. Essentially, with this approach, the query engine would retrieve those data points from our data infrastructure that contain the answer to the user’s question, and feed those data to the LLM. The LLM should then synthesise these raw data into a coherent whole. Now things become easier to tackle. For starters, we only expect our LLM to excel at synthesising raw data. No more reliance on the knowledge stored in the its neurons. Off-the-shelf LLMs are already good at this, so that means we can save our time and energy and don’t have to bother with training one ourselves. This means we can solve our problem by focusing on the query engine.
Hide and seek
This is where things become interesting. By decomposing our problem in the way we did, we are able to frame it as a Retrieval-Augmented Generation problem (RAG). The RAG paradigm was first introduced in 2020, during the early days of transformer models. RAG applications work like described above: a user asks a question; a query engine retrieves relevant datapoints from a database and feeds them to an LLM; the LLM generates a response based on the context it received; and this output is presented to the user. The very first versions of the Prismos chat assistant were traditional RAG applications along these lines. But we weren’t quite there yet. The main issue, it turned out, was very simple: users could ask literally anything. That means the query engine needs to intelligently pull together data that could, in theory, live in opposite parts of our data infrastructure, often requiring sequential database queries where each query depends on what the previous one returned.
Traditional RAG applications typically retrieve all context upfront in a single pass. They cannot easily handle these complex, multi-step queries. AI engineers often went for the nuclear option to force the query engine to read all information that is possibly relevant, in the hopes that the answer to the user’s query is “somewhere” in that data dump. This doesn’t work quite well. There is only so much context an LLM can handle (see the infamous “context window” limits), and the more context you add, the harder it becomes for an LLM to synthesise it. If you expect someone to find a needle, then don’t hide it in a haystack to begin with. RAG, it turns out, is too rigid. It lacks the flexibility you need when you’re sitting on a lot of data, like Prismos is. We need a just-in-time query engine, software that excels at fetching context exactly when, and only when, this context is needed for the question at hand, and, crucially, excels at forgetting context as soon as it’s no longer relevant to the conversation.
Enter Agentic RAG: an approach to RAG where the query engine is implemented as an agent that dynamically decides which datapoints to read from storage and when. The new Prismos chat assistant is an Agentic RAG assistant. It checks, with every user message, which context needs to be fetched and which context needs to be forgotten. This is why this release is a big deal. You can now ask follow-up questions that require pulling data from multiple sources, or shift topics mid-conversation without the chat assistant getting confused by irrelevant context. Think of mapping stakeholder networks across institutions, tracking how specific issues evolve through different political bodies, or even analysing how a politician’s stance on a topic has shifted over time. All this used to be painstaking manual work, but can now be done conversationally. This isn’t just an incremental improvement. It’s the foundation for an entirely new way of working with political data.


