Local Government Watch

Local Gov Watch is a website which enable you to search local government websites easily using natural language.

Purpose
How it works - Overview
Technical details
How RAG can be applied to your data

Purpose

Local governments produce thousands of PDFs each day. These documents are hard to find using external search engines such as Google or local council search engines.

This site has parsed PDFs into small chunks of text so that they can be searched and then passed to a large language model (LLM) for more natural and human friendly search results.

If the large language model response does not answer your question sufficiently, you can click on the links to see the source documents to find out the source information on a council website.

How it works - Overview

The search is an example of retrieval augmented generation (RAG) using a large language model (LLM).

Roughly the local government search proceeds as follows:-

The user submits a query through the search interface.
The query is passed to a search engine (Google) and relevant results retrieved from the world wide web.
The system retrieves relevant documents from a database of previously parsed local council PDFs.
The combined web search results and pdf documents are then processed using a large language model (ChatGPT) to generate an intelligent response for the original question.

The process of searching a database and then combining this information with the original query before sending both to a large language model is known as retrieval-augmented generation (RAG).

This helps to reduce LLM hallucinations, when a direct query to ChatGPT may give an incorrect answer due to reliance on out-of-date or incorrect information. The RAG process restricts the answer to source material to reduce this effect.

Technical details

The method of retrieving relevant documents given a query is slightly more complex than described above. We use a variety of techniques to make the search more robust and reliable for example:-

The query and the web documents are sent to the ChatGPT to generate an initial web only answer
The web only answer is then used to retrieve the nearest 50 text chunks from the vector database

this search using an answer technique is known as Hypothetical Document Embeddings

The top 50 matches are then reduced to 10 using Maximal Marginal Relevance (MMR)

is a simple weighting technique which reduces a set of embedded document vectors by combining the metrics of similarity to a target vector and diversity from each other.

The remaining 10 matches are then re-ranked by directly asking chatGPT to pick the top 5 matches most relevant to the original query.

The steps above make the extraction of relevant PDFs slower than simply embedding the query and picking the most similar chunk from the vector database. This simple approach works well for most toy examples in RAG tutorials, however as the vector database increases in size (our vector database contains ~1 million entries) the matching task therefore becomes more challenging, we have found that the extra steps above are necessary to return relevant results.

Simpler direct RAG methods are also more sensitive to slight mis-spellings and abbreviations compared to the HyDE+MMR+Re-ranking approach described above.

The advantages of the extra steps are as follows:-

HyDE generalises a query to an answer format before finding similar embedded answers in the backend database. Attempting to embed a query (i.e. a question) and match this to an answer embedding is less reliable.
MMR allows more PDF text chunks to be considered and guarantees diversity, avoiding returning lots of the similar chunks, which can occur using only a similarity metric. For example there is a lot of boilerplate text in government PDFs, which we mostly want to return one copy of at most.
Re-ranking applies an LLM to a set of candidate PDFs, using the power of a direct LLM question to remove less relevant text chunks. This process is more reliable than comparing document embedding vectors only for promising text candidates.

Finally the set of retrieved PDFs, web links and original query are passed to the LLM to formulate a coherent answer. This step is conceptually fairly simple, but relies on the pdf retrieval step working reliably.

How RAG can be applied to your data

If you have a text database that you wish to make easily searchable, e.g. for internal or external users, then TWA can help build a human friendly search index using RAG based methods. For more information contact us and see how we can help with your AI projects.