The objective of this exercise series is to develop a prototype of a Retrieval-Augmented Generation (RAG) system capable of answering questions based on 10-K filings submitted to the U.S. Securities and Exchange Commission (SEC). The full series includes six Colab notebooks, each exploring progressively advanced concepts in RAG systems and their applications:
Exercise 3: RAG with Query Decomposition & Tracing with LangSmith/LangFuse
Code with Explanation is posted here: Colab Notebook Link
Exercise 5: RAG with Agentic Pattern: ReAct + Reflection
These exercises incrementally build on basic RAG with focus on “why” before “what” and “how".
This exercise, the third in the series, focuses on illustrating how complex queries can be decomposed into simpler sub-queries by in the quality of response generated by RAG system. This exercise extends the last exercise by adding Reranker.
We encourage readers to go through Reranking Retrieved Chunks using Reranker (Cross-Encoder model) before going through the code.
When users interact with RAG systems, they often pose complex questions that encompass multiple aspects or require information from different areas of the knowledge base. Consider a query like "How do Tesla and GM's approaches to manufacturing and production compare, particularly for electric vehicles? Where are their vehicles produced?" This question combines several distinct informational needs: manufacturing methodologies, EV-specific production approaches, and factory locations for two different companies. Direct vector similarity search with such compound queries can be suboptimal, as the embedding may not effectively capture all query dimensions simultaneously. Query decomposition addresses this challenge by leveraging a Large Language Model (LLM) to break down complex queries into simpler, more focused sub-queries that can be processed independently before being synthesized into a comprehensive response.
The decomposition process typically starts by prompting the LLM to analyze the user's question and identify its core components. A sample prompt, shown below, guides the LLM to generate a set of atomic sub-queries that collectively cover all aspects of the original question. For the automotive manufacturing comparison, the LLM might generate targeted sub-queries like:
This approach enables more precise matching with relevant chunks in the vector database, as each sub-query can be vectorized to capture specific semantic aspects more accurately. The retrieved chunks for each sub-query are then combined and reranked to provide a complete context for the language model to generate a coherent response that compares and contrasts both companies' manufacturing strategies and facility locations.
The use of LLMs for query decomposition offers several advantages over rule-based or keyword-based approaches. LLMs can understand implicit relationships within questions, identify logical dependencies between different query components, and generate sub-queries that maintain the original intent while being optimized for retrieval.
Sample prompt for query decomposition:
You are an expert at converting user questions into specific database queries for similarity search. Break down the `user-question` into distinct sub-queries that address different aspects of the original question. Ensure that the set of sub-queries comprehensively covers the main aspects of the original question.
user-question: ```<paste user-question here>```