Passage, Sentence, or Proposition? An Empirical Comparison of Retrieval Granularity Effects on LLM Answer Accuracy in Retrieval-Augmented Generation
DOI:
https://doi.org/10.66372/JGER.v3i1.6Keywords:
retrieval-augmented generation, retrieval granularity, open-domain question answering, large language modelsAbstract
Retrieval-Augmented Generation (RAG) has become a dominant paradigm for grounding large language model (LLM) outputs in external knowledge. While extensive research has focused on retriever architectures and generation strategies, the choice of retrieval granularity—the textual unit indexed and retrieved—remains insufficiently studied. This paper presents a controlled empirical comparison of four retrieval granularity levels: document, passage (100-word window), sentence, and proposition. Experiments are conducted across three open-domain question answering benchmarks (Natural Questions, TriviaQA, and HotpotQA) using two representative dense retrievers (DPR and Contriever) paired with LLaMA-2-7B-Chat as the reader. Results indicate that finer-grained retrieval units consistently improve retrieval recall, with proposition-level indexing achieving up to 6.8 absolute points higher Recall@20 than passage-level on Natural Questions under DPR. End-to-end answer accuracy follows a similar trend for single-hop factoid questions, where proposition-level retrieval yields the highest Exact Match scores. On multi-hop questions in HotpotQA, this advantage diminishes and passage-level retrieval produces comparable or slightly superior accuracy, suggesting that broader contextual units are beneficial when reasoning across multiple evidence pieces. These findings provide practical guidance for RAG pipeline design: retrieval granularity should be selected in accordance with question complexity, and no single granularity level dominates across all conditions.

