Evaluating the Quality of Large Language Model-Generated Explanations in Recommendation Tasks: A Multi-Dimensional Comparative Analysis

Authors

  • Zijie Chen Computer Engineering, University of Toronto Master, Toronto, Canada Author
  • Minghui Wang School of Software and Microelectronics, Peking University, Beijing, China Author

DOI:

https://doi.org/10.66372/JGER.v3i1.4

Keywords:

large language models, recommendation explanation, evaluation framework, prompt strategy

Abstract

The integration of large language models (LLMs) into recommendation systems has introduced new possibilities for generating natural language explanations that accompany recommended items. While prior research has explored methods of leveraging LLMs for explanation generation, limited attention has been given to systematically evaluating the quality of these explanations across different models, domains, and prompting strategies. This paper presents a multi-dimensional comparative analysis of LLM-generated recommendation explanations, examining four commercially available and open-source LLMs across three public recommendation datasets. A structured evaluation framework is proposed that encompasses four quality dimensions: faithfulness, informativeness, persuasiveness, and personalization. The evaluation employs both automatic metrics (BLEU-4, ROUGE-L, BERTScore) and human annotation protocols involving 12 trained evaluators. The experimental results indicate that larger-parameter models produce more informative and faithful explanations, though the gap narrows substantially when context-enhanced prompting strategies are applied. Automatic metrics show moderate correlation with human judgments on informativeness and faithfulness but limited alignment on persuasiveness and personalization. These findings offer practical guidance for selecting appropriate LLMs and prompting strategies in explanation-augmented recommendation applications.

Author Biography

  • Minghui Wang, School of Software and Microelectronics, Peking University, Beijing, China

     

     

Downloads

Published

2025-01-16

How to Cite

Evaluating the Quality of Large Language Model-Generated Explanations in Recommendation Tasks: A Multi-Dimensional Comparative Analysis. (2025). Journal of Global Engineering Review, 3(1), 54-63. https://doi.org/10.66372/JGER.v3i1.4