Evaluating the Quality of Large Language Model-Generated Explanations in Recommendation Tasks: A Multi-Dimensional Comparative Analysis

Zijie Chen; Minghui Wang

doi:10.66372/JGER.v3i1.4

Authors

Zijie Chen Computer Engineering, University of Toronto Master, Toronto, Canada Author
Minghui Wang School of Software and Microelectronics, Peking University, Beijing, China Author

DOI:

https://doi.org/10.66372/JGER.v3i1.4

Keywords:

large language models, recommendation explanation, evaluation framework, prompt strategy

Abstract

The integration of large language models (LLMs) into recommendation systems has introduced new possibilities for generating natural language explanations that accompany recommended items. While prior research has explored methods of leveraging LLMs for explanation generation, limited attention has been given to systematically evaluating the quality of these explanations across different models, domains, and prompting strategies. This paper presents a multi-dimensional comparative analysis of LLM-generated recommendation explanations, examining four commercially available and open-source LLMs across three public recommendation datasets. A structured evaluation framework is proposed that encompasses four quality dimensions: faithfulness, informativeness, persuasiveness, and personalization. The evaluation employs both automatic metrics (BLEU-4, ROUGE-L, BERTScore) and human annotation protocols involving 12 trained evaluators. The experimental results indicate that larger-parameter models produce more informative and faithful explanations, though the gap narrows substantially when context-enhanced prompting strategies are applied. Automatic metrics show moderate correlation with human judgments on informativeness and faithfulness but limited alignment on persuasiveness and personalization. These findings offer practical guidance for selecting appropriate LLMs and prompting strategies in explanation-augmented recommendation applications.

Author Biography

Minghui Wang, School of Software and Microelectronics, Peking University, Beijing, China

Evaluating the Quality of Large Language Model-Generated Explanations in Recommendation Tasks: A Multi-Dimensional Comparative Analysis

Authors

DOI:

Keywords:

Abstract

Author Biography

Downloads

Published

Issue

Section

License

How to Cite

Manu

For Authors

About Journal

Editorial Team

Make a Submission

Ready to Publish