A Comparative Evaluation of Prompting Strategies for Code Generation with Large Language Models

Authors

  • Fanyi Zhao Computer Science, Stevens Institute of Technology, NJ, USA Author
  • Mingzhuo Yu Computer Science, Northeastern University, MA, USA Author
  • Chuankai Luo Electronic Information Engineering, Tsinghua University, Beijing, China Author

DOI:

https://doi.org/10.66372/JGER.v2i1.1

Keywords:

large language models, code generation, prompting strategies, empirical evaluation

Abstract

Large language models have demonstrated remarkable capabilities in automated code generation, yet the relative effectiveness of different prompting strategies remains insufficiently characterized under zcontrolled experimental conditions. This study presents a systematic comparative evaluation of six prompting strategies — zero-shot, few-shot, chain-of-thought, structured chain-of-thought, self-debugging, and self-refine — across four representative models (GPT-4o, GPT-3.5-Turbo, DeepSeek-Coder-33B-Instruct, and CodeLlama-34B-Instruct) on three established benchmarks (HumanEval+, MBPP+, and LiveCodeBench). The evaluation encompasses functional correctness measured by pass@1, computational cost quantified by average token consumption, and difficulty-stratified performance analysis on contamination-free competitive programming tasks. Results indicate that structured chain-of-thought prompting yields the most consistent improvements across all models, achieving 6.7–8.5 percentage point gains over zero-shot baselines on HumanEval+. Multi-turn strategies such as self-debugging deliver the highest absolute performance on frontier models (80.5% pass@1 for GPT-4o on HumanEval+) while offering diminished returns on smaller open-source models. Cost-effectiveness analysis reveals that single-turn structured prompting achieves 77–93% of multi-turn performance at 72% lower token cost. These findings provide empirically grounded guidance for practitioners selecting prompting strategies under varying model capability levels and computational resource constraints.

Author Biography

  • Chuankai Luo, Electronic Information Engineering, Tsinghua University, Beijing, China

     

     

Downloads

Published

2024-01-06

How to Cite

A Comparative Evaluation of Prompting Strategies for Code Generation with Large Language Models. (2024). Journal of Global Engineering Review, 2(1), 1-11. https://doi.org/10.66372/JGER.v2i1.1