A Comparative Evaluation of Prompting Strategies for Code Generation with Large Language Models

Fanyi Zhao; Mingzhuo Yu; Chuankai Luo

doi:10.66372/JGER.v2i1.1

Authors

Fanyi Zhao Computer Science, Stevens Institute of Technology, NJ, USA Author
Mingzhuo Yu Computer Science, Northeastern University, MA, USA Author
Chuankai Luo Electronic Information Engineering, Tsinghua University, Beijing, China Author

DOI:

https://doi.org/10.66372/JGER.v2i1.1

Keywords:

large language models, code generation, prompting strategies, empirical evaluation

Abstract

Large language models have demonstrated remarkable capabilities in automated code generation, yet the relative effectiveness of different prompting strategies remains insufficiently characterized under zcontrolled experimental conditions. This study presents a systematic comparative evaluation of six prompting strategies — zero-shot, few-shot, chain-of-thought, structured chain-of-thought, self-debugging, and self-refine — across four representative models (GPT-4o, GPT-3.5-Turbo, DeepSeek-Coder-33B-Instruct, and CodeLlama-34B-Instruct) on three established benchmarks (HumanEval+, MBPP+, and LiveCodeBench). The evaluation encompasses functional correctness measured by pass@1, computational cost quantified by average token consumption, and difficulty-stratified performance analysis on contamination-free competitive programming tasks. Results indicate that structured chain-of-thought prompting yields the most consistent improvements across all models, achieving 6.7–8.5 percentage point gains over zero-shot baselines on HumanEval+. Multi-turn strategies such as self-debugging deliver the highest absolute performance on frontier models (80.5% pass@1 for GPT-4o on HumanEval+) while offering diminished returns on smaller open-source models. Cost-effectiveness analysis reveals that single-turn structured prompting achieves 77–93% of multi-turn performance at 72% lower token cost. These findings provide empirically grounded guidance for practitioners selecting prompting strategies under varying model capability levels and computational resource constraints.

Author Biography

Chuankai Luo, Electronic Information Engineering, Tsinghua University, Beijing, China

A Comparative Evaluation of Prompting Strategies for Code Generation with Large Language Models

Authors

DOI:

Keywords:

Abstract

Author Biography

Downloads

Published

Issue

Section

License

How to Cite

Manu

For Authors

About Journal

Editorial Team

Make a Submission

Ready to Publish