[1]

“Latency–Throughput Tradeoffs of ONNX Runtime, TensorRT-LLM, vLLM, and Triton: An Empirical Comparison on 1B–3B Parameter LLM Inference”, jger, vol. 4, no. 1, pp. 173–182, Feb. 2026, doi: 10.66372/JGER.v4i1.12.