Large language models for source code generation and editing
Zapiski Nauchnykh Seminarov POMI, Investigations on applied mathematics and informatics. Part IV, Tome 540 (2024), pp. 276-350 Cet article a éte moissonné depuis la source Math-Net.Ru

Voir la notice du chapitre de livre

Lomshakov V. M., Nikolenko S. I. Large Language Models for Source Code Generation and Editing. In recent years, large language models (LLMs) have significantly transformed approaches to the automation of software development, providing powerful tools for code generation, correction, and optimization. In this survey, we examine methods for adapting LLMs to programming tasks, including reinforcement learning from human feedback (RLHF), instruction tuning, parameter-efficient fine-tuning (PEFT), and effective prompting strategies. We review modern approaches for fine-tuning and LLM applications, discuss their advantages and limitations, consider relevant datasets for code generation and correction tasks and the corresponding evaluation metrics. Additionally, we describe state of the art open weight models for working with source code.
@article{ZNSL_2024_540_a14,
     author = {V. M. Lomshakov and S. I. Nikolenko},
     title = {Large language models for source code generation and editing},
     journal = {Zapiski Nauchnykh Seminarov POMI},
     pages = {276--350},
     year = {2024},
     volume = {540},
     language = {ru},
     url = {http://geodesic.mathdoc.fr/item/ZNSL_2024_540_a14/}
}
TY  - JOUR
AU  - V. M. Lomshakov
AU  - S. I. Nikolenko
TI  - Large language models for source code generation and editing
JO  - Zapiski Nauchnykh Seminarov POMI
PY  - 2024
SP  - 276
EP  - 350
VL  - 540
UR  - http://geodesic.mathdoc.fr/item/ZNSL_2024_540_a14/
LA  - ru
ID  - ZNSL_2024_540_a14
ER  - 
%0 Journal Article
%A V. M. Lomshakov
%A S. I. Nikolenko
%T Large language models for source code generation and editing
%J Zapiski Nauchnykh Seminarov POMI
%D 2024
%P 276-350
%V 540
%U http://geodesic.mathdoc.fr/item/ZNSL_2024_540_a14/
%G ru
%F ZNSL_2024_540_a14
V. M. Lomshakov; S. I. Nikolenko. Large language models for source code generation and editing. Zapiski Nauchnykh Seminarov POMI, Investigations on applied mathematics and informatics. Part IV, Tome 540 (2024), pp. 276-350. http://geodesic.mathdoc.fr/item/ZNSL_2024_540_a14/

[1] A. Aghajanyan, S. Gupta, L. Zettlemoyer, “Intrinsic dimensionality explains the effectiveness of language model fine-tuning”, Proc. 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (August 2021, Online), v. 1, Long Papers, Association for Computational Linguistics, 2021, 7319–7328

[2] M. Ahn et al., Do as I can, not as I say: Grounding language in robotic affordances, 2022, arXiv: 2204.01691

[3] E. L. Aleixo, J. G. Colonna, M. Cristo, E. Fernandes, Catastrophic forgetting in deep learning: A comprehensive taxonomy, 2023, arXiv: 2312.10549

[4] Anthropic, Claude 3.5 sonnet, 2024 https://www.anthropic.com/news/claude-3-5-sonnet

[5] Anthropic, Introducing the next generation of Claude, 2024 https://www.anthropic.com/news/claude-3-family

[6] A. Asai, S. Min, Z. Zhong, D. Chen, “Retrieval-based language models and applications”, Proc. 61st Annual Meeting of the Association for Computational Linguistics (Toronto, Canada, July 2023), v. 6, Tutorial Abstracts, Association for Computational Linguistics, 2023, 41–46

[7] J. Austin et al., Program synthesis with large language models, 2021, arXiv: 2108.07732

[8] M. Bavarian, H. Jun, N. Tezak, J. Schulman, C. McLeavey, J. Tworek, M. Chen, Efficient training of language models to fill in the middle, 2022, arXiv: 2207.14255 | Zbl

[9] J. Baxter, P. L. Bartlett, “Infinite-horizon policy-gradient estimation”, J. Artif. Int. Res., 15:1 (2001), 319–350 | MR | Zbl

[10] R. A. Bradley, M. E. Terry, “Rank analysis of incomplete block designs. I. The method of paired comparisons”, Biometrika, 39 (1952), 324–245 | MR

[11] T. Brown et al., “Language models are few-shot learners”, Advances in Neural Information Processing Systems, 33, Curran Associates, Inc., 2020, 1877–1901

[12] X.-Q. Cai, Y.-J. Zhang, C.-K. Chiang, M. Sugiyama, “Imitation learning from vague feedback”, Proc. 37th International Conference on Neural Information Processing Systems (Red Hook, NY, USA), Curran Associates, Inc., 2024

[13] Y. Cao et al., A comprehensive survey of AI-generated content (AIGC): A history of generative AI from GAN to ChatGPT, 2023, arXiv: 2303.04226

[14] S. Chaudhary, Code Alpaca: An instruction-following LLaMA model for code generation, 2023 https://github.com/sahil280114/codealpaca

[15] A. Chen et al., “Learning from natural language feedback”, Trans. Mach. Learn. Res., 2024

[16] A. Chen et al., Improving code generation by training with natural language feedback, 2024, arXiv: 2303.16749

[17] M. Chen et al., Evaluating large language models trained on code, 2021, arXiv: 2107.03374

[18] P. F. Christiano et al., Deep reinforcement learning from human preferences, Advances in Neural Information Processing Systems, 30, Curran Associates, Inc., 2017

[19] A. Chronopoulou, M. Peters, J. Dodge, “Efficient hierarchical domain adaptation for pretrained language models”, Proc. 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Seattle, United States, July 2022), Association for Computational Linguistics, 2022, 1336–1351 | DOI

[20] A. Chronopoulou, M. Peters, A. Fraser, J. Dodge, “AdapterSoup: Weight averaging to improve generalization of pretrained language models”, Findings of the Association for Computational Linguistics: EACL (Dubrovnik, Croatia, May 2023), Association for Computational Linguistics, 2023, 2054–2063 | DOI

[21] H. W. Chung et al., “Scaling instruction-finetuned language models”, J. Mach. Learn. Res., 25:70 (2024), 1–53

[22] K. Cobbe et al., Training verifiers to solve math word problems, 2021, arXiv: 2110.14168 | Zbl

[23] N. Dai et al., Process supervision-guided policy optimization for code generation, 2024, arXiv: 2410.17621

[24] M. De Lange et al., “A continual learning survey: Defying forgetting in classification tasks”, IEEE Trans. Pattern Anal. Mach. Intell., 44:7 (2022), 3366–3385

[25] DeepSeek-AI et al., DeepSeek-v2: A strong, economical, and efficient mixture-of-experts language model, 2024, arXiv: 2405.04434

[26] DeepSeek-AI et al., DeepSeek-Coder-v2: Breaking the barrier of closed-source models in code intelligence, 2024, arXiv: 2406.11931

[27] N. Ding et al., “Sparse low-rank adaptation of pre-trained language models”, The 2023 Conference on Empirical Methods in Natural Language Processing, 2023

[28] X. Du et al., “Evaluating large language models in class-level code generation”, Proc. IEEE/ACM 46th International Conference on Software Engineering, ICSE '24 (New York, NY, USA), Association for Computing Machinery, 2024

[29] W. Fan et al., A survey on RAG meeting LLMs: Towards retrieval-augmented large language models, 2024, arXiv: 2405.06211

[30] Z. Feng et al., “CodeBERT: A pre-trained model for programming and natural languages”, Findings of the Association for Computational Linguistics: EMNLP (Online, November 2020), Association for Computational Linguistics, 2020, 1536–1547 | DOI

[31] R. M. French, “Catastrophic forgetting in connectionist networks”, Trends Cogn. Sci., 3:4 (1999), 128–135 | DOI

[32] J. Fürnkranz, E. Hüllermeier, W. Cheng, S.-H. Park, “Preference-based reinforcement learning: A formal framework and a policy iteration algorithm”, Mach. Learn, 89:1–2 (2012), 123–156 | DOI | MR | Zbl

[33] Y. Gao et al., Retrieval-augmented generation for large language models: A survey, 2024, arXiv: 2312.10997

[34] J. Gehring et al., RLEF: Grounding code LLMs in execution feedback with reinforcement learning, 2024, arXiv: 2410.02089 | Zbl

[35] D. Guo et al., DeepSeek-Coder: When the large language model meets programming — the rise of code intelligence, 2024, arXiv: 2401.14196

[36] S. Gururangan, M. Lewis, A. Holtzman, N. A. Smith, L. Zettlemoyer, “DEMix layers: Disentangling domains for modular language modeling”, Proc. 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Seattle, United States, July 2022), Association for Computational Linguistics, 5557–5576

[37] D. Ha, A. M. Dai, Q. V. Le, “Hypernetworks”, International Conference on Learning Representations, 2017

[38] Z. Han et al., Parameter-efficient fine-tuning for large models: A comprehensive survey, 2024, arXiv: 2403.14608 | MR

[39] J. He, C. Zhou, X. Ma, T. Berg-Kirkpatrick, G. Neubig, “Towards a unified view of parameter-efficient transfer learning”, International Conference on Learning Representations, 2022

[40] J. Hoffmann et al., Training compute-optimal large language models, 2022, arXiv: 2203.15556

[41] A. Hosseini et al., V-star: Training verifiers for self-taught reasoners, 2024, arXiv: 2402.06457

[42] N. Houlsby et al., “Parameter-efficient transfer learning for NLP”, Proc. 36th International Conference on Machine Learning, Proceedings of Machine Learning Research, PMLR, 97, 2019, 2790–2799

[43] E. J. Hu et al., “LoRA: Low-rank adaptation of large language models”, International Conference on Learning Representations, 2022

[44] Y. Hu, Y. Xie, T. Wang, M. Chen, Z. Pan, “Structure-aware low-rank adaptation for parameter-efficient fine-tuning”, Math., 11:20 (2023)

[45] Y. Hu, Y. Lu, RAG and RAU: A survey on retrieval-augmented language model in natural language processing, 2024, arXiv: 2404.19543

[46] X. Huang et al., Recommender AI agent: Integrating large language models for interactive recommendations, 2024, arXiv: 2308.16505

[47] B. Hui et al., Qwen2.5-Coder technical report, 2024, arXiv: 2409.12186

[48] D. R. Hunter, “MM algorithms for generalized Bradley-Terry models”, Ann. Stat., 32:1 (2004), 384–406 | DOI | MR | Zbl

[49] A. Jain, S. Sharma, T. Joachims, A. Saxena, “Learning preferences for manipulation tasks from online coactive feedback”, Int. J. Rob. Res., 34:10 (2015), 1296–1313 | DOI

[50] N. Jain et al., LiveCodeBench: Holistic and contamination-free evaluation of large language models for code, 2024, arXiv: 2403.07974

[51] S. Kadavath et al., Language models (mostly) know what they know, 2022, arXiv: 2207.05221

[52] J. Kaplan et al., Scaling laws for neural language models, 2020, arXiv: 2001.08361

[53] J. Lehman et al., “The surprising creativity of digital evolution: A collection of anecdotes from the evolutionary computation and artificial life research communities”, Artif. Life, 26:2 (2020), 274–306 | DOI

[54] T. Lei et al., “Conditional adapters: Parameter-efficient transfer learning with fast inference”, Advances in Neural Information Processing Systems, 36, Curran Associates, Inc., 2023, 8152–8172

[55] C. Li, H. Farkhoor, R. Liu, J. Yosinski, “Measuring the intrinsic dimension of objective landscapes”, International Conference on Learning Representations, 2018

[56] H. Li et al., A survey on retrieval-augmented text generation, 2022, arXiv: 2202.01110

[57] J. Li, P. Zhou, C. Xiong, S. Hoi, “Prototypical contrastive learning of unsupervised representations”, International Conference on Learning Representations, 2021

[58] Y. Li et al., “Making language models better reasoners with step-aware verifier”, Proc. 61st Annual Meeting of the Association for Computational Linguistics (Toronto, Canada, July 2023), Association for Computational Linguistics, 2023, 5315–5333

[59] Y. Li, J. Parsert, E. Polgreen, “Guiding enumerative program synthesis with large language models”, Computer Aided Verification, Springer Nature Switzerland, Cham, 2024, 280–301 | MR

[60] Y. Li et al., “Competition-level code generation with AlphaCode”, Science, 378:6624 (2022), 1092–1097 | DOI

[61] Y. Li et al., “Competition-level code generation with AlphaCode”, Science, 378:6624 (2022), 1092-1097 | DOI

[62] J. Liu et al., “RLTF: Reinforcement learning from unit test feedback”, Trans. Mach. Learn. Res., 2023

[63] J. Liu et al., Is your code generated by ChatGPT really correct? Rigorous evaluation of large language models for code generation, 2023, arXiv: 2305.01210

[64] S.-Y. Liu et al., DORA: Weight-decomposed low-rank adaptation, 2024, arXiv: 2402.09353

[65] Y. Liu et al., RoBERTa: A robustly optimized BERT pretraining approach, 2019, arXiv: 1907.11692

[66] Z. Liu et al., ALoRA: Allocating low-rank adaptation for fine-tuning large language models, 2024, arXiv: 2403.16187

[67] R. Lowe, J. Leike, Aligning language models to follow instructions, 2022 https://openai.com/index/instruction-following/

[68] Z. Luo et al., WizardCoder: Empowering code large language models with Evol-Instruct, 2023, arXiv: 2306.08568

[69] M. Masana et al., “Class-incremental learning: Survey and performance evaluation on image classification”, IEEE Trans. Pattern Anal. Mach. Intell., 45:5 (2023), 5513–5533 | DOI

[70] M. McCloskey, N. J. Cohen, “Catastrophic interference in connectionist networks: The sequential learning problem”, Psychology of Learning and Motivation, 24, Academic Press, 1989, 109–165 | DOI

[71] S. Minaee et al., Large language models: A survey, 2024, arXiv: 2402.06196

[72] N. Muennighoff et al., Octopack: Instruction tuning code large language models, 2024, arXiv: 2308.07124

[73] R. Nakano et al., WebGPT: Browser-assisted question-answering with human feedback, 2022, arXiv: 2112.09332

[74] H. Naveed et al., A comprehensive overview of large language models, 2024, arXiv: 2307.06435

[75] H. D. Nguyen, F. Chamroukhi, “Practical and theoretical aspects of mixture-of-experts modeling: An overview”, WIREs Data Min. Knowl. Discov, 8:4 (2018), e1246 | DOI

[76] A. Ni et al., “LEVER: Learning to verify language-to-code generation with execution”, Proc. 40th International Conference on Machine Learning, 2023 JMLR.org

[77] OpenAI, Introducing ChatGPT, 2022 https://openai.com/index/chatgpt/

[78] OpenAI, GPT-4 technical report, 2023, arXiv: 2303.08774

[79] L. Ouyang et al., Training language models to follow instructions with human feedback, 2022, arXiv: 2203.02155

[80] L. Ouyang et al., “Training language models to follow instructions with human feedback”, Advances in Neural Information Processing Systems, 35, Curran Associates, Inc., 2022, 27730–27744

[81] P. Pan et al., “Continual deep learning by functional regularisation of memorable past”, Proc. 34th International Conference on Neural Information Processing Systems, Curran Associates, Inc., 2020

[82] G. I. Parisi et al., “Continual lifelong learning with neural networks: A review”, Neural Netw., 113 (2019), 54–71 | DOI

[83] J. S. Park et al., Generative agents: Interactive simulacra of human behavior, 2023, arXiv: 2304.03442

[84] M. R. Parvez et al., “Retrieval augmented code generation and summarization”, Findings of the Association for Computational Linguistics: EMNLP (Punta Cana, Dominican Republic, November 2021), 2021, 2719–2734 | DOI | MR

[85] J. Pfeiffer et al., “AdapterFusion: Non-destructive task composition for transfer learning”, Proc. 16th Conference of the European Chapter of the Association for Computational Linguistics (April 2021, Online), Association for Computational Linguistics, 2021, 487–503

[86] B. Pfülb, A. Gepperth, “A comprehensive, application-oriented study of catastrophic forgetting in DNNs”, International Conference on Learning Representations, 2019

[87] M. Popović, “chrF: Character n-gram F-score for automatic MT evaluation”, Proc. Tenth Workshop on Statistical Machine Translation (Lisbon, Portugal, September 2015), 392–395

[88] M. Post, “A call for clarity in reporting BLEU scores”, Proc. Third Conference on Machine Translation, Research Papers (Brussels, Belgium, October 2018), 186–191

[89] R. Rafailov et al., Direct preference optimization: Your language model is secretly a reward model, 2024, arXiv: 2305.18290

[90] S. Rebuffi et al., “iCaRL: Incremental classifier and representation learning”, 2017 IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2017, 5533–5542

[91] S. E. Robertson, K. S. Jones, “Relevance weighting of search terms”, J. Am. Soc. Inf. Sci., 27 (1976), 129–146 | DOI

[92] R. Rombach et al., “High-resolution image synthesis with latent diffusion models”, Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2022, 10684–10695

[93] B. Roziere et al., “Unsupervised translation of programming languages”, Proc. 34th International Conference on Neural Information Processing Systems, Curran Associates Inc., 2020

[94] J. Schulman et al., “Trust region policy optimization”, Proc. 32nd International Conference on Machine Learning (Lille, France, July 2015), PMLR, 37, 2015, 1889–1897

[95] J. Schulman et al., Proximal policy optimization algorithms, 2017, arXiv: 1707.06347

[96] J. Schulman et al., Proximal policy optimization algorithms, 2017, arXiv: 1707.06347

[97] Z. Shao et al., DeepSeekMath: Pushing the limits of mathematical reasoning in open language models, 2024, arXiv: 2402.03300

[98] J. Shen et al., “Generate rank: A multi-task framework for math word problems”, Findings of the Association for Computational Linguistics: EMNLP (Punta Cana, Dominican Republic, November 2021), 2021, 2269–2279

[99] N. Shinn et al., “Reflexion: Language agents with verbal reinforcement learning”, Proc. 37th International Conference on Neural Information Processing Systems, Curran Associates Inc., 2024

[100] M. Shridhar et al., ALFWORLD: Aligning text and embodied environments for interactive learning, 2021, arXiv: 2010.03768 | Zbl

[101] D. Shrivastava et al., Repository-level prompt generation for large language models of code, 2023, arXiv: 2206.12839

[102] A. C. Stickland, I. Murray, “BERT and PALs: Projected attention layers for efficient adaptation in multi-task learning”, Proc. 36th International Conference on Machine Learning (Long Beach, CA, USA), PMLR, 97, 2019, 5986–5995

[103] N. Stiennon et al., “Learning to summarize with human feedback”, Advances in Neural Information Processing Systems, 33, Curran Associates Inc., 2020, 3008–3021

[104] R. S. Sutton, A. G. Barto, Reinforcement learning: An introduction, MIT Press, 2018 | MR | Zbl

[105] R. S. Sutton et al., “Policy gradient methods for reinforcement learning with function approximation”, Proc. 12th International Conference on Neural Information Processing Systems, MIT Press, 1999, 1057–1063

[106] Gemini Team et al., Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context, 2024, arXiv: 2403.05530

[107] E. Todorov, T. Erez, Y. Tassa, “MuJoCo: A physics engine for model-based control”, 2012 IEEE/RSJ Int. Conf. Intell. Robots Syst. (IROS), 2012, 5026–5033

[108] H. Touvron et al., LLaMA: Open and efficient foundation language models, 2023, arXiv: 2302.13971

[109] H. Touvron et al., LLaMA 2: Open foundation and fine-tuned chat models, 2023, arXiv: 2307.09288

[110] M. Valipour et al., “DyLoRA: Parameter-efficient tuning of pre-trained models using dynamic search-free low-rank adaptation”, Proc. 17th Conference of the European Chapter of the Association for Computational Linguistics (Dubrovnik, Croatia, May 2023), 3274–3287

[111] A. van den Oord, Y. Li, O. Vinyals, Representation learning with contrastive predictive coding, 2018, arXiv: 1807.03748

[112] A. Vaswani et al., Attention is all you need, 2017, arXiv: 1706.03762 | Zbl

[113] L. Wang et al., User behavior simulation with large language model based agents, 2024, arXiv: 2306.02552

[114] Y. Wang et al., “RecMind: Large language model powered agent for recommendation”, Findings of the Association for Computational Linguistics: NAACL (Mexico City, Mexico, June 2024), 2024, 4351–4364

[115] Y. Wang et al., “AdaMix: Mixture-of-adaptations for parameter-efficient model tuning”, Proc. 2022 Conference on Empirical Methods in Natural Language Processing (Abu Dhabi, UAE, December 2022), 5744–5760

[116] Y. Wang et al., Self-Instruct: Aligning language models with self-generated instructions, 2023, arXiv: 2212.10560

[117] Y. Wang et al., Super-NaturalInstructions: Generalization via declarative instructions on 1600+ NLP tasks, 2022, arXiv: 2204.07705

[118] Y. Wang et al., “Super-NaturalInstructions: Generalization via declarative instructions on 1600+NLP tasks”, Proc. 2022 Conference on Empirical Methods in Natural Language Processing (Abu Dhabi, UAE, December 2022), 5085–5109

[119] F. R. Ward, F. Toni, F. Belardinelli, “On agent incentives to manipulate human feedback in multi-agent reward learning scenarios”, Proc. 21st International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2022, 1759–1761

[120] C. J. H. Watkins, P. Dayan, “Q-learning”, Mach. Learn, 8:3–4 (1992), 279–292 | Zbl

[121] E. Wiewiora, “Reward shaping”, Encyclopedia of Machine Learning, Springer US, 2010, 863–865

[122] R. J. Williams, “Simple statistical gradient-following algorithms for connectionist reinforcement learning”, Mach. Learn., 8:3–4 (1992), 229–256 | Zbl

[123] C. Wirth et al., “A survey of preference-based reinforcement learning methods”, J. Mach. Learn. Res., 18:136 (2017), 1–46 | MR

[124] C. Xu et al., WizardLM: Empowering large language models to follow complex instructions, 2023, arXiv: 2304.12244

[125] Y. Xu et al., “Preference-based reinforcement learning with finite-time guarantees”, Proc. 34th International Conference on Neural Information Processing Systems, Curran Associates Inc., 2020

[126] J. Yang et al., SWE-Agent: Agent-computer interfaces enable automated software engineering, 2024, arXiv: 2405.15793

[127] Z. Yang et al., HotpotQA: A dataset for diverse, explainable multi-hop question answering, 2018, arXiv: 1809.09600

[128] S. Yao et al., “ReAct: Synergizing reasoning and acting in language models”, Int. Conf. Learn. Representations (ICLR), 2023

[129] Z. Yu et al., “$\mathcal{B}$-Coder: Value-based deep reinforcement learning for program synthesis”, Twelfth International Conference on Learning Representations (ICLR), 2024

[130] S. E. Yuksel et al., “Twenty years of mixture of experts”, IEEE Trans. Neural Netw. Learn. Syst., 23:8 (2012), 1177–1193 | DOI

[131] D. Zan et al., “When language model meets private library”, Findings of the Association for Computational Linguistics: EMNLP (Abu Dhabi, UAE, December 2022), 2022, 277–288

[132] A. Zhang et al., On generative agents in recommendation, 2024, arXiv: 2310.10108

[133] Q. Zhang et al., AdaLoRA: Adaptive budget allocation for parameter-efficient fine-tuning, 2023, arXiv: 2303.10512

[134] S. Zhang et al., Instruction tuning for large language models: A survey, 2024, arXiv: 2308.10792

[135] H. Zhao et al., “Prototype-based HyperAdapter for sample-efficient multi-task tuning”, Proc. 2023 Conference on Empirical Methods in Natural Language Processing (Singapore, December 2023), 4603–4615

[136] P. Zhao et al., Retrieval-augmented generation for AI-generated content: A survey, 2024, arXiv: 2402.19473

[137] R. Zhao et al., Retrieving multimodal information for augmented generation: A survey, 2023, arXiv: 2303.10868

[138] W. X. Zhao et al., A survey of large language models, 2024, arXiv: 2303.18223

[139] L. Zheng et al., Judging LLM-as-a-judge with MT-Bench and Chatbot Arena, 2023, arXiv: 2306.05685

[140] R. Zheng et al., Secrets of RLHF in large language models part I: PPO, 2023, arXiv: 2307.04964

[141] T. Zheng et al., OpenCodeInterpreter: Integrating code generation with execution and refinement, 2024, arXiv: 2402.14658

[142] L. Zhong et al., Debug like a human: A large language model debugger via verifying runtime execution step-by-step, 2024, arXiv: 2402.16906

[143] S. Zhou et al., DocPrompting: Generating code by retrieving the docs, 2023, arXiv: 2207.05987

[144] B. Zhu et al., “Principled reinforcement learning with human feedback from pairwise or k-wise comparisons”, Proc. 40th International Conference on Machine Learning (ICML), 2023 JMLR.org

[145] Y. Zhu et al., “Counter-interference adapter for multilingual machine translation”, Findings of the Association for Computational Linguistics: EMNLP (Punta Cana, Dominican Republic, November 2021), 2021, 2812–2823 | DOI