Leaderboard (Sort by HumanEval Pass@1)
Rank | Model | Params | HumanEval | MBPP | HF | Paper |
---|---|---|---|---|---|---|
1 | GPT-4 + Relexion | ? | 91.0 | 77.1 | paper | |
2 | GPT-4 | ? | 67.0 | paper | ||
3 | Pangu-Coder2 | 15B | 61.6 | paper | ||
4 | WizardCoder-15B | 15B | 57.3 | 51.8 | ckpt | paper |
5 | GPT-3.5 | ? | 48.1 | paper | ||
6 | Code-Davinci-002 | ? | 47.0 | paper | ||
7 | StarCoder-15B (Prompted) | 15B | 40.8 | 49.5 | ckpt | paper |
8 | PaLM 2-S | ? | 37.6 | 50.0 | paper | |
9 | PaLM-Coder-540B | 540B | 36.0 | 47.0 | paper | |
10 | InstructCodeT5+ | 16B | 35.0 | paper | ||
11 | StarCoder-15B | 15B | 33.6 | 52.7 | ckpt | paper |
12 | Code-Cushman-001 | ? | 33.5 | 45.9 | paper | |
13 | CodeT5+ | 16B | 30.9 | paper | ||
14 | LLaMA2-70B | 70B | 29.9 | ckpt | paper | |
15 | CodeGen-16B-Mono | 16B | 29.3 | 35.3 | paper | |
16 | PaLM-540B | 540B | 26.2 | 36.8 | paper | |
17 | LLaMA-65B | 65B | 23.7 | 37.7 | paper | |
18 | CodeGeeX | 13B | 22.9 | 24.4 | paper | |
19 | LLaMA-33B | 33B | 21.7 | 30.2 | paper | |
20 | CodeGen-16B-Multi | 16B | 18.3 | 20.9 | paper | |
21 | AlphaCode | 1.1B | 17.1 | paper |
๐ก Toolkit:
- bigcode-evaluation-harness: A framework for the evaluation of autoregressive code generation language models.
- multilingual-code-evals: Multilingual Code Models Evaluation.
-
Evaluating Large Language Models Trained on Code
Preprint
[Paper] Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto. et al. , 2021.07
-
CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis
ICLR23
[Paper] Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, Caiming Xiong , 2022.03
-
CodeGen2: Lessons for Training LLMs on Programming and Natural Languages
ICLR23
[Paper] Erik Nijkamp, Hiroaki Hayashi, Caiming Xiong, Silvio Savarese, Yingbo Zhou , 2023.05
-
SantaCoder: don't reach for the stars!
Preprint
[Paper] Loubna Ben Allal, Raymond Li, Denis Kocetkov, Chenghao Mou, Christopher Akiki, Carlos Munoz Ferrandis, Niklas Muennighoff. et al. , 2023.01
-
StarCoder: may the source be with you!
Preprint
[Paper] Raymond Li, Loubna Ben Allal, Yangtian Zi, Niklas Muennighoff, Denis Kocetkov, Chenghao Mou. et al. , 2023.05
-
WizardCoder: Empowering Code Large Language Models with Evol-Instruct
Preprint
[Paper] Ziyang Luo, Can Xu, Pu Zhao, Qingfeng Sun, Xiubo Geng, Wenxiang Hu, Chongyang Tao, Jing Ma, Qingwei Lin, Daxin Jiang , 2023.07
-
PanGu-Coder2: Boosting Large Language Models for Code with Ranking Feedback
Preprint
[Paper] Bo Shen, Jiaxin Zhang, Taihong Chen, Daoguang Zan, Bing Geng, An Fu, Muhan Zeng, Ailun Yu, Jichuan Ji, Jingyang Zhao, Yuenan Guo, Qianxiang Wang , 2023.07
-
CodeT: Code Generation with Generated Tests
ICLR23
[Paper] Bei Chen, Fengji Zhang, Anh Nguyen, Daoguang Zan, Zeqi Lin, Jian-Guang Lou, Weizhu Chen , 2022.07
-
Coder Reviewer Reranking for Code Generation
ICML23
[Paper] Tianyi Zhang, Tao Yu, Tatsunori B Hashimoto, Mike Lewis, Wen-tau Yih, Daniel Fried, Sida I Wang , 2022.11
-
Measuring Coding Challenge Competence With APPS
NeurIPS21
Named APPS
[Paper][Repo] Dan Hendrycks, Steven Basart, Saurav Kadavath, Mantas Mazeika, Akul Arora, Ethan Guo, Collin Burns, Samir Puranik, Horace He, Dawn Song, Jacob Steinhardt , 2021.05
-
Program Synthesis with Large Language Models
Preprint
Named MBPP
[Paper] Jacob Austin, Augustus Odena, Maxwell Nye, Maarten Bosma, Henryk Michalewski, David Dohan, Ellen Jiang, Carrie Cai, Michael Terry, Quoc Le, Charles Sutton , 2021.08
This is an active repository and your contributions are always welcome! If you have any question about this opinionated list, do not hesitate to contact me [email protected]
.
@software{awesome-code-llm,
author = {Binyuan Hui},
title = {An awesome and curated list of best code-LLM for research},
howpublished = {\url{https://github.com/huybery/Awesome-Code-LLM}},
year = 2023,
}
This project is inspired by Awesome-LLM.