🤖✨ Awesome Repository-Level Code Generation ✨🤖

🌟 A curated list of awesome repository-level code generation research papers and resources. If you want to contribute to this list (please do), feel free to send me a pull request. 🚀

📚 Contents

📚 Contents
💥 Repo-Level Issue Resolution
🤖 Repo-Level Code Completion
📊 Datasets and Benchmarks

💥 Repo-Level Issue Resolution

SWE-Fixer: Training Open-Source LLMs for Effective and Efficient GitHub Issue Resolution [2025-01-arXiv] [📄 paper] [🔗 repo]
Training Software Engineering Agents and Verifiers with SWE-Gym [2024-12-arXiv] [📄 paper] [🔗 repo]
CODEV: Issue Resolving with Visual Data [2024-12-arXiv] [📄 paper] [🔗 repo]
LLMs as Continuous Learners: Improving the Reproduction of Defective Code in Software Issues [2024-11-arXiv] [📄 paper]
Globant Code Fixer Agent Whitepaper [2024-11] [📄 paper]
MarsCode Agent: AI-native Automated Bug Fixing [2024-11-arXiv] [📄 paper]
Lingma SWE-GPT: An Open Development-Process-Centric Language Model for Automated Software Improvement [2024-11-arXiv] [📄 paper] [🔗 repo]
SWE-Search: Enhancing Software Agents with Monte Carlo Tree Search and Iterative Refinement [2024-10-arXiv] [📄 paper] [🔗 repo]
AutoCodeRover: Autonomous Program Improvement [2024-09-ISSTA] [📄 paper] [🔗 repo]
SpecRover: Code Intent Extraction via LLMs [2024-08-arXiv] [📄 paper]
OpenHands: An Open Platform for AI Software Developers as Generalist Agents [2024-07-arXiv] [📄 paper] [🔗 repo]
AGENTLESS: Demystifying LLM-based Software Engineering Agents [2024-07-arXiv] [📄 paper]
RepoGraph: Enhancing AI Software Engineering with Repository-level Code Graph [2024-07-arXiv] [📄 paper] [🔗 repo]
How to Understand Whole Software Repository? [2024-06-arXiv] [📄 paper]
SWE-Agent: Can Language Models Resolve Real-World GitHub Issues? [2024-01-ICLR] [📄 paper] [🔗 repo]

🤖 Repo-Level Code Completion

Improving FIM Code Completions via Context & Curriculum Based Learning [2024-12-arXiv] [📄 paper]
ContextModule: Improving Code Completion via Repository-level Contextual Information [2024-12-arXiv] [📄 paper]
RepoGenReflex: Enhancing Repository-Level Code Completion with Verbal Reinforcement and Retrieval-Augmented Generation [2024-09-arXiv] [📄 paper]
RAMBO: Enhancing RAG-based Repository-Level Method Body Completion [2024-09-arXiv] [📄 paper] [🔗 repo]
RLCoder: Reinforcement Learning for Repository-Level Code Completion [2024-07-arXiv] [📄 paper] [🔗 repo]
Hierarchical Context Pruning: Optimizing Real-World Code Completion with Repository-Level Pretrained Code LLMs [2024-06-arXiv] [📄 paper] [🔗 repo]
STALL+: Boosting LLM-based Repository-level Code Completion with Static Analysis [2024-06-arXiv] [📄 paper]
GraphCoder: Enhancing Repository-Level Code Completion via Code Context Graph-based Retrieval and Language Model [2024-06-arXiv] [📄 paper]
Enhancing Repository-Level Code Generation with Integrated Contextual Information [2024-06-arXiv] [📄 paper]
R2C2-Coder: Enhancing and Benchmarking Real-world Repository-level Code Completion Abilities of Code Large Language Models [2024-06-arXiv] [📄 paper]
Natural Language to Class-level Code Generation by Iterative Tool-augmented Reasoning over Repository [2024-05-arXiv] [📄 paper] [🔗 repo]
Iterative Refinement of Project-Level Code Context for Precise Code Generation with Compiler Feedback [2024-03-arXiv] [📄 paper] [🔗 repo]
Repoformer: Selective Retrieval for Repository-Level Code Completion [2024-03-arXiv] [📄 paper] [🔗 repo]
RepoHyper: Search-Expand-Refine on Semantic Graphs for Repository-Level Code Completion [2024-03-arXiv] [📄 paper] [🔗 repo]
RepoMinCoder: Improving Repository-Level Code Generation Based on Information Loss Screening [2024-07-Internetware] [📄 paper]
CodePlan: Repository-Level Coding using LLMs and Planning [2024-07-FSE] [📄 paper] [🔗 repo]
RepoCoder: Repository-Level Code Completion Through Iterative Retrieval and Generation [2023-10-EMNLP] [📄 paper] [🔗 repo]
Monitor-Guided Decoding of Code LMs with Static Analysis of Repository Context [2023-09-NeurIPS] [📄 paper] [🔗 repo]
RepoFusion: Training Code Models to Understand Your Repository [2023-06-arXiv] [📄 paper] [🔗 repo]
Repository-Level Prompt Generation for Large Language Models of Code [2023-06-ICML] [📄 paper] [🔗 repo]
Fully Autonomous Programming with Large Language Models [2023-06-GECCO] [📄 paper] [🔗 repo]

📊 Datasets and Benchmarks

LibEvolutionEval: A Benchmark and Study for Version-Specific Code Generation [2024-arXiv] [📄 paper]
REPOCOD: Can Language Models Replace Programmers? REPOCOD Says 'Not Yet' [2024-arXiv] [📄 paper] https://github.com/lt-asset/REPOCOD
Codev-Bench: How Do LLMs Understand Developer-Centric Code Completion? [2024-arXiv] [📄 paper] [🔗 repo]
RepoExec: On the Impacts of Contexts on Repository-Level Code Generation [2024-arXiv] [📄 paper] https://github.com/FSoft-AI4Code/RepoExec
CodeRAG-Bench: Can Retrieval Augment Code Generation? [2024-arXiv] [📄 paper] [🔗 repo]
R2C2-Bench: Enhancing and Benchmarking Real-world Repository-level Code Completion Abilities of Code Large Language Models [2024-arXiv] [📄 paper]
DevEval: Evaluating Code Generation in Practical Software Projects [2024-ACL-Findings] [📄 paper] [🔗 repo]
CodAgentBench: Enhancing Code Generation with Tool-Integrated Agent Systems for Real-World Repo-level Coding Challenges [2024-ACL] [📄 paper]
RepoBench: Benchmarking Repository-Level Code Auto-Completion Systems [2024-ICLR] [📄 paper] [🔗 repo]
R2E-Eval: Turning Any GitHub Repository into a Programming Agent Test Environment [2024-ICML] [📄 paper] [🔗 repo]
SWE-bench: Can Language Models Resolve Real-World GitHub Issues? [2024-ICLR] [📄 paper] [🔗 repo]
SWE-bench+: Enhanced Coding Benchmark for LLMs [2024-arXiv] [📄 paper]
SWE-bench Multimodal: Multimodal Software Engineering Benchmark [2024-arXiv] [📄 paper] [🔗 site]
Visual SWE-bench: Issue Resolving with Visual Data [2024-arXiv] [📄 paper] [🔗 repo]
SWE-Gym: Training Software Engineering Agents and Verifiers with SWE-Gym [2024-12-arXiv] [📄 paper] [🔗 repo]
RepoEval: Repository-Level Code Completion Through Iterative Retrieval and Generation [2023-EMNLP] [📄 paper] [🔗 repo]
CrossCodeEval: A Diverse and Multilingual Benchmark for Cross-File Code Completion [2023-NeurIPS] [📄 paper] [🔗 site]
CrossCodeLongEval: Repoformer: Selective Retrieval for Repository-Level Code Completion [2024-ICML] [📄 paper] [🔗 repo]
M2RC-EVAL: M2rc-Eval: Massively Multilingual Repository-level Code Completion Evaluation [2024-arXiv] [📄 paper] [🔗 repo]
ExecRepoBench: Multi-level Executable Code Completion Evaluation [2024-arXiv] [📄 paper] [🔗 site]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

🤖✨ Awesome Repository-Level Code Generation ✨🤖

📚 Contents

💥 Repo-Level Issue Resolution

🤖 Repo-Level Code Completion

📊 Datasets and Benchmarks

Files

README.md

Latest commit

History

README.md

File metadata and controls

🤖✨ Awesome Repository-Level Code Generation ✨🤖

📚 Contents

💥 Repo-Level Issue Resolution

🤖 Repo-Level Code Completion

📊 Datasets and Benchmarks