Skip to content

Latest commit

 

History

History
135 lines (72 loc) · 11.4 KB

File metadata and controls

135 lines (72 loc) · 11.4 KB

🤖✨ Awesome Repository-Level Code Generation ✨🤖

🌟 A curated list of awesome repository-level code generation research papers and resources. If you want to contribute to this list (please do), feel free to send me a pull request. 🚀

📚 Contents

💥 Repo-Level Issue Resolution

  • SWE-Fixer: Training Open-Source LLMs for Effective and Efficient GitHub Issue Resolution [2025-01-arXiv] [📄 paper] [🔗 repo]

  • Training Software Engineering Agents and Verifiers with SWE-Gym [2024-12-arXiv] [📄 paper] [🔗 repo]

  • CODEV: Issue Resolving with Visual Data [2024-12-arXiv] [📄 paper] [🔗 repo]

  • LLMs as Continuous Learners: Improving the Reproduction of Defective Code in Software Issues [2024-11-arXiv] [📄 paper]

  • Globant Code Fixer Agent Whitepaper [2024-11] [📄 paper]

  • MarsCode Agent: AI-native Automated Bug Fixing [2024-11-arXiv] [📄 paper]

  • Lingma SWE-GPT: An Open Development-Process-Centric Language Model for Automated Software Improvement [2024-11-arXiv] [📄 paper] [🔗 repo]

  • SWE-Search: Enhancing Software Agents with Monte Carlo Tree Search and Iterative Refinement [2024-10-arXiv] [📄 paper] [🔗 repo]

  • AutoCodeRover: Autonomous Program Improvement [2024-09-ISSTA] [📄 paper] [🔗 repo]

  • SpecRover: Code Intent Extraction via LLMs [2024-08-arXiv] [📄 paper]

  • OpenHands: An Open Platform for AI Software Developers as Generalist Agents [2024-07-arXiv] [📄 paper] [🔗 repo]

  • AGENTLESS: Demystifying LLM-based Software Engineering Agents [2024-07-arXiv] [📄 paper]

  • RepoGraph: Enhancing AI Software Engineering with Repository-level Code Graph [2024-07-arXiv] [📄 paper] [🔗 repo]

  • How to Understand Whole Software Repository? [2024-06-arXiv] [📄 paper]

  • SWE-Agent: Can Language Models Resolve Real-World GitHub Issues? [2024-01-ICLR] [📄 paper] [🔗 repo]

🤖 Repo-Level Code Completion

  • Improving FIM Code Completions via Context & Curriculum Based Learning [2024-12-arXiv] [📄 paper]

  • ContextModule: Improving Code Completion via Repository-level Contextual Information [2024-12-arXiv] [📄 paper]

  • RepoGenReflex: Enhancing Repository-Level Code Completion with Verbal Reinforcement and Retrieval-Augmented Generation [2024-09-arXiv] [📄 paper]

  • RAMBO: Enhancing RAG-based Repository-Level Method Body Completion [2024-09-arXiv] [📄 paper] [🔗 repo]

  • RLCoder: Reinforcement Learning for Repository-Level Code Completion [2024-07-arXiv] [📄 paper] [🔗 repo]

  • Hierarchical Context Pruning: Optimizing Real-World Code Completion with Repository-Level Pretrained Code LLMs [2024-06-arXiv] [📄 paper] [🔗 repo]

  • STALL+: Boosting LLM-based Repository-level Code Completion with Static Analysis [2024-06-arXiv] [📄 paper]

  • GraphCoder: Enhancing Repository-Level Code Completion via Code Context Graph-based Retrieval and Language Model [2024-06-arXiv] [📄 paper]

  • Enhancing Repository-Level Code Generation with Integrated Contextual Information [2024-06-arXiv] [📄 paper]

  • R2C2-Coder: Enhancing and Benchmarking Real-world Repository-level Code Completion Abilities of Code Large Language Models [2024-06-arXiv] [📄 paper]

  • Natural Language to Class-level Code Generation by Iterative Tool-augmented Reasoning over Repository [2024-05-arXiv] [📄 paper] [🔗 repo]

  • Iterative Refinement of Project-Level Code Context for Precise Code Generation with Compiler Feedback [2024-03-arXiv] [📄 paper] [🔗 repo]

  • Repoformer: Selective Retrieval for Repository-Level Code Completion [2024-03-arXiv] [📄 paper] [🔗 repo]

  • RepoHyper: Search-Expand-Refine on Semantic Graphs for Repository-Level Code Completion [2024-03-arXiv] [📄 paper] [🔗 repo]

  • RepoMinCoder: Improving Repository-Level Code Generation Based on Information Loss Screening [2024-07-Internetware] [📄 paper]

  • CodePlan: Repository-Level Coding using LLMs and Planning [2024-07-FSE] [📄 paper] [🔗 repo]

  • RepoCoder: Repository-Level Code Completion Through Iterative Retrieval and Generation [2023-10-EMNLP] [📄 paper] [🔗 repo]

  • Monitor-Guided Decoding of Code LMs with Static Analysis of Repository Context [2023-09-NeurIPS] [📄 paper] [🔗 repo]

  • RepoFusion: Training Code Models to Understand Your Repository [2023-06-arXiv] [📄 paper] [🔗 repo]

  • Repository-Level Prompt Generation for Large Language Models of Code [2023-06-ICML] [📄 paper] [🔗 repo]

  • Fully Autonomous Programming with Large Language Models [2023-06-GECCO] [📄 paper] [🔗 repo]

📊 Datasets and Benchmarks

  • LibEvolutionEval: A Benchmark and Study for Version-Specific Code Generation [2024-arXiv] [📄 paper]

  • REPOCOD: Can Language Models Replace Programmers? REPOCOD Says 'Not Yet' [2024-arXiv] [📄 paper] https://github.com/lt-asset/REPOCOD

  • Codev-Bench: How Do LLMs Understand Developer-Centric Code Completion? [2024-arXiv] [📄 paper] [🔗 repo]

  • RepoExec: On the Impacts of Contexts on Repository-Level Code Generation [2024-arXiv] [📄 paper] https://github.com/FSoft-AI4Code/RepoExec

  • CodeRAG-Bench: Can Retrieval Augment Code Generation? [2024-arXiv] [📄 paper] [🔗 repo]

  • R2C2-Bench: Enhancing and Benchmarking Real-world Repository-level Code Completion Abilities of Code Large Language Models [2024-arXiv] [📄 paper]

  • DevEval: Evaluating Code Generation in Practical Software Projects [2024-ACL-Findings] [📄 paper] [🔗 repo]

  • CodAgentBench: Enhancing Code Generation with Tool-Integrated Agent Systems for Real-World Repo-level Coding Challenges [2024-ACL] [📄 paper]

  • RepoBench: Benchmarking Repository-Level Code Auto-Completion Systems [2024-ICLR] [📄 paper] [🔗 repo]

  • R2E-Eval: Turning Any GitHub Repository into a Programming Agent Test Environment [2024-ICML] [📄 paper] [🔗 repo]

  • SWE-bench: Can Language Models Resolve Real-World GitHub Issues? [2024-ICLR] [📄 paper] [🔗 repo]

  • SWE-bench+: Enhanced Coding Benchmark for LLMs [2024-arXiv] [📄 paper]

  • SWE-bench Multimodal: Multimodal Software Engineering Benchmark [2024-arXiv] [📄 paper] [🔗 site]

  • Visual SWE-bench: Issue Resolving with Visual Data [2024-arXiv] [📄 paper] [🔗 repo]

  • SWE-Gym: Training Software Engineering Agents and Verifiers with SWE-Gym [2024-12-arXiv] [📄 paper] [🔗 repo]

  • RepoEval: Repository-Level Code Completion Through Iterative Retrieval and Generation [2023-EMNLP] [📄 paper] [🔗 repo]

  • CrossCodeEval: A Diverse and Multilingual Benchmark for Cross-File Code Completion [2023-NeurIPS] [📄 paper] [🔗 site]

  • CrossCodeLongEval: Repoformer: Selective Retrieval for Repository-Level Code Completion [2024-ICML] [📄 paper] [🔗 repo]

  • M2RC-EVAL: M2rc-Eval: Massively Multilingual Repository-level Code Completion Evaluation [2024-arXiv] [📄 paper] [🔗 repo]

  • ExecRepoBench: Multi-level Executable Code Completion Evaluation [2024-arXiv] [📄 paper] [🔗 site]