8000 GitHub - TRUMANCFY/CoQuIR: CoQuIR: A Comprehensive Benchmark for Code Quality-Aware Information Retrieval
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

TRUMANCFY/CoQuIR

Repository files navigation

CoQuIR: A Comprehensive Benchmark for Code Quality-Aware Information Retrieval

TL;DR: Code quality, a fundamental property of software, has been largely overlooked by modern retrievers.

Abstract: Code retrieval is essential in modern software development as it boosts reuse and speeds up debugging. However, current benchmarks primarily emphasize functional relevance while neglecting critical dimensions of software quality. Motivated by this gap, we introduce CoQuIR, the first large-scale, multilingual benchmark specifically designed to evaluate quality-aware code retrieval across four critical dimensions: correctness, efficiency, security, and maintainability. CoQuIR includes fine-grained quality annotations over 42,725 queries and 134,907 code snippets in 11 programming languages and is accompanied by two quality-centric evaluation metrics (Pairwise Preference Accuracy and Margin-based Ranking Score). Based on CoQuIR, we benchmark 23 retrieval models, spanning open-source and proprietary models, and find that even top-performing models often fail to distinguish buggy or insecure code from their robust counterparts. Furthermore, we conduct preliminary investigations into methods for explicitly training retrievers to recognize code quality. Through synthetic datasets, we demonstrate promising improvements in quality-aware metrics across different models without compromising semantic relevance. Downstream code generation performance further validates the effectiveness of our approach. Our work underscores the importance of incorporating quality signals into code retrieval systems, establishing a foundation for more trustworthy software development tools.

Installation

To begin, set up the conda environment using the following command:

conda env create -f environment.yml

Specifically, we include CoQuIR inmteb library:

pip uninstall mteb
pip install git+https://github.com/TRUMANCFY/mteb@main

Resources

CoQuIR is public on Hugging Face 🤗. CoQuIR considers different aspects of code quality, including correctness, efficiency, security, and maintainability.

As a multilingual benchmark, CoQuIR includes codes in 11 programing language from various use cases.

Evaluation

We benchmark 23 retrieval models, including both open-source and proprietary models. The script and the complete evaluation results are included in evaluate_coquir.sh and preference_code_retrieval_evaluation.ipynb.

We inherit the original metrics for information retrieval tasks, i.e., Normalized Discounted Cumulative Gain@10 (NDCG@10) and Mean Reciprocal Rank@10 (MRR@10).

To better model the quality-awareness of retrievers, we specifically design two quality-aware metrics, Pairwise Preference Accuracy (PPA) and Margin-based Ranking Score (MRS). We find even the top-performing models often fail to distinguish the code quality.

Citing

About

CoQuIR: A Comprehensive Benchmark for Code Quality-Aware Information Retrieval

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
0