GuardBench
is a Python library for guardrail models evaluation.
It provides a common interface to 40 evaluation datasets, which are downloaded and converted into a standardized format for improved usability.
It also allows to quickly compare results and export LaTeX
tables for scientific publications.
GuardBench
's benchmarking pipeline can also be leveraged on custom datasets.
GuardBench
was featured in EMNLP 2024.
The related paper is available here.
You can find the list of supported datasets here. A few of them requires authorization. Please, read this.
If you use GuardBench
to evaluate guardrail models for your scientific publications, please consider citing our work.
- 40 datasets for guardrail models evaluation.
- Automated evaluation pipeline.
- User-friendly.
- Extendable.
- Reproducible and sharable evaluation.
- Exportable evaluation reports.
python>=3.10
pip install guardbench
from guardbench import benchmark
def moderate(
conversations: list[list[dict[str, str]]], # MANDATORY!
# additional `kwargs` as needed
) -> list[float]:
# do moderation
# return list of floats (unsafe probabilities)
benchmark(
moderate=moderate, # User-defined moderation function
model_name="My Guardrail Model",
batch_size=32,
datasets="all",
# Note: you can pass additional `kwargs` for `moderate`
)
- Follow our tutorial on benchmarking
Llama Guard
withGuardBench
. - More examples are available in the
scripts
folder.
Browse the documentation for more details about:
- The datasets and how to obtain them.
- The data format used by
GuardBench
. - How to use the
Report
class to compare models and export results asLaTeX
tables. - How to leverage
GuardBench
's benchmarking pipeline on custom datasets.
You can find GuardBench
's leaderboard here.
All results can be reproduced using the provided scripts
.
If you want to submit your results, please contact us.
- Elias Bassani (European Commission - Joint Research Centre)
@inproceedings{guardbench,
title = "{G}uard{B}ench: A Large-Scale Benchmark for Guardrail Models",
author = "Bassani, Elias and
Sanchez, Ignacio",
editor = "Al-Onaizan, Yaser and
Bansal, Mohit and
Chen, Yun-Nung",
booktitle = "Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing",
month = nov,
year = "2024",
address = "Miami, Florida, USA",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.emnlp-main.1022",
doi = "10.18653/v1/2024.emnlp-main.1022",
pages = "18393--18409",
}
Would you like to see other features implemented? Please, open a feature request.
GuardBench is provided as open-source software licensed under EUPL v1.2.