Skip to content

Latest commit

 

History

History
31 lines (24 loc) · 1.53 KB

get_datasets.md

File metadata and controls

31 lines (24 loc) · 1.53 KB

Get the datasets

Warning

By downloading and using the datasets provided by GuardBench, you agree with the license of the specific datasets you intend to use. If you are unsure about what you can and can't do with specific datasets, please refer to their original repositories and owners.

HuggingFace CLI Login

To download some datasets you need to be logged into HuggingFace. If you don't have an account, create one. Then, create/copy an access token from your profile settings. Lastly, log into HuggingFace using the HuggingFace CLI:

huggingface-cli login

DecodingTrust

To download DecodingTrust, you must first fill this form.

HEx-PHI

To download HEx-PHI, you must first fill this form.

ToxiGen

To download ToxiGen, you must first fill this form. We suggest to use the same email address of your HuggingFace account.

Download Datasets

Datasets are downloaded and formatted on the first request and stored in ~/.guardbench/datasets for later use.
You can also download and format all datasets in advance with the following command:

python -c "import guardbench; guardbench.download_all()"