Skip to content

This repository contains the resource introduced in the paper: "Truth or Mirage? Towards End-to-End Factuality Evaluation with LLM-Oasis". LLM-Oasis is a large-scale resource for end-to-end factuality evaluation obtained by extracting and falsifying information from Wikipedia.

License

Notifications You must be signed in to change notification settings

Babelscape/LLM-Oasis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 

Repository files navigation

Truth or Mirage?
Towards End-to-End Factuality Evaluation
with LLM-Oasis

Paper License: CC BY-NC-SA 4.0 Hugging Face Collection

LLM-Oasis Overview

This repository contains the resource introduced in the paper: "Truth or Mirage? Towards End-to-End Factuality Evaluation with LLM-Oasis". LLM-Oasis is a large-scale resource for end-to-end factuality evaluation obtained by extracting and falsifying information from Wikipedia. Specifically, given a text from Wikipedia, we extract a set of factual and unfactual claims, with the latter obtained by falsifying one of the facts expressed in the original text. Starting from these sets, we design two claims2text tasks and generate a factual text, which is a paraphrase of the original one, and its unfactual counterpart, featuring the falsified claim. This resulted in 81k ⟨factual, unfactual⟩ pairs that are suitable for training and evaluating fact-checking systems.

Datasets

LLM-Oasis comprises multiple datasets, all hosted on Hugging Face, addressing different stages of the pipeline as described in the paper:

Claim Extraction

Claim Falsification

Paraphrase Generation

Unfactual Text Generation

Gold Benchmark

Task 1: End-to-End Factuality Evaluation

🚨 Evaluate your LLM 🚨

Do you want to evaluate your LLM as an end-to-end factuality evaluator on our gold benchmark? Submit your predictions here: Submission form

Upload a .jsonl whose entries are formatted like this:

{
  'id': str # matching the 'id' value in Babelscape/LLM-Oasis_e2e_factuality_evaluation;
  'factual': bool # where True indicates that the text is factual, False, conversely.
}

Task 2: Evidence-Based Claim Verification

  • Hugging Face Babelscape/LLM-Oasis_claim_verification
    • Contains data for verifying the veracity of a single claim against evidence from Wikipedia.
    • Labels have been removed for blind evaluation.
    • Refer to Section 4.2 of the paper for more details.

🚨 Evaluate your LLM 🚨

Do you want to evaluate your LLM for claim verification on our gold benchmark??

Submit your predictions here: Submission form

Upload a .jsonl whose entries are formatted like this:

{
  'id': str # matching the 'id' value in Babelscape/LLM-Oasis_claim_verification;
  'factual': bool # where True indicates that the claim is factual, False, conversely.
}

License

This work is under the Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) license.

Citation

If you use LLM-Oasis in your work, please cite our paper:

@misc{scirè2024truthmirageendtoendfactuality,
      title={Truth or Mirage? Towards End-to-End Factuality Evaluation with LLM-Oasis}, 
      author={Alessandro Scirè and Andrei Stefan Bejgu and Simone Tedeschi and Karim Ghonim and Federico Martelli and Roberto Navigli},
      year={2024},
      eprint={2411.19655},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2411.19655}, 
}

About

This repository contains the resource introduced in the paper: "Truth or Mirage? Towards End-to-End Factuality Evaluation with LLM-Oasis". LLM-Oasis is a large-scale resource for end-to-end factuality evaluation obtained by extracting and falsifying information from Wikipedia.

Resources

License

Stars

Watchers

Forks