Truth or Mirage?
Towards End-to-End Factuality Evaluation
with LLM-Oasis

This repository contains the resource introduced in the paper: "Truth or Mirage? Towards End-to-End Factuality Evaluation with LLM-Oasis". LLM-Oasis is a large-scale resource for end-to-end factuality evaluation obtained by extracting and falsifying information from Wikipedia. Specifically, given a text from Wikipedia, we extract a set of factual and unfactual claims, with the latter obtained by falsifying one of the facts expressed in the original text. Starting from these sets, we design two claims2text tasks and generate a factual text, which is a paraphrase of the original one, and its unfactual counterpart, featuring the falsified claim. This resulted in 81k ⟨factual, unfactual⟩ pairs that are suitable for training and evaluating fact-checking systems.

Datasets

LLM-Oasis comprises multiple datasets, all hosted on Hugging Face, addressing different stages of the pipeline as described in the paper:

Claim Extraction

Babelscape/LLM-Oasis_claim_extraction
- Contains the text-claims pairs used to train the claim extraction system.
- Refer to Section 3.1 of the paper for more details.

Claim Falsification

Babelscape/LLM-Oasis_claim_falsification
- Includes the outcome of the claim falsification process.
- Refer to Section 3.2 of the paper for more details.

Paraphrase Generation

Babelscape/LLM-Oasis_paraphrase_generation
- Contains the factual texts, i.e., the paraphrases, generated from the extracted claims.
- Refer to Section 3.3 of the paper for more details.

Unfactual Text Generation

Babelscape/LLM-Oasis_unfactual_text_generation
- Includes the non-factual texts generated from the set of extracted claims, including the falsified one.

Gold Benchmark

Task 1: End-to-End Factuality Evaluation

Babelscape/LLM-Oasis_e2e_factuality_evaluation
- Contains data for assessing the factuality of raw texts in natural language.
- Labels have been removed for blind evaluation.
- Refer to Section 4.2 of the paper for more details.

🚨 Evaluate your LLM 🚨

Do you want to evaluate your LLM as an end-to-end factuality evaluator on our gold benchmark? Submit your predictions here: Submission form

Upload a .jsonl whose entries are formatted like this:

{
  'id': str # matching the 'id' value in Babelscape/LLM-Oasis_e2e_factuality_evaluation;
  'factual': bool # where True indicates that the text is factual, False, conversely.
}

Task 2: Evidence-Based Claim Verification

Babelscape/LLM-Oasis_claim_verification
- Contains data for verifying the veracity of a single claim against evidence from Wikipedia.
- Labels have been removed for blind evaluation.
- Refer to Section 4.2 of the paper for more details.

🚨 Evaluate your LLM 🚨

Do you want to evaluate your LLM for claim verification on our gold benchmark??

Submit your predictions here: Submission form

Upload a .jsonl whose entries are formatted like this:

{
  'id': str # matching the 'id' value in Babelscape/LLM-Oasis_claim_verification;
  'factual': bool # where True indicates that the claim is factual, False, conversely.
}

License

This work is under the Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) license.

Citation

If you use LLM-Oasis in your work, please cite our paper:

@misc{scirè2024truthmirageendtoendfactuality,
      title={Truth or Mirage? Towards End-to-End Factuality Evaluation with LLM-Oasis}, 
      author={Alessandro Scirè and Andrei Stefan Bejgu and Simone Tedeschi and Karim Ghonim and Federico Martelli and Roberto Navigli},
      year={2024},
      eprint={2411.19655},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2411.19655}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
LICENSE.txt		LICENSE.txt
README.md		README.md
llm-oasis.png		llm-oasis.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Truth or Mirage?
Towards End-to-End Factuality Evaluation
with LLM-Oasis

Datasets

Claim Extraction

Claim Falsification

Paraphrase Generation

Unfactual Text Generation

Gold Benchmark

Task 1: End-to-End Factuality Evaluation

🚨 Evaluate your LLM 🚨

Task 2: Evidence-Based Claim Verification

🚨 Evaluate your LLM 🚨

License

Citation

About

Contributors 2

License

Babelscape/LLM-Oasis

Folders and files

Latest commit

History

Repository files navigation

Truth or Mirage?Towards End-to-End Factuality Evaluation with LLM-Oasis

Datasets

Claim Extraction

Claim Falsification

Paraphrase Generation

Unfactual Text Generation

Gold Benchmark

Task 1: End-to-End Factuality Evaluation

🚨 Evaluate your LLM 🚨

Task 2: Evidence-Based Claim Verification

🚨 Evaluate your LLM 🚨

License

Citation

About

Resources

License

Stars

Watchers

Forks

Contributors 2

Truth or Mirage?
Towards End-to-End Factuality Evaluation
with LLM-Oasis