Avaliable in Hugging Face https://huggingface.co/datasets/MMG/SpanishBFF
Spanish-BFF is the first Spanish AI-generated dictionary using GPT3.
- Paper: Spanish Built Factual Freectianary (Spanish-BFF): the first AI-generated free dictionary
- Point of Contact: [email protected] , [email protected]
Spanish-BFF contains a total of 66353 lemmas with its definitions (only one definiton per lemma).
These lemmas correspond to nominal, adjetival, verbal and adverbial classes.
- Spanish (es)
{ 'id': 'b0o8', 'lemma': 'fomo', 'definition': 'FOMO es un acrónimo de "miedo a perderse", y se refiere a la ansiedad que uno puede sentir cuando ve que otros están disfrutando de algo que él o ella no está haciendo.', }
{ id: str lemma: str difinition: str }
Split | Size |
---|---|
train |
66,353 |
- Number of nouns: 38093 (57.41 %)
- Number of adjetives: 17424 (26.26 %)
- Number of verbs: 9296 (14.01 %)
- Number of adverbs: 1540 (2.32 %)
Uncertainties provided by a coverage factor k=1 within the standard deviation of the population of definitions.
- Total words in definitions: 551878
- Average words/definition: 8.3 +/- 5.1 words
- Average characters/definitions: 49.1 +/- 28.4 characters
Each one of the definitons were generated in batches using the following prompt:
Generate in Spanish a definition of the word "[word]"
This corpus is the first open-source complete dictionary produced by LLMs. We intend to contribute to a better understanding and development of NLP and promote responsible use.
This version has not been postprocessed to mitigate potential errors, biases or hallucinations the AI model could have generated.
@misc{https://doi.org/10.48550/arxiv.2302.12746,
doi = {10.48550/ARXIV.2302.12746},
url = {https://arxiv.org/abs/2302.12746},
author = {Ortega-Martín, Miguel and García-Sierra, Óscar and Ardoiz, Alfonso and Armenteros, Juan Carlos and Álvarez, Jorge and Alonso, Adrián},
keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
title = {Spanish Built Factual Freectianary (Spanish-BFF): the first AI-generated free dictionary},
publisher = {arXiv},
year = {2023},
copyright = {Creative Commons Attribution 4.0 International}
}