Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training fulltext - annotation doubt #1237

Open
martasoricetti opened this issue Jan 23, 2025 · 1 comment
Open

Training fulltext - annotation doubt #1237

martasoricetti opened this issue Jan 23, 2025 · 1 comment
Labels
models:fulltext question There's no such thing as a stupid question training guidelines Related to the annotation guidelines for training data

Comments

@martasoricetti
Copy link

martasoricetti commented Jan 23, 2025

I’m training the full text model and I have a doubt. It could happen that a reference is divided by a figure, a table or a formula.
How should I handle this situations in the annotation process?

<p>[…]<ref type="biblio">(Ramanathan et<lb/></ref></p>
 
 <figure type="table">Table 5. […] </figure> 
 
 <figure>Figure 3. […] </figure>
 
 <p><ref type="biblio">al., 2001)</ref>. However, it was argued that the use of APCADA,<lb/>[…]</p>

I tried this approach (dividing the same intext reference pointer in two different tags) but i don’t know if it’s the right choice

@lfoppiano
Copy link
Collaborator

HI @martasoricetti, I think that's the right way. However I'm not sure how those rare cases will be reconstructed after the model extracts them. But yes, that's the right approach.

You could add xml:id / corresp attributes only to those references that are split by other elements.
Those attributes will be ignored at the moment, but they might be used in future to establish that it's the same reference.

Something like:

<p>[…]<ref type="biblio" xml:id="ref1">(Ramanathan et<lb/></ref></p>
 
 <figure type="table">Table 5. […] </figure> 
 
 <figure>Figure 3. […] </figure>
 
 <p><ref type="biblio" corresp="#ref1">al., 2001)</ref>. However, it was argued that the use of APCADA,<lb/>[…]</p>

@lfoppiano lfoppiano added question There's no such thing as a stupid question models:fulltext training guidelines Related to the annotation guidelines for training data labels Jan 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
models:fulltext question There's no such thing as a stupid question training guidelines Related to the annotation guidelines for training data
Projects
None yet
Development

No branches or pull requests

2 participants