Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

<p> duplicates #1232

Open
Samuel-Scalbert opened this issue Jan 16, 2025 · 5 comments
Open

<p> duplicates #1232

Samuel-Scalbert opened this issue Jan 16, 2025 · 5 comments
Labels
bug From Hemiptera and especially its suborder Heteroptera

Comments

@Samuel-Scalbert
Copy link

When processing this file from HAL with the last docker image :

The

tags are duplicated and intricated :

  • hal-03811257 (line 229)
<p><p><p>The creation of Web browser extensions to improve Web search is not new; almost twenty years ago, the first approaches started to appear. Although, most of them are still focused on primary searches or context-aware searches, such as the case of SearchPad</p>[? ]</p>. SearchPanel [? ] is an approach based on Web browser extensibility that helps users improve their search activity. However, SearchPanel is focused on improving information seeking by augmenting results on Web search services like Google or DuckDuckGo. However, it does not allow users to create new search services specific to a domain.</p>
@lfoppiano
Copy link
Collaborator

Hi @Samuel-Scalbert, which version and operating system are you using?

@lfoppiano lfoppiano added the bug From Hemiptera and especially its suborder Heteroptera label Jan 16, 2025
@Samuel-Scalbert
Copy link
Author

Hi, here are the information :

Operating System: Ubuntu 22.04.5 LTS
Kernel: Linux 6.8.0-51-generic
Architecture: x86-64

@lfoppiano
Copy link
Collaborator

Sorry, which version of grobid are you using?

@Samuel-Scalbert
Copy link
Author

I use this image : grobid/grobid:0.8.0

@lfoppiano
Copy link
Collaborator

The issue is indeed a bug, which is also present in version 0.8.1. It's an issue due to a table and relative note that is created where it shouldn't. It was fixed in #1207.

The latest dev version should work better: https://huggingface.co/spaces/lfoppiano/grobid-dev

Let's leave this issue open so that I will not forget to double check the issue of how the note is created.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug From Hemiptera and especially its suborder Heteroptera
Projects
None yet
Development

No branches or pull requests

2 participants