Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add free URI field to both place and person Recogito search #5

Open
gabrielbodard opened this issue Jul 27, 2017 · 8 comments
Open

Comments

@gabrielbodard
Copy link

No description provided.

@rsimon
Copy link

rsimon commented Jul 28, 2017

For Person, no specific extension is needed. I checked & confirmed that the standard uri field in the annotation body can just take any URI, and Recogito will store it happily.

For Places, the current implementation strictly requires place URIs that are known in the built-in gazetteer index, at the time the annotation is created. This kind of integrity enforcement would need to be relaxed if we want to allow this.

Warning, however: such "dangling links" would have a few consequences further down the line, i.e. a somewhat more thorough analysis of pros/cons will be needed at some point.

@rsimon
Copy link

rsimon commented Jul 29, 2017

Looked into this further & think it makes sense to treat "known" URIs differently from the others in the model. It would be (much) easier to query them separately meaning that

  • we can more easily derive separate stats for them (for the future annotation stats page)
  • it could be easier to handle updates to gazetteers or person data (which might lead to an "unknown URI" turning into a "known" one). But this could also be treated differently....

@gabrielbodard
Copy link
Author

That's fair enough—and I'm all in favour of granularity of fields allowing separate querying down the line—but, and I think this is important, it is important that the two URI fields also be able to be queried together. I see several possible cases for this:

  • Free entry of URIs that are known to Pelagios (or that become known to it later)
  • preserving the difference between Recogito-selected and user-pasted URIs (regardless of whether they are or become recognised)
  • comparison of URIs for machine-assisted disambiguation/coreference or reasoning
  • analysis of annotations with multiple URIs for assisted gazetteer alignment

In other words, I'm wondering if the more useful distinction isn't between known and unknown (although that has value for parsing/visualisation), but between interface-selected and manually entered.

@rsimon
Copy link

rsimon commented Jul 31, 2017

Hm... good point. Probably doesn't make modeling easier though ;-) Can we discuss the use cases a bit more?

  • Free entry of URIs that are known to Pelagios (or that become known to it later)

It totally makes sense to be able to add a URI directly, irrespective of whether it's known or not, I agree. But would it be essential to know whether it's been manually added or not? (Vs. manually searched in the gazetteer, for example?) After all, there is still the "confirmed" vs. "non-confirmed" flag, if the point was to distinguish between NER annotations and user-provided ones. In addition, automated NER that has not been touched by a human user is already identifiable because it has no "created by" information attached to it.

  • preserving the difference between Recogito-selected and user-pasted URIs

Would the key use case for this to benchmark the NER?

  • comparison of URIs for machine-assisted disambiguation/coreference or reasoning
  • analysis of annotations with multiple URIs for assisted gazetteer alignment

Can you elaborate on these two a bit more? E.g. give examples?

@rsimon
Copy link

rsimon commented Jan 17, 2018

Hi @gabrielbodard (and CC-ing @thegsi),

just a quick heads-up that I'm picking up work on this again. Time (as always) is limited, but I'd least like to spend a couple of days building a prototype branch of Recogito with a changed internal data model, where the "URIs have to be known & indexed" constraint is removed.

I think the code/schema changes may not be so bad after all. I'm expecting a performance hit on some rather essential features (map view, data exports) & don't yet know how bad it will be. Also, transforming the 500k+ existing annotations in our live instance to the a new format will be a bit of open-heart surgery, but let's worry about that when we get there ;-)

If it works, however, I think we would not only be able to support the feature as such; but it would also simplify Recogito's internal structure and potentially make it a lot easier to plug in external knowledge bases and URI sources. Hence, definitely a goal worth pursuing. I'll keep you posted on the progress!

P.S.: I'm also documenting stuff at pelagios#413

@thegsi
Copy link

thegsi commented Feb 1, 2018

@rsimon Sounds great. Probably Scala work? Do keep me updated here and/or email about progress and if you need some Javascript work.

@rsimon
Copy link

rsimon commented Feb 1, 2018

The bulk of the backend work (yes, all Scala) is now done. Still needs testing and probably a bit bugfixing here and there. But overall it's looking good. Because for various reasons, however, I won't be deploying this to the live instance until mid-February, and (a second update) beginning of March. (Most importantly, my institution is moving office and will take down the server for a week or so. Therefore I'll need to move everything to a rented VM and then back after everything is done. Planning to combine the move-related downtimes with the system upgrades.)

@rsimon
Copy link

rsimon commented Feb 1, 2018

PS.: I am now moving back to mostly frontend/JavaScript work now. E.g. options to change the map colouring based on different properties (tags, annotation status etc.):

https://github.com/pelagios/recogito2/tree/master/app/assets/javascripts/document/map

and enhancing the gazetteer search dialog, e.g. adding options to filter by gazetteer etc.:

https://github.com/pelagios/recogito2/tree/master/app/assets/javascripts/document/annotation/common/georesolution

Also, of course, a "georesolution-panel-alternative" for searching person datasets would be highly intresting, as discussed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants