-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add free URI field to both place and person Recogito search #5
Comments
For Person, no specific extension is needed. I checked & confirmed that the standard For Places, the current implementation strictly requires place URIs that are known in the built-in gazetteer index, at the time the annotation is created. This kind of integrity enforcement would need to be relaxed if we want to allow this. Warning, however: such "dangling links" would have a few consequences further down the line, i.e. a somewhat more thorough analysis of pros/cons will be needed at some point. |
Looked into this further & think it makes sense to treat "known" URIs differently from the others in the model. It would be (much) easier to query them separately meaning that
|
That's fair enough—and I'm all in favour of granularity of fields allowing separate querying down the line—but, and I think this is important, it is important that the two URI fields also be able to be queried together. I see several possible cases for this:
In other words, I'm wondering if the more useful distinction isn't between known and unknown (although that has value for parsing/visualisation), but between interface-selected and manually entered. |
Hm... good point. Probably doesn't make modeling easier though ;-) Can we discuss the use cases a bit more?
It totally makes sense to be able to add a URI directly, irrespective of whether it's known or not, I agree. But would it be essential to know whether it's been manually added or not? (Vs. manually searched in the gazetteer, for example?) After all, there is still the "confirmed" vs. "non-confirmed" flag, if the point was to distinguish between NER annotations and user-provided ones. In addition, automated NER that has not been touched by a human user is already identifiable because it has no "created by" information attached to it.
Would the key use case for this to benchmark the NER?
Can you elaborate on these two a bit more? E.g. give examples? |
Hi @gabrielbodard (and CC-ing @thegsi), just a quick heads-up that I'm picking up work on this again. Time (as always) is limited, but I'd least like to spend a couple of days building a prototype branch of Recogito with a changed internal data model, where the "URIs have to be known & indexed" constraint is removed. I think the code/schema changes may not be so bad after all. I'm expecting a performance hit on some rather essential features (map view, data exports) & don't yet know how bad it will be. Also, transforming the 500k+ existing annotations in our live instance to the a new format will be a bit of open-heart surgery, but let's worry about that when we get there ;-) If it works, however, I think we would not only be able to support the feature as such; but it would also simplify Recogito's internal structure and potentially make it a lot easier to plug in external knowledge bases and URI sources. Hence, definitely a goal worth pursuing. I'll keep you posted on the progress! P.S.: I'm also documenting stuff at pelagios#413 |
@rsimon Sounds great. Probably Scala work? Do keep me updated here and/or email about progress and if you need some Javascript work. |
The bulk of the backend work (yes, all Scala) is now done. Still needs testing and probably a bit bugfixing here and there. But overall it's looking good. Because for various reasons, however, I won't be deploying this to the live instance until mid-February, and (a second update) beginning of March. (Most importantly, my institution is moving office and will take down the server for a week or so. Therefore I'll need to move everything to a rented VM and then back after everything is done. Planning to combine the move-related downtimes with the system upgrades.) |
PS.: I am now moving back to mostly frontend/JavaScript work now. E.g. options to change the map colouring based on different properties (tags, annotation status etc.): https://github.com/pelagios/recogito2/tree/master/app/assets/javascripts/document/map and enhancing the gazetteer search dialog, e.g. adding options to filter by gazetteer etc.: Also, of course, a "georesolution-panel-alternative" for searching person datasets would be highly intresting, as discussed. |
No description provided.
The text was updated successfully, but these errors were encountered: