You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I’m writing some code to detect mentions of companies. The corpus uses the ehf/of/sf/etc. suffixes so that’s a strong indicator for me, and potentially for Greynir too.
I know that the Greynir website has an entity recognizer, but it seems quite strongly coupled to the database. Is there a case for bintokenizer to adapt a new token type? Or perhaps for Greynir to become company-entity aware?
I have some interesting examples of company names if that’s useful. I’m currently using an imperfect regex to match company names and then using Greynir to go back to the indefinite form.
Miðbæjarhótel/Centerhotels ehf.
Reitir - hótel ehf.
105 Miðborg slhf.
Faxaflóahafnir sf.
Bjarg íbúðafélag hses.
Efstaleitis Apótek ehf.
Íþrótta- og sýningahöllin hf.
V-16 ehf.
These are the suffixes I’ve come across:
ehf.
slhf.
sf.
hses.
hf.
ohf.
bs.
The text was updated successfully, but these errors were encountered:
Recently better company name tokenization was added. But I noticed there is not an attempt to detect lemmas. At least being able to get to the indefinite form of company names (from Veitna to Veitur) makes sense. Singular for pluralized and other word form changes may not be as useful.
Here the correct lemma would be Veitur ohf., but perhaps not a priority for this project to attempt lemming company and/or entity names. I’m doing it manually with a convulated matching algorithm, so I can take a sentence like that and link companies to company pages. It works ok.
I’m writing some code to detect mentions of companies. The corpus uses the ehf/of/sf/etc. suffixes so that’s a strong indicator for me, and potentially for Greynir too.
I know that the Greynir website has an entity recognizer, but it seems quite strongly coupled to the database. Is there a case for bintokenizer to adapt a new token type? Or perhaps for Greynir to become company-entity aware?
I have some interesting examples of company names if that’s useful. I’m currently using an imperfect regex to match company names and then using Greynir to go back to the indefinite form.
These are the suffixes I’ve come across:
The text was updated successfully, but these errors were encountered: