Replies: 2 comments 9 replies
-
@dschwalm, thanks for your feedback!. The lemmatizer model used is a machine learning based solution, although it is fairly simplistic. This means that there is no straightforward way to handle a grammatical case inside this system. |
Beta Was this translation helpful? Give feedback.
2 replies
-
@dschwalm there is a new, improved lemmatization in the works. Expect a release in weeks. Hope that this will solve your problem. |
Beta Was this translation helpful? Give feedback.
7 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello,
I have a question regarding lemmatization.
For the input 'macskás könyvek' I expected the lemmas as 'macska', 'könyv'. Instead, 'macskás' could not be lemmatized to 'macska'.
'macskát', 'macskák', 'macskával' could be lemmatized successfully.
Is there a way to enhance the lemmatizer to support this grammar structure 'macskás', 'lovas', 'havas'?
I would gladly contribute in the implementation.
Thanks,
Daniel
ps. my code:
import spacy nlp = spacy.load("hu_core_news_lg") doc = nlp("macskás könyvek") for token in doc: print(token, token.lemma, token.lemma_)
Output:
macskás 8631042501551719371 macskás
könyvek 2221763149769740023 könyv
Beta Was this translation helpful? Give feedback.
All reactions