Replace deprecated 'punkt' with 'punkt_tab' #83
+3
−3
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem
The NLTK package
punkt
has been deprecated, resulting in an error when calling a BM25TokenizerSolution
Replace
punkt
with the newpunkt_tab
Type of Change
This might be a breaking change. NLTK 3.8.1 and lower use
punkt
whereas NLTK 3.8.2 and above will usepunkt_tab
. Thepyproject.toml
file referencesnltk = "^3.6.5"
, meaning it will install NLTK 3.8.2 if possible, thus breaking. Introducing this breaking change on a patch version is something that the NLTK maintainers not should have done, but alas.Another fix would be to freeze the NLTK version.
Test Plan
I tried it locally and it fixed my issue.