Replies: 3 comments 4 replies
-
There is no integrated method for this. You should be able to filter out these words as preprocessing step. |
Beta Was this translation helpful? Give feedback.
-
I should be more precise, I'm thinking of introduce stop words when using a tokenizer, like in methods |
Beta Was this translation helpful? Give feedback.
-
Sure, let me explain my use case. For an information system, I'm trying to link applications and computing resources. Currently, I do:
This is working. The process is fast (few seconds, less than a minute) but the result is so and so. When investigating, I saw many false positives due to a match on irrelevant words, like "application" or "server". I then pre-processed the input using string replace. But it is quite tedious and would require tokenization to do it properly. Since functions like |
Beta Was this translation helpful? Give feedback.
-
Will it be feasible to introduce "stop words", that is, words that should not be considered when comparing strings.
Common stop words like "and" "this", and to on. But also custom stop words, like, in my domain "application" is not relevant when comparing strings.
Beta Was this translation helpful? Give feedback.
All reactions