What happens with multiple comparisons on same column #2582
-
I'm relatively new to splink and record linkage. I was trying to see the differences in gammas when using multple comparisons on the same column, but was a little surprised to get a single gamma. I'm wondering if this is because it just uses the last comparison listed for the column or if it is somehow combining the comparisons. I have three different comparisons from the comparisons library. I'm wondering if this could be a way to build an ensemble of sorts. However, I figure if this was the case it would be mentioned in the documentation or would have been asked in the Q&A and I can find neither. Appreciate any help and the package, which so far has been great for addressing a need for me. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 3 replies
-
This article should clarify, specifically look at the bit where it says:
And see also |
Beta Was this translation helpful? Give feedback.
Each comparison is composed of a number of comparison levels:
https://moj-analytical-services.github.io/splink/topic_guides/comparisons/comparisons_and_comparison_levels.html#defining-similarity
They are mutually exclusive, similarity can be categorised in only one of the comparison levels. Each comparison level corresponds to a single gamma value.
If you have two comparisons that refer to the same column it might actually cause a bug or error because it shouldn't be done: it would cause that column to be double counted. If you really wanted to do it, I think you have to specify the comparison as a dict :
https://moj-analytical-services.github.io/splink/topic_guides/comparisons/customisin…