This repository has been archived by the owner on Jan 8, 2022. It is now read-only.
#intersect
is algorithmically incorrect
#58
Labels
#intersect
is algorithmically incorrect
#58
Feature
Current implementation Quirks
1.)
intersect
isn't commutative, even though it should beWhen taking the intersection (β©) of two sets T and S:
Order doesn't affect the final outcome of the intersection. This makes sense since we're dealing with equality of elements within a given set and if elements are equal in both sets, the element is kept. Order has no involvement here.
Now let's see if the
Collection#intersect
respects this property of set intersection:Sample code:
Output:
The output for each intersection is different. This means it isn't a true intersection because proper intersections are commutative. IE
T.intersect(S)
andS.intersect(T)
should yield values that are exactly equal.We can look at the current implementation to see why this is caused:
This method only considers values from the right hand side of the intersection. More specifically it only uses keys as the basis of equality. Which brings me to another design oversight.
2.) Key-Based intersection doesn't actually intersect a
Map
-like structure.Key-based intersection is just that, an intersection of keys within the key set of a map. However the issue comes into play once you try to tie values in within those keys. You never intersected the values, you only intersected the keys. The act of associating values in the intersection, that haven't been intersected, means that this is no longer an actual intersection.
Ideal solution or implementation
Compare Maps as a set of entries, not as a key-value store
To fix this, we need to change the way we think of map comparisons. Everything I stated above applies to sets. Well... we have an issue here,
Map
's aren't sets. Well in terms of K/V access they aren't. However in terms of implementation they are definitely are. Hence we have:Map#entries
Now we can consider the map as a set rather than just a pure K/V access object. This means we can now apply the proper type of intersection on this map.
Let's try a proper intersection on the previous code example:
Output:
Wait what? The intersect is empty?!
Yup, this is completely intentional. We're now taking values into account which means if a given entry has the same key but differing values, it's no longer in the intersection. This preserves the true function of a proper intersection.
I want to note now
intersect
is commutative sinceT.intersect(S)
is the same set asS.intersect(T)
.But isn't that less useful?
Well what use cases are there for key-based intersection?
Also if the library is deciding which collection values to keep and which ones not to keep, doesn't this make the whole of idea of intersecting collections more confusing.
Usually if you want to
intersect
maps you have two maps that are homogenous in terms of key types and value types. So in most cases the "proper" intersection won't affect those cases.For example intersecting roles of a user with one given in a list, has the same effect in both implementations. It just so happens that the algorithm I'm proposing is actually correct for all cases not just a certain cases.
How would this be implemented?
For key equality the same method of using
Map#has
would be used. Under the hood this method uses theSameValueZero
algorithm for testing equality.Object.is
is the API-equivalent implementation ofSameValueZero
. So naturally it would also be used for map-value equality.Now the
intersection
method does correct intersections, and avoids the pitfalls of the previous implementations.Ok, but there are some use cases where I find key-based map intersection useful
Understood, however this isn't something the library should be implementing. The is because as stated above the library obfuscated the precedence it uses for intersections. And it would be confusing for a method literally called
intersect
to not perform a proper intersection.Instead this functionality should be implemented by the user. Doing so, is quite trivial:
With the user constructing they're own key-based intersection it returns control to the user to set precedence, and doesn't rely on an implementation that is opinionated in terms of collection precedence.
Alternative solutions or implementations
We could remove
intersect
, but I don't think that's very ideal.Other context
No response
The text was updated successfully, but these errors were encountered: