Review suitable distance metrics #32

MaxGhenis · 2018-12-21T06:30:47Z

Analyses thus far have used Euclidean distance, which has worked well enough for initial eyeballing. However, it doesn't distinguish much between a small value and zero, which is important given the PUF's sparsity. One rule of thumb proposed is that Euclidean isn't useful when less than 3/4 of attributes are non-zero, which is certainly the case in the PUF.

That same thread suggested that cosine similarity can be better in these cases, though a comment here suggests it's best for categorical data. Cosine similarity should be normalized. Others like Gower and Mahalanobis distances can be investigated here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Review suitable distance metrics #32

Review suitable distance metrics #32

MaxGhenis commented Dec 21, 2018

Review suitable distance metrics #32

Review suitable distance metrics #32

Comments

MaxGhenis commented Dec 21, 2018