Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review suitable distance metrics #32

Open
MaxGhenis opened this issue Dec 21, 2018 · 0 comments
Open

Review suitable distance metrics #32

MaxGhenis opened this issue Dec 21, 2018 · 0 comments

Comments

@MaxGhenis
Copy link
Collaborator

Analyses thus far have used Euclidean distance, which has worked well enough for initial eyeballing. However, it doesn't distinguish much between a small value and zero, which is important given the PUF's sparsity. One rule of thumb proposed is that Euclidean isn't useful when less than 3/4 of attributes are non-zero, which is certainly the case in the PUF.

That same thread suggested that cosine similarity can be better in these cases, though a comment here suggests it's best for categorical data. Cosine similarity should be normalized. Others like Gower and Mahalanobis distances can be investigated here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant