Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add functions for query-target distance calculations #17

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

schlegelp
Copy link

@schlegelp schlegelp commented Sep 11, 2024

Hi! This PR would add functions to run query->target distance calculations, complementing the current all-by-all functionality.

The use-case for these (at least in my work) would be to find, for each query neuron, the best match(es) among a pool of target neurons.

TODOs:

  • query->target version of:
    • GW distance
    • QGW distance
    • SLB distance
  • update docs

Before I continue working on the TODOs above, I was wondering if you think these would be a good fit? And if you do, whether you agree with the way it is implemented in this first pass?

@schlegelp schlegelp changed the title Add functions for query-target GW distance calculations [WIP] Add functions for query-target GW distance calculations Sep 11, 2024
@schlegelp schlegelp changed the title [WIP] Add functions for query-target GW distance calculations [WIP] Add functions for query-target distance calculations Sep 11, 2024
gw_coupling_mat_csv: Optional[str] = None,
return_coupling_mats: bool = False,
) -> tuple[
DistanceMatrix, # GW distance matrix (Squareform)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be

Matrix,  # GW distance matrix (not necessarily a square)


:return: If `return_coupling_mats` is True,
returns `( gw_dmat, couplings )`,
where gw_dmat is a square matrix whose (i,j) entry is the GW distance
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gw_dmat is not necessarily square

raise Exception(
"Must supply list of cell identifiers for writing to file."
)
gw_data = csv_output_writer(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's nice that you were able to make use of this existing function. When I wrote it I had square matrices in mind but I think this should be correct.

return_coupling_mats: bool = False,
verbose: Optional[bool] = False,
) -> tuple[
DistanceMatrix, # Pairwise GW distance matrix (Squareform)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Matrix, # Pairwise GW distance matrix (need not be square)

@patrick-nicodemus
Copy link
Collaborator

Hi Philipp,
Thanks very much for this contribution. It looks great.
I think the code is simple and clear.
I think the overall architecture / structure is good and fits in with other parts of the library.
Thank you very much for writing type annotations and adding a unit test.

I don't have any comments on the code but there are a few changes to the docstrings and type labels I suggested.

The way the code currently stands, the type DistanceMatrix refers to a square, symmetric distance matrix with zeroes along the diagonal. I have been using the name Matrix = NewType("Matrix", npt.NDArray[np.float64]) to refer to a numpy array of shape 2.

We can rename the type to something else to make it easier to distinguish between square symmetric distance matrices and general rectangular distance matrices, but on the other hand the distance matrices in the query/target code don't have any special properties other than being nonnegative, so perhaps they need a special name.

For future work, I don't think it's necessary (for computational reasons, at least) to do something like this for the SLB distances, which I expect to be sufficiently fast to compute that one can just do all-by-all comparisons and then filter afterwards. Although the SLB is itself a poor estimator of GW, in its own right it seems to give reasonably good performance on our k-nn classifier tests, so it may be of direct use.

On the other hand, I do think that implementing the query version of the QGW is likely to be useful, and the design can be basically identical to the design of this one. It would be nice to unify some of this redundant code but I am not sure that the benefits of a uniform interface would outweigh the increased complexity (in Python, at least).

Again, thank you for this contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants