Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow passing spike rates directly to gpfa #507

Closed

Conversation

jonahpearl
Copy link

Hi there -- I made a little edit to the gpfa class that allows spike rates to be passed in directly, bypassing the first step of the processing. This would allow gpfa to be used on df/f traces, or pre-interpolated spike rates, or really any continuous set of values.

I don't know if the downstream analysis assumes things specific to spike rates, eg only positive values, but if so, might be good to add checks to that effect in

Also, it seemed like the check in transform() if len(spiketrains[0]) != len(self.has_spikes_bool) was redundant / conflicting with the line a few lines later seq['y'] = seq['y'][self.has_spikes_bool, :], so I didn't add any checks there, but it feels like it ought to check something about seq.

Anyways, hope this helps someone :)

@pep8speaks
Copy link

Hello @jonahpearl! Thanks for opening this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 292:80: E501 line too long (83 > 79 characters)
Line 320:69: E261 at least two spaces before inline comment
Line 320:80: E501 line too long (88 > 79 characters)
Line 379:80: E501 line too long (88 > 79 characters)
Line 452:1: W293 blank line contains whitespace
Line 456:33: E127 continuation line over-indented for visual indent
Line 460:80: E501 line too long (81 > 79 characters)
Line 461:33: E128 continuation line under-indented for visual indent

@Moritz-Alexander-Kern Moritz-Alexander-Kern added the enhancement Editing an existing module, improving something label Jul 26, 2022
@Moritz-Alexander-Kern
Copy link
Member

Hi @jonahpearl ,
thank you for contributing this neat enhancement to elephant.

I've created a pull request here: jonahpearl#1 . This PR makes the code comply to pep8.
(it's only formatting)

I will have to look more deeply into this and also come up with some unit tests. If you have any suggestions for tests or a minimal example, don't hesitate to share with us.

Looking forward to continue on this.

What do you think @essink ?

@Moritz-Alexander-Kern
Copy link
Member

Hey @jonahpearl ,

Here's a quick reply to your comments, I need some more time to do a complete the review and come up with a more detailed response.

One key unit test would be the case where some traces are all zeros, and so get removed in pre-processing during training, but then, eg, the test set has some non-zeros on those neurons' traces and so it isn't removed. I think there's a "has-spikes" bool that deals with this but I didn't totally understand the logic of its implementation so I left it alone.

Good point, I agree this is a good unit-test, thanks for your input!

The has_spikes_bool logic in line 374:

self.has_spikes_bool = np.hstack(seqs['y']).any(axis=1)

https://github.com/jonahpearl/elephant/blob/67492e9852607f08211546f726df9dd5fbe64dfe/elephant/gpfa/gpfa.py#L374
looks per trace over all time points if there is a nonzero-entry and makes a boolean-mask out of it, so this should work the way you used it. (credits go to essink for pointing this out to me)

Something else to consider is how the results are changed by giving gpfa a smoothed trace. Looks like normally the input is binned spike counts, so if it's using a poisson process somewhere deep down, that assumption would be flawed. But if I understand correctly, it's just fitting factor analysis on the traces x time matrix, and then calculating trajectories in that space, and there's no poisson anywhere. But not sure how smoothing would affect FA

I will come back to this and give a more detailed answer once I have looked into it, but at first glance: do agree that there is no Poisson here. (thanks again essink )

@jonahpearl
Copy link
Author

Ah, I see now. I was worried that seq_trains was getting stored and re-used, but I see now that it's created each time from the input, so the has_spikes bool should always work the same. Then I think simply copying the spirit of lines 459-461 into line 464 should do what we need:

if len(seq_trains) != len(self.has_spikes_bool):
    raise ValueError("'seq_trains' must contain the same number of neurons as the training spiketrain data")

Copy link
Member

@Moritz-Alexander-Kern Moritz-Alexander-Kern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your patience, here is a more in-depth review:
additional points:

  • consider adding the change to fit_transform(): here an argument would have to be added and passed to fit and transform accordingly. (fit_transform is just a wrapper for fit and transform).

  • consider analogous to _check_training_data() a check for seqs e.g. there should be traces inside and it should be a recarray with the right fields.

I will soon push suggestions for unit-tests to the PR already opened here: jonahpearl#1
(edit: first unit-tests for basic functionality added)

self._check_training_data(spiketrains)
seqs_train = self._format_training_data(spiketrains)
elif seqs_train is not None:
seqs_train = self._format_training_data_seqs(seqs_train) # TODO: write this!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
seqs_train = self._format_training_data_seqs(seqs_train) # TODO: write this!
seqs_train = self._format_training_data_seqs(seqs_train)

remove TODO, since _format_training_data_seqs is implemented, or is this referring to something different?

seq['y'] = seq['y'][self.has_spikes_bool, :]
return seqs

def transform(self, spiketrains, seqs=None, returned_data=['latent_variable_orth']):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since seqs is now a parameter of transform, consider adding a description to the docstring of transform.

"neurons as the training spiketrain data")
seqs = gpfa_util.get_seqs(spiketrains, self.bin_size)
elif seqs is not None:
# check some stuff
Copy link
Member

@Moritz-Alexander-Kern Moritz-Alexander-Kern Aug 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# check some stuff
if len(seqs['y'][0]) != len(self.has_spikes_bool):
raise ValueError(
"'seq_trains' must contain the same number of neurons as "
"the training spiketrain data")

Thanks for your suggestion, I took the liberty to add it here, I hope I got the spirit of your idea correctly? #507 (comment)

seqs = gpfa_util.get_seqs(spiketrains, self.bin_size)
elif seqs is not None:
# check some stuff
pass

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pass

no longer needed, see above

@Moritz-Alexander-Kern
Copy link
Member

Thanks for the contribution, development will be continued in #539, feel free to reopen at any time if necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Editing an existing module, improving something
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants