Allow passing spike rates directly to gpfa #507

jonahpearl · 2022-07-25T22:43:22Z

Hi there -- I made a little edit to the gpfa class that allows spike rates to be passed in directly, bypassing the first step of the processing. This would allow gpfa to be used on df/f traces, or pre-interpolated spike rates, or really any continuous set of values.

I don't know if the downstream analysis assumes things specific to spike rates, eg only positive values, but if so, might be good to add checks to that effect in

Also, it seemed like the check in transform() if len(spiketrains[0]) != len(self.has_spikes_bool) was redundant / conflicting with the line a few lines later seq['y'] = seq['y'][self.has_spikes_bool, :], so I didn't add any checks there, but it feels like it ought to check something about seq.

Anyways, hope this helps someone :)

pep8speaks · 2022-07-25T22:43:25Z

Hello @jonahpearl! Thanks for opening this PR. We checked the lines you've touched for PEP 8 issues, and found:

In the file elephant/gpfa/gpfa.py:

Line 292:80: E501 line too long (83 > 79 characters)
Line 320:69: E261 at least two spaces before inline comment
Line 320:80: E501 line too long (88 > 79 characters)
Line 379:80: E501 line too long (88 > 79 characters)
Line 452:1: W293 blank line contains whitespace
Line 456:33: E127 continuation line over-indented for visual indent
Line 460:80: E501 line too long (81 > 79 characters)
Line 461:33: E128 continuation line under-indented for visual indent

Moritz-Alexander-Kern · 2022-08-01T12:01:17Z

Hi @jonahpearl ,
thank you for contributing this neat enhancement to elephant.

I've created a pull request here: jonahpearl#1 . This PR makes the code comply to pep8.
(it's only formatting)

I will have to look more deeply into this and also come up with some unit tests. If you have any suggestions for tests or a minimal example, don't hesitate to share with us.

Looking forward to continue on this.

What do you think @essink ?

Moritz-Alexander-Kern · 2022-08-03T09:07:59Z

Hey @jonahpearl ,

Here's a quick reply to your comments, I need some more time to do a complete the review and come up with a more detailed response.

One key unit test would be the case where some traces are all zeros, and so get removed in pre-processing during training, but then, eg, the test set has some non-zeros on those neurons' traces and so it isn't removed. I think there's a "has-spikes" bool that deals with this but I didn't totally understand the logic of its implementation so I left it alone.

Good point, I agree this is a good unit-test, thanks for your input!

The has_spikes_bool logic in line 374:

self.has_spikes_bool = np.hstack(seqs['y']).any(axis=1)

https://github.com/jonahpearl/elephant/blob/67492e9852607f08211546f726df9dd5fbe64dfe/elephant/gpfa/gpfa.py#L374
looks per trace over all time points if there is a nonzero-entry and makes a boolean-mask out of it, so this should work the way you used it. (credits go to essink for pointing this out to me)

Something else to consider is how the results are changed by giving gpfa a smoothed trace. Looks like normally the input is binned spike counts, so if it's using a poisson process somewhere deep down, that assumption would be flawed. But if I understand correctly, it's just fitting factor analysis on the traces x time matrix, and then calculating trajectories in that space, and there's no poisson anywhere. But not sure how smoothing would affect FA

I will come back to this and give a more detailed answer once I have looked into it, but at first glance: do agree that there is no Poisson here. (thanks again essink )

jonahpearl · 2022-08-03T13:05:10Z

Ah, I see now. I was worried that seq_trains was getting stored and re-used, but I see now that it's created each time from the input, so the has_spikes bool should always work the same. Then I think simply copying the spirit of lines 459-461 into line 464 should do what we need:

if len(seq_trains) != len(self.has_spikes_bool):
    raise ValueError("'seq_trains' must contain the same number of neurons as the training spiketrain data")

Moritz-Alexander-Kern

Thank you for your patience, here is a more in-depth review:
additional points:

consider adding the change to fit_transform(): here an argument would have to be added and passed to fit and transform accordingly. (fit_transform is just a wrapper for fit and transform).
consider analogous to _check_training_data() a check for seqs e.g. there should be traces inside and it should be a recarray with the right fields.

I will soon push suggestions for unit-tests to the PR already opened here: jonahpearl#1
(edit: first unit-tests for basic functionality added)

Moritz-Alexander-Kern · 2022-08-10T15:11:15Z

elephant/gpfa/gpfa.py

+            self._check_training_data(spiketrains)
+            seqs_train = self._format_training_data(spiketrains)
+        elif seqs_train is not None:
+            seqs_train = self._format_training_data_seqs(seqs_train) # TODO: write this!


Suggested change

seqs_train = self._format_training_data_seqs(seqs_train) # TODO: write this!

seqs_train = self._format_training_data_seqs(seqs_train)

remove TODO, since _format_training_data_seqs is implemented, or is this referring to something different?

Moritz-Alexander-Kern · 2022-08-10T15:27:56Z

elephant/gpfa/gpfa.py

+            seq['y'] = seq['y'][self.has_spikes_bool, :]
+        return seqs
+
+    def transform(self, spiketrains, seqs=None, returned_data=['latent_variable_orth']):


Since seqs is now a parameter of transform, consider adding a description to the docstring of transform.

Moritz-Alexander-Kern · 2022-08-10T16:31:22Z

elephant/gpfa/gpfa.py

+                                "neurons as the training spiketrain data")
+            seqs = gpfa_util.get_seqs(spiketrains, self.bin_size)
+        elif seqs is not None:
+            # check some stuff


Suggested change

# check some stuff

if len(seqs['y'][0]) != len(self.has_spikes_bool):

raise ValueError(

"'seq_trains' must contain the same number of neurons as "

"the training spiketrain data")

Thanks for your suggestion, I took the liberty to add it here, I hope I got the spirit of your idea correctly? #507 (comment)

Moritz-Alexander-Kern · 2022-08-10T16:32:22Z

elephant/gpfa/gpfa.py

+            seqs = gpfa_util.get_seqs(spiketrains, self.bin_size)
+        elif seqs is not None:
+            # check some stuff
+            pass


Suggested change

pass

no longer needed, see above

Moritz-Alexander-Kern · 2023-02-23T15:47:12Z

Thanks for the contribution, development will be continued in #539, feel free to reopen at any time if necessary.

jonahpearl added 4 commits July 25, 2022 14:40

Allow seqs to be passed directly to gpfa fit

7afd5d8

Small fix to error checking

1c42958

Typo

f671a38

another typo

67492e9

Moritz-Alexander-Kern added the enhancement Editing an existing module, improving something label Jul 26, 2022

Moritz-Alexander-Kern requested changes Aug 10, 2022

View reviewed changes

Moritz-Alexander-Kern mentioned this pull request Feb 10, 2023

[ENH] GPFA with continuous data #539

Draft

2 tasks

Moritz-Alexander-Kern closed this Feb 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow passing spike rates directly to gpfa #507

Allow passing spike rates directly to gpfa #507

jonahpearl commented Jul 25, 2022

pep8speaks commented Jul 25, 2022

Moritz-Alexander-Kern commented Aug 1, 2022

Moritz-Alexander-Kern commented Aug 3, 2022

jonahpearl commented Aug 3, 2022

Moritz-Alexander-Kern left a comment •

edited

Loading

Moritz-Alexander-Kern Aug 10, 2022

Moritz-Alexander-Kern Aug 10, 2022

Moritz-Alexander-Kern Aug 10, 2022 •

edited

Loading

Moritz-Alexander-Kern Aug 10, 2022

Moritz-Alexander-Kern commented Feb 23, 2023

	seqs_train = self._format_training_data_seqs(seqs_train) # TODO: write this!
	seqs_train = self._format_training_data_seqs(seqs_train)

-            # check some stuff
+            if len(seqs['y'][0]) != len(self.has_spikes_bool):
+                raise ValueError(
+                    "'seq_trains' must contain the same number of neurons as "
+                    "the training spiketrain data")

Allow passing spike rates directly to gpfa #507

Allow passing spike rates directly to gpfa #507

Conversation

jonahpearl commented Jul 25, 2022

pep8speaks commented Jul 25, 2022

Moritz-Alexander-Kern commented Aug 1, 2022

Moritz-Alexander-Kern commented Aug 3, 2022

jonahpearl commented Aug 3, 2022

Moritz-Alexander-Kern left a comment • edited Loading

Choose a reason for hiding this comment

Moritz-Alexander-Kern Aug 10, 2022

Choose a reason for hiding this comment

Moritz-Alexander-Kern Aug 10, 2022

Choose a reason for hiding this comment

Moritz-Alexander-Kern Aug 10, 2022 • edited Loading

Choose a reason for hiding this comment

Moritz-Alexander-Kern Aug 10, 2022

Choose a reason for hiding this comment

Moritz-Alexander-Kern commented Feb 23, 2023

Moritz-Alexander-Kern left a comment •

edited

Loading

Moritz-Alexander-Kern Aug 10, 2022 •

edited

Loading