Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify data setup #634

Closed
wants to merge 5 commits into from
Closed

Simplify data setup #634

wants to merge 5 commits into from

Conversation

maedoc
Copy link
Member

@maedoc maedoc commented Dec 7, 2022

This adds a simple way to get the demo dataset from Zenodo w/o the wget/unzip stuff or missing data with the pip package. It's currently in the library part but could be anywhere, @liadomide thoughts?

@i-Zaak
Copy link
Contributor

i-Zaak commented Dec 7, 2022

This might be helpful: https://github.com/fatiando/pooch

@liadomide
Copy link
Member

This adds a simple way to get the demo dataset from Zenodo w/o the wget/unzip stuff or missing data with the pip package. It's currently in the library part but could be anywhere, @liadomide thoughts?

I appreciate the suggestion; I think this is a good idea.
Maybe we put it inside tvb-library package so that after a pip install tvb-library we have it at hand in a user's env for other modules depending on tvb-library (e.g. tvb-framework) to use ?

@i-Zaak
Copy link
Contributor

i-Zaak commented Dec 7, 2022

100% agree with @liadomide - tvb-library would be the most intuitive place.

@liadomide
Copy link
Member

This might be helpful: https://github.com/fatiando/pooch

@i-Zaak what exactly do you have in mind? to publish also with pooch, thus keep the data replicated in 2 places or replace Zenodo entirely?

@i-Zaak
Copy link
Contributor

i-Zaak commented Dec 7, 2022

I'd use pooch to fetch the zip file from zenodo, verify checksum and unzip.

I recently started using it for the EBRAINS datasets:

_ = pooch.retrieve(
    url='https://object.cscs.ch/v1/AUTH_227176556f3c4bb38df9feea4b91200c/hbp-d000059_Atlas_based_HCP_connectomes_v1.1_pub/200-Schaefer17Networks.zip',
    known_hash='5086f4b3405acff84ffe132cee17c67a90000a3fae98da50d4e14fb55d7f5d57',
    path='some/path/where/the/data/lives',
    processor=pooch.Unzip(extract_dir='.')
)

@maedoc maedoc self-assigned this Dec 8, 2022
@maedoc
Copy link
Member Author

maedoc commented Dec 8, 2022

I'm onboard for pooch but I'd just like to get #633 done first so that it's easier to add the dependency.

@maedoc
Copy link
Member Author

maedoc commented Dec 8, 2022

keep the data replicated in 2 places or replace Zenodo entirely?

On a longer term view, it could be of interest to keep the Zenodo for the DOI but have the primary source be a Datalad repository which stores significant demo datasets, simulated data etc, that could help jumpstart users' projects. For instance, having 100 HCP subjects preprocessed, with some basic simulations done could be a huge benefit.

On short term, having an easy fetcher for our Zenodo but also Ebrains is very nice.

@i-Zaak
Copy link
Contributor

i-Zaak commented Dec 8, 2022

We could start incorporating various data sources in the TVB data (datalad, zenodo, EBRAINS) similarly to how MNE is handling datasets: https://mne.tools/stable/overview/datasets_index.html

In the end, there should be minimal data contained in the tvb-data package itself, but it should have code to get those. The condition for priority of inclusion would be for me public availability to avoid dealing with tokens and other auth interfaces (exception for EBRAINS HDG, that is known beast).

Another candidate to be merged in: https://gitlab.ebrains.eu/fousekjan/tvb-ebrains-data

@maedoc
Copy link
Member Author

maedoc commented May 30, 2024

@liadomide do you think this PR is still relevant, given the work in #691 ?

(btw do you have a link to the issue TVB-1999? I cannot find it at req.thevirtualbrain.org)

@romina1601
Copy link
Member

@liadomide do you think this PR is still relevant, given the work in #691 ?

(btw do you have a link to the issue TVB-1999? I cannot find it at req.thevirtualbrain.org)

@maedoc here is the link for TVB-1999: https://tvb-projects.atlassian.net/browse/TVB-1999

@liadomide
Copy link
Member

@maedoc I think it is safe to close this, and leave the #691 for a final review and integration.
This was a good base for #691

@maedoc
Copy link
Member Author

maedoc commented May 30, 2024

close in favor of #691

@maedoc maedoc closed this May 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants