Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproducibility -- include datasets from paper in repo #20

Open
ulupo opened this issue Aug 6, 2021 · 3 comments
Open

Reproducibility -- include datasets from paper in repo #20

ulupo opened this issue Aug 6, 2021 · 3 comments

Comments

@ulupo
Copy link
Contributor

ulupo commented Aug 6, 2021

I think it would be good to include the datasets we used in the benchmarks and reported on in the paper. Perhaps an extra folder benchmarks can be created with scripts that people can use to test performance, and at least the datasets we used in the paper? I think that would be a good service for people interested in the library and possibly in finding any remaining bottlenecks.

@MonkeyBreaker
Copy link
Collaborator

I think it is a good idea, but I would not put it directly in main branch.
Maybe we could create a benchmark branch and put there all the data-sets we want, what do you think ?
The reason is that I think the data should not be directly present in the package or at least in the main branch of the package.

@ulupo
Copy link
Contributor Author

ulupo commented Aug 6, 2021

I think it is a good idea, but I would not put it directly in main branch.

Hmm, I'm not sure I agree. Though I think I see why you say this ("only code in main"), including benchmarks scripts and data is the approach in scikit-learn for example: https://github.com/scikit-learn/scikit-learn (see benchmarks folder with scripts, and the datasets subpackage with the data itself). Additionally, the data would be saved as text files, not as binary.

@MonkeyBreaker
Copy link
Collaborator

I was not aware of scikit-learn practices, and because we try to follow them on giotto-tda, we should also follow them here.

Though I think I see why you say this ("only code in main")

Exactly, but it is more a matter of taste (mine in this case) 😛

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants