-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Release datasets on Hugging Face #7
Comments
Hi @NielsRogge, thank's for reaching out! To be honest I did check the guides on Datasets on HF during the development phase of SambaMixer, because I wanted to make it easily accessible (which is certainly not the case with this Nasa Battery dataset...). The one thing that did hold me back was the fact that I am not the original creator of that dataset. Is there a way to explicitly give credit to the original creators? Besides of that I had some technical doubts concerning the implementation details, which I would figure out after reading the docs. We have this CSV and that contains a link to an npy-file that contains the timesignals: I might look into that for our follow-up work. |
You could upload the datasets under your HF username or organization, and give credit to the original authors in the dataset card (README). Of course, you could also reach out to the authors for explicit permission. But if the datasets can be freely downloaded on the website, they might allow redistribution (does the license say anything about that?)
The Datasets library supports csv files: https://huggingface.co/docs/datasets/loading#csv. Basically you can load the csv file as a 🤗 Dataset, then call |
Hello @sascha-kirch 🤗
I'm Niels and work as ML engineer at Hugging Face. I discovered your work as it got featured in AK's daily papers: https://huggingface.co/papers/2411.00233. The paper page lets people discuss about your paper and lets them find artifacts about it (the datasets used in your paper for instance) you can also claim the paper as yours which will show up on your public profile at HF.
Would you like to host the datasets on https://huggingface.co/datasets? Hosting on Hugging Face will give them more visibility/enable better discoverability, and will also allow people to do:
If you're down, leaving a guide here: https://huggingface.co/docs/datasets/loading. If the Datasets format doesn't work for your dataset, then see https://huggingface.co/docs/huggingface_hub/en/guides/upload.
Besides that, there's the dataset viewer which allows people to quickly explore the first few rows of the data in the browser.
After uploaded, we can also link the datasets to the paper page (read here) so people can discover your work.
What do you think?
Kind regards,
Niels
The text was updated successfully, but these errors were encountered: