Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reducing the size of large datasets #94

Open
KennethEnevoldsen opened this issue Jan 25, 2024 · 3 comments
Open

reducing the size of large datasets #94

KennethEnevoldsen opened this issue Jan 25, 2024 · 3 comments
Labels
dataset new dataset to add enhancement New feature or request

Comments

@KennethEnevoldsen
Copy link
Owner

KennethEnevoldsen commented Jan 25, 2024

If it is possible to reduce the size of some datasets without changing the performance too much it would be great to ensure that the benchmark runs faster.

I am especially thinking of ScaLA, Da Political comments, as well as massive intent and massive scenario.

@x-tabdeveloping would you think this is reasonable as well?

@KennethEnevoldsen KennethEnevoldsen added enhancement New feature or request dataset new dataset to add labels Jan 25, 2024
This was referenced Jan 25, 2024
@x-tabdeveloping
Copy link
Collaborator

Hmm yeah it would be nice if we could make it faster somehow, especially if we're planning on bootstrapping stuff, then it's a really good idea.

@KennethEnevoldsen
Copy link
Owner Author

Well only bootstrapping the evaluation (not the encoding) - but agree

@KennethEnevoldsen
Copy link
Owner Author

Notably better at the fixes in #130 (doesn't actually change the size)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dataset new dataset to add enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants