reducing the size of large datasets #94

KennethEnevoldsen · 2024-01-25T07:17:47Z

If it is possible to reduce the size of some datasets without changing the performance too much it would be great to ensure that the benchmark runs faster.

I am especially thinking of ScaLA, Da Political comments, as well as massive intent and massive scenario.

@x-tabdeveloping would you think this is reasonable as well?

x-tabdeveloping · 2024-01-31T08:58:46Z

Hmm yeah it would be nice if we could make it faster somehow, especially if we're planning on bootstrapping stuff, then it's a really good idea.

KennethEnevoldsen · 2024-01-31T10:22:39Z

Well only bootstrapping the evaluation (not the encoding) - but agree

KennethEnevoldsen · 2024-02-06T07:58:31Z

Notably better at the fixes in #130 (doesn't actually change the size)

KennethEnevoldsen added enhancement New feature or request dataset new dataset to add labels Jan 25, 2024

This was referenced Jan 25, 2024

Add Swedn and VG clustering datasets #96

Merged

Datasets to add #61

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reducing the size of large datasets #94

reducing the size of large datasets #94

KennethEnevoldsen commented Jan 25, 2024 •

edited

Loading

x-tabdeveloping commented Jan 31, 2024

KennethEnevoldsen commented Jan 31, 2024

KennethEnevoldsen commented Feb 6, 2024

reducing the size of large datasets #94

reducing the size of large datasets #94

Comments

KennethEnevoldsen commented Jan 25, 2024 • edited Loading

x-tabdeveloping commented Jan 31, 2024

KennethEnevoldsen commented Jan 31, 2024

KennethEnevoldsen commented Feb 6, 2024

KennethEnevoldsen commented Jan 25, 2024 •

edited

Loading