You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am interested in creating train, test and eval splits from a pandas Dataframe, therefore I was looking at the possibilities I can follow. I noticed the split parameter and was hopeful to use it in order to generate the 3 at once, however, while trying to understand the code, i noticed that it has no added value (correct me if I am wrong or misunderstood the code).
from_pandas function code :
ifinfoisnotNoneandfeaturesisnotNoneandinfo.features!=features:
raiseValueError(
f"Features specified in `features` and `info.features` can't be different:\n{features}\n{info.features}"
)
features=featuresiffeaturesisnotNoneelseinfo.featuresifinfoisnotNoneelseNoneifinfoisNone:
info=DatasetInfo()
info.features=featurestable=InMemoryTable.from_pandas(
df=df,
preserve_index=preserve_index,
)
iffeaturesisnotNone:
# more expensive cast than InMemoryTable.from_pandas(..., schema=features.arrow_schema)# needed to support the str to Audio conversion for instancetable=table.cast(features.arrow_schema)
returncls(table, info=info, split=split)
Steps to reproduce the bug
fromdatasetsimportDataset# Filling the split parameter with whatever causes no harm at alldata=Dataset.from_pandas(self.raw_data, split='egiojegoierjgoiejgrefiergiuorenvuirgurthgi')
Expected behavior
Would be great if there is no split parameter (if it isn't working), or to add a concrete example of how it can be used.
Describe the bug
I am interested in creating train, test and eval splits from a pandas Dataframe, therefore I was looking at the possibilities I can follow. I noticed the split parameter and was hopeful to use it in order to generate the 3 at once, however, while trying to understand the code, i noticed that it has no added value (correct me if I am wrong or misunderstood the code).
from_pandas function code :
Steps to reproduce the bug
Expected behavior
Would be great if there is no split parameter (if it isn't working), or to add a concrete example of how it can be used.
Environment info
datasets
version: 3.2.0huggingface_hub
version: 0.27.1fsspec
version: 2024.9.0The text was updated successfully, but these errors were encountered: