You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In terms of licensing, it looks like the ThermoML archive has recently been "FAIRified" and can be downloaded in bulk at http://doi.org/10.18434/mds2-2422
If I remember correctly, I used the massive flat file from this link!
In terms of licensing, it looks like the ThermoML archive has recently been "FAIRified" and can be downloaded in bulk at http://doi.org/10.18434/mds2-2422
Also, to save you some time, here's a rough sketch of the code you need to just get pandas dataframes:
fromthermopylimportParser# Get the data for each of the the journalsdata_paths= [
"ThermoML.v2020-09-30/10.1021/",
"ThermoML.v2020-09-30/10.1016/",
"ThermoML.v2020-09-30/10.1007/"
]
fordata_pathindata_paths:
parser=Parser(data_path)
parsed_data=parser.parse()
parsed_data=pd.DataFrame(parsed_data)
doi=data_path.split("/")[-2]
parsed_data.to_parquet(f"{doi}.pq")
I'm happy to have a go at adding the ThermoML archives as a dataset, if this is useful.
Already mentioned on Discord by @marcosfelt, including the useful link to thermopyl and @marcosfelt's updated fork.
The text was updated successfully, but these errors were encountered: