Adding the ThermoML dataset #117

ml-evs · 2023-03-16T17:44:51Z

I'm happy to have a go at adding the ThermoML archives as a dataset, if this is useful.

Already mentioned on Discord by @marcosfelt, including the useful link to thermopyl and @marcosfelt's updated fork.

ml-evs · 2023-03-16T17:54:17Z

In terms of licensing, it looks like the ThermoML archive has recently been "FAIRified" and can be downloaded in bulk at http://doi.org/10.18434/mds2-2422

marcosfelt · 2023-03-16T18:30:45Z

If I remember correctly, I used the massive flat file from this link!

In terms of licensing, it looks like the ThermoML archive has recently been "FAIRified" and can be downloaded in bulk at http://doi.org/10.18434/mds2-2422

Also, to save you some time, here's a rough sketch of the code you need to just get pandas dataframes:

from thermopyl import Parser

# Get the data for each of the the journals
data_paths = [
    "ThermoML.v2020-09-30/10.1021/",
    "ThermoML.v2020-09-30/10.1016/",
    "ThermoML.v2020-09-30/10.1007/"
]


for data_path in data_paths:
     parser = Parser(data_path)
     parsed_data = parser.parse()
     parsed_data = pd.DataFrame(parsed_data)
     doi = data_path.split("/")[-2]
     parsed_data.to_parquet(f"{doi}.pq")

ml-evs linked a pull request Mar 20, 2023 that will close this issue

Add ThermoML Archive dataset #118

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding the ThermoML dataset #117

Adding the ThermoML dataset #117

ml-evs commented Mar 16, 2023

ml-evs commented Mar 16, 2023

marcosfelt commented Mar 16, 2023 •

edited

Loading

Adding the ThermoML dataset #117

Adding the ThermoML dataset #117

Comments

ml-evs commented Mar 16, 2023

ml-evs commented Mar 16, 2023

marcosfelt commented Mar 16, 2023 • edited Loading

marcosfelt commented Mar 16, 2023 •

edited

Loading