How to handle PubMed documents from various sources #209

korikuzma · 2022-12-09T23:59:26Z

For our sources, we store PMIDs as Documents. Some sources provide more information than others. For example, OncoKB only gives the PMID. Whereas CIViC gives authors + description. An example would be for pmid:22663011. Currently, we're just taking the first source that loads that document since there is an ID constraint in the db. We should think on how we want to handle this as we add more sources. Should we combine source data? Should we prefix the ID with the source it came from?

The text was updated successfully, but these errors were encountered:

jsstevenson · 2024-03-30T00:23:09Z

Some of this metadata, like article title and authors, should be objectively determinable -- we could use NCBI esearch/efetch to grab a minimum set of attributes for every article, regardless of what a source supplies.

That said, curated properties like a CIViC description definitely go above and beyond that.

jsstevenson · 2024-06-14T17:57:48Z

My proposal:

it's a job for eutils!
given a DOI or PMID, fetch any basic metadata we might want (e.g. for display purposes -- author list, title, date, journal, issue/vol/no, etc). It's relatively safe to go between DOI <-> PMID so we can pick one as an identifier when it's available
if neither of the above is available, just fill in what we can and figure out some way to identify it
If sources DO provide PMID/DOI and additional metadata, we could check that the stuff they provide matches what comes out of our eutils lookup, and log/raise warnings for any discrepancies

korikuzma added the question Further information is requested label Dec 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to handle PubMed documents from various sources #209

How to handle PubMed documents from various sources #209

korikuzma commented Dec 9, 2022

jsstevenson commented Mar 30, 2024

jsstevenson commented Jun 14, 2024

How to handle PubMed documents from various sources #209

How to handle PubMed documents from various sources #209

Comments

korikuzma commented Dec 9, 2022

jsstevenson commented Mar 30, 2024

jsstevenson commented Jun 14, 2024