Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to handle PubMed documents from various sources #209

Open
korikuzma opened this issue Dec 9, 2022 · 2 comments
Open

How to handle PubMed documents from various sources #209

korikuzma opened this issue Dec 9, 2022 · 2 comments
Labels
question Further information is requested

Comments

@korikuzma
Copy link
Member

For our sources, we store PMIDs as Documents. Some sources provide more information than others. For example, OncoKB only gives the PMID. Whereas CIViC gives authors + description. An example would be for pmid:22663011. Currently, we're just taking the first source that loads that document since there is an ID constraint in the db. We should think on how we want to handle this as we add more sources. Should we combine source data? Should we prefix the ID with the source it came from?

@korikuzma korikuzma added the question Further information is requested label Dec 9, 2022
@jsstevenson
Copy link
Member

Some of this metadata, like article title and authors, should be objectively determinable -- we could use NCBI esearch/efetch to grab a minimum set of attributes for every article, regardless of what a source supplies.

That said, curated properties like a CIViC description definitely go above and beyond that.

@jsstevenson
Copy link
Member

My proposal:

  1. it's a job for eutils!
  2. given a DOI or PMID, fetch any basic metadata we might want (e.g. for display purposes -- author list, title, date, journal, issue/vol/no, etc). It's relatively safe to go between DOI <-> PMID so we can pick one as an identifier when it's available
  3. if neither of the above is available, just fill in what we can and figure out some way to identify it
  4. If sources DO provide PMID/DOI and additional metadata, we could check that the stuff they provide matches what comes out of our eutils lookup, and log/raise warnings for any discrepancies

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants