Replies: 20 comments
-
This seems like simple denormalization on the surface
Everyone can already get everywhere from anything; doing more cannot add functionality, but can exponentially increase the maintenance workload. What are you trying to do? |
Beta Was this translation helpful? Give feedback.
-
I need to insert the DOI into the Arctos record as a link. Do we have a protocol for linking to DOIs? |
Beta Was this translation helpful? Give feedback.
-
The DOI is the link to the actual genomic data. On the surface, this publication does not reference specimens. You have to dig to find out there is genomic data from an Arctos specimen embedded. We need a direct link from the specimen record to the dryad doi containing the genomic data. |
Beta Was this translation helpful? Give feedback.
-
That's a problem for the publication, right? From Arctos, someone's already done the digging and created citations (or project, if it's really underground) and the links are front-and-center obvious. In any case that sounds like a job best suited to Media - if you use the DOI (https://doi.org/10.5061/dryad.26j38) as the media_uri it should be fairly stable. If you can somehow justify otherIDs - and I don't think you can here without losing functionality (see #2278) - you could make a new type |
Beta Was this translation helpful? Give feedback.
-
Why not do both? Media and Other ID? I think we should have a similar
linkage to genomic data as to GenBank data and IsoBank data.
…On Mon, Sep 30, 2019 at 10:19 AM dustymc ***@***.***> wrote:
On the surface
That's a problem for the publication, right? From Arctos, someone's
already done the digging and created citations (or project, if it's really
underground) and the links are front-and-center obvious.
In any case that sounds like a job best suited to Media - if you use the
DOI (https://doi.org/10.5061/dryad.26j38) as the media_uri it should be
fairly stable.
If you can somehow justify otherIDs - and I don't think you can here
without losing functionality (see #2278
<#2278>) - you could make a new
typeDOI with a base URL of https://doi.org/ and enter 10.5061/dryad.26j38
with the specimen, which would link to https://doi.org/10.5061/dryad.26j38
.
—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
<#2281?email_source=notifications&email_token=ADQ7JBHJ4C4N4GISAYXVQITQMIRH3A5CNFSM4I34WI52YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD76HDCI#issuecomment-536637833>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ADQ7JBGJYEMVDGKX6Z5K66DQMIRH3ANCNFSM4I34WI5Q>
.
|
Beta Was this translation helpful? Give feedback.
-
Well it's all three in this case, and the answer as always is "denormalization bad." If you do this through the publication then everything works because it's part of a system meant to do just this. If something changes you (or others who use this system, and there are lots of them and they have extensive resources) then you fix one thing and it all works again. If you also do it through otherIDs then you get to choose between maintaining more data (maybe lots more - 43,546 specimens in the case of http://arctos.database.museum/publication/10007289) or having inconsistent data (if you do this for some publication, or some specimens in some publication, then users don't have a single clear path that'll do the same thing for all specimens). You'll get no help maintaining those links; they're just from your specimens. Add in Media and it's just a doubling; you now have 87092 (plus one, the publication) THINGS to maintain for that one publication, and doing so still doesn't expose any new information. |
Beta Was this translation helpful? Give feedback.
-
So if I understand correctly, you are saying I should link the publication
only and that will be the only way this information is discoverable. That
means we would have no explicit way of searching for specimens that have
genomic data. Is there some other way we could flag specimens that have
published genome data to make them discoverable? Currently, we find
specimens with sequence data by looking for specimens that have GenBank
identifiers. We will soon be able to find specimens with isotopic data by
searching on IsoBank identifiers. How do we find specimens with genomic
data, if the data is embedded in a dryad doi?
…On Mon, Sep 30, 2019 at 10:52 AM dustymc ***@***.***> wrote:
both
Well it's all three in this case, and the answer as always is
"denormalization bad." If you do this through the publication then
everything works because it's part of a system meant to do just this. If
something changes you (or others who use this system, and there are lots of
them and they have extensive resources) then you fix one thing and it all
works again.
If you also do it through otherIDs then you get to choose between
maintaining more data (maybe lots more - 43,546 specimens in the case of
http://arctos.database.museum/publication/10007289) or having
inconsistent data (if you do this for some publication, or some specimens
in some publication, then users don't have a single clear path that'll do
the same thing for all specimens). You'll get no help maintaining those
links; they're just from your specimens.
Add in Media and it's just a doubling; you now have 87092 (plus one, the
publication) THINGS to maintain for that one publication, and doing so
still doesn't expose any new information.
—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
<#2281?email_source=notifications&email_token=ADQ7JBEE3GZPABDPVPDHYWLQMIVEZA5CNFSM4I34WI52YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD76KHQI#issuecomment-536650689>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ADQ7JBC5JBK7OJK2LPML5TLQMIVEZANCNFSM4I34WI5Q>
.
|
Beta Was this translation helpful? Give feedback.
-
What's the distinction between "genomic" and "sequence"? Will these specimens not also have GenBank IDs? I don't think there's anything in DOI or dryad that would make me think "genomic." |
Beta Was this translation helpful? Give feedback.
-
No, they will not have GenBank IDs. The data have been deposited in Dryad.
There is no way to know from the publication link that there is associated
genomic data, but there is. That is why we need to include an identfier to
the dataset, and also why we need some way to flag for the existence of the
genomic data. Solution?
…On Mon, Sep 30, 2019 at 11:24 AM dustymc ***@***.***> wrote:
What's the distinction between "genomic" and "sequence"?
Will these specimens not also have GenBank IDs?
I don't think there's anything in DOI or dryad that would make me think
"genomic."
—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
<#2281?email_source=notifications&email_token=ADQ7JBBM2R3MQH5XIUVC6OTQMIY3JA5CNFSM4I34WI52YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD76NJCA#issuecomment-536663176>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ADQ7JBH7DAQMFAXR7I7JHGLQMIY3JANCNFSM4I34WI5Q>
.
|
Beta Was this translation helpful? Give feedback.
-
Perhaps we need an Other ID of "Genomic Data" with a value of the url to
whatever repository is used?
On Mon, Sep 30, 2019 at 11:35 AM Mariel Campbell <[email protected]>
wrote:
… No, they will not have GenBank IDs. The data have been deposited in Dryad.
There is no way to know from the publication link that there is associated
genomic data, but there is. That is why we need to include an identfier to
the dataset, and also why we need some way to flag for the existence of the
genomic data. Solution?
On Mon, Sep 30, 2019 at 11:24 AM dustymc ***@***.***> wrote:
> What's the distinction between "genomic" and "sequence"?
>
> Will these specimens not also have GenBank IDs?
>
> I don't think there's anything in DOI or dryad that would make me think
> "genomic."
>
> —
> You are receiving this because you were assigned.
> Reply to this email directly, view it on GitHub
> <#2281?email_source=notifications&email_token=ADQ7JBBM2R3MQH5XIUVC6OTQMIY3JA5CNFSM4I34WI52YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD76NJCA#issuecomment-536663176>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/ADQ7JBH7DAQMFAXR7I7JHGLQMIY3JANCNFSM4I34WI5Q>
> .
>
|
Beta Was this translation helpful? Give feedback.
-
I can look harder later, crossref might be tracking supplements (although it's not in a quick check).
That's back to the normalization thing - someone will use the flag to find these two records, figure that's all that's in Arctos, and leave. The "solution" is to maintain the flag for everything with "genomics data" which I'd guess is a bit more than double the workload?
That's functionally identical to Media (including the maintenance implications). Looks a lot like more arbitrariness - which of the various methods of doing the same thing did this particular specimen use? - which as always leads to users not getting what they're looking for.
Probably not without some more background information; this still doesn't much make sense to me. Someone didn't cite specimens and did some genomics-thing that didn't result in anything in GenBank, and we need to do something special to capture that?? I'm sure I'm missing something, but I don't know what it could be. Given what I know now, Media seems the least-evil way to doing this. There's some label ("subject"?) used for similar things; a value of 'genomics data' there might even work well enough for the flag. |
Beta Was this translation helpful? Give feedback.
-
I see both points. It is nice to have the data right on Arctos but there is also the work of putting it on there. I know there are a lot more pubs on Arctos that use dryad - I don't plan to go back and add them to other identifiers if that is the route things go. The DOI is not really an other identifier so it doesn't really fit in there like a GenBank or BoLd number does. I'm not even really up for adding the DOI as an independent publication - again, it comes down to time. We barely keep up with adding publications as it is and if I had to check all for dryad and add that information too. Ugh. I think it really comes down to who needs the data. Researchers can go figure it out themselves. And as of right now, we don't need to know exactly which specimens have genomic data as we are simply counting publications. |
Beta Was this translation helpful? Give feedback.
-
I dug around a bit more, the crossref metadata does not contain the supplemental data so that path won't (yet) work. I'm not sure if they don't do that yet (seems unlikely) or everything I could find just didn't supply it (frustratingly common; see ORCID). I'm not sure there's a "both" points of view here. If ya'll want to include or link or data or whatever, then so do I! My only concern is HOW. Are we painting ourselves into a corner from which future development will have a hard time escaping, misleading users, making a bunch of work for ourselves in maintaining this or cleaning up the giant mess, building something that's going to add the most bang for the least buck, ???? Without a fairly detailed understanding of what we're trying to do - and I do not have that here - I don't really know the answer to those questions. I'm just trying to find the best way to model whatever ya'll want to do. I see @campmlc added a new otherID with a baseURL. Shall we close this? |
Beta Was this translation helpful? Give feedback.
-
I've been in discussion with Joe Cook who has reached out to various
researchers publishing genomic data. The agreement is that we should not
just put out the publication doi and force people to search through to find
genomic data, but there is not yet a standard repository or standard way of
citing. They are working on this. In the meantime, I'm thinking we could
have an ID of "Genomic Data" and link to the url.
…On Tue, Oct 1, 2019 at 8:57 AM dustymc ***@***.***> wrote:
I dug around a bit more, the crossref metadata does not contain the
supplemental data so that path won't (yet) work. I'm not sure if they don't
do that yet (seems unlikely) or everything I could find just didn't supply
it (frustratingly common; see ORCID).
I'm not sure there's a "both" points of view here. If ya'll want to
include or link or data or whatever, then so do I! My only concern is HOW.
Are we painting ourselves into a corner from which future development will
have a hard time escaping, misleading users, making a bunch of work for
ourselves in maintaining this or cleaning up the giant mess, building
something that's going to add the most bang for the least buck, ????
Without a fairly detailed understanding of what we're trying to do - and I
do not have that here - I don't really know the answer to those questions.
I'm just trying to find the best way to model whatever ya'll want to do.
I see @campmlc <https://github.com/campmlc> added a new otherID with a
baseURL. Shall we close this?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2281?email_source=notifications&email_token=ADQ7JBAMNBE6JFUGOEQAJ23QMNQOXA5CNFSM4I34WI52YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEABSF3Q#issuecomment-537076462>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ADQ7JBHLMZETPTFTGTD2UZ3QMNQOXANCNFSM4I34WI5Q>
.
|
Beta Was this translation helpful? Give feedback.
-
--------- Forwarded message ---------
From: Joseph Cook <[email protected]>
Date: Tue, Oct 1, 2019 at 8:30 AM
Subject: Re: [ArctosDB/arctos] Creating reciprocal linkages to Dryad for
genomic data (#2281)
To: Jocelyn Colella <[email protected]>
Yes, those are links that should be built (eventually) into Arctos---not
searching through pubs for links.........
You are right, but we need to get some of the faculty players behind this
effort (Mike, Chris, MVZ folks).
On Tue, Oct 1, 2019 at 8:26 AM Jocelyn Colella <[email protected]>
wrote:
SRA definitely. Possibly also link to ENSEMBL? http://ensemblgenomes.org
--
Postdoctoral Research Faculty, University of New Hampshire
MacManes Lab - Evolutionary & Ecophysiological Genomics
Molecular, Cellular, & Biomedical Sciences Dept.
512-567-9843 • jcolella [at] gmail [dot] com • [http://] jpcolella [dot]
weebly [dot] com
On Tue, Oct 1, 2019 at 9:21 AM Mariel Campbell <[email protected]>
wrote:
… I've been in discussion with Joe Cook who has reached out to various
researchers publishing genomic data. The agreement is that we should not
just put out the publication doi and force people to search through to find
genomic data, but there is not yet a standard repository or standard way of
citing. They are working on this. In the meantime, I'm thinking we could
have an ID of "Genomic Data" and link to the url.
On Tue, Oct 1, 2019 at 8:57 AM dustymc ***@***.***> wrote:
> I dug around a bit more, the crossref metadata does not contain the
> supplemental data so that path won't (yet) work. I'm not sure if they don't
> do that yet (seems unlikely) or everything I could find just didn't supply
> it (frustratingly common; see ORCID).
>
> I'm not sure there's a "both" points of view here. If ya'll want to
> include or link or data or whatever, then so do I! My only concern is HOW.
> Are we painting ourselves into a corner from which future development will
> have a hard time escaping, misleading users, making a bunch of work for
> ourselves in maintaining this or cleaning up the giant mess, building
> something that's going to add the most bang for the least buck, ????
>
> Without a fairly detailed understanding of what we're trying to do - and
> I do not have that here - I don't really know the answer to those
> questions. I'm just trying to find the best way to model whatever ya'll
> want to do.
>
> I see @campmlc <https://github.com/campmlc> added a new otherID with a
> baseURL. Shall we close this?
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#2281?email_source=notifications&email_token=ADQ7JBAMNBE6JFUGOEQAJ23QMNQOXA5CNFSM4I34WI52YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEABSF3Q#issuecomment-537076462>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/ADQ7JBHLMZETPTFTGTD2UZ3QMNQOXANCNFSM4I34WI5Q>
> .
>
|
Beta Was this translation helpful? Give feedback.
-
On Mon, Sep 30, 2019 at 11:22 PM Julia Allen ***@***.***> wrote:
Hey Joe at al.,
Right now GenBank has a specific repository for genomic data, for both raw
sequencing reads and for genome assemblies. It is called the Sequence Read
Archive (SRA <https://www.ncbi.nlm.nih.gov/sra>). As you upload the data
you can add in any specimen info of your choice. It is a little more
tricky trying to search the SRA fir specimen info, but you can search by
taxon and then look at the data associated with the uploads to see if there
is specimen data.
The SRA is growing at a rapid pace (and is huge-- as you might imagine).
Hope that was helpful! Feel free to email anytime!
Julie
On Mon, Sep 30, 2019 at 7:16 PM Joseph Cook ***@***.***> wrote:
>
> Hi Folks,
>
> Could you weigh in on genome data and Arctos--trying to figure out how to
> link (like w enow link to GenBank, but GenBank won't be the repository for
> genomes, correct?)
>
> Where will genomes be deposited eventually? (Not Dryad, right?)
>
> thanks,
> Joe
>
>
> ---------- Forwarded message ---------
> From: Mariel Campbell ***@***.***>
> Date: Mon, Sep 30, 2019 at 12:14 PM
> Subject: Re: [ArctosDB/arctos] Creating reciprocal linkages to Dryad for
> genomic data (#2281)
> To: ArctosDB/arctos ***@***.***>
> Cc: Subscribed ***@***.***>
> > You are receiving this because you were assigned.
> > Reply to this email directly, view it on GitHub
> > <
> #2281?email_source=notifications&email_token=ADQ7JBEE3GZPABDPVPDHYWLQMIVEZA5CNFSM4I34WI52YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD76KHQI#issuecomment-536650689
> >,
> > or mute the thread
> > <
> https://github.com/notifications/unsubscribe-auth/ADQ7JBC5JBK7OJK2LPML5TLQMIVEZANCNFSM4I34WI5Q
> >
> > .
> >
>
> —
> You are receiving this because you are subscribed to this thread.
>
On Tue, Oct 1, 2019 at 9:25 AM Mariel Campbell <[email protected]>
wrote:
… --------- Forwarded message ---------
From: Joseph Cook ***@***.***>
Date: Tue, Oct 1, 2019 at 8:30 AM
Subject: Re: [ArctosDB/arctos] Creating reciprocal linkages to Dryad for
genomic data (#2281)
To: Jocelyn Colella ***@***.***>
Yes, those are links that should be built (eventually) into Arctos---not
searching through pubs for links.........
You are right, but we need to get some of the faculty players behind this
effort (Mike, Chris, MVZ folks).
On Tue, Oct 1, 2019 at 8:26 AM Jocelyn Colella ***@***.***>
wrote:
> SRA definitely. Possibly also link to ENSEMBL? http://ensemblgenomes.org
> --
> Postdoctoral Research Faculty, University of New Hampshire
> MacManes Lab - Evolutionary & Ecophysiological Genomics
> Molecular, Cellular, & Biomedical Sciences Dept.
> 512-567-9843 • jcolella [at] gmail [dot] com • [http://] jpcolella [dot]
> weebly [dot] com
>
>
On Tue, Oct 1, 2019 at 9:21 AM Mariel Campbell ***@***.***>
wrote:
> I've been in discussion with Joe Cook who has reached out to various
> researchers publishing genomic data. The agreement is that we should not
> just put out the publication doi and force people to search through to find
> genomic data, but there is not yet a standard repository or standard way of
> citing. They are working on this. In the meantime, I'm thinking we could
> have an ID of "Genomic Data" and link to the url.
>
> On Tue, Oct 1, 2019 at 8:57 AM dustymc ***@***.***> wrote:
>
>> I dug around a bit more, the crossref metadata does not contain the
>> supplemental data so that path won't (yet) work. I'm not sure if they don't
>> do that yet (seems unlikely) or everything I could find just didn't supply
>> it (frustratingly common; see ORCID).
>>
>> I'm not sure there's a "both" points of view here. If ya'll want to
>> include or link or data or whatever, then so do I! My only concern is HOW.
>> Are we painting ourselves into a corner from which future development will
>> have a hard time escaping, misleading users, making a bunch of work for
>> ourselves in maintaining this or cleaning up the giant mess, building
>> something that's going to add the most bang for the least buck, ????
>>
>> Without a fairly detailed understanding of what we're trying to do - and
>> I do not have that here - I don't really know the answer to those
>> questions. I'm just trying to find the best way to model whatever ya'll
>> want to do.
>>
>> I see @campmlc <https://github.com/campmlc> added a new otherID with a
>> baseURL. Shall we close this?
>>
>> —
>> You are receiving this because you were mentioned.
>> Reply to this email directly, view it on GitHub
>> <#2281?email_source=notifications&email_token=ADQ7JBAMNBE6JFUGOEQAJ23QMNQOXA5CNFSM4I34WI52YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEABSF3Q#issuecomment-537076462>,
>> or mute the thread
>> <https://github.com/notifications/unsubscribe-auth/ADQ7JBHLMZETPTFTGTD2UZ3QMNQOXANCNFSM4I34WI5Q>
>> .
>>
>
|
Beta Was this translation helpful? Give feedback.
-
See above; this method still looks like a lot of work for low-quality data to me.
Ideally (if I'm at least understanding the basics) this would be be bigger than Arctos. If we can get crossref to supply the DOI of the supplemental data, then implementing and maintaining this should be trivial in Arctos. If we do the same thing we've done for other such data with the mechanism built for that sort of thing (media), then this should at least be predictable (eg, the URL is typed as a URL instead of a string) and manageable.
That's been in Arctos for quite some time: http://arctos.database.museum/info/ctDocumentation.cfm?table=CTCOLL_OTHER_ID_TYPE&field=NCBI%20Sequence%20Read%20Archive%20Run%20ID |
Beta Was this translation helpful? Give feedback.
-
I am reopening this because I don't believe this is an other identifier for any specific record. |
Beta Was this translation helpful? Give feedback.
-
There is only one record using this identifier - https://arctos.database.museum/guid/MSB:Para:20350 And it links to a publication - https://datadryad.org/stash/dataset/doi:10.5061/dryad.26j38 And I don't see anywhere that this specific specimen is cited or that the publication contains any specific genetic information. |
Beta Was this translation helpful? Give feedback.
-
I'm still baffled and this still looks like a job for Media, but that is definitely not a publication. The publication is http://dx.doi.org/10.1093/sysbio/syw105, the data used in the publication are https://dx.doi.org/10.5061/dryad.26j38. (They could have also been stashed in dropbox, github, or anything else - dryad is just a sciencey hard drive.) Note that we've also explicitly tossed about the entire functionality of the DOI ecosystem by making the link to https://datadryad.org/stash/dataset/doi:10.5061/dryad.26j38 for some reason. |
Beta Was this translation helpful? Give feedback.
-
I need to create an Other ID for dryad dois. I'd like to confirm I'm doing this correctly, and also see about initiating a discussion with them regarding reciprocal links?
e.g.
https://datadryad.org/stash/our_platform
https://datadryad.org/stash/dataset/doi:10.5061/dryad.26j38
We need this quickly for demo of Arctos ability to link to genomic data.
Beta Was this translation helpful? Give feedback.
All reactions