Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Definition of Data acquisition method in the specification [DO NOT MERGE] #718

Merged
merged 14 commits into from
Aug 29, 2024

Conversation

ypriverol
Copy link
Member

This PR includes small changes in the specification to define how DIA experiments and other acquisition methods. In addition, an example and other small fixes are done in other datasets.

@ypriverol ypriverol added enhancement New feature or request help wanted Extra attention is needed Specification Specification issues related with PRIDE formats, API, etc PSI-Discussion labels Aug 27, 2024
@ypriverol ypriverol changed the title Definition of Data acquisition method in the specification Definition of Data acquisition method in the specification [DO NOT MERGE] Aug 27, 2024
@trishorts
Copy link
Collaborator

This is good stuff. Really appreciate your service to the community. Maybe there is stuff in another place that I am not aware of.

But, I wonder where the info about which protease (if any) is located. Also wonder about what search engine and version and parameters is located. Also, what about the datails of the protein database (fasta or xml).

Can you point me to the definitions file for each of your headings? I don't know what you expect for modifications, etc.

@ypriverol
Copy link
Member Author

This is good stuff. Really appreciate your service to the community. Maybe there is stuff in another place that I am not aware of.

But, I wonder where the info about which protease (if any) is located.

https://github.com/bigbio/proteomics-sample-metadata/blob/master/sdrf-proteomics/README.adoc#1022-cleavage-agents

Also wonder about what search engine and version and parameters is located. Also, what about the datails of the protein database (fasta or xml).

We are trying to do not encode databases because it can change from experiment to reanalyses.

Can you point me to the definitions file for each of your headings? I don't know what you expect for modifications, etc.

Some data analysis fields https://github.com/bigbio/proteomics-sample-metadata/blob/master/sdrf-proteomics/data-analysis-metadata.adoc

@trishorts
Copy link
Collaborator

trishorts commented Aug 27, 2024

so wonder about what search engine and version and parameters is located. Also, what about the datails of the protein database (fasta or xml).

We are trying to do not encode databases because it can change from experiment to reanalyses.

Nonetheless, with knowledge of the protein sequence data that was used, it becomes hard to reproduce the original result. Re-analysis is important. But, I think reproduction is equally important. So, metadata should store everything necessary to enable reproduction. You don't have to store the FASTA/XML. But, one should know the location and date when it was obtained and if modifications were obtained there.

Similarly, all the search software metadata should come along. Engine, version and all settings.

This is opinion of course.

@ypriverol
Copy link
Member Author

so wonder about what search engine and version and parameters is located. Also, what about the datails of the protein database (fasta or xml).

We are trying to do not encode databases because it can change from experiment to reanalyses.

Nonetheless, with knowledge of the protein sequence data that was used, it becomes hard to reproduce the original result. Re-analysis is important. But, I think reproduction is equally important. So, metadata should store everything necessary to enable reproduction. You don't have to store the FASTA/XML. But, one should know the location and date when it was obtained and if modifications were obtained there.

Similarly, all the search software metadata should come along. Engine, version and all settings.

This is opinion of course.

I think for search engines and data analysis we should continue the work started by @david-bouyssie @veitveit https://github.com/bigbio/proteomics-sample-metadata/blob/master/sdrf-proteomics/data-analysis-metadata.adoc which are still not part of the full specification but a proposal about how to capture these fields.

This PR is mainly to discuss how in the current specification we can capture DDA vs DIA which is getting requested more and more by users. It would be great from your point of view to see what other parameters we should capture in DIA instrument settings.

@ypriverol ypriverol requested a review from TineClaeys August 28, 2024 13:47
@ypriverol
Copy link
Member Author

This PR adds the terms needed for this PR PRIDE-Archive/pride-ontology#104

@ypriverol ypriverol requested a review from enryH August 28, 2024 15:58
@ypriverol ypriverol merged commit 3a13234 into master Aug 29, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed PSI-Discussion Specification Specification issues related with PRIDE formats, API, etc
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants