Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Big dataset quantified before and now with ms2rescore not quant #459

Open
ypriverol opened this issue Dec 7, 2024 · 15 comments
Open

Big dataset quantified before and now with ms2rescore not quant #459

ypriverol opened this issue Dec 7, 2024 · 15 comments
Assignees
Labels
bug Something isn't working

Comments

@ypriverol
Copy link
Member

ypriverol commented Dec 7, 2024

Description of the bug

I have run this dataset with the previous version of quantms 1.2, without ms2rescore and sage. Right now is not working:

The exit status of the task that caused the workflow execution to fail was: 8

Error executing process > 'NFCORE_QUANTMS:QUANTMS:TMT:PROTEINQUANT:PROTEINQUANTIFIER (MSV000085836.sdrf_openms_design)'

Caused by:
  Process `NFCORE_QUANTMS:QUANTMS:TMT:PROTEINQUANT:PROTEINQUANTIFIER (MSV000085836.sdrf_openms_design)` terminated with an error exit status (8)


Command executed:

  ProteinQuantifier \
      -method 'top' \
      -in ID_mapper_merge_epi_filter_resconf.consensusXML \
      -design MSV000085836.sdrf_openms_design.tsv \
      -out MSV000085836.sdrf_openms_design_protein_openms.csv \
      -mztab MSV000085836.sdrf_openms_design_openms.mzTab \
      -peptide_out MSV000085836.sdrf_openms_design_peptide_openms.csv \
      -top:N 3 \
      -top:aggregate median \
      -top:include_all \
       \
      -ratios \
      -threads 8 \
       \
      -debug 0 \
      2>&1 | tee pro_quant.log
  
  cat <<-END_VERSIONS > versions.yml
  "NFCORE_QUANTMS:QUANTMS:TMT:PROTEINQUANT:PROTEINQUANTIFIER":
      ProteinQuantifier: $(ProteinQuantifier 2>&1 | grep -E '^Version(.*)' | sed 's/Version: //g' | cut -c 1-50)
  END_VERSIONS

Command exit status:
  8

Command output:
  Quantifying peptides...
  Warning: No peptides quantified.
  /tmp/OpenMS/src/openms/source/ANALYSIS/QUANTITATION/PeptideAndProteinQuant.cpp(446): No protein matches found, cannot quantify proteins.
  Error: Unexpected internal error (No protein matches found, cannot quantify proteins.)

Command wrapper:
  Quantifying peptides...
  Warning: No peptides quantified.
  /tmp/OpenMS/src/openms/source/ANALYSIS/QUANTITATION/PeptideAndProteinQuant.cpp(446): No protein matches found, cannot quantify proteins.
  Error: Unexpected internal error (No protein matches found, cannot quantify proteins.)

Work dir:
  /hps/nobackup/juan/pride/reanalysis/absolute-expression/cell-lines/MSV000085836/work/27/076c4aca914f0b880f6820aea85f7f

Container:
  /hps/nobackup/juan/pride/reanalysis/singularity/ghcr.io-openms-openms-tools-thirdparty-sif-latest.img

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`

ID Filter only found 1 Peptide: tail -n 400 -f ID_mapper_merge_epi_filter_resconf.log

ConsensusXMLFile::store():  found 1 invalid unique ids
IDConflictResolver took 30:22 m (wall), 30:12 m (CPU), 01:35 m (system), 28:37 m (user); Peak Memory Usage: 35086 MB.

Looks like IDFilter is the issue:

tail -n 400 ID_mapper_merge_epi_idfilter.log 
Filtering by PSM score (better than 0.05)...
No 'score:peptide' threshold set. Not filtering by peptide score.
No 'score:protein' threshold set. Not filtering by protein score.
Filtering by protein group score...
Removing unreferenced protein hits...
Removing peptide hits without protein references...
Before filtering:
1 identification runs with 79890 proteins,
8175488 spectra identified with 8175488 spectrum matches.
After filtering:
1 identification runs with 0 proteins,
0 spectra identified with 0 spectrum matches.
ConsensusXMLFile::store():  found 1 invalid unique ids
IDFilter took 35:14 m (wall), 34:46 m (CPU), 02:00 m (system), 32:46 m (user); Peak Memory Usage: 42713 MB.

Here the command of IDFilter:

#!/usr/bin/env bash

set -e # Exit if a tool returns a non-zero status/exit code
set -u # Treat unset variables and parameters as an error
set -o pipefail # Returns the status of the last command to exit with a non-zero status or zero if all successfully execute
set -C # No clobber - prevent output redirection from overwriting files.

IDFilter \
    -in ID_mapper_merge_epi.consensusXML \
    -out ID_mapper_merge_epi_filter.consensusXML \
    -threads 22 \
    -score:proteingroup "0.01" -score:psm "0.05" -delete_unreferenced_peptide_hits \
    2>&1 | tee ID_mapper_merge_epi_idfilter.log

cat <<-END_VERSIONS > versions.yml
"NFCORE_QUANTMS:QUANTMS:TMT:PROTEININFERENCE:IDFILTER":
    IDFilter: $(IDFilter 2>&1 | grep -E '^Version(.*)' | sed 's/Version: //g' | cut -d ' ' -f 1)
END_VERSIONS

Command used and terminal output

No response

Relevant files

No response

System information

No response

@ypriverol ypriverol added the bug Something isn't working label Dec 7, 2024
@ypriverol ypriverol self-assigned this Dec 7, 2024
@ypriverol
Copy link
Member Author

Here the quantms PROTEININFERENCE:

#!/usr/bin/env bash

set -e # Exit if a tool returns a non-zero status/exit code
set -u # Treat unset variables and parameters as an error
set -o pipefail # Returns the status of the last command to exit with a non-zero status or zero if all successfully execute
set -C # No clobber - prevent output redirection from overwriting files.

ProteinInference \
    -in ID_mapper_merge.consensusXML \
    -threads 16 \
    -picked_fdr true \
    -picked_decoy_string DECOY_ \
    -protein_fdr true \
    -Algorithm:use_shared_peptides true \
    -Algorithm:annotate_indistinguishable_groups true \
     \
    -Algorithm:score_aggregation_method best \
    -Algorithm:min_peptides_per_protein 1 \
    -out ID_mapper_merge_epi.consensusXML \
    -debug 0 \
    2>&1 | tee ID_mapper_merge_inference.log

cat <<-END_VERSIONS > versions.yml
"NFCORE_QUANTMS:QUANTMS:TMT:PROTEININFERENCE:PROTEININFERENCER":
    ProteinInference: $(ProteinInference 2>&1 | grep -E '^Version(.*) ' | sed 's/Version: //g' | cut -d ' ' -f 1)
END_VERSIONS

@jpfeuffer
Copy link
Collaborator

Did you also use epifany before? And this is TMT right? You'll probably have a lot of errors from ms2rescore. I'm convinced that it's not working correctly for TMT, how we do it right now.

@daichengxin
Copy link
Collaborator

Did you use 1.2.0? Could you share the ID_mapper_merge_epi.consensusXML and other related files?

@ypriverol
Copy link
Member Author

ypriverol commented Dec 8, 2024

@daichengxin
Copy link
Collaborator

NoMS2Rescore:

Filtering by PSM score (better than 0.01)...
No 'score:protein' threshold set. Not filtering by protein score.
Removing decoy hits...
Filtering by protein group score...
Removing unreferenced protein hits...
Removing peptide hits without protein references...
Before filtering:
1 identification runs with 40419 proteins,
9822955 spectra identified with 9822955 spectrum matches.
After filtering:
1 identification runs with 13161 proteins,
7780873 spectra identified with 7780873 spectrum matches.
ConsensusXMLFile::store():  found 1 invalid unique ids
IDFilter took 34:12 m (wall), 29:05 m (CPU), 01:38 m (system), 27:27 m (user); Peak Memory Usage: 42328 MB.

MS2Rescore:

Filtering by PSM score (better than 0.01)...
No 'score:protein' threshold set. Not filtering by protein score.
Removing decoy hits...
Filtering by protein group score...
Removing unreferenced protein hits...
Removing peptide hits without protein references...
Before filtering:
1 identification runs with 40566 proteins,
10184014 spectra identified with 10184014 spectrum matches.
After filtering:
1 identification runs with 0 proteins,
0 spectra identified with 0 spectrum matches.
ConsensusXMLFile::store():  found 1 invalid unique ids
IDFilter took 33:37 m (wall), 29:02 m (CPU), 02:03 m (system), 26:59 m (user); Peak Memory Usage: 42670 MB.

@ypriverol
Copy link
Member Author

This is with the latest version of OpenMS? Can you make both files and the commands to run IDFilter available?

@timosachsenberg
Copy link

is the psm score still the q-value or something else (e.g., p-value) after ms2rescore.

@jpfeuffer
Copy link
Collaborator

Off topic:

30 minutes for filtering? It's really time for a new file format.

@ypriverol
Copy link
Member Author

We have one student here on this.

@daichengxin
Copy link
Collaborator

Yes, dev branch. PSM Still q-value. Protein is also q-value.
image
image.

And I plotted protein q-value for without ms2rescore (orange line) and with ms2rescore (blue line).
All target PSM are removed at protein q-value 0.01 for with ms2rescore.
But almost target PSM are remained at protein q-value 0.01 for without ms2rescore.
NoMS2Rescore

@ypriverol
Copy link
Member Author

Can you plot the PSMs q-value distribution?

@daichengxin
Copy link
Collaborator

daichengxin commented Dec 10, 2024

PSM q-value distribution from the large-scale datasets. It looks normal.

psm_MS2Rescore
psm_NoMS2Rescore

@ypriverol
Copy link
Member Author

ypriverol commented Dec 10, 2024

@daichengxin, @timosachsenberg has pointed out that the scores don't look quite right in #459 (comment), Can you plot the protein scores without the smooth?

How many proteins do we have with 0 in the q-value?

@timosachsenberg
Copy link

can you also plot target / decoy curves and how they change before / after ms2rescore? For me it looks like some peptide decoys got promoted to a good score which results in bad protein q-values

@ypriverol
Copy link
Member Author

Ok, looks like the problem is not ms2rescore. Here my recent benchmark using same dataset but not ms2rescore:

nextflow run /hps/nobackup/juan/pride/reanalysis/quantms/main.nf
		 -profile pride_slurm,dev
		 --input MSV000085836.sdrf.tsv
		 --search_engines comet,sage,msgf
		 --root_folder /hps/nobackup/juan/pride/reanalysis/absolute-expression/cell-lines/MSV000085836/
		 --local_input_type raw
		 --outdir /hps/nobackup/juan/pride/reanalysis/absolute-expression/ae-entrapment/cell-lines/MSV000085836
		 --database /hps/nobackup/juan/pride/reanalysis/multiomics-configs/databases/Homo-sapiens-uniprot-reviewed-contam-entrap-decoy-20241105.fasta
		 --protein_level_fdr_cutoff 0.01
		 --posterior_probabilities percolator
		 --psm_level_fdr_cutoff 0.05
		 --protocol TMT
		 --quantify_decoys true
		 --sage_processes 100
		 --skip_post_msstats true
		 --enable_pmultiqc false
		 -resume
		 -with-tower

Here the results from proteinquantifer:

The exit status of the task that caused the workflow execution to fail was: 8

Error executing process > 'NFCORE_QUANTMS:QUANTMS:TMT:PROTEINQUANT:PROTEINQUANTIFIER (MSV000085836.sdrf_openms_design)'

Caused by:
  Process `NFCORE_QUANTMS:QUANTMS:TMT:PROTEINQUANT:PROTEINQUANTIFIER (MSV000085836.sdrf_openms_design)` terminated with an error exit status (8)


Command executed:

  ProteinQuantifier \
      -method 'top' \
      -in ID_mapper_merge_epi_filter_resconf.consensusXML \
      -design MSV000085836.sdrf_openms_design.tsv \
      -out MSV000085836.sdrf_openms_design_protein_openms.csv \
      -mztab MSV000085836.sdrf_openms_design_openms.mzTab \
      -peptide_out MSV000085836.sdrf_openms_design_peptide_openms.csv \
      -top:N 3 \
      -top:aggregate median \
      -top:include_all \
       \
      -ratios \
      -threads 8 \
       \
      -debug 0 \
      2>&1 | tee pro_quant.log
  
  cat <<-END_VERSIONS > versions.yml
  "NFCORE_QUANTMS:QUANTMS:TMT:PROTEINQUANT:PROTEINQUANTIFIER":
      ProteinQuantifier: $(ProteinQuantifier 2>&1 | grep -E '^Version(.*)' | sed 's/Version: //g' | cut -c 1-50)
  END_VERSIONS

Command exit status:
  8

Command output:
  Quantifying peptides...
  Warning: No peptides quantified.
  /tmp/OpenMS/src/openms/source/ANALYSIS/QUANTITATION/PeptideAndProteinQuant.cpp(446): No protein matches found, cannot quantify proteins.
  Error: Unexpected internal error (No protein matches found, cannot quantify proteins.)

Command wrapper:
  Quantifying peptides...
  Warning: No peptides quantified.
  /tmp/OpenMS/src/openms/source/ANALYSIS/QUANTITATION/PeptideAndProteinQuant.cpp(446): No protein matches found, cannot quantify proteins.
  Error: Unexpected internal error (No protein matches found, cannot quantify proteins.)

Work dir:
  /hps/nobackup/juan/pride/reanalysis/absolute-expression/cell-lines/MSV000085836/work/10/f70f443e848d1982b23d4d7895fe80

Container:
  /hps/nobackup/juan/pride/reanalysis/singularity/ghcr.io-openms-openms-tools-thirdparty-sif-latest.img

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line

Here is the ProteinInference data @timosachsenberg @jpfeuffer:

https://ftp.pride.ebi.ac.uk/pub/databases/pride/resources/proteomes/quantms-benchmark/8711e85cea3f3da8ec117ec7477be9/

Here is the folder for IDFilter @timosachsenberg @jpfeuffer:

https://ftp.pride.ebi.ac.uk/pub/databases/pride/resources/proteomes/quantms-benchmark/298bfca6ce4a4c7fd531b62d62f168/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants