Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with decoy prefix when trying to get protein level confidence. #114

Open
louisebuur opened this issue Nov 9, 2023 · 2 comments
Open

Comments

@louisebuur
Copy link

louisebuur commented Nov 9, 2023

Hi!

I am having an issue with getting protein level confidence.

I have an MS Amanda output file that I am reading using the psm_utils package, and then converting that file to a LinearPsmDataset

This is the file that I am using
https://drive.google.com/file/d/1PiztK5BY4byAR2Loup6kTf7j9Q4QMgGQ/view?usp=drive_link

from ms2rescore.rescoring_engines.mokapot import convert_psm_list
from psm_utils.io import read_file
import mokapot
psm_list = read_file("path_to_file.csv", filetype ="msamanda") 
mokapot_psms = convert_psm_list(psm_list)

I used the make_decoys function to add decoy sequences to my FASTA file

mokapot.make_decoys(fasta = "path_to_file.fasta",decoy_prefix = "REV_",reverse=True,out_file="E:/test.fasta")

And then I use the add_proteins function and put in the parameters that correspond to the ones I used in the search

mokapot_psms.add_proteins("E:/test.fasta",enzyme = "[KR]",missed_cleavages = 2,min_length = 6,max_length = 60)

Then I want to assign confidence and print the results

confidence_result = mokapot_psms.assign_confidence()
print(confidence_result.accepted)

However I get this error:
25362 out of 46118 peptides could not be mapped. Please check your digest settings.
ValueError: Fewer than 90% of all peptides could be matched to proteins. Please verify that your digest settings are correct.

I realized that I didn't include the decoy_prefix, so I tried to do that

mokapot_psms.add_proteins("E:/test.fasta",enzyme = "[KR]",missed_cleavages = 2,min_length = 6,max_length = 60,decoy_prefix = "REV_")

Run this again

confidence_result = mokapot_psms.assign_confidence()
print(confidence_result.accepted)

And then get this error
46118 out of 46118 peptides could not be mapped. Please check your digest settings.
ValueError: Fewer than 90% of all peptides could be matched to proteins. Please verify that your digest settings are correct.

I did double check that the digest settings are correct. And it seems that half of the peptides can be mapped if I do not specify the decoy prefix in the add_proteins function.
So, to me it seems like there is an issue with the decoy prefix pattern.

I also tried to use the default decoy pattern decoy_ when creating the FASTA file using the make_decoys function, and then also changed the decoy prefix in the input file. I still got the same errors.

I am using version 0.10.0 of Mokapot

Can you give me a hint of what may be wrong here?
And please let me know if I should provide further information.

Thanks in advance!

@wfondrie
Copy link
Owner

wfondrie commented Nov 9, 2023

This indeed sounds like a problem! I've just requested access to the files so I can take a look. My guess is that perhaps the peptide strings are formatted in a way that isn't accounted for in mokapot, but I'll have to take a closer look.

Would it be alright to post examples from the pin file here as future documentation for the issue?

@louisebuur
Copy link
Author

louisebuur commented Nov 9, 2023

Hi Will,
I should have given you access to the file now.
And yes of course, feel free to post examples :)

The file I analysed with MS Amanda is not mine, but obtained from this data set

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants