Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proteomicsLFQ with new SVM results in UPS1 dataset #301

Closed
ypriverol opened this issue Oct 6, 2023 · 23 comments
Closed

proteomicsLFQ with new SVM results in UPS1 dataset #301

ypriverol opened this issue Oct 6, 2023 · 23 comments
Assignees
Labels
documentation Improvements or additions to documentation enhancement New feature or request high-priority

Comments

@ypriverol
Copy link
Member

ypriverol commented Oct 6, 2023

Description of feature

@timosachsenberg @jpfeuffer @daichengxin we also have run it with parameters:

feature_with_id_min_score = 0.25
feature_without_id_min_score = 0.75

Dataset PXD001819, Files: http://ftp.pride.ebi.ac.uk/pub/databases/pride/resources/proteomes/quantms-benchmark/PXD001819NEWSAGE/

Results Table:

image (7)

Previous UPS detected:

image (8)

Current UPS detected:

image (9)

This issue needs to be discussion about the default parameters for MBRs. This issue is related to #287

@ypriverol ypriverol added the enhancement New feature or request label Oct 6, 2023
@ypriverol ypriverol added high-priority release 1.3 documentation Improvements or additions to documentation labels Oct 6, 2023
@ypriverol
Copy link
Member Author

@daichengxin can you provide a similar plot than the last one for Maxquant.

@timosachsenberg
Copy link

I think the numbers don't tell you that much here. Can we somehow see how well the quantities match? e.g., if we only picked up noise in the old version the new one would be better. (and the other way around)

@ypriverol
Copy link
Member Author

ypriverol commented Oct 6, 2023

You are right, this is why I ask for the same plot from MQ for the last plot. Our first version of the plot has a lot of noise quantities, and we have solved that, but we have to find the right parameters for both thresholds now. It looks like the current 0.25 and 0.75 is too stringent by the results in here #287

I'm running now both datasets with 0.10 and 0.90.

@jpfeuffer
Copy link
Collaborator

jpfeuffer commented Oct 6, 2023

I agree. Can we do an automated evaluation maybe? We could even do a little step in nextflow and then launch the process multiple times with different parameters.
How long is the runtime with -resume in the last step?

@jpfeuffer
Copy link
Collaborator

jpfeuffer commented Oct 6, 2023

But from my memory this actually looks a bit like the MQ results. 2500 amol was the turning point.

Except for the lowest concentration of course. In theory it does not make sense that you find more features in lower concentrations, if you didn't even find them in higher ones.
Unless, the higher amount of MS2 IDs in the higher concentrations and consecutively extracted features leads to an increased pressure to link untargeted features in the lowest concentration. And the more higher concentrations (relative to the concentration you are currently looking at) you have, the more features you try to link. Therefore the lowest concentration gets more "chances". Does this make sense?
I think in this case an FDR approach vs the same probability cutoff for every run would bring additional value.

I also agree with @timosachsenberg that we should do plots that look at the relative differences between each concentration, instead of just found features/proteins.
E.g. check the number of significantly deregulated proteins between samples.
(Although the higher number of found features in the lowest concentration remains a bit weird to me).

@daichengxin
Copy link
Collaborator

daichengxin commented Oct 7, 2023

MaxQuant results provided by original paper :
PalombaA_pubmed_34038140_2.xlsx

edit: replaced P1-P9 with concentrations. @jpfeuffer

@daichengxin
Copy link
Collaborator

daichengxin commented Oct 7, 2023

The same plot given by authors. https://europepmc.org/articles/PMC8280745/figure/fig2/. 2500 amol was the turning point. And ours results looks like better than MQ at low concentrations?

image

@daichengxin
Copy link
Collaborator

daichengxin commented Oct 7, 2023

@jpfeuffer
Copy link
Collaborator

Can you replace P1 and P9 with something reasonable that includes the amol concentration in the header of the MQ xlsx?

@ypriverol
Copy link
Member Author

Test results. Results folder: http://ftp.pride.ebi.ac.uk/pub/databases/pride/resources/proteomes/quantms-benchmark/PXD001819NEWSAGE/proteomicslfq/ image image image

@timosachsenberg @jpfeuffer for the 0.10 and 0.75 I used the threshold of 1000 for the intensity. It actually looks much better than 0.1 0.9 with intensity 10'000. What do you think?

@jpfeuffer
Copy link
Collaborator

Yes looks better imo. But we should really check expected fold changes.

@ypriverol
Copy link
Member Author

@jpfeuffer
Copy link
Collaborator

upsReval.zip

Here is my old script. In the beginning it includes some fixes for MSstats loading and plotting which are probably not necessary anymore.
I am not sure when I will get to it. Maybe Friday.

@ypriverol
Copy link
Member Author

I will try to make it run today and let you know. Do you have the original figures from MQ.

@jpfeuffer
Copy link
Collaborator

It can read MQ files and PD files, too. I use the results from https://github.com/wombat-p

@ypriverol
Copy link
Member Author

Can you provide me a direct link to those outputs PD and MQ from wombat-p. Im now configuring your script.

@ypriverol
Copy link
Member Author

Here the results @jpfeuffer of your script with this data:

plot_zoom_png

@jpfeuffer
Copy link
Collaborator

Looks good but I think not a large improvement to fold changes for comparisons with <=500 amol.

@ypriverol
Copy link
Member Author

ypriverol commented Oct 11, 2023

We are fine with that, I have the feeling we have reduced a lot the false positive signals in MBR and that is the major advantage. But as you said, nothing has happens in terms of the feature detection by itself in the low concentrations.

@jpfeuffer
Copy link
Collaborator

jpfeuffer commented Oct 11, 2023

I agree that it might be enough since the fold changes did not get worse but the number of found features look a bit better. I think MQ identified way less in the lower concentrations (which doesn't mean much since they might actually be close to unquantifiability and looking at the quants maybe should stay unreported). We might be able to set a higher min threshold for unidentified features actually.
I am not sure if my MQ files are the best to use (old version and I don't know the settings used) but I can send them tomorrow.

It would be great if we could have some debug output on all traces of all features across a consensusFeature. Or the extracted areas.
@timosachsenberg we could also check the interpolation of quantities for traces that could not be fit (in the OpenSwath algos). I think you fixed something there but maybe this approach in general is not so great for our purpose?

@ypriverol
Copy link
Member Author

I will close the following issue in favor of #303 Please move the discussions about future improvements about MBR LFQ to that issue.

@jpfeuffer
Copy link
Collaborator

jpfeuffer commented Oct 15, 2023

@ypriverol @timosachsenberg This was my old plot for MQ. I don't remember the version. But you can clearly see that if MQ finds a feature it is usually correct in rel. quants. We have more proteins basically everywhere, but out quants are very off in the low concentrations. My interpretation is a significant overestimation of quants in features in lower concentrations. Basically starting from 2500 and below there is no difference in quants for a linked feature anymore.
FDR might help but maybe also a more sensitive quantification is necessary in those low concentration features (since as you can see MQ is able to recover 3-4 proteins with sometimes close to correct rel. quants.
MQ_final

@timosachsenberg
Copy link

@jpfeuffer do you think the interpolation you mentioned could be an issue? could be easy to check...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request high-priority
Projects
None yet
Development

No branches or pull requests

4 participants