-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Single cell analysis performance with worse results #287
Comments
@daichengxin can you add some description about How the data was analyzed with MaxQuant and the original paper. |
Thank you for the analysis. Very interesting. Can we maybe run the analysis with looser FDR cutoffs to see if it's more an identification problem or a quantification problem? |
@jpfeuffer a couple of ideas: In single cell, looks for me that MBR would be doing most of the job for the single cell runs. If you see we ID most of the peptides in the 20 and 100 samples (complex samples), more than MQ. Those samples are used as "channels" to look for poor signals from the single cell files. My point is the following:
Do you think this 20 min window that they use for MBR in MQ can be the difference with us? Do you think we need to do something in the experimental design.? For the MBR we are annotating every sample as a biological replicate to enable MBRs. |
Yes, if we find out that it's not the identification performance we will check quantification. |
@daichengxin can we check in the output of MQ of the original project how many peptides from the single cell samples are quantify/identified using MBR, I think MQ use a notation like |
This is what I thought, see @jpfeuffer The majority of peptides are not coming from IDs but from the MBR, then we should see how we fail to do proper MBR in our side, my previous two comments. |
Still.. if they don't pass FDR, there is nothing to quantify. So we have to wait for the ID analysis |
Can you please add the command you used for quantms? |
|
I don't know. I can't see any signs of transfer in the Proteomicslfq logs. |
The command with the relapsed FDR is:
Here the results: http://ftp.pride.ebi.ac.uk/pub/databases/pride/resources/proteomes/quantms-benchmark/PXD016921-MBR/ |
I found the plfq command. Looks correct. |
@jpfeuffer @ypriverol ran the data with a more relaxed FDR 0.15 at Protein and PSM level. However, the number of quantified peptides did not increase significantly. Similarily, almost all of the peptides quantified only in MQ are from MBR. For quantms, these peptides also are only quantified in other files rather than from same file. |
I think if we loosen the percentage of aligned samples required to trigger requantification, we should think about an FDR approach for transferred quantification first. |
You can use the new container tomorrow to play around with a parameter that controls this proportion. |
Can you let me know what is the parameter that we will need to use. |
id_transfer_threshold |
What is the default value? |
like
? These are my values with filtering out modified ones:
|
I think he means unique as in "unique for a protein". But IIRC we tried to only export unique-to-group peptides to MSstats so Timo's approach might be correct. |
You need to check what MQ exports and how it defines uniqueness if you want to compare by #identifications. You can check by picking some of the group-unique peptides and see how MQ defines them. |
Here is my script. And i try to run @timosachsenberg command, but got different values from @timosachsenberg.
|
How is your MQ script? |
|
0.1&0.9 results. Results folder: http://ftp.pride.ebi.ac.uk/pub/databases/pride/resources/proteomes/quantms-benchmark/PXD016921-MBR-newSVM/ |
I think if you wanted more IDs, one should have used 0.1 and something that is less than 0.75, e.g. 0.5 |
Why did you comment the filter for Unique (groups)? |
filtered results (unique for protein groups) @jpfeuffer |
Ok can we compare it with different quantms cutoffs now? Especially something lower than 0.9 for unidentified. And without |
I don't think we need to compare protein groups for now. They will (almost) always be less if the peptides are less. I'd rather see multiple cutoffs in the table. |
This is interesting. I used the old id files that Yasset provided in the original analysis and ran them through the ProteomicsLFQ tool. I got numbers much closer to MQ. I am currently traveling but would like to dig into the exact parameters used a bit more. |
Thank you very much for the quick table. Surprisingly very few changes. This must mean that all unidentified Features have scores either below 0.6 or above 0.9. |
@timosachsenberg exact parameters are always visible in the pipeline_info folder |
That's the command I used. FDR settings seem the be the same. Do you see anything I missed?
|
No, looks the same as the 0.25_0.75 directory. |
Maybe the idxmls changed meanwhile? Changes in Sage or percolator adapter? No idea. |
I can confirm that these outputs match the ones Dai reported. Will now check if -Seeding:intThreshold 1000.0 makes such a difference. |
It looks like we are using the default parameter for
|
Yep, main reason seems to be the intensity threshold.
Dai would it be possible to generate the results for different thresholds using the new parameter? |
Numbers look really good. I wonder if MQ has a min. feature ity. cutoff. I guess we need to see the effect on UPS |
Looks like the rigth parameters are close to 0.20 - 0.80. @jpfeuffer @timosachsenberg |
I will close the current issue in favor of #303 . All discussions about MBR LFQ must be move to that issue. |
Description of the bug
Analyzing single cell dataset PXD016291. The results folder can be found in here. The MBR is enabled. The results looks quite different from original paper. The less proteins are quantified in single cell samples than original results. And more proteins are misquantified in blank sample than original results. The original paper used MaxQuant with as follows parameters:
Command used and terminal output
$ nextflow run /hps/nobackup/juan/pride/reanalysis/quantms/main.nf -c /hps/nobackup/juan/pride/reanalysis/quantms/nextflow.config -profile ebislurm --input PXD016921-MBR.sdrf.tsv --search_engines comet,msgf --root_folder /hps/nobackup/juan/pride/reanalysis/single-cell/PXD016921/ --local_input_type raw --outdir PXD016921-MBR --database /hps/nobackup/juan/pride/reanalysis/multiomics-configs/databases/Homo-sapiens-uniprot-reviewed-contaminants-decoy-202210.fasta --protein_level_fdr_cutoff 0.01 --psm_level_fdr_cutoff 0.01 --quantify_decoys true --transfer_ids mean --targeted_only false --skip_post_msstats false --enable_pmultiqc true -resume
Relevant files
Original MaxQuant results
quantms results
System information
No response
The text was updated successfully, but these errors were encountered: