-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Details of preprocessing steps for data #1
Comments
Hello! Thank you for your interest in our work and for being the first to raise this question. Indeed, data sources are crucial. I have verified the issue you mentioned, but it seems there is a discrepancy with the information you provided. I downloaded the file CPTAC3_Lung_Adeno_Carcinoma_Proteome.tmt10.tsv, which contains only 11,032 protein data entries. I am not sure where the 11,485 proteins you mentioned came from. Additionally, from this link https://proteomic.datacommons.cancer.gov/pdc/analysis/f1c59a53-ab7c-11e9-9a07-0a80fada099c?StudyName=CPTAC%20LUAD%20Discovery%20Study%20-%20Proteome we can also view the heatmap for this dataset, which includes 11,029 protein data entries. In fact, the original data used by our tool is entirely consistent with the data in this heatmap. |
Hi, thanks for the quick reply. I have attached the CPTAC proteomic data for tumour samples, which contains data for 12.433 proteins. Also, the file you provided contains many missing values. So, what is the imputation strategy employed in your workflow? https://pdc.cancer.gov/pdc/cptac-pancancer under proteome |
Hi, |
Hello! Great tool you have developed.
I was curious as to the specific preprocessing steps for the proteomic data. For instance, CPTAC LUAD seems to contain 11485 proteins, where the original was 12400+. What were the thresholds for missing values, and what imputation strategy was used, and why?
Thanks.
The text was updated successfully, but these errors were encountered: