-
Notifications
You must be signed in to change notification settings - Fork 107
SDRF to MaxQuant analysis
Here, we show how to re-analyse the proteomic standard data set (publication) using the annotations in the SDRF-file. The procedure can easily adapt to other datasets.
We used the following versionf of sdrf-pipelines: sdrf-pipelines (0.0.14) and [MaxQuant] (https://www.maxquant.org/) (1.6.10.43) We recommend using Conda for the installation.
You need to download the SDRF file, a database that contains the yeast proteome and the UPS proteins (e.g. this one, and the raw data files from PRIDE.
The following command adds the experimental design, file paths and available search parameters in the sdrf-file to a MaxQuant parameter file with default settings.
parse_sdrf convert-maxquant -s sdrf.tsv -f $PWD/yeast_UPS.fasta -r PATH_TO_RAW_FILES
Here, we assume that the files sdrf.tsv and yeast_UPS.fasta are located in the current folder. Do not forget to change PATH_TO_RAW_FILES accordingly.
Important: Always use absolute paths for the fasta file and the folder with the raw files, as MaxQuant can have issues with relative paths. You might need to change the $PWD
function if you are in a Windows or a Mac environment.
You will get a MaxQuant parameter file named mqpar.xml
The resulted maqpar.xml starts with the following lines:
<?xml version="1.0" encoding="utf-8"?>
<MaxQuantParams xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<fastaFiles>
<FastaFileInfo>
<fastaFilePath>/home/veit/Test_sdrf_MQ/yeast_UPS.fasta</fastaFilePath>
<identifierParseRule>>([^\s]*)</identifierParseRule>
<descriptionParseRule>>(.*)</descriptionParseRule>
<taxonomyParseRule></taxonomyParseRule>
<variationParseRule></variationParseRule>
<modificationParseRule></modificationParseRule>
<taxonomyId></taxonomyId>
</FastaFileInfo>
</fastaFiles>
<fastaFilesProteogenomics></fastaFilesProteogenomics>
<fastaFilesFirstSearch></fastaFilesFirstSearch>
<fixedSearchFolder></fixedSearchFolder>
<andromedaCacheSize>350000</andromedaCacheSize>
<advancedRatios>True</advancedRatios>
<pvalThres>0.005</pvalThres>
<neucodeRatioBasedQuantification>False</neucodeRatioBasedQuantification>
<neucodeStabilizeLargeRatios>False</neucodeStabilizeLargeRatios>
<rtShift>False</rtShift>
<separateLfq>False</separateLfq>
<lfqStabilizeLargeRatios>True</lfqStabilizeLargeRatios>
The mqpar.xml
for the UPS example can be found here.
Note: Check the description of the sdrf-pipelines for further option like setting the temporary folder or the number of threads to accelerate the MaxQuant analysis
The standard command-line procedure is:
maxquant mqpar.xml
Running the full UPS data set will take a while (hours to a day) depending on the computer. You will find the output files in a subfolder combined in the given directory of the raw files.