Scripts to run the VGP pipelines through planemo - Do not Support Trio data yet Designed to import data from the Genomeark AWS repository.
See the file installs.sh for the list of dependencies
You need:
- the species name (no space, underscores) (e.g. Taeniopygia_guttata)-
- the specimen ID (e.g. bTaeGut2), and
- the path of the output table (e.g. ./list_file.tab)
sh VGP-planemo-scripts/get_files_names.sh $Species_name $Specimen_ID $output
A tabular file containing the names of PacBio, Arima, and Bionano files on Genomark
e.g.
Taeniopygia_guttata bTaeGut2 m54306U_210519_154448.hifi_reads.fastq.gz m54306U_210521_004211.hifi_reads.fastq.gz m54306Ue_210629_211205.hifi_reads.fastq.gz m54306Ue_210719_083927.hifi_reads.fastq.gz m64055e_210624_223222.hifi_reads.fastq.gz bTaeGut2_ARI8_001_USPD16084394-AK5146_HJFMFCCXY_L1_R1.fq.gz bTaeGut2_ARI8_001_USPD16084394-AK5146_HJFMFCCXY_L2_R1.fq.gz bTaeGut2_ARI8_001_USPD16084394-AK5146_HJFMFCCXY_L3_R1.fq.gz bTaeGut2_ARI8_001_USPD16084394-AK5146_HJFMFCCXY_L4_R1.fq.gz bTaeGut2_ARI8_001_USPD16084394-AK5146_HJFMFCCXY_L5_R1.fq.gz bTaeGut2_ARI8_001_USPD16084394-AK5146_HJFMFCCXY_L6_R1.fq.gz bTaeGut2_ARI8_001_USPD16084394-AK5146_HJFMFCCXY_L7_R1.fq.gz bTaeGut2_ARI8_001_USPD16084394-AK5146_HJFMFCCXY_L8_R1.fq.gz bTaeGut2_ARI8_001_USPD16084394-AK5146_HJFMMCCXY_L6_R1.fq.gz bTaeGut2_ARI8_001_USPD16084394-AK5146_HJFMFCCXY_L1_R2.fq.gz bTaeGut2_ARI8_001_USPD16084394-AK5146_HJFMFCCXY_L2_R2.fq.gz bTaeGut2_ARI8_001_USPD16084394-AK5146_HJFMFCCXY_L3_R2.fq.gz bTaeGut2_ARI8_001_USPD16084394-AK5146_HJFMFCCXY_L4_R2.fq.gz bTaeGut2_ARI8_001_USPD16084394-AK5146_HJFMFCCXY_L5_R2.fq.gz bTaeGut2_ARI8_001_USPD16084394-AK5146_HJFMFCCXY_L6_R2.fq.gz bTaeGut2_ARI8_001_USPD16084394-AK5146_HJFMFCCXY_L7_R2.fq.gz bTaeGut2_ARI8_001_USPD16084394-AK5146_HJFMFCCXY_L8_R2.fq.gz bTaeGut2_ARI8_001_USPD16084394-AK5146_HJFMMCCXY_L6_R2.fq.gz bTaeGut2_Saphyr_DLE1_3172351.cmap
You need:
- the name of the table output of the previous step
- The prefix for the yaml files used to run the workflow (e.g.
./test
will produce files called./test_wf1_$Specimen_ID.yaml
)
python VGP-planemo-scripts/prepare_wf1.py $Input_table $Yaml_prefix
To change the parameters of all jobs, modify the file wf1_run.sample.yaml
For each Species :
- A Yaml File containing the input paths and the job parameters named
$Yaml_prefix_wf1_$Specimen_ID.yml
(To modify individual job parameter modify these)
For all :
- A table named
wf_run_$Input_table
containing the input table plus columns listing :- The yaml file to use for the workflow
- The json file that will contain the results of the workflow
- The command line to paste on your shell to run the workflow on Galaxy.org (Change the command line if you want to run against another galaxy instance). Set or replace
$MAINKEY
variable with your Galaxy API ID.
You need:
- The table named
wf_run_$Input_table
- The prefix for the yaml files used to run the workflow (e.g.
./test
will produce files called./test_wf3_$Specimen_ID.yaml
)
python VGP-planemo-scripts/prepare_wf3.py wf_run_$Input_table $Yaml_prefix
To change the parameters of all jobs, modify the file wf3_run.sample.yaml