You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When running individual steps of pipeline I get different results as compared to when using batch command
Using batch leads to noisy results
When running commands individually
when using the batch command
Step by Step difference
I have not included here the coverage commands as I use multiple normals and they didnt look much different except for the antitarget region
Binning
Individual
Batch
> Detected file format: bed > Detected file format: bed > Estimated read length 101.0 > Wrote /tmp/tmp0od03n9s.bed with 100 regions > Splitting large targets > Wrote Agilent_SureSelect_XT_HS2_All_Exon_V8_Regions.target.bed with 204770 regions > Skipping untargeted chromosomes MT > Wrote Agilent_SureSelect_XT_HS2_All_Exon_V8_Regions.antitarget.bed with 134163 regions
> Detected file format: bed > Splitting large targets > Wrote Agilent_SureSelect_XT_HS2_All_Exon_V8_Regions.target.bed with 232655 regions > Wrote Agilent_SureSelect_XT_HS2_All_Exon_V8_Regions.antitarget.bed with 38328 regions
Reference
Individual
Batch
> Targets: 9665 (4.72%) bins failed filters (log2 < -5.0, log2 > 5.0, spread > 1.0) > Antitargets: 18937 (14.11%) bins failed filters > Wrote reference.cnn with 338933 regions
> Targets: 13067 (5.616%) bins failed filters (log2 < -5.0, log2 > 5.0, spread > 1.0) > Antitargets: 1894 (4.764%) bins failed filters > Wrote reference.cnn with 272408 regions
Difference in fix
Individual
Batch
> Processing target: tumour > Keeping 195105 of 204770 bins > Correcting for GC bias... > Correcting for density bias... > Processing antitarget: tumour > Keeping 115226 of 134163 bins > Correcting for GC bias... > Correcting for RepeatMasker bias... > Antitargets are 3.42 x more variable than targets
> Processing target: tumour > Keeping 219588 of 232655 bins > Correcting for GC bias... > Correcting for density bias... > Processing antitarget: tumour > Keeping 37859 of 39753 bins > Correcting for GC bias... > Correcting for RepeatMasker bias... > Antitargets are 1.39 x more variable than targets
Difference in Segment
This is a bit different because at some point segment call when just run as cnvkit.py segment tumor.cnr -o tumor.cns
still starts to smoothing by default which is pretty strange as --smooth-cbs i thought was an opt in feature or is this something else.
Individual
Batch
> Segmenting with method 'cbs', significance threshold 0.0001, in 1 processes > Smoothing overshot at 8 / 233 indices: (-30.268828150561895, -0.21054706747579377) vs. original (-27.9209, 0.53479) > Smoothing overshot at 10 / 595 indices: (-29.16372209174013, 1.8425761663484446) vs. original (-27.9546, -0.028386)
> Segmenting with method 'cbs', significance threshold 0.0001, in 1 processes > Dropped 3 / 13645 bins on chromosome 1 > Dropped 2 / 11956 bins on chromosome 1 > Dropped 1 / 9698 bins on chromosome 5 > Dropped 2 / 10534 bins on chromosome 12 > Dropped 48 / 126 bins on chromosome Y > Dropped 254 / 375 bins on chromosome Y
Then there are bunch of postprocessing step in batch mode which isnt documented as part of the batch pipeline altogether in the stable release version of the readthedocs like segmetrics and call to filter based on ci
CI filtering
Individual
Batch
> Applying filter 'ci' > Filtered by 'ci' from 59 to 34 rows > Wrote tumor.ci.cns with 34 regions
> Applying filter 'ci' > Filtered by 'ci' from 729 to 395 rows
This was followed by median centering and p-t-test
Version Info
Tested on v0.9.10, v0.9.11, v0.9.12
What is this all about
When running individual steps of pipeline I get different results as compared to when using batch command
Using batch leads to noisy results
When running commands individually
when using the batch command
Step by Step difference
I have not included here the coverage commands as I use multiple normals and they didnt look much different except for the antitarget region
Binning
> Detected file format: bed
> Estimated read length 101.0
> Wrote /tmp/tmp0od03n9s.bed with 100 regions
> Splitting large targets
> Wrote Agilent_SureSelect_XT_HS2_All_Exon_V8_Regions.target.bed with 204770 regions
> Skipping untargeted chromosomes MT
> Wrote Agilent_SureSelect_XT_HS2_All_Exon_V8_Regions.antitarget.bed with 134163 regions
> Splitting large targets
> Wrote Agilent_SureSelect_XT_HS2_All_Exon_V8_Regions.target.bed with 232655 regions
> Wrote Agilent_SureSelect_XT_HS2_All_Exon_V8_Regions.antitarget.bed with 38328 regions
Reference
> Antitargets: 18937 (14.11%) bins failed filters
> Wrote reference.cnn with 338933 regions
> Antitargets: 1894 (4.764%) bins failed filters
> Wrote reference.cnn with 272408 regions
Difference in fix
> Keeping 195105 of 204770 bins
> Correcting for GC bias...
> Correcting for density bias...
> Processing antitarget: tumour
> Keeping 115226 of 134163 bins
> Correcting for GC bias...
> Correcting for RepeatMasker bias...
> Antitargets are 3.42 x more variable than targets
> Keeping 219588 of 232655 bins
> Correcting for GC bias...
> Correcting for density bias...
> Processing antitarget: tumour
> Keeping 37859 of 39753 bins
> Correcting for GC bias...
> Correcting for RepeatMasker bias...
> Antitargets are 1.39 x more variable than targets
Difference in Segment
This is a bit different because at some point segment call when just run as
cnvkit.py segment tumor.cnr -o tumor.cns
still starts to smoothing by default which is pretty strange as
--smooth-cbs
i thought was an opt in feature or is this something else.> Smoothing overshot at 8 / 233 indices: (-30.268828150561895, -0.21054706747579377) vs. original (-27.9209, 0.53479)
> Smoothing overshot at 10 / 595 indices: (-29.16372209174013, 1.8425761663484446) vs. original (-27.9546, -0.028386)
> Dropped 3 / 13645 bins on chromosome 1
> Dropped 2 / 11956 bins on chromosome 1
> Dropped 1 / 9698 bins on chromosome 5
> Dropped 2 / 10534 bins on chromosome 12
> Dropped 48 / 126 bins on chromosome Y
> Dropped 254 / 375 bins on chromosome Y
Then there are bunch of postprocessing step in batch mode which isnt documented as part of the batch pipeline altogether in the stable release version of the readthedocs like segmetrics and call to filter based on ci
CI filtering
> Filtered by 'ci' from 59 to 34 rows
> Wrote tumor.ci.cns with 34 regions
> Filtered by 'ci' from 729 to 395 rows
This was followed by median centering and p-t-test
and finally endining with bintest
Bintest
> Significant hits in 7141/195105 bins (3.66%)
> Significant hits in 5976/219588 bins (2.72%)
Overall i see two differences
The text was updated successfully, but these errors were encountered: