Differnece between running batch and each step individually #935

shashwatsahay · 2024-12-21T17:48:37Z

Version Info

Tested on v0.9.10, v0.9.11, v0.9.12

What is this all about

When running individual steps of pipeline I get different results as compared to when using batch command
Using batch leads to noisy results

When running commands individually

when using the batch command

Step by Step difference

I have not included here the coverage commands as I use multiple normals and they didnt look much different except for the antitarget region

Binning

Individual	Batch
> Detected file format: bed > Detected file format: bed > Estimated read length 101.0 > Wrote /tmp/tmp0od03n9s.bed with 100 regions > Splitting large targets > Wrote Agilent_SureSelect_XT_HS2_All_Exon_V8_Regions.target.bed with 204770 regions > Skipping untargeted chromosomes MT > Wrote Agilent_SureSelect_XT_HS2_All_Exon_V8_Regions.antitarget.bed with 134163 regions	> Detected file format: bed > Splitting large targets > Wrote Agilent_SureSelect_XT_HS2_All_Exon_V8_Regions.target.bed with 232655 regions > Wrote Agilent_SureSelect_XT_HS2_All_Exon_V8_Regions.antitarget.bed with 38328 regions

Reference

Individual	Batch
> Targets: 9665 (4.72%) bins failed filters (log2 < -5.0, log2 > 5.0, spread > 1.0) > Antitargets: 18937 (14.11%) bins failed filters > Wrote reference.cnn with 338933 regions	> Targets: 13067 (5.616%) bins failed filters (log2 < -5.0, log2 > 5.0, spread > 1.0) > Antitargets: 1894 (4.764%) bins failed filters > Wrote reference.cnn with 272408 regions

Difference in fix

Individual	Batch
> Processing target: tumour > Keeping 195105 of 204770 bins > Correcting for GC bias... > Correcting for density bias... > Processing antitarget: tumour > Keeping 115226 of 134163 bins > Correcting for GC bias... > Correcting for RepeatMasker bias... > Antitargets are 3.42 x more variable than targets	> Processing target: tumour > Keeping 219588 of 232655 bins > Correcting for GC bias... > Correcting for density bias... > Processing antitarget: tumour > Keeping 37859 of 39753 bins > Correcting for GC bias... > Correcting for RepeatMasker bias... > Antitargets are 1.39 x more variable than targets

Difference in Segment

This is a bit different because at some point segment call when just run as
cnvkit.py segment tumor.cnr -o tumor.cns
still starts to smoothing by default which is pretty strange as --smooth-cbs i thought was an opt in feature or is this something else.

Individual	Batch
> Segmenting with method 'cbs', significance threshold 0.0001, in 1 processes > Smoothing overshot at 8 / 233 indices: (-30.268828150561895, -0.21054706747579377) vs. original (-27.9209, 0.53479) > Smoothing overshot at 10 / 595 indices: (-29.16372209174013, 1.8425761663484446) vs. original (-27.9546, -0.028386)	> Segmenting with method 'cbs', significance threshold 0.0001, in 1 processes > Dropped 3 / 13645 bins on chromosome 1 > Dropped 2 / 11956 bins on chromosome 1 > Dropped 1 / 9698 bins on chromosome 5 > Dropped 2 / 10534 bins on chromosome 12 > Dropped 48 / 126 bins on chromosome Y > Dropped 254 / 375 bins on chromosome Y

Then there are bunch of postprocessing step in batch mode which isnt documented as part of the batch pipeline altogether in the stable release version of the readthedocs like segmetrics and call to filter based on ci

CI filtering

Individual	Batch
> Applying filter 'ci' > Filtered by 'ci' from 59 to 34 rows > Wrote tumor.ci.cns with 34 regions	> Applying filter 'ci' > Filtered by 'ci' from 729 to 395 rows

This was followed by median centering and p-t-test

and finally endining with bintest

Bintest

Individual	Batch
> Ignoring 115226 off-target bins > Significant hits in 7141/195105 bins (3.66%)	> Ignoring 37859 off-target bins > Significant hits in 5976/219588 bins (2.72%)

Overall i see two differences

At the step of binning which uses target and antitarget instead of autobin if i am not wrong (refer to my comment on batch hybrid: Use autobin for target and antitarget bin sizes #302)

#302 (comment)

Or it could be due to the automatic smmothing in segment step which i dont undestand how is it even happening

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Differnece between running batch and each step individually #935

Differnece between running batch and each step individually #935

shashwatsahay commented Dec 21, 2024

Differnece between running batch and each step individually #935

Differnece between running batch and each step individually #935

Comments

shashwatsahay commented Dec 21, 2024

Version Info

What is this all about

When running commands individually

when using the batch command

Step by Step difference

Binning

Reference

Difference in fix

Difference in Segment

CI filtering

Bintest