-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HISAT2.wdl: replace output command substitutions with explicit fifo/wait #233
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Could you provide a test (or a set of example inputs) that can be run to verify that this works?
Given the wide impact to multiple pipelines of changes to this step it would be great if we had a test specifically for this step |
Hi @mlin - Any chance you're planning to add a test to this PR? I suspect it's also a bit out of date with the codebase at this point. Let us know if you don't have the right access to add a test. |
Hi @kbergin, all, The new lines of code proposed here run unconditionally and thus, should be exercised in any existing tests which call on the HISAT2 tasks. They're a mechanical restructuring of the existing commands, eliminating the concurrency race condition described above. I think it would be disproportionately difficult to design a test that specifically provokes the race condition. |
The hisat2 tasks stream output into samtools to avoid having to materialize a giant text SAM file on the scratch disk. This is a good idea but it's implemented in a slightly risky way, using an output command substitution like
hisat2 ... -S >(samtools view -o output.bam ...)
. In this construct samtools is spawned as a background process, and bash does not wait for it before proceeding to the next command or exiting at the end of the script. Furthermore according to this Q&A it does not even provide a way to wait for it!This creates a race condition where the next step is liable to start reading a partial BAM file, including the runtime system potentially outputting a truncated file (cf. chanzuckerberg/miniwdl#211).
Here we replace the output command substitutions with a less-elegant but hopefully reliable construct, which allows us to explicitly wait for samtools before proceeding.