Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HISAT2.wdl: replace output command substitutions with explicit fifo/wait #233

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

mlin
Copy link
Contributor

@mlin mlin commented Jul 18, 2019

The hisat2 tasks stream output into samtools to avoid having to materialize a giant text SAM file on the scratch disk. This is a good idea but it's implemented in a slightly risky way, using an output command substitution like hisat2 ... -S >(samtools view -o output.bam ...). In this construct samtools is spawned as a background process, and bash does not wait for it before proceeding to the next command or exiting at the end of the script. Furthermore according to this Q&A it does not even provide a way to wait for it!

This creates a race condition where the next step is liable to start reading a partial BAM file, including the runtime system potentially outputting a truncated file (cf. chanzuckerberg/miniwdl#211).

Here we replace the output command substitutions with a less-elegant but hopefully reliable construct, which allows us to explicitly wait for samtools before proceeding.

@mlin mlin requested a review from mckinsel July 18, 2019 21:43
@pullapprove pullapprove bot requested a review from gbggrant July 18, 2019 22:22
Copy link

@gbggrant gbggrant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Could you provide a test (or a set of example inputs) that can be run to verify that this works?

@barkasn
Copy link
Contributor

barkasn commented Jul 19, 2019

Given the wide impact to multiple pipelines of changes to this step it would be great if we had a test specifically for this step

@kbergin
Copy link
Contributor

kbergin commented Nov 7, 2019

Hi @mlin - Any chance you're planning to add a test to this PR? I suspect it's also a bit out of date with the codebase at this point. Let us know if you don't have the right access to add a test.

@mlin
Copy link
Contributor Author

mlin commented Nov 8, 2019

Hi @kbergin, all,

The new lines of code proposed here run unconditionally and thus, should be exercised in any existing tests which call on the HISAT2 tasks. They're a mechanical restructuring of the existing commands, eliminating the concurrency race condition described above. I think it would be disproportionately difficult to design a test that specifically provokes the race condition.

@kbergin
Copy link
Contributor

kbergin commented Nov 9, 2019

That seems reasonable to me, Mike, but I was mostly just bumping the PR to help get it resolved. @barkasn @gbggrant opinions here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants