Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for output to BAM #95

Open
plijnzaad opened this issue Apr 21, 2020 · 10 comments
Open

Add support for output to BAM #95

plijnzaad opened this issue Apr 21, 2020 · 10 comments
Assignees
Milestone

Comments

@plijnzaad
Copy link

plijnzaad commented Apr 21, 2020

I just fixed the File Not Found (issue #86) a bit (see my pull request), but whereas it seems to now be able to read sam and bam files, it can only write sam, not bam:

/Users/philip/tmp/atropos.fix/bin/atropos -a RA5=GATCGTCGGACTGTAGAACTCTGAAC -se grep_adapt.bam -o trimmed4.bam --report-file summary5.txt --input-format bam --output-format bam

With atropos 2.0.0a5.dev1+g3aa3791 , Python 3.7.7, pysam version '0.15.4' , Cython version 0.29.16 ), this yields UnknownFileTypeError: File format <SequenceFileType.BAM: ({'.bam'}, False)> is unknown (expected 'fasta' or 'fastq').

For details see this attachment

@jdidion
Copy link
Owner

jdidion commented Apr 21, 2020

Thanks! I'll try to work on it this week. Please submit a PR if you figure out the solution.

@plijnzaad
Copy link
Author

plijnzaad commented May 6, 2020

just a comment for those of you (like myself) who would try to use some FIFO magic for this: solving it as something like

atropos -a RA5=GATCGTCGGACTGTAGAACTCTGAAC -se TM433_trunc.bam --input-format bam -o >(samtools view -b > trimmed.bam) --output-format sam --report-file summary3.txt
does not work (crashes with message like Path /dev/fd/63 is not writable ... looks like Python or pysam makes to many assumptions here ... )

Also crashes when using -o - or when using -o /dev/stdout .

@jdidion
Copy link
Owner

jdidion commented May 6, 2020

Hi @plijnzaad - writing to BAM is not currently supported, and I'm still debating whether or not to add it. For now you can workaround this by writing SAM to stdout and piping it to samtools -Sb.

Regarding the FIFO issue - can you please try again after installing bamnostic? atropos will use bamnostic first if it's installed and it should avoid most of the issues that exist with pysam. Thanks

@jdidion jdidion changed the title Can read bam and sam, but only write sam, not bam. Add support for output to BAM May 6, 2020
@jdidion jdidion added enhancement and removed bug labels May 6, 2020
@plijnzaad
Copy link
Author

plijnzaad commented May 7, 2020

Hi, I installed bamnostic (version 1.1.4) and it finds it etc. but it still doesn't work. With bamnostic, atropos again crashes on input bam files that contain @CO header lines (says Malformed BGZF block, see
e-bamnostic-baminput-with-CO.txt. Using a bam file without a @CO header lline and the following incantation:

atropos -a RA5=GATCGTCGGACTGTAGAACTCTGAAC --input-format bam -se $bam --output-format sam -o >(samtools view -b -o testout.bam )

it again crashes with Path /dev/fd/63 is not writable (see
e-bamnostic-bamoutput.txt ).

Using -o - or -o /dev/stdout instead leads to ValueError: Invalid path: /dev/stdout

@jdidion
Copy link
Owner

jdidion commented May 7, 2020

This appears to be a problem with your bamfile. When I converted it to SAM and then back to BAM it worked fine (samtools view TM249_trunc2.bam | samtools view -Sb > new.bam).

To get output to stdout, you just need to not specify the -o option. You should be able to specify -o /dev/stdout or -o - but it appears that is not working - I will fix that. But the following command works as expected:

atropos -a RA5=GATCGTCGGACTGTAGAACTCTGAAC --input-format bam -se new.bam --output-format sam | samtools view -Sb -o testout.bam.

@jdidion
Copy link
Owner

jdidion commented May 7, 2020

Specifying stdout/stderr is now fixed in develop.

@plijnzaad
Copy link
Author

plijnzaad commented May 8, 2020

Brililant, seems to work fine, many thanks. I realized that I overlooked the fact that leaving out the-ooption already resulted in output to stdout - sorry!

Still puzzled about the bam-formatting error that trips bamnostic up, I'll have another look.

(PS: your example conversion gets rid of all header lines, so doesn't say much)

@plijnzaad
Copy link
Author

plijnzaad commented May 8, 2020

Weirdly, converting (to SAM) the original bam and the reformatted (bam->sam->bam) versions are identical. I also did a strict check with ValidateSamFile from picardtools-2.21.1, both bams give identical (and harmless) warnings. My conclusion is that bamnostic is broken. Is there a way to make atropos prefer pysam over bamnostic (other than uninstalling bamnostic) ?

@jdidion
Copy link
Owner

jdidion commented May 8, 2020

The error is "Malformed BGZF block" so it's not a difference in the contents but with the compression of the data.

I will add a new issue to enable the choice of BAM reading library to be configurable. For the time being, the solution is just to pip uninstall bamnostic.

@plijnzaad
Copy link
Author

Brilliant, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants