Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interpreting warning messages in discovery only mode #415

Open
sparthib opened this issue Feb 28, 2024 · 2 comments
Open

Interpreting warning messages in discovery only mode #415

sparthib opened this issue Feb 28, 2024 · 2 comments

Comments

@sparthib
Copy link

Hello, I have a bam file where my reads are aligned to the GENCODE GRCh38.p14.genome.fa file.
The corresponding gtf file is gencode.v44.chr_patch_hapl_scaff.annotation.gtf

I tried running BAMBU in discovery only mode

bambuAnnotations <- prepareAnnotations(gtf.file)
se.discoveryOnly <- bambu(reads = test.bam,
                          annotations = bambuAnnotations,
                          genome = fa.file,
                          quant = FALSE,
                          rcOutDir = se_output_dir,
                          NDR = 1)

I reloaded the output rcFile as

se.discoveryOnly <- bambu(reads = c("test_rc.rds"),
                                 annotations = bambuAnnotations, genome = fa.file)

In my resulting se file, I see these 3 warnings in the metadata:

# [1] "not all chromosomes present in reference annotations, annotations might be incomplete. Please compare objects on the same reference"
# [2] "25701 reads are mapped outside the provided genomic regions. These reads will be dropped. Check you are using the same genome used for the alignment"
# [3] "No aligned spliced reads detected!Bambu expects spliced reads. If this is intended, see Documentation on how to handle single-exon transcripts"

Also, for each transcript in my bambuAnnotations object, I see the following message as well.
505 sequences from an unspecified genome; no seqlengths

I seem to be using matching reference genome fastas and gtfs. Any pointers as to what could be wrong would be appreciated.

Thanks,

Sowmya

@andredsim
Copy link
Collaborator

Hi Sowmya,

The first two warnings are more informational, than detrimental to your analysis. The first can occur when there are reads to scaffolds in the fasta file, that have no annotations in the gtf file, for example on accessory chromosomes, which is quite common. The second is a consequence of some aligners which can align part of a read near the end of scaffolds, but the entire read extends beyond those boundaries. I have observed this myself in my own data and we havn't come up with a clean solution, which is why we only report how many reads we remove from the analysis, so users can know how much of their data is impacted.

The third warning however is critical. By default bambu expects spliced reads, and in most use-cases they should be present in the data. The most typical cause for a lack of spliced reads is either the user mapped the reads to the transcriptome (which doesn't seem to be the case in this instance), or they ran an aligner such as minimap2 without setting the splice aware parameter.

Let me know how you aligned your data, or if you don't expect spliced reads (prokaryote sample, targeted amplification sequencing etc), and we can go from there.

Kind Regards,
Andre Sim

@sparthib
Copy link
Author

Hi Andre,

Thank you for your quick and detailed response. I believe this is because I ran minimap2 without setting the splice aware parameter. Will update this thread after rerunning alignment.

Thanks!
Sowmya

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants