The same transcript with different counts in counts_transcript.txt #410

thegreatFANG · 2024-01-10T10:46:04Z

Hi there,

Thanks for creating this great tool.
When I followed the test and ran Bambu with my own data, I got the following results.

I am wondering about how this happened and how to fix it.

Thanks again!
fang

thegreatFANG · 2024-01-10T14:16:06Z

And here is the code.

options(BIOCONDUCTOR_CONFIG_FILE = "/data/home/zyfang/config.yaml")
library(bambu)
fa.file = "/data/home/zyfang/Data/fa/GRCh38_full_analysis_set_plus_decoy_hla.fa"
gtf.file = "/data/home/zyfang/Data/gtf/hg38.annot.gtf"
samples.bam = "/data/home/zyfang/Data/bam/RY01.chr22.bam"
se = bambu(reads=samples.bam,annotations=gtf.file,genome=fa.file)
writeBambuOutput(se,path="/data/home/zyfang/Data/output_chr22")
q()

andredsim · 2024-01-11T01:35:11Z

Hi,

This is a strange output. Below I have attached what should be expected. From what I can gather from your image I believe that the transcripts highlighted in the read boxes are actually different transcripts, but for some reason the ENST transcript id has been incorrectly appended to the gene id. You could confirm this by looking at the output gtf file for these transcripts to see if they have different coordinates.

If you don't find a case in this output where the gene id is the same but the ENST transcript id is different than it is likely this is just a superficial issue and you can remove the ENST transcript id and the counts will be fine. However if there are cases where there are two transcripts with the same gene id but different ENST transcript id then I would be worried the quantification is impacted.

As to why this has occurred is puzzling. Would you be able to upload some of the lines from the gtf file you used?
And the output from the following:

library(bambu)
gtf.file = "/data/home/zyfang/Data/gtf/hg38.annot.gtf"
annotations = prepareAnnotations(gtf.file)
print(mcols(annotations))

Kind Regards,
Andre Sim

thegreatFANG · 2024-01-16T13:54:03Z

Hi,

This is a strange output. Below I have attached what should be expected. From what I can gather from your image I believe that the transcripts highlighted in the read boxes are actually different transcripts, but for some reason the ENST transcript id has been incorrectly appended to the gene id. You could confirm this by looking at the output gtf file for these transcripts to see if they have different coordinates.

If you don't find a case in this output where the gene id is the same but the ENST transcript id is different than it is likely this is just a superficial issue and you can remove the ENST transcript id and the counts will be fine. However if there are cases where there are two transcripts with the same gene id but different ENST transcript id then I would be worried the quantification is impacted.

As to why this has occurred is puzzling. Would you be able to upload some of the lines from the gtf file you used? And the output from the following:
library(bambu)
gtf.file = "/data/home/zyfang/Data/gtf/hg38.annot.gtf"
annotations = prepareAnnotations(gtf.file)
print(mcols(annotations))
Kind Regards, Andre Sim

Thank you for your reply. Based on the information you provided, I have determined that it is my gtf file that caused the issue. My gtf file was converted from a gff3 file, but due to some reasons, the "gene" is missing from this gtf file. When I tested using a correct gtf file, I obtained the correct results.
Thanks again,
fang

thegreatFANG closed this as completed Jan 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The same transcript with different counts in counts_transcript.txt #410

The same transcript with different counts in counts_transcript.txt #410

thegreatFANG commented Jan 10, 2024

thegreatFANG commented Jan 10, 2024

andredsim commented Jan 11, 2024

thegreatFANG commented Jan 16, 2024

The same transcript with different counts in counts_transcript.txt #410

The same transcript with different counts in counts_transcript.txt #410

Comments

thegreatFANG commented Jan 10, 2024

thegreatFANG commented Jan 10, 2024

andredsim commented Jan 11, 2024

thegreatFANG commented Jan 16, 2024