Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The same transcript with different counts in counts_transcript.txt #410

Closed
thegreatFANG opened this issue Jan 10, 2024 · 3 comments
Closed

Comments

@thegreatFANG
Copy link

Hi there,

Thanks for creating this great tool.
When I followed the test and ran Bambu with my own data, I got the following results.
QQ图片20240110183954
I am wondering about how this happened and how to fix it.

Thanks again!
fang

@thegreatFANG
Copy link
Author

And here is the code.

options(BIOCONDUCTOR_CONFIG_FILE = "/data/home/zyfang/config.yaml")
library(bambu)
fa.file = "/data/home/zyfang/Data/fa/GRCh38_full_analysis_set_plus_decoy_hla.fa"
gtf.file = "/data/home/zyfang/Data/gtf/hg38.annot.gtf"
samples.bam = "/data/home/zyfang/Data/bam/RY01.chr22.bam"
se = bambu(reads=samples.bam,annotations=gtf.file,genome=fa.file)
writeBambuOutput(se,path="/data/home/zyfang/Data/output_chr22")
q()

@andredsim
Copy link
Collaborator

Hi,

This is a strange output. Below I have attached what should be expected. From what I can gather from your image I believe that the transcripts highlighted in the read boxes are actually different transcripts, but for some reason the ENST transcript id has been incorrectly appended to the gene id. You could confirm this by looking at the output gtf file for these transcripts to see if they have different coordinates.

If you don't find a case in this output where the gene id is the same but the ENST transcript id is different than it is likely this is just a superficial issue and you can remove the ENST transcript id and the counts will be fine. However if there are cases where there are two transcripts with the same gene id but different ENST transcript id then I would be worried the quantification is impacted.

As to why this has occurred is puzzling. Would you be able to upload some of the lines from the gtf file you used?
And the output from the following:

library(bambu)
gtf.file = "/data/home/zyfang/Data/gtf/hg38.annot.gtf"
annotations = prepareAnnotations(gtf.file)
print(mcols(annotations))

image

Kind Regards,
Andre Sim

@thegreatFANG
Copy link
Author

Hi,

This is a strange output. Below I have attached what should be expected. From what I can gather from your image I believe that the transcripts highlighted in the read boxes are actually different transcripts, but for some reason the ENST transcript id has been incorrectly appended to the gene id. You could confirm this by looking at the output gtf file for these transcripts to see if they have different coordinates.

If you don't find a case in this output where the gene id is the same but the ENST transcript id is different than it is likely this is just a superficial issue and you can remove the ENST transcript id and the counts will be fine. However if there are cases where there are two transcripts with the same gene id but different ENST transcript id then I would be worried the quantification is impacted.

As to why this has occurred is puzzling. Would you be able to upload some of the lines from the gtf file you used? And the output from the following:

library(bambu)
gtf.file = "/data/home/zyfang/Data/gtf/hg38.annot.gtf"
annotations = prepareAnnotations(gtf.file)
print(mcols(annotations))

image

Kind Regards, Andre Sim

Thank you for your reply. Based on the information you provided, I have determined that it is my gtf file that caused the issue. My gtf file was converted from a gff3 file, but due to some reasons, the "gene" is missing from this gtf file. When I tested using a correct gtf file, I obtained the correct results.
Thanks again,
fang

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants