-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"total" alignments is less than "protein_coding" #325
Comments
Thanks, I will have a look. The counts come from bedtools, while the "total" value comes from the BAM file and is the number of (mapped) alignments. |
@AndreasHeger - Did you get a chance to look at the issue? I am experiencing the same issue. Can you please shed some light? |
Sorry for the delay. The issue is likely that double-counting occurs. I assume that the bed file is probably derived from a gene set with multiple alternative transcripts? There are probably duplicates:
This will be counted twice into the protein_coding bin by bedtools. To avoid this, try:
|
I will add some text to the help message to warn users. |
Please let me know if this is likely to be the issue. |
@AndreasHeger - Thanks for your suggestion. I tried it and it doesn't make any difference. Seems like input BED file doesn't have any multiple alternative transcripts. |
Thanks. Could you share some of your data? I was able to reproduce the issue using duplicated intervals, but don't know how else it comes about. |
Hi,
I'm using
bam_vs_bed
to quantify the number of reads that are aligned to different transcript types (RNASeq) using a bed file where the ID is a gencode gene_type. The output for one sample is below. The apparent issue (unless I'm missing something) is that the "total" count is an order of magnitude less than the count for "protein_coding".Thanks!
The text was updated successfully, but these errors were encountered: