You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm not sure if I'm missing something obvious or using the wrong parameters for gffread but when I run it to extract transcript sequences from a GFF3 file it doesn't seem to extract the full transcript from the start to the stop coordinate. I examined a few sequences from the output and as best as I can tell it might just be extracting the CDS segments and not the introns in between, so perhaps the coding sequences but that's not what I want. Here's the command I use (example filenames):
In the transcripts.fa file I noticed that the sequences are not the complete transcripts including introns, is there a particular set of parameters that will help me get that output? For example here's the mRNA/transcript line from the GFF3:
I thought I should expect that the sequence length should be approximately the length between that start and end coordinate, unless I'm interpreting this wrongly?
Most of the transcripts I'm dealing with in the GFF3 have both CDS and exon segments that overlap identically except at the start and end of the transcripts since those act as 'implied UTRs'.
Any help you can provide would be appreciated, thanks!
Vaneet
The text was updated successfully, but these errors were encountered:
Hello,
I'm not sure if I'm missing something obvious or using the wrong parameters for gffread but when I run it to extract transcript sequences from a GFF3 file it doesn't seem to extract the full transcript from the start to the stop coordinate. I examined a few sequences from the output and as best as I can tell it might just be extracting the CDS segments and not the introns in between, so perhaps the coding sequences but that's not what I want. Here's the command I use (example filenames):
gffread -w transcripts.fa -g genomic_seq.fa gene_models.gff3
In the transcripts.fa file I noticed that the sequences are not the complete transcripts including introns, is there a particular set of parameters that will help me get that output? For example here's the mRNA/transcript line from the GFF3:
Chr1 Xenbase mRNA 329594 331864 . - . ID=mRNA099831;Name=XM_031895097.1;Dbxref=GeneID:116408318,Genbank:XM_031895097.1;Parent=XBXT10g022928;gbkey=mRNA;gene=bbc3;transcript_id=RefSeq:XM_031895097.1;curie=RefSeq:XM_031895097.1;Ontology_term=SO:0000234
I thought I should expect that the sequence length should be approximately the length between that start and end coordinate, unless I'm interpreting this wrongly?
Most of the transcripts I'm dealing with in the GFF3 have both CDS and exon segments that overlap identically except at the start and end of the transcripts since those act as 'implied UTRs'.
Any help you can provide would be appreciated, thanks!
Vaneet
The text was updated successfully, but these errors were encountered: