AT2 Transcript assemblies - which one?! #1

danmaclean · 2012-12-19T19:57:52Z

Diane (@DGOS) and Matt (@MattBashton) have both done Trinity assemblies of the read sets.
Here is the transcript of an email conversation that they had about which one should be kept..

Hi Matt,

I have added another assembly to the AT2 assemblies folder. This was
generated with the older version of trinity and has fewer transcripts than
the one you posted. I will try to get some time to compare the two soon.

Great to see your contributions and enthusiasm for the project. We are
hoping to undertake genome assemblies early in the new year, so I will let
you know when we have these as I'm sure you too will be excited to
annotate them.

Have a good Christmas,

Diane

Hi Diane,

Yes I noticed that is was substantially smaller than the assembly I
generated, I'm assuming this is due to the newer version of Tirinity I'm
using producing a larger assembly, maybe it has longer contigs, since as
far as I can tell we've used the same input and I was just about to email
Dan on this subject, but thanks for your mail. It's interesting that they
are different, I've not followed the development of trinity closely so
I'm not sure what has changed in it during the last year. I guess there
are other assemblers out there too.

It might be worth deciding on a assembly version for most of the analysis
to be run against (annotation etc), as I guess analyses run on different
versions of assemblies will soon get confusing. I don't mind kicking off
an assembly of the AT1 data if need be, it should only take about 9 hours
to run if that. I can stick it on a FTP server here, or uploaded it
under a different name i.e. with the trinity version appended or
something if you don't want to confuse things.

Hope you all have a good Christmas,

Regards,

Matt.

Hi Matt,

I'm presuming it's due to some inbuilt parameter settings that are
slightly different with the newer version (min. contig length etc.). I
also haven't been following the developments in trinity so need to have a
look in more detail to see how the versions differ. I agree that more than
one assembly could be confusing so I will try to look into this as a
priority with others here @ TSL. If you have time to also run AT1 with the
newer version, this would also be interesting to compare.

It is a good idea to start a message board with suggestions and
discussions for future analysis as the project progresses. Once we have
genome assemblies I'm sure more people will start to contribute or want to
suggest useful analysis. I will have a look into this.

Thanks again for your contributions as they are really useful. It's really
exciting to work together on such an interesting project.

Regards,

Diane

danmaclean · 2012-12-19T20:07:15Z

So here's what I think about this apparent repetition. I think it's brilliant! It's exactly the sort of thing we wanted, I would say we keep both for the time being. At the very least they are great fodder for things like the Maker pipeline and such which would help give us the combined non-redundant set of transcripts.

As for uniquely identifying the assembly, the url of the file in the repository will uniquely identify it, and you can come up with a short tag name for it using the metadata files. You could easily create subfolders within the assemblies folder for each assembly that you do, each folder name could have the short tag name.

If you want to upload large files, let me know, we can stick them on ftp-oadb.tsl.ac.uk, I can give you an upload password and uname.

As for setting up a message board to discuss issues, then github has one built in (and you're using it now!), look at the top of the repository, there's a button labelled issues. That keeps track of issues with repositories and allows us to discuss how to resolve them.

cheers
Dan

ghost · 2013-02-21T16:15:18Z

For interest: I ran the assemblies found on githib through an evaluation routine I have.
The 'reference' used was the set of Glarea_lozoyensis proteins (7904) from NCBI.
I search (tblastn) each protein against the assembly, score each hit from 0 to 1 (perfect hit) and the average score taken for each assembly. The absolute value is pretty meaningless in this case, but the relative values should provide an indication.

I have included scores for some velvet/oases assemblies with and without reads corrected by seecer.

Hope that is of some interest.

Paul O'Neill
University of Exeter

danmaclean · 2013-04-19T08:17:27Z

Hi Paul,

Ive been a little tardy in getting my head round this, really sorry. I really like the analysis, and I think the conclusion is that the methods for the assemblies are working out well (with the exception of oases for AT2) and that the AT1 assemblies are a bit more variable. Im not sure I understand the significance of the magnitude of the differences. It looks to me like they are all pretty similar, within a sample, (ie all the AT1 samples are similar), but the AT1 is closer than the AT2 to your reference set. Would you like to comment on this? Where there any particular regions/contigs that were really bad between these? Lastly, Would you like to write a blog post for http://oadb.tsl.ac.uk summarising this work?

Cheers
Dan

ghost closed this as completed Feb 21, 2013

ghost reopened this Feb 21, 2013

danmaclean mentioned this issue Apr 19, 2013

Project organisation - discussion forum? #2

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AT2 Transcript assemblies - which one?! #1

AT2 Transcript assemblies - which one?! #1

danmaclean commented Dec 19, 2012

danmaclean commented Dec 19, 2012

ghost commented Feb 21, 2013

danmaclean commented Apr 19, 2013

AT2 Transcript assemblies - which one?! #1

AT2 Transcript assemblies - which one?! #1

Comments

danmaclean commented Dec 19, 2012

danmaclean commented Dec 19, 2012

ghost commented Feb 21, 2013

danmaclean commented Apr 19, 2013