Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Length of identified repeats #102

Open
FabianDK opened this issue Sep 16, 2020 · 1 comment
Open

Length of identified repeats #102

FabianDK opened this issue Sep 16, 2020 · 1 comment

Comments

@FabianDK
Copy link

In your paper you report that RepeatModeler2 has a low number of false positives.
I am wondering, however, if repeats with a small length are more likely false positives than larger ones?

In my analysis, I obtained 464 repeats, of which about 10% are below 100bp and almost 50% are below 500bp (min = 56bp, max = 17331bp, average = 1131 bp).

Would you recommend to filter the identified repeat sequences for a minimum length?

@rmhubley
Copy link
Member

Sorry for the long delay. It is hard to say from size alone. It really depends on the organism, the classes of TEs etc. In many cases shorter sequences may simply be fragments of true, but much longer families. In curating a de-novo generated library we typically take the longer sequences first and then, after curation ( ie. extension ) we compare the smaller fragments against the curated library to see if we can discard duplicated results or identify subfamilies. The remaining set are then extended ( if possible ) and a final library is generated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants