Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can atropos remove 5' adapter variants that are incomplete from the tail? #128

Open
lokapal opened this issue Aug 7, 2021 · 5 comments
Open

Comments

@lokapal
Copy link

lokapal commented Aug 7, 2021

Hello!

Just to clarify things - as far as I understand, atropos cannot remove incomplete 5' adapters that are incomplete at the tail, not head (so as cutadapt)?
I.e. if I have MYVERYLONGADAPTER and I have a lot of reads like

read1
MYVERYLONGAmysequence1
read2
MYVERYLOmysequence2

then the leftovers from the adapter can be removed only by listing all possible variants in the adapter.fa file?
I just installed atropos version 1.1.31 system-wide through "python3.7 -m pip install atropos"

@jdidion
Copy link
Owner

jdidion commented Aug 7, 2021

Do you have an example of a library prep that would produce reads with these characteristics?

@lokapal
Copy link
Author

lokapal commented Aug 7, 2021

Surely I do, it's not a theoretical question. Please find attached the example: two entries that are marked up. It's 4C library.
reads.fa.gz
Three adapters: A1, A2, Illumina/IlluminaPE.
A1D, A2D - direct adapters, A1RC, A2RC - reverse complement adapters.

@jdidion
Copy link
Owner

jdidion commented Aug 7, 2021 via email

@jdidion
Copy link
Owner

jdidion commented Aug 7, 2021

Found this - I assume this is standard protocol? https://www.sciencedirect.com/science/article/pii/S1046202318304742

Here they are trimming all reads at the same position (i.e. the -u flag of atropos). Is what they suggest standard, or is variable-length trimming like you're trying to do more the norm?

@lokapal
Copy link
Author

lokapal commented Aug 7, 2021

I can't state for "all" researchers but "my" wet biologists constantly supply me with libraries that can contain two 4C adapters and can contain only one 4C adapter, can contain two full adapters and can contain one full and one incomplete adapter in the different reads. Previously it always were SE reads (in my case) and it was much simpler - I always have cut the full 5' ANY adapter and didn't care about what was BEFORE it. But now I have PE reads and it is much more complicated, as you can see from the example attached.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants