-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
script: extract the maximum sequence extent around a set of hashes #2413
Comments
Been thinking. This could also be informative in identifying QTLs at the metapangenome level (or narrower) when comparing phenotypes. |
hot take response: that's already what the hashes themselves (hopefully) do. This is more about what you do after that! |
How can I skip the contig hashing and take prefetch output Two example files ( Or is there another way of identifying the genomic region of matching hashes? |
hot take: you need the contig sequence itself so that it can be output with |
Sure. Though my question is can I return all the unknown contiguous sequences from all the hashes I have from prefetch. I think I'll need to write a different script for that though. |
moving script into plugin: [ command |
extract-max-extent-around-hashes.py is a potentially useful script that does the following:
Given a query sig, and a target set of contigs,
extract the maximum extent of actual DNA sequence from the contigs that
would generate the hashes in the query sig. For example, for a single hash,
this would be all DNA sequence between the neighboring hashes, but NOT
including the k-mers that generated those neighboring hashes. If multiple
hashes in a row matched, it would include all sequence up to the surrounding
(non-matching) k-mers.
The motivation is to extract the maximum potentially alignable sequence
when two genomes share some hashes, but might be useful for other purposes,
too.
The text was updated successfully, but these errors were encountered: