Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

are there instructions anywhere for using directsketch to sketch all reference genomes under a certain taxonomic node? #166

Open
ctb opened this issue Dec 26, 2024 · 2 comments

Comments

@ctb
Copy link
Contributor

ctb commented Dec 26, 2024

e.g. all plant genomes - sourmash-bio/sourmash#3172 - but I'd like to do fungi.

@bluegenes
Copy link
Collaborator

I have code to do everything except generate the csv of assembly accessions + taxids. Perhaps we can download from NCBI
via REST API (taxonomy) or similar? https://www.ncbi.nlm.nih.gov/datasets/docs/v2/api/rest-api/

Code is mostly in https://github.com/bluegenes/2024-ds-plant but I wrote some improved assembly version handling for the ICTV db that I will add. Shouldn't really be an issue if our downloaded file is recent and doesn't contain suppressed accessions.

@bluegenes
Copy link
Collaborator

For us I think we want an automated workflow, but may be worth writing a tutorial for this too so folks know how to build custom dbs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants