Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create evaluation dataset for the contact prediction task #43

Open
pascalnotin opened this issue Sep 7, 2023 · 0 comments
Open

Create evaluation dataset for the contact prediction task #43

pascalnotin opened this issue Sep 7, 2023 · 0 comments

Comments

@pascalnotin
Copy link
Collaborator

This issue is about creating the data that will support the contact prediction task evaluation (see issue#19).

We need to select ~100 protein sequences that cover a balanced spectrum of:

  1. Depth of alignment / number of homologs (covering a wide range of depths like Fig.1B of the ESM2 paper)
  2. Taxa (prokaryotes, humans, other eukaryotes, viruses)

We then need to extract the contact maps for each protein, which we will use as ground truth in the evaluation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

No branches or pull requests

1 participant