Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate new sequence by fixing the length #14

Open
shrimonmuke0202 opened this issue Dec 29, 2024 · 6 comments
Open

Generate new sequence by fixing the length #14

shrimonmuke0202 opened this issue Dec 29, 2024 · 6 comments

Comments

@shrimonmuke0202
Copy link

Hi,

Your work is fantastic!

I have a question: I want to generate a sequence of proteins by providing the length as input.

For example:
Generate a sequence of length 100.

How can I do this?

Thanks and regards,
Shrimon Mukherjee

@Lyu6PosHao
Copy link
Member

Hi,
Thanks for your recognization!
For decoder-only autoregressive models such as ProLLaMA, there is no direct way to control the output length to fixed values such as 100.
One possible approach is to let the model generate many sequences, and then manually filter out those with a length of 100.

Kind regards

@shrimonmuke0202
Copy link
Author

shrimonmuke0202 commented Jan 5, 2025

Thanks for your answer. I want to produce the results present in the paper for my work, in particular the Table 2. could you share the code used to calculate the metrics like Tm score, RMSD score
Uploading IMG_0070.jpeg…

@Lyu6PosHao
Copy link
Member

Please refer to https://github.com/steineggerlab/foldseek. You could easily use the tools provided by foldseek to calculate TMscore, RMSD, etc.

@shrimonmuke0202
Copy link
Author

Thanks, how you perform the evaluation process i.e; how many new sequences are generated to performa the evaluation process?

@shrimonmuke0202
Copy link
Author

shrimonmuke0202 commented Jan 7, 2025

Also could you share the test set. It will help for me to compare between ProLlama and our proposed model.

@Lyu6PosHao
Copy link
Member

1000 protein sequences used for each model in Table 2. We use models in Table2 to generate 1000 proteins respectively.

You could use ProLLaMA_Stage_1 to generate 1000 sequecens unconditionally throught scripts/infer.py. And then compare the proteins generated by ProLLaMA with your proposed model.

Best Regards

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants