Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: Output Bias Proportional to Sequence Length? #10

Open
dvanderwall opened this issue Dec 19, 2024 · 1 comment
Open

Question: Output Bias Proportional to Sequence Length? #10

dvanderwall opened this issue Dec 19, 2024 · 1 comment
Labels
question Further information is requested

Comments

@dvanderwall
Copy link

Thanks for your earlier replies. Really enjoying this software. One thing I have noticed is that outputs seem biased towards returning top results for proteins that have longer sequence lengths. It also appears that there is a filter for sequence length available. What solutions/workflows have you found to be most useful for eliminating this bias, i.e. what cutoffs to use, which proteins to eliminate from index, etc. Thanks in advance.

@khb7840
Copy link
Member

khb7840 commented Dec 26, 2024

Can you share an example of this behavior please? I would like to reproduce the issue locally. Currently, if same hits were identified, the shorter ones gets higher score because Folddisco applies length-based penalty in scoring.

@khb7840 khb7840 added the question Further information is requested label Dec 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants