Question: Output Bias Proportional to Sequence Length? #10

dvanderwall · 2024-12-19T19:47:26Z

Thanks for your earlier replies. Really enjoying this software. One thing I have noticed is that outputs seem biased towards returning top results for proteins that have longer sequence lengths. It also appears that there is a filter for sequence length available. What solutions/workflows have you found to be most useful for eliminating this bias, i.e. what cutoffs to use, which proteins to eliminate from index, etc. Thanks in advance.

khb7840 · 2024-12-26T06:46:33Z

Can you share an example of this behavior please? I would like to reproduce the issue locally. Currently, if same hits were identified, the shorter ones gets higher score because Folddisco applies length-based penalty in scoring.

khb7840 added the question Further information is requested label Dec 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question: Output Bias Proportional to Sequence Length? #10

Question: Output Bias Proportional to Sequence Length? #10

dvanderwall commented Dec 19, 2024

khb7840 commented Dec 26, 2024

Question: Output Bias Proportional to Sequence Length? #10

Question: Output Bias Proportional to Sequence Length? #10

Comments

dvanderwall commented Dec 19, 2024

khb7840 commented Dec 26, 2024