This script allows the in silico evolution of hallucinated protein homooligomers (HALs) by clustering many trajectories according to AF2 fitness metrics. To maximize the exploration of the design space while minimizing use of computational resources, we devised this evolution-based computational strategy: many short MCMC trajectories (< 50 steps) outputs are clustered by fitness score, and then used to seed new generations of trajectories.
This was used to generate HALs of increased complexity across longer length-scales, by extending the design specifications to structures of high symmetry (>C15) and longer assembly sequence length (>1000 residues). To generate multiple possible oligomers from a single large structure, we specified the MCMC trajectories as single-chains with internal sequence symmetry, with the goal of generating structure-symmetric repeat proteins that can be split into any desired oligomeric assembly compatible with factorization (e.g. C15 into a pentamer). Using this approach, we hallucinated C15, C18, C20, C25, C24, C30, C33, and C42 oligomers of up to 2000 residues ranging from 5 to 12 nm along their largest dimension, which were then divided into homo-trimers, homo-pentamers, homo-hexamers and homo-heptamers.