-
Notifications
You must be signed in to change notification settings - Fork 564
Description
Hello everyone,I'm currently implementing a workflow based on the recent RFpeptides paper (Rettie et al., 2025, Nature Chemical Biology) and had a question about the sequence design step. I'd appreciate any insights from the community.
The Paper's Workflow: The authors describe an iterative, 4-round process for each diffused backbone, which (as I understand it) looks like this:
- Run ProteinMPNN on the RFdiffusion backbone (using temperature of 0.0001) to get the single best sequence.
- Run Rosetta FastRelax on the new sequence/backbone complex.
- Use the relaxed backbone from the previous step as the new input for ProteinMPNN (again at T=0.0001).
- Repeat this MPNN-Relax loop for a total of 4 cycles.
My Alternative Workflow Idea: I was considering an alternative, and potentially computationally cheaper, approach to achieve sequence diversity: 1. Take the original, single backbone from RFdiffusion. Generate 4 sequences on this same fixed backbone using LigandMPNN, but use a higher temperature (e.g., T=0.1, 0.2...?) .
2. Take each of these 4 sequences and run Rosetta FastRelax on them once.
My Questions:
- What are the perceived pros and cons of my proposed workflow versus the iterative one in the paper?
- The authors' method seems like a local "sequence-structure co-optimization," whereas my idea is more of a "fixed-backbone sampling" followed by refinement. Is one inherently superior for this task?
- For those who use ProteinMPNN or LigandMPNN for sampling (not just greedy optimization), what temperature values have you found offer a good balance between meaningful diversity and sequence quality (i.e., avoiding sequences that are too random)?
Any thoughts or experiences with these different design strategies would be extremely helpful.