Prot2Prop

Structure-aware fine-tuning of protein language models for joint prediction of multiple developability properties from protein structure inputs.

Objectives

This project aims to train one shared adapter + task‑specific heads for key developability properties relevant to enzymes, binders, synthetic biology, and related protein engineering tasks.

Aggregate data from multiple datasets for properties of interest
Train a single adapter (or LoRA block) on all properties with a separate head per property.
Use a multi‑task loss with masking for missing labels.
Repeat training to build ensembles and evaluate ensembling strategies.

Properties (Primary)

Thermal unfolding metrics (Tm, ΔG)
Stability under pH or ionic strength shifts
Aggregation propensity
Solubility
Secretion signal / signal peptide presence
Protease susceptibility / half-life
Immunogenicity / epitope likelihood
Oligomerization state / assembly propensity
Metal binding / cofactor dependence
Membrane association / transmembrane topology

Properties (Secondary)

Properties for future investigation (lower priority and potentially less synergistic with the primary set).

Binding affinity (protein–protein, protein–ligand)
Enzymatic activity (kcat, Km, specificity)
Expression yield / solubility in specific hosts
Subcellular localization
Allosteric regulation potential
Disorder content / intrinsically disordered regions
PTM site likelihood (phosphorylation, glycosylation, ubiquitination, acetylation)

I suggest we ignore these for the time being, as they all require some degree of auxiliary information. For example, binding affinity depends on interaction partners, while enzymatic activity and kcat are only meaningful with substrate information. Without this context, I'm of the opinion that these properties would be of limited value. That said, I’ve kept them listed to provide a sense of future direction for the project, once the initial set of primary properties has been implemented.

Motivation

The current ecosystem provides many task-specific models for properties such as solubility and stability, but few efforts unify these objectives within a single, structure-aware model. A multi-property framework can reduce parameter count and improve throughput, while potentially improving accuracy through shared signal across correlated biochemical traits (e.g., stability-related metrics).

Additionally, widely used tools (e.g., NetSolP-1.0) rely on older base models and datasets. With improved architectures, larger corpora, and better training algorithms now available, a modern, multi-property, structure-aware approach is both timely and high-impact.

Installation

# install python dependencies
python -m venv .venv && source .venv/bin/activate
pip install -e .

# training-only dependencies
pip install -e ".[dev]"

Results (WIP)

Crude Solubility Results for Reference

Epoch 1/3: 100%|???????????| 7810/7810 [1:05:11<00:00,  2.00it/s]
Train Loss: 0.5971 | Val Acc: 0.7097 F1: 0.7000
Epoch 2/3: 100%|???????????| 7810/7810 [1:05:04<00:00,  2.00it/s]
Train Loss: 0.5374 | Val Acc: 0.7233 F1: 0.7051
Epoch 3/3: 100%|???????????| 7810/7810 [1:04:57<00:00,  2.00it/s]
Train Loss: 0.5194 | Val Acc: 0.7224 F1: 0.7146

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
examples		examples
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
aggregate_data.py		aggregate_data.py
inference.py		inference.py
pyproject.toml		pyproject.toml
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Prot2Prop

Objectives

Properties (Primary)

Properties (Secondary)

Motivation

Installation

Results (WIP)

Crude Solubility Results for Reference

About

Uh oh!

Releases

Packages

Languages

License

NeurosnapInc/Prot2Prop

Folders and files

Latest commit

History

Repository files navigation

Prot2Prop

Objectives

Properties (Primary)

Properties (Secondary)

Motivation

Installation

Results (WIP)

Crude Solubility Results for Reference

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages