easypqp-rs is a Rust library for in-silico peptide library generation, with Python bindings for integration with the python EasyPQP library.
-
Fast in-silico library generation using Rust
-
Includes a command-line tool for batch library generation
-
Python bindings for integration within the easypqp Python package
-
Configurable via JSON for fine-tuning predictions, fragmentation settings, and NCE/instrument profiles
easypqp-rs has an optional standalone command-line interface (CLI) binary for generating in-silico libraries. This can be used independently of the EasyPQP Python package if you prefer.
easypqp-insilico ./config.jsonThe tool is configured via a JSON file. Below is a comprehensive guide to all available parameters.
Click to expand full example config.json
{
"database": {
"fasta": "path/to/proteins.fasta",
"enzyme": {
"name": "Trypsin/P",
"cleave_at": "KR",
"restrict": "P",
"c_terminal": null,
"min_len": 7,
"max_len": 50,
"missed_cleavages": 2
},
"peptide_min_mass": 500.0,
"peptide_max_mass": 5000.0,
"generate_decoys": true,
"decoy_tag": "rev_",
"static_mods": {
"C": 57.0215
},
"variable_mods": {
"M": [15.9949],
"[": [42.0106]
},
"max_variable_mods": 2
},
"insilico_settings": {
"precursor_charge": [2, 3, 4],
"max_fragment_charge": 2,
"min_transitions": 6,
"max_transitions": 6,
"fragmentation_model": "HCD",
"allowed_fragment_types": ["b", "y"],
"rt_scale": 100.0
},
"dl_feature_generators": {
"retention_time": {
"model_path": "path/to/rt_model.safetensors",
"constants_path": "path/to/rt_model_const.yaml",
"architecture": "rt_cnn_tf"
},
"ion_mobility": {
"model_path": "path/to/ccs_model.safetensors",
"constants_path": "path/to/ccs_model_const.yaml",
"architecture": "ccs_cnn_tf"
},
"ms2_intensity": {
"model_path": "path/to/ms2_model.pth",
"constants_path": "path/to/ms2_model_const.yaml",
"architecture": "ms2_bert"
},
"device": "cpu",
"instrument": "timsTOF",
"nce": 20.0,
"batch_size": 64,
"fine_tune_config": {
"fine_tune": false,
"train_data_path": "",
"batch_size": 256,
"epochs": 3,
"learning_rate": 0.001,
"save_model": false
}
},
"peptide_chunking": 0,
"output_file": "insilico_library.tsv",
"write_report": true,
"parquet_output": false
}database - FASTA file, enzyme, modifications, and decoy generation
| Parameter | Type | Default | Description |
|---|---|---|---|
fasta |
string | REQUIRED | Path to FASTA protein database file |
generate_decoys |
boolean | true |
Auto-generate decoy sequences by reversing protein sequences |
decoy_tag |
string | "rev_" |
Prefix added to decoy protein names |
peptide_min_mass |
number | 500.0 |
Minimum peptide mass in Daltons |
peptide_max_mass |
number | 5000.0 |
Maximum peptide mass in Daltons |
max_variable_mods |
integer | 2 |
Maximum number of variable modifications per peptide |
Enzyme Configuration:
"enzyme": {
"name": "Trypsin/P", // Enzyme name (for reference)
"cleave_at": "KR", // Amino acids where enzyme cleaves
"restrict": "P", // Amino acid that prevents cleavage if following cleavage site
"c_terminal": true, // Cleavage occurs C-terminal to the cleavage site
"min_len": 7, // Minimum peptide length
"max_len": 50, // Maximum peptide length
"missed_cleavages": 2 // Number of allowed missed cleavages
}Static Modifications:
"static_mods": {
"C": 57.0215 // Carbamidomethylation of Cysteine
}Variable Modifications:
"variable_mods": {
"M": [15.9949], // Oxidation of Methionine
"[": [42.0106] // N-terminal Acetylation
}Common modification masses:
- Carbamidomethyl (C):
57.0215 - Oxidation (M):
15.9949 - Phosphorylation (STY):
79.9663 - N-terminal Acetylation:
42.0106 - Deamidation (NQ):
0.9840
insilico_settings - Precursor/fragment charges, transitions, fragmentation model
| Parameter | Type | Default | Description |
|---|---|---|---|
precursor_charge |
array[int] | [2, 3, 4] |
Precursor charge states to generate |
max_fragment_charge |
integer | 2 |
Maximum fragment ion charge |
min_transitions |
integer | 6 |
Minimum number of transitions per precursor |
max_transitions |
integer | 6 |
Maximum number of transitions per precursor |
fragmentation_model |
string | "HCD" |
Fragmentation type: "HCD", "CID", or "ETD" |
allowed_fragment_types |
array[string] | ["b", "y"] |
Allowed fragment ion types: "b", "y" |
rt_scale |
number | 100.0 |
Retention time scaling factor (multiplies predicted RT) |
[!NOTE] The current MS2 intensity prediction models only support
"b"and"y"fragment ions.
Example:
"insilico_settings": {
"precursor_charge": [2, 3],
"max_fragment_charge": 1,
"min_transitions": 6,
"max_transitions": 12,
"fragmentation_model": "HCD",
"allowed_fragment_types": ["b", "y"],
"rt_scale": 1.0
}Note
If no retention_time, ion_mobility, or ms2_intensity fields are provided under dl_feature_generators, pretrained models will be automatically downloaded and used. The current default pretrained models used are:
- RT:
rt_cnn_tf- A CNN-Transformer model trained on the ProteomicsML repository RT dataset. This model is based on AlphaPeptDeep's CNN-LSTM implementation, with the LSTM replaced by a Transformer encoder. - CCS:
ccs_cnn_tf- A CNN-Transformer model trained on the ProteomicsML repository CCS dataset. This model is also based on AlphaPeptDeep's CNN-LSTM implementation, with the LSTM replaced by a Transformer encoder. - MS2:
ms2_bert- A BERT-based model retreived from AlphaPeptDeep's pretrained models.
dl_feature_generators - Custom or pretrained RT/IM/MS2 prediction models
If this section is omitted or empty, pretrained AlphaPeptDeep models will be automatically downloaded and used.
Model Configuration:
Each model (RT, IM, MS2) requires three files:
{
"model_path": "path/to/model.safetensors", // Model weights (.pth or .safetensors)
"constants_path": "path/to/model_const.yaml", // Model configuration constants
"architecture": "model_architecture_name" // Architecture identifier
}| Parameter | Type | Default | Description |
|---|---|---|---|
retention_time |
object | pretrained | Custom RT prediction model |
ion_mobility |
object | pretrained | Custom IM/CCS prediction model (timsTOF only) |
ms2_intensity |
object | pretrained | Custom MS2 intensity prediction model |
device |
string | "cpu" |
Compute device: "cpu", "cuda", or "mps" (Apple Silicon) |
instrument |
string | "timsTOF" |
Instrument type: "QE" or "timsTOF" |
nce |
number | 20.0 |
Normalized collision energy for fragmentation |
batch_size |
integer | 64 |
Batch size for model inference |
fine_tune_config |
object | see below | Optional fine-tuning configuration |
Supported Architectures:
- RT:
"rt_cnn_tf","rt_lstm","rt_transformer" - IM/CCS:
"ccs_cnn_tf","ccs_lstm" - MS2:
"ms2_bert","ms2_transformer"
fine_tune_config - Transfer learning on experimental data
Fine-tune pretrained models on your own experimental data for improved accuracy.
| Parameter | Type | Default | Description |
|---|---|---|---|
fine_tune |
boolean | false |
Enable fine-tuning |
train_data_path |
string | "" |
Path to training data TSV file |
batch_size |
integer | 256 |
Training batch size |
epochs |
integer | 3 |
Number of training epochs |
learning_rate |
number | 0.001 |
Learning rate for optimizer |
save_model |
boolean | false |
Save fine-tuned model weights to disk |
Training Data Format (TSV):
Required columns:
sequence: Modified sequence with square bracket notation (e.g.,MGC[+57.0215]AAR)precursor_charge: Precursor charge stateretention_time: Experimental retention timeion_mobility: CCS value (only if using timsTOF)fragment_type: Fragment ion type (b,y, etc.)fragment_series_number: Fragment positionproduct_charge: Fragment chargeintensity: Normalized fragment intensity
Example:
"fine_tune_config": {
"fine_tune": true,
"train_data_path": "experimental_data.tsv",
"batch_size": 256,
"epochs": 5,
"learning_rate": 0.0001,
"save_model": true
}Output file format, reporting, and memory management
| Parameter | Type | Default | Description |
|---|---|---|---|
output_file |
string | "insilico_library.tsv" |
Path for output library file |
write_report |
boolean | true |
Generate HTML quality control report |
parquet_output |
boolean | false |
Output in Parquet format instead of TSV |
peptide_chunking |
integer | 0 |
Peptides per chunk (0 = auto-calculate based on memory) |
Peptide Chunking:
0(default): Automatically calculate chunk size based on available memory (recommended)> 0: Manual chunk size for processing large FASTA files with limited RAM- Larger chunks = faster processing but more memory usage
The minimum required configuration only needs a FASTA file:
{
"database": {
"fasta": "proteins.fasta"
}
}All other parameters will use sensible defaults and pretrained models will be auto-downloaded.
You can override JSON configuration values via command-line arguments:
easypqp-insilico config.json \
--fasta my_proteins.fasta \
--output_file my_library.tsv \
--no-write-report \
--parquetAvailable flags:
--fasta <PATH>: Override database FASTA file--output_file <PATH>: Override output file path--no-write-report: Disable HTML report generation--parquet: Output in Parquet format instead of TSV