Skip to content

Commit e8ebd9a

Browse files
committed
Update README for msa server
1 parent c4af2f0 commit e8ebd9a

File tree

1 file changed

+33
-29
lines changed

1 file changed

+33
-29
lines changed

MsaServer/README.md

Lines changed: 33 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -1,42 +1,56 @@
11
# Setting up your local ColabFold API server
22

3-
Here you will find two examples of how to setup your own API server on a Linux machine.
3+
Here you will find two examples of how to setup your own API server on a Linux (or macOS for testing) machine.
44

55
## `setup-and-start-local.sh`
66

77
The `setup-and-start-local.sh` script will execute most of the steps to get a server running for you.
88
It will do the following steps:
9-
* check that all required software is installed (go, git, aria2c, curl)
10-
* download a specific MMseqs2 version that we tested to work with ColabFold
9+
* check that all required software is installed (`curl`, `aria2c`, `rsync`, `aws`)
10+
* download pinned **MMseqs2** and **mmseqs-server** binaries for your platform (Linux x86\_64/arm64, macOS universal)
1111
* download the databases (UniRef30 and ColabFoldDB, this might take some time)
1212
* download the API server and compile its binary
1313
* start the API server
1414

15-
The script can be called repeatedly to start the server. It will avoid doing any unnecessary setup work.
15+
The script can be called repeatedly to (re)start the server. It avoids unnecessary work and only re-downloads components when the pinned commit changed.
1616

17-
### Tweaking `config.json`
18-
You can tweak the provided example `config.json` file. Two values you will likely want to change are the:
19-
* `server.address` field to specify a custom port. We recommend putting a `nginx` server infront of the ColabFold API server to deal with gzip compression, SSL etc.
20-
* `local.workers` field to specify how many job workers are allowed to run in parallel.
17+
### CPU/GPU and platform detection
2118

22-
## Setup a systemd service
23-
To better manage the ColabFold API server, we recommend to setup a systemd service. It will automatically restart the server in case of failure and allow managing the server with common operating system tools (like `journalctl`, `systemctl`, etc.).
19+
* Uncomment the `export GPU=1` line to enable GPU mode (Linux only).
20+
* The script adds the parameters `--paths.colabfold.gpu.gpu 1 --paths.colabfold.gpu.server 1`. See `config.json` for more details.
21+
22+
### Choosing a PDB rsync mirror
23+
24+
At the top of the script you can set the PDB mirror to use (RCSB, PDBe or PDBj).
25+
Uncomment the pair you want. The script exits if no mirror is selected.
26+
27+
### Configuration
2428

25-
You can first execute the `setup-and-start-local.sh` script to get a working server. Then tweak the `systemd-example-mmseqs-server.service` file and adjust the directories to the MSA server binary/directories.
29+
Edit `config.json` as needed. Common tweaks:
2630

27-
Afterwards copy the service file to `/etc/systemd/system/mmseqs-server.service` (this might vary by distribution).
31+
* `server.address` — change the bind address/port (we recommend putting `nginx` in front for gzip/SSL).
32+
* `local.workers` — number of local job workers.
33+
* Optional GPU block under `paths.colabfold.gpu` lets you pin device IDs per DB when you run multi-GPU.
34+
* A `server.ratelimit` example is included and can be enabled.
35+
36+
### Run
2837

29-
Then call:
3038
```
31-
sudo systemctl daemon-reload
32-
sudo systemctl start mmseqs-server.service
39+
./setup-and-start-local.sh
3340
```
3441

35-
The `restart-systemd.sh` script contains an example how to stop the server, clear the job cache and start it again.
42+
If `DEBUG_MINI_DB=1` is set, the server starts with templates disabled and a tiny DB for quick tests.
43+
44+
## Setup a systemd service
45+
To better manage the ColabFold API server, we recommend to setup a systemd service. It will automatically restart on failure and lets you use `journalctl`/`systemctl`.
46+
47+
1. First run `setup-and-start-local.sh` once to get the folder structure and binaries.
48+
2. Adjust the `systemd-example-mmseqs-server.service` example and point it to your paths:
49+
3. Enable and start `./restart-systemd.sh`
3650

3751
## Forcing databases to stay resident in system memory
3852

39-
The ColabFold MSA API server will only achieve response time of few seconds if the search database are held fully within system memory. We use vmtouch (https://github.com/hoytech/vmtouch) to keep the precomputed database index file within system memory. This is the most expensive part of the MSA API server, as the two default databases (UniRef30+ColabFoldDB), require currently 768GB-1024GB RAM to stay resident in RAM and have enough RAM spare for worker processes. If you are only running batch searches or are using the command line tool with our API server, system requirements are much much lower.
53+
The ColabFold MSA API server will only achieve response time of few seconds if the search database are held fully within system memory. We use vmtouch (https://github.com/hoytech/vmtouch) to keep the precomputed database index file within system memory. In CPU mode, this is the most expensive part of the MSA API server, as the two default databases (UniRef30+ColabFoldDB) require currently 768GB-1024GB RAM to stay resident in RAM and have enough RAM spare for worker processes.
4054

4155
After installing `vmtouch`, you can execute the following command to make sure that the search databases are not evicted from the system cache:
4256

@@ -45,17 +59,7 @@ cd databases
4559
sudo vmtouch -f -w -t -l -d -m 1000G *.idx
4660
```
4761

48-
This assumes that precomputed database index was created without splits. Check that there are no `uniref30_2103_db.idx.{0,1,...}` or `colabfold_envdb_202108_db.idx.{0,1,...}` files in the databases folder. If these files are there, you should recreate the precomputed database indices with the following command:
49-
50-
```
51-
cd databases
52-
rm uniref30_2103_db.idx* colabfold_envdb_202108_db.idx*
53-
mmseqs createindex uniref30_2103_db tmp --remove-tmp-files 1 --split 1
54-
mmseqs createindex colabfold_envdb_202108_db tmp --remove-tmp-files 1 --split 1
55-
```
56-
5762
## Using a custom API server
5863

59-
You can now pass the server URL to `colabfold_batch`'s `--host-url` parameter. If you want to use the notebook with a custom API server add a `host_url=https://yourserver.example.org` parameter to the `run()` call in the *Run Prediction* cell.
60-
61-
Templates are still requested from our server (if the `--templates` flag is used). We will improve templates in a future release.
64+
You can pass the server URL to `colabfold_batch` via `--host-url`.
65+
In notebooks, add `host_url=https://yourserver.example.org` to the `run()` call in the *Run Prediction* cell.

0 commit comments

Comments
 (0)