Skip to content

Commit b4c744d

Browse files
Several bugfixes concerning formatting of CIF files. Support of MacOS (M1 and x64 architectures).
Co-authored-by: Sebastian Deorowicz <[email protected]>
1 parent d25f8ab commit b4c744d

File tree

114 files changed

+2471430
-54595
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

114 files changed

+2471430
-54595
lines changed

.github/workflows/self-hosted.yml

Lines changed: 155 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -12,44 +12,184 @@ jobs:
1212
########################################################################################
1313
checkout:
1414
name: Checkout
15-
runs-on: [self-hosted, protestar]
15+
strategy:
16+
matrix:
17+
machine: [xeon, mac-i7, mac-m1]
18+
runs-on: [self-hosted, protestar, '${{ matrix.machine }}']
1619
steps:
1720
- uses: actions/checkout@v2
1821
with:
1922
submodules: recursive
2023

21-
########################################################################################
22-
make-tests:
23-
name: Make tests
24+
########################################################################################
25+
make:
26+
name: Make
2427
needs: checkout
25-
runs-on: [self-hosted, protestar]
2628
strategy:
2729
fail-fast: false
2830
matrix:
29-
compiler: [g++-9, g++-10, g++-11]
31+
machine: [xeon, mac-i7, mac-m1]
32+
compiler: [g++-11]
33+
runs-on: [self-hosted, protestar, '${{ matrix.machine }}']
3034

3135
steps:
36+
- name: clean
37+
run: |
38+
make clean
3239
- name: make
3340
run: |
3441
make -j32 CXX=${{matrix.compiler}}
3542
cp ./bin/protestar ./bin/protestar-${{matrix.compiler}}
36-
make clean
3743
3844
########################################################################################
39-
pdb-tests:
40-
needs: make-tests
41-
name: PDB tests
42-
runs-on: [self-hosted, protestar]
45+
alpha-fold:
46+
needs: make
47+
name: Alpha-fold
4348
strategy:
4449
fail-fast: false
4550
matrix:
51+
machine: [xeon, mac-i7, mac-m1]
4652
compiler: [g++-11]
47-
subdir: [ligands, nmr, nucleotide]
53+
runs-on: [self-hosted, protestar, '${{ matrix.machine }}']
54+
55+
steps:
56+
- name: add PDB (${{matrix.compiler}})
57+
run: |
58+
./bin/protestar-${{matrix.compiler}} add -v 3 --type pdb --indir ./apsd-data/pdb/ --out test.psa
59+
60+
- name: add CIF (${{matrix.compiler}})
61+
run: |
62+
./bin/protestar-${{matrix.compiler}} add -v 3 --type cif --indir ./apsd-data/cif/ --in test.psa --out test.psa
63+
64+
- name: add PAE (${{matrix.compiler}})
65+
run: |
66+
./bin/protestar-${{matrix.compiler}} add -v 3 --type pae --indir ./apsd-data/pae/ --in test.psa --out test.psa
67+
68+
- name: extract PDB (${{matrix.compiler}})
69+
run: |
70+
mkdir -p ./pdb-dir
71+
./bin/protestar-${{matrix.compiler}} get -v 3 --in test.psa --outdir ./pdb-dir/ --type pdb --all
72+
python3 ./data/cmp.py ./pdb-dir ./apsd-data/pdb/
73+
rm -r ./pdb-dir
74+
75+
- name: extract CIF (${{matrix.compiler}})
76+
run: |
77+
mkdir -p ./cif-dir
78+
./bin/protestar-${{matrix.compiler}} get -v 3 --in test.psa --outdir ./cif-dir/ --type cif --all
79+
rm -r ./cif-dir
80+
81+
- name: extract PAE (${{matrix.compiler}})
82+
run: |
83+
mkdir ./pae-dir
84+
./bin/protestar-${{matrix.compiler}} get -v 3 --in test.psa --outdir ./pae-dir/ --type pae --all
85+
rm -r ./pae-dir
86+
87+
########################################################################################
88+
pdb-lossless:
89+
needs: alpha-fold
90+
name: PDB lossless
91+
strategy:
92+
fail-fast: false
93+
matrix:
94+
machine: [xeon, mac-i7, mac-m1]
95+
compiler: [g++-11]
96+
subdir: [ligands, nmr, nucleotide]
97+
runs-on: [self-hosted, protestar, '${{ matrix.machine }}']
4898
env:
49-
DATA: ../../../../data
50-
OUT: ../../../../out
99+
DATA: ./data
100+
OUT: ./out
51101

52102
steps:
53103
- name: lossless (${{matrix.compiler}}, ${{matrix.subdir}})
54104
run: |
55-
./bin/protestar-${{matrix.compiler}} add --type pdb --indir ${DATA}/${{matrix.subdir}} --out $OUT{}/${{matrix.subdir.psa}}
105+
mkdir -p ${OUT}/${{matrix.subdir}}/pdb-lossless
106+
./bin/protestar-${{matrix.compiler}} add -v 3 --type pdb --indir ${DATA}/${{matrix.subdir}} --out ${OUT}/${{matrix.subdir}}.psa
107+
./bin/protestar-${{matrix.compiler}} get -v 3 --in ${OUT}/${{matrix.subdir}}.psa --outdir ${OUT}/${{matrix.subdir}}/pdb-lossless --type pdb --all
108+
python3 ${DATA}/cmp.py ${OUT}/${{matrix.subdir}}/pdb-lossless ${DATA}/${{matrix.subdir}}
109+
rm -r ${OUT}/${{matrix.subdir}}/pdb-lossless
110+
111+
########################################################################################
112+
pdb-lossy:
113+
needs: alpha-fold
114+
name: PDB minimal and lossy
115+
strategy:
116+
fail-fast: false
117+
matrix:
118+
machine: [xeon, mac-i7, mac-m1]
119+
compiler: [g++-11]
120+
subdir: [ligands]
121+
runs-on: [self-hosted, protestar, '${{ matrix.machine }}']
122+
env:
123+
DATA: ./data
124+
OUT: ./out
125+
126+
steps:
127+
- name: minimal (${{matrix.compiler}}, ${{matrix.subdir}})
128+
run: |
129+
mkdir -p ${OUT}/${{matrix.subdir}}/pdb-minimal
130+
./bin/protestar-${{matrix.compiler}} add -v 3 --type pdb --indir ${DATA}/${{matrix.subdir}} --out ${OUT}/${{matrix.subdir}}-minimal.psa --minimal
131+
./bin/protestar-${{matrix.compiler}} get -v 3 --in ${OUT}/${{matrix.subdir}}-minimal.psa --outdir ${OUT}/${{matrix.subdir}}/pdb-minimal --type pdb --all
132+
python3 ${DATA}/cmp.py ${OUT}/${{matrix.subdir}}/pdb-minimal ${DATA}/${{matrix.subdir}}/minimal
133+
rm -r ${OUT}/${{matrix.subdir}}/pdb-minimal
134+
- name: lossy (${{matrix.compiler}}, ${{matrix.subdir}})
135+
run: |
136+
mkdir -p ${OUT}/${{matrix.subdir}}/pdb-lossy
137+
./bin/protestar-${{matrix.compiler}} add -v 3 --type pdb --indir ${DATA}/${{matrix.subdir}} --out ${OUT}/${{matrix.subdir}}-lossy.psa --lossy --max-error-bb 100 --max-error-sc 100
138+
./bin/protestar-${{matrix.compiler}} get -v 3 --in ${OUT}/${{matrix.subdir}}-lossy.psa --outdir ${OUT}/${{matrix.subdir}}/pdb-lossy --type pdb --all
139+
python3 ${DATA}/cmp.py ${OUT}/${{matrix.subdir}}/pdb-lossy ${DATA}/${{matrix.subdir}}/lossy_100_100
140+
rm -r ${OUT}/${{matrix.subdir}}/pdb-lossy
141+
142+
########################################################################################
143+
cif-lossy:
144+
needs: alpha-fold
145+
name: CIF
146+
strategy:
147+
fail-fast: false
148+
matrix:
149+
machine: [xeon, mac-i7, mac-m1]
150+
compiler: [g++-11]
151+
subdir: [ligands]
152+
runs-on: [self-hosted, protestar, '${{ matrix.machine }}']
153+
env:
154+
DATA: ./data
155+
OUT: ./out
156+
157+
steps:
158+
- name: lossless (${{matrix.compiler}}, ${{matrix.subdir}})
159+
run: |
160+
mkdir -p ${OUT}/${{matrix.subdir}}/cif-lossless
161+
./bin/protestar-${{matrix.compiler}} add -v 3 --type cif --indir ${DATA}/${{matrix.subdir}} --out ${OUT}/${{matrix.subdir}}.psa
162+
./bin/protestar-${{matrix.compiler}} get -v 3 --in ${OUT}/${{matrix.subdir}}.psa --outdir ${OUT}/${{matrix.subdir}}/cif-lossless --type pdb --all
163+
python3 ${DATA}/cmp.py ${OUT}/${{matrix.subdir}}/cif-lossless ${DATA}/${{matrix.subdir}}/cif-lossless
164+
rm -r ${OUT}/${{matrix.subdir}}/cif-lossless
165+
- name: minimal (${{matrix.compiler}}, ${{matrix.subdir}})
166+
run: |
167+
mkdir -p ${OUT}/${{matrix.subdir}}/cif-minimal
168+
./bin/protestar-${{matrix.compiler}} add -v 3 --type cif --indir ${DATA}/${{matrix.subdir}} --out ${OUT}/${{matrix.subdir}}-minimal.psa --minimal
169+
./bin/protestar-${{matrix.compiler}} get -v 3 --in ${OUT}/${{matrix.subdir}}-minimal.psa --outdir ${OUT}/${{matrix.subdir}}/cif-minimal --type pdb --all
170+
python3 ${DATA}/cmp.py ${OUT}/${{matrix.subdir}}/cif-minimal ${DATA}/${{matrix.subdir}}/minimal
171+
rm -r ${OUT}/${{matrix.subdir}}/cif-minimal
172+
- name: lossy (${{matrix.compiler}}, ${{matrix.subdir}})
173+
run: |
174+
mkdir -p ${OUT}/${{matrix.subdir}}/cif-lossy
175+
./bin/protestar-${{matrix.compiler}} add -v 3 --type pdb --indir ${DATA}/${{matrix.subdir}} --out ${OUT}/${{matrix.subdir}}-lossy.psa --lossy --max-error-bb 100 --max-error-sc 100
176+
./bin/protestar-${{matrix.compiler}} get -v 3 --in ${OUT}/${{matrix.subdir}}-lossy.psa --outdir ${OUT}/${{matrix.subdir}}/cif-lossy --type pdb --all
177+
python3 ${DATA}/cmp.py ${OUT}/${{matrix.subdir}}/cif-lossy ${DATA}/${{matrix.subdir}}/lossy_100_100
178+
rm -r ${OUT}/${{matrix.subdir}}/cif-lossy
179+
180+
181+
########################################################################################
182+
pyprotestar:
183+
name: pyprotestar tests
184+
needs: checkout
185+
strategy:
186+
fail-fast: false
187+
matrix:
188+
machine: [xeon, mac-i7]
189+
compiler: [g++-11]
190+
runs-on: [self-hosted, protestar, '${{ matrix.machine }}']
191+
192+
steps:
193+
- name: make
194+
run: |
195+
make -j32 pyprotestar

.gitignore

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -366,3 +366,17 @@ FodyWeavers.xsd
366366
# make sure everything in data is not ignored
367367
!data/**
368368
/src/_old/.gitmodules
369+
/data/ligands/ligands.psa
370+
/data/apsd-cif-out
371+
/data/apsd-pdb-out
372+
/data/ligands-out
373+
/data/pdb-out
374+
/data/apsd-cif.psa
375+
/data/apsd-pdb.psa
376+
/data/cif.psa
377+
/data/ligands.psa
378+
/data/pdb.psa
379+
/data/cif-out
380+
/data/a
381+
/data/bin.bin0
382+
/data/bin.bin

README.md

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,10 @@ ProteStAr should be downloaded from [https://github.com/refresh-bio/protestar](h
5151
Support for MacOS and well as ARM-based CPUs will be added soon.
5252

5353
## Version history
54+
* 1.1.0 (8 May 2024)
55+
* Support of ANISOU, SIGATM, and SIGUIJ sections in PDB files.
56+
* Some fixes in the PDB and CIF output formatting.
57+
* Support of MacOS (M1 and x64 architectures).
5458
* 1.0.0 (8 December 2023)
5559
* *pyprotestar* Python package added,
5660
* fixed incorrect alignment of ATOM column in some PDB files.
@@ -124,7 +128,7 @@ THe C++ API is provided in `src/lib-cxx/protestar-api.h` file.
124128
You can also take a look at `src/example_api` to see the API in use.
125129

126130
### Python package
127-
ProteStAr archives can be accessed through *pyprotestar* Python package. The package is built automatically together with ProteStAr binaries but can be also compiled separately:
131+
ProteStAr archives can be accessed through *pyprotestar* Python package. The package has to be compiled separately:
128132
```
129133
make -j pyprotestar
130134
```
@@ -151,8 +155,7 @@ The data in apsd-data were selected from [AlphaFold Protein Structure Database](
151155
* The subset of ESM Atlas used in the experiments can be downloaded from [ESM subset](https://polslpl-my.sharepoint.com/:u:/g/personal/sdeorowicz_polsl_pl/EZlvCxYITEhNuXJeorf5vggBQlwCuBiEu6vzoUmEutAtoA?e=fYI6an) (1.6 GB file).
152156

153157
## Known issues and limitations
154-
* After the decompression of CIF and PDB files, the formating of the tables may be a bit different than the original one. The contents is, however, the same. This will be fixed in one of the future releases.
155-
* In atom tables (CIF, PDB) we support only ATOM and HETATM sections. Thus other lines, like ANISOU, SIGUIJ will be skipper. This will be fixed in one on the future releases.
158+
* After decompression of CIF files, the formating of tables may be a bit different than the original one. The contents is, however, the same.
156159

157160
## Citing
158161

data/cmp.py

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
import os
2+
import sys
3+
import filecmp
4+
import gzip
5+
6+
if len(sys.argv) != 3:
7+
print("Usage: cmp.py <check-dir> <ref-dir>")
8+
9+
check_dir = sys.argv[1]
10+
ref_dir = sys.argv[2]
11+
12+
all_agree = True
13+
print("Comparing files...")
14+
for file in os.listdir(check_dir):
15+
if os.path.isfile(check_dir + "/" + file):
16+
17+
print(file + " ", end='')
18+
19+
ref_path = ref_dir + "/" + file
20+
if os.path.exists(ref_path):
21+
f1 = open(check_dir + "/" + file, 'rt')
22+
f2 = open(ref_path, 'rt')
23+
24+
b1 = f1.read()
25+
b2 = f2.read()
26+
agree = (b1 == b2)
27+
all_agree &= agree
28+
print(agree)
29+
elif os.path.exists(ref_path + ".gz"):
30+
f1 = open(check_dir + "/" + file, 'rt')
31+
f2 = gzip.open(ref_path + ".gz", 'rt')
32+
33+
b1 = f1.read()
34+
b2 = f2.read()
35+
agree = (b1 == b2)
36+
all_agree &= agree
37+
print(agree, "(gz)")
38+
else:
39+
all_agree = False
40+
print("Not found")
41+
42+
if all_agree:
43+
sys.exit(0)
44+
45+
sys.exit(-1)

0 commit comments

Comments
 (0)