-
Notifications
You must be signed in to change notification settings - Fork 10
Description
Hi!
I am trying to parse a few different MaxQuant output files and the modified sequences extracted from the file are not parsed as ProForma correctly. Here is what I am dealing with. Here is one row from one file:
Raw file Scan number Scan index Sequence Length Missed cleavages Modifications Modified sequence Oxidation (M) Probabilities Oxidation (M) Score Diffs Oxidation (M) Proteins Charge Fragmentation Mass analyzer Type Scan event number
01974c_BA1-TUM_missing_first_1_01_01-3xHCD-1h-R4 18859 16929 ACVINGMQLK 10 0 Oxidation (M) _ACVINGM(ox)QLK_ ACVINGM(1)QLK ACVINGM(101.9)QLK 1 TUM_missing_first_1 2 HCD FTMS MULTI-MSMS 31 0 575.29138 1148.5682 0.17725 1.7827292 24.19 0.00036581 101.9 100.71 101.9 1 1 0.812871 0.009392453 0.1333722 18828 6096732 0.426561 -2 0.06135368 y1;y2;y3;y4;y5;y6;y7;y8;y9;y1-NH3;y6-NH3;y7-NH3;y8-NH3;y8(2+);a2;b2;b3;b4;b5 71809.2;29973.1;24011.2;32854.1;190928.3;556904.4;805166.8;685423.2;105967.8;11586.8;26761.8;29306.5;32482.7;10282.6;114158.7;652840.1;435104.2;53264.8;28631 -0.0004923382;-0.0005195443;-0.003007707;0.0006837579;0.0004915272;-0.0003559902;-0.0005144462;0.0002259294;-0.00348233;0.0003633455;-0.00334429;-0.002593298;-0.006689147;0.003100681;4.57087E-05;-6.530397E-05;-0.0001010637;-0.0003278859;-0.002078173 -3.34666;-1.996731;-7.746661;1.277359;0.8298453;-0.5039816;-0.6278023;0.2459745;-3.228739;2.79312;-4.851494;-3.231865;-7.420119;6.744212;0.2239743;-0.2813915;-0.305196;-0.7381029;-3.722506 147.113296508789;260.197387695313;388.258453369141;535.290161132813;592.311817087127;706.355592051727;819.439814488134;918.507488028666;1078.54184448957;130.085891723633;689.33203125;802.415344238281;901.487854003906;459.75439453125;204.080078125;232.075103759766;331.143553435689;444.227844238281;558.272521972656 19 0.5166685 0.2289157 None Unknown 101.8962;1.189599;0.2601814 ACVINGMQLK;LKDSEGSGTAGK;DAHKSEVAHR _ACVINGM(ox)QLK_;_LKDSEGSGTAGK_;_DAHKSEVAHR_ 66 8 7 7 15 1and now another file:
Raw file Scan number Scan index Sequence Length Missed cleavages Modifications Modified sequence Oxidation (M) Probabilities Phospho (STY) Probabilities Oxidation (M) Score diffs Phospho (STY) Score diffs Acetyl (Protein N-term) Oxidation (M)
OXPAL230121_44 14613 7897 AAAEGEMK 8 0 Oxidation (M) _AAAEGEM(Oxidation (M))K_ AAAEGEM(1)K AAAEGEM(79)K 0 1 0 P0A9B2 gapA Glyceraldehyde-3-phosphate dehydrogenase A 2 HCD FTMS MULTI-MSMS 1 0.0 411.68674 821.35892 0.31799 0.00013091 -0.57361634 25.967 0.0047701 79.116 68.3 79.116 1.0 1 0 0 0 14612 8211736.5 0.0651129635157789 -8 0.0703334808349609 y1;y2;y4;y5;y6;y7;y5-H2O;y6-H2O;y1-NH3;a2;b2;b3 162904.734375;75995.5625;159223.859375;106600.2265625;201834.5625;48688.8359375;47573.99609375;7619.5693359375;65347.02734375;367253.65625;314543.28125;40487.2734375 2.646063904876428E-05;-0.0001748173209534798;0.00041258822591316857;-0.0005543576077116086;-0.00011734497638826724;0.00025578379938906437;-7.163136046983709E-05;-0.0024110601625579875;-6.3900626571467E-05;3.3939142966232794E-05;3.7367395322007724E-05;-3.99617186985779E-06 0.17986635464753814;-0.5943167934958867;0.8591796057282154;-0.9098936188837109;-0.17249205021037203;0.34044188222478894;-0.12115356235845015;-3.6405240676221244;-0.4912171170462424;0.2949010231855326;0.2611616737682475;-0.018663355086892004 147.11277770996094;294.148378216221;480.2118476304741;609.2554076725077;680.2920844476763;751.3288251067005;591.2443602599604;662.2838134765625;130.08631896972656;115.08655548095703;143.0814666748047;214.11862182617188 12 0.303405304480312 0.0923076923076923 Unknown 79.11602402068965;10.816500709941389;10.816500709941389 AAAEGEMK;ALNDMDK;SGDEWTK _AAAEGEM(Oxidation (M))K_;_ALNDM(Oxidation (M))DK_;_SGDEWTK_ 177 495 7 7 133 639So when reading the PSMs with psm_utils, you get the following peptidoforms: Peptidoform('ACVINGM[ox]QLK/2') and Peptidoform('AAAEGEM[Oxidation (M)]K/2'), respectively.
If you then try to calculate masses, neither will give the correct result. The first one will actually resolve ox as carboxymethyl because the last-ditch attempt at resolving in Pyteomics is currently a very permissive Unimod search; while the other will just raise an exception. In both files there is a Modifications column where you have the same form for the modification: Oxidation (M). Looks like if we remove the site, we can use the name and have a much better chance of getting a consistent ProForma. However I'm not sure how many other kinds of MaxQuant tables are out there.