Skip to content

Peptidoforms parsed from MaxQuant output are not always valid #138

@levitsky

Description

@levitsky

Hi!

I am trying to parse a few different MaxQuant output files and the modified sequences extracted from the file are not parsed as ProForma correctly. Here is what I am dealing with. Here is one row from one file:

Raw file	Scan number	Scan index	Sequence	Length	Missed cleavages	Modifications	Modified sequence	Oxidation (M) Probabilities	Oxidation (M) Score Diffs	Oxidation (M)	Proteins	Charge	Fragmentation	Mass analyzer	Type	Scan event number
01974c_BA1-TUM_missing_first_1_01_01-3xHCD-1h-R4	18859	16929	ACVINGMQLK	10	0	Oxidation (M)	_ACVINGM(ox)QLK_	ACVINGM(1)QLK	ACVINGM(101.9)QLK	1	TUM_missing_first_1	2	HCD	FTMS	MULTI-MSMS	31	0	575.29138	1148.5682	0.17725	1.7827292	24.19	0.00036581	101.9	100.71	101.9	1	1	0.812871	0.009392453	0.1333722	18828	6096732	0.426561	-2	0.06135368	y1;y2;y3;y4;y5;y6;y7;y8;y9;y1-NH3;y6-NH3;y7-NH3;y8-NH3;y8(2+);a2;b2;b3;b4;b5	71809.2;29973.1;24011.2;32854.1;190928.3;556904.4;805166.8;685423.2;105967.8;11586.8;26761.8;29306.5;32482.7;10282.6;114158.7;652840.1;435104.2;53264.8;28631	-0.0004923382;-0.0005195443;-0.003007707;0.0006837579;0.0004915272;-0.0003559902;-0.0005144462;0.0002259294;-0.00348233;0.0003633455;-0.00334429;-0.002593298;-0.006689147;0.003100681;4.57087E-05;-6.530397E-05;-0.0001010637;-0.0003278859;-0.002078173	-3.34666;-1.996731;-7.746661;1.277359;0.8298453;-0.5039816;-0.6278023;0.2459745;-3.228739;2.79312;-4.851494;-3.231865;-7.420119;6.744212;0.2239743;-0.2813915;-0.305196;-0.7381029;-3.722506	147.113296508789;260.197387695313;388.258453369141;535.290161132813;592.311817087127;706.355592051727;819.439814488134;918.507488028666;1078.54184448957;130.085891723633;689.33203125;802.415344238281;901.487854003906;459.75439453125;204.080078125;232.075103759766;331.143553435689;444.227844238281;558.272521972656	19	0.5166685	0.2289157	None	Unknown		101.8962;1.189599;0.2601814	ACVINGMQLK;LKDSEGSGTAGK;DAHKSEVAHR	_ACVINGM(ox)QLK_;_LKDSEGSGTAGK_;_DAHKSEVAHR_	66	8	7	7	15	1

and now another file:

Raw file	Scan number	Scan index	Sequence	Length	Missed cleavages	Modifications	Modified sequence	Oxidation (M) Probabilities	Phospho (STY) Probabilities	Oxidation (M) Score diffs	Phospho (STY) Score diffs	Acetyl (Protein N-term)	Oxidation (M)
OXPAL230121_44	14613	7897	AAAEGEMK	8	0	Oxidation (M)	_AAAEGEM(Oxidation (M))K_	AAAEGEM(1)K		AAAEGEM(79)K		0	1	0	P0A9B2	gapA	Glyceraldehyde-3-phosphate dehydrogenase A	2	HCD	FTMS	MULTI-MSMS	1	0.0	411.68674	821.35892	0.31799	0.00013091	-0.57361634	25.967	0.0047701	79.116	68.3	79.116	1.0	1	0	0	0	14612	8211736.5	0.0651129635157789	-8	0.0703334808349609		y1;y2;y4;y5;y6;y7;y5-H2O;y6-H2O;y1-NH3;a2;b2;b3	162904.734375;75995.5625;159223.859375;106600.2265625;201834.5625;48688.8359375;47573.99609375;7619.5693359375;65347.02734375;367253.65625;314543.28125;40487.2734375	2.646063904876428E-05;-0.0001748173209534798;0.00041258822591316857;-0.0005543576077116086;-0.00011734497638826724;0.00025578379938906437;-7.163136046983709E-05;-0.0024110601625579875;-6.3900626571467E-05;3.3939142966232794E-05;3.7367395322007724E-05;-3.99617186985779E-06	0.17986635464753814;-0.5943167934958867;0.8591796057282154;-0.9098936188837109;-0.17249205021037203;0.34044188222478894;-0.12115356235845015;-3.6405240676221244;-0.4912171170462424;0.2949010231855326;0.2611616737682475;-0.018663355086892004	147.11277770996094;294.148378216221;480.2118476304741;609.2554076725077;680.2920844476763;751.3288251067005;591.2443602599604;662.2838134765625;130.08631896972656;115.08655548095703;143.0814666748047;214.11862182617188	12	0.303405304480312	0.0923076923076923		Unknown		79.11602402068965;10.816500709941389;10.816500709941389	AAAEGEMK;ALNDMDK;SGDEWTK	_AAAEGEM(Oxidation (M))K_;_ALNDM(Oxidation (M))DK_;_SGDEWTK_				177	495	7	7	133	639

So when reading the PSMs with psm_utils, you get the following peptidoforms: Peptidoform('ACVINGM[ox]QLK/2') and Peptidoform('AAAEGEM[Oxidation (M)]K/2'), respectively.

If you then try to calculate masses, neither will give the correct result. The first one will actually resolve ox as carboxymethyl because the last-ditch attempt at resolving in Pyteomics is currently a very permissive Unimod search; while the other will just raise an exception. In both files there is a Modifications column where you have the same form for the modification: Oxidation (M). Looks like if we remove the site, we can use the name and have a much better chance of getting a consistent ProForma. However I'm not sure how many other kinds of MaxQuant tables are out there.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions