-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Describe the bug
align_mapper.p_to_g currently fails when converting protein coordinates to genomic coordinates if the amino acid is encoded by a split codon across an exon junction.
A concrete example is BRAF N581S, which corresponds to a codon split across two exons in transcript NM_004333.6.
Steps to reproduce
align_mapper.p_to_g(
p_ac='NP_004324.2',
p_start_pos=580,
p_end_pos=580,
)
The call raises an error:
Unable to find transcript alignment for query:
SELECT hgnc, tx_ac, tx_start_i, tx_end_i, alt_ac, alt_start_i,
alt_end_i, alt_strand, alt_aln_method, ord, tx_exon_id, alt_exon_id
FROM uta_20241220.tx_exon_aln_v
WHERE tx_ac='NM_004333.6'
AND alt_ac LIKE 'NC_00%'
AND alt_aln_method='splign'
AND 1966 BETWEEN tx_start_i AND tx_end_i
AND 1969 BETWEEN tx_start_i AND tx_end_i
ORDER BY CAST(
SUBSTR(alt_ac, position('.' in alt_ac) + 1, LENGTH(alt_ac)) AS INT
)
Expected behavior
p_to_g should be able to handle protein positions whose underlying codon spans an exon junction and return the correct genomic coordinates (potentially as a compound or multi-interval mapping).
Acceptance Criteria
align_mapper.p_to_g successfully maps protein positions whose underlying codon spans an exon junction (split codons).
Possible reason(s)/Suggested Fix
BRAF N581 is encoded by a split codon across an exon boundary in NM_004333.6.
Relevant exon alignment records from uta_20241220.tx_exon_aln_v:
SELECT *
FROM uta_20241220.tx_exon_aln_v
WHERE tx_ac = 'NM_004333.6'
AND alt_ac LIKE 'NC_00%'
AND alt_aln_method = 'splign'
AND tx_start_i < 1975
AND tx_end_i > 1960
ORDER BY tx_start_i, tx_end_i;
The codon corresponding to protein position 581 spans transcript positions 1966–1969, which cross the boundary between exon ord=13 and ord=14.
However, the current query logic requires both tx_start_i and tx_end_i to fall within a single exon alignment row, which fails for split codons.
Suggestion
Consider enhancing p_to_g to:
detect codons spanning exon junctions, and
support mapping them by aggregating multiple exon alignment segments rather than requiring a single tx_exon_aln_v row.
Environment & Version
None