Skip to content

Backfill publication associations on legacy score calibrations #668

@bencap

Description

@bencap

Summary
The score calibration publication source validator (functional_classifications_require_publication_sources) currently lives on ScoreCalibrationModify instead of ScoreCalibrationBase. This was done intentionally so that legacy calibrations without publication associations can still be serialized for API read responses. Once all existing calibrations are backfilled with appropriate publications, the validator should be moved to ScoreCalibrationBase so it applies on reads as well.

See the TODO comment in src/mavedb/view_models/score_calibration.py on ScoreCalibrationModify.

Requirements
Calibrations with functional classifications must have:

  • At least one method source (relation = 'method')
  • At least one threshold source (relation = 'threshold')
  • At least one classification source (relation = 'classification') if any functional classification has an acmg_classification

Diagnostic SQL
Identify calibrations that have functional classifications but are missing required publication associations:

SELECT
    sc.id AS calibration_id,
    sc.urn AS calibration_urn,
    sc.title,
    BOOL_OR(fc.acmg_classification_id IS NOT NULL) AS has_acmg,
    COALESCE(
        ARRAY_AGG(DISTINCT scpi.relation) FILTER (WHERE scpi.relation IS NOT NULL),
        '{}'
    ) AS existing_relations
FROM score_calibrations sc
JOIN score_calibration_functional_classifications fc
    ON fc.calibration_id = sc.id
LEFT JOIN score_calibration_publication_identifiers scpi
    ON scpi.score_calibration_id = sc.id
GROUP BY sc.id, sc.urn, sc.title
HAVING
    -- Missing method sources
    NOT ('method' = ANY(
        COALESCE(ARRAY_AGG(DISTINCT scpi.relation) FILTER (WHERE scpi.relation IS NOT NULL), '{}')
    ))
    -- Missing threshold sources
    OR NOT ('threshold' = ANY(
        COALESCE(ARRAY_AGG(DISTINCT scpi.relation) FILTER (WHERE scpi.relation IS NOT NULL), '{}')
    ))
    -- Missing classification sources when ACMG is present
    OR (
        BOOL_OR(fc.acmg_classification_id IS NOT NULL)
        AND NOT ('classification' = ANY(
            COALESCE(ARRAY_AGG(DISTINCT scpi.relation) FILTER (WHERE scpi.relation IS NOT NULL), '{}')
        ))
    )
ORDER BY sc.id;

Steps

  1. Run the diagnostic query above against production to identify affected calibrations
  2. Determine the appropriate publications for each affected calibration (likely the publications associated with the parent score set, or the ExCALIBR bioRxiv preprint for ExCALIBR-loaded calibrations)
  3. Write a data migration to insert score_calibration_publication_identifiers rows for each missing relation type
  4. After deploying the migration, move the functional_classifications_require_publication_sources validator from ScoreCalibrationModify to ScoreCalibrationBase and remove the TODO comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    app: backendTask implementation touches the backendtype: maintenanceMaintaining this project

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions