Skip to content

Medial-EarlySign/medpython

Repository files navigation

Medial EarlySign Libraries

Pepy Total Downloads PyPI - License GitHub contributors GitHub commit activity

Overview of Medial Infrastructure

Medial EarlySign provides an infrastructure to convert Electronic Medical Records (EMR) - a complex, semi-structured time-series dataset into machine-learning-ready data and reproducible model pipelines. The library is optimized for sparse time-series EMR data and is designed for low memory usage and fast processing at scale. Unlike images or free text, EMR data can be stored in complex format. The infrastructure standardize both the storage and the processing of time-series signals. We can think about this infrastructure as "TensorFlow" of medical data machine learning.

Key benefits at a glance:

  • Fast and memory-efficient processing for large-scale EMR sparse time series where general-purpose libraries (e.g., pandas) are often impractical.
  • Shareable, tested pipelines and methods that save engineering time and reduce duplicated effort.
  • Built-in safeguards to reduce data leakage and time-series-specific overfitting.
  • Production-ready: easily deployable in Docker or minimal Linux images.

This framework is deployed in production across multiple healthcare sites and played a key role in our award-winning submission to the CMS AI Health Outcomes Challenge.

Main contributers from recent years:

Challenges

  • Variety of Questions: Risk prediction (e.g., cancer, CKD), compliance, diagnostics, treatment recommendations
  • Medical Data Complexity: Temporal irregularity, high dimensionality (>100k categories), sparse signals, multiple data types
  • Retrospective Data Issues: Noise, bias, spurious patterns, policy sensitivity

Platform Requirements

  • Performance: Ultra-efficient in memory & time (>100x compare to native python pandas in some cases, mainly in preprocessing)
  • Extensibility: Rich APIs, configurable pipelines, support new data types
  • Minimal Rewriting & Ease Of Usage: JSON‑driven configs, unified codebase, python API to the C library
  • Comprehensive: From "raw" data to model deployment
  • Reproducible & Versioned: Track data, code, models, and parameters

Documentation

Please refer to MR_WIKI for full documentation

Setup

To install the python module:

pip install medpython

Build from source:

python -m pip install -v "medpython @ git+https://github.com/Medial-EarlySign/medpython.git/#subdirectory=Internal/MedPyExport/generate_binding"

To explore the code:

  • Internal/MedPython - for the python module. The folder generate_bindings contains the setup.py to install the module.
  • Internal/AlgoMarker - the folder to compile the minimal shared library for production and running it in distroless container.