Skip to content

Build ML pipelines with smart caching and remote execution. Develop locally, deploy to HPC clusters instantly. Track with Aim. 🎯

License

Notifications You must be signed in to change notification settings

tamohannes/urartu

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

119 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

PyPI - Package Version PyPI - Python Version GitHub - License

Urartu

Urartu is an ML workflow runner built around Pipelines (orchestrators) and Actions (reusable steps) with automatic caching and dependency injection.

Installation

pip install urartu

From source:

git clone [email protected]:tamohannes/urartu.git
cd urartu
pip install -e .

Project layout (recommended)

Run the CLI from a project root that contains:

my_project/
β”œβ”€β”€ __init__.py
β”œβ”€β”€ actions/
β”‚   β”œβ”€β”€ __init__.py
β”‚   └── my_action.py
β”œβ”€β”€ pipelines/
β”‚   β”œβ”€β”€ __init__.py
β”‚   └── my_pipeline.py
└── configs/
    └── pipeline/
        └── my_pipeline.yaml

Optional (per-user configs):

my_project/
└── configs_<username>/
    β”œβ”€β”€ aim/
    β”œβ”€β”€ machine/
    └── slurm/

Quickstart

Create a pipeline config:

# configs/pipeline/my_pipeline.yaml
pipeline_name: my_pipeline
debug: false

pipeline:
  experiment_name: "My pipeline"
  device: auto
  seed: 42

  # Pipeline-level cache policy (propagates to actions)
  cache_enabled: true
  force_rerun: false
  cache_max_age_days: 7

  actions:
    - action_name: my_action
      # Action-specific config (merged with pipeline-level common settings)
      some_param: 123

Create the pipeline file:

from aim import Run
from omegaconf import DictConfig
from urartu.common import Pipeline


class MyPipeline(Pipeline):
    pass


def main(cfg: DictConfig, aim_run: Run):
    MyPipeline(cfg, aim_run).main()

Create an action:

from omegaconf import DictConfig
from aim import Run
from urartu.common import Action


class MyAction(Action):
    def run(self):
        cache_dir = self.get_cache_entry_dir()
        run_dir = self.get_run_dir()
        # ... compute, write machine-readable artifacts to cache_dir ...
        # ... write plots/reports to run_dir ...

    def get_outputs(self):
        return {
            "cache_dir": str(self.get_cache_entry_dir()),
            "run_dir": str(self.get_run_dir()),
        }

Run it:

urartu my_pipeline

CLI overrides and config groups

  • Overrides: pipeline.seed=123, pipeline.device=cuda, descr="my run".
  • Config-group selectors (unquoted) load *.yaml files from:
    • configs_<username>/<group>/<selector>.yaml
    • configs/<group>/<selector>.yaml
    • built-in defaults in the Urartu package

Examples:

# Select config files (unquoted values)
urartu my_pipeline machine=local slurm=no_slurm aim=no_aim

# Set literal strings (quoted values)
urartu my_pipeline descr="experiment 001" machine="local"

Notes on outputs and caching

  • Cached, machine-readable artifacts: write under self.get_cache_entry_dir(...) (shared across runs).
  • Run artifacts (plots/reports/logs): write under self.get_run_dir(...) (unique per run).

Citation

If you find Urartu helpful in your research, please cite it:

@software{Tamoyan_Urartu_2023,
  author = {Hovhannes Tamoyan},
  license = {Apache-2.0},
  month = {8},
  title = {{Urartu}},
  url = {https://github.com/tamohannes/urartu},
  year = {2023}
}

About

Build ML pipelines with smart caching and remote execution. Develop locally, deploy to HPC clusters instantly. Track with Aim. 🎯

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published