Skip to content

Copier template for METPO/KG-Microbe ontology curation workflow

Notifications You must be signed in to change notification settings

berkeleybop/metpo-kgm-copier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

METPO/KG-Microbe Curation Copier Template

A Copier template for creating structured ontology curation workflows, specifically designed for the METPO (Microbial Environment and Traits Ontology) and KG-Microbe knowledge graph projects.

Features

🎯 Curation Workflow

  • Google Sheets Integration: Automatically fetch and split ROBOT templates from Google Sheets
  • Overlapping Assignments: Configurable overlap for inter-curator agreement assessment
  • LLM-Guided Curation: Safe, structured prompts for using LLMs in curation
  • Version Control: Git-based workflow with issue β†’ branch β†’ commits β†’ PR pattern

βœ… Quality Assurance

  • OBO Foundry Validators: Automatic checking of FP-006 (textual definitions) and FP-012 (naming conventions)
  • Definition Quality Checks: Genus-differentia form, circularity, source validation
  • CI/CD: GitHub Actions for automated validation on pull requests
  • Pre-commit Hooks: Catch issues before they're committed

πŸ› οΈ Developer Experience

  • Type Checking: Full mypy support with type hints
  • Linting: Ruff for code quality
  • Testing: pytest with example tests
  • Just Commands: Simple, memorable commands for all operations
  • Comprehensive Documentation: README, CURATION_GUIDE, and inline help

πŸŽ“ Intern-Friendly

  • Step-by-Step Guides: Detailed CURATION_GUIDE.md for beginners
  • Approved LLM Prompts: Templates following best practices
  • Workflow Helpers: Commands like just new-branch and just progress
  • Learning-Oriented: Teaches git, Python, ontology principles, and LLM best practices

Prerequisites

  • Python >= 3.10
  • uv (installation)
  • copier with jinja2-time:
    uv tool install copier --with jinja2-time
  • just (optional but recommended):
    uv tool install rust-just

Quick Start

1. Generate a New Workspace

# Create and navigate to your project directory
mkdir metpo-curation-workspace
cd metpo-curation-workspace

# Generate from template
copier copy --trust https://github.com/berkeleybop/metpo-kgm-copier .

You'll be prompted for:

  • Project name (default: metpo-curation-workspace)
  • Python package name (default: metpo_curation)
  • Your name and email
  • GitHub organization
  • Google Sheets ID and GID
  • Number of curators (default: 3)
  • Overlap percentage (default: 10)

2. Set Up the Workspace

just setup

3. Start Curating

# Fetch assignments from Google Sheets
just fetch-assignments

# Read the workflow guide
cat CURATION_GUIDE.md

# Validate your work
just validate-all

Template Structure

metpo-kgm-copier/
β”œβ”€β”€ copier.yaml              # Copier configuration
β”œβ”€β”€ template/                # Template files
β”‚   β”œβ”€β”€ src/                 # Python source (validators, splitter)
β”‚   β”œβ”€β”€ prompts/templates/   # LLM prompt templates
β”‚   β”œβ”€β”€ tests/              # Unit tests
β”‚   β”œβ”€β”€ .github/workflows/  # CI/CD
β”‚   β”œβ”€β”€ justfile.jinja      # Command runner
β”‚   └── README.md.jinja     # Project documentation
└── README.md               # This file

Customization

Changing Defaults

Edit copier.yaml to change default values:

  • Number of curators
  • Overlap percentage
  • Google Sheets ID
  • License, Python version, etc.

Adding New Validators

Add validation logic to template/src/{{project_slug}}/validators.py.

Adding New Prompts

Create new prompt templates in template/prompts/templates/.

Modifying Workflow

Edit template/justfile.jinja to add or modify commands.

For Interns: What You'll Learn

This template is designed to teach marketable skills:

  1. Ontology Development: OBO Foundry principles, ROBOT templates
  2. Version Control: Git branching, commits, pull requests, code review
  3. Python Development: Type hints, testing, linting, modern tools (uv, ruff, mypy)
  4. LLM Best Practices: Prompt engineering, critical evaluation, reproducibility
  5. Scientific Curation: Literature search, source verification, domain knowledge
  6. Collaborative Coding: Issue tracking, PR workflow, documentation

Updating Generated Projects

Projects generated from this template can be updated when the template improves:

cd your-generated-project
copier update --trust

This will merge template changes into your project, preserving your customizations.

Design Principles

1. Safety First

  • Validators catch common errors
  • Pre-commit hooks prevent bad commits
  • CI/CD ensures quality before merging

2. Reproducibility

  • All prompts and outputs tracked in git
  • Timestamped execution records
  • Clear audit trail from LLM to final definition

3. Learning-Oriented

  • Extensive documentation
  • Helpful error messages
  • Progressive complexity

4. Team Collaboration

  • Fork-friendly workflow
  • Overlap for inter-curator agreement
  • Clear contribution guidelines

Examples

Fetch and Split Assignments

just fetch-assignments

Creates overlapping TSV files from Google Sheets ROBOT template.

Validate Definitions

just validate-file assignments/curator1.tsv

Checks:

  • Definition exists and is non-empty
  • Follows genus-differentia form
  • No circularity
  • Sources in valid format (PMID:, DOI:, etc.)
  • Label follows FP-012 naming conventions

Track Progress

just progress curator1

Shows:

  • Total classes assigned
  • Raw outputs created
  • Reviewed outputs completed

Related Projects

Contributing

Improvements to this template are welcome!

  1. Fork this repository
  2. Create a branch for your changes
  3. Test by materializing a project: copier copy --vcs-ref your-branch . /tmp/test-project
  4. Submit a pull request

License

BSD-3-Clause (or MIT, Apache-2.0 - configurable in generated projects)

Authors

Created by Montana Smith and team at Lawrence Berkeley National Laboratory for the METPO/KG-Microbe project.

Based on monarch-project-copier by Chris Mungall and David Linke.

Support


Happy Curating! πŸ¦ πŸ”¬

About

Copier template for METPO/KG-Microbe ontology curation workflow

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published