A Copier template for creating structured ontology curation workflows, specifically designed for the METPO (Microbial Environment and Traits Ontology) and KG-Microbe knowledge graph projects.
- Google Sheets Integration: Automatically fetch and split ROBOT templates from Google Sheets
- Overlapping Assignments: Configurable overlap for inter-curator agreement assessment
- LLM-Guided Curation: Safe, structured prompts for using LLMs in curation
- Version Control: Git-based workflow with issue β branch β commits β PR pattern
- OBO Foundry Validators: Automatic checking of FP-006 (textual definitions) and FP-012 (naming conventions)
- Definition Quality Checks: Genus-differentia form, circularity, source validation
- CI/CD: GitHub Actions for automated validation on pull requests
- Pre-commit Hooks: Catch issues before they're committed
- Type Checking: Full mypy support with type hints
- Linting: Ruff for code quality
- Testing: pytest with example tests
- Just Commands: Simple, memorable commands for all operations
- Comprehensive Documentation: README, CURATION_GUIDE, and inline help
- Step-by-Step Guides: Detailed CURATION_GUIDE.md for beginners
- Approved LLM Prompts: Templates following best practices
- Workflow Helpers: Commands like
just new-branchandjust progress - Learning-Oriented: Teaches git, Python, ontology principles, and LLM best practices
- Python >= 3.10
- uv (installation)
- copier with jinja2-time:
uv tool install copier --with jinja2-time
- just (optional but recommended):
uv tool install rust-just
# Create and navigate to your project directory
mkdir metpo-curation-workspace
cd metpo-curation-workspace
# Generate from template
copier copy --trust https://github.com/berkeleybop/metpo-kgm-copier .You'll be prompted for:
- Project name (default: metpo-curation-workspace)
- Python package name (default: metpo_curation)
- Your name and email
- GitHub organization
- Google Sheets ID and GID
- Number of curators (default: 3)
- Overlap percentage (default: 10)
just setup# Fetch assignments from Google Sheets
just fetch-assignments
# Read the workflow guide
cat CURATION_GUIDE.md
# Validate your work
just validate-allmetpo-kgm-copier/
βββ copier.yaml # Copier configuration
βββ template/ # Template files
β βββ src/ # Python source (validators, splitter)
β βββ prompts/templates/ # LLM prompt templates
β βββ tests/ # Unit tests
β βββ .github/workflows/ # CI/CD
β βββ justfile.jinja # Command runner
β βββ README.md.jinja # Project documentation
βββ README.md # This file
Edit copier.yaml to change default values:
- Number of curators
- Overlap percentage
- Google Sheets ID
- License, Python version, etc.
Add validation logic to template/src/{{project_slug}}/validators.py.
Create new prompt templates in template/prompts/templates/.
Edit template/justfile.jinja to add or modify commands.
This template is designed to teach marketable skills:
- Ontology Development: OBO Foundry principles, ROBOT templates
- Version Control: Git branching, commits, pull requests, code review
- Python Development: Type hints, testing, linting, modern tools (uv, ruff, mypy)
- LLM Best Practices: Prompt engineering, critical evaluation, reproducibility
- Scientific Curation: Literature search, source verification, domain knowledge
- Collaborative Coding: Issue tracking, PR workflow, documentation
Projects generated from this template can be updated when the template improves:
cd your-generated-project
copier update --trustThis will merge template changes into your project, preserving your customizations.
- Validators catch common errors
- Pre-commit hooks prevent bad commits
- CI/CD ensures quality before merging
- All prompts and outputs tracked in git
- Timestamped execution records
- Clear audit trail from LLM to final definition
- Extensive documentation
- Helpful error messages
- Progressive complexity
- Fork-friendly workflow
- Overlap for inter-curator agreement
- Clear contribution guidelines
just fetch-assignmentsCreates overlapping TSV files from Google Sheets ROBOT template.
just validate-file assignments/curator1.tsvChecks:
- Definition exists and is non-empty
- Follows genus-differentia form
- No circularity
- Sources in valid format (PMID:, DOI:, etc.)
- Label follows FP-012 naming conventions
just progress curator1Shows:
- Total classes assigned
- Raw outputs created
- Reviewed outputs completed
- monarch-project-copier: Base template for Monarch projects
- linkml-project-copier: LinkML schema projects
- METPO: Microbial Environment and Traits Ontology
- ROBOT: Tool for ontology development
Improvements to this template are welcome!
- Fork this repository
- Create a branch for your changes
- Test by materializing a project:
copier copy --vcs-ref your-branch . /tmp/test-project - Submit a pull request
BSD-3-Clause (or MIT, Apache-2.0 - configurable in generated projects)
Created by Montana Smith and team at Lawrence Berkeley National Laboratory for the METPO/KG-Microbe project.
Based on monarch-project-copier by Chris Mungall and David Linke.
- Issues: https://github.com/berkeleybop/metpo-kgm-copier/issues
- Discussions: Ask in METPO team meetings
- Documentation: See generated project's README.md and CURATION_GUIDE.md
Happy Curating! π¦ π¬