feat: (m-decompose) Module Prompt V3#770
feat: (m-decompose) Module Prompt V3#770csbobby wants to merge 21 commits intogenerative-computing:mainfrom
Conversation
|
The PR description has been updated. Please fill out the template for your PR to be reviewed. |
|
Updated the modular prompts to adapt the recent llms. |
|
Hi This passes the tests and ready to be reviewed @jakelorocco. |
planetf1
left a comment
There was a problem hiding this comment.
A few observations — mostly around parsing robustness and test coverage. Also worth checking whether the from e exception-chaining fix applied in _general_instructions.py is needed in any equivalent except blocks in _subtask_constraint_assign.py.
| raise TagExtractionError( | ||
| 'LLM failed to generate correct tags for extraction: "<assigned_constraints>"' | ||
| ) | ||
| subtask_constraint_assign_str = re.sub( |
There was a problem hiding this comment.
pattern is already compiled as RE_FINAL_SENTENCE in _general_instructions.py — could extract rather than re-compiling on every iteration.
| subtask_constraint_assign_match.group(1).strip() | ||
| if subtask_constraint_assign_match | ||
| else None | ||
| else data[3].strip() |
There was a problem hiding this comment.
silently swallows the block when the tag is missing. general-instructions parser has a nicer graduated fallback (open-tag-only). Log warning at a minimum (how do we return?)
| #MODEL_ID=granite4:latest | ||
|
|
||
| MODEL_ID=qwen2.5:7b | ||
| MODEL_ID=mistral-small3.2:latest # granite4:latest |
There was a problem hiding this comment.
Hardcoded model mistral-small3.2:latest —can we make it variable?
Is the model used documented in the test docs?
How much memory does it need? Is it reasonable on a smaller system (though if it's needed for results, understandable).
| continue | ||
|
|
||
| # Only keep lines starting with "- " | ||
| if stripped.startswith("- "): |
There was a problem hiding this comment.
This silently drops any constraint not prefixed with - (e.g. numbered lists). The old code kept those. Might be worth logging skipped lines or broadening the heuristic a touch.
|
|
||
| # Remove duplicates while preserving order | ||
| subtask_constraint_assign = list( | ||
| dict.fromkeys(subtask_constraint_assign) |
There was a problem hiding this comment.
Probably fine in practice, but dict.fromkeys will quietly collapse genuinely distinct constraints that happen to share identical text.
| ) | ||
|
|
||
| assert requested_templates == ["m_decomp_result_v2.py.jinja2"] | ||
| assert requested_templates == ["m_decomp_result_v3.py.jinja2"] |
There was a problem hiding this comment.
The new fallback parsing paths (open-tag-only, final-sentence stripping, - filtering) don't seem to have dedicated tests yet — could we add a few unit cases?
There was a problem hiding this comment.
Would this example work with a newer granite4 model?
Module Prompt V3
Description
Implementation Checklist
Protocol Compliance
parts()returns list of constituent parts (Components or CBlocks)format_for_llm()returns TemplateRepresentation or string_parse(computed: ModelOutputThunk)parses model output correctly into the specified Component return typeTesting
tests/components/