Skip to content

[Pelis Agent Factory Advisor] Agentic Workflow Maturity Report β€” Mar 2026Β #1115

@github-actions

Description

@github-actions

πŸ“Š Executive Summary

gh-aw-firewall is a highly mature agentic workflow repository β€” among the most advanced outside the gh-aw factory itself β€” with 28 compiled agentic workflows spanning security, CI/CD, documentation, and multi-engine smoke testing. However, three actionable gaps stand out: no issue triage/labeling agent, no meta-agent to audit workflow health, and a missing Firewall Escape Test Agent that is referenced in the security review workflow but doesn't yet exist.


πŸŽ“ Patterns Learned from Pelis Agent Factory

From crawling the full Pelis blog series and exploring the githubnext/agentics reference repo, the key patterns are:

Pattern Description Present here?
Specialization Many focused workflows vs one monolithic agent βœ… Yes β€” 28 specialized workflows
Multi-engine Different AI models for different tasks βœ… Yes β€” claude, codex, copilot
Meta-agents Agents that monitor other agents (Audit Workflows, Workflow Health Manager) ❌ Missing
Cascade workflows Issues β†’ downstream PR chains via issue-monster βœ… Partial β€” issue-monster exists
Cache-memory Cross-run persistent state (e.g., issue-duplication-detector) βœ… Yes
skip-if-match Preventing duplicate outputs ⚠️ Partially broken β€” duplicates observed
Observability Metrics Collector, Portfolio Analyst ❌ Missing
Issue triage Automated labeling + triage comments ❌ Missing
Code quality agents Continuous Simplicity, Refactoring, Style ❌ Missing
Breaking change detection Alerting on backward-incompatible changes ❌ Missing
Daily malicious code scan Supply chain defense ❌ Missing

πŸ“‹ Current Agentic Workflow Inventory

Workflow Purpose Trigger Engine Assessment
build-test-{bun,cpp,deno,dotnet,go,java,node,rust} Build & test PRs in 8 ecosystems PR opened/sync copilot βœ… Excellent coverage
ci-cd-gaps-assessment Daily CI/CD gap analysis Schedule daily copilot βœ… Active, creating discussions
ci-doctor Investigate CI failures, open issues workflow_run failed copilot βœ… Core workflow
cli-flag-consistency-checker Weekly CLI flag consistency check Schedule weekly copilot βœ… Good hygiene
dependency-security-monitor Daily CVE monitoring + dep PRs Schedule daily copilot βœ… Very active (3 open PRs)
doc-maintainer Daily docs sync with code changes Schedule daily copilot βœ… Good coverage
issue-duplication-detector Detect duplicate issues Issue opened copilot βœ… Uses cache-memory
issue-monster Dispatch issues to Copilot SWE agent Issue opened + hourly copilot βœ… Core orchestrator
pelis-agent-factory-advisor This workflow Schedule daily copilot ⚠️ UNCOMPILED
plan /plan slash command Discussion/issue comment copilot βœ… Interactive
secret-digger-claude/codex/copilot Hourly secret scanning (3 engines) Hourly cron all 3 ⚠️ Codex + Copilot failing
security-guard PR security review PR opened/sync claude βœ… Excellent for this repo
security-review Daily comprehensive security review Schedule daily copilot βœ… Very thorough
smoke-{chroot,claude,codex,copilot} End-to-end smoke tests PR + schedule all 3 + copilot βœ… Multi-engine, excellent
test-coverage-improver Weekly test coverage PRs Schedule weekly copilot ⚠️ UNCOMPILED
update-release-notes Enhance release notes on publish Release published copilot βœ… Good

🚨 Immediate Issues to Address

These are operational problems with existing workflows that need fixing now.

1. Two workflows are uncompiled (pelis-agent-factory-advisor, test-coverage-improver)

  • These will not run because GitHub Actions executes the .lock.yml files, not the .md files
  • Run gh aw compile .github/workflows/test-coverage-improver.md and gh aw compile .github/workflows/pelis-agent-factory-advisor.md followed by the post-processing script

2. Duplicate discussions accumulating

3. Secret Digger failing for Codex + Copilot engines (#1107, #1105)

  • Three parallel hourly secret scanners β€” the codex and copilot variants are failing; investigate and fix

4. Three open Dependency PRs stacking without merge (#1114, #1110, #1104)

  • dependency-security-monitor is creating PRs faster than they're being merged; consider adding auto-merge for patch-level safe updates or a stale-PR cleanup

πŸš€ Actionable Recommendations

P0 β€” Implement Immediately

P0.1: Issue Triage Agent

What: Automatically label incoming issues with appropriate categories (bug, security, enhancement, documentation, question, good-first-issue)

Why: Currently 10 open issues have zero labels, making the issue tracker hard to navigate. The issue-monster dispatches issues but skips unlabeled/un-triaged ones. A triage agent feeds better quality issues into the cascade. From the factory: issue triage is the "hello world" of agentic workflows with immediate, clear value.

How: Add a new issue-triage.md workflow triggered on issues: [opened] with safe-outputs: add-labels and add-comment. Uses codebase context to label issues by analyzing title + body.

Effort: Low

---
on:
  issues:
    types: [opened]
permissions:
  issues: read
  contents: read
tools:
  github:
    toolsets: [issues, labels]
safe-outputs:
  add-labels:
    allowed: [bug, security, enhancement, documentation, question, good-first-issue, firewall, proxy, docker, ci]
  add-comment: {}
timeout-minutes: 5
---
# Issue Triage Agent
Analyze issue #$\{\{ github.event.issue.number }} in $\{\{ github.repository }}...

P1 β€” Plan for Near-Term

P1.1: Firewall Escape Test Agent πŸ”₯

What: A dedicated daily agent that attempts to escape the AWF network firewall using known techniques and reports findings as a discussion

Why: The security-review.md workflow already references this agent ("Read the Firewall Escape Test Agent's Report") but it doesn't exist β€” this is a gap in the security review pipeline. For a security firewall repository, continuous adversarial escape testing is uniquely domain-relevant. This workflow would try known bypass techniques (DNS tunneling, HTTP CONNECT abuse, IPv6 bypass, localhost tricks) and report on which ones are properly blocked.

How: A daily scheduled workflow using bash: true that runs actual awf commands with various bypass attempts inside the container, checks squid logs, and reports success/failure per technique.

Effort: Medium

Unique to this repo: No other repository type can benefit from this as directly as a network firewall tool. Each test run validates real security invariants.


P1.2: Workflow Health Monitor (Meta-Agent)

What: A weekly meta-agent that reviews all other agentic workflow runs and creates a health report with issues for unhealthy agents

Why: The factory learned that meta-agents are incredibly valuable. Currently there's no observability on the 28 workflows themselves β€” nobody is watching the watchers. The duplicate discussion problem (#1111/#1106, etc.) would be caught automatically. Secret Digger failures (#1107, #1105) linger as issues but there's no systematic health check.

How: Weekly scheduled workflow using agentic-workflows tool to inspect recent runs of all workflows, identify failure rates, duplicate outputs, and cost anomalies. Creates issues for unhealthy workflows.

Effort: Low–Medium

---
on:
  schedule: weekly
tools:
  agentic-workflows:
  github:
    toolsets: [default, actions]
  cache-memory: true
safe-outputs:
  create-discussion:
    title-prefix: "[Workflow Health] "
  create-issue:
    title-prefix: "[Workflow Health] "
    labels: [agentic-workflows]
    max: 5

P1.3: Breaking Change Checker

What: On each PR, detect backward-incompatible CLI changes (removed flags, changed defaults, renamed options, Docker API changes)

Why: AWF is a distributed CLI tool consumed by users who script it. Breaking changes in --allow-domains semantics, flag names, or Docker compose configuration need early detection. The factory uses this pattern with a 100% causal chain merge rate. Recent PRs adding --build-local, changing --image-tag behavior, and adding API proxy ports are exactly the type of changes this catches.

How: PR-triggered workflow that diffs src/cli.ts, src/types.ts, and containers/ against base branch, identifies potentially breaking changes, and comments on the PR.

Effort: Low


P2 β€” Consider for Roadmap

P2.1: Daily Malicious Code Scan

What: Daily scan of recent commits for suspicious patterns β€” obfuscated code, unusual network calls, hardcoded credentials, suspicious shell commands

Why: AWF runs as root with NET_ADMIN capability and accesses docker.sock. A supply chain compromise here would be particularly dangerous. The factory runs this daily in gh-aw. For a security-critical tool, this defensive layer is especially important.

Effort: Low (based on existing secret-digger pattern, just different analysis focus)


P2.2: Sub Issue Closer

What: Automatically close sub-issues when parent issues are resolved

Why: As issue-monster creates more Copilot SWE agent tasks, sub-issue tracking will accumulate stale closed/merged items. From the factory: "keeps the issue tracker clean."

Effort: Low


P2.3: Changeset Generator

What: On merging to main, analyze commits since last release and auto-generate a PR with version bump + CHANGELOG entry

Why: update-release-notes improves notes after a release is published, but there's no automation for preparing releases. The factory's Changeset workflow had a 78% merge rate across 28 proposed PRs. Given AWF releases container images via GHCR, having well-tracked version bumps matters.

Effort: Medium


P2.4: Fix skip-if-match for Discussion-Creating Workflows

What: Update ci-cd-gaps-assessment, security-review, and pelis-agent-factory-advisor to use better deduplication to avoid accumulating stale duplicate discussions/issues

Why: Currently 6 open duplicate issues (#1113/#1109, #1112/#1108, #1111/#1106). The skip-if-match queries need to match the title prefixes + date patterns.

Effort: Low β€” just adjust the skip-if-match queries in each workflow


P3 β€” Future Ideas

P3.1: Portfolio Analyst (Token Cost Optimizer)

What: Weekly analysis of workflow token usage and costs across all 28 workflows, identifying expensive agents and optimization opportunities

Why: With 28 workflows running daily/hourly/weekly, token costs accumulate. The factory found some agents were "way too chatty" with LLM calls. Secret-digger alone runs 3Γ— per hour.

Effort: Low (read-only analysis)


P3.2: Weekly Issue & PR Summary

What: Weekly digest of repository activity β€” open issues, PR status, workflow health β€” posted as a discussion

Why: With automated agents creating many issues/PRs, maintainers need a curated weekly digest to stay informed without reading every individual output.

Effort: Low


P3.3: Contribution Guidelines Checker

What: On new PRs from external contributors, check that contribution guidelines (conventional commits, scope, PR title format) are followed and comment with guidance

Why: AWF enforces strict conventional commits (with a limited scope allowlist β€” cli, docker, squid, proxy, ci, deps). External contributors frequently get PR title check failures. An early-comment agent reduces frustration.

Effort: Low


πŸ“ˆ Maturity Assessment

Current Level: 4/5 β€” Advanced Factory

This is one of the most sophisticated agentic workflow setups outside the gh-aw factory itself. Strengths:

  • βœ… 28 compiled agentic workflows across all major categories
  • βœ… Multi-engine support (Claude, Codex, Copilot)
  • βœ… Domain-specific workflows (security-guard, smoke tests, secret-digger Γ— 3)
  • βœ… Good cascade design (ci-doctor β†’ issues β†’ issue-monster β†’ PRs)
  • βœ… Cache-memory usage for stateful agents

Target Level: 4.5/5 β€” Add meta-monitoring and triage

Gap Analysis:

  1. Add issue triage (P0) β†’ improves issue quality entering issue-monster cascade
  2. Add workflow health monitor (P1) β†’ closes the observability gap for 28 workflows
  3. Fix uncompiled workflows (operational) β†’ pelis-advisor and test-coverage-improver aren't running
  4. Build the escape test agent (P1) β†’ unique to this repo's security mission

πŸ”„ Comparison with Best Practices

Best Practice This Repo Notes
Issue triage ❌ Missing; all auto-created issues unlabeled
Fault investigation βœ… ci-doctor is excellent
Security compliance βœ…βœ… Above average β€” security-guard, security-review, secret-diggerΓ—3
Documentation sync βœ… doc-maintainer + cli-flag-consistency-checker
Meta-agent monitoring ❌ No workflow health manager or audit workflows
Release automation ⚠️ update-release-notes exists but no changeset generation
Code quality agents ❌ No simplicity/refactoring/style agents
Interactive/ChatOps βœ… /plan slash command
Multi-engine testing βœ…βœ… Unique strength β€” smoke tests on 4 configs
Observability/metrics ❌ No portfolio analyst or metrics collector

What this repo does uniquely well: The triple-engine secret digger (running hourly on claude/codex/copilot) and the four-way smoke testing matrix are standout patterns not seen in the factory itself. The security-guard PR reviewer using Claude is particularly well-suited to this security-critical codebase.

Domain opportunity: A Firewall Escape Test Agent is uniquely valuable here β€” no other repository type can leverage this pattern. It would turn the firewall into its own test subject, continuously verifying security invariants.


Generated by Pelis Agent Factory Advisor Β· 2026-03-02


Note: This was intended to be a discussion, but discussions could not be created due to permissions issues. This issue was created as a fallback.

Tip: Discussion creation may fail if the specified category is not announcement-capable. Consider using the "Announcements" category or another announcement-capable category in your workflow configuration.

Generated by Pelis Agent Factory Advisor

  • expires on Mar 9, 2026, 3:25 AM UTC

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions