Skip to content

Feat/Batch Multi Form Fill with Canonical Extraction and Evidence Attribution#158

Open
Acuspeedster wants to merge 4 commits intofireform-core:mainfrom
Acuspeedster:feat/batch-fill-canonical-extraction
Open

Feat/Batch Multi Form Fill with Canonical Extraction and Evidence Attribution#158
Acuspeedster wants to merge 4 commits intofireform-core:mainfrom
Acuspeedster:feat/batch-fill-canonical-extraction

Conversation

@Acuspeedster
Copy link

Closes #155
Closes #156
Closes #157


Summary

This PR operationalizes FireForm’s "report once, file everywhere" promise at the API level.

It introduces:

  • Canonical transcript extraction (single-pass)
  • Concurrent template mapping
  • Batch multi-form endpoint
  • Evidence attribution per field
  • Persisted audit trail
  • Partial failure resilience
  • Complexity reduction from T×F to 1 + T

From an architectural standpoint, this PR should be considered high priority, as it resolves a core scalability and compliance limitation affecting FireForm’s primary multi-agency use case.


Architectural Redesign

Separation of Concerns

Previously:

  • Extraction and template filling were fused.
  • Each template re-extracted from raw transcript.

Now:

  • Canonical incident extraction (1 LLM call)
  • Template mapping (T concurrent calls)
  • PDF writing concurrent via executor

IncidentExtractor

New extractor.py introduces IncidentExtractor.

Pass 1:

  • Single LLM call
  • Produces canonical incident record
  • 26 template-agnostic categories
  • Each category contains:
    • value
    • evidence_quote
    • confidence

Pass 2:

  • Template-specific mapping
  • Stateless w.r.t transcript
  • Concurrent via asyncio.gather

Complexity Improvement

Previous:
T × F LLM calls

New:
1 + T LLM calls

Example:
5 templates × 10 fields = 50 calls
Now = 6 calls


Concurrency

  • Mapping calls executed concurrently
  • PDF fills executed in thread pool executor
  • Event loop never blocked

New Endpoints

POST /forms/fill/batch

  • One transcript
  • Multiple templates
  • Returns all output paths
  • Returns batch_id

GET /forms/batches/{id}

  • Per-template success/failure
  • Batch state

GET /forms/batches/{id}/audit

  • Canonical extraction
  • Evidence quotes
  • Confidence levels

Partial Failure Handling

  • One template failure does not abort batch
  • Status states:
    • complete
    • partial
    • failed

Callers never lose successful fills due to unrelated template error.


Database Changes

New BatchSubmission table:

  • batch_id
  • template_ids
  • status
  • per-template outputs
  • canonical extraction JSON
  • evidence fields
  • created_at

Repository additions:

  • create_batch
  • get_batch

Testing

24 new tests
Total: 38 passing

Coverage includes:

  • Single-template success
  • Multi-template success
  • Partial failure
  • All-failed case
  • Audit endpoint validation
  • Evidence content validation
  • 404 handling
  • Input validation (empty, duplicates, limits)
  • Unit tests for evidence builder

Operational Significance

This PR:

  • Removes redundant extraction cost
  • Enables true multi-agency filing
  • Introduces legal-grade evidence attribution
  • Reduces inference complexity
  • Improves scalability
  • Aligns FireForm with production emergency-services requirements
  • Establishes architectural clarity between extraction, mapping, and presentation

Given that multi-agency filing is the primary use case FireForm is designed to solve, resolving this redundancy and audit gap is foundational for production readiness.

…emplate tests with detailed assertions and mock setups
…tribution

The core promise of FireForm is 'report once, file everywhere'. This commit
delivers that promise at the API level.

## What this adds

### POST /forms/fill/batch
One incident transcript -> fill N agency PDFs in a single request.

  Extraction complexity: O(T*F) LLM calls  ->  O(1 + T) LLM calls
  (T templates, F fields each; e.g. 5 agencies x 10 fields = 50 -> 6 calls)

  Pipeline:
  1. Single canonical LLM extraction pass (all incident data, one call)
  2. Concurrent template-mapping passes via asyncio.gather() (T fast calls)
  3. Concurrent PDF fills via loop.run_in_executor() (no event-loop blocking)
  4. Per-template FormSubmission records + one BatchSubmission audit record
  5. Partial failure tolerance: some PDFs can fail without aborting others

### GET /forms/batches/{id}
Lightweight status check: per-template output paths, success/failure counts.

### GET /forms/batches/{id}/audit  [legal compliance endpoint]
Full canonical extraction for chain-of-custody verification. Every extracted
field carries the verbatim transcript quote used as evidence, allowing
supervisors and legal teams to trace every value in every filed form back
to a specific statement in the original incident recording.

## New files
- src/extractor.py          IncidentExtractor class (canonical + mapping)
- api/routes/batch.py       Three new endpoints
- api/schemas/batch.py      BatchFill, TemplateResult, EvidenceField, ...
- tests/test_batch.py       24 tests (all passing)

## Modified files
- api/db/models.py          BatchSubmission SQLModel table
- api/db/repositories.py    create_batch / get_batch CRUD
- api/main.py               Register batch router; add OpenAPI metadata
- src/filler.py             Add fill_form_with_data() (name-based field fill)
- tests/conftest.py         Import FillJob + BatchSubmission for test DB

38 tests pass.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

1 participant