feat: frontend UI, batch LLM extraction, dynamic PDF labels, API hardening by utkarshqz · Pull Request #210 · fireform-core/FireForm

utkarshqz · 2026-03-09T16:39:12Z

feat: frontend UI, batch LLM extraction, dynamic PDF labels, API hardening

Summary

This PR delivers the complete FireForm pipeline working end-to-end for the first time. It adds the missing browser frontend, fixes the root cause of the LLM hallucination bug, hardens the API, and provides full documentation.

All changes in this PR are tightly interconnected — they cannot be separated without creating a broken intermediate state. The dependency chain is:

frontend/index.html
  └── needs CORS middleware          → api/main.py
  └── needs GET /templates           → api/routes/templates.py
        └── needs get_all_templates  → api/db/repositories.py
  └── needs POST /forms/fill         → api/routes/forms.py
        └── needs working LLM        → src/llm.py
              └── needs human labels → api/routes/templates.py (stores labels, not empty strings)
  └── needs GET /forms/download      → api/routes/forms.py + api/db/repositories.py

Every file changed is a direct dependency of the frontend working correctly. This is why they are submitted together.

Closes / Fixes

Closes #1
Closes #102
Closes #160
Closes #162
Closes #165
Closes #173
Closes #196
Closes #205
Fixes #135
Fixes #145
Fixes #149
Fixes #187
Addresses #206

Type of change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
This change requires a documentation update

What changed and why

1. 🧠 Root cause fix — LLM hallucination (#173)

This was the most impactful bug in the codebase. Every PDF came out with the same value repeated in every field.

Root cause — traced through 3 layers:

Layer	Problem	Fix
`templates.py`	Stored fields as `{"textbox_0_0": ""}` — empty values, meaningless to any LLM	Now reads real labels from PDF annotations: `{"JobTitle": "Job Title"}`
`llm.py`	Called Ollama once per field using the internal PDF name `textbox_0_0` — Mistral had no idea what the field meant	Single batch call with ALL fields + human-readable labels in one prompt
Mistral	Received `"What is textbox_0_0?"` 7 times → guessed the same name each time	Receives a complete form context → extracts correct distinct values

Performance improvement: 7 Ollama API calls → 1 Ollama API call (O(N) → O(1))

The exact prompt now sent to Mistral:

You are a form-filling assistant. Extract values from the transcript to fill the form fields below.
Return ONLY a JSON object. No explanation. No markdown. No extra text.

FORM FIELDS (each line: "internal_key": null  // visible label on form):
{
  "NAME/SID": null,         // Name/Sid
  "JobTitle": null,         // Job Title
  "Department": null,       // Department
  "Phone Number": null,     // Phone Number
  "email": null,            // Email
  "Date7_af_date": null,    // Date7
  "signature": null         // Signature
}

RULES:
1. Replace null with the value from the transcript
2. Keep the exact key names
3. Return valid JSON only
4. If a value is not found in the transcript, use null

Transcript:
Employee name is John Smith. Employee ID is EMP-2024-789.
Job title is Firefighter Paramedic. Department is Emergency Medical Services.
Phone number is 916-555-0147.

JSON:

Real Mistral output (verified locally, 09/03/2026):

{
  "NAME/SID": "John Smith; EMP-2024-789",
  "JobTitle": "Firefighter Paramedic",
  "Department": "Emergency Medical Services",
  "Phone Number": "916-555-0147",
  "email": null,
  "Date7_af_date": null,
  "signature": null
}

4 fields correctly extracted with distinct values. 3 fields correctly null (email, date, signature were not in the input transcript). Zero hallucination.

2. 🖥️ Frontend UI — first working browser interface (#1, #196)

FireForm had no UI. Users had to write curl commands or use Postman to interact with it. This PR adds a complete single-file browser interface at frontend/index.html.

What the UI does:

Step 1 — Upload: Drag or select any fillable PDF form. FireForm reads the field names automatically.
Step 2 — Select: Choose from previously uploaded templates via dropdown.
Step 3 — Describe: Type or paste a plain-language incident description.
Step 4 — Fill: One click sends everything to Ollama and fills the PDF.
Step 5 — Download: Get the completed PDF immediately.

Additional UI features:

Live API status indicator — shows green/red dot for uvicorn + Ollama connectivity
Submission history — see all previously filled forms in the session
Dark/light mode toggle — persists across page reloads
Clear error messages — tells the user exactly what went wrong (Ollama offline, invalid PDF, etc.)
No build step required — pure HTML/CSS/JS, served with python -m http.server 3000

Screenshots (from local testing):

3. 📄 Dynamic PDF field label extraction (#162, #173)

The old template system stored every PDF field as an empty string, throwing away the label information that was sitting right there in the PDF file.

Old templates.py:

# Every field got an empty string — useless to LLM
fields = {name: "" for name in raw_fields}
# Stored: {"textbox_0_0": "", "textbox_0_1": "", "textbox_0_2": ""}

New templates.py:

for internal_name, field_data in raw_fields.items():
    # Try PDF tooltip label first (/TU), then internal name (/T)
    label = field_data.get("/TU") or field_data.get("/T")
    if not label or label == internal_name:
        # Prettify camelCase and underscores as readable fallback
        label = re.sub(r'([a-z])([A-Z])', r'\1 \2', internal_name)
        label = re.sub(r'_af_.*$', '', label).replace('_', ' ').title()
    fields[internal_name] = label

# Stored: {
#   "JobTitle": "Job Title",
#   "Phone Number": "Phone Number",
#   "Date7_af_date": "Date7",
#   "NAME/SID": "Name/Sid"
# }

This works with any PDF regardless of how it was created — whether it has proper tooltip labels or just internal names.

4. 🔌 API hardening (#135, #145, #149, #165, #187, #205)

api/main.py — three critical additions:

# 1. CORS middleware — allows browser to call the API
app.add_middleware(
    CORSMiddleware,
    allow_origins=["*"],
    allow_methods=["*"],
    allow_headers=["*"],
)

# 2. Global error handler — AppError now returns proper JSON with correct status codes
#    Without this, any AppError caused a 500 crash AND CORS headers were never added,
#    so the browser saw an opaque network error with no useful message
@app.exception_handler(AppError)
def app_error_handler(request: Request, exc: AppError):
    return JSONResponse(
        status_code=exc.status_code,
        content={"detail": exc.message}
    )

# 3. from typing import Union — fixes NameError on startup (#135, #187)

api/routes/forms.py:

Single DB query per request — was fetching the template twice ([BUG]: Redundant Database Query in POST /forms/fill #149)
Fix fields type mismatch — template.fields is stored as dict but Controller.fill_form() expects a list — added isinstance guard with list(fields.keys()) conversion for backward compatibility
ConnectionError → returns 503 with "Ollama not running" message (not a cryptic 500)
Null path guard — returns 500 with clear message if PDF generation fails silently
os.path.exists() check before DB insert — prevents saving records pointing to missing files
GET /forms/download/{submission_id} — serves filled PDF as file download ([FEAT]: Add GET /forms/download/{submission_id} endpoint #205)
GET /forms/{submission_id} — retrieve submission record by ID

api/routes/templates.py:

GET /templates with limit/offset pagination ([FEAT]: Add GET /templates/{template_id} endpoint #160)
GET /templates/{id} ([FEAT]: Add GET /templates endpoint (list + pagination) #162)
Saves uploaded PDFs to src/inputs/ (stable path, survives restarts)
Reads both /TU and /T PDF annotations for field labels

api/db/repositories.py:

Added get_all_templates(session, limit, offset) — was missing entirely, caused ImportError: cannot import name 'get_all_templates' on every startup ([BUG]: 'Union' not defined in main.py #135, [BUG]: NameError on startup due to missing Union import in main.py #187)
Added get_form(session, submission_id) — needed for download and submission retrieval

src/llm.py:

Handle both dict and list for target_fields — isinstance check ensures compatibility with both old list-based and new dict-based field formats
Strips markdown code fences from Mistral response (\``json...```` → clean JSON)
Graceful fallback to per-field extraction if batch JSON parse fails

src/main.py:

Fixed broken import (from backend import Fill → Controller) which caused silent PDF generation failures
Added os.fspath() normalization for PDF paths on Windows

5. 📚 Documentation (#102)

docs/SETUP.md — complete setup guide for Windows, Linux, and macOS covering:

Prerequisites (Python 3.11+, Ollama 0.17.7+, Mistral 7B)
Virtual environment setup
Step-by-step server startup
Frontend usage walkthrough with example transcript
How the batch AI extraction works (with before/after explanation)
Environment variables
Full troubleshooting section for every known startup error

docs/frontend.md — frontend architecture and API integration reference

docs/demo/ — real screenshots and filled PDF from local testing on 09/03/2026

How Has This Been Tested?

Manual end-to-end test (verified locally, 09/03/2026):

1. uvicorn api.main:app --reload         ← API running
2. ollama serve                           ← Mistral running
3. cd frontend && python -m http.server 3000
4. http://localhost:3000                  ← UI loads, green dot visible
5. Upload file.pdf (Cal Fire vaccination form)
6. Enter transcript (John Smith, Firefighter Paramedic, etc.)
7. Click Fill Form
8. Download filled PDF — 4 fields correctly populated, 3 correctly null

Unit tests (PR #209): 52/52 passing

python -m pytest tests/ -v
52 passed in 0.58s

End-to-end pipeline: upload PDF → describe incident → download filled PDF ✅
52 unit tests pass (see PR test: replace broken test suite with 52 passing tests #209) ✅
Frontend loads, API status indicator shows green ✅
All error states return correct HTTP codes (404/422/500/503) ✅
CORS — browser can call API from localhost:3000 ✅
Batch LLM: 1 Ollama call fills all 7 fields (verified in uvicorn logs) ✅
fields dict→list conversion verified working end-to-end ✅

Environment:

OS: Windows 11
Python: 3.11.9
Ollama: 0.17.7
Model: Mistral 7B

Checklist

My code follows the style guidelines of this project
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
Any dependent changes have been merged and published in downstream modules

…ening

feat: frontend UI, batch LLM extraction, dynamic PDF labels, API hard…

ad28847

…ening

This was referenced Mar 9, 2026

feat: add schema validation for LLM extracted fields #212

Closed

feat: add schema validation for LLM extracted fields #213

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: frontend UI, batch LLM extraction, dynamic PDF labels, API hardening#210

feat: frontend UI, batch LLM extraction, dynamic PDF labels, API hardening#210
utkarshqz wants to merge 1 commit intofireform-core:mainfrom
utkarshqz:feat/frontend-ui-and-api-fixes

utkarshqz commented Mar 9, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

utkarshqz commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

feat: frontend UI, batch LLM extraction, dynamic PDF labels, API hardening

Summary

Closes / Fixes

Type of change

What changed and why

1. 🧠 Root cause fix — LLM hallucination (#173)

2. 🖥️ Frontend UI — first working browser interface (#1, #196)

3. 📄 Dynamic PDF field label extraction (#162, #173)

4. 🔌 API hardening (#135, #145, #149, #165, #187, #205)

5. 📚 Documentation (#102)

How Has This Been Tested?

Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

utkarshqz commented Mar 9, 2026 •

edited

Loading