Skip to content

feat: frontend UI, batch LLM extraction, dynamic PDF labels, API hardening#210

Open
utkarshqz wants to merge 1 commit intofireform-core:mainfrom
utkarshqz:feat/frontend-ui-and-api-fixes
Open

feat: frontend UI, batch LLM extraction, dynamic PDF labels, API hardening#210
utkarshqz wants to merge 1 commit intofireform-core:mainfrom
utkarshqz:feat/frontend-ui-and-api-fixes

Conversation

@utkarshqz
Copy link

@utkarshqz utkarshqz commented Mar 9, 2026

feat: frontend UI, batch LLM extraction, dynamic PDF labels, API hardening

Summary

This PR delivers the complete FireForm pipeline working end-to-end for the first time. It adds the missing browser frontend, fixes the root cause of the LLM hallucination bug, hardens the API, and provides full documentation.

All changes in this PR are tightly interconnected — they cannot be separated without creating a broken intermediate state. The dependency chain is:

frontend/index.html
  └── needs CORS middleware          → api/main.py
  └── needs GET /templates           → api/routes/templates.py
        └── needs get_all_templates  → api/db/repositories.py
  └── needs POST /forms/fill         → api/routes/forms.py
        └── needs working LLM        → src/llm.py
              └── needs human labels → api/routes/templates.py (stores labels, not empty strings)
  └── needs GET /forms/download      → api/routes/forms.py + api/db/repositories.py

Every file changed is a direct dependency of the frontend working correctly. This is why they are submitted together.


Closes / Fixes

Closes #1
Closes #102
Closes #160
Closes #162
Closes #165
Closes #173
Closes #196
Closes #205
Fixes #135
Fixes #145
Fixes #149
Fixes #187
Addresses #206


Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update

What changed and why

1. 🧠 Root cause fix — LLM hallucination (#173)

This was the most impactful bug in the codebase. Every PDF came out with the same value repeated in every field.

Root cause — traced through 3 layers:

Layer Problem Fix
templates.py Stored fields as {"textbox_0_0": ""} — empty values, meaningless to any LLM Now reads real labels from PDF annotations: {"JobTitle": "Job Title"}
llm.py Called Ollama once per field using the internal PDF name textbox_0_0 — Mistral had no idea what the field meant Single batch call with ALL fields + human-readable labels in one prompt
Mistral Received "What is textbox_0_0?" 7 times → guessed the same name each time Receives a complete form context → extracts correct distinct values

Performance improvement: 7 Ollama API calls → 1 Ollama API call (O(N) → O(1))

The exact prompt now sent to Mistral:

You are a form-filling assistant. Extract values from the transcript to fill the form fields below.
Return ONLY a JSON object. No explanation. No markdown. No extra text.

FORM FIELDS (each line: "internal_key": null  // visible label on form):
{
  "NAME/SID": null,         // Name/Sid
  "JobTitle": null,         // Job Title
  "Department": null,       // Department
  "Phone Number": null,     // Phone Number
  "email": null,            // Email
  "Date7_af_date": null,    // Date7
  "signature": null         // Signature
}

RULES:
1. Replace null with the value from the transcript
2. Keep the exact key names
3. Return valid JSON only
4. If a value is not found in the transcript, use null

Transcript:
Employee name is John Smith. Employee ID is EMP-2024-789.
Job title is Firefighter Paramedic. Department is Emergency Medical Services.
Phone number is 916-555-0147.

JSON:

Real Mistral output (verified locally, 09/03/2026):

{
  "NAME/SID": "John Smith; EMP-2024-789",
  "JobTitle": "Firefighter Paramedic",
  "Department": "Emergency Medical Services",
  "Phone Number": "916-555-0147",
  "email": null,
  "Date7_af_date": null,
  "signature": null
}

4 fields correctly extracted with distinct values. 3 fields correctly null (email, date, signature were not in the input transcript). Zero hallucination.


2. 🖥️ Frontend UI — first working browser interface (#1, #196)

FireForm had no UI. Users had to write curl commands or use Postman to interact with it. This PR adds a complete single-file browser interface at frontend/index.html.

What the UI does:

  • Step 1 — Upload: Drag or select any fillable PDF form. FireForm reads the field names automatically.
  • Step 2 — Select: Choose from previously uploaded templates via dropdown.
  • Step 3 — Describe: Type or paste a plain-language incident description.
  • Step 4 — Fill: One click sends everything to Ollama and fills the PDF.
  • Step 5 — Download: Get the completed PDF immediately.

Additional UI features:

  • Live API status indicator — shows green/red dot for uvicorn + Ollama connectivity
  • Submission history — see all previously filled forms in the session
  • Dark/light mode toggle — persists across page reloads
  • Clear error messages — tells the user exactly what went wrong (Ollama offline, invalid PDF, etc.)
  • No build step required — pure HTML/CSS/JS, served with python -m http.server 3000

Screenshots (from local testing):

FireForm UI

FireForm UI — results panel


3. 📄 Dynamic PDF field label extraction (#162, #173)

The old template system stored every PDF field as an empty string, throwing away the label information that was sitting right there in the PDF file.

Old templates.py:

# Every field got an empty string — useless to LLM
fields = {name: "" for name in raw_fields}
# Stored: {"textbox_0_0": "", "textbox_0_1": "", "textbox_0_2": ""}

New templates.py:

for internal_name, field_data in raw_fields.items():
    # Try PDF tooltip label first (/TU), then internal name (/T)
    label = field_data.get("/TU") or field_data.get("/T")
    if not label or label == internal_name:
        # Prettify camelCase and underscores as readable fallback
        label = re.sub(r'([a-z])([A-Z])', r'\1 \2', internal_name)
        label = re.sub(r'_af_.*$', '', label).replace('_', ' ').title()
    fields[internal_name] = label

# Stored: {
#   "JobTitle": "Job Title",
#   "Phone Number": "Phone Number",
#   "Date7_af_date": "Date7",
#   "NAME/SID": "Name/Sid"
# }

This works with any PDF regardless of how it was created — whether it has proper tooltip labels or just internal names.


4. 🔌 API hardening (#135, #145, #149, #165, #187, #205)

api/main.py — three critical additions:

# 1. CORS middleware — allows browser to call the API
app.add_middleware(
    CORSMiddleware,
    allow_origins=["*"],
    allow_methods=["*"],
    allow_headers=["*"],
)

# 2. Global error handler — AppError now returns proper JSON with correct status codes
#    Without this, any AppError caused a 500 crash AND CORS headers were never added,
#    so the browser saw an opaque network error with no useful message
@app.exception_handler(AppError)
def app_error_handler(request: Request, exc: AppError):
    return JSONResponse(
        status_code=exc.status_code,
        content={"detail": exc.message}
    )

# 3. from typing import Union — fixes NameError on startup (#135, #187)

api/routes/forms.py:

  • Single DB query per request — was fetching the template twice ([BUG]: Redundant Database Query in POST /forms/fill #149)
  • Fix fields type mismatch — template.fields is stored as dict but Controller.fill_form() expects a list — added isinstance guard with list(fields.keys()) conversion for backward compatibility
  • ConnectionError → returns 503 with "Ollama not running" message (not a cryptic 500)
  • Null path guard — returns 500 with clear message if PDF generation fails silently
  • os.path.exists() check before DB insert — prevents saving records pointing to missing files
  • GET /forms/download/{submission_id} — serves filled PDF as file download ([FEAT]: Add GET /forms/download/{submission_id} endpoint #205)
  • GET /forms/{submission_id} — retrieve submission record by ID

api/routes/templates.py:

api/db/repositories.py:

src/llm.py:

  • Handle both dict and list for target_fieldsisinstance check ensures compatibility with both old list-based and new dict-based field formats
  • Strips markdown code fences from Mistral response (\``json...```` → clean JSON)
  • Graceful fallback to per-field extraction if batch JSON parse fails

src/main.py:

  • Fixed broken import (from backend import FillController) which caused silent PDF generation failures
  • Added os.fspath() normalization for PDF paths on Windows

5. 📚 Documentation (#102)

docs/SETUP.md — complete setup guide for Windows, Linux, and macOS covering:

  • Prerequisites (Python 3.11+, Ollama 0.17.7+, Mistral 7B)
  • Virtual environment setup
  • Step-by-step server startup
  • Frontend usage walkthrough with example transcript
  • How the batch AI extraction works (with before/after explanation)
  • Environment variables
  • Full troubleshooting section for every known startup error

docs/frontend.md — frontend architecture and API integration reference

docs/demo/ — real screenshots and filled PDF from local testing on 09/03/2026


How Has This Been Tested?

Manual end-to-end test (verified locally, 09/03/2026):

1. uvicorn api.main:app --reload         ← API running
2. ollama serve                           ← Mistral running
3. cd frontend && python -m http.server 3000
4. http://localhost:3000                  ← UI loads, green dot visible
5. Upload file.pdf (Cal Fire vaccination form)
6. Enter transcript (John Smith, Firefighter Paramedic, etc.)
7. Click Fill Form
8. Download filled PDF — 4 fields correctly populated, 3 correctly null

Unit tests (PR #209): 52/52 passing

python -m pytest tests/ -v
52 passed in 0.58s
  • End-to-end pipeline: upload PDF → describe incident → download filled PDF ✅
  • 52 unit tests pass (see PR test: replace broken test suite with 52 passing tests #209) ✅
  • Frontend loads, API status indicator shows green ✅
  • All error states return correct HTTP codes (404/422/500/503) ✅
  • CORS — browser can call API from localhost:3000 ✅
  • Batch LLM: 1 Ollama call fills all 7 fields (verified in uvicorn logs) ✅
  • fields dict→list conversion verified working end-to-end ✅

Environment:

  • OS: Windows 11
  • Python: 3.11.9
  • Ollama: 0.17.7
  • Model: Mistral 7B

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published in downstream modules

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment