Skip to content

Spatial sorting #175

Open
Cubix33 wants to merge 2 commits intofireform-core:mainfrom
Cubix33:spatial-sorting
Open

Spatial sorting #175
Cubix33 wants to merge 2 commits intofireform-core:mainfrom
Cubix33:spatial-sorting

Conversation

@Cubix33
Copy link

@Cubix33 Cubix33 commented Mar 3, 2026

Closes #168

📝 Description

This PR refactors the PDF annotation sorting logic in src/filler.py. It replaces the strict coordinate-based sort with a Y-Cluster Spatial Sorting algorithm. This ensures that form fields on the same visual line (like "First Name" and "Last Name") are grouped correctly into "rows" before being sorted left-to-right, even if their Y-coordinates differ by a few pixels.

🛠️ Technical Changes

  • Y-Cluster Implementation: Introduced a clustering mechanism in src/filler.py that groups PDF widgets within a 10-pixel Y-axis tolerance.
  • Two-Pass Sorting:
    1. Vertical Pass: Groups annotations into "visual rows" based on their bottom-edge Y-coordinate (Rect[1]).
    2. Horizontal Pass: Sorts each individual row strictly by the X-coordinate (Rect[0]).
  • Robustness: This prevents "Last Name" from being filled before "First Name" simply because the box was drawn 1 pixel higher on the page.

💡 Rationale

Strict coordinate sorting often fails on professionally designed, multi-column forms or tables where fields are meant to be read left-to-right but may have slight misalignments in the underlying PDF code. This change allows FireForm to handle dense, complex layouts (medical, legal, and government forms) with much higher accuracy.

🧪 Quality Check

  • Verified that side-by-side fields on the same visual line are filled correctly (Left → Right).
  • Confirmed that the vertical (Top → Bottom) flow is maintained for multi-row forms.
  • Validated that the 10-pixel tolerance threshold effectively handles common PDF alignment variances.
  • Fallback tested on standard single-column PDFs to ensure zero regressions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[ENHANCEMENT]: Implement Y-Cluster Spatial Sorting for PDF Annotations

1 participant