Skip to content

Conversation

@Uli-Z
Copy link
Contributor

@Uli-Z Uli-Z commented Nov 16, 2025

This PR introduces a lean, extensible import architecture and wires the Spliit‑JSON format as the first adapter. Goal: reliable import/export as a core capability to ensure data sovereignty and smooth onboarding.

Motivation

  • Import/Export is central: users should control their data and move it between instances.
  • Onboarding from Splitwise should be simple and lossless (no balance drift) and feel reliable.
  • A long-requested Splitwise import (refs Import from Splitwise #22) needs a solid, generic import foundation, not a one-off integration.
  • To accelerate the discussion, a working prototype was built. With a lot of help of AI, it works. But I’m not a web developer — a thorough review is welcome.

Scope in this PR

  • Core layer: import models, registry, and file pipeline under src/lib/imports/**.
  • Formats: Spliit‑JSON v1 and a Debug format; automatic detection via looksLike + confidence (highest wins; deterministic tie-breaks).
  • Server: TRPC procedures for preview/start/chunk/cancel/finalize under groups/import.
  • UI: import dialog (analyze → preview → import) with progress and cancel; import is enabled only after a successful analysis.
  • Integration: “Import from file” as a submenu next to Create; “Add by URL” is unchanged vs trunk.
  • Large flows: progress UI and user-initiated cancel are included.

Decisions

  • Automatic format detection keeps the UI simple and resilient; explicit user selection would add friction, and future versions of the same tools will vary in structure.
  • Focus on unobtrusive integration: the new import flow sits next to existing create actions without altering the established UI patterns.

Future work

  • Splitwise adapter (multi-language), more formats.

Demo

firefox_B57h8EyK8Z

@Uli-Z Uli-Z force-pushed the feature/generic-import branch from f64243a to d42fd24 Compare November 17, 2025 07:47
Uli-Z added 8 commits December 7, 2025 20:26
- Introduce ImportFormat interface and in-memory registry for adapters
- Add registry helper to detect formats and delegate parsing
- Add file import builder to parse, collect errors, and compute participant summaries via balances
- Establish clear types for parsed group meta (name, currency, participants)
- Implement robust detection on full JSON payload with minimal structure checks
- Parse export into ExpenseFormValues; coerce amounts/dates and validate against schema
- Aggregate per-row errors and expose optional group meta (name, currency, participants)
- Self-register adapter in the global registry
- Add marker-based debug format (DEBUG_IMPORT/DEBUG_ERRORS) with unambiguous detection
- Emit one error per line for quick UI testing of failure paths
- Include simple fixture file for manual verification
- Register debug adapter with registry at low priority
…finalize)

- Expose preview endpoint to parse and summarize uploaded file before import
- Implement job-based create flow with chunked processing and progress reporting
- Provide cancel/cleanup and finalize endpoints to control lifecycle
- Register endpoints under groups router
- Dropzone with drag-and-drop and accessible labeling
- Analysis panel to display detected format, totals and errors
- Progress view for chunked import with visual bar
- Result view to confirm completion or cancellation
- Combine upload + preview + scroll-to-confirm + chunked import in one dialog
- Handle cancel/finalize flows with toasts and resilient state reset
- Support optional prefill of group name from parsed file meta
- Integrate TRPC mutations with defensive error handling
- Add Import from file option to create menu
- Mount FileImportModal and navigate to new group on success
- Persist created group to recent list and refresh view
- Add strings for upload, preview errors, progress, and results
- Provide German, English, Spanish and French localizations
- Wire keys used across import components and modal
@Uli-Z Uli-Z force-pushed the feature/generic-import branch from d42fd24 to 907bd3a Compare December 7, 2025 19:31
Uli-Z added 7 commits December 9, 2025 08:56
…ility

Extracted complex parsing logic from 'parseToInternal' into smaller, private helper methods for better readability and easier maintenance.
Implemented a daily cleanup of ImportJob records older than 24 hours at the start of a new import, preventing database bloat from stale jobs.
Implemented Zod validation for 'expensesToCreate' in the ImportJob model when processing chunks. This ensures data integrity and prevents runtime errors from corrupted job data by safely parsing and validating the JSON.
…factoring

This commit consolidates the review feedback implementation:

Security & Robustness:
- Enforced a 10MB limit on file uploads to prevent DoS.
- Implemented optimistic locking in chunk processing to prevent race conditions.

Refactoring:
- Extracted UI logic into 'useFileImportProcess' hook for better separation of concerns.
- Centralized participant derivation logic in the import library.

Quality:
- Added integration tests for category mapping consistency.
- Applied code formatting.
@Uli-Z
Copy link
Contributor Author

Uli-Z commented Dec 10, 2025

Recent Updates (Yesterday & Today)

Since the initial submission of this PR, significant further improvements have been implemented:

  1. Robust Backend Architecture (Serverless Ready)

    • Database Persistence: Replaced in-memory job tracking with a dedicated ImportJob database model. This is critical for serverless environments (like Vercel), ensuring long-running
      imports survive lambda restarts or cold starts.
    • State Management: The import process is now fully stateful in the DB, allowing the client to poll progress reliably.
    • Auto-Cleanup: Implemented a maintenance routine that automatically cleans up stale import jobs older than 24 hours to prevent database bloat.
  2. Security & Safety

    • Strict DoS Protection: Enforced a 10MB hard limit on file uploads to prevent memory exhaustion attacks.
    • Concurrency Control: Implemented optimistic locking in the chunk processing logic. This prevents "Race Conditions" where parallel requests could otherwise create duplicate expenses.
  3. Code Quality & Refactoring

    • UI Logic Extraction: The FileImportModal was refactored into a "dumb" view component. Complex state and orchestration logic now resides in a custom useFileImportProcess hook,
      improving testability and separation of concerns.
    • Centralized Business Logic: Logic for deriving participants (when missing in source files) was moved from the API layer to the core library (file-import.ts), keeping the tRPC routers
      thin.
  4. Comprehensive Testing

    • Category Consistency: Added integration tests ensuring the hardcoded CATEGORY_LOOKUP in the adapter stays in sync with the database seed data.
    • Adapter Validation: Added unit tests for the SpliitJsonFormat adapter to verify parsing edge cases and error handling.
      (Note: I used gemini 3 cli-agent for this code review)

- Updated ImportFormat interface to be async (Promise-based) to support non-blocking operations and future worker offloading.
- Moved file preview logic from server-side tRPC to client-side.
- Removed fs dependencies from import parsers to allow browser execution.
- Removed obsolete importFromFilePreview tRPC endpoint.
- Fixed type errors and updated tests to align with async interface.
Adds a new tRPC procedure 'processBatch' that accepts a list of expenses and inserts them transactionally. This enables client-side batching strategies.
Moves the import orchestration to the client to reduce server load and complexity.

- Implements batching logic (processing 50 expenses at a time).
- Adds a 10MB file size limit validation.
- Includes logic to correctly map participant names to UUIDs during the import process, ensuring expenses are linked to the correct users even if the source file uses names.
…cedures

Removes the stateful 'ImportJob' model and associated procedures (start-job, run-chunk, etc.).

Previously, the server tracked import progress in the database, which caused excessive I/O overhead. The process is now stateless on the server, relying on the new client-driven batching approach.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant