Skip to content

feat: delete attachment files when removing an account#257

Open
YourEconProf wants to merge 5 commits intowesm:mainfrom
YourEconProf:delete-attachments-on-remove
Open

feat: delete attachment files when removing an account#257
YourEconProf wants to merge 5 commits intowesm:mainfrom
YourEconProf:delete-attachments-on-remove

Conversation

@YourEconProf
Copy link
Copy Markdown
Contributor

Attachments unique to the removed source are deleted from disk during remove-account. Attachments whose content_hash is shared with another source are preserved.

Removes need for 'msgvault gc (when available)' message

Attachments unique to the removed source are deleted from disk during
`remove-account`. Attachments whose content_hash is shared with another
source are preserved.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@roborev-ci
Copy link
Copy Markdown

roborev-ci bot commented Apr 10, 2026

roborev: Combined Review (2ee1998)

Verdict: Changes are not ready to merge due to 2 medium-severity correctness issues in attachment cleanup.

Medium

  • Race can delete attachments that became shared during account removal
    Location: cmd/msgvault/cmd/remove_account.go:118-138
    Problem: The command computes deletable attachment paths before RemoveSource executes, then deletes those files afterward without re-validating. If another sync/import adds a reference to the same content_hash in that window, this can remove a file that is now referenced by a different source, leaving attachment metadata pointing to a missing file.
    Fix: Determine deletable paths within the same write transaction/snapshot as source removal, or re-check that no attachment rows still reference each path immediately before os.Remove.

  • storage_path null/empty values can break removal or target the attachments root
    Location: internal/store/messages.go:1228, cmd/msgvault/cmd/remove_account.go:133
    Problem: If storage_path is NULL, scanning into a plain string will fail and abort account removal. If it is an empty string, filepath.Join(attachmentsDir, relPath) resolves to the attachments directory itself, causing os.Remove to attempt deletion of the root attachments directory.
    Fix: Filter these rows out in the query, e.g. AND a.storage_path IS NOT NULL AND a.storage_path != ''.


Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

- Filter NULL/empty storage_path in AttachmentPathsUniqueToSource query to
  prevent scan panic and accidental deletion of the attachments root dir
- Re-check each path is unreferenced immediately before os.Remove to guard
  against a concurrent sync adding a reference after the candidate list
  was collected (TOCTOU race)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@roborev-ci
Copy link
Copy Markdown

roborev-ci bot commented Apr 10, 2026

roborev: Combined Review (3f7f504)

Verdict: Changes are not safe to merge yet due to a high-severity race that can delete live attachment data.

High

  • cmd/msgvault/cmd/remove_account.go:135
  • internal/store/messages.go:1262
  • internal/sync/sync.go:747

The new attachment cleanup is still vulnerable to a TOCTOU race. remove-account checks whether a storage_path is referenced in the database and then deletes the file, but attachment ingestion writes or validates the file on disk first (StoreAttachmentFile) and only inserts the attachments row afterward (UpsertAttachment). That leaves a window where another concurrent sync/import can have a real attachment file in flight while the DB still shows zero references. In that case, remove-account can delete the blob and the concurrent sync then records a live attachment row pointing at a missing file.

Suggested fix: make attachment creation and attachment cleanup mutually exclusive, for example by taking the same shared lock around StoreAttachmentFile + UpsertAttachment and around IsAttachmentPathReferenced + os.Remove, or by deferring physical deletion to a GC phase that runs only when sync/import jobs are not active.


Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

StoreAttachmentFile writes the file before UpsertAttachment inserts the DB
row, leaving a window where IsAttachmentPathReferenced returns false for a
file a concurrent sync has already written. To close this race, skip all
file deletion when HasAnyActiveSync reports a running sync and print a
message directing the user to re-run when idle.

The per-path IsAttachmentPathReferenced re-check is kept as a last-ditch
guard for the narrow window between the HasAnyActiveSync gate and the
individual os.Remove calls.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@roborev-ci
Copy link
Copy Markdown

roborev-ci bot commented Apr 10, 2026

roborev: Combined Review (af44f07)

Verdict: Changes are not safe to merge yet due to one high-severity concurrency bug and one medium-severity cleanup regression.

High

  • cmd/msgvault/cmd/remove_account.go:123-141
    The active-sync safeguard runs after s.RemoveSource(source.ID). Because sync_runs.source_id is ON DELETE CASCADE, deleting the source can remove the very running row that HasAnyActiveSync() relies on. That means the check can incorrectly report no active sync while the removed account's sync is still ingesting attachments, allowing file deletion to race with in-flight writes.

Medium

  • cmd/msgvault/cmd/remove_account.go:119-150
    If HasAnyActiveSync() returns true or errors, the source has already been deleted, so attachment cleanup is skipped after discarding the DB rows needed to determine which files became orphaned. The current warning to re-run msgvault remove-account is not actionable because the account no longer exists.

  • cmd/msgvault/cmd/remove_account.go:140-166, internal/store/sync.go:240-248, internal/sync/sync.go:745-753
    The new guard is still a TOCTOU check rather than real synchronization. HasAnyActiveSync() only snapshots current state; a new sync can start after it returns false but before the delete loop runs. Since attachment ingestion writes the file before inserting its DB reference, IsAttachmentPathReferenced() can return false for a file another sync has already created, and os.Remove() can delete it, leaving a later attachment row pointing at a missing file.


Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

YourEconProf and others added 2 commits April 10, 2026 18:37
sync_runs.source_id is ON DELETE CASCADE, so calling HasAnyActiveSync()
after RemoveSource() can miss a running sync on the account being removed —
the sync_run row is already gone by the time we check.

Move the HasAnyActiveSync check (and the skip decision) to before
RemoveSource so sync_runs are still intact. The skip flag is then acted on
after the cascade, keeping the correct invariant: per-path
IsAttachmentPathReferenced catches any new reference a concurrent sync for a
*different* source may have inserted in the window between the pre-check and
the os.Remove calls.

Also fix the warning message: since the account no longer exists after
removal, directing the user to re-run 'remove-account' is not actionable.
Instead point them at the attachments directory.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…etion

- Validate that each resolved attachment path stays within attachmentsDir
  before calling os.Remove. filepath.Join resolves '..' components, so a
  corrupt or tampered DB storage_path value could otherwise direct deletion
  at a file outside the attachments directory.

- Add idx_attachments_storage_path index so IsAttachmentPathReferenced does
  not full-scan the attachments table on every per-file re-check. The
  existing content_hash index does not cover the storage_path = ? query.

- Document in AttachmentPathsUniqueToSource that thumbnail_path is not
  included; no sync/import code currently writes thumbnail files, so there
  is nothing to clean up, but the gap is noted for future implementors.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@roborev-ci
Copy link
Copy Markdown

roborev-ci bot commented Apr 10, 2026

roborev: Combined Review (4eb0125)

Blocking issues remain: account removal can still delete live attachment files, and the current “skip deletion” path can delete the wrong files.

High

  • Concurrent sync race can delete live attachments

    • Locations: cmd/msgvault/cmd/remove_account.go:124, cmd/msgvault/cmd/remove_account.go:136, cmd/msgvault/cmd/remove_account.go:200, internal/store/sync.go:240, internal/sync/sync.go:652
    • The new HasAnyActiveSync() check is only point-in-time. A new sync can start after the check passes but before or during attachment deletion. Since attachment ingestion writes the blob before inserting the DB row, IsAttachmentPathReferenced() can still observe the path as unreferenced and delete a file that an in-flight sync is about to reference, leaving the database pointing to a missing file.
    • This needs serialization between remove-account deletion and sync startup / attachment writes; re-checking DB references alone is not sufficient.
  • filepath.Abs failure does not actually skip deletion

    • Locations: cmd/msgvault/cmd/remove_account.go:169, cmd/msgvault/cmd/remove_account.go:173
    • When filepath.Abs(attachmentsDir) fails, the code logs that file deletion is being skipped, but still proceeds into the deletion loop with cleanDir = "". That bypasses the containment check and can attempt to remove relPath relative to the current working directory.
    • The deletion loop should be fully bypassed on Abs failure, or file deletion should be explicitly disabled before iteration.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant