Skip to content

Conversation

@mro68
Copy link
Contributor

@mro68 mro68 commented Dec 21, 2025

This PR replaces #445 with a clean branch/commit history.

Summary:

  • Add PackCachingBackend (LRU in-memory cache) to reduce API calls for expensive backends like Google Drive by caching pack files.

Dependency:

Testing:

  • cargo fmt
  • cargo clippy --all-features -- -D warnings
  • cargo test --all-features

Notes:

  • I’m still learning some git pitfalls; sorry for the noise in the previous PR.

@mro68
Copy link
Contributor Author

mro68 commented Dec 21, 2025

Response to @aawsome's Feedback

Thank you @aawsome for your detailed and thoughtful feedback! I completely understand and appreciate your concerns about architecture, memory usage, and the trade-offs involved.

1. Acknowledgment of Concerns

You're absolutely right about the architectural considerations:

  • The memory overhead with large pack files (up to 2 GiB) is a valid concern
  • Reading entire pack files does transfer more data than strictly necessary
  • A prune-specific solution (like in restore) would be more elegant architecturally

2. Why I Chose This Approach

My goal was actually broader than just fixing my GDrive issue - I wanted to enable the same performance benefits for all cloud services that OpenDAL supports (GDrive, OneDrive, Dropbox, etc.) where API calls are expensive.

I deliberately designed this solution as a transparent backend layer to:

  • Avoid touching security-critical core functions
  • Keep prune working exactly as before (no changes to deduplication, compression, security)
  • Make it easy to enable/disable without affecting rustic's core strengths
  • Provide a quick win while we work on a better long-term solution

3. Offer to Collaborate on Prune Optimization

I would be very happy to collaborate on a proper prune-specific optimization! Your suggestion of:

  • Reading contiguous blob ranges
  • Configurable "hole size" threshold for the API-calls vs. data-transfer trade-off
  • Applying the same logic to restore

...sounds like the right long-term approach. I'm willing to work on this, though I should mention I'm still learning Rust (I come from CNC/robotics programming, Python, C#, etc.) - so I might need some guidance on the rustic codebase architecture.

4. Making PackCachingBackend Configurable

I propose making the cache configurable to address your concerns:

Option A: Configuration-based

[pack-cache]
enabled = true  # or false to disable completely
max_packs = 128  # configurable cache size
max_memory_mb = 6144  # memory limit (e.g., 6 GB)
backends = ["gdrive", "onedrive"]  # which backends to cache

Option B: Prune-only activation

  • Only activate PackCachingBackend during prune operations
  • Other operations use the normal backend

Option C: Both

  • Configurable + only active for specific operations

Which approach would you prefer? I'm happy to implement whichever makes the most sense for rustic's architecture.

5. Practical Reality Check

The performance difference is dramatic:

  • Without cache: 2h 48min for prune (160 GB repo, 1,211 blobs in 21 packs)
  • With cache: 8 minutes for the same operation
  • Speedup: ~21x faster

For GDrive users, this is the difference between "unusable" and "practical". Even with the memory overhead and extra data transfer, it's a massive improvement.

Summary

I see this PR as a pragmatic short-term solution that:

  • Makes GDrive (and similar backends) usable today
  • Doesn't break anything for existing backends
  • Can be made configurable to address memory concerns
  • Doesn't prevent us from implementing a better long-term solution

I'm committed to working on the proper prune optimization you suggested - this cache can serve as a bridge until we have that in place.

What do you think? Should I add configuration options, or would you prefer to discuss the prune-specific approach first?


P.S.: I'm still learning Git workflows - sorry for the noise with the previous PRs. I've learned to use clean branches now! 😅

@aawsome
Copy link
Member

aawsome commented Dec 21, 2025

Hi @mro68!

Thanks a lot for the proposals. I must say, I took to opportunity to check out how the prune rewrite could be optimized directly - and already implemented this in #448.
I am very sorry if this by accident overruns your ambitions to work on this. I really appreciate every help and the contributions you made so far and hope you stay motivated to make more improvements!
Can you help me to finish #448 by testing it (of course only on a test repo as errors could be still in)? It should have a similar or even better effect as this PR but has no implications on memory usage. Maybe we'll need to tweak the parameters (40 MiB / 256 kiB) added or even make them configurable for users...

@mro68
Copy link
Contributor Author

mro68 commented Dec 31, 2025

Closing in favor of #448, which provides an elegant algorithmic solution to the same problem.

My testing showed both approaches achieve similar performance for --repack-all (~10min vs 3.5h baseline), but #448 has the advantage of zero memory overhead compared to the pack cache approach (which could use up to 6 GB RAM).

Thanks @aawsome for the alternative solution! 👍


Happy New Year! 🎆 Wishing you a smooth transition into 2026 and a healthy, successful year ahead!

@mro68 mro68 closed this Dec 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants