feat(core): add PackCachingBackend for API-expensive backends #447

mro68 · 2025-12-21T03:53:16Z

This PR replaces #445 with a clean branch/commit history.

Summary:

Add PackCachingBackend (LRU in-memory cache) to reduce API calls for expensive backends like Google Drive by caching pack files.

Dependency:

Depends on feat(backend/opendal): include scheme in location() and bump opendal to 0.55.0 #446 (scheme included in location() + OpenDAL bump).
Once feat(backend/opendal): include scheme in location() and bump opendal to 0.55.0 #446 is merged, I will rebase this PR onto main to keep the diff focused.

Testing:

cargo fmt
cargo clippy --all-features -- -D warnings
cargo test --all-features

Notes:

I’m still learning some git pitfalls; sorry for the noise in the previous PR.

…o 0.55.0

mro68 · 2025-12-21T05:24:35Z

Response to @aawsome's Feedback

Thank you @aawsome for your detailed and thoughtful feedback! I completely understand and appreciate your concerns about architecture, memory usage, and the trade-offs involved.

1. Acknowledgment of Concerns

You're absolutely right about the architectural considerations:

The memory overhead with large pack files (up to 2 GiB) is a valid concern
Reading entire pack files does transfer more data than strictly necessary
A prune-specific solution (like in restore) would be more elegant architecturally

2. Why I Chose This Approach

My goal was actually broader than just fixing my GDrive issue - I wanted to enable the same performance benefits for all cloud services that OpenDAL supports (GDrive, OneDrive, Dropbox, etc.) where API calls are expensive.

I deliberately designed this solution as a transparent backend layer to:

Avoid touching security-critical core functions
Keep prune working exactly as before (no changes to deduplication, compression, security)
Make it easy to enable/disable without affecting rustic's core strengths
Provide a quick win while we work on a better long-term solution

3. Offer to Collaborate on Prune Optimization

I would be very happy to collaborate on a proper prune-specific optimization! Your suggestion of:

Reading contiguous blob ranges
Configurable "hole size" threshold for the API-calls vs. data-transfer trade-off
Applying the same logic to restore

...sounds like the right long-term approach. I'm willing to work on this, though I should mention I'm still learning Rust (I come from CNC/robotics programming, Python, C#, etc.) - so I might need some guidance on the rustic codebase architecture.

4. Making PackCachingBackend Configurable

I propose making the cache configurable to address your concerns:

Option A: Configuration-based

[pack-cache]
enabled = true  # or false to disable completely
max_packs = 128  # configurable cache size
max_memory_mb = 6144  # memory limit (e.g., 6 GB)
backends = ["gdrive", "onedrive"]  # which backends to cache

Option B: Prune-only activation

Only activate PackCachingBackend during prune operations
Other operations use the normal backend

Option C: Both

Configurable + only active for specific operations

Which approach would you prefer? I'm happy to implement whichever makes the most sense for rustic's architecture.

5. Practical Reality Check

The performance difference is dramatic:

Without cache: 2h 48min for prune (160 GB repo, 1,211 blobs in 21 packs)
With cache: 8 minutes for the same operation
Speedup: ~21x faster

For GDrive users, this is the difference between "unusable" and "practical". Even with the memory overhead and extra data transfer, it's a massive improvement.

Summary

I see this PR as a pragmatic short-term solution that:

Makes GDrive (and similar backends) usable today
Doesn't break anything for existing backends
Can be made configurable to address memory concerns
Doesn't prevent us from implementing a better long-term solution

I'm committed to working on the proper prune optimization you suggested - this cache can serve as a bridge until we have that in place.

What do you think? Should I add configuration options, or would you prefer to discuss the prune-specific approach first?

P.S.: I'm still learning Git workflows - sorry for the noise with the previous PRs. I've learned to use clean branches now! 😅

aawsome · 2025-12-21T07:40:32Z

Hi @mro68!

Thanks a lot for the proposals. I must say, I took to opportunity to check out how the prune rewrite could be optimized directly - and already implemented this in #448.
I am very sorry if this by accident overruns your ambitions to work on this. I really appreciate every help and the contributions you made so far and hope you stay motivated to make more improvements!
Can you help me to finish #448 by testing it (of course only on a test repo as errors could be still in)? It should have a similar or even better effect as this PR but has no implications on memory usage. Maybe we'll need to tweak the parameters (40 MiB / 256 kiB) added or even make them configurable for users...

mro68 · 2025-12-31T18:34:20Z

Closing in favor of #448, which provides an elegant algorithmic solution to the same problem.

My testing showed both approaches achieve similar performance for --repack-all (~10min vs 3.5h baseline), but #448 has the advantage of zero memory overhead compared to the pack cache approach (which could use up to 6 GB RAM).

Thanks @aawsome for the alternative solution! 👍

Happy New Year! 🎆 Wishing you a smooth transition into 2026 and a healthy, successful year ahead!

mro68 added 3 commits December 21, 2025 04:28

fix(backend/opendal): include scheme in location() and bump opendal t…

097c23c

…o 0.55.0

feat(core): add PackCachingBackend

99bc611

chore(deps): update Cargo.lock

10a823c

mro68 mentioned this pull request Dec 21, 2025

feat(backend/opendal): include scheme in location() and bump opendal to 0.55.0 #446

Merged

Merge branch 'main' into feat/pack-caching-backend-v2

e55fd59

mro68 closed this Dec 31, 2025

mro68 mentioned this pull request Dec 31, 2025

feat: Optimize prune rewrite #448

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(core): add PackCachingBackend for API-expensive backends #447

feat(core): add PackCachingBackend for API-expensive backends #447

Uh oh!

mro68 commented Dec 21, 2025 •

edited

Loading

Uh oh!

mro68 commented Dec 21, 2025

Uh oh!

aawsome commented Dec 21, 2025

Uh oh!

mro68 commented Dec 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat(core): add PackCachingBackend for API-expensive backends #447

feat(core): add PackCachingBackend for API-expensive backends #447

Uh oh!

Conversation

mro68 commented Dec 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mro68 commented Dec 21, 2025

Response to @aawsome's Feedback

1. Acknowledgment of Concerns

2. Why I Chose This Approach

3. Offer to Collaborate on Prune Optimization

4. Making PackCachingBackend Configurable

5. Practical Reality Check

Summary

Uh oh!

aawsome commented Dec 21, 2025

Uh oh!

mro68 commented Dec 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mro68 commented Dec 21, 2025 •

edited

Loading