A lightweight way to save outputs from (expensive) computations.
The function form saves the output of running a function
and can be used with the do...end syntax.
cache("test.bson") do
a = "a very time-consuming quantity to compute"
b = "a very long simulation to run"
return (; a = a, b = b)
endThe first time this runs,
it saves the output in a BSON file called test.bson.
Subsequent runs load the saved output from the file test.bson
rather than re-running the potentially time-consuming computations!
Especially handy for long simulations.
An example of the output:
julia> using CacheVariables
julia> cache("test.bson") do
a = "a very time-consuming quantity to compute"
b = "a very long simulation to run"
return (; a = a, b = b)
end
[ Info: Saving to test.bson
(a = "a very time-consuming quantity to compute", b = "a very long simulation to run")
julia> cache("test.bson") do
a = "a very time-consuming quantity to compute"
b = "a very long simulation to run"
return (; a = a, b = b)
end
[ Info: Loading from test.bson
(a = "a very time-consuming quantity to compute", b = "a very long simulation to run")If the path is set to nothing, then caching is skipped and the function is simply run.
julia> cache(nothing) do
a = "a very time-consuming quantity to compute"
b = "a very long simulation to run"
return (; a = a, b = b)
end
[ Info: No path provided, running without caching.
(a = "a very time-consuming quantity to compute", b = "a very long simulation to run")This can be useful for conditionally saving a cache (see Using pattern 3 on a cluster below).
The macro form looks at the code to determine what variables to save.
@cache "test.bson" begin
a = "a very time-consuming quantity to compute"
b = "a very long simulation to run"
100
endThe first time this block runs,
it identifies the variables a and b and saves them
(in addition to the final output 100 that is saved as ans)
in a BSON file called test.bson.
Subsequent runs load the saved values from the file test.bson
rather than re-running the potentially time-consuming computations!
Especially handy for long simulations.
An example of the output:
julia> using CacheVariables
julia> @cache "test.bson" begin
a = "a very time-consuming quantity to compute"
b = "a very long simulation to run"
100
end
┌ Info: Saving to test.bson
│ a
└ b
100
julia> @cache "test.bson" begin
a = "a very time-consuming quantity to compute"
b = "a very long simulation to run"
100
end
┌ Info: Loading from test.bson
│ a
└ b
100An optional overwrite flag (default is false) at the end
tells the macro to always save,
even when a file with the given name already exists.
julia> @cache "test.bson" begin
a = "a very time-consuming quantity to compute"
b = "a very long simulation to run"
100
end false
┌ Info: Loading from test.bson
│ a
└ b
100
julia> @cache "test.bson" begin
a = "a very time-consuming quantity to compute"
b = "a very long simulation to run"
100
end true
┌ Info: Overwriting test.bson
│ a
└ b
100Caveat:
The variable name ans is used for storing the final output
(100 in the above examples),
so it is best to avoid using this as a variable name.
It can be common to need to cache the results of a large sweep (e.g., over parameters or trials of a simulation).
Caching the full sweep can simply be done as follows:
julia> using CacheVariables
julia> cache("test.bson") do
map(1:3) do run
result = "time-consuming result of run $run"
return result
end
end
[ Info: Saving to test.bson
3-element Vector{String}:
"time-consuming result of run 1"
"time-consuming result of run 2"
"time-consuming result of run 3"
julia> cache("test.bson") do
map(1:3) do run
result = "time-consuming result of run $run"
return result
end
end
[ Info: Loading from test.bson
3-element Vector{Any}:
"time-consuming result of run 1"
"time-consuming result of run 2"
"time-consuming result of run 3"If each run in the sweep itself takes a very long time, it can be better to cache each individual run separately as follows:
julia> using CacheVariables
julia> map(1:3) do run
cache(joinpath("cache", "run-$run.bson")) do
result = "time-consuming result of run $run"
return result
end
end
[ Info: Saving to cache/run-1.bson
[ Info: Saving to cache/run-2.bson
[ Info: Saving to cache/run-3.bson
3-element Vector{String}:
"time-consuming result of run 1"
"time-consuming result of run 2"
"time-consuming result of run 3"
julia> map(1:3) do run
cache(joinpath("cache", "run-$run.bson")) do
result = "time-consuming result of run $run"
return result
end
end
[ Info: Loading from cache/run-1.bson
[ Info: Loading from cache/run-2.bson
[ Info: Loading from cache/run-3.bson
3-element Vector{String}:
"time-consuming result of run 1"
"time-consuming result of run 2"
"time-consuming result of run 3"A convenient aspect of this pattern is that the runs can then be performed independently, such as on different nodes of a computing cluster. For example, the following code allows the runs to be spread across a SLURM job array:
julia> using CacheVariables
julia> ENV["SLURM_ARRAY_TASK_ID"] = 2 # simulate run from SLURM job array
2
julia> SUBSET = haskey(ENV, "SLURM_ARRAY_TASK_ID") ?
(IDX = parse(Int, ENV["SLURM_ARRAY_TASK_ID"]); IDX:IDX) : Colon()
2:2
julia> map((1:3)[SUBSET]) do run
cache(joinpath("cache", "run-$run.bson")) do
result = "time-consuming result of run $run"
return result
end
end
[ Info: Saving to cache/run-2.bson
1-element Vector{String}:
"time-consuming result of run 2"When run on the cluster, this only runs (and caches) the case indexed the job array index. Then, when the code is run again (off the cluster), the caches from the full sweep will simply be loaded!
Sometimes it's useful to make a merged cache file (e.g., to reduce the number of cache files to commit in git, etc.).
A convenient pattern here is to use nested cache calls.
julia> using CacheVariables
julia> cache("fullsweep.bson") do
map(1:3) do run
cache(joinpath("cache", "run-$run.bson")) do
result = "time-consuming result of run $run"
return result
end
end
end
[ Info: Saving to cache/run-1.bson
[ Info: Saving to cache/run-2.bson
[ Info: Saving to cache/run-3.bson
[ Info: Saving to fullsweep.bson
3-element Vector{String}:
"time-consuming result of run 1"
"time-consuming result of run 2"
"time-consuming result of run 3"
julia> cache("fullsweep.bson") do
map(1:3) do run
cache(joinpath("cache", "run-$run.bson")) do
result = "time-consuming result of run $run"
return result
end
end
end
[ Info: Loading from fullsweep.bson
3-element Vector{Any}:
"time-consuming result of run 1"
"time-consuming result of run 2"
"time-consuming result of run 3"Note that only the fullsweep.bson cache file was used when loading.
Once this file is produced, the intermediate files (cache/run-1.bson, etc.) are no longer needed.
To use this pattern on a cluster (as in Using pattern 2 on a cluster), we need to make sure the outer cache is not formed until we have all the results.
This can be done as follows:
julia> using CacheVariables
julia> ENV["SLURM_ARRAY_TASK_ID"] = 2 # simulate run from SLURM job array
2
julia> SUBSET = haskey(ENV, "SLURM_ARRAY_TASK_ID") ?
(IDX = parse(Int, ENV["SLURM_ARRAY_TASK_ID"]); IDX:IDX) : Colon()
2:2
julia> cache(SUBSET === Colon() ? "fullsweep.bson" : nothing) do
map((1:3)[SUBSET]) do run
cache(joinpath("cache", "run-$run.bson")) do
result = "time-consuming result of run $run"
return result
end
end
end
[ Info: No path provided, running without caching.
[ Info: Saving to cache/run-2.bson
1-element Vector{String}:
"time-consuming result of run 2"Note that the full cache was not generated here.