`DatasetRestoring` not optimised for distributed GPUs or CPUs

In a distributed model setup, the `DatasetRestoring` step uses a huge amount of memory - up to 9GB per-GPU in a 1/4 degree ocean model. 

I ran a series of tests with the below MWE that outputs chip memory usage at opportune steps and found an interesting trend:

```Julia
using MPI
using CUDA

MPI.Init()
atexit(MPI.Finalize)  

using Oceananigans
using Oceananigans.Units
using ClimaOcean
using Oceananigans.DistributedComputations
using Printf
using Dates
using ClimaOcean.EN4
using ClimaOcean.ECCO
using ClimaOcean.EN4: download_dataset

data_path = expanduser("/g/data/v46/txs156/ocean-ensembles/data/")

arch = Distributed(CPU(); partition = Partition(y = DistributedComputations.Equal()), synchronized_communication=true)

function memory_status(arch::Union{Distributed{<:GPU}, <:GPU})
    free, total = CUDA.memory_info()
    used = total - free
    used_GiB, free_GiB, total_GiB = used / 2^30, free / 2^30, total / 2^30
    @show arch.local_rank, used_GiB, free_GiB, total_GiB
end

function memory_status(arch::Union{Distributed{<:CPU}, <:CPU})
    total = Sys.total_memory()
    free  = Sys.free_memory()
    used  = total - free
    used_GiB, free_GiB, total_GiB = used / 2^30, free / 2^30, total / 2^30
    @show arch.local_rank, used_GiB, free_GiB, total_GiB
end

Nx, Ny, Nz = 100, 100, 50
Lx, Ly = 100, 100
@info "Defining vertical z faces"
depth = -6000.0 # Depth of the ocean in meters
z_faces = ExponentialCoordinate(Nz, depth, 0) 
memory_status(arch)
@info "Creating grid"

underlying_grid = TripolarGrid(arch;
                    size = (Nx, Ny, Nz),
                    z = z_faces,
                    halo = (6, 6, 3))

@info "Defining bottom bathymetry"

memory_status(arch)

ETOPOmetadata = Metadatum(:bottom_height, dataset=ETOPO2022(), dir = data_path)
ClimaOcean.DataWrangling.download_dataset(ETOPOmetadata)
memory_status(arch)
@time bottom_height = regrid_bathymetry(underlying_grid, ETOPOmetadata;
                                  minimum_depth = 15,
                                  interpolation_passes = 1, # 75 interpolation passes smooth the bathymetry near Florida so that the Gulf Stream is able to flow
				                  major_basins = 2)

memory_status(arch)

@info "Defining grid"

@time grid = ImmersedBoundaryGrid(underlying_grid, GridFittedBottom(bottom_height); active_cells_map=true)

memory_status(arch)
### Restoring

# We include surface salinity restoring to a predetermined dataset.

dates = vcat(collect(DateTime(1991, 1, 1): Month(1): DateTime(1991, 5, 1)),
             collect(DateTime(1990, 5, 1): Month(1): DateTime(1990, 12, 1)))

@info "We download the 1990-1991 data for an RYF implementation"

dataset = EN4Monthly() # Other options include ECCO2Monthly(), ECCO4Monthly() or ECCO2Daily()

temperature = Metadata(:temperature; dates, dataset = dataset, dir=data_path)
salinity    = Metadata(:salinity;    dates, dataset = dataset, dir=data_path)

download_dataset(temperature)
download_dataset(salinity)

memory_status(arch)

@info "Defining restoring rate"

restoring_rate  = 1 / 18days
@inline mask(x, y, z, t) = z ≥ z_surf - 1

FS = DatasetRestoring(salinity, grid; mask, rate=restoring_rate, time_indices_in_memory = 10)
forcing = (; S=FS)

memory_status(arch)
```
Gives:

MEMORY USAGE:
_________________________
No. of G/CPUs           GPUs.                       CPUs
2.                     125MB per GPU.           237 MB per CPU
3.                     190MB per GPU            403 MB per CPU
_________________________

1. In the above simulations, the creation of the model itself uses ~60MB of memory, so the restoring is very memory intensive. 
2. Increasing the number of G/CPUs INCREASES the per-GPU memory allocation, which seems wrong. 

cc @navidcy @simone-silvestri 



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`DatasetRestoring` not optimised for distributed GPUs or CPUs #633

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

DatasetRestoring not optimised for distributed GPUs or CPUs #633

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`DatasetRestoring` not optimised for distributed GPUs or CPUs #633