The unet model expect encoder depth upto 5, but convnext model gives 4, from its each stage. In the official convnext, they use 4 stage conv block output to build uper-net model. But I'm not sure with unet implementation in this repo can achieve convnext-unet anyhow?