sd: relax size restrictions for DiT models#1986
sd: relax size restrictions for DiT models#1986wbruna wants to merge 1 commit intoLostRuins:concedo_experimentalfrom
Conversation
Round image dimensions to the specific multiple required by each DiT model, which range from 32 (certain Wan models) to 1 (Chroma Radiance), with most requiring multiples of 8 or 16. Unet models keep being rounded to multiples of 64. Current sd.cpp rounds the sizes internally; but it always rounds up, so we still need to round on our side to apply image size restrictions, and to trigger VAE tiling correctly. Also, remove a legacy test that could abort a generation with unsupported image sizes: it'd never run, because it was applied after the image side adjustements.
|
I'm not sure what would be the best approach to the stable-ui side. Maybe a new config item like the "Allow Larger Params" to change the granularity, still defaulting to 64? We could explain in a tooltip that 64 is the most compatible value, and that different values could be rounded by the server to model-specific multiples. Not that I am eager to program that or anything 🙂 |
|
|
||
| } else { | ||
|
|
||
| if (params.width <= 0 || params.width % 64 != 0 || params.height <= 0 || params.height % 64 != 0) { |
There was a problem hiding this comment.
still needed for handling negative numbers i think
There was a problem hiding this comment.
The sd_fix_resolution function deals with negatives right at its beginning:
width = std::max(std::min(width, 8192), img_side_min);
height = std::max(std::min(height, 8192), img_side_min);
Yep; it was changed upstream at leejet/stable-diffusion.cpp#1073. I'm using the same code to get the needed multiple. |
|
One way to test is with sides that are almost multiples of 64: 127x127 become 120x120 for ZIT, 112x112 for Klein 4B, 64x64 for SDXL. Also tested VAE tiling, with 767x1023 requests. -1x-1 becomes 64x64 as before, though we should probably return an error at the API level instead. |

Round image dimensions to the specific multiple required by each DiT model, which range from 32 (certain Wan models) to 1 (Chroma Radiance), with most requiring multiples of 8 or 16. Unet models keep being rounded to multiples of 64.
Current sd.cpp rounds the sizes internally; but it always rounds up, so we still need to round on our side to apply image size restrictions, and to trigger VAE tiling correctly.
Also, remove a legacy test that could abort a generation with unsupported image sizes: it'd never run, because it was applied after the image side adjustements.