[Feature] FastVideo/SFWan2.2-I2V-A14B-Preview-Diffusers acts like T2V rather than I2V.

### Motivation

Dear authors,
recently, we use the FastVideo/SFWan2.2-I2V-A14B-Preview-Diffusers to inference in a causal way.
we found that the image the model used is in a way very similar to T2V, aka:
1) the image is firstly processed to the FastVideo/SFWan2.2-I2V-A14B-Preview-Diffusers with timestamp(0) to update the kv_cache.
2) then the FastVideo/SFWan2.2-I2V-A14B-Preview-Diffusers model will generate next chunk only in T2V mode (which means it use 16 channel noise latent as input, which is different with the wan2.2 i2v using 16 channel image latents and 4 mask channel and 16 channel noised latents, totally in 36 channel to process).

Thus, we wonder was the FastVideo/SFWan2.2-I2V-A14B-Preview-Diffusers trained in T2V model instead of I2V?

And, more,
is there any I2V way also in SF(causal)?


thanks

### Related resources

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] FastVideo/SFWan2.2-I2V-A14B-Preview-Diffusers acts like T2V rather than I2V. #1086

Motivation

Related resources

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature] FastVideo/SFWan2.2-I2V-A14B-Preview-Diffusers acts like T2V rather than I2V. #1086

Description

Motivation

Related resources

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions