LTX-2: IC-LoRA training with reference videos #2498

bghira · 2026-01-26T03:21:06Z

This pull request adds support for IC-LoRA-style reference video conditioning to the LTX-2 model and pipeline, enabling the use of reference videos during inference and training. The changes include new methods for handling conditioning latents, validation logic, and integration of reference video tokens into the inference and training workflows.

IC-LoRA Reference Video Conditioning Support:

Added supports_conditioning_dataset, requires_conditioning_latents, and prepare_batch_conditions methods to the LTX-2 model to indicate and validate support for reference video conditioning. These methods ensure that only valid conditioning types and latent inputs are accepted. [1] [2]
Enhanced the model_predict method to handle reference video conditioning latents, including input validation, packing of reference and target latents, concatenation, and alignment of timesteps and positional encodings. The method now also ensures reference tokens are excluded from the output and not used with incompatible regularizers. [1] [2] [3] [4] [5]

Pipeline Integration and Data Handling:

Introduced _prepare_video_conditioning to the pipeline for loading, resizing, encoding, and packing reference video frames as latents, along with generating corresponding masks and positional encodings.
Updated the pipeline's __call__ method to accept a video_conditioning argument, process and concatenate reference tokens and masks, and ensure correct handling during the denoising loop (including timestep masking and restoration of reference tokens at each step). Reference tokens are removed from the final output. [1] [2] [3] [4] [5] [6] [7]

Utilities and Imports:

Added imports for load_video and resize_video_frames to support video loading and preprocessing in both the main and image-to-video pipelines. [1] [2]
Added a utility function retrieve_latents to extract latents from VAE encoder output, supporting both sampling and argmax modes.

These changes collectively enable robust and flexible reference video conditioning for LTX-2, improving its capabilities for tasks requiring IC-LoRA-style conditioning.

Copilot

Pull request overview

This PR adds IC-LoRA-style reference video conditioning support to the LTX-2 training model and both the text-to-video and image-to-video pipelines, enabling the model to consume reference videos as latent-space conditions during training and inference.

Changes:

Add reference video loading, resizing, VAE encoding, and packing utilities to both LTX-2 pipelines, plus a shared retrieve_latents helper.
Extend the LTXVideo2 training model to declare conditioning support, validate conditioning batches, and integrate reference conditioning latents into model_predict (timesteps, RoPE coords, force-keep mask, and output slicing).
Wire video_conditioning into the pipelines’ __call__ methods, including token/mask concatenation, timestep masking for reference tokens, and removal of reference tokens from outputs.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File	Description
`simpletuner/helpers/models/ltxvideo2/pipeline_ltx2.py`	Adds reference video conditioning support for text-to-video inference: video loading/resizing/encoding, packing into reference tokens/masks/coords, integrating `video_conditioning` into the denoising loop via sequence concatenation, timestep masking, and coordinate concatenation, and stripping reference tokens before decoding.
`simpletuner/helpers/models/ltxvideo2/pipeline_ltx2_image2video.py`	Mirrors the reference conditioning path for image-to-video, reusing similar `_prepare_video_conditioning` logic and adapting the denoising loop to separate reference vs target tokens in latent space and keep reference tokens fixed across steps.
`simpletuner/helpers/models/ltxvideo2/model.py`	Declares that LTX-2 supports and uses conditioning latents, validates conditioning inputs in `prepare_batch_conditions` / `model_predict`, and augments `model_predict` to pack reference latents, build per-token timesteps and RoPE coords, extend TREAD force-keep masks, disallow CREPA with reference tokens, and drop reference tokens from the video prediction.

Comments suppressed due to low confidence (1)

simpletuner/helpers/models/ltxvideo2/model.py:1011

This assignment to 'prepare_batch_conditions' is unnecessary as it is redefined before this value is used.

    def prepare_batch_conditions(self, batch: dict, state: dict) -> dict:

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

simpletuner/helpers/models/ltxvideo2/model.py

LTX-2: IC-LoRA training with reference videos

d5246eb

bghira requested a review from Copilot January 26, 2026 03:21

Copilot started reviewing on behalf of bghira January 26, 2026 03:21 View session

Copilot AI reviewed Jan 26, 2026

View reviewed changes

simpletuner/helpers/models/ltxvideo2/model.py Outdated Show resolved Hide resolved

bghira added 5 commits January 25, 2026 21:50

fix duplicate method

14c0938

run the image_path() method when TrainingSample is returned

fba6b27

forcibly disable audio for conditioning video datasets

f22256f

fix regression in str path handling

e7d604d

maybe fix video latent caching for conditioning sets

ecc8c64

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LTX-2: IC-LoRA training with reference videos #2498

LTX-2: IC-LoRA training with reference videos #2498

Uh oh!

bghira commented Jan 26, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

LTX-2: IC-LoRA training with reference videos #2498

Are you sure you want to change the base?

LTX-2: IC-LoRA training with reference videos #2498

Uh oh!

Conversation

bghira commented Jan 26, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants