Feature/multi modal #4

qchapp · 2025-12-19T14:04:25Z

This pull request adds robust multimodal (image + text) support to MIRAGE, enabling the processing of datasets containing both images and text with vision-language models (VLMs). The changes cover configuration, input handling, batch processing, and documentation, making MIRAGE compatible with datasets containing embedded images or image file paths. Additionally, the code now gracefully handles empty shards and includes an example configuration for a medical imaging dataset.

Multimodal (Image) Support:

Added support for image inputs in both configuration and processing, including handling of embedded images (PIL) and path-based images with a configurable image_base_path. [1] [2] [3]
Implemented resolve_image_input to robustly resolve image paths, URLs, and embedded objects for SGLang compatibility.
Modified the batch processing logic to handle multimodal prompts, including per-example calls for image-containing batches and efficient batching for text-only cases. [1] [2]

Configuration and Documentation:

Updated the README.md with detailed instructions and examples for configuring and using multimodal (image + text) datasets, including both embedded and path-based image scenarios. [1] [2]
Added a sample configuration file (config_pmc_oa.yaml) for a medical imaging dataset using a vision-language model, demonstrating new multimodal features.

General Improvements and Maintenance:

Gracefully handles empty shards in distributed processing, ensuring no errors when a shard has zero samples.
Cleaned up and removed an unused Markdown prompt from prompts.py.
Minor refactoring and imports for improved type handling and PIL image support. [1] [2]
Updated run.sh to simplify configuration and output directory handling.

These changes significantly enhance MIRAGE's flexibility for multimodal data and improve its usability for a wider range of datasets and models.

Copilot

Pull request overview

This pull request adds comprehensive multimodal (image + text) support to MIRAGE, enabling the framework to process datasets containing images alongside text using Vision-Language Models (VLMs). The changes introduce image input handling, path resolution for external image files, and batch processing logic that accommodates both embedded PIL Images and path-based images.

Key changes include:

Extended configuration schema with type: image and image_base_path fields for input variables to support both embedded and path-based images
Implemented resolve_image_input function to handle various image input formats (PIL Images, URLs, absolute/relative paths)
Modified batch processing to detect multimodal inputs and route them through per-example generation for compatibility with VLM APIs
Added graceful handling of empty shards in distributed processing

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 11 comments.

Show a summary per file

File	Description
src/mirage/config.py	Added `type` and `image_base_path` fields to `InputVar` dataclass with `is_image()` helper method for image input identification
src/mirage/utils.py	Implemented image path resolution logic, added PIL Image imports, and updated template filling to preserve non-string objects like images
src/mirage/shard_process.py	Added multimodal prompt builder, modified batch processing to handle image inputs with per-example generation, and added empty shard handling
src/mirage/prompts.py	Removed unused `ASSISTANT_ONLY_MD_PROMPT` constant
run.sh	Simplified script by removing hardcoded output directory variables
configs/config_pmc_oa.yaml	Added example configuration for medical imaging dataset demonstrating multimodal features with Qwen3-VL model
README.md	Added comprehensive documentation section explaining multimodal usage with examples for both embedded and path-based images

Comments suppressed due to low confidence (1)

src/mirage/utils.py:6

Import of 'Path' is not used.

from pathlib import Path

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/mirage/shard_process.py

README.md

src/mirage/utils.py

configs/config_pmc_oa.yaml

src/mirage/shard_process.py

src/mirage/utils.py

src/mirage/shard_process.py

src/mirage/utils.py

Copilot · 2025-12-19T14:10:21Z

src/mirage/shard_process.py

+        ds_shard.save_to_disk(shard_out_dir)
+        try:
+            llm.shutdown()
+        except Exception:


'except' clause does nothing but pass and there is no explanatory comment.

Yeah add a warning here probably

Ok I'll do it

Co-authored-by: Copilot <[email protected]>

Copilot · 2025-12-19T14:25:58Z

@qchapp I've opened a new pull request, #5, to work on those changes. Once the pull request is ready, I'll request review from you.

* Initial plan * Make chat template configurable for multimodal models Co-authored-by: qchapp <[email protected]> * Update documentation for configurable chat template Co-authored-by: qchapp <[email protected]> * Address code review feedback: improve chat template validation and inference Co-authored-by: qchapp <[email protected]> * Improve chat template inference and add early validation Co-authored-by: qchapp <[email protected]> * Remove infer_chat_template method, make chat_template explicit in config Co-authored-by: qchapp <[email protected]> * Improve error message and simplify comment Co-authored-by: qchapp <[email protected]> --------- Co-authored-by: copilot-swe-agent[bot] <[email protected]> Co-authored-by: qchapp <[email protected]>

Copilot · 2025-12-19T14:47:13Z

@qchapp I've opened a new pull request, #6, to work on those changes. Once the pull request is ready, I'll request review from you.

* Initial plan * Optimize batch processing by separating text-only and multimodal samples Co-authored-by: qchapp <[email protected]> * Optimize chat template validation to run once per batch Co-authored-by: qchapp <[email protected]> * Enable batched processing for multimodal samples Co-authored-by: qchapp <[email protected]> --------- Co-authored-by: copilot-swe-agent[bot] <[email protected]> Co-authored-by: qchapp <[email protected]>

qchapp · 2026-01-07T14:59:07Z

I will test the new changes before merging.

qchapp · 2026-01-07T22:09:37Z

I tested again on my small test and it worked:

>>> df[0]
{'image': <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=800x600 at 0x4001A117FCB0>, 'caption': 'A translucent human torso reveals a vividly detailed heart at its core, pulsing with life as intricate networks of arteries and veins—colored in vibrant reds and blues—radiate outward, illustrating the vital circulatory system that sustains the body.', 'original_caption': 'A heart.'}

Here is the test image:

configs/config_pmc_oa.yaml

BoyeGuillaume · 2026-01-29T10:44:42Z

src/mirage/shard_process.py

+        ds_shard.save_to_disk(shard_out_dir)
+        try:
+            llm.shutdown()
+        except Exception:


Yeah add a warning here probably

src/mirage/utils.py

Co-authored-by: Guillaume Boyé <[email protected]>

qchapp added 15 commits December 11, 2025 14:11

first version supporting image modalities and ready to test

0bdbaab

changes to handle cases where image is not in the column of the dataset

1388052

added demo dataset to test

9561e78

small change

704322c

fixed sglang image error

c98e264

fixed image error

73a4071

changes to handle empty shards and test with 32 nodes

620cb2c

maybe this time

20aef28

fix output

62688d1

trying to fix strange error

4e9bbeb

change to use map correctly and prompt

c7ca926

not fully working

5aef6d2

using image token

e7986d8

removed useless code

1226a0d

last changes before PR

a45fb48

qchapp requested review from MichelDucartier and Copilot December 19, 2025 14:04

Copilot started reviewing on behalf of qchapp December 19, 2025 14:04 View session

Copilot AI reviewed Dec 19, 2025

View reviewed changes

qchapp and others added 5 commits December 19, 2025 15:16

Update README.md

6dc2a32

Co-authored-by: Copilot <[email protected]>

Update src/mirage/shard_process.py

6aa3692

Co-authored-by: Copilot <[email protected]>

Update src/mirage/utils.py

9919fa1

Co-authored-by: Copilot <[email protected]>

fixed small error for lower python version

1dc8ed8

Update src/mirage/utils.py

2b38d63

Co-authored-by: Copilot <[email protected]>

Copilot AI mentioned this pull request Dec 19, 2025

Make chat template configurable for multimodal models #5

Merged

qchapp requested review from fabnemEPFL and removed request for MichelDucartier and fabnemEPFL December 19, 2025 14:43

qchapp requested a review from MichelDucartier December 19, 2025 14:43

Copilot AI mentioned this pull request Dec 19, 2025

Optimize batch processing for both text-only and multimodal samples #6

Merged

BoyeGuillaume approved these changes Jan 29, 2026

View reviewed changes

qchapp and others added 4 commits February 2, 2026 10:17

added exception and error handling for llm shutdown

3fd0f6c

Update src/mirage/utils.py

bd64d95

Co-authored-by: Guillaume Boyé <[email protected]>

rework to resolve conflicts with main

f0a24e3

refactor again

63b8646

qchapp had a problem deploying to docker February 2, 2026 10:40 — with GitHub Actions Failure

qchapp had a problem deploying to docker February 2, 2026 10:40 — with GitHub Actions Error

Rename project to MMIRAGE (Modular Multimodal)

553d50a

qchapp had a problem deploying to docker February 2, 2026 16:29 — with GitHub Actions Failure

qchapp had a problem deploying to docker February 2, 2026 16:29 — with GitHub Actions Error

fixed imports

5356d11

qchapp had a problem deploying to docker February 2, 2026 16:54 — with GitHub Actions Failure

qchapp had a problem deploying to docker February 2, 2026 16:54 — with GitHub Actions Error

Feature/multi modal #4

Are you sure you want to change the base?

Feature/multi modal #4

Uh oh!

Conversation

qchapp commented Dec 19, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

BoyeGuillaume Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

qchapp Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI commented Dec 19, 2025

Uh oh!

Copilot AI commented Dec 19, 2025

Uh oh!

qchapp commented Jan 7, 2026

Uh oh!

qchapp commented Jan 7, 2026

Uh oh!

Uh oh!

BoyeGuillaume Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants