Skip to content

Conversation

@MGAMZ
Copy link
Owner

@MGAMZ MGAMZ commented Jan 16, 2026

As title.

Copilot AI review requested due to automatic review settings January 16, 2026 14:53
@MGAMZ MGAMZ added the dataset itkit.dataset module label Jan 16, 2026
@codecov
Copy link

codecov bot commented Jan 16, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (main@bcf313a). Learn more about missing BASE report.
✅ All tests successful. No failed tests found.

Additional details and impacted files
@@           Coverage Diff           @@
##             main      #59   +/-   ##
=======================================
  Coverage        ?   72.22%           
=======================================
  Files           ?      106           
  Lines           ?    11798           
  Branches        ?     1054           
=======================================
  Hits            ?     8521           
  Misses          ?     3032           
  Partials        ?      245           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds support for the Medical Segmentation Decathlon (MSD) dataset to the itkit framework. The MSD dataset includes 10 different medical imaging segmentation tasks across various anatomical structures and imaging modalities.

Changes:

  • Added MSD dataset classes for 10 tasks (Brain Tumor, Heart, Liver, Hippocampus, Prostate, Lung, Pancreas, Hepatic Vessel, Spleen, and Colon)
  • Provided metadata JSON file with dataset information including labels, modalities, and references
  • Included a conversion script to reorganize MSD dataset directory structure
  • Removed git submodules configuration

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
itkit/dataset/MSD/mm_dataset.py Defines dataset classes for all 10 MSD tasks, with both SeriesVolumeDataset and PatchedDataset variants
itkit/dataset/MSD/dataset.json Contains metadata for all 10 MSD tasks including labels, modalities, references, and dataset sizes
itkit/dataset/MSD/convert.py Provides utility script to reorganize MSD dataset from original structure to itkit's expected format
itkit/dataset/MSD/init.py Exports all MSD dataset classes for external use
.gitmodules Removes git submodules configuration (mmengine, mmsegmentation, mmpretrain)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

"tensorImageSize": "3D",
"reference": "King’s College London",
"licence": "CC-BY-SA 4.0",
"relase": "1.0 04/05/2018",
Copy link

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected spelling of 'relase' to 'release'.

Copilot uses AI. Check for mistakes.
"description": "Left and right hippocampus segmentation",
"reference": " Vanderbilt University Medical Center",
"licence": "CC-BY-SA 4.0",
"relase": "1.0 04/05/2018",
Copy link

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected spelling of 'relase' to 'release'.

Copilot uses AI. Check for mistakes.
"description": "Prostate transitional zone and peripheral zone segmentation",
"reference": "Radboud University, Nijmegen Medical Centre",
"licence": "CC-BY-SA 4.0",
"relase": "1.0 04/05/2018",
Copy link

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected spelling of 'relase' to 'release'.

Copilot uses AI. Check for mistakes.
"description": "Lung and cancer segmentation",
"reference": "The Cancer Imaging Archive",
"licence": "CC-BY-SA 4.0",
"relase": "1.0 04/05/2018",
Copy link

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected spelling of 'relase' to 'release'.

Copilot uses AI. Check for mistakes.
"description": "Pancreas and cancer segmentation",
"reference": "Memorial Sloan Kettering Cancer Center ",
"licence": "CC-BY-SA 4.0",
"relase": "1.0 04/05/2018",
Copy link

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected spelling of 'relase' to 'release'.

Copilot uses AI. Check for mistakes.
Comment on lines +10 to +21
if not JSON_PATH.exists():
return {}
with open(JSON_PATH, encoding="utf-8") as f:
return json.load(f)

MSD_METADATA = _load_metadata()

def _get_classes(task_id: str) -> list[str]:
labels = MSD_METADATA.get(task_id, {}).get("labels", {})
return [labels[k] for k in sorted(labels.keys(), key=int)]


Copy link

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _get_classes function will fail if MSD_METADATA is empty or if dataset.json is missing. When JSON_PATH doesn't exist, _load_metadata returns an empty dict, causing _get_classes to return an empty list for classes. This could lead to runtime errors when the dataset classes are instantiated with invalid METAINFO. Consider adding validation or error handling to ensure the JSON file exists and contains the expected task data.

Suggested change
if not JSON_PATH.exists():
return {}
with open(JSON_PATH, encoding="utf-8") as f:
return json.load(f)
MSD_METADATA = _load_metadata()
def _get_classes(task_id: str) -> list[str]:
labels = MSD_METADATA.get(task_id, {}).get("labels", {})
return [labels[k] for k in sorted(labels.keys(), key=int)]
"""Load dataset metadata from JSON_PATH.
Raises:
FileNotFoundError: If the metadata JSON file does not exist.
ValueError: If the JSON cannot be parsed or does not contain a dict.
"""
if not JSON_PATH.exists():
raise FileNotFoundError(f"Dataset metadata file not found: {JSON_PATH}")
try:
with open(JSON_PATH, encoding="utf-8") as f:
data = json.load(f)
except json.JSONDecodeError as exc:
raise ValueError(f"Failed to parse dataset metadata JSON at {JSON_PATH}") from exc
if not isinstance(data, dict):
raise ValueError(f"Dataset metadata in {JSON_PATH} must be a JSON object (dict)")
return data
MSD_METADATA = _load_metadata()
def _get_classes(task_id: str) -> list[str]:
"""Return the ordered list of class labels for a given task.
Raises:
KeyError: If the task_id is not present in the metadata.
ValueError: If labels are missing, empty, or have invalid keys.
"""
task_meta = MSD_METADATA.get(task_id)
if task_meta is None:
raise KeyError(f"Task id {task_id!r} not found in dataset metadata")
labels = task_meta.get("labels")
if not isinstance(labels, dict) or not labels:
raise ValueError(f"Missing or invalid 'labels' for task {task_id!r} in dataset metadata")
try:
sorted_keys = sorted(labels.keys(), key=int)
except (TypeError, ValueError) as exc:
raise ValueError(
f"Label keys for task {task_id!r} must be strings convertible to integers"
) from exc
return [labels[k] for k in sorted_keys]

Copilot uses AI. Check for mistakes.

def _get_classes(task_id: str) -> list[str]:
labels = MSD_METADATA.get(task_id, {}).get("labels", {})
return [labels[k] for k in sorted(labels.keys(), key=int)]
Copy link

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sorted call uses key=int which will fail if any label key cannot be converted to an integer. This could raise a ValueError at module import time if the JSON contains non-numeric label keys. Consider adding error handling or validation for the label keys format.

Suggested change
return [labels[k] for k in sorted(labels.keys(), key=int)]
try:
sorted_keys = sorted(labels.keys(), key=int)
except (ValueError, TypeError):
# Fallback: sort keys as strings if they cannot be converted to int
sorted_keys = sorted(labels.keys())
return [labels[k] for k in sorted_keys]

Copilot uses AI. Check for mistakes.
if target.exists():
print(f" Warning: {item.name} already exists in 'image', skipping.")
else:
shutil.move(str(item), str(target))
Copy link

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using shutil.move on potentially large medical image files without error handling could lead to data loss if the operation fails mid-transfer. Consider copying files first and only removing the source after verification, or at least wrap the move operation in a try-except block with appropriate error handling and recovery.

Copilot uses AI. Check for mistakes.
if target.exists():
print(f" Warning: {item.name} already exists in 'label', skipping.")
else:
shutil.move(str(item), str(target))
Copy link

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using shutil.move on potentially large medical image files without error handling could lead to data loss if the operation fails mid-transfer. Consider copying files first and only removing the source after verification, or at least wrap the move operation in a try-except block with appropriate error handling and recovery.

Copilot uses AI. Check for mistakes.
Comment on lines +1 to +22
from .mm_dataset import (
Task01_BrainTumour_Mha,
Task01_BrainTumour_Patch,
Task02_Heart_Mha,
Task02_Heart_Patch,
Task03_Liver_Mha,
Task03_Liver_Patch,
Task04_Hippocampus_Mha,
Task04_Hippocampus_Patch,
Task05_Prostate_Mha,
Task05_Prostate_Patch,
Task06_Lung_Mha,
Task06_Lung_Patch,
Task07_Pancreas_Mha,
Task07_Pancreas_Patch,
Task08_HepaticVessel_Mha,
Task08_HepaticVessel_Patch,
Task09_Spleen_Mha,
Task09_Spleen_Patch,
Task10_Colon_Mha,
Task10_Colon_Patch,
)
Copy link

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new MSD dataset classes lack test coverage. Other datasets in the repository have test coverage in tests/dataset/test_dataset_registry.py with parametrized tests for common metainfo validation. Consider adding similar test coverage for the MSD dataset classes.

Copilot uses AI. Check for mistakes.
@MGAMZ MGAMZ merged commit 94b7e74 into main Jan 16, 2026
3 of 5 checks passed
@MGAMZ MGAMZ deleted the dev/MSD branch January 16, 2026 15:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dataset itkit.dataset module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants