Skip to content

Conversation

@jokkalt
Copy link

@jokkalt jokkalt commented Jan 8, 2026

Implements the following flow from the matlab files RotateCropImage.m and RemoveHolesAndStitchesNew.m:
- Determine the rotation angle for the image and mask
By using the given rotation_angle if rotation_angle != 0. Otherwise, if the first object of crop_info is a
rectangle, then the rotation_angle of that rectangle is determined. Any other case will lead to a rotation angle
of 0.
- If the rotation angle is not 0, the mask is binary dilated using DILATE_STEPS iterations to correct for
imperfections when rotating. A margin is determined to reduce the final image to compensate for the dilation.
- The mask and image are cropped to the bounds of the mask.
- The scan image is cleaned of needles. The parameter times_median is used to determine the threshold to find
outliers. (this step is RemoveHolesAndStitchesNew.m; note that the parameter interpolate_data in that script is ALWAYS 0, so a lot of code is skipped)
- The cleaned image is masked using the cropped mask.
- The image and mask are rotated by rotation_angle. (This code is freshly written, not using matlab code, since matlab code did not translate well due to java/matlab/python coordinate system used)
- The rotated image and rotated mask are cropped to the bounds of the mask, using margin to compensate for dilation.

NOTE: there was a bug in the existing crop_to_mask function: y and x were switched

jokkalt and others added 30 commits January 6, 2026 12:37
Returns will be use to wrap function in container that can be passed
around in rail. Thus this will be use for railway function methodology
We no longer are working with the raw value, since it's unclear at
runtime if a failure occured or not. Thus the raw value will be
encapsulated in a container that can be passed around until the entended
reciever consumes it; think of a mail currier package.

State of the container can be Success or Failure, but that can be unpack
lazily
Writen module docstrings for new modules

with the assistant of Claud (AI)
Pipelines is an abstracted module that communicates with the returns
layer. This means endpoints can call scratch-core function without
caring without caring what the underlying protocol is used for
failures/exceptions
Create an upload scan parameters model that hold all the parameters to
be used for the pipelines. All the parameters have a defaults and
defaults will be generated if no parameters is given, thus nothing
changes for the user or endpoint contract
deptry notify that dependencies needed maintainance
* mask and crop functionality added

* type fixes and some efficiency
gaussian filter

---------

Co-authored-by: Sharlon N. Regales <[email protected]>
* resampling method added
* Mark type and resampling for marks implemented
added files to lfs
@jokkalt jokkalt changed the title WIP Feature/rotate cropped image Feature/rotate cropped image Jan 15, 2026
@jokkalt jokkalt marked this pull request as ready for review January 15, 2026 15:05
selected area.
The points dict differs per CropType:
CIRCLE: {'center': array [x, y], 'radius': float}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any way we could enforce/test/validate this when creating a CropInfo?

ELLIPSE: {'center': array [x, y], 'majoraxis': float, 'minoraxis': float, angle_majoraxis: float}
"""

data: dict[str, Any]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you can create a pydantic model per type and let pydantic figure out which one matches the parameters in data?


scan_image: ScanImage
mark_type: MarkType
crop_infos: list[CropInfo]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use a tuple here instead of a list?

def crop_to_mask(
image: ScanMap2DArray,
mask: MaskArray,
margin: Optional[int] = None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
margin: Optional[int] = None,
margin: int | None = None,

Crops an image to the bounding box of a mask.
:param image: The image to crop
:param mask: Binary mask
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add the new margin parameter to docstring


def _determine_bounding_box(mask: MaskArray) -> tuple[slice, slice]:
def _determine_bounding_box(
mask: MaskArray, margin: Optional[int] = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
mask: MaskArray, margin: Optional[int] = None
mask: MaskArray, margin: int | None = None

:param times_median: Parameter used to determine what is considered an outlier when removing outliers/needles.
:return: Tuple of the cropped, rotated and masked scan image, and the rotated and cropped mask.
"""
rotation_angle = get_rotation_angle(crop_infos, rotation_angle)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

according to this function, crop_infos cannot be None

)

# Apply mask and prepare data
scan_image_masked = ScanImage(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a mask_and_crop_scan_image method for this

:param times_median: Parameter to help determine the outlier threshold.
:return: The cleaned scan image.
"""
filter_size_moderated = 5
Copy link
Collaborator

@cfs-data cfs-data Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this a module constant, or parameter with default value?

times_median = times_median * 6

# Check if this is a small strip of data
is_small_strip = scan_image.width <= 20 or scan_image.height <= 20
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change the magic value 20 into a module constant?

:param mask: Binary mask array
:param margin: Margin around the bounding box to either crop (positive) or extend (negative) the bounding box
:return: Tuple of (y_slice, x_slice) for the bounding box
Copy link
Collaborator

@cfs-data cfs-data Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Order of the tuple in return type here is not matching the output



def _determine_bounding_box(mask: MaskArray) -> tuple[slice, slice]:
def _determine_bounding_box(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def _determine_bounding_box(
def determine_bounding_box(

:param mask: Binary mask array
:param margin: Margin around the bounding box to either crop (positive) or extend (negative) the bounding box
:return: Tuple of (y_slice, x_slice) for the bounding box
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

switch as well

y_max, x_max = np.max(non_zero_coords, axis=1)
return slice(x_min, x_max + 1), slice(y_min, y_max + 1)

if margin:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if margin defaults to 0 then this is cleaner:

y_coords, x_coords = np.nonzero(mask)
y_min = max(0, y_coords.min() + margin)
y_max = min(mask.shape[0], y_coords.max() - margin + 1)
x_min = max(0, x_coords.min() + margin)
x_max = min(mask.shape[1], x_coords.max() - margin + 1)

scan_image: ScanImage, mask: MaskArray, times_median: float = 15.0
) -> ScanImage:
"""
Remove needle artifacts (outliers) from depth measurement data using median filtering.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are needle artifacts?

# Calculate subsampling factor for computational efficiency
# Goal: 7 μm sampling with 70 μm filter diameter
subsample_factor = int(
np.ceil(70e-6 / filter_size_moderated / scan_image.scale_x)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Magic value should be module constant


if not is_small_strip:
# Calculate subsampling factor for computational efficiency
# Goal: 7 μm sampling with 70 μm filter diameter
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"sampling" is not clear here, do you mean pixel size / scale?

if subsample_factor > 1:
scan_image_subsampled = unwrap_result(
subsample_scan_image(
scan_image_masked, subsample_factor, subsample_factor
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use named keywords when there are multiple keywords


residual_image = (
scan_image_masked.data
- scan_image_filtered.data[
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is scan_image_filtered containing NaNs in the same places as scan_image_masked?

]
)
else:
# For small strips: use 1D filtering with adjusted kernel size
Copy link
Collaborator

@cfs-data cfs-data Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is now very long. I think it would be beneficial to split the function up into smaller functions with single responsibilities, and only create a small pipeline / logical flow here


# Find outliers: points where |residual| > threshold
threshold = times_median * np.nanmedian(np.abs(residual_image))
indices_invalid = np.abs(residual_image) > threshold
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
indices_invalid = np.abs(residual_image) > threshold
mask_outliers = np.abs(residual_image) > threshold

times_median = times_median * 6

# Check if this is a small strip of data
is_small_strip = scan_image.width <= 20 or scan_image.height <= 20
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here you are checking for both width and height (as opposed to the matlab code) but further down you are assuming the width is the shortest?

)


def apply_median_filter(scan_image: ScanImage, filter_size: int) -> ScanImage:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a good idea to rewrite this implementation in a way that makes more clear what actually happens. From the code it seems we are applying a windowing function and compute the medians over the windows?

def rotate_crop_and_mask_image_by_crop(
scan_image: ScanImage,
mask: MaskArray,
rotation_angle: float = 0.0,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this function need a default value of 0.0?

indices_invalid = np.abs(residual_image) > threshold

# Remove outliers by setting them to NaN
scan_image_without_outliers = scan_image.data.copy()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not a scan_image anymore. And I was wondering whether we could make a update_mark_data-like function to cover all this boilerplate up?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants