Skip to content

Conversation

@Millefeuille42
Copy link

No description provided.

@Millefeuille42 Millefeuille42 self-assigned this Jun 30, 2025
Comment on lines 2082 to 2086
self._linstor.resource_auto_place(
rsc_name=resource.name,
place_count=1,
diskless_type=linstor.consts.FLAG_DRBD_DISKLESS
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Must be recreated on the correct node, otherwise you change the behavior on the caller side.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on what I read on the linstor docs, I'd say something like this would be appropriate. Is it ?

        self._linstor.resource_create(
            rscs=[
                linstor.linstorapi.ResourceData(
                    node_name=resource.node_name,
                    rsc_name=resource.name,
                    storage_pool=volume.storage_pool_name,
                    diskless=linstor.consts.FLAG_DISKLESS in resource.flags,
                    drbd_diskless=linstor.consts.FLAG_DRBD_DISKLESS in resource.flags,
                    nvme_initiator=linstor.consts.FLAG_NVME_INITIATOR in resource.flags,
                    ebs_initiator=linstor.consts.FLAG_EBS_INITIATOR in resource.flags
                )
            ]
        )

Copy link
Author

@Millefeuille42 Millefeuille42 Jul 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion added in 6b32030

place_count=1,
diskless_type=linstor.consts.FLAG_DRBD_DISKLESS
)
self._mark_resource_cache_as_dirty()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure there is no issue for caller with for resource in self._get_resource_cache().resources code? 😄

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here in current usage I break the parent loop with a return, however for now it is up to the caller to handle the fact that this function implicitly modifies shared state. I see two options:

  • Document this clearly in a docstring, specifying that it mutates the resource cache
  • Move cache invalidation responsibility to the caller

I'd prefer the first option since as I know there is little to no point to not invalidate the cache afterwards ? 🤔

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been changed to a scan happening before the loop in 9a9dec1. So this is not an issue anymore.

Comment on lines 527 to 533
return util.retry(
lambda: self.allocated_volume_size,
maxretry=1,
period=1
)
except LinstorVolumeManagerError as e:
raise e
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this code. 😅 Maybe we should talk about it in a call.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See answers below, the util.retry was unnecessary and removed in bae7edb, meanwhile the except->raise is there to prevent nested exceptions of the same type.

Comment on lines 2149 to 2150
except LinstorVolumeManagerError as e:
raise e
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the function will call itself if a LinstorVolumeManagerError is raised, that exception can be caught again below, unnecessarily nesting exceptions of the same type. This might be a bit of overkill though, so I can remove it if you see no use in it.

Comment on lines 2144 to 2148
return util.retry(
lambda: self._get_volumes_info(volume_name),
maxretry=1,
period=1
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why a retry, and with a maxretry of 1 and a return here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might have mixed up stuff because I'm not sure of why I did this in the first place. I have fixed this in bae7edb. However this brings the eventuality of a loop if the resource is gets broken in the meantime.

@Wescoeur
Copy link
Member

Wescoeur commented Jul 8, 2025

  • Can we delete a diskless that is open in read-only mode? Does DRBD prevent deletion if it is open?
  • Can we detect if a diskless is open (without using /sys/kernel/debug/)?

@Millefeuille42
Copy link
Author

Following issues that @Wescoeur noted we decided to have this PR on hold. We don't really know what is the cause of this issue and it seems relatively rare.

Not being sure of what happens when trying to delete a resource open in read-only and how to know if it is open makes room for further errors. Thus, manual repair remains preferred for the moment.

This is something that will be easily handled in SMAPIv3.

@stormi
Copy link
Member

stormi commented Jul 17, 2025

Then I suggest converting this PR back to draft.

@Millefeuille42 Millefeuille42 marked this pull request as draft July 17, 2025 10:18
@Wescoeur Wescoeur force-pushed the 3.2.12-8.3 branch 3 times, most recently from e7801da to ca5b52d Compare August 28, 2025 15:32
@Wescoeur Wescoeur force-pushed the 3.2.12-8.3 branch 2 times, most recently from 66c7c8e to aaaab27 Compare December 1, 2025 23:24
@Millefeuille42 Millefeuille42 force-pushed the mlr-1037-robustify-linstor-diskless-path-recreation branch from 9a9dec1 to 4ba4f62 Compare December 8, 2025 10:04
@Millefeuille42 Millefeuille42 force-pushed the mlr-1037-robustify-linstor-diskless-path-recreation branch from 4ba4f62 to 5e85ea6 Compare December 10, 2025 09:55
Signed-off-by: Mathieu Labourier <mathieu.labourier@vates.tech>
@Millefeuille42 Millefeuille42 force-pushed the mlr-1037-robustify-linstor-diskless-path-recreation branch from 5e85ea6 to a53dd00 Compare December 10, 2025 10:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants