Skip to content

Conversation

@alfonsosanchezbeato
Copy link
Member

@alfonsosanchezbeato alfonsosanchezbeato commented Oct 20, 2025

Use libblkid instead of using the disks package, as the latter calls to udevadm
trigger/settle, which delays boot considerably.

@alfonsosanchezbeato alfonsosanchezbeato added the Run nested The PR also runs tests inluded in nested suite label Oct 20, 2025
@github-actions github-actions bot added the Run only one system Only runs spread tests on one system label Oct 20, 2025
@codecov
Copy link

codecov bot commented Oct 20, 2025

Codecov Report

❌ Patch coverage is 75.43424% with 99 lines in your changes missing coverage. Please review.
✅ Project coverage is 77.59%. Comparing base (5c3292b) to head (f5e6b94).
⚠️ Report is 19 commits behind head on master.

Files with missing lines Patch % Lines
cmd/snap-bootstrap/disk.go 79.69% 37 Missing and 17 partials ⚠️
cmd/snap-bootstrap/blkid/blkid.go 0.00% 16 Missing ⚠️
cmd/snap-bootstrap/cmd_initramfs_mounts.go 83.54% 10 Missing and 3 partials ⚠️
cmd/snap-bootstrap/blkid/mock_blkid.go 0.00% 11 Missing ⚠️
cmd/snap-bootstrap/cmd_initramfs_mounts_cvm.go 50.00% 2 Missing and 3 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #16182      +/-   ##
==========================================
+ Coverage   77.50%   77.59%   +0.08%     
==========================================
  Files        1342     1337       -5     
  Lines      184078   184336     +258     
  Branches     2444     2444              
==========================================
+ Hits       142664   143029     +365     
+ Misses      32780    32670     -110     
- Partials     8634     8637       +3     
Flag Coverage Δ
unittests 77.59% <75.43%> (+0.08%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions
Copy link

github-actions bot commented Oct 20, 2025

Wed Jan 21 02:34:17 UTC 2026
The following results are from: https://github.com/canonical/snapd/actions/runs/21191868217

Failures:

Preparing:

  • openstack-ext:ubuntu-18.04-64:tests/nested/manual/devmode-snaps-can-run-other-snaps
  • openstack-ext:ubuntu-18.04-64:tests/nested/classic/hotplug
  • openstack-ext:ubuntu-18.04-64:tests/nested/classic/snapshots-with-core-refresh-revert
  • openstack-ext:ubuntu-22.04-64:tests/nested/manual/hybrid-fde-all-key-databases:db
  • openstack-ext:ubuntu-24.04-64:tests/nested/manual/hybrid-fde-all-key-databases:pk

Executing:

  • openstack:arch-linux-64:tests/main/refresh:strict_remote
  • openstack-ext:ubuntu-22.04-64:tests/nested/core/core20-fault-inject-on-refresh:kernel_reboot_link_snap
  • openstack-ext:ubuntu-22.04-64:tests/nested/core/connected-after-reboot-revert
  • openstack-ext:ubuntu-22.04-64:tests/nested/core/core20-fault-inject
  • openstack-ext:ubuntu-22.04-64:tests/nested/manual/muinstaller
  • openstack-ext:ubuntu-22.04-64:tests/nested/manual/muinstaller-real:seeded
  • openstack-ext:ubuntu-24.04-64:tests/nested/core/core20-fault-inject-on-refresh:base_reboot_link_snap
  • openstack-ext:ubuntu-24.04-64:tests/nested/manual/muinstaller-core:install_optional_all
  • openstack-ext:ubuntu-26.04-64:tests/nested/manual/minimal-smoke:secboot_disabled
  • openstack-ext:ubuntu-26.04-64:tests/nested/manual/minimal-smoke:secboot_enabled
  • openstack:ubuntu-core-20-64:tests/regression/lp-2065077
  • openstack:ubuntu-core-24-64:tests/main/snap-user-service-upgrade-failure
  • openstack:ubuntu-core-24-64:tests/main/snap-user-service-start-on-install
  • openstack:ubuntu-core-24-64:tests/main/snap-user-service-restart-on-upgrade
  • openstack:ubuntu-24.04-64:tests/main/interfaces-snap-interfaces-requests-control

Restoring:

  • openstack-ext:ubuntu-22.04-64:tests/nested/manual/muinstaller-real:seeded
  • openstack-ext:ubuntu-22.04-64:tests/nested/manual/
  • openstack-ext:ubuntu-22.04-64:
  • openstack:ubuntu-20.04-64:tests/main/nss-modules:winbind
  • openstack:ubuntu-24.04-64:tests/main/nss-modules:winbind

@alfonsosanchezbeato
Copy link
Member Author

alfonsosanchezbeato commented Oct 21, 2025

I have been doing some tests with the imx93. I think that we should stop relying on the udev properties added by 60-persistent-storage.rules in the initramfs (ID_*), as we can be sure that they exist only after a udevadm settle which will always delay things more than we want. We can be sure only that the udev properties for the disk are already present as the symlink is created later in 90-ubuntu-core-partitions.rules and we have a dependency for it, but events from the partitions might not have been processed yet. And disk.populatePartitions() looks for ID_* properties for all partitions of the disk failing if it does not find any of them.

So, we have 3 options to know which are the partitions to mount:

  • Wait for the symlinks of all partitions that we are interested in to be present (that was already one of my comments in the scan disk PR - @valentindavid noted that the devices would be already present, but it is not the case for the udev properties where we have the race)
  • Use again libblkid as we have done for the snap-bootstrap scan command to find out the ubuntu disk partitions instead of relaying on ID_*. This should not be hard, but we will be doing again some of the work of the scan command.
  • Eventually split up things so each mount is a separate command. But we are still a bit far from that.

@alfonsosanchezbeato
Copy link
Member Author

alfonsosanchezbeato commented Dec 16, 2025

This is rebased now on top of #16370 - it is ready for an initial pass, even if it might be necessary to split. Fwiw even on qemu this removes 25% of the time spent by the snap-bootstrap command.

@alfonsosanchezbeato alfonsosanchezbeato force-pushed the remove-udevadm-calls branch 3 times, most recently from 5bac26e to 45fb797 Compare January 6, 2026 16:15
@alfonsosanchezbeato alfonsosanchezbeato marked this pull request as ready for review January 6, 2026 22:37
@github-actions github-actions bot removed the Run only one system Only runs spread tests on one system label Jan 6, 2026
@alfonsosanchezbeato
Copy link
Member Author

Changing to ready to review now


// PartitionWithFsLabel returns a partition with a filesystem matching fsLabel.
func (d *Disk) PartitionWithFsLabel(fsLabel string) (*Partition, error) {
// We are case-insensitive, this is especially important for the vfat case
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should rethink that. We should only use filesystem labels only when the partition table does not have names (mbr).

Copy link
Member Author

@alfonsosanchezbeato alfonsosanchezbeato Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We saw labels using capital letters in vfat filesystems, so we allow this in the code that was comparing labels in the past, but only for vfat. I can add a check for the fs type easily by extracting also that infomation when probing, let me know if this was your concern here.

Otherwise, we still use only partition info in the GPT case - it is just that now we always probe the fs as that was needed for the "initramfs-mounts" snap-bootstrap command, and in the past we were using this only for the "scan" command. It was simpler code to just probe always everything.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the objection here is that we should try not to match against the FS label, only in cases where we have no other choice (mbr) - not whether it's case-insensitive. You say we are still using partition information for GPT case, but I'm not sure i see that? It looks to me like we are now using filesystem label in all cases?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This says that we are case-insensitive, which puts me on guard in general as it makes things more forgiving, but it just says "this is especially important for the vfat case" - but is that because we want to only be case-insensitive for VFAT or do we also want for other filesystems?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have checked and we use the filesystem label/UUID in quite a few places when running the "initramfs-mounts" command, so we need to probe in that case. But that is not necessary for the "scan-disk" command, so I have added an option so the caller can decide whether to fully probe or not, so we save some time when scanning disks.

Otherwise, now I check for the partition type so we are case insensitive for vfat, which is what we were doing previously.

// GetSize is a wrapper for blkid_partition_get_size
// GetSize returns the size of the partition in bytes
GetSize() int64
// GetNumber is a wrapper for blkid_partition_get_partno
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// GetNumber is a wrapper for blkid_partition_get_partno
// GetNumber returns the proposed partition number of the partition

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just following here the style of the previous function descriptions, just changed GetSize/Start because there were not wrappers anymore. I'm actually happy with changing this, but we should do that then for all methods in AbstractBlkidProbe/AbstractBlkidPartition. Let me know if you are ok with changing this now or as a follow-up.

Comment on lines 263 to 269
func (p *blkidPartition) GetNumber() int {
return int(C.blkid_partition_get_partno(p.partitionHandle))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func (p *blkidPartition) GetNumber() int {
return int(C.blkid_partition_get_partno(p.partitionHandle))
func (p *blkidPartition) GetNumber() int {
// blkid_partition_get_partno proposed partition number (e.g. 'N' from sda'N'). Note that the number
// is generated by library independently of the OS.
// See
// https://www.kernel.org/pub/linux/utils/util-linux/v2.41/libblkid-docs/libblkid-Partitions-probing.html#blkid-partition-get-partno
return int(C.blkid_partition_get_partno(p.partitionHandle))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment added. I find it a bit weird that is says "proposed" as I expect the number to be deterministic as afaict by looking at blkid it comes from the order in GPT/MBR headers. And we would be in trouble if it was assigned differently in the kernel.

}

// Filesystem is the information reported by blkid on a filesystem.
type Filesystem struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I notice this is public API, and used across the cmd files - I'm not a huge fan of this being inside a cmd_* file as this feels like library code, even if it's simple. My opinion is it would be clearer for this to be in a dedicated file.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, moved now.


// PartitionWithFsLabel returns a partition with a filesystem matching fsLabel.
func (d *Disk) PartitionWithFsLabel(fsLabel string) (*Partition, error) {
// We are case-insensitive, this is especially important for the vfat case
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the objection here is that we should try not to match against the FS label, only in cases where we have no other choice (mbr) - not whether it's case-insensitive. You say we are still using partition information for GPT case, but I'm not sure i see that? It looks to me like we are now using filesystem label in all cases?


// PartitionWithFsLabel returns a partition with a filesystem matching fsLabel.
func (d *Disk) PartitionWithFsLabel(fsLabel string) (*Partition, error) {
// We are case-insensitive, this is especially important for the vfat case
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This says that we are case-insensitive, which puts me on guard in general as it makes things more forgiving, but it just says "this is especially important for the vfat case" - but is that because we want to only be case-insensitive for VFAT or do we also want for other filesystems?

@alfonsosanchezbeato alfonsosanchezbeato force-pushed the remove-udevadm-calls branch 2 times, most recently from 0f75c39 to 9ffa601 Compare January 14, 2026 19:20
Use libblkid instead of using the disks package, as the latter calls to udevadm
trigger/settle, which delays boot considerably.
@alfonsosanchezbeato alfonsosanchezbeato changed the title osutil/disks: remove calls to udevadm trigger/settle when finding cmd/snap-bootstrap: use libblkid to find information about disks Jan 15, 2026
Copy link
Member Author

@alfonsosanchezbeato alfonsosanchezbeato left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments addressed now. I have also implemented now Model() so it returns the content of the "model" file in sysfs, which looks like the main thing that is used by udevd to fill ID_MODEL.

// GetSize is a wrapper for blkid_partition_get_size
// GetSize returns the size of the partition in bytes
GetSize() int64
// GetNumber is a wrapper for blkid_partition_get_partno
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just following here the style of the previous function descriptions, just changed GetSize/Start because there were not wrappers anymore. I'm actually happy with changing this, but we should do that then for all methods in AbstractBlkidProbe/AbstractBlkidPartition. Let me know if you are ok with changing this now or as a follow-up.

Comment on lines 263 to 269
func (p *blkidPartition) GetNumber() int {
return int(C.blkid_partition_get_partno(p.partitionHandle))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment added. I find it a bit weird that is says "proposed" as I expect the number to be deterministic as afaict by looking at blkid it comes from the order in GPT/MBR headers. And we would be in trouble if it was assigned differently in the kernel.

}

// Filesystem is the information reported by blkid on a filesystem.
type Filesystem struct {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, moved now.


// PartitionWithFsLabel returns a partition with a filesystem matching fsLabel.
func (d *Disk) PartitionWithFsLabel(fsLabel string) (*Partition, error) {
// We are case-insensitive, this is especially important for the vfat case
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have checked and we use the filesystem label/UUID in quite a few places when running the "initramfs-mounts" command, so we need to probe in that case. But that is not necessary for the "scan-disk" command, so I have added an option so the caller can decide whether to fully probe or not, so we save some time when scanning disks.

Otherwise, now I check for the partition type so we are case insensitive for vfat, which is what we were doing previously.

Copy link
Collaborator

@pedronis pedronis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did a first pass, some comments and questions

// Disk implementations provide disk information required by secboot.
type Disk interface {
SecbootPartitionWithFsLabel(string) (Partition, error)
Model() string
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe this should be called DiskModel, as we have too many things called model already

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, done


// Disk implementations provide disk information required by secboot.
type Disk interface {
SecbootPartitionWithFsLabel(string) (Partition, error)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a strange name, are you trying to convey that the result is a secboot.Partition, I don't think golang conventions usually do that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a conflict with the definition of PartitionWithFsLabel in snap-bootstrap for Disk. In the end I have used an adapter so the name can be the same.

// Implemention of interface needed by secboot.
var _ = secboot.Partition(&Partition{})

func (p *Partition) PartitionNode() string {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few of these accessors don't seem to be tested

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added checks

}

func (d *Disk) Model() string {
// Get file information for the device node
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function is not tested it seems

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added tests now

m.degradedState.partition(part).MountState = boot.PartitionMounted
m.degradedState.partition(part).MountLocation = where

if err := m.verifyMountPoint(where, part); err != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you explain why we don't need/want this anymore? is it related to the switch to scan_disk?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, because of that and also because we are moving to libblkid also in initramfs-mounts command - this means that snap-bootstrap is probing directly blocks in /dev/<node>, so we read all partition information for the disk we know it was the boot disk. Previously we had to do this verification as we were relying on symlinks created by udevd, where there is always a chance that some other disk had the same label/UUID, etc.

So now fsDevice points to /dev/<disk>p<N> instead of to /dev/disk/by-partuuid/<UUID>.

disk, err := disks.DiskFromDeviceName("/dev/disk/snapd/disk")
// If the disk kernel came from cannot be determined, then it will fallback to
// looking at the specified disk label.
func findBootDisk(fallbacklabel string) (*Disk, string, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this and helpers perhaps live in disk.go ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, moved now

Copy link
Member Author

@alfonsosanchezbeato alfonsosanchezbeato left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pedronis thanks for the review, I've addressed the comments now

m.degradedState.partition(part).MountState = boot.PartitionMounted
m.degradedState.partition(part).MountLocation = where

if err := m.verifyMountPoint(where, part); err != nil {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, because of that and also because we are moving to libblkid also in initramfs-mounts command - this means that snap-bootstrap is probing directly blocks in /dev/<node>, so we read all partition information for the disk we know it was the boot disk. Previously we had to do this verification as we were relying on symlinks created by udevd, where there is always a chance that some other disk had the same label/UUID, etc.

So now fsDevice points to /dev/<disk>p<N> instead of to /dev/disk/by-partuuid/<UUID>.

disk, err := disks.DiskFromDeviceName("/dev/disk/snapd/disk")
// If the disk kernel came from cannot be determined, then it will fallback to
// looking at the specified disk label.
func findBootDisk(fallbacklabel string) (*Disk, string, error) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, moved now


// Disk implementations provide disk information required by secboot.
type Disk interface {
SecbootPartitionWithFsLabel(string) (Partition, error)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a conflict with the definition of PartitionWithFsLabel in snap-bootstrap for Disk. In the end I have used an adapter so the name can be the same.

// Disk implementations provide disk information required by secboot.
type Disk interface {
SecbootPartitionWithFsLabel(string) (Partition, error)
Model() string
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, done

// Implemention of interface needed by secboot.
var _ = secboot.Partition(&Partition{})

func (p *Partition) PartitionNode() string {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added checks

}

func (d *Disk) Model() string {
// Get file information for the device node
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added tests now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Run nested The PR also runs tests inluded in nested suite

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants