Skip to content

refactor: refactor rs/ic_os/metrics: shared write helper, replace hand-rolled formatter, rename tools#9330

Open
shilingwang wants to merge 5 commits intomasterfrom
shiling/metrics-tool
Open

refactor: refactor rs/ic_os/metrics: shared write helper, replace hand-rolled formatter, rename tools#9330
shilingwang wants to merge 5 commits intomasterfrom
shiling/metrics-tool

Conversation

@shilingwang
Copy link
Contributor

@shilingwang shilingwang commented Mar 12, 2026

#NODE-1524

Summary

This PR cleans up the five metric-writing tools under rs/ic_os/ in three ways:

  1. Shared atomic write helper (rs/ic_os/metrics/utils)
    A new ic-os-metrics-utils crate provides a single write_registry_to_file(registry, path) function. It encodes a Prometheus registry and writes the result atomically via a temp-file rename (using ic_sys::write_string_using_tmp_file), so a partially-written .prom file is never visible to node_exporter.
    Previously, metrics_tool, nft_exporter, and guest_vm_runner all duplicated the same 6-line BufWriter / TextEncoder / flush block and wrote non-atomically. All four are now updated to use the shared helper.
  2. Replace hostos_tool's hand-rolled Prometheus formatter
    hostos_tool/src/prometheus_metric.rs (158 lines) reimplemented the Prometheus text exposition format from scratch, used f32 for metric values, and only supported writing a single metric. It is deleted and replaced with the standard prometheus crate (GaugeVec with a gen label), consistent with the rest of the codebase.
  3. Rename tools for consistency
    The _tool suffix was used inconsistently — hostos_tool, guestos_tool, setupos_tool are all configuration/provisioning binaries, while fstrim_tool and metrics_tool are periodic metric exporters that live in rs/ic_os/metrics/. Both are renamed to match the folder's naming convention:
  • metrics_toolcustom_metrics (binary, crate, systemd units)
  • fstrim_toolfstrim_metrics (binary, crate, library, systemd units)

The public entry-point function fstrim_tool(...) in the library crate is also simplified to run(...).
4. Fix integration test metric ordering
fstrim_metrics integration tests were asserting on a hardcoded metric ordering that did not match the alphabetical order Prometheus registries actually produce (fstrim_datadir_* < fstrim_last_* < fstrim_runs_*). All three affected expected strings are corrected.

Files changed
Area Change
rs/ic_os/metrics/utils/ New shared crate
rs/ic_os/metrics/custom_metrics/ Renamed from metrics_tool; uses shared helper
rs/ic_os/metrics/fstrim_metrics/ Renamed from fstrim_tool; integration test assertions fixed
rs/ic_os/metrics/nft_exporter/ Uses shared helper (atomic writes)
rs/ic_os/os_tools/guest_vm_runner/src/metrics.rs Uses shared helper (atomic writes)
rs/ic_os/os_tools/hostos_tool/ Deletes prometheus_metric.rs; uses prometheus crate + shared helper
ic-os/components/monitoring/guestos/ Renamed metrics_tool.* and fstrim_tool.* systemd units
rs/tests/crypto/ Updated for fstrim_metrics rename and new crate name

@github-actions github-actions bot added the chore label Mar 12, 2026
@shilingwang shilingwang changed the title chore: use prometheus trait for all metrics tools of guestos refactor: refactor rs/ic_os/metrics: shared write helper, replace hand-rolled formatter, rename tools Mar 13, 2026
@github-actions github-actions bot added refactor and removed chore labels Mar 13, 2026
@shilingwang shilingwang marked this pull request as ready for review March 13, 2026 13:27
@shilingwang shilingwang requested review from a team as code owners March 13, 2026 13:27
Copy link
Contributor

@andrewbattat andrewbattat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants