Skip to content

Conversation

@edwintorok
Copy link
Member

Draft, because I still need to test it.

@lindig
Copy link
Contributor

lindig commented Apr 11, 2025

diff --git a/ocaml/xcp-rrdd/bin/rrdd/xcp_rrdd.ml b/ocaml/xcp-rrdd/bin/rrdd/xcp_rrdd.ml
index e2b6f741f..3fe57962e 100644
--- a/ocaml/xcp-rrdd/bin/rrdd/xcp_rrdd.ml
+++ b/ocaml/xcp-rrdd/bin/rrdd/xcp_rrdd.ml
@@ -543,7 +543,7 @@ let monitor_write_loop writers =
               ) ;
               Thread.delay !Rrdd_shared.timeslice
             with e ->
-              debug
+              warn
                 "Monitor/write thread caught an exception. Pausing for 10s, \
                  then restarting: %s"
                 (Printexc.to_string e) ;

Would add this. Debug level is info so this is invisible.

…wn races

Note that this hostload metric is completely wrong, and we should
probably drop it.
But for now prevent it from introducing gaps in the entire host CPU
graph: if an exception escapes here, then nothing else gets reported,
because the host CPU, VM CPU and hostload metrics all get returned as a single
list. An exception escaping from any prevents all the others from
working too.

Fixes: 9aa7dfc ("CP-43574: Add host load data source")
Signed-off-by: Edwin Török <[email protected]>
@edwintorok edwintorok force-pushed the pr/rrd-cpu branch 2 times, most recently from 8e5fa03 to a61816a Compare April 11, 2025 15:07
When this plugin was part of xcp-rrdd the shared memory protocol wasn't
used for transfering data (it was transferred by direct function call).
The shared memory protocol has a limitation that you need to declare a
maximum size from the beginning, or carefully grow the file itself
dynamically.

We support max 1024 VMs/host, and if each one needs 5 pages,
that'd be 20Mib/host, so always use that as the minimum number of pages.

Fixes: b3ea092 ("IH-615: Move CPU-related data-source collection into a separate RRDD plugin")
Signed-off-by: Edwin Török <[email protected]>
@edwintorok
Copy link
Member Author

In fact 1 page / VM is not enough, a VM with 32 vCPUs needs a bit more than 2, so 64 should fit in 5. Still only 20MiB/host to avoid the bug.

Even with these 2 fixes there are still occasional gaps in the CPU graphs, so there are likely more bugs in the CPU plugin (other plugins don't have the gaps).

@edwintorok edwintorok marked this pull request as ready for review April 11, 2025 15:10
@psafont psafont enabled auto-merge April 11, 2025 15:18
@psafont psafont added this pull request to the merge queue Apr 11, 2025
Merged via the queue into xapi-project:master with commit ece6444 Apr 11, 2025
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants