From d308a918855cc104288ed83be260db028fba0014 Mon Sep 17 00:00:00 2001 From: Changlei Li Date: Wed, 31 Dec 2025 13:46:41 +0800 Subject: [PATCH] CA-422282 Fix race condition about xenops cache There is race condition about vm cache between pool_migrate_complete and VM event. In the cross-pool migration case, it is designed to create vm with power_state Halted in XAPI db. In pool_migrate_complete, add_caches create an empty xenops_chae for the VM, then refresh_vm compares the cache powerstate None with its real state Running to update the right powerstate to XAPI db. In the fail case, it is found that: -> VM event 1 update_vm -> pool_migrate_complete add_caches (cache power_state None) -> pool_migrate_complete refresh_vm -> VM event 1 update cache (cache power_state Running) -> VM event 2 update_vm (Running <-> Running, XAPI DB not update) When pool_migrate_complete add_caches, the cache update of previous VM event 1 breaks the design intention. This commit add a wait in pool_migrate_complete to ensure all in-flight events complete before add_caches. Then there will be no race condition. Signed-off-by: Changlei Li --- ocaml/xapi/xapi_vm_migrate.ml | 1 + 1 file changed, 1 insertion(+) diff --git a/ocaml/xapi/xapi_vm_migrate.ml b/ocaml/xapi/xapi_vm_migrate.ml index cba3d5f4966..3e147054e6b 100644 --- a/ocaml/xapi/xapi_vm_migrate.ml +++ b/ocaml/xapi/xapi_vm_migrate.ml @@ -492,6 +492,7 @@ let pool_migrate_complete ~__context ~vm ~host:_ = if Xapi_xenops.vm_exists_in_xenopsd queue_name dbg id then ( remove_stale_pcis ~__context ~vm ; Xapi_xenops.set_resident_on ~__context ~self:vm ; + Xapi_xenops.Events_from_xenopsd.wait queue_name dbg id () ; Xapi_xenops.add_caches id ; Xapi_xenops.refresh_vm ~__context ~self:vm ; Monitor_dbcalls_cache.clear_cache_for_vm ~vm_uuid:id