Skip to content

Conversation

@gangj
Copy link
Contributor

@gangj gangj commented Apr 14, 2025

No description provided.

The destroy operation is missing in the allowed operations for HA
statefile VDI after HA is disable. It is because when
update_allowed_operations for the VDI, it is found that
ha_disable_in_progress is true.

Xapi_ha.disable_internal will also be called when HA is not enabled
successfully, in that case, ha_enable_in_progress is true.

This fix removes `ha_enable or `ha_disable from pool's current
operations to allow update_allowed_operations to enable destory
operation for the VDI.

Signed-off-by: Gang Ji <[email protected]>
@gangj
Copy link
Contributor Author

gangj commented Apr 14, 2025

Ring3 BST+BVT passed: 215961

Copy link
Contributor

@lindig lindig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is difficult to review; the change is small but there is a lot of context. When was the current operation previously removed?

@gangj
Copy link
Contributor Author

gangj commented Apr 14, 2025

This is difficult to review; the change is small but there is a lot of context. When was the current operation previously removed?

It will be removed "automatically" after the operation is done:

 850       let enable_ha ~__context ~heartbeat_srs ~configuration =
 851         info
 852           "Pool.enable_ha: pool = '%s'; heartbeat_srs = [ %s ]; configuration \
 853            = [ %s ]"
 854           (current_pool_uuid ~__context)
 855           (String.concat ", " (List.map Ref.string_of heartbeat_srs))
 856           (String.concat "; "
 857              (List.map (fun (k, v) -> k ^ "=" ^ v) configuration)
 858           ) ;
 859         let pool = Helpers.get_pool ~__context in
 860         Xapi_pool_helpers.with_pool_operation ~__context ~doc:"Pool.ha_enable"
 861           ~self:pool ~op:`ha_enable (fun () ->
 862             Local.Pool.enable_ha ~__context ~heartbeat_srs ~configuration
 863         )
 864
 865       let disable_ha ~__context =
 866         info "Pool.disable_ha: pool = '%s'" (current_pool_uuid ~__context) ;
 867         let pool = Helpers.get_pool ~__context in
 868         Xapi_pool_helpers.with_pool_operation ~__context ~doc:"Pool.ha_disable"
 869           ~self:pool ~op:`ha_disable (fun () -> Local.Pool.disable_ha ~__context
 870         )
 871
 872       let ha_prevent_restarts_for ~__context ~seconds =
"ocaml/xapi/message_forwarding.ml" 6662 lines --12%--  
220 (** Add to the Pool's current operations, call a function and then remove from the
221     current operations. Ensure the allowed_operations are kept up to date. *)
222 let with_pool_operation ~__context ~self ~doc ~op f =
223   let task_id = Ref.string_of (Context.get_task_id __context) in
224   Helpers.retry_with_global_lock ~__context ~doc (fun () ->
225       assert_operation_valid ~__context ~self ~op ;
226       Db.Pool.add_to_current_operations ~__context ~self ~key:task_id ~value:op
227   ) ;
228   update_allowed_operations ~__context ~self ;
229   (* Then do the action with the lock released *)
230   finally f (* Make sure to clean up at the end *) (fun () ->
231       try
232         Db.Pool.remove_from_current_operations ~__context ~self ~key:task_id ;
233         update_allowed_operations ~__context ~self ;
234         Helpers.Early_wakeup.broadcast
235           (Datamodel_common._pool, Ref.string_of self)
236       with _ -> ()
237   )
"ocaml/xapi/xapi_pool_helpers.ml" 326 lines --59%-- 

Which is a little late for us in this case.

Error (Api_errors.ha_is_enabled, [])
else if
List.mem record.Db_actions.vDI_type [`ha_statefile; `metadata]
List.mem record.Db_actions.vDI_type [`ha_statefile; `redo_log]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the difference between metadata and redo_log here?

Copy link
Contributor Author

@gangj gangj Apr 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix here don't require the understanding of the difference between the two types, the "metadata vdi for HA" is created of type `redo_type, not as its name suggests:

 24 let create ~__context ~sr =
 25   Helpers.call_api_functions ~__context (fun rpc session_id ->
 26       Client.VDI.create ~rpc ~session_id ~name_label:"Metadata for HA"
 27         ~name_description:"Used for master failover" ~sR:sr
 28         ~virtual_size:Redo_log.minimum_vdi_size ~_type:`redo_log ~sharable:true
 29         ~read_only:false ~other_config:[] ~xenstore_data:[]
 30         ~sm_config:Redo_log.redo_log_sm_config ~tags:[]
 31   )
 32
"ocaml/xapi/xha_metadata_vdi.ml" 89 lines --1%--    

@psafont
Copy link
Member

psafont commented Apr 16, 2025

Can you change the title of the second commit? It doesn't look like a fixup, but a genuine fix.

As xha_metadata_vdi is of type `redo_log.

This is why the destroy operation was not missing for the HA metadata VDI
after HA is disabled.

Signed-off-by: Gang Ji <[email protected]>
@gangj gangj force-pushed the private/gangj/CA-408230 branch from ebf9d9e to f19f381 Compare April 16, 2025 09:01
@gangj gangj added this pull request to the merge queue Apr 16, 2025
Merged via the queue into xapi-project:master with commit 48ae1a3 Apr 16, 2025
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants