kvm: suspend/resume in deleting vm snapshot on kvm#4033
kvm: suspend/resume in deleting vm snapshot on kvm#4033DaanHoogland merged 1 commit intoapache:4.13from
Conversation
|
@weizhouapache being Wei Zhou, as usual... :) thanks! Shall we try to squeeze this in master only (4.13 is already out for voting) - or shall we craft another 4.13 RC2 due to this? (I don't see a problem there) - this seems to be a serious enough issue/fix to warrant RC2? (and simple enough change to NOT ask for a serious full-blown retesting of everything) ? /cc @DaanHoogland @rhtyd |
|
@blueorangutan package |
|
@andrijapanicsb a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress. |
|
Packaging result: ✔centos7 ✔debian. JID-1171 |
|
@blueorangutan test |
|
@andrijapanicsb a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests |
|
Good boy., ape.... |
|
Would expect to have it in 4.13 RC2 given it's potential but serious impact. |
|
Trillian test result (tid-1420)
|
...in/java/com/cloud/hypervisor/kvm/resource/wrapper/LibvirtDeleteVMSnapshotCommandWrapper.java
Show resolved
Hide resolved
DaanHoogland
left a comment
There was a problem hiding this comment.
Code looks good but for one scenario, When.the VM was suspended and the user wants to delete a snapshot without starting the VM up again. I'm not sure if this can happen. Can we be sure it can't?
|
Not sure ai understand your question @DaanHoogland? |
|
@andrijapanicsb @DaanHoogland this also LGTM, smaller than the other two. The code to suspend is okay, but a check may be preferable. @blueorangutan package |
|
@rhtyd a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress. |
andrijapanicsb
left a comment
There was a problem hiding this comment.
LGTM
Tested manually and it works as expected - VM get's suspended very briefly.
NOTE: VM does get resumed unconditionally - I'm fine with that (if someone suspends the VM manually, outside of the ACS, then ACS resumes his VM - "sucks to be you") :)
|
Packaging result: ✔centos7 ✔debian. JID-1173 |
|
2 x LGTMs/Approvals, manual testing done fine, regression tests are fine. Ready for merge @DaanHoogland |
|
@weizhouapache ... should we also suspend the VM while we delete VM snapshot right after Backuping up volume snap to Secondary Storage, below: This PR code 4033 is executed when we delete a VM snapshot that we manually created previously : (i.e. proper resume during creation and deletion of the VM snap) - all good here. But when kvm.snapshots.enabled=true, and you snap just the volume of the running VM - I could see that the whole VM is being paused, whole VM snapshot was taken (all volumes and RAM) and then VM is being resumed and then a single volume snapshots (from the qcow2) is being exported via qemu-img to Secondary STorage and then the original VM snapshot is deleted - during this VM SNAP deletion I could not confirm the VM is being ALWAYS paused/suspended (no logs messages ever, but I could see it being paused when DATA volume snap is deleted, most of the time, but not paused when ROOT volumes are snapped)- i.e. I could see that the code of this PR is NOT executed, but some other code (what I shared above I guess?) - since I can't see for example that this logging is NOT done: s_logger.debug("Suspending domain " + vmName); (from this PR) The only thing I would like to see verified is that the VM is always suspended (like in this PR) when we CREATE VOLUME SNAPSHOT, due to the whole VM SNAPSHOTS is taken/deleted when we do just a volume snapshots (running VM, with kvm.snapshots.enable=true) Could you please advise? |
Description
To void qcow2 image corruption, we'd better suspend vm when delete a vmsnapshot, and resume it when vmsnapshot is removed.
Fixes: #3193
related to: #3194 #4029
Types of changes
Screenshots (if appropriate):
How Has This Been Tested?