Skip to content

template: copy md5 mismatch#3383

Merged
yadvr merged 13 commits intoapache:4.13from
shapeblue:fix-copy-template-md5-mismatch-3191
Jan 31, 2020
Merged

template: copy md5 mismatch#3383
yadvr merged 13 commits intoapache:4.13from
shapeblue:fix-copy-template-md5-mismatch-3191

Conversation

@shwstppr
Copy link
Copy Markdown
Contributor

@shwstppr shwstppr commented Jun 7, 2019

Description

Fixes #3191

When a template is registered, code stores md5sum of the downloaded file in the vm_template table. However, this downloaded file could be deleted after template installation if it is not an actual (.qcow2, .ova, etc.) file. When the user copies a template using copyTemplate API, the actual template file will be copied across the image stores. Matching checksum for the copied templated file and the stored value from the vm_template table will result in a mismatch.
Changes will set an empty checksum value for the copied template while passing to download service which allows skipping wrong checksum check for the copied while install.
However, this results in a change in checksum value for concerned template entry in vm_template table post template install.

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)

Screenshots (if appropriate):

template-md5-mismatch-1

template-md5-mismatch-2

How Has This Been Tested?

cmk

@yadvr
Copy link
Copy Markdown
Member

yadvr commented Jun 10, 2019

Is this ready for testing @shwstppr or implementation in progress?

@shwstppr
Copy link
Copy Markdown
Contributor Author

@rhtyd I don't think the change/implementation is correct. I'm thinking of a better way. Will appreciate help here.
As mentioned in summary, when copying template code should not compare the checksum of the original download template file with the copied file. But if I make any changes here, https://github.com/apache/cloudstack/blob/master/services/secondary-storage/server/src/main/java/org/apache/cloudstack/storage/template/DownloadManagerImpl.java#L383 it would affect normal template registration/download as well.

@yadvr yadvr modified the milestones: 4.14.0.0, 4.13.0.0 Jun 20, 2019
@yadvr yadvr modified the milestones: 4.13.0.0, 4.13.1.0 Jul 4, 2019
@andrijapanicsb
Copy link
Copy Markdown
Contributor

ping if any updates here.

@shwstppr
Copy link
Copy Markdown
Contributor Author

@andrijapanicsb will look into it next week

@DaanHoogland
Copy link
Copy Markdown
Contributor

DaanHoogland commented Nov 25, 2019

@shwstppr there are two different checksums indeed.

  • One for the file to be downloaded to compare against its origin
  • One for checking internal copying

the original design has not taken this into account.

@DaanHoogland
Copy link
Copy Markdown
Contributor

@andrijapanicsb, I discussed with @shwstppr and this simple fix seems to work but does not take into account any special requirements on the design flaws as described above. any opinions? /cc @PaulAngus

@DaanHoogland DaanHoogland self-assigned this Dec 5, 2019
@andrijapanicsb
Copy link
Copy Markdown
Contributor

As discussed (internally) this fix simply set the source image md5sum to NULL (instead of reading a wrong md5sum from DB, from the compressed file) and thus when SSVM receives the files in the new zone, since no source MD5 sum was defined, it will not check and will consider an image as downloaded successfully - which should be fine in 99% of cases.

I would like to test a silly thing - if the download of template (via the SSVM in the destination zone) is somehow interrupted - do we ever reach the MD5 "to check or not to check" part of code - ideally the download process should break and no false positive in sense of "successful" download.

This can be tested while copying a template from zone1 to zone2, by logging in to zone1 SSVM and simply stopping the Apache webserver - simulating connection drop - and checking/confirm that on the destination zone the template download has broken as exected.

@weizhouapache ping for opinion pls CC @rhtyd

@yadvr yadvr modified the milestones: 4.13.1.0, 4.14.0.0 Dec 7, 2019
@yadvr
Copy link
Copy Markdown
Member

yadvr commented Dec 7, 2019

Is this ready or needs further update - @shwstppr @andrijapanicsb @PaulAngus @borisstoyanov ?

@andrijapanicsb
Copy link
Copy Markdown
Contributor

needs testing as I suggested, unless someone has other opinions as well

Copy link
Copy Markdown
Contributor

@borisstoyanov borisstoyanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm kind of -1 on this fix, simply because we're immediately cutting a functionality to the end user. I think we should add an explicit checksum parameter to the copy template api if the user wants to do that check (default = null). if checksum passed is null then yes skip it, but if a sum is passed then do the check.

@shwstppr
Copy link
Copy Markdown
Contributor Author

I'm not sure if adding checksum parameter would add any functionality to the user or solve the problem here. As mentioned above copy and create template same code for downloading and checking checksum. For correctly checking checksum while copying a template code should compare the checksum of the copied file and the installed template file and not the checksum of copied file with the original checksum of the template.
Adding a checksum parameter, I'm not sure if user will ever input the checksum of template's installed file. And if user inputs original checksum of template copy will fail. Making it irrelevant IMO.
We should rather refactor underlying code (https://github.com/apache/cloudstack/blob/master/engine/storage/image/src/main/java/org/apache/cloudstack/storage/image/TemplateServiceImpl.java#L1026-L1075) to pass checksum string for the installed template file, adding a flag in DownloadCmd and DownloadAnswer classes to identify if we are copying so it doesn't change checksum value in DB later(https://github.com/apache/cloudstack/blob/master/engine/storage/src/main/java/org/apache/cloudstack/storage/image/BaseImageStoreDriverImpl.java#L247), after successful completion.
I'll try to do it over the weekend. Will update if I'm not able to.
cc @DaanHoogland

@DaanHoogland
Copy link
Copy Markdown
Contributor

I agree with @borisstoyanov that having a checking mechanism is useful functionality, but not that we are cutting that with this PR. It is not there and would be new functionality. The proposal is sensible, but the problem is that the user would not know the checksum to input, unless they are an admin with access to the source store and manually create it, @shwstppr. Making your solution an automation of the process that would happen in @borisstoyanov 's solution. If you can make it happen, please. Else, we are half way there with the extra parameter.

@shwstppr shwstppr force-pushed the fix-copy-template-md5-mismatch-3191 branch from 63e538b to 6b0ae7d Compare December 13, 2019 08:17
@andrijapanicsb
Copy link
Copy Markdown
Contributor

@shwstppr @borisstoyanov /cc @DaanHoogland where are we with this one?
i.e. we don't even expose MD5 option to users when downloading the template from the internet - sharing this just for the sake of the feature comparison

@andrijapanicsb
Copy link
Copy Markdown
Contributor

I'm also OK if we move forward withOUT MD5 sum now for 4.14, and create an additional issue for 4.15 to add MD5 checksum ?

@DaanHoogland
Copy link
Copy Markdown
Contributor

@andrijapanicsb It is a bug, i think we want it on 4.13

@andrijapanicsb
Copy link
Copy Markdown
Contributor

And what about backward "compatibility".
If adding i.e. a new storage, SSVNM will try to download template from the original URL - does md5 sum gets checked and if so, will it fail for older templates created using md5sum if you use sha512?
i. e. will it break anything?

@DaanHoogland
Copy link
Copy Markdown
Contributor

no, in theory ;) if the checksum stored is an md5 a check will be done against that.

@blueorangutan
Copy link
Copy Markdown

Trillian test result (tid-799)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 30187 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr3383-t799-kvm-centos7.zip
Smoke tests completed. 77 look OK, 0 have error(s)
Only failed tests results shown below:

Test Result Time (s) Test File

@blueorangutan
Copy link
Copy Markdown

Trillian test result (tid-800)
Environment: vmware-65u2 (x2), Advanced Networking with Mgmt server 7
Total time taken: 35798 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr3383-t800-vmware-65u2.zip
Smoke tests completed. 77 look OK, 0 have error(s)
Only failed tests results shown below:

Test Result Time (s) Test File

@blueorangutan
Copy link
Copy Markdown

Trillian test result (tid-796)
Environment: xenserver-71 (x2), Advanced Networking with Mgmt server 7
Total time taken: 40046 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr3383-t796-xenserver-71.zip
Intermittent failure detected: /marvin/tests/smoke/test_scale_vm.py
Smoke tests completed. 76 look OK, 1 have error(s)
Only failed tests results shown below:

Test Result Time (s) Test File
test_01_scale_vm Failure 28.68 test_scale_vm.py

@yadvr yadvr closed this Jan 28, 2020
@yadvr yadvr reopened this Jan 28, 2020
@yadvr yadvr closed this Jan 29, 2020
@yadvr yadvr reopened this Jan 29, 2020
Copy link
Copy Markdown
Contributor

@borisstoyanov borisstoyanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, manually verified we don't see the checksum error on copy. The other case where having a crosszone template and adding a new zone is not really a valid case since we can't add a checksum and leave blank zone at the same time. I think it's good for merging.

@DaanHoogland
Copy link
Copy Markdown
Contributor

DaanHoogland commented Jan 30, 2020

The job exceeded the maximum log length, and has been terminated. not related to the code. we might want to lower the travis log levels?

@DaanHoogland DaanHoogland requested a review from yadvr January 30, 2020 09:12
@DaanHoogland
Copy link
Copy Markdown
Contributor

ping @shwstppr @rhtyd @andrijapanicsb @nvazquez @PaulAngus , alright like this? I need a second ok, after @borisstoyanov 's testing

@borisstoyanov
Copy link
Copy Markdown
Contributor

@blueorangutan package

@blueorangutan
Copy link
Copy Markdown

@borisstoyanov a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link
Copy Markdown

Packaging result: ✖centos6 ✔centos7 ✔debian. JID-717

@borisstoyanov
Copy link
Copy Markdown
Contributor

@blueorangutan test

@blueorangutan
Copy link
Copy Markdown

@borisstoyanov a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@blueorangutan
Copy link
Copy Markdown

Trillian test result (tid-859)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 32228 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr3383-t859-kvm-centos7.zip
Intermittent failure detected: /marvin/tests/smoke/test_internal_lb.py
Intermittent failure detected: /marvin/tests/smoke/test_vpc_vpn.py
Smoke tests completed. 76 look OK, 1 have error(s)
Only failed tests results shown below:

Test Result Time (s) Test File
test_04_rvpc_internallb_haproxy_stats_on_all_interfaces Error 205.36 test_internal_lb.py

@apache apache deleted a comment from borisstoyanov Jan 31, 2020
@DaanHoogland
Copy link
Copy Markdown
Contributor

@rhtyd good to go?

@yadvr
Copy link
Copy Markdown
Member

yadvr commented Jan 31, 2020

LGTM, checking travis. Will merge if it's ok.

@yadvr yadvr merged commit 9d105b6 into apache:4.13 Jan 31, 2020
ustcweizhou pushed a commit to ustcweizhou/cloudstack that referenced this pull request Feb 28, 2020
Fixes apache#3191

When a template is registered, code stores md5sum of the downloaded file in the vm_template table. However, this downloaded file could be deleted after template installation if it is not an actual (.qcow2, .ova, etc.) file. When the user copies a template using copyTemplate API, the actual template file will be copied across the image stores. Matching checksum for the copied templated file and the stored value from the vm_template table will result in a mismatch.
Changes will set an empty checksum value for the copied template while passing to download service which allows skipping wrong checksum check for the copied while install.
However, this results in a change in checksum value for concerned template entry in vm_template table post template install.

Co-authored-by: dahn <daan.hoogland@gmail.com>
@DaanHoogland DaanHoogland removed their assignment Jun 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants