Skip to content

RvR: Set up metadata/password/dhcp server on gateway IP instead of guest IP in RVR#3477

Merged
yadvr merged 4 commits intoapache:4.13from
ustcweizhou:4.11-rvr-services-on-gw
Jan 28, 2020
Merged

RvR: Set up metadata/password/dhcp server on gateway IP instead of guest IP in RVR#3477
yadvr merged 4 commits intoapache:4.13from
ustcweizhou:4.11-rvr-services-on-gw

Conversation

@ustcweizhou
Copy link
Copy Markdown
Contributor

Description

When we create a vm in the network with redundant VRs, the lease file in the vm (for example /var/lib/dhcp/dhclient.eth0.leases) shows the dhcp-server-identifier is the guest ip (not vip/gateway) of master VR. That's the ip ipaddress where the vm fetch password and metadata from.
if we stop the master VR (then backup will be master) or restart the network with cleanup (VRs will be created), the guest ip of master VR changes so vm are not able to get metadata/ssh-key using the ips in dhcp lease file.

Setting up metadata/password/dhcp server on gateway instead of guest IP in redundant VRs will fix the issues.

FIxes: #3409

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)

Screenshots (if appropriate):

How Has This Been Tested?

@ustcweizhou
Copy link
Copy Markdown
Contributor Author

@rhtyd Here is the PR for issue #3409
I will not be able to respond to your comments in time because I will be on holiday in the coming weeks.

@yadvr yadvr added this to the 4.13.0.0 milestone Jul 9, 2019
@yadvr
Copy link
Copy Markdown
Member

yadvr commented Jul 9, 2019

Thanks @ustcweizhou I'll help review and test, if you're unavailable may extend and address any review comments myself.

@@ -1,4 +1,4 @@
<VirtualHost 10.1.1.1:80>
<VirtualHost 10.1.1.1:8180>
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why change the port?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these lines will be replaced with gateway IP and guest IP in CsApp.py
If we use :80 and :443 in this template, the first time it is ok, after that the configuration file gets mess up
if we use :8180 and :8443, the configuration file will be changed only once.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ustcweizhou Thanks for explaining. I'll test it.


<IfModule mod_ssl.c>
<VirtualHost 10.1.1.1:443>
<VirtualHost 10.1.1.1:8443>
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above - any reason to change the port or did it come from your internal branch?

@yadvr
Copy link
Copy Markdown
Member

yadvr commented Jul 9, 2019

@blueorangutan package

@blueorangutan
Copy link
Copy Markdown

@rhtyd a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link
Copy Markdown

Packaging result: ✔centos6 ✔centos7 ✔debian. JID-105

@yadvr
Copy link
Copy Markdown
Member

yadvr commented Jul 9, 2019

@blueorangutan test

@blueorangutan
Copy link
Copy Markdown

@rhtyd a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@ustcweizhou ustcweizhou closed this Jul 9, 2019
@ustcweizhou ustcweizhou reopened this Jul 9, 2019
@yadvr
Copy link
Copy Markdown
Member

yadvr commented Jul 10, 2019

@blueorangutan test

@blueorangutan
Copy link
Copy Markdown

@rhtyd a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@yadvr yadvr changed the base branch from 4.11 to master July 10, 2019 15:37
@yadvr
Copy link
Copy Markdown
Member

yadvr commented Jul 10, 2019

Looks like on 4.11 branch there is some issue, I'll kick tests against master.
@blueorangutan package

@blueorangutan
Copy link
Copy Markdown

@rhtyd a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link
Copy Markdown

Packaging result: ✔centos6 ✔centos7 ✔debian. JID-111

@yadvr
Copy link
Copy Markdown
Member

yadvr commented Jul 10, 2019

@blueorangutan test

@blueorangutan
Copy link
Copy Markdown

@rhtyd a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@blueorangutan
Copy link
Copy Markdown

Trillian test result (tid-139)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 41924 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr3477-t139-kvm-centos7.zip
Intermittent failure detected: /marvin/tests/smoke/test_accounts.py
Intermittent failure detected: /marvin/tests/smoke/test_internal_lb.py
Intermittent failure detected: /marvin/tests/smoke/test_iso.py
Intermittent failure detected: /marvin/tests/smoke/test_templates.py
Intermittent failure detected: /marvin/tests/smoke/test_volumes.py
Intermittent failure detected: /marvin/tests/smoke/test_vpc_redundant.py
Smoke tests completed. 67 look OK, 5 have error(s)
Only failed tests results shown below:

Test Result Time (s) Test File
test_04_rvpc_internallb_haproxy_stats_on_all_interfaces Error 199.56 test_internal_lb.py
ContextSuite context=TestTemplateHierarchy>:setup Error 1521.01 test_accounts.py
test_04_extract_Iso Failure 1.09 test_iso.py
test_04_extract_template Failure 1.11 test_templates.py
test_06_download_detached_volume Failure 11.49 test_volumes.py

@yadvr
Copy link
Copy Markdown
Member

yadvr commented Jul 11, 2019

@blueorangutan test

@blueorangutan
Copy link
Copy Markdown

@rhtyd a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@apache apache deleted a comment from blueorangutan Jul 11, 2019
@yadvr
Copy link
Copy Markdown
Member

yadvr commented Jul 11, 2019

@blueorangutan test

@blueorangutan
Copy link
Copy Markdown

@rhtyd a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@blueorangutan
Copy link
Copy Markdown

Trillian test result (tid-149)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 40829 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr3477-t149-kvm-centos7.zip
Intermittent failure detected: /marvin/tests/smoke/test_accounts.py
Intermittent failure detected: /marvin/tests/smoke/test_iso.py
Intermittent failure detected: /marvin/tests/smoke/test_templates.py
Intermittent failure detected: /marvin/tests/smoke/test_volumes.py
Intermittent failure detected: /marvin/tests/smoke/test_vpc_redundant.py
Smoke tests completed. 67 look OK, 5 have error(s)
Only failed tests results shown below:

Test Result Time (s) Test File
ContextSuite context=TestTemplateHierarchy>:setup Error 1521.43 test_accounts.py
test_04_extract_Iso Failure 1.13 test_iso.py
test_04_extract_template Failure 1.09 test_templates.py
test_06_download_detached_volume Failure 10.49 test_volumes.py
test_05_rvpc_multi_tiers Failure 409.02 test_vpc_redundant.py
test_05_rvpc_multi_tiers Error 437.98 test_vpc_redundant.py

@yadvr
Copy link
Copy Markdown
Member

yadvr commented Jul 12, 2019

@blueorangutan package

@blueorangutan
Copy link
Copy Markdown

Trillian test result (tid-425)
Environment: vmware-65u2 (x2), Advanced Networking with Mgmt server 7
Total time taken: 53497 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr3477-t425-vmware-65u2.zip
Intermittent failure detected: /marvin/tests/smoke/test_accounts.py
Intermittent failure detected: /marvin/tests/smoke/test_iso.py
Intermittent failure detected: /marvin/tests/smoke/test_templates.py
Intermittent failure detected: /marvin/tests/smoke/test_volumes.py
Smoke tests completed. 73 look OK, 4 have error(s)
Only failed tests results shown below:

Test Result Time (s) Test File
ContextSuite context=TestTemplateHierarchy>:setup Error 1517.79 test_accounts.py
test_04_extract_Iso Failure 1.06 test_iso.py
test_04_extract_template Failure 74.53 test_templates.py
test_06_download_detached_volume Failure 89.92 test_volumes.py

@ustcweizhou
Copy link
Copy Markdown
Contributor Author

@rhtyd I will look into the failures.
it seems ssvm is broken by this PR.

@ustcweizhou
Copy link
Copy Markdown
Contributor Author

@rhtyd
I have pushed a new commit to fix ssvm.
could you please kick off another test ? thanks.

@DennisKonrad
Copy link
Copy Markdown
Contributor

DennisKonrad commented Nov 7, 2019

Hi @ustcweizhou,

do you think this will solve #3179 ?
We have this problem with KVM+OvS with redundant VPC Offering where the wrong dev num is chosen. On current master.

I suspect the changes in this PR will at least change the behaviour of VPC also.

@ustcweizhou
Copy link
Copy Markdown
Contributor Author

@DennisKonrad unfortunately I do not think this pr is helpful on fixing the issue you mentioned.

@DaanHoogland DaanHoogland reopened this Jan 3, 2020
@andrijapanicsb
Copy link
Copy Markdown
Contributor

@DaanHoogland all 3 envs failed, I wiped them to regain some resources on Trillian

@DaanHoogland
Copy link
Copy Markdown
Contributor

build failures again "TASK [Remove previous SSH key from Project if it exists] ". Not sure if this is related to the PR, but looks like it.

@yadvr
Copy link
Copy Markdown
Member

yadvr commented Jan 6, 2020

Trillian test failed, needs re-run @DaanHoogland cc @andrijapanicsb
@blueorangutan package

@blueorangutan
Copy link
Copy Markdown

@rhtyd a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link
Copy Markdown

Packaging result: ✔centos6 ✔centos7 ✔debian. JID-536

@andrijapanicsb
Copy link
Copy Markdown
Contributor

@blueorangutan test matrix

@blueorangutan
Copy link
Copy Markdown

@andrijapanicsb a Trillian-Jenkins matrix job (centos7 mgmt + xs71, centos7 mgmt + vmware65, centos7 mgmt + kvmcentos7) has been kicked to run smoke tests

@apache apache deleted a comment from blueorangutan Jan 6, 2020
@apache apache deleted a comment from blueorangutan Jan 6, 2020
@apache apache deleted a comment from blueorangutan Jan 6, 2020
@apache apache deleted a comment from blueorangutan Jan 6, 2020
@apache apache deleted a comment from blueorangutan Jan 6, 2020
@apache apache deleted a comment from blueorangutan Jan 6, 2020
@apache apache deleted a comment from blueorangutan Jan 6, 2020
@blueorangutan
Copy link
Copy Markdown

Trillian test result (tid-704)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 42566 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr3477-t704-kvm-centos7.zip
Intermittent failure detected: /marvin/tests/smoke/test_privategw_acl.py
Smoke tests completed. 76 look OK, 1 have error(s)
Only failed tests results shown below:

Test Result Time (s) Test File
test_02_vpc_privategw_static_routes Failure 267.01 test_privategw_acl.py
test_03_vpc_privategw_restart_vpc_cleanup Failure 267.22 test_privategw_acl.py
test_04_rvpc_privategw_static_routes Failure 412.07 test_privategw_acl.py

@blueorangutan
Copy link
Copy Markdown

Trillian test result (tid-703)
Environment: xenserver-71 (x2), Advanced Networking with Mgmt server 7
Total time taken: 47090 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr3477-t703-xenserver-71.zip
Intermittent failure detected: /marvin/tests/smoke/test_scale_vm.py
Smoke tests completed. 76 look OK, 1 have error(s)
Only failed tests results shown below:

Test Result Time (s) Test File
test_01_scale_vm Failure 35.02 test_scale_vm.py

@blueorangutan
Copy link
Copy Markdown

Trillian test result (tid-705)
Environment: vmware-65u2 (x2), Advanced Networking with Mgmt server 7
Total time taken: 57883 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr3477-t705-vmware-65u2.zip
Intermittent failure detected: /marvin/tests/smoke/test_deploy_vm_root_resize.py
Smoke tests completed. 76 look OK, 1 have error(s)
Only failed tests results shown below:

Test Result Time (s) Test File
test_00_deploy_vm_root_resize Failure 404.93 test_deploy_vm_root_resize.py

@DaanHoogland
Copy link
Copy Markdown
Contributor

@rhtyd @andrijapanicsb differnet errors at different environments above. as this had been reviewed and tested before, do we spend more time investigating?

@DaanHoogland
Copy link
Copy Markdown
Contributor

ping @rhtyd @andrijapanicsb ??

@yadvr yadvr merged commit ff1c6e7 into apache:4.13 Jan 28, 2020
ustcweizhou added a commit to ustcweizhou/cloudstack that referenced this pull request Feb 28, 2020
… guest IP in RVR (apache#3477)

When we create a vm in the network with redundant VRs, the lease file in the vm (for example /var/lib/dhcp/dhclient.eth0.leases) shows the dhcp-server-identifier is the guest ip (not vip/gateway) of master VR. That's the ip ipaddress where the vm fetch password and metadata from.
if we stop the master VR (then backup will be master) or restart the network with cleanup (VRs will be created), the guest ip of master VR changes so vm are not able to get metadata/ssh-key using the ips in dhcp lease file.

Setting up metadata/password/dhcp server on gateway instead of guest IP in redundant VRs will fix the issues.

FIxes apache#3409
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants