Skip to content

server: Search zone-wide storage pool when allocation algothrim is firstfitleastconsumed#4002

Merged
andrijapanicsb merged 1 commit intoapache:4.13from
ustcweizhou:4.13-fix-zone-wide-pool-while-firstfitleastconsume
Apr 6, 2020
Merged

server: Search zone-wide storage pool when allocation algothrim is firstfitleastconsumed#4002
andrijapanicsb merged 1 commit intoapache:4.13from
ustcweizhou:4.13-fix-zone-wide-pool-while-firstfitleastconsume

Conversation

@ustcweizhou
Copy link
Copy Markdown
Contributor

Description

When vm.allocation.algorithm is set to 'firstfitleastconsumed', we are unable to create vm or volume if all storage pools are zone-wide (which have cluster_id=NULL)

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)

Screenshots (if appropriate):

How Has This Been Tested?

  1. create zone with zone-wide storage pool
  2. deploy vm successfully.

@DaanHoogland
Copy link
Copy Markdown
Contributor

@andrijapanicsb can you test this?

@andrijapanicsb
Copy link
Copy Markdown
Contributor

will do, unless I forget..!

@DaanHoogland
Copy link
Copy Markdown
Contributor

@andrijapanicsb can you test this?

@andrijapanicsb
Copy link
Copy Markdown
Contributor

@blueorangutan package

@blueorangutan
Copy link
Copy Markdown

@andrijapanicsb a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link
Copy Markdown

Packaging result: ✔centos7 ✔debian. JID-1121

@andrijapanicsb
Copy link
Copy Markdown
Contributor

@blueorangutan test

@andrijapanicsb andrijapanicsb added this to the 4.13.1.0 milestone Apr 3, 2020
@blueorangutan
Copy link
Copy Markdown

@andrijapanicsb a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@andrijapanicsb
Copy link
Copy Markdown
Contributor

@weizhouapache is this "if ALL pools are zone-wide" - i.e. in mixed cluster/zone-wide pools you did NOT observe the issue?

@andrijapanicsb
Copy link
Copy Markdown
Contributor

@blueorangutan test

@blueorangutan
Copy link
Copy Markdown

@andrijapanicsb a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@andrijapanicsb
Copy link
Copy Markdown
Contributor

Single zone-wide - VM started fine
image
image
image

Added another zone-wide, and was able to spring VM as expected:
image
image

Have not reproduced before the fix - but the fix is working fine

LGTM

@andrijapanicsb andrijapanicsb self-requested a review April 3, 2020 19:53
@weizhouapache
Copy link
Copy Markdown
Member

Single zone-wide - VM started fine
image
image
image

Added another zone-wide, and was able to spring VM as expected:
image
image

Have not reproduced before the fix - but the fix is working fine

LGTM

@andrijapanicsb Have you restarted the management server after vm.allocation.algorithm is changed to firstfitleastconsumed (it is 'random' by default) ?

@blueorangutan
Copy link
Copy Markdown

Trillian test result (tid-1355)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 31759 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr4002-t1355-kvm-centos7.zip
Intermittent failure detected: /marvin/tests/smoke/test_privategw_acl.py
Intermittent failure detected: /marvin/tests/smoke/test_vpc_vpn.py
Smoke tests completed. 76 look OK, 1 have error(s)
Only failed tests results shown below:

Test Result Time (s) Test File
test_02_vpc_privategw_static_routes Failure 190.39 test_privategw_acl.py
test_03_vpc_privategw_restart_vpc_cleanup Failure 178.21 test_privategw_acl.py
test_04_rvpc_privategw_static_routes Failure 252.20 test_privategw_acl.py

@blueorangutan
Copy link
Copy Markdown

Trillian test result (tid-1358)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 30179 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr4002-t1358-kvm-centos7.zip
Intermittent failure detected: /marvin/tests/smoke/test_privategw_acl.py
Intermittent failure detected: /marvin/tests/smoke/test_vpc_vpn.py
Smoke tests completed. 76 look OK, 1 have error(s)
Only failed tests results shown below:

Test Result Time (s) Test File
test_02_vpc_privategw_static_routes Failure 177.50 test_privategw_acl.py
test_03_vpc_privategw_restart_vpc_cleanup Failure 174.52 test_privategw_acl.py
test_04_rvpc_privategw_static_routes Failure 244.24 test_privategw_acl.py

@andrijapanicsb
Copy link
Copy Markdown
Contributor

@ustcweizhou @weizhouapache yes, of course

Copy link
Copy Markdown
Contributor

@DaanHoogland DaanHoogland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one possible improvement but code looks good. let's merge if testing is good.


@Override
public List<Long> orderHostsByFreeCapacity(Long clusterId, short capacityTypeForOrdering){
public List<Long> orderHostsByFreeCapacity(Long zoneId, Long clusterId, short capacityTypeForOrdering){
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@weizhouapache wouldn't a cluster imply a zone? we wouldn't have to pass a zone if we already know the cluster.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DaanHoogland if we search storage pools by clusterid, we would not get the list of zone-wide storage pools, because cluster_id is NULL.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense, but if we have a cluster we can find the zone it is in and use that to find zone wide pools for it?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to be clear, I think this is aesthetic, not a potential bug.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DaanHoogland
please look at the code, there is only 'AND cluster_id = ' in the sql.
you can also execute the query in mysql

select * from op_host_capacity where capacity_type=3 and cluster_id=?? 

it lists only the cluster-wide storage pools.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, i read the code. Either zone_id or cluster_id should be null. What is the use case of both being not null?
My point is; pass the cluster and find all storage pools for that cluster either cluster wide or for the zone the cluster is in.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DaanHoogland what if the storage pool is used for another zone ?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we pass the cluster and deduce the zone from that, or if we deliberately pass another zone than the cluster we are looking to deploy in? I don't see how can it ever be another zone,except for a bad bug. Or do you mean a storage pool that is used for multiple zones? I have not seen that before.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DaanHoogland as I said, if we pass the cluster id, it lists only cluster-wide storage pools.
To list zone-wide storage pools, we should pass the zoneId/data_center_id (and cluster_id should be NULL).

@weizhouapache
Copy link
Copy Markdown
Member

@andrijapanicsb if you cannot reproduce the issue, please do not merge this PR.

ps: I am still able to reproduce the issue. We had the issue in 4.7 as well.

@andrijapanicsb
Copy link
Copy Markdown
Contributor

@weizhouapache - I have not TRIED to reproduce the issue - but I can if you find it needed?

@andrijapanicsb
Copy link
Copy Markdown
Contributor

@weizhouapache which version should I test?

@weizhouapache
Copy link
Copy Markdown
Member

@weizhouapache which version should I test?

@andrijapanicsb 4.13 or 4.14/master

@nvazquez
Copy link
Copy Markdown
Contributor

nvazquez commented Apr 6, 2020

@blueorangutan package

@blueorangutan
Copy link
Copy Markdown

@nvazquez a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link
Copy Markdown

Packaging result: ✔centos7 ✔debian. JID-1131

Copy link
Copy Markdown
Contributor

@nvazquez nvazquez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
Tested manually on KVM env with zone-wide storage pool and vm.allocation.algorithm = firstfitleastconsumed

@weizhouapache
Copy link
Copy Markdown
Member

LGTM
Tested manually on KVM env with zone-wide storage pool and vm.allocation.algorithm = firstfitleastconsumed

@nvazquez thanks for approval. Have you tested without this PR ?

@nvazquez
Copy link
Copy Markdown
Contributor

nvazquez commented Apr 6, 2020

@weizhouapache yes, tested in master and failed with this error:

2020-04-06 19:45:04,506 DEBUG [c.c.a.m.a.i.FirstFitAllocator] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7 FirstFitRoutingAllocator) (logid:bc36d8d1) Found a suitable host, adding to list: 1
2020-04-06 19:45:04,507 DEBUG [c.c.a.m.a.i.FirstFitAllocator] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7 FirstFitRoutingAllocator) (logid:bc36d8d1) Host Allocator returning 1 suitable hosts
2020-04-06 19:45:04,510 DEBUG [c.c.d.DeploymentPlanningManagerImpl] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) Checking suitable pools for volume (Id, Type): (129,ROOT)
2020-04-06 19:45:04,510 DEBUG [c.c.d.DeploymentPlanningManagerImpl] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) We need to allocate new storagepool for this volume
2020-04-06 19:45:04,510 DEBUG [c.c.d.DeploymentPlanningManagerImpl] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) Calling StoragePoolAllocators to find suitable pools
2020-04-06 19:45:04,511 DEBUG [o.a.c.s.a.LocalStoragePoolAllocator] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) LocalStoragePoolAllocator trying to find storage pool to fit the vm
2020-04-06 19:45:04,511 DEBUG [o.a.c.s.a.ClusterScopeStoragePoolAllocator] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) ClusterScopeStoragePoolAllocator looking for storage pool
2020-04-06 19:45:04,515 DEBUG [o.a.c.s.a.ClusterScopeStoragePoolAllocator] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) Looking for pools in dc: 1  pod:1  cluster:1. Disabled pools will be ignored.
2020-04-06 19:45:04,517 DEBUG [o.a.c.s.a.ClusterScopeStoragePoolAllocator] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) Found pools matching tags: []
2020-04-06 19:45:04,517 DEBUG [o.a.c.s.a.ClusterScopeStoragePoolAllocator] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) No storage pools available for shared volume allocation, returning
2020-04-06 19:45:04,518 DEBUG [o.a.c.s.a.ZoneWideStoragePoolAllocator] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) ZoneWideStoragePoolAllocator to find storage pool
2020-04-06 19:45:04,521 DEBUG [o.a.c.s.a.AbstractStoragePoolAllocator] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) Checking if storage pool is suitable, name: null ,poolId: 2
2020-04-06 19:45:04,523 INFO  [c.c.s.StorageManagerImpl] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) Storage pool null (2) does not supply IOPS capacity, assuming enough capacity
2020-04-06 19:45:04,531 DEBUG [c.c.s.StorageManagerImpl] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) Checking pool 2 for storage, totalSize: 2164369915904, usedBytes: 1493502525440, usedPct: 0.6900403274253623, disable threshold: 0.85
2020-04-06 19:45:04,531 DEBUG [c.c.s.StorageManagerImpl] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) Destination pool id: 2
2020-04-06 19:45:04,551 DEBUG [c.c.s.StorageManagerImpl] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) Pool ID for the volume with ID 129 is null
2020-04-06 19:45:04,554 DEBUG [c.c.s.StorageManagerImpl] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) Found storage pool ps of type NetworkFilesystem with over-provisioning factor 2
2020-04-06 19:45:04,555 DEBUG [c.c.s.StorageManagerImpl] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) Total over-provisioned capacity calculated is 2 * 2164369915904
2020-04-06 19:45:04,555 DEBUG [c.c.s.StorageManagerImpl] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) Total capacity of the pool ps with ID 2 is 4328739831808
2020-04-06 19:45:04,559 DEBUG [c.c.s.StorageManagerImpl] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) Checking pool: 2 for storage allocation , maxSize : 4328739831808, totalAllocatedSize : 12784435712, askingSize : 8589934592, allocated disable threshold: 0.85
2020-04-06 19:45:04,563 DEBUG [o.a.c.s.a.AbstractStoragePoolAllocator] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) List of pools in descending order of free capacity: []
2020-04-06 19:45:04,563 DEBUG [c.c.d.DeploymentPlanningManagerImpl] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) No suitable pools found for volume: Vol[129|vm=123|ROOT] under cluster: 1
2020-04-06 19:45:04,563 DEBUG [c.c.d.DeploymentPlanningManagerImpl] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) No suitable pools found
2020-04-06 19:45:04,563 DEBUG [c.c.d.DeploymentPlanningManagerImpl] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) No suitable storagePools found under this Cluster: 1
2020-04-06 19:45:04,567 DEBUG [c.c.d.DeploymentPlanningManagerImpl] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) Could not find suitable Deployment Destination for this VM under any clusters, returning. 
2020-04-06 19:45:04,567 DEBUG [c.c.d.FirstFitPlanner] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) Searching all possible resources under this Zone: 1
2020-04-06 19:45:04,568 DEBUG [c.c.d.FirstFitPlanner] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) Listing clusters in order of aggregate capacity, that have (atleast one host with) enough CPU and RAM capacity under this Zone: 1
2020-04-06 19:45:04,569 DEBUG [c.c.d.FirstFitPlanner] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) Removing from the clusterId list these clusters from avoid set: [1]
2020-04-06 19:45:04,570 DEBUG [c.c.d.FirstFitPlanner] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) No clusters found after removing disabled clusters and clusters in avoid list, returning.
2020-04-06 19:45:04,572 DEBUG [c.c.v.UserVmManagerImpl] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) Destroying vm VM[User|i-2-123-VM] as it failed to create on Host with Id:null
2020-04-06 19:45:04,578 DEBUG [c.c.c.CapacityManagerImpl] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) VM state transitted from :Stopped to Error with event: OperationFailedToErrorvm's original host id: null new host id: null host id before state transition: null
2020-04-06 19:45:04,594 DEBUG [c.c.r.ResourceLimitManagerImpl] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) Updating resource Type = volume count for Account = 2 Operation = decreasing Amount = 1
2020-04-06 19:45:04,601 DEBUG [c.c.r.ResourceLimitManagerImpl] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) Updating resource Type = primary_storage count for Account = 2 Operation = decreasing Amount = 8589934592
2020-04-06 19:45:04,612 WARN  [c.c.a.AlertManagerImpl] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) AlertType:: 8 | dataCenterId:: 1 | podId:: null | clusterId:: null | message:: Failed to deploy Vm with Id: 123, on Host with Id: null
2020-04-06 19:45:04,619 DEBUG [c.c.r.ResourceLimitManagerImpl] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) Updating resource Type = user_vm count for Account = 2 Operation = decreasing Amount = 1
2020-04-06 19:45:04,625 DEBUG [c.c.r.ResourceLimitManagerImpl] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) Updating resource Type = cpu count for Account = 2 Operation = decreasing Amount = 1
2020-04-06 19:45:04,632 DEBUG [c.c.r.ResourceLimitManagerImpl] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) Updating resource Type = memory count for Account = 2 Operation = decreasing Amount = 512
2020-04-06 19:45:04,643 INFO  [o.a.c.a.c.u.v.DeployVMCmd] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) com.cloud.exception.InsufficientServerCapacityException: Unable to create a deployment for VM[User|i-2-123-VM]Scope=interface com.cloud.dc.DataCenter; id=1
2020-04-06 19:45:04,643 INFO  [o.a.c.a.c.u.v.DeployVMCmd] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) Unable to create a deployment for VM[User|i-2-123-VM]
com.cloud.exception.InsufficientServerCapacityException: Unable to create a deployment for VM[User|i-2-123-VM]Scope=interface com.cloud.dc.DataCenter; id=1
	at org.apache.cloudstack.engine.cloud.entity.api.VMEntityManagerImpl.reserveVirtualMachine(VMEntityManagerImpl.java:225)
	at org.apache.cloudstack.engine.cloud.entity.api.VirtualMachineEntityImpl.reserve(VirtualMachineEntityImpl.java:202)
	at com.cloud.vm.UserVmManagerImpl.startVirtualMachine(UserVmManagerImpl.java:4766)
	at com.cloud.vm.UserVmManagerImpl.startVirtualMachine(UserVmManagerImpl.java:4315)
	at com.cloud.vm.UserVmManagerImpl.startVirtualMachine(UserVmManagerImpl.java:4305)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:344)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:198)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163)
	at org.apache.cloudstack.network.contrail.management.EventUtils$EventInterceptor.invoke(EventUtils.java:107)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:175)
	at com.cloud.event.ActionEventInterceptor.invoke(ActionEventInterceptor.java:51)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:175)
	at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:95)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186)
	at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:212)
	at com.sun.proxy.$Proxy176.startVirtualMachine(Unknown Source)
	at org.apache.cloudstack.api.command.user.vm.DeployVMCmd.execute(DeployVMCmd.java:635)
	at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:156)
	at com.cloud.api.ApiAsyncJobDispatcher.runJob(ApiAsyncJobDispatcher.java:108)
	at org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.runInContext(AsyncJobManagerImpl.java:586)
	at org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
	at org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
	at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
	at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
	at org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
	at org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.run(AsyncJobManagerImpl.java:534)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:834)
2020-04-06 19:45:04,663 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (API-Job-Executor-1:ctx-acf00d62 job-1045) (logid:bc36d8d1) Complete async job-1045, jobStatus: FAILED, resultCode: 530, result: org.apache.cloudstack.api.response.ExceptionResponse/null/{"uuidList":[],"errorcode":533,"errortext":"Unable to create a deployment for VM[User|i-2-123-VM]"}

@weizhouapache
Copy link
Copy Markdown
Member

@weizhouapache yes, tested in master and failed with this error:

2020-04-06 19:45:04,506 DEBUG [c.c.a.m.a.i.FirstFitAllocator] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7 FirstFitRoutingAllocator) (logid:bc36d8d1) Found a suitable host, adding to list: 1
2020-04-06 19:45:04,507 DEBUG [c.c.a.m.a.i.FirstFitAllocator] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7 FirstFitRoutingAllocator) (logid:bc36d8d1) Host Allocator returning 1 suitable hosts
2020-04-06 19:45:04,510 DEBUG [c.c.d.DeploymentPlanningManagerImpl] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) Checking suitable pools for volume (Id, Type): (129,ROOT)
2020-04-06 19:45:04,510 DEBUG [c.c.d.DeploymentPlanningManagerImpl] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) We need to allocate new storagepool for this volume
2020-04-06 19:45:04,510 DEBUG [c.c.d.DeploymentPlanningManagerImpl] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) Calling StoragePoolAllocators to find suitable pools
2020-04-06 19:45:04,511 DEBUG [o.a.c.s.a.LocalStoragePoolAllocator] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) LocalStoragePoolAllocator trying to find storage pool to fit the vm
2020-04-06 19:45:04,511 DEBUG [o.a.c.s.a.ClusterScopeStoragePoolAllocator] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) ClusterScopeStoragePoolAllocator looking for storage pool
2020-04-06 19:45:04,515 DEBUG [o.a.c.s.a.ClusterScopeStoragePoolAllocator] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) Looking for pools in dc: 1  pod:1  cluster:1. Disabled pools will be ignored.
2020-04-06 19:45:04,517 DEBUG [o.a.c.s.a.ClusterScopeStoragePoolAllocator] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) Found pools matching tags: []
2020-04-06 19:45:04,517 DEBUG [o.a.c.s.a.ClusterScopeStoragePoolAllocator] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) No storage pools available for shared volume allocation, returning
2020-04-06 19:45:04,518 DEBUG [o.a.c.s.a.ZoneWideStoragePoolAllocator] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) ZoneWideStoragePoolAllocator to find storage pool
2020-04-06 19:45:04,521 DEBUG [o.a.c.s.a.AbstractStoragePoolAllocator] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) Checking if storage pool is suitable, name: null ,poolId: 2
2020-04-06 19:45:04,523 INFO  [c.c.s.StorageManagerImpl] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) Storage pool null (2) does not supply IOPS capacity, assuming enough capacity
2020-04-06 19:45:04,531 DEBUG [c.c.s.StorageManagerImpl] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) Checking pool 2 for storage, totalSize: 2164369915904, usedBytes: 1493502525440, usedPct: 0.6900403274253623, disable threshold: 0.85
2020-04-06 19:45:04,531 DEBUG [c.c.s.StorageManagerImpl] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) Destination pool id: 2
2020-04-06 19:45:04,551 DEBUG [c.c.s.StorageManagerImpl] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) Pool ID for the volume with ID 129 is null
2020-04-06 19:45:04,554 DEBUG [c.c.s.StorageManagerImpl] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) Found storage pool ps of type NetworkFilesystem with over-provisioning factor 2
2020-04-06 19:45:04,555 DEBUG [c.c.s.StorageManagerImpl] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) Total over-provisioned capacity calculated is 2 * 2164369915904
2020-04-06 19:45:04,555 DEBUG [c.c.s.StorageManagerImpl] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) Total capacity of the pool ps with ID 2 is 4328739831808
2020-04-06 19:45:04,559 DEBUG [c.c.s.StorageManagerImpl] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) Checking pool: 2 for storage allocation , maxSize : 4328739831808, totalAllocatedSize : 12784435712, askingSize : 8589934592, allocated disable threshold: 0.85
2020-04-06 19:45:04,563 DEBUG [o.a.c.s.a.AbstractStoragePoolAllocator] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) List of pools in descending order of free capacity: []
2020-04-06 19:45:04,563 DEBUG [c.c.d.DeploymentPlanningManagerImpl] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) No suitable pools found for volume: Vol[129|vm=123|ROOT] under cluster: 1
2020-04-06 19:45:04,563 DEBUG [c.c.d.DeploymentPlanningManagerImpl] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) No suitable pools found
2020-04-06 19:45:04,563 DEBUG [c.c.d.DeploymentPlanningManagerImpl] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) No suitable storagePools found under this Cluster: 1
2020-04-06 19:45:04,567 DEBUG [c.c.d.DeploymentPlanningManagerImpl] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) Could not find suitable Deployment Destination for this VM under any clusters, returning. 
2020-04-06 19:45:04,567 DEBUG [c.c.d.FirstFitPlanner] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) Searching all possible resources under this Zone: 1
2020-04-06 19:45:04,568 DEBUG [c.c.d.FirstFitPlanner] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) Listing clusters in order of aggregate capacity, that have (atleast one host with) enough CPU and RAM capacity under this Zone: 1
2020-04-06 19:45:04,569 DEBUG [c.c.d.FirstFitPlanner] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) Removing from the clusterId list these clusters from avoid set: [1]
2020-04-06 19:45:04,570 DEBUG [c.c.d.FirstFitPlanner] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) No clusters found after removing disabled clusters and clusters in avoid list, returning.
2020-04-06 19:45:04,572 DEBUG [c.c.v.UserVmManagerImpl] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) Destroying vm VM[User|i-2-123-VM] as it failed to create on Host with Id:null
2020-04-06 19:45:04,578 DEBUG [c.c.c.CapacityManagerImpl] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) VM state transitted from :Stopped to Error with event: OperationFailedToErrorvm's original host id: null new host id: null host id before state transition: null
2020-04-06 19:45:04,594 DEBUG [c.c.r.ResourceLimitManagerImpl] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) Updating resource Type = volume count for Account = 2 Operation = decreasing Amount = 1
2020-04-06 19:45:04,601 DEBUG [c.c.r.ResourceLimitManagerImpl] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) Updating resource Type = primary_storage count for Account = 2 Operation = decreasing Amount = 8589934592
2020-04-06 19:45:04,612 WARN  [c.c.a.AlertManagerImpl] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) AlertType:: 8 | dataCenterId:: 1 | podId:: null | clusterId:: null | message:: Failed to deploy Vm with Id: 123, on Host with Id: null
2020-04-06 19:45:04,619 DEBUG [c.c.r.ResourceLimitManagerImpl] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) Updating resource Type = user_vm count for Account = 2 Operation = decreasing Amount = 1
2020-04-06 19:45:04,625 DEBUG [c.c.r.ResourceLimitManagerImpl] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) Updating resource Type = cpu count for Account = 2 Operation = decreasing Amount = 1
2020-04-06 19:45:04,632 DEBUG [c.c.r.ResourceLimitManagerImpl] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) Updating resource Type = memory count for Account = 2 Operation = decreasing Amount = 512
2020-04-06 19:45:04,643 INFO  [o.a.c.a.c.u.v.DeployVMCmd] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) com.cloud.exception.InsufficientServerCapacityException: Unable to create a deployment for VM[User|i-2-123-VM]Scope=interface com.cloud.dc.DataCenter; id=1
2020-04-06 19:45:04,643 INFO  [o.a.c.a.c.u.v.DeployVMCmd] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) Unable to create a deployment for VM[User|i-2-123-VM]
com.cloud.exception.InsufficientServerCapacityException: Unable to create a deployment for VM[User|i-2-123-VM]Scope=interface com.cloud.dc.DataCenter; id=1
	at org.apache.cloudstack.engine.cloud.entity.api.VMEntityManagerImpl.reserveVirtualMachine(VMEntityManagerImpl.java:225)
	at org.apache.cloudstack.engine.cloud.entity.api.VirtualMachineEntityImpl.reserve(VirtualMachineEntityImpl.java:202)
	at com.cloud.vm.UserVmManagerImpl.startVirtualMachine(UserVmManagerImpl.java:4766)
	at com.cloud.vm.UserVmManagerImpl.startVirtualMachine(UserVmManagerImpl.java:4315)
	at com.cloud.vm.UserVmManagerImpl.startVirtualMachine(UserVmManagerImpl.java:4305)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:344)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:198)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163)
	at org.apache.cloudstack.network.contrail.management.EventUtils$EventInterceptor.invoke(EventUtils.java:107)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:175)
	at com.cloud.event.ActionEventInterceptor.invoke(ActionEventInterceptor.java:51)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:175)
	at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:95)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186)
	at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:212)
	at com.sun.proxy.$Proxy176.startVirtualMachine(Unknown Source)
	at org.apache.cloudstack.api.command.user.vm.DeployVMCmd.execute(DeployVMCmd.java:635)
	at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:156)
	at com.cloud.api.ApiAsyncJobDispatcher.runJob(ApiAsyncJobDispatcher.java:108)
	at org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.runInContext(AsyncJobManagerImpl.java:586)
	at org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
	at org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
	at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
	at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
	at org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
	at org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.run(AsyncJobManagerImpl.java:534)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:834)
2020-04-06 19:45:04,663 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (API-Job-Executor-1:ctx-acf00d62 job-1045) (logid:bc36d8d1) Complete async job-1045, jobStatus: FAILED, resultCode: 530, result: org.apache.cloudstack.api.response.ExceptionResponse/null/{"uuidList":[],"errorcode":533,"errortext":"Unable to create a deployment for VM[User|i-2-123-VM]"}

@nvazquez good. We got same issue and it's fixed by this PR.

@andrijapanicsb
Copy link
Copy Markdown
Contributor

3 x LGTM, automated tests fine (3 known issues in 4,13, fixed in 4,14), manual tests fine

Merging

@andrijapanicsb andrijapanicsb merged commit 6bf92fb into apache:4.13 Apr 6, 2020
@weizhouapache
Copy link
Copy Markdown
Member

thanks @andrijapanicsb

DaanHoogland added a commit that referenced this pull request Apr 11, 2020
* 4.13:
  Snapshot deletion issues (#3969)
  server: Cannot list affinity group if there are hosts dedicated… (#4025)
  server: Search zone-wide storage pool when allocation algothrim is firstfitleastconsumed (#4002)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants