server: Search zone-wide storage pool when allocation algothrim is firstfitleastconsumed by ustcweizhou · Pull Request #4002 · apache/cloudstack

ustcweizhou · 2020-03-30T21:06:06Z

Description

When vm.allocation.algorithm is set to 'firstfitleastconsumed', we are unable to create vm or volume if all storage pools are zone-wide (which have cluster_id=NULL)

Types of changes

Breaking change (fix or feature that would cause existing functionality to change)
New feature (non-breaking change which adds functionality)
Bug fix (non-breaking change which fixes an issue)
Enhancement (improves an existing feature and functionality)
Cleanup (Code refactoring and cleanup, that may add test cases)

Screenshots (if appropriate):

How Has This Been Tested?

create zone with zone-wide storage pool
deploy vm successfully.

…rstfitleastconsumed

DaanHoogland · 2020-04-01T20:33:34Z

@andrijapanicsb can you test this?

andrijapanicsb · 2020-04-01T21:15:46Z

will do, unless I forget..!

DaanHoogland · 2020-04-03T11:25:35Z

@andrijapanicsb can you test this?

andrijapanicsb · 2020-04-03T18:08:44Z

@blueorangutan package

blueorangutan · 2020-04-03T18:09:09Z

@andrijapanicsb a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

blueorangutan · 2020-04-03T18:27:25Z

Packaging result: ✔centos7 ✔debian. JID-1121

andrijapanicsb · 2020-04-03T18:41:06Z

@blueorangutan test

blueorangutan · 2020-04-03T18:42:09Z

@andrijapanicsb a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

andrijapanicsb · 2020-04-03T19:30:29Z

@weizhouapache is this "if ALL pools are zone-wide" - i.e. in mixed cluster/zone-wide pools you did NOT observe the issue?

andrijapanicsb · 2020-04-03T19:37:56Z

@blueorangutan test

blueorangutan · 2020-04-03T19:39:09Z

@andrijapanicsb a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

andrijapanicsb · 2020-04-03T19:53:12Z

Single zone-wide - VM started fine

Added another zone-wide, and was able to spring VM as expected:

Have not reproduced before the fix - but the fix is working fine

LGTM

weizhouapache · 2020-04-03T21:02:01Z

Single zone-wide - VM started fine

Added another zone-wide, and was able to spring VM as expected:

Have not reproduced before the fix - but the fix is working fine

LGTM

@andrijapanicsb Have you restarted the management server after vm.allocation.algorithm is changed to firstfitleastconsumed (it is 'random' by default) ?

blueorangutan · 2020-04-04T04:03:10Z

Trillian test result (tid-1355)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 31759 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr4002-t1355-kvm-centos7.zip
Intermittent failure detected: /marvin/tests/smoke/test_privategw_acl.py
Intermittent failure detected: /marvin/tests/smoke/test_vpc_vpn.py
Smoke tests completed. 76 look OK, 1 have error(s)
Only failed tests results shown below:

Test	Result	Time (s)	Test File
test_02_vpc_privategw_static_routes	`Failure`	190.39	test_privategw_acl.py
test_03_vpc_privategw_restart_vpc_cleanup	`Failure`	178.21	test_privategw_acl.py
test_04_rvpc_privategw_static_routes	`Failure`	252.20	test_privategw_acl.py

blueorangutan · 2020-04-04T04:35:07Z

Trillian test result (tid-1358)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 30179 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr4002-t1358-kvm-centos7.zip
Intermittent failure detected: /marvin/tests/smoke/test_privategw_acl.py
Intermittent failure detected: /marvin/tests/smoke/test_vpc_vpn.py
Smoke tests completed. 76 look OK, 1 have error(s)
Only failed tests results shown below:

Test	Result	Time (s)	Test File
test_02_vpc_privategw_static_routes	`Failure`	177.50	test_privategw_acl.py
test_03_vpc_privategw_restart_vpc_cleanup	`Failure`	174.52	test_privategw_acl.py
test_04_rvpc_privategw_static_routes	`Failure`	244.24	test_privategw_acl.py

andrijapanicsb · 2020-04-04T12:15:33Z

@ustcweizhou @weizhouapache yes, of course

DaanHoogland

one possible improvement but code looks good. let's merge if testing is good.

DaanHoogland · 2020-04-06T07:46:42Z

engine/schema/src/main/java/com/cloud/capacity/dao/CapacityDaoImpl.java


    @Override
-    public List<Long> orderHostsByFreeCapacity(Long clusterId, short capacityTypeForOrdering){
+    public List<Long> orderHostsByFreeCapacity(Long zoneId, Long clusterId, short capacityTypeForOrdering){


@weizhouapache wouldn't a cluster imply a zone? we wouldn't have to pass a zone if we already know the cluster.

@DaanHoogland if we search storage pools by clusterid, we would not get the list of zone-wide storage pools, because cluster_id is NULL.

makes sense, but if we have a cluster we can find the zone it is in and use that to find zone wide pools for it?

to be clear, I think this is aesthetic, not a potential bug.

@DaanHoogland
please look at the code, there is only 'AND cluster_id = ' in the sql.
you can also execute the query in mysql

select * from op_host_capacity where capacity_type=3 and cluster_id=??

it lists only the cluster-wide storage pools.

yes, i read the code. Either zone_id or cluster_id should be null. What is the use case of both being not null?
My point is; pass the cluster and find all storage pools for that cluster either cluster wide or for the zone the cluster is in.

@DaanHoogland what if the storage pool is used for another zone ?

if we pass the cluster and deduce the zone from that, or if we deliberately pass another zone than the cluster we are looking to deploy in? I don't see how can it ever be another zone,except for a bad bug. Or do you mean a storage pool that is used for multiple zones? I have not seen that before.

@DaanHoogland as I said, if we pass the cluster id, it lists only cluster-wide storage pools.
To list zone-wide storage pools, we should pass the zoneId/data_center_id (and cluster_id should be NULL).

weizhouapache · 2020-04-06T09:30:57Z

@andrijapanicsb if you cannot reproduce the issue, please do not merge this PR.

ps: I am still able to reproduce the issue. We had the issue in 4.7 as well.

andrijapanicsb · 2020-04-06T09:41:40Z

@weizhouapache - I have not TRIED to reproduce the issue - but I can if you find it needed?

andrijapanicsb · 2020-04-06T10:03:11Z

@weizhouapache which version should I test?

weizhouapache · 2020-04-06T11:15:59Z

@weizhouapache which version should I test?

@andrijapanicsb 4.13 or 4.14/master

nvazquez · 2020-04-06T13:46:47Z

@blueorangutan package

blueorangutan · 2020-04-06T13:47:22Z

@nvazquez a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

blueorangutan · 2020-04-06T14:05:03Z

Packaging result: ✔centos7 ✔debian. JID-1131

nvazquez

LGTM
Tested manually on KVM env with zone-wide storage pool and vm.allocation.algorithm = firstfitleastconsumed

weizhouapache · 2020-04-06T19:16:24Z

LGTM
Tested manually on KVM env with zone-wide storage pool and vm.allocation.algorithm = firstfitleastconsumed

@nvazquez thanks for approval. Have you tested without this PR ?

nvazquez · 2020-04-06T19:48:57Z

@weizhouapache yes, tested in master and failed with this error:

2020-04-06 19:45:04,506 DEBUG [c.c.a.m.a.i.FirstFitAllocator] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7 FirstFitRoutingAllocator) (logid:bc36d8d1) Found a suitable host, adding to list: 1
2020-04-06 19:45:04,507 DEBUG [c.c.a.m.a.i.FirstFitAllocator] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7 FirstFitRoutingAllocator) (logid:bc36d8d1) Host Allocator returning 1 suitable hosts
2020-04-06 19:45:04,510 DEBUG [c.c.d.DeploymentPlanningManagerImpl] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) Checking suitable pools for volume (Id, Type): (129,ROOT)
2020-04-06 19:45:04,510 DEBUG [c.c.d.DeploymentPlanningManagerImpl] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) We need to allocate new storagepool for this volume
2020-04-06 19:45:04,510 DEBUG [c.c.d.DeploymentPlanningManagerImpl] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) Calling StoragePoolAllocators to find suitable pools
2020-04-06 19:45:04,511 DEBUG [o.a.c.s.a.LocalStoragePoolAllocator] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) LocalStoragePoolAllocator trying to find storage pool to fit the vm
2020-04-06 19:45:04,511 DEBUG [o.a.c.s.a.ClusterScopeStoragePoolAllocator] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) ClusterScopeStoragePoolAllocator looking for storage pool
2020-04-06 19:45:04,515 DEBUG [o.a.c.s.a.ClusterScopeStoragePoolAllocator] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) Looking for pools in dc: 1  pod:1  cluster:1. Disabled pools will be ignored.
2020-04-06 19:45:04,517 DEBUG [o.a.c.s.a.ClusterScopeStoragePoolAllocator] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) Found pools matching tags: []
2020-04-06 19:45:04,517 DEBUG [o.a.c.s.a.ClusterScopeStoragePoolAllocator] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) No storage pools available for shared volume allocation, returning
2020-04-06 19:45:04,518 DEBUG [o.a.c.s.a.ZoneWideStoragePoolAllocator] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) ZoneWideStoragePoolAllocator to find storage pool
2020-04-06 19:45:04,521 DEBUG [o.a.c.s.a.AbstractStoragePoolAllocator] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) Checking if storage pool is suitable, name: null ,poolId: 2
2020-04-06 19:45:04,523 INFO  [c.c.s.StorageManagerImpl] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) Storage pool null (2) does not supply IOPS capacity, assuming enough capacity
2020-04-06 19:45:04,531 DEBUG [c.c.s.StorageManagerImpl] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) Checking pool 2 for storage, totalSize: 2164369915904, usedBytes: 1493502525440, usedPct: 0.6900403274253623, disable threshold: 0.85
2020-04-06 19:45:04,531 DEBUG [c.c.s.StorageManagerImpl] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) Destination pool id: 2
2020-04-06 19:45:04,551 DEBUG [c.c.s.StorageManagerImpl] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) Pool ID for the volume with ID 129 is null
2020-04-06 19:45:04,554 DEBUG [c.c.s.StorageManagerImpl] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) Found storage pool ps of type NetworkFilesystem with over-provisioning factor 2
2020-04-06 19:45:04,555 DEBUG [c.c.s.StorageManagerImpl] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) Total over-provisioned capacity calculated is 2 * 2164369915904
2020-04-06 19:45:04,555 DEBUG [c.c.s.StorageManagerImpl] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) Total capacity of the pool ps with ID 2 is 4328739831808
2020-04-06 19:45:04,559 DEBUG [c.c.s.StorageManagerImpl] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) Checking pool: 2 for storage allocation , maxSize : 4328739831808, totalAllocatedSize : 12784435712, askingSize : 8589934592, allocated disable threshold: 0.85
2020-04-06 19:45:04,563 DEBUG [o.a.c.s.a.AbstractStoragePoolAllocator] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) List of pools in descending order of free capacity: []
2020-04-06 19:45:04,563 DEBUG [c.c.d.DeploymentPlanningManagerImpl] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) No suitable pools found for volume: Vol[129|vm=123|ROOT] under cluster: 1
2020-04-06 19:45:04,563 DEBUG [c.c.d.DeploymentPlanningManagerImpl] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) No suitable pools found
2020-04-06 19:45:04,563 DEBUG [c.c.d.DeploymentPlanningManagerImpl] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) No suitable storagePools found under this Cluster: 1
2020-04-06 19:45:04,567 DEBUG [c.c.d.DeploymentPlanningManagerImpl] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) Could not find suitable Deployment Destination for this VM under any clusters, returning. 
2020-04-06 19:45:04,567 DEBUG [c.c.d.FirstFitPlanner] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) Searching all possible resources under this Zone: 1
2020-04-06 19:45:04,568 DEBUG [c.c.d.FirstFitPlanner] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) Listing clusters in order of aggregate capacity, that have (atleast one host with) enough CPU and RAM capacity under this Zone: 1
2020-04-06 19:45:04,569 DEBUG [c.c.d.FirstFitPlanner] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) Removing from the clusterId list these clusters from avoid set: [1]
2020-04-06 19:45:04,570 DEBUG [c.c.d.FirstFitPlanner] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) No clusters found after removing disabled clusters and clusters in avoid list, returning.
2020-04-06 19:45:04,572 DEBUG [c.c.v.UserVmManagerImpl] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) Destroying vm VM[User|i-2-123-VM] as it failed to create on Host with Id:null
2020-04-06 19:45:04,578 DEBUG [c.c.c.CapacityManagerImpl] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) VM state transitted from :Stopped to Error with event: OperationFailedToErrorvm's original host id: null new host id: null host id before state transition: null
2020-04-06 19:45:04,594 DEBUG [c.c.r.ResourceLimitManagerImpl] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) Updating resource Type = volume count for Account = 2 Operation = decreasing Amount = 1
2020-04-06 19:45:04,601 DEBUG [c.c.r.ResourceLimitManagerImpl] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) Updating resource Type = primary_storage count for Account = 2 Operation = decreasing Amount = 8589934592
2020-04-06 19:45:04,612 WARN  [c.c.a.AlertManagerImpl] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) AlertType:: 8 | dataCenterId:: 1 | podId:: null | clusterId:: null | message:: Failed to deploy Vm with Id: 123, on Host with Id: null
2020-04-06 19:45:04,619 DEBUG [c.c.r.ResourceLimitManagerImpl] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) Updating resource Type = user_vm count for Account = 2 Operation = decreasing Amount = 1
2020-04-06 19:45:04,625 DEBUG [c.c.r.ResourceLimitManagerImpl] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) Updating resource Type = cpu count for Account = 2 Operation = decreasing Amount = 1
2020-04-06 19:45:04,632 DEBUG [c.c.r.ResourceLimitManagerImpl] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) Updating resource Type = memory count for Account = 2 Operation = decreasing Amount = 512
2020-04-06 19:45:04,643 INFO  [o.a.c.a.c.u.v.DeployVMCmd] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) com.cloud.exception.InsufficientServerCapacityException: Unable to create a deployment for VM[User|i-2-123-VM]Scope=interface com.cloud.dc.DataCenter; id=1
2020-04-06 19:45:04,643 INFO  [o.a.c.a.c.u.v.DeployVMCmd] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) Unable to create a deployment for VM[User|i-2-123-VM]
com.cloud.exception.InsufficientServerCapacityException: Unable to create a deployment for VM[User|i-2-123-VM]Scope=interface com.cloud.dc.DataCenter; id=1
	at org.apache.cloudstack.engine.cloud.entity.api.VMEntityManagerImpl.reserveVirtualMachine(VMEntityManagerImpl.java:225)
	at org.apache.cloudstack.engine.cloud.entity.api.VirtualMachineEntityImpl.reserve(VirtualMachineEntityImpl.java:202)
	at com.cloud.vm.UserVmManagerImpl.startVirtualMachine(UserVmManagerImpl.java:4766)
	at com.cloud.vm.UserVmManagerImpl.startVirtualMachine(UserVmManagerImpl.java:4315)
	at com.cloud.vm.UserVmManagerImpl.startVirtualMachine(UserVmManagerImpl.java:4305)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:344)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:198)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163)
	at org.apache.cloudstack.network.contrail.management.EventUtils$EventInterceptor.invoke(EventUtils.java:107)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:175)
	at com.cloud.event.ActionEventInterceptor.invoke(ActionEventInterceptor.java:51)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:175)
	at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:95)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186)
	at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:212)
	at com.sun.proxy.$Proxy176.startVirtualMachine(Unknown Source)
	at org.apache.cloudstack.api.command.user.vm.DeployVMCmd.execute(DeployVMCmd.java:635)
	at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:156)
	at com.cloud.api.ApiAsyncJobDispatcher.runJob(ApiAsyncJobDispatcher.java:108)
	at org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.runInContext(AsyncJobManagerImpl.java:586)
	at org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
	at org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
	at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
	at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
	at org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
	at org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.run(AsyncJobManagerImpl.java:534)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:834)
2020-04-06 19:45:04,663 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (API-Job-Executor-1:ctx-acf00d62 job-1045) (logid:bc36d8d1) Complete async job-1045, jobStatus: FAILED, resultCode: 530, result: org.apache.cloudstack.api.response.ExceptionResponse/null/{"uuidList":[],"errorcode":533,"errortext":"Unable to create a deployment for VM[User|i-2-123-VM]"}

weizhouapache · 2020-04-06T19:59:24Z

@weizhouapache yes, tested in master and failed with this error:

2020-04-06 19:45:04,506 DEBUG [c.c.a.m.a.i.FirstFitAllocator] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7 FirstFitRoutingAllocator) (logid:bc36d8d1) Found a suitable host, adding to list: 1
2020-04-06 19:45:04,507 DEBUG [c.c.a.m.a.i.FirstFitAllocator] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7 FirstFitRoutingAllocator) (logid:bc36d8d1) Host Allocator returning 1 suitable hosts
2020-04-06 19:45:04,510 DEBUG [c.c.d.DeploymentPlanningManagerImpl] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) Checking suitable pools for volume (Id, Type): (129,ROOT)
2020-04-06 19:45:04,510 DEBUG [c.c.d.DeploymentPlanningManagerImpl] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) We need to allocate new storagepool for this volume
2020-04-06 19:45:04,510 DEBUG [c.c.d.DeploymentPlanningManagerImpl] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) Calling StoragePoolAllocators to find suitable pools
2020-04-06 19:45:04,511 DEBUG [o.a.c.s.a.LocalStoragePoolAllocator] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) LocalStoragePoolAllocator trying to find storage pool to fit the vm
2020-04-06 19:45:04,511 DEBUG [o.a.c.s.a.ClusterScopeStoragePoolAllocator] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) ClusterScopeStoragePoolAllocator looking for storage pool
2020-04-06 19:45:04,515 DEBUG [o.a.c.s.a.ClusterScopeStoragePoolAllocator] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) Looking for pools in dc: 1  pod:1  cluster:1. Disabled pools will be ignored.
2020-04-06 19:45:04,517 DEBUG [o.a.c.s.a.ClusterScopeStoragePoolAllocator] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) Found pools matching tags: []
2020-04-06 19:45:04,517 DEBUG [o.a.c.s.a.ClusterScopeStoragePoolAllocator] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) No storage pools available for shared volume allocation, returning
2020-04-06 19:45:04,518 DEBUG [o.a.c.s.a.ZoneWideStoragePoolAllocator] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) ZoneWideStoragePoolAllocator to find storage pool
2020-04-06 19:45:04,521 DEBUG [o.a.c.s.a.AbstractStoragePoolAllocator] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) Checking if storage pool is suitable, name: null ,poolId: 2
2020-04-06 19:45:04,523 INFO  [c.c.s.StorageManagerImpl] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) Storage pool null (2) does not supply IOPS capacity, assuming enough capacity
2020-04-06 19:45:04,531 DEBUG [c.c.s.StorageManagerImpl] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) Checking pool 2 for storage, totalSize: 2164369915904, usedBytes: 1493502525440, usedPct: 0.6900403274253623, disable threshold: 0.85
2020-04-06 19:45:04,531 DEBUG [c.c.s.StorageManagerImpl] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) Destination pool id: 2
2020-04-06 19:45:04,551 DEBUG [c.c.s.StorageManagerImpl] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) Pool ID for the volume with ID 129 is null
2020-04-06 19:45:04,554 DEBUG [c.c.s.StorageManagerImpl] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) Found storage pool ps of type NetworkFilesystem with over-provisioning factor 2
2020-04-06 19:45:04,555 DEBUG [c.c.s.StorageManagerImpl] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) Total over-provisioned capacity calculated is 2 * 2164369915904
2020-04-06 19:45:04,555 DEBUG [c.c.s.StorageManagerImpl] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) Total capacity of the pool ps with ID 2 is 4328739831808
2020-04-06 19:45:04,559 DEBUG [c.c.s.StorageManagerImpl] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) Checking pool: 2 for storage allocation , maxSize : 4328739831808, totalAllocatedSize : 12784435712, askingSize : 8589934592, allocated disable threshold: 0.85
2020-04-06 19:45:04,563 DEBUG [o.a.c.s.a.AbstractStoragePoolAllocator] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) List of pools in descending order of free capacity: []
2020-04-06 19:45:04,563 DEBUG [c.c.d.DeploymentPlanningManagerImpl] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) No suitable pools found for volume: Vol[129|vm=123|ROOT] under cluster: 1
2020-04-06 19:45:04,563 DEBUG [c.c.d.DeploymentPlanningManagerImpl] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) No suitable pools found
2020-04-06 19:45:04,563 DEBUG [c.c.d.DeploymentPlanningManagerImpl] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) No suitable storagePools found under this Cluster: 1
2020-04-06 19:45:04,567 DEBUG [c.c.d.DeploymentPlanningManagerImpl] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) Could not find suitable Deployment Destination for this VM under any clusters, returning. 
2020-04-06 19:45:04,567 DEBUG [c.c.d.FirstFitPlanner] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) Searching all possible resources under this Zone: 1
2020-04-06 19:45:04,568 DEBUG [c.c.d.FirstFitPlanner] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) Listing clusters in order of aggregate capacity, that have (atleast one host with) enough CPU and RAM capacity under this Zone: 1
2020-04-06 19:45:04,569 DEBUG [c.c.d.FirstFitPlanner] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) Removing from the clusterId list these clusters from avoid set: [1]
2020-04-06 19:45:04,570 DEBUG [c.c.d.FirstFitPlanner] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) No clusters found after removing disabled clusters and clusters in avoid list, returning.
2020-04-06 19:45:04,572 DEBUG [c.c.v.UserVmManagerImpl] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) Destroying vm VM[User|i-2-123-VM] as it failed to create on Host with Id:null
2020-04-06 19:45:04,578 DEBUG [c.c.c.CapacityManagerImpl] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) VM state transitted from :Stopped to Error with event: OperationFailedToErrorvm's original host id: null new host id: null host id before state transition: null
2020-04-06 19:45:04,594 DEBUG [c.c.r.ResourceLimitManagerImpl] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) Updating resource Type = volume count for Account = 2 Operation = decreasing Amount = 1
2020-04-06 19:45:04,601 DEBUG [c.c.r.ResourceLimitManagerImpl] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) Updating resource Type = primary_storage count for Account = 2 Operation = decreasing Amount = 8589934592
2020-04-06 19:45:04,612 WARN  [c.c.a.AlertManagerImpl] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) AlertType:: 8 | dataCenterId:: 1 | podId:: null | clusterId:: null | message:: Failed to deploy Vm with Id: 123, on Host with Id: null
2020-04-06 19:45:04,619 DEBUG [c.c.r.ResourceLimitManagerImpl] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) Updating resource Type = user_vm count for Account = 2 Operation = decreasing Amount = 1
2020-04-06 19:45:04,625 DEBUG [c.c.r.ResourceLimitManagerImpl] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) Updating resource Type = cpu count for Account = 2 Operation = decreasing Amount = 1
2020-04-06 19:45:04,632 DEBUG [c.c.r.ResourceLimitManagerImpl] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) Updating resource Type = memory count for Account = 2 Operation = decreasing Amount = 512
2020-04-06 19:45:04,643 INFO  [o.a.c.a.c.u.v.DeployVMCmd] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) com.cloud.exception.InsufficientServerCapacityException: Unable to create a deployment for VM[User|i-2-123-VM]Scope=interface com.cloud.dc.DataCenter; id=1
2020-04-06 19:45:04,643 INFO  [o.a.c.a.c.u.v.DeployVMCmd] (API-Job-Executor-1:ctx-acf00d62 job-1045 ctx-c26962c7) (logid:bc36d8d1) Unable to create a deployment for VM[User|i-2-123-VM]
com.cloud.exception.InsufficientServerCapacityException: Unable to create a deployment for VM[User|i-2-123-VM]Scope=interface com.cloud.dc.DataCenter; id=1
	at org.apache.cloudstack.engine.cloud.entity.api.VMEntityManagerImpl.reserveVirtualMachine(VMEntityManagerImpl.java:225)
	at org.apache.cloudstack.engine.cloud.entity.api.VirtualMachineEntityImpl.reserve(VirtualMachineEntityImpl.java:202)
	at com.cloud.vm.UserVmManagerImpl.startVirtualMachine(UserVmManagerImpl.java:4766)
	at com.cloud.vm.UserVmManagerImpl.startVirtualMachine(UserVmManagerImpl.java:4315)
	at com.cloud.vm.UserVmManagerImpl.startVirtualMachine(UserVmManagerImpl.java:4305)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:344)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:198)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163)
	at org.apache.cloudstack.network.contrail.management.EventUtils$EventInterceptor.invoke(EventUtils.java:107)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:175)
	at com.cloud.event.ActionEventInterceptor.invoke(ActionEventInterceptor.java:51)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:175)
	at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:95)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186)
	at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:212)
	at com.sun.proxy.$Proxy176.startVirtualMachine(Unknown Source)
	at org.apache.cloudstack.api.command.user.vm.DeployVMCmd.execute(DeployVMCmd.java:635)
	at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:156)
	at com.cloud.api.ApiAsyncJobDispatcher.runJob(ApiAsyncJobDispatcher.java:108)
	at org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.runInContext(AsyncJobManagerImpl.java:586)
	at org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
	at org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
	at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
	at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
	at org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
	at org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.run(AsyncJobManagerImpl.java:534)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:834)
2020-04-06 19:45:04,663 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (API-Job-Executor-1:ctx-acf00d62 job-1045) (logid:bc36d8d1) Complete async job-1045, jobStatus: FAILED, resultCode: 530, result: org.apache.cloudstack.api.response.ExceptionResponse/null/{"uuidList":[],"errorcode":533,"errortext":"Unable to create a deployment for VM[User|i-2-123-VM]"}

@nvazquez good. We got same issue and it's fixed by this PR.

andrijapanicsb · 2020-04-06T20:01:32Z

3 x LGTM, automated tests fine (3 known issues in 4,13, fixed in 4,14), manual tests fine

Merging

weizhouapache · 2020-04-06T20:04:58Z

thanks @andrijapanicsb

* 4.13: Snapshot deletion issues (#3969) server: Cannot list affinity group if there are hosts dedicated… (#4025) server: Search zone-wide storage pool when allocation algothrim is firstfitleastconsumed (#4002)

server: Search zone-wide storage pool when allocation algothrim is fi…

089835b

…rstfitleastconsumed

svenvogel added type:bug component:management-server labels Apr 1, 2020

andrijapanicsb added this to the 4.13.1.0 milestone Apr 3, 2020

andrijapanicsb self-requested a review April 3, 2020 19:53

andrijapanicsb approved these changes Apr 3, 2020

View reviewed changes

andrijapanicsb requested a review from DaanHoogland April 4, 2020 12:14

DaanHoogland approved these changes Apr 6, 2020

View reviewed changes

nvazquez approved these changes Apr 6, 2020

View reviewed changes

andrijapanicsb merged commit 6bf92fb into apache:4.13 Apr 6, 2020

Conversation

ustcweizhou commented Mar 30, 2020

Description

Types of changes

Screenshots (if appropriate):

How Has This Been Tested?

Uh oh!

DaanHoogland commented Apr 1, 2020

Uh oh!

andrijapanicsb commented Apr 1, 2020

Uh oh!

DaanHoogland commented Apr 3, 2020

Uh oh!

andrijapanicsb commented Apr 3, 2020

Uh oh!

blueorangutan commented Apr 3, 2020

Uh oh!

blueorangutan commented Apr 3, 2020

Uh oh!

andrijapanicsb commented Apr 3, 2020

Uh oh!

blueorangutan commented Apr 3, 2020

Uh oh!

andrijapanicsb commented Apr 3, 2020

Uh oh!

andrijapanicsb commented Apr 3, 2020

Uh oh!

blueorangutan commented Apr 3, 2020

Uh oh!

andrijapanicsb commented Apr 3, 2020

Uh oh!

weizhouapache commented Apr 3, 2020

Uh oh!

blueorangutan commented Apr 4, 2020

Uh oh!

blueorangutan commented Apr 4, 2020

Uh oh!

andrijapanicsb commented Apr 4, 2020

Uh oh!

DaanHoogland left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

weizhouapache commented Apr 6, 2020

Uh oh!

andrijapanicsb commented Apr 6, 2020

Uh oh!

andrijapanicsb commented Apr 6, 2020

Uh oh!

weizhouapache commented Apr 6, 2020

Uh oh!

nvazquez commented Apr 6, 2020

Uh oh!

blueorangutan commented Apr 6, 2020

Uh oh!

blueorangutan commented Apr 6, 2020

Uh oh!

nvazquez left a comment

Choose a reason for hiding this comment

Uh oh!

weizhouapache commented Apr 6, 2020

Uh oh!

nvazquez commented Apr 6, 2020