Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix volume allocation on local VMFS storage #10201

Open
wants to merge 1 commit into
base: 4.19
Choose a base branch
from

Conversation

bernardodemarco
Copy link
Collaborator

Description

On the allocation process of VM volumes, the reordering of storage pools by capacity is wrongly identifying whether a pool is shared or not. A pool is considered to be shared when its type (represented by the Storage.StoragePoolType enum) is shared; however, as can be observed from the enum options, the VMFS storage type is considered to be shared.

public static enum StoragePoolType {
Filesystem(false, true, true), // local directory
NetworkFilesystem(true, true, true), // NFS
IscsiLUN(true, false, false), // shared LUN, with a clusterfs overlay
Iscsi(true, false, false), // for e.g., ZFS Comstar
ISO(false, false, false), // for iso image
LVM(false, false, false), // XenServer local LVM SR
CLVM(true, false, false),
RBD(true, true, false), // http://libvirt.org/storage.html#StorageBackendRBD
SharedMountPoint(true, true, true),
VMFS(true, true, false), // VMware VMFS storage
PreSetup(true, true, false), // for XenServer, Storage Pool is set up by customers.
EXT(false, true, false), // XenServer local EXT SR
OCFS2(true, false, false),
SMB(true, false, false),
Gluster(true, false, false),
PowerFlex(true, true, true), // Dell EMC PowerFlex/ScaleIO (formerly VxFlexOS)
ManagedNFS(true, false, false),
Linstor(true, true, false),
DatastoreCluster(true, true, false), // for VMware, to abstract pool of clusters
StorPool(true, true, true),
FiberChannel(true, true, false); // Fiber Channel Pool for KVM hypervisors is used to find the volume by WWN value (/dev/disk/by-id/wwn-<wwnvalue>)

As a consequence of that, when trying to deploy VMs in VMFS local storages, the allocation of the volumes will fail. Based on the current reordering flow, CloudStack will check that the VMFS pool type is shared and set the capacity type to be equal to Capacity.CAPACITY_TYPE_STORAGE_ALLOCATED. Then, when the DAO method is executed, the VMFS pools will not be present in the returned list of pools, as they require the capacity type Capacity.CAPACITY_TYPE_LOCAL_STORAGE to be retrieved.

if (pools.get(0).getPoolType().isShared()) {
capacityType = Capacity.CAPACITY_TYPE_STORAGE_ALLOCATED;
} else {
capacityType = Capacity.CAPACITY_TYPE_LOCAL_STORAGE;
}
List<Long> poolIdsByCapacity = capacityDao.orderHostsByFreeCapacity(zoneId, clusterId, capacityType);

Therefore, this PR proposes to fix this issue by changing the validation of whether a given storage pool is shared. For this check, the com.cloud.storage.StoragePool#isShared method is now invoked, which considers a storage pool to be local if it has a scope equal to HOST.

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)
  • build/CI
  • test (unit or integration test code)

Feature/Enhancement Scale or Bug Severity

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

Screenshots (if appropriate):

How Has This Been Tested?

To validate the allocation process of VM volumes before the PR changes, I first changed the vm.allocation.algorithm value to firstfitleastconsumed. Then, when deploying a VM with a local storage compute offering, I checked through the logs that no VMFS pool was being retrieved. When reproducing the same steps after applying the PR changes, I verified that the VMFS pools were correctly retrieved.

@bernardodemarco
Copy link
Collaborator Author

@blueorangutan package

@blueorangutan
Copy link

@bernardodemarco a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

Copy link

codecov bot commented Jan 16, 2025

Codecov Report

Attention: Patch coverage is 12.50000% with 7 lines in your changes missing coverage. Please review.

Project coverage is 15.13%. Comparing base (35fe19f) to head (d25141f).
Report is 4 commits behind head on 4.19.

Files with missing lines Patch % Lines
...torage/allocator/AbstractStoragePoolAllocator.java 12.50% 7 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff            @@
##               4.19   #10201   +/-   ##
=========================================
  Coverage     15.13%   15.13%           
- Complexity    11272    11275    +3     
=========================================
  Files          5408     5408           
  Lines        473958   473979   +21     
  Branches      57811    57814    +3     
=========================================
+ Hits          71721    71730    +9     
- Misses       394219   394232   +13     
+ Partials       8018     8017    -1     
Flag Coverage Δ
uitests 4.30% <ø> (ø)
unittests 15.85% <12.50%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 12100

@blueorangutan
Copy link

[LL] Trillian Build Failed (tid-7057)

Copy link
Member

@weizhouapache weizhouapache left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code lgtm

not tested yet

@blueorangutan
Copy link

[LL]Trillian test result (tid-7064)
Environment: kvm-rocky8 (x2), Advanced Networking with Mgmt server r8
Total time taken: 41863 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr10201-t7064-kvm-rocky8.zip
Smoke tests completed. 132 look OK, 1 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_01_migrate_VM_and_root_volume Error 69.56 test_vm_life_cycle.py
test_02_migrate_VM_with_two_data_disks Error 47.37 test_vm_life_cycle.py
test_08_migrate_vm Error 42.78 test_vm_life_cycle.py

@bernardodemarco
Copy link
Collaborator Author

[LL]Trillian test result (tid-7064) Environment: kvm-rocky8 (x2), Advanced Networking with Mgmt server r8 Total time taken: 41863 seconds Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr10201-t7064-kvm-rocky8.zip Smoke tests completed. 132 look OK, 1 have errors, 0 did not run Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_01_migrate_VM_and_root_volume Error 69.56 test_vm_life_cycle.py
test_02_migrate_VM_with_two_data_disks Error 47.37 test_vm_life_cycle.py
test_08_migrate_vm Error 42.78 test_vm_life_cycle.py

The integration test errors appear to be related to environment issues. The three test cases returned the following error message:

 Cannot migrate VM, destination host pr10201-t7064-kvm-rocky8-kvm2 (ID: 2b59b904-b34b-475a-a997-fa5de6ac584f) is not in correct state, has status: Connecting, state: Enabled

Copy link
Contributor

@DaanHoogland DaanHoogland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clgtm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants