Skip to content

Conversation

@adityagesh
Copy link
Collaborator

  1. Convert GPU Driver installation to a Tool. GPU Feature in Azure is overloaded and has multiple responsibilities.
  2. Add support for AMD GPU Driver installation
  3. Add basic amd-smi tool

@adityagesh adityagesh force-pushed the aditya/cleanup_gpu_installation branch from 7ef7268 to 09513d7 Compare October 28, 2025 13:01
GPU feature was responsible for both capability
and driver installation. This change is refactoring
without any functional changes.

The driver installation responsibility is now
converted to a tool
_is_nvidia is initialized, but never used
@adityagesh adityagesh force-pushed the aditya/cleanup_gpu_installation branch 7 times, most recently from 0081041 to 4636902 Compare October 29, 2025 08:35
@adityagesh adityagesh force-pushed the aditya/cleanup_gpu_installation branch from 4636902 to 9b0fa7a Compare October 29, 2025 10:05
@adityagesh adityagesh force-pushed the aditya/cleanup_gpu_installation branch 2 times, most recently from 505a16d to eeee25b Compare October 31, 2025 18:16
@adityagesh adityagesh force-pushed the aditya/cleanup_gpu_installation branch from eeee25b to 056479b Compare October 31, 2025 18:18
@adityagesh adityagesh force-pushed the aditya/cleanup_gpu_installation branch 3 times, most recently from 760b044 to ccc7398 Compare November 1, 2025 12:28
@adityagesh adityagesh force-pushed the aditya/cleanup_gpu_installation branch from ccc7398 to 86355c1 Compare November 1, 2025 12:31
@adityagesh adityagesh force-pushed the aditya/cleanup_gpu_installation branch 3 times, most recently from 203c42c to 73e3f98 Compare November 5, 2025 09:18
Comment on lines 89 to 93
gpu_driver_factory = Factory[GpuDriver](GpuDriver)

driver_class = gpu_driver_factory.create_by_type_name(
compute_sdk, node=node, **kwargs
)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lisa/tools/gpu_drivers.py:89:49: error: Only concrete class can be given where "Type[GpuDriver]" is expected
Found 1 error in 1 file (checked 511 source files)
nox > Command mypy -p lisa failed with exit code 1

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove abstract decorators, just raise NotImplementError

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh okay, I thought the comment regarding abstract was only for the get_dependencies method

@adityagesh adityagesh force-pushed the aditya/cleanup_gpu_installation branch 2 times, most recently from 20d47d0 to 7a950e6 Compare November 5, 2025 15:37
@adityagesh adityagesh force-pushed the aditya/cleanup_gpu_installation branch from 7a950e6 to 315488f Compare November 5, 2025 17:54
GpuDriver # type: ignore[type-abstract]
)

driver_class = gpu_driver_factory.create_by_type_name(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

driver_class -> driver, it creates an instance, not a class.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, forgot to change the variable name

compute_sdk = _get_supported_driver(node)

# Create GPU driver instance using virtual tool pattern
gpu_driver = node.tools.create(GpuDriver, compute_sdk=compute_sdk)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use tools.get. It creates the object too, but will cache it for next use.

return AmdSmi

@property
def driver_name(self) -> str:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks the driver name is duplicated with type_name, should it be deleted?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed

@adityagesh adityagesh force-pushed the aditya/cleanup_gpu_installation branch from 175d62f to 347778f Compare November 6, 2025 06:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants