-
Notifications
You must be signed in to change notification settings - Fork 3k
Discover GPU vendor from CDI spec before injecting GPU for --gpus option #28008
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
4f051d4 to
44cc798
Compare
44cc798 to
2b50a66
Compare
Luap99
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure we want to expand the gpu option further, but if other maitnainers want this then this must at least bve implemented in a way where cdi's are resolved on the server side no the client.
pkg/specgenutil/specgen.go
Outdated
| devices := c.Devices | ||
| for _, gpu := range c.GPUs { | ||
| devices = append(devices, "nvidia.com/gpu="+gpu) | ||
| if len(c.GPUs) > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that cannot be handled here, FillOutSpecGen() is called on the client side, i.e. podman-remote and the cdi configs only ever exists on the server.
That is also why the bloat check is failing because this would drag all the cdi code into the remote client which we should avoid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood. I will look into moving this code to server-side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have moved the GPUs to CDI device resolution logic to server side. Please have a look.
Thank you so much for taking out your time to see this, btw.
pkg/specgen/gpus.go
Outdated
| for _, knownVendor := range knownGPUVendors { | ||
| for _, vendor := range vendors { | ||
| if vendor == knownVendor { | ||
| logrus.Debugf("Discovered GPU vendor from CDI specs: %s", vendor) | ||
| return vendor, nil | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure the UX of this would make sense, what if you have both an amd and nvidia card in the same system.
I think the --gpu option is just a bad idea in general.
I am not sure if we should continue expanding the --gpu option at all for this. Users should really just directly use the --device cdi syntac directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think keeping the --gpus option is very helpful for users to just request GPUs without worrying to know what cdi is and how it works.
Also, having both NVIDIA and AMD gpus on the same machine is a rare case and the logic implemented here will favour nvidia (it's okay).
Another argument in favour of these changes, it is better to support AMD gpus at least on the machines where there are only AMD gpus rather than not supporting them at all :-)
We can work out the logic for machines with mixed vendors in future, but it would be really nice to start supporting AMD gpus for now.
(our customers love podman <3 😃)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rather than not supporting them at all :-)
I mean my point is we do support cdi so if they have cdi files we already support them.
But sure I get your point, multiple gpu vendor use case can be limited to them having to use the full proper cdi syntax.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also think that handling the common / simplest case here (i.e. a single vendor's CDI spec) here is fine -- given that using --device=vendor1.com/gpu=ID1 --device=vendor2.com/gpu=ID2 is already possible. With this in mind, does it make sense to log a warning if multiple supported vendors are detected?
As a follow-up: Does it make sense to make the chosen vendor configurable, or is that not control that we expect users to actually want to use?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I rather keep this straight forward, for any complex scenario people should use the cdi syntax on --device. It may not be optimal but I think the need support for amd and nvidia card in the same system will be rare enough so it should not matter.
2b50a66 to
3fb8e70
Compare
|
are these here the right official docs for AMD cdi? https://instinct.docs.amd.com/projects/container-toolkit/en/latest/container-runtime/quick-start-guide.html#regenerating-and-validating-cdi-specifications We should likely add links to the docs for both nvidia and amd in the man page on how to generate the cdi files because this won't work without them. I do a proper review later |
Yes, @Luap99. This should do for now. If you say, I can add it in the docs in this PR itself. Also, once this is merged, we might organize our docs to have end to end guide on how to get things working with podman. |
ab24545 to
ca6e32c
Compare
ninja-quokka
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work @shiv-tyagi
I think this looks nice, please update your PR description with a release note under the Does this PR introduce a user-facing change? section though, I think this change is worthy of it and users with AMD GPUs would like to know.
pkg/specgen/gpus.go
Outdated
| registry, err := cdi.NewCache( | ||
| cdi.WithSpecDirs(cdiSpecDirs...), | ||
| cdi.WithAutoRefresh(false), | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is required to ensure that the same registry that is used to perform injection is used to resolve the vendor? Since these are both instantiated at different points in time, it may be that their view of the world (in terms of valid vendors) is not consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be a valid point. I will look into this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this is already kinda broken in DevicesFromPath which is called inside a loop so we get a new cache each time which seems wrong. So rework wise it would be best to move this up the call stack outside the loop and then we can reuse this for all lookups.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We now use the same registry. The GPUs are resolved to CDI devices just before other CDI devices are injected into the container. And the same CDI cache is used to discover the vendor which is used to inject the devices.
| for _, gpu := range c.GPUs { | ||
| devices = append(devices, "nvidia.com/gpu="+gpu) | ||
| } | ||
| s.GPUs = c.GPUs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we are not expliciltly updating s.Devices based on s.GPUs here, we need to ensure that s.ResolveGPUsToCDIDevices is called before we call DevicesFromPath for the generated devices. How is this done?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DevicesFromPath is called inside SpecGenToOCI which is called after the s.ResolveGPUsToCDIDevices in the MakeContainer call, which would ensure the device resolution precedes the actual injection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think logically it would be niecer to have this moved directly above the DevicesFromPath() calls and just directly call InjectDevices() on the path without first appending it to the devices string list. That way it would not have to do the isCDIDevice() condition of DevicesFromPath()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have reworked the logic. Now we pass the GPUs directly into ContainerMiscConfig using libpod.WithGPUs (which is similar to libpod.WithCDI) where they get resolved to full device ids just before other CDI devices are injected. DevicesFromPath is completely skipped this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I think this looks a lot better. The one question I have (which is probably out of scope for this PR) is how the injection we're doing here differs from the
func DevicesFromPath(g *generate.Generator, devicePath string, config *config.Config) error {
implementations that also exist in the codebase (https://github.com/shiv-tyagi/podman/blob/ddf6b4c0bcd964dd7ce28e97e051eea8a851e200/pkg/specgen/generate/config_linux.go#L27 and https://github.com/shiv-tyagi/podman/blob/ddf6b4c0bcd964dd7ce28e97e051eea8a851e200/pkg/specgen/generate/config_freebsd.go#L20-L21)? Do those also need to be extended to support GPUs? (I'm happy to move this to another issue if that requires further/separate discussion).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the DevicesFromPath only receives the devices mentioned in containers.conf (see this). I am not sure if that has anything to do with GPU support.
The ones we pass using --device option are extracted early using
| func ExtractCDIDevices(s *specgen.SpecGenerator) []libpod.CtrCreateOption { |
libpod as we are doing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that even though DevicesFromPath does its own injection, I would still argue that this is out of scope for this PR since these devices are always fully specified (as discussed in #16232). It may make sense to see if we could move to a state where we at least reuse the same registry in a follow-up, but I don't have a good feeling for the complexity of that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have created #28060 as a follow-up to this one. Let's discuss this there.
ca6e32c to
a38eec9
Compare
|
[NON-BLOCKING] Packit jobs failed. @containers/packit-build please check. Everyone else, feel free to ignore. |
a38eec9 to
176b4aa
Compare
Honny1
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only checked the code structure and found just a small nit. I also have a question about Apple GPU access through libkrun. Will it work, or could it break the Apple GPU? I haven't dug into it yet, but that was the first thing that came to mind.
| return "", fmt.Errorf("CDI cache cannot be nil") | ||
| } | ||
|
|
||
| knownGPUVendors := []string{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would use map[string]struct{} to reduce nested loops.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it would be even simpler to just inline that as switch case statement
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it is just a nit, I would want to keep it like this to make sure the order in which known vendors are searched looks for nvidia first.
Using map/switch for knownGPUVendors would change it to return the first known vendor among the seen vendors, which can be either depending on who is first in cdiCache.ListVendors().
for example, if cdiCache.ListVendors() returns - [ "some-vendor.com", "amd.com", "nvidia.com"], it will pick "amd.com" as it is the first vendor which is known whereas I want it to look for nvidia first and then amd.
cc @elezar
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Traditionally, the --gpus flag has only supported NVIDIA GPUs, but was recently changed. I would argue that many users still expect --gpus to give them an NVIDIA GPU and we should not change this behaviour now.
Even IF we decide that AMD GPUs should be preferred, we should make this explicit and deterministic and not rely on the ordering of the underlying maps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, agreed. +1 for doing it in order with NVIDIA followed by AMD.
This only makes the |
|
@shiv-tyagi Thanks for the explanation. |
Luap99
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just some small comments, I think the design of moving this into libpod is the best way to go about it.
cc @mheon
libpod/container_internal_common.go
Outdated
| cdiDevices := c.config.CDIDevices | ||
| cdiDevices = append(cdiDevices, gpuCDIDevices...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI slice handling in go sucks
This is not safe, because append may modify the underlying slice backing array directly and as long as cdiDevices has enough capacity
So this here could alter c.config.CDIDevices which would be quite unexpected.
I think the proper thing here is
cdiDevices := make([]string, 0, len(c.config.CDIDevices) + len(gpuCDIDevices))
cdiDevices = append(cdiDevices, c.config.CDIDevices...)
cdiDevices = append(cdiDevices, gpuCDIDevices...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. I have fixed this. Thanks.
| return "", fmt.Errorf("CDI cache cannot be nil") | ||
| } | ||
|
|
||
| knownGPUVendors := []string{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it would be even simpler to just inline that as switch case statement
libpod/util.go
Outdated
| // by discovering the vendor from the provided CDI cache. | ||
| func gpusToCDIDevices(gpus []string, cdiCache *cdi.Cache) ([]string, error) { | ||
| if len(gpus) == 0 { | ||
| return []string{}, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can return nil instead of empty slice which should avoid an empty allocation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Thanks for spotting.
libpod/util.go
Outdated
| return nil, fmt.Errorf("could not discover GPU vendor from CDI cache: %w", err) | ||
| } | ||
|
|
||
| cdiDevices := []string{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder the the prealloc linter is not flagging this, this can be preallocated to aviod extra memory copies/allocations.
cdiDevices := make([]string, 0, len(gpus))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Thanks.
176b4aa to
ddf6b4c
Compare
|
|
||
| gpuCDIDevices, err := gpusToCDIDevices(c.config.GPUs, registry) | ||
| if err != nil { | ||
| return nil, nil, fmt.Errorf("converting GPU identifiers to CDI devices: %w", err) | ||
| } | ||
| cdiDevices := make([]string, 0, len(c.config.CDIDevices)+len(gpuCDIDevices)) | ||
| cdiDevices = append(cdiDevices, c.config.CDIDevices...) | ||
| cdiDevices = append(cdiDevices, gpuCDIDevices...) | ||
| if _, err := registry.InjectDevices(g.Config, cdiDevices...); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would adding a helper that encapsulates this logic for a given config and registry here make things easier to follow?
func getCDIDeviceNames(registry *cdi.Cache, c *ContainerConfig) ([]string, error) {
if len(c.GPUs) == 0 {
return c.CDIDevices, nil
}
gpuCDIDevices, err := gpusToCDIDevices(c.GPUs, registry)
if err != nil {
return nil, fmt.Errorf("converting GPU identifiers to CDI devices: %w", err)
}
return append(c.CDIDevices, gpuCDIDevices...), nil
}
(Note that as a side note: I have left out the pre-allocation of the slice. Is such an optimisation worth it in go, or does the compiler generally do the right thing in simple cases such as this?)
and then we update the code here:
| gpuCDIDevices, err := gpusToCDIDevices(c.config.GPUs, registry) | |
| if err != nil { | |
| return nil, nil, fmt.Errorf("converting GPU identifiers to CDI devices: %w", err) | |
| } | |
| cdiDevices := make([]string, 0, len(c.config.CDIDevices)+len(gpuCDIDevices)) | |
| cdiDevices = append(cdiDevices, c.config.CDIDevices...) | |
| cdiDevices = append(cdiDevices, gpuCDIDevices...) | |
| if _, err := registry.InjectDevices(g.Config, cdiDevices...); err != nil { | |
| cdiDevices, err := getCDIDeviceNames(registry, c.config) | |
| if _, err := registry.InjectDevices(g.Config, cdiDevices...); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Note that as a side note: I have left out the pre-allocation of the slice. Is such an optimisation worth it in go, or does the compiler generally do the right thing in simple cases such as this?)
Worth it, well in many cases no if you measure performance of the whole program. But it can certainly add up if you adds 1000 of elements and cause a bunch of memory reallocations instead of just once allocating the right size. And really doing this does not hurt it optimises the amount of allocations we need.
The bigger issue as I described in my review comments above is that assigning a slice with := is dangerous as hell in go, just like a map this is not a deep copy, you only copy the pointer to the backing array and if that slice has enough capacity append will not create a new backing array.
What do you expect to happen here? Likely not what we wanted.
package main
import "fmt"
func main() {
sliceA := make([]string, 0, 2)
sliceA = append(sliceA, "1")
sliceB := sliceA
sliceA = append(sliceA, "A")
sliceB = append(sliceB, "B")
fmt.Println("A ", sliceA)
fmt.Println("B ", sliceB)
}https://go.dev/play/p/Ukjfci3BPzT
And then if you have callers modify elements in place you have the same issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking again into the stdlib I guess we do have slices.Concat(s1, s2) now so that can be preferred do join two slices instead of the manual make allocation, but never blindly append into an existing slice that you do not plan to modify
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
libpod/util.go
Outdated
| // It returns the vendor domain (e.g., "nvidia.com", "amd.com") that should | ||
| // be used to construct fully qualified CDI device names. | ||
| // Returns an error if no known GPU vendor is found. | ||
| func discoverGPUVendorFromCDI(cdiCache *cdi.Cache) (string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a note: if we were to define a local interface:
type vendorLister interface {
ListVendors() []string
}
here we can test this function without creating CDI specs on disk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion. I added the interface and used it in the function and the tests.
docs/source/markdown/options/gpus.md
Outdated
| GPU devices to add to the container ('all' to pass all GPUs) Currently only | ||
| Nvidia devices are supported. | ||
| NVIDIA and AMD devices are supported. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a blocker, but what about updating this to the following:
| GPU devices to add to the container ('all' to pass all GPUs) Currently only | |
| Nvidia devices are supported. | |
| NVIDIA and AMD devices are supported. | |
| Start the container with GPU support. Where `ENTRY` can be `all` to request all GPUs, or a vendor-specific identifier. Currently NVIDIA and AMD devices are supported. If both NVIDIA and AMD devices are present the NVIDIA devices will be preferred and a CDI device name must be specified using the `--device` flag to request a set of GPUs from a *specific* vendor. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that suggested wording
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @elezar. I have updated this.
6677acf to
8013d3d
Compare
…ption Signed-off-by: Shiv Tyagi <Shiv.Tyagi@amd.com>
8013d3d to
054793d
Compare
Luap99
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
elezar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @shiv-tyagi. I think this is good to go.
I've also created #28060 as a follow-up for some of our discussions.
This gets the
--gpusoption working inpodman run(orpodman create) commands with AMD gpus by discovering the vendor from CDI specs to construct device id for the GPU to inject into the container.Checklist
Ensure you have completed the following checklist for your pull request to be reviewed:
commits. (
git commit -s). (If needed, usegit commit -s --amend). The author email must matchthe sign-off email address. See CONTRIBUTING.md
for more information.
Fixes: #00000in commit message (if applicable)make validatepr(format/lint checks)Noneif no user-facing changes)Does this PR introduce a user-facing change?
Yes
--gpus option now supports AMD GPUs