-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Is your feature request related to a problem? Please describe.
Currently to run an operation on a remote ray cluster requires a number of steps including writing ray env yaml, constructing the command line, exporting a context, creating a working directory to send, creating a port-forward and possibly building wheels. This is quite complex and ado is not in loop to validate or check inputs.
It also only works when the remote ray cluster already exists.
Describe the solution you'd like.
To have this supported by ado cli e.g.
ado create operation --execution-context=$YAML -f operation.yaml
where execution context is a pydantic model that contains all details you might want to configure for a remote run. In particular ado can then
- apply intelligent defaults
- validate what is provided
Fields to support include
- Is it a one-off job or submitting to an existing cluster
- If plugins should be downloaded or wheels built locally -> should be inferred from the operation
- the environment variables to set
- ...
Then ado can
- creating working dir
- copy necessary operations files there -> including exporting the current context if not provided
- builds wheels and copy to working dir
- construct a kuberay job YAML if necessary
- construct ray job submit command line and launch job
- validate the inputs
Future additions could include creating the rayjob with data from the actuators or validating the cluster with data from the actuators (e.g. what resource they expect to be exposed)
Describe alternatives you've considered. A clear and concise description of
any alternative solutions or features you've considered.
Adding flags to ado create operation instead of a YAML
- Downside is that many flags will be required to support the variety of options - command line (and docs) will become very long with many unused flags
Creating a separate command e.g. ado run remote
- this breaks the "sense" of the CLI i.e. ado create operation is not the only way to create operations
Additional context. Add any other context or screenshots about the feature
request here.