Skip to content

Conversation

@btovar
Copy link
Member

@btovar btovar commented Dec 9, 2024

Fix #3999

Proposed Changes

Give an overall description of the changes, along with the context and motivation.
Mention relevant issues and pull requests as needed.

Merge Checklist

The following items must be completed before PRs can be merged.
Check these off to verify you have completed all steps.

  • make test Run local tests prior to pushing.
  • make format Format source code to comply with lint policies. Note that some lint errors can only be resolved manually (e.g., Python)
  • make lint Run lint on source code prior to pushing.
  • Manual Update: Update the manual to reflect user-visible changes.
  • Type Labels: Select a github label for the type: bugfix, enhancement, etc.
  • Product Labels: Select a github label for the product: TaskVine, Makeflow, etc.
  • PR RTM: Mark your PR as ready to merge.

@colinthomas-z80
Copy link
Contributor

colinthomas-z80 commented Dec 9, 2024

Make sure this works even when both manager name and object are specified. like:

Factory(manager=m, manager_name="xyz")

Where it "working" means that Factory(manager=m, manager_name="xyz", batch_type="condor") will produce workers that point to your hostname:port rather than localhost:port

@btovar
Copy link
Member Author

btovar commented Dec 9, 2024

localhost:port should only be set when using batch_type local. Is this not the case?

@colinthomas-z80
Copy link
Contributor

Yeah the problem I brought up in the issue was that when the manager object is passed to the factory it always invokes the worker with localhost:port regardless of batch type or other kwargs. So condor workers would not connect

@btovar
Copy link
Member Author

btovar commented Dec 9, 2024

So this pr does not solve it? In the examples you posted in the issue the batch type is not being set, so it defaults to "local", and that's why the localhost:port is used. This pr is trying to make it so that you only get localhost:port if you specified a manager and the batch_type is local.

@colinthomas-z80
Copy link
Contributor

It might be that I was creating the factory giving only the manager object as an argument, then modifying the factory object afterwards like f.batch_type = "condor". So the batch type gets changed but it does not modify the manager_hostport

@colinthomas-z80
Copy link
Contributor

Could we stop using localhost and always use a proper hostname so that local and remote workers can resolve if batch_type gets changed on the fly?

@btovar
Copy link
Member Author

btovar commented Dec 9, 2024

I'm more inclined to forbid changing the batch type and manager host port after init. This would reflect what the vine_factory command line does. We should only update parameters that correspond to fields in the config file once the factory is created. The localhost:port default is meant to be user friendly for the introductory examples.

@colinthomas-z80
Copy link
Contributor

Sounds good to me.

@btovar
Copy link
Member Author

btovar commented Dec 9, 2024

does this one work for you?

@colinthomas-z80
Copy link
Contributor

yes!

@btovar btovar merged commit baaf877 into cooperative-computing-lab:master Dec 9, 2024
10 checks passed
@btovar btovar deleted the factory_fix_name branch December 9, 2024 17:52
btovar added a commit that referenced this pull request Dec 14, 2024
… available (#4001)

* vine factory:always use manager.name as fallback when no host:port is available

* lint

* do not allow changing batch-type, manager-name after initial configuration
dthain pushed a commit to dthain/cctools that referenced this pull request Jan 10, 2025
… available (cooperative-computing-lab#4001)

* vine factory:always use manager.name as fallback when no host:port is available

* lint

* do not allow changing batch-type, manager-name after initial configuration
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

vine: in-context factory uses wrong hosturl for remote workers when given manager object as argument.

2 participants