-
Notifications
You must be signed in to change notification settings - Fork 51
Open
Labels
infraIssues related to infrastructureIssues related to infrastructureinitiativeLarge piece of work covering multiple sprintLarge piece of work covering multiple sprint
Milestone
Description
Describe the task. Describe the task. It can be a feature, a set of experiments, documentation, etc.
Use case: as a scientist, I launch a number of experiments such as using this loop:
for lr in "5e-5" "1e-4" "2e-4" "4e-4" ; do
for node in 2 4 8 ; do
echo "$lr $node"
../WeatherGenerator-private/hpc/launch-slurm.py --chain-jobs 1 --nodes "$node" --options "wgtags.org='ecmwf'" "wgtags.exp='lr_scaling'" "wgtags.issue='1168'" "lr_max=$lr" "num_mini_epochs=1024" "wgtags.num_nodes=$nodes"
done
done
Currently, the configs and the tags are uploaded at the end of the training run. I need to wait for the completion of the experiment to know the tags associated with the experiment. This prevents me from:
- understanding which run_id is associated with which experiments
- monitoring a large batch of experiments (8+) from within mlflow.
Feature request: when an experiment is launched and registered, also upload the wgtags.* space of the config (at least, maybe also the rest of the config if easy to do).
Marked as initiative because it has to happen after the config is fully resolved.
Hedgedoc URL, if you are keeping notes, plots, logs in hedgedoc.
No response
URL to the design document
No response
Area
- datasets, data readers, data preparation and transfer
- model
- science
- infrastructure and engineering
- evaluation, export and visualization
- documentation
Metadata
Metadata
Assignees
Labels
infraIssues related to infrastructureIssues related to infrastructureinitiativeLarge piece of work covering multiple sprintLarge piece of work covering multiple sprint
Type
Projects
Status
No status