Add method to generate scaling experiments to PayuManager#46
Add method to generate scaling experiments to PayuManager#46micaeljtoliveira merged 3 commits intomainfrom
Conversation
|
@manodeep Would you mind also having a look at this PR? It should clarify some of the changes made in ACCESS-NRI/access-config-utils#34 |
e4a72cb to
34536c1
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #46 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 16 16
Lines 791 834 +43
=========================================
+ Hits 791 834 +43 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
34536c1 to
7e9dab0
Compare
|
@manodeep Later this week I'll do a final check with the latest changes from ACCESS-NRI/access-config-utils#34, but I think this is ready for review. |
manodeep
left a comment
There was a problem hiding this comment.
@micaeljtoliveira Looked through the changes - looks mostly good to me, and asked for clarifications on some.
I don't know enough of 'MagicMock' to properly review the testing bits, but seems to be fine. Coverage is 100% right?
| return logs | ||
|
|
||
| def generate_core_layouts_from_node_count( | ||
| self, num_nodes: float, cores_per_node: int, layout_search_config: LayoutSearchConfig | None = None |
There was a problem hiding this comment.
I don't know the | None = None - does that mean allow "None" as an input, and then pass that "None" to the routine? (Basically, trying to figure out what the = None bit is doing)
There was a problem hiding this comment.
You need to read this in two parts:
layout_search_config: LayoutSearchConfig | None: parameter is of type LayoutSearchConfing or None.
layout_search_config = None: the usual way of providing a default value.
There was a problem hiding this comment.
Thanks! I still don't follow why the =None is necessary. Either the parameter is passed as an instance of LayoutSearchConfig or it is passed as None
Regardless, this is just for my curiosity :)
There was a problem hiding this comment.
Scratch that, type-hinting does not actually do runtime checks - so that =None would be required
There was a problem hiding this comment.
Yeah, sorry for not clarifying that part.
| logger.info(f"Generated {len(layouts)} layouts for {num_nodes} nodes. Layouts: {layouts}") | ||
|
|
||
| # TODO: the branch name needs to be simpler and model agnostic | ||
| branch_name = f"layout-unused-cores-to-cice-{layout_config.allocate_unused_cores_to_ice}" |
There was a problem hiding this comment.
Thanks for creating an issue about it!
| for layout in layouts: | ||
| pert_config = self.generate_perturbation_block(layout=layout, branch_name_prefix=branch_name) | ||
| branch = pert_config["branches"][0] | ||
| pert_config["config.yaml"]["walltime"] = str(timedelta(hours=walltime_hrs)) |
There was a problem hiding this comment.
Why does the walltime need to be set explicitly?
(I remember there was something odd about the walltime in experiment-generator - but may be just add a comment that clarifies the need for walltime setting)
There was a problem hiding this comment.
The walltime is a value set in the payu config (config.yaml), so here we are letting the user set it differently for each perturbation by providing a function.
You had a similar logic in the ESM1.6 scaling notebook, but were directly replacing the text of the yaml block:
walltime_hrs += nblocks_added * (1.5 * 4.0/num_nodes) # use a 1.5 hrs time for 4-node runs, and then scale linearly
entire_block += block
entire_block = entire_block.replace("walltime: Inf", f"walltime: {int(walltime_hrs)}:00:00")There was a problem hiding this comment.
To be clear, one could just pass it as part of the control_options, but then it would always be the same for all perturbations.
There was a problem hiding this comment.
I am not sure why janky walltime increasing logic with the number of scaling experiments actually worked 😅 - might have had to edit by hand afterwards
There was a problem hiding this comment.
Hopefully the new implementation will be less janky 😄
| # Verify the configuration passed to ExperimentGenerator has correct walltime | ||
| call_args = mock_experiment_generator.call_args[0][0] | ||
| assert ( | ||
| call_args["Perturbation_Experiment"]["Experiment_1"]["config.yaml"]["walltime"] == "5:00:00" |
There was a problem hiding this comment.
If you split out num_nodes_list into a variable above, then you can use that variable in the manager.generate_scaling_experiments call, as well as here with calls to walltime_func for each element in that list.
There was a problem hiding this comment.
Yes, but that requires manipulating further the result of the function, just like done in the PayuManager code:
pert_config["config.yaml"]["walltime"] = str(timedelta(hours=walltime_hrs))So the tests would be:
assert (
call_args["Perturbation_Experiment"]["Experiment_1"]["config.yaml"]["walltime"] == str(timedelta(hours=walltime_func(2.0))
)which then doesn't check that str(timedelta(hours=walltime_func(2.0)) is doing what it's suppose to do.
There was a problem hiding this comment.
I can see it both ways. I like making changes in one place and having that update everywhere; however, I can also see that this being in a "tests" context, it makes sense to have duplicated logic
There was a problem hiding this comment.
Yes, I agree, keeping the code DRY is a good thing, but it's not always clear how far to take that in tests. I think not including any of the code being tested in the unit test itself is a good rule of thumb.
424a347 to
7e9dab0
Compare
…ake a list of node counts, generate layouts that match the user-supplied criteria, and create the corresponding experiments. Currently the method still has some code that is somehow ESM1.6 specific (e.g., the branch naming scheme), but we need to have more supported models before deciding on the best way to generalise it. Deleted the generate_experiments method, as this should not be needed anymore. If other types of generation is needed (i.e, not scaling tests), one should implement specific methods for those.
7e9dab0 to
22c20f2
Compare
|
@micaeljtoliveira Thanks for the clarifications - is this all finalised now? |
|
@manodeep Yes, all finalised from my side. And thanks for the review. |
|
I forgot to say, I did test this the best I could by trying to make sure it was producing the same perturbations as the ESM1.6 scaling notebook, but I couldn't actually run those perturbation experiments (the corresponding executables are gone...). That means that, when we update the notebook, we'll have to carefully check the results. |
Companion PR to ACCESS-NRI/access-config-utils#34