Add method to generate scaling experiments to PayuManager by micaeljtoliveira · Pull Request #46 · ACCESS-NRI/access-profiling

micaeljtoliveira · 2026-02-02T23:05:53Z

Companion PR to ACCESS-NRI/access-config-utils#34

micaeljtoliveira · 2026-02-02T23:07:13Z

@manodeep Would you mind also having a look at this PR? It should clarify some of the changes made in ACCESS-NRI/access-config-utils#34

codecov · 2026-02-03T02:02:11Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.00%. Comparing base (4ae8487) to head (22c20f2).
⚠️ Report is 4 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff            @@
##              main       #46   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           16        16           
  Lines          791       834   +43     
=========================================
+ Hits           791       834   +43

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

micaeljtoliveira · 2026-02-10T04:18:25Z

@manodeep Later this week I'll do a final check with the latest changes from ACCESS-NRI/access-config-utils#34, but I think this is ready for review.

manodeep

@micaeljtoliveira Looked through the changes - looks mostly good to me, and asked for clarifications on some.

I don't know enough of 'MagicMock' to properly review the testing bits, but seems to be fine. Coverage is 100% right?

src/access/profiling/payu_manager.py

src/access/profiling/access_models.py

manodeep · 2026-02-11T03:19:54Z

src/access/profiling/access_models.py

        return logs

+    def generate_core_layouts_from_node_count(
+        self, num_nodes: float, cores_per_node: int, layout_search_config: LayoutSearchConfig | None = None


I don't know the | None = None - does that mean allow "None" as an input, and then pass that "None" to the routine? (Basically, trying to figure out what the = None bit is doing)

You need to read this in two parts:
layout_search_config: LayoutSearchConfig | None: parameter is of type LayoutSearchConfing or None.
layout_search_config = None: the usual way of providing a default value.

Thanks! I still don't follow why the =None is necessary. Either the parameter is passed as an instance of LayoutSearchConfig or it is passed as None

Regardless, this is just for my curiosity :)

Scratch that, type-hinting does not actually do runtime checks - so that =None would be required

Yeah, sorry for not clarifying that part.

src/access/profiling/payu_manager.py

manodeep · 2026-02-11T03:30:29Z

src/access/profiling/payu_manager.py

+            logger.info(f"Generated {len(layouts)} layouts for {num_nodes} nodes. Layouts: {layouts}")
+
+            # TODO: the branch name needs to be simpler and model agnostic
+            branch_name = f"layout-unused-cores-to-cice-{layout_config.allocate_unused_cores_to_ice}"


This reminded me - that feature is broken :(

Thanks for creating an issue about it!

manodeep · 2026-02-11T03:33:12Z

src/access/profiling/payu_manager.py

+            for layout in layouts:
+                pert_config = self.generate_perturbation_block(layout=layout, branch_name_prefix=branch_name)
+                branch = pert_config["branches"][0]
+                pert_config["config.yaml"]["walltime"] = str(timedelta(hours=walltime_hrs))


Why does the walltime need to be set explicitly?

(I remember there was something odd about the walltime in experiment-generator - but may be just add a comment that clarifies the need for walltime setting)

The walltime is a value set in the payu config (config.yaml), so here we are letting the user set it differently for each perturbation by providing a function.

You had a similar logic in the ESM1.6 scaling notebook, but were directly replacing the text of the yaml block:

walltime_hrs += nblocks_added * (1.5 * 4.0/num_nodes) # use a 1.5 hrs time for 4-node runs, and then scale linearly entire_block += block entire_block = entire_block.replace("walltime: Inf", f"walltime: {int(walltime_hrs)}:00:00")

To be clear, one could just pass it as part of the control_options, but then it would always be the same for all perturbations.

I am not sure why janky walltime increasing logic with the number of scaling experiments actually worked 😅 - might have had to edit by hand afterwards

Hopefully the new implementation will be less janky 😄

src/access/profiling/payu_manager.py

tests/test_access_models.py

manodeep · 2026-02-11T03:53:03Z

tests/test_payu_manager.py

+    # Verify the configuration passed to ExperimentGenerator has correct walltime
+    call_args = mock_experiment_generator.call_args[0][0]
+    assert (
+        call_args["Perturbation_Experiment"]["Experiment_1"]["config.yaml"]["walltime"] == "5:00:00"


If you split out num_nodes_list into a variable above, then you can use that variable in the manager.generate_scaling_experiments call, as well as here with calls to walltime_func for each element in that list.

Yes, but that requires manipulating further the result of the function, just like done in the PayuManager code:

pert_config["config.yaml"]["walltime"] = str(timedelta(hours=walltime_hrs))

So the tests would be:

assert ( call_args["Perturbation_Experiment"]["Experiment_1"]["config.yaml"]["walltime"] == str(timedelta(hours=walltime_func(2.0)) )

which then doesn't check that str(timedelta(hours=walltime_func(2.0)) is doing what it's suppose to do.

I can see it both ways. I like making changes in one place and having that update everywhere; however, I can also see that this being in a "tests" context, it makes sense to have duplicated logic

Yes, I agree, keeping the code DRY is a good thing, but it's not always clear how far to take that in tests. I think not including any of the code being tested in the unit test itself is a good rule of thumb.

…ake a list of node counts, generate layouts that match the user-supplied criteria, and create the corresponding experiments. Currently the method still has some code that is somehow ESM1.6 specific (e.g., the branch naming scheme), but we need to have more supported models before deciding on the best way to generalise it. Deleted the generate_experiments method, as this should not be needed anymore. If other types of generation is needed (i.e, not scaling tests), one should implement specific methods for those.

manodeep · 2026-02-17T01:32:09Z

@micaeljtoliveira Thanks for the clarifications - is this all finalised now?

micaeljtoliveira · 2026-02-17T01:56:55Z

@manodeep Yes, all finalised from my side. And thanks for the review.

micaeljtoliveira · 2026-02-17T01:59:43Z

I forgot to say, I did test this the best I could by trying to make sure it was producing the same perturbations as the ESM1.6 scaling notebook, but I couldn't actually run those perturbation experiments (the corresponding executables are gone...). That means that, when we update the notebook, we'll have to carefully check the results.

micaeljtoliveira requested a review from manodeep February 2, 2026 23:06

micaeljtoliveira force-pushed the generate_experiments branch from e4a72cb to 34536c1 Compare February 3, 2026 02:00

micaeljtoliveira force-pushed the generate_experiments branch from 34536c1 to 7e9dab0 Compare February 3, 2026 02:12

manodeep reviewed Feb 11, 2026

View reviewed changes

micaeljtoliveira force-pushed the generate_experiments branch from 424a347 to 7e9dab0 Compare February 16, 2026 04:29

micaeljtoliveira added 3 commits February 17, 2026 11:25

Add set_control method to PayuManager.

54c450c

Add model_type abstract property to PayuManager.

8a4028a

micaeljtoliveira force-pushed the generate_experiments branch from 7e9dab0 to 22c20f2 Compare February 17, 2026 00:25

manodeep approved these changes Feb 17, 2026

View reviewed changes

micaeljtoliveira merged commit 20576d1 into main Feb 17, 2026
8 checks passed

micaeljtoliveira deleted the generate_experiments branch February 17, 2026 02:52

Comments

Conversation

micaeljtoliveira commented Feb 2, 2026

Uh oh!

micaeljtoliveira commented Feb 2, 2026

Uh oh!

codecov bot commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

micaeljtoliveira commented Feb 10, 2026

Uh oh!

manodeep left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

manodeep commented Feb 17, 2026

Uh oh!

micaeljtoliveira commented Feb 17, 2026

Uh oh!

micaeljtoliveira commented Feb 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov bot commented Feb 3, 2026 •

edited

Loading