Skip to content

EasyBuild does not unload modules before creating / uploading test report #5086

@Thyre

Description

@Thyre

Recently, I've been getting more and more spurious failures of EasyBuild when trying to use --upload-test-report. Error messages might include not correctly being able to access my GitHub token, or this crash:

== COMPLETED: Installation ended successfully (took 9 mins 17 secs)
== Results of the build can be found in the log file(s) /data/EasyBuild-develop/software/X11/20250608-GCCcore-14.3.0/easybuild/easybuild-X11-20250608-20260109.184743.log

EasyBuild crashed! Please consider reporting a bug, this should not happen...

Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/data/EasyBuild/lib/python3.13/site-packages/easybuild/main.py", line 859, in <module>
    main_with_hooks()
    ~~~~~~~~~~~~~~~^^
  File "/data/EasyBuild/lib/python3.13/site-packages/easybuild/main.py", line 844, in main_with_hooks
    exit_code: EasyBuildExit = main(args=args, prepared_cfg_data=(init_session_state, eb_go, cfg_settings))
                               ~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/EasyBuild/lib/python3.13/site-packages/easybuild/main.py", line 795, in main
    is_successful = process_eb_args(orig_paths, eb_go, cfg_settings, modtool, testing, init_session_state,
                                    hooks, do_build)
  File "/data/EasyBuild/lib/python3.13/site-packages/easybuild/main.py", line 629, in process_eb_args
    test_report_msg = overall_test_report(ecs_with_res, len(paths), overall_success, success_msg, init_session_state)
  File "/data/EasyBuild/lib/python3.13/site-packages/easybuild/tools/testing.py", line 459, in overall_test_report
    txt = post_pr_test_report(pr_nrs, GITHUB_EASYCONFIGS_REPO, test_report, msg, init_session_state,
                              success)
  File "/data/EasyBuild/lib/python3.13/site-packages/easybuild/tools/testing.py", line 372, in post_pr_test_report
    gpu_info = get_gpu_info()
  File "/data/EasyBuild/lib/python3.13/site-packages/easybuild/tools/systemtools.py", line 746, in get_gpu_info
    amd_driver = res.output.strip().split('\n')[1].split(',')[1]
                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^
IndexError: list index out of range

A test report is uploaded (see e.g. https://gist.github.com/Thyre/777493e6e4311a48e7dffa9095ce396e), but generating the comment failed. My suspicion was that some modules might still be loaded at the point when EasyBuild tries to generate the test report, as the AMD tools amd-smi on a system level can fail with other modules loaded:

[jreuter@Linux ~]$ ml
No modules loaded
[jreuter@Linux ~]$ amd-smi version
AMDSMI Tool: 26.2.0+unknown | AMDSMI Library version: 26.2.0 | ROCm version: 7.1.1 | amdgpu version: Linuxversion6.18.3-arch1-1(linux@archlinux)(gcc(GCC)15.2.120251112,GNUld(GNUBinutils)2.45.1)#1SMPPREEMPT_DYNAMICFri,02Jan202617:52:55+0000 | amd_hsmp version: N/A
[jreuter@Linux ~]$ ml GCCcore/14.3.0 Python/3.13.5
[jreuter@Linux ~]$ amd-smi version
Unhandled import error: No module named 'amdsmi'
Failed to import the amdsmi Python library. Ensure it is installed in Python.
Alternatively, verify that the library is in the path:
/opt/rocm/libexec/amdsmi_cli/../../share/amd_smi

So I've added a module list just before the commands are executed. Low and behold, we still have modules loaded.

== 2026-01-09 18:47:44,152 run.py:504 INFO Path to bash that will be used to run shell commands: /usr/bin/bash
== 2026-01-09 18:47:44,152 run.py:518 INFO Running 'module ...' shell command in /home/jreuter:
        module list
== 2026-01-09 18:47:44,229 run.py:632 INFO 'module ...' shell command completed successfully
== 2026-01-09 18:47:44,229 run.py:634 INFO Output of 'module ...' shell command (stdout + stderr):
Currently Loaded Modules:
  1) GCCcore/14.3.0   5) Automake/1.18        9) XZ/5.8.1        13) gettext/0.25   17) Perl-bundle-CPAN/5.40.2  21) SQLite/3.50.1  25) Ninja/1.13.0     29) fontconfig/2.17.0  33) zlib/1.3.1
  2) M4/1.4.20        6) libtool/2.5.4       10) libxml2/2.14.3  14) pkgconf/2.4.3  18) intltool/0.51.0          22) libffi/3.5.1   26) Doxygen/1.14.0   30) libpng/1.6.50      34) xorg-macros/1.20.2
  3) Perl/5.40.2      7) Autotools/20250527  11) ncurses/6.5     15) expat/2.7.1    19) libtommath/1.3.0         23) Python/3.13.5  27) bzip2/1.0.8      31) Brotli/1.1.0       35) libpciaccess/0.18.1
  4) Autoconf/2.72    8) Bison/3.8.2         12) libiconv/1.18   16) OpenSSL/3      20) Tcl/9.0.1                24) Meson/1.8.2    28) util-linux/2.41  32) freetype/2.13.3

== 2026-01-09 18:47:44,229 systemtools.py:703 INFO loaded modules: Currently Loaded Modules:
  1) GCCcore/14.3.0   5) Automake/1.18        9) XZ/5.8.1        13) gettext/0.25   17) Perl-bundle-CPAN/5.40.2  21) SQLite/3.50.1  25) Ninja/1.13.0     29) fontconfig/2.17.0  33) zlib/1.3.1
  2) M4/1.4.20        6) libtool/2.5.4       10) libxml2/2.14.3  14) pkgconf/2.4.3  18) intltool/0.51.0          22) libffi/3.5.1   26) Doxygen/1.14.0   30) libpng/1.6.50      34) xorg-macros/1.20.2
  3) Perl/5.40.2      7) Autotools/20250527  11) ncurses/6.5     15) expat/2.7.1    19) libtommath/1.3.0         23) Python/3.13.5  27) bzip2/1.0.8      31) Brotli/1.1.0       35) libpciaccess/0.18.1
  4) Autoconf/2.72    8) Bison/3.8.2         12) libiconv/1.18   16) OpenSSL/3      20) Tcl/9.0.1                24) Meson/1.8.2    28) util-linux/2.41  32) freetype/2.13.3
== 2026-01-09 18:47:44,230 filetools.py:579 INFO Command amd-smi found at /opt/rocm/bin/amd-smi
== 2026-01-09 18:47:44,230 run.py:504 INFO Path to bash that will be used to run shell commands: /usr/bin/bash
== 2026-01-09 18:47:44,230 run.py:518 INFO Running 'amd-smi ...' shell command in /home/jreuter:
        amd-smi static --driver --board --asic --csv
== 2026-01-09 18:47:44,269 run.py:636 WARNING 'amd-smi ...' shell command FAILED (exit code 1)
== 2026-01-09 18:47:44,269 run.py:637 INFO Output of 'amd-smi ...' shell command (stdout + stderr):
Unhandled import error: No module named 'amdsmi'
Failed to import the amdsmi Python library. Ensure it is installed in Python.
Alternatively, verify that the library is in the path:
/opt/rocm/libexec/amdsmi_cli/../../share/amd_smi

== 2026-01-09 18:47:44,269 filetools.py:579 INFO Command rocm-smi found at /opt/rocm/bin/rocm-smi
== 2026-01-09 18:47:44,269 run.py:504 INFO Path to bash that will be used to run shell commands: /usr/bin/bash
== 2026-01-09 18:47:44,269 run.py:518 INFO Running 'rocm-smi ...' shell command in /home/jreuter:
        rocm-smi --showdriverversion --csv
== 2026-01-09 18:47:44,362 run.py:632 INFO 'rocm-smi ...' shell command completed successfully
== 2026-01-09 18:47:44,363 run.py:634 INFO Output of 'rocm-smi ...' shell command (stdout + stderr):
WARNING: AMD GPU device(s) is/are in a low-power state. Check power control/runtime_status

name, value
"Driver version", "6.18.3-arch1-1"

rocm-smi should still work, but chokes on the warning output not separated from the remainder.

Generally, I feel like EasyBuild should not have modules loaded anymore to determine the system information, since the executed commands might interfere, or am I missing something here?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions