-
Notifications
You must be signed in to change notification settings - Fork 219
Description
Recently, I've been getting more and more spurious failures of EasyBuild when trying to use --upload-test-report. Error messages might include not correctly being able to access my GitHub token, or this crash:
== COMPLETED: Installation ended successfully (took 9 mins 17 secs)
== Results of the build can be found in the log file(s) /data/EasyBuild-develop/software/X11/20250608-GCCcore-14.3.0/easybuild/easybuild-X11-20250608-20260109.184743.log
EasyBuild crashed! Please consider reporting a bug, this should not happen...
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/data/EasyBuild/lib/python3.13/site-packages/easybuild/main.py", line 859, in <module>
main_with_hooks()
~~~~~~~~~~~~~~~^^
File "/data/EasyBuild/lib/python3.13/site-packages/easybuild/main.py", line 844, in main_with_hooks
exit_code: EasyBuildExit = main(args=args, prepared_cfg_data=(init_session_state, eb_go, cfg_settings))
~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/EasyBuild/lib/python3.13/site-packages/easybuild/main.py", line 795, in main
is_successful = process_eb_args(orig_paths, eb_go, cfg_settings, modtool, testing, init_session_state,
hooks, do_build)
File "/data/EasyBuild/lib/python3.13/site-packages/easybuild/main.py", line 629, in process_eb_args
test_report_msg = overall_test_report(ecs_with_res, len(paths), overall_success, success_msg, init_session_state)
File "/data/EasyBuild/lib/python3.13/site-packages/easybuild/tools/testing.py", line 459, in overall_test_report
txt = post_pr_test_report(pr_nrs, GITHUB_EASYCONFIGS_REPO, test_report, msg, init_session_state,
success)
File "/data/EasyBuild/lib/python3.13/site-packages/easybuild/tools/testing.py", line 372, in post_pr_test_report
gpu_info = get_gpu_info()
File "/data/EasyBuild/lib/python3.13/site-packages/easybuild/tools/systemtools.py", line 746, in get_gpu_info
amd_driver = res.output.strip().split('\n')[1].split(',')[1]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^
IndexError: list index out of range
A test report is uploaded (see e.g. https://gist.github.com/Thyre/777493e6e4311a48e7dffa9095ce396e), but generating the comment failed. My suspicion was that some modules might still be loaded at the point when EasyBuild tries to generate the test report, as the AMD tools amd-smi on a system level can fail with other modules loaded:
[jreuter@Linux ~]$ ml
No modules loaded
[jreuter@Linux ~]$ amd-smi version
AMDSMI Tool: 26.2.0+unknown | AMDSMI Library version: 26.2.0 | ROCm version: 7.1.1 | amdgpu version: Linuxversion6.18.3-arch1-1(linux@archlinux)(gcc(GCC)15.2.120251112,GNUld(GNUBinutils)2.45.1)#1SMPPREEMPT_DYNAMICFri,02Jan202617:52:55+0000 | amd_hsmp version: N/A
[jreuter@Linux ~]$ ml GCCcore/14.3.0 Python/3.13.5
[jreuter@Linux ~]$ amd-smi version
Unhandled import error: No module named 'amdsmi'
Failed to import the amdsmi Python library. Ensure it is installed in Python.
Alternatively, verify that the library is in the path:
/opt/rocm/libexec/amdsmi_cli/../../share/amd_smiSo I've added a module list just before the commands are executed. Low and behold, we still have modules loaded.
== 2026-01-09 18:47:44,152 run.py:504 INFO Path to bash that will be used to run shell commands: /usr/bin/bash
== 2026-01-09 18:47:44,152 run.py:518 INFO Running 'module ...' shell command in /home/jreuter:
module list
== 2026-01-09 18:47:44,229 run.py:632 INFO 'module ...' shell command completed successfully
== 2026-01-09 18:47:44,229 run.py:634 INFO Output of 'module ...' shell command (stdout + stderr):
Currently Loaded Modules:
1) GCCcore/14.3.0 5) Automake/1.18 9) XZ/5.8.1 13) gettext/0.25 17) Perl-bundle-CPAN/5.40.2 21) SQLite/3.50.1 25) Ninja/1.13.0 29) fontconfig/2.17.0 33) zlib/1.3.1
2) M4/1.4.20 6) libtool/2.5.4 10) libxml2/2.14.3 14) pkgconf/2.4.3 18) intltool/0.51.0 22) libffi/3.5.1 26) Doxygen/1.14.0 30) libpng/1.6.50 34) xorg-macros/1.20.2
3) Perl/5.40.2 7) Autotools/20250527 11) ncurses/6.5 15) expat/2.7.1 19) libtommath/1.3.0 23) Python/3.13.5 27) bzip2/1.0.8 31) Brotli/1.1.0 35) libpciaccess/0.18.1
4) Autoconf/2.72 8) Bison/3.8.2 12) libiconv/1.18 16) OpenSSL/3 20) Tcl/9.0.1 24) Meson/1.8.2 28) util-linux/2.41 32) freetype/2.13.3
== 2026-01-09 18:47:44,229 systemtools.py:703 INFO loaded modules: Currently Loaded Modules:
1) GCCcore/14.3.0 5) Automake/1.18 9) XZ/5.8.1 13) gettext/0.25 17) Perl-bundle-CPAN/5.40.2 21) SQLite/3.50.1 25) Ninja/1.13.0 29) fontconfig/2.17.0 33) zlib/1.3.1
2) M4/1.4.20 6) libtool/2.5.4 10) libxml2/2.14.3 14) pkgconf/2.4.3 18) intltool/0.51.0 22) libffi/3.5.1 26) Doxygen/1.14.0 30) libpng/1.6.50 34) xorg-macros/1.20.2
3) Perl/5.40.2 7) Autotools/20250527 11) ncurses/6.5 15) expat/2.7.1 19) libtommath/1.3.0 23) Python/3.13.5 27) bzip2/1.0.8 31) Brotli/1.1.0 35) libpciaccess/0.18.1
4) Autoconf/2.72 8) Bison/3.8.2 12) libiconv/1.18 16) OpenSSL/3 20) Tcl/9.0.1 24) Meson/1.8.2 28) util-linux/2.41 32) freetype/2.13.3
== 2026-01-09 18:47:44,230 filetools.py:579 INFO Command amd-smi found at /opt/rocm/bin/amd-smi
== 2026-01-09 18:47:44,230 run.py:504 INFO Path to bash that will be used to run shell commands: /usr/bin/bash
== 2026-01-09 18:47:44,230 run.py:518 INFO Running 'amd-smi ...' shell command in /home/jreuter:
amd-smi static --driver --board --asic --csv
== 2026-01-09 18:47:44,269 run.py:636 WARNING 'amd-smi ...' shell command FAILED (exit code 1)
== 2026-01-09 18:47:44,269 run.py:637 INFO Output of 'amd-smi ...' shell command (stdout + stderr):
Unhandled import error: No module named 'amdsmi'
Failed to import the amdsmi Python library. Ensure it is installed in Python.
Alternatively, verify that the library is in the path:
/opt/rocm/libexec/amdsmi_cli/../../share/amd_smi
== 2026-01-09 18:47:44,269 filetools.py:579 INFO Command rocm-smi found at /opt/rocm/bin/rocm-smi
== 2026-01-09 18:47:44,269 run.py:504 INFO Path to bash that will be used to run shell commands: /usr/bin/bash
== 2026-01-09 18:47:44,269 run.py:518 INFO Running 'rocm-smi ...' shell command in /home/jreuter:
rocm-smi --showdriverversion --csv
== 2026-01-09 18:47:44,362 run.py:632 INFO 'rocm-smi ...' shell command completed successfully
== 2026-01-09 18:47:44,363 run.py:634 INFO Output of 'rocm-smi ...' shell command (stdout + stderr):
WARNING: AMD GPU device(s) is/are in a low-power state. Check power control/runtime_status
name, value
"Driver version", "6.18.3-arch1-1"
rocm-smi should still work, but chokes on the warning output not separated from the remainder.
Generally, I feel like EasyBuild should not have modules loaded anymore to determine the system information, since the executed commands might interfere, or am I missing something here?