This issue is meant to report and track performance regressions with recent CUDA versions (>= v12.5.0) unearthed during recent benchmarking.
To confirm, see the following benchmarks: quick_benchmarks.xlsx
These benchmarks were from a number of different plaforms (Michigan State High Performance Computing Center, San Diego Supercomputer Center [Expanse], lab machines).
As to the performance regression itself, the ERI kernels appear to be the main culprit according to timings in logs. Interestingly, the energies and graients are all correct even with the newer CUDA versions. So, the regression does not concern correctness but only performance.