Refactor matrix operation interfaces (eigen-decompisition, matrix-matrix multiplication) #442

vtripath65 · 2026-01-12T20:51:01Z

CUDA_DIAG function was being called to perform symmetric orthogonalization and diagonalization. The function handled both fock matrix and overlap matrix. In case of overlap matrix diagonalization, an identity matrix is used as transformation matrix as overlap matrix should not be orthogonalized.

This PR fixes the issue by splitting CUDA_DIAG into CUDA_DIAG and Fock_DIAG, where CUDA_DIAG is called only for diagonalization of matrices and Fock_DIAG is called if matrix needs to be orthogonalized and then diagonalized. Fock_DIAG calls CUDA_DIAG for diagonalization.

This PR also prepares us for a shift to canonical orthogonalization for handling near-linear dependency in AOs.

agoetz

Some things to check. And improvements to be made.

agoetz · 2026-01-16T00:31:42Z

src/modules/quick_scf_module.f90

-                quick_qm_struct%E, quick_qm_struct%idegen, &
-                quick_qm_struct%vec, quick_qm_struct%co, &
-                V2, nbasis)
+          call fock_diag(quick_qm_struct%o, quick_qm_struct%x, &


I think we should remove code duplication between HIP and CUDA here. We should either

have a function GPU_FOCK_DIAG which is defined as CUDA_FOCK_DIAG or HIP_FOCK_DIAG (similar to GPU_DGEMM)

or

use GPU_DGEMM (twice) and have a new function GPU_DIAG which is defined as cudaDIAG, rocDIAG, or magmaDIAG

We should have GPU_DIAG in any case. Option 1) from above would also simplify the code here.

MatDiag.f90 in subs directory is now called for diagonalization using both CPU and GPU.

This simplifies the code significantly.

agoetz · 2026-01-16T00:33:04Z

src/modules/quick_overlap_module.f90

@@ -243,8 +243,7 @@ subroutine fullx
   RECORD_TIME(timer_begin%T1eSD)

 #if defined(CUDA) || defined(CUDA_MPIV)


We should have a function GPU_DIAG similar to GPU_DGEMM

agoetz · 2026-01-16T00:33:32Z

src/gpu/cuda/cusolver/quick_cusolver.c

 {
-    int ka, kb;
+    cudaError_t err1, err2, err3;
    cublasStatus_t stat1, stat2, stat3;


Are we using stat1, stat2, stat3 any more?

No, we are not using the stats. Just checking the errors.

The fock_diag function does not exist anymore.

agoetz · 2026-01-16T00:34:40Z

src/gpu/cuda/cusolver/quick_cusolver.c

-    stat1=cublasSetMatrix(dim,dim,sizeof(devPtr_o[0]),o,dim,devPtr_o,dim);
-    stat2=cublasSetMatrix(dim,dim,sizeof(devPtr_x[0]),x,dim,devPtr_x,dim);
-    stat3=cublasSetMatrix(dim,dim,sizeof(devPtr_hold[0]),hold,dim,devPtr_hold,dim);
+    err1 = cudaMemcpy(devPtr_o, o, sizeof(double)*dim*dim, cudaMemcpyHostToDevice);


When replacing cublasSetMatrix with cudaMemcpy does this guarantee the same row/column order mapping (Fortran vs C)? I am not sure.

I am not sure how will I check this with our setup. All our matrices which are calling this routine are symmetric.

PS - All tests are passing.

I did some tests to check if switching to cudamemcpy will affect asymmetric matrices.

Since Fortran is using column major storage there will not be any issue. There will be an issue with row major arrays. So, if we will create an array in C/C++ then, we can not use this routine as it is.

ohearnk · 2026-01-20T21:04:20Z

@vtripath65, @agoetz -- Thanks for working on this. Overall, this is moving in the right direction. I'd like to make a few more changes (e.g., further simplify CPU and GPU codepaths for diagonalization-related routines, backport CUDA changes to HIP codes, etc.).

Give me a few days to make the changes and test them.

vtripath65 · 2026-01-21T19:33:44Z

@vtripath65, @agoetz -- Thanks for working on this. Overall, this is moving in the right direction. I'd like to make a few more changes (e.g., further simplify CPU and GPU codepaths for diagonalization-related routines, backport CUDA changes to HIP codes, etc.).

Give me a few days to make the changes and test them.

vtripath65 · 2026-01-21T19:37:22Z

@ohearnk -- Please let me know if you need any help from me.

…ense matrix operations (eigen-decompsition, matrix-matrix multiplication). Further code clean-up.

agoetz · 2026-02-03T04:32:21Z

Is this ready? Any idea why tests are cancelled?

vtripath65 added 2 commits January 9, 2026 19:14

CUDA_DIAG utility is improved

015bc19

commented out lines removed and gpu_core_freq returned to master format

859dd14

vtripath65 requested a review from ohearnk January 12, 2026 22:03

agoetz requested changes Jan 16, 2026

View reviewed changes

vtripath65 added 10 commits January 15, 2026 17:34

removed fock_diag for scf module

59d628b

removed fock_diag from uscf module

65dba6f

Fock_DIAG is removed

81e5f39

calling a subroutine for diagonalization in SCF routine

04902c4

small changes to Makefile and MatDiag

9423d70

rocdiag and magmadiag are now defined in MatDiag

0a3fae6

removed the commented out lines from scf module

8c2225c

no need to use rocdiag and magmadiag in scf module

a748276

diagonalization in USCF module is now calling MatDiag

8d3c74e

diagonalization in overlap module is now calling MatDiag

e301fc3

vtripath65 closed this Jan 21, 2026

vtripath65 reopened this Jan 21, 2026

ohearnk changed the title ~~CUDA_DIAG function call simplified~~ Refactor matrix operation interfaces (eigen-decompisition, matrix-matrix multiplication) Jan 26, 2026

ohearnk added 4 commits February 2, 2026 12:29

Remove unused hand-written eigen-decompsition code. Add wrapper for d…

e97ae17

…ense matrix operations (eigen-decompsition, matrix-matrix multiplication). Further code clean-up.

Merge branch 'master' into vtripath65-cuda_diag_fix.

e9a260d

Fix merge (DGEMM wrapper).

20c9ad0

Fix eigen-decompisitions for SAD guess routines.

e0cab8f

		@@ -243,8 +243,7 @@ subroutine fullx
		RECORD_TIME(timer_begin%T1eSD)

		#if defined(CUDA) \|\| defined(CUDA_MPIV)

Refactor matrix operation interfaces (eigen-decompisition, matrix-matrix multiplication) #442

Are you sure you want to change the base?

Refactor matrix operation interfaces (eigen-decompisition, matrix-matrix multiplication) #442

Uh oh!

Conversation

vtripath65 commented Jan 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

agoetz left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ohearnk commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vtripath65 commented Jan 21, 2026

Uh oh!

vtripath65 commented Jan 21, 2026

Uh oh!

agoetz commented Feb 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

vtripath65 commented Jan 12, 2026 •

edited

Loading

ohearnk commented Jan 20, 2026 •

edited

Loading