Full dumps are slow on distributed file systems when using recent versions of OpenMPI, that default to ompio for MPI-I/O.
MPICH and MPI implementations based on MPICH (e.g. IntelMPI) use romio and do not have this issue.
setting the environment variables
OMPI_MCA_io=^ompio
OMPI_MCA_fs_ufs_lock_algorithm=3
or calling mpirun with --mca io ^ompio --mca fs_ufs_lock_algorithm 3
disables ompio and enables romio and fixes the slow writes.
From what I found out this seems to be a known issue for (fortran) applications with parallel hdf5 written for api v18 (See for example a note here or on this mailing list ).
Thought I should share this information with other users.