AMBER GPU Compilation Notes#
These notes document the local AMBER GPU patches used for BATTER runs. Apply
the AMBER26 netfrc patch first because it affects both NVIDIA CUDA and AMD
HIP GPU GTI/TI runs. The AMD section below is limited to GTI/HIP runtime
stability patches.
AMBER26 netfrc Patch#
This is an AMBER26 GPU GTI/TI issue and is not AMD-only. It can affect both NVIDIA CUDA and AMD HIP builds.
netfrc is the PME net-force correction switch in the &ewald namelist:
netfrc = 1removes the average nonbonded/PME force offset;netfrc = 0leaves the forces exactly as computed.
AMBER24 defaulted to netfrc = 1 for MD runs whenever imin = 0. AMBER26
changed the default so that restrained runs with ntr = 1 use
netfrc = 0:
if (imin .eq. 0 .and. ntr .eq. 0) then
netfrc = 1
else
netfrc = 0
end if
For GPU GTI softcore/TI, this can destabilize the force finalization path. In
the failing BATTER case, the run eventually reported an illegal memory access
while copying the 42-term energy buffer, but the trigger was the AMBER26
ntr = 1 default selecting netfrc = 0.
The local AMBER26 patch is in:
pmemd26_src/src/pmemd/src/mdin_ewald_dat.F90
It keeps the AMBER26 default for non-GTI and non-TI restrained runs, but restores the AMBER24-style MD default for CUDA/HIP GTI TI runs:
if (netfrc .lt. 0) then
#if defined(CUDA) && defined(GTI)
! GPU GTI force finalization is sensitive to disabling the PME net-force
! correction. Keep the normal MD default for TI even when ntr is set.
if (imin .eq. 0 .and. (ntr .eq. 0 .or. icfe .ne. 0)) then
#else
if (imin .eq. 0 .and. ntr .eq. 0) then
#endif
netfrc = 1
else
netfrc = 0
end if
end if
The warning for netfrc == 1 with restraints is also narrowed so GPU GTI TI
does not warn for this intentional default, while minimization and non-TI
restrained runs still warn.
If using an unpatched AMBER26 build, add this namelist to affected BATTER input files as a workaround:
&ewald
netfrc = 1,
/
AMD GPU GTI/HIP Runtime Patches#
These patches are for ROCm/HIP stability in GTI/TI runs. The observed symptoms
were HSA memory aperture violations, illegal memory accesses, or follow-on
hipGetDevice failures during TI/softcore simulations.
gti_cuda.cuIn
ik_BuildTINBList, keep the launch shape for the GTI neighbor-list phases at:threadsPerBlock = 128; blocksToUse = gpu->blocks;
This matches the older working pmemd24 HIP behavior. On Frontier/ROCm, the larger architecture-dependent launch shapes could fail later in kernels such as
kCalculateTIKineticEnergy_kerneleven though the original fault came from GTI neighbor-list construction.gti_general_kernels.cuIn
vec_sync, keep thecombinedModecases split explicitly. The ROCm-sensitive case iscombinedMode == 2: copya0into temporaries before writinga1.T vx = pVector[a0]; T vy = pVector[a0 + cSim.stride]; T vz = pVector[a0 + cSim.stride2]; pVector[a1] = vx; pVector[a1 + cSim.stride] = vy; pVector[a1 + cSim.stride2] = vz;
This preserves the intended
V0 -> V1sync while avoiding the old shared branch that faulted inkgSyncVector_kernelunder ROCm.
These AMD/HIP runtime patches should not be applied blindly to NVIDIA-specific
launch tuning. The netfrc patch above is the cross-vendor AMBER26 fix; the
GTI/HIP launch and vec_sync changes are ROCm stability fixes.