Skip to content

Instantly share code, notes, and snippets.

@luraess
Last active December 11, 2025 11:04
Show Gist options
  • Select an option

  • Save luraess/a47931d7fb668bd4348a2c730d5489f4 to your computer and use it in GitHub Desktop.

Select an option

Save luraess/a47931d7fb668bd4348a2c730d5489f4 to your computer and use it in GitHub Desktop.
ROCm-aware (AMDGPU) MPI multi-GPU test
using MPI
using AMDGPU
MPI.Init()
comm = MPI.COMM_WORLD
rank = MPI.Comm_rank(comm)
# select device
# using node-local communicator to retrieve node-local rank
comm_l = MPI.Comm_split_type(comm, MPI.COMM_TYPE_SHARED, rank)
rank_l = MPI.Comm_rank(comm_l)
device = AMDGPU.device_id!(rank_l+1)
# using default device if the scheduler exposes different GPU per rank (e.g. SLURM `--gpus-per-task=1`)
# device = AMDGPU.device_id!(1)
gpu_id = AMDGPU.device_id(AMDGPU.device())
# select device
size = MPI.Comm_size(comm)
dst = mod(rank+1, size)
src = mod(rank-1, size)
println("rank=$rank rank_loc=$rank_l (gpu_id=$gpu_id - $device), size=$size, dst=$dst, src=$src")
N = 4
send_mesg = ROCArray{Float64}(undef, N)
recv_mesg = ROCArray{Float64}(undef, N)
fill!(send_mesg, Float64(rank))
AMDGPU.synchronize()
rank==0 && println("start sending...")
MPI.Sendrecv!(send_mesg, dst, 0, recv_mesg, src, 0, comm)
println("recv_mesg on proc $rank: $recv_mesg")
rank==0 && println("done.")
@luraess
Copy link
Author

luraess commented Dec 11, 2025

Reporting issues

Please report any question or issues you may encounter related to GPU-aware MPI on either Julia at scale Discourse or as an issue on MPI.jl.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment