Play a little bit with local LLM models in a badget laptop, by keeping an stable Linux base system. In my case Fedora Linux.
- CPU: AMD Ryzen 5 Pro 7535U
- GPU: AMD Radeon 660M GPU
- RAM:
1632 GB
Distrobox is ideal to test or play linux distros because is lightweight and very intregrated with the host machine.
sudo dnf install distrobox
distrobox create \
--name rocm-ubuntu \
--image ubuntu:22.04 \
--additional-flags "--device /dev/kfd --device /dev/dri"
distrobox enter rocm-ubuntu # add --verbose if you have any issue while fist boot
sudo apt update
sudo apt install -y \
wget \
gnupg\\
ca-certificates \
software-properties-common \
lsb-release
sudo apt install python3-setuptools python3-wheel
wget https://repo.radeon.com/amdgpu-install/7.2/ubuntu/jammy/amdgpu-install_7.2.70200-1_all.deb
sudo dpkg -i amdgpu-install_7.2.70200-1_all.deb
sudo amdgpu-install -y --usecase=graphics,rocm --no-dkms
export HSA_OVERRIDE_GFX_VERSION=10.3.0 # remember to add this in your .bashrc or .zshrc!
mkdir -p ~/pip_tmp
export TMPDIR=~/pip_tmp
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm7.1 --no-cache-dir
You can run rocminfo and/or run this Python script:
import torch
try:
print(f"ROCm Version: {torch.version.hip}")
if torch.cuda.is_available():
# Try to actually move data to the GPU (This fails if binaries are missing)
x = torch.tensor([1.0, 2.0, 3.0]).cuda()
print(f"✅ SUCCESS! Tensor created on: {torch.cuda.get_device_name(0)}")
print(x)
else:
print("❌ No GPU detected by PyTorch.")
except Exception as e:
print(f"❌ CRASHED: {e}")Inside the Ubuntu distrobox
wget -qO - https://packages.lunarg.com/lunarg-signing-key-pub.asc | sudo apt-key add -
sudo wget -qO /etc/apt/sources.list.d/lunarg-vulkan-jammy.list https://packages.lunarg.com/vulkan/lunarg-vulkan-jammy.list
sudo apt update
sudo apt install vulkan-sdkNow test it with: vulkaninfo | head -20
Download and install the Ollama binary and libs
curl -fsSL https://ollama.com/install.sh | sh
ollama serve
In another terminal, test a basic model.
distrobox enter rocm-ubuntu
ollama run llama3.2
Note: if you want to compile in Fedora Linux:
sudo dnf install git make gcc-c++ vulkan-headers vulkan-loader-devel libshaderc-devel glslc glslang cmake ninja-build
I get a little bit (not so much) performance with llama.cpp than ollama.
git clone --depth 1 https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
make clean
export ROCM_PATH=/opt/rocm-7.2.0
export HIP_PATH=${ROCM_PATH}
export HSA_PATH=${ROCM_PATH}
export LD_LIBRARY_PATH=${ROCM_PATH}/lib:${ROCM_PATH}/lib64:$LD_LIBRARY_PATH
# rm -rf build-rocm # if you built it previously
cmake -B build-rocm \
-S . \
-DGGML_HIP=ON \
-DAMDGPU_TARGETS=gfx1030 \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_PREFIX_PATH=${ROCM_PATH} \
-DCMAKE_HIP_COMPILER=${ROCM_PATH}/llvm/bin/clang++
cmake --build build-rocm -j$(nproc)
In another terminals you can monitorize the hardware resources:
watch -n 1 rocm-smi # subterminal 1
radeontop # subterminal 2
Common code-related prompt:
export PROMPT_AI="Write a Python function that loads a YAML file and validates required keys."
- ollama:
ollama serve
ollama pull hf.co/Qwen/Qwen2.5-Coder-7B-Instruct-GGUF:Q2_K
ollama curl -N http://localhost:11434/api/generate -d "{
\"model\": \"hf.co/Qwen/Qwen2.5-Coder-7B-Instruct-GGUF:Q2_K\",
\"prompt\": \"$PROMPT_AI\",
\"num_ctx\": 2048,
\"num_gpu_layers\": 99,
\`"stream\": false
}" | jq '.eval_duration, .prompt_eval_count, .response'- llama.cpp
build-rocm/bin/llama-cli \
-m ~/models/qwen2.5-coder-7b-instruct-q2_k.gguf \
-ngl 99 \
-c 8192 \
-t 6 \
-n 512 \
--color on \
-p "$PROMPT_AI"