Overview

Play a little bit with local LLM models in a badget laptop, by keeping an stable Linux base system. In my case Fedora Linux.

Current hardware

CPU: AMD Ryzen 5 Pro 7535U
GPU: AMD Radeon 660M GPU
RAM: 16 32 GB

Main setup

Distrobox

Distrobox is ideal to test or play linux distros because is lightweight and very intregrated with the host machine.

sudo dnf install distrobox

Install Ubuntu 22.04

distrobox create \
--name rocm-ubuntu \
--image ubuntu:22.04 \
--additional-flags "--device /dev/kfd --device /dev/dri"

Install ROCm and Pytorch for ROCm

distrobox enter rocm-ubuntu # add --verbose if you have any issue while fist boot
sudo apt update
sudo apt install -y \
  wget \
  gnupg\\
  ca-certificates \
  software-properties-common \
  lsb-release
sudo apt install python3-setuptools python3-wheel
wget https://repo.radeon.com/amdgpu-install/7.2/ubuntu/jammy/amdgpu-install_7.2.70200-1_all.deb
sudo dpkg -i amdgpu-install_7.2.70200-1_all.deb
sudo amdgpu-install -y --usecase=graphics,rocm --no-dkms
export HSA_OVERRIDE_GFX_VERSION=10.3.0 # remember to add this in your .bashrc or .zshrc!
mkdir -p ~/pip_tmp
export TMPDIR=~/pip_tmp
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm7.1 --no-cache-dir

Testing GPU with ROCM

You can run rocminfo and/or run this Python script:

import torch

try:
    print(f"ROCm Version: {torch.version.hip}")
    if torch.cuda.is_available():
        # Try to actually move data to the GPU (This fails if binaries are missing)
        x = torch.tensor([1.0, 2.0, 3.0]).cuda()
        print(f"✅ SUCCESS! Tensor created on: {torch.cuda.get_device_name(0)}")
        print(x)
    else:
        print("❌ No GPU detected by PyTorch.")
except Exception as e:
    print(f"❌ CRASHED: {e}")

Testing GPU with Vulkan

Inside the Ubuntu distrobox

wget -qO - https://packages.lunarg.com/lunarg-signing-key-pub.asc | sudo apt-key add -
sudo wget -qO /etc/apt/sources.list.d/lunarg-vulkan-jammy.list https://packages.lunarg.com/vulkan/lunarg-vulkan-jammy.list
sudo apt update
sudo apt install vulkan-sdk

Now test it with: vulkaninfo | head -20

Install local LLMs

Ollama

Download and install the Ollama binary and libs

curl -fsSL https://ollama.com/install.sh | sh
ollama serve

In another terminal, test a basic model.

distrobox enter rocm-ubuntu
ollama run llama3.2

llama.cpp

Note: if you want to compile in Fedora Linux:

sudo dnf install git make gcc-c++ vulkan-headers vulkan-loader-devel libshaderc-devel glslc glslang cmake ninja-build

I get a little bit (not so much) performance with llama.cpp than ollama.

git clone --depth 1 https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
make clean
export ROCM_PATH=/opt/rocm-7.2.0
export HIP_PATH=${ROCM_PATH}
export HSA_PATH=${ROCM_PATH}
export LD_LIBRARY_PATH=${ROCM_PATH}/lib:${ROCM_PATH}/lib64:$LD_LIBRARY_PATH
# rm -rf build-rocm # if you built it previously
cmake -B build-rocm \
  -S . \
  -DGGML_HIP=ON \
  -DAMDGPU_TARGETS=gfx1030 \
  -DCMAKE_BUILD_TYPE=Release \
  -DCMAKE_PREFIX_PATH=${ROCM_PATH} \
  -DCMAKE_HIP_COMPILER=${ROCM_PATH}/llvm/bin/clang++
cmake --build build-rocm -j$(nproc)

GPU monitoring

In another terminals you can monitorize the hardware resources:

watch -n 1 rocm-smi # subterminal 1
radeontop # subterminal 2

LLMs performance benchmarks

Common code-related prompt:

export PROMPT_AI="Write a Python function that loads a YAML file and validates required keys."

ollama:

ollama serve
ollama pull hf.co/Qwen/Qwen2.5-Coder-7B-Instruct-GGUF:Q2_K
ollama curl -N http://localhost:11434/api/generate -d "{
  \"model\": \"hf.co/Qwen/Qwen2.5-Coder-7B-Instruct-GGUF:Q2_K\",
  \"prompt\": \"$PROMPT_AI\",
  \"num_ctx\": 2048,
  \"num_gpu_layers\": 99,
  \`"stream\": false
}" | jq '.eval_duration, .prompt_eval_count, .response'

llama.cpp

build-rocm/bin/llama-cli \
  -m ~/models/qwen2.5-coder-7b-instruct-q2_k.gguf \
  -ngl 99 \
  -c 8192 \
  -t 6 \
  -n 512 \
  --color on \
  -p "$PROMPT_AI"

jaimemrjm/local_llm_linux.md

Select an option

No results found