Skip to content

Instantly share code, notes, and snippets.

@rajesh-s
Last active December 29, 2025 21:47
Show Gist options
  • Select an option

  • Save rajesh-s/6573703e2fb27929344face33acdceee to your computer and use it in GitHub Desktop.

Select an option

Save rajesh-s/6573703e2fb27929344face33acdceee to your computer and use it in GitHub Desktop.
vLLM on GH200
# Upgrade kernel
sudo rm -rf /etc/apt/sources.list.d/lambda-repository.list
sudo apt update
sudo apt install linux-nvidia-64k -y
sudo reboot now
# Install latest driver
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu$(lsb_release -rs | sed 's/\.//')/sbsa/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring*.deb
sudo apt update
sudo apt-get install cuda-toolkit-13-0 -y
sudo apt-get purge -y 'nvidia-*' 'libnvidia-*'
sudo apt-get autoremove -y
sudo apt-get install -y nvidia-open
sudo reboot now
# Alternative persistence
sudo systemctl start nvidia-persistenced
sudo nvidia-smi -pm 1
# Perf settings
sudo sysctl kernel.perf_event_paranoid=-1
sudo sysctl kernel.sched_schedstats=1
sudo sysctl kernel.kptr_restrict=0
sudo apt install linux-tools-$(uname -r)
sudo modprobe arm_spe_pmu
# Set memory policy
echo online_movable | sudo tee /sys/devices/system/memory/auto_online_blocks
sudo nano /etc/default/grub # GRUB_CMDLINE_LINUX_DEFAULT="memhp_default_state=online_movable init_on_alloc=0"
sudo update-grub
sudo reboot now
# vLLM
curl -LsSf https://astral.sh/uv/install.sh | sh
uv venv --python 3.12 --seed
source .venv/bin/activate
uv pip install vllm --torch-backend=auto
# LMCache
export LMCACHE_REF=v0.3.11
git clone https://github.com/LMCache/LMCache.git -b ${LMCACHE_REF}
cd LMCache && sed -i '/torch/d' pyproject.toml
uv pip install build wheel setuptools_scm
python -m build --wheel --no-isolation
uv pip install dist/*.whl
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment