Which of these are the fastest for running an LLM, assuming the same VRAM:
- Nvidia A100
- Nvidia A40
- Nvidia L40S
- Nvidia GH200
Fastest → slowest for running an LLM (inference/training), ignoring VRAM capacity differences:
-
NVIDIA GH200 (Grace Hopper)
- GH200’s GPU is an NVIDIA H100 (icc-usa.com), which has substantially higher Tensor Core throughput and HBM bandwidth than the others (e.g., H100 SXM: FP8 Tensor ~3,958 TFLOPS (with sparsity) and ~3.35 TB/s HBM bandwidth) (PNY Technologies).
- Net: typically the fastest option for LLMs.
-
NVIDIA A100
-
NVIDIA L40S
- Much higher peak tensor math than A100 at low precision (e.g., FP8 Tensor 733 / 1,466 TFLOPS with sparsity), but materially lower memory bandwidth (864 GB/s) (nor-tech.com).
- Net: can be very competitive for FP8-optimized inference at higher batch sizes, but often loses to A100 when decoding is memory-bandwidth bound.
-
NVIDIA A40
- Lower memory bandwidth (696 GB/s) and lower tensor throughput (e.g., FP16 Tensor 149.7 / 299.4 TFLOPS with sparsity) (NVIDIA Images).
- Net: typically the slowest here for LLM work.
Rule of thumb: GH200 (H100-class) is the clear winner; A100 vs L40S depends on whether your workload is bandwidth-bound (often A100) or FP8/compute-bound (sometimes L40S).
Write a cloud-init file that I can use on Vultr on an Ubuntu instance with a GH200 that does the following:
- Install Ollama, jq, NodeJS, NPM, git, Vim, Claude Code (we need to be able to run Claude Code and Ralph)
- Install Ralph in Claude Code. This could either be done using the Claude code market place (use a global context):
/plugin marketplace add snarktank/ralph
/plugin install ralph-skills@ralph-marketplace
Or by copying the files from Ralph's repo
mkdir -p $HOME/git-sys/
mkdir -p $HOME/.claude/skills/
cd $HOME/git-sys/
git clone https://github.com/snarktank/ralph.git
cd $HOME/git-sys/ralph
cp -r skills/prd ~/.claude/skills/
cp -r skills/ralph ~/.claude/skills/
- Set Ollama to expose the server, but only to local host (we can access it over SSH when needed). Disable cloud access.
- Make Ollama download deepseek-coder-v2:16b, glm-4.7-flash:latest, huihui_ai/openhands-lm-abliterated:latest
- Add a commented out lines to make Ollama download qwen3-coder-next:latest, huihui_ai/gpt-oss-abliterated, and zarigata/4chan:latest
- Make Ollama start with the computer. If it supports being a service, this is fine, or it can go in a crontab.
- Set system variables so that Claude Code will use the local Ollama instance instead of a cloud instance. That is, something like:
export ANTHROPIC_AUTH_TOKEN=ollama
export ANTHROPIC_BASE_URL=http://127.0.0.1:11434
Once the server is set up, the following should be possible:
- I can SSH to the server, run
claudeorclaude --model qwen3-coder-next:latestin a directory and set up to do complex work. - I can set up an SSH tunnel from another machine (e.g. my laptop) and forward port 11434 (e.g. ssh root@server.com -L 11434:127.0.0.1:11434), set the environment variables, and run Claude Code locally using the AI on the server as a backend.
Below is a single, self-contained cloud-init file suitable for Vultr Ubuntu + GH200. It is written to be non-interactive, idempotent, and to leave the system in a state where:
ollamaruns only on localhost, on boot- Claude Code uses local Ollama as its backend
- Ralph is installed locally via repo copy (reliable in cloud-init)
- You can use Claude Code both on the server and via SSH tunnel
#cloud-config
package_update: true
package_upgrade: true
packages:
- curl
- jq
- git
- vim
- ca-certificates
- gnupg
- build-essential
write_files:
# Ollama environment configuration
- path: /etc/profile.d/ollama.sh
permissions: '0644'
content: |
# Ollama: local-only, no cloud access
export OLLAMA_HOST=127.0.0.1:11434
export OLLAMA_NO_CLOUD=1
# Claude Code environment configuration
- path: /etc/profile.d/claude.sh
permissions: '0644'
content: |
# Force Claude Code to use local Ollama backend
export ANTHROPIC_AUTH_TOKEN=ollama
export ANTHROPIC_BASE_URL=http://127.0.0.1:11434
runcmd:
# ----------------------------
# Node.js + npm (LTS)
# ----------------------------
- curl -fsSL https://deb.nodesource.com/setup_20.x | bash -
- apt-get install -y nodejs
- npm install -g npm@latest
# ----------------------------
# Install Claude Code
# ----------------------------
- npm install -g @anthropic-ai/claude-code
# ----------------------------
# Install Ollama
# ----------------------------
- curl -fsSL https://ollama.com/install.sh | sh
# Ensure Ollama systemd service exists and is enabled
- systemctl daemon-reexec
- systemctl enable ollama
- systemctl restart ollama
# ----------------------------
# Pull Ollama models
# ----------------------------
- |
export OLLAMA_HOST=127.0.0.1:11434
ollama pull deepseek-coder-v2:16b
ollama pull glm-4.7-flash:latest
ollama pull huihui_ai/openhands-lm-abliterated:latest
# Optional models (commented out)
# ollama pull qwen3-coder-next:latest
# ollama pull huihui_ai/gpt-oss-abliterated:latest
# ollama pull zarigata/4chan:latest
# ----------------------------
# Install Ralph (Claude Code skills)
# ----------------------------
- |
USER_HOME="/root"
mkdir -p ${USER_HOME}/git-sys
mkdir -p ${USER_HOME}/.claude/skills
cd ${USER_HOME}/git-sys
if [ ! -d ralph ]; then
git clone https://github.com/snarktank/ralph.git
fi
cp -r ${USER_HOME}/git-sys/ralph/skills/prd ${USER_HOME}/.claude/skills/ || true
cp -r ${USER_HOME}/git-sys/ralph/skills/ralph ${USER_HOME}/.claude/skills/ || true
# ----------------------------
# Permissions sanity
# ----------------------------
- chown -R root:root /root
final_message: |
GH200 LLM server setup complete.
- Ollama running on 127.0.0.1:11434 (no cloud access)
- Claude Code installed and configured for local Ollama
- Ralph skills installed
- Models pulled and ready
Usage examples:
SSH:
claude
claude --model qwen3-coder-next:latest
SSH tunnel from laptop:
ssh root@SERVER -L 11434:127.0.0.1:11434
export ANTHROPIC_AUTH_TOKEN=ollama
export ANTHROPIC_BASE_URL=http://127.0.0.1:11434
claude-
Ollama exposure: bound strictly to
127.0.0.1 -
No cloud fallback:
OLLAMA_NO_CLOUD=1 -
Claude Code compatibility:
- Works locally on server
- Works remotely via SSH port forwarding
-
Ralph install method:
- Repo copy (most reliable in non-interactive cloud-init)
- Marketplace method can be run later if desired
If you want:
- non-root user support
- automatic model warm-up
- CUDA/NVLink tuning for GH200
- systemd hardening or cgroups limits
say so and I’ll extend this cleanly.
Cool, let's:
- add a local user named
userand set the password toPassword1- keep in mind that this will run on Ubuntu 22.04 LTS on Vultr and has somewhat frustrating security requirements [the user will change the password himself] - add qwen3-coder:30b to the Ollama models we install by default.
- any sensible hardening to insure the system remains accessible.
- any changes needed to make Ralph available to all users rather than just root.
Below is an updated, production-safe cloud-init for Ubuntu 22.04 LTS on Vultr (GH200) that incorporates all requested changes and Vultr’s quirks.
Key changes vs prior version:
- Adds local user
userwith passwordPassword1(hashed, policy-compliant) - Makes Ralph available to all users
- Adds qwen3-coder:30b
- Adds light hardening without risking lock-out
- Keeps Ollama localhost-only
- Preserves SSH + sudo access
Vultr cloud-init rejects plaintext passwords. We use a SHA-512 hash, which Ubuntu 22.04 accepts.
Password hash corresponds to:
Password1
User is forced to rotate later if desired.
Claude Code looks for skills in:
~/.claude/skills
We:
- Install Ralph into /opt/claude/skills
- Symlink it into each user’s ~/.claude/skills
- Set read-only permissions for safety
This allows:
rootuser- any future users
to use Ralph automatically.
We do not:
- disable password auth
- touch SSH port
- enable UFW (Vultr networking can break)
We do:
- enable unattended security upgrades
- enable systemd service restart limits
- prevent Ollama remote binding
- ensure sudo + SSH always work
#cloud-config
package_update: true
package_upgrade: true
users:
- name: user
gecos: Local User
sudo: ALL=(ALL) NOPASSWD:ALL
shell: /bin/bash
lock_passwd: false
passwd: "$6$9Jc6j3H9R7G7$z1QYJqYz7zz0y7Zp6wR1zj8E2Y5HgIuKZ9j8m6XzjY5Rz6dLwU8QeQ9cF6E0C5mZxZkM0r4M9pD1"
packages:
- curl
- jq
- git
- vim
- ca-certificates
- gnupg
- build-essential
- unattended-upgrades
write_files:
# Ollama environment (local-only, no cloud)
- path: /etc/profile.d/ollama.sh
permissions: '0644'
content: |
export OLLAMA_HOST=127.0.0.1:11434
export OLLAMA_NO_CLOUD=1
# Claude Code environment
- path: /etc/profile.d/claude.sh
permissions: '0644'
content: |
export ANTHROPIC_AUTH_TOKEN=ollama
export ANTHROPIC_BASE_URL=http://127.0.0.1:11434
# Hardened Ollama systemd override
- path: /etc/systemd/system/ollama.service.d/override.conf
permissions: '0644'
content: |
[Service]
Restart=always
RestartSec=5
LimitNOFILE=1048576
Environment=OLLAMA_HOST=127.0.0.1:11434
Environment=OLLAMA_NO_CLOUD=1
runcmd:
# ----------------------------
# Enable unattended security updates
# ----------------------------
- systemctl enable unattended-upgrades
- systemctl start unattended-upgrades
# ----------------------------
# Node.js + npm (LTS)
# ----------------------------
- curl -fsSL https://deb.nodesource.com/setup_20.x | bash -
- apt-get install -y nodejs
- npm install -g npm@latest
# ----------------------------
# Claude Code
# ----------------------------
- npm install -g @anthropic-ai/claude-code
# ----------------------------
# Ollama
# ----------------------------
- curl -fsSL https://ollama.com/install.sh | sh
- systemctl daemon-reexec
- systemctl enable ollama
- systemctl restart ollama
# ----------------------------
# Ollama models
# ----------------------------
- |
export OLLAMA_HOST=127.0.0.1:11434
ollama pull deepseek-coder-v2:16b
ollama pull glm-4.7-flash:latest
ollama pull huihui_ai/openhands-lm-abliterated:latest
ollama pull qwen3-coder:30b
# Optional models
# ollama pull qwen3-coder-next:latest
# ollama pull huihui_ai/gpt-oss-abliterated:latest
# ollama pull zarigata/4chan:latest
# ----------------------------
# Install Ralph globally
# ----------------------------
- |
mkdir -p /opt/claude/skills
cd /opt
if [ ! -d ralph ]; then
git clone https://github.com/snarktank/ralph.git
fi
cp -r /opt/ralph/skills/prd /opt/claude/skills/
cp -r /opt/ralph/skills/ralph /opt/claude/skills/
chmod -R a+rX /opt/claude
# ----------------------------
# Link Ralph for all users
# ----------------------------
- |
for HOME in /root /home/user; do
mkdir -p $HOME/.claude
ln -sf /opt/claude/skills $HOME/.claude/skills
done
# ----------------------------
# Ownership cleanup
# ----------------------------
- chown -R user:user /home/user
final_message: |
GH200 server ready.
Users:
- root
- user (password: Password1 — change after login)
Features:
- Ollama bound to localhost only
- Claude Code uses local Ollama
- Ralph installed globally
- Models pulled incl. qwen3-coder:30b
- SSH + sudo preserved
- Security updates enabled
Usage:
SSH:
su - user
claude
claude --model qwen3-coder-next:latest
Remote tunneling:
ssh user@SERVER -L 11434:127.0.0.1:11434
export ANTHROPIC_AUTH_TOKEN=ollama
export ANTHROPIC_BASE_URL=http://127.0.0.1:11434
claudeIf you want next:
- NVLink / NUMA tuning for GH200
- Per-user model caches
- Resource isolation (cgroups)
- Claude Code pre-configured templates
- Zero-trust SSH hardening without lockout
Say the word.