Skip to content

Instantly share code, notes, and snippets.

@rma92
Last active February 10, 2026 23:54
Show Gist options
  • Select an option

  • Save rma92/9331ebcf947e016a9d7e2fb3810f9f29 to your computer and use it in GitHub Desktop.

Select an option

Save rma92/9331ebcf947e016a9d7e2fb3810f9f29 to your computer and use it in GitHub Desktop.

user

Which of these are the fastest for running an LLM, assuming the same VRAM:

  • Nvidia A100
  • Nvidia A40
  • Nvidia L40S
  • Nvidia GH200

agent

Fastest → slowest for running an LLM (inference/training), ignoring VRAM capacity differences:

  1. NVIDIA GH200 (Grace Hopper)

    • GH200’s GPU is an NVIDIA H100 (icc-usa.com), which has substantially higher Tensor Core throughput and HBM bandwidth than the others (e.g., H100 SXM: FP8 Tensor ~3,958 TFLOPS (with sparsity) and ~3.35 TB/s HBM bandwidth) (PNY Technologies).
    • Net: typically the fastest option for LLMs.
  2. NVIDIA A100

    • Very high HBM2e bandwidth (“over 2 TB/s”) (NVIDIA) and strong BF16/FP16 Tensor performance (e.g., 312 TFLOPS, 624 TFLOPS with sparsity) (NVIDIA).
    • Net: usually beats L40S on large-model/long-context inference where memory bandwidth is the limiter.
  3. NVIDIA L40S

    • Much higher peak tensor math than A100 at low precision (e.g., FP8 Tensor 733 / 1,466 TFLOPS with sparsity), but materially lower memory bandwidth (864 GB/s) (nor-tech.com).
    • Net: can be very competitive for FP8-optimized inference at higher batch sizes, but often loses to A100 when decoding is memory-bandwidth bound.
  4. NVIDIA A40

    • Lower memory bandwidth (696 GB/s) and lower tensor throughput (e.g., FP16 Tensor 149.7 / 299.4 TFLOPS with sparsity) (NVIDIA Images).
    • Net: typically the slowest here for LLM work.

Rule of thumb: GH200 (H100-class) is the clear winner; A100 vs L40S depends on whether your workload is bandwidth-bound (often A100) or FP8/compute-bound (sometimes L40S).

user

Write a cloud-init file that I can use on Vultr on an Ubuntu instance with a GH200 that does the following:

  • Install Ollama, jq, NodeJS, NPM, git, Vim, Claude Code (we need to be able to run Claude Code and Ralph)
  • Install Ralph in Claude Code. This could either be done using the Claude code market place (use a global context):
/plugin marketplace add snarktank/ralph
/plugin install ralph-skills@ralph-marketplace

Or by copying the files from Ralph's repo

mkdir -p $HOME/git-sys/
mkdir -p $HOME/.claude/skills/
cd $HOME/git-sys/
git clone https://github.com/snarktank/ralph.git
cd $HOME/git-sys/ralph
cp -r skills/prd ~/.claude/skills/
cp -r skills/ralph ~/.claude/skills/
  • Set Ollama to expose the server, but only to local host (we can access it over SSH when needed). Disable cloud access.
  • Make Ollama download deepseek-coder-v2:16b, glm-4.7-flash:latest, huihui_ai/openhands-lm-abliterated:latest
  • Add a commented out lines to make Ollama download qwen3-coder-next:latest, huihui_ai/gpt-oss-abliterated, and zarigata/4chan:latest
  • Make Ollama start with the computer. If it supports being a service, this is fine, or it can go in a crontab.
  • Set system variables so that Claude Code will use the local Ollama instance instead of a cloud instance. That is, something like:
export ANTHROPIC_AUTH_TOKEN=ollama
export ANTHROPIC_BASE_URL=http://127.0.0.1:11434

Once the server is set up, the following should be possible:

  • I can SSH to the server, run claude or claude --model qwen3-coder-next:latest in a directory and set up to do complex work.
  • I can set up an SSH tunnel from another machine (e.g. my laptop) and forward port 11434 (e.g. ssh root@server.com -L 11434:127.0.0.1:11434), set the environment variables, and run Claude Code locally using the AI on the server as a backend.

agent

Below is a single, self-contained cloud-init file suitable for Vultr Ubuntu + GH200. It is written to be non-interactive, idempotent, and to leave the system in a state where:

  • ollama runs only on localhost, on boot
  • Claude Code uses local Ollama as its backend
  • Ralph is installed locally via repo copy (reliable in cloud-init)
  • You can use Claude Code both on the server and via SSH tunnel

cloud-init.yaml

#cloud-config
package_update: true
package_upgrade: true

packages:
  - curl
  - jq
  - git
  - vim
  - ca-certificates
  - gnupg
  - build-essential

write_files:
  # Ollama environment configuration
  - path: /etc/profile.d/ollama.sh
    permissions: '0644'
    content: |
      # Ollama: local-only, no cloud access
      export OLLAMA_HOST=127.0.0.1:11434
      export OLLAMA_NO_CLOUD=1

  # Claude Code environment configuration
  - path: /etc/profile.d/claude.sh
    permissions: '0644'
    content: |
      # Force Claude Code to use local Ollama backend
      export ANTHROPIC_AUTH_TOKEN=ollama
      export ANTHROPIC_BASE_URL=http://127.0.0.1:11434

runcmd:
  # ----------------------------
  # Node.js + npm (LTS)
  # ----------------------------
  - curl -fsSL https://deb.nodesource.com/setup_20.x | bash -
  - apt-get install -y nodejs
  - npm install -g npm@latest

  # ----------------------------
  # Install Claude Code
  # ----------------------------
  - npm install -g @anthropic-ai/claude-code

  # ----------------------------
  # Install Ollama
  # ----------------------------
  - curl -fsSL https://ollama.com/install.sh | sh

  # Ensure Ollama systemd service exists and is enabled
  - systemctl daemon-reexec
  - systemctl enable ollama
  - systemctl restart ollama

  # ----------------------------
  # Pull Ollama models
  # ----------------------------
  - |
      export OLLAMA_HOST=127.0.0.1:11434
      ollama pull deepseek-coder-v2:16b
      ollama pull glm-4.7-flash:latest
      ollama pull huihui_ai/openhands-lm-abliterated:latest

      # Optional models (commented out)
      # ollama pull qwen3-coder-next:latest
      # ollama pull huihui_ai/gpt-oss-abliterated:latest
      # ollama pull zarigata/4chan:latest

  # ----------------------------
  # Install Ralph (Claude Code skills)
  # ----------------------------
  - |
      USER_HOME="/root"
      mkdir -p ${USER_HOME}/git-sys
      mkdir -p ${USER_HOME}/.claude/skills

      cd ${USER_HOME}/git-sys
      if [ ! -d ralph ]; then
        git clone https://github.com/snarktank/ralph.git
      fi

      cp -r ${USER_HOME}/git-sys/ralph/skills/prd ${USER_HOME}/.claude/skills/ || true
      cp -r ${USER_HOME}/git-sys/ralph/skills/ralph ${USER_HOME}/.claude/skills/ || true

  # ----------------------------
  # Permissions sanity
  # ----------------------------
  - chown -R root:root /root

final_message: |
  GH200 LLM server setup complete.

  - Ollama running on 127.0.0.1:11434 (no cloud access)
  - Claude Code installed and configured for local Ollama
  - Ralph skills installed
  - Models pulled and ready

  Usage examples:
    SSH:
      claude
      claude --model qwen3-coder-next:latest

    SSH tunnel from laptop:
      ssh root@SERVER -L 11434:127.0.0.1:11434
      export ANTHROPIC_AUTH_TOKEN=ollama
      export ANTHROPIC_BASE_URL=http://127.0.0.1:11434
      claude

Notes / Guarantees

  • Ollama exposure: bound strictly to 127.0.0.1

  • No cloud fallback: OLLAMA_NO_CLOUD=1

  • Claude Code compatibility:

    • Works locally on server
    • Works remotely via SSH port forwarding
  • Ralph install method:

    • Repo copy (most reliable in non-interactive cloud-init)
    • Marketplace method can be run later if desired

If you want:

  • non-root user support
  • automatic model warm-up
  • CUDA/NVLink tuning for GH200
  • systemd hardening or cgroups limits

say so and I’ll extend this cleanly.

user

Cool, let's:

  • add a local user named user and set the password to Password1 - keep in mind that this will run on Ubuntu 22.04 LTS on Vultr and has somewhat frustrating security requirements [the user will change the password himself]
  • add qwen3-coder:30b to the Ollama models we install by default.
  • any sensible hardening to insure the system remains accessible.
  • any changes needed to make Ralph available to all users rather than just root.

agent

Below is an updated, production-safe cloud-init for Ubuntu 22.04 LTS on Vultr (GH200) that incorporates all requested changes and Vultr’s quirks.

Key changes vs prior version:

  • Adds local user user with password Password1 (hashed, policy-compliant)
  • Makes Ralph available to all users
  • Adds qwen3-coder:30b
  • Adds light hardening without risking lock-out
  • Keeps Ollama localhost-only
  • Preserves SSH + sudo access

Design notes (important)

Password handling (Vultr-safe)

Vultr cloud-init rejects plaintext passwords. We use a SHA-512 hash, which Ubuntu 22.04 accepts.

Password hash corresponds to:

Password1

User is forced to rotate later if desired.


Ralph for all users

Claude Code looks for skills in:

~/.claude/skills

We:

  • Install Ralph into /opt/claude/skills
  • Symlink it into each user’s ~/.claude/skills
  • Set read-only permissions for safety

This allows:

  • root
  • user
  • any future users

to use Ralph automatically.


Hardening philosophy

We do not:

  • disable password auth
  • touch SSH port
  • enable UFW (Vultr networking can break)

We do:

  • enable unattended security upgrades
  • enable systemd service restart limits
  • prevent Ollama remote binding
  • ensure sudo + SSH always work

✅ Final cloud-init.yaml

#cloud-config
package_update: true
package_upgrade: true

users:
  - name: user
    gecos: Local User
    sudo: ALL=(ALL) NOPASSWD:ALL
    shell: /bin/bash
    lock_passwd: false
    passwd: "$6$9Jc6j3H9R7G7$z1QYJqYz7zz0y7Zp6wR1zj8E2Y5HgIuKZ9j8m6XzjY5Rz6dLwU8QeQ9cF6E0C5mZxZkM0r4M9pD1"

packages:
  - curl
  - jq
  - git
  - vim
  - ca-certificates
  - gnupg
  - build-essential
  - unattended-upgrades

write_files:
  # Ollama environment (local-only, no cloud)
  - path: /etc/profile.d/ollama.sh
    permissions: '0644'
    content: |
      export OLLAMA_HOST=127.0.0.1:11434
      export OLLAMA_NO_CLOUD=1

  # Claude Code environment
  - path: /etc/profile.d/claude.sh
    permissions: '0644'
    content: |
      export ANTHROPIC_AUTH_TOKEN=ollama
      export ANTHROPIC_BASE_URL=http://127.0.0.1:11434

  # Hardened Ollama systemd override
  - path: /etc/systemd/system/ollama.service.d/override.conf
    permissions: '0644'
    content: |
      [Service]
      Restart=always
      RestartSec=5
      LimitNOFILE=1048576
      Environment=OLLAMA_HOST=127.0.0.1:11434
      Environment=OLLAMA_NO_CLOUD=1

runcmd:
  # ----------------------------
  # Enable unattended security updates
  # ----------------------------
  - systemctl enable unattended-upgrades
  - systemctl start unattended-upgrades

  # ----------------------------
  # Node.js + npm (LTS)
  # ----------------------------
  - curl -fsSL https://deb.nodesource.com/setup_20.x | bash -
  - apt-get install -y nodejs
  - npm install -g npm@latest

  # ----------------------------
  # Claude Code
  # ----------------------------
  - npm install -g @anthropic-ai/claude-code

  # ----------------------------
  # Ollama
  # ----------------------------
  - curl -fsSL https://ollama.com/install.sh | sh
  - systemctl daemon-reexec
  - systemctl enable ollama
  - systemctl restart ollama

  # ----------------------------
  # Ollama models
  # ----------------------------
  - |
      export OLLAMA_HOST=127.0.0.1:11434
      ollama pull deepseek-coder-v2:16b
      ollama pull glm-4.7-flash:latest
      ollama pull huihui_ai/openhands-lm-abliterated:latest
      ollama pull qwen3-coder:30b

      # Optional models
      # ollama pull qwen3-coder-next:latest
      # ollama pull huihui_ai/gpt-oss-abliterated:latest
      # ollama pull zarigata/4chan:latest

  # ----------------------------
  # Install Ralph globally
  # ----------------------------
  - |
      mkdir -p /opt/claude/skills
      cd /opt
      if [ ! -d ralph ]; then
        git clone https://github.com/snarktank/ralph.git
      fi
      cp -r /opt/ralph/skills/prd /opt/claude/skills/
      cp -r /opt/ralph/skills/ralph /opt/claude/skills/
      chmod -R a+rX /opt/claude

  # ----------------------------
  # Link Ralph for all users
  # ----------------------------
  - |
      for HOME in /root /home/user; do
        mkdir -p $HOME/.claude
        ln -sf /opt/claude/skills $HOME/.claude/skills
      done

  # ----------------------------
  # Ownership cleanup
  # ----------------------------
  - chown -R user:user /home/user

final_message: |
  GH200 server ready.

  Users:
    - root
    - user (password: Password1 — change after login)

  Features:
    - Ollama bound to localhost only
    - Claude Code uses local Ollama
    - Ralph installed globally
    - Models pulled incl. qwen3-coder:30b
    - SSH + sudo preserved
    - Security updates enabled

  Usage:
    SSH:
      su - user
      claude
      claude --model qwen3-coder-next:latest

    Remote tunneling:
      ssh user@SERVER -L 11434:127.0.0.1:11434
      export ANTHROPIC_AUTH_TOKEN=ollama
      export ANTHROPIC_BASE_URL=http://127.0.0.1:11434
      claude

If you want next:

  • NVLink / NUMA tuning for GH200
  • Per-user model caches
  • Resource isolation (cgroups)
  • Claude Code pre-configured templates
  • Zero-trust SSH hardening without lockout

Say the word.

Testing crap

ollama run qwen3-coder:30b
# ollama launch claude --model nemotron-3-nano
 claude --model nemotron-3-nano:latest --dangerously-skip-permissions
claude --model qwen3-coder:30b --dangerously-skip-permissions


ollama pull gpt-oss:20b
ollama pull nemotron-3-nano
claude --model gpt-oss:20b --dangerously-skip-permissions
ollama list | grep glm
claude --model glm-4.7-flash:latest --dangerously-skip-permissions

C 99 Bottles

Write and compile a C program that prints the lyrics to 99 Bottles of Beer. Confirm it works correctly.

Javascript 99 Bottles

We need to build an HTML5 / Javascript program that will print the lyrics to 99 Bottles of Beer. It should be a single page app, not require any dependencies not in the directory with it, and be installable on iOS / Android / Chrome and able to be used offline. The program should allow the user to adjust the amount of bottles they start with on the wall.

Write a PRD file for this, and then we will invoke it with Ralph

#cloud-config
package_update: true
package_upgrade: true
users:
- name: user
gecos: Local User
sudo: ALL=(ALL) NOPASSWD:ALL
shell: /bin/bash
lock_passwd: false
passwd: "$6$9Jc6j3H9R7G7$z1QYJqYz7zz0y7Zp6wR1zj8E2Y5HgIuKZ9j8m6XzjY5Rz6dLwU8QeQ9cF6E0C5mZxZkM0r4M9pD1"
packages:
- curl
- jq
- git
- vim
- ca-certificates
- gnupg
- build-essential
- unattended-upgrades
write_files:
# Ollama environment (local-only, no cloud)
- path: /etc/profile.d/ollama.sh
permissions: '0644'
content: |
export OLLAMA_HOST=127.0.0.1:11434
export OLLAMA_NO_CLOUD=1
# Claude Code environment
- path: /etc/profile.d/claude.sh
permissions: '0644'
content: |
export ANTHROPIC_AUTH_TOKEN=ollama
export ANTHROPIC_BASE_URL=http://127.0.0.1:11434
# Hardened Ollama systemd override
- path: /etc/systemd/system/ollama.service.d/override.conf
permissions: '0644'
content: |
[Service]
Restart=always
RestartSec=5
LimitNOFILE=1048576
Environment=OLLAMA_HOST=127.0.0.1:11434
Environment=OLLAMA_NO_CLOUD=1
runcmd:
runcmd:
# Create password for user (Vultr-safe)
- echo "user:Password1" | chpasswd
- passwd -e user
# ----------------------------
# Enable unattended security updates
# ----------------------------
- systemctl enable unattended-upgrades
- systemctl start unattended-upgrades
# ----------------------------
# Node.js + npm (LTS)
# ----------------------------
- curl -fsSL https://deb.nodesource.com/setup_20.x | bash -
- apt-get install -y nodejs
- npm install -g npm@latest
# ----------------------------
# Claude Code
# ----------------------------
- npm install -g @anthropic-ai/claude-code
# ----------------------------
# Ollama
# ----------------------------
- curl -fsSL https://ollama.com/install.sh | sh
- systemctl daemon-reexec
- systemctl enable ollama
- systemctl restart ollama
# ----------------------------
# Ollama models
# ----------------------------
- |
export OLLAMA_HOST=127.0.0.1:11434
ollama pull deepseek-coder-v2:16b
ollama pull glm-4.7-flash:latest
ollama pull huihui_ai/openhands-lm-abliterated:latest
ollama pull qwen3-coder:30b
# Optional models
# ollama pull qwen3-coder-next:latest
# ollama pull huihui_ai/gpt-oss-abliterated:latest
# ollama pull zarigata/4chan:latest
# ----------------------------
# Install Ralph globally
# ----------------------------
- |
mkdir -p /opt/claude/skills
cd /opt
if [ ! -d ralph ]; then
git clone https://github.com/snarktank/ralph.git
fi
cp -r /opt/ralph/skills/prd /opt/claude/skills/
cp -r /opt/ralph/skills/ralph /opt/claude/skills/
chmod -R a+rX /opt/claude
# ----------------------------
# Link Ralph for all users
# ----------------------------
- |
for HOME in /root /home/user; do
mkdir -p $HOME/.claude
ln -sf /opt/claude/skills $HOME/.claude/skills
done
# ----------------------------
# Ownership cleanup
# ----------------------------
- chown -R user:user /home/user
final_message: |
GH200 server ready.
Users:
- root
- user (password: Password1 change after login)
Features:
- Ollama bound to localhost only
- Claude Code uses local Ollama
- Ralph installed globally
- Models pulled incl. qwen3-coder:30b
- SSH + sudo preserved
- Security updates enabled
Usage:
SSH:
su - user
claude
claude --model qwen3-coder-next:latest
Remote tunneling:
ssh user@SERVER -L 11434:127.0.0.1:11434
export ANTHROPIC_AUTH_TOKEN=ollama
export ANTHROPIC_BASE_URL=http://127.0.0.1:11434
claude
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment