Skip to content

Instantly share code, notes, and snippets.

@StevenSong
Created December 19, 2025 14:42
Show Gist options
  • Select an option

  • Save StevenSong/2f69e668e0a6aee920ccc9e50e95e10a to your computer and use it in GitHub Desktop.

Select an option

Save StevenSong/2f69e668e0a6aee920ccc9e50e95e10a to your computer and use it in GitHub Desktop.
steps to use cline AI coding agent in VSCode using a locally served LLM

VSCode + Cline + Ollama + Docker + Qwen3 Coder 30B

For environments that need to be particularly locked down and code/data should not be sent to an external service, a locally served LLM can still be used as a backend to agentic AI coding tools. This gist details steps to use Cline AI coding agent in VSCode using a locally served LLM running in an Ollama Docker image (assuming you use VSCode ± RemoteSSH on the same machine that will serve the model):

  1. start Ollama docker:
    docker run -d --rm --gpus='"device=0"' -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
    • requires having docker with nvidia container toolkit installed
    • set gpu device index to control specific resource usage on multi-gpu systems
  2. serve a capable agentic code model inside the container (at the time of writing, Cline suggests Qwen3 Coder 30B at 8-bit quantization):
    docker exec ollama ollama run qwen3-coder:30b-a3b-q8_0
    • 8-bit quantization fits on <80GB of VRAM for 32k token context window (tested with an A100 80GB)
    • 4-bit quantization is also available and fits on <40GB of VRAM: ollama qwen3-coder tags
  3. install Cline using VSCode Extensions
  4. configure Cline:
    • use Ollama as the API provider
    • select the served model
    • select the checkbox for "Use compact prompt"
    • change the auto-approve permissions as desired

References

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment