VSCode + Cline + Ollama + Docker + Qwen3 Coder 30B

For environments that need to be particularly locked down and code/data should not be sent to an external service, a locally served LLM can still be used as a backend to agentic AI coding tools. This gist details steps to use Cline AI coding agent in VSCode using a locally served LLM running in an Ollama Docker image (assuming you use VSCode ± RemoteSSH on the same machine that will serve the model):

start Ollama docker:
```
docker run -d --rm --gpus='"device=0"' -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
```
- requires having docker with nvidia container toolkit installed
- set gpu device index to control specific resource usage on multi-gpu systems
serve a capable agentic code model inside the container (at the time of writing, Cline suggests Qwen3 Coder 30B at 8-bit quantization):
```
docker exec ollama ollama run qwen3-coder:30b-a3b-q8_0
```
- 8-bit quantization fits on <80GB of VRAM for 32k token context window (tested with an A100 80GB)
- 4-bit quantization is also available and fits on <40GB of VRAM: ollama qwen3-coder tags
install Cline using VSCode Extensions
configure Cline:
- use Ollama as the API provider
- select the served model
- select the checkbox for "Use compact prompt"
- change the auto-approve permissions as desired

StevenSong/local-coding-agent.md

Select an option

No results found

Select an option

No results found

VSCode + Cline + Ollama + Docker + Qwen3 Coder 30B

References