For environments that need to be particularly locked down and code/data should not be sent to an external service, a locally served LLM can still be used as a backend to agentic AI coding tools. This gist details steps to use Cline AI coding agent in VSCode using a locally served LLM running in an Ollama Docker image (assuming you use VSCode ± RemoteSSH on the same machine that will serve the model):
- start Ollama docker:
docker run -d --rm --gpus='"device=0"' -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama- requires having docker with nvidia container toolkit installed
- set gpu device index to control specific resource usage on multi-gpu systems
- serve a capable agentic code model inside the container (at the time of writing, Cline suggests Qwen3 Coder 30B at 8-bit quantization):