Running Claude Code Locally with Ollama and Open-Source Models as a Free Alternative to the Anthropic API
Claude Code's API costs add up fast for heavy users, often $50 to $200+/month on Opus 4.5/4.6. Ollama (v0.14.0+) now supports the Anthropic Messages API natively, which means Claude Code can run against local open-source models at zero cost, with no data leaving the machine.
This guide covers the full setup: installing Ollama and Claude Code, choosing a model that fits 16 GB of RAM, connecting the pieces, and understanding the real tradeoffs.
Claude Code is Anthropic's terminal-based coding agent. It reads codebases, edits files, runs shell commands, calls tools, and handles multi-step workflows, all from the command line via natural language.
Under the hood, it communicates with Anthropic's API, typically using Claude Sonnet 4.5 or Opus 4.5/4.6. Opus 4.5 charges roughly $15 per million input tokens and $75 per million output tokens. Daily use during active development routinely reaches $50 to $200/month. The Claude Max subscription ($100 to $200/month) flattens that cost but remains significant for independent developers, students, and hobbyists.
Ollama runs large language models locally. It handles model downloads, quantization, memory management, and API serving. One command pulls a model, another runs it. It supports macOS, Windows, and Linux across both Apple Silicon and NVIDIA GPUs.
Since v0.14.0 (January 2026), Ollama exposes an Anthropic-compatible Messages API on localhost:11434. This is the same protocol Claude Code uses to reach Anthropic's servers. By redirecting Claude Code to Ollama's local endpoint, the agent continues to function (file editing, tool calling, multi-turn reasoning) but inference runs on a local open-source model instead of Anthropic's cloud.
No API key. No usage bill. No data transmitted externally.
Claude Code is not an autocomplete tool or a chat wrapper. It is an agent that operates inside the terminal with the following capabilities:
- Codebase awareness: reads project structure, files, and dependencies
- Direct file editing: writes and modifies code across multiple files
- Shell execution: runs commands, tests, and package installations
- Tool calling: invokes external tools and chains multi-step operations
- Git integration: handles commits, branches, and diffs
- Multi-turn reasoning: plans, iterates, and refines across conversation turns
The agent itself is free to install. The cost comes from the model it talks to. This guide replaces the paid model with a free, locally-hosted one.
Ollama is an open-source tool for downloading, managing, and serving LLMs (large language models) on local hardware. It abstracts away model format handling (GGUF quantization, memory allocation, GPU offloading) behind a CLI and HTTP API.
Key details:
- Runs on macOS, Windows 11, and Linux
- Supports Apple Silicon (unified memory) and NVIDIA GPUs (CUDA)
- Falls back to CPU inference when no GPU is available (much slower)
- Serves models via a local HTTP API on port 11434
- Since v0.14.0, that API includes Anthropic Messages API compatibility
- Since v0.15.0, the
ollama launchcommand automates Claude Code configuration
| Setup | Monthly Cost | Notes |
|---|---|---|
| Claude Code + Opus 4.5 API | ~$50 to $200+ | Scales with token usage |
| Claude Max subscription | $100 to $200 | Flat rate |
| Claude Code + Ollama (local) | $0 | Electricity only |
| Claude Code + Ollama Cloud | Free tier available | Paid plans start at ~$3/month |
Annual savings range from $600 to $2,400 depending on prior usage. The tradeoff is model capability, covered in the Caveats section below.
| Component | Specification |
|---|---|
| OS | macOS 13.0+ (Apple Silicon recommended) or Windows 11 |
| RAM | 16 GB minimum |
| Disk | 15 to 25 GB free for model files |
| Ollama | v0.15+ (ollama.com/download) |
| Claude Code | Current release (code.claude.com) |
| Internet | Required for initial downloads only |
16 GB of RAM limits model selection to the 14B to 20B parameter range. The experience will be noticeably slower than on 32 GB+ machines, and model quality drops compared to larger models. Specific model recommendations for this constraint follow in Step 2.
Download the installer from ollama.com/download.
macOS: Open the .dmg, drag Ollama to Applications. It runs as a background service.
Windows 11: Run OllamaSetup.exe and follow the prompts. It installs as a system service.
Verify the installation:
ollama --versionExpected output: ollama version is 0.15.x (or newer). If the command fails, the Ollama service may not be running. Start it manually with ollama serve in a separate terminal window.
Model selection determines the quality/speed/memory tradeoff. These are the current recommendations for 16 GB RAM systems, ordered by general coding effectiveness:
| Model | Download Size | Strengths | Command |
|---|---|---|---|
gpt-oss:20b |
~13 GB | Strong coding, reliable tool calling. Top pick at this memory tier. | ollama pull gpt-oss:20b |
glm-4.7-flash |
~12 GB | MoE architecture (30B total, 3B active per token). Fast inference, native tool calling, 128K context. | ollama pull glm-4.7-flash |
qwen3-coder:14b |
~9 GB | Coding-specialized. Lower memory footprint, reasonable quality. | ollama pull qwen3-coder:14b |
devstral-small |
~14 GB | Mistral's coding model. Competent at general development tasks. | ollama pull devstral-small |
Start with gpt-oss:20b. If memory pressure causes slowdowns or heavy swapping, drop to qwen3-coder:14b.
Ollama also hosts cloud-served models accessible through the same CLI. These run at full context length on remote infrastructure with a free tier:
ollama pull glm-4.7:cloud
ollama pull gpt-oss:120b-cloud
ollama pull minimax-m2.1:cloudCloud models are significantly more capable than anything that fits in 16 GB locally. They serve as a practical fallback when local inference is too slow or too limited for a given task. Note: data does leave the machine when using cloud models.
For this guide, using gpt-oss:20b:
ollama pull gpt-oss:20bThis downloads approximately 13 GB of model weights. After completion, verify:
ollama listThe model should appear with its name, ID, and size.
Claude Code installs as a standalone binary. The npm installation method is deprecated; use the native installer.
curl -fsSL https://claude.ai/install.sh | bashReload the shell configuration:
source ~/.bashrc # or: source ~/.zshrcirm https://claude.ai/install.ps1 | iexcurl -fsSL https://claude.ai/install.cmd -o install.cmd && install.cmd && del install.cmdclaude --versionIf the command is not found, confirm that ~/.local/bin or ~/.claude/bin is in the system PATH. On macOS/Linux:
echo 'export PATH="$HOME/.local/bin:$HOME/.claude/bin:$PATH"' >> ~/.bashrc
source ~/.bashrcThree configuration methods, from simplest to most flexible:
ollama launch claudeThis walks through model selection and starts Claude Code with the correct environment variables. No manual configuration required.
macOS / Linux:
export ANTHROPIC_BASE_URL="http://localhost:11434"
export ANTHROPIC_AUTH_TOKEN="ollama"
export ANTHROPIC_API_KEY=""
claude --model gpt-oss:20bAdd the three export lines to ~/.bashrc or ~/.zshrc to persist across sessions.
Windows (PowerShell):
$env:ANTHROPIC_BASE_URL = "http://localhost:11434"
$env:ANTHROPIC_AUTH_TOKEN = "ollama"
$env:ANTHROPIC_API_KEY = ""
claude --model gpt-oss:20bCreate or edit ~/.claude/settings.json:
{
"env": {
"ANTHROPIC_BASE_URL": "http://localhost:11434",
"ANTHROPIC_AUTH_TOKEN": "ollama",
"ANTHROPIC_API_KEY": ""
}
}Then run Claude Code with the --model flag:
claude --model gpt-oss:20bNavigate to a project directory and start Claude Code:
cd ~/my-project
claude --model gpt-oss:20bTest with a prompt:
What files are in this project and what does it do?
or:
Write a Python function that reads a CSV and returns the top 5 rows sorted by a given column.
Claude Code reads the project files, reasons about the request, and writes or edits code, all powered by the local model.
Disconnect from the internet and run a prompt. A successful response confirms fully local operation with no external data transmission.
Claude Code performs better with large context windows. Ollama recommends at least 64K tokens for coding tools. On a 16 GB RAM machine, 16K to 32K is more realistic to avoid excessive memory pressure.
Set context length via environment variable before starting Ollama:
export OLLAMA_CONTEXT_LENGTH=32000
ollama serveCloud models do not have this constraint; they run at their full context length on remote infrastructure.
These are real constraints, not footnotes.
Inference speed. Local inference on 16 GB hardware is slow. Expect 10 to 60 seconds per response depending on complexity. Multi-file refactors can take several minutes. This is a fundamental hardware limitation, not a software bug.
Model quality. Open-source models in the 14B to 20B parameter range are competent for common coding patterns, code explanation, test generation, and standard refactoring. They fall short on complex multi-step reasoning, novel architecture decisions, and tasks requiring deep domain knowledge. They are not comparable to Opus 4.5/4.6 in capability.
Tool calling reliability. Claude Code depends on the model's ability to produce correctly formatted tool calls. gpt-oss:20b and glm-4.7-flash handle this consistently. Other models may fail intermittently. If tool calling breaks repeatedly with a given model, switch to one of these two.
Memory pressure at 16 GB. Running a 13 GB model leaves roughly 3 GB for the OS, context window, and other applications. Close unnecessary programs. Expect swapping if running memory-heavy applications alongside inference. If the system becomes unresponsive, switch to a smaller model or use Ollama Cloud.
Maturity. Ollama's Anthropic API compatibility shipped in January 2026. Edge cases in streaming and tool calling are still being patched. Check Ollama's release notes for fixes relevant to Claude Code workflows.
# --- ONE-TIME SETUP ---
# Install Ollama (or download from ollama.com/download)
curl -fsSL https://ollama.com/install.sh | sh
# Pull a model
ollama pull gpt-oss:20b
# Install Claude Code
curl -fsSL https://claude.ai/install.sh | bash
source ~/.bashrc
# --- DAILY USE ---
# Easiest method (Ollama v0.15+):
ollama launch claude
# Manual method:
export ANTHROPIC_BASE_URL="http://localhost:11434"
export ANTHROPIC_AUTH_TOKEN="ollama"
export ANTHROPIC_API_KEY=""
cd ~/my-project
claude --model gpt-oss:20b
# --- MANAGEMENT ---
ollama list # List installed models
ollama pull <model> # Download a model
ollama rm <model> # Remove a model
claude --version # Check Claude Code versionTo return to Anthropic's API for tasks that require a more capable model, unset the environment variables:
unset ANTHROPIC_BASE_URL
unset ANTHROPIC_AUTH_TOKEN
unset ANTHROPIC_API_KEYRun claude without --model to use the default Anthropic backend. A practical workflow: use local models for routine development, switch to Anthropic for tasks where model quality is the bottleneck.
- Ollama - Download
- Ollama Blog - Claude Code Compatibility
- Ollama Blog -
ollama launch - Ollama Docs - Claude Code Integration
- Claude Code - Setup Documentation
- Ollama Docs - Anthropic API Compatibility
February 2026. Models and tooling evolve rapidly; verify versions against official documentation before following these steps.