100% Local AI Voice Chat with Custom Voice

A fully local, privacy-first voice chat setup running on a single machine (tested on WSL2 with an NVIDIA RTX 4070 SUPER). You talk to an LLM through a web UI using your microphone, it responds with a cloned custom voice — all processing stays on your hardware. No cloud APIs required.

There is a fairly high latency on non-Mac setup, but it is usable if you are patient. You can privately voice chat with your favorite video game character now!

Note that this README is not 100% tested, the real setup has been evolving organically. It may be a bit chaotic too. You may try asking if you are stuck on something. I recommend you just work through it together with an AI assistant. E.g.:

Open terminal and start wsl (you will need WSL set up on Windows; on Linux, this should all also work, and even easier)
Copy this file to your home directory.
Set up nenv with node-22.
npx --yes @mariozechner/pi-coding-agent@latest, /login, pick Antigravity, login with your google account (you will get a free tier quota), /model, pick claude-opus-4-6
Ask Opus to read the README and help you set things up.
If there is a problem, open a separate session to solve it to save context so your free quota lasts a bit.

Architecture Overview

      ┌────────────────────────────────────────┐
      │          LLM Backend (LM Studio)       │
      │       e.g. Qwen3-14B @ localhost:1234  │
      └──────┬─────────────────────────────────┘
             │ OpenAI-compat chat API
             ▼
┌─────────────────────────────────────────────────────────────┐
│                    Open WebUI (:8080)                        │
│              Web UI for chat + voice I/O                     │
│         socat SSL proxy (:8443) ← browser mic needs HTTPS   │
└────────────┬──────────────────────────┬─────────────────────┘
             │ STT (OpenAI-compat)      │ TTS (OpenAI-compat)
             ▼                          ▼
┌────────────────────────┐  ┌─────────────────────────────────┐
│  Parakeet TDT 0.6B v3  │  │  Qwen3-TTS 1.7B + Voice Clone  │
│  ONNX/CPU ASR (:5092)  │  │  GPU TTS (:8880)                │
│  Docker container       │  │  Custom voice via .pkl prompt   │
└────────────────────────┘  └─────────────────────────────────┘

Components launched by talk.sh:

#	Component	Port	Role
1	Parakeet TDT 0.6B v3 (Docker/CPU)	5092	Speech-to-Text — blazing fast ONNX ASR, OpenAI-compatible API
2	Qwen3-TTS (GPU, uv)	8880	Text-to-Speech — with a cloned custom voice loaded from a `.pkl` prompt
3	Open WebUI (uv)	8080	Chat frontend — connects to your local LLM and wires STT + TTS together
4	socat SSL proxy	8443	HTTPS wrapper — browsers require HTTPS for microphone access

Not launched by talk.sh (run separately):

LLM backend — e.g. LM Studio serving a model on localhost:1234, or Ollama on localhost:11434. Configure this in Open WebUI after first launch.
For RTX 4070S with 16GB VRAM, we recommend Qwen3-14B (great at roleplay!) with Context Length set to ~8000 and GPU Offload set to 15/40 (so that there is enough free VRAM for Qwen3-TTS).
When using a small model, set a fairly short prompt. Example (explore on your own):

You are Vel'koz (the champion from LoL).
(It is unclear if Vel'Koz was the first Void-spawn to emerge on Runeterra, but there has certainly never been another to match his level of cruel, calculating sentience. While his kin devour or defile everything around them, he seeks instead to scrutinize and study the physical realm—and the strange, warlike beings that dwell there—for any weakness the Void might exploit. But Vel'Koz is far from a passive observer, striking back at threats with deadly plasma, or by disrupting the very fabric of the world itself.)

You are a friend(?), not an assistant.
You do not talk very much.
/no_think

Prerequisites

OS: Linux or WSL2 on Windows
GPU: NVIDIA GPU with CUDA support (for Qwen3-TTS; ~4-6 GB VRAM for the 1.7B model)
NVIDIA Driver: 525+ (CUDA 12.x)
Docker + Docker Compose: For the Parakeet STT container
uv: Python package manager (install)
socat + openssl: For the HTTPS proxy (sudo apt install socat openssl)
A local LLM server: LM Studio, Ollama, or any OpenAI-compatible endpoint

Setup

1. Parakeet TDT 0.6B v3 — Speech-to-Text (Docker)

Ultra-fast multilingual ASR using NVIDIA's Parakeet TDT model converted to ONNX INT8. Runs on CPU (~30x real-time on modern Intel CPUs), exposed as an OpenAI-compatible /v1/audio/transcriptions endpoint.

git clone https://github.com/groxaxo/parakeet-tdt-0.6b-v3-fastapi-openai
cd parakeet-tdt-0.6b-v3-fastapi-openai
docker compose up parakeet-cpu -d

The first run will build the Docker image and download the model (~1.2 GB). Verify it's working:

# Health check
curl http://localhost:5092/health

# Web UI for testing
# Open http://localhost:5092 in a browser

2. Qwen3-TTS — Text-to-Speech with Voice Cloning (GPU)

Qwen3-TTS serves an OpenAI-compatible /v1/audio/speech endpoint. It uses the Base model variant which supports voice cloning — a pre-extracted voice prompt (.pkl file) is loaded at startup so every response uses your custom voice.

Note: You must use the pasky/Qwen3-TTS-Openai-Fastapi fork — it adds CUSTOM_VOICE prompt support and automatic speech batching on top of the upstream repo.

2a. Clone the repo and set up the environment

mkdir -p ~/tts-clone
cd ~/tts-clone
git clone https://github.com/pasky/Qwen3-TTS-Openai-Fastapi
cd Qwen3-TTS-Openai-Fastapi
uv sync

The first uv sync will create a .venv and install all dependencies (including PyTorch with CUDA). The Qwen3-TTS model weights (~3.4 GB) are downloaded automatically on first server start from HuggingFace.

2b. Create a custom voice prompt (one-time)

You need a voice prompt .pkl file — a pickled list of voice features extracted from reference audio. There are two ways to create one:

Option A: Using the Voice Studio web UI (recommended)

Start the server with ENABLE_VOICE_STUDIO=true (see step 2c), then open the Gradio Voice Studio at http://localhost:8880/voice-studio. Use the "Voice Clone" tab to upload reference audio, provide its transcript, generate a test sample, and save the profile. Export the profile as a .pkl file.

Option B: Using the standalone cloning script

Create a script like clone_voice.py in the ~/tts-clone/ directory:

cd ~/tts-clone

# Install dependencies for the cloning script
uv venv
# (this will redownload the qwen-tts model, ymmv you can do this in the cloned repo above)
uv pip install qwen-tts torch torchaudio soundfile

# Prepare reference audio:
# - Use 5-15 seconds of clean speech from your target voice
# - Provide an accurate transcript of what's said in the audio
# - Supported formats: WAV, OGG, MP3

# Run the cloning script (example with a local file):
# To generate a script like this, just ask a coding agent to
# "generate a wrapper around model.create_voice_clone_prompt that
# will pickle and save its output" and give it this chapter's context.
uv run python clone_velkoz.py \
  --ref_audio "reference_audio.wav" \
  --ref_text "Exact transcript of the reference audio." \
  --save_prompt my_voice_prompt.pkl \
  --text "Test sentence to verify the cloned voice." \
  --output test_output.wav

This produces:

my_voice_prompt.pkl — the reusable voice prompt (pass as CUSTOM_VOICE env var)
test_output.wav — a test audio file to verify the voice sounds right

You can iterate on the reference audio and transcript until you're happy with the result. Multiple reference utterances concatenated together tend to produce better voice quality.

2c. Start the TTS server

cd ~/tts-clone/Qwen3-TTS-Openai-Fastapi

ENABLE_VOICE_STUDIO=true \
CUSTOM_VOICE=../my_voice_prompt.pkl \
TTS_MODEL_NAME=Qwen/Qwen3-TTS-12Hz-1.7B-Base \
HOST=0.0.0.0 \
PORT=8880 \
uv run python -m api.main

Key environment variables:

Variable	Value	Purpose
`TTS_MODEL_NAME`	`Qwen/Qwen3-TTS-12Hz-1.7B-Base`	Must use the Base model for voice cloning (not CustomVoice)
`CUSTOM_VOICE`	Path to `.pkl` file	Pre-extracted voice prompt — all TTS output uses this voice
`ENABLE_VOICE_STUDIO`	`true`	Enables the Gradio Voice Studio UI at `/voice-studio`
`HOST` / `PORT`	`0.0.0.0` / `8880`	Listen address and port

Verify it's working:

curl -X POST http://localhost:8880/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{"input": "Hello, this is a test.", "voice": "alloy"}' \
  --output test.mp3

3. Open WebUI — Chat Frontend

Open WebUI provides the chat interface with built-in voice input/output support. It connects to your local LLM backend and routes STT/TTS through the services above.

3a. Install Open WebUI

mkdir -p ~/openwebui
cd ~/openwebui
uv venv
uv pip install open-webui

However, upstream Open WebUI has voice mode bugs (as of early 2026) — playback stalls when TTS generation is slower than sentence playback, ellipses break sentence splitting, and the voice prompt mode resets on every restart. The pasky/open-webui fork fixes these. To use it instead:

mkdir -p ~/openwebui
cd ~/openwebui
git clone https://github.com/pasky/open-webui .
# Install frontend dependencies and build
npm install
npm run build
# Install backend
uv venv
uv sync

3b. Generate a self-signed SSL certificate

Browsers require HTTPS to access the microphone. We use socat to wrap Open WebUI's HTTP port in SSL:

cd ~/openwebui

# Generate a self-signed certificate (valid 30 days)
openssl req -x509 -newkey rsa:4096 -keyout localhost.key -out localhost.cert \
  -days 30 -nodes -subj '/CN=localhost'

# Generate DH parameters (optional, for stronger SSL)
openssl dhparam -out dhparams.pem 512

# Combine key + cert into a single PEM file (required by socat)
cat localhost.key localhost.cert > localhost.pem
chmod 600 localhost.key localhost.pem

3c. Start Open WebUI + SSL proxy

cd ~/openwebui

# Start the SSL proxy (background)
socat ssl-l:8443,reuseaddr,fork,cert=localhost.pem,verify=0 tcp4-connect:localhost:8080 &

# Start Open WebUI
uv run open-webui serve

Open WebUI will be available at:

http://localhost:8080 — direct HTTP (no mic access)
https://localhost:8443 — via SSL proxy (use this for voice chat)

On first visit to https://localhost:8443, your browser will warn about the self-signed certificate — accept/trust it to proceed.

3d. Configure Open WebUI

After creating your admin account on first launch:

1. Connect your LLM backend:

Go to Admin Panel → Settings → Connections
Add your local LLM endpoint, e.g.:
- LM Studio: http://localhost:1234/v1 (OpenAI API section)
- Ollama: http://localhost:11434 (auto-detected; disable if not using Ollama!)
Save and verify models appear in the Models screen of Settings

2. Configure Speech-to-Text (Parakeet):

Go to Settings → Audio
Set STT Engine to OpenAI
Set OpenAI Base URL to http://localhost:5092/v1
Set OpenAI API Key to sk-no-key-required
Leave STT Model empty

3. Configure Text-to-Speech (Qwen3-TTS):

Go to Settings → Audio
Set TTS Engine to OpenAI
Set OpenAI Base URL to http://localhost:8880/v1
Set OpenAI API Key to sk-no-key-required
Set TTS Voice to custom
Set TTS Model to qwen3-tts

4. Switch off personality-numbing voice mode defaults:

Go to Settings -> Interface
Disable Voice Mode Custom Prompt
(you may need to do that on followup restarts unless you deployed the openwebui code modifications recommended; if in voice mode, personality is degraded compared to text chat, you can also open LMStudio Developer screen, scroll back to the last "role": "system" block, and double check it ends with /no_think or whatever you have in your system prompt, and not something random "you are a helpful assistant replying in short sentences" like junk)

Running Everything — `talk.sh`

This script starts everything, save it as talk.sh:

#!/bin/bash
cd /home/freeman/parakeet-tdt-0.6b-v3-fastapi-openai/
docker compose up parakeet-cpu &

cd /home/freeman/tts-clone/Qwen3-TTS-Openai-Fastapi
VLLM_OMNI_LOG_LEVEL=DEBUG VLLM_OMNI_LOG_LEVEL=DEBUG ENABLE_VOICE_STUDIO=true CUSTOM_VOICE=../velkoz_prompt.pkl TTS_MODEL_NAME=Qwen/Qwen3-TTS-12Hz-1.7B-Base HOST=0.0.0.0 PORT=8880 uv run python -m api.main &

cd /home/freeman/openwebui
socat ssl-l:8443,reuseaddr,fork,cert=localhost.pem,verify=0 tcp4-connect:localhost:8080 &
uv run open-webui serve

Once all components are set up (and LMStudio fired up with the model and API enabled!), talk.sh launches everything in one go:

chmod +x talk.sh
./talk.sh

From Windows, you can launch it directly into WSL:

C:\Windows\System32\wsl.exe bash --login -c "cd /home/freeman; ./talk.sh"

The script starts all four services (Parakeet Docker container, Qwen3-TTS server, socat SSL proxy, Open WebUI) and then you can open https://localhost:8443 in your browser to start chatting with voice.

Note that this setup expects wsl network to be in a mirror (?) mode where locally open ports appear on localhost on Windows wide as well as on Linux side.

Use the "voice mode" in openwebui for best experience (it should be smoother than mic and read aloud icons).

Stopping

Press Ctrl+C to stop the foreground Open WebUI process. Then clean up the background processes:

# Stop the Parakeet container
docker compose -f ~/parakeet-tdt-0.6b-v3-fastapi-openai/docker-compose.yml down

# Kill background socat and TTS processes
kill %1 %2  # or: pkill -f socat; pkill -f "api.main"

Troubleshooting

Problem	Solution
Browser says "microphone blocked"	Make sure you're using `https://localhost:8443`, not HTTP
Certificate warning in browser	Expected with self-signed certs — click "Advanced" → "Proceed"
Parakeet container won't start	Run `docker compose up parakeet-cpu` (not `parakeet-gpu`) and check `docker logs parakeet-cpu`
TTS server crashes on start	Ensure you have enough VRAM (~4-6 GB). Check that `CUSTOM_VOICE` path is correct and points to a valid `.pkl`
TTS output sounds wrong/generic	Verify `TTS_MODEL_NAME` is set to the Base model (`Qwen3-TTS-12Hz-1.7B-Base`), not `CustomVoice`
"Model not found" in Open WebUI	Check that your LLM backend (LM Studio/Ollama) is running and the connection URL is correct
STT not working in Open WebUI	Verify Parakeet is healthy (`curl http://localhost:5092/health`) and the Audio settings use `OpenAI` engine with the correct base URL
Port conflicts	Ensure nothing else is using ports 5092, 8080, 8443, or 8880
SSL certificate expired	Regenerate with the `openssl req` command above (default validity is 30 days)

Component Details

Parakeet TDT 0.6B v3

Repo: https://github.com/groxaxo/parakeet-tdt-0.6b-v3-fastapi-openai
Model: NVIDIA Parakeet TDT 0.6B v3, ONNX INT8 quantized
Runs on: CPU (Docker container)
Performance: ~30x real-time on modern CPUs
Languages: 25 European languages with auto-detection
API: OpenAI-compatible /v1/audio/transcriptions

Qwen3-TTS

Repo: https://github.com/pasky/Qwen3-TTS-Openai-Fastapi (fork of groxaxo/Qwen3-TTS-Openai-Fastapi)
Model: Qwen/Qwen3-TTS-12Hz-1.7B-Base (supports voice cloning)
Runs on: GPU (CUDA) — ~4-6 GB VRAM
Languages: 10+ languages (English, Chinese, Japanese, Korean, German, French, Russian, Portuguese, Spanish, Italian)
API: OpenAI-compatible /v1/audio/speech
Voice Cloning: Load a pre-built .pkl prompt via CUSTOM_VOICE env var

Open WebUI

Site: https://openwebui.com/
Recommended fork: https://github.com/pasky/open-webui (voice mode fixes)
Version: 0.7.2+
Runs on: CPU (Python/uv)
Default port: 8080 (proxied to 8443 via socat for HTTPS)

pasky/README-local-voice.md

Select an option

No results found

Select an option

No results found

100% Local AI Voice Chat with Custom Voice

Architecture Overview

Prerequisites

Setup

1. Parakeet TDT 0.6B v3 — Speech-to-Text (Docker)

2. Qwen3-TTS — Text-to-Speech with Voice Cloning (GPU)

2a. Clone the repo and set up the environment

2b. Create a custom voice prompt (one-time)

2c. Start the TTS server

3. Open WebUI — Chat Frontend

3a. Install Open WebUI

3b. Generate a self-signed SSL certificate

3c. Start Open WebUI + SSL proxy

3d. Configure Open WebUI

Running Everything — `talk.sh`

Stopping

Troubleshooting

Component Details

Parakeet TDT 0.6B v3

Qwen3-TTS

Open WebUI

pasky/README-local-voice.md

100% Local AI Voice Chat with Custom Voice

Architecture Overview

Prerequisites

Setup

1. Parakeet TDT 0.6B v3 — Speech-to-Text (Docker)

2. Qwen3-TTS — Text-to-Speech with Voice Cloning (GPU)

2a. Clone the repo and set up the environment

2b. Create a custom voice prompt (one-time)

2c. Start the TTS server

3. Open WebUI — Chat Frontend

3a. Install Open WebUI

3b. Generate a self-signed SSL certificate

3c. Start Open WebUI + SSL proxy

3d. Configure Open WebUI

Running Everything — talk.sh

Stopping

Troubleshooting

Component Details

Parakeet TDT 0.6B v3

Qwen3-TTS

Open WebUI

Running Everything — `talk.sh`