Verified: February 2026
Hardware: Radxa Dragon Q6A (Qualcomm QCS6490)
OS: Ubuntu 24.04 (Noble)
To create a "Jarvis-like" voice assistant that runs entirely on-device (offline) with low latency.
Getting AI running on embedded NPUs is often a trade-off between "possible" and "practical."
- Llama 3.2 (The Brain): Runs beautifully on the NPU using Qualcomm's
genieruntime. It's fast (~15-20 t/s) and efficient. - Whisper (The Ears): While running Whisper on the NPU is possible, the available pre-compiled models often lack the "Decoder Loop" logic required to turn math into text.
- Lesson: Instead of fighting complex C++ graphs for NPU Whisper, we use Whisper-Tiny on the CPU. It is lightweight, reliable, and more than fast enough for real-time speech.
- Dependencies: Ubuntu 24.04 is strict about Python packages. Always use a virtual environment (
venv).
- Radxa Dragon Q6A board
- Microphone (USB or onboard)
- Internet connection (for initial setup)
Install the audio and system libraries required for recording and processing.
sudo apt update
sudo apt install -y fastrpc fastrpc-dev libcdsprpc1 radxa-firmware-qcs6490 \
python3-pip python3-venv libportaudio2 ffmpeg git
The NPU (Hexagon DSP) needs permission to run. Create a udev rule so you don't need sudo later.
- Edit
/etc/udev/rules.d/99-fastrpc.rules:
KERNEL=="fastrpc-*", MODE="0666"
SUBSYSTEM=="dma_heap", KERNEL=="system", MODE="0666"
- Apply changes:
sudo udevadm control --reload-rules
sudo udevadm trigger
Set up a clean environment. Crucial: We pin transformers to version 4.48.1 to avoid audio pipeline bugs in newer versions.
# Create and activate venv
python3 -m venv ~/ai-assistant-venv
source ~/ai-assistant-venv/bin/activate
# Install core libraries
pip install --upgrade pip
pip install modelscope sounddevice soundfile numpy torch
# Install specific transformers version for Whisper stability
pip install "transformers==4.48.1"
We use modelscope to fetch the Radxa-packaged Llama model which includes the genie binary.
mkdir -p ~/llama-npu && cd ~/llama-npu
pip install modelscope # Temporarily for download
modelscope download --model radxa/Llama3.2-1B-4096-qairt-v68 --local_dir .
# Make the NPU runner executable
chmod +x genie-t2t-run
- Download/Create
llama_engine.py(The NPU wrapper). - Download/Create
npu_voice_assistant.py(The main application). - Place them in your home folder (e.g.,
~/ai-assistant/).
Always ensure your microphone is plugged in and your volume is up.
source ~/ai-assistant-venv/bin/activate
python3 npu_voice_assistant.py
Credits: Developed through trial and error on the Radxa forums and community debugging sessions.