Skip to content

Instantly share code, notes, and snippets.

@Foadsf
Created January 29, 2026 13:46
Show Gist options
  • Select an option

  • Save Foadsf/1b82e19e6bf21976a9ffad0e998b05cd to your computer and use it in GitHub Desktop.

Select an option

Save Foadsf/1b82e19e6bf21976a9ffad0e998b05cd to your computer and use it in GitHub Desktop.

Radxa Dragon Q6A - AI Quick Start Guide

This guide enables Hardware Accelerated AI on the Radxa Dragon Q6A. We will run Llama 3.2 (LLM) on the NPU and Whisper (Speech) on the CPU to create a fully voice-interactive system.

Hardware: Radxa Dragon Q6A (QCS6490)
OS: Ubuntu 24.04 Noble (T7 Image or newer)
Status: βœ… Verified Working (Jan 2026)


πŸ› οΈ Step 1: System Preparation

Run these commands once to install drivers and set permissions.

1. Install Dependencies

sudo apt update
sudo apt install -y fastrpc fastrpc-dev libcdsprpc1 radxa-firmware-qcs6490 \
    python3-pip python3.12-venv libportaudio2 ffmpeg git alsa-utils

2. Set Permanent NPU Permissions

This ensures you don't get "Permission Denied" errors after rebooting.

sudo tee /etc/udev/rules.d/99-fastrpc.rules << 'EOF'
KERNEL=="fastrpc-*", MODE="0666"
SUBSYSTEM=="dma_heap", KERNEL=="system", MODE="0666"
EOF

# Apply immediately
sudo udevadm control --reload-rules
sudo udevadm trigger

3. Create Python Virtual Environment

We use a virtual environment to prevent dependency conflicts with system packages.

# Create and activate
python3 -m venv ~/qai-venv
source ~/qai-venv/bin/activate

# Upgrade pip
pip install --upgrade pip

# Install AI tools (Whisper via Transformers, Audio libraries)
pip install "transformers[torch]" librosa soundfile sounddevice accelerate

Note: We use HuggingFace Transformers for Whisper instead of qai_hub_models because the QCS6490 NPU requires quantized models, and the qai_hub_models Whisper variants don't support quantization for this device.


πŸ¦™ Step 2: Setup Llama 3.2 (NPU)

We use the 4096-context model for better conversation memory.

1. Download Model

(Note: requires ~2GB space)

# Ensure you are NOT in the venv for this part (using system tools for binary download)
deactivate 2>/dev/null

# Install downloader
pip3 install modelscope --break-system-packages

# Download
mkdir -p ~/llama-4k && cd ~/llama-4k
modelscope download --model radxa/Llama3.2-1B-4096-qairt-v68 --local_dir .

# Make the runner executable
chmod +x genie-t2t-run

2. Create "Chat" Shortcut

Create a simple script to run the NPU model.

cd ~/llama-4k
cat << 'EOF' > chat
#!/bin/bash
cd ~/llama-4k
export LD_LIBRARY_PATH="$(pwd):$LD_LIBRARY_PATH"

# Llama 3 Prompt Format
PROMPT="<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\n$1<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n"

./genie-t2t-run -c htp-model-config-llama32-1b-gqa.json -p "$PROMPT"
EOF
chmod +x chat

Test it: ~/llama-4k/chat "What is the capital of France?"


πŸŽ™οΈ Step 3: Setup Whisper (Speech-to-Text)

We run Whisper on the CPU using HuggingFace Transformers. This approach is reliable and produces accurate transcriptions.

1. Create "Transcribe" Script

cat << 'EOF' > ~/transcribe.sh
#!/bin/bash
# Transcribe audio using Whisper via HuggingFace Transformers

source ~/qai-venv/bin/activate

python3 << PYTHON
from transformers import pipeline
import warnings
import sys

# Suppress deprecation warnings
warnings.filterwarnings("ignore")

# Use whisper-tiny for speed, or whisper-small for accuracy
pipe = pipeline(
    "automatic-speech-recognition",
    model="openai/whisper-tiny",
    device="cpu"
)

result = pipe("$1", generate_kwargs={"language": "en", "task": "transcribe"})
print(result["text"].strip())
PYTHON
EOF
chmod +x ~/transcribe.sh

Model Options:

  • openai/whisper-tiny (39M params) - Fastest, ~1x realtime
  • openai/whisper-base (74M params) - Balanced
  • openai/whisper-small (244M params) - Most accurate, ~2.3x realtime

2. Verify Audio

Download a sample file to test the system.

cd ~
wget https://huggingface.co/datasets/Narsil/asr_dummy/resolve/main/1.flac -O test_audio.wav
~/transcribe.sh test_audio.wav

Expected Output: "He hoped there would be stew for dinner, turnips and carrots and bruised potatoes..."

3. (Optional) Benchmark Transcription Speed

cat << 'EOF' > ~/benchmark_whisper.sh
#!/bin/bash
source ~/qai-venv/bin/activate

python3 << 'PYTHON'
from transformers import WhisperProcessor, WhisperForConditionalGeneration
import torch
import librosa
import time

# Load model
print("Loading Whisper Small...")
processor = WhisperProcessor.from_pretrained("openai/whisper-small")
model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-small")

# Load audio
audio, sr = librosa.load("test_audio.wav", sr=16000)
duration = len(audio) / sr
print(f"Audio: {duration:.2f} seconds")

# Transcribe
inputs = processor(audio, sampling_rate=16000, return_tensors="pt")
start = time.time()
with torch.no_grad():
    ids = model.generate(inputs.input_features, language="en", task="transcribe")
elapsed = time.time() - start

text = processor.batch_decode(ids, skip_special_tokens=True)[0]
print(f"\nTranscription: {text}")
print(f"Time: {elapsed:.2f}s | Realtime factor: {elapsed/duration:.2f}x")
PYTHON
EOF
chmod +x ~/benchmark_whisper.sh

πŸ€– Step 4: The "Jarvis" Demo (Voice to AI)

Combine both tools! This script records your voice, converts it to text, sends it to Llama, and prints the answer.

1. Create the Voice Assistant Script

cat << 'EOF' > ~/voice-chat.sh
#!/bin/bash

RECORDING="$HOME/my_voice.wav"

echo "πŸ”΄ Recording... (Press Ctrl+C to stop, or wait 5 seconds)"
arecord -d 5 -f S16_LE -r 16000 -c 1 -t wav "$RECORDING" 2>/dev/null
echo "βœ… Processing..."

# 1. Speech to Text (Whisper on CPU)
echo "πŸŽ™οΈ Transcribing..."
USER_TEXT=$(~/transcribe.sh "$RECORDING")
echo "πŸ—£οΈ  You said: $USER_TEXT"

if [ -z "$USER_TEXT" ]; then
    echo "❌ No speech detected."
    exit 1
fi

# 2. Text to Intelligence (Llama on NPU)
echo "πŸ€– AI Thinking..."
~/llama-4k/chat "$USER_TEXT"
EOF
chmod +x ~/voice-chat.sh

2. Run It

Plug in a USB microphone and run:

~/voice-chat.sh

πŸ“Š Performance Summary

Component Model Processor Performance
Brain Llama 3.2 1B (4096) NPU (Hexagon) ~15 tokens/sec
Ears Whisper Tiny CPU (Kryo) ~1x realtime
Ears Whisper Small CPU (Kryo) ~2.3x realtime
Memory System RAM Shared ~2.5 GB Total

Why not Whisper on NPU?
The QCS6490 NPU requires quantized (INT8) model I/O, but the qai_hub_models Whisper variants only support float precision. Qualcomm's pre-quantized Whisper models require AIMET-ONNX which isn't available for aarch64 Linux. CPU inference via Transformers is the most reliable path.


πŸ› οΈ Advanced: Using Larger Whisper Models

For better accuracy with longer audio or difficult accents:

# Edit ~/transcribe.sh and change the model line:
# whisper-tiny  β†’ whisper-base  β†’ whisper-small β†’ whisper-medium

# Or create a high-accuracy version:
cat << 'EOF' > ~/transcribe-accurate.sh
#!/bin/bash
source ~/qai-venv/bin/activate

python3 << PYTHON
from transformers import pipeline
import warnings
warnings.filterwarnings("ignore")

pipe = pipeline(
    "automatic-speech-recognition",
    model="openai/whisper-small",
    device="cpu"
)

result = pipe("$1", generate_kwargs={"language": "en", "task": "transcribe"})
print(result["text"].strip())
PYTHON
EOF
chmod +x ~/transcribe-accurate.sh

πŸ› Troubleshooting

Issue Solution
Permission denied (/dev/fastrpc) Run the Step 1 udev commands and reboot.
genie-t2t-run: not found Ensure you are in ~/llama-4k and run chmod +x genie-t2t-run.
ModuleNotFoundError (Python) Run source ~/qai-venv/bin/activate before using scripts.
Whisper outputs gibberish Audio may be corrupt. Check with aplay your_audio.wav.
ALSA lib... warnings Safe to ignore; audio still records correctly.
Slow Whisper performance Use whisper-tiny instead of whisper-small.

πŸ“š References

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment