Claudir Architecture — Part 7: Database Schema

Part 7: Database Schema

Each bot maintains its own SQLite database (claudir.db) plus a shared bot-to-bot messaging database. The schema covers ~20 tables organized by domain.

Core Messaging

-- All messages the bot has seen (group + DM)
messages (
    chat_id INTEGER NOT NULL,
    message_id INTEGER NOT NULL,
    user_id INTEGER NOT NULL,
    username TEXT NOT NULL,
    timestamp TEXT NOT NULL,
    text TEXT NOT NULL,
    reply_to_id INTEGER,          -- ID of the replied-to message
    reply_to_username TEXT,        -- Who wrote the reply target
    reply_to_text TEXT,            -- Text of the reply target
    has_image INTEGER DEFAULT 0,
    image_file_id TEXT,            -- Telegram file_id for image retrieval
    model TEXT,                    -- Which Claude model generated the response (if bot)
    PRIMARY KEY (chat_id, message_id)
)
-- Indexed on: timestamp, user_id, username, chat_id

-- Known users per chat
users (
    chat_id INTEGER NOT NULL,
    user_id INTEGER NOT NULL,
    username TEXT,
    first_name TEXT NOT NULL,
    join_date TEXT NOT NULL,
    last_message_date TEXT,
    message_count INTEGER DEFAULT 0,
    status TEXT DEFAULT 'member',  -- 'member', 'left', 'banned'
    PRIMARY KEY (chat_id, user_id)
)
-- Indexed on: username, status, chat_id

Both tables use composite primary keys (chat_id, *) — a migration from the original single-chat schema to support multi-chat operation.

Spam Enforcement

-- Strike tracking (persisted across restarts)
strikes (
    user_id INTEGER PRIMARY KEY,
    count INTEGER NOT NULL DEFAULT 0,
    last_strike_at TEXT
)

Reminders

reminders (
    id INTEGER PRIMARY KEY,
    chat_id INTEGER NOT NULL,
    user_id INTEGER NOT NULL,          -- Who created it
    message TEXT NOT NULL,             -- What to send
    trigger_at TEXT NOT NULL,          -- ISO8601 for time-based triggers
    repeat_cron TEXT,                  -- Cron expression for periodic (NULL = one-time)
    created_at TEXT NOT NULL,
    last_triggered_at TEXT,
    active INTEGER DEFAULT 1,
    -- Token-based triggers (alternative to time-based)
    token_trigger_at INTEGER,          -- One-time: fire when cumulative tokens >= this
    repeat_every_tokens INTEGER,       -- Repeating: fire every N tokens consumed
    last_token_trigger INTEGER         -- Track tokens at last trigger for repeat
)
-- Indexed on: trigger_at (active only), chat_id (active only), token_trigger_at (active + non-null)

Reminders support three trigger modes: one-time at a specific time, periodic via cron expression, and token-threshold triggers that fire based on cumulative API usage. The background reminder task checks every 60 seconds.

Focus Mode

-- Singleton: current focus target (NULL = process all)
focus_state (
    id INTEGER PRIMARY KEY CHECK (id = 1),
    focused_chat_id INTEGER,
    focused_at TEXT
)

-- Per-chat cursor tracking
focus_chats (
    chat_id INTEGER PRIMARY KEY,
    chat_title TEXT,
    cursor_message_id INTEGER DEFAULT 0,  -- Last processed message_id
    last_injected_at TEXT,                -- Debounce: when context was last injected
    last_pending_count INTEGER,           -- Debounce: message count at last injection
    created_at TEXT NOT NULL
)

-- Human-friendly names for chat IDs
chat_aliases (
    alias TEXT PRIMARY KEY,
    chat_id INTEGER NOT NULL,
    created_at TEXT NOT NULL
)

Muting

-- Muted group chats
muted_chats (
    chat_id INTEGER PRIMARY KEY,
    muted_at TEXT NOT NULL,
    muted_until TEXT,              -- NULL = indefinite, otherwise ISO8601
    reason TEXT,
    messages_received INTEGER DEFAULT 0,
    unique_users TEXT DEFAULT '[]'  -- JSON array of user_ids (for unmute summary)
)

-- Muted DMs (separate from chat muting)
muted_dms (
    muted_at TEXT NOT NULL,
    muted_until TEXT,
    reason TEXT,
    messages_received INTEGER DEFAULT 0,
    unique_users TEXT DEFAULT '[]'
)

Billing & DMs

-- Star balances for paid DMs
user_balances (
    user_id INTEGER PRIMARY KEY,
    balance INTEGER DEFAULT 0,
    total_deposited INTEGER DEFAULT 0,
    total_spent INTEGER DEFAULT 0,
    created_at TEXT NOT NULL,
    updated_at TEXT NOT NULL
)

-- Audit log for all star transactions
transactions (
    id INTEGER PRIMARY KEY,
    user_id INTEGER NOT NULL,
    amount INTEGER NOT NULL,          -- Positive = deposit, negative = spend
    balance_after INTEGER NOT NULL,
    transaction_type TEXT NOT NULL,    -- 'deposit', 'dm_text', 'dm_image_flash', etc.
    description TEXT,
    created_at TEXT NOT NULL
)

-- DM rate limiting (per user per hour)
dm_rate_limits (
    user_id INTEGER NOT NULL,
    hour TEXT NOT NULL,                -- 'YYYY-MM-DD HH' format
    message_count INTEGER DEFAULT 0,
    PRIMARY KEY (user_id, hour)
)

-- Free trial tracking (lifetime free messages per user)
dm_free_trial (
    user_id INTEGER PRIMARY KEY,
    messages_used INTEGER DEFAULT 0,
    created_at TEXT NOT NULL
)

-- Write-ahead log for crash-safe billing
-- Records intent BEFORE deducting stars; incomplete entries auto-refunded on startup
pending_dms (
    id INTEGER PRIMARY KEY,
    user_id INTEGER NOT NULL,
    chat_id INTEGER NOT NULL,
    message_text TEXT,
    cost INTEGER NOT NULL,
    created_at TEXT NOT NULL,
    completed_at TEXT                  -- NULL = incomplete, non-NULL = processed
)

-- Privacy consent tracking
dm_privacy_consent (
    user_id INTEGER PRIMARY KEY,
    consent_version TEXT NOT NULL,     -- e.g., "v1", "v2"
    consented_at TEXT NOT NULL,
    declined_at TEXT                   -- If user ever declined (for audit)
)

The pending_dms table is notable — it implements a write-ahead log pattern for billing. Before deducting stars from a user's balance, the harness writes the intent to pending_dms. If the process crashes mid-transaction, startup logic scans for incomplete entries (where completed_at IS NULL) and auto-refunds them. This ensures no user is ever charged for a message that wasn't delivered.

Search & Embeddings

-- Message embeddings for semantic auto-recall
embeddings (
    chat_id INTEGER NOT NULL,
    message_id INTEGER NOT NULL,
    embedding BLOB NOT NULL,          -- 768 f32 values = 3072 bytes
    text_preview TEXT NOT NULL,       -- First 200 chars for display
    username TEXT NOT NULL,
    timestamp TEXT NOT NULL,
    PRIMARY KEY (chat_id, message_id)
)

-- Memory file embeddings for RAG over notes/research
memory_embeddings (
    id INTEGER PRIMARY KEY,
    file_path TEXT NOT NULL,
    chunk_index INTEGER NOT NULL,     -- Chunk number within file (0-based)
    chunk_text TEXT NOT NULL,
    embedding BLOB NOT NULL,          -- 768 f32 values = 3072 bytes
    updated_at TEXT NOT NULL,
    UNIQUE(file_path, chunk_index)
)

-- LLM-generated summaries of message chunks (PageIndex)
page_index (
    id INTEGER PRIMARY KEY,
    chat_id INTEGER NOT NULL,
    start_message_id INTEGER NOT NULL,
    end_message_id INTEGER NOT NULL,
    start_timestamp TEXT NOT NULL,
    end_timestamp TEXT NOT NULL,
    message_count INTEGER NOT NULL,
    summary TEXT NOT NULL,            -- LLM-generated summary of the message chunk
    created_at TEXT NOT NULL,
    UNIQUE(chat_id, start_message_id)
)

PageIndex stores LLM-generated summaries of message chunks (pages). The page_search tool uses these summaries for retrieval — a cheaper and more robust approach than pure vector similarity. A separate page-index binary generates the summaries and runs as a background process.

Operations

-- Cross-bot heartbeat monitoring
heartbeats (
    bot_name TEXT PRIMARY KEY,        -- "nodira", "mirzo", "dilya"
    last_heartbeat TEXT NOT NULL,     -- ISO8601 timestamp
    iteration_count INTEGER DEFAULT 0 -- Total CC response cycles completed
)

-- Channel post rate limiting
channel_posts (
    chat_id INTEGER NOT NULL,
    date TEXT NOT NULL,
    post_count INTEGER DEFAULT 0,
    PRIMARY KEY (chat_id, date)
)

Design Notes

Everything is SQLite. No external databases, no Redis, no message brokers. SQLite handles all persistence including full-text message storage, billing, reminders, focus tracking, and embeddings. For a single-machine deployment serving Telegram bots, this is the right tradeoff — zero operational overhead, ACID guarantees, and good-enough performance.

All timestamps are ISO8601 strings. SQLite lacks native datetime types, so timestamps are stored as TEXT in YYYY-MM-DD HH:MM:SS format. This is human-readable in raw queries and sorts correctly as strings.

Composite primary keys for multi-chat. The original schema used message_id as the primary key for messages and user_id for users. When multi-chat support was added, these became (chat_id, message_id) and (chat_id, user_id) — a migration that preserved all existing data while enabling per-chat isolation.

Rustam-Z/07-database-schema.md

Select an option

No results found