Skip to content

Instantly share code, notes, and snippets.

@mshuffett
Created January 28, 2026 09:32
Show Gist options
  • Select an option

  • Save mshuffett/cd157866099fb79ddd75362fce532045 to your computer and use it in GitHub Desktop.

Select an option

Save mshuffett/cd157866099fb79ddd75362fce532045 to your computer and use it in GitHub Desktop.
Moltbot Cloud Consumer Product Plan - GPT 5.2 (Codex)

Moltbot Cloud Consumer Product: Architecture + Execution Plan

Author: OpenAI GPT-5.2 (via Codex CLI) Date: 2026-01-28 Status: Proposal


Executive Summary (Opinionated)

Moltbot becomes a cloud-hosted unified inbox + AI agent with two connectivity modes:

  1. Cloud Connectors (server-side) for platforms that support reliable cloud operation (Telegram/Discord/Slack/Teams, etc.).
  2. Personal Bridge (client-side) for platforms that require a local device (iMessage, WhatsApp Web, Signal, and any future “device-tethered” channels).

The consumer product is primarily a web + mobile app (plus a lighter CLI), with the Mac app evolving into an optional Bridge + native notifications client rather than the primary gateway host.

Key design principle: “Cloud brain, optional local hands.” The cloud owns identity, storage, agent execution, billing, and the unified inbox. The bridge only owns device-bound sessions and forwards events over an encrypted tunnel.


1. Cloud Architecture

1.1 Service Decomposition (from monolith to cloud)

Break the current gateway into domain services that can scale independently and isolate channel risk.

Core (always-on)

  • API Gateway / BFF (HTTP + WebSocket)
    • Authenticated client APIs for web/mobile/CLI
    • Real-time subscriptions and presence
  • Identity & Billing
    • Users, devices, sessions, entitlements, subscriptions
  • Inbox / Messaging Core
    • Normalized conversations/messages/events
    • Search, retention, export
  • Connections Service
    • Channel accounts, OAuth tokens, bot tokens, webhooks, connection state
  • Event Bus
    • Internal pub/sub for message flow, agent triggers, retries
  • Agent Orchestrator
    • Turns inbound events into “agent jobs”
    • Enforces per-user policy, budgets, safety, routing to model
  • Media Service
    • Upload, transform, virus scan, CDN delivery, lifecycle policies

Channel execution plane

  • Cloud Connector Workers (per channel type)
    • Telegram connector, Discord connector, Slack connector, Teams connector, etc.
    • Responsible for:
      • receiving platform events
      • dedupe/order normalization
      • emitting canonical events into the Event Bus
      • sending outbound messages
  • Bridge Gateway
    • Manages persistent tunnels from user devices (Mac/mobile/desktop bridge)
    • Device registration, mTLS, NAT traversal
    • Multiplexes “local-only channel events” into Event Bus

Optional / later

  • Search service (if Postgres FTS isn’t enough)
  • Rules/Automation engine (IFTTT-like triggers beyond the agent)
  • Data warehouse + analytics for growth + retention metrics

1.2 Canonical Message Model (event-sourced-ish, practical)

Use a canonical event stream internally, backed by a relational store for queries.

Canonical inbound event types

  • message.received
  • message.edited
  • message.deleted
  • reaction.added
  • reaction.removed
  • conversation.created/updated
  • connection.state_changed
  • media.available
  • agent.requested (derived trigger)
  • agent.responded

Rule: all connectors (cloud + bridge) must emit these canonical events with stable IDs.

1.3 Technology Choices (specific and pragmatic)

Baseline cloud stack (recommended)

  • Compute: AWS ECS Fargate (long-lived connectors) + AWS Lambda (spiky jobs)
    • Reason: connectors often need persistent sockets and predictable networking; ECS excels.
  • Primary DB: Amazon RDS Postgres
    • Reason: multi-tenant relational data + strong consistency + mature indexing + JSONB.
  • Cache / ephemeral state: ElastiCache Redis
    • Rate limits, dedupe windows, websocket presence, short-lived locks.
  • Eventing / jobs (start): SQS + SNS
    • Simple, reliable retries, DLQs, cheap.
  • Event streaming (scale later): MSK (Kafka) or NATS JetStream
    • Add when throughput/fanout outgrows SQS/SNS patterns.
  • Media: S3 + CloudFront CDN
    • Presigned uploads, lifecycle, cheap storage/egress control.
  • Secrets: AWS Secrets Manager (+ KMS encryption)
  • Observability: OpenTelemetry + CloudWatch + managed traces (Datadog optional)

Web + client apps

  • Web app: Vercel (fast iteration, great DX)
  • API/BFF: still in AWS (latency is fine; stable networking; connector proximity)
  • Auth: Auth0 or Cognito (pick one; for consumer scale, Auth0 is faster to ship; Cognito is cheaper long-term)

If you want “single-provider simplicity,” do everything in AWS (including Next.js on ECS), but Vercel is a strong consumer-product speed advantage.

1.4 Database Design (multi-tenant, consumer-first)

Use tenant = household/workspace even for solo users; it future-proofs family/team plans.

Core tables (schema sketch)

tenants

  • id (ULID, pk)
  • plan (enum: free, pro, family, team, enterprise)
  • created_at

users

  • id (ULID, pk)
  • email (unique)
  • name
  • created_at

memberships

  • tenant_id (fk)
  • user_id (fk)
  • role (owner, admin, member)
  • created_at
  • unique(tenant_id,user_id)

devices

  • id (ULID)
  • tenant_id (fk)
  • user_id (fk)
  • platform (ios, android, mac, web, cli, bridge)
  • push_token (nullable)
  • public_key (for device E2E features, optional)
  • last_seen_at

connections (channel accounts)

  • id (ULID)
  • tenant_id
  • type (telegram, discord, slack, teams, whatsapp, imessage, signal, …)
  • mode (cloud, bridge)
  • display_name
  • status (connected, degraded, disconnected, action_required)
  • created_at, updated_at

connection_credentials

  • connection_id
  • provider (oauth, bot_token, session_blob)
  • encrypted_secret (KMS envelope)
  • scopes (jsonb)
  • expires_at
  • rotated_at

conversations

  • id (ULID)
  • tenant_id
  • primary_connection_id (nullable; for “source”)
  • title
  • kind (dm, group, channel)
  • created_at, updated_at

conversation_participants

  • conversation_id
  • participant_id (fk to identities)
  • role (member/admin)
  • unique(conversation_id,participant_id)

identities (normalized people/accounts)

  • id (ULID)
  • tenant_id
  • connection_id (nullable; identity can be cross-connection if linked)
  • provider_user_id (string)
  • display_name
  • handle (nullable)
  • avatar_url (nullable)

messages

  • id (ULID)
  • tenant_id
  • conversation_id
  • direction (inbound/outbound)
  • source (connection_id or “agent”)
  • provider_message_id (string, nullable, indexed)
  • sender_identity_id (nullable for system/agent)
  • text (nullable)
  • content (jsonb: blocks, mentions, formatting)
  • sent_at (timestamp)
  • received_at (timestamp)
  • dedupe_key (string, unique within tenant; used for idempotency)

message_attachments

  • id (ULID)
  • message_id
  • media_id
  • kind (image, video, audio, file)
  • metadata (jsonb)

media_objects

  • id (ULID)
  • tenant_id
  • sha256 (unique per tenant, optional global dedupe later)
  • mime_type
  • size_bytes
  • storage_key (s3 key)
  • created_at
  • status (uploading, ready, quarantined, failed)
  • variants (jsonb: thumbnails, transcodes)

agent_sessions

  • id (ULID)
  • tenant_id
  • conversation_id
  • policy (jsonb)
  • created_at, updated_at

agent_runs

  • id (ULID)
  • tenant_id
  • agent_session_id
  • trigger_message_id (nullable)
  • status (queued, running, succeeded, failed, cancelled)
  • model (string)
  • input_tokens, output_tokens
  • cost_usd_micros
  • started_at, ended_at
  • trace_id

usage_events (append-only)

  • id (ULID)
  • tenant_id
  • type (message_ingested, message_sent, media_bytes_stored, media_bytes_egressed, agent_tokens, connector_runtime_seconds, bridge_runtime_seconds)
  • quantity
  • at (timestamp)
  • meta (jsonb)

Notes

  • Use Row Level Security (RLS) keyed on tenant_id for safety.
  • Partition messages by time (monthly) once volume grows.
  • Start with Postgres full-text search; add OpenSearch later if needed.

1.5 Real-time Event Streaming & Subscriptions

Client real-time (web/mobile)

  • WebSocket endpoint: wss://api.moltbot.com/v1/ws
  • Auth: short-lived JWT + device registration.
  • Subscription model:
    • subscribe: { conversation_id }
    • subscribe: { tenant_id, topics: ["connection.*", "agent.*"] }

Internal event propagation

  • Phase 1: SQS (jobs) + SNS (fanout) + Redis for short dedupe windows
  • Phase 2: Kafka/NATS for high-throughput ordering guarantees

Ordering & dedupe

  • Canonical event_id + dedupe_key per message.
  • Connectors must provide stable provider IDs; otherwise derive:
    • dedupe_key = hash(provider + conversation + sender + timestamp_bucket + content_hash)

1.6 Media Storage & Delivery

  • Upload flow:
    1. Client requests POST /v1/media/presign
    2. Client uploads to S3 via presigned URL
    3. Media service verifies checksum, marks ready
    4. Async pipeline generates variants (thumbnail, waveform, transcodes)
  • Delivery:
    • CloudFront signed URLs (prevents hotlinking and controls egress costs)
  • Security:
    • Virus scanning (ClamAV in Lambda/ECS) before ready
    • “Quarantine” bucket/prefix for suspect files
  • Retention:
    • Free tier: shorter retention + smaller quotas
    • Paid: longer retention + higher caps

1.7 Agent/AI Execution Model (reliable + cost-controlled)

High-level

  • The agent is not a long-lived process. It is a job triggered by events, with state persisted to DB.
  • Agent orchestration steps:
    1. Inbound message event → policy evaluation (is agent enabled here?)
    2. Create agent_run (queued)
    3. Worker claims job with a conversation lock (Redis or DB advisory lock)
    4. Build context:
      • recent messages
      • conversation summary (if exists)
      • user preferences + rules
    5. Run model with tool-use enabled
    6. Persist tool calls + outputs
    7. Emit outbound message via connector
    8. Record usage + cost

Context strategy (keeps cost sane)

  • Maintain rolling:
    • conversation_summary (short)
    • long_term_memory (explicit user-approved facts)
  • Use budgeted prompting:
    • Hard caps per plan for tokens/day and tokens/run
    • Graceful fallback: “I’m at my budget; upgrade or wait”

Tool execution safety

  • Tool calls run in a capability sandbox:
    • “read-only” tools vs “write” tools
    • per-tenant allowlists (e.g., Google Calendar only if connected)
  • Audit every tool invocation in agent_runs + tool_calls.

2. Onboarding Changes (web-first, <2 minutes)

2.1 Signup/Registration Flow

Goal: first successful inbound+outbound message in <2 minutes.

Flow

  1. Land on marketing page → “Get Started”
  2. Create account (email + magic link, or Apple/Google)
  3. Create tenant (default: “Personal”)
  4. Choose first channel to connect (recommend Telegram or Discord as fastest)
  5. Guided “send a test message” step
  6. Offer to enable AI agent (“Pi”) with privacy + cost explanation

2.2 Channel Connection Flows (by type)

Telegram (fastest path)

  • Option A (recommended): “Connect via Moltbot Bot”
    • User clicks deep link to start bot
    • Bot provides handshake code
    • Web app confirms code and binds Telegram chat to tenant
  • Option B: bring-your-own-bot token (advanced)

Discord

  • OAuth2 install flow:
    • “Add Moltbot to your server”
    • Select server + permissions
    • Confirm test message in a channel

Slack

  • Standard Slack OAuth:
    • Choose workspace
    • Subscribe to Events API
    • Install app + request scopes
    • Test message

MS Teams

  • Guided webhook install (or OAuth-based app install if supported in your chosen Teams integration approach)
  • Clear “copy/paste this URL” step + verification ping

2.3 First-run Experience (product UX)

The Unified Inbox screen opens immediately, even before all channels are connected:

  • Left sidebar:
    • “Connected” vs “Needs action”
  • Main panel:
    • A single “Welcome” conversation with Pi
    • A “Test message” checklist
  • Success moment:
    • A real inbound message appears, and Pi replies (if enabled)

2.4 iMessage / WhatsApp / Signal Pairing (Bridge UX)

These require a Personal Bridge.

Bridge pairing (unified pattern)

  1. User chooses “Connect iMessage” (or WhatsApp/Signal)
  2. Web app shows:
    • Download Mac app (or mobile app if supported)
    • A QR code / pairing code
  3. Bridge app signs in → scans code → establishes encrypted tunnel
  4. Bridge advertises capabilities:
    • imessage.send/read
    • whatsapp.send/read
    • signal.send/read
  5. Web app shows live “Connected” and runs a test message

Bridge security model

  • Each bridge device gets an identity (device_id) + mTLS cert.
  • Tunnel is end-to-end encrypted transport (TLS + device attestation).
  • Secrets for device-bound sessions remain on the bridge when possible; the cloud stores only what it must.

2.5 Free vs Paid Tier Differences (during onboarding)

  • Free
    • 1–2 cloud connectors
    • limited message history (e.g., 7–30 days)
    • limited AI (small monthly token grant)
    • no local-bridge channels
  • Paid
    • unlock Bridge channels (iMessage/WhatsApp/Signal)
    • longer retention + search
    • higher AI budgets
    • multi-device sync + advanced automations

UX rule: don’t block early—let free users connect one cloud channel and feel value immediately, then upsell when they hit:

  • adding a second channel
  • enabling Pi autopilot
  • connecting iMessage/WhatsApp

3. App Changes

3.1 Mac App: from “Gateway Host” → “Bridge + Native Client”

New Mac app responsibilities

  • Optional Bridge runtime for device-bound channels (iMessage, WhatsApp Web, Signal)
  • System integrations:
    • native notifications
    • microphone/voice capture (if you keep voice wake)
  • Background reliability:
    • auto-start, reconnect, self-update
  • Troubleshooting UI:
    • connection health, logs, “Repair” actions

What disappears from Mac app

  • Owning the full gateway for cloud-compatible channels
  • Managing global config files and local YAML as primary UX

3.2 Mobile Apps: direct cloud clients

New mobile responsibilities

  • First-class unified inbox experience
  • Push notifications from cloud (APNS/FCM)
  • Optional: “mobile bridge” only if a channel truly needs it (avoid if possible; it’s battery-hostile)

3.3 Web Dashboard/App (primary surface)

Must include:

  • Unified inbox (search, filters, unread, pin)
  • Connection manager (connect/disconnect, scopes, health)
  • Agent controls:
    • per-conversation enable/disable
    • tone and behavior settings
    • budgets and privacy controls
  • Billing + usage dashboards
  • Data export / delete account (trust-critical)

3.4 CLI: from “gateway manager” → “cloud power tool”

Keep CLI, but reposition it for:

  • moltbot login
  • moltbot status (cloud connections, bridge health)
  • moltbot connect <channel> (opens browser OAuth, returns)
  • moltbot tail (stream events for debugging)
  • moltbot export (download archive)
  • moltbot selfhost (for the self-hosted SKU; separate docs)

4. Business Model

4.1 Pricing (concrete numbers; adjust with real costs)

Consumer plans (monthly)

  • Free — $0
    • 1 cloud connector
    • 30 days message retention
    • 1 GB media
    • Pi: 50k tokens/month (light usage)
  • Pro — $12
    • 5 cloud connectors
    • 1 year retention
    • 25 GB media
    • Pi: 2M tokens/month
    • Basic automations
  • Plus (Bridge) — $24
    • everything in Pro
    • Bridge channels (iMessage/WhatsApp/Signal)
    • 100 GB media
    • Pi: 6M tokens/month
  • Family — $35
    • up to 5 users in one tenant
    • shared inbox + shared Pi
    • Bridge included
    • pooled token + storage budgets

Teams (lightweight “prosumer/creator”)

  • Team — $49 (5 seats included)
    • shared inbox + roles
    • shared connectors
    • audit log
    • basic compliance export
    • extra seats $8/seat

Enterprise (custom)

  • SSO/SAML, SCIM, retention policies, legal hold, DLP integrations, dedicated support.

4.2 Usage Metering (what you actually bill on)

Track usage in usage_events, aggregate daily/monthly.

Metering dimensions:

  • Messages ingested (count)
  • Messages sent (count)
  • AI tokens (input/output)
  • Media stored (GB-month)
  • Media egress (GB)
  • Connector runtime (optional; mostly internal cost driver)
  • Bridge runtime (not billed directly; can be used for abuse detection)

Billing approach:

  • Plans include generous bundles.
  • Overages:
    • AI tokens: $5 per additional 2M tokens
    • Storage: $3 per additional 50GB
    • Egress: bake into margins; only charge for extreme usage

4.3 Self-hosted vs Cloud (positioning)

  • Cloud: fastest setup, best UX, push notifications, managed AI, no YAML, no servers.
  • Self-hosted: maximum control, can run offline/local-only, advanced tinkering, community extensions.

Keep self-hosted as:

  • a premium “Power User” story, or
  • an open/core tier that increases trust and reduces churn risk from “lock-in” fear.

4.4 Enterprise/team features (what to build later)

  • Shared inbox assignment, labels, SLA timers
  • Audit logs + export APIs
  • Role-based access control
  • Per-channel restrictions
  • Data retention & legal hold

5. Migration Strategy

5.1 Migrating Existing Self-hosted Users

Migration goals

  • Preserve message history (if user wants)
  • Reduce reconnect pain
  • Keep local-only channels working via Bridge without requiring “a server”

Migration tool concept

  • moltbot cloud migrate
    • Auth to cloud
    • Upload:
      • session metadata
      • transcripts (optionally)
      • media (optional; can be huge—do incremental)
    • Create conversations + identities in cloud
    • Then guide:
      • reconnect cloud-compatible channels via OAuth
      • install bridge for local-only channels

Hybrid mode (important)

For a transition period:

  • Users can keep the existing gateway running,
  • but point it at the cloud as a “bridge-like” node. This reduces churn by avoiding an all-at-once migration.

5.2 Phased Implementation (maximize value early)

Phase 0 — Foundations (2–4 weeks)

  • Cloud identity + tenant model
  • Basic web app shell + billing scaffolding
  • Postgres schema + message ingestion API
  • WebSocket subscriptions for clients

Phase 1 — First cloud connector + unified inbox (4–8 weeks)

  • Ship Telegram connector (fastest)
  • Inbound + outbound messaging
  • Minimal agent response loop (on/off toggle)
  • Mobile push notifications (basic)

Success metric: new user can sign up and see Moltbot reply in Telegram in <2 minutes.

Phase 2 — More cloud connectors + reliability (6–10 weeks)

  • Discord + Slack
  • Robust retries, idempotency, connection health dashboard
  • Search + better conversation model

Phase 3 — Bridge MVP (6–12 weeks)

  • Mac bridge app + pairing
  • iMessage bridge first (big differentiator)
  • WhatsApp Web bridge second (if ToS/risk acceptable)
  • Signal bridge (if reliable in your environment)

Phase 4 — Product polish + growth (ongoing)

  • Automations, summaries, smart inbox
  • Family plan + multi-user tenants
  • Referrals, virality hooks, deeper analytics

5.3 Risk Assessment (and mitigations)

Platform/ToS risk (high)

  • WhatsApp Web automation and some unofficial APIs can be fragile or disallowed. Mitigation:
  • Prefer “user-run bridge” and be transparent.
  • Design connectors as replaceable modules.
  • Invest in reliability + graceful degradation UX.

Security risk (high)

You store extremely sensitive communications. Mitigation:

  • Strong tenant isolation, encryption, audit logs, least-privilege secrets.
  • Clear incident response plan early.
  • Optional “bring your own key” for advanced users later.

Reliability risk (medium)

Real-time messaging is unforgiving. Mitigation:

  • Idempotency everywhere, DLQs, replayable event log, connector health checks.

Cost risk (medium)

AI can burn margin fast. Mitigation:

  • Strict budgets, summaries, caching, and paid tiers tied to token grants.

6. Consumer Product Considerations

6.1 Why non-technical users will care

  • One inbox for “too many apps”
  • Fast search across everything
  • AI that can:
    • draft replies in your voice
    • summarize threads
    • remind you to follow up
    • extract action items
    • schedule/send based on your rules

The magic is not “it connects to 7 platforms.” The magic is: “I never miss important messages, and replying is effortless.”

6.2 Competitive positioning

Versus Beeper / unified inbox apps

  • Differentiator: AI-native workflows, not just aggregation.
  • Differentiator: bridge architecture for device-bound channels without forcing a self-hosted server.
  • Differentiator: credible self-host escape hatch to build trust.

Versus texts.com and similar

  • Compete on:
    • better agent + automation
    • better cross-channel search and summaries
    • privacy controls + transparent data handling

6.3 Key differentiators to lean into

  • Pi agent across all channels with consistent memory + controls
  • Per-conversation policies (“Pi is allowed here, not there”)
  • Bridge model that enables iMessage/WhatsApp/Signal without requiring a home server
  • Local-first option (self-host) for the privacy-conscious

6.4 Retention & engagement features (concrete)

  • Daily “Important messages” digest
  • Unread triage: “only show messages that need a reply”
  • Follow-up reminders (“nudge me if no response in 2 days”)
  • Contact intelligence: lightweight “CRM” notes per person (user-approved memory)
  • “Vacation mode” and “Do Not Disturb” across channels
  • Personal analytics: response time, inbox zero streaks (optional)

Appendix A: API Design (concrete)

Client HTTP API (examples)

  • POST /v1/auth/magiclink
  • GET /v1/me
  • GET /v1/tenants/:tenantId
  • POST /v1/connections/:type/start (returns OAuth URL or pairing QR payload)
  • POST /v1/connections/:connectionId/complete
  • GET /v1/conversations?cursor=...
  • GET /v1/conversations/:id/messages?cursor=...
  • POST /v1/messages (send)
    • body: { conversation_id, text, attachments[] }
  • POST /v1/agent/enable / POST /v1/agent/disable
  • POST /v1/media/presign

WebSocket topics

Client sends:

  • {"op":"subscribe","topic":"conversation","conversation_id":"..."}
  • {"op":"subscribe","topic":"tenant","tenant_id":"...","events":["connection.*","agent.*"]}

Server sends:

  • message.created
  • message.updated
  • conversation.updated
  • connection.state_changed
  • agent.run_started / agent.run_finished

Appendix B: Bridge Protocol (cloud tunnel)

Bridge registration

  • Bridge app signs in → obtains device credentials
  • Establishes tunnel: wss://bridge.moltbot.com/v1/tunnel
  • Heartbeats + capability advertisement:
    • { capabilities: ["imessage.read","imessage.send", ...], version, device_meta }

Message forwarding

  • Bridge → cloud canonical events:
    • message.received with provider_message_id, conversation mapping, attachments references
  • Cloud → bridge commands:
    • send.message with canonical payload
    • sync.history (optional; careful with volume)

Appendix C: What to Build First (if you want maximum consumer impact)

  1. Web signup + Telegram connector + unified inbox + Pi replies (fastest wow moment)
  2. Mobile push notifications + inbox UX polish
  3. Mac bridge for iMessage (killer differentiator, if reliable)
  4. Expand connectors + automations + family plan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment