Moltbot Cloud Consumer Product: Architecture + Execution Plan

Author: OpenAI GPT-5.2 (via Codex CLI) Date: 2026-01-28 Status: Proposal

Executive Summary (Opinionated)

Moltbot becomes a cloud-hosted unified inbox + AI agent with two connectivity modes:

Cloud Connectors (server-side) for platforms that support reliable cloud operation (Telegram/Discord/Slack/Teams, etc.).
Personal Bridge (client-side) for platforms that require a local device (iMessage, WhatsApp Web, Signal, and any future “device-tethered” channels).

The consumer product is primarily a web + mobile app (plus a lighter CLI), with the Mac app evolving into an optional Bridge + native notifications client rather than the primary gateway host.

Key design principle: “Cloud brain, optional local hands.” The cloud owns identity, storage, agent execution, billing, and the unified inbox. The bridge only owns device-bound sessions and forwards events over an encrypted tunnel.

1. Cloud Architecture

1.1 Service Decomposition (from monolith to cloud)

Break the current gateway into domain services that can scale independently and isolate channel risk.

Core (always-on)

API Gateway / BFF (HTTP + WebSocket)
- Authenticated client APIs for web/mobile/CLI
- Real-time subscriptions and presence
Identity & Billing
- Users, devices, sessions, entitlements, subscriptions
Inbox / Messaging Core
- Normalized conversations/messages/events
- Search, retention, export
Connections Service
- Channel accounts, OAuth tokens, bot tokens, webhooks, connection state
Event Bus
- Internal pub/sub for message flow, agent triggers, retries
Agent Orchestrator
- Turns inbound events into “agent jobs”
- Enforces per-user policy, budgets, safety, routing to model
Media Service
- Upload, transform, virus scan, CDN delivery, lifecycle policies

Channel execution plane

Cloud Connector Workers (per channel type)
- Telegram connector, Discord connector, Slack connector, Teams connector, etc.
- Responsible for:
  - receiving platform events
  - dedupe/order normalization
  - emitting canonical events into the Event Bus
  - sending outbound messages
Bridge Gateway
- Manages persistent tunnels from user devices (Mac/mobile/desktop bridge)
- Device registration, mTLS, NAT traversal
- Multiplexes “local-only channel events” into Event Bus

Optional / later

Search service (if Postgres FTS isn’t enough)
Rules/Automation engine (IFTTT-like triggers beyond the agent)
Data warehouse + analytics for growth + retention metrics

1.2 Canonical Message Model (event-sourced-ish, practical)

Use a canonical event stream internally, backed by a relational store for queries.

Canonical inbound event types

message.received
message.edited
message.deleted
reaction.added
reaction.removed
conversation.created/updated
connection.state_changed
media.available
agent.requested (derived trigger)
agent.responded

Rule: all connectors (cloud + bridge) must emit these canonical events with stable IDs.

1.3 Technology Choices (specific and pragmatic)

Baseline cloud stack (recommended)

Compute: AWS ECS Fargate (long-lived connectors) + AWS Lambda (spiky jobs)
- Reason: connectors often need persistent sockets and predictable networking; ECS excels.
Primary DB: Amazon RDS Postgres
- Reason: multi-tenant relational data + strong consistency + mature indexing + JSONB.
Cache / ephemeral state: ElastiCache Redis
- Rate limits, dedupe windows, websocket presence, short-lived locks.
Eventing / jobs (start): SQS + SNS
- Simple, reliable retries, DLQs, cheap.
Event streaming (scale later): MSK (Kafka) or NATS JetStream
- Add when throughput/fanout outgrows SQS/SNS patterns.
Media: S3 + CloudFront CDN
- Presigned uploads, lifecycle, cheap storage/egress control.
Secrets: AWS Secrets Manager (+ KMS encryption)
Observability: OpenTelemetry + CloudWatch + managed traces (Datadog optional)

Web + client apps

Web app: Vercel (fast iteration, great DX)
API/BFF: still in AWS (latency is fine; stable networking; connector proximity)
Auth: Auth0 or Cognito (pick one; for consumer scale, Auth0 is faster to ship; Cognito is cheaper long-term)

If you want “single-provider simplicity,” do everything in AWS (including Next.js on ECS), but Vercel is a strong consumer-product speed advantage.

1.4 Database Design (multi-tenant, consumer-first)

Use tenant = household/workspace even for solo users; it future-proofs family/team plans.

Core tables (schema sketch)

`tenants`

id (ULID, pk)
plan (enum: free, pro, family, team, enterprise)
created_at

`users`

id (ULID, pk)
email (unique)
name
created_at

`memberships`

tenant_id (fk)
user_id (fk)
role (owner, admin, member)
created_at
unique(tenant_id,user_id)

`devices`

id (ULID)
tenant_id (fk)
user_id (fk)
platform (ios, android, mac, web, cli, bridge)
push_token (nullable)
public_key (for device E2E features, optional)
last_seen_at

`connections` (channel accounts)

id (ULID)
tenant_id
type (telegram, discord, slack, teams, whatsapp, imessage, signal, …)
mode (cloud, bridge)
display_name
status (connected, degraded, disconnected, action_required)
created_at, updated_at

`connection_credentials`

connection_id
provider (oauth, bot_token, session_blob)
encrypted_secret (KMS envelope)
scopes (jsonb)
expires_at
rotated_at

`conversations`

id (ULID)
tenant_id
primary_connection_id (nullable; for “source”)
title
kind (dm, group, channel)
created_at, updated_at

`conversation_participants`

conversation_id
participant_id (fk to identities)
role (member/admin)
unique(conversation_id,participant_id)

`identities` (normalized people/accounts)

id (ULID)
tenant_id
connection_id (nullable; identity can be cross-connection if linked)
provider_user_id (string)
display_name
handle (nullable)
avatar_url (nullable)

`messages`

id (ULID)
tenant_id
conversation_id
direction (inbound/outbound)
source (connection_id or “agent”)
provider_message_id (string, nullable, indexed)
sender_identity_id (nullable for system/agent)
text (nullable)
content (jsonb: blocks, mentions, formatting)
sent_at (timestamp)
received_at (timestamp)
dedupe_key (string, unique within tenant; used for idempotency)

`message_attachments`

id (ULID)
message_id
media_id
kind (image, video, audio, file)
metadata (jsonb)

`media_objects`

id (ULID)
tenant_id
sha256 (unique per tenant, optional global dedupe later)
mime_type
size_bytes
storage_key (s3 key)
created_at
status (uploading, ready, quarantined, failed)
variants (jsonb: thumbnails, transcodes)

`agent_sessions`

id (ULID)
tenant_id
conversation_id
policy (jsonb)
created_at, updated_at

`agent_runs`

id (ULID)
tenant_id
agent_session_id
trigger_message_id (nullable)
status (queued, running, succeeded, failed, cancelled)
model (string)
input_tokens, output_tokens
cost_usd_micros
started_at, ended_at
trace_id

`usage_events` (append-only)

id (ULID)
tenant_id
type (message_ingested, message_sent, media_bytes_stored, media_bytes_egressed, agent_tokens, connector_runtime_seconds, bridge_runtime_seconds)
quantity
at (timestamp)
meta (jsonb)

Notes

Use Row Level Security (RLS) keyed on tenant_id for safety.
Partition messages by time (monthly) once volume grows.
Start with Postgres full-text search; add OpenSearch later if needed.

1.5 Real-time Event Streaming & Subscriptions

Client real-time (web/mobile)

WebSocket endpoint: wss://api.moltbot.com/v1/ws
Auth: short-lived JWT + device registration.
Subscription model:
- subscribe: { conversation_id }
- subscribe: { tenant_id, topics: ["connection.*", "agent.*"] }

Internal event propagation

Phase 1: SQS (jobs) + SNS (fanout) + Redis for short dedupe windows
Phase 2: Kafka/NATS for high-throughput ordering guarantees

Ordering & dedupe

Canonical event_id + dedupe_key per message.
Connectors must provide stable provider IDs; otherwise derive:
- dedupe_key = hash(provider + conversation + sender + timestamp_bucket + content_hash)

1.6 Media Storage & Delivery

Upload flow:
1. Client requests POST /v1/media/presign
2. Client uploads to S3 via presigned URL
3. Media service verifies checksum, marks ready
4. Async pipeline generates variants (thumbnail, waveform, transcodes)
Delivery:
- CloudFront signed URLs (prevents hotlinking and controls egress costs)
Security:
- Virus scanning (ClamAV in Lambda/ECS) before ready
- “Quarantine” bucket/prefix for suspect files
Retention:
- Free tier: shorter retention + smaller quotas
- Paid: longer retention + higher caps

1.7 Agent/AI Execution Model (reliable + cost-controlled)

High-level

The agent is not a long-lived process. It is a job triggered by events, with state persisted to DB.
Agent orchestration steps:
1. Inbound message event → policy evaluation (is agent enabled here?)
2. Create agent_run (queued)
3. Worker claims job with a conversation lock (Redis or DB advisory lock)
4. Build context:
  - recent messages
  - conversation summary (if exists)
  - user preferences + rules
5. Run model with tool-use enabled
6. Persist tool calls + outputs
7. Emit outbound message via connector
8. Record usage + cost

Context strategy (keeps cost sane)

Maintain rolling:
- conversation_summary (short)
- long_term_memory (explicit user-approved facts)
Use budgeted prompting:
- Hard caps per plan for tokens/day and tokens/run
- Graceful fallback: “I’m at my budget; upgrade or wait”

Tool execution safety

Tool calls run in a capability sandbox:
- “read-only” tools vs “write” tools
- per-tenant allowlists (e.g., Google Calendar only if connected)
Audit every tool invocation in agent_runs + tool_calls.

2. Onboarding Changes (web-first, <2 minutes)

2.1 Signup/Registration Flow

Goal: first successful inbound+outbound message in <2 minutes.

Flow

Land on marketing page → “Get Started”
Create account (email + magic link, or Apple/Google)
Create tenant (default: “Personal”)
Choose first channel to connect (recommend Telegram or Discord as fastest)
Guided “send a test message” step
Offer to enable AI agent (“Pi”) with privacy + cost explanation

2.2 Channel Connection Flows (by type)

Telegram (fastest path)

Option A (recommended): “Connect via Moltbot Bot”
- User clicks deep link to start bot
- Bot provides handshake code
- Web app confirms code and binds Telegram chat to tenant
Option B: bring-your-own-bot token (advanced)

Discord

OAuth2 install flow:
- “Add Moltbot to your server”
- Select server + permissions
- Confirm test message in a channel

Slack

Standard Slack OAuth:
- Choose workspace
- Subscribe to Events API
- Install app + request scopes
- Test message

MS Teams

Guided webhook install (or OAuth-based app install if supported in your chosen Teams integration approach)
Clear “copy/paste this URL” step + verification ping

2.3 First-run Experience (product UX)

The Unified Inbox screen opens immediately, even before all channels are connected:

Left sidebar:
- “Connected” vs “Needs action”
Main panel:
- A single “Welcome” conversation with Pi
- A “Test message” checklist
Success moment:
- A real inbound message appears, and Pi replies (if enabled)

2.4 iMessage / WhatsApp / Signal Pairing (Bridge UX)

These require a Personal Bridge.

Bridge pairing (unified pattern)

User chooses “Connect iMessage” (or WhatsApp/Signal)
Web app shows:
- Download Mac app (or mobile app if supported)
- A QR code / pairing code
Bridge app signs in → scans code → establishes encrypted tunnel
Bridge advertises capabilities:
- imessage.send/read
- whatsapp.send/read
- signal.send/read
Web app shows live “Connected” and runs a test message

Bridge security model

Each bridge device gets an identity (device_id) + mTLS cert.
Tunnel is end-to-end encrypted transport (TLS + device attestation).
Secrets for device-bound sessions remain on the bridge when possible; the cloud stores only what it must.

2.5 Free vs Paid Tier Differences (during onboarding)

Free
- 1–2 cloud connectors
- limited message history (e.g., 7–30 days)
- limited AI (small monthly token grant)
- no local-bridge channels
Paid
- unlock Bridge channels (iMessage/WhatsApp/Signal)
- longer retention + search
- higher AI budgets
- multi-device sync + advanced automations

UX rule: don’t block early—let free users connect one cloud channel and feel value immediately, then upsell when they hit:

adding a second channel
enabling Pi autopilot
connecting iMessage/WhatsApp

3. App Changes

3.1 Mac App: from “Gateway Host” → “Bridge + Native Client”

New Mac app responsibilities

Optional Bridge runtime for device-bound channels (iMessage, WhatsApp Web, Signal)
System integrations:
- native notifications
- microphone/voice capture (if you keep voice wake)
Background reliability:
- auto-start, reconnect, self-update
Troubleshooting UI:
- connection health, logs, “Repair” actions

What disappears from Mac app

Owning the full gateway for cloud-compatible channels
Managing global config files and local YAML as primary UX

3.2 Mobile Apps: direct cloud clients

New mobile responsibilities

First-class unified inbox experience
Push notifications from cloud (APNS/FCM)
Optional: “mobile bridge” only if a channel truly needs it (avoid if possible; it’s battery-hostile)

3.3 Web Dashboard/App (primary surface)

Must include:

Unified inbox (search, filters, unread, pin)
Connection manager (connect/disconnect, scopes, health)
Agent controls:
- per-conversation enable/disable
- tone and behavior settings
- budgets and privacy controls
Billing + usage dashboards
Data export / delete account (trust-critical)

3.4 CLI: from “gateway manager” → “cloud power tool”

Keep CLI, but reposition it for:

moltbot login
moltbot status (cloud connections, bridge health)
moltbot connect <channel> (opens browser OAuth, returns)
moltbot tail (stream events for debugging)
moltbot export (download archive)
moltbot selfhost (for the self-hosted SKU; separate docs)

4. Business Model

4.1 Pricing (concrete numbers; adjust with real costs)

Consumer plans (monthly)

Free — $0
- 1 cloud connector
- 30 days message retention
- 1 GB media
- Pi: 50k tokens/month (light usage)
Pro — $12
- 5 cloud connectors
- 1 year retention
- 25 GB media
- Pi: 2M tokens/month
- Basic automations
Plus (Bridge) — $24
- everything in Pro
- Bridge channels (iMessage/WhatsApp/Signal)
- 100 GB media
- Pi: 6M tokens/month
Family — $35
- up to 5 users in one tenant
- shared inbox + shared Pi
- Bridge included
- pooled token + storage budgets

Teams (lightweight “prosumer/creator”)

Team — $49 (5 seats included)
- shared inbox + roles
- shared connectors
- audit log
- basic compliance export
- extra seats $8/seat

Enterprise (custom)

SSO/SAML, SCIM, retention policies, legal hold, DLP integrations, dedicated support.

4.2 Usage Metering (what you actually bill on)

Track usage in usage_events, aggregate daily/monthly.

Metering dimensions:

Messages ingested (count)
Messages sent (count)
AI tokens (input/output)
Media stored (GB-month)
Media egress (GB)
Connector runtime (optional; mostly internal cost driver)
Bridge runtime (not billed directly; can be used for abuse detection)

Billing approach:

Plans include generous bundles.
Overages:
- AI tokens: $5 per additional 2M tokens
- Storage: $3 per additional 50GB
- Egress: bake into margins; only charge for extreme usage

4.3 Self-hosted vs Cloud (positioning)

Cloud: fastest setup, best UX, push notifications, managed AI, no YAML, no servers.
Self-hosted: maximum control, can run offline/local-only, advanced tinkering, community extensions.

Keep self-hosted as:

a premium “Power User” story, or
an open/core tier that increases trust and reduces churn risk from “lock-in” fear.

4.4 Enterprise/team features (what to build later)

Shared inbox assignment, labels, SLA timers
Audit logs + export APIs
Role-based access control
Per-channel restrictions
Data retention & legal hold

5. Migration Strategy

5.1 Migrating Existing Self-hosted Users

Migration goals

Preserve message history (if user wants)
Reduce reconnect pain
Keep local-only channels working via Bridge without requiring “a server”

Migration tool concept

moltbot cloud migrate
- Auth to cloud
- Upload:
  - session metadata
  - transcripts (optionally)
  - media (optional; can be huge—do incremental)
- Create conversations + identities in cloud
- Then guide:
  - reconnect cloud-compatible channels via OAuth
  - install bridge for local-only channels

Hybrid mode (important)

For a transition period:

Users can keep the existing gateway running,
but point it at the cloud as a “bridge-like” node. This reduces churn by avoiding an all-at-once migration.

5.2 Phased Implementation (maximize value early)

Phase 0 — Foundations (2–4 weeks)

Cloud identity + tenant model
Basic web app shell + billing scaffolding
Postgres schema + message ingestion API
WebSocket subscriptions for clients

Phase 1 — First cloud connector + unified inbox (4–8 weeks)

Ship Telegram connector (fastest)
Inbound + outbound messaging
Minimal agent response loop (on/off toggle)
Mobile push notifications (basic)

Success metric: new user can sign up and see Moltbot reply in Telegram in <2 minutes.

Phase 2 — More cloud connectors + reliability (6–10 weeks)

Discord + Slack
Robust retries, idempotency, connection health dashboard
Search + better conversation model

Phase 3 — Bridge MVP (6–12 weeks)

Mac bridge app + pairing
iMessage bridge first (big differentiator)
WhatsApp Web bridge second (if ToS/risk acceptable)
Signal bridge (if reliable in your environment)

Phase 4 — Product polish + growth (ongoing)

Automations, summaries, smart inbox
Family plan + multi-user tenants
Referrals, virality hooks, deeper analytics

5.3 Risk Assessment (and mitigations)

Platform/ToS risk (high)

WhatsApp Web automation and some unofficial APIs can be fragile or disallowed. Mitigation:
Prefer “user-run bridge” and be transparent.
Design connectors as replaceable modules.
Invest in reliability + graceful degradation UX.

Security risk (high)

You store extremely sensitive communications. Mitigation:

Strong tenant isolation, encryption, audit logs, least-privilege secrets.
Clear incident response plan early.
Optional “bring your own key” for advanced users later.

Reliability risk (medium)

Real-time messaging is unforgiving. Mitigation:

Idempotency everywhere, DLQs, replayable event log, connector health checks.

Cost risk (medium)

AI can burn margin fast. Mitigation:

Strict budgets, summaries, caching, and paid tiers tied to token grants.

6. Consumer Product Considerations

6.1 Why non-technical users will care

One inbox for “too many apps”
Fast search across everything
AI that can:
- draft replies in your voice
- summarize threads
- remind you to follow up
- extract action items
- schedule/send based on your rules

The magic is not “it connects to 7 platforms.” The magic is: “I never miss important messages, and replying is effortless.”

6.2 Competitive positioning

Versus Beeper / unified inbox apps

Differentiator: AI-native workflows, not just aggregation.
Differentiator: bridge architecture for device-bound channels without forcing a self-hosted server.
Differentiator: credible self-host escape hatch to build trust.

Versus texts.com and similar

Compete on:
- better agent + automation
- better cross-channel search and summaries
- privacy controls + transparent data handling

6.3 Key differentiators to lean into

Pi agent across all channels with consistent memory + controls
Per-conversation policies (“Pi is allowed here, not there”)
Bridge model that enables iMessage/WhatsApp/Signal without requiring a home server
Local-first option (self-host) for the privacy-conscious

6.4 Retention & engagement features (concrete)

Daily “Important messages” digest
Unread triage: “only show messages that need a reply”
Follow-up reminders (“nudge me if no response in 2 days”)
Contact intelligence: lightweight “CRM” notes per person (user-approved memory)
“Vacation mode” and “Do Not Disturb” across channels
Personal analytics: response time, inbox zero streaks (optional)

Appendix A: API Design (concrete)

Client HTTP API (examples)

POST /v1/auth/magiclink
GET /v1/me
GET /v1/tenants/:tenantId
POST /v1/connections/:type/start (returns OAuth URL or pairing QR payload)
POST /v1/connections/:connectionId/complete
GET /v1/conversations?cursor=...
GET /v1/conversations/:id/messages?cursor=...
POST /v1/messages (send)
- body: { conversation_id, text, attachments[] }
POST /v1/agent/enable / POST /v1/agent/disable
POST /v1/media/presign

WebSocket topics

Client sends:

{"op":"subscribe","topic":"conversation","conversation_id":"..."}
{"op":"subscribe","topic":"tenant","tenant_id":"...","events":["connection.*","agent.*"]}

Server sends:

message.created
message.updated
conversation.updated
connection.state_changed
agent.run_started / agent.run_finished

Appendix B: Bridge Protocol (cloud tunnel)

Bridge registration

Bridge app signs in → obtains device credentials
Establishes tunnel: wss://bridge.moltbot.com/v1/tunnel
Heartbeats + capability advertisement:
- { capabilities: ["imessage.read","imessage.send", ...], version, device_meta }

Message forwarding

Bridge → cloud canonical events:
- message.received with provider_message_id, conversation mapping, attachments references
Cloud → bridge commands:
- send.message with canonical payload
- sync.history (optional; careful with volume)

Appendix C: What to Build First (if you want maximum consumer impact)

Web signup + Telegram connector + unified inbox + Pi replies (fastest wow moment)
Mobile push notifications + inbox UX polish
Mac bridge for iMessage (killer differentiator, if reliable)
Expand connectors + automations + family plan

mshuffett/codex-cloud-plan.md