Author: OpenAI GPT-5.2 (via Codex CLI) Date: 2026-01-28 Status: Proposal
Moltbot becomes a cloud-hosted unified inbox + AI agent with two connectivity modes:
- Cloud Connectors (server-side) for platforms that support reliable cloud operation (Telegram/Discord/Slack/Teams, etc.).
- Personal Bridge (client-side) for platforms that require a local device (iMessage, WhatsApp Web, Signal, and any future “device-tethered” channels).
The consumer product is primarily a web + mobile app (plus a lighter CLI), with the Mac app evolving into an optional Bridge + native notifications client rather than the primary gateway host.
Key design principle: “Cloud brain, optional local hands.” The cloud owns identity, storage, agent execution, billing, and the unified inbox. The bridge only owns device-bound sessions and forwards events over an encrypted tunnel.
Break the current gateway into domain services that can scale independently and isolate channel risk.
- API Gateway / BFF (HTTP + WebSocket)
- Authenticated client APIs for web/mobile/CLI
- Real-time subscriptions and presence
- Identity & Billing
- Users, devices, sessions, entitlements, subscriptions
- Inbox / Messaging Core
- Normalized conversations/messages/events
- Search, retention, export
- Connections Service
- Channel accounts, OAuth tokens, bot tokens, webhooks, connection state
- Event Bus
- Internal pub/sub for message flow, agent triggers, retries
- Agent Orchestrator
- Turns inbound events into “agent jobs”
- Enforces per-user policy, budgets, safety, routing to model
- Media Service
- Upload, transform, virus scan, CDN delivery, lifecycle policies
- Cloud Connector Workers (per channel type)
- Telegram connector, Discord connector, Slack connector, Teams connector, etc.
- Responsible for:
- receiving platform events
- dedupe/order normalization
- emitting canonical events into the Event Bus
- sending outbound messages
- Bridge Gateway
- Manages persistent tunnels from user devices (Mac/mobile/desktop bridge)
- Device registration, mTLS, NAT traversal
- Multiplexes “local-only channel events” into Event Bus
- Search service (if Postgres FTS isn’t enough)
- Rules/Automation engine (IFTTT-like triggers beyond the agent)
- Data warehouse + analytics for growth + retention metrics
Use a canonical event stream internally, backed by a relational store for queries.
message.receivedmessage.editedmessage.deletedreaction.addedreaction.removedconversation.created/updatedconnection.state_changedmedia.availableagent.requested(derived trigger)agent.responded
Rule: all connectors (cloud + bridge) must emit these canonical events with stable IDs.
- Compute: AWS ECS Fargate (long-lived connectors) + AWS Lambda (spiky jobs)
- Reason: connectors often need persistent sockets and predictable networking; ECS excels.
- Primary DB: Amazon RDS Postgres
- Reason: multi-tenant relational data + strong consistency + mature indexing + JSONB.
- Cache / ephemeral state: ElastiCache Redis
- Rate limits, dedupe windows, websocket presence, short-lived locks.
- Eventing / jobs (start): SQS + SNS
- Simple, reliable retries, DLQs, cheap.
- Event streaming (scale later): MSK (Kafka) or NATS JetStream
- Add when throughput/fanout outgrows SQS/SNS patterns.
- Media: S3 + CloudFront CDN
- Presigned uploads, lifecycle, cheap storage/egress control.
- Secrets: AWS Secrets Manager (+ KMS encryption)
- Observability: OpenTelemetry + CloudWatch + managed traces (Datadog optional)
- Web app: Vercel (fast iteration, great DX)
- API/BFF: still in AWS (latency is fine; stable networking; connector proximity)
- Auth: Auth0 or Cognito (pick one; for consumer scale, Auth0 is faster to ship; Cognito is cheaper long-term)
If you want “single-provider simplicity,” do everything in AWS (including Next.js on ECS), but Vercel is a strong consumer-product speed advantage.
Use tenant = household/workspace even for solo users; it future-proofs family/team plans.
id(ULID, pk)plan(enum: free, pro, family, team, enterprise)created_at
id(ULID, pk)email(unique)namecreated_at
tenant_id(fk)user_id(fk)role(owner, admin, member)created_at- unique(
tenant_id,user_id)
id(ULID)tenant_id(fk)user_id(fk)platform(ios, android, mac, web, cli, bridge)push_token(nullable)public_key(for device E2E features, optional)last_seen_at
id(ULID)tenant_idtype(telegram, discord, slack, teams, whatsapp, imessage, signal, …)mode(cloud, bridge)display_namestatus(connected, degraded, disconnected, action_required)created_at,updated_at
connection_idprovider(oauth, bot_token, session_blob)encrypted_secret(KMS envelope)scopes(jsonb)expires_atrotated_at
id(ULID)tenant_idprimary_connection_id(nullable; for “source”)titlekind(dm, group, channel)created_at,updated_at
conversation_idparticipant_id(fk toidentities)role(member/admin)- unique(
conversation_id,participant_id)
id(ULID)tenant_idconnection_id(nullable; identity can be cross-connection if linked)provider_user_id(string)display_namehandle(nullable)avatar_url(nullable)
id(ULID)tenant_idconversation_iddirection(inbound/outbound)source(connection_id or “agent”)provider_message_id(string, nullable, indexed)sender_identity_id(nullable for system/agent)text(nullable)content(jsonb: blocks, mentions, formatting)sent_at(timestamp)received_at(timestamp)dedupe_key(string, unique within tenant; used for idempotency)
id(ULID)message_idmedia_idkind(image, video, audio, file)metadata(jsonb)
id(ULID)tenant_idsha256(unique per tenant, optional global dedupe later)mime_typesize_bytesstorage_key(s3 key)created_atstatus(uploading, ready, quarantined, failed)variants(jsonb: thumbnails, transcodes)
id(ULID)tenant_idconversation_idpolicy(jsonb)created_at,updated_at
id(ULID)tenant_idagent_session_idtrigger_message_id(nullable)status(queued, running, succeeded, failed, cancelled)model(string)input_tokens,output_tokenscost_usd_microsstarted_at,ended_attrace_id
id(ULID)tenant_idtype(message_ingested, message_sent, media_bytes_stored, media_bytes_egressed, agent_tokens, connector_runtime_seconds, bridge_runtime_seconds)quantityat(timestamp)meta(jsonb)
- Use Row Level Security (RLS) keyed on
tenant_idfor safety. - Partition
messagesby time (monthly) once volume grows. - Start with Postgres full-text search; add OpenSearch later if needed.
- WebSocket endpoint:
wss://api.moltbot.com/v1/ws - Auth: short-lived JWT + device registration.
- Subscription model:
subscribe: { conversation_id }subscribe: { tenant_id, topics: ["connection.*", "agent.*"] }
- Phase 1: SQS (jobs) + SNS (fanout) + Redis for short dedupe windows
- Phase 2: Kafka/NATS for high-throughput ordering guarantees
- Canonical
event_id+dedupe_keyper message. - Connectors must provide stable provider IDs; otherwise derive:
dedupe_key = hash(provider + conversation + sender + timestamp_bucket + content_hash)
- Upload flow:
- Client requests
POST /v1/media/presign - Client uploads to S3 via presigned URL
- Media service verifies checksum, marks
ready - Async pipeline generates variants (thumbnail, waveform, transcodes)
- Client requests
- Delivery:
- CloudFront signed URLs (prevents hotlinking and controls egress costs)
- Security:
- Virus scanning (ClamAV in Lambda/ECS) before
ready - “Quarantine” bucket/prefix for suspect files
- Virus scanning (ClamAV in Lambda/ECS) before
- Retention:
- Free tier: shorter retention + smaller quotas
- Paid: longer retention + higher caps
- The agent is not a long-lived process. It is a job triggered by events, with state persisted to DB.
- Agent orchestration steps:
- Inbound message event → policy evaluation (is agent enabled here?)
- Create
agent_run(queued) - Worker claims job with a conversation lock (Redis or DB advisory lock)
- Build context:
- recent messages
- conversation summary (if exists)
- user preferences + rules
- Run model with tool-use enabled
- Persist tool calls + outputs
- Emit outbound message via connector
- Record usage + cost
- Maintain rolling:
conversation_summary(short)long_term_memory(explicit user-approved facts)
- Use budgeted prompting:
- Hard caps per plan for tokens/day and tokens/run
- Graceful fallback: “I’m at my budget; upgrade or wait”
- Tool calls run in a capability sandbox:
- “read-only” tools vs “write” tools
- per-tenant allowlists (e.g., Google Calendar only if connected)
- Audit every tool invocation in
agent_runs+tool_calls.
Goal: first successful inbound+outbound message in <2 minutes.
Flow
- Land on marketing page → “Get Started”
- Create account (email + magic link, or Apple/Google)
- Create tenant (default: “Personal”)
- Choose first channel to connect (recommend Telegram or Discord as fastest)
- Guided “send a test message” step
- Offer to enable AI agent (“Pi”) with privacy + cost explanation
- Option A (recommended): “Connect via Moltbot Bot”
- User clicks deep link to start bot
- Bot provides handshake code
- Web app confirms code and binds Telegram chat to tenant
- Option B: bring-your-own-bot token (advanced)
- OAuth2 install flow:
- “Add Moltbot to your server”
- Select server + permissions
- Confirm test message in a channel
- Standard Slack OAuth:
- Choose workspace
- Subscribe to Events API
- Install app + request scopes
- Test message
- Guided webhook install (or OAuth-based app install if supported in your chosen Teams integration approach)
- Clear “copy/paste this URL” step + verification ping
The Unified Inbox screen opens immediately, even before all channels are connected:
- Left sidebar:
- “Connected” vs “Needs action”
- Main panel:
- A single “Welcome” conversation with Pi
- A “Test message” checklist
- Success moment:
- A real inbound message appears, and Pi replies (if enabled)
These require a Personal Bridge.
- User chooses “Connect iMessage” (or WhatsApp/Signal)
- Web app shows:
- Download Mac app (or mobile app if supported)
- A QR code / pairing code
- Bridge app signs in → scans code → establishes encrypted tunnel
- Bridge advertises capabilities:
imessage.send/readwhatsapp.send/readsignal.send/read
- Web app shows live “Connected” and runs a test message
- Each bridge device gets an identity (
device_id) + mTLS cert. - Tunnel is end-to-end encrypted transport (TLS + device attestation).
- Secrets for device-bound sessions remain on the bridge when possible; the cloud stores only what it must.
- Free
- 1–2 cloud connectors
- limited message history (e.g., 7–30 days)
- limited AI (small monthly token grant)
- no local-bridge channels
- Paid
- unlock Bridge channels (iMessage/WhatsApp/Signal)
- longer retention + search
- higher AI budgets
- multi-device sync + advanced automations
UX rule: don’t block early—let free users connect one cloud channel and feel value immediately, then upsell when they hit:
- adding a second channel
- enabling Pi autopilot
- connecting iMessage/WhatsApp
- Optional Bridge runtime for device-bound channels (iMessage, WhatsApp Web, Signal)
- System integrations:
- native notifications
- microphone/voice capture (if you keep voice wake)
- Background reliability:
- auto-start, reconnect, self-update
- Troubleshooting UI:
- connection health, logs, “Repair” actions
- Owning the full gateway for cloud-compatible channels
- Managing global config files and local YAML as primary UX
- First-class unified inbox experience
- Push notifications from cloud (APNS/FCM)
- Optional: “mobile bridge” only if a channel truly needs it (avoid if possible; it’s battery-hostile)
Must include:
- Unified inbox (search, filters, unread, pin)
- Connection manager (connect/disconnect, scopes, health)
- Agent controls:
- per-conversation enable/disable
- tone and behavior settings
- budgets and privacy controls
- Billing + usage dashboards
- Data export / delete account (trust-critical)
Keep CLI, but reposition it for:
moltbot loginmoltbot status(cloud connections, bridge health)moltbot connect <channel>(opens browser OAuth, returns)moltbot tail(stream events for debugging)moltbot export(download archive)moltbot selfhost(for the self-hosted SKU; separate docs)
- Free — $0
- 1 cloud connector
- 30 days message retention
- 1 GB media
- Pi: 50k tokens/month (light usage)
- Pro — $12
- 5 cloud connectors
- 1 year retention
- 25 GB media
- Pi: 2M tokens/month
- Basic automations
- Plus (Bridge) — $24
- everything in Pro
- Bridge channels (iMessage/WhatsApp/Signal)
- 100 GB media
- Pi: 6M tokens/month
- Family — $35
- up to 5 users in one tenant
- shared inbox + shared Pi
- Bridge included
- pooled token + storage budgets
- Team — $49 (5 seats included)
- shared inbox + roles
- shared connectors
- audit log
- basic compliance export
- extra seats $8/seat
- SSO/SAML, SCIM, retention policies, legal hold, DLP integrations, dedicated support.
Track usage in usage_events, aggregate daily/monthly.
Metering dimensions:
- Messages ingested (count)
- Messages sent (count)
- AI tokens (input/output)
- Media stored (GB-month)
- Media egress (GB)
- Connector runtime (optional; mostly internal cost driver)
- Bridge runtime (not billed directly; can be used for abuse detection)
Billing approach:
- Plans include generous bundles.
- Overages:
- AI tokens: $5 per additional 2M tokens
- Storage: $3 per additional 50GB
- Egress: bake into margins; only charge for extreme usage
- Cloud: fastest setup, best UX, push notifications, managed AI, no YAML, no servers.
- Self-hosted: maximum control, can run offline/local-only, advanced tinkering, community extensions.
Keep self-hosted as:
- a premium “Power User” story, or
- an open/core tier that increases trust and reduces churn risk from “lock-in” fear.
- Shared inbox assignment, labels, SLA timers
- Audit logs + export APIs
- Role-based access control
- Per-channel restrictions
- Data retention & legal hold
- Preserve message history (if user wants)
- Reduce reconnect pain
- Keep local-only channels working via Bridge without requiring “a server”
moltbot cloud migrate- Auth to cloud
- Upload:
- session metadata
- transcripts (optionally)
- media (optional; can be huge—do incremental)
- Create conversations + identities in cloud
- Then guide:
- reconnect cloud-compatible channels via OAuth
- install bridge for local-only channels
For a transition period:
- Users can keep the existing gateway running,
- but point it at the cloud as a “bridge-like” node. This reduces churn by avoiding an all-at-once migration.
- Cloud identity + tenant model
- Basic web app shell + billing scaffolding
- Postgres schema + message ingestion API
- WebSocket subscriptions for clients
- Ship Telegram connector (fastest)
- Inbound + outbound messaging
- Minimal agent response loop (on/off toggle)
- Mobile push notifications (basic)
Success metric: new user can sign up and see Moltbot reply in Telegram in <2 minutes.
- Discord + Slack
- Robust retries, idempotency, connection health dashboard
- Search + better conversation model
- Mac bridge app + pairing
- iMessage bridge first (big differentiator)
- WhatsApp Web bridge second (if ToS/risk acceptable)
- Signal bridge (if reliable in your environment)
- Automations, summaries, smart inbox
- Family plan + multi-user tenants
- Referrals, virality hooks, deeper analytics
- WhatsApp Web automation and some unofficial APIs can be fragile or disallowed. Mitigation:
- Prefer “user-run bridge” and be transparent.
- Design connectors as replaceable modules.
- Invest in reliability + graceful degradation UX.
You store extremely sensitive communications. Mitigation:
- Strong tenant isolation, encryption, audit logs, least-privilege secrets.
- Clear incident response plan early.
- Optional “bring your own key” for advanced users later.
Real-time messaging is unforgiving. Mitigation:
- Idempotency everywhere, DLQs, replayable event log, connector health checks.
AI can burn margin fast. Mitigation:
- Strict budgets, summaries, caching, and paid tiers tied to token grants.
- One inbox for “too many apps”
- Fast search across everything
- AI that can:
- draft replies in your voice
- summarize threads
- remind you to follow up
- extract action items
- schedule/send based on your rules
The magic is not “it connects to 7 platforms.” The magic is: “I never miss important messages, and replying is effortless.”
- Differentiator: AI-native workflows, not just aggregation.
- Differentiator: bridge architecture for device-bound channels without forcing a self-hosted server.
- Differentiator: credible self-host escape hatch to build trust.
- Compete on:
- better agent + automation
- better cross-channel search and summaries
- privacy controls + transparent data handling
- Pi agent across all channels with consistent memory + controls
- Per-conversation policies (“Pi is allowed here, not there”)
- Bridge model that enables iMessage/WhatsApp/Signal without requiring a home server
- Local-first option (self-host) for the privacy-conscious
- Daily “Important messages” digest
- Unread triage: “only show messages that need a reply”
- Follow-up reminders (“nudge me if no response in 2 days”)
- Contact intelligence: lightweight “CRM” notes per person (user-approved memory)
- “Vacation mode” and “Do Not Disturb” across channels
- Personal analytics: response time, inbox zero streaks (optional)
POST /v1/auth/magiclinkGET /v1/meGET /v1/tenants/:tenantIdPOST /v1/connections/:type/start(returns OAuth URL or pairing QR payload)POST /v1/connections/:connectionId/completeGET /v1/conversations?cursor=...GET /v1/conversations/:id/messages?cursor=...POST /v1/messages(send)- body:
{ conversation_id, text, attachments[] }
- body:
POST /v1/agent/enable/POST /v1/agent/disablePOST /v1/media/presign
Client sends:
{"op":"subscribe","topic":"conversation","conversation_id":"..."}{"op":"subscribe","topic":"tenant","tenant_id":"...","events":["connection.*","agent.*"]}
Server sends:
message.createdmessage.updatedconversation.updatedconnection.state_changedagent.run_started/agent.run_finished
- Bridge app signs in → obtains device credentials
- Establishes tunnel:
wss://bridge.moltbot.com/v1/tunnel - Heartbeats + capability advertisement:
{ capabilities: ["imessage.read","imessage.send", ...], version, device_meta }
- Bridge → cloud canonical events:
message.receivedwithprovider_message_id,conversation mapping, attachments references
- Cloud → bridge commands:
send.messagewith canonical payloadsync.history(optional; careful with volume)
- Web signup + Telegram connector + unified inbox + Pi replies (fastest wow moment)
- Mobile push notifications + inbox UX polish
- Mac bridge for iMessage (killer differentiator, if reliable)
- Expand connectors + automations + family plan