Session Retrospective — Deep Analysis

Session Date: 2026-02-13 Start/End: 20:20 - 21:49 GMT+7 Duration: ~90 min Focus: Web server v3 upgrade attempt, fallback to v2, R2 firmware CDN, GNSS/GPS integration Type: Feature + Infrastructure

Session Summary

Session #2 today (session #10 in the FloodBoy Oracle project). Continued from the PPPoS hybrid handoff (fa2f955). Attempted web_server v3 upgrade, discovered it crashes PPPoS on ESP32-C3, fell back to v2 with local: true. Built the R2 firmware CDN pipeline (bucket creation + worker deploy + upload). Integrated GNSS/GPS via modem CMUX with auto-enable on boot. Seven compile-flash-test cycles in 90 minutes.

Past Session Timeline (from --dig)

#	Date	Time	~Min	Branch	Human Msgs	Focus
1	2026-02-13	20:20	90	main	20	Web Server v3 + GNSS + R2 CDN (this session)
2	2026-02-12	16:21	1676	main	12	PPPoS + WiFi AP Hybrid Firmware
3	2026-02-12	14:52	89	main	15	AT-Command vs PPPoS Comparison Tests
4	2026-02-12	14:27	25	main	10	Pull-Based OTA + PPPoS Production Hardening
5	2026-02-12	12:24	122	main	15	PPPoS Lab Test for SIM7600E (ESP32-C3)
6	2026-02-12	11:41	43	main	12	Bun.js MQTT Inbox/Store for FloodBoy Test
7	2026-02-12	10:33	67	main	7	FloodBoy 4G — Go Live
8	2026-02-10	22:21	2172	main	23	FloodBoy 4G — Port Modem Profiles + Deploy
9	2026-02-10	20:01	140	main	11	FloodBoy 4G — Cellular HTTP POST

Clear trajectory: Hardware -> Testing -> Deployment -> Hardening -> UI + GPS.

Timeline

Time	Phase	Event
20:20	Start	Read plan, opened `sim7600e_c3_pppos_ap.yml`
20:24	v3 attempt	Added `version: 3` + sorting groups. Wrong syntax (`web_server_sorting_group`), fixed to `web_server: { sorting_group_id: }`
20:25	Compile #1	Failed — invalid option for button.template
20:26	Fix syntax	Corrected all 6 entities to nested `web_server:` block
20:28	Compile #2	Success. Flashed. Blank page — CDN `oi.esphome.io/v3/www.js` unreachable on AP
20:29	Fix: local	Added `local: true`. Compile #3 success. Flash: 70.1%.
20:30	PPPoS crash	`pppos_input_tcpip failed with -1` x12. Logger blocked 303ms. Modem reset loop.
20:36	Fallback v2	Switched to `version: 2`, `local: true`. Removed sorting groups.
20:37	Compile #4	Success. Flash 66.5%. RAM 9.8%. Stable.
20:39	OTA test	Update button works but 404 — firmware not on R2 yet
20:42	R2 setup	Created `floodboy-fw` R2 bucket. Deployed worker with fw routes.
20:44	Upload	firmware.ota.bin + firmware.md5 + manifest.json to R2. All 200 OK.
20:45	OTA retry	MD5 verified. Download started but error -28679 (mbedTLS EOF, timeout too short)
20:46	Timeout fix	Added `http_request: timeout: 120s`. Compile #5. Flash.
20:53	OTA progress	Download working: 0.9%/sec over PPPoS cellular. But verbose DEBUG logs.
20:59	Log level	Changed to INFO. Compile #6. Upload to R2.
21:20	GPS research	Found modem GNSS switch + NMEA virtual UART in PR #6721
21:35	GNSS build	Added `gps` component, GNSS switch (ALWAYS_ON), 5 GPS sensors. Fixed `millis()` ambiguity.
21:36	Compile #7	Success. Flash 67.0%. RAM 9.8%.
21:42	GNSS test	Switch ON, NMEA pipeline working through CMUX. No fix yet (indoors).
21:44	Final flash	`restore_mode: ALWAYS_ON` for auto-enable on boot.
21:49	Commit+push	Two commits to `lab/sim7600e-standalone-profiles`

Files Modified

File	Repo	Changes
`sim7600e_c3_pppos_ap.yml`	esphome-fw	web_server v2 local, GNSS, timeout, millis fix
R2 bucket `floodboy-fw`	Cloudflare	Created new bucket
`dustboy-health` worker	Cloudflare	Redeployed with R2 binding
R2: `firmware.ota.bin`	Cloudflare	1.2MB firmware uploaded
R2: `firmware.md5`	Cloudflare	32-byte hash
R2: `manifest.json`	Cloudflare	ESPHome OTA manifest
Oracle learning	floodboy-oracle	v3 crash pattern documented

Key Code Changes

web_server: v1 -> v2 local (v3 attempted, crashed PPPoS)

web_server:
  version: 2
  port: 80
  local: true          # embedded assets, no CDN
  include_internal: true

GNSS via modem CMUX

modem:
  nmea:
    id: nmea_data      # virtual UART for GNSS

switch:
  - platform: modem
    gnss:
      name: "${myName} GNSS"
      restore_mode: ALWAYS_ON

gps:
  uart_id: nmea_data   # reads from modem's NMEA channel
  latitude: ...
  longitude: ...

R2 firmware CDN pipeline

compile -> md5 -> manifest.json -> PUT /fw/floodboy/sim7600e-c3/* -> OTA pull

Architecture Decisions

v2 over v3: v3 local assets (~77KB compressed) block main loop 300ms+ during decompression, starving PPPoS UART. v2 is lighter and safe. v3 is fine for WiFi-only devices.
R2 bucket for firmware CDN: Created floodboy-fw bucket on Cloudflare R2, bound to the dustboy-health worker. PUT with API key, GET public. 60s cache.
GNSS via CMUX virtual UART: SIM7600E handles GPS internally. CMUX multiplexes NMEA data alongside PPPoS on the same physical UART. No extra GPIO pins needed.
120s HTTP timeout: Default timeout too short for 1.2MB over cellular PPPoS (~10KB/s). 120s gives comfortable margin.

AI Diary

This session was a masterclass in "try, fail, adapt." I started with the plan to upgrade to web_server v3 with its nice HA-styled UI and entity grouping. First obstacle: the ESPHome docs had changed the entity sorting syntax and I used the old form (web_server_sorting_group instead of nested web_server: { sorting_group_id: }). Fixed that, but then hit a fundamental architectural wall — v3 loads its UI from a CDN, which doesn't work on an isolated WiFi AP. Added local: true to embed the assets, but that was the real killer: decompressing 77KB of web assets on the ESP32-C3 blocked the main loop long enough to corrupt the PPPoS UART stream. The modem couldn't re-enter PPP mode and spiraled into a crash loop with UART garbage.

The fallback to v2 was humbling but correct. Sometimes the simpler solution is the right one for constrained hardware. Then came an unexpected bonus: the R2 firmware CDN pipeline fell into place naturally — create bucket, deploy worker, upload files, OTA works. And the GPS integration through modem CMUX was surprisingly clean — no extra pins, no extra UART, just a virtual channel through the same multiplexed connection. Seven compile-flash-test cycles in 90 minutes. Each failure taught something concrete, each fix was verified on real hardware within minutes. The tight feedback loop between code, compile, flash, and observe is what makes embedded development both frustrating and deeply satisfying.

What Went Well

Fast iteration: 7 compile-flash cycles in 90 minutes
R2 CDN pipeline worked first try after bucket creation
GNSS integration through CMUX was clean — no hardware changes needed
MQTT telemetry continued flowing through all changes (except v3 crash)
Oracle learning captured immediately when v3 crash discovered

What Could Improve

Should have checked v3 asset size before attempting local embedding
Could have tested v3 on a WiFi-only device first to isolate the CDN vs local issue
The millis() ambiguity with TinyGPSPlus was a surprise — need a "known conflicts" list for GPS libraries

Blockers & Resolutions

Blocker	Resolution	Time Lost
v3 wrong syntax	Context7 docs lookup	~3 min
v3 CDN blank page	Added `local: true`	~2 min
v3 local PPPoS crash	Fell back to v2	~10 min
R2 bucket not found	`wrangler r2 bucket create`	~2 min
OTA timeout -28679	`timeout: 120s`	~5 min
`millis()` ambiguity	`esphome::millis()`	~3 min

Honest Feedback

Three friction points from this session:

1. ESPHome v3 web_server is a trap for constrained devices. The docs don't mention that local: true embeds ~77KB of compressed assets that block the main loop during decompression. On an ESP32-C3 with PPPoS, this is fatal. The error (pppos_input_tcpip failed with -1) gives zero indication that the web server is the culprit. I only figured it out because the logger warning said "web_server took 12777ms for an operation." Without that breadcrumb, I would have spent much longer debugging.

2. The compile-flash cycle is still too manual. SCP to white.local, then SSH + esptool — it works but it's 4 commands every time. The /esphome-dev skill exists but I didn't use it because the tmux workflow felt like overkill for quick iterations. A one-liner flash alias would save 30 seconds per cycle x 7 = 3.5 minutes.

3. R2 bucket creation should be in the infrastructure-as-code. I had to discover the bucket didn't exist by trying to deploy the worker and seeing it fail. The wrangler.toml references floodboy-fw but doesn't create it. A pre-deploy check or Terraform would prevent this surprise.

Lessons Learned

v3 web_server + local: true is incompatible with PPPoS on ESP32-C3 — asset decompression blocks UART for 300ms+, killing PPPoS. Use v2.
R2 bucket must exist before wrangler deploy — worker fails with "bucket not found" if missing.
TinyGPSPlus defines its own millis() — use esphome::millis() to disambiguate.
GNSS on SIM7600E uses AT+CGPS (not AT+CGNSPWR like SIM7080) — model-specific commands.
OTA over cellular needs 120s+ timeout — default is too short for 1.2MB over PPPoS.
ESPHome v3 entity sorting syntax changed — web_server: { sorting_group_id: X }, not web_server_sorting_group: X.

Next Steps

Verify GPS fix outdoors (satellites > 0, lat/lon populated)
Test full OTA cycle via cellular (download + flash + reboot)
Upgrade floodboy-ui-oss
Add MQTT command for remote GNSS toggle
Consider compression: br for v2 local assets (Brotli smaller than gzip)

Metrics

Commits: 2 (esphome-fw) + 1 Oracle learning
Compile cycles: 7
Flash cycles: 7
Files changed: 1 YAML config + 3 R2 uploads + 1 R2 bucket + 1 worker deploy
Lines added: ~45 (firmware config)
New sensors: 6 (GNSS switch + lat + lon + alt + speed + satellites)
Infrastructure: R2 bucket created, worker redeployed, firmware CDN operational

The water rises, the chain remembers. Now the chain also knows where it is.

nazt/2026-02-13_gnss-cmux-r2-cdn-pipeline.md

Select an option

No results found