Created
February 6, 2026 22:15
-
-
Save dzianisv/69d82bb0d1a683e4c40419a68ce5fc09 to your computer and use it in GitHub Desktop.
Medium blog post about AI podcast automation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| <!DOCTYPE html> | |
| <html> | |
| <head> | |
| <title>How I Built a Fully Automated AI Podcast from Gmail Newsletters</title> | |
| </head> | |
| <body> | |
| <article> | |
| <h1>How I Built a Fully Automated AI Podcast from Gmail Newsletters</h1> | |
| <p><em>From email to Spotify in minutes — no manual intervention required</em></p> | |
| <p>Last week, I published the first episode of "The Inference Times" — a podcast covering housing markets, stock markets, and tech trends. The twist? I never wrote a script, never recorded my voice, and never touched a single button on Spotify's interface.</p> | |
| <p>Everything was automated.</p> | |
| <p>In this post, I'll walk you through how I built a system that:</p> | |
| <ol> | |
| <li>Extracts newsletter content from Gmail</li> | |
| <li>Generates natural-sounding audio using AI text-to-speech</li> | |
| <li>Creates custom cover art using Gemini</li> | |
| <li>Publishes directly to Spotify — all orchestrated by an AI coding agent</li> | |
| </ol> | |
| <h2>The Problem: Newsletter Overload</h2> | |
| <p>Like many of you, I subscribe to several high-quality newsletters. CalculatedRisk for housing market analysis. Matt Levine's Money Stuff for finance. Benedict Evans for tech trends.</p> | |
| <p>But here's the thing: <strong>I rarely read them.</strong></p> | |
| <p>They pile up in my inbox, guilt-inducing reminders of content I'll "get to later." What I <em>do</em> have time for is listening during my commute.</p> | |
| <p>So I asked myself: What if my newsletters could come to me as a podcast?</p> | |
| <h2>The Stack</h2> | |
| <p>Here's what powers "The Inference Times":</p> | |
| <ul> | |
| <li><strong>OpenCode</strong> — AI coding agent that orchestrates the entire workflow</li> | |
| <li><strong>OpenCode Skills</strong> — Custom skill definitions for repeatable tasks</li> | |
| <li><strong>Gmail API</strong> — Extract newsletter content</li> | |
| <li><strong>Coqui TTS</strong> — Generate natural-sounding speech (fast, local)</li> | |
| <li><strong>Bark</strong> — Alternative TTS for expressive speech</li> | |
| <li><strong>Gemini</strong> — Create episode cover art</li> | |
| <li><strong>Chrome DevTools MCP</strong> — Automate Spotify publishing</li> | |
| </ul> | |
| <h2>Step 1: Extracting Content from Gmail</h2> | |
| <p>The first challenge was getting newsletter content out of Gmail in a clean, usable format. I wrote a Python script that connects to Gmail API, fetches emails matching specific criteria, and converts HTML to clean, speakable text.</p> | |
| <p>The key insight is the html_to_podcast_script() function. Raw email HTML is full of navigation, footers, unsubscribe links, and formatting cruft. I use BeautifulSoup to remove noise and preserve paragraph structure for natural pauses.</p> | |
| <h2>Step 2: Generating Audio with AI Text-to-Speech</h2> | |
| <p>For text-to-speech, I evaluated several options:</p> | |
| <ul> | |
| <li><strong>ElevenLabs</strong> — Excellent quality, fast, but expensive</li> | |
| <li><strong>OpenAI TTS</strong> — Great quality, API-based</li> | |
| <li><strong>Bark</strong> — Excellent quality, slow, free</li> | |
| <li><strong>Coqui TTS</strong> — Good quality, fast, free, local</li> | |
| </ul> | |
| <p>I went with Coqui TTS using the VCTK VITS model for most content. It runs entirely on my MacBook, costs nothing, and generates 3 minutes of audio in about 45 seconds.</p> | |
| <p>The p226 voice from the VCTK dataset has a pleasant, professional British tone — perfect for financial news.</p> | |
| <h2>Step 3: Cover Art Generation with Gemini</h2> | |
| <p>Every episode needs cover art. Rather than use a static image, I wanted dynamic art that reflects the episode topic.</p> | |
| <p>I use Google's Gemini model through their web interface. My OpenCode agent navigates to gemini.google.com and generates art with prompts describing the desired style and topic.</p> | |
| <h2>Step 4: Publishing to Spotify with Chrome DevTools</h2> | |
| <p>This is where the magic happens.</p> | |
| <p>Spotify doesn't have a public API for podcast publishing. You have to use their web interface at creators.spotify.com. Most people would say "automation stops here."</p> | |
| <p>Not with OpenCode.</p> | |
| <p>Using the Chrome DevTools MCP (Model Context Protocol) server, my AI agent can navigate web pages, fill out forms, click buttons, handle authentication, and wait for page loads.</p> | |
| <p>The entire flow runs autonomously. The agent handles edge cases like rich text editor bugs and loading states.</p> | |
| <h2>Results</h2> | |
| <p>Here's what "The Inference Times" Episode 1 looks like:</p> | |
| <ul> | |
| <li>Source: 5 CalculatedRisk emails about housing markets</li> | |
| <li>Audio length: 3 minutes 10 seconds</li> | |
| <li>Generation time: ~2 minutes total</li> | |
| <li>Manual effort: Zero</li> | |
| </ul> | |
| <h2>Lessons Learned</h2> | |
| <p><strong>1. Web automation is fragile but powerful</strong> — AI agents can adapt when traditional automation scripts would fail.</p> | |
| <p><strong>2. Local TTS is good enough</strong> — Coqui's VCTK model produces perfectly listenable audio for informational content.</p> | |
| <p><strong>3. Skills > Scripts</strong> — Packaging workflows as OpenCode Skills means the AI agent can adapt and improve the process.</p> | |
| <h2>What's Next</h2> | |
| <p>I'm planning to:</p> | |
| <ul> | |
| <li>Add more newsletter sources — Matt Levine, Benedict Evans, Stratechery</li> | |
| <li>Implement scheduling — Auto-publish every Monday morning</li> | |
| <li>Add intro/outro music — Using AI-generated audio beds</li> | |
| </ul> | |
| <p>The dream is a fully autonomous media company that transforms written content into audio content at scale.</p> | |
| <h2>Try It Yourself</h2> | |
| <p>If you want to build something similar:</p> | |
| <ol> | |
| <li>Install OpenCode: https://opencode.ai</li> | |
| <li>Set up Coqui TTS with Python</li> | |
| <li>Enable Chrome DevTools with --remote-debugging-port=9222</li> | |
| <li>Create your skill in ~/.config/opencode/skills/</li> | |
| </ol> | |
| <p>The future of content isn't creation — it's transformation. We're swimming in high-quality written content. The bottleneck is format conversion.</p> | |
| <p>AI agents like OpenCode make that conversion automatic.</p> | |
| <hr> | |
| <p><em>Den is building AI-powered tools for content transformation. Follow for more posts on automation, AI agents, and the future of media.</em></p> | |
| </article> | |
| </body> | |
| </html> |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment