Hosted by Dr. Marily Nika and Hamel Husain
45 minutes | Virtual (Zoom) | Free to join
Most PMs know something is wrong with their AI product but can't pinpoint what. They rely on vague feedback ("it feels off") or wait for engineering to investigate. This session gives you a repeatable process to diagnose exactly where your AI is failing in under an hour—no code required.
You'll walk away knowing how to look at your AI's behavior systematically, spot the patterns that matter, and have a concrete list of what to fix first.
The #1 mistake PMs make is not reviewing real outputs. We'll show you how to pull 20 examples that represent the full range of your product's behavior—not just the easy wins or obvious disasters.
"The tone is off" or "it's not helpful" isn't actionable. We'll show you how to name your failures specifically (e.g., "ignores budget constraints," "wrong client persona") so your team knows exactly what to build against.
Not all failures matter equally. You'll leave with a prioritized list of 3-5 issues ranked by how often they occur and how much they hurt the user experience.
PMs are the bridge between what users need and what engineering builds. But when AI products fail, PMs often feel stuck—they know something's wrong but can't translate that intuition into specific, fixable problems.
This isn't about becoming an ML engineer. It's about developing the systematic eye that separates PMs who ship AI that works from those who ship AI that "kind of works sometimes."
Marily has seen hundreds of AI product failures at Google and Meta. Hamel has debugged AI systems across dozens of companies. Together, we'll give you the process we actually use.
| Time | Step | What You're Doing |
|---|---|---|
| Pre-work | Sample | We'll come prepared with 20 examples that cover different user types, features, and edge cases |
| 0-40 min | Read & React | Go through each example together and write one sentence about what went wrong. This is where the real learning happens—you'll build intuition for spotting failures by watching us do it live. |
| 40-42 min | Group | Cluster your notes into 3-5 categories of failure |
| 42-45 min | Prioritize | Count how often each category appears, rank by impact |
We'll do this live on a real AI product so you can follow along and apply the same process to your own products.
- PMs building AI products who want to move faster on quality
- Product leaders who need to give concrete feedback to engineering
- Anyone who's ever said "the AI just doesn't get it" and wanted to be more specific
This Lightning Lesson is a taste of systematic AI evaluation. For the full methodology:
- AI Product Management Bootcamp (Marily Nika) – Build and launch real AI products with certification
- AI Evals for Engineers & PMs (Hamel Husain & Shreya Shankar) – The complete framework for measuring and improving AI quality
Dr. Marily Nika is a Gen AI Product Leader at Google with 12+ years at Google and Meta. She created Maven's first AI PM Certification and has taught 20k+ students worldwide. PhD in Machine Learning, TED AI speaker, Harvard Business School Fellow.
Hamel Husain is an ML engineer with 20+ years of experience at companies including Airbnb and GitHub. He co-created the AI Evals course and consults with companies building AI products. His early LLM research was used by OpenAI for code understanding.