Skip to content

Instantly share code, notes, and snippets.

@numman-ali
Created December 28, 2025 23:08
Show Gist options
  • Select an option

  • Save numman-ali/4e7a654af593fd157c9bdde3e9a5cf38 to your computer and use it in GitHub Desktop.

Select an option

Save numman-ali/4e7a654af593fd157c9bdde3e9a5cf38 to your computer and use it in GitHub Desktop.

Generative AI Model Research Plan

Priority: TOP — This informs schema design, skill prompts, and render pipeline.

Last Updated: 2025-12-28

Status: ✅ REFERENCES VALIDATED (cloud + on-device) — synthesis still pending


✅ Reference Library Re-Validation (Status)

Problem Identified (resolved for references/): Early drafts were created from web searches and aggregator sources and required systematic validation against official vendor documentation.

What exists:

  • 50 cloud model reference docs
  • 3 on-device model compilation docs (55+ models)
  • Synthesis documents in planning/synthesis/* (not yet revalidated)
  • Model ID audit (complete for models already documented; new models may appear over time)

What is now done (2025-12-28):

  • All existing references/** docs have been validated against official vendor documentation (for cloud) or primary upstream sources (HF/GitHub) for on-device.
  • Gaps (models that should exist in the library but do not yet have dedicated docs) are tracked in:
    • references/GAPS.md
    • references/MODEL-INVENTORY.md

Remaining risk: Any earlier inaccuracies may still exist in planning/synthesis/* until those documents are revalidated against the now-canonical reference docs.


Re-Validation Plan

Phase 1: Cloud Models (existing docs) — DONE (2025-12-28)

Each document requires an agent to:

  1. Fetch official vendor documentation
  2. Compare EVERY claim in the reference doc
  3. Verify prompting vocabulary matches official guidance
  4. Verify capabilities (resolution, duration, formats)
  5. Verify pricing (cross-check with MODEL-AUDIT.md)
  6. Verify API parameters and endpoints
  7. Update the reference doc with corrections
  8. Note any new features/capabilities not captured

Phase 2: On-Device Models (3 compilation docs) — DONE (2025-12-28)

Similar validation against HuggingFace model cards and GitHub repos.

Phase 3: Synthesis Documents (7 documents) — TODO

After reference docs are validated, verify synthesis docs reflect corrected information.


Cloud Models in the Reference Library (as of 2025-12-28)

The canonical “what’s covered vs missing” list lives in:

  • references/MODEL-INVENTORY.md
  • references/GAPS.md

The tables below are a convenience snapshot for this plan doc.

Video Generation (15 documents)

Document Model Provider Primary Source Status
references/video/veo-3.md Veo 3.1 Google cloud.google.com/vertex-ai/generative-ai/docs COVERED
references/video/sora-2.md Sora 2 OpenAI platform.openai.com/docs COVERED
references/video/runway-gen4.5.md Gen-4/4.5 Runway docs.dev.runwayml.com COVERED
references/video/kling-2.1.md Kling 2.1 Kuaishou klingai.com/global/dev COVERED
references/video/luma-ray3.md Ray2/Ray3 Luma AI docs.lumalabs.ai COVERED
references/video/hailuo-02.md Hailuo 02 MiniMax platform.minimaxi.com/docs/api-reference/video-generation-intro COVERED
references/video/midjourney-video.md Midjourney Video Midjourney docs.midjourney.com/docs/video COVERED
references/video/seedance-1.5-pro.md Seedance 1.5 Pro / 1.0 family ByteDance (Volcengine Ark) volcengine.com/docs/82379 COVERED
references/video/pika-2.md Pika 2.2 (via fal.ai) Pika fal.ai/models COVERED
references/video/pixverse.md PixVerse (v5.5) PixVerse docs.platform.pixverse.ai COVERED
references/video/haiper-2.x.md Haiper Video 2.x Haiper docs.haiper.ai/api-reference COVERED
references/video/vidu.md Vidu (viduq1 / 2.0 / 1.5) Vidu docs.platform.vidu.com COVERED
references/video/firefly-video.md Firefly Video (Generate Video API) Adobe developer.adobe.com/firefly-services/docs COVERED
references/video/nova-reel.md Nova Reel AWS (Amazon Bedrock) docs.aws.amazon.com/nova/latest/userguide COVERED
references/video/alibaba-wan.md Wan (Wan2.x / Wanx2.1 + VACE editing) Alibaba Cloud (Model Studio / DashScope) alibabacloud.com/help COVERED

Image Generation (17 documents)

Document Model Provider Primary Source Status
references/image/nano-banana-pro.md Nano Banana / Nano Banana Pro Google ai.google.dev/gemini-api/docs/image-generation COVERED
references/image/imagen-4.md Imagen 4 Google ai.google.dev/gemini-api/docs/imagen COVERED
references/image/flux-2.md FLUX.2 Black Forest Labs docs.bfl.ai COVERED
references/image/flux-kontext.md FLUX.1 Kontext Black Forest Labs docs.bfl.ai/kontext COVERED
references/image/gpt-image.md GPT Image 1.5 OpenAI platform.openai.com/docs/guides/image-generation COVERED
references/image/midjourney.md Midjourney V7 Midjourney docs.midjourney.com COVERED
references/image/ideogram-3.md Ideogram 3.0 Ideogram developer.ideogram.ai COVERED
references/image/seedream-4.md Seedream 4.5 ByteDance docs.byteplus.com COVERED
references/image/firefly-image.md Firefly Image (API) Adobe developer.adobe.com/firefly-services/docs COVERED
references/image/stability-image.md Stable Image + SD 3.5 (API) Stability AI api.stability.ai/v2alpha/openapi COVERED
references/image/nova-canvas.md Nova Canvas AWS (Amazon Bedrock) docs.aws.amazon.com/nova/latest/userguide COVERED
references/image/minimax-image.md MiniMax Image Generation (image-01, image-01-live) MiniMax platform.minimaxi.com/docs/api-reference/image-generation-intro COVERED
references/image/recraft.md Recraft (Recraft API) Recraft recraft.ai/docs/api-reference COVERED
references/image/leonardo.md Leonardo (Image API) Leonardo AI docs.leonardo.ai/reference COVERED
references/image/reve-image.md Reve Image API (Create/Edit/Remix) Reve api.reve.com COVERED
references/image/krea.md Krea (Image/Video API) Krea docs.krea.ai/api-reference COVERED
references/image/freepik-mystic.md Freepik Mystic Freepik docs.freepik.com/api-reference COVERED

Audio Generation (18 documents)

Document Model Provider Primary Source Status
references/audio/elevenlabs.md ElevenLabs TTS ElevenLabs elevenlabs.io/docs COVERED
references/audio/eleven-music.md Eleven Music ElevenLabs elevenlabs.io/docs COVERED
references/audio/minimax-music.md MiniMax Music 2.0 (music-2.0) MiniMax platform.minimaxi.com/docs/api-reference/music-intro COVERED
references/audio/suno-v5.md Suno v5 Suno help.suno.com COVERED
references/audio/udio.md Udio v1.5 Udio help.udio.com COVERED
references/audio/openai-tts.md OpenAI TTS OpenAI platform.openai.com/docs/guides/text-to-speech COVERED
references/audio/fish-audio-openaudio-s1.md OpenAudio S1 Fish Audio docs.fish.audio COVERED
references/audio/cartesia-sonic.md Sonic 3 Cartesia docs.cartesia.ai COVERED
references/audio/playht.md PlayHT PlayHT docs.play.ht COVERED
references/audio/gemini-tts.md Gemini Preview TTS Google (Gemini API) ai.google.dev/gemini-api/docs/speech-generation COVERED
references/audio/minimax-speech.md MiniMax Speech (T2A + Async + Voice Design/Cloning) MiniMax platform.minimaxi.com/docs/api-reference/speech-t2a-intro COVERED
references/audio/google-cloud-tts.md Google Cloud TTS Google Cloud cloud.google.com/text-to-speech COVERED
references/audio/azure-tts.md Azure TTS Microsoft learn.microsoft.com/azure/ai-services/speech-service COVERED
references/audio/amazon-polly.md Amazon Polly AWS docs.aws.amazon.com/polly COVERED
references/audio/respeecher.md Respeecher Respeecher docs.respeecher.com COVERED
references/audio/stable-audio.md Stable Audio 2 / 2.5 Stability AI api.stability.ai/v2alpha/openapi COVERED
references/audio/lyria-2.md Lyria 2 Google docs.cloud.google.com/vertex-ai/generative-ai/docs COVERED
references/audio/lyria-realtime.md Lyria RealTime Google (Gemini API) ai.google.dev/gemini-api/docs/music-generation COVERED

On-Device Models to Validate

Document Models Primary Sources Status
references/video/on-device-models.md compilation doc HuggingFace model cards, GitHub COVERED
references/image/on-device-models.md compilation doc HuggingFace model cards, GitHub COVERED
references/audio/on-device-models.md compilation doc HuggingFace model cards COVERED

Agent Prompts for Validation

Template: Cloud Model Validation Agent

**Task**: Validate the reference document for [MODEL] against official [VENDOR] documentation.

**Reference Document**: `references/[category]/[file].md`
**Primary Source**: [OFFICIAL_DOCS_URL]
**Secondary Sources**: [AGGREGATOR_URLS]

**Validation Checklist**:

1. **Model Identity**
   - [ ] Correct model name/version
   - [ ] Correct API model_id (cross-check MODEL-AUDIT.md)
   - [ ] Correct provider attribution

2. **Capabilities**
   - [ ] Resolution limits verified
   - [ ] Duration limits verified
   - [ ] Supported formats verified
   - [ ] Feature claims verified (audio support, text rendering, etc.)

3. **Pricing**
   - [ ] Current pricing verified
   - [ ] Pricing tiers/variants verified
   - [ ] Credit system (if applicable) verified

4. **API Documentation**
   - [ ] Endpoint format verified
   - [ ] Authentication method verified
   - [ ] Required parameters verified
   - [ ] Optional parameters verified
   - [ ] Response format verified

5. **Prompting Guide**
   - [ ] Camera movement vocabulary verified (video)
   - [ ] Style/aesthetic terminology verified (image)
   - [ ] Voice/emotion controls verified (audio)
   - [ ] Best practices match official guidance
   - [ ] Example prompts verified

6. **Limitations**
   - [ ] Known limitations documented
   - [ ] Rate limits documented
   - [ ] Content restrictions documented

**Output**:
- List of CONFIRMED items (with evidence links)
- List of CORRECTIONS needed (with correct information and evidence)
- List of ADDITIONS (new features/capabilities not in current doc)
- Updated reference document content

**Quality Bar**:
- Every claim must have evidence from official source
- No "seems" or "probably" - use UNKNOWN if unverifiable
- Preserve document structure, only update content

Specific Agent Prompts

Video: Veo 3.1

Validate `references/video/veo-3.md` against:
- https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/veo-video-generation
- https://cloud.google.com/vertex-ai/generative-ai/docs/models/veo/3-1-generate
- https://ai.google.dev/gemini-api/docs

Focus areas:
- Timestamp prompting format (is [00:00-00:03] correct?)
- Audio generation capabilities
- Camera movement vocabulary (what terms does Google recommend?)
- Resolution/duration limits
- Pricing per second

Video: Sora 2

Validate `references/video/sora-2.md` against:
- https://platform.openai.com/docs/models/sora-2
- https://platform.openai.com/docs/models/sora-2-pro
- https://platform.openai.com/docs/api-reference/videos

Focus areas:
- Multi-scene capabilities
- Duration limits (sora-2 vs sora-2-pro)
- Resolution options
- Prompt structure recommendations
- Credit/pricing system

Video: Runway Gen-4

Validate `references/video/runway-gen4.5.md` against:
- https://docs.dev.runwayml.com/guides/models/
- https://docs.dev.runwayml.com/guides/pricing/

Focus areas:
- Gen-4 vs Gen-4.5 availability (Gen-4.5 API not yet available per audit)
- Motion Brush documentation
- Camera control parameters
- Credit system

Video: Kling 2.1

Validate `references/video/kling-2.1.md` against:
- https://klingai.com/global/dev
- https://app.klingai.com/global/dev/document-api

Focus areas:
- Model tiers (standard/pro/master)
- Lip-sync capabilities
- Camera movement vocabulary
- Duration limits per tier
- Pricing structure

Video: Luma Ray

Validate `references/video/luma-ray3.md` against:
- https://docs.lumalabs.ai/docs/api
- https://lumalabs.ai/learning-hub

Focus areas:
- Ray2 vs Ray3 availability (Ray3 API not yet available per audit)
- HDR capabilities
- Draft mode documentation
- Credit system

Video: Hailuo 02

Validate `references/video/hailuo-02.md` against:
- https://platform.minimaxi.com/docs/api-reference/video-generation-intro

Focus areas:
- Model variants (02 vs 2.3 vs 2.3-Fast)
- Resolution/duration options
- Pricing per resolution tier

Image: Nano Banana / Nano Banana Pro (Gemini native image generation)

Validate `references/image/nano-banana-pro.md` against:
- https://ai.google.dev/gemini-api/docs/nanobanana
- https://ai.google.dev/gemini-api/docs/image-generation
- https://ai.google.dev/gemini-api/docs/pricing

Focus areas:
- Correct model IDs (`gemini-2.5-flash-image`, `gemini-3-pro-image-preview`)
- Token/pricing tables and image-size token costs
- 4K output + “Thinking” + thought signatures behavior (Pro)
- Prompting vocabulary + official prompt templates

Image: Imagen 4

Validate `references/image/imagen-4.md` against:
- https://ai.google.dev/gemini-api/docs/imagen
- https://cloud.google.com/vertex-ai/generative-ai/docs/models/imagen/4-0-generate
- https://cloud.google.com/vertex-ai/generative-ai/pricing

Focus areas:
- Model variants (fast/standard/ultra) and IDs
- Pricing (Gemini API vs Vertex AI pricing surfaces)
- Aspect ratio + output size constraints
- Prompting guidance (official)

Image: FLUX.2

Validate `references/image/flux-2.md` against:
- https://docs.bfl.ai/quick_start/generating_images
- https://docs.bfl.ai/flux_2/flux2_overview
- https://bfl.ai/pricing

Focus areas:
- All FLUX.2 variants (pro/max/flex/dev)
- Endpoint-based API (not model_id based)
- Text rendering capabilities
- Pricing per megapixel

Image: GPT Image 1.5

Validate `references/image/gpt-image.md` against:
- https://platform.openai.com/docs/models/gpt-image-1.5
- https://platform.openai.com/docs/guides/image-generation

Focus areas:
- Model versions (1.5 vs 1 vs 1-mini)
- Token-based pricing
- Quality tiers
- Text rendering accuracy

Image: Midjourney V7

Validate `references/image/midjourney.md` against:
- https://docs.midjourney.com

Focus areas:
- V7 capabilities
- API availability (still no public API?)
- Parameter syntax (--ar, --stylize, etc.)
- Style reference system

Image: Ideogram 3.0

Validate `references/image/ideogram-3.md` against:
- https://developer.ideogram.ai/api-reference/api-reference/generate-v3
- https://ideogram.ai/features/3.0

Focus areas:
- Version 3.0 features
- Text rendering accuracy claims
- Style Codes feature
- API endpoint format

Image: Seedream 4.5

Validate `references/image/seedream-4.md` against:
- https://docs.byteplus.com/en/docs/ModelArk
- https://seed.bytedance.com/en/seedream4_5

Focus areas:
- API availability (via BytePlus ModelArk)
- Multi-reference fusion capabilities
- Speed benchmarks
- Pricing

Audio: ElevenLabs

Validate `references/audio/elevenlabs.md` against:
- https://elevenlabs.io/docs/overview/models
- https://elevenlabs.io/docs/api-reference

Focus areas:
- Model IDs (eleven_v3, eleven_multilingual_v2, etc. - use underscores!)
- Voice cloning requirements
- Stability/similarity controls
- Pricing per character

Audio: Suno v5

Validate `references/audio/suno-v5.md` against:
- https://help.suno.com
- https://suno.com

Focus areas:
- v5 capabilities vs v4
- NO official API (only third-party wrappers)
- Song duration limits
- Lyric formatting

Audio: Udio v1.5

Validate `references/audio/udio.md` against:
- https://help.udio.com
- https://www.udio.com/blog

Focus areas:
- v1.5 and v1.5 Allegro differences
- NO official API (Udio explicitly states this)
- Stem separation features
- Key control

Audio: OpenAI TTS

Validate `references/audio/openai-tts.md` against:
- https://platform.openai.com/docs/guides/text-to-speech
- https://platform.openai.com/docs/api-reference/audio

Focus areas:
- Model IDs (tts-1, tts-1-hd, gpt-4o-mini-tts)
- Voice options
- Instructions support (gpt-4o-mini-tts only)
- Pricing structure

Audio: Fish Audio S1

Validate `references/audio/fish-audio-openaudio-s1.md` against:
- https://docs.fish.audio/api-reference/endpoint/openapi-v1/text-to-speech
- https://docs.fish.audio/developer-guide/models-pricing

Focus areas:
- Model ID is just "s1" in API
- Pricing per UTF-8 bytes
- Emotion control capabilities
- Voice cloning

Audio: Cartesia Sonic

Validate `references/audio/cartesia-sonic.md` against:
- https://docs.cartesia.ai/build-with-cartesia/tts-models
- https://cartesia.ai/pricing

Focus areas:
- Sonic-3 vs Sonic-2 vs Sonic-turbo
- Date-stamped version snapshots
- State Space Models claims
- Latency benchmarks

On-Device Model Validation (55 Agents - 1 Per Model)

On-Device Agent Template

**Task**: Validate on-device model [MODEL] against HuggingFace/GitHub.

**Sources**:
- HuggingFace model card: [HF_URL]
- GitHub repo: [GITHUB_URL]

**MANDATORY Validation Checklist**:

1. **Hardware Requirements**
   - [ ] Minimum VRAM verified
   - [ ] Recommended VRAM verified
   - [ ] RAM requirements verified

2. **Mac Compatibility** (CRITICAL - user uses MacBook)
   - [ ] MPS (Metal) support: YES/NO/PARTIAL
   - [ ] Apple Silicon (M1/M2/M3/M4) tested: YES/NO/UNKNOWN
   - [ ] Mac-specific installation steps documented
   - [ ] Mac performance benchmarks if available
   - [ ] Known Mac limitations or issues

3. **License**
   - [ ] License type verified
   - [ ] Commercial use allowed: YES/NO/CONDITIONAL
   - [ ] Revenue limits (if any)

4. **Model Specs**
   - [ ] Parameter count verified
   - [ ] Current version/release date
   - [ ] Output specs (resolution, duration, quality)

5. **Quality Claims**
   - [ ] Benchmark scores verified with source
   - [ ] Comparison claims verified

**Output**: Corrections + Mac compatibility assessment

Video On-Device (13 agents)

# Model HuggingFace/GitHub Focus
1 HunyuanVideo 1.5 tencent/HunyuanVideo-1.5 GGUF options, VRAM, SSTA claims
2 Wan2.1/2.2 Wan-AI/Wan2.1-T2V-14B, Wan-AI/Wan2.2-TI2V-5B MoE architecture, Apache 2.0
3 LTX-Video Lightricks/LTX-Video MPS support, speed claims
4 CogVideoX THUDM/CogVideoX-5b, THUDM/CogVideoX-2b Quantization, Mac support
5 Mochi 1 genmo/mochi-1-preview VRAM requirements, ComfyUI
6 Stable Video Diffusion stabilityai/stable-video-diffusion-img2vid-xt License, optimizations
7 Open-Sora 2.0 hpcaitech/Open-Sora VRAM, output specs
8 Open-Sora Plan PKU-YuanGroup/Open-Sora-Plan v1.5 capabilities
9 AnimateDiff guoyww/AnimateDiff VRAM by config, SDXL support
10 SkyReels V1 SkyworkAI/SkyReels-V1 Human-centric features, VBench
11 Pyramid Flow rain1011/pyramid-flow-sd3 MIT license, Mac support
12 Kandinsky 5.0 kandinskylab/Kandinsky-5.0-T2V-Lite 10s video, attention engines
13 Step-Video stepfun-ai/Step-Video-T2V 30B params, multi-GPU

Image On-Device (18 agents)

# Model HuggingFace Focus
14 SD 1.5 runwayml/stable-diffusion-v1-5 License, ecosystem
15 SDXL stabilityai/stable-diffusion-xl-base-1.0 License terms, refiner
16 SDXL Turbo stabilityai/sdxl-turbo Steps, resolution limits
17 SDXL Lightning ByteDance 2-8 step quality
18 SD 3.5 Medium stabilityai/stable-diffusion-3.5-medium License (<$1M), VRAM
19 SD 3.5 Large stabilityai/stable-diffusion-3.5-large Quantization options
20 FLUX.1 Schnell black-forest-labs/FLUX.1-schnell Apache 2.0, NF4 options
21 FLUX.1 Dev black-forest-labs/FLUX.1-dev Non-commercial terms
22 FLUX.2 Dev black-forest-labs/FLUX.2-dev 32B params, consumer viability
23 Stable Cascade stabilityai/stable-cascade 3-stage architecture
24 PixArt-Sigma PixArt-alpha/PixArt-Sigma-XL-2-1024-MS DiT architecture, 4K
25 HiDream-I1 HiDream.ai 17B params, GGUF variants
26 Z-Image Turbo Tongyi-MAI/Z-Image-Turbo #1 leaderboard, bilingual
27 Kolors Kwai-Kolors/Kolors Commercial registration
28 Playground v2.5 playgroundai/playground-v2.5-1024px-aesthetic Open vs v3 closed
29 HunyuanDiT Tencent OpenVINO, Chinese
30 DeepFloyd IF DeepFloyd/IF-I-XL-v1.0 Text rendering, VRAM
31 Kandinsky 5.0 Lite kandinskylab/kandinsky-5.0-image-lite Multi-modal family

Audio TTS On-Device (17 agents)

# Model HuggingFace/GitHub Focus
32 Chatterbox ResembleAI/chatterbox MIT, emotion control, 63.8% pref
33 Fish Speech/OpenAudio S1 fishaudio/fish-speech CC-BY-NC, #1 TTS-Arena
34 CosyVoice2 FunAudioLLM/CosyVoice2-0.5B Apache 2.0, streaming
35 Kokoro-82M hexgrad/Kokoro-82M Apache 2.0, 82M params
36 F5-TTS SWivid/F5-TTS CC-BY-NC weights
37 IndexTTS-2 index-tts/index-tts Duration control
38 XTTS v2 coqui/XTTS-v2 Coqui license, 17 langs
39 StyleTTS2 yl4579/StyleTTS2 MIT, human-level
40 GPT-SoVITS RVC-Boss/GPT-SoVITS MIT, singing support
41 Bark suno/bark MIT, sound effects
42 OpenVoice v2 myshell-ai/OpenVoiceV2 MIT, lightweight
43 Piper rhasspy/piper MIT, CPU-only
44 Tortoise TTS neonbjb/tortoise-tts Apache 2.0, slow
45 WhisperSpeech WhisperSpeech/WhisperSpeech Apache 2.0/MIT
46 MaskGCT Amphion ICLR 2025, 6 langs
47 OuteTTS edwko/OuteTTS MIT, llama.cpp
48 Spark-TTS SparkAudio/Spark-TTS-0.5B CC-BY-NC-SA

Audio Music On-Device (7 agents)

# Model HuggingFace/GitHub Focus
49 ACE-Step ACE-Step/ACE-Step-v1-3.5B Apache 2.0, 4min songs
50 YuE multimodal-art-projection/YuE Apache 2.0, 5min
51 DiffRhythm ASLP-lab/DiffRhythm Apache 2.0, 4m45s
52 MusicGen facebook/musicgen-large CC-BY-NC, variants
53 Stable Audio Open stabilityai/stable-audio-open-1.0 <$1M license
54 Riffusion riffusion/riffusion-model-v1 MIT, spectrograms
55 Magenta RT Google Open weights, real-time

Execution Plan

Session: Full Library Re-Validation (73 Agents Total)

Phase 1: Cloud Models (18 Opus agents, parallel)

  • 6 video model agents
  • 6 image model agents
  • 6 audio model agents
  • Each validates against official vendor docs
  • Returns: corrections, updated content, evidence links

Phase 2: On-Device Models (55 agents, parallel batches)

  • 13 video model agents
  • 18 image model agents
  • 17 TTS model agents
  • 7 music model agents
  • Each validates against HuggingFace + GitHub
  • CRITICAL: Mac compatibility verification for each model

Phase 3: Merge & Update

  • Merge all corrections into reference docs
  • Update 3 on-device compilation docs with per-model corrections
  • Cross-check against MODEL-AUDIT.md

Phase 4: Synthesis Update

  • Update PROMPT-VOCABULARY.md with verified terminology
  • Update comparison docs with verified capabilities
  • Update COST-OPTIMIZATION.md with verified pricing

Phase 5: Finalize

  • Mark all documents as validated
  • Update CONTINUITY.md
  • Layer 2 truly complete

Agent Summary

Category Cloud Agents On-Device Agents Total
Video 7 13 20
Image 7 18 25
Audio (TTS) 4 17 21
Audio (Music) 2 7 9
Total 20 55 75

Success Criteria

  • All 20 cloud model docs validated against official sources
  • All 55 on-device models validated against HuggingFace/GitHub
  • Mac compatibility verified for every on-device model
  • Every prompting guide verified against vendor recommendations
  • Every capability claim has evidence link
  • MODEL-AUDIT.md corrections applied to reference docs
  • Synthesis docs updated to reflect corrected information
  • CONTINUITY.md updated with completion status

Files Structure

references/
├── README.md              # Library index (needs status update)
├── GLOSSARY.md            # Terms and conventions
├── GAPS.md                # Known gaps
├── VALIDATION-REPORT.md   # Accuracy verification (needs update)
├── video/
│   ├── README.md
│   ├── veo-3.md           # NEEDS REVIEW
│   ├── sora-2.md          # NEEDS REVIEW
│   ├── runway-gen4.5.md   # NEEDS REVIEW
│   ├── kling-2.1.md       # NEEDS REVIEW
│   ├── luma-ray3.md       # NEEDS REVIEW
│   ├── hailuo-02.md       # NEEDS REVIEW
│   ├── midjourney-video.md # NEEDS REVIEW
│   └── on-device-models.md # NEEDS REVIEW
├── image/
│   ├── README.md
│   ├── nano-banana-pro.md # NEEDS REVIEW
│   ├── imagen-4.md        # NEEDS REVIEW
│   ├── flux-2.md          # NEEDS REVIEW
│   ├── gpt-image.md       # NEEDS REVIEW
│   ├── midjourney.md      # NEEDS REVIEW
│   ├── ideogram-3.md      # NEEDS REVIEW
│   ├── seedream-4.md      # NEEDS REVIEW
│   └── on-device-models.md # NEEDS REVIEW
└── audio/
    ├── README.md
    ├── elevenlabs.md      # NEEDS REVIEW
    ├── suno-v5.md         # NEEDS REVIEW
    ├── udio.md            # NEEDS REVIEW
    ├── openai-tts.md      # NEEDS REVIEW
    ├── fish-audio-openaudio-s1.md # NEEDS REVIEW
    ├── cartesia-sonic.md  # NEEDS REVIEW
    └── on-device-models.md # NEEDS REVIEW

planning/synthesis/
├── MODEL-AUDIT.md         # COMPLETE (model IDs verified)
├── VIDEO-COMPARISON.md    # NEEDS UPDATE after validation
├── IMAGE-COMPARISON.md    # NEEDS UPDATE after validation
├── AUDIO-COMPARISON.md    # NEEDS UPDATE after validation
├── PROMPT-VOCABULARY.md   # NEEDS UPDATE after validation
├── COST-OPTIMIZATION.md   # NEEDS UPDATE after validation
├── SCHEMA-RECOMMENDATIONS.md # NEEDS UPDATE after validation
└── INTEGRATION-PATTERNS.md # NEEDS UPDATE after validation

This research plan was updated 2025-12-27 to require full library re-validation before Layer 2 can be considered complete.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment