Generative AI Model Research Plan

Priority: TOP — This informs schema design, skill prompts, and render pipeline.

Last Updated: 2025-12-28

Status: ✅ REFERENCES VALIDATED (cloud + on-device) — synthesis still pending

✅ Reference Library Re-Validation (Status)

Problem Identified (resolved for references/): Early drafts were created from web searches and aggregator sources and required systematic validation against official vendor documentation.

What exists:

50 cloud model reference docs
3 on-device model compilation docs (55+ models)
Synthesis documents in planning/synthesis/* (not yet revalidated)
Model ID audit (complete for models already documented; new models may appear over time)

What is now done (2025-12-28):

All existing references/** docs have been validated against official vendor documentation (for cloud) or primary upstream sources (HF/GitHub) for on-device.
Gaps (models that should exist in the library but do not yet have dedicated docs) are tracked in:
- references/GAPS.md
- references/MODEL-INVENTORY.md

Remaining risk: Any earlier inaccuracies may still exist in planning/synthesis/* until those documents are revalidated against the now-canonical reference docs.

Re-Validation Plan

Phase 1: Cloud Models (existing docs) — DONE (2025-12-28)

Each document requires an agent to:

Fetch official vendor documentation
Compare EVERY claim in the reference doc
Verify prompting vocabulary matches official guidance
Verify capabilities (resolution, duration, formats)
Verify pricing (cross-check with MODEL-AUDIT.md)
Verify API parameters and endpoints
Update the reference doc with corrections
Note any new features/capabilities not captured

Phase 2: On-Device Models (3 compilation docs) — DONE (2025-12-28)

Similar validation against HuggingFace model cards and GitHub repos.

Phase 3: Synthesis Documents (7 documents) — TODO

After reference docs are validated, verify synthesis docs reflect corrected information.

Cloud Models in the Reference Library (as of 2025-12-28)

The canonical “what’s covered vs missing” list lives in:

references/MODEL-INVENTORY.md
references/GAPS.md

The tables below are a convenience snapshot for this plan doc.

Video Generation (15 documents)

Document	Model	Provider	Primary Source	Status
`references/video/veo-3.md`	Veo 3.1	Google	cloud.google.com/vertex-ai/generative-ai/docs	COVERED
`references/video/sora-2.md`	Sora 2	OpenAI	platform.openai.com/docs	COVERED
`references/video/runway-gen4.5.md`	Gen-4/4.5	Runway	docs.dev.runwayml.com	COVERED
`references/video/kling-2.1.md`	Kling 2.1	Kuaishou	klingai.com/global/dev	COVERED
`references/video/luma-ray3.md`	Ray2/Ray3	Luma AI	docs.lumalabs.ai	COVERED
`references/video/hailuo-02.md`	Hailuo 02	MiniMax	platform.minimaxi.com/docs/api-reference/video-generation-intro	COVERED
`references/video/midjourney-video.md`	Midjourney Video	Midjourney	docs.midjourney.com/docs/video	COVERED
`references/video/seedance-1.5-pro.md`	Seedance 1.5 Pro / 1.0 family	ByteDance (Volcengine Ark)	volcengine.com/docs/82379	COVERED
`references/video/pika-2.md`	Pika 2.2 (via fal.ai)	Pika	fal.ai/models	COVERED
`references/video/pixverse.md`	PixVerse (v5.5)	PixVerse	docs.platform.pixverse.ai	COVERED
`references/video/haiper-2.x.md`	Haiper Video 2.x	Haiper	docs.haiper.ai/api-reference	COVERED
`references/video/vidu.md`	Vidu (viduq1 / 2.0 / 1.5)	Vidu	docs.platform.vidu.com	COVERED
`references/video/firefly-video.md`	Firefly Video (Generate Video API)	Adobe	developer.adobe.com/firefly-services/docs	COVERED
`references/video/nova-reel.md`	Nova Reel	AWS (Amazon Bedrock)	docs.aws.amazon.com/nova/latest/userguide	COVERED
`references/video/alibaba-wan.md`	Wan (Wan2.x / Wanx2.1 + VACE editing)	Alibaba Cloud (Model Studio / DashScope)	alibabacloud.com/help	COVERED

Image Generation (17 documents)

Document	Model	Provider	Primary Source	Status
`references/image/nano-banana-pro.md`	Nano Banana / Nano Banana Pro	Google	ai.google.dev/gemini-api/docs/image-generation	COVERED
`references/image/imagen-4.md`	Imagen 4	Google	ai.google.dev/gemini-api/docs/imagen	COVERED
`references/image/flux-2.md`	FLUX.2	Black Forest Labs	docs.bfl.ai	COVERED
`references/image/flux-kontext.md`	FLUX.1 Kontext	Black Forest Labs	docs.bfl.ai/kontext	COVERED
`references/image/gpt-image.md`	GPT Image 1.5	OpenAI	platform.openai.com/docs/guides/image-generation	COVERED
`references/image/midjourney.md`	Midjourney V7	Midjourney	docs.midjourney.com	COVERED
`references/image/ideogram-3.md`	Ideogram 3.0	Ideogram	developer.ideogram.ai	COVERED
`references/image/seedream-4.md`	Seedream 4.5	ByteDance	docs.byteplus.com	COVERED
`references/image/firefly-image.md`	Firefly Image (API)	Adobe	developer.adobe.com/firefly-services/docs	COVERED
`references/image/stability-image.md`	Stable Image + SD 3.5 (API)	Stability AI	api.stability.ai/v2alpha/openapi	COVERED
`references/image/nova-canvas.md`	Nova Canvas	AWS (Amazon Bedrock)	docs.aws.amazon.com/nova/latest/userguide	COVERED
`references/image/minimax-image.md`	MiniMax Image Generation (`image-01`, `image-01-live`)	MiniMax	platform.minimaxi.com/docs/api-reference/image-generation-intro	COVERED
`references/image/recraft.md`	Recraft (Recraft API)	Recraft	recraft.ai/docs/api-reference	COVERED
`references/image/leonardo.md`	Leonardo (Image API)	Leonardo AI	docs.leonardo.ai/reference	COVERED
`references/image/reve-image.md`	Reve Image API (Create/Edit/Remix)	Reve	api.reve.com	COVERED
`references/image/krea.md`	Krea (Image/Video API)	Krea	docs.krea.ai/api-reference	COVERED
`references/image/freepik-mystic.md`	Freepik Mystic	Freepik	docs.freepik.com/api-reference	COVERED

Audio Generation (18 documents)

Document	Model	Provider	Primary Source	Status
`references/audio/elevenlabs.md`	ElevenLabs TTS	ElevenLabs	elevenlabs.io/docs	COVERED
`references/audio/eleven-music.md`	Eleven Music	ElevenLabs	elevenlabs.io/docs	COVERED
`references/audio/minimax-music.md`	MiniMax Music 2.0 (`music-2.0`)	MiniMax	platform.minimaxi.com/docs/api-reference/music-intro	COVERED
`references/audio/suno-v5.md`	Suno v5	Suno	help.suno.com	COVERED
`references/audio/udio.md`	Udio v1.5	Udio	help.udio.com	COVERED
`references/audio/openai-tts.md`	OpenAI TTS	OpenAI	platform.openai.com/docs/guides/text-to-speech	COVERED
`references/audio/fish-audio-openaudio-s1.md`	OpenAudio S1	Fish Audio	docs.fish.audio	COVERED
`references/audio/cartesia-sonic.md`	Sonic 3	Cartesia	docs.cartesia.ai	COVERED
`references/audio/playht.md`	PlayHT	PlayHT	docs.play.ht	COVERED
`references/audio/gemini-tts.md`	Gemini Preview TTS	Google (Gemini API)	ai.google.dev/gemini-api/docs/speech-generation	COVERED
`references/audio/minimax-speech.md`	MiniMax Speech (T2A + Async + Voice Design/Cloning)	MiniMax	platform.minimaxi.com/docs/api-reference/speech-t2a-intro	COVERED
`references/audio/google-cloud-tts.md`	Google Cloud TTS	Google Cloud	cloud.google.com/text-to-speech	COVERED
`references/audio/azure-tts.md`	Azure TTS	Microsoft	learn.microsoft.com/azure/ai-services/speech-service	COVERED
`references/audio/amazon-polly.md`	Amazon Polly	AWS	docs.aws.amazon.com/polly	COVERED
`references/audio/respeecher.md`	Respeecher	Respeecher	docs.respeecher.com	COVERED
`references/audio/stable-audio.md`	Stable Audio 2 / 2.5	Stability AI	api.stability.ai/v2alpha/openapi	COVERED
`references/audio/lyria-2.md`	Lyria 2	Google	docs.cloud.google.com/vertex-ai/generative-ai/docs	COVERED
`references/audio/lyria-realtime.md`	Lyria RealTime	Google (Gemini API)	ai.google.dev/gemini-api/docs/music-generation	COVERED

On-Device Models to Validate

Document	Models	Primary Sources	Status
`references/video/on-device-models.md`	compilation doc	HuggingFace model cards, GitHub	COVERED
`references/image/on-device-models.md`	compilation doc	HuggingFace model cards, GitHub	COVERED
`references/audio/on-device-models.md`	compilation doc	HuggingFace model cards	COVERED

Agent Prompts for Validation

Template: Cloud Model Validation Agent

**Task**: Validate the reference document for [MODEL] against official [VENDOR] documentation.

**Reference Document**: `references/[category]/[file].md`
**Primary Source**: [OFFICIAL_DOCS_URL]
**Secondary Sources**: [AGGREGATOR_URLS]

**Validation Checklist**:

1. **Model Identity**
   - [ ] Correct model name/version
   - [ ] Correct API model_id (cross-check MODEL-AUDIT.md)
   - [ ] Correct provider attribution

2. **Capabilities**
   - [ ] Resolution limits verified
   - [ ] Duration limits verified
   - [ ] Supported formats verified
   - [ ] Feature claims verified (audio support, text rendering, etc.)

3. **Pricing**
   - [ ] Current pricing verified
   - [ ] Pricing tiers/variants verified
   - [ ] Credit system (if applicable) verified

4. **API Documentation**
   - [ ] Endpoint format verified
   - [ ] Authentication method verified
   - [ ] Required parameters verified
   - [ ] Optional parameters verified
   - [ ] Response format verified

5. **Prompting Guide**
   - [ ] Camera movement vocabulary verified (video)
   - [ ] Style/aesthetic terminology verified (image)
   - [ ] Voice/emotion controls verified (audio)
   - [ ] Best practices match official guidance
   - [ ] Example prompts verified

6. **Limitations**
   - [ ] Known limitations documented
   - [ ] Rate limits documented
   - [ ] Content restrictions documented

**Output**:
- List of CONFIRMED items (with evidence links)
- List of CORRECTIONS needed (with correct information and evidence)
- List of ADDITIONS (new features/capabilities not in current doc)
- Updated reference document content

**Quality Bar**:
- Every claim must have evidence from official source
- No "seems" or "probably" - use UNKNOWN if unverifiable
- Preserve document structure, only update content

Specific Agent Prompts

Video: Veo 3.1

Validate `references/video/veo-3.md` against:
- https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/veo-video-generation
- https://cloud.google.com/vertex-ai/generative-ai/docs/models/veo/3-1-generate
- https://ai.google.dev/gemini-api/docs

Focus areas:
- Timestamp prompting format (is [00:00-00:03] correct?)
- Audio generation capabilities
- Camera movement vocabulary (what terms does Google recommend?)
- Resolution/duration limits
- Pricing per second

Video: Sora 2

Validate `references/video/sora-2.md` against:
- https://platform.openai.com/docs/models/sora-2
- https://platform.openai.com/docs/models/sora-2-pro
- https://platform.openai.com/docs/api-reference/videos

Focus areas:
- Multi-scene capabilities
- Duration limits (sora-2 vs sora-2-pro)
- Resolution options
- Prompt structure recommendations
- Credit/pricing system

Video: Runway Gen-4

Validate `references/video/runway-gen4.5.md` against:
- https://docs.dev.runwayml.com/guides/models/
- https://docs.dev.runwayml.com/guides/pricing/

Focus areas:
- Gen-4 vs Gen-4.5 availability (Gen-4.5 API not yet available per audit)
- Motion Brush documentation
- Camera control parameters
- Credit system

Video: Kling 2.1

Validate `references/video/kling-2.1.md` against:
- https://klingai.com/global/dev
- https://app.klingai.com/global/dev/document-api

Focus areas:
- Model tiers (standard/pro/master)
- Lip-sync capabilities
- Camera movement vocabulary
- Duration limits per tier
- Pricing structure

Video: Luma Ray

Validate `references/video/luma-ray3.md` against:
- https://docs.lumalabs.ai/docs/api
- https://lumalabs.ai/learning-hub

Focus areas:
- Ray2 vs Ray3 availability (Ray3 API not yet available per audit)
- HDR capabilities
- Draft mode documentation
- Credit system

Video: Hailuo 02

Validate `references/video/hailuo-02.md` against:
- https://platform.minimaxi.com/docs/api-reference/video-generation-intro

Focus areas:
- Model variants (02 vs 2.3 vs 2.3-Fast)
- Resolution/duration options
- Pricing per resolution tier

Image: Nano Banana / Nano Banana Pro (Gemini native image generation)

Validate `references/image/nano-banana-pro.md` against:
- https://ai.google.dev/gemini-api/docs/nanobanana
- https://ai.google.dev/gemini-api/docs/image-generation
- https://ai.google.dev/gemini-api/docs/pricing

Focus areas:
- Correct model IDs (`gemini-2.5-flash-image`, `gemini-3-pro-image-preview`)
- Token/pricing tables and image-size token costs
- 4K output + “Thinking” + thought signatures behavior (Pro)
- Prompting vocabulary + official prompt templates

Image: Imagen 4

Validate `references/image/imagen-4.md` against:
- https://ai.google.dev/gemini-api/docs/imagen
- https://cloud.google.com/vertex-ai/generative-ai/docs/models/imagen/4-0-generate
- https://cloud.google.com/vertex-ai/generative-ai/pricing

Focus areas:
- Model variants (fast/standard/ultra) and IDs
- Pricing (Gemini API vs Vertex AI pricing surfaces)
- Aspect ratio + output size constraints
- Prompting guidance (official)

Image: FLUX.2

Validate `references/image/flux-2.md` against:
- https://docs.bfl.ai/quick_start/generating_images
- https://docs.bfl.ai/flux_2/flux2_overview
- https://bfl.ai/pricing

Focus areas:
- All FLUX.2 variants (pro/max/flex/dev)
- Endpoint-based API (not model_id based)
- Text rendering capabilities
- Pricing per megapixel

Image: GPT Image 1.5

Validate `references/image/gpt-image.md` against:
- https://platform.openai.com/docs/models/gpt-image-1.5
- https://platform.openai.com/docs/guides/image-generation

Focus areas:
- Model versions (1.5 vs 1 vs 1-mini)
- Token-based pricing
- Quality tiers
- Text rendering accuracy

Image: Midjourney V7

Validate `references/image/midjourney.md` against:
- https://docs.midjourney.com

Focus areas:
- V7 capabilities
- API availability (still no public API?)
- Parameter syntax (--ar, --stylize, etc.)
- Style reference system

Image: Ideogram 3.0

Validate `references/image/ideogram-3.md` against:
- https://developer.ideogram.ai/api-reference/api-reference/generate-v3
- https://ideogram.ai/features/3.0

Focus areas:
- Version 3.0 features
- Text rendering accuracy claims
- Style Codes feature
- API endpoint format

Image: Seedream 4.5

Validate `references/image/seedream-4.md` against:
- https://docs.byteplus.com/en/docs/ModelArk
- https://seed.bytedance.com/en/seedream4_5

Focus areas:
- API availability (via BytePlus ModelArk)
- Multi-reference fusion capabilities
- Speed benchmarks
- Pricing

Audio: ElevenLabs

Validate `references/audio/elevenlabs.md` against:
- https://elevenlabs.io/docs/overview/models
- https://elevenlabs.io/docs/api-reference

Focus areas:
- Model IDs (eleven_v3, eleven_multilingual_v2, etc. - use underscores!)
- Voice cloning requirements
- Stability/similarity controls
- Pricing per character

Audio: Suno v5

Validate `references/audio/suno-v5.md` against:
- https://help.suno.com
- https://suno.com

Focus areas:
- v5 capabilities vs v4
- NO official API (only third-party wrappers)
- Song duration limits
- Lyric formatting

Audio: Udio v1.5

Validate `references/audio/udio.md` against:
- https://help.udio.com
- https://www.udio.com/blog

Focus areas:
- v1.5 and v1.5 Allegro differences
- NO official API (Udio explicitly states this)
- Stem separation features
- Key control

Audio: OpenAI TTS

Validate `references/audio/openai-tts.md` against:
- https://platform.openai.com/docs/guides/text-to-speech
- https://platform.openai.com/docs/api-reference/audio

Focus areas:
- Model IDs (tts-1, tts-1-hd, gpt-4o-mini-tts)
- Voice options
- Instructions support (gpt-4o-mini-tts only)
- Pricing structure

Audio: Fish Audio S1

Validate `references/audio/fish-audio-openaudio-s1.md` against:
- https://docs.fish.audio/api-reference/endpoint/openapi-v1/text-to-speech
- https://docs.fish.audio/developer-guide/models-pricing

Focus areas:
- Model ID is just "s1" in API
- Pricing per UTF-8 bytes
- Emotion control capabilities
- Voice cloning

Audio: Cartesia Sonic

Validate `references/audio/cartesia-sonic.md` against:
- https://docs.cartesia.ai/build-with-cartesia/tts-models
- https://cartesia.ai/pricing

Focus areas:
- Sonic-3 vs Sonic-2 vs Sonic-turbo
- Date-stamped version snapshots
- State Space Models claims
- Latency benchmarks

On-Device Model Validation (55 Agents - 1 Per Model)

On-Device Agent Template

**Task**: Validate on-device model [MODEL] against HuggingFace/GitHub.

**Sources**:
- HuggingFace model card: [HF_URL]
- GitHub repo: [GITHUB_URL]

**MANDATORY Validation Checklist**:

1. **Hardware Requirements**
   - [ ] Minimum VRAM verified
   - [ ] Recommended VRAM verified
   - [ ] RAM requirements verified

2. **Mac Compatibility** (CRITICAL - user uses MacBook)
   - [ ] MPS (Metal) support: YES/NO/PARTIAL
   - [ ] Apple Silicon (M1/M2/M3/M4) tested: YES/NO/UNKNOWN
   - [ ] Mac-specific installation steps documented
   - [ ] Mac performance benchmarks if available
   - [ ] Known Mac limitations or issues

3. **License**
   - [ ] License type verified
   - [ ] Commercial use allowed: YES/NO/CONDITIONAL
   - [ ] Revenue limits (if any)

4. **Model Specs**
   - [ ] Parameter count verified
   - [ ] Current version/release date
   - [ ] Output specs (resolution, duration, quality)

5. **Quality Claims**
   - [ ] Benchmark scores verified with source
   - [ ] Comparison claims verified

**Output**: Corrections + Mac compatibility assessment

Video On-Device (13 agents)

#	Model	HuggingFace/GitHub	Focus
1	HunyuanVideo 1.5	`tencent/HunyuanVideo-1.5`	GGUF options, VRAM, SSTA claims
2	Wan2.1/2.2	`Wan-AI/Wan2.1-T2V-14B`, `Wan-AI/Wan2.2-TI2V-5B`	MoE architecture, Apache 2.0
3	LTX-Video	`Lightricks/LTX-Video`	MPS support, speed claims
4	CogVideoX	`THUDM/CogVideoX-5b`, `THUDM/CogVideoX-2b`	Quantization, Mac support
5	Mochi 1	`genmo/mochi-1-preview`	VRAM requirements, ComfyUI
6	Stable Video Diffusion	`stabilityai/stable-video-diffusion-img2vid-xt`	License, optimizations
7	Open-Sora 2.0	`hpcaitech/Open-Sora`	VRAM, output specs
8	Open-Sora Plan	`PKU-YuanGroup/Open-Sora-Plan`	v1.5 capabilities
9	AnimateDiff	`guoyww/AnimateDiff`	VRAM by config, SDXL support
10	SkyReels V1	`SkyworkAI/SkyReels-V1`	Human-centric features, VBench
11	Pyramid Flow	`rain1011/pyramid-flow-sd3`	MIT license, Mac support
12	Kandinsky 5.0	`kandinskylab/Kandinsky-5.0-T2V-Lite`	10s video, attention engines
13	Step-Video	`stepfun-ai/Step-Video-T2V`	30B params, multi-GPU

Image On-Device (18 agents)

#	Model	HuggingFace	Focus
14	SD 1.5	`runwayml/stable-diffusion-v1-5`	License, ecosystem
15	SDXL	`stabilityai/stable-diffusion-xl-base-1.0`	License terms, refiner
16	SDXL Turbo	`stabilityai/sdxl-turbo`	Steps, resolution limits
17	SDXL Lightning	ByteDance	2-8 step quality
18	SD 3.5 Medium	`stabilityai/stable-diffusion-3.5-medium`	License (<$1M), VRAM
19	SD 3.5 Large	`stabilityai/stable-diffusion-3.5-large`	Quantization options
20	FLUX.1 Schnell	`black-forest-labs/FLUX.1-schnell`	Apache 2.0, NF4 options
21	FLUX.1 Dev	`black-forest-labs/FLUX.1-dev`	Non-commercial terms
22	FLUX.2 Dev	`black-forest-labs/FLUX.2-dev`	32B params, consumer viability
23	Stable Cascade	`stabilityai/stable-cascade`	3-stage architecture
24	PixArt-Sigma	`PixArt-alpha/PixArt-Sigma-XL-2-1024-MS`	DiT architecture, 4K
25	HiDream-I1	HiDream.ai	17B params, GGUF variants
26	Z-Image Turbo	`Tongyi-MAI/Z-Image-Turbo`	#1 leaderboard, bilingual
27	Kolors	`Kwai-Kolors/Kolors`	Commercial registration
28	Playground v2.5	`playgroundai/playground-v2.5-1024px-aesthetic`	Open vs v3 closed
29	HunyuanDiT	Tencent	OpenVINO, Chinese
30	DeepFloyd IF	`DeepFloyd/IF-I-XL-v1.0`	Text rendering, VRAM
31	Kandinsky 5.0 Lite	`kandinskylab/kandinsky-5.0-image-lite`	Multi-modal family

Audio TTS On-Device (17 agents)

#	Model	HuggingFace/GitHub	Focus
32	Chatterbox	`ResembleAI/chatterbox`	MIT, emotion control, 63.8% pref
33	Fish Speech/OpenAudio S1	`fishaudio/fish-speech`	CC-BY-NC, #1 TTS-Arena
34	CosyVoice2	`FunAudioLLM/CosyVoice2-0.5B`	Apache 2.0, streaming
35	Kokoro-82M	`hexgrad/Kokoro-82M`	Apache 2.0, 82M params
36	F5-TTS	`SWivid/F5-TTS`	CC-BY-NC weights
37	IndexTTS-2	`index-tts/index-tts`	Duration control
38	XTTS v2	`coqui/XTTS-v2`	Coqui license, 17 langs
39	StyleTTS2	`yl4579/StyleTTS2`	MIT, human-level
40	GPT-SoVITS	`RVC-Boss/GPT-SoVITS`	MIT, singing support
41	Bark	`suno/bark`	MIT, sound effects
42	OpenVoice v2	`myshell-ai/OpenVoiceV2`	MIT, lightweight
43	Piper	`rhasspy/piper`	MIT, CPU-only
44	Tortoise TTS	`neonbjb/tortoise-tts`	Apache 2.0, slow
45	WhisperSpeech	`WhisperSpeech/WhisperSpeech`	Apache 2.0/MIT
46	MaskGCT	Amphion	ICLR 2025, 6 langs
47	OuteTTS	`edwko/OuteTTS`	MIT, llama.cpp
48	Spark-TTS	`SparkAudio/Spark-TTS-0.5B`	CC-BY-NC-SA

Audio Music On-Device (7 agents)

#	Model	HuggingFace/GitHub	Focus
49	ACE-Step	`ACE-Step/ACE-Step-v1-3.5B`	Apache 2.0, 4min songs
50	YuE	`multimodal-art-projection/YuE`	Apache 2.0, 5min
51	DiffRhythm	`ASLP-lab/DiffRhythm`	Apache 2.0, 4m45s
52	MusicGen	`facebook/musicgen-large`	CC-BY-NC, variants
53	Stable Audio Open	`stabilityai/stable-audio-open-1.0`	<$1M license
54	Riffusion	`riffusion/riffusion-model-v1`	MIT, spectrograms
55	Magenta RT	Google	Open weights, real-time

Execution Plan

Session: Full Library Re-Validation (73 Agents Total)

Phase 1: Cloud Models (18 Opus agents, parallel)

6 video model agents
6 image model agents
6 audio model agents
Each validates against official vendor docs
Returns: corrections, updated content, evidence links

Phase 2: On-Device Models (55 agents, parallel batches)

13 video model agents
18 image model agents
17 TTS model agents
7 music model agents
Each validates against HuggingFace + GitHub
CRITICAL: Mac compatibility verification for each model

Phase 3: Merge & Update

Merge all corrections into reference docs
Update 3 on-device compilation docs with per-model corrections
Cross-check against MODEL-AUDIT.md

Phase 4: Synthesis Update

Update PROMPT-VOCABULARY.md with verified terminology
Update comparison docs with verified capabilities
Update COST-OPTIMIZATION.md with verified pricing

Phase 5: Finalize

Mark all documents as validated
Update CONTINUITY.md
Layer 2 truly complete

Agent Summary

Category	Cloud Agents	On-Device Agents	Total
Video	7	13	20
Image	7	18	25
Audio (TTS)	4	17	21
Audio (Music)	2	7	9
Total	20	55	75

Success Criteria

All 20 cloud model docs validated against official sources
All 55 on-device models validated against HuggingFace/GitHub
Mac compatibility verified for every on-device model
Every prompting guide verified against vendor recommendations
Every capability claim has evidence link
MODEL-AUDIT.md corrections applied to reference docs
Synthesis docs updated to reflect corrected information
CONTINUITY.md updated with completion status

Files Structure

references/
├── README.md              # Library index (needs status update)
├── GLOSSARY.md            # Terms and conventions
├── GAPS.md                # Known gaps
├── VALIDATION-REPORT.md   # Accuracy verification (needs update)
├── video/
│   ├── README.md
│   ├── veo-3.md           # NEEDS REVIEW
│   ├── sora-2.md          # NEEDS REVIEW
│   ├── runway-gen4.5.md   # NEEDS REVIEW
│   ├── kling-2.1.md       # NEEDS REVIEW
│   ├── luma-ray3.md       # NEEDS REVIEW
│   ├── hailuo-02.md       # NEEDS REVIEW
│   ├── midjourney-video.md # NEEDS REVIEW
│   └── on-device-models.md # NEEDS REVIEW
├── image/
│   ├── README.md
│   ├── nano-banana-pro.md # NEEDS REVIEW
│   ├── imagen-4.md        # NEEDS REVIEW
│   ├── flux-2.md          # NEEDS REVIEW
│   ├── gpt-image.md       # NEEDS REVIEW
│   ├── midjourney.md      # NEEDS REVIEW
│   ├── ideogram-3.md      # NEEDS REVIEW
│   ├── seedream-4.md      # NEEDS REVIEW
│   └── on-device-models.md # NEEDS REVIEW
└── audio/
    ├── README.md
    ├── elevenlabs.md      # NEEDS REVIEW
    ├── suno-v5.md         # NEEDS REVIEW
    ├── udio.md            # NEEDS REVIEW
    ├── openai-tts.md      # NEEDS REVIEW
    ├── fish-audio-openaudio-s1.md # NEEDS REVIEW
    ├── cartesia-sonic.md  # NEEDS REVIEW
    └── on-device-models.md # NEEDS REVIEW

planning/synthesis/
├── MODEL-AUDIT.md         # COMPLETE (model IDs verified)
├── VIDEO-COMPARISON.md    # NEEDS UPDATE after validation
├── IMAGE-COMPARISON.md    # NEEDS UPDATE after validation
├── AUDIO-COMPARISON.md    # NEEDS UPDATE after validation
├── PROMPT-VOCABULARY.md   # NEEDS UPDATE after validation
├── COST-OPTIMIZATION.md   # NEEDS UPDATE after validation
├── SCHEMA-RECOMMENDATIONS.md # NEEDS UPDATE after validation
└── INTEGRATION-PATTERNS.md # NEEDS UPDATE after validation

This research plan was updated 2025-12-27 to require full library re-validation before Layer 2 can be considered complete.

numman-ali/Generative AI Model Research Plan.md