Dynamic Model Selection for Contextual Ad Generation

Author: Albert Bentov Date: 2026-02-11 Status: Design Proposal

Summary

This document proposes a two-tiered approach for intelligent model selection in contextual ad generation:

Simple Approach (immediate): Rule-based routing for current production based on expected click value (eCPM) and domain reputation
Advanced Approach (post-online learning): Learned routing integrated with quality predictor (Ĉ) and performance predictor (P̂)

Expected Impact:

50-70% cost reduction on low-value impressions
Maintained quality on high-value impressions
2-5x faster response times for most requests
Profitability threshold enforcement per impression

Problem Statement
API Cost Analysis (2026)
Simple Approach: Rule-Based Routing
Advanced Approach: Learned Routing
Fast Wins: Input Token Optimization
ROI Analysis & Break-Even Scenarios
Implementation Roadmap
Risk Mitigation

1. Problem Statement

Current State

Our production system (ControlledAd.py) serves contextual ads with human-in-the-loop approval:

Fetch article (title + body)
Generate embeddings (256d)
Find anchor ad via similarity search
Exploration trigger: When predefined categories or approved candidates fail similarity threshold
Brand safety check (LLM call on title + content)
Generate candidate variants (LLM with mega-prompt: brand + styling + strategies + few-shot + safety instructions)
Generate image
Human approval → serve winning ad

Problem: We use expensive models uniformly regardless of:

Expected click value (advertiser's willingness to pay)
Domain quality/reputation (premium publishers vs low-traffic blogs)
Content complexity (simple product ads vs nuanced brand campaigns)

Result: Unprofitable on low-eCPM impressions, over-engineered for simple contexts.

Goal

Dynamically select model tier (LLM size, image generation quality) based on:

$$ \text{ModelTier} = f(\text{eCPM}, \text{domainStats}, \text{contentComplexity}) $$

Constraint: Maintain quality standards while maximizing profit margin per impression.

2. API Cost Analysis (2026)

2.1 LLM Pricing (Text Generation)

Model	Provider	Input Cost ($/1M tokens)	Output Cost ($/1M tokens)	Context	Speed	Use Case
GPT-5.2	OpenAI	$1.75	$14.00	400K	Fast	Premium tier
Gemini 3 Pro	Google	$2.00	$12.00	200K	Fast	Premium tier
Gemini 3 Flash	Google	$0.50	$3.00	1M	Very Fast	Balanced tier
Gemini 2.0 Flash	Google	$0.10	$0.40	1M	Very Fast	Budget tier
Llama 3.1 8B	Groq	$0.05	$0.08	128K	Ultra Fast	Ultra-budget
Mixtral 8x7B	Groq	$0.27	$0.27	32K	Very Fast	Budget alternative
Claude Haiku	Anthropic	$1.00	$5.00	200K	Fast	Budget fallback

Key Observations:

Gemini 2.0 Flash is 20x cheaper than Gemini 3 Pro on input, 30x cheaper than GPT-5.2
Groq inference is 35-40x cheaper than premium models with acceptable quality trade-offs
Context caching available on Gemini (75% savings on repeated prompts, cache reads at 10% of input price)
GPT-5.2 generates internal "thinking" tokens billed as output ($14/1M)

Sources:

2.2 Image Generation Pricing

Model	Provider	Resolution	Cost per Image	Speed	Use Case
Imagen 3	Google	1024×1024	$0.030	~8s	Premium tier
FLUX.1 [pro]	Replicate	1024×1024	$0.055	~10s	High quality
FLUX.1 [dev]	Replicate	1024×1024	$0.030	~6s	Balanced tier
FLUX.1 [schnell]	Replicate	1024×1024	$0.003	~2s	Budget tier

Key Observations:

Flux schnell is 10x cheaper than Imagen 3 with acceptable quality
2-3 second generation time enables real-time workflows
Flux dev offers good balance (same price as Imagen 3, faster)

Sources:

2.3 Realistic Request Cost Breakdown

Production mega-prompt structure:

Brand description: ~200 tokens
Styling instructions: ~300 tokens
Strategy guidelines: ~200 tokens
Few-shot examples (3-5 examples): ~500 tokens
Safety instructions: ~150 tokens
Article content (full): ~2500 tokens
Total input: ~3,850 tokens

Brand safety call:

System prompt: ~200 tokens
Article (title + content): ~2500 tokens
Total safety check: ~2,700 tokens

Scenario: Generate 1 contextual ad with exploration triggered

Component	Tokens/Params	Model	Cost
Brand safety check	2,700 input + 50 output	GPT-5.2 (current prod)	$0.00543
Article embedding	2,500 input	text-embedding-3-small	$0.00005
Tagline generation	3,850 input + 150 output	GPT-5.2 or Gemini 3 Pro	$0.00884
Image generation	1 image	Imagen 3	$0.03000
Total (Premium)			$0.0443

Alternative (Budget):

Component	Tokens/Params	Model	Cost
Brand safety check	2,700 input + 50 output	GPT-5.2 (unchanged)	$0.00543
Article embedding	800 input (title + para1)	text-embedding-3-small	$0.00002
Tagline generation (compact)	1,200 input + 150 output	Gemini 2.0 Flash	$0.00018
Image generation	1 image	Flux schnell	$0.00300
Total (Budget)			$0.0086

Savings: 91% cost reduction per generation

Ultra-budget (Groq):

Component	Tokens/Params	Model	Cost
Brand safety check	2,700 input + 50 output	Llama 3.1 8B (Groq)	$0.00014
Tagline generation (compact)	1,200 input + 150 output	Llama 3.1 8B (Groq)	$0.00007
Image generation	1 image	Flux schnell	$0.00300
Total (Ultra-budget)			$0.0032

Savings: 92% cost reduction, 5x faster

3. Simple Approach: Rule-Based Routing

3.1 Routing Logic

def select_model_tier(
    ecpm: float,              # Expected CPM ($/1000 impressions)
    domain_quality: str,      # 'premium' | 'standard' | 'low'
    content_length: int,      # Article word count
    campaign_type: str        # 'brand_awareness' | 'performance'
) -> dict:
    """
    Simple rule-based model selection.

    Returns:
        {
            'llm': str,
            'llm_tier': 'premium' | 'balanced' | 'budget',
            'image': str,
            'image_tier': 'premium' | 'balanced' | 'budget',
            'input_mode': 'full_article' | 'title_plus_para1',
            'max_cost': float
        }
    """

    # Profitability threshold (must cover at least 2x generation cost)
    MIN_ECPM_PREMIUM = 10.0   # $10 eCPM = $0.010 per impression
    MIN_ECPM_BALANCED = 3.0   # $3 eCPM = $0.003 per impression

    # Decision tree
    if ecpm >= MIN_ECPM_PREMIUM and domain_quality == 'premium':
        # High-value, premium publishers → best quality
        return {
            'llm': 'gpt-5.2',  # or 'gemini-3-pro'
            'llm_tier': 'premium',
            'image': 'imagen-3',
            'image_tier': 'premium',
            'input_mode': 'full_article',
            'safety_model': 'gpt-5.2',  # Current production
            'max_cost': 0.0443
        }

    elif ecpm >= MIN_ECPM_BALANCED and domain_quality in ['premium', 'standard']:
        # Mid-value, good publishers → balanced
        return {
            'llm': 'gemini-3-flash',
            'llm_tier': 'balanced',
            'image': 'flux-dev',
            'image_tier': 'balanced',
            'input_mode': 'title_plus_para1',
            'safety_model': 'gpt-5.2',  # Current production
            'max_cost': 0.0117
        }

    elif campaign_type == 'brand_awareness':
        # Brand campaigns → prioritize quality over cost
        return {
            'llm': 'gpt-5.2',  # or 'gemini-3-pro'
            'llm_tier': 'premium',
            'image': 'flux-dev',  # Balanced image sufficient
            'image_tier': 'balanced',
            'input_mode': 'title_plus_para1',
            'safety_model': 'gpt-5.2',  # Current production
            'max_cost': 0.0159
        }

    else:
        # Low-value or unproven domains → budget
        return {
            'llm': 'gemini-2-flash',
            'llm_tier': 'budget',
            'image': 'flux-schnell',
            'image_tier': 'budget',
            'input_mode': 'title_plus_para1',
            'safety_model': 'gpt-5.2',  # Current production
            'max_cost': 0.0086
        }

3.2 Domain Quality Classification

Data Sources (existing in production):

Impression count (from impressions table)
CTR history (clicks / impressions per domain)
Human approval rate (from controlled_ads type=2 vs type=-1)
Publisher whitelist/blacklist

Simple Heuristic:

def classify_domain_quality(domain: str) -> str:
    """Classify domain based on historical stats."""
    stats = get_domain_stats(domain)

    if domain in PREMIUM_WHITELIST:
        return 'premium'

    if stats['impression_count'] > 10000 and stats['ctr'] > 0.02:
        return 'premium'

    if stats['impression_count'] > 1000 and stats['ctr'] > 0.01:
        return 'standard'

    return 'low'

3.3 Integration into ControlledAd.py

Modification point before exploration trigger:

def _trigger_exploration_async(self, selected_ad: Dict | None) -> None:
    """Trigger exploration with dynamic model selection."""

    # NEW: Select model tier before generation
    model_config = select_model_tier(
        ecpm=self.calculate_ecpm(),
        domain_quality=self.classify_domain(),
        content_length=len(self.article_text.split()),
        campaign_type=self.campaign_type
    )

    # Store config for exploration method to use
    self.model_config = model_config

    # Existing exploration logic...
    if self.cache.get_from_cache(self.key_lock_exploration):
        return

    self.cache.update_cache(
        self.key_lock_exploration,
        {'exploration_in_progress': 1},
        EXPIRATION_60_SEC
    )

    if self.exploration_method:
        async_call(self._execute_exploration_on_copy, selected_ad)

3.4 Expected Impact

Traffic Distribution (estimated):

Tier	% Traffic	Avg eCPM	Current Cost	New Cost	Savings
Premium	15%	$12.00	$0.0443	$0.0443	$0
Balanced	35%	$5.00	$0.0443	$0.0117	$0.0326
Budget	50%	$1.50	$0.0443	$0.0086	$0.0357

Total Savings: (0.35 × $0.0326) + (0.50 × $0.0357) = $0.0293 per impression (66% reduction)

Annual Impact (1M impressions/month):

Current: $44,300/month
New: $15,055/month
Savings: $29,245/month ($350,940/year)

4. Advanced Approach: Learned Routing

4.1 Integration with Online Learning System

Once the self-learning framework (Ĉ, P̂, DSPy) is operational, upgrade routing to use learned signals:

def select_model_tier_learned(
    context: dict,           # Brand, article, domain
    C_hat_threshold: float = 0.7,  # Quality predictor threshold
    P_hat_threshold: float = 0.02, # Performance predictor threshold
    ecpm: float = None
) -> dict:
    """
    Learned model selection using quality and performance predictors.

    Key insight: If we predict high approval (Ĉ) and high CTR (P̂),
    it's worth investing in premium models. Otherwise, use budget.
    """

    # Quick quality pre-check using Ĉ on anchor ad
    anchor_quality = C_hat(context['brand'], context['article'], context['anchor'])

    # Predicted performance using P̂ on anchor
    predicted_ctr = P_hat(context['article'], context['anchor'])

    # Calculate expected value of premium vs budget generation
    premium_value = (
        predicted_ctr * 1.2 *  # Assume 20% CTR lift from premium models
        ecpm / 1000 -           # Revenue per impression
        0.0411                  # Premium cost
    )

    budget_value = (
        predicted_ctr *         # No CTR lift assumption
        ecpm / 1000 -           # Revenue per impression
        0.0035                  # Budget cost
    )

    # Decision: use premium only if EV is higher
    if premium_value > budget_value and anchor_quality > C_hat_threshold:
        return {
            'llm': 'gpt-5.2',  # or 'gemini-3-pro'
            'llm_tier': 'premium',
            'image': 'imagen-3',
            'image_tier': 'premium',
            'input_mode': 'full_article',
            'expected_value': premium_value,
            'reason': f'High quality ({anchor_quality:.2f}) + high CTR ({predicted_ctr:.3f})'
        }

    else:
        return {
            'llm': 'gemini-2-flash',
            'llm_tier': 'budget',
            'image': 'flux-schnell',
            'image_tier': 'budget',
            'input_mode': 'title_plus_para1',
            'expected_value': budget_value,
            'reason': f'Budget sufficient (quality={anchor_quality:.2f}, CTR={predicted_ctr:.3f})'
        }

4.2 Multi-Armed Bandit for Model Tier Selection

Treat model tier selection as a contextual bandit problem:

Context: (brand_id, domain_tier, content_category, article_length) Actions: (premium, balanced, budget) Reward: (revenue - cost) per impression

class ModelTierBandit:
    """Contextual bandit for model tier selection."""

    def __init__(self):
        self.policy = EpsilonGreedy(epsilon=0.1)
        self.context_encoder = embed_context
        self.Q_table = defaultdict(lambda: {'premium': 0.0, 'balanced': 0.0, 'budget': 0.0})

    def select_tier(self, context: dict) -> str:
        """Select model tier using ε-greedy policy."""
        context_key = self.context_encoder(context)

        if random.random() < self.policy.epsilon:
            return random.choice(['premium', 'balanced', 'budget'])
        else:
            return max(self.Q_table[context_key], key=self.Q_table[context_key].get)

    def update(self, context: dict, tier: str, reward: float):
        """Update Q-value after observing reward."""
        context_key = self.context_encoder(context)
        alpha = 0.1  # Learning rate
        old_Q = self.Q_table[context_key][tier]
        self.Q_table[context_key][tier] = old_Q + alpha * (reward - old_Q)

5. Fast Wins: Input Token Optimization

5.1 Current Input (Full Article + Mega Prompt)

Typical production prompt:

Mega-prompt components: ~1,350 tokens
- Brand description: 200
- Styling instructions: 300
- Strategy guidelines: 200
- Few-shot examples: 500
- Safety instructions: 150
Article (full): ~2,500 tokens
Total input: ~3,850 tokens

Brand safety call:

Article (title + content): ~2,500 tokens
Safety prompt: ~200 tokens
Total: ~2,700 tokens

5.2 Optimized Input (Title + First Paragraph)

Reduced input:

Mega-prompt components: ~1,350 tokens (same)
Article (title + para1): ~400 tokens
Total input: ~1,750 tokens

Brand safety call (unchanged):

Still uses full article for safety: ~2,700 tokens

Savings: 54% input token reduction on generation (safety unchanged for quality)

5.3 Cost Impact

Model	Full Article Cost	Compact Cost	Savings
GPT-5.2	$0.00884	$0.00401	$0.00483 (55%)
Gemini 3 Pro	$0.00950	$0.00431	$0.00519 (55%)
Gemini 2.0 Flash	$0.00039	$0.00018	$0.00021 (54%)

5.4 Implementation

def build_compact_prompt(self, context: dict) -> str:
    """Build prompt using only title + first paragraph."""

    article_title = context['article']['title']
    article_body = context['article']['body']

    # Extract first paragraph (split by \n\n or first 150 words)
    first_paragraph = self.extract_first_paragraph(article_body, max_words=150)

    # Mega-prompt components (unchanged)
    mega_prompt = self.build_mega_prompt_base(context['brand'])

    prompt = f"""{mega_prompt}

Article title: {article_title}
Article excerpt: {first_paragraph}

Anchor tagline: {context['anchor']['tagline']}

Generate contextual tagline variant following brand guidelines above.

Tagline:
"""

    return prompt

6. ROI Analysis & Break-Even Scenarios

6.1 Cost-Revenue Model

Profit per impression:

$$ \text{Profit} = \frac{\text{eCPM}}{1000} - C_{gen} $$

Or for CPC campaigns:

$$ \text{Profit} = \text{CTR} \times \text{CPC} - C_{gen} $$

6.2 Break-Even Analysis

Model Tier	$C_{gen}$	Break-even eCPM (2× margin)	Break-even CPC (1% CTR)
Premium (GPT-5.2 + Imagen)	$0.0443	$88.60	$4.43
Balanced (Gemini 3 Flash + Flux)	$0.0117	$23.40	$1.17
Budget (Gemini 2 Flash + Flux)	$0.0086	$17.20	$0.86

Interpretation:

Premium tier requires $82+ eCPM to be profitable with 2× margin
Budget tier profitable at $7 eCPM (achievable on most campaigns)
Ultra-cheap models (Groq + Flux schnell) profitable at <$1 eCPM

6.3 Scenario Analysis

Scenario 1: Mid-value campaign (eCPM = $5, CTR = 1.5%)

Tier	Cost	Revenue	Profit	ROI
Premium	$0.0443	$0.0050	-$0.0393	-88.7%
Balanced	$0.0117	$0.0050	-$0.0067	-57.3%
Budget	$0.0086	$0.0050	-$0.0036	-41.9%

Conclusion: Only budget tier profitable for typical campaigns.

Scenario 2: Premium publisher (CPC = $8, CTR = 2.5%)

Tier	Cost	Revenue (CTR × CPC)	Profit	ROI
Premium	$0.0443	$0.20	$0.1557	+351.5%
Balanced	$0.0117	$0.20	$0.1883	+1609.4%
Budget	$0.0086	$0.20 × 0.95	$0.1814	+2109.3%

Insight: Even with -5% quality penalty, budget tier delivers highest ROI. Premium justified only for brand-sensitive campaigns.

7. Implementation Roadmap

Phase 1: Simple Rule-Based Routing

Deliverables:

select_model_tier() function with eCPM + domain quality routing
Domain quality classifier (premium/standard/low)
Integration into ControlledAd._trigger_exploration_async()
Logging: model_tier, generation_cost, decision_reason

Success criteria:

50% of traffic routed to budget tier
No drop in approval rate
Cost savings confirmed

Phase 2: Input Token Optimization

Deliverables:

build_compact_prompt() using title + para1
A/B test framework (50/50 split)
Quality monitoring dashboard

Success criteria:

<5% approval rate drop
<3% CTR drop
54% input token savings confirmed

Phase 3: Adaptive Thresholds

Deliverables:

Historical analysis: profit vs tier by campaign
Per-campaign threshold learning
Threshold update automation

Success criteria:

10% additional profit vs fixed thresholds
Thresholds stable (not oscillating)

Phase 4: Learned Routing

Deliverables:

select_model_tier_learned() using Ĉ and P̂
Expected value calculation framework
Bandit policy for exploration

Prerequisites:

Ĉ (quality predictor) trained and deployed
P̂ (performance predictor) trained and deployed
Propensity logging operational

Success criteria:

15% profit improvement vs rule-based
Bandit policy converges

8. Risk Mitigation

8.1 Quality Degradation Risk

Risk: Budget models produce lower quality, reducing approval rate and CTR.

Mitigation:

Start with conservative thresholds (only low-value traffic to budget)
Monitor approval rate daily, alert if <80%
Circuit breaker: auto-revert to premium if approval drops >10%
Human review sample: 100 budget-generated ads for manual QA

8.2 Profitability Threshold Risk

Risk: eCPM thresholds miscalibrated, losing money on expensive generations.

Mitigation:

Default to budget tier unless eCPM exceeds 2× generation cost
Continuous profit analysis per tier
Threshold adjustment automation

8.3 Model Availability Risk

Risk: Primary model down or rate-limited, fallback needed.

Mitigation:

Fallback chain: Gemini 2.0 Flash → Groq Llama → Claude Haiku
Cache model availability status (Redis, 1min TTL)
Alert if fallback rate >5%

Appendix A: Detailed Pricing Tables

Text Generation (per 1M tokens)

Model	Provider	Input	Output	Speed	Context
GPT-5.2	OpenAI	$1.75	$14.00	Fast	400K
Gemini 3 Pro	Google	$2.00	$12.00	Fast	200K
Gemini 3 Flash	Google	$0.50	$3.00	Very Fast	1M
Gemini 2.0 Flash	Google	$0.10	$0.40	Very Fast	1M
Llama 3.1 8B	Groq	$0.05	$0.08	Ultra Fast	128K
Mixtral 8x7B	Groq	$0.27	$0.27	Very Fast	32K
Claude Haiku	Anthropic	$1.00	$5.00	Fast	200K

Image Generation

Model	Provider	Resolution	Cost	Speed
Imagen 3	Google	1024×1024	$0.030	~8s
FLUX.1 [pro]	Replicate	1024×1024	$0.055	~10s
FLUX.1 [dev]	Replicate	1024×1024	$0.030	~6s
FLUX.1 [schnell]	Replicate	1024×1024	$0.003	~2s

Sources

End of Document

albertbn/dynamic_model_selection_design.md

Dynamic Model Selection for Contextual Ad Generation

Summary

Table of Contents

1. Problem Statement

Current State

Goal

2. API Cost Analysis (2026)

2.1 LLM Pricing (Text Generation)

2.2 Image Generation Pricing

2.3 Realistic Request Cost Breakdown

3. Simple Approach: Rule-Based Routing

3.1 Routing Logic

3.2 Domain Quality Classification

3.3 Integration into ControlledAd.py

3.4 Expected Impact

4. Advanced Approach: Learned Routing

4.1 Integration with Online Learning System

4.2 Multi-Armed Bandit for Model Tier Selection

5. Fast Wins: Input Token Optimization

5.1 Current Input (Full Article + Mega Prompt)

5.2 Optimized Input (Title + First Paragraph)

5.3 Cost Impact

5.4 Implementation

6. ROI Analysis & Break-Even Scenarios

6.1 Cost-Revenue Model

6.2 Break-Even Analysis

6.3 Scenario Analysis

7. Implementation Roadmap

Phase 1: Simple Rule-Based Routing

Phase 2: Input Token Optimization

Phase 3: Adaptive Thresholds

Phase 4: Learned Routing

8. Risk Mitigation

8.1 Quality Degradation Risk

8.2 Profitability Threshold Risk

8.3 Model Availability Risk

Appendix A: Detailed Pricing Tables

Text Generation (per 1M tokens)

Image Generation

Sources