Skip to content

Instantly share code, notes, and snippets.

@albertbn
Last active February 11, 2026 12:51
Show Gist options
  • Select an option

  • Save albertbn/c78c4bbaae3a7f8b8e0087f92ee6b151 to your computer and use it in GitHub Desktop.

Select an option

Save albertbn/c78c4bbaae3a7f8b8e0087f92ee6b151 to your computer and use it in GitHub Desktop.
Dynamic Model Selection for Contextual Ad Generation - Design Proposal

Dynamic Model Selection for Contextual Ad Generation

Author: Albert Bentov Date: 2026-02-11 Status: Design Proposal


Summary

This document proposes a two-tiered approach for intelligent model selection in contextual ad generation:

  1. Simple Approach (immediate): Rule-based routing for current production based on expected click value (eCPM) and domain reputation
  2. Advanced Approach (post-online learning): Learned routing integrated with quality predictor (Ĉ) and performance predictor (P̂)

Expected Impact:

  • 50-70% cost reduction on low-value impressions
  • Maintained quality on high-value impressions
  • 2-5x faster response times for most requests
  • Profitability threshold enforcement per impression

Table of Contents

  1. Problem Statement
  2. API Cost Analysis (2026)
  3. Simple Approach: Rule-Based Routing
  4. Advanced Approach: Learned Routing
  5. Fast Wins: Input Token Optimization
  6. ROI Analysis & Break-Even Scenarios
  7. Implementation Roadmap
  8. Risk Mitigation

1. Problem Statement

Current State

Our production system (ControlledAd.py) serves contextual ads with human-in-the-loop approval:

  1. Fetch article (title + body)
  2. Generate embeddings (256d)
  3. Find anchor ad via similarity search
  4. Exploration trigger: When predefined categories or approved candidates fail similarity threshold
  5. Brand safety check (LLM call on title + content)
  6. Generate candidate variants (LLM with mega-prompt: brand + styling + strategies + few-shot + safety instructions)
  7. Generate image
  8. Human approval → serve winning ad

Problem: We use expensive models uniformly regardless of:

  • Expected click value (advertiser's willingness to pay)
  • Domain quality/reputation (premium publishers vs low-traffic blogs)
  • Content complexity (simple product ads vs nuanced brand campaigns)

Result: Unprofitable on low-eCPM impressions, over-engineered for simple contexts.

Goal

Dynamically select model tier (LLM size, image generation quality) based on:

$$ \text{ModelTier} = f(\text{eCPM}, \text{domainStats}, \text{contentComplexity}) $$

Constraint: Maintain quality standards while maximizing profit margin per impression.


2. API Cost Analysis (2026)

2.1 LLM Pricing (Text Generation)

Model Provider Input Cost ($/1M tokens) Output Cost ($/1M tokens) Context Speed Use Case
GPT-5.2 OpenAI $1.75 $14.00 400K Fast Premium tier
Gemini 3 Pro Google $2.00 $12.00 200K Fast Premium tier
Gemini 3 Flash Google $0.50 $3.00 1M Very Fast Balanced tier
Gemini 2.0 Flash Google $0.10 $0.40 1M Very Fast Budget tier
Llama 3.1 8B Groq $0.05 $0.08 128K Ultra Fast Ultra-budget
Mixtral 8x7B Groq $0.27 $0.27 32K Very Fast Budget alternative
Claude Haiku Anthropic $1.00 $5.00 200K Fast Budget fallback

Key Observations:

  • Gemini 2.0 Flash is 20x cheaper than Gemini 3 Pro on input, 30x cheaper than GPT-5.2
  • Groq inference is 35-40x cheaper than premium models with acceptable quality trade-offs
  • Context caching available on Gemini (75% savings on repeated prompts, cache reads at 10% of input price)
  • GPT-5.2 generates internal "thinking" tokens billed as output ($14/1M)

Sources:

2.2 Image Generation Pricing

Model Provider Resolution Cost per Image Speed Use Case
Imagen 3 Google 1024×1024 $0.030 ~8s Premium tier
FLUX.1 [pro] Replicate 1024×1024 $0.055 ~10s High quality
FLUX.1 [dev] Replicate 1024×1024 $0.030 ~6s Balanced tier
FLUX.1 [schnell] Replicate 1024×1024 $0.003 ~2s Budget tier

Key Observations:

  • Flux schnell is 10x cheaper than Imagen 3 with acceptable quality
  • 2-3 second generation time enables real-time workflows
  • Flux dev offers good balance (same price as Imagen 3, faster)

Sources:

2.3 Realistic Request Cost Breakdown

Production mega-prompt structure:

  • Brand description: ~200 tokens
  • Styling instructions: ~300 tokens
  • Strategy guidelines: ~200 tokens
  • Few-shot examples (3-5 examples): ~500 tokens
  • Safety instructions: ~150 tokens
  • Article content (full): ~2500 tokens
  • Total input: ~3,850 tokens

Brand safety call:

  • System prompt: ~200 tokens
  • Article (title + content): ~2500 tokens
  • Total safety check: ~2,700 tokens

Scenario: Generate 1 contextual ad with exploration triggered

Component Tokens/Params Model Cost
Brand safety check 2,700 input + 50 output GPT-5.2 (current prod) $0.00543
Article embedding 2,500 input text-embedding-3-small $0.00005
Tagline generation 3,850 input + 150 output GPT-5.2 or Gemini 3 Pro $0.00884
Image generation 1 image Imagen 3 $0.03000
Total (Premium) $0.0443

Alternative (Budget):

Component Tokens/Params Model Cost
Brand safety check 2,700 input + 50 output GPT-5.2 (unchanged) $0.00543
Article embedding 800 input (title + para1) text-embedding-3-small $0.00002
Tagline generation (compact) 1,200 input + 150 output Gemini 2.0 Flash $0.00018
Image generation 1 image Flux schnell $0.00300
Total (Budget) $0.0086

Savings: 91% cost reduction per generation

Ultra-budget (Groq):

Component Tokens/Params Model Cost
Brand safety check 2,700 input + 50 output Llama 3.1 8B (Groq) $0.00014
Tagline generation (compact) 1,200 input + 150 output Llama 3.1 8B (Groq) $0.00007
Image generation 1 image Flux schnell $0.00300
Total (Ultra-budget) $0.0032

Savings: 92% cost reduction, 5x faster


3. Simple Approach: Rule-Based Routing

3.1 Routing Logic

def select_model_tier(
    ecpm: float,              # Expected CPM ($/1000 impressions)
    domain_quality: str,      # 'premium' | 'standard' | 'low'
    content_length: int,      # Article word count
    campaign_type: str        # 'brand_awareness' | 'performance'
) -> dict:
    """
    Simple rule-based model selection.

    Returns:
        {
            'llm': str,
            'llm_tier': 'premium' | 'balanced' | 'budget',
            'image': str,
            'image_tier': 'premium' | 'balanced' | 'budget',
            'input_mode': 'full_article' | 'title_plus_para1',
            'max_cost': float
        }
    """

    # Profitability threshold (must cover at least 2x generation cost)
    MIN_ECPM_PREMIUM = 10.0   # $10 eCPM = $0.010 per impression
    MIN_ECPM_BALANCED = 3.0   # $3 eCPM = $0.003 per impression

    # Decision tree
    if ecpm >= MIN_ECPM_PREMIUM and domain_quality == 'premium':
        # High-value, premium publishers → best quality
        return {
            'llm': 'gpt-5.2',  # or 'gemini-3-pro'
            'llm_tier': 'premium',
            'image': 'imagen-3',
            'image_tier': 'premium',
            'input_mode': 'full_article',
            'safety_model': 'gpt-5.2',  # Current production
            'max_cost': 0.0443
        }

    elif ecpm >= MIN_ECPM_BALANCED and domain_quality in ['premium', 'standard']:
        # Mid-value, good publishers → balanced
        return {
            'llm': 'gemini-3-flash',
            'llm_tier': 'balanced',
            'image': 'flux-dev',
            'image_tier': 'balanced',
            'input_mode': 'title_plus_para1',
            'safety_model': 'gpt-5.2',  # Current production
            'max_cost': 0.0117
        }

    elif campaign_type == 'brand_awareness':
        # Brand campaigns → prioritize quality over cost
        return {
            'llm': 'gpt-5.2',  # or 'gemini-3-pro'
            'llm_tier': 'premium',
            'image': 'flux-dev',  # Balanced image sufficient
            'image_tier': 'balanced',
            'input_mode': 'title_plus_para1',
            'safety_model': 'gpt-5.2',  # Current production
            'max_cost': 0.0159
        }

    else:
        # Low-value or unproven domains → budget
        return {
            'llm': 'gemini-2-flash',
            'llm_tier': 'budget',
            'image': 'flux-schnell',
            'image_tier': 'budget',
            'input_mode': 'title_plus_para1',
            'safety_model': 'gpt-5.2',  # Current production
            'max_cost': 0.0086
        }

3.2 Domain Quality Classification

Data Sources (existing in production):

  • Impression count (from impressions table)
  • CTR history (clicks / impressions per domain)
  • Human approval rate (from controlled_ads type=2 vs type=-1)
  • Publisher whitelist/blacklist

Simple Heuristic:

def classify_domain_quality(domain: str) -> str:
    """Classify domain based on historical stats."""
    stats = get_domain_stats(domain)

    if domain in PREMIUM_WHITELIST:
        return 'premium'

    if stats['impression_count'] > 10000 and stats['ctr'] > 0.02:
        return 'premium'

    if stats['impression_count'] > 1000 and stats['ctr'] > 0.01:
        return 'standard'

    return 'low'

3.3 Integration into ControlledAd.py

Modification point before exploration trigger:

def _trigger_exploration_async(self, selected_ad: Dict | None) -> None:
    """Trigger exploration with dynamic model selection."""

    # NEW: Select model tier before generation
    model_config = select_model_tier(
        ecpm=self.calculate_ecpm(),
        domain_quality=self.classify_domain(),
        content_length=len(self.article_text.split()),
        campaign_type=self.campaign_type
    )

    # Store config for exploration method to use
    self.model_config = model_config

    # Existing exploration logic...
    if self.cache.get_from_cache(self.key_lock_exploration):
        return

    self.cache.update_cache(
        self.key_lock_exploration,
        {'exploration_in_progress': 1},
        EXPIRATION_60_SEC
    )

    if self.exploration_method:
        async_call(self._execute_exploration_on_copy, selected_ad)

3.4 Expected Impact

Traffic Distribution (estimated):

Tier % Traffic Avg eCPM Current Cost New Cost Savings
Premium 15% $12.00 $0.0443 $0.0443 $0
Balanced 35% $5.00 $0.0443 $0.0117 $0.0326
Budget 50% $1.50 $0.0443 $0.0086 $0.0357

Total Savings: (0.35 × $0.0326) + (0.50 × $0.0357) = $0.0293 per impression (66% reduction)

Annual Impact (1M impressions/month):

  • Current: $44,300/month
  • New: $15,055/month
  • Savings: $29,245/month ($350,940/year)

4. Advanced Approach: Learned Routing

4.1 Integration with Online Learning System

Once the self-learning framework (Ĉ, P̂, DSPy) is operational, upgrade routing to use learned signals:

def select_model_tier_learned(
    context: dict,           # Brand, article, domain
    C_hat_threshold: float = 0.7,  # Quality predictor threshold
    P_hat_threshold: float = 0.02, # Performance predictor threshold
    ecpm: float = None
) -> dict:
    """
    Learned model selection using quality and performance predictors.

    Key insight: If we predict high approval (Ĉ) and high CTR (P̂),
    it's worth investing in premium models. Otherwise, use budget.
    """

    # Quick quality pre-check using Ĉ on anchor ad
    anchor_quality = C_hat(context['brand'], context['article'], context['anchor'])

    # Predicted performance using P̂ on anchor
    predicted_ctr = P_hat(context['article'], context['anchor'])

    # Calculate expected value of premium vs budget generation
    premium_value = (
        predicted_ctr * 1.2 *  # Assume 20% CTR lift from premium models
        ecpm / 1000 -           # Revenue per impression
        0.0411                  # Premium cost
    )

    budget_value = (
        predicted_ctr *         # No CTR lift assumption
        ecpm / 1000 -           # Revenue per impression
        0.0035                  # Budget cost
    )

    # Decision: use premium only if EV is higher
    if premium_value > budget_value and anchor_quality > C_hat_threshold:
        return {
            'llm': 'gpt-5.2',  # or 'gemini-3-pro'
            'llm_tier': 'premium',
            'image': 'imagen-3',
            'image_tier': 'premium',
            'input_mode': 'full_article',
            'expected_value': premium_value,
            'reason': f'High quality ({anchor_quality:.2f}) + high CTR ({predicted_ctr:.3f})'
        }

    else:
        return {
            'llm': 'gemini-2-flash',
            'llm_tier': 'budget',
            'image': 'flux-schnell',
            'image_tier': 'budget',
            'input_mode': 'title_plus_para1',
            'expected_value': budget_value,
            'reason': f'Budget sufficient (quality={anchor_quality:.2f}, CTR={predicted_ctr:.3f})'
        }

4.2 Multi-Armed Bandit for Model Tier Selection

Treat model tier selection as a contextual bandit problem:

Context: (brand_id, domain_tier, content_category, article_length) Actions: (premium, balanced, budget) Reward: (revenue - cost) per impression

class ModelTierBandit:
    """Contextual bandit for model tier selection."""

    def __init__(self):
        self.policy = EpsilonGreedy(epsilon=0.1)
        self.context_encoder = embed_context
        self.Q_table = defaultdict(lambda: {'premium': 0.0, 'balanced': 0.0, 'budget': 0.0})

    def select_tier(self, context: dict) -> str:
        """Select model tier using ε-greedy policy."""
        context_key = self.context_encoder(context)

        if random.random() < self.policy.epsilon:
            return random.choice(['premium', 'balanced', 'budget'])
        else:
            return max(self.Q_table[context_key], key=self.Q_table[context_key].get)

    def update(self, context: dict, tier: str, reward: float):
        """Update Q-value after observing reward."""
        context_key = self.context_encoder(context)
        alpha = 0.1  # Learning rate
        old_Q = self.Q_table[context_key][tier]
        self.Q_table[context_key][tier] = old_Q + alpha * (reward - old_Q)

5. Fast Wins: Input Token Optimization

5.1 Current Input (Full Article + Mega Prompt)

Typical production prompt:

  • Mega-prompt components: ~1,350 tokens
    • Brand description: 200
    • Styling instructions: 300
    • Strategy guidelines: 200
    • Few-shot examples: 500
    • Safety instructions: 150
  • Article (full): ~2,500 tokens
  • Total input: ~3,850 tokens

Brand safety call:

  • Article (title + content): ~2,500 tokens
  • Safety prompt: ~200 tokens
  • Total: ~2,700 tokens

5.2 Optimized Input (Title + First Paragraph)

Reduced input:

  • Mega-prompt components: ~1,350 tokens (same)
  • Article (title + para1): ~400 tokens
  • Total input: ~1,750 tokens

Brand safety call (unchanged):

  • Still uses full article for safety: ~2,700 tokens

Savings: 54% input token reduction on generation (safety unchanged for quality)

5.3 Cost Impact

Model Full Article Cost Compact Cost Savings
GPT-5.2 $0.00884 $0.00401 $0.00483 (55%)
Gemini 3 Pro $0.00950 $0.00431 $0.00519 (55%)
Gemini 2.0 Flash $0.00039 $0.00018 $0.00021 (54%)

5.4 Implementation

def build_compact_prompt(self, context: dict) -> str:
    """Build prompt using only title + first paragraph."""

    article_title = context['article']['title']
    article_body = context['article']['body']

    # Extract first paragraph (split by \n\n or first 150 words)
    first_paragraph = self.extract_first_paragraph(article_body, max_words=150)

    # Mega-prompt components (unchanged)
    mega_prompt = self.build_mega_prompt_base(context['brand'])

    prompt = f"""{mega_prompt}

Article title: {article_title}
Article excerpt: {first_paragraph}

Anchor tagline: {context['anchor']['tagline']}

Generate contextual tagline variant following brand guidelines above.

Tagline:
"""

    return prompt

6. ROI Analysis & Break-Even Scenarios

6.1 Cost-Revenue Model

Profit per impression:

$$ \text{Profit} = \frac{\text{eCPM}}{1000} - C_{gen} $$

Or for CPC campaigns:

$$ \text{Profit} = \text{CTR} \times \text{CPC} - C_{gen} $$

6.2 Break-Even Analysis

Model Tier $C_{gen}$ Break-even eCPM (2× margin) Break-even CPC (1% CTR)
Premium (GPT-5.2 + Imagen) $0.0443 $88.60 $4.43
Balanced (Gemini 3 Flash + Flux) $0.0117 $23.40 $1.17
Budget (Gemini 2 Flash + Flux) $0.0086 $17.20 $0.86

Interpretation:

  • Premium tier requires $82+ eCPM to be profitable with 2× margin
  • Budget tier profitable at $7 eCPM (achievable on most campaigns)
  • Ultra-cheap models (Groq + Flux schnell) profitable at <$1 eCPM

6.3 Scenario Analysis

Scenario 1: Mid-value campaign (eCPM = $5, CTR = 1.5%)

Tier Cost Revenue Profit ROI
Premium $0.0443 $0.0050 -$0.0393 -88.7%
Balanced $0.0117 $0.0050 -$0.0067 -57.3%
Budget $0.0086 $0.0050 -$0.0036 -41.9%

Conclusion: Only budget tier profitable for typical campaigns.

Scenario 2: Premium publisher (CPC = $8, CTR = 2.5%)

Tier Cost Revenue (CTR × CPC) Profit ROI
Premium $0.0443 $0.20 $0.1557 +351.5%
Balanced $0.0117 $0.20 $0.1883 +1609.4%
Budget $0.0086 $0.20 × 0.95 $0.1814 +2109.3%

Insight: Even with -5% quality penalty, budget tier delivers highest ROI. Premium justified only for brand-sensitive campaigns.


7. Implementation Roadmap

Phase 1: Simple Rule-Based Routing

Deliverables:

  1. select_model_tier() function with eCPM + domain quality routing
  2. Domain quality classifier (premium/standard/low)
  3. Integration into ControlledAd._trigger_exploration_async()
  4. Logging: model_tier, generation_cost, decision_reason

Success criteria:

  • 50% of traffic routed to budget tier
  • No drop in approval rate
  • Cost savings confirmed

Phase 2: Input Token Optimization

Deliverables:

  1. build_compact_prompt() using title + para1
  2. A/B test framework (50/50 split)
  3. Quality monitoring dashboard

Success criteria:

  • <5% approval rate drop
  • <3% CTR drop
  • 54% input token savings confirmed

Phase 3: Adaptive Thresholds

Deliverables:

  1. Historical analysis: profit vs tier by campaign
  2. Per-campaign threshold learning
  3. Threshold update automation

Success criteria:

  • 10% additional profit vs fixed thresholds
  • Thresholds stable (not oscillating)

Phase 4: Learned Routing

Deliverables:

  1. select_model_tier_learned() using Ĉ and P̂
  2. Expected value calculation framework
  3. Bandit policy for exploration

Prerequisites:

  • Ĉ (quality predictor) trained and deployed
  • P̂ (performance predictor) trained and deployed
  • Propensity logging operational

Success criteria:

  • 15% profit improvement vs rule-based
  • Bandit policy converges

8. Risk Mitigation

8.1 Quality Degradation Risk

Risk: Budget models produce lower quality, reducing approval rate and CTR.

Mitigation:

  1. Start with conservative thresholds (only low-value traffic to budget)
  2. Monitor approval rate daily, alert if <80%
  3. Circuit breaker: auto-revert to premium if approval drops >10%
  4. Human review sample: 100 budget-generated ads for manual QA

8.2 Profitability Threshold Risk

Risk: eCPM thresholds miscalibrated, losing money on expensive generations.

Mitigation:

  1. Default to budget tier unless eCPM exceeds 2× generation cost
  2. Continuous profit analysis per tier
  3. Threshold adjustment automation

8.3 Model Availability Risk

Risk: Primary model down or rate-limited, fallback needed.

Mitigation:

  1. Fallback chain: Gemini 2.0 Flash → Groq Llama → Claude Haiku
  2. Cache model availability status (Redis, 1min TTL)
  3. Alert if fallback rate >5%

Appendix A: Detailed Pricing Tables

Text Generation (per 1M tokens)

Model Provider Input Output Speed Context
GPT-5.2 OpenAI $1.75 $14.00 Fast 400K
Gemini 3 Pro Google $2.00 $12.00 Fast 200K
Gemini 3 Flash Google $0.50 $3.00 Very Fast 1M
Gemini 2.0 Flash Google $0.10 $0.40 Very Fast 1M
Llama 3.1 8B Groq $0.05 $0.08 Ultra Fast 128K
Mixtral 8x7B Groq $0.27 $0.27 Very Fast 32K
Claude Haiku Anthropic $1.00 $5.00 Fast 200K

Image Generation

Model Provider Resolution Cost Speed
Imagen 3 Google 1024×1024 $0.030 ~8s
FLUX.1 [pro] Replicate 1024×1024 $0.055 ~10s
FLUX.1 [dev] Replicate 1024×1024 $0.030 ~6s
FLUX.1 [schnell] Replicate 1024×1024 $0.003 ~2s

Sources


End of Document

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment