Skip to content

Instantly share code, notes, and snippets.

@greggdonovan
Last active January 26, 2026 16:30
Show Gist options
  • Select an option

  • Save greggdonovan/f8728ba0f55399edb182260a8f1ce9a2 to your computer and use it in GitHub Desktop.

Select an option

Save greggdonovan/f8728ba0f55399edb182260a8f1ce9a2 to your computer and use it in GitHub Desktop.
Prolist campaign embedding indexer diagram
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

Prolist campaign embedding indexer

flowchart TB
  %% Prolist Campaign Embedding Indexer (CDC -> Bigtable -> Vespa)

  subgraph CDC["CDC ingest"]
    A["Kafka CDC topic<br/>cdc.etsy_shard.prolist_campaign"] --> B["DecodeCdcPayload<br/>KafkaAvroDeserializer + Schema Registry"]
    B --> C["ProlistCampaignChange<br/>shop_id, before/after status, op, ts"]
  end

  C --> D{"Activation event?<br/>(op=c with after=ACTIVE)<br/>OR (op=u: inactive→ACTIVE)"}
  D -->|"no"| X["drop"]
  D -->|"yes"| E["ExtractActivationShopIds"]

  E --> J{"Dry run?"}
  J -->|"yes"| K["Debug output<br/>Logs only"]
  J -->|"no"| L["FetchListingIdsForShop<br/>Bigtable shop→listing index<br/>rowKeyPrefix={shop_id}#"]

  L --> M["KV(shop_id, listing_id)"]
  M --> N["FetchEmbeddingsFromBigtable<br/>listing embeddings table"]
  N --> O{"Embedding parsed?"}
  O -->|"no, placeholders disabled"| X2["drop"]
  O -->|"no, placeholders enabled"| Q["ListingEmbedding<br/>listing_id + zero vector"]
  O -->|"yes"| Q["ListingEmbedding<br/>listing_id + vector"]

  Q --> R["Create Vespa Update JSON<br/>field = ads_air_* (configurable)"]
  R --> S["Fan out by document type<br/>prolist and/or listing"]
  S --> T["VespaFeedFn<br/>mTLS + endpoint"]

  %% Notes for optional behavior
  L -.-> L2["If listingIdSource=none, logs warning and emits nothing"]

Loading

Prolist campaign embedding indexer (CDC → Bigtable → Vespa)

This Dataflow job listens to CDC updates on prolist_campaign and detects when a shop transitions into the ACTIVE status for Ads. For each activation event, it emits the shop id (optionally deduped within a short window), then looks up the shop’s listing ids in a Bigtable shop→listing index. It fetches the latest L2L embedding vector for each listing from the embeddings Bigtable and formats a Vespa partial update for the configured ads_air_* field. Those updates are then fanned out to the target document types (prolist and/or listing) and fed to Vespa over mTLS.

In dry‑run mode, the pipeline does not read Bigtable or write to Vespa; it only logs activation debug lines.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment