MaxGhenis · February 10, 2026 14:01
diff --git a/sparse_calibrator_vs_hardconcrete.ipynb b/sparse_calibrator_vs_hardconcrete.ipynb
 {
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# SparseCalibrator vs HardConcrete: reweighting frontier comparison\n",
    "\n",
    "This notebook compares two sparse survey calibration methods on an out-of-sample evaluation:\n",
    "\n",
    "- **SparseCalibrator** ($L_1$ penalty, FISTA solver — convex, deterministic)\n",
    "- **HardConcrete** ($L_0$ penalty via Hard Concrete gates — non-convex, stochastic)\n",
    "\n",
    "Both are implemented in [microplex](https://github.com/CosilicoAI/microplex). HardConcrete wraps PolicyEngine's [l0-python](https://github.com/PolicyEngine/l0-python) package, which is currently used in [policyengine-us-data](https://github.com/PolicyEngine/policyengine-us-data) for survey weight calibration.\n",
    "\n",
    "## Setup\n",
    "\n",
    "```bash\n",
    "pip install microplex l0-python\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import time\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "import matplotlib.pyplot as plt\n",
    "from microplex.calibration import SparseCalibrator, HardConcreteCalibrator"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Generate synthetic population\n",
    "\n",
    "5,000 records with age groups, sex, and survey weights — mimicking a CPS-like dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "N = 5_000\n",
    "rng = np.random.RandomState(42)\n",
    "\n",
    "df = pd.DataFrame({\n",
    "    \"age_group\": rng.choice([\"0-17\", \"18-34\", \"35-54\", \"55-64\", \"65+\"], N,\n",
    "                             p=[0.22, 0.22, 0.26, 0.13, 0.17]),\n",
    "    \"is_male\": rng.choice([True, False], N, p=[0.49, 0.51]),\n",
    "    \"weight\": rng.lognormal(8.0, 0.5, N),  # Survey weights ~3000 mean\n",
    "})\n",
    "print(f\"Records: {len(df):,}\")\n",
    "print(f\"Mean weight: {df['weight'].mean():.0f}\")\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Train/test target split\n",
    "\n",
    "- **Train targets** (used during calibration): age group marginals + total weight (6 targets)\n",
    "- **Test targets** (held out): sex marginals (2 targets)\n",
    "\n",
    "Targets are perturbed 10-30% from the sample distribution to simulate calibration to external population totals."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "rng2 = np.random.RandomState(123)\n",
    "\n",
    "# Train: age_group marginals\n",
    "train_marginal = {\"age_group\": {\n",
    "    cat: round(count * rng2.uniform(0.7, 1.3))\n",
    "    for cat, count in df[\"age_group\"].value_counts().items()\n",
    "}}\n",
    "\n",
    "# Train: total weight (continuous)\n",
    "train_continuous = {\"weight\": round(df[\"weight\"].sum() * rng2.uniform(0.9, 1.1))}\n",
    "\n",
    "# Test: sex marginals (held out during calibration)\n",
    "test_marginal = {\"is_male\": {\n",
    "    cat: round(count * rng2.uniform(0.8, 1.2))\n",
    "    for cat, count in df[\"is_male\"].value_counts().items()\n",
    "}}\n",
    "\n",
    "print(\"Train targets:\")\n",
    "for var, targets in {**train_marginal, **{\"weight\": train_continuous}}.items():\n",
    "    print(f\"  {var}: {targets}\")\n",
    "print(f\"\\nTest targets (held out):\")\n",
    "for var, targets in test_marginal.items():\n",
    "    print(f\"  {var}: {targets}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Evaluation function"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def evaluate(df, weights, marginal_targets):\n",
    "    \"\"\"Mean absolute relative error on a set of marginal targets.\"\"\"\n",
    "    errors = []\n",
    "    for var, var_targets in marginal_targets.items():\n",
    "        for cat, target in var_targets.items():\n",
    "            mask = df[var] == cat\n",
    "            actual = float(weights[mask].sum())\n",
    "            errors.append(abs(actual - target) / target if target > 0 else 0)\n",
    "    return np.mean(errors)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Sweep SparseCalibrator (L1, convex)\n",
    "\n",
    "SparseCalibrator solves:\n",
    "$$\\min \\frac{1}{2} \\sum_i \\left(\\frac{A_i w - b_i}{|b_i|}\\right)^2 + \\lambda \\|w\\|_1 \\quad \\text{s.t.} \\quad w \\geq 0$$\n",
    "\n",
    "This is non-negative LASSO with relative loss (via target normalization). FISTA solver — convex, deterministic, no random seed dependence."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sc_lambdas = [0.0, 0.001, 0.005, 0.01, 0.02, 0.05, 0.1, 0.2, 0.5, 1.0, 2.0, 5.0]\n",
    "sc_results = []\n",
    "\n",
    "for lam in sc_lambdas:\n",
    "    cal = SparseCalibrator(sparsity_weight=lam)\n",
    "    t0 = time.time()\n",
    "    cal.fit(df, train_marginal, train_continuous)\n",
    "    elapsed = time.time() - t0\n",
    "    w = cal.weights_\n",
    "    n_active = int((w > 1e-9).sum())\n",
    "    sc_results.append({\n",
    "        \"lambda\": lam,\n",
    "        \"n_active\": n_active,\n",
    "        \"train_error\": evaluate(df, w, train_marginal),\n",
    "        \"test_error\": evaluate(df, w, test_marginal),\n",
    "        \"elapsed\": elapsed,\n",
    "    })\n",
    "    print(f\"  λ={lam:<6} n_active={n_active:>5}  test={sc_results[-1]['test_error']:.1%}  {elapsed:.3f}s\")\n",
    "\n",
    "sc_df = pd.DataFrame(sc_results)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Sweep HardConcrete (L0, non-convex)\n",
    "\n",
    "HardConcrete uses differentiable $L_0$ gates (Hard Concrete distribution, Louizos et al. 2018) to jointly optimize which records to keep and what weights to assign. Non-convex — results depend on random initialization.\n",
    "\n",
    "We run **5 seeds per λ** to get reliable mean ± SE."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import torch\n",
    "\n",
    "hc_lambdas = [1e-7, 5e-7, 1e-6, 5e-6, 1e-5, 5e-5, 1e-4, 5e-4, 1e-3, 5e-3, 1e-2]\n",
    "n_seeds = 5\n",
    "hc_results = []\n",
    "\n",
    "for lam in hc_lambdas:\n",
    "    seed_errors = []\n",
    "    seed_actives = []\n",
    "    seed_times = []\n",
    "    for seed in range(n_seeds):\n",
    "        torch.manual_seed(42 + seed)\n",
    "        np.random.seed(42 + seed)\n",
    "        cal = HardConcreteCalibrator(lambda_l0=lam, epochs=2000, verbose=False)\n",
    "        t0 = time.time()\n",
    "        cal.fit(df, train_marginal, train_continuous)\n",
    "        elapsed = time.time() - t0\n",
    "        w = cal.weights_\n",
    "        seed_errors.append(evaluate(df, w, test_marginal))\n",
    "        seed_actives.append(int((w > 1e-9).sum()))\n",
    "        seed_times.append(elapsed)\n",
    "    hc_results.append({\n",
    "        \"lambda\": lam,\n",
    "        \"n_active_mean\": np.mean(seed_actives),\n",
    "        \"test_error_mean\": np.mean(seed_errors),\n",
    "        \"test_error_se\": np.std(seed_errors) / np.sqrt(n_seeds),\n",
    "        \"elapsed_mean\": np.mean(seed_times),\n",
    "    })\n",
    "    r = hc_results[-1]\n",
    "    print(f\"  λ={lam:<8.0e} n_active={r['n_active_mean']:>6.0f}  \"\n",
    "          f\"test={r['test_error_mean']:.1%} ± {r['test_error_se']:.1%}  \"\n",
    "          f\"{r['elapsed_mean']:.2f}s\")\n",
    "\n",
    "hc_df = pd.DataFrame(hc_results)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Frontier plot"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "fig, ax = plt.subplots(figsize=(8, 5))\n",
    "\n",
    "# SparseCalibrator frontier\n",
    "sc_sorted = sc_df.sort_values(\"n_active\")\n",
    "ax.plot(sc_sorted[\"n_active\"], sc_sorted[\"test_error\"],\n",
    "        \"s-\", color=\"#2ca02c\", linewidth=1.5, markersize=6,\n",
    "        label=r\"SparseCalibrator ($L_1$, convex)\")\n",
    "\n",
    "# HardConcrete frontier with error bars\n",
    "hc_sorted = hc_df.sort_values(\"n_active_mean\")\n",
    "ax.errorbar(hc_sorted[\"n_active_mean\"], hc_sorted[\"test_error_mean\"],\n",
    "            yerr=hc_sorted[\"test_error_se\"],\n",
    "            fmt=\"o-\", color=\"#1f77b4\", linewidth=1.5, markersize=6,\n",
    "            capsize=3, capthick=1,\n",
    "            label=r\"HardConcrete ($L_0$, non-convex)\")\n",
    "\n",
    "ax.set_xlabel(\"Active records (non-zero weight)\", fontsize=12)\n",
    "ax.set_ylabel(\"Out-of-sample error\\n(held-out sex margin)\", fontsize=12)\n",
    "ax.set_xscale(\"log\")\n",
    "ax.set_ylim(0, None)\n",
    "ax.yaxis.set_major_formatter(plt.FuncFormatter(lambda y, _: f\"{y:.0%}\"))\n",
    "ax.legend(fontsize=10)\n",
    "ax.grid(True, alpha=0.2)\n",
    "ax.set_title(\"Reweighting frontier: records used vs out-of-sample error\", fontsize=13)\n",
    "plt.tight_layout()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Runtime comparison"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"SparseCalibrator runtime (all λ values):\")\n",
    "print(f\"  Mean: {sc_df['elapsed'].mean():.3f}s\")\n",
    "print(f\"  Max:  {sc_df['elapsed'].max():.3f}s\")\n",
    "\n",
    "print(f\"\\nHardConcrete runtime (2000 epochs, all λ values):\")\n",
    "print(f\"  Mean: {hc_df['elapsed_mean'].mean():.2f}s\")\n",
    "print(f\"  Max:  {hc_df['elapsed_mean'].max():.2f}s\")\n",
    "\n",
    "print(f\"\\nSpeedup: {hc_df['elapsed_mean'].mean() / sc_df['elapsed'].mean():.0f}x\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Summary\n",
    "\n",
    "| Property | SparseCalibrator ($L_1$) | HardConcrete ($L_0$) |\n",
    "|----------|------------------------|---------------------|\n",
    "| **Solver** | FISTA (convex) | Adam SGD (non-convex) |\n",
    "| **Deterministic** | Yes | No (seed-dependent) |\n",
    "| **Dominates frontier** | Yes | No |\n",
    "| **Runtime (5K records)** | ~0.02s | ~1.8s (2000 epochs) |\n",
    "| **Loss function** | Relative MSE (via normalization) | Relative MSE |\n",
    "| **Regularization** | $\\lambda \\|w\\|_1$ (weight shrinkage) | $\\lambda_{L_0} \\|\\alpha\\|_0$ (gate selection) |\n",
    "\n",
    "### Key findings\n",
    "\n",
    "1. **SparseCalibrator dominates the entire frontier** — lower or equal out-of-sample error at every sparsity level\n",
    "2. **HardConcrete variance explodes at high sparsity** — ±10-14% SE below 100 records vs zero variance for SparseCalibrator\n",
    "3. **SparseCalibrator is 12-90x faster** depending on configuration\n",
    "4. **SparseCalibrator already supports relative loss** — `normalize_targets=True` (default) divides each constraint row by `|b_i|`, which is mathematically equivalent to `loss_type=\"relative\"` in l0-python\n",
    "5. **No new method** — SparseCalibrator is non-negative LASSO (Tibshirani 1996) solved with FISTA (Beck & Teboulle 2009). The contribution is the empirical comparison, not the algorithm.\n",
    "\n",
    "### Recommendation\n",
    "\n",
    "Replace `l0-python`'s `SparseCalibrationWeights` with `microplex`'s `SparseCalibrator` for survey weight calibration in PolicyEngine datasets. The convex formulation is faster, more reliable, and produces better out-of-sample generalization."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "name": "python",
   "version": "3.14.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
 }
	{
	"cells": [
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"# SparseCalibrator vs HardConcrete: reweighting frontier comparison\n",
	"\n",
	"This notebook compares two sparse survey calibration methods on an out-of-sample evaluation:\n",
	"\n",
	"- SparseCalibrator ($L_1$ penalty, FISTA solver — convex, deterministic)\n",
	"- HardConcrete ($L_0$ penalty via Hard Concrete gates — non-convex, stochastic)\n",
	"\n",
	"Both are implemented in [microplex](https://github.com/CosilicoAI/microplex). HardConcrete wraps PolicyEngine's [l0-python](https://github.com/PolicyEngine/l0-python) package, which is currently used in [policyengine-us-data](https://github.com/PolicyEngine/policyengine-us-data) for survey weight calibration.\n",
	"\n",
	"## Setup\n",
	"\n",
	"```bash\n",
	"pip install microplex l0-python\n",
	"```"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"import time\n",
	"import numpy as np\n",
	"import pandas as pd\n",
	"import matplotlib.pyplot as plt\n",
	"from microplex.calibration import SparseCalibrator, HardConcreteCalibrator"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"## Generate synthetic population\n",
	"\n",
	"5,000 records with age groups, sex, and survey weights — mimicking a CPS-like dataset."
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"N = 5_000\n",
	"rng = np.random.RandomState(42)\n",
	"\n",
	"df = pd.DataFrame({\n",
	" \"age_group\": rng.choice([\"0-17\", \"18-34\", \"35-54\", \"55-64\", \"65+\"], N,\n",
	" p=[0.22, 0.22, 0.26, 0.13, 0.17]),\n",
	" \"is_male\": rng.choice([True, False], N, p=[0.49, 0.51]),\n",
	" \"weight\": rng.lognormal(8.0, 0.5, N), # Survey weights ~3000 mean\n",
	"})\n",
	"print(f\"Records: {len(df):,}\")\n",
	"print(f\"Mean weight: {df['weight'].mean():.0f}\")\n",
	"df.head()"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"## Train/test target split\n",
	"\n",
	"- Train targets (used during calibration): age group marginals + total weight (6 targets)\n",
	"- Test targets (held out): sex marginals (2 targets)\n",
	"\n",
	"Targets are perturbed 10-30% from the sample distribution to simulate calibration to external population totals."
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"rng2 = np.random.RandomState(123)\n",
	"\n",
	"# Train: age_group marginals\n",
	"train_marginal = {\"age_group\": {\n",
	" cat: round(count * rng2.uniform(0.7, 1.3))\n",
	" for cat, count in df[\"age_group\"].value_counts().items()\n",
	"}}\n",
	"\n",
	"# Train: total weight (continuous)\n",
	"train_continuous = {\"weight\": round(df[\"weight\"].sum() * rng2.uniform(0.9, 1.1))}\n",
	"\n",
	"# Test: sex marginals (held out during calibration)\n",
	"test_marginal = {\"is_male\": {\n",
	" cat: round(count * rng2.uniform(0.8, 1.2))\n",
	" for cat, count in df[\"is_male\"].value_counts().items()\n",
	"}}\n",
	"\n",
	"print(\"Train targets:\")\n",
	"for var, targets in {train_marginal, {\"weight\": train_continuous}}.items():\n",
	" print(f\" {var}: {targets}\")\n",
	"print(f\"\\nTest targets (held out):\")\n",
	"for var, targets in test_marginal.items():\n",
	" print(f\" {var}: {targets}\")"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"## Evaluation function"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"def evaluate(df, weights, marginal_targets):\n",
	" \"\"\"Mean absolute relative error on a set of marginal targets.\"\"\"\n",
	" errors = []\n",
	" for var, var_targets in marginal_targets.items():\n",
	" for cat, target in var_targets.items():\n",
	" mask = df[var] == cat\n",
	" actual = float(weights[mask].sum())\n",
	" errors.append(abs(actual - target) / target if target > 0 else 0)\n",
	" return np.mean(errors)"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"## Sweep SparseCalibrator (L1, convex)\n",
	"\n",
	"SparseCalibrator solves:\n",
	"$$\\min \\frac{1}{2} \\sum_i \\left(\\frac{A_i w - b_i}{\|b_i\|}\\right)^2 + \\lambda \\\|w\\\|_1 \\quad \\text{s.t.} \\quad w \\geq 0$$\n",
	"\n",
	"This is non-negative LASSO with relative loss (via target normalization). FISTA solver — convex, deterministic, no random seed dependence."
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"sc_lambdas = [0.0, 0.001, 0.005, 0.01, 0.02, 0.05, 0.1, 0.2, 0.5, 1.0, 2.0, 5.0]\n",
	"sc_results = []\n",
	"\n",
	"for lam in sc_lambdas:\n",
	" cal = SparseCalibrator(sparsity_weight=lam)\n",
	" t0 = time.time()\n",
	" cal.fit(df, train_marginal, train_continuous)\n",
	" elapsed = time.time() - t0\n",
	" w = cal.weights_\n",
	" n_active = int((w > 1e-9).sum())\n",
	" sc_results.append({\n",
	" \"lambda\": lam,\n",
	" \"n_active\": n_active,\n",
	" \"train_error\": evaluate(df, w, train_marginal),\n",
	" \"test_error\": evaluate(df, w, test_marginal),\n",
	" \"elapsed\": elapsed,\n",
	" })\n",
	" print(f\" λ={lam:<6} n_active={n_active:>5} test={sc_results[-1]['test_error']:.1%} {elapsed:.3f}s\")\n",
	"\n",
	"sc_df = pd.DataFrame(sc_results)"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"## Sweep HardConcrete (L0, non-convex)\n",
	"\n",
	"HardConcrete uses differentiable $L_0$ gates (Hard Concrete distribution, Louizos et al. 2018) to jointly optimize which records to keep and what weights to assign. Non-convex — results depend on random initialization.\n",
	"\n",
	"We run 5 seeds per λ to get reliable mean ± SE."
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"import torch\n",
	"\n",
	"hc_lambdas = [1e-7, 5e-7, 1e-6, 5e-6, 1e-5, 5e-5, 1e-4, 5e-4, 1e-3, 5e-3, 1e-2]\n",
	"n_seeds = 5\n",
	"hc_results = []\n",
	"\n",
	"for lam in hc_lambdas:\n",
	" seed_errors = []\n",
	" seed_actives = []\n",
	" seed_times = []\n",
	" for seed in range(n_seeds):\n",
	" torch.manual_seed(42 + seed)\n",
	" np.random.seed(42 + seed)\n",
	" cal = HardConcreteCalibrator(lambda_l0=lam, epochs=2000, verbose=False)\n",
	" t0 = time.time()\n",
	" cal.fit(df, train_marginal, train_continuous)\n",
	" elapsed = time.time() - t0\n",
	" w = cal.weights_\n",
	" seed_errors.append(evaluate(df, w, test_marginal))\n",
	" seed_actives.append(int((w > 1e-9).sum()))\n",
	" seed_times.append(elapsed)\n",
	" hc_results.append({\n",
	" \"lambda\": lam,\n",
	" \"n_active_mean\": np.mean(seed_actives),\n",
	" \"test_error_mean\": np.mean(seed_errors),\n",
	" \"test_error_se\": np.std(seed_errors) / np.sqrt(n_seeds),\n",
	" \"elapsed_mean\": np.mean(seed_times),\n",
	" })\n",
	" r = hc_results[-1]\n",
	" print(f\" λ={lam:<8.0e} n_active={r['n_active_mean']:>6.0f} \"\n",
	" f\"test={r['test_error_mean']:.1%} ± {r['test_error_se']:.1%} \"\n",
	" f\"{r['elapsed_mean']:.2f}s\")\n",
	"\n",
	"hc_df = pd.DataFrame(hc_results)"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"## Frontier plot"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"fig, ax = plt.subplots(figsize=(8, 5))\n",
	"\n",
	"# SparseCalibrator frontier\n",
	"sc_sorted = sc_df.sort_values(\"n_active\")\n",
	"ax.plot(sc_sorted[\"n_active\"], sc_sorted[\"test_error\"],\n",
	" \"s-\", color=\"#2ca02c\", linewidth=1.5, markersize=6,\n",
	" label=r\"SparseCalibrator ($L_1$, convex)\")\n",
	"\n",
	"# HardConcrete frontier with error bars\n",
	"hc_sorted = hc_df.sort_values(\"n_active_mean\")\n",
	"ax.errorbar(hc_sorted[\"n_active_mean\"], hc_sorted[\"test_error_mean\"],\n",
	" yerr=hc_sorted[\"test_error_se\"],\n",
	" fmt=\"o-\", color=\"#1f77b4\", linewidth=1.5, markersize=6,\n",
	" capsize=3, capthick=1,\n",
	" label=r\"HardConcrete ($L_0$, non-convex)\")\n",
	"\n",
	"ax.set_xlabel(\"Active records (non-zero weight)\", fontsize=12)\n",
	"ax.set_ylabel(\"Out-of-sample error\\n(held-out sex margin)\", fontsize=12)\n",
	"ax.set_xscale(\"log\")\n",
	"ax.set_ylim(0, None)\n",
	"ax.yaxis.set_major_formatter(plt.FuncFormatter(lambda y, _: f\"{y:.0%}\"))\n",
	"ax.legend(fontsize=10)\n",
	"ax.grid(True, alpha=0.2)\n",
	"ax.set_title(\"Reweighting frontier: records used vs out-of-sample error\", fontsize=13)\n",
	"plt.tight_layout()\n",
	"plt.show()"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"## Runtime comparison"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"print(\"SparseCalibrator runtime (all λ values):\")\n",
	"print(f\" Mean: {sc_df['elapsed'].mean():.3f}s\")\n",
	"print(f\" Max: {sc_df['elapsed'].max():.3f}s\")\n",
	"\n",
	"print(f\"\\nHardConcrete runtime (2000 epochs, all λ values):\")\n",
	"print(f\" Mean: {hc_df['elapsed_mean'].mean():.2f}s\")\n",
	"print(f\" Max: {hc_df['elapsed_mean'].max():.2f}s\")\n",
	"\n",
	"print(f\"\\nSpeedup: {hc_df['elapsed_mean'].mean() / sc_df['elapsed'].mean():.0f}x\")"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"## Summary\n",
	"\n",
	"\| Property \| SparseCalibrator ($L_1$) \| HardConcrete ($L_0$) \|\n",
	"\|----------\|------------------------\|---------------------\|\n",
	"\| Solver \| FISTA (convex) \| Adam SGD (non-convex) \|\n",
	"\| Deterministic \| Yes \| No (seed-dependent) \|\n",
	"\| Dominates frontier \| Yes \| No \|\n",
	"\| Runtime (5K records) \| ~0.02s \| ~1.8s (2000 epochs) \|\n",
	"\| Loss function \| Relative MSE (via normalization) \| Relative MSE \|\n",
	"\| Regularization \| $\\lambda \\\|w\\\|_1$ (weight shrinkage) \| $\\lambda_{L_0} \\\|\\alpha\\\|_0$ (gate selection) \|\n",
	"\n",
	"### Key findings\n",
	"\n",
	"1. SparseCalibrator dominates the entire frontier — lower or equal out-of-sample error at every sparsity level\n",
	"2. HardConcrete variance explodes at high sparsity — ±10-14% SE below 100 records vs zero variance for SparseCalibrator\n",
	"3. SparseCalibrator is 12-90x faster depending on configuration\n",
	"4. SparseCalibrator already supports relative loss — `normalize_targets=True` (default) divides each constraint row by `\|b_i\|`, which is mathematically equivalent to `loss_type=\"relative\"` in l0-python\n",
	"5. No new method — SparseCalibrator is non-negative LASSO (Tibshirani 1996) solved with FISTA (Beck & Teboulle 2009). The contribution is the empirical comparison, not the algorithm.\n",
	"\n",
	"### Recommendation\n",
	"\n",
	"Replace `l0-python`'s `SparseCalibrationWeights` with `microplex`'s `SparseCalibrator` for survey weight calibration in PolicyEngine datasets. The convex formulation is faster, more reliable, and produces better out-of-sample generalization."
	]
	}
	],
	"metadata": {
	"kernelspec": {
	"display_name": "Python 3",
	"language": "python",
	"name": "python3"
	},
	"language_info": {
	"name": "python",
	"version": "3.14.0"
	}
	},
	"nbformat": 4,
	"nbformat_minor": 4
	}
No results found