Prepared for Vahid's meeting with ONS firm data team, December 2025
PolicyEngine is exploring building an OLG (Overlapping Generations) model for the UK, similar to OG-USA/OG-Core but with greater firm-level heterogeneity. We also want to expand our firm microsimulation capabilities for tax policy analysis (VAT, corporation tax, business rates).
This document outlines what firm-level data would be most valuable from ONS.
Based on the Trends in UK Business Dynamism and Productivity bulletin:
| Source | Coverage | Key Variables |
|---|---|---|
| Annual Business Survey (ABS) | 1997-2023, ~98% of turnover | Turnover, employment, GVA, intermediate consumption |
| Longitudinal Business Database (LBD) | 1999-present, quarterly | Firm entry/exit, employment dynamics |
| Inter-departmental Business Register (IDBR) | Universe of businesses | Registration, legal status |
- Firm size distributions by turnover band
- Entry/exit rates (aggregate and by broad sector)
- Sectoral capital-output ratios
- Labor shares by sector
- Aggregate markup trends
- Job creation/destruction rates
Individual moments exist, but cross-tabulations are limited. For calibrating heterogeneous firm models, we need the shape of distributions and correlations between characteristics.
Our VAT threshold analysis constructed synthetic microdata via optimization because joint distributions weren't published—only marginals from ONS and HMRC separately. Better data would enable:
- More accurate policy simulations
- Behavioral response modeling (bunching near thresholds)
- Distributional analysis by firm type
OG-Core's firm sector uses representative firms per sector. For heterogeneous firms, we need:
| Parameter Type | Examples |
|---|---|
| Production | Capital-labor ratios, TFP distributions, markup dispersion |
| Dynamics | Entry/exit rates, growth transitions, survival by age |
| Labor | Wage distributions by firm size, hiring patterns |
We evaluated candidate cross-tabulations against five criteria:
- Calibration value - Improves model fit to real economy
- New policy questions - Unlocks analyses we can't currently do
- Publication potential - Leads to high-visibility outputs
- Feasibility - Likelihood ONS can/will provide
- Reusability - Useful across multiple projects
| Priority | Cross-Tabulation | Rationale |
|---|---|---|
| 1 | Entry/Exit rates × Sector × Firm Size | Core to OLG dynamics, high feasibility |
| 2 | Fine turnover bins (£5k increments) near £70k-£120k × Sector | Critical for VAT/corp tax threshold analysis |
| 3 | Turnover × Firm Age × Sector | Enables firm lifecycle modeling |
| 4 | Capital stock × Employment × Sector | Essential for production function calibration |
| 5 | Turnover × Employment × Sector | Baseline joint distribution |
| 6 | Size class transition matrices (panel) | Highest reuse value for dynamic models |
| 7 | Markup × Firm Size × Sector | Novel for market power analysis |
| 8 | Wage bill × Turnover × Sector | Labor share heterogeneity |
- Why: OLG models need firm birth/death rates to calibrate steady-state firm distributions and transition dynamics
- Current gap: Published rates are aggregate or by broad sector only
- Ideal format: Annual rates by 2-digit SIC, size class (micro/small/medium/large), for last 10 years
- Why: VAT threshold (£90k), corporation tax small profits (£50k), and other policy thresholds create behavioral responses. Current ONS bands (£50-99k) are too coarse to identify bunching.
- Current gap: Finest public data is £50k bands
- Ideal format: £5k bins from £50k-£150k, by sector
- Why: Firm age is strongly predictive of growth, exit risk, and productivity. Essential for lifecycle modeling.
- Current gap: Age distributions exist separately from turnover distributions
- Ideal format: Joint distribution by firm age cohort (0-2, 3-5, 6-10, 11+), turnover band, sector
- Why: Calibrating production functions requires knowing capital intensity variation across firm types
- Current gap: Capital stocks published at sector level only
- Ideal format: Average capital per worker by sector and firm size class
- Why: Dynamic models need P(size class at t+1 | size class at t) to calibrate adjustment costs and growth processes
- Current gap: Not published; would require LBD panel analysis
- Ideal format: 5×5 transition matrix (by size class) for each major sector, annual
If custom tabulations aren't feasible:
- Secure Research Service - Direct access to anonymised microdata (requires accreditation, but PolicyEngine has academic collaborators)
- Existing detailed tables - ONS may have unpublished tables from previous projects
- HMRC linkage - Some data may exist in linked ONS-HMRC datasets
- What's the process for requesting custom tabulations vs. requiring SRS access?
- Are there existing unpublished tables that match our needs?
- What's the timeline for data requests?
- Is there appetite for a formal data-sharing agreement for ongoing research?
- Can longitudinal linkages (LBD panel) be provided as tabulations, or only via SRS?
This could be a two-way collaboration. We've built open-source tools that may address challenges ONS faces:
The problem: Does ONS build bespoke models for corporation tax, business rates, or VAT policy analysis? These often end up as one-off spreadsheets or scripts.
Our approach: PolicyEngine's rules-as-code framework encodes tax-benefit logic in a modular, version-controlled, testable way. We currently cover household taxes and benefits; extending to firm taxes would mean:
- Transparent, auditable corporation tax calculations
- Easy scenario analysis (rate changes, threshold shifts, reliefs)
- API access for integration with other tools
- Automatic handling of policy changes over time
We'd be interested in collaborating on a firm tax module if ONS sees value.
The problem: Combining multiple firm datasets (e.g., ABS survey data with HMRC administrative tax records) requires statistical matching or imputation when direct linkage isn't possible.
Our approach: microimpute is our open-source package for imputing variables across datasets. It supports:
- Quantile regression forests for continuous variables
- Multiple imputation for uncertainty quantification
- Calibration to known marginals
- Donor-based matching methods
If ONS needs to combine, say, detailed ABS characteristics with HMRC tax liability data without full record linkage, microimpute could help.
The problem: Producing reliable estimates at fine geographic levels (constituencies, local authorities) when surveys are designed for national/regional representativeness.
Our approach: We've built survey-enhance for reweighting survey data to match area-level targets. For UK household data, we calibrate to all 650 parliamentary constituencies using:
- Gradient-based optimization for survey weights
- Multiple constraint types (means, totals, quantiles)
- Entropy-based regularization to stay close to original design weights
This could apply to firm surveys if ONS wants constituency-level business statistics without running massive sample boosts.
The problem: Releasing microdata with disclosure control is costly; synthetic data is an alternative but quality varies.
Our approach: For our VAT analysis, we generated synthetic firm microdata that matches published marginals from multiple sources. The optimization-based approach ensures:
- Exact calibration to published totals
- Plausible joint distributions
- No disclosure risk (fully synthetic)
We could collaborate on synthetic firm datasets that ONS could release publicly while we use for modeling.
PolicyEngine is a nonprofit that builds open-source tax and benefit microsimulation models. Our UK model covers the full tax-benefit system and is used by researchers, journalists, and policymakers. We're expanding into firm-level modeling to analyze business taxation and macroeconomic policy.
- Vahid Ahmadi - vahid@policyengine.org
- Max Ghenis - max@policyengine.org
Last updated: December 2025