Skip to content

Instantly share code, notes, and snippets.

@rgommers
Last active December 19, 2025 20:20
Show Gist options
  • Select an option

  • Save rgommers/11373ebbfd8281317c8439322c1387d6 to your computer and use it in GitHub Desktop.

Select an option

Save rgommers/11373ebbfd8281317c8439322c1387d6 to your computer and use it in GitHub Desktop.
NumPy test suite runtime estimates pytest-run-parallel and TSan

On a macOS arm64 (M1) machine, running the NumPy test suite:

Default build:

  • pixi r test -- --collect-only: 10 s
  • pixi r test (1 thread): 163 s.
  • pixi r test -j2 (pytest-xdist): 116 s.

Free-threaded build (python 3.14):

  • pixi r test-nogil -- --collect-only: 10 s
  • pixi r test-nogil -- --collect-only --parallel-threads=2: 30 s
  • pixi r test-nogil (1 thread): 167 s.
  • pixi r test-nogil -j2 (pytest-xdist): 123 s.
  • pixi r test-nogil -- --parallel-threads=2: 270 s.
  • pixi r test-nogil -j2 -- parallel-threads=2: 192 s.

Now with submodule test selection:

  • pixi r test-nogil -- numpy/_core: 57 s.
  • pixi r test-nogil -- numpy/f2py: 80 s.
  • pixi r test-nogil -- --ignore=numpy/f2py: 78 s.

Putting that all together:

  • pixi r test-nogil -- --parallel-threads=2 --ignore=numpy/f2py: 186 s.
  • pixi r test-nogil -j2 -- --parallel-threads=2 --ignore=numpy/f2py: 123 s.

Benchmarks show that M1 vs. M3 Ultra is about 30% faster single-core, and 4x on a multi-core benchmark. For actually running multiple sets of tests in parallel in separate processes, performance should scale linearly with the number of cores - which would probably be roughly 5x faster, given 8 vs. 28 cores and 4 vs. 20 performance cores, times 30% per core. I'd expect the upcoming M5 Ultra to give us another 30% at least. So let's say 6.5x faster.

Now assume TSan makes everything 50x slower (that seems worst-case). That should give a test suite runtime for the non-slow tests with f2py excluded of roughly: 186 s. x 50/6.5 = 1430 s. (= 24 min.). That assumes that we can shard the tests easily to run in independent processes. If that's not the case and we're stuck with something like -j2, then it's going to be more like 100 minutes runtime.


For SciPy, the test suite is heavier but definitely shardable across submodules. The slowest submodule is scipy.stats, which takes a similar amount of time as all of NumPy minus f2py:

  • pixi r test-nogil -- --parallel-threads=2 scipy/stats: 171 s.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment