With no cuda jit cache
========================= 1 run
cubin: real 0m3.553s
fatbin: real 0m4.106s
=========================
========================= 10 runs
cubin real 0m38.732s
fatbin real 0m44.738s
=========================
========================= 100 runs
cubin real 5m53.634s
fatbin real 6m52.232s
=========================
========================= 1000 runs
cubin real 58m24.084s
fatbin real 68m42.272s
=========================
| Runs | CUBIN Total | FATBIN Total | CUBIN/Run | FATBIN/Run | Diff/Run | Diff (%) |
|---|---|---|---|---|---|---|
| 1 | 3.553 s | 4.106 s | 3.553 s | 4.106 s | 0.553 s | ~15.6% |
| 10 | 38.732 s | 44.738 s | 3.873 s | 4.474 s | 0.601 s | ~15.5% |
| 100 | 353.634 s | 412.232 s | 3.536 s | 4.122 s | 0.586 s | ~16.6% |
| 1000 | 3504.084 s | 4122.272 s | 3.504 s | 4.122 s | 0.618 s | ~17.6% |