Performance Benchmarks

Across the 17 rules we benchmark, Basilisk’s cold full-file check is a median 35× faster than Pyright (up to 58×) — a typical 17 ms against Pyright’s 609 ms. These are self-measured, reproducible numbers: every release runs make bench and commits the raw CSV, so the table below is exactly what was last measured — never a hand-typed figure.

Rule	basilisk	basilisk-warm	pyright	mypy	mypy-warm	ty	pyrefly	zuban
E0001 Missing param	12 ms	4 ms ✓	531 ms	635 ms	168 ms	31 ms	111 ms	32 ms
E0002 Missing return	16 ms	4 ms ✓	580 ms	633 ms	166 ms	36 ms	113 ms	38 ms
E0010 Unresolved import	50 ms	8 ms ✓	477 ms	668 ms	166 ms	246 ms	783 ms	244 ms
E0011 Explicit any	15 ms	4 ms ✓	530 ms	622 ms	164 ms	30 ms	108 ms	30 ms
E0012 Argument type mismatch	17 ms	6 ms ✓	668 ms	624 ms	166 ms	54 ms	114 ms	48 ms
E0014 Assignment incompatibility	12 ms	8 ms ✓	609 ms	592 ms	170 ms	51 ms	114 ms	31 ms
E0016 Incompatible override	52 ms	4 ms ✓	655 ms	624 ms	167 ms	54 ms	118 ms	30 ms
E0018 Undefined name	19 ms	8 ms ✓	493 ms	641 ms	169 ms	49 ms	549 ms	35 ms
E0022 Unhashable dict key	16 ms	8 ms ✓	532 ms	627 ms	164 ms	37 ms	104 ms	31 ms
E0023 Nonexhaustive match	13 ms	6 ms ✓	541 ms	613 ms	165 ms	23 ms	109 ms	27 ms
E0026 Typevar single constraint	21 ms	8 ms ✓	737 ms	596 ms	166 ms	39 ms	109 ms	33 ms
E0036 Classvar misuse	19 ms	8 ms ✓	616 ms	624 ms	166 ms	56 ms	132 ms	32 ms
E0038 Typeddict readonly inheritance	92 ms	5 ms ✓	680 ms	585 ms	165 ms	38 ms	117 ms	26 ms
E0050 Newtype name mismatch	14 ms	8 ms ✓	727 ms	633 ms	166 ms	44 ms	121 ms	35 ms
E0054 Final reassignment	8 ms	5 ms ✓	474 ms	579 ms	166 ms	27 ms	99 ms	24 ms
E0056 Readonly typeddict mutation	39 ms	5 ms ✓	645 ms	591 ms	166 ms	41 ms	107 ms	25 ms
E0093 Typeddict invalid key	38 ms	5 ms ✓	640 ms	588 ms	166 ms	37 ms	110 ms	26 ms

Versions benchmarked: basilisk dev (c7d11042); pyright 1.1.408; mypy 1.19.1; ty 0.0.19; pyrefly 0.54.0; zuban 0.9.0

Mean wall-clock over 10 hyperfine runs; lower is better, ✓ = fastest. Measured on Apple M4 Max (Darwin 25.5.0, 14 cores) on 2026-07-03T17:39:13+1000. Source: benchmarks/status/darwin-arm64-apple-m4-max.csv — regenerated by make bench.

Methodology

Every number is produced by hyperfine, which runs each tool’s real command-line checker many times and reports the mean wall-clock time. We run the same harness against every tool on identical single-rule fixtures — large Python files that stress one diagnostic each — so the comparison is like-for-like.

Cold vs. warm

The main <tool> column is a cold full-file check from scratch: the time to start the tool and check the file with no prior state. Only Basilisk and mypy also have a -warm column, because they are the only two that keep a real cross-run cache (basilisk-warm = a --cache result-cache hit; mypy-warm = an incremental .mypy_cache hit, with cold mypy measured under --no-incremental). Pyright, ty and Pyrefly keep no cross-run result cache — a repeat run is a cold run — so they are measured cold-only. zuban is cold-only too, but its mypy mode reuses a ./.mypy_cache when present, so we delete that cache before each timed run to keep the measurement honestly cold.

Fair strictness

mypy runs with --strict and zuban runs as zuban mypy --strict, because the fixtures exercise strict-mode analysis — without it those tools report “no issues” on the strictness fixtures and would not be doing comparable work. Basilisk, Pyright, ty and Pyrefly are run in their default checking mode.

What this does and doesn’t show

These are cold, full-file CLI runs. They reflect batch/CI checking, not warm incremental editor latency: inside the LSP, an edit re-checks only the file you touched and its importers, so interactive re-checks do strictly less work than any number here. Throughput on a single small fixture is also not the same as throughput on a large real codebase.

Reproduce any of this yourself with make bench. The raw results are committed to the repository under benchmarks/status/<machine>.csv; this page renders the darwin-arm64-apple-m4-max machine. Competitor PEP-conformance context lives in the type-checker comparison, and Pyrefly’s own throughput figure (1.85M LOC/sec on 166-core Meta infrastructure) is published at pyrefly.org.