Performance Benchmarks
Across the 17 rules we benchmark, Basilisk’s
cold full-file check is a median 35×
faster than Pyright (up to 58×) —
a typical 17 ms against Pyright’s
609 ms. These are self-measured, reproducible numbers:
every release runs make bench and commits the raw CSV, so the table below is exactly
what was last measured — never a hand-typed figure.
| Rule | basilisk | basilisk-warm | pyright | mypy | mypy-warm | ty | pyrefly | zuban |
|---|---|---|---|---|---|---|---|---|
| E0001 Missing param | 12 ms | 4 ms ✓ | 531 ms | 635 ms | 168 ms | 31 ms | 111 ms | 32 ms |
| E0002 Missing return | 16 ms | 4 ms ✓ | 580 ms | 633 ms | 166 ms | 36 ms | 113 ms | 38 ms |
| E0010 Unresolved import | 50 ms | 8 ms ✓ | 477 ms | 668 ms | 166 ms | 246 ms | 783 ms | 244 ms |
| E0011 Explicit any | 15 ms | 4 ms ✓ | 530 ms | 622 ms | 164 ms | 30 ms | 108 ms | 30 ms |
| E0012 Argument type mismatch | 17 ms | 6 ms ✓ | 668 ms | 624 ms | 166 ms | 54 ms | 114 ms | 48 ms |
| E0014 Assignment incompatibility | 12 ms | 8 ms ✓ | 609 ms | 592 ms | 170 ms | 51 ms | 114 ms | 31 ms |
| E0016 Incompatible override | 52 ms | 4 ms ✓ | 655 ms | 624 ms | 167 ms | 54 ms | 118 ms | 30 ms |
| E0018 Undefined name | 19 ms | 8 ms ✓ | 493 ms | 641 ms | 169 ms | 49 ms | 549 ms | 35 ms |
| E0022 Unhashable dict key | 16 ms | 8 ms ✓ | 532 ms | 627 ms | 164 ms | 37 ms | 104 ms | 31 ms |
| E0023 Nonexhaustive match | 13 ms | 6 ms ✓ | 541 ms | 613 ms | 165 ms | 23 ms | 109 ms | 27 ms |
| E0026 Typevar single constraint | 21 ms | 8 ms ✓ | 737 ms | 596 ms | 166 ms | 39 ms | 109 ms | 33 ms |
| E0036 Classvar misuse | 19 ms | 8 ms ✓ | 616 ms | 624 ms | 166 ms | 56 ms | 132 ms | 32 ms |
| E0038 Typeddict readonly inheritance | 92 ms | 5 ms ✓ | 680 ms | 585 ms | 165 ms | 38 ms | 117 ms | 26 ms |
| E0050 Newtype name mismatch | 14 ms | 8 ms ✓ | 727 ms | 633 ms | 166 ms | 44 ms | 121 ms | 35 ms |
| E0054 Final reassignment | 8 ms | 5 ms ✓ | 474 ms | 579 ms | 166 ms | 27 ms | 99 ms | 24 ms |
| E0056 Readonly typeddict mutation | 39 ms | 5 ms ✓ | 645 ms | 591 ms | 166 ms | 41 ms | 107 ms | 25 ms |
| E0093 Typeddict invalid key | 38 ms | 5 ms ✓ | 640 ms | 588 ms | 166 ms | 37 ms | 110 ms | 26 ms |
- Versions benchmarked
basilisk dev (c7d11042)pyright 1.1.408mypy 1.19.1ty 0.0.19pyrefly 0.54.0zuban 0.9.0
Methodology
Every number is produced by hyperfine, which runs each tool’s real command-line checker many times and reports the mean wall-clock time. We run the same harness against every tool on identical single-rule fixtures — large Python files that stress one diagnostic each — so the comparison is like-for-like.
Cold vs. warm
The main <tool> column is a cold full-file check from scratch: the
time to start the tool and check the file with no prior state. Only Basilisk and mypy also have a
-warm column, because they are the only two that keep a real cross-run cache
(basilisk-warm = a --cache result-cache hit; mypy-warm = an
incremental .mypy_cache hit, with cold mypy measured under --no-incremental).
Pyright, ty and Pyrefly keep no cross-run result cache — a repeat run is a cold run — so
they are measured cold-only. zuban is cold-only too, but its mypy mode reuses a ./.mypy_cache
when present, so we delete that cache before each timed run to keep the measurement honestly cold.
Fair strictness
mypy runs with --strict and zuban runs as zuban mypy --strict, because the
fixtures exercise strict-mode analysis — without it those tools report “no issues” on
the strictness fixtures and would not be doing comparable work. Basilisk, Pyright, ty and Pyrefly are
run in their default checking mode.
What this does and doesn’t show
These are cold, full-file CLI runs. They reflect batch/CI checking, not warm incremental editor latency: inside the LSP, an edit re-checks only the file you touched and its importers, so interactive re-checks do strictly less work than any number here. Throughput on a single small fixture is also not the same as throughput on a large real codebase.
Reproduce any of this yourself with make bench. The raw results are committed to the
repository under benchmarks/status/<machine>.csv; this page renders the
darwin-arm64-apple-m4-max machine. Competitor PEP-conformance context lives in the
type-checker comparison, and Pyrefly’s own throughput figure
(1.85M LOC/sec on 166-core Meta infrastructure) is published at
pyrefly.org.