Inference grade definitions

A provisional grading scheme for inference markets. Three tiers, each fixed by a latency band, a context-window floor, an evaluation floor, and an indicative price floor. See grade for the rationale.

grade specification · revision 0

frontier

T1-FRONTIER

Latency: ≤ 200 ms p95
Context: ≥ 200k tokens
Eval floor: MMLU-Pro ≥ 0.78 · GPQA ≥ 0.65
Typical use: Interactive frontier reasoning, agentic tool use
Price floor: $8.00 / Mtok

production

T2-PRODUCTION

Latency: ≤ 800 ms p95
Context: ≥ 64k tokens
Eval floor: MMLU-Pro ≥ 0.62 · GPQA ≥ 0.45
Typical use: Steady-state production load, RAG, summarisation
Price floor: $1.20 / Mtok

bulk

T3-BULK

Latency: ≤ 24 h batch
Context: ≥ 16k tokens
Eval floor: MMLU-Pro ≥ 0.45
Typical use: Batch enrichment, offline data pipelines
Price floor: $0.15 / Mtok

notes

Eval floors assume a frozen, signed evaluation harness with reproducible scoring. None has industry-wide acceptance today.
Latency targets are end-to-end at the delivery point (network + queueing + model). Provider-side timing is necessary but not sufficient.
Price floors are indicative — they exist to make the grade-spread visible, not to set a market.

Speculative design. The grades, evaluations, and price floors on this page are illustrative; no standards body has issued them.