artifacts / grade definitions
// specimen — taxonomy
Inference grade definitions
A provisional grading scheme for inference markets. Three tiers,
each fixed by a latency band, a context-window floor, an
evaluation floor, and an indicative price floor. See
grade for the rationale.
grade specification · revision 0
- Latency
- ≤ 200 ms p95
- Context
- ≥ 200k tokens
- Eval floor
- MMLU-Pro ≥ 0.78 · GPQA ≥ 0.65
- Typical use
- Interactive frontier reasoning, agentic tool use
- Price floor
- $8.00 / Mtok
- Latency
- ≤ 800 ms p95
- Context
- ≥ 64k tokens
- Eval floor
- MMLU-Pro ≥ 0.62 · GPQA ≥ 0.45
- Typical use
- Steady-state production load, RAG, summarisation
- Price floor
- $1.20 / Mtok
- Latency
- ≤ 24 h batch
- Context
- ≥ 16k tokens
- Eval floor
- MMLU-Pro ≥ 0.45
- Typical use
- Batch enrichment, offline data pipelines
- Price floor
- $0.15 / Mtok
notes
-
Eval floors assume a frozen, signed evaluation harness with
reproducible scoring. None has industry-wide acceptance today.
-
Latency targets are end-to-end at the delivery point (network +
queueing + model). Provider-side timing is necessary but not
sufficient.
-
Price floors are indicative — they exist to make the
grade-spread visible, not to set a market.
Speculative design. The grades, evaluations, and price floors on
this page are illustrative; no standards body has issued them.