artifacts / grade definitions

// specimen — taxonomy

Inference grade definitions

A provisional grading scheme for inference markets. Three tiers, each fixed by a latency band, a context-window floor, an evaluation floor, and an indicative price floor. See grade for the rationale.

grade specification · revision 0

frontier

T1-FRONTIER

Latency
≤ 200 ms p95
Context
≥ 200k tokens
Eval floor
MMLU-Pro ≥ 0.78 · GPQA ≥ 0.65
Typical use
Interactive frontier reasoning, agentic tool use
Price floor
$8.00 / Mtok
production

T2-PRODUCTION

Latency
≤ 800 ms p95
Context
≥ 64k tokens
Eval floor
MMLU-Pro ≥ 0.62 · GPQA ≥ 0.45
Typical use
Steady-state production load, RAG, summarisation
Price floor
$1.20 / Mtok
bulk

T3-BULK

Latency
≤ 24 h batch
Context
≥ 16k tokens
Eval floor
MMLU-Pro ≥ 0.45
Typical use
Batch enrichment, offline data pipelines
Price floor
$0.15 / Mtok

notes

  1. Eval floors assume a frozen, signed evaluation harness with reproducible scoring. None has industry-wide acceptance today.
  2. Latency targets are end-to-end at the delivery point (network + queueing + model). Provider-side timing is necessary but not sufficient.
  3. Price floors are indicative — they exist to make the grade-spread visible, not to set a market.

Speculative design. The grades, evaluations, and price floors on this page are illustrative; no standards body has issued them.