Dispatchestag · kv-cache

Filed under kv-cache.

One Dispatch carries this tag.

Essays·22 June 2026·8 min read
Pulling Apart the Inference Stack
By mid-2026 every serious inference framework has accepted that the two halves of a forward pass want different hardware: prefill on compute-bound GPUs, decode on bandwidth-bound ones, the KV cache shipped between them over a fast fabric. It is the deepest reshaping of LLM serving since continuous batching — and it happened almost entirely without anyone outside the inference crowd noticing.
llm-inferenceprefill-decode-disaggregationvllmkv-cache
Read

← All Dispatches