How we built the most efficient inference engine for Cloudflare's network

Freedom5093 · 2025-08-28T09:52:56 1756374776

I don't understand:

> all of the prompt tokens are available in advance and do not require decoding

> The other technique is called batching: this technique aggregates multiple prompts into a single decode operation.

So do prompts get decoded or not? Are there 2 decode steps? Unclear