Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
How we built the most efficient inference engine for Cloudflare's network (cloudflare.com)
5 points by jgrahamc 6 months ago | hide | past | favorite | 1 comment


I don't understand:

> all of the prompt tokens are available in advance and do not require decoding

> The other technique is called batching: this technique aggregates multiple prompts into a single decode operation.

So do prompts get decoded or not? Are there 2 decode steps? Unclear




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: