Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Oh actually yeah that's true. You have correctly out-nitpicked my nitpick lol.

But at that point i feel like we are getting close to "everything that isn't a perfect Turing-machine is somewhat-stochastic" ;)

Edit: someone corrected me above, it does seem to matter more then I thought



> someone corrected me above, it does seem to matter more then I thought

if you llm agent takes different decisions from the same prompt, then you have to deal with it

1) your benchmarks become stochastic so you need multiple samples to get confidence for your AB testing

2) if your system assumes at least once completion you have to implement single record and replay so you dont get multiple rollout of with different actions




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: