Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is great! CPU only for now or can it leverage GPU for sped up inference?


Focus is CPU for now - but it'd be very useful to have GPU support. As a baseline, this can support anything that llama.cpp. Relevant link: https://github.com/ggerganov/llama.cpp#blas-build




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: