Hacker Newsnew | past | comments | ask | show | jobs | submit | aleinin's commentslogin

One that I’ve seen recently is https://reducto.ai It appears to be an OCR wrapper.


Cool project! How do you think about targeting hardware-specific ISAs directly? There’s an interesting paper from Citadel (https://arxiv.org/pdf/1804.06826) that highlights inefficiencies in nvcc for the Volta architecture. Do you see Luminal’s search-based paradigm eventually extending beyond outperforming handwritten kernels, towards actually competing with NVIDIA’s compiler optimizations at the PTX level?


yep! currently we're emitting cuda / metal but once the search is better, i want to directly emit ptx / low-level asm on other hardwares.


I don't suppose you have an eye towards verilog in the long term?

I'm curious as to the breadth of possibilities that could be searched. I would imagine something like this could invent flash attention if it cast its net wide enough, but that is a pretty broad net. [Edit: I scrolled back and saw flash attention was explicitly mentioned, cool stuff]


Equality saturation (something that luminal uses at its core) is a topic for hardware synthesis and verification too. Something like dynamic hardware generation (instead of kernel generation). For example, see this thesis [1] by Samuel Coward of Imperial.

[1] https://samuelcoward.co.uk/assets/pdf/Thesis_Imperial.pdf


you suppose correctly ;)


If you're looking for a high level introduction to GPU development on Apple silicon I would recommend learning Metal. It's Apple's GPU acceleration language similar to CUDA for Nvidia hardware. I ported a set of puzzles for CUDA called GPU-Puzzles (a collection of exercises designed to teach GPU programming fundamentals)[1] to Metal [2]. I think it's a very accessible introduction to Metal and writing GPU kernels.

[1] https://github.com/srush/GPU-Puzzles

[2] https://github.com/abeleinin/Metal-Puzzles


After a quick scan through the [2] link, I have added this to the list of things to look into in 2025


Curious about the others in your list


Can anyone recommend a CUDA equivalent of (2)? That’s a spectacular learning resource and I’d like to use a similar one to upskill for CUDA


Isn’t the link right before it exactly what you’re asking for? Since 2 is a port of 1


I recently ported this to Metal for Apple Silicon computers. If you're interested in learning GPU programming on an M series Mac, I think this is a very accessible option. Thanks to Sasha for making this!

https://github.com/abeleinin/Metal-Puzzles


Wow, thank you! I've been wanting to learn about GPUs on my next flight, and this is the perfect material for that.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: