aleinin's comments

aleinin · 2025-10-20T15:58:55 1760975935

One that I’ve seen recently is https://reducto.ai It appears to be an OCR wrapper.

aleinin · 2025-08-20T17:22:40 1755710560

Cool project! How do you think about targeting hardware-specific ISAs directly? There’s an interesting paper from Citadel (https://arxiv.org/pdf/1804.06826) that highlights inefficiencies in nvcc for the Volta architecture. Do you see Luminal’s search-based paradigm eventually extending beyond outperforming handwritten kernels, towards actually competing with NVIDIA’s compiler optimizations at the PTX level?

jafioti · 2025-08-20T17:38:59 1755711539

yep! currently we're emitting cuda / metal but once the search is better, i want to directly emit ptx / low-level asm on other hardwares.

Lerc · 2025-08-20T22:57:19 1755730639

I don't suppose you have an eye towards verilog in the long term?

I'm curious as to the breadth of possibilities that could be searched. I would imagine something like this could invent flash attention if it cast its net wide enough, but that is a pretty broad net. [Edit: I scrolled back and saw flash attention was explicitly mentioned, cool stuff]

bojle · 2025-08-21T06:17:36 1755757056

Equality saturation (something that luminal uses at its core) is a topic for hardware synthesis and verification too. Something like dynamic hardware generation (instead of kernel generation). For example, see this thesis [1] by Samuel Coward of Imperial.

[1] https://samuelcoward.co.uk/assets/pdf/Thesis_Imperial.pdf

jafioti · 2025-08-21T00:20:46 1755735646

you suppose correctly ;)

aleinin · on Dec 25, 2024

If you're looking for a high level introduction to GPU development on Apple silicon I would recommend learning Metal. It's Apple's GPU acceleration language similar to CUDA for Nvidia hardware. I ported a set of puzzles for CUDA called GPU-Puzzles (a collection of exercises designed to teach GPU programming fundamentals)[1] to Metal [2]. I think it's a very accessible introduction to Metal and writing GPU kernels.

[1] https://github.com/srush/GPU-Puzzles

[2] https://github.com/abeleinin/Metal-Puzzles

dylan604 · on Dec 25, 2024

After a quick scan through the [2] link, I have added this to the list of things to look into in 2025

Jiahang · on Dec 26, 2024

Curious about the others in your list

singlepaynews · on Dec 26, 2024

Can anyone recommend a CUDA equivalent of (2)? That’s a spectacular learning resource and I’d like to use a similar one to upskill for CUDA

dagmx · on Dec 26, 2024

Isn’t the link right before it exactly what you’re asking for? Since 2 is a port of 1

aleinin · on Sept 23, 2024

I recently ported this to Metal for Apple Silicon computers. If you're interested in learning GPU programming on an M series Mac, I think this is a very accessible option. Thanks to Sasha for making this!

https://github.com/abeleinin/Metal-Puzzles

negativeonehalf · on Sept 24, 2024

Wow, thank you! I've been wanting to learn about GPUs on my next flight, and this is the perfect material for that.