| 1. | | LLM from scratch, part 28 – training a base model from scratch on an RTX 3090 (gilesthomas.com) |
| 540 points by gpjt 26 days ago | past | 121 comments |
|
| 2. | | Writing an LLM from scratch, part 27 – what's left, and what's next? (gilesthomas.com) |
| 1 point by gpjt 54 days ago | past |
|
| 3. | | Writing an LLM from scratch, part 26 – evaluating the fine-tuned model (gilesthomas.com) |
| 4 points by gpjt 55 days ago | past |
|
| 4. | | Writing an LLM from scratch, part 25 – instruction fine-tuning (gilesthomas.com) |
| 2 points by gpjt 60 days ago | past |
|
| 5. | | Writing an LLM from scratch, part 24 – the transcript hack (gilesthomas.com) |
| 1 point by gpjt 61 days ago | past |
|
| 6. | | Retro Language Models: Rebuilding Karpathy's RNN in PyTorch (gilesthomas.com) |
| 3 points by gpjt 65 days ago | past |
|
| 7. | | Writing an LLM from scratch, part 23 – fine-tuning for classification (gilesthomas.com) |
| 1 point by gpjt 66 days ago | past |
|
| 8. | | Writing an LLM from scratch, part 22 – training our LLM (gilesthomas.com) |
| 254 points by gpjt 73 days ago | past | 10 comments |
|
| 9. | | Revisiting Karpathy's 'Unreasonable Effectiveness of Recurrent Neural Networks' (gilesthomas.com) |
| 2 points by gpjt 78 days ago | past |
|
| 10. | | Writing an LLM from scratch, part 21 – perplexed by perplexity (gilesthomas.com) |
| 1 point by gpjt 82 days ago | past |
|
| 11. | | Writing an LLM from scratch, part 20 – starting training, and cross entropy loss (gilesthomas.com) |
| 41 points by gpjt 86 days ago | past | 3 comments |
|
| 12. | | How Do LLMs Work? (gilesthomas.com) |
| 2 points by gpjt 3 months ago | past | 1 comment |
|
| 13. | | The maths you need to start understanding LLMs (gilesthomas.com) |
| 616 points by gpjt 3 months ago | past | 120 comments |
|
| 14. | | What AI chatbots are doing under the hood (gilesthomas.com) |
| 2 points by gpjt 4 months ago | past |
|
| 15. | | LLM from scratch, part 18 – residuals, shortcut connections, and the Talmud (gilesthomas.com) |
| 2 points by gpjt 4 months ago | past |
|
| 16. | | The fixed length bottleneck and the feed forward network (gilesthomas.com) |
| 1 point by gpjt 4 months ago | past |
|
| 17. | | Writing an LLM from scratch, part 17 – the feed-forward network (gilesthomas.com) |
| 8 points by gpjt 4 months ago | past |
|
| 18. | | Writing an LLM from scratch, part 16 – layer normalisation (gilesthomas.com) |
| 1 point by gpjt 5 months ago | past |
|
| 19. | | Leaving PythonAnywhere (gilesthomas.com) |
| 3 points by gpjt 6 months ago | past |
|
| 20. | | Writing an LLM from scratch, part 15 – from context vectors to logits (gilesthomas.com) |
| 7 points by gpjt 7 months ago | past |
|
| 21. | | Writing an LLM from scratch, part 14 – the complexity of self-attention at scale (gilesthomas.com) |
| 1 point by gpjt 7 months ago | past |
|
| 22. | | Writing an LLM from scratch, part 13 – attention heads are dumb (gilesthomas.com) |
| 351 points by gpjt 7 months ago | past | 67 comments |
|
| 23. | | Writing an LLM from scratch, part 12 – multi-head attention (gilesthomas.com) |
| 3 points by gpjt 8 months ago | past |
|
| 24. | | Writing an LLM from scratch, part 11 – batches (gilesthomas.com) |
| 2 points by gpjt 8 months ago | past |
|
| 25. | | The Business of the AI Labs (omega-prime.co.uk) |
| 19 points by gpjt 8 months ago | past | 3 comments |
|
| 26. | | Writing an LLM from scratch, part 10 – dropout (gilesthomas.com) |
| 90 points by gpjt 9 months ago | past | 8 comments |
|
| 27. | | Adding /Llms.txt (gilesthomas.com) |
| 1 point by gpjt 9 months ago | past |
|
| 28. | | Writing an LLM from scratch, part 9 – causal attention (gilesthomas.com) |
| 4 points by gpjt 9 months ago | past |
|
| 29. | | Writing an LLM from scratch, part 8 – trainable self-attention (gilesthomas.com) |
| 380 points by gpjt 9 months ago | past | 31 comments |
|
| 30. | | It’s still worth blogging in the age of AI (gilesthomas.com) |
| 333 points by gpjt 10 months ago | past | 223 comments |
|
|
| More |