> TBF there is no good explanation why it works My mental justification for atte...

eurekin · on April 15, 2024

> TBF there is no good explanation why it works

I thought the general consesus was: "transformers allow neural networks to have adaptive weights".

As opposed to the previous architectures, were every edge connecting two neurons always has the same weight.

EDIT: a good video, where it's actually explained better: https://youtu.be/OFS90-FX6pg?t=750&si=A_HrX1P3TEfFvLay

rcarmo · on April 15, 2024

You're effectively steering the predictions based on adjacent vectors (and precursors from the prompt). That mental model works fine.