Diffbot reposted this
How do LLMs add numbers? Largely based on Anthropic's research "On the biology of a Large Language Model", we dive into the process of how LLMs handle addition, including main takeaways from self-attention and the replacement model : cross-layer transcoder for better interpretability of neural networks with sparse features. In the self-attention matrix multiplication animation, we use 4-dimension toy embeddings for illustration purposes.