Attention is all you need – Part 2
This is the 11th and final post in my series on building a toy GPT. Read my earlier posts first for better understanding. I concluded the previous post by explaining how Attention works. You can download the Excel sheet and Python code that explains how Attention from here and here. The Transformer architecture was originally…