Attention is all you need – Part 2

This is the 11th and final post in my series on building a toy GPT. Read my earlier posts first for better understanding. I concluded the previous post by explaining how Attention works. You can download the Excel sheet and Python code that explains how Attention from here and here. The Transformer architecture was originally…

Embeddings – Part 2

This is the 9th post in my series on building a toy GPT. For better understanding, I recommend reading my earlier posts first. Word embeddings convert words into fixed-length numerical arrays. Each number in these arrays corresponds to a specific characteristic of the word, such as its association with a place, person, gender, or concept.…

Embeddings – Part 1

This is the 8th post in my series on building a toy GPT. For better understanding, I recommend reading my earlier posts first. I love playing and watching cricket. The dominance India showed in the recently concluded World Cup is astounding. I have never seen anything like it in the four decades I’ve been following…

Neural Networks – Part 3

This is the seventh post in my series on making a toy GPT. For better understanding, I recommend reading my earlier posts first. The MNIST dataset is the “hello world” of machine learning, containing images of handwritten digits that are used to train machine learning models. It includes 60,000 training images and 10,000 test images…