There's only one way to deeply understand something - i.e get your hands dirty and start buidling it from the ground up!
In this repo we successfully implement and train a small decoder only transformer - shakspeare GPT.
- Bigram.ipynb -: The implementation of the transformer
- train.py -: Simply run if you want to train your own model on a dataset
- inferencing_trained_model.ipynb -: If you want to see how my final trained model speaks shakespeare
I followed along sensei 👨🏫 Andrej Karpathy and his Neural Networks: Zero to Hero series. I think it's the best resource out there to get started and quickly progress in Deep neural nets.
The model was trained for 50mins on a Nvidia Tesla T4 GPU which is available for free in a colab notebook.
I was able to get the final loss down to 0.976
You can take the model for a test run using the inferencing_trained_model notebook, as I've also included the trained model weights in the repo, or see the some 2K tokens generated by the model in 2k_lines_output.txt
