🐦‍🔥 myT-LLM

myT-LLM (pronounced as mighty-llm)

author : Mahanth Yalla

Objective: Training a Generative Pre-trained Transformer (GPT) a.k.a. $\texttt{Decoder Only Transformer}$ from Scratch and compare with SOTA models online.

Currently in development

Ack: AI generated image - by the Gemini - nanobanana.ai

Overview

myT-LLM is a full-stack, from-scratch implementation of a Generative Pre-trained Transformer (GPT) model — built for research, reproducibility, and understanding.
It's the continuation of the miniGPT project and represents a practical exploration of modern SOTA (state-of-the-art) architectural innovations.

This repository brings together:

End-to-end LLM pretraining pipelines
Custom tokenization and data streaming
Experimental architecture variants (CerebrasGPT ,LLaMA, Qwen - inspired)
Efficient GPU-optimized training

Objective

To build, train, and evaluate a GPT-style decoder-only Transformer from first principles — while integrating the latest architectural and optimization techniques.

Core goals:

Understand each component's role in LLM design
Train scalable GPT variants (nano → small → base)
Compare SOTA enhancements like SwiGLU, RMSNorm, GQA, and Flash Attention
Benchmark loss curves, scaling laws, and stability
Open-source a clean research-grade training framework

Architecture Highlights

Component	Choice / Implementation	Reference
Model Type	Decoder-only Transformer	QWEN / LLaMA lineage
Normalization	Pre-RMSNorm	[Zhang et al., 2019]
Attention	Multi-Head / Grouped-Query / Flash Attention	[Dao et al., 2022]
Activation	SwiGLU (2× FFN)	[Shazeer, 2020]
Optimizer	AdamW (with fused kernels)
Scheduler	Cosine Decay + Warmup	[Chinchilla, 2022]
Tokenizer	Custom BPE / SentencePiece
Initialization	GPT-2 style (std = 0.02)	[Radford et al., 2019]

Key Features

Custom BPE Tokenizer (train your own)
End-to-End LLM Training Suite
Checkpointing, Resumption, and Logging
Flash Attention for faster context windows
RMSNorm + SwiGLU + GQA architecture
Cosine LR Scheduling + Warmup
ROCm/AMD GPU compatibility
Config-driven YAML setup for multi-model scaling
Built-in tests for dataset, trainer, and architecture

Repository Structure


myT-LLM/
├── Research/                 # Papers, notes, experiments
├── dev/                      # Tokenization & preprocessing
│   ├── 11_train_suite_v1/        # Baseline trainer
│   ├── 12_train_suite_v2/        # Improved dataset pipeline
│   ├── 13_train_suite_v3/        # Benchmarks & timing
│   ├── 16_Last_Min_Improvements/ # Stability & performance patches
│   ├── 17_The_End_Game/          # Finalized pipeline
│   └── 18_Lets_Train/            # Final training runs
├── stage/                    # Archived early stages
├── prod/                     # Production-ready LLM trainer
│   ├── architecture.py
│   ├── tokenizer.py
│   ├── trainer.py
│   ├── configs/
│   ├── cmds.prod.sh
│   └── main.py
└── assets/                   # Images

Research Lineage

This project evolved from:

miniGPT → theoretical and educational prototype
myT-LLM — scaled-up, SOTA implementation with modern training infrastructure

Influences:

Attention is All You Need — Vaswani et al., 2017
Language Models are Few-Shot Learners — OpenAI, 2020
FlashAttention — Dao et al., 2022
Chinchilla Scaling Laws — DeepMind, 2022
LLaMA: Open and Efficient Foundation Language Models — Meta, 2023

Motto

“Don't just use Transformers — understand them.” — myT-LLM Project

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
Research		Research
assets/images		assets/images
dev		dev
prod		prod
stage/train_suite_v1		stage/train_suite_v1
.gitignore		.gitignore
README.MD		README.MD

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🐦‍🔥 myT-LLM

Currently in development

Overview

Objective

Architecture Highlights

Key Features

Repository Structure

Research Lineage

Influences:

Motto

About

Uh oh!

Releases

Languages

Mahanth-Maha/myt-LLM

Folders and files

Latest commit

History

Repository files navigation

🐦‍🔥 myT-LLM

Currently in development

Overview

Objective

Architecture Highlights

Key Features

Repository Structure

Research Lineage

Influences:

Motto

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Languages