A from-scratch PyTorch LLM implementing Sparse Mixture-of-Experts (MoE) with Top-2 gating. Integrates modern Llama-3 components (RMSNorm, SwiGLU, RoPE, GQA) and a custom-coded Byte-Level BPE tokenizer. Pre-trained on a curated corpus of existential & dark philosophical literature.
The execution of the EXIS-MoE project is designed as a clear, sequential four-stage process. After installing the necessary dependencies, the user initiates the project by running dataloader, which automatically curates the specialized philosophical and horror corpus from Project Gutenberg and generates the input.txt file. Once the data pipeline is established, the core learning begins by running train.py. This script executes the custom BPE tokenizer training, constructs the Sparse Mixture-of-Experts (MoE) architecture, optimizes the model over the dataset using AdamW, and saves the resulting weights as frad.pth. Finally, the trained model is loaded via inference.py, which initializes the KV cache and begins the auto-regressive generation loop, enabling interaction with the domain-specialized AI.