deepseek-from-scratch

Here is 1 public repository matching this topic...

AnkitaMungalpara / Building-DeepSeek-From-Scratch

This repository shows how to build a DeepSeek language model from scratch using PyTorch. It includes clean, well-structured implementations of advanced attention techniques such as key–value caching for fast decoding, multi-query attention, grouped-query attention, and multi-head latent attention.

transformers pytorch multi-query-attention grouped-query-attention multi-head-latent-attention deepseek-from-scratch

Updated Jan 10, 2026
Jupyter Notebook

Improve this page

Add a description, image, and links to the deepseek-from-scratch topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the deepseek-from-scratch topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

deepseek-from-scratch

Here is 1 public repository matching this topic...

AnkitaMungalpara / Building-DeepSeek-From-Scratch

Improve this page

Add this topic to your repo