AetherMind

Train a small reasoning language model from scratch.

AetherMind is a project to build Forge-1, a compact decoder-only transformer that can reason step-by-step using chain-of-thought (CoT) thinking tokens. Two variants are available:

Variant	Parameters	VRAM Required	Target Hardware
Forge-1-Nano	~125M	4 GB	RTX 3050 Ti, GTX 1650
Forge-1-Mini	~350M	15 GB	Colab T4, RTX 3060+

Architecture

Forge-1 follows the modern LLaMA/Mistral architecture:

RMSNorm — Pre-normalization (more efficient than LayerNorm)
Grouped Query Attention (GQA) — Reduces KV-cache memory by sharing KV heads
Rotary Position Embeddings (RoPE) — Relative position encoding via rotation
SwiGLU Feed-Forward — Gated activation function for better feature selection
Weight Tying — Input embeddings shared with output projection head

Quick Start

1. Setup Environment

# Clone the repo
git clone https://github.com/yourusername/AetherMind.git
cd AetherMind

# Create virtual environment
python -m venv venv
venv\Scripts\activate       # Windows
# source venv/bin/activate  # Linux/macOS

# Install dependencies
pip install torch --index-url https://download.pytorch.org/whl/cu121
pip install -r requirements.txt

2. Download & Preprocess Data

# Download datasets (use 1000 for a quick test)
python data/download_datasets.py 1000

# Full download (takes longer)
python data/download_datasets.py

3. Train

# Local training (RTX 3050 Ti / 4GB VRAM)
python scripts/train.py --config configs/local_config.yaml --variant nano

# Or on Colab (T4 / 15GB VRAM)
python scripts/train.py --config configs/colab_config.yaml --variant mini

# Resume interrupted training
python scripts/train.py --config configs/local_config.yaml --variant nano --resume

4. Chat

python scripts/chat.py --model outputs/final_model/forge1_nano_final.pt --variant nano

5. Evaluate

python scripts/evaluate.py --model outputs/final_model/forge1_nano_final.pt --variant nano

6. Export

python scripts/export.py --model outputs/final_model/forge1_nano_final.pt --variant nano

Google Colab

!git clone https://github.com/yourusername/AetherMind.git
%cd AetherMind
!python scripts/colab_setup.py
!python scripts/train.py --config configs/colab_config.yaml --variant mini

Training Datasets

Dataset	Size	Purpose
OpenHermes-2.5	1M examples	Diverse instruction following
Open-Platypus	25K examples	STEM reasoning with CoT
OpenThoughts-114k	114K examples	Synthetic CoT reasoning traces

Memory Optimization

For low-VRAM GPUs (4GB), AetherMind uses:

FP16 mixed precision — Halves memory for activations
Gradient checkpointing — Trades compute for memory
Gradient accumulation — Small physical batch, large effective batch
Weight tying — Shares embedding weights between input and output layers
GQA — Fewer KV heads reduces KV-cache memory

License

MIT License — see LICENSE for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AetherMind

Architecture

Quick Start

1. Setup Environment

2. Download & Preprocess Data

3. Train

4. Chat

5. Evaluate

6. Export

Google Colab

Training Datasets

Memory Optimization

License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
configs		configs
data		data
docs		docs
inference		inference
model		model
scripts		scripts
training		training
ui		ui
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cli.py		cli.py
requirements.txt		requirements.txt

License

codex-clone/AetherMind

Folders and files

Latest commit

History

Repository files navigation

AetherMind

Architecture

Quick Start

1. Setup Environment

2. Download & Preprocess Data

3. Train

4. Chat

5. Evaluate

6. Export

Google Colab

Training Datasets

Memory Optimization

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages