“This project demonstrates a minimal RLHF loop for aligning language models with human preferences. It includes supervised fine-tuning, preference data collection, reward model training, and policy optimization using PPO or DPO. Designed for clarity, reproducibility, and scalability.”

-
Notifications
You must be signed in to change notification settings - Fork 0
“This project implements a mini LLM alignment pipeline using Reinforcement Learning from Human Feedback (RLHF). It includes training a reward model from human-annotated preference data, fine-tuning the language model via policy optimization, and performing ablation studies to evaluate robustness, fairness, and alignment trade-offs.”
kantkrishan0206-crypto/AlignGPT
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
About
“This project implements a mini LLM alignment pipeline using Reinforcement Learning from Human Feedback (RLHF). It includes training a reward model from human-annotated preference data, fine-tuning the language model via policy optimization, and performing ablation studies to evaluate robustness, fairness, and alignment trade-offs.”
Topics
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published