Skip to content

“This project implements a mini LLM alignment pipeline using Reinforcement Learning from Human Feedback (RLHF). It includes training a reward model from human-annotated preference data, fine-tuning the language model via policy optimization, and performing ablation studies to evaluate robustness, fairness, and alignment trade-offs.”

Notifications You must be signed in to change notification settings

kantkrishan0206-crypto/AlignGPT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

188 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AlignGPT

“This project demonstrates a minimal RLHF loop for aligning language models with human preferences. It includes supervised fine-tuning, preference data collection, reward model training, and policy optimization using PPO or DPO. Designed for clarity, reproducibility, and scalability.” image

About

“This project implements a mini LLM alignment pipeline using Reinforcement Learning from Human Feedback (RLHF). It includes training a reward model from human-annotated preference data, fine-tuning the language model via policy optimization, and performing ablation studies to evaluate robustness, fairness, and alignment trade-offs.”

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published