Pipeline rl #322

J-SUPHA · 2026-01-19T02:34:48Z

PR Type

RL Environment PR - Complete Environment Snapshot & Zero-Training sections
Non-Environment PR - Complete Description, Related Issues & Type of Change sections

📝 General Information

Description

Introduced Training with LORAs and training via shared weights to Atropos example trainer

Not an issue but a feature

Type of Change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update
Code refactor (no functional changes)
Build/CI/CD related changes
Other (please describe):

✅ Developer & Reviewer Checklist

Code follows project style (black, isort, flake8 pass with pre-commit)
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
New and existing unit tests pass locally with my changes
Docstrings added for all new public classes / functions
If .env vars required, did you add it to the .env.example in repo root?

for more information, see https://pre-commit.ci

dmahan93 · 2026-01-20T18:28:45Z

so some comments

can you break up the giant files a bit 😅
How are you handling QKV? vLLM has QKV in one tensor, but HF has it on three different ones right?
How does the LoRA work here?

J-SUPHA · 2026-01-20T18:39:53Z

Will break up the files once I fix and make sure point 2 works as intended
Yes this was is a mistake on my end. HF and vLLM both have fused qkv for Qwen So I assumed all did will fix and report back once I test it out with all models
For the LORA workflow it goes Trainer loads HF base model - Creates LoRA adapters on q_proj, v_proj - Trains adapters - Saves adapter - Calls POST /lora/load to vLLM. But the Lora/Load endpoint in vllm_api_server is not correct - will update this as well

for more information, see https://pre-commit.ci

dmahan93 · 2026-01-20T18:48:06Z

https://github.com/huggingface/transformers/blob/main/src%2Ftransformers%2Fmodels%2Fqwen3%2Fmodeling_qwen3.py#L236-L246

do they load it up differently?

J-SUPHA · 2026-01-20T19:04:15Z

No - you are right will look into it

for more information, see https://pre-commit.ci

J-SUPHA added 30 commits January 18, 2026 21:31

initial commit

0a6b031

correction

b39e0ba

local version

2f54930

changes

f32626c

generate endpoint with logprobs

fe195df

stuff

07d9d4d

gradient checkpointing issue for LoRAs

bfba2a1

design choice - LoRA and shared vLLM through the bridge

c16db8a

smol changes

8f75f4d

training bug

295683c

tracking

889644c

standardize the training approach

001b890

readme updates

86c0fe1

add missing parameter

17785ed

health changes

4fc2574

IPC updates

1bcc9ee

vllm underlying weights

24a7fca

weight updates async

6924be1

Cleanup

b826fad

changes based on torchtitan

d912a3a

changes based on torchtitan 2

fd7e767

monkey patch fixes

0118c3b

daemon errors

02986bb

param locations update

d10a5fc

error handling

e3cb9c0

changes

9e0fc2a

basic changes

b282dfa

patching problem

4dde427

better debugging

60d889a

improve default

d7e8182

J-SUPHA and others added 4 commits January 19, 2026 14:45

linter stuff

e1d3683

README updates

1c7eae8

[pre-commit.ci] auto fixes from pre-commit.com hooks

97a561f

for more information, see https://pre-commit.ci

linting

790bcad

J-SUPHA and others added 2 commits January 20, 2026 13:44

LORA 1

38a11bd

[pre-commit.ci] auto fixes from pre-commit.com hooks

a2cdb84

for more information, see https://pre-commit.ci

J-SUPHA added 2 commits January 20, 2026 22:13

hot swap adapter

c898d41

editing

a2fb678

J-SUPHA force-pushed the pipelineRL branch from 4e474bd to a2fb678 Compare January 21, 2026 03:19

editing

373dd5a

J-SUPHA force-pushed the pipelineRL branch from a02cab0 to 373dd5a Compare January 21, 2026 03:36

testing scripts

58888c8

J-SUPHA force-pushed the pipelineRL branch from 2741176 to 58888c8 Compare January 21, 2026 16:00

pipelineRL

41875eb

J-SUPHA force-pushed the pipelineRL branch from c6fe802 to 41875eb Compare January 21, 2026 22:02

evals

aa8a502

J-SUPHA force-pushed the pipelineRL branch from 0daf46f to aa8a502 Compare January 21, 2026 22:09

evals

360a16c

J-SUPHA force-pushed the pipelineRL branch from b31c901 to 360a16c Compare January 21, 2026 22:11

evals erros

30ecd86

J-SUPHA force-pushed the pipelineRL branch from c2c1775 to 30ecd86 Compare January 21, 2026 22:15

evals errors

2584144

J-SUPHA force-pushed the pipelineRL branch from a3e15dc to 2584144 Compare January 21, 2026 22:19

evals errors

f982fe4

J-SUPHA force-pushed the pipelineRL branch from b88fbe1 to f982fe4 Compare January 21, 2026 22:25

[pre-commit.ci] auto fixes from pre-commit.com hooks

3b73d1f

for more information, see https://pre-commit.ci

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Pipeline rl #322

Pipeline rl #322

Uh oh!

J-SUPHA commented Jan 19, 2026 •

edited

Loading

Uh oh!

dmahan93 commented Jan 20, 2026

Uh oh!

J-SUPHA commented Jan 20, 2026 •

edited

Loading

Uh oh!

dmahan93 commented Jan 20, 2026

Uh oh!

J-SUPHA commented Jan 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Pipeline rl #322

Are you sure you want to change the base?

Pipeline rl #322

Uh oh!

Conversation

J-SUPHA commented Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Type

📝 General Information

Description

Type of Change

✅ Developer & Reviewer Checklist

Uh oh!

dmahan93 commented Jan 20, 2026

Uh oh!

J-SUPHA commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dmahan93 commented Jan 20, 2026

Uh oh!

J-SUPHA commented Jan 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

J-SUPHA commented Jan 19, 2026 •

edited

Loading

J-SUPHA commented Jan 20, 2026 •

edited

Loading