Skip to content
Draft
Show file tree
Hide file tree
Changes from 250 commits
Commits
Show all changes
300 commits
Select commit Hold shift + click to select a range
ba40987
delete data_preprocess, enable get_dataset_info
Jensen246 Sep 4, 2025
ac24a57
delete code about preprocess, enable finetune with one dataset_info.json
Jensen246 Sep 4, 2025
d68a951
refine prompt to get file_name of dir correctly
Jensen246 Sep 4, 2025
f7f7830
remove pre_model_path, move import
Jensen246 Sep 5, 2025
d3ee94d
rename task
Jensen246 Sep 5, 2025
f82f5d0
enable task components, refactor TrainingTask
Jensen246 Sep 5, 2025
e7fcdbd
feat: eval yaml of coder, including mini-batch test
Jensen246 Sep 5, 2025
ac8dc80
refactor: remove deprecated file, simplify code
Jensen246 Sep 5, 2025
e9d8a70
feat: new FTExperiment class
Jensen246 Sep 5, 2025
1e45761
feat: print filtered params
Jensen246 Sep 5, 2025
029a32f
run extract_params in docker, remove make llamafactory-install(no lon…
Jensen246 Sep 8, 2025
17e27c7
feat: extract llama factory info and save it as file(need debug and d…
Jensen246 Sep 9, 2025
7a7ebb8
fix: use cache in llamafactory info
Jensen246 Sep 9, 2025
cfdb08b
feat: select model automatically
Jensen246 Sep 9, 2025
fd45c42
feat: extract info from llama-factory in docker, refine model&paramet…
Jensen246 Sep 15, 2025
0a20190
refactor: move env prepare&llamafactory manager into scene init
Jensen246 Sep 15, 2025
1883520
feat: check commit hash and refresh llamafactory info
Jensen246 Sep 15, 2025
1d2ffba
fix: remove redundant file check and add chmod
Jensen246 Sep 18, 2025
d78154c
fix: add required fields in yaml&template
Jensen246 Oct 9, 2025
1b31fbf
refactor: simplify llama factory manager
Jensen246 Oct 9, 2025
77f49e0
feat: docker_cache
Jensen246 Oct 10, 2025
9649def
chore: move import position
Jensen246 Oct 10, 2025
45b863c
feat: update llama factory optionally
Jensen246 Oct 11, 2025
3b6e559
chore: rename some parameters
Jensen246 Oct 11, 2025
a2b7a70
refactor: simplify coder evaluator
Jensen246 Oct 11, 2025
10cc554
feat: coder docker cache
Jensen246 Oct 11, 2025
5106556
feat: create_ws_ckp for FT, and clean init file
Jensen246 Oct 11, 2025
6ffe9eb
chore: rename, more gpu_info in exp_gen
Jensen246 Oct 13, 2025
58326be
feat: add sample , sample count&data size
Jensen246 Oct 13, 2025
71f808c
chore: rename, reset prompt
Jensen246 Oct 13, 2025
49d9d2f
chore: update TODO
Hoder-zyf Oct 14, 2025
67722e0
fix: check if any file exists in info directory before extracting LLa…
Jensen246 Oct 15, 2025
76dae4e
fix: add yaml in md5_hash(for cached_run)
Jensen246 Oct 15, 2025
5bb1bd8
fix(proposal): yaml path bug
Jensen246 Oct 20, 2025
f8f1ac9
refactor: remove fallback in exp_gen
Jensen246 Oct 20, 2025
49ce181
feat: json schema in expgen
Jensen246 Oct 20, 2025
61df6d4
fix: filter params, support tensorboard
Jensen246 Oct 20, 2025
6da0f1a
add tensorboard in requirements
Jensen246 Oct 20, 2025
3366e90
fix bugs
Jensen246 Oct 21, 2025
993b46c
refactor: rm redundant __init__ in ds
Jensen246 Oct 21, 2025
7121089
fix: use await in direct_exp_gen
Jensen246 Oct 21, 2025
8007f05
docs: add TODO for pending_task_list
Jensen246 Oct 21, 2025
a0362d0
refactory: rm create_ws_ckp, inherit from FBWorkspace
Jensen246 Oct 21, 2025
492ef18
refactor: rm redundant feedback&running func, which have been inherited
Jensen246 Oct 21, 2025
35484f7
refactor: rm base_model download during scenario init
Jensen246 Oct 21, 2025
d8b8ef8
refactor(env): use declarative config for exclude_chmod_paths from sh…
Jensen246 Oct 21, 2025
1fd5b7c
refactor: remove LLMPipelineEvaluator
Jensen246 Oct 22, 2025
ec4bada
refactor: remove pending_tasks_list, use sub_tasks directly
Jensen246 Oct 22, 2025
2935625
feat(DockerEnv): show log dynamically, save log in docker(optional)
Jensen246 Oct 22, 2025
11c91a9
refactor(coder): remove debug_mode, which is outdated
Jensen246 Oct 22, 2025
2afaed8
Merge branch 'main' into qzli/ft
Jensen246 Oct 22, 2025
c5bdf04
fix: exclude template field(auto detect)
Jensen246 Oct 22, 2025
6d2817c
fix: set ckp size limit
Jensen246 Oct 22, 2025
13d8c2b
refactor: rename evaluator
Jensen246 Oct 23, 2025
02ba0fe
refactor(runner): rename classes, set es improve_mode
Jensen246 Oct 23, 2025
b845cd6
fix: enhance feedback handling in MultiProcessEvolvingStrategy for im…
peteryang1 Oct 23, 2025
bd024fc
Merge remote-tracking branch 'upstream/xuyang1/fix_improve_mode_bug' …
Jensen246 Oct 23, 2025
5a3d02d
Merge remote-tracking branch 'upstream/main' into qzli/ft
Jensen246 Oct 23, 2025
623a656
fix: enforce queried_knowledge requirement and update type hints for …
you-n-g Oct 24, 2025
14b259d
refactor: refine exp_feedback
Jensen246 Oct 24, 2025
1b118b8
feat: frame for runner and exp feedback
Jensen246 Oct 24, 2025
95b76af
refactor: simplify runner eval
Jensen246 Oct 27, 2025
ca68896
refactor: rm model_dump setting
Jensen246 Oct 27, 2025
ecb5748
refactor: rm eval in exp2feedback
Jensen246 Oct 27, 2025
e77bac6
feat: eval with opencompass
Jensen246 Oct 27, 2025
ef0932e
fix: set model_abbr
Jensen246 Oct 27, 2025
814124e
feat: Benchmark Docker
Jensen246 Oct 27, 2025
aff5e2f
feat: lm_eval instead of opencompass
Jensen246 Oct 29, 2025
71f7747
feat: share benchmarks cache
Jensen246 Oct 29, 2025
af2ced7
feat(benchmark): set gpu_nums automatically
Jensen246 Oct 29, 2025
ba6be59
fix(benchmark): get gpu count
Jensen246 Oct 29, 2025
d1e8e60
chore: use aime25 benchmark
Jensen246 Oct 29, 2025
fbc483a
fix: git clone lm_eval, for newest benchmark
Jensen246 Oct 29, 2025
92bc892
feat(benchmark): limit and more log info
Jensen246 Oct 30, 2025
4dfba68
use env to get gpu count
peteryang1 Nov 4, 2025
2fd1844
feat: use opencompass instead of lm_eval
Jensen246 Nov 4, 2025
4bc4b0a
refactor(benchmark): simplify sh
Jensen246 Nov 5, 2025
61052f3
enable costeer
peteryang1 Nov 6, 2025
7f82cf2
Merge branch 'main' into qzli/ft
peteryang1 Nov 6, 2025
6d8ea6d
some update on the code
peteryang1 Nov 6, 2025
4d3da5b
feat(coder): add type and default value in prompt
Jensen246 Nov 6, 2025
9921831
fix: unable to quantization
Jensen246 Nov 6, 2025
723d55a
fix: docker for opencompass
Jensen246 Nov 6, 2025
058a4b1
fix: eval with opencompass[vllm]
Jensen246 Nov 6, 2025
d0f1f46
docs: llm ft readme
Jensen246 Nov 7, 2025
a2c3ccd
fix: coder and benchmark bugs
Jensen246 Nov 8, 2025
c85fb2e
remove useless doc
Jensen246 Nov 10, 2025
9cb2415
feat: add feedback in expgen
Jensen246 Nov 10, 2025
33d62fe
expgen gets all params and decides more
Jensen246 Nov 10, 2025
c686e1f
update all code
peteryang1 Nov 11, 2025
9e81941
resume generate_dataset_info_config
peteryang1 Nov 11, 2025
904fef1
new update
peteryang1 Nov 11, 2025
75f0986
improve proposal prompt
peteryang1 Nov 12, 2025
433b1c1
refactor: dataset info, and fix some bugs
Jensen246 Nov 12, 2025
190fe2e
update exp_gen
peteryang1 Nov 12, 2025
d36ce4c
fix several bugs
peteryang1 Nov 12, 2025
a68c94c
update prompt
peteryang1 Nov 14, 2025
86d0240
some update to code
peteryang1 Nov 14, 2025
357ec13
use scenario to describe the data and device
peteryang1 Nov 17, 2025
5049c4f
simple update on prompts
peteryang1 Nov 18, 2025
25c295b
small update
peteryang1 Nov 18, 2025
0b94fbd
update code
peteryang1 Nov 18, 2025
df37d49
add dataset info into scenario
peteryang1 Nov 19, 2025
f01792b
remove redundant parameters
Jensen246 Nov 20, 2025
0fcc3bc
enable opencompass eval
Jensen246 Nov 22, 2025
19e08d5
fix benchmark test(enable dotenv)
Jensen246 Nov 22, 2025
517347c
fix lora eval bug
Jensen246 Nov 22, 2025
836d187
some update for benchmark
Jensen246 Nov 23, 2025
a5b2aca
update feedback
peteryang1 Nov 20, 2025
c15bcfb
finalize feedback
peteryang1 Nov 24, 2025
0e7dc31
user benchmark as input
peteryang1 Nov 24, 2025
4d5d19a
clean benchmark code
Jensen246 Nov 24, 2025
83eba89
some update to benchmark.py
peteryang1 Nov 24, 2025
c293ba0
new benchmark file
peteryang1 Nov 24, 2025
7a78046
several small update
peteryang1 Nov 24, 2025
a837282
modify the prompt of coder
peteryang1 Nov 24, 2025
55601b7
remove oft, which has been obsoleted
Jensen246 Nov 24, 2025
8f83d31
comment for 2 level dataset instructure
Jensen246 Nov 24, 2025
0c92d34
prompts and dockerfile refine for using deepspeed and fa2
Jensen246 Nov 24, 2025
b95254c
lint
Jensen246 Nov 24, 2025
2e7f69a
hot fix
peteryang1 Nov 25, 2025
3cf9ac9
several major update
peteryang1 Nov 25, 2025
84972cf
prompt key refinement
peteryang1 Nov 25, 2025
dc2e96a
refine prompt
peteryang1 Nov 25, 2025
eb613cd
Merge branch 'main' into qzli/ft
peteryang1 Nov 25, 2025
7d2b64b
small update
peteryang1 Nov 25, 2025
3574238
fix a small bug
peteryang1 Nov 26, 2025
2056c0b
remove debug config after execution
peteryang1 Nov 27, 2025
0979827
fix: only remove <think> at start
Jensen246 Nov 27, 2025
1f2ca73
feat: support creating dataset & multi-eval frame (#1302)
you-n-g Nov 28, 2025
b16c3a7
feat: data implement for pre-proposal and proposal and add datasets (…
Hoder-zyf Nov 28, 2025
e489d9c
feat: add stats in dataset_info, and enable data coder (#1306)
Jensen246 Dec 1, 2025
e104f50
feat: Merge data coder (#1307)
Jensen246 Dec 1, 2025
5b7dc33
replace str length with token_limit
Hoder-zyf Dec 1, 2025
41fc3c5
add readme to dataset_info and remove useless blank lines in scenario…
Hoder-zyf Dec 1, 2025
a7e2734
feat: dataset prepare
Jensen246 Dec 2, 2025
c4d59b5
fix: extract prams script name
Jensen246 Dec 2, 2025
a5e306c
feat: add loss&predictions samples to feedback
Jensen246 Dec 2, 2025
24d4f2c
remove duplicate envs and and add llm_api_preferences and enhance rea…
Hoder-zyf Dec 2, 2025
e9574e6
feat: network for ft_env
Jensen246 Dec 3, 2025
9c68d3c
fix: remove gpt-4o, which has low quota
Jensen246 Dec 3, 2025
324fac2
feat: a simple ui
Jensen246 Dec 3, 2025
c89fab9
feat: merge data and train task type (#1309)
Jensen246 Dec 3, 2025
20bd353
feat: filter redundant prams of lf
Jensen246 Dec 3, 2025
e921e8a
fix: ui bug caused by removing task_type
Jensen246 Dec 3, 2025
6ca6f7a
fix: force agent to use high concurrency, and remove redundant prompt
Jensen246 Dec 4, 2025
6042eca
feat: extract info from llama factory log, and check data exists befo…
Jensen246 Dec 4, 2025
ddeb8b4
fix: add compatibility rules
Jensen246 Dec 4, 2025
6a0290a
feat: llm evaluator for data coder
Jensen246 Dec 4, 2025
d873181
feat: openai package in ft docker, and refine prompt
Jensen246 Dec 5, 2025
c17f8ae
feat: refine ft ui, add more info
Jensen246 Dec 5, 2025
8d5a4c3
feat: add raw logs
Jensen246 Dec 5, 2025
54c2b71
refine data coder prompt(for feedback debug)
Jensen246 Dec 5, 2025
88ac617
feat: select dataset in scen init
Jensen246 Dec 5, 2025
63e849e
fix: ui for docker log seperately
Jensen246 Dec 5, 2025
42abe13
feat: sync log through blob
Jensen246 Dec 6, 2025
e11df1c
improve ui, and add llm feedback in Runner&Exp2FB (#1312)
Jensen246 Dec 7, 2025
fbdbdc0
feat(UI): add running info and benchmark metric in loop expander
Jensen246 Dec 8, 2025
f433d3c
feat(UI): add render markdown toggle
Jensen246 Dec 8, 2025
700569e
feat: refine prompts and add error type in exp2fb
Jensen246 Dec 8, 2025
c3a2bd7
feat: add filterd params reason, set default benchmark timeout to inf…
Jensen246 Dec 8, 2025
be1faa2
recover dataset deepscaler
Jensen246 Dec 8, 2025
fec6d76
feat: set timeout in .env
Jensen246 Dec 9, 2025
7a37170
refactor: unifiied ft_env timeout
Jensen246 Dec 9, 2025
fe61835
feat: debug mode for data coder
Jensen246 Dec 11, 2025
98d77e4
feat: deliver data_stats after generate debug_data
Jensen246 Dec 11, 2025
6ff8105
feat: use gpt-5.1 as judge model, set judge_retry, and refine debug m…
Jensen246 Dec 11, 2025
38c9574
refine prompt
Jensen246 Dec 11, 2025
4d023c3
refactor: llama factory manager logic, and refine data processing prompt
Jensen246 Dec 11, 2025
8bab158
feat(DockerEnv): support GPU selection via CUDA_VISIBLE_DEVICES
Jensen246 Dec 11, 2025
7ff3e91
feat: set api concurrency via .env
Jensen246 Dec 11, 2025
f3b6cc2
fix: ft env timeout bug
Jensen246 Dec 11, 2025
171a07c
feat: enable CondaEnv run
Jensen246 Dec 12, 2025
39efce0
fix: can't update bin path in first run, and path bug in lf manager
Jensen246 Dec 12, 2025
506491b
feat(ui): set log path through .env
Jensen246 Dec 15, 2025
42d4ca6
refactor(ui): wrap_lines, remove css
Jensen246 Dec 15, 2025
625da8a
feat(coder): retry when parse code-block fail
Jensen246 Dec 15, 2025
442514b
fix: refine single-fb in ui, and fix path bug(not allow proposal to d…
Jensen246 Dec 15, 2025
3cfb3ad
fix: opencompass CondaEnv torch compatible with vllm
Jensen246 Dec 16, 2025
3924326
fix: refine error text in coding
Jensen246 Dec 16, 2025
96e2dc4
feat: deepspeed config for CondaEnv
Jensen246 Dec 16, 2025
92c5770
feat: memory estimator
Jensen246 Dec 16, 2025
9b7f7c8
fix: deepspeed package for condaenv
Jensen246 Dec 16, 2025
410a5b5
fix: use `client.chat.completions.create()` only
Jensen246 Dec 16, 2025
a7f1b8a
feat: flash attention for condaenv
Jensen246 Dec 16, 2025
23a9d7b
feat: strong and weak models interface
Jensen246 Dec 16, 2025
00de2c0
fix: condaenv package dependency
Jensen246 Dec 16, 2025
4c5b6fb
use multi round conversation in llm finetune proposal
peteryang1 Dec 17, 2025
7345abe
refine prompt for data processing
Jensen246 Dec 17, 2025
336c6cd
Merge branch 'finetune' of github.com:microsoft/RD-Agent into finetune
Jensen246 Dec 17, 2025
3fa8280
enable evolving in data coder
peteryang1 Dec 17, 2025
3ee61d1
maximize output token size
peteryang1 Dec 17, 2025
d6c26e3
fix: refine ui
Jensen246 Dec 17, 2025
271167b
Merge branch 'finetune' of github.com:microsoft/RD-Agent into finetune
Jensen246 Dec 17, 2025
75f7a8c
fix: optional packages for llama factory
Jensen246 Dec 17, 2025
d3e20e5
fix: torch denpendency for b200
Jensen246 Dec 17, 2025
ddb6315
fix: opencompass dependency
Jensen246 Dec 18, 2025
b0a79f3
update cot prompts
Hoder-zyf Dec 18, 2025
99367bd
skip the sub implement
peteryang1 Dec 18, 2025
90b728c
skip conda preparation if env exists
peteryang1 Dec 18, 2025
f19027c
update chemcot datasets
Hoder-zyf Dec 18, 2025
77a37b7
fix: unify docker to use litellm
Jensen246 Dec 18, 2025
cc01cdc
update readme and instructions
Hoder-zyf Dec 19, 2025
daaab94
fix: set CUDA_VISIBLE_DEVICES for CondaEnv
Jensen246 Dec 21, 2025
936a181
feat: add panorama dataset, refactor dataset interface
Jensen246 Dec 21, 2025
a81ffb4
feat: calculate token using tiktoken, and ndarray bug
Jensen246 Dec 21, 2025
08f8e86
Merge upstream/chemcot: add chemcot dataset with DatasetConfig interface
Jensen246 Dec 21, 2025
7763cc6
fix: download subtasks of chemcotdataset seperately
Jensen246 Dec 21, 2025
0a41502
feat: customized prepare func for datasets
Jensen246 Dec 21, 2025
781d6d0
feat: update new benchmarks
Jensen246 Dec 21, 2025
5b22c9c
add datasets package
Jensen246 Dec 21, 2025
a963bfe
docs: readme for llm finetune
Jensen246 Dec 22, 2025
4a6a4fe
feat: download raw data directly, with post-process function
Jensen246 Dec 22, 2025
b26e72c
feat: analyze raw dataset
Jensen246 Dec 22, 2025
3d71857
suppress litellm debug info
Jensen246 Dec 22, 2025
0d1fd17
feat(ui): summary page
Jensen246 Dec 23, 2025
473cfe5
feat: run multi-jobs
Jensen246 Dec 23, 2025
fe3374c
feat: improve ui
Jensen246 Dec 23, 2025
60c3e75
feat: add path and checkout options to LLM finetune loop entrypoint
you-n-g Dec 23, 2025
37c7804
feat: add FinanceIQ_ppl benchmark with auto-download and dataset desc…
you-n-g Dec 23, 2025
37147c4
refactor: remove unused imports and dead code, fix session folder log…
you-n-g Dec 23, 2025
1000aa0
feat: enable tablebench and tableInstruct dataset
chelsea97 Dec 23, 2025
3d18e0a
refine dataset readme, and coder prompt
Jensen246 Dec 23, 2025
93ecc78
Merge branch 'finetune' of github.com:microsoft/RD-Agent into finetune
Jensen246 Dec 23, 2025
5b88eac
refine proposal and coder prompt
Jensen246 Dec 23, 2025
d830351
fix: ui path (default log path)
Jensen246 Dec 23, 2025
a225fd5
feat: add automatic LoRA model merging for benchmarking with vLLM
you-n-g Dec 23, 2025
90c621d
refactor: reorganize finetune benchmark and merge modules under bench…
you-n-g Dec 23, 2025
7cc2a8a
refactor: modularize benchmark config and error extraction for finetu…
you-n-g Dec 23, 2025
d232af0
fix: update benchmark import paths and disable env cache for device info
you-n-g Dec 24, 2025
bc0742b
refactor docke&conda env and fix import bugs
Jensen246 Dec 24, 2025
46743d0
Merge branch 'finetune' of github.com:microsoft/RD-Agent into finetune
Jensen246 Dec 24, 2025
18f85be
modify init python file
chelsea97 Dec 24, 2025
97e2f4c
feat: add FinanceIQ dataset split utility and integrate with pipeline
you-n-g Dec 24, 2025
e73ffc6
feat: set weak and strong model by env, distribute workload across mo…
Jensen246 Dec 24, 2025
18e7207
Merge branch 'finetune' of github.com:microsoft/RD-Agent into finetune
Jensen246 Dec 24, 2025
325d0d2
feat: sample dataset and rm params for tensorboard, wandb
Jensen246 Dec 24, 2025
66a5677
update script to run jobs
Jensen246 Dec 24, 2025
b6b967f
refine proposal prompt, remove specific dataset name
Jensen246 Dec 24, 2025
2c9fbc2
fix(ui): auto switch log folder
Jensen246 Dec 24, 2025
9143e6f
fix: estimate the processed full data after sample
Jensen246 Dec 25, 2025
bdf9f5b
feat: filter raw data more aggressively, and lower data_eval standard
Jensen246 Dec 25, 2025
62f0c58
feat: sync workspace to blob
Jensen246 Dec 25, 2025
5d07fea
feat: rdkit for chemcotbench
Jensen246 Dec 25, 2025
7c0610e
update qwen2.5&llama3.1 context
Jensen246 Dec 25, 2025
0bbc492
fix: force failure on validation error and remove try/except in valid…
you-n-g Dec 26, 2025
01a65b5
feat: unified error sample extraction (with test scripts)
Jensen246 Dec 28, 2025
3e60b88
feat: set conda cache with .env
Jensen246 Dec 28, 2025
ee51fa8
feat: skip data eval if data pass in last evo
Jensen246 Dec 28, 2025
afb867a
fix: rm redundant param
Jensen246 Dec 28, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .env.example
Original file line number Diff line number Diff line change
Expand Up @@ -55,5 +55,7 @@ EMBEDDING_MODEL="litellm_proxy/BAAI/bge-large-en-v1.5"
# Cache Setting (Optional):
# USE_CHAT_CACHE=True
# USE_EMBEDDING_CACHE=True
# FT_DOCKER_ENABLE_CACHE=True
# DS_DOCKER_ENABLE_CACHE=True
# Senario Configs:
# ==========================================
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -182,4 +182,6 @@ static/
# AI assistant
.cursor/
.claude/
AGENTS.md
AGENTS.md

scripts/
256 changes: 256 additions & 0 deletions rdagent/app/finetune/llm/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,256 @@
# LLM Fine-tuning (FT) 场景运行指南

本文档介绍如何运行 RD-Agent 的 LLM Fine-tuning 场景。

## 简介

FT 场景用于自动化优化大语言模型在特定 benchmark 上的表现。系统会自动:
1. 生成数据处理和训练代码
2. 执行模型微调
3. 在目标 benchmark 上评估模型性能
4. 根据反馈迭代改进

## 支持的 Benchmark

| 类别 | Benchmark | 数据集 | 描述 |
|------|-----------|--------|------|
| Math | `aime24`, `aime25` | `deepscaler` | AIME 数学竞赛 |
| Patent | `panorama_par4pc` | `panorama-par4pc` | 专利现有技术检索 |
| Patent | `panorama_pi4pc` | `panorama-pi4pc` | 专利段落识别 |
| Patent | `panorama_noc4pc` | `panorama-noc4pc` | 专利新颖性分类 |
| Chemistry | `chemcotbench_mol_und` | `chemcot-mol_und` | 分子理解 |
| Chemistry | `chemcotbench_mol_edit` | `chemcot-mol_edit` | 分子编辑 |
| Chemistry | `chemcotbench_mol_opt` | `chemcot-mol_opt` | 分子优化 |
| Chemistry | `chemcotbench_reaction` | `chemcot-rxn` | 化学反应预测 |

> 数据集配置位于 `rdagent/scenarios/finetune/datasets/__init__.py` 的 `DATASETS` 字典中。

>运行时agent会查看所有数据集,根据target benchmark和scenario选出与之相关的。

## 环境配置

### 1. 运行环境

确保已安装 `rdagent` 主运行环境,其他需要的运行环境会自动创建

> 在 `.env` 配置文件中通过设置 `FT_Coder_CoSTEER_env_type = conda/docker` 来配置

### 2. .env 配置文件

在项目根目录创建 `.env` 文件,参考以下模板:

```bash
# ========== API Configuration ==========
BACKEND=rdagent.oai.backend.LiteLLMAPIBackend
CHAT_MODEL=gpt-5.2
CHAT_TEMPERATURE=1
CHAT_STREAM=True
OPENAI_API_KEY=sk-xxx
OPENAI_API_BASE=http://your-api-endpoint

EMBEDDING_MODEL=text-embedding-ada-002
EMBEDDING_USE_AZURE=True

# ========== Global Configs ==========
MAX_RETRY=12000
RETRY_WAIT_SECONDS=5
MULTI_PROC_N=16
STEP_SEMAPHORE=1

# ========== Cache Settings ==========
DUMP_CHAT_CACHE=False
USE_CHAT_CACHE=False
DUMP_EMBEDDING_CACHE=True
USE_EMBEDDING_CACHE=True
LOG_LLM_CHAT_CONTENT=True

CHAT_FREQUENCY_PENALTY=0.1
CHAT_PRESENCE_PENALTY=0.0

# ========== FT Scenario Specific ==========
FT_FILE_PATH=/path/to/your/finetune/workspace

# Environment type: docker or conda
# Set to "conda" when Docker is unavailable
FT_Coder_CoSTEER_env_type=conda

# Docker settings (only used when env_type=docker)
FT_DOCKER_ENABLE_CACHE=True
FT_UPDATE_LLAMA_FACTORY=False

# Data processing API concurrency (adjust based on target API capacity)
FT_API_MAX_WORKERS=1000

# Data processing Model
FT_STRONG_MODELS='["gpt-5", "gpt-5.1"]'
FT_WEAK_MODELS='["gpt-4o-mini"]'

# Benchmark and target (can be overridden in script)
FT_TARGET_BENCHMARK=aime25
FT_USER_TARGET_SCENARIO="I need to enhance the model's performance on math reasoning tasks."

# Timeout settings
FT_DATA_PROCESSING_TIMEOUT=28800

# Judge settings (optional)
# FT_JUDGE_MODEL=gpt-5.1
# FT_JUDGE_RETRY=10

REASONING_THINK_RM=True

# ========== Logging ==========
LOG_FORMAT_CONSOLE="{time:YYYY-MM-DD HH:mm:ss.SSS} | {level: <8} | <cyan>{process}</cyan> | {name}:{function}:{line} - {message}"

# ========== HuggingFace ==========
HF_TOKEN=hf_xxx
```

## 运行方法

### 基本命令

```bash
# 激活 conda 环境
conda activate rdagent

# 运行 FT 场景
dotenv run -- python rdagent/app/finetune/llm/loop.py --base-model <MODEL>
```

### 命令行参数

| 参数 | 说明 | 示例 |
|------|------|------|
| `--base-model` | 基础模型名称(必需,其他都可以不填) | `Qwen/Qwen2.5-7B-Instruct` |
| `--benchmark` | 目标 benchmark | `aime25` |
| `--benchmark-description` | Benchmark 描述 | - |
| `--dataset` | 指定数据集 | - |
| `--step-n` | 步数限制 | `10` |
| `--loop-n` | 循环次数限制 | `5` |
| `--timeout` | 总时间限制 | - |

### 运行示例

```bash
# 在 AIME25 上微调 Qwen2.5-7B
dotenv run -- python rdagent/app/finetune/llm/loop.py \
--base-model Qwen/Qwen2.5-7B-Instruct

# 指定 GPU 运行
CUDA_VISIBLE_DEVICES=0,1 dotenv run -- python rdagent/app/finetune/llm/loop.py \
--base-model Qwen/Qwen2.5-7B-Instruct

# 限制循环次数
dotenv run -- python rdagent/app/finetune/llm/loop.py \
--base-model Qwen/Qwen2.5-7B-Instruct \
--loop-n 3
```

### 多任务并行运行

创建 `tasks.json` 配置文件:
```json
{
"tasks": [
{"model": "Qwen/Qwen2.5-7B-Instruct", "benchmark": "aime25", "gpus": "0,1"},
{"model": "Qwen/Qwen2.5-7B-Instruct", "benchmark": "gsm8k", "gpus": "2,3"}
]
}
```

使用 `run_ft_deploy.sh` 脚本运行:
```bash
./run_ft_deploy.sh tasks.json # 正常运行
./run_ft_deploy.sh tasks.json --dry-run # 仅预览配置
./run_ft_deploy.sh tasks.json --no-sync # 禁用 blob 同步
```

<details>
<summary>run_ft_deploy.sh 脚本参考</summary>

```bash
#!/bin/bash
# 多任务并行部署脚本(简化版)

RDAGENT_DIR="$HOME/RD-Agent"
ENV_TEMPLATE=".env.ft"
STAGGER_DELAY=60

cd "$RDAGENT_DIR"
source ~/miniconda3/etc/profile.d/conda.sh
conda activate rdagent

CONFIG_FILE="${1:-tasks.json}"
NUM_TASKS=$(jq '.tasks | length' "$CONFIG_FILE")

for ((i=0; i<NUM_TASKS; i++)); do
model=$(jq -r ".tasks[$i].model" "$CONFIG_FILE")
benchmark=$(jq -r ".tasks[$i].benchmark" "$CONFIG_FILE")
gpus=$(jq -r ".tasks[$i].gpus" "$CONFIG_FILE")

# 更新 .env 中的 benchmark
cp "$ENV_TEMPLATE" .env
sed -i "s|^FT_TARGET_BENCHMARK=.*|FT_TARGET_BENCHMARK=$benchmark|" .env

CUDA_VISIBLE_DEVICES=$gpus \
dotenv run -- python rdagent/app/finetune/llm/loop.py --base-model "$model" &

# 首个任务等待环境创建,后续任务错开启动
[[ $i -eq 0 ]] && sleep 120 || sleep $STAGGER_DELAY
done

wait
```

</details>

## Blob 日志同步

使用 Azure Blob 在多台机器间同步日志文件。

### 1. 生成 SAS Token

```bash
# 首先登录 Azure CLI
az login

# 生成 Token(默认有效期 7 天)
bash rdagent/utils/blob/gen_token.sh

# 或指定过期时间
bash rdagent/utils/blob/gen_token.sh 2025-01-31T00:00Z
```

Token 会保存到 `git_ignore_folder/.az_sas_token`。

### 2. 同步日志

同步路径:`log/` ↔ `blob://epeastus/rdagent/FinetuneAgenticLLM/FT_qizheng/logs`

```bash
# 上传本地日志到 Blob
bash rdagent/utils/blob/azsync.sh up

# 从 Blob 下载日志到本地
bash rdagent/utils/blob/azsync.sh down
```

> 如需修改远程路径,编辑 `rdagent/utils/blob/azsync.sh` 中的 `REMOTE_PATH` 变量。

## 日志查看

运行日志保存在 `log/` 目录下:

```
log/
└── 2025-01-01_12-00-00-123456/
├── Loop_0/
│ ├── direct_exp_gen/ # 假设生成
│ ├── coding/ # 代码生成
│ ├── running/ # 训练执行
│ └── feedback/ # 反馈总结
└── Loop_1/
└── ...
```


Loading
Loading