RDAgent Finetune LLM #1314

XianBW · 2025-12-15T07:05:38Z

Description

Motivation and Context

How Has This Been Tested?

If you are adding a new feature, test on your own test scripts.

Screenshots of Test Results (if appropriate):

Your own tests:

Types of changes

Fix bugs
Add new feature
Update documentation

📚 Documentation preview 📚: https://RDAgent--1314.org.readthedocs.build/en/1314/

…ger needed in host)

…evelop)

…er select(need debug)

…mark dir

…ne scenario

…dels

…ator

test/finetune/test_benchmark_api.py

+    )
+
+    config_file = workspace / "config.py"
+    config_file.write_text(config_content)


In general, the fix is to avoid persisting the API key in clear text in the generated config.py. Instead, the config should reference the key from a safer source (for example, environment variables), or the portion of the template dealing with credentials should be removed and the runtime should read them from the environment. For the hard-coded API_KEY = "sk-1234" example, we should also stop embedding a literal key in the script and instead demonstrate reading from an environment variable.

The best fix here, without changing the overall behavior of the benchmark runner, is:

Stop passing api_key into generate_api_config and stop inserting it into API_CONFIG_TEMPLATE.

Modify the template so that, instead of having a concrete key value formatted in, it reads the API key from os.environ["API_KEY"] at runtime (or similar).

Update run_benchmark_api so that it no longer passes the key into generate_api_config; instead, it sets the environment variable (for the Docker process or worker) if needed.

In the __main__ example, remove the hard-coded API_KEY = "sk-1234" and show how to set it via environment (os.environ.setdefault("API_KEY", "...") or just read it, failing if not set).

Because we must not assume other files’ behavior, we will: (1) remove use of api_key from the config template and function call, ensuring the key is no longer written to disk; and (2) keep the function signatures intact where possible to avoid breaking callers, but mark api_key as unused in the template. We’ll also replace the hard-coded API_KEY = "sk-1234" default with reading from an environment variable, so secrets are not stored in source code.

Concretely in test/finetune/test_benchmark_api.py:

Update API_CONFIG_TEMPLATE so that it does not contain any {api_key} placeholder or other key contents; instead, it will read API_KEY from the environment inside the generated config code (e.g., os.environ.get("API_KEY")). This ensures the key is available at runtime but never written into the file content.

Keep generate_api_config’s api_key parameter but stop using it in the .format() call; this preserves the call signature for any other uses while eliminating the clear-text storage.

In run_benchmark_api, keep receiving api_key and use os.environ.setdefault("API_KEY", api_key) (or just os.environ["API_KEY"] = api_key) before invoking downstream processes, rather than passing it into generate_api_config for file interpolation.

In the __main__ section, remove the literal API_KEY = "sk-1234" and instead read from os.environ.get("API_KEY"), optionally falling back to None or raising an error, so secrets are not embedded in the file.

These edits are all within the shown file and ensure that the API key is no longer written to disk in clear text, addressing all alert variants.

Jensen246 added 30 commits October 22, 2025 03:15

rename config.yaml as train.yaml

23df0f8

fix: path bug in coder

82ee0f0

delete data_preprocess, enable get_dataset_info

ba40987

delete code about preprocess, enable finetune with one dataset_info.json

ac24a57

refine prompt to get file_name of dir correctly

d68a951

remove pre_model_path, move import

f7f7830

rename task

d3ee94d

enable task components, refactor TrainingTask

f82f5d0

feat: eval yaml of coder, including mini-batch test

e7fcdbd

refactor: remove deprecated file, simplify code

ac8dc80

feat: new FTExperiment class

e9d8a70

feat: print filtered params

1e45761

run extract_params in docker, remove make llamafactory-install(no lon…

029a32f

…ger needed in host)

feat: extract llama factory info and save it as file(need debug and d…

17e27c7

…evelop)

fix: use cache in llamafactory info

7a7ebb8

feat: select model automatically

cfdb08b

feat: extract info from llama-factory in docker, refine model&paramet…

fd45c42

…er select(need debug)

refactor: move env prepare&llamafactory manager into scene init

0a20190

feat: check commit hash and refresh llamafactory info

1883520

fix: remove redundant file check and add chmod

1d2ffba

fix: add required fields in yaml&template

d78154c

refactor: simplify llama factory manager

1b31fbf

feat: docker_cache

77f49e0

chore: move import position

9649def

feat: update llama factory optionally

45b863c

chore: rename some parameters

3b6e559

refactor: simplify coder evaluator

a2b7a70

feat: coder docker cache

10cc554

feat: create_ws_ckp for FT, and clean init file

5106556

chore: rename, more gpu_info in exp_gen

6ffe9eb

chelsea97 and others added 16 commits December 23, 2025 13:17

feat: enable tablebench and tableInstruct dataset

1000aa0

refine dataset readme, and coder prompt

3d18e0a

Merge branch 'finetune' of github.com:microsoft/RD-Agent into finetune

93ecc78

refine proposal and coder prompt

5b88eac

fix: ui path (default log path)

d830351

feat: add automatic LoRA model merging for benchmarking with vLLM

a225fd5

refactor: reorganize finetune benchmark and merge modules under bench…

90c621d

…mark dir

refactor: modularize benchmark config and error extraction for finetu…

7cc2a8a

…ne scenario

fix: update benchmark import paths and disable env cache for device info

d232af0

refactor docke&conda env and fix import bugs

bc0742b

Merge branch 'finetune' of github.com:microsoft/RD-Agent into finetune

46743d0

modify init python file

18f85be

feat: add FinanceIQ dataset split utility and integrate with pipeline

97e2f4c

feat: set weak and strong model by env, distribute workload across mo…

e73ffc6

…dels

Merge branch 'finetune' of github.com:microsoft/RD-Agent into finetune

18e7207

feat: sample dataset and rm params for tensorboard, wandb

325d0d2

Jensen246 force-pushed the finetune branch from 7e32165 to 325d0d2 Compare December 24, 2025 13:39

Jensen246 added 3 commits December 24, 2025 13:58

update script to run jobs

66a5677

refine proposal prompt, remove specific dataset name

b6b967f

fix(ui): auto switch log folder

2c9fbc2

Jensen246 force-pushed the finetune branch from 099648e to 2c9fbc2 Compare December 24, 2025 15:12

Jensen246 and others added 7 commits December 25, 2025 08:36

fix: estimate the processed full data after sample

9143e6f

feat: filter raw data more aggressively, and lower data_eval standard

bdf9f5b

feat: sync workspace to blob

62f0c58

feat: rdkit for chemcotbench

5d07fea

update qwen2.5&llama3.1 context

7c0610e

fix: force failure on validation error and remove try/except in valid…

0bbc492

…ator

feat: unified error sample extraction (with test scripts)

01a65b5

github-advanced-security bot found potential problems Dec 28, 2025

View reviewed changes

feat: set conda cache with .env

3e60b88

@@ -141,7 +141,6 @@
                     limit_config=limit_config,
                     model_abbr=model_abbr,
                     model_path=model_path,
-                    api_key=api_key,
                     api_base=api_base,
                     work_dir=work_dir,
                     max_out_len=max_out_len,
@@ -202,6 +201,10 @@
                 # OpenAISDK class (LLM judge) auto-appends /chat/completions, so use base only
                 docker_api_base_sdk = "http://localhost:3000/v1"
+                # Expose API key to benchmark environment via environment variable
+                if api_key is not None:
+                    os.environ["API_KEY"] = api_key
                 # Generate config.py
                 config_content = generate_api_config(
                     model_abbr=f"api-{benchmark_name}",
@@ -296,7 +299,7 @@
                 os.chdir(_project_root)
                 # ==================== API Configuration ====================
-                API_KEY = "sk-1234"
+                API_KEY = os.environ.get("API_KEY")
                 API_BASE = "http://localhost:3000"
                 MODEL = "gpt-4o-mini"
                 HF_TOKEN = "hf_xxxx"  # For gated datasets

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

RDAgent Finetune LLM #1314

RDAgent Finetune LLM #1314

XianBW commented Dec 15, 2025 •

edited by github-actions bot

Loading

Uh oh!

Check failure

Copilot Autofix

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Uh oh!

RDAgent Finetune LLM #1314

Are you sure you want to change the base?

RDAgent Finetune LLM #1314

Conversation

XianBW commented Dec 15, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

How Has This Been Tested?

Screenshots of Test Results (if appropriate):

Types of changes

Uh oh!

Check failure

Uh oh!

Uh oh!

Uh oh!

Copilot Autofix

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

XianBW commented Dec 15, 2025 •

edited by github-actions bot

Loading