-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
RDAgent Finetune LLM #1314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
RDAgent Finetune LLM #1314
Conversation
…ger needed in host)
…er select(need debug)
| ) | ||
|
|
||
| config_file = workspace / "config.py" | ||
| config_file.write_text(config_content) |
Check failure
Code scanning / CodeQL
Clear-text storage of sensitive information High test
sensitive data (password)
This expression stores
sensitive data (password)
This expression stores
sensitive data (password)
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI about 3 hours ago
In general, the fix is to avoid persisting the API key in clear text in the generated config.py. Instead, the config should reference the key from a safer source (for example, environment variables), or the portion of the template dealing with credentials should be removed and the runtime should read them from the environment. For the hard-coded API_KEY = "sk-1234" example, we should also stop embedding a literal key in the script and instead demonstrate reading from an environment variable.
The best fix here, without changing the overall behavior of the benchmark runner, is:
- Stop passing
api_keyintogenerate_api_configand stop inserting it intoAPI_CONFIG_TEMPLATE. - Modify the template so that, instead of having a concrete key value formatted in, it reads the API key from
os.environ["API_KEY"]at runtime (or similar). - Update
run_benchmark_apiso that it no longer passes the key intogenerate_api_config; instead, it sets the environment variable (for the Docker process or worker) if needed. - In the
__main__example, remove the hard-codedAPI_KEY = "sk-1234"and show how to set it via environment (os.environ.setdefault("API_KEY", "...")or just read it, failing if not set).
Because we must not assume other files’ behavior, we will: (1) remove use of api_key from the config template and function call, ensuring the key is no longer written to disk; and (2) keep the function signatures intact where possible to avoid breaking callers, but mark api_key as unused in the template. We’ll also replace the hard-coded API_KEY = "sk-1234" default with reading from an environment variable, so secrets are not stored in source code.
Concretely in test/finetune/test_benchmark_api.py:
- Update
API_CONFIG_TEMPLATEso that it does not contain any{api_key}placeholder or other key contents; instead, it will readAPI_KEYfrom the environment inside the generated config code (e.g.,os.environ.get("API_KEY")). This ensures the key is available at runtime but never written into the file content. - Keep
generate_api_config’sapi_keyparameter but stop using it in the.format()call; this preserves the call signature for any other uses while eliminating the clear-text storage. - In
run_benchmark_api, keep receivingapi_keyand useos.environ.setdefault("API_KEY", api_key)(or justos.environ["API_KEY"] = api_key) before invoking downstream processes, rather than passing it intogenerate_api_configfor file interpolation. - In the
__main__section, remove the literalAPI_KEY = "sk-1234"and instead read fromos.environ.get("API_KEY"), optionally falling back toNoneor raising an error, so secrets are not embedded in the file.
These edits are all within the shown file and ensure that the API key is no longer written to disk in clear text, addressing all alert variants.
-
Copy modified lines R204-R207 -
Copy modified line R302
| @@ -141,7 +141,6 @@ | ||
| limit_config=limit_config, | ||
| model_abbr=model_abbr, | ||
| model_path=model_path, | ||
| api_key=api_key, | ||
| api_base=api_base, | ||
| work_dir=work_dir, | ||
| max_out_len=max_out_len, | ||
| @@ -202,6 +201,10 @@ | ||
| # OpenAISDK class (LLM judge) auto-appends /chat/completions, so use base only | ||
| docker_api_base_sdk = "http://localhost:3000/v1" | ||
|
|
||
| # Expose API key to benchmark environment via environment variable | ||
| if api_key is not None: | ||
| os.environ["API_KEY"] = api_key | ||
|
|
||
| # Generate config.py | ||
| config_content = generate_api_config( | ||
| model_abbr=f"api-{benchmark_name}", | ||
| @@ -296,7 +299,7 @@ | ||
| os.chdir(_project_root) | ||
|
|
||
| # ==================== API Configuration ==================== | ||
| API_KEY = "sk-1234" | ||
| API_KEY = os.environ.get("API_KEY") | ||
| API_BASE = "http://localhost:3000" | ||
| MODEL = "gpt-4o-mini" | ||
| HF_TOKEN = "hf_xxxx" # For gated datasets |
Description
Motivation and Context
How Has This Been Tested?
Screenshots of Test Results (if appropriate):
Types of changes
📚 Documentation preview 📚: https://RDAgent--1314.org.readthedocs.build/en/1314/