Dataset: Download the
test.jsonlfile here and put it todata/leetcode_contest.jsonl.
- Run {run_type}
python {file_name} \
--config 'config/config-leetcode-contest-qwen2.5-coder-32b.json' \
--run_type {run_type} \
--api_key 'xxx' \
--base_url 'xxx'The required order of method execution, the methods following the arrow need to rely on the results generated by the previous methods for execution, including codes or test cases.
Sampling / Gen_tests -> Sampling+Filtering / CodeT / MBR_Exec / Self_repair,Gen_tests -> Reflexion
- run_type
| Method | file_name | run_type |
|---|---|---|
| Sampling | b_sampling.py | sampling |
| Sampling+Filtering | b_sampling_filtering.py | sampling_filtering |
| Gen_tests | b_gen_tests.py | gen_tests |
| CodeT | b_codet.py | codet |
| MBR_Exec | b_mbr_exec.py | mbr_exec |
| Self_repair | b_self_repair.py | self_repair |
| Reflexion | b_reflexion | reflexion |
| CoCoEvo | coevod.py | coevo |
| Evolution (CoCoEvo w/o test evolution) | evolutiond.py | evolution |
- Run CoCoEvo
python coevod.py \
--config 'config/config-leetcode-contest-qwen2.5-coder-32b.json' \
--run_type 'coevo' \
--api_key 'xxx' \
--base_url 'xxx'- Evaluate CoCoEvo / Evolution
python count_code_population.py \
result_dir 'result/leetcode_contest/qwen2.5-coder-32b/coevod'- Evaluate generated test cases
# evaluate generated tests
python b_gen_tests_eval.py \
--config 'config/config-leetcode-contest-qwen2.5-coder-32b.json' \
--run_type 'gen_tests_eval'
# show evaluation result
python show_tests.py --result_dir='result/leetcode_contest/qwen2.5-coder-32b' --run_type='gen_tests_eval'- For other methods, use submit.py
# submit to private tests
python submit.py \
--config 'config/config-leetcode-contest-qwen2.5-coder-32b.json' \
--run_type {run_type}baselines (AgentCoder, CodeCOT, INTERVENOR)
@ARTICLE{11098743,
author={Li, Kefan and Yuan, Yuan and Yu, Hongyue and Guo, Tingyu and Cao, Shijie},
journal={IEEE Transactions on Evolutionary Computation},
title={CoCoEvo: Co-Evolution of Programs and Test Cases to Enhance Code Generation},
year={2025},
volume={},
number={},
pages={1-1},
keywords={Codes;Accuracy;Maintenance engineering;Evolutionary computation;Electronic mail;Software development management;Programming;Dynamic scheduling;Computer bugs;Training;Large Language Models;Code Generation;Test Case Generation;Co-Evolution},
doi={10.1109/TEVC.2025.3593272}}