I got only 9.7% for llama2-7B-chat on human-eval using your script ``` python {'pass@1': 0.0975609756097561} ```