Bfcl v4 #3905

dtrawins · 2026-01-13T15:03:48Z

🛠 Summary

CVS-179106

🧪 Checklist

Unit tests added.
The documentation updated.
Change follows security best practices.
``

atobiszei · 2026-01-13T15:09:15Z

tests/accuracy/test_small_models.sh

@@ -0,0 +1,86 @@
+#!/bin/bash -x
+#
+# Copyright (c) 2024 Intel Corporation


atobiszei · 2026-01-13T15:09:33Z

tests/accuracy/test_single_model.sh

@@ -0,0 +1,22 @@
+export MODEL=$1


copyright header

Copilot

Pull request overview

This PR updates the Berkeley Function Call Leaderboard (BFCL) integration to version 4, refactoring test infrastructure and adding support for new models and chat templates.

Changes:

Refactored testing scripts by extracting model test logic from export_all_models.sh into dedicated test scripts
Updated gorilla benchmark integration to a newer version with modified patch configurations
Added support for additional models (GPT-OSS-20B, Qwen3-Coder-30B, Devstral) with corresponding chat templates

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
tests/accuracy/test_small_models.sh	New script containing extracted model testing logic with configurable tool-guided generation
tests/accuracy/test_single_model.sh	New script for testing individual models with specific configurations
tests/accuracy/test_case_ids_to_generate.json	Configuration file defining test case IDs for generation
tests/accuracy/install_gorilla.sh	New installation script for gorilla benchmark with updated commit hash
tests/accuracy/export_all_models.sh	Removed model testing logic and added new model export commands
extras/chat_template_examples/chat_template_devstral.jinja	New Devstral chat template with comprehensive system prompts and tool call formatting
demos/continuous_batching/accuracy/gorilla.patch	Updated patch for newer gorilla version with modified configuration handling
demos/continuous_batching/accuracy/README.md	Updated documentation with new gorilla version, installation steps, and test categories

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

tests/accuracy/test_single_model.sh

Copilot · 2026-01-13T15:12:32Z

tests/accuracy/test_single_model.sh

+--rest_port 8000 --model_repository_path /models --source_model ${model_name}-${precision} \
+--tool_parser ${tool_parser} --model_name ovms-model \
+--cache_size 0 --task text_generation
+
+echo wait for model server to be ready
+while [ "$(curl -s http://localhost:8000/v3/models | jq -r '.data[0].id')" != "${model_name}-${precision}" ] ; do echo waiting for LLM model; sleep 1; done


Continuing from the previous issue, this line also uses undefined lowercase variables ${model_name} and ${precision} instead of ${MODEL} and ${PRECISION}.

Suggested change

--rest_port 8000 --model_repository_path /models --source_model ${model_name}-${precision} \

--tool_parser ${tool_parser} --model_name ovms-model \

--cache_size 0 --task text_generation

echo wait for model server to be ready

while [ "$(curl -s http://localhost:8000/v3/models | jq -r '.data[0].id')" != "${model_name}-${precision}" ] ; do echo waiting for LLM model; sleep 1; done

--rest_port 8000 --model_repository_path /models --source_model ${MODEL}-${PRECISION} \

--tool_parser ${tool_parser} --model_name ovms-model \

--cache_size 0 --task text_generation

echo wait for model server to be ready

while [ "$(curl -s http://localhost:8000/v3/models | jq -r '.data[0].id')" != "${MODEL}-${PRECISION}" ] ; do echo waiting for LLM model; sleep 1; done

tests/accuracy/test_small_models.sh

extras/chat_template_examples/chat_template_devstral.jinja

atobiszei · 2026-01-13T15:14:02Z

tests/accuracy/test_single_model.sh

+
+
+docker stop ovms 2>/dev/null
+docker run -d --name ovms --user $(id -u):$(id -g) --rm -p 8000:8000 -v $(pwd)/models:/models openvino/model_server:latest \


Most probably we would need to pass image label/tag so this could be the default, but otherwise user passed arguments should be used here?

Co-authored-by: Copilot <[email protected]>

…rver into bfcl-v4

dtrawins added 2 commits January 13, 2026 15:23

updated bfcl instructions

ed952fe

updated scripts to test accuracy

a11a441

dtrawins requested review from atobiszei, mzegla and pgladkows January 13, 2026 15:04

atobiszei reviewed Jan 13, 2026

View reviewed changes

atobiszei requested a review from Copilot January 13, 2026 15:11

Copilot AI reviewed Jan 13, 2026

View reviewed changes

atobiszei reviewed Jan 13, 2026

View reviewed changes

dtrawins and others added 4 commits January 13, 2026 16:19

Apply suggestions from code review

6ed3d21

Co-authored-by: Copilot <[email protected]>

Apply suggestions from code review

b258d30

Co-authored-by: Copilot <[email protected]>

updates

98b50cc

Merge branch 'bfcl-v4' of https://github.com/openvinotoolkit/model_se…

ab69005

…rver into bfcl-v4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bfcl v4 #3905

Bfcl v4 #3905

dtrawins commented Jan 13, 2026

Uh oh!

atobiszei Jan 13, 2026

Uh oh!

atobiszei Jan 13, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI Jan 13, 2026

Uh oh!

Uh oh!

Uh oh!

atobiszei Jan 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants



		docker stop ovms 2>/dev/null
		docker run -d --name ovms --user $(id -u):$(id -g) --rm -p 8000:8000 -v $(pwd)/models:/models openvino/model_server:latest \

Bfcl v4 #3905

Are you sure you want to change the base?

Bfcl v4 #3905

Conversation

dtrawins commented Jan 13, 2026

🛠 Summary

🧪 Checklist

Uh oh!

atobiszei Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

atobiszei Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Copilot AI Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

atobiszei Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants