Skip to content

Commit 744bbcb

Browse files
shrutiyerShruti Iyer
andauthored
Propagate OAI eval and run ids to the results (Azure#44899)
* Propagate oai eval and run ids to the results * Code formatter --------- Co-authored-by: Shruti Iyer <[email protected]>
1 parent 950e660 commit 744bbcb

File tree

13 files changed

+71
-10
lines changed

13 files changed

+71
-10
lines changed

sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluate/_evaluate.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1052,6 +1052,10 @@ def _evaluate( # pylint: disable=too-many-locals,too-many-statements
10521052

10531053
result_df_dict = results_df.to_dict("records")
10541054
result: EvaluationResult = {"rows": result_df_dict, "metrics": metrics, "studio_url": studio_url} # type: ignore
1055+
if eval_run_info_list:
1056+
result["oai_eval_run_ids"] = [
1057+
{"eval_group_id": info["eval_group_id"], "eval_run_id": info["eval_run_id"]} for info in eval_run_info_list
1058+
]
10551059
# _add_aoai_structured_results_to_results(result, LOGGER, kwargs.get("eval_meta_data"))
10561060

10571061
eval_id: Optional[str] = kwargs.get("_eval_id")

sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_model_configurations.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -139,6 +139,7 @@ class Conversation(TypedDict):
139139

140140
class EvaluationResult(TypedDict):
141141
metrics: Dict
142+
oai_eval_run_ids: NotRequired[List[Dict[str, str]]]
142143
studio_url: NotRequired[str]
143144
rows: List[Dict]
144145
_evaluation_results_list: List[Dict]

sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/red_team/_utils/formatting_utils.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,7 @@ def get_strategy_name(attack_strategy: Union[AttackStrategy, List[AttackStrategy
5252

5353

5454
def get_flattened_attack_strategies(
55-
attack_strategies: List[Union[AttackStrategy, List[AttackStrategy]]]
55+
attack_strategies: List[Union[AttackStrategy, List[AttackStrategy]]],
5656
) -> List[Union[AttackStrategy, List[AttackStrategy]]]:
5757
"""Flatten complex attack strategies into individual strategies.
5858

sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/simulator/_simulator.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -426,7 +426,7 @@ def _parse_prompty_response(self, *, response: str) -> Dict[str, Any]:
426426
try:
427427
if isinstance(response, str):
428428
response = response.replace("\u2019", "'").replace("\u2018", "'")
429-
response = response.replace("\u201C", '"').replace("\u201D", '"')
429+
response = response.replace("\u201c", '"').replace("\u201d", '"')
430430

431431
# Replace None with null
432432
response = response.replace("None", "null")

sdk/evaluation/azure-ai-evaluation/samples/aoai_score_model_grader_sample.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@
2424
- AZURE_AI_PROJECT_ENDPOINT
2525
2. Hub-based project (legacy):
2626
- AZURE_SUBSCRIPTION_ID
27-
- AZURE_RESOURCE_GROUP_NAME
27+
- AZURE_RESOURCE_GROUP_NAME
2828
- AZURE_PROJECT_NAME
2929
"""
3030

sdk/evaluation/azure-ai-evaluation/samples/evaluation_samples_safety_evaluation.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
"""
1010
DESCRIPTION:
1111
These samples demonstrate usage of _SafetyEvaluation class with various _SafetyEvaluator instances.
12-
12+
1313
USAGE:
1414
python evaluation_samples_safety_evaluation.py
1515

sdk/evaluation/azure-ai-evaluation/samples/evaluation_samples_simulate.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010
"""
1111
DESCRIPTION:
1212
These samples demonstrate usage of various classes and methods used to perform simulation in the azure-ai-evaluation library.
13-
13+
1414
USAGE:
1515
python evaluation_samples_simulate.py
1616

sdk/evaluation/azure-ai-evaluation/samples/evaluation_samples_threshold.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010
"""
1111
DESCRIPTION:
1212
These samples demonstrate usage of various classes and methods used to perform evaluation with thresholds in the azure-ai-evaluation library.
13-
13+
1414
USAGE:
1515
python evaluation_samples_threshold.py
1616

sdk/evaluation/azure-ai-evaluation/samples/red_team_samples.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
DESCRIPTION:
99
These samples demonstrate usage of various classes and methods used in Red Team
1010
functionality within the azure-ai-evaluation library.
11-
11+
1212
USAGE:
1313
python red_team_samples.py
1414

sdk/evaluation/azure-ai-evaluation/samples/score_model_multimodal/aoai_score_model_grader_sample_audio.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@
2424
- AZURE_AI_PROJECT_ENDPOINT
2525
2. Hub-based project (legacy):
2626
- AZURE_SUBSCRIPTION_ID
27-
- AZURE_RESOURCE_GROUP_NAME
27+
- AZURE_RESOURCE_GROUP_NAME
2828
- AZURE_PROJECT_NAME
2929
"""
3030

0 commit comments

Comments
 (0)