feat: Proposed SIMBAUQ Sampling Strategy by radum2275 · Pull Request #785 · generative-computing/mellea

radum2275 · 2026-04-03T16:39:57Z

Sampling Strategy PR

Use this template when adding or modifying sampling strategies in mellea/stdlib/sampling/.

Description

Link to Issue: Fixes Proposal: Integrating Similarity‑Based Aggregation for Uncertainty Quantification into Mellea #718

Implementation Checklist

Base Class

Extends appropriate base class:
- BaseSamplingStrategy if your changes are mostly modifying the repair and/or select_from_failure functions
- SamplingStrategy if your changes involve a new sample method
- Other defined sampling strategies if your implementation is similar to existing implementations

Return Value

Returns a properly typed SamplingResult. Specifically, this means:
- ModelOutputThunks in sample_generations are properly typed from the Component and the parsed_repr is the expected type.

Integration

Strategy exported in mellea/stdlib/sampling/__init__.py

Testing

Tests added to tests/sampling/
New code has 100% coverage
Ensure existing tests and github automation passes (a maintainer will kick off the github automation when the rest of the PR is populated)

Signed-off-by: Radu Marinescu <radu.marinescu@ie.ibm.com>

github-actions · 2026-04-03T16:40:10Z

The PR description has been updated. Please fill out the template for your PR to be reviewed.

planetf1

Also noticed we don't export SOFAISamplingStrategy in all - not an issue from this PR, but observed

planetf1 · 2026-04-07T13:40:48Z

test/stdlib/sampling/test_simbauq.py

+
+
+@pytest.mark.ollama
+@pytest.mark.llm


Suggested change

@pytest.mark.llm

@pytest.mark.e2e

llm was recently removed/deprecated as a marker. Our 'integration' tests are those that test multiple components of mellea together, but don't require external dependencies (like ollama) hence e2e as the classification

planetf1 · 2026-04-07T13:47:12Z

docs/examples/simbauq/simbauq_example.py

@@ -0,0 +1,216 @@
+# pytest: openai, llm, qualitative


Suggested change

# pytest: openai, llm, qualitative

# pytest: openai, e2e, qualitative, skip

Since this has dependencies we don't automatically set up, it can't automatically run in most environments/CI, so I think we need skip. (Also updated llm->e2e).

The openai marker isn't applicable here as the example is using rits (we'd need to clarify what we mean in the framework automation as it is of course using the openai API)

planetf1 · 2026-04-07T13:47:37Z

docs/examples/simbauq/simbauq_example.py

+    uv run python docs/examples/simbauq/simbauq_example.py
+
+Requires:
+    RITS_API_KEY environment variable or hardcoded key below.


this won't work for users outside IBM. Should the example be based on an external or local service?

+1 all things on public github should only reference external/local services

planetf1 · 2026-04-07T13:48:55Z

.gitignore

 docs/docs/api/
 docs/docs/api-reference.mdx
 .venv-docs-autogen/
+CLAUDE.md


Suggested change

CLAUDE.md

We have a CLAUDE.md checked in as part of the project so it should not be ignored.

planetf1 · 2026-04-07T13:49:56Z

docs/examples/simbauq/README.md

+# SIMBA-UQ Sampling Strategy
+
+Confidence-aware sample selection using the SIMBA-UQ framework
+(Bhattacharjya et al., 2024). Generates multiple samples across a range of


Suggested change

(Bhattacharjya et al., 2024). Generates multiple samples across a range of

(Bhattacharjya et al., 2025). Generates multiple samples across a range of

The paper is from 2025?

planetf1 · 2026-04-07T14:04:51Z

mellea/stdlib/sampling/simbauq.py

+            return scores[self.rouge_type].fmeasure
+
+        if self.similarity_metric == "sbert":
+            from sklearn.metrics.pairwise import cosine_similarity


This isn't (currently) a mandatory dependency, or in any dependency group (unless there's a transitive chain)? So import needs a guard, clear message and/or fallback behaviour/raising error

planetf1 · 2026-04-07T14:05:11Z

mellea/stdlib/sampling/simbauq.py

+            return float(np.exp(np.mean(log_sims)))
+
+        if self.aggregation == "harmonic_mean":
+            from scipy import stats as scipy_stats


see comment on import above (guard)

planetf1 · 2026-04-07T14:05:23Z

mellea/stdlib/sampling/simbauq.py

+        Returns:
+            Trained ``RandomForestClassifier``.
+        """
+        from sklearn.ensemble import RandomForestClassifier


see comment on import above (guard)

planetf1 · 2026-04-07T14:07:37Z

test/stdlib/sampling/test_simbauq.py

@@ -0,0 +1,365 @@
+"""Tests for SIMBAUQSamplingStrategy."""
+
+import numpy as np


numpy is a transitive dependency of rouge_score, which is currently mandatory - but it's probably a good idea to ensure it's added as an explicit dependency?

Note it appears in the vllm group but another pr is removing that.

planetf1 · 2026-04-07T14:11:23Z

mellea/stdlib/sampling/simbauq.py

+        probs = self._classifier.predict_proba(x_test)  # type: ignore[union-attr]
+        return probs[:, 1]
+
+    def _compute_confidences(self, samples: list[str]) -> np.ndarray:


Nit: _compute_confidences() looks like it duplicates the aggregation loop already inlined in sample() (lines ~233-240) but isn't actually called from there. Would it make sense to have sample() call _compute_confidences() instead of inlining the logic? That way the tests exercise the same code path that runs in production.

Radu Marinescu added 4 commits April 2, 2026 14:12

feat: initial commit for the SIMBAUQSamplingStrategy

c5236f0

Signed-off-by: Radu Marinescu <radu.marinescu@ie.ibm.com>

chore: added a separate filed to mot.meta for the similarity matrix

ea51043

Signed-off-by: Radu Marinescu <radu.marinescu@ie.ibm.com>

chore: added a second aggregation by classification CE algorithm

5c23a58

Signed-off-by: Radu Marinescu <radu.marinescu@ie.ibm.com>

refactor: revised and moved the SIMBAUQSamplingStrategy in docs/examples

d7f3b6a

Signed-off-by: Radu Marinescu <radu.marinescu@ie.ibm.com>

radum2275 requested a review from a team as a code owner April 3, 2026 16:39

jakelorocco changed the title ~~Proposed SIMBAUQ Sampling Strategy~~ feat: Proposed SIMBAUQ Sampling Strategy Apr 3, 2026

github-actions bot added the enhancement New feature or request label Apr 3, 2026

planetf1 requested changes Apr 7, 2026

View reviewed changes

	# pytest: openai, llm, qualitative
	# pytest: openai, e2e, qualitative, skip

	(Bhattacharjya et al., 2024). Generates multiple samples across a range of
	(Bhattacharjya et al., 2025). Generates multiple samples across a range of

		@@ -0,0 +1,365 @@
		"""Tests for SIMBAUQSamplingStrategy."""

		import numpy as np



		@pytest.mark.ollama
		@pytest.mark.llm

Conversation

radum2275 commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Sampling Strategy PR

Description

Implementation Checklist

Base Class

Return Value

Integration

Testing

Uh oh!

github-actions bot commented Apr 3, 2026

Uh oh!

planetf1 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

radum2275 commented Apr 3, 2026 •

edited

Loading