Skip to content

feat: Proposed SIMBAUQ Sampling Strategy#785

Open
radum2275 wants to merge 4 commits intogenerative-computing:mainfrom
radum2275:feat/simba_uq
Open

feat: Proposed SIMBAUQ Sampling Strategy#785
radum2275 wants to merge 4 commits intogenerative-computing:mainfrom
radum2275:feat/simba_uq

Conversation

@radum2275
Copy link
Copy Markdown

@radum2275 radum2275 commented Apr 3, 2026

Sampling Strategy PR

Use this template when adding or modifying sampling strategies in mellea/stdlib/sampling/.

Description

Implementation Checklist

Base Class

  • Extends appropriate base class:
    • BaseSamplingStrategy if your changes are mostly modifying the repair and/or select_from_failure functions
    • SamplingStrategy if your changes involve a new sample method
    • Other defined sampling strategies if your implementation is similar to existing implementations

Return Value

  • Returns a properly typed SamplingResult. Specifically, this means:
    • ModelOutputThunks in sample_generations are properly typed from the Component and the parsed_repr is the expected type.

Integration

  • Strategy exported in mellea/stdlib/sampling/__init__.py

Testing

  • Tests added to tests/sampling/
  • New code has 100% coverage
  • Ensure existing tests and github automation passes (a maintainer will kick off the github automation when the rest of the PR is populated)

Radu Marinescu added 4 commits April 2, 2026 14:12
Signed-off-by: Radu Marinescu <radu.marinescu@ie.ibm.com>
Signed-off-by: Radu Marinescu <radu.marinescu@ie.ibm.com>
Signed-off-by: Radu Marinescu <radu.marinescu@ie.ibm.com>
Signed-off-by: Radu Marinescu <radu.marinescu@ie.ibm.com>
@radum2275 radum2275 requested a review from a team as a code owner April 3, 2026 16:39
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 3, 2026

The PR description has been updated. Please fill out the template for your PR to be reviewed.

@jakelorocco jakelorocco changed the title Proposed SIMBAUQ Sampling Strategy feat: Proposed SIMBAUQ Sampling Strategy Apr 3, 2026
@github-actions github-actions bot added the enhancement New feature or request label Apr 3, 2026
Copy link
Copy Markdown
Contributor

@planetf1 planetf1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also noticed we don't export SOFAISamplingStrategy in all - not an issue from this PR, but observed



@pytest.mark.ollama
@pytest.mark.llm
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@pytest.mark.llm
@pytest.mark.e2e

llm was recently removed/deprecated as a marker. Our 'integration' tests are those that test multiple components of mellea together, but don't require external dependencies (like ollama) hence e2e as the classification

@@ -0,0 +1,216 @@
# pytest: openai, llm, qualitative
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# pytest: openai, llm, qualitative
# pytest: openai, e2e, qualitative, skip

Since this has dependencies we don't automatically set up, it can't automatically run in most environments/CI, so I think we need skip. (Also updated llm->e2e).

The openai marker isn't applicable here as the example is using rits (we'd need to clarify what we mean in the framework automation as it is of course using the openai API)

uv run python docs/examples/simbauq/simbauq_example.py

Requires:
RITS_API_KEY environment variable or hardcoded key below.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this won't work for users outside IBM. Should the example be based on an external or local service?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 all things on public github should only reference external/local services

docs/docs/api/
docs/docs/api-reference.mdx
.venv-docs-autogen/
CLAUDE.md
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
CLAUDE.md

We have a CLAUDE.md checked in as part of the project so it should not be ignored.

# SIMBA-UQ Sampling Strategy

Confidence-aware sample selection using the SIMBA-UQ framework
(Bhattacharjya et al., 2024). Generates multiple samples across a range of
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
(Bhattacharjya et al., 2024). Generates multiple samples across a range of
(Bhattacharjya et al., 2025). Generates multiple samples across a range of

The paper is from 2025?

return scores[self.rouge_type].fmeasure

if self.similarity_metric == "sbert":
from sklearn.metrics.pairwise import cosine_similarity
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't (currently) a mandatory dependency, or in any dependency group (unless there's a transitive chain)? So import needs a guard, clear message and/or fallback behaviour/raising error

return float(np.exp(np.mean(log_sims)))

if self.aggregation == "harmonic_mean":
from scipy import stats as scipy_stats
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comment on import above (guard)

Returns:
Trained ``RandomForestClassifier``.
"""
from sklearn.ensemble import RandomForestClassifier
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comment on import above (guard)

@@ -0,0 +1,365 @@
"""Tests for SIMBAUQSamplingStrategy."""

import numpy as np
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

numpy is a transitive dependency of rouge_score, which is currently mandatory - but it's probably a good idea to ensure it's added as an explicit dependency?

Note it appears in the vllm group but another pr is removing that.

probs = self._classifier.predict_proba(x_test) # type: ignore[union-attr]
return probs[:, 1]

def _compute_confidences(self, samples: list[str]) -> np.ndarray:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: _compute_confidences() looks like it duplicates the aggregation loop already inlined in sample() (lines ~233-240) but isn't actually called from there. Would it make sense to have sample() call _compute_confidences() instead of inlining the logic? That way the tests exercise the same code path that runs in production.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Proposal: Integrating Similarity‑Based Aggregation for Uncertainty Quantification into Mellea

3 participants