[JAX] Add warning if using BSHD and max_segments_per_seq > 1 by jberchtold-nvidia · Pull Request #2796 · NVIDIA/TransformerEngine

jberchtold-nvidia · 2026-03-24T17:25:31Z

Description

Adds a small warning if the user tries to use BSHD with max_segments_per_seq > 1

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refactoring

Changes

Adds a warning if the user tries to use BSHD with max_segments_per_seq > 1
Adds a new test to validate this warning is shown correctly

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Signed-off-by: Jeremy Berchtold <jberchtold@nvidia.com>

for more information, see https://pre-commit.ci

jberchtold-nvidia · 2026-03-24T17:28:25Z

/te-ci jax

greptile-apps · 2026-03-24T17:30:06Z

Greptile Summary

This PR adds a UserWarning to fused_attn in transformer_engine/jax/attention.py when the caller sets max_segments_per_seq > 1 with a non-THD (BSHD) layout, since sequence packing is only meaningful for THD formats. The change is small, self-contained, and the warning logic — guarded by not qkv_layout.is_thd() — correctly covers all BSHD variants (BS3HD, BSHD_BS2HD, BSHD_BSHD_BSHD). The two issues flagged in earlier review rounds (stacklevel omission and the duplicate if causing IndentationError) are both resolved in the current state.

Key observations:

Warning is correctly placed in the non-legacy code path (after the early-return for deprecated mask usage), so it won't fire spuriously for callers using the old mask-based API.
stacklevel=2 is present, directing the warning to the caller's site rather than library internals.
The PR description and checklist claim a test class (TestMaxSegmentsPerSeqWarning) was added to tests/jax/test_fused_attn.py, but it was removed in a subsequent commit (6d7ad99) with no stated reason, leaving the new warning path without automated test coverage.

Confidence Score: 5/5

Safe to merge — the one-line logic change is correct and non-breaking; all remaining feedback is P2 quality/documentation concerns.

All identified issues are P2 (missing test coverage, inaccurate PR checklist). No runtime defects, security issues, or logic errors remain. Previous P0/P1 issues (duplicate if / missing stacklevel) are resolved.

No files require special attention for merge safety, but tests/jax/test_fused_attn.py should ideally have the removed test reinstated.

Important Files Changed

Filename	Overview
transformer_engine/jax/attention.py	Adds a correctly placed UserWarning (with stacklevel=2) when max_segments_per_seq > 1 is used with a non-THD layout; prior review issues (duplicate if, missing stacklevel) are resolved.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["fused_attn(qkv, ..., max_segments_per_seq, qkv_layout)"]
    B{"sequence_descriptor is None\nor isinstance(jnp.ndarray)?"}
    C["warnings.warn(DeprecationWarning)\n+ raise ValueError if max_segments_per_seq != 1"]
    D["_legacy_fused_attn(...)  return"]
    E{"max_segments_per_seq > 1\nAND NOT qkv_layout.is_thd()?"}
    F["warnings.warn(UserWarning, stacklevel=2)\n'max_segments_per_seq only applies to THD'"]
    G["_fused_attn(...) → output  return"]

    A --> B
    B -- yes --> C --> D
    B -- no --> E
    E -- yes --> F --> G
    E -- no --> G

_{Reviews (5): Last reviewed commit: "Merge branch 'main' into jberchtold/te-m..." | Re-trigger Greptile}

transformer_engine/jax/attention.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by: jberchtold-nvidia <158520091+jberchtold-nvidia@users.noreply.github.com>

transformer_engine/jax/attention.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by: jberchtold-nvidia <158520091+jberchtold-nvidia@users.noreply.github.com>

jberchtold-nvidia · 2026-03-24T18:53:48Z

/te-ci jax

transformer_engine/jax/attention.py

KshitijLakhani · 2026-03-30T16:57:37Z

tests/jax/test_fused_attn.py

        )
+
+
+class TestMaxSegmentsPerSeqWarning:


Thanks for testing this warning out, Jeremy !
I think negative testing is crucial in such cases..
However, I'm thinking that merging a test case for just checking a warning might be overkill.
We've got quite a few warnings in TE but we do not test them (from at least what I'm aware of) so I'd suggest we drop the test and just keep the warning change in fused attn.
Local negative testing to ensure that the warning gets triggered, followed by a passing CI for fused attn tests should be enough, I think . Thoughts ?

That sounds good to me, I'll remove the new test

Co-authored-by: Kshitij Lakhani <33047503+KshitijLakhani@users.noreply.github.com> Signed-off-by: jberchtold-nvidia <158520091+jberchtold-nvidia@users.noreply.github.com>

Signed-off-by: Jeremy Berchtold <jberchtold@nvidia.com>

jberchtold-nvidia · 2026-03-30T17:47:08Z

/te-ci jax

KshitijLakhani

LGTM !
Good to merge post successful CI runs for fused attn
Thanks !

…2796) * Add warning if using BSHD and max_segments_per_seq > 1 Signed-off-by: Jeremy Berchtold <jberchtold@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update transformer_engine/jax/attention.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by: jberchtold-nvidia <158520091+jberchtold-nvidia@users.noreply.github.com> * Update transformer_engine/jax/attention.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by: jberchtold-nvidia <158520091+jberchtold-nvidia@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Kshitij Lakhani <33047503+KshitijLakhani@users.noreply.github.com> Signed-off-by: jberchtold-nvidia <158520091+jberchtold-nvidia@users.noreply.github.com> * Remove warning test Signed-off-by: Jeremy Berchtold <jberchtold@nvidia.com> --------- Signed-off-by: Jeremy Berchtold <jberchtold@nvidia.com> Signed-off-by: jberchtold-nvidia <158520091+jberchtold-nvidia@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Co-authored-by: Kshitij Lakhani <33047503+KshitijLakhani@users.noreply.github.com>

Add warning if using BSHD and max_segments_per_seq > 1

da09120

Signed-off-by: Jeremy Berchtold <jberchtold@nvidia.com>

jberchtold-nvidia requested a review from KshitijLakhani March 24, 2026 17:26

[pre-commit.ci] auto fixes from pre-commit.com hooks

c6750aa

for more information, see https://pre-commit.ci

greptile-apps bot reviewed Mar 24, 2026

View reviewed changes

transformer_engine/jax/attention.py Show resolved Hide resolved

Update transformer_engine/jax/attention.py

87a9cb0

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by: jberchtold-nvidia <158520091+jberchtold-nvidia@users.noreply.github.com>

greptile-apps bot reviewed Mar 24, 2026

View reviewed changes

transformer_engine/jax/attention.py Outdated Show resolved Hide resolved

Update transformer_engine/jax/attention.py

7b675ef

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by: jberchtold-nvidia <158520091+jberchtold-nvidia@users.noreply.github.com>

KshitijLakhani requested changes Mar 30, 2026

View reviewed changes

KshitijLakhani assigned jberchtold-nvidia Mar 30, 2026

jberchtold-nvidia and others added 3 commits March 30, 2026 10:41

Apply suggestions from code review

7a110b0

Co-authored-by: Kshitij Lakhani <33047503+KshitijLakhani@users.noreply.github.com> Signed-off-by: jberchtold-nvidia <158520091+jberchtold-nvidia@users.noreply.github.com>

Remove warning test

6d7ad99

Signed-off-by: Jeremy Berchtold <jberchtold@nvidia.com>

Merge branch 'main' into jberchtold/te-max-segments-per-seq-warning-bshd

1aeb83f

KshitijLakhani self-requested a review March 30, 2026 17:46

KshitijLakhani approved these changes Mar 30, 2026

View reviewed changes

jberchtold-nvidia merged commit f4debf6 into NVIDIA:main Mar 30, 2026
11 of 14 checks passed

jberchtold-nvidia deleted the jberchtold/te-max-segments-per-seq-warning-bshd branch March 30, 2026 23:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[JAX] Add warning if using BSHD and max_segments_per_seq > 1#2796

[JAX] Add warning if using BSHD and max_segments_per_seq > 1#2796
jberchtold-nvidia merged 7 commits intoNVIDIA:mainfrom
jberchtold-nvidia:jberchtold/te-max-segments-per-seq-warning-bshd

jberchtold-nvidia commented Mar 24, 2026

Uh oh!

jberchtold-nvidia commented Mar 24, 2026

Uh oh!

greptile-apps bot commented Mar 24, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

jberchtold-nvidia commented Mar 24, 2026

Uh oh!

Uh oh!

KshitijLakhani Mar 30, 2026

Uh oh!

jberchtold-nvidia Mar 30, 2026

Uh oh!

jberchtold-nvidia Mar 30, 2026

Uh oh!

jberchtold-nvidia commented Mar 30, 2026

Uh oh!

KshitijLakhani left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		)


		class TestMaxSegmentsPerSeqWarning:

Conversation

jberchtold-nvidia commented Mar 24, 2026

Description

Type of change

Changes

Checklist:

Uh oh!

jberchtold-nvidia commented Mar 24, 2026

Uh oh!

greptile-apps bot commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

Uh oh!

Uh oh!

jberchtold-nvidia commented Mar 24, 2026

Uh oh!

Uh oh!

KshitijLakhani Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

jberchtold-nvidia Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

jberchtold-nvidia Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

jberchtold-nvidia commented Mar 30, 2026

Uh oh!

KshitijLakhani left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

greptile-apps bot commented Mar 24, 2026 •

edited

Loading