[Feature][NSA]  Implement Grouped-Query Attention (GQA) kernel with sliding window.

## Parent Issue
Part of #70 

## Task Type



- [x] **L1: Kernel Implementation** (Write TileLang kernel)
- [x] **L2: Op Implementation** (Wrapper + Unit Tests + Benchmarks)
- [x] **L3: Function Implementation** (Autograd Function)
- [ ] **L4: Layer Implementation** (nn.Module Wrapper)
- [x] **Benchmarks** (Performance Profiling)

## Description
This kernel is used in NSA (Neural Sequence Attention) to preserve local precision during long-context modeling by retaining fine-grained token selection. The kernel  supports both:
1. Variable-length sequences (varlen)
2. Sliding window attention (local attention with fixed window size)

The kernel should operate on unpadded tensors and accept the following inputs and outputs:
### Inputs
Q_unpad, shape is [UQ, heads, dim]
K_unpad, shape is [UKV, head_kv, dim]
V_unpad, shape is [UKV, head_kv, dim]
cu_seqlens_q, shape is [B + 1]
cu_seqlens_k, shape is [B + 1]
window_size_left,
window_size_right.
### Outputs:
Output_unpad shape is [UQ, heads, dim]。

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature][NSA] Implement Grouped-Query Attention (GQA) kernel with sliding window. #167

Parent Issue

Task Type

Description

Inputs

Outputs:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature][NSA] Implement Grouped-Query Attention (GQA) kernel with sliding window. #167

Description

Parent Issue

Task Type

Description

Inputs

Outputs:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions