Parent Issue
Part of #70
Task Type
Description
This kernel is used in NSA (Neural Sequence Attention) to preserve local precision during long-context modeling by retaining fine-grained token selection. The kernel supports both:
- Variable-length sequences (varlen)
- Sliding window attention (local attention with fixed window size)
The kernel should operate on unpadded tensors and accept the following inputs and outputs:
Inputs
Q_unpad, shape is [UQ, heads, dim]
K_unpad, shape is [UKV, head_kv, dim]
V_unpad, shape is [UKV, head_kv, dim]
cu_seqlens_q, shape is [B + 1]
cu_seqlens_k, shape is [B + 1]
window_size_left,
window_size_right.
Outputs:
Output_unpad shape is [UQ, heads, dim]。