Optimize ragged CUDA attention with per-row length masks.
All challenges
Keywordsattention
Optimize a Triton gated dot-product attention kernel.
Optimize a Triton flash-attention kernel with causal masking.
Optimize a Triton decoding-attention kernel for decoder-style attention shapes.