Optimize a Triton GEMM with GELU around tile-boundary dimensions.
All challenges
Keywordscuda
Optimize a Triton GEMM with GELU for small and large K dimensions.
Optimize a Triton GEMM with GELU for awkward matrix dimensions.
Optimize a Triton gated dot-product attention kernel.
Optimize a fused dual-linear Jensen-Shannon divergence Triton kernel.
Optimize a fused linear and cross-entropy Triton kernel.
Optimize a Triton flash-attention kernel with causal masking.
Optimize a Triton decoding-attention kernel for decoder-style attention shapes.
Optimize a Triton cross-entropy loss kernel against PyTorch GPU baselines.